Monday, November 20, 2023

Thousands of Australian UFO newsclippings - downloaded, organised and assimilated into wider collection using AI-generated code [AI assisted ufology]

Last year, I uploaded a few hundred scanned Australian newspaper clippings about UFOs. In the last few weeks, I have explored the (impressive) ability of Artificial Intelligence tools to generate code to find further UFO newspaper clippings from various Australian sources and organise them so that they can be assimilated into my wider collection of UFO material. As a result, I added over 3,500 Australian newspaper clippings to my online UFO collection this weekend. More significantly, now that relevant tools have been developed, it should be relatively easy to add thousands more from Australia (or apply the same technique to combine similar UFO newspaper clipping collections from dozens of other countries with previous scans and share them online).

The current Australian collection is at:

I've previously done various projects in relation to UFO newspaper clippings. For example, a few years ago I shared a basic list of over 60,000 scanned UFO newspaper articles that were then in my collection (and shared a sample of that collection, from the 1980s), More recently, I uploaded - with permission from Rod Dyke - scans of the "UFO Newsclipping Service" (1969-2011) and - with permission from Ron and Richard Smotek - scans of a similar service offered by the Aerial Phenomenon Clipping Information Center ("APCIC") (1970s-1990)

Artificial Intelligence tools now allow very rapid reorganisation of UFO material from numerous sources, so it is now relatively simple to assimilate UFO newspaper clippings from online databases, scrapbooks of UFO researchers, official documents (e.g. the Australian files that I've uploaded as PDFs over the years, working with Keith Basterfield and Paul Dean - see HERE), offline UFO databases, archives of UFO groups/researchers and digitised UFO material. 

These UFO newspaper clippings from various sources can all be organised so that they can, in turn, then be assimilated into a wider collection of UFO case files, official UFO documents, UFO magazines, PhD dissertations regarding UFOs, UFO databases, discussion forum posts and related emails/correspondence.  

As a further little case study, this weekend I uploaded a few thousand further Australian newspaper clippings to my free online UFO archive (kindly hosted by the AFU in Sweden). These are being combined with material from the AFU's offline archive in Sweden plus collections of newspaper clippings from various Australian researchers (including collection put together by Anthony Clarke and Judith Houston McGinness - with their kind permission).

I picked Australia for this little case study due to the existence of a huge, free online database of Australian newspaper stories: Trove.   

Various Australian UFO researchers have previously highlighted the existence of Trove in blog posts, including posts by Keith Basterfield and Paul Dean. The Trove newspaper archive includes a huge number of Australian newspaper stories. Unfortunately, it is not easy to find a comprehensive online collection of the UFO newspaper clippings available on Trove (or any collection supplemented by further UFO newspaper clippings from other sources, such as those mentioned above).

Searching Trove can be slightly frustrating. For example, a search of the content of articles on Trove for "UFO" finds _many_ articles from long before 1947 (i.e. before the modern UFO era, and before the term "UFO" was coined).  Some of those early newspaper articles have been scanned poorly so the text as recognised by the Trove software is basically a collection of random letters. By chance, those random letters includes the letters "UFO" in hundreds of articles (e.g. in a line of text which is recognised by the Trove system [wrongly] as being "adfr AWTA hAWrhyu UFO akaRF jsD AlE").   

Very brief search terms such as "UFO" therefore generate hundreds (if not thousands) of false positive results which would have to be weeded out if the collection is to be limited to just articles relating to UFOs.  

At the other extreme, it is possible to search Trove for articles which readers have tagged with the label "UFO".  The material found using this search has far less irrelevant material (with a relevant material being almost 100% of all the results) - BUT but only a small fraction of relevant material is found (say 1%). At present, most of the articles in Trove that may be of interest to UFO researchers have not been tagged.

So, the challenge is either:
(1)  To find the time to weed out irrelevant results from wider search terms, or
(2) Finding search terms/restrictions which result in only (or at least almost entirely) material which is relevant to UFO research.

I don't have the time for (1), so working on my own the only real option is (2).

Fortunately, it is possible to come up with search terms and restrictions which greatly reduce the amount of irrelevant material while finding far more relevant material than just the articles currently tagged with a label such as "UFO".  

To help find useful search terms (and to archive material which is found), I found it useful to download all the articles found as a result of a search and then to glance through the articles offline (which is much faster than reviewing them online). In particular, I used AI software to generate code to download all the search results, then manually reviewed folder of PDFs for each of those search results, setting the view in Windows Explorer to include a preview pane on the right hand side of the screen - allowing relatively rapid review of the PDFs to determine if the results were largely relevant or whether they included a lot of irrelevant material.  

I think it would be useful to have a discussion of the pros and cons of different potential search terms, with at least a qualitative discussion of actual experiments with those search terms. I don't recall seeing this done within ufology so far.  For example:

The term "UFO", as indicated above, generated far too many irrelevant results due to random letters being perceived by relevant OCR software - particularly in poor scans of earlier newspaper articles (e.g. articles from the 1800s).

The term "UFO" could be combined with another search term or collection of alternative search terms, e.g. searching for "UFO" AND (light OR mysterious OR unidentified OR flying OR sighting OR sighted).  Unfortunately, pre-1952, most of the results were irrelevant (with, say, less than 10% being relevant).   Most were hits for the word "unidentified" in poorly scanned articles from the 1800s with lots of random characters that happen to include the three consecutive letters "UFO".  Post 1952, the percentage rises to, say, about 50% relevant - with fewer poor scans with random characters, but quite a lot of the hits for the keyword "UFO" are in reviews of science fiction books and movies.

Better than "UFO" was a search for "Unidentified" AND "Flying" AND "Object".  I'd estimate that about 95% of results after 1947 were result. However, there were surprisingly lots of results prior to 1947, only about 10% of which were relevant. One possibility would be devising searches that use terms combined with date restrictions e.g. "Unidentified" AND "Flying" AND "Object" but only in relation to articles dating from, say, 1947 or later.

Turning from "UFO", some other terms resulted some interesting articles being found but too many irrelevant articles for the search to be relied upon alone (without manual intervention). For example, Trove had about 200 hits for "strange lights in the sky" (i.e. a much smaller number than for UFO or flying saucer).  This found articles containing both "lights" in the plural ("strange lights in the sky") and light in the singular ("strange light in the sky"). However, a relatively high percentage were irrelevant or uninteresting. 

By a significant margin, the most productive search term was "flying saucer" (the results for which included the plural, "flying saucers", in addition to the singular).  This resulted in about 2,000 hits, mainly from the 1940-1950s, almost all of which were relevant.  Due to the high rate of relevancy, the numerous results of this particular search probably do not require much (if any) manual intervention and is the most promising search term to be rolled out and applied to databases of newspaper articles from other English-speaking countries.

I'll set out below the code I generated using AI tools to download material from Trove (which includes a line stating the URL for the relevant search results to be downloaded, which obviously has to be changed for each different search term used):

from selenium import webdriver
from import By
from import WebDriverWait
from import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
import requests
import os
import time
import datetime
import re

MAX_RETRIES = 100  # Number of retries for downloading an article and for loading search result pages

def safe_click(driver, element):
    driver.execute_script("arguments[0].scrollIntoView();", element)
    except Exception:
        driver.execute_script("arguments[0].click();", element)

def sanitize_filename(filename):
    invalid_chars = ['<', '>', ':', '"', '/', '\\', '|', '?', '*']
    for char in invalid_chars:
        filename = filename.replace(char, '_')
    return filename

def download_pdf_from_search_result(driver, search_result_url, save_dir, max_retries=MAX_RETRIES):
    retries = 0
    success = False

    while retries < max_retries and not success:
            WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="grp2Download"]/span[1]')))
            # Construct the filename dynamically
            filename = construct_filename_dynamic(driver)
            # The rest of the download steps
            download_icon = driver.find_element(By.XPATH, '//*[@id="grp2Download"]/span[1]')
            safe_click(driver, download_icon)
            pdf_option = driver.find_element(By.XPATH, '//*[@id="articlePdfLink"]')
            safe_click(driver, pdf_option)
            change_size_btn = driver.find_element(By.XPATH, '//*[@id="articleImageQualityShow"]')
            safe_click(driver, change_size_btn)
            largest_checkbox = driver.find_element(By.XPATH, '//*[@id="inlineRadio5"]')
            safe_click(driver, largest_checkbox)
            create_pdf_btn = driver.find_element(By.XPATH, '//*[@id="downloadModal"]/div/div/div[3]/a[10]')
            safe_click(driver, create_pdf_btn)
            time.sleep(10)  # Wait for the "View PDF" button to appear/change after the "Create PDF" button is clicked
            view_pdf_btn = driver.find_element(By.XPATH, '//*[@id="downloadModal"]/div/div/div[3]/a[10]')
            pdf_url = view_pdf_btn.get_attribute('href')
            # Fetch the PDF content
            response = requests.get(pdf_url, stream=True, timeout=60)  # Add a timeout for the request
            # Sanitize the filename
            safe_filename = sanitize_filename(filename)
            # Download the PDF
            with open(os.path.join(save_dir, safe_filename), 'wb') as pdf_file:
                for chunk in response.iter_content(chunk_size=1024):
                    if chunk:

            success = True
        except (TimeoutException, requests.exceptions.ConnectionError) as e:  # Handle both timeout and connection errors
            retries += 1
            print(f"Attempt {retries} failed due to {str(e)}. Retrying...")
            wait_time = min(5 + retries * 2, 600)
    if not success:
        print(f"Failed to download article from {search_result_url} after {max_retries} attempts.")

def construct_filename_dynamic(driver):
    page_source = driver.page_source
    soup = BeautifulSoup(page_source, 'html.parser')

    source_element = driver.find_element(By.XPATH, '//*[@id="breadcrumb-c"]/ul/li[4]/a')
    source = source_element.text.strip().split('(')[0].strip() + " (Australia)" if source_element else "Browse (Australia)"

    title_element = driver.find_element(By.XPATH, '//*[@id="breadcrumb-c"]/ul/li[7]/a')
    title = title_element.text.strip().lower().title() if title_element else "UnknownTitle"

    date_pattern = re.compile(r'(\w+ \d{1,2} \w+ \d{4})')
    date_match =

    if date_match:
        date_info =
        day, month, year = date_info[1].zfill(2), date_info[2], date_info[3]
        day, month, year = "00", "UnknownMonth", "0000"

    month_mapping = {
        'Jan': '01', 'Feb': '02', 'Mar': '03', 'Apr': '04', 'May': '05', 'Jun': '06',
        'Jul': '07', 'Aug': '08', 'Sep': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'

    date_string = f"{year} {month_mapping.get(month, '00')} {day}"

    filename = f"{date_string}_{source} - {title}.pdf"
    return filename

def process_search_results_page(driver, search_results_url, save_dir, start_article=1, end_article=397, articles_per_page=20):
    # Calculate starting page and ending page
    start_page = (start_article - 1) // articles_per_page + 1
    end_page = (end_article - 1) // articles_per_page + 1

    # Generate the URL for the starting page
    start_pos = (start_page - 1) * articles_per_page
    initial_url = f"{search_results_url}&startPos={start_pos}"

    # Add a short delay to let the page load and to see if the popup appears

    # Try to close the popup
        close_popup = driver.find_element(By.XPATH, '//*[@id="culturalModal___BV_modal_footer_"]/div/div/div[2]/button/span')
        time.sleep(2)  # Give it a moment to close
    except Exception as e:
        print(f"Error closing popup: {e}")

    article_counter = start_article - 1
    page_count = start_page

    while page_count <= end_page:
        # Extract and process articles on the current page
        print(f"Attempting to extract articles from page {page_count}...")
        article_links_elements = driver.find_elements(By.XPATH, "//a[contains(@href, '/newspaper/article/')]")
        article_links = list(set([link_elem.get_attribute('href') for link_elem in article_links_elements]))

        print(f"Found {len(article_links)} articles on the current page.")

        article_num = 0  # Counter to keep track of which article is being processed on the page

        for link in article_links:
            article_num += 1
            print(f"Processing article number {article_num} with URL {link}...")

                download_pdf_from_search_result(driver, link, save_dir)
            except Exception as e:
                print(f"Error processing article number {article_num} with URL {link}. Error: {e}")

            article_counter += 1
            print(f"Processed {article_counter} articles.")

        print(f"Current article count after processing page: {article_counter}")

        # Check if we need to navigate to the next page
        if article_num % articles_per_page == 0:
            print("Verifying if we need to navigate to the next page...")
            print(f"article_counter: {article_num}, articles_per_page: {articles_per_page}, Modulus result: {article_counter % articles_per_page}")
            for i in range(MAX_RETRIES):
                next_page_url = f"{search_results_url}&startPos={article_counter}"
                time.sleep(min(5 + i*2, 600))  # Wait time increases with each retry, maxing out at 600 seconds (10 minutes)

                # If articles are found on the page, break out of the retry loop
                article_links_elements = driver.find_elements(By.XPATH, "//a[contains(@href, '/newspaper/article/')]")
                if article_links_elements:
                    print(f"Retry {i+1}: No articles found on the new page. Retrying...")

            page_count += 1
            print(f"No more articles to process or reached the limit.")
            print(f"Processed {page_count - start_page + 1} pages.")

# Script execution starts here
if __name__ == '__main__':
    driver = webdriver.Chrome()
    save_directory = 'e:/temp/Trove/testdown'

    # URL of the search results page
    search_results_url = ""

    # For processing articles 
    process_search_results_page(driver, search_results_url, save_directory, start_article=1, end_article=422)


Thursday, November 16, 2023

ATS ( - 411 selected UFO threads archived as PDFs - some thousands of pages long

I've now archived as PDFs over 400 selected UFO threads from ATS as searchable PDFs.

I have checked the total number of pages in the 411 PDFs uploaded so far, but one thread is over 3,000 pages long in PDF format.

ATS ( was a popular forum until the last few years. Recently, all users were unable to login until some volunteers (particularly "Djarums") worked to re-admit at least some ATS members (including me). The new owner of ATS did not appear to be able to [or even attempt to] solve the problem, which seems to cast considerable doubt on the future viability of ATS. 

Several other fellow members of ATS are working on archiving at least the text of numerous threads on ATS. I don't plan on uploading such an archive, at least unless and until ATS does go offline - but at least the material should be preserved. ATS had some of the most extensive online discussions regarding UFOs and conspiracy theories in the period from, oh, about 2000-2015 (prior to the current popularity of Facebook and Twitter).

(I'd estimate the total number of pages from ATS archived so far, if I converted them all to PDFs, would be at least several million pages of material).



Saturday, November 4, 2023

ATS ("AboveTopSecret") dying? Threads by Karl12 - PDFs added to online archive

ATS ("AboveTopSecret") was a very popular UFO / conspiracy discussion forum until the rise of Facebook, Twitter and other modern social media.  ATS has been in a downward trajectory for a few years due to that competition.  During the last week or two, ATS has appeared to teeter on the brink of collapse. All users were locked out of their accounts. The current owner appears to have disappeared and failed to sort things out. It seems that it's only due to the work of a few active volunteers on ATS, particularly "Djarums", that access has been restored for some members. 

It's rather unclear if ATS will survive for much longer.

I've archived about 100 of my more substantial UFO threads from ATS as searchable PDF documents.  I've also archived over 60 UFO threads by Karl12 after he gave me permission to do the same with his UFO threads.  

I'm tempted to widen the archiving effort of UFO threads from ATS given the recent failures in relation to ATS and the risk that it may go down permanently, possibly soon.  I did want permission from the new owner (and I've requested permission on ATS several times during the last few years, without any objection but no clear consent - although some of the moderators of ATS have helped me develop code to archive ATS threads as searchable PDFs...).    

The archiving code that I've developed (with help from ChatGPT and some other members of ATS, particularly "Drewlander") iterates through a list saved in a file called "thread_details.csv" and creates a PDF of each thread in that list (as in the samples at the link above). That file listing threads can be used to store a list of relevant threads, in the format:

THREADNUMBER,NUMBER OF PAGES,AUTHOR - BRIEF TITLE  [with no spaces after each comma]

e.g. :

1308154,14,Karl12 - New And Revised UFO Quote Directory

1278525,3,Karl12 - Highly Dubious USAF UFO Explanations

1231535,4,Karl12 - Early UFO Saucer Reports

841422,5,Karl12 - UFOs and falling leaf or pendulum motion

1171896,3,Karl12 - UFOs and Colour Change

878723,7,Karl12 - Electromagnetic Effects Associated with UFOS

460705,4,Karl12 - UFO OVNI Shapes

1233389,6,Karl12 - UFO Time Anomaly Research

505080,9,Karl12 - Unusual reports of UFOs taking on water

1261532,7,Karl12 - UFO Faerie Lore Connection

1273353,3,Karl12 - UFOs And Stopped Clocks

1271101,2,Karl12 - UFO Animal Reaction Research

1263051,3,Karl12 - UFO  Cryptid Research

898220,6,Karl12 - UFO Light Beam Cases

1286201,3,Karl12 - UFO Pilot Under-Reporting Bias

900175,5,Karl12 - Missing Gun Camera UFO Footage

513308,10,Karl12 - UFO Government Documentary Evidence - Greenewald

I posted the relevant code and a brief bit of background in a blog post a while ago (before the most recent technical/hacking problems with ATS):

If other members of ATS want (and, ideally, give their permission), I can expand this archive.  If anyone wants particular threads added, it would be helpful if they provided their list of requests in the same format as that used to list threads above (since I can then quickly paste that list into a file to be iterated by the code I've developed).

Sunday, September 24, 2023

PDF: "UFOs - A British Viewpoint" - book by Jenny Randles and Peter Warrington

Further to my recent uploads of several UFO books by veteran researcher Jenny Randles, I have now uploaded a searchable copy of "UFOs : A British Viewpoint" co-authored by Jenny Randles and Peter Warrington (with the kind permission of both authors).

This was the first book written by Jenny Randles. It was written in 1976 to 1978 and published in August 1979.

It was also the first book in which Jenny Randles (with Peter Warrington) stated that we "really ought to redefine our terms and speak of a UAP" and then define UAPs.   Jenny recently stated to me that using "UAP" instead of "UFO" was literally the first thing they wrote about in this book as "definition of the terms of what you are investigating is always the most important starting point of research".

As indicated by the title of the book, it is focused on British sightings, but the analysis of different categories of UFO reports and of different theories regarding them is more general.

(I have a particular soft spot for this book as it was one of the first sensible books about UFOs that I bought...).

Jenny Randles and Peter Warrington, photographed in 1976 when they had just started writing this book:

Monday, September 18, 2023

Dozens more files from Grant Cameron's UFO research archives - Bob Pratt / National Enquirer, Frank Scully, Charlie Red Star, Donald Keyhoe, Roswell, Rahma and more

Grant Cameron has been busy scanning his UFO research files and has allowed me to upload them to my online archive, in a folder in his name.

The latest upload contains over 60 further files provided by Grant, including correspondence, research notes and other material relating to Bob Pratt / National Enquirer, Frank Scully, Charlie Red Star, Donald Keyhoe, Roswell, Rahma and more.

More from Grant's archives will be online in the next few days (together with, if I have a bit of luck confirming relevant permissions, further documents from other archives in the USA and Russia, and publications from skeptical organisations across Europe).

Friday, September 15, 2023

PDFs - "The Skeptic" (UK), first 100 issues : Largest single upload of skeptical material?

I think this may be the largest single upload by me so far (or anyone else, for that matter...) of material by prominent skeptics from Britain and around the world. 

I have now uploaded the first 100 issues of the British magazine "The Skeptic", thanks to scanning by our friends at the AFU in Sweden. After my pestering some of "The Skeptic" team since 2018 (politely, I hope), the current and former editors of "The Skeptic" (i.e.  Michael Marshall, Wendy Grossman, Deborah Hyde and Chris French) kindly indicated this year that they had no objection to this upload. 

As summarised on the website of "The Skeptic", that publication is "the UK’s longest-running publication offering skeptical analysis of pseudoscience, conspiracy theory and claims of the paranormal". It was founded in 1987 and continues to be published. It "has become an invaluable resource for journalists, teachers, psychologists, and inquisitive people of all ages who yearn to discover the truth behind the many extraordinary claims of paranormal and unusual phenomena".

This sizeable upload joins a host of skeptical material in my online archive hosted by the AFU.

For ease of reference, the skeptical material currently in my online archive is listed below (this material being in addition, of course, to the mass of ufological / Fortean material from other viewpoints that I have uploaded to that archive in recent years with the help of well over 100 UFO groups / researchers around the world and, of course, permission from the relevant individuals: 

(1) "Skeptic UFO Newsletter" by Philip Klass 

(2) John Rimmer's "Magonia

(3) Ian Mrzyglod's "Probe"  

(4) Tim Printy's "SUNlite" 

(5) "Tampa Bay Skeptics Report"

(6) "North Texas Skeptic"

(7) "South Shore Skeptic"

(8) "Arizona Skeptic"

(9) "Phoenix Skeptics News

(10) "Georgia Skeptics" newsletter

(11) "Bay Area Skeptics Information Sheet

(12) "Cincinnati Skeptic"

(13) "REALL News"  (Rational Examination Association of Lincoln Land)

(14) "Skeptical Eye" (National Capital Area Skeptics, NCAS)

(15) "Shadow of a Doubt" (National Capital Area Skeptics, NCAS)

(16) The book "Flying Saucerers" by David Clarke and Andy Roberts ("A social history of Ufology")

(17) The book "The UFOs That Never Were" by Jenny Randles, Andy Roberts and David Clarke (in my view arguably the best UFO book in the last 50 years...)

(18)   a PDF archive of about 10,000 pages of Tweets by skeptics Mick West and Charlie Wiser

(19) a PDF archive of over 4 million pages of automated transcripts of UFO podcasts and videos, including those by several skeptics e.g. Mick West

Incidentally, after I uploaded one of the skeptical newsletters mentioned above, I was blocked by one or two UFOtwitter users that I had never interacted with (or mentioned and, frankly, rarely heard of). This large upload of further skeptical material is, in part, my response to those individuals...  :) 

Saturday, September 9, 2023

Canadian section of Grant Cameron's archives

Grant Cameron has kindly provided me with scans of numerous UFO documents, articles and notes relating to Canada

I have added searchable copies of those scans to the folder I created for material from Grant's research archives.

The Canadian government documents in this upload overlap with the large set of PDFs that I uploaded in 2011-2012 (outlined in my items at that time on entitled "Canadian disclosure: “UFO Found” and other documents/photos" and "Canadian PDFs – “At no time should it be made available to the public” + more official memos") and my subsequent items / uploads.

Grant's collection complements the official Canadian files previously available, since his research archives includes further Canadian documents, press articles, research notes, and other material.

The scans I have currently added to that folder from Grant's collection are:

1976 Ontario Triangles.pdf

1976 Triangles Case.pdf

1991 Canadian UFO survey pilots.pdf

1991 Canadian UFO survey.pdf

2023-07-25 CSiS Five Eyes letter.pdf

Arthur Bray Adamski.pdf

Brian Parks Arthur Bray.pdf

Canadian Embassy correspondence.pdf

Canadian Government Documents 1.pdf

Canadian Government Documents 2.pdf

Canadian Government Documents 3.pdf

Canadian Government Documents 4.pdf

Canadian Government Documents 5.pdf

Canadian Government Documents 6.pdf

Donald Keyhoe Wilbert Smith.pdf

Inside circle member Arthur bray phone notes.pdf

Manitoba  UFO  polls.pdf

Manitoba Cases and Notes 1.pdf

Manitoba Shirtliffe Sighting.pdf

Manitoba UFO  Material 2.pdf

Manitoba UFO Material 1.pdf

Manitoba UFO Stuff.pdf

Musgrave RCMP quote.pdf

My Father-s sighting.pdf

Ottawa Flying Saucer interplanetary society.pdf

Project Magnet 1.pdf

Project Magnet 2.pdf

Project Magnet 3.pdf

Project Magnet 4.pdf

Project Magnet Official Status 1.pdf

Project Magnet official status 2.pdf

Smith contactee.pdf

Smith contactee_page_5.pdf

Wilbert Smith draft article.pdf

Wilbert Smith GeBauer Correspondence.pdf

Wilbert Smith Mrs Swan Project Magnet_page_3.pdf

Wilbert Smith Quotes.pdf