Tuesday, September 20, 2022

Another 238,465 PDF pages added to UFO archive : Open Minds Forum posts (8 million+ links replaced) - Aviary, SERPO, Nimitz, California Drone/CARET, Source A

The Open Minds Forum was a popular UFO Proboards discussion forum online until about the beginning of 2012.  It included numerous posts discussing UFO reports and Disclosure issues. It included several "Special Guests" boards featuring material from various UFO researchers, including John Lear, Ron Schmidt ("Zorgon"), Dan Smith, Angelia Joiner and others. Popular topics included SERPO, members of the Aviary, the California Drone / CARET photographs, Source A, and many other UFO incidents/issues.

I have now created a new archive of posts to the Open Minds Forum, with the kind blessing of its owner (Chris Iversen) and the encouragement and help of various other former members/administrators of the Open Minds Forum (including Brendan Burton, Lee Nicholson, Manuel Lamiroy and others). 

As part of creating this archive, I have merged two different incomplete archives and done various operations to find and replace over 8 million links within the archive. 

I've made the archive available both as:

(1) A html archive which can be browsed online; and 

(2) A PDF archive which can be downloaded for ease of offline archiving/searching.  The PDF archive currently includes 238,465 pages.  The PDF archive of posts to the Open Minds Forum can be searched offline (whether alone or as part of a larger search of scanned UFO books, magazines/newsletters and official documents) using free software such as PDF Xchange Editor, as I've outlined several times over the last decade as part of my UFO scanning project e.g. in my post here. 

As usual, my upload is hosted on the website of Sweden's Archives For the Unexplained.

(The archive remains incomplete and some of the links still do not work. I'd be happy to do more work on this, but I think I would probably require at least several minutes of help from someone more experienced than me with Regular Expressions and html. In particular, I'd like to use a Regular Expression to find all the links that currently seek to use javascript, which are in the format "javascript:if(confirm)...", and replace them with working links. Of course, if I could get several minutes of help from a competent programmer then there are several other UFO mini-projects that could be completed very, very quickly...).



One former member and co-owner at the Open Minds Forum, Lee Nicholson, helpfully provided the following recollections of his that forum:

"Like many, I joined the Open Minds Forum in 2006 to follow the developing "Project Serpo" story. Five fun filled years followed, and while "Serpo" failed to deliver the promised evidence, the forum went from strength to strength. I met and worked with lots of wonderful people over the years and made friendships which continue today."

"When the forum's 'doors' sadly 'closed' in Dec 2011, the site had recorded some 276,774 posts over 10,041 topic threads. Notable hoaxes such as; Serpo, CARET/Drone and 'Source A' were exposed by our staff and membership. Historic cases like the Stephensville sightings were discussed in the pages. The late John Lear, Edgar Fouche and Dan Smith spent many hours debating Area 51, the TR-3B and the 'Best Possible World'. Our members scoured the intenet for the 'smoking gun' , discussed the merits of individual sightings, videos, documents, researchers and hypotheses."

"In 2010 Kevin Day joined the forum, under the pseudonym TheSeer, to post his short story about Nimitz/Tic Tac incident, the significance of which wouldn't become apparent until 2017. We were able to confirm visits to the forum from various military and intel organisations and at one point our staff emails were being redirected to a field near Harrogate (RAF Menwith Hill/NSA) for some unknown purpose."

"Two of our staff members, Frank 'Doc' Andrews (SurferDoc) and Lilian Waters (NewYorkLily) have sadly passed away, but no doubt they too would be thrilled to see this huge body of UFO literature revived for the current generation of researchers. An epic task accomplished by respected researcher Isaac Koi and the Archives of the Unexplained (AFU). As a former member and co-owner at the Open Minds Forum, I can't thank them enough."

"Lee Nicholson, September, 2022"

The founder of the forum, Brendan Burton, also provided the following comments:

"I founded the forum and initially was the sole administrator. Indeed I had 'the keys' to the forum and bought the domain 'openmindsforum.com', later passing the domain to Chris to set up as a 'cooperative' venture. My motto was and still is "it's all Good". The rules were simple. All opinions were welcome as long as people were treated with dignity and respect. In that respect we were extremely popular and successful, we had only a few detractors. It was sad to learn of the passing of 'Doc' and I to this day hold my fellow admins with extremely high regard, and I dearly miss the good times we shared..." 
"An important note: in 'The TC' Comms the term 'Solar Warden' first arose and predates all mention anywhere else on the web."

To help others with any similar projects (and to remind myself when I do the next similar project...), it may help for me to set out some of the technical steps I followed (after quite a bit of experimentation and searching...)..  These are probably very simple to those with sufficient programming experiences, but I'm just a lawyer so I was rather pleased to get these steps to work. :) Fortunately, some of the steps I've previously set out in relation to my previous uploads of archives of pages of the huge UFOmind.com website and of posts to the the Reality Uncovered forum overlapped with this task and gave a bit of a head start, so I did not have to start from scratch with finding relevant software tools and learning relevant techniques (particularly some basic Regular Expressions).

First, I obtained and merged two incomplete archives of posts to the Open Minds Forum. 

One of the those archives is the incomplete set of pages from the Wayback Machine's Internet Archive. Sadly that archive is very incomplete. Pages archives after about 7 January 2012 were simply placeholders stating that the forum had been removed for an alleged breach of Proboards' terms of service. In any event, that archive is not easily searched.  I used the Wayback Machine Downloader to download earlier versions of the limited pages of the Open Minds Forum that had been archived on the Wayback Machine using the steps below (which obtained about 10% of the material I have now uploaded):


1) I downloaded the rubyinstaller recommended at the top of rubyinstaller.org/downloads then run the downloaded exe file

2) downloaded the zip file github.com/hartator/wayback-machine-downloader/archive/…

3) unzipped the downloaded zip file

4) Used the windows start menu to search for "Start command prompt with Ruby"

5) I followed the instructions for the github.com/hartator/wayback_machine_downloader (i.e.: copy paste this "gem install wayback_machine_downloader" into the prompt. Hit enter and it will install the program

6) Followed the "Usage" guidelines from that github page, entering commands within the same Ruby command prompt. Due to the issues I mentioned above, it was important to limit the download to the latest archived version of the Open Minds Forum website up to, say, 7 January 2012 (since some webpages from this website are incomplete thereafter). This can be done by using the "to" qualifier in a relevant download command: wayback_machine_downloader http://lucianarchy.proboards.com/ --to 20120107

7) Found the relevant files at C:\Users\YOURusername\websites


A larger (but still incomplete archive) was helpfully brought to my attention by Manuel Lamiroy, at the link below:
http://openmindsforum.com/?fbclid=IwAR02OW1MhYwIVd1X3LpM6nqw_Rw-JnUz8P50f3e2AJzOOKsdi0kk1aZ80T0

This second archive overlaps with the Wayback Machine's archive, but has considerably more pages. I understand that this larger archive is a partial copy of the Open Minds Forum made by one of its users, "Fore". 

Unfortunately, these two archives had adopted different file naming formats, such as the 2 examples below:

(a) index.cgi-action=display&board=analysis&thread=6257&page=1

(b) index.cgi%3faction%3ddisplay%26board%3danalysis%26thread%3d7466%26page%3d1

Since these two archives included different (albeit overlapping) pages, I merged the two archive. To merge the archives, it was necessary to change the file name format used in every file in one the archives.  I used the free Bulk Rename Utility to replace relevant filenames for thousands of html files. In particular, I used that software to make the following changes to the file names:

(1) Add a .htm suffix to each file

(2) Replace %3d with =

(3) Replace %3f with -

(4) Replaced %26 with &

Most of the links in these archives no longer worked. In some cases the links were in the wrong format. In many cases the link was, in any event, dead. To address these problems, I had to make various changes to the format of most links in each webpage and I also changed most external links to link to the archived page nearest 7 January 2012 by adding a relevant prefix (https://web.archive.org/web/2012010751143/http... ; i.e. 51143 after the relevant date as part of a Wayback Machine archive URL), as I discussed in my previous items at:

https://isaackoiup.blogspot.com/2021/10/realityuncovered-forum-searchable-pdfs.html
https://isaackoiup.blogspot.com/2022/03/glenn-campbells-ufomindcom-best-ufo.html

To make the necessary changes to links, I used the free "Notepad++" software (in particular, its "Find in Files" function to rapidly search and replace in a directory of about 50,000 separate html files). For some of the changes, it was useful to use Regular Expressions. The relevant changes included:

(1) Replace ?board with -board

(2) Replace index.cgi? with index.cgi-

(3) Replace index.cgi with index-cgi

(4) Replace http://www.lucianarchy.proboards.com/index with index

(5) Regular Expression replace thread=(.*?(?=")) with thread=\1.htm" to find all strings that begin with thread= with any characters until inverted commas and then replace them with the same string plus .htm (effectively adding .htm to each internal link, so that they work with the archived pages).

(6) Regular Expression replace &user=(.*?(?=")) with &user=\1.htm" 

(7) Regular Expression replace board=(.*?(?=")) with board=\1.htm" 

(8) Regular Expression replace index-cgi#(.*?(?=")) with index-cgi#\1.htm" 

(9) Replace .htm.htm with .htm

(10) Replace href="http with href="https://web.archive.org/web/2012010751143/http

One of these steps changed over 7 million links and several other steps involved changes to over 0.5 million changes each, so the total number of links replaced is over 8 million.

Finally, I also created a PDF version of the archive. I started to use Adobe Acrobat's batch create file option to convert the html files into searchable PDFs to create a PDF . Unfortunately, after 24 hours less than 5% of the archive had been converted (suggesting that the full conversion would require more than 20 days).   I raised this problem on Twitter and two users (Taras Young and "NJR") recommended trying the free "wkhtmltopdf" software. I was able to create a very brief batch file containing a loop to use wkhtmltopdf to convert each html file in a directory to a PDF, at a speed of between 5 and 10 times faster than Adobe Acrobat.  The real work is done by the (free) wkhtmltopdf software. This very short and simple batch file is just executed from within a folder of html files to cycles through the files and convert each one to a PDF file and save them in a stated directory: 

[start batchfile]
@echo off 
for %%i in (*.htm) do "C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" "%%i" "E:\temp\pdfsfromhtml\%%~ni.pdf" 
[end]

It should be possible to modify this simple batch file to create others loops to rapidly complete various mini-projects in relation to archiving/searching other UFO material (including converting multi-page threads from AboveTopSecret and other forums into a single PDF per thread...).  Oh, and a related discussion prompted some thoughts on archiving "UFO Twitter".










No comments:

Post a Comment