Tuesday, December 6, 2022

Over 62,000 pages of the best of UFO Twitter - Archiving UFO Twitter as searchable PDFs

Thanks to code developed by a fellow Twitter user ("QEDJoe"), over 60,000 pages of material from UFO Twitter are now freely available online as searchable PDF documents in the UFO / Forteana archive I've been helping develop that is hosted on the website of the Swedish AFU.   

(The code to archive Twitter posts as PDFs developed by QEDJoe is now also freely available online and may also be of interest to those outside the UFO community since it could be applied to other material as well...).

Frankly, I'm not the world's biggest fan of Twitter or the discussion of UFOs on it. I find too many of the posts about UFOs on Twitter are extremely polarised, to the point that those posts don't really contribute to the discussion. Polarisation in discussions of ufology has been a problem for decades, but Twitter appears exacerbate this issue. Also, the restriction on the length of posts can get in the way of providing supporting references or follow-through on the points being made.

But ... the discussion of UFOs on Twitter is now a significant part of the culture of modern ufology, so someone should be making a bit of an effort to archive it. Also, many links have been posted on Twitter that may be difficult to find in the future without preserving these posts. Furthermore, the speed of communication and the size of the community on Twitter mean that it has the potential (although, sadly, this potential is rarely fulfilled...) for issues to be examined in depth relatively quickly.

So, during the last few months I have been keen to archive at least a sample of posts about UFOs on Twitter. I found some code online which was intended to parse Twitter archives into text files and was able to adapt it to work on an archive of my own Tweets to convert them to PDFs (after overcoming an encoding issue with the original code). Unfortunately, the code would not work on archives of tweets provided by some other researchers that kindly offered to act as guinea pigs.

The effort to archive Twitter material relating to UFOs (and, potentially, other topics) was given a _major_ boost recently when someone with considerably more computer coding skills than me kindly agreed to help out. "QEDJoe", a physicist with an interest in Forteana, very promptly sorted out the code I had been struggling with, getting it to work and then making some improvements. Crucially, he also stuck around to deal with some issues that arose with material provided by other researchers. That persistence eventually paid off and the code now seems to be able to cope with any Twitter archive I've thrown at it recently.  

(Some formatting issues could be the subject of improvement at some point in the future, e.g. producing the archive in chronological order or addressing a few remaining encoding issues).

Anyone can duplicate or extend this mini-project (or apply it to non-UFO material). The instructions for using QEDJoe's code are included at the link to that code above. Basically:

(1) The relevant Twitter user needs to request that Twitter provide them with a copy of the material they have posted there. This should take a minute or so, with some brief instructions with screenshots at the link below: https://twitter.com/isaackoi/status/1573650114417250304…

(2) About 36 hours later, the relevant Twitter user will get an email letting you know that the archive is available in about 24-36 hours. That person will need to download it and either process the contents themselves _OR_ send a single file ("tweets.js", in the "data" folder provided by Twitter) to an archiver (**NOT** the full archive which includes their private Twitter messages etc...). (If the Twitter user wants to, they can rename the file extension for that single file from .js to .txt then open it to see the content, which is a set of their tweets and related links etc). When sending that single file to an archiver, problems can sometimes be encountered attaching the file to an email (in which case the file can be sent using a free file sharing website, such as Wetransfer.com). 

(3) The archiver then needs to install and execute QEDJoe's code at the link above (folllowing the instructions that accompany that code). That code is run from the Command Prompt (run cmd), execute the relevant Python code, including give the path to the twitter archive file and the name to be given to the relevant PDF [e.g. : python -m tweets2pdf -f tmw.js -p "Tweets - Mick West.pdf" -i]


Twitter can, of course, be searched using the search function on that website. But I find searchable PDFs to be useful for searching (in addition to preserving the material), since it is possible to use free PDF software to give more control over a search, e.g. searching for two words where they appear in the same paragraph or excluding a result if a further keyword is present.
The nuanced control of searches available in relation to PDF material means that UFO research can be conducted more efficiently and effectively. Since most people interested in ufology have limited time (including me...), having tools that enable more efficient research is pretty important to me. More effective research within ufology would also be, well, rather nice to see.

I have reached out to some of the people active on Twitter whose posts are largely confined to UFOs/Forteana and, in my highly subjective view, are among the more interesting or useful posters on these topics on Twitter.

(There isn't an easy way to exclude certain material from a user from the archiving process, so I did not include various people whose UFO material I would have liked to include but who also post a considerably proportion of non-UFO material on their Twitter account).

The initial batch of UFO Twitter material now online using QEDJoe's code - thanks to the cooperation of the various researchers listed below - includes the following:



Tweets - Aaron Gulyas (@saucerlife) 1,222 pages

Tweets - Alejandro Rojas (@alejandrotrojas) 4,161 pages

Tweets - Bob McGwier (@BobMcGwier_N4HY) 4,210 pages

Tweets - Bradley Johansson (@bradjohansson21) 1,176 pages

Tweets - Charlie Wiser (@likeitmatters3) 1,562 pages

Tweets - Chris Rutkowski (@ufologyresearch) 844 pages

Tweets - Curt Collins (@CurtCollins579) 616 pages

Tweets - Dan Zetterstrom (@TheZignal) 5,857 pages

Tweets - Daniel Miller (@SicCoP1) 2,841 pages

Tweets - Frank Stalter (@UfoSunday) 1,214 pages

Tweets - Giuliano Marinkovic (@OmniTalkRadio) 4,805 pages

Tweets - Isaac Koi (@isaackoi) 212 pages

Tweets - Jake Mann (@itsredactedjake) 1,191 pages

Tweets - Jay Austin (JayMatthewsMMA) 2,309 pages

Tweets - Jeff Knox (@mrjeffknox) 2,170 pages

Tweets - Joe Murgia (@TheUfoJoe) 7,384 pages

Tweets - Jonathan Davies (@IWANTTOKNOWUK) 5,120 pages

Tweets - Keith Basterfield (@KeithBasterfie1) 177 pages

Tweets - Michael Huntington (@MHuntington7) 3,284 pages

Tweets - Mick West (@MickWest) 9,417 pages

Tweets - Nick Coffin (@InvNightSchool) 1,176 pages

Tweets - Steve Long (@UAPorSAP) 1,308 pages

Total Pages 62,256 pages











No comments:

Post a Comment