Archiving twitter via open source software

Over the last few months I’ve been helping Lorna Richardson, PhD student at the Centre for Digital Humanities at UCLThe Twitter logo from their official set. Her research is centred around the use of Twitter and social media by archaeologists and others who have an interest in the subject. I’ve been using the platform for around 3 years (starting in January 2008) and I’ve been collecting data via several methods for several reasons; for a backup of what I have said, to analyse the retweeting of what I’ve said and to see what I’ve passed on. To do this, I’ve been using several different open source software packages. These are Thinkupapp, Twapperkeeper (open source own install) and Tweetnest. Below, I’ll run through how I’ve found these platforms and what problems I’ve had getting them to run. I won’t go into the Twitter terms and conditions conversation and how it has affected academic research, just be aware of it…..

Just so you know the server environment that I’m running all this on is as follows, the Portable Antiquities Scheme‘s dedicated Dell  machine located at the excellent Dedipower facility in Reading, running a Linux O/S (Ubuntu server), Apache 2, PHP 5.2.4, MySql 5.04 and with the following mods that you might find useful curl, gd, imagemagick, exif, json and simplexml. I have root access, so I can pretty much do what I want (as long as I know what I’m doing, but Google teaches me what I need to know!) To install these software packages you don’t need to know too much about programming or server admin unless you want to customise scripts etc for your own use (I did….) You can probably install all this stuff onto Amazon cloud based services if you can be bothered. I’ve no doubt made some mistakes below, so correct me if I am wrong!

Several factors that you must remember with Twitter:

  1. The system only lets you retrieve 3200 of your tweets. If you chatter a lot like Mar Dixon or Janet Davis, you’ll never get your archive 🙂 Follow them though, they have interesting things to say….
  2. Search only goes back 7 days (pretty useless, hey what!)
  3. Twitter change their T&C, so what is below might be banned under these in the future!
  4. Thinkuppapp and Twapperkeeper use oauth to connect your Twitter account so that no passwords are compromised.
  5. You’ll need to set up your twitter account with application settings – secrets and tokens are the magic here – to do this go to and register a new app and follow the steps that are outlined in the documentation for each app (if you run a blog and have connected your twitter account, this is old hat!)


Tweetnest is open source software from Andy Graulund at Pongsocket. This is the most lightweight of the software that I’ve been using. It provides a basic archive of your own tweets, no responses or conversation threading, but it does allow for customisation of the interface via editing of the config file. Installing this is pretty simple, you need a server with PHP 5.2 or greater and also the JSON extension. You don’t need to be the owner of the Twitter account to mine the Tweets, but each install can only handle one person’s archive. You could have an install for multiple members of your team, if you wanted to…..

Source code is available on github and the code is pretty easy to hack around if you are that way inclined. The interface also allows for basic graphs of when you tweeted, search of your tweet stream and has .htaccess protection of the update tweets functionality (or you can cron job if you know how to do this.) My instance of this can be found at Below are a few screen shots of the interfaces and updating functions. The only issue I had with installing this was changing the rewriteBase directive due to other things I am up to.

Tweet update interface
Tweet update interface
Monthly archive of tweets
Monthly archive of tweets


Thinkupapp has been through a couple of name changes since I first started to use it (I think it was Thinktank when I first started), and has been updated regularly with new β releases and patches released frequently. I know of a couple of other people in the heritage sector that use this software (Tom Goskar at Wessex and Seb Chan of Sydney’s Powerhouse Museum mentioned he was using it this morning on Twitter.)

This is originally a project by Gina Trapani (started in 2009), and it now has a group of contributors who enhance the software via github and is labelled as an Expertlabs project and is used by the Whitehouse (they had impressive results around the time of the State of the Union speech). This open source platform allows you to archive your tweets (again within the limits) and their responses, retweets and conversations (it also has a bonus of being able to mine Facebook for pages or your own data and it can have multiple user accounts). It also has graphical interfaces that allow you to visualise how many followers you have gathered over time, number of tweets, geo coding of tweets onto a map (you’ll need an api key for googlemaps), export to excel friendly format and search facility. You can also publish your tweets out onto your own site or blog via the api and the system will also allow you to view images and links that your virtual (or maybe real) friends have published on their stream of conciousness. You can also turn on or off the ability for other users to register on your instance and have multiple people archiving their Tweet stream.

This is slightly trickier than tweetnest to install, but anyone can manage this if they follow the good instructions and if you run into problems read their google group. One thing that might present as an issue if you have a large amount of tweets is a memory error – solve this by setting ini_set(‘memory_limit’,’32M’); in the config file that throws this exception, or you might time out as a script takes longer than 30 seconds to run. Again this can be solved by adding set_time_limit ( 500 );  to your config file. Other things that went wrong on my install included the SQL upgrades (but you can do these manually via phpmyadmin or terminal if you are confident) or the Twitter api error count needed to be increased. All easy to solve.

Things that I would have preferred on this are clean urls from mod_rewrite as an option and that maybe it was coded using one of the major frameworks like Symfony or Zend. No big deal though. Maybe there will also be a solr type search interface at some point as well, but as it is open source, fork it and create plugins like this visualisation.

You can see my public instance at and there’s some screen shots of interfaces below.

My thinkup app at
My thinkup app at
Staffordshire hoard retweets
Staffordshire hoard retweets

Embed interface
Script to embed your tweet thread into another application

Graphs of followers etc
Graphs of followers etc


The Twapperkeeper archiving system has been around for a while now, and has been widely used to archive hashtags from conferences and events. Out of the software that I’ve been using, this is the ugliest, but perhaps the most useful for trend analysis. However, it has recently fallen foul of the changes in Twitter’s T&C, so the functionality of the original site has had the really useful features expunged – namely data export for analysis. However, the creator of this excellent software created an opensource version you can download and install on your own instance; this has been called yourTwapperkeeper. I’ve set this up for the Day of Archaeology project and added a variety of hashtags to the instance so that we can monitor what is going on around the day (I won’t be sharing this url I am afraid….) Code for this can be downloaded from the Google code repository and again this is an easy install and you just need to follow the instructions. Important things to remember here include setting up the admin users and who is allowed t0 register archives, working out whether you want to associate this with your primary account in case you get pinged for violation of the terms of service, setting up your account with the correct tokens etc by registering your app with twitter in the first place.

Once everything is set up, and you start the crawler process, your archive will begin to fill with tweets (from the date at which archiving started) and you can filter texts for retweets, dates created, terms etc. With your own install of twapperkeeper, you can still export data, but at your own risk so be warned!

Day of archaeology 2011

QRcode for #dayofarch
The Day of Archaeology Qr code

One of the projects that I’m working on, alongside some other digital archaeologists (Lorna Richardson, Matthew Law, Jess Ogden, Stu Eve, Andrew Dufton and Tom Goskar) is the “The Day of Archaeology 2011”, a social media based project that will allow archaeologists working all over the world to document what they do on one day, July 29th 2011. I’m providing server space via the Portable Antiquities Scheme’s underused backup box and also configured the wordpress install and open source twapperkeeper for storing the social buzz.

This date coincides with the “Festival of British Archaeology“, which runs from 16th – 31st July 2011 and is one of the hundreds of events being held to celebrate archaeology in the UK and beyond.

So how does it work? Well, archaeologists taking part in the project will document their day through photography, video, facebook activity, twitter commentary and written blog posts. These will then be collated in realtime on the project’s dedicated website –, which will then provide a glimpse into a day in the life of people working in archaeology, from archaeological excavations to laboratories, universities, community archaeology groups, education services, museums and offices. This project is open to everyone working or volunteering in any aspect of archaeology from anywhere in the world – and even those who have defected! Currently, over 150 people and organisations have signed up. You could be next, so give archaeology a voice!

This innovative idea, follows on from the very successful “Day of Digital Humanities” and was dreamt up by Matthew Law and Lorna Richardson and was then built upon following a twitter conversation and subs

If you would like to get involved, email the project team at and you will receive further details and account details for the website nearer the date. If you have no experience of using blog software, there’s information on how to use the systems provided on the site. If you have experience in graphic design, perhaps you could consider entering the design a logo competition, rules and more information can be found on the project’s website.

The project is supported by:

The hashtag for this project is #dayofarch and can be used on tweets, blog posts and flickr photos to aggregate externally. Please consider using this tag if you refer to this project.

Portable Antiquities Scheme site wins an award

Museums and the web logo
Museums and the web logo

The Portable Antiquities Scheme website, which I rebuilt over a period of around 10 months from 2009 – March 2010 has just won an award at the international ‘Museums and the Web’ conference held in Philadelphia. I originally entered the Scheme’s website just to try and get it a bit more exposure in the international museum sector and it came first in the ‘Research/online collection’ category against some quite stiff opposition (last year’s section was won by the V&A!) Surprisingly for me, it also gathered votes in the people’s choice award, which makes me feel very humble. I actually found out that the site had won, via twitter whilst checking in on foursquare to the Ghazala beach bar in Sharm el Sheikh. Power to the web!

Other entrants included:

  • Museum Boijmans Van Beuningen
  • New York Botanical Gardens
  • J. Paul Getty Trust
  • Museum of Fine Arts, Boston
  • National Museum of American History
  • Museum of the City of New York
  • Windsor Historical Society
  • the STERNA consortium
  • Steve in Action Project Team
  • Museum of Fine Arts, Boston
  • Museum Victoria
  • Powerhouse Museum
  • Centraal Museum
  • Queensland Museum
  • The Strong (National Museum of Play, Toy Hall of Fame, ICHEG, National Toy Hall of Fame)

The site was created using Zend Framework (started around version 0.7 and now runs on version 1.11.3 – needs upgrading) and uses Ubuntu, Solr, MySQL and extensive use of YQL to power the various features that you’ll find. I’m really pleased that the site was recognised at such a prestigious conference and it is testament to all the people who contribute towards the Scheme’s success.

Palestine Exploration Fund flickrstream

Over the last few days, I have been adding a selection of the Palestine Exploration Fund‘s extensive image collection to a Flickr profile. The aim of this, was to try and make more people aware of some of the gems that the Fund has within the collection in Hinde Mews. This small slice of the photographic collection contains some amazing images of places and landscapes around Palestine. If you like them, please do consider joining the Fund to help with our charitable activities.

Panoramic photograph of the Dome of the Rock
Panoramic photograph of the Dome of the Rock
Peristyle of Temple of Jupiter Heliopolitan, Baalbek
Peristyle of Temple of Jupiter Heliopolitan, Baalbek

You can see more of these amazing images on flickr.