Category Archives: Techie stuff

1397496529473.jpg

Preparing the British Museum Bronze Age index for transcription

Originally published at: http://research.micropasts.org/2014/04/30/preparing-the-index/

Since late 2013, the MicroPasts team has been preparing the British Museum‘s (BM) Bronze Age Index to be the first offering on our crowd-sourcing platform. This corpus consists of around 30,000 (roughly A4 sized) cards (holding information going back to as early as 1913).  The majority of these are double sided and generally have text on the front and a line drawing on the reverse (there are many variants that have been discovered, such as large fold out shield plans.)

MicroPasts · Application  British Museum Bronze Age Index Drawer B16 · Contribute
The Crowd sourcing platform

Over the last few years, several curators have mooted exercises (Ben Roberts, now at Durham University attempted to turn the transcription into an AHRC funded collaborative Doctoral Award) to turn this amazing resource into a digital archive, but this had not come to fruition until the advent of the MicroPasts project. Internal discussions had been raging on how best to deal with these cards for a number of years, and it was felt that this project could perhaps be the ideal solution and provide museum and public interaction of a new type, which the BM had not explored previously.

To enable this corpus to be digitised is reasonably straight forward and we have employed Dr Jennifer Wexler (@jwexler on Twitter) to manage the scanning process, and she has been doing this since February after her return from field work in Benin.

The equipment needed for this is relatively straight forward, the BM has acquired two high capacity/speed scanners (Canon) which can scan 60 and 100 sheets per minute at 600 dpi and once this initial project is over, they can be reused for turning more archival materials into potential crowd sourcing materials. You can see a picture of Neil’s former office (he’s just moved to a nicer one -we’re not jealous) being used as the scanning centre below in one of his tweets:

The first drawer scanned is known as A9 (this application on the platform), and this was done by the Bronze Age Curator Neil Wilkin (@nwilkinBM on Twitter) over a few weeks whilst dispensing with his other duties. Once Jennifer returned, scanning started in earnest! These high resolution images were then stored in various places to facilitate good data preservation (on an external 4TB hard drive, the Portable Antiquities Scheme server cluster and onto Amazon S3) and they were then stitched together by Daniel Pett (@portableant on Twitter), as composite images using a simple python script and then uploaded to Flickr (for example see this set) for the crowd-sourcing platform to access and then present them as tasks for our audience to assist with. All of these images have been released under the most liberal licence that Flickr permits (we would have ideally liked to make them CC0, but this option does not exist) and so they are served up under a CC-BY licence. The data that will be transcribed, will also be made available for download and reuse by anyone, under a CC0 licence. The embedded tweet below, shows an example of one of the stitched cards:

The platform that we’re using for serving up the crowd sourcing tasks has been created by Daniel Lombraña González (lead developer – @teleyinex  on Twitter) and the Pybossa team, and it is a departure from the usual technology stack that the project team has used previously. Installation of the platform is straightforward and it was deployed on to Portable Antiquities Scheme hardware in around 15 minutes. We then employed Daniel to assist with building the transcription application skeleton (in conjunction with project lead Andy Bevan (not on Twitter!) and Daniel Pett) that would be used for each drawer, whilst we also developed our own look and feel to give MicroPasts some visual identity. If you’re interested, the code is available on GitHub and if you have suggestions from improvements, you could either fork the code or comment on our community forum.

Originally published at: http://research.micropasts.org/2014/04/30/preparing-the-index/

For the last few months, building up to launch, lots of debugging and user testing was conducted to see how the site reacted, whether the tasks we offered were feasible and interesting enough. Chiara Bonacchi (@Chiara_Bonacchi) and Adi Keinan (@Adi_Keinan) worked on the main project site, building our Facebook and Twitter engagement.

Chiara has also developed our evaluation frameworks, which we were integrating into the system and feel are vital to discovering more about people’s engagement with our platforms and how their motivations progress through time, and hopefully the project’s success! This evaluative work hopes to be one of the first following the development of individual users’ interaction on a crowd-sourcing website.

And then we launched and tasks are ongoing:

This project is very exciting for the BM and especially for our curatorial staff. It could unlock new opportunities and Neil sums up very succinctly, why we are doing this public archaeology project, so we’ll leave it to him:

Thank you for participating!

544x306

Lost Change: mapping coins from the Portable Antiquities Scheme

Today sees the launch of Lost Change, an innovative and experimental application that allows coins found within England and Wales and recorded through the British Museum’s Portable Antiquities Scheme (PAS), to be visualised on an interactive, dual-mapping interface. This tool enables people to interrogate a huge dataset (over 300,000 coin records can be manipulated) and discover links between coins’ place of origin (the issuing mint or a more vague attribution if this location is uncertain) and where they were discovered and then subsequently reported to the PAS Finds Liaison Officers.

While much of the the data is made available for re-use on the PAS website under a Creative Commons licence, some details are closely guarded to prevent illicit activity (for example night-hawking or detecting without landowner permission) and so this application has been developed with these restrictions in mind. An object’s coordinates are only mapped to an Ordnance Survey four-figure National Grid Reference (which equates to a point within a 1km square), and only if the landowner or finder has not requested these to be hidden from the public.

The distribution of coins is biased by a number of factors (a project funded by the Leverhulme Trust is looking at this in greater depth) which could include:

  • Whether metal detecting is permitted by the landowner, or the topography makes detecting difficult
  • Soil type and land use
  • Whether there is an active community of metal detectorists within the vicinity

544x306

The tool is straightforward to use. The left hand pane holds details for the place of discovery; the right hand side holds details for the place of issue, the mint. These panes work in tandem, with data dynamically updating in each, depending on the user’s choice. A simple example to get going is this:

  • Click on “Iron Age” within the list of periods
  • Within the right hand pane, click on one of the three circular representations and this will highlight where the coins from this mint were found in the left hand pane. The larger the circular representation, the more coins from that mint have been recorded.
  • If one clicks on any of the dots within the left hand pane, these are selected and an overlay in the right hand pane allows dynamic searching of the PAS database.

The PAS intends to build on this project at a later stage and will be seeking further funding to enable this to happen, with many more facets of discovery available to query the dataset.

Lost Change was funded through a £5,000 grant from the CreativeWorks London ‘Entrepreneur-in-Residence’ programme.

The PAS is grateful to Gavin Baily and Sarah Bagshaw from Tracemedia who developed the application, and everyone who has contributed to the PAS database.

If you have any feedback on the project, please contact the PAS viainfo@finds.org.uk.

This originally appeared on the British Museum blog: http://blog.britishmuseum.org/2014/02/19/lost-change-mapping-coins-from-the-portable-antiquities-scheme/

Yahoo! Openhack EU (Bucharest)

A pair of history enthusiastsLast weekend, I was invited to attend the Yahoo! Openhack EU event that was held in Bucharest, Romania as part of a team of “History Enthusiasts” to try and help participants generate ideas using cultural sector data. This came about from the really successful History Hack Day that Matt Patterson organised earlier this year and due to this, Yahoo!’s Murray Rowan invited him to assemble a team to go to Romania and evangelise. Our team comprised myself, Jo Pugh from the National Archives and our leader Matt; we went armed with the datasets that were made available for the hackday and a list of apis from Mia Ridge (formerly of the Science Museum and now pursuing a PhD).

The Openhack event (hosted in the Crystal Palace Ballrooms – don’t leave the complex we were told, the wild dogs will get you!) started with a load of tech talks, most interesting for me was the YQL one (to see how things had progressed), Douglas Crockford (watched this on video later) on JSON and also Ted Drake‘s accessibility seminar. One thing I thought that was absent was the Geo element, something that is extremely strong at Yahoo! (api wise before you moan about maps) and an element that always features strongly at hack days in mashups or hacks. Our team then gave a series of short presentations to the Romanians who were interested in our data unfortunately not too many, but that seemed to be the norm for the enthusiasts. We felt that a lot of people had already come with ideas and were using the day as a collaborative catalyst to present their work, not that this is a bad thing, be prepared and your work will be more focused at these events. Between us we talked about the success of the hackday at the Guardian and Jo presented material from the National Archives and then we discussed ideas with various people throughout the day; for example:

  1. Accessing shipping data – one of the teams we spoke to wanted some quite specific data about routes. However, we found a static html site with a huge amount of detail and suggested scraping and then text extraction for entities mentioned and producing a mashup based on this – see submarines hack
  2. How to use Google Earth time slider to get some satellite imagery for certain points in time (the deforestation project was after this)
  3. Where you can access museum type information – history hack days list
  4. Which apis they could use – Mia Ridge’s wiki list

I tried to do a few things whilst there, some Twitter analysis with Gephi and R (laptop not playing ball with this) and building some YQL opentables for Alchemy’s text extraction apis and Open Library (I’ll upload these when tested properly). Matt looked at trying to either build a JSON api or a mobile based application for Anna Powell-Smith‘s excellent Domesday mapping project (django code base) and Jo played with his data for Papal bullae from the National Archives using Google’s fusion tables and also looking at patterns within the  syntax via IBM’s Manyeyes tool.

Ursus blackHacking then progressed for the next 24 hours, interspersed with meals and  some entertainment provided by the Algorythmics (see the embedded video below from Ted Drake) who danced a bubble sort in Romanian folk style, and 2 brief interludes to watch the Eurovision (Blue and the Romanian entry). We retired to the bar at the JW Marriot for a few Ursus beers and then back to the Ibis for the night before returning the next day to see what people had produced to wow their fellow hackers and a panel of judges. Unfortunately, I had to head back to the UK (to help run the ARCN CASPAR conference) from OTP when the hacks were being presented, so I didn’t get to see the finished products. The internet noise reveals some awesome work and a few that I liked the sound of are commented on below. I also archived off all the twitter chat using the #openhackeu hashtag if anyone would like these (currently over 1700 tweets). There was also some brilliant live blogging by a very nice chap called Alex Palcuie, which gives you a good idea of how the day progressed.

So, after reading through the hacks list, these are my favourites:

  1. Mood music
  2. The Yahoo! Farm – robotics and web technology meshed, awesome
  3. Face off (concept seems good)
  4. Pandemic alert – uses webgl (only chrome?)
  5. Where’s tweety

And these are the actual winners (there was also a proper ‘hack’, which wasn’t really in the vein of the competition as laid out on the first day, but shows skill!):

  • Best Product Enhancement – TheBatMail
  • Hack for Social Good – Map of Deforested Areas in Romania
  • Best Yahoo! Search BOSS Hack - Take a hike
  • Best Local Hack –  Tourist Guide
  • Hacker’s choice – Yahoo farm
  • Best Messenger Hack – Yahoo Social Programming
  • Best Mashup – YMotion
  • Best hacker in show – Alexandru Badiu, he built 3 hacks in 24 hours!

To conclude, Murray Rowan and Anil Patel‘s team produced a fantastic event which for once had a very high proportion of women (maybe 10-25% of an audience of over 300) in attendance – which will please many of the people I know via Twitter in the UK and beyond. We met some great characters (like Bogdan Iordache) and saw the second biggest building on the planet (it was the biggest on 9/11 the taxi drivers proudly claim) and met a journalist I never want to meet again….. According to the hackday write up, 1090 cans of Red Bull, 115 litres of Pepsi and 55 lbs of coffee were consumed (and a hell of a lot of food seeing some of the food mountains that went past!)

Here’s to the next one. Maybe a cultural institution can set a specific challenge to be cracked at this. And I leave you with Ted Drake‘s video:

Archiving twitter via open source software

Over the last few months I’ve been helping Lorna Richardson, PhD student at the Centre for Digital Humanities at UCLThe Twitter logo from their official set. Her research is centred around the use of Twitter and social media by archaeologists and others who have an interest in the subject. I’ve been using the platform for around 3 years (starting in January 2008) and I’ve been collecting data via several methods for several reasons; for a backup of what I have said, to analyse the retweeting of what I’ve said and to see what I’ve passed on. To do this, I’ve been using several different open source software packages. These are Thinkupapp, Twapperkeeper (open source own install) and Tweetnest. Below, I’ll run through how I’ve found these platforms and what problems I’ve had getting them to run. I won’t go into the Twitter terms and conditions conversation and how it has affected academic research, just be aware of it…..

Just so you know the server environment that I’m running all this on is as follows, the Portable Antiquities Scheme‘s dedicated Dell  machine located at the excellent Dedipower facility in Reading, running a Linux O/S (Ubuntu server), Apache 2, PHP 5.2.4, MySql 5.04 and with the following mods that you might find useful curl, gd, imagemagick, exif, json and simplexml. I have root access, so I can pretty much do what I want (as long as I know what I’m doing, but Google teaches me what I need to know!) To install these software packages you don’t need to know too much about programming or server admin unless you want to customise scripts etc for your own use (I did….) You can probably install all this stuff onto Amazon cloud based services if you can be bothered. I’ve no doubt made some mistakes below, so correct me if I am wrong!

Several factors that you must remember with Twitter:

  1. The system only lets you retrieve 3200 of your tweets. If you chatter a lot like Mar Dixon or Janet Davis, you’ll never get your archive :) Follow them though, they have interesting things to say….
  2. Search only goes back 7 days (pretty useless, hey what!)
  3. Twitter change their T&C, so what is below might be banned under these in the future!
  4. Thinkuppapp and Twapperkeeper use oauth to connect your Twitter account so that no passwords are compromised.
  5. You’ll need to set up your twitter account with application settings – secrets and tokens are the magic here – to do this go to https://dev.twitter.com/apps and register a new app and follow the steps that are outlined in the documentation for each app (if you run a blog and have connected your twitter account, this is old hat!)

Tweetnest

Tweetnest is open source software from Andy Graulund at Pongsocket. This is the most lightweight of the software that I’ve been using. It provides a basic archive of your own tweets, no responses or conversation threading, but it does allow for customisation of the interface via editing of the config file. Installing this is pretty simple, you need a server with PHP 5.2 or greater and also the JSON extension. You don’t need to be the owner of the Twitter account to mine the Tweets, but each install can only handle one person’s archive. You could have an install for multiple members of your team, if you wanted to…..

Source code is available on github and the code is pretty easy to hack around if you are that way inclined. The interface also allows for basic graphs of when you tweeted, search of your tweet stream and has .htaccess protection of the update tweets functionality (or you can cron job if you know how to do this.) My instance of this can be found at http://finds.org.uk/tweetnest Below are a few screen shots of the interfaces and updating functions. The only issue I had with installing this was changing the rewriteBase directive due to other things I am up to.

Tweet update interface
Tweet update interface
Monthly archive of tweets
Monthly archive of tweets

Thinkupapp

Thinkupapp has been through a couple of name changes since I first started to use it (I think it was Thinktank when I first started), and has been updated regularly with new β releases and patches released frequently. I know of a couple of other people in the heritage sector that use this software (Tom Goskar at Wessex and Seb Chan of Sydney’s Powerhouse Museum mentioned he was using it this morning on Twitter.)

This is originally a project by Gina Trapani (started in 2009), and it now has a group of contributors who enhance the software via github and is labelled as an Expertlabs project and is used by the Whitehouse (they had impressive results around the time of the State of the Union speech). This open source platform allows you to archive your tweets (again within the limits) and their responses, retweets and conversations (it also has a bonus of being able to mine Facebook for pages or your own data and it can have multiple user accounts). It also has graphical interfaces that allow you to visualise how many followers you have gathered over time, number of tweets, geo coding of tweets onto a map (you’ll need an api key for googlemaps), export to excel friendly format and search facility. You can also publish your tweets out onto your own site or blog via the api and the system will also allow you to view images and links that your virtual (or maybe real) friends have published on their stream of conciousness. You can also turn on or off the ability for other users to register on your instance and have multiple people archiving their Tweet stream.

This is slightly trickier than tweetnest to install, but anyone can manage this if they follow the good instructions and if you run into problems read their google group. One thing that might present as an issue if you have a large amount of tweets is a memory error – solve this by setting ini_set(‘memory_limit’,’32M’); in the config file that throws this exception, or you might time out as a script takes longer than 30 seconds to run. Again this can be solved by adding set_time_limit ( 500 );  to your config file. Other things that went wrong on my install included the SQL upgrades (but you can do these manually via phpmyadmin or terminal if you are confident) or the Twitter api error count needed to be increased. All easy to solve.

Things that I would have preferred on this are clean urls from mod_rewrite as an option and that maybe it was coded using one of the major frameworks like Symfony or Zend. No big deal though. Maybe there will also be a solr type search interface at some point as well, but as it is open source, fork it and create plugins like this visualisation.

You can see my public instance at http://finds.org.uk/social and there’s some screen shots of interfaces below.

My thinkup app at finds.org.uk
My thinkup app at finds.org.uk
Staffordshire hoard retweets
Staffordshire hoard retweets

Embed interface
Script to embed your tweet thread into another application
Graphs of followers etc
Graphs of followers etc

Twapperkeeper

The Twapperkeeper archiving system has been around for a while now, and has been widely used to archive hashtags from conferences and events. Out of the software that I’ve been using, this is the ugliest, but perhaps the most useful for trend analysis. However, it has recently fallen foul of the changes in Twitter’s T&C, so the functionality of the original site has had the really useful features expunged – namely data export for analysis. However, the creator of this excellent software created an opensource version you can download and install on your own instance; this has been called yourTwapperkeeper. I’ve set this up for the Day of Archaeology project and added a variety of hashtags to the instance so that we can monitor what is going on around the day (I won’t be sharing this url I am afraid….) Code for this can be downloaded from the Google code repository and again this is an easy install and you just need to follow the instructions. Important things to remember here include setting up the admin users and who is allowed t0 register archives, working out whether you want to associate this with your primary account in case you get pinged for violation of the terms of service, setting up your account with the correct tokens etc by registering your app with twitter in the first place.

Once everything is set up, and you start the crawler process, your archive will begin to fill with tweets (from the date at which archiving started) and you can filter texts for retweets, dates created, terms etc. With your own install of twapperkeeper, you can still export data, but at your own risk so be warned!

British Institute of Persian Studies

BIPS logoAfter a few months of on-off work, I’ve finally finished the British Institute of Persian studies website. It has taken a bit longer than I expected as we’ve had to get comments from various stake holders on the committee of the Institute. I’m actually quite pleased with it and I’m getting more pleased with textpattern as a web content management platform all the time. I’ve also started using a vitual server at oneandone which seems pretty good value for around £19 pcm.

The website has two domains, bips.org.uk and bips.ac.uk (we’re not totally sure about future directions for this.) Any feedback on this gratefully received. Now onto the ICOMON website which will be trilingual. Bit less time to do this one though! Maybe I’ll burn a candle as usual from both ends and the middle (I’m also working on some Iron Age data at work at present… but that is a secret….)

Experiments

BIPS logoI’m just building a new website for the British Institute of Persian Studies to replace their old one . And I’ve been experimenting with adding Google and flickr to the basic Textpattern driven content management system. I haven’t gone down a plugin path for this, instead I’ve used a really good idea from David Ramos and adapted this to suit my idea for mapping research articles and archaeological site notes. The current website doesn’t offer this content (sites), so the information is currently lifted directly from Wikipedia and the Institute’s scholars may wish to expand it and correct Wikipedia’s errors if they exist. The basic result can be seen on my dev server version and has resized infowindows, short excerpts from the info, geo co-ordinates and a direct output (&output=kml) to Google Earth. I’m trying to decide whether mouseover or click is the best usability model for this interface, I am leaning towards the click as it allows you to focus better.
The last thing that I think I’ll do is add custom markers using the Society logo as the pointer but I need to be back in the office as I don’t have any image editing software on my macbook (any open source packages anyone can suggest?) So does the integration add value to this website? I think it helps visualise the locations that get mentioned in the text of the site.
I’ve learnt quite alot from these pages produced by the Blackpool Community Church JavaScript Team and the results are useful. Maybe I’ll do something with this for the Scheme’s website running data direct from the database.

The other thing that I’ve played with is 24Ways tutorial on parsing JSON data from flickr’s api to add value to the website and draw in current photos of Iran, Persia and archaeological sites. I’ve not done and JSON stuff before, but I like the result! As mentioned in the article, this output is at present undocumented, but seems a lot faster than the RSS method I use. You are still limited to how many photos you pull out and I can’t see how to just use Creative Commons licensed data as yet. So once again, like the google implementation, it is hard coded into one of Textpattern’s template pages that drive a section. For a simple CMS, Textpattern offers some great functions and is extensible. Better meta data handling would be great and a better image management or inbuilt gallery would also be brilliant. However, it suits my needs for projects like this.
Of course the Institute might hate it, and the design is still up for discussion but as it is css driven easy to change, I’m going to add some sand and desert plants to the background and the blues denote the sky. I’ve also started to build in microformats and I’ve also used zenphoto gallery for images (all temporary from the lovely Vesta Curtis) and I’m starting to integrate the forum software into the CMS.

Anyway, comments gratefully received….even if you hate it or think it could be improved. I’m still just dabbling with this technology lark.