Smart Data Hack
In the week running up to International Open Data Day, we held the second incarnation of the Smart Data Hack at Edinburgh’s School of Informatics. It is, in essence, a 5-day hackathon for students, focussing on datasets that are locally relevant. The luxury of devoting a whole week to the event was courtesy of the University of Edinburgh’s annual Innovative Learning Week: a time to learn new things in new ways. This year, by the end of the week, there were a total of 16 teams (comprising some 65 students) who submitted projects to the judging panel.
So how does the Smart Data Hack work? One of the first tasks was to bring data publishers on board. This was not completely straightforward, but we were fortunate that two of our strongest supporters from 2013 again stepped forward with data (and prizes): City of Edinburgh Council, thanks to the vision and enthusiasm of Sally Kerr, and Skyscanner. Having the flexibility to accommodate both open and proprietary data within the same event is a plus, even though it makes it hard to neatly categorise the hackathon.
We had five other significant contributors to the hackathon: Scottish Neighbourhood Statistics, supported by Swirrl; Scottish Parliament; Friends of the Earth Scotland, Project Ginsberg, supported by Interface3; and Edinburgh University Student Association (EUSA). Apart from EUSA, the connections with these organisations had been established as a result of Open Knowledge Foundation Scotland activities, most notably our regular #OpenDataEDB meet-ups.
The structure of the week went like this. On the morning of Monday, we invited data holders to present their data and to pitch some ideas for working with the data; Monday afternoon was a time to give students (primarily First and Second Year undergraduates) an introduction to technologies that they might not yet have encountered. The rest of the week, up until 2.00 pm on Friday, was given over to hacking, briefly interrupted by a show’n’tell session on Wednesday to mark the half-way point. Friday pm was taken over by groups of judges visiting each team, to receive a short demo and to ask questions. (Although logistically more challenging, we believe that this is fairer procedure than judging solely on the basis of a more-or-less slick team presentation at the front of the room.)
What were the projects?
Unfortunately there isn’t space here to describe all the projects, and certainly not in the detail they deserve. Here is a brief rundown on seven projects from the hackathon which worked with open datasets.
1. Scottish Neighbourhood Statistics Data
Several of the teams were inspired by the new OpenDataScotland.org site to work with data from Scottish Neighbourhood Statistics and the Scottish Index of Multiple Deprivation. Bill Roberts of Swirrl generously gave his time to help with technical support.
This project created a Child Deprivation Index based on data from the 2012 Scottish Index of Multiple Deprivation, drawing on research that found significant correlations between child neglect on the one hand, and on the other, neighbourhood poverty, school performance, housing, and especially parental depression. The map is intended to alert service providers to areas where greater support for families would be particularly valuable.
The project aims to create a visual representation of information about schools in Scotland and the associated data zones from Scottish Neighbourhood Statistics Data in order to show interesting links between different schools and communities.
The map represents the schools as blue circles while the data zones that send children to a given school are connected to the school with a black line whose opacity is determined by the number of children involved. Data zones are coloured according to their ranking on the Scottish Index of Multiple Deprivation.
2. Scottish Parliament Data
The project aims to examine how much Members of the Scottish Parliament speak up and on what topics, with the goal of analysing their involvement in their constituency and to see if this correlates with its rankings on the Scottish Index of Multiple Deprivation.
Track your MSP’s mood in (nearly) real time. Get to know your MSP through data scraped from Parliament and committee meeting transcripts, and analyzed using natural language processing techniques.
3. City of Edinburgh Data
Spot the bin is an interactive game that teaches you about the city of Edinburgh. The goal of this game is to visit and learn about the location of parks and trees, museums and libraries, community centres, and much more.
In the game, you will use your Android smartphone to compete against others in the city.
As you ‘capture’ locations, you will be blazing a trail. Your aim is to break your competitor’s trail while protecting your own.
Using Tweets and data from City of Edinburgh Council, the project aims to display correlations between location, situation and happiness. Using NLTK, Leaflet, Twitter and multiple other APIs across a massive amount of data we aim to display cleanly and effectively any links we find between the data points.
The project uses data about air quality in Edinburgh to build a heat map, and to show what kind of exposure to air pollution citizens are likely to experience in the course of different kinds of journeys across the city.
Some Lessons and Surprises
What did I learn from this year’s Smart Data Hack?
- Even though we pointed students to the masses of open data out there, many of them still had to resort to screen scraping to get what they wanted.
- It turns out that first year Informatics undergraduates can get to grips with SPARQL in a matter of hours if that’s what they need to pull linked open data out of OpenDataScotland.org
- Natural Language Processing was unexpectedly popular for getting information out of unstructured data such as Tweets and MSP’s speeches.
- Enthusiasm and an inspiring vision go a long way in persuading students to work with you, even when you don’t have a lot of data (hat tip to Jaclyn Kaye and Kate Ho).
What else? Organising the Smart Data Hack isn’t easy but it’s fun — and successful in large part because it is a co-creation with the students, not least via the support of CompSoc, the student computing society. And follow-through is important. We are now working to present the projects to wider audiences, and in some cases to see if they can be developed further in partnership with the data providers. This is really exciting. Although the Smart Data Hack is not part of the official University of Edinburgh curriculum, my hope is that it will contribute to teaching our students the benefits and possibilities of opening, spreading, and sharing data .
Finally, we didn’t have the energy to put on anything in Edinburgh for Open Data Day. But let’s give a shout out to Open Glasgow’s First Hackathon, which cleverly straddled the 22nd Feb.