January 2016 meetup at the National Library of Scotland

Cross-posted from http://nicolaosborne.blogs.edina.ac.uk/2016/01/19/open-knowledge-edinburgh-meet-up-19/

Screenshot of Greener Leith's Edinburgh Open Data Map

This evening I’m at Open Knowledge Edinburgh Meet Up 19, at the National Library of Scotland on George IVth Bridge, organised by OK Scotland.

I’ll be liveblogging so, as usual, any corrections, tweaks, comments etc. are very much welcome.

Tonight’s event has seven lightning talks:

  • Gill Hamilton (NLS): Welcome
  • Pippa Gardner (Urban Tide): Scottish Government Open Data Training Pilot
  • Allan Brown/Gill Hamilton: Identifying People in Scotland’s Post Office Directories
  • Ewan Klein: The UK Local Open Data Index
  • Fred Saunderson: The NLS Open Data Strategy
  • Akiko Kobayashi: The Fountainbridge Community Wikihouse
  • Jeremy Darot: Data Linkage in Scotland / Greener Leith‘s Edinburgh Open Data Map

Gill is starting us off with an introduction to the venue and the meet up – which is number 19. She is also giving a shout out to two Wikimedian’s in Residence: Sara from Museums and Galleries Scotland; and Ewan from Wikipmedian in Residence for Edinburgh University.

Pippa Gardner (Urban Tide): Scottish Government Open Data Training Pilot

This is Pippa’s first visit to OK Edinburgh – I’ve been to the Glasgow one before a few times though. So this is basically a big plug for #scotopendata, https://scotopendata.eventbright.co.uk.

ScotOpenData is a free open data training pilots for public sector organisations across Scotland, funded by Scottish Government. We are running 28 courses over the pilot year and this is a pilot – we are interested in content, style, duration, everything. It’s being handled in a very open way. We can get round about 560 people in that year but that’s just a drop in the ocean of the sector and the people working with open data. We’ve run 5 courses so far, two more this week (Aberdeen and Inverness) so do pass on the message.

At the moment there is a 1 day course: Open Data Opportunity – an introduction to what open data is, the cultural changes not just the technical issues. That’s covering background, strategy, aims of Scottish Government, engagement. The 2 day course then goes into much more detail and covers more technical aspects.

The 1 day course is designed with the needs of public sector leaders, senior managers and data owners in mind – although we see a wider range of people coming along. It’s quite high level, not technical at all but talk best practice and engagement.

The 2 day course is about the publication process, the publication chain, platforms to use, APIs, licensing, etc. And one of the things we are finding already is that the 2 day course is more popular than the 1 day course. There is a massive appetite for this throughout the country, for that detail not just the “what is” aspect.

A really interesting journey so far. Started in October, running until September 2017. Have first quarterly reporting coming up in the next few weeks. We have had 52% take up already. We have had strong representation form local authorities, NHS and a range of other public sector bodies so far. And we have used networks and social media to spread the word but do share onwards, all are welcome.

Feedback so far has included a lot of people reassured by knowing that there are others in the same boat as them – commenting that they feel they are “Not alone”, “struggling with limited resources”, and that there is a “great deal to gain from greater collaboration”. There is a particular interest in making business cases etc. We think the exchange of ideas and experience and networking is a hugely valuable part of these sessions and we need to think about how to sustain that network on an ongoing basis.


Q1) What are the reasons people are giving for coming along?

A1) For the 2 day course the technical aspects have been really important, there is a real appetite for that. They want to know how to do it and how to coordinate across Scotland. The 1 day course is a lot about people starting out, de-mystifying, and really wanting a focus on benefit and business case – what can I use to take to my senior managers to make my case?

Q2) What is the eligibility here? Are community councillors eligible?

A2) As long as you have an association with an eligible public sector body it should be fine, but I can check. There is a list you can look at too. The only people we’ve had to turn away so far have been academic sector – their training is funded separately.

Q3) Has there been any follow up with participants?

A3) We ask questions through eventbright at sign up, we ask again at the end of the course, and then we do 3 month follow up. Some show a dip after the course – we think that may be about them judging their own skills and then reassessing them in light of learning more. But they are quite engaging workshops, getting people talking about what they will do when they go back…

Allan Brown: Identifying People in Scotland’s Post Office Directories

I’m going to be talking about my honours project using the OCR data from scanned historical Post Office Directories (PODs). And taking those original directories and making them into a searchable database – looking for surname, first name, address, business name etc.And I am using machine learning to do this.

So we wanted to identify feature vectors – what a forename looks like, what a surname looks like etc. so that the system can use that as a training set to learn what those features look like, so that it has a mathematical model to predict what kind of word new ones might be.

So, an example of doing this would be to take feature vectors of the form [cloud coverage, temperature, wind speed] and to predict if it will rain. The system looks for features that can differentiate rainy and non-rainy days…

So, we are doing that sort of prediction for the PODs. Why use machine learning for this? Well it handles format differences between directories well – which is good as the directories from across 100 years vary here. It handles format differences within directories… and ambiguities. OCR errors mean its not just a case of looking up words in dictionary (70% accuracy when we tried that) and our machine learning is hitting about 80% accuracy.

The benefits of this project is to provide historians with open source tool for exploring Scotland’s history. And a free resource. It serves as a springboard for further work with similar data. And demonstrates what can be done with open data and a broad range of experts from different field – showing the benefit of using this data beyond historians so that more can be done with the data, making it more useful.


Q1) This is important stuff. Are the NLS, and are you, relaxed about copyright and open data?

A1 – Allan) The data we are using is already open source. But the format isn’t that searchable or sortable. So the idea is to attribute metadata to it so we can attribute people to it. I think it will be almost entirely open source.

A1 – Gill) We license transcriptions as CC0, images as CC-BY-NC. But we are using the giant XML transcriptions.

Q2) If you took current valuation data rolls, could you do the same thing? The valuation rolls of commercial properties etc. the NRS data.

Comment) Better to ask for the current owner of the data.

A2 – Allan) This is very much designed for Edinburgh post office directory. Very

Q3) How far through?

A3) About half way through… Can take a page, identify people in that data… Looking to de-depulicate data across directories…

Q4) We worked with these same directories a few years ago (for AddressingHistory), looking for locations based on file structure rather than machine learning but that work might be of interest to combine with the person work you are doing.

Comment) I think you already have the POD Parser from that

A4) Yes, but would be useful to discuss.

Ewan Klein: The UK Local Open Data Index

The UK Local Open Data Index is part of a wider OK Foundation project looking to measure and see how mature, and how open countries are.

So, you can look at the US City Open Data Census and this compares data sets deemed important, then ranked (with a traffic light colour system) by openness.

So, if you want to run a census for your country you can do that nationally or locally. The community agrees the key data sets. Then we have a hack or sprint event doing some leg (desk) work to see what is available – what open data is available on crime in LA, say. And then from that the ranking by importance is done.

A census was started for the UK but it didn’t get that far. Nottingham, Cambridge, Leeds, Manchester etc. were looked at but the data gathered wasn’t terribly thorough. The default data sets include things like real-time transit; air quality etc. These are reasonable… But are they useful for Scottish cities? For instance:

  • Real-time transport data – is controlled by transport operators
  • Air quality – is published by Air Quality Scotland, collected locally
  • Transport timetabled – again, transport operators
  • Crime statistics – collected by police, published by Scottish Government
  • Procurement contracts – published by Scottish Government
  • Food Safety inspections – published by Food Standards Agency
  • Traffic accidents – Published by UK Dept for Transport

Many of these data sets are not at city level, and many cities in scotland will have the same data available so not useful to compare.

In Australia they used different data choices: public amenities; addresses; trees; garbage collection times and places; bike paths and footpaths; ward boundaries; property boundaries; public buildings; building outlines; etc.

I think it is up to the community to decide those data sets that matter, that have relevance and meaning to those communities. I’d like this community to be involved in that. Saturday 5th March is International Open Data Day and I’d like to do a sprint and to carry out an Open Data Census for Seven Scottish Cities! Join me!


Q1) Aren’t there standard city measures?

A1) There is an ISO standard for city indicators – with about 400 measures.

Q2) Can you find those automatically with web crawlers etc.

A2) There are limits – for instance on whether the license is machine readable. And whether available for download, or by API etc. I’d be happy to sit down and look with you at the data sets… Doing some of this automatically is useful to a point, but you need human judgement too. But first you have to decide what is important.

Q3) Are there clear requirements for openness?

A3) There are quite specific criteria to use.

Q4) It’s maybe dull at city level but it’s a good thing that Scottish Government is ensuring data is comparable across cities. That’s a good thing. And there are things that aren’t being done nationally that could be done more, collected more…

A4) I think that’s important and you could argue that having green all the way down might be a good thing.

And we’ve just had a wee break here… Now onwards… 

Fred Saunderson: The NLS Open Data Strategy

Fred Saunderson

We published our NLS Open Data Publication Plan last week. This comes out of the Scottish Government Open Data Strategy which builds on the principles of open data by default, quality and quantity, usable by all, releasing data for improved governance, releasing data for innovation.

That Scottish Government strategy calls particularly on public sector organisations to publish their data in a format more appropriate for reuse – 3* or above on Tim Berners-Lee’s Deployment Model. So this is really exciting for us, it’s a really strong encouragement and a reason to talk to senior managers, to get buy-in on a plan. There is an appetite for better understanding, better structure, it’s a really nice way to go about it.

So, last month we published our Open Data Publication Plan (http://www.nls.uk/about-us/open-data/). And our plan is to provide our data in 3* and above. Unlike many public bodies we are set up to provide information, we have been thinking about this for a long time, so we are in a good place to get our data to a good standard. We benefit from already having the culture and mindset of data and data sharing.

Fred is explaining Berners-Lee’s deployment model.

So, our plan lists the data we have and will make available and two major priorities:

  1. To make available as 3* open data the data that we already make available
  2. We can also identify what we aren’t yet supplying at that level. And we aim to publish appropriate non personal and non sensitive data as 3* open data.

So, we want better data. But we also want better reuse which will benefit us but also will benefit wider society.

Generally we wil be licensing under CC-0 or CC-BY for data. And we aim to release as CSV, EAD (Encoded Archival Description) and MARCXML. We may release in other formats but those are our main formats.

We will have 14 datasets opened by the end of 2016; a further 8 identified by the end of 2017. We have a list of the datasets – it is online – but I wanted to point out that it’s an amalgamation of collections metadata and corporate information. For instance the Emigrants Guides to North America, the Bibliography of Scottish Literature in Translation (BOSLIT) – these are things we collate or use. But we also have datasets like Payments with a value in excess of £25,000 – much more data on the running of the organisation.

In terms of other datasets we need to identify those datasets that we can open up. We have that list of what is already available, but we will be adding to that. We will release this data on our website to start with. But I know that the Scottish Government is also working on a data discovery site where it will also be discoverable.


Q1) You mentioned some difficulty identifying non personal, non sensitive data… I was wondering if you’d seen the ODI Data Spectrum as that helps a lot with clarifying that.

A1) I’ll take a look!

Q2) What about 4* or 5*?

A2) We haven’t done any yet, so we want to get some institutional buy-in early, and get to the 3* place first.

A2 – Gill) We will get there… But it’s a step process. We have to do what the government recommends first, and then move onwards.

Q3) Isn’t this a legal requirement? There is a directive… I thought it was a mandate to publish a plan.

A3) That’s different, that’s an EU directive perhaps…? The Scottish Public Sector has a different specific requirement on public sector bodies, which is what we are working towards.

Akiko Kobayashi: The Fountainbridge Community Wikihouse

wikihouse (1)

I’m an architect and I’m going to give you an overview of a project I worked on last year. The site is in the Fountainbridge area. The council acquired the land for Boroughmuir High School and, once it acquired that land, we ased if we could set up a community project. So, our project is Fountainbridge Community Initiative, and brings in the Community Garden and The Forge. The Wikihouse was a project undertaken for £3500 of public and housing association funding.

For me a Wikihouse is about open source design – a whole other talk on the meaning of that. It is also about digital fabrication; and the ease of assembly.

So, Open Source Design… There are hundreds of projects on the Wikihouse website. Earlier designs formed portal frames with two layers sandwiched together – still in use but now the design uses box beam… (cue exciting presentation of samples!). So, you have one layer, with sides, build of plywood to create a very strong cross-section.

I used the design, with some additional privately shared designs, and some hacks. So we have a frame design that builds a strong building but, as it is temporary, we don’t have foundations but instead use breeze blocks as ballast to help hold it down. Then throughout the structure we have a waterproof membrane to keep it water tight.

So if you go to the ecommons folder the models and designs are provided in SketchUp – as that’s free. It’s fascinating to explore and play with those designs. There is little documentation so you have to pick apart the design that way anyway, and you have to personally take responsibility to decide if that’s a sensible approach.

In SketchUp you can then explode the design to see the components and use a tool (e.g. AutoCAD) to make changes etc. Then you use ? software to allocate the pieces to your plywood. That is then turned into instructions for a CNC (Computer Numerically Controlled) router to cut and create those pieces. I’m sure many of you love learning new things – and what I love about new things is sharing that with people. I loved learning how to use the CNC machines, so I trained a bunch of people (from all sorts of backgrounds) to do that to, to help fabricate the pieces.

To me open is also about ease of assembley… Making this as easy as possible means as many people as possible can get involved and be part of the process. To find out how easy it was to do I tried a test assembly of one 1-1 frame to see how long it took, if it worked, was it easy enough to do. The version (v.4.2) I was basing my wikihouse on hadn’t ever been built before so it was a major test. And I was feeding back to the Wikihouse Foundation a lot on that. And we got a lot of feedback from the community on how they found the process.

There was one sequence issue – not readily obvious in the 3D model – that I found in building this. So I fed that back to the form for version v.4.2… But no one saw it unfortunately (but it was fixed another way!).

We also did some load testing (with human and a keg of Guiness!) and alongside design and fabrication I undertook some 1:3 scale workshops (for context Akiko’s model along today is 1:10) to get a sense of how this would work – with community members, with Primary 2 students, with structural engineers (very competitive but also very lovely in coming up with fun and useful testing suggestions!) etc.

The event and build took place in October. There is one other building on site, the Rubber Mill (shortly to be the home of Edinburgh Print Makers), where we were allowed to store components.

So, we began the project building frames.. We had half our volunteers from Canmore Housing Association, school children visiting… We built the frames, lifted them and made connections with pin connectors and joints. We had a rockclimber on roofing duty… Companies donated some of the materials. The developers provided a camera to take images of the building. My favourite picture is one where an early mock up was pretty much happening during the build!

One of our participants said: “I can’t believe I helped to build a building. I thought only professionals could do that.”. Now, obviously there are some things that only professionals should do but there are lots of things we can all build, do, and be part of. I’m interested in self-build houses and that idea of how much you can do yourself is part of that.

So, take a look at #wikihouseEDIN (@fountainbridgec@WikiHouse) – do get in touch and use the space – we have community stuff taking place there, Collective Gallery’s Marxist reading group meets there!


Q1) Is the issue with self-build homes that we don’t have the land?

A1) The land is there for developers to use. There is a bigger policy issue around land banking.

Q2) What are the future uses for the system?

A2) People see this in different ways. Alastair Parbin, who leads Wikihouse, he’s in touch with development companies. There is wastage. As nice as digital fabrication is… I had a nightmare founding an affordable CNC machine I could use affordably… For remote spots, non developed countries… CNC is a big ask. I don’t know how he’s getting on with that. The strongest feature for me is ease of assembly. My idea for this site was as a catalyst to help people to understand that they could get involved in their own self-build house, or whatever… Self-efficacy. Maybe even just for confidence in building a piece of Ikea flat pack furniture…

Jeremy Darot: Data Linkage in Scotland / Greener Leith’s Edinburgh Open Data Map

My day job is as a statistician at the Scottish Government working on making data available as Linked Open Data. Up until now it has been difficult to link data sets together to get the maximum value from open data sets. So the Scottish Government has put together a framework to make that happen, particularly for research and statistics purpose. So, if you are researcher and have a question that you think would be answered by combining several data sets, and would benefit Scotland, we can offer some support. So please come and speak to me.

Outside of work I work with a charity called Greener Leith with a crazy plan to plant 1000 trees in Leith, but we are also working in sustainable communities and sustainable development. It can be really hard for individuals to track planning, to engage with it, and to have a voice in that. Happily Edinburgh Council are getting better at making data available as open data on their website though.

So, I decided to take this data and create a map of Edinburgh… This is a really simple map, created in R (used in academia and government) and it’s a little bit of server side and client side. I didn’t even have to use a GIS to do the spatial data analysis with this. So, I started taking a small data set from the Leith community council – a curated database of major projects. For each project – such as the tram extension – we have status, information, a link to the Council website and any consultation links. Ideally I’d like to do this for all of Edinburgh. I know the Improvement Service will be launching a website to do this for all planning applications… But some manual curation is needed… Actually things are only going to be in there when an application is made, and often as a community one wants a voice before that stages, before an application is made.

So, that was my starting point… I have also brought in the Local Development Plan… Protected areas (including listed buildings). This is a framework… But also with planning you need to understand infrastructure. For instance, in Leith we are a highly populated and growing area… but if anything access to GP surgeries has been decreasing… So I wanted to map GPs (and ideally catchment areas – I have a request in for that), dentists, care homes, and also things like shops. And i have also added options for air quality, based on the University’s air quality monitoring station. It’s possible to add lots of things to this map… One thing that is quite useful is to view administrative boundaries. And you can search the map.

I provide links on the map to the Greener Leith site, and a page with much more information, the data sources etc.

So, this was a weekend pet project… I want to add census data, data on population health, Council Tax paid… And will also use a MongoDB to let people save views, add data, etc. I’ve no idea if this is useful and usable so do play around with it and let me know any feedback, bugs etc.

The map that Jeremy built can be found here: https://myleith.shinyapps.io/myedinburgh/, with more resources available at https://myleith.wordpress.com.


Q1) How will all this information, these maps help developers? And infrastructure shown?

A1) Ideally I’d want it to be useful. I’d like to think developers will look at this stuff when planning. I got some early data under a data sharing agreement… The Council holds this data but not all open. The idea is that this sort of data levels the playing field a little bit for consultations and planning processes.

Q2) Can we get access to the R code that drives this?

A2) Yes, I’ll ping the code on GitHub… And my code is built on that.

Q3) A lot of this data is available on Edinburgh’s atlas actually…

A3) A lot of open data is available, but

Q4) Did you say that you did this over a weekend?

A4) I did, yes. And for free – thought it costs me $9 a month to host. I used CartoDB and Leaflet, etc.

Q5) Could this be your day job?

A5) I have a day job! But I know that the Scottish Government has a great open data portal that is really lovely.

Ewan: I should note the MESH project was responsible for many of the detailed contributions to Edinburgh data on OpenStreetMap for that, and we have Richard Rodger, the leading light of that project, present in the audience tonight.

Richard: Indeed. We have what will be the most detailed map in Europe, and we want that to be 100% accurate. So when Jeremy zooms in you can see every single garden… Every house number. They will all be there and that work continues. All of the plots in huge detail. And every business listed is actually accurate at the moment – it’s really really useful. You could use that for checking what else is already open/nearby etc. (e.g. to find the best place for a new pharmacy near a GP’s surgery etc).

And policy isn’t best made with contemporary data, but with 5 years or 10 years’ data. I would like to take the last 10 years of planning data, put in a georeferenced database… And contact community and community organisations to see what is happening, and the upcoming planning proposals. Most importantly it would be great to know if data sets would have 5 or 10 year back projections – so useful for policy making.

Jeremy: There is data going back to the 80s on changes of use that could be useful…. Would be great to work together actually.