Developing a National Core Reference Data Set
Today was a big day for open data in the UK with the publication of Stephan Shakespeare’s Independent Review of Public Sector Information. Of course the Government will be responding formally to the review over the next couple of months but I wanted to explore one of the main issues highlighted by the review. For me, the stand out recommendation from the report was to establish a National Core Reference data set.
Stephan does not set out these data in detail, but offers a common sense set of principles to assist us:
“Within such National Core Reference Data we would also expect to find the connective tissue of place and location, the administrative building blocks of registered legal entities, the details of land and property ownership.”
We should define 'National Core Reference Data' as the most important data held by each government department and other publicly funded bodies; this should be identified by an external body; it should (a) identify and describe the key entities at the heart of a department’s responsibilities and (b) form the foundation for a range of other datasets, both inside and outside government, by providing points of reference and interconnection.”
As a result of a number of Prime Minister’s letters, the 2011 Growth Review, the Open Data White Paper and Departmental Open Data Strategies the Transparency Team in the Cabinet Office have been focussed on getting the most valuable data out of government and into the hands of developers. We have drawn upon advice from senior figures in the world of open data like Nigel Shadbolt, Rufus Pollock and others who sit on the Transparency Board to help define our priorities. In addition, through the user request process on data.gov.uk and with assistance from the Open Data User Group released data sets are increasingly user-demand driven. We report to Parliament on the progress made against commitments made in the Open Data White Paper through quarterly Written Ministerial Statements.
Of course, we have already made substantial progress, with over 9,000 data sets released by government departments and agencies, including critical information such as government spending, crime data and school performance. Stephan himself as Chair of the Data Strategy Board has helped drive the release of core location data from the Trading Funds such as Ordinance Survey OpenData maps and the important recent release of Land Registry Historical property data
But clearly there is more to do, and Stephan offers a potential route to the next stage. I like Stephan’s suggestion that there is a core set of data that is critical to each government department, although I suspect it may not be straightforward to define. If we were thinking about transport it seems likely this might involve the location of transport terminals, live timetable information, government subsidies/spending, and fares, (all of which is already openly available), but is there more?
I would like to open a conversation with data users and, of course, government departments about what we think this core reference data set might consist of. It seems sensible to think of it as additional to the unique demand-led process we have in place via data.gov.uk and ODUG, and I suspect there will also always be a role for a central focus on particular data sets that might not be seen as ‘core’ but which could still make a transformational difference to citizens.
I am interested in your comments about how we might create a more detailed set of definitions for a national core reference data set that could be applied across Departments, and I’m also interested what people think the top five of so data sets in each department might be, regardless of whether they have already been released.