Update from ONS on data interoperability

Overcoming the incompatibility of statistical and geographic information systems

At first glance, publishing statistical data has always appeared to sit quite nicely with the principles of data transparency. Statistics are all about taking an observation and linking it to other attributes such as classifications or geographies. Statistics are also about making data available, and the UKSA publication hubhas been making UK level statistics available for many years.

So if statistical agencies have it all figured out, where is the challenge?

Whilst statistics are published regularly, geography is often overlooked. As geographers within a statistical organisation it seems obvious. “Everything happens somewhere”, and statistics are observations of ‘things’. Geography therefore must be a fundamental part of any statistic produced.

The reality is though, that statistical agencies are primarily concerned with building statistical systems and geography and statistics are rarely compatible. If a system is built to take the SDMX statistical data format, how do you include the requirements of GEMINI 2.1 metadata or GML? The statistical systems are often unable to handle the large size of the boundary files that geographers want to make available.

To try to overcome some of the interoperability problems between statistics and geography, ONS published the Geography Policy for National Statistics. This document sets out the geographic parameters that statisticians must work within if they intend to produce National Statistics, based around 7 core geographic principles that statistical users must adhere to. In isolation, the policy is not enough however, and geographic tools are needed to support users.

So this brings us back to the same problem. Statistical users need geographic tools and data to implement the policy. This data needs to be incorporated within products but the existing systems for disseminating statistical data, cannot work with geographic products of the size required. This therefore, is the challenge of data transparency for statistics.

Historically, ONS has used a combination of different disseminating mechanisms for geographic data such as web dissemination, orphan sites and DVDs through the post but these methods are inconsistent and don’t fit with the transparency agenda.

The solution is to use data.gov.uk as a single access point for discovery of geographic data, and to link from there to a geoportal (that is currently in development) where users could download the geographic products online. This goes most of the way to delivering the tools that users need to work with statistical data but there is also an opportunity to go further and provide geographic data as linked data, using the GSS codes that uniquely identify each geography to link the attributes from the different geographic products.

Now, instead of a 9 character GSS identifier, each geography is given a URI that allows it to not only be uniquely identified but also makes it available online. We therefore end up with identifiers such as http://statistics.data.gov.uk/id/statistical-geography/E05008305 that only require users to change the GSS code at the end to get to the geographic information that they need.

This has created a single online resource for all statistical geography data, where users can navigate through several products dynamically without being aware that they are doing so. It has allowed the data to be linked to other organisation’s data such as Ordnance Survey or legislation data. Users can now choose to take data in a variety of different machine-readable or human readable formats and the expectation is now that instead of users being supplied with products, they will build applications that take live feeds of the data. Instead of a quarterly product cycle, there will now be daily or weekly data updates.

So the end result is that data quality, data currency and data availability are all improved as a result of the current work that ONS is doing. The data that is now available will be used and linked in ways that it is impossible to comprehend at present and will undoubtedly surprise. As an organisation, ONS no longer needs to worry about the interoperability of geographic and statistical systems and products, as the data will be published and linked at the cellular level, removing the long term requirement for products. Ultimately, civil service departments are no longer telling users what data they need, but instead making all data available and allowing users to innovate and this can only be a good thing.


So does this mean that if we

So does this mean that if we look at a paper about how many people got sick in the UK, the statistics should indicate the right geographic zone or just keep account of it?

The link for the RDF/XML file shows the GSS notation but how would we go about finding the notation label using SKOS?

