Linked data, registries and talking about the weather
Here in the UK we love to talk about the weather; it’s in our nature. But whilst we humans can deal with the ambiguity of everyday conversations, machines and software agents just don’t stand a chance. We might say that “it is misty with a temperature of 4 degrees” and the chances are you’ll have a good idea of what I mean (except if you’re from the US where you’ll think that I’m talking Fahrenheit rather than Celsius!). Without the benefit of context, machines can't infer that “mist” is the prevailing weather condition, that I’m talking about “temperature” of the air excluding effects of wind chill or warming from the sun and that, as a European, I’m stating the temperature value in “degrees” Celsius. To be interoperable, machines need all these details spelled out explicitly in the data they exchange.
The World Meteorological Organization (WMO), a UN specialised agency, facilitates the “free and unrestricted exchange of data and information [relating to weather, water resources and climate] in real- or near real-time relating to safety and security of society, economic welfare and the protection of the environment”. Since 1963 WMO has been operating a programme called the “World Weather Watch” that, amongst other things, coordinates the routine sharing of weather observations from the meteorological services of Member States. Without this international collaboration, the Met Office would be unable to forecast the weather for the UK or effectively monitor climate change.
When it comes to setting the standards for how weather observations and other meteorological data are shared internationally, WMO takes responsibility. WMO is also responsible, on behalf of the International Civil Aviation Organization (ICAO), for prescribing the standards for sharing meteorological data relating to international air navigation services. Did you know that it is a legal requirement for pilots to have valid “aerodrome routine meteorological reports” (METAR) and “aerodrome forecasts” (TAF) in order to fly?
The details of these data exchange standards are provided in WMO No. 306 “Manual on codes”. Not only do these standards say what type of information needs to shared and how it should be structured, they also include hundreds of code-tables that provide a controlled vocabulary for talking about the weather - everything from weather conditions, to runway deposit types, to sea surface states and even a classification scheme for the maturity of locusts! Effectively, the "Manual on codes” provides a shared language for talking about weather, water and climate. Sadly, this “shared language” is tightly coupled with WMO-specific data formats that have been designed over the years to be extremely compact. Whilst experts from meteorological services (and pilots!) have the necessary skills and training to understand data encoded in these formats, they’re pretty impenetrable to everyone else.
WMO recognises that, as the UN system’s “authoritative voice” on meteorology, it must respond to the growing expectation that anyone should be able to gain value from data - not just the technical elite. As such, WMO is beginning to publish new data standards based on common technology with ubiquitous tooling support, such as XML, and a commitment to the ISO 19100-series of geographic information standards (the set of international standards underpinning the European INSPIRE Directive and UK Location). As the national meteorological service of the UK, the Met Office is committed to driving this initiative within WMO. Met Office staff occupy a number of leading roles in WMO expert teams – including myself as the chair of the “Inter Programme Expert Team on Metadata and Data Representation Development” (IPET-MDRD); the expert team responsible for developing new data standards and approaches to interoperability. The first of these new data standards to be delivered, the ICAO Meteorological Information Exchange Model (IWXXM) was published by WMO in September 2013 to meet interoperability requirements from the international aviation community and changes to ICAO regulation.
During the development of IWXXM, WMO experts recognised that it was no longer sufficient for the code-tables defined in the “Manual on codes” to be published only in document form. The terms from these code-tables, also known as “controlled vocabularies”, needed to be referenced from XML-encoded data products. The use of canonical labels to reference such terms is error prone. For example, the runway surface deposit type “Damp” (from WMO FM-94 BUFR edition 4 code-table 0-20-086 “Runway deposits”) might be written in block-capitals (“DAMP”), or with typographic errors (“Dammp”) or in another of the official languages of WMO (“влажный”). Whilst it is possible to reconcile these terms with the controlled vocabulary, to do so is non-trivial. The alternative is the use of an unambiguous identifier for each term and for the code-tables within which those terms were defined. When it comes to assigning identifiers to things, Tim Berners-Lee’s 5-star deployment scheme for Open Data provides clear guidance: (for a 4-star rating) “use URIs to denote things - so that people can point to your stuff”. Better still, by using HTTP URIs, we can use the machinery of the Internet to resolve those identifiers to some useful information about the identified resource. To do this, WMO has established a sub-domain at wmo.int within which all these terms and code-tables can be published. So the runway deposit type “Damp” is identified as http://codes.wmo.int/bufr4/codeflag/0-20-086/1. Whilst this HTTP URI might not be everyone’s cup of tea in terms of readability, the WMO experts chose to map the URI path onto the pre-existing governance structures employed within WMO’s “Manual on codes”; “bufr4” relates to WMO FM-94 BUFR edition 4, “codeflag” relates to the code- and flag-tables within that data format specification, “0-20-086” relates to a particular code-table (“Runway deposits”) and “1” is the local identifier (or “code-figure”) assigned to the term within the code-table.
However, whilst a valid HTTP URI that doesn’t have to resolve to anything (HTTP 404 implies only that the server cannot find the requested resource - not that the resource does not exist!), it is plain good practice to publish information resources at the URLs.
Whilst the WMO experts were considering their options for publishing code-tables and terms in a web-accessible form, similar requirements were from within Government were emerging within the UK Government Linked Data working group (UKGovLD), a cross-government forum set up following publication of the Government’s Open Data white paper to “lead the creation and maintenance of the underpinning technologies for Linked Data and promote the benefits across the public sector”. Recognising their shared goals, DEFRA (as the Department responsible for UK Location) and Met Office commissioned the delivery of a proof of concept system from Epimorphics via the G-Cloud framework. Following completion of the proof of concept and a subsequent phase of funded development, Epimorphics has delivered an open source “Linked Data Registry” solution for managing controlled vocabularies (a.k.a. “Registers”) and the definitions of the terms used therein. The software is based on linked data principles & RDF and provides both a web application and RESTful API. Details can be found at the Linked Data Registry project wiki on GitHub. Additional funding is anticipated to drive further enhancement to this software.
On behalf of WMO, Met Office have used the Linked Data Registry software as the basis for publishing the WMO code-tables and terms in a web-accessible form. The "WMO Codes Registry” is deployed at http://codes.wmo.int. The current coverage of terms from the “Manual on codes” is sparse, as the initial objective was to publish the aviation-related terms and code-tables required to support IWXXM. However, there is commitment from WMO to expand the coverage and add multi-lingual content. In particular, WMO intends to add authoritative definitions from WMO No. 182 “International meteorological vocabulary”.
The primary goal of WMO was to publish terms in support of a new data standard. The result has been more profound; the crown-jewels of WMO, the code-tables and terms that provide a shared language for talking about weather, water and climate, have been published (at least in part) as Linked Data with a commitment from WMO to maintain these resources long-term. As a result, these terms, and the unambiguous semantics they relate to, are publicly visible and can be used by anyone publishing data that relates to the weather. Consistent use of these terms will make it easier to reconcile or merge data products from disparate publishers, and, as a result, derive more value from aggregating weather-related open data.
For more information about the WMO Codes Registry, please see the following presentations: "The WMO Codes Registry - web based publication of the Manual on Codes”, Tandy J, ECMWF’s 14th workshop on meteorological operational systems (PDF) and “Overview of the WMO Codes Registry”, Tandy J, TT-AvXML meeting 3 (PowerPoint). A user guide for the WMO Codes Registry can be found here. The Linked Data Registry project is hosted on GitHub; documentation is available on the wiki.
DEFRA are in the process of establishing a Registry instance at http://environment.data.gov.uk.
Jeremy Tandy. Technology Fellow at Met Office and Chair WMO IPET-MDRD.