Guest Post: A Developers' Guide to the Linked Data APIs - Jeni Tennison

Linked data offers some great advantages for publishing government data. The approach makes it easy to publish information in a way that allows it to be combined with other sets of data, without an up-front agreement about exactly what information should be published. The benefits arise from the emphasis on unambiguous, common identifiers for things, from the inherent extensibility of the RDF data model, and from the publication of data in a standard format. Linked data is a great way of publishing information for diverse and distributed organisations, such as government.

However, the RDF model, its various serialisations and the SPARQL query language are foreign to the majority of developers. Those developers understandably want to be able to use the tool chains that they are familiar with to access government data. Publishing data purely as RDF, and providing access purely through SPARQL queries raises an unacceptable barrier onto the use of that data.

We have therefore been working on a way of retaining the advantages that the linked data approach gives us, while providing a much more familiar API to that data. Here, we'll talk about what these APIs look like, how you can use them, and a little about how they works behind the scenes. We're particularly keen to hear feedback to help us improve its usability.

The example that we discuss here is a particular configuration of a linked data API that operates over the SPARQL endpoint provided for the Edubase dataset. Some of the features that we'll look at are generic features that will be common for any configuration; others are specific to the way that this particular education API has been configured. Going forward, we expect the majority of the data that we publish as linked data to be accessible through this type of API.

Simple XML and JSON Formats

Let's first look at a couple of pages to give you a flavour. Turn your browser to:

`http://services.data.gov.uk/education/api/school`

This page lists ten schools from the set of data that we have about schools. By default the result comes back as XML. The most important part (the actual list of schools) looks like:

    <items>
      <item href="http://education.data.gov.uk/id/school/100866">
        <label>Herne Hill School</label>
        <typeOfEstablishment href="http://education.data.gov.uk/def/school/TypeOfEstablishment_TERM_Other_Independent_School">
          <label>Other Independent School</label>
        </typeOfEstablishment>
        <gender href="http://education.data.gov.uk/def/school/Gender_Mixed">
          <label>Mixed</label>
        </gender>
        <establishmentNumber datatype="integer">6375</establishmentNumber>
        <uniqueReferenceNumber datatype="integer">100866</uniqueReferenceNumber>
      </item>
      ... other items ...
    </items>

This provides basic information about each school. The unique identifiers both for the schools themselves and for some of the concepts used to describe the school (such as the facts that it is an independent school and takes both boys and girls) are indicated through the `href` attributes. To keep things simple, the properties of the school are provided using elements without namespaces. Where a value has a datatype, the name for that datatype is provided in a `datatype` property.

The same information can also be retrieved as JSON, at:

`http://services.data.gov.uk/education/api/school.json`

In the JSON, each school looks like:

    {
      "_about": "http://education.data.gov.uk/id/school/100866",
      "label": "Herne Hill School",
      "typeOfEstablishment": {
        "_about": "http://education.data.gov.uk/def/school/TypeOfEstablishment_TERM_Other_Independent_School",
        "label":"Other Independent School"
      },
      "gender": {
        "_about": "http://education.data.gov.uk/def/school/Gender_Mixed",
        "label": "Mixed"
      },
      "establishmentNumber": 6375,
      "uniqueReferenceNumber": 100866
    }

Again, the URIs for the resources that have them are present, this time in the `_about` property, but otherwise the property names are simple strings which enables you to load the JSON and access the information it contains using standard dot-notation. Numbers are represented as numbers, booleans as booleans and other values as strings.

Behind the scenes, these simple XML and JSON formats are generated from an RDF graph. This graph can be viewed in RDF serialisations as well: both as RDF/XML and as Turtle. These different formats for the page is available in are listed within the page itself, for example in the XML version:

    <format>
      <item href="http://services.data.gov.uk/education/api/school.rdf?_page=1">
        <isFormatOf href="http://services.data.gov.uk/education/api/school?_page=1" />
        <format id="_:format_rdf"><label>application/rdf+xml</label></format>
        <label>rdf</label>
      </item>
      <item href="http://services.data.gov.uk/education/api/school.ttl?_page=1">
        <isFormatOf href="http://services.data.gov.uk/education/api/school?_page=1" />
        <format id="_:format_ttl"><label>text/turtle</label></format>
        <label>ttl</label>
      </item>
      <item href="http://services.data.gov.uk/education/api/school.json?_page=1">
        <isFormatOf href="http://services.data.gov.uk/education/api/school?_page=1" />
        <format id="_:format_json"><label>application/json</label></format>
        <label>json</label>
      </item>
      <item href="http://services.data.gov.uk/education/api/school.xml?_page=1">
        <isFormatOf href="http://services.data.gov.uk/education/api/school?_page=1" />
        <format id="_:format_xml"><label>application/xml</label></format>
        <label>xml</label>
      </item>
    </format>

Clients can also use content-negotiation to determine which format they want to use to retrieve the data. APIs can be configured to provide other formats such as HTML or Atom.

Navigation

Whatever format it has, the top level of the result document provides several pointers to enable navigation through the list of results. In XML, the top-level `<result>` element looks as follows:

    <result format="linked-data-api" version="0.2"
      href="http://services.data.gov.uk/education/api/school?_page=1">
      <type href="http://purl.org/linked-data/api/vocab#Page" />
      <isPartOf href="http://services.data.gov.uk/education/api/school">
        <type href="http://purl.org/linked-data/api/vocab#List" />
        <hasPart href="http://services.data.gov.uk/education/api/school?_page=1" />
      </isPartOf>
      <first href="http://services.data.gov.uk/education/api/school?_page=1" />
      <next href="http://services.data.gov.uk/education/api/school?_page=2" />
      <itemsPerPage datatype="integer">10</itemsPerPage>
      <startIndex datatype="integer">0</startIndex>
      <items>...</items>
      <definition href="http://services.data.gov.uk/education/api#schools" />
      <version>...</version>
      <format>...</format>
    </result>

The particular set of ten schools shown within this XML is the first page of a much longer list. The `<next>` element provides a pointer to the next page, while the `<first>` element points to the first page. On pages after the first page, the `<prev>` element points to the previous page. There are two URI parameters that support paging through the list:

  • `_page` provides the page number
  • `_pageSize` indicates how many items should be listed on each page (the default is configuration dependent)

The other link that's shown above is a link to some metadata about the API. This aspect of the API has yet to be fleshed out in detail, but we expect it to include:

  • some help about using the particular endpoint, such as a list of parameters that can be used to filter the results
  • the SPARQL queries that were used to generate the particular list, so that they can be used directly and as a learning aid
  • debugging information to help the developers of a particular API configuration

Currently, if you follow the link to the definition, you will see an HTML page that describes the configuration of this particular API. Each API can be configured to support a number of endpoints, whose URIs are described using patterns such as:

/education/api/school/constituency-name/{constituency}

The curly braces within a URI pattern indicate a variable that is used to construct the list. For example, accessing:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham`

will provide a list of schools within the constituency of Horsham. Each of these endpoints supports the same sets of features -- the ability to page through, sort and filter the list and so on -- but they may have different defaults.

Views

Each API endpoint has a number of views defined, which contain different information about each school. For most of the education API endpoints, these views are:

  • short - shows very basic information
  • medium - shows a few more fundamental details about the schools, such as its address
  • provision - describes the kind and number of children that they take
  • location - describes where the school is
  • performance - gives information related to their performance
  • admin - gives administrative information
  • all - gives you everything that's known about each school

The different views that are available are listed within the `<version>` element in the XML result:

    <version>...
      <item href="http://services.data.gov.uk/education/api/school?_view=short">
        <isVersionOf href="http://services.data.gov.uk/education/api/school?_page=1" />
        <label>short</label>
      </item>
      <item href="http://services.data.gov.uk/education/api/school?_view=medium">
        <isVersionOf href="http://services.data.gov.uk/education/api/school?_page=1" />
        <label>medium</label>
      </item>
      ... other versions ...
    </version>

As you can see from the URIs, you can choose a different view using the `_view` parameter. For example:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?_view=provision`

will show information about the numbers of children taught at the schools within Horsham.

You can also supplement your view with extra properties using the `_properties` parameter, with a list of the properties (elements in XML) you want to see separated by commas. For example, to see just the basic information supplemented by the latitude and longitude of each school, you can use:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?_view=short&_properties=lat,long`

Sorting

The results can be sorted by any combination of properties using the `_sort` parameter. Simply name the property (or element if you're looking at XML) that you want to sort on. For example, to short the schools in Horsham alphabetically based on their name, use:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?_sort=label&_view=short`

You can also sort in reverse order by prefixing the property with a hyphen:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?_sort=-label&_view=short`

Be aware that sorted results are much more time-consuming to generate than unsorted ones.

Filtering

The results for a given endpoint can be further filtered by anything you like, simply by using the names of the properties or elements in the JSON or XML as URI parameters to indicate what you want to filter by. For example, all schools that have nursery provision in Horsham:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?_view=provision&nurseryProvision=true`

The filtering can go down into any nested objects/elements using dot-notation paths. For example, you can filter to only get Girls schools in Horsham with:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?gender.label=Girls`

Filtering can also set ranges on numbers using min-{property} and max-{property}. For example, to see schools in Horsham that take seven-year-olds you could use:

`http://services.data.gov.uk/education/api/school/constituency-name/Horsham?_view=provision&max-statutoryLowAge=7&min-statutoryHighAge=7`

Behind the Scenes

As we've described here, the education API is a particular configuration of some generic middleware that can operate over any SPARQL endpoint. The configuration defines a set of URI patterns (API endpoints) each of which maps on to the queries that are used to construct the list, which is then formatted as required. For each request, the middleware:

  • selects some a page-worth of resources that should be viewed
  • views some set of properties of those resources to construct an RDF graph
  • formats the RDF graph in the required serialisation

The API configuration itself is done using RDF and specifies both the API endpoints themselves and the way in which the RDF properties that it exposes are mapped onto JSON properties or XML elements. Most of the time, the configuration uses the same kind of dot-notation property path syntax as is used in the URI parameters. For example, the endpoint based on constituency name is specified using:

    spec:schoolsByConstituencyName
      a api:ListEndpoint ;
      api:uriTemplate "/education/api/school/constituency-name/{constituency}" ;
      api:selector [
        api:parent spec:schoolsSelector ;
        api:filter "parliamentaryConstituency.label={constituency}"
      ] ;
      api:defaultViewer spec:viewerLocation
    
    spec:schoolsSelector
      a api:Selector ;
      api:filter "type=School&establishmentStatus.label=Open"
    
    spec:viewerLocation
      a api:Viewer ;
      api:name "location" ;
      api:properties "label,uniqueReferenceNumber,establishmentNumber,typeOfEstablishment.label,phaseOfEducation.label,gender.label,religiousCharacter.label,address.address1,address.address2,address.address3,address.town,address.region,address.postcode,lat,long,easting,northing,censusAreaStatisticWard.label,districtAdministrative.label,localAuthority.label,LSOA.label,MSOA.label,LLSC.label,parliamentaryConstituency.label,administrativeWard.label,urbanRural.label,hasGOR.label" .

We used this approach to make it easy to experiment with URI parameters and then simply drop the parameters into a configuration file. However, the specification language also supports generating lists of resources, views over those resources, and even entire RDF graphs using SPARQL directly, for added power and flexibility where that's needed.

Further Information

The specification for the configuration of the linked data API is available as a set of wiki pages. There are a few implementations in the works, which you can use to create your own APIs over any SPARQL endpoint; the most mature of these is the one behind the education API: an open-source PHP implementation, developed by Keith Alexander and others at Talis and called Puelia.

If you have any comments on the linked data API or want to know more, there is a linked data API Google Group that can be used for discussion.

Jeni is an independent consultant specialising in XML, XSLT, schemas and the semantic web. You can find more about her work at jenitennison.com

Comments

Local authority

Do you realise the establishment number is unique within a local authority?
The DCSF number is the LA number combine with the establishment number.

 Flag as offensive 

I agree

I agree, Linked Data is the way to when it comes to connecting diverse, distributed organisations with government data.

As Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

If the Data sets being shared with the developer community is consistent to one format, open and transparent, then it has the potential to land many different applications that will connect general public with the government.

-Steve

 Flag as offensive 

Where do we find these URLs?

It is nice to see a set of URLs that will effectively build and execute a SPARQL query, but where do we find the details of them? For example, what if I want to use the Town instead of the Constituency name? Is there a URL for that? If so, where is it listed?

I am amazed to see the amount of data that is available with things like this, but I am also amazed to see the lack of documentation for people building services with it.

 Flag as offensive 

Search by school name or postcode

I'm finding this very unhelpful. I'm looking for a solution to aid accurate school naming and addressing. Anone have any advice for searching the data in that way? Cheers, NEIL

 Flag as offensive 

Horses for courses

I think it is very clearly written, and we should thank the author for providing it. The caveat is that it assumes familiarity with the ideas behing api's: beyond web developers, few of us have those skills, even those of us with relatively technical backgrounds.

I'm a whizz with Excel or Access or Mapinfo, but very unclear (along with 99.9% of people I think) with how to easily move data from these quite technical api formats into more common desktop products which most people will want. A guide to those steps would be really helpful. I'd probably even pay for a course, but I've yet to see one.

 Flag as offensive 

URL - 404 error

I was interested in accessing this data for analytics - however the links in this post are giving me 404 errors.  The SPARQL link has changed to http://education.data.gov.uk/sparql/education/query.html.  I was wondering if you could update the blog post with the new API URLs?

Many thanks

 Flag as offensive