Creating URIs

***Status:** Working Draft*

*This document forms part of the data.gov.uk guidance about publishing Linked Data within the UK public sector.*

On the web, we are used to URIs being used to identify web pages or downloadable content such as PDFs or CSV files. But URIs can also be used to name real-world things, such as ["St John's Primary School"](http://education.data.gov.uk/id/school/123065) or ["the Ministry of Justice"](http://reference.data.gov.uk/id/department/moj), or abstract things such as ["the Civil Service grade of Permanent Secretary"](http://reference.data.gov.uk/def/central-government/PermanentSecretary) or ["the service provided by a council for registering a baby's birth"](http://id.esd.org.uk/service/319).

Naming real-world and abstract things with URIs is extremely powerful for two main reasons:

* it doesn't require any up-front standardisation; as the owner of a URI, you do not have to negotiate with anyone else about what it means
* it helps people find more information, by plugging the URI into a browser; as the owner of a URI, you have control over what information you provide

It's important to have different URIs for real-world or abstract things and documents about those things. If someone requests the URI for "St John's Primary School", the web server cannot give them back the school itself because it is made of bricks and mortar, not bytes. All it can give back is a document about the school. We want to know about the school, of course, but we'll also want to know some things about the document too: when it was created, by whom, and how, so that we know how much to trust the information it contains.

## Reusing vs Inventing ##

Identifying things with URIs works best when everyone uses the same URI to mean the same thing, because that makes it very easy to tie data about that thing together. For example, we use the URI `http://reference.data.gov.uk/id/department/co` to name Cabinet Office in:

* spending data
* organogram data (both within Cabinet Office and across its NDPBs)
* energy usage data

and so on, which means re-users can tie the information across these data sets together very easily.

However, it's also acceptable for separate organisations to mint URIs for the same thing. These distinct URIs can be linked together as a separate step, after the fact. There is no need to wait for an 'official' URI to be minted for something.

We provide a number of URI sets within the data.gov.uk domain that are intended for reuse across the public sector. **A URI set is a collection of URIs for particular kinds of real-world (or abstract) things that follow the same pattern and are usually based on some unique identifier for those things.** These include:

* Government Departments
* Local Authorities
* Schools
* Railway Stations

As a general principle, you should invent URIs for real-world and abstract things if:

* you already name them (for example, assign them codes that you use internally)
* you need a URI for them and it doesn't already exist (for example, you have data that refers to them)
* you're not quite sure whether you mean the same as someone else when you use a term (for example, 'Kent' might be viewed differently as an administrative area or a transport area)

## URI Design ##

URIs for things should be designed to:

* uniquely name the thing that they name
* be in use for a long time
* be short and human readable
* incorporate existing identifiers where available

You should have a URI minting policy within your organisation that works with your back-end systems, but do not worry if not all the URIs that you use work immediately.

### URIs for Real-World Things ###

The following diagram shows four kinds of resources that are related to real-world things: URIs for the real-world things themselves, URIs for the documents about those real-world things, URIs for sets of real-world things of the same type, and URIs for documents about those sets.

More detail about each of these kinds of URIs, and the way in which they should be structured is provided in the following table.

Suggested Pattern Example and Notes
Real-World Thing URIs for things that exist in the real world, such as schools, railways stations or hospitals.
/id/{type}/{id}

http://education.data.gov.uk/id/school/520965

A request to one of these URIs should result in an HTTP 303 See Other redirection to a document that describes the thing. The identifier should not be tied to a particular backend (such as the row number of something within a database); it will be something relevant in many datasets, such as a school's unique reference number or a station's TIPLOC code. Note that the case convention for the type within the URI is lower-case-hyphenated.

/id/{type}/{id}/{child-type}/{child-id}

http://reference.data.gov.uk/id/department/bis/unit/finance-commercial-group

These URIs should be used when the identifier for a real-world thing is only unique within a particular parent thing. Examples are units within organisations or junctions on motorways. They should be handled in the same way as above.

/data/{dataset}/{date}#{type}{number}

http://data.education.gov.uk/data/edubase/2010-09-01#school37

Do not use URIs of this form unless there is no good reusable identifier for something within a dataset (see URIs for datasets, below). URIs of this form are closely coupled to datasets, and we use a fragment identifier (eg #school37) rather than a path to simplify publication in this case. URIs of this form might be later linked up to other, dataset-independent, URIs through a separate matching step (such as through tallying company names and postcodes).

URI Set URI for a set of things that follow a particular URI pattern and of a particular type, such as the set of schools.
/id/{type}

http://education.data.gov.uk/id/school

A request to one of these URIs should result in an HTTP 303 See Other redirection to a document that describes the URI set, for example including metadata about the type of things that use this URI pattern, and a list of a few examples.

/id/{type}/{id}/{child-type}

http://transport.data.gov.uk/id/road/M5/junction

These URIs work in the same way as those above, but are for URIs that have more than one level.

Generic Documents URIs for documents about things or sets of things.
/doc/{type}/{id}

http://education.data.gov.uk/doc/school/520965

These are documents about real-world things. A request to a URI of this form should result in a document that contains current information about that thing.

/doc/{type}

http://education.data.gov.uk/doc/school

These are documents about sets of things. A request to a URI of this form should return a document that describes that set and should usually give a list of some examples of that type. It may also be a starting point for an API that enables you to search for particular things of this type.

/doc/{type}/{id}/{child-type}

http://education.data.gov.uk/doc/road/M5/junction

These are documents about sets of things, in the same way as above.

### URIs for Datasets ###

Datasets are resources on the web that actually contain data, usually bringing together information about multiple real-world things.

Suggested Pattern Example and Notes
Datasets URIs for data about a selection of real-world things, or pure statistical data.
/data/{dataset}/{version}

http://data.bis.gov.uk/data/organogram/2010-06-30

A request to a URI of this form should return information about a dataset, and usually also the data that it contains. If the dataset is huge, it should be split into multiple subsets, with the document at the dataset URI containing pointers to those subsets using void:subset.

/data/{dataset}/{version}/{subset}

http://data.bis.gov.uk/data/organogram/2010-06-30/provenance

A request to a URI of this form should return the data within a particular subset of a dataset, which may have a different source or have come through different processing.

anything you like It really doesn't matter what URI you use for the data that you produce. The only real guideline is that it's a good idea to use a date within the URI so that you can produce different versions of the data over time; even then, it's possible to avoid doing so if you make use of archives to preserve historical information.

### URIs for Concept Schemes and Code Lists ###

The following diagram shows how URIs for concept schemes and code lists, and the concepts that they contain, work. Concept schemes and code lists are collections of abstract things which are generally used as values in the descriptions of real-world things. For example, the gender 'male' is a concept which might be used when describing a statistic that is the male population in a particular area.

More detail about each of these kinds of URIs, and the way in which they should be structured is provided in the following table.

Suggested Pattern Example and Notes
Concept URIs for concepts, which generally have a code associated with them.
/def/{scheme}/{concept}

http://statistics.data.gov.uk/def/gender/M

A request to one of these URIs should result in a HTTP 303 See Other redirection to a document that describes the concept, which will usually be the URI of the concept scheme to which the concept belongs.

/def/{scheme}#{concept}

http://data.statistics.gov.uk/def/age-group#25-35

Only use URIs in this format if you have a small set of concepts that you want to name. You may want to always use the form above, for consistency.

Concept Scheme / Code List URIs for collections of concepts or abstract things that are assigned codes.
/def/{scheme}

http://statistics.data.gov.uk/def/gender

A request to one of these URIs should result in a human and/or machine-readable description of the concepts within this concept scheme.

### URIs for Vocabularies ###

The following diagram shows how URIs for vocabularies fit together. Vocabularies are collections of classes and properties that are used to describe real-world and abstract things.

More detail about each of these kinds of URIs, and the way in which they should be structured is provided in the following table.

Suggested Pattern Example and Notes
Classes URIs for classes of things, such as schools. Note that the convention is for the class name to be named using UpperCamelCase.
/def/{vocabulary}/{class}

http://reference.data.gov.uk/def/central-government/PermanentSecretary

A request to one of these URIs should result in a HTTP 303 See Other redirection to a document that describes the class, which is usually the vocabulary (see below).

/def/{vocabulary}#{class}

http://data.bis.gov.uk/def/grade#G5

Only use URIs in this format if you have a small set of classes and properties that you want to name. Note that the part of the URI before the # is the URI for the vocabulary to which the class belongs. You may want to always use the form above, for consistency.

Properties URIs for properties of things, such as the unique reference number of a school. Note that the convention is for the class name to be named using lowerCamelCase.
/def/{vocabulary}/{property}

http://reference.data.gov.uk/def/central-government/devolvesTo

A request to one of these URIs should result in a HTTP 303 See Other redirection to a document that describes the property, which is usually the vocabulary (see below).

/def/{vocabulary}#{property}

http://data.bis.gov.uk/def/grade#grade

Only use URIs in this format if you have a small set of classes and properties that you want to name. Note that the part of the URI before the # is the URI for the vocabulary to which the property belongs. You may want to always use the form above, for consistency.

Vocabulary URIs for collections of classes and properties. Note that the naming scheme for the vocabulary is lower-case-hyphenated.
/def/{vocabulary}

http://reference.data.gov.uk/def/central-government

A request to one of these URIs should result in a human and/or machine-readable description of the abstract things within this concept scheme or vocabulary.

## Data Formats ##

None of the URIs above have included anything that indicates the kind of format that a document should be returned in. One of the principles of good URI design is to have a **generic document** that does not have a specified format. Browsers and other clients can then use **[content negotiation](http://en.wikipedia.org/wiki/Content_negotiation)** to state which format they want to receive data in: a web browser will request data as HTML for a human to read, while an AJAX application might request it as JSON and a linked data application as RDF/XML.

The following diagram shows how content negotiation works. Each of the formats should also have its own URI, with a request to that URI responding the document itself. When requesting a document, the client uses an `Accept` header to state what format they want to receive, and the server responds with the relevant document, using the `Content-Type` header to state what format is being delivered and the `Content-Location` header to point to the URI of that document.

The following table provides some examples of what these URIs might look like.

Suggested Pattern Example and Notes
Document and Dataset Formats URIs for documents in particular formats. A request to a URI of this form should always return the format indicated through the extension used. This enables people to directly reference documents in a particular format.
/doc/{type}/{identifier}.{ext}

http://education.data.gov.uk/doc/school/520965.rdf

/doc/{type}.{ext}

http://education.data.gov.uk/doc/school.json

/data/{dataset}/{version}/index.{ext}

http://data.bis.gov.uk/data/organogram/2010-06-30/index.ttl

/data/{dataset}/{version}/provenance.{ext}

http://data.bis.gov.uk/data/organogram/2010-06-30/provenance.nt

## Further Reading ##

* [Cool URIs for the Semantic Web](http://www.w3.org/TR/cooluris/)