Data on the data on the data.gov.uk

I've worked in data procurement for over 10 years and was keen to understand more about the 3236 datasets behind the headlines on data.gov.uk.

I started with the metadata itself and discovered quickly that it's pretty incomplete. To unlock the true value of the data on data.gov.uk it would be great to see an effort to increase completeness & input accuracy of the metadata. I absolutely take on board that the main function is to get the data out there first and foremost but the more reliable metadata the easier it is to find this data, thus unlocking its potential.

Here's what was found whilst digging around the metadata:

Data.gov.uk provides room for a weighty 29 attributes for each data set and with 3236 data sets [June 2010] , that’s a lot of information on the information. Of these 29 attributes on average 54% were blank with the worst offender being “last updated” with 92% blank entries [currency is fundamental when considering fitness for purpose].

I made some pretty graphs but I can't post them on here so are some key stats:

"Key" Fields - percentage of blank entries:

Last Updated: 92%
Data Type: 69%
Geographic Granularity: 47%
Update Frequency: 46%
Released: 36%
Department: 34%
Geographic coverage: 8%

Once you get past the blank fields it's interesting to see the types of data being inputted.

Here are the [crude] stats for the main fields:

Date Released - year by percentage [top 5]:

Blank entries: 36%
2009: 30%
2010: 16%
2008: 10%

Last Updated - year by percentage [top 5]:

Blank entries: 92%
2009: 4%
"Other": 3%
2010: 1%
2008: <1%

Update frequency - freq by percentage [top 5]:

Blank entries: 46%
Annual: 27%
Not updated: 8%
Quarterly: 7%
Monthly or more freq: 5%

Department: Entries by percentage [ top 5]:

Department of Health: 11%
Department for Children, Schools & Families: 8%
DEFRA: 6%
Welsh Assembly: 6%
Home Office: 5%
[Blank entries: 34%]

Geographic coverage: area by percentage [top 5]:

England: 41%
UK: 15%
England & Wales: 14%
Blank entries: 8%
Scotland: 7%

Geographic Granularity: area by percentage [4 entries- clear confusion about what this field means]:

Blank entries: 47%
GB & UK: 25%
“Other”: 17%
Local Authority: 11%

Data Type: Type by percentage [4 entries]

Blank entries: 69%
Administrative data: 22%
Survey data: 7%
Modelled: 2%