4.

Meaningful Open Data: how should we ensure collection and publication of the most useful data, through an approach that enables public service providers to understand the value of the data they hold and helps the public at large know what data is collected?

5) Should the data that government releases always be of high quality?  How do we define quality?  To what extent should public service providers ‘polish’ the data they publish if at all?

Comments (6)

No dataset is perfect,

No dataset is perfect, particularly when the raw input is likely to be coming from the lowest paid public servants.

Place the analytical burden on the user and requestor.

There may be some merit in putting gold stars on good datasets, but the ONS is a good place to start if you want good data, and the public / market will soon work out which the good sets and good analyses are.

Of course it should be of

Of course it should be of high quality otherwise it is of no use. Incorrect data is wasteful.

Quality should = standard/consistent/accurate

It should be correct and consistent - even if this means checking it.

Get the data out

Get the data out there.

Gather feedback on it.

Consider cost-effective improvements tbbers.o future releases

But there are plenty of commercial bodies out there willing to take on the role of dats scrubbers to polish the data. 

The public service should be evaluating, very thoroughly, the cost effectiveness of any data polishing effort it undertakes

 

Data Quality

Yes, you need to define the revalant quality required for each dataset you specificy including guidance on how to compley with this and check before publishing.

If you are using data for comparitiy purposes you need to know that every data source meets the relevant quality standard agreed for that data set. So all provider will need to conform to this.

If you not have agreed data quality standards you should not attempt or promote the use of comparison between data. You should require all provides to provide accompanying notes explaing the basis on which the data set has been produce and any limiations on it's use or conclusions that can be drawn. In this case the data set is valid for that organiation in the context it defines and not for comparison.

Data Quality

If we focused on taking the data collection source through a filtering process which visualised results we would be able to see the wood for the trees. We should go for the low hanging fruit by generating meta data directly from the collection data and adding sanity checks. This would be enough, polish costs and what benefit is derived?

Defining quality

Quality tends to have different attributes. Typically, this is not just about accuracy, what made it useful to government is its quality. Dimensions of relevance, accuracy, timeliness, accessibility, interpretability and coherence are a good place to start. Metadata should be an expectation but audit would be more effective in establishing what is valued rather than having a polish expectation. If data are found wanting they could be rereleased with an annotation of this history. Data collection tends not to be about perfection - that is why it is collected. If we knew what it was, the only reason to have it would be to prove that to other people and I don't think that's really the kind of data we are interested in here. An expectation that if data are revised for internal purposes then they are revised in the published version too should be sufficient. ONS already do this kind of revision for growth estimates: so long as this process is understood and expected, people can make their own decisions about how to use the data. IT would set an unhelpful obstacle if unflattering data could be held back in the hope that they may be found to be inaccurate. This would produce pressure to collect accurate data in the first place which would lead to more informaed decisions and more accountability.