ODUG's call for Open Data Requests

Since the Open Data User Group first met in July, we’ve worked hard on getting the Group off the mark to ensure we are in the best position to represent the views of the Open Data community to Government. It has been a fantastic learning process, and I’m proud of the enthusiasm, and theabilities the ODUG members have shown in bringing this change about.

In my last blog, I opened the door wide for members of the community to come forward with their thoughts on how we can best serve their data needs. Today we’re making it even easier to communicate with ODUG on these ideas with the publication of our data request form. This form, hosted by data.gov.uk, will enable anyone who’s got an idea for a dataset not yet accessible – be it because of costs or other barriers – to tell us about it and to indicate the benefits of releasing the data. This will become the growing source of evidence ODUG can draw on to make the case to Government for more Open Data releases.

We are breaking new ground by interacting with the community in this way. Upholding our own underlying principles of openness and transparency we will publish each data request form – unless you tell us not to – so that the resource isn’t just for ODUG, but for the entire community to engage with and discuss. Just because someone else has submitted the data request you thought of making, doesn’t mean you can’t support it – you should do as there is definitely strength in numbers. Strength in numbers also covers any estimates about the benefits a data release would bring, so if you can provide figures and descriptions of benefits to support the potential release of a dataset, it’s likely that we can make it happen sooner.

Next week we will present to the Data Strategy Board what we believe is the first tranche of priorities they should focus on in bringing about easier access, reduced or removed costs, to datasets that will make a difference to people’s ability to build businesses and deliver services. We will publish this document alongside the methodology we’re using for prioritising requests, so you can help us make the case for the most beneficial open data.

The impact of ODUG’s input to the DSB relies on the contributions from open data innovators and entrepreneurs. For too long Government has tried to second guess what data might be of use to the community of businesses and third sector organisations, without fully understanding the needs on the ground. This is the opportunity to get it right.

We’re also establishing an online presence hosted by data.gov.uk. This will enable pundits to communicate directly with ODUG.

ODUG is committed to representing the data community at the heart of government and I strongly encourage you to get involved and send us your dataset requests.

Comments

Auditable process?

I was active earlier in data.gov.uk in putting suggestions forward. It is time consuming and did previously seem pretty futile, as there was never any feedback as to what was happening and there didn't seem to be any outcomes.

I would be willing to give it another go, but only if there are going to be some standards for feedback and information.

I think it would be helpful to be told

  • who the request is submitted to (datestamped) (if this is delayed by more than a month from the date of upload then I think there needs to be some interim feedback so it doesn't look like we are posting into a black hole)
  • when the request will be considered
  • what the response to the request is (within a week of the consideration date)
  • subsequently what progress is being made, at at least monthly intervals (or it looks like it has died)

It would also be interesting to understand how the requestor will be involved in the process, as it is not helpful if the released dataset does not meet the business need envisaged (which anyone who does request data knows is often the case)-there is usually a to-and-fro process to get data right and this will be as true with open data as with other data.

 Flag as offensive 

Relationship to Data Unlocking Service

As Martin, I've previously tried to engage with the Data Unlocking Service, both when it was hosted outside, and when it moved to data.gov.uk (I note that URLs for the old OPSI hosted Unlocking Service currently redirect to a 404 page here on Data.gov.uk - as the example in this blog post)

It would be good to have some clarity on:

  • Whether the ODUG route replaces all previous Data Unlocking Service routes to request data;

and

  • Whether previous requests are being carried forward to ODUG;

Whilst it is good to see ODUG will be putting time into championing requests - the fascility to make requests doesn't feel like something new at all - and the new form requires much more input than the previous, making submitting a request more involved than it used to be... which doesn't feel like progress...

 Flag as offensive 

Previous data requests

Tim,

The minutes of recent ODUG meetings indicate that the members of the group have been trawling through the legacy requests on Data.gov.uk, so I don't think material submitted under the previous regime has been forgotten.

You will see Heather mentioned a "first tranche of priorities". My understanding from correspondence is that ODUG are working with an existing proposed "data release pipeline" based on the open data that Departments have already committed to release plus selected previous data requests to Data.gov.uk.

Personally I'm somewhat concerned about the transparency of the process and that priorities might be driven too much by the existing Departmental commitments. I would ideally like to see ODUG rigorously champion the data releases that will have the broadest social and economic benefits, not the data that Departments (and Trading Funds) are minded to give us.

-- Owen Boswarva, 27/09/2012

 Flag as offensive 

New datasets? What about improving existing?

Hi

I'm concerned that a concentration on requesting the release of further data may divert resources that might otherwise be used to improve existing datasets.  As an example, I was surprised when I saw that Companies House is on the Public Data Group - their monthly data release of businesses is very poor.  It rather feels as though they've been told they must publish something without considering the quality of the product.

I would really like to be able to identify British registered businesses by their location (postcode ideally).  To me that doesn't seem too much to ask.  However, with data shown in incorrect fields and some joined with other fields, that simple task is impossible.  I'm told that Companies House simply takes the information a business supplies them with, no checks, no nothing.  It rather makes me wonder - apart from being a basic store of rough data - what value do they add?

--------

 Flag as offensive 

Data quality and Companies House's Free Public Data Product

If you mean the Free Public Data Product that Companies House launched in June, I also had the impression they were not happy about having to publish this as open data. That's based mainly on the lack of publicity when it was released, the obvious lack of effort that went into naming the product, and the fact that the data is not yet registered on Data.gov.uk.

However I've done some work with this data set myself and I thought the data quality seemed pretty good. The data includes properly formatted postcodes for more than 98% of UK companies listed. I did not find any issues with data in incorrect fields or have any particular problems using the data. Is it possible you may not be importing the files correctly into whatever software you are using to view or manipulate them?

Overall I thought the data quality in this product was about what I would expect. If the data is sufficiently accurate for the statutory purposes of Companies House, I don't think we can reasonably argue that they should be making additional checks on accuracy just to support the open data product. That would not be a reasonable use of public resources.

Obviously public bodies shouldn't be releasing rubbish data, and if an open data set is poor quality that may point to an internal problem. However as open data re-users we should expect to do a certain amount of work ourselves to cleanse and process the data if our requirements are more exacting than those of the data provider.

-- Owen Boswarva, 27/09/2012

 Flag as offensive 

Data quality and Companies House's Free Public Data Product

Hi,

That's very interesting, thank you for your reply.

That is the dataset I was referring to.  I'm not sure what's happening then, I'm using Excel 2010 to open the CSVs.

Looking at the first file of four from September's data release I see 847,127 UK businesses by filtering on 'CountryOfOrigin' (of 849,999). Then filtering on the blanks in the 'RegAddress.PostCode' field shows up 9,297 records - looking across at the other address fields I see that in many cases the postcode is there but it's been concatenated to another part of the address (company 5537157 is a good example where the 'RegAddress.County' field has "TYNE AND WEARSR2 7SH" in it).

Do you not have that problem?  Potentially missing ~9,000 records from one of four datasets (whether that equates to a small % or not) seems a problem to me.  I appreciate having to manipulate data for my needs, but this seems such a basic error on Companies House part.

Thanks again for your reply.

------

 Flag as offensive 

What you're describing is *good* data quality :-)

Hi,

I've had a look at that file in MS Access and my numbers agree with yours. I also agree there's concatenation of postcodes in incorrect fields in a small number of records. Everything you say is technically true, and it doesn't sound as if you're doing anything wrong.

But we have a difference of perspective. To me those numbers indicate pretty good data quality, taking into account the type of data and the size of the dataset.

I think what we're seeing are input errors from the registration process, i.e. they haven't been added subsequently by Companies House systems or when the csv files were extracted.

The question is whether we should reasonably expect Companies House to fix those errors. In my view I don't think we should. Efforts to improve data quality are good practice, but there are diminishing returns as you approach perfection. If those data errors don't invalidate the registrations themselves, Companies House probably has no business reason of its own to fix them.

(I probably should have been clearer when I said I didn't find any issues with data in incorrect fields. There's data in incorrect fields; I just don't see it as a real issue. Any address cleansing software should be able to pick out the postcodes and put them in the correct column, or if you know how to write code you can do it with regex or similar.)

I realise this isn't much help. Perhaps it's a matter of expectations? I'm not saying we should be grateful to Companies House for releasing the data, or that we should never raise complaints about open data. But on the other hand I don't expect open data to come with the same standard of service that I might get with a commercial data product.

-- Owen Boswarva, 27/09/2012

 Flag as offensive 

Don't expect too much from admin data!

In order to assess the quality of data on administrative datasets such as those form the CH registration process, it is worth asking "is this data item actually important to the administrative process?" and "how important is it that is exactly right, in the right format etc?".  For postcode in CH data, the answer to the first question is presumably "very important" but the answer to the second is "not important at all" because even if the postcode is enmeshed with the rest of the address the mail will still get through.

I don't think it is reasonable to expect more from any administrative process, at least not until it is formly established in the syche of the organisation that the onward use of their data really is important.  If it is that important, of course, they can expect extra staff or software resource to tidy up the data, can't they? - at which point the problem that their resouces are struggling to cope with the original job they were established for comes up and bites you.

Of course, even if the data are right, they may not be what you think they are.  There is a myth that there is an item in CH data that tells you how much of their turnover comes form exports.  Not so - it can also be turnover of overseas subsidiaries, and you can't tell which without seeing the Annual Report proper!  And it may be "Europe vs elsewhere". And only has to be there if a sizeable proportion of turnover in different geographical segments.  That's a metadata issue, but CH would not be expected to worry overmuch about it, as all they are doing is reproducing the numbers form the reports.

 Flag as offensive 

no response yet?

We seem to have a lot of very valid comments, but not much response yet.

Just to clarify where we seem to be, as there seem to be some practical difficulties as well.

  • The reporting/requesting tool is well linked to, but there is no way to see what has been requested, as far as I can see
  • I can see requests under the previous system, but a)I can't comment on them unless someone else has already commented, which isn't helpful, b)The location is very very hidden away in the site structure- you have to know to look in participate, then know what ODUG means, then it's right at the bottom, off screen- and c)I know I've requested things in the past, but they aren't there as far as I can see? So I would question how complete the list is. It would be helpful to include requests turned down, as this would establish a corpus of things we won't repeat.

I hope all of the comments can be addressed, as there is a vibrant user base here that really do want to be proactive in helping move things forward.

 Flag as offensive 

Viewing and commenting on public data requests

You can view the most recent public data requests here:

     http://data.gov.uk/data_requests?sort_by=created&sort_order=DESC

although I agree the list is not easy to find.

As of earlier today commenting has been enabled on new data requests. However the functionality does not seem to have been made retroactive. There are about 50 public data requests, submitted in the past couple of weeks, that do not currently have commenting enabled.

-- Owen Boswarva, 09/10/2012

 Flag as offensive 

Thanks

That's helpful, thanks. Looks like the first text needs to be modified to make clear it is including new requests, as it says otherwise, but definitely a step in the right direction! Obviously being retroactive would be far more sensible.

It would also be helpful if, as per earlier comments, there was a date of request field, and dates for progress. It would be ironic if useful stuff like that that lets us monitor how successful the project is was kept secret on an open data website!

 Flag as offensive 

Please enable comments on data requests

"Just because someone else has submitted the data request you thought of making, doesn't mean you can’t support it – you should do as there is definitely strength in numbers."

In support of that point, may I suggest you open up the public data requests to comments?

At the moment the only way to add additional arguments in favour of opening a data set that someone else has already requested is to add another request and rely on ODUG to match up the multiple submissions. It would be much more effective to let Data.gov.uk users add supporting material to existing requests.

Data requests submitted to the previous Unlocking Service were open to comments, so why not do the same for the new requests?

-- Owen Boswarva, 28/09/2012

Update:  Commenting has now been enabled on all public data requests submitted under the new process. Thank you.

-- Owen Boswarva, 14/10/2012

 Flag as offensive