Confirmed updates to API, CSV files and documentation

Hi everyone,

Thank you for your useful comments, suggestions and feedback about the police.uk data during the past couple of months, on this forum, by email, and in person.

I'm pleased to say that we're able to take quite a few of improvements forward, hopefully resulting in an open data offering that is more consistent, easy-to-use, and reliable in the long term.

Firstly, all police.uk open data related things (except the application showcase) will be migrating across to the data.police.uk website, which we're currently revamping in line with the Open Data Institute best practices.

From early July, we'll be providing much better information about data provenance, quality issues, our assurance and anonymisation processes, and tools for working with the data. We'll also be making it much easier to get in touch with the Home Office about any concerns, issues or suggestions, as well as launching a Data User Group so it's easier to for you to influence future policy decisions.

In terms of the data itself, we'll be launching a clearer change log and providing single-click, permalinked archives of the full crime and outcomes database each month in CSV format.

Based mainly on the feedback from this forum, we'll also be providing a 'custom download' feature so you can bulk download data in CSV format for a specific time period and set of forces. A picture of the in-development site explains it better than I probably can : )

https://dl.dropboxusercontent.com/u/22172657/data_custom-downloads.png

The CSV files contained in the download will be very similar to those currently provided on police.uk/data, with the following changes:

- We won't be providing an aggregation of the data by Neighbourhood Team beyond the end of August. It will still be possible to get this data using the new 'poly' parameter in the API, and we'll make sure to provide help and guidance to anyone who needs to transition over to it.
- The 2011 Lower Super Output Area code and name will be listed alongside all crimes.
- The last known 'status' for each crime (e.g. under investigation, offender cautioned) will be included alongside the rest of the crime information, in addition to the current outcomes files.
- They will use WGS84 latitude and longitude, rather than eastings and northings.

I'll post a full example file to this thread as soon as I have one.

In terms of boundary files, we'll be making the KML files for police forces boundaries much easier to download, and well as a continuing to publish current and historical archives for the (regually changing) neighbourhood team boundaries.

A 'poly' parameter will also be added to the street-level crimes and street-level outcome methods in the API, allowing you to query for custom shapes (like wards, neighbourhood watch areas, councils) instead of just a set radius around a single lat long. The format will be:

poly=[lat],[lng]:[lat],[lng]:[lat],[lng]

The last point does not have to be the same as the first, as the system will join them up automatically. For example:

/api/crimes-street/all-crime?date=2013-01&poly=52.268,0.543:52.794,0.238:52.130,0.478

We're also going to be making a few additions to the API documentation in general over the coming months, based on comments and questions that users have been kind enough to send us.

We'll send out a message to the police.uk mailing list and post on this forum when the new data.police.uk site is ready (all things being well, early next week). All data will also continue to be available in their current formats until the end of August.

Thanks

Alex

Comments

Sample CSV data

Hi,

A sample of format of the new CSV files can now be downloaded from https://s3-eu-west-1.amazonaws.com/policeuk/uploads/new-csv-example.zip

It's representative of what you would get if you selected April 2013 - April 2013 as the date period, and checked Avon and Somerset as the force.

Alex

 Flag as offensive 

Hi Alex, Can you confirm if

Hi Alex,

Can you confirm if the Neighbourhood Crimes end point is being retired, and what will be replacing it?

I found it extremely useful for providing crime rates per 1000, and crime level... Which none of the other end points provide.

 Flag as offensive 

Hi, Yes, it is. It's being

Hi,

Yes, it is. It's being replaced by the 'poly' parameter on the street-level crimes method, which allows you to query the API for any shape you want (including neighbourhood areas). We will continue to publish the neighbourhood KML files here. As an aside, I made a pre-compiled list of the relevant URLs for the 2013-05 neighbourhood teams at the weekend - you're welcome to download and use it if you like : )

The crime rates are actually one of the reasons behind retiring the neighbourhood-crimes method. Because neighbourhood teams are generally structured around operational policing and resourcing issues, rather than census or similar boundaries, it can sometimes be very difficult for forces to provide accurate population figures. We know that some forces are basing the population figures on information that's over a decade out-of-date, and in the worst cases, are nothing more that a guess by force statisticians. There's also discrepancies in the way that different forces provide population estimates for areas with low resident population but high footfall (like an airport).

For those reasons, we're keen to stop releasing information that's potentially misleading into the public domain.

Instead, we're looking to give data users the tools to calculate much more accurate and up-to-date crime rates themselves. You can use the 2011 LSOA population figures in conjunction with the new CSV files to calculate the rates easily, or combine any other type of population figures you have with the new custom shape 'poly' parameter using the API to calculate the crime rates for any area you want.

Sorry if these changes create extra work for you and are annoying at first. My hope is that over the long term they'll provide a much more stable, accurate and flexible open data offering.

Alex

 Flag as offensive 

Coordinates and data formats

Hi Alex,

This is a duplicate of what I posted on the previous thread as I only just spotted this new one...  In response to your reply on the previous thread, I totally understand the rationale for using WGS84 to cover both Great Britain and Northern Ireland, but I have some issues with making the swap and the changes made to the download formats.

1) All the data published so far is in national grid coordinates, so all your early adopters have put systems in place to use these coordinates; your changes will favour certain new adopters but will undoubtedly annoy anyone who has already developed something to load and process the existing data format and in particular the coordinates.

2) I want to use the data in conjunction with OS OpenSpace and Ordnance Survey Open Data, which (last time I checked!) was also a part of government.  It seems a shame for Police.uk to favour Google/Bing users over those who choose to use data from our National Mapping Agency.

3) I don't buy the idea that "supplying the coordinates in three coordinate systems costs taxpayers more money".  The points have been supplied in the national grid coordinate systems for the last two years, so you already have systems in place to deliver them, and the WGS84 coordinates are equally possible (I've now seen the new portal!).  Where, exactly, is the cost in simply appending the lat, lon as new fields while retaining the old ones?  I can't see a genuine cost in that.

4) You need to give people more notice for changes like this, or at least dual-run both formats for a longer period.  Just one or two months is not a fair notice period to developers and entrepreneurs to make significant changes to their systems - particularly when the changes are a replacement rather than an addition to what was available previously.  This kind of thing creates uncertainty and makes OpenData seem like a flaky thing to build a business on (which goes against the whole idea of the thing).  I for one am seriously put off by this.

I hope that explains my position.  Having re-read the previous thread, several others were saying similar things too and it's a shame that this feedback from actual developers and end-users has not been taken on board.  Please take these comments seriously, as what should have been (and almost was) a positive improvement to the download portal has ended up being a disappointment because of the way it's been done.

Many thanks and best regards,

Jonathan.

 Flag as offensive 

Please see the other thread

For anyone else reading this, I replied Jonathan's message on the original thead (#31).

 Flag as offensive 

The new data structure looks

The new data structure looks good, and matches the way that we (ocsi.co.uk) are using the data in analysis and tools to improve public services. 

However, a couple of thoughts based on the discussion above and the previous thread

Adding standard area codes such as LSOA is good.

Although it is clearly straightforward for GIS folks to link latitude-longitude to standard areas such as LSOAs, open data is not just for GIS folks. Adding the LSOA code makes the datasets much more usable for analysts who work directly with these areas. So it's good to see you doing this. And as an SME director / founder, I don't agree with the argument that adding an 'LSOA code' is straying into 'value-added' commercial territory. This is a basic property of the individual data records, and increases usability (and hence take-up) of the data.

However, it would be good to see OA codes added also. These are after all the basic building block of (almost all) standard administrative areas, and are how Census data is delivered. Your primary reason for not using OAs seems to be that streets cross over OA boundaries. However, (1) so do LSOAs (many of which use street centre lines as their boundaries), and (2) you've already linked the data down to a point, so users can only aggregate up to higher areas using this point data - so will inevitably link up to OAs etc from the point data. Both arguments suggest that OAs would be useful (either instead of, or as well as, LSOAs) to increase use of the data by analysts etc. 

Eastings & Northings should be kept (along with Latitude & Longitude)

We use latitude-longitude, so good to see these in the data files. However, many organisations will have spent time & money & goodwill on developing products that expect Eastings & Northings output from your API. Given that there would be only a very small cost of continuing to provide these fields, it seems odd not to maintain backwards compatibility for existing users. Plus future users may be less likely to take on the risk of basing commercial tools / offer / services that use the API ("if this kind of change can be made that breaks existing tools, what else might get broken in future"). 

But, overall the new data structure works for us, keep up the good work.

Cheers, Tom.---------------------------------Dr Tom Smith, DirectorOxford Consultants for Social Inclusion (OCSI)tom.smith@ocsi.co.uk@_datasmithwww.ocsi.co.ukmob +44 7966 543 467, tel +44 1273 810 224.

 Flag as offensive