Two potential changes to the API and CSV downloads

Hi,

There’s a couple of improvements we’re thinking of making to the Police.uk API and data downloads over next couple of months, but wanted to run these thoughts past the developer community first for any comments, suggestions or other good ideas!

Using Latitude and Longitude in CSV downloads

  • At the moment the CSV files we make available on www.police.uk/data use a mixture of coordinates – both OSGB36 and OSNI52 easting-northings.
  • We think that switching to the more common WGS84 latitude-longitude format will make the data much easier to work with, and simpler to integrate with systems like OpenStreetmap or the KML shapefiles that we also make available on the site. It also reduces the risk of someone not realising that the CSV files currently use two different datums, and converting the coordinates incorrectly.
    *The Police.uk API already uses latitude-longitude throughout, so harmonising the two should also make it simpler for anyone who wants to use the CSV data in conjunction with the API.

If we do make the switch, we will probably have a changeover period of a couple of months where we make both forms of the CSV files available.

Custom location API queries

  • Currently you can only extract crime data in the API for a constant radius from a set latitude-longitude, or for a particular neighbourhood policing team.
  • This isn’t great, and makes it difficult for people who want crime data based on a different types of location. So we’d like to depreciate the neighbourhood-crimes method and neighbourhood CSV files, and replace them with something much more flexible.
  • We’re thinking of adding a way for developers to POST their own shapefiles to the API and get back all the information for crime and outcomes in that shape. It means that you’d easily be able to get accurate crime data for any location you want – for example, a council ward, town boundary, neighbourhood watch area, postcode area, census area with population data, or anything else you can think of!

Again, we may be able to organise a period of parallel running if the community thinks it would be useful.

Any thoughts would be much appreciated : )

Thanks,

Alex Edwards
Police Company Directorate | Home Office

Comments

Comment on proposed changes

Hi Alex,

Using WGS84 instead of the OS datums is an excellent idea, for the reasons given. Though deeply unpatriotic, of course.

Polygon-based custom location queries are also a good idea in principle, but I would strongly urge you to retain availability of standard csv downloads based on some kind of small-area geography. I've never liked the "neighbourhood" geography much as it seems to be particular to Police.uk data, but wards or LSOAs would be a good replacement. Some users just need the whole dataset in a manageable format; they may not want to faff around with uploading shapefiles. Of course if you can preload a range of standard geographies that would be ideal -- provided they are consistent month to month.

-- Owen Boswarva, 11/04/2013

PS: It's not just the "developer community" who use Police.uk data, y'know. Some of us are analysts.

PPS: I think the word you want is "deprecate". Depreciate means something else.

 Flag as offensive 

LSOA-OA-BTP-CSV

I have no strong preference between oa and lsoa - lsoa may be easier to compare with other, non-census, datasets so is probably better for general use. Expert users can easily aggregate the street level data themselves anyway.

Also, as Owen suggests, at OA you would have more instances where a street covers more than one area and hence a single OA would get some of the crime from its neighbour which could be misleading.

Can you also add an extra column to the csv street level data with the higher level geog - code?

Also are there any cases where a lsoa straddles two police forces? 

finally what about BTP? they should be excluded from the aggregation, I guess, otherwise that would make things look interesting.

 Flag as offensive 

BTP and LSOAs

Hi,

I'm not sure about LSOA's straddling force boundaries - does anyone else here know? If not, I'll take a look into it.

With regard to BTP, we could do either. I think it comes down to this: If the area you're interested in (say a LSOA) happens to contain a railway station, should the output contain the crimes that happened at the station, or should it act like the station doesn't exist?

My gut feeling is that the former is simpler and has more practical applications, but I'd be interested to understand the arguments either way. If we want to split apart BTP for the purpuse of comparing performance, workload etc, then the official and verified ONS police force area data would probably be a better bet than the (indicative, in-flux) police.uk data anyway.

Alex

 Flag as offensive 

Hi Owen, Yes, the lack of

Hi Owen,

Yes, the lack of consistency is a big drawback of using neighbourhood teams - forces restructure and change their team boundaries quite often for operational policing reasons, probably at least one force every month.

That's a good idea, we could probably preload some set shapes without too much trouble and continue to make those available as CSVs. Electoral Wards, OA's, LSOA's, Census Merged Wards etc would probably be fine to do. If you had to pick out one type to use for the CSVs, which do you think would be useful to the most people?

Thanks

Alex

 Flag as offensive 

Small area geography

Hi Alex,

If I had to pick one set of boundaries I would say use the 2011 Census LSOAs.

I'm rather averse to wards because the policy history tends to make them confusing. (There's nothing worse than trying to match up two datasets that use different types of wards.)

However I hope we can get some other crime and justice data users to respond on this question as well ...

-- Owen Boswarva, 13/04/2013

 Flag as offensive 

Agree with Owen, wards are a

Agree with Owen, wards are a very poor geography for analytics. Not only do OA's no longer nest within them but due to the mix of single and multi-member wards they are horribly inconsistent in size and shape.

LSOA's are a decent compromise but are too big for many uses. Standard 2011 OA's and/or Workplace Zones would be my pick - many crime types are dominated by where people work (eg shoplifting so WZ might be best).

Does the API cover Scotland/NI? If so we're still waiting for the OA/WZ from NRS. NI LSOA are out but are bigger than EW so a bit of care needed.

Whatever you do, avoid postal geog :-) . Although good for some public facing dispaly, a more horrid framework for analytics would be hard to design.

Best

 Flag as offensive 

Yes, OAs would be good too ...

Alex,

I have an idea that the "crimes by street" data is already effectively generalised to a larger area than OAs typically cover, in which case use of OAs might mislead. But you will have a better understanding than me of that, of course. It's always possible to derive LSOA data from OA data anyway, so I would be fine with 2011 OAs. 

Not so sure about Workplace Zones. As Blair says they are useful for analysing certain crime types. However I don't think they are right as the default geography; that would rather seem to prejudge how the data will be re-used. (But if we ever see a release of incident data on crime against businesses, Workplace Zones would be nearly ideal for that.)

We are really spoilt for choice with Census geographies actually. We'll miss them when they're gone ...

(Blair, the Police.uk API is a Home Office project so unfortunately only covers England and Wales.)

-- Owen Boswarva, 13/04/2013

 Flag as offensive 

OAs and Northern Ireland

Hi Blair and Owen,

The extent of generalisation of the data varies - in dense urban areas the distance that a crime travels during anonymisation is typically very small, so OAs would probably be OK. But in rural areas the distance can be really quite large, so a larger area like LSOAs would be a safer bet for minimising the number of crimes that get included into the 'wrong' area.

Police.uk and the API/CSVs do include Northern Ireland Crime data (as of December last year) - but no outcomes data at the moment. We'd like to include Scottish data too, but have been waiting for the merger that's just taken place between all the Scottish forces. I think it's something we'll take a more serious look at later in the year.

Alex

 Flag as offensive 

It's an interesting idea to

It's an interesting idea to request an area being specified. You might set-up a website about say the Olympics, and want to show a graph of crime in a user-defined area over time. That would save users filtering the incidents another way. I would say that KML is rather more open than shapefile. A one-off upload is no particular hassle, and be a benefit for building a site of this sort. However, it's not clear how popular this use would be, compared to the existing circle and ward-level breakdowns you already provide.

LSOAs are interesting if want to do analysis with ONS data, so have more value that wards, I'd say, given the choice.

 Flag as offensive 

On behalf of Dan Lewis

All--- Dan Lewis is having issues posting to the site, so while we fix that he has asked me to post the following on his behalf, he sould be able to post with his own account later today - These are Dan's views.

-------------------------------------

 Hi Alex and everyone,

I've only just seen this so I am responding now.

Using our own financial and programming resources,  UKCrimeStats has already done the small (and large) area geography work. We have all crime data going back to December 2010 matched to the following shapes - see here http://www.ukcrimestats.com/Subdivisions/ ;

Constituencies

County Council

County Ward

District Council

Distrct Ward

London Assembly

London Assembly Constituency

London Borough Ward

Metropolitan District

Metropolitan Ward

Unitary Authority

Unitary Authority Electoral Division

Unitary Authority Ward

Welsh Assembly Constituency

Welsh Assembly Region

Postcode District

Postcode Sector

Norhern Ireland Data zone

Lower Layer Super Output Area (LSOA)

So it's not acceptable for police.uk to copy what we've done and undermine one of our website's usps using taxpayers funds. Alex, could you confirm that you will now not do this and go into competition with us at the cost of the public purse and the detriment of the market?

We are also nearly finished with the release of an LSOA data product (with 100 plus datasets including population etc.) which we hope analysts and developers will find useful. We are also developing a multi-faceted api which we intend to be better than police.uk along similar lines that you describe so here again, we don't want to be unfairly crowded out by police.uk.

I know not everyone agrees, but the better way to encourage developers and analysts is for police.uk to stop developing and concentrate on cleaning up the data, completing the data (still a lot of missing police stations on the api after 2 years for example) and instigating better data governance. We would all be better served by a plurality in crime data apps and websites rather than the existing monopoly. 

 Flag as offensive 

I disagree ...

I hope the Home Office will prioritise the wider interests of Police.uk data re-users, and disregard the views put forward by Dan Lewis above.

The concept of providing location data with a selection of administrative geographies is hardly unusual. If that is UKCrimeStats's unique selling proposition, it doesn't seem like much of one. The features proposed by Alex are a sensible extension of the existing Police.uk functionality. 

It's always nice to see additional analysis or added value from third-party data providers who have a properly marketable USP. However crime and justice data is very sensitive to error and interpretation. Developers and analysts must be able to download a neutral presentation of the bulk crime data from an official website, in a flexible manner.

If I were using police data in a project for a client, I would have difficulty explaining why I was relying on a re-cut version bought from a conservative think-tank, rather than from closer to source.

The point of "open data" is to encourage innovative re-use of public information, including re-use for commercial purposes. That doesn't mean we should discourage data publishers from making their bulk data easy to obtain or re-use (or from using that data themselves in support of public services) merely to protect the perceived business interests of individual re-users.

The proposals set out by Alex do not seem to me to be either unfair or anti-competitive. However if Dan Lewis has any real grounds for complaint he could of course pursue them through the OPSI or the OFT.

-- Owen Boswarva, 18/04/2013

 Flag as offensive 

Crime or Recorded Crime?

I agree with Owen.  I see what is proposed as part of the basic open data product.  As such it seems like asking for trouble to allow it to drift into being a third party's IP.  In theory, such a third party could lose interest, run out of resource or simply disappear.

I took a quick look at UKCrimeStats via Owen's link.  I was a bit surprised that it wasn't called UKRecordedCrimeStats.  And then that there appeared to be no reference to the fact that it was restricted to reported crime which is also recorded, with all that implies in terms of variations in coverage and practice between forces and locations and across time.

I'm well aware that the British Crime Survey can't be used to produce small area data, but I would have thought that any repository of crime data ought at least to refer to its existence and its role at more aggregated levels of geography, and to include something on the issues that it raises, even if there is not unanimity on some of those issues.

 Flag as offensive 

In reply to Owen and ExStat

Hi Owen and Exstat, 

Thanks for your comments. It's really good to have this debate in a public forum. I've no doubt it will carry on for some time and we are very happy to participate, at great length, for as long as it takes.

 First of all @ExStat;

"I took a quick look at UKCrimeStats via Owen's link.  I was a bit surprised that it wasn't called UKRecordedCrimeStats."

No, this is just not the case. For 2 years, we have explained this many times over on the following pages;

http://www.ukcrimestats.com/FAQ/

http://www.ukcrimestats.com/AboutData/

http://www.ukcrimestats.com/Disclaimer/

And there is no link from Owen, but from me via Antonio. We have never ever claimed to offer live data, only recorded data from police.uk which we have then connected to other shapefiles and built into a database - we are the only aggregators of the crime. If it's live data you want then look at www.streetviolence.org from witness confident. 

Secondly, Owen. I can see you've done lots of interesting geospatial work but may I just ask if you have done any published work based on the crime data because I can't seem to see any?

There has to be a distinction between those who have skin in the game and those who don't. Am I correct in saying you've never done anything with the crime data?

If you could put yourself in an entrepreneur's shoes, I just wonder how you would feel if time and money you had spent developing an open data platform was suddenly made worthless by government copying what you have done?

I don't doubt the good intentions of the HO civil servants but you must admit, there is a difficult conflict of interest here - develop police.uk and secure more budget and praise for ourselves or take a step back and let the private sector do more and lose both?

It's a bit disappointing that the only way I found out about this forum - which we have been compaigning for with the HO for 2 years and this is an outcome we strived for - we have only found out about via a police.uk google alert. It really should be publicised so a large number of those who are genuinely engaged and actively developing crime data are part of the conversation. Right now, it looks like it is just us. 

That leads on to the next point. How do you presume to know what is in the wider interests of police.uk data reusers and do you really believe that the interest of taxpayers are not an issue?

The number one problem facing the country is that there is not enough money. All police.uk developments will cost money, the question is who pays - all taxpayers in perpetuity or do you have a cut-off point and say just those developers who will use it and add value in their own way in a free and fair marketplace that they enter at their own cost and risk?

You have to try to imagine that if the HO had only released the crime data rather than police.uk simultaneously, a large number of entrepreneurial players would have moved in and offered competing platforms, apis etc. for free. Of course some, maybe even a lot,  would have failed, but the data would have got cleaned up and we would all have been better served by a range of innovative apps at a fraction of the cost to the taxpayer. Unfortunately, that's why lots of people around the world do not see police.uk as a model worth emulating.

Do you have a full overview of what everyone is doing in this sector and the downstream impact of police.uk using taxpayer funds to further increase their market impact, in a kind of super-enhanced BBC-style monopoly of the crime data?

Perhaps we don't agree about the role of government in open data. But most genuine believers in open data would see the role of government to publish data and not to develop it and certainly not both. So it really isn't acceptable to avoid the competition issue especially as the HO invited developers to work with the crime data back when it was released in January 2011. The Open Data Institute to their great credit hosted a one day seminar about this which I spoke athttp://www.theodi.org/events/open-data-comes-market-mysterious-case-disappearing-crime-apps which was sponsored by the EPSRC project SOCIAM. The postcode address file battle has been lost. Do you want crime data in a government-controlled monopoly too?

We continue to work with a growing wide range of people and organisations of differing views who reuse our refined data for their own purposes who all much appreciate working with an independent think tank with no agenda other than a commitment to data integrity. Here again, you've missed another point. Do you really think it is in the public interest to have only 1 monopoly supplier and what incentive do they have to find and correct their own mistakes and keep costs under control?

You are quite right that "crime and justice data is very sensitive to error and interpretation" and that is an argument fo encouraging more players - not just ourselves - to find the errors and innovate in many directions many of us will not have thought of. The competition issue will not go away and will increase right across the board of open data as long as there is no rules-based system for what government can and can't do in the open data marketplace. OPSI only addresses a fraction of this through the 2005 re-use of public sector information act. And in the technology world, where companies succeed or fail within a year or even less, the regulators are too slow, so the rules have to be in place and obeyed first in order to create a fertile evolutionary landscape. 

Until then, I fear that over the next few years, the open data meetings I go to will continue to have attendees that are on average 80% public sector, 5% journalists, 5% open data blogger/hangers-on and 5% who may or may not actually try to build a commercially viable platform / app / product.  

If the open data agenda was really working then those percentages would be inverted. I would hope everyone here would want that too? 

 Flag as offensive 

Should this be a new thread?

I think Owen is right when he says that Dan raises a fundamental discussion on open data policy - it arcs over a lot more than just the two original questions I asked, and perhaps more than even the police.uk project.

Would it be better suited to it's own thread, rather than just as a sideline in this one? It feels like a big enough topic.

Dan's view on these two potential developments is quite clear, but I'd still be keen to hear from other people on the specific questions I started this thread with.

About promoting the forum to the mailing list - we've just been waiting for clarification about upcoming changes to crime categories from ONS and the HO Stats team, so we could bundle the news into one email. I expect us to be sending something out in the next ten days.

Alex

 Flag as offensive 

In reply to Dan Lewis ...

Dan, thanks for your response. Obviously we disagree on the fundamentals of open data policy, so I'm happy to let my previous post speak without much elaboration.

With regards to your ad hominem comments, I think this debate should really stand or fall on its merits rather than on my credentials. (Or yours, whatever they may be.)

However for the sake of clarification: My interest in Police.uk is mainly a function of my broader interest in unlocking public data. I have done quite a bit of work with crime indicators, though I won't pretend it's my specialism. My professional background is primarily in modelling of geographic perils for in-house insurance and risk management applications. I don't "publish" any work products; anything I put on the web is likely to be only blogs or hobby stuff.

I'm rather mystified by your mention of "live data" in response to exstat's post. Are you really not familiar with the British Crime Survey outputs? Surely you understand the analytic limitations of working only with recorded crime data?

-- Owen Boswarva, 26/04/2013

 Flag as offensive 

Live data???

Owen has beaten me to it.  I'm having difficulty working out what Dan thinks "recorded crime" actually means if he thinks the other option is "live" data.  I would be very chary of relying on an organisation which appeared to believe that for anything!

The British Crime Survey is a far better source for large area data on most types of crime, because it is not affected by changes to the instructions given to the police by the Home Office (remember the furore on Chris Grayling's interpretation of recorded violent crime data?) or by the rigour with which the police apply those instructions.  It's not perfect, not least because people's recall of crime over a given period is not totally reliable, but that unreliability is likely to be consistent, thus not affecting trends.

The wider issue of the interface between public and private sector roles is nor straightforward.  The rationale behind making open data freely available (in every sense) is in part that third parties will then develop innovative applications which generate value added for the economy, thereby compensating the country for the cosst involved in making such data freely available in the first place.  Many statistical datasets have come with geographies attached for years, and I doubt that Dan would argue that govenrment should step back from that.  To my mind, putting further statistical datasets on a similar basis is pretty much core, and increases the base set of data on which third parties can then build truly innovative applications.  In other words, more applications can be developed because the source product has become richer.  Make the geographical add-ins the property of one third party and it is easy to see financial or IP obstacles being placed in the way of further third parties, unless each of them reinvents the wheel.  On the other hand, if it becomes a risk for a third party to develop something which doesn't currently exist (though arguably should do so) in the core product, it may never be developed at all.

So there is scope for a decent discussion about "what belongs where" but I think it should be based on what is likely to generate most innovation overall.

 Flag as offensive 

Hi Alex, We would like to see

Hi Alex,

We would like to see the crime data on Police UK website made available to public in the following format:

COORDINATES

* Both Lat/Lon and OSGB36/OSNI52 format. We do work with both sets of coordinates but it would be useful (since you have coordinates already on your system) to provide both sets of grid references. Not many people are able to convert from OSGB36/OSNI52 to Long/Lat and from Long/Lat to OSGB36/OSNI52.

DATA OUTPUT

* BULK download-type1 in one CSV file all crime data for all Police Forces. At the moment we have to download files separately for every Police Force for every month. So we would like to see one CSV file for "Crimes by Streets", one CSV file for "Crimes by Neighbourhoods" and one CSV file for "Outcomes by Streets" for ALL Police Forces combined for one month. In this case we would have to download only 12x3=36 monthly files. When zipped these files are easy to download.

* BULK download-type2 in one CSV file all crime data for all Police Forces for one year. So we would like to see one CSV file for "Crimes by Streets", one CSV file for "Crimes by Neighbourhoods" and one CSV file for "Outcomes by Streets" for ALL Police Forces combined for one year. In this case users can download only 3 annual CSV files. In the case of the current year, for example, you could provide CUMULATIVE file for 4 months only which will eventually become full annual file. When zipped these files are easy to download.

* I would also leave the option of all data being broken by month by Police Forces as you have it now.

* OUTPUT at different geographies. Few comments were about providing crime data at OA/LSOA/WARD level. If crime data at street level already have coordinates then anybody with GIS software is able to process this data and do point-in-polygon to assign OA/LSOA/Ward code. Census geography is available for download from ONS' website. In my opinion, it is more important that you provide data in more efficient format (such as bulk download) than to do data processing yourself on behalf of users. If people do not have GIS software to process raw street crime data then I think providing data at OA level would be sufficient.

* POLICE STATIONS. We would like to see one CSV file that contains locations of all Police Stations in UK with coordinates, postcodes and addresses. At the moment this is not available for download in a simple CSV format.

* BOUNDARIES. Is there possibility of having Police Force boundaries (and neighbourhoods) in other formats apart from KML (Shape or MapInfo)?

Regards

Armin

 Flag as offensive 

Good Plan!

Just to say, I fully support this comment - if this was done, I think most people would be delighted!

 Flag as offensive 

Supporting the user base

I'm a bit perplexed that you think not many people are able to datum shift the Police.uk data, or convert boundaries from KML to other formats, yet you think users should be left to do their own point-in-polygon processing. That seems inconsistent. It really depends whether there is demand for the data at LSOA or Ward level. If there is, it would be more efficient to cut the data once at source than require users to make their own arrangements.

It's inherent in the nature of open data that we cannot easily characterise the user base. I think Police.uk should try to accommodate users with minimal technical skillsets, to the extent practical.

I sounds as if what we really need is a wizard like the one on the Nomis website, which allows users to specify geography, variables, format etc. and then generates a download file on the fly. However I'm not sure Police.uk has the wherewithal to implement something like that.

-- Owen Boswarva, 02/05/2013

 Flag as offensive 

GIS and Data users

Hi Owen,

I have met many users of GIS software who are capable of doing point-in-polygon processing but are not able to convert coordinates or KML files into Shape or MapInfo format. So I do not see any inconsistency in my comment. It is based on real life experience. I have also met many people who are able to do all sorts of complex GIS programming. Anybody who wants to use this type of data at LSOA level would be much better off having OA data, append LSOA, Middle SOA, Ward, etc codes and aggregate data themselves. All these data sets are very large so any user who wants to process large data sets will need to have better tool than Excel since it cannot handle large data sets and it is not most appropriate software for data manipulation. If we are talking about user with minimal technical skills in data manipulation and processing then those users do not need raw data. They need files at much higher geographical level (wards, LAs, constituencies, etc).

I think what I have proposed in my earlier post (different way of summarising and publishing the same Police.uk data) is very easy to produce and put on the website. Police.uk does not need to create any wizard where users can specify geography, variables, formats, etc. All we need is the raw data in CSV format with lookup tables for various Police Force geographies  that can be read by almost every application. Combine this with already available Ordnance Survey data + data released by ONS and you have ingredients that allow you to do a lot of things with the data.

 Flag as offensive 

Hi Armin

Hi Armin,

I'm totally with you on the bulk downloads suggestion - I've had to download and concatenate all the individual CSV files a couple of times myself, and it's a real PITA.

A wizard / download filter like Owen suggests does feel to me like a nice solution, which would meet both your needs of having one file, but also make it easy for people who only need a subset too. It probably wouldn't need to be as complex as the Nomis one, but something that lets you select the date range and geography (either all forces, certain forces, or a certain smaller geography [OA/LSOA]) you're interested in, and then dynamically generates the CSV file would certainly be possible.

We do need to cater for people who use the data outside of GIS systems, in code, databases, spreadsheets etc. I think we need to offer something more that just one big file, and definitely can't assume that everyone has the skills and software to allocate points to polygons that you do : )

Coordinates - If it's a deal breaker we could provide coordinates across three datums, but I'd really like to settle on one coordinate system to minimise complexity and technical debt our end.

Police Stations - I agree. Believe it or not, no such dataset currently exists at a national level. We're working on it!

Thanks again, some really useful comments there.

Alex

 Flag as offensive 

Consistency

I think the most important thing for users of police.uk data (be they web developers, academics or police/community safety practitioners) is consistency in what is available. If I'm going to spend time and money building a work-flow around particular features then I need to know that the same data will be available in a few months time. I think these changes should be additions rather than replacements. In my case (running the crimeinlondon.com website) this means CSV files for wards – new features are great, but not at the expense of what we have come to rely on. 

Three small points:

  • WGS84 is probably the easiest co-ordinate system for web developers to use, but (at least in my experience) in academia and the police service itself Eastings and Northings are much more common and more-commonly understood.
  • CSV files are much easier for bulk downloads than the JSON API. If I want to download ward-level data just for London for a month I only need one CSV file but I would need to do more than 600 API calls.
  • I disagree with Dan Lewis on the role of the Home Office in this. The HO's primary role is to keep people safe and helping people understand crime in their area is part of that. If other people can make use of HO data then that's great and if they can make money out of it then that's even better, but I don't think there is any public benefit to holding back police.uk to help developers make money.
 Flag as offensive 

Hi Matt,   Apart from making

Hi Matt,

Apart from making the crime categories more granular, this is the first real change to the open data formats since launch in January 2011. We're really cautious about doing anything that will mess up people's existing processes or code, so if we are making a change we want to make sure that we do it right, and it is suitable for the next few years.

The changes I proposed actually lend themselves to more consistency - currently we publish CSV files based on neighbourhood teams, which are constant flux due to operational policing reasons. Switching to an 'external' geography which isn't going to change, like the 2011 LSOA boundaries, means the CSV data should be more reliable in the future.

The bulk download point is also taken - we'll definitely be making it easier to download all the CSV data in one go - by means of a full database 'dump', and an filter that allows you to pick the date range and specific force if you only need a subset to work with.

Alex

 Flag as offensive 

What services are going to be withdrawn in August?

Hi Alex,

Just to confirm the message that's appeared at police.uk/data (today?), are you completely discontinuing the provision of .csv files for streets, neighbourhoods and outcomes in August?

I appreciate that some forces change their neighbourhood team boundaries frequently, but in London they're based on ward boundaries. This means are directly comparable with lots of other data published at the ward level and are useful for local councillors (who are elected for wards). I've spent months developing crimeinlondon.com to compare ward-level crime statistics with ward-level demographic information. It appears this will all be wasted if you move to exclusively providing data for LSOAs, since they are not coterminous with any other geographical units outside the OA hierarchy. At UCL we were also just starting the coding for a system to use the police.uk data for student projects, but this will now have to be put on hold until we’re sure what services you are withdrawing.

Is it possible to continue to provide data as .csv files as well as via the API? That’s what I meant when I previously said that any new services for developers should be in addition to what is provided now, not instead of the services that people have built their systems around.

Thanks,

 

Matt

 Flag as offensive 

Hi Matt,No - you'll still be

Hi Matt,

No - you'll still be able to download CSV files of all the crime and outcome data, it'll just be a page on the data.police.uk siite instead of the main www.police.uk site, and in a different (hopefully for the majority of people, better) format. It will be broken down by LSOA instead of NPT though. I'll post a proper, sepearate thread about this later today.

On the crimeinlondon/met ward/npt issue specifically... I know at the moment the Met NPTs are coterminous with wards, but it's really not safe to rely on that remaining the case.

I found out this morning that the Met are currently restructuring their NPTs, and I've got no idea whether these new ones, when they go live, will be co-terminous or not. It's not something the Home Office have any say or control over, because it's an operational policing decision.

The good news is that as part of the data.police.uk improvements there will be a 'poly' parameter added to the 'crimes-street' and 'outcomes-at-location' API methods, which you can pass the coordinates of the actual, permanent ward boundaries to and get the data back. This will be a much safer, long-term way to source the data you need for the site.

I think what you're doing with crimeinlondon is great, and I'll be happy to help you out with switching over.

Alex

 Flag as offensive 

Re: Changes

Matt's right - Alex, please could we have a clear statement of exactly what will and what will not be available prior to live release of the data which will be already uploaded to the police.uk? 

3rd party developers who already have built systems around the existing setup need advance warning. 

 Flag as offensive 

Please keep OSGB36 coordinates

Hi there,

Just to say that I welcome improvements to the download portal, and that better provision of bulk data would be very handy.  I fully support the comment by Armin (post #17) which is a good solution all round!

I would be very sorry to see the OSGB36 coordinates replaced with WGS84 lon/lat coordinates though - these should be provided in addition, not as a replacement.  I want the data to be consistent with Ordnance Survey's Open Data, as well as the statistical boundaries as supplied by ONS, all of which are provided in OSGB36 (in fact most government data is).  Also, the coordinate conversion from WGS84 to OSGB36 is an awkward step to add to any existing data loading process.  I know that Google and Bing have monopolised the mass onine mapping market, and that for global data WGS84 is a good choice; but most expert users of GIS data in Great Britain still use (on the whole) OSGB36.

If there is a plan to provide summary statistics at polygon geographies, Output Areas are far superior to LSOAs in my opinion, but the data would ideally need to be "cut" prior to generalisation of the points.

Many thanks,

Jonathan.

 Flag as offensive 

Hi Jonathon, Using a global

Hi Jonathon,

Using a global coordinate system like WGS84 makes more sense for police.uk than OSGB36, because we also cover Northern Ireland.

The vast majority of the open data we make available already uses WGS84 (in the API and in all the boundary files we provide). I'll speak to the rest of the team here about the possibilities, but I think that harmonising on one coordinate system - and leaving it up to the end user to convert if they want to - makes more sense from a cost/benefit perspective than going the opposite way and trying to extend everything to have three-way support for WGS84, OSGB36 and OSNI52. It would certainly cost the tax-payer less!

On the OA/LSOA subject -  You're right, the anonymisation of the data means that OAs aren't really suitable, because the risk of crimes 'jumping' accross boundaries during the anonymisation is high. Plus they're very small generally. Even up at LSOA level, it looks like theres going to be lots of LSOAs with only one or two crimes.

So going more granular that LSOA isn't going to be that meaningful, and could potentially be very misleading.

Hope that explains the reasoning behind things a bit.

Alex

 Flag as offensive 

Coordinates and data formats

Hi Alex,

Thanks for replying.  I totally understand the rationale for using WGS84 to cover both Great Britain and Northern Ireland, but I have some issues with making the swap.

1) All the data published so far is in national grid coordinates, so all your early adopters have put systems in place to use these coordinates; your changes will favour certain new adopters but will undoubtedly annoy anyone who has already developed something to load and process the existing data format and in particular the coordinates.

2) I want to use the data in conjunction with OS OpenSpace and Ordnance Survey Open Data, which (last time I checked!) was also a part of government.  It seems a shame for Police.uk to favour Google/Bing users over those who choose to use data from our National Mapping Agency.

3) I don't buy the idea that "supplying the coordinates in three coordinate systems costs taxpayers more money".  The points have been supplied in the national grid coordinate systems for the last two years, so you already have systems in place to deliver them, and the WGS84 coordinates are equally possible (I've now seen the new portal!).  Where, exactly, is the cost in simply appending the lat, lon as new fields while retaining the old ones?  I can't see a genuine cost in that.

4) You need to give people more notice for changes like this, or at least dual-run both formats for a longer period.  Just one month is not a fair notice period to developers and entrepreneurs to make significant changes to their systems.  This kind of thing creates uncertainty and makes OpenData seem like a flaky thing to build a business on (which goes against the whole idea of the thing).  I for one am seriously put off by this.

I hope that explains my position.  Please take these comments seriously.

Many thanks and best regards,

Jonathan.

 Flag as offensive 

Hi Jonathan,That’s useful

Hi Jonathan,

That’s useful feedback and thoughts, thank you.

Favouritism to one mapping provider over another isn’t something that’s played any part here.

However, we do need to use the appropriate tool for the job, and WGS84 is better suited to the task because the extent of our geographical coverage.

If you look through the API docs (the API is essentially a thin layer on top of the back-end databases), you’ll see that everything from crime locations to neighbourhood meeting locations is held in WGS84 only. We don’t have OSGB36/OSNI52 coordinates stored anywhere, and haven’t for a very long time.

When the old maps.police.uk site launched nearly 4 years ago, the CSV files made available on the site (and the back end databases) used OSGB36. By the time we released the API six months later, the site had proved popular, we had realised that expansion beyond GB was likely, and switched everything over to WGS84. However, like you say, we didn’t want to break anyone’s loading and processing procedures, so maintained the CSV file only as-was for the next 3 and a half years, converting to OS coordinates at runtime when generating the files.

Making breaking changes isn’t something I take lightly, but with the introduction of new crime categories (cross-government) and the retiring of neighbourhood-level data it meant that changes to the format were inevitable anyway. Combined with the release of the data.police.uk ‘hub’, it felt like now was the most appropriate time to make the change.

I don’t anticipate there being any changes to the CSV files or API for the foreseeable future.

Having a longer cross-over period is a learning point we probably need to take away from this release. It’s actually two months (not one), but it would be good to get your thoughts on what kind of length would be appropriate.

Thanks

Alex

 Flag as offensive 

Thanks

Hi Alex,

Thanks for your considered and reasonable reply.  The explanation has helped along with the understanding that changing the format will not be a regular occurence!  I certainly prefer many aspects of the new download format, and will make the necessary changes to my loading process to read the new column layout and include the conversion from lat/lon to national grid (even though I've literally just completed a loading and processing procedure for the previous format.  Ah well, such is life!).

As for the change-over / dual-running period, I would recommend 6 months.  I know it may seem like a long time, but actually many data providers provide this or longer when making major changes.  Certainly any businesses with production systems could have some major IT hoops to jump through in order to change their loading processes, with all the testing that needs to be done, etc...  Some businesses would need to allocate resource and funding to such a change, while others would need to deal with contractors, etc..., all of which takes time.  Also, having several consecutive months of sample (or real) data provides a much better test of new processes than just one or two files.  As an additional suggestion, I think that making sample data available much earlier would help a lot - part of the problem with this release is that the sample data arrived almost at the same time the new portal was released, and that there is only one (or two?) more files due to be released in the old format.

One final thing - are there likely to be further classifications added in the coming months/years, or has that stabilised again for now?  I welcome the new classifications, but it would be so much easier if all the remaining classifications that are suitable for "public consumption" could be added at the same time. That's because when calculating summary statistics over a particular geography, it generally requires a separate database column for each classification - so each change is a bit of a headache - particularly when it happens mid-year, in which case one has to decide between having a discontinuity, or whether to "calculate" the old classifications from the new ones in order to maintain consistency.  Perhaps I should design my database tables to be more flexible, but I'm going for performance rather than flexibility! :-)

Best wishes,

Jonathan.

 Flag as offensive 

I too welcome the

I too welcome the improvements. The ability to bulk download the data, linkages to OA's & LSOA's, the expansion of the crime categories. inclusion of lat/lon as well as the existing osgr's are all to be welcomed. However, I am interested in the classification of neighbourhoods into types - Geo-demographics and street level crime data could prove to be very useful if Scotland also published the same data. I have been hunting around but I cannot see that this is going to happen. Does anyone know what Scotland are planning to do?

 Flag as offensive