Combined Online Information System
Posted by Her Majesty's Treasury on 16/12/2011 106 comments
The Combined Online Information System (COINS) is a database of UK Government expenditure provided by government departments. The data is used to produce reports for Parliament and the public including: expenditure data in the Budget report; Supply Estimates; Public Expenditure Statistical Analyses (PESA); Whole of Government Accounts (WGA); the monthly Public Sector Finance Releases. It is also used by the ONS for statistical purposes..
15/12/11 – Following the publication of the Whole of Government Accounts 2009-10 on 29 November 2011, the raw data used by WGA has now been released.
21/09/2011 - Following the initial release of COINS raw data in June 2010, additional data for the year 2010-11 has now been released, along with updated data files for the years 2006-07 to 2009-10, in September 2011.
Data on COINS are only maintained for five prior years, so that years from 2005-06 back are no longer updated. The data files released in June 2010 (for the years 2005-06 to 2009-10 inclusive) remain available.
15/12/11 – Following the publication of the Whole of Government Accounts 2009-10 on 29 November 2011, the raw data used by WGA has now been released.
21/09/2011 - Following the initial release of COINS raw data in June 2010, additional data for the year 2010-11 has now been released, along with updated data files for the years 2006-07 to 2009-10, in September 2011.
Data on COINS are only maintained for five prior years, so that years from 2005-06 back are no longer updated. The data files released in June 2010 (for the years 2005-06 to 2009-10 inclusive) remain available.
Comments (106)
Thanks for this
Thanks for this article
agriturismo
When I first Discount
When I first Discount Louboutins set eyes on cheap louboutin shoes Helen Steele's SS12 collection, with its vibrant prints on sumptuous silks, I immediately thought it would make for a great story christian louboutin pumps on modern florals, perfect for Chelsea Flower Show Christian Louboutin New Arrivals week. But then I got closer and discovered these weren't florals at all but abstract prints which opened the gateway to an intriguing world Christian Louboutin Boots of thrash metal, duck farming and Christian Louboutin Ankle Boots bungee painting...
Irish artist Christian Louboutin Peep Toe Pumps and fashion designer, Helen Steele, works from her studio in what she describes as "the lush rolling hills of Monaghan, poet Patrick Kavanagh's homeland" in a converted duck hatchery. Not as odd a choice as it may first appear when you hear that Christian Louboutin Pumps her husband is a duck farmer who, according to chef Heston Blumenthal, produces the best Peking duck in the world. From this unusual base, Christian Louboutin Sandals she creates fine art, for customers including Christian Louboutin Wedges fashion luminary and internationally reknowned print queen, Diane von Furstenberg, as well as fashion range, using some unconventional print-making http://www.louboutinpumps-cheapsale.com/ techniques
I appreciated your article.
I appreciated your article. I'm writing a report on this topic for my blog. Thanks
Hotel Toscana
bgnb
Air Conditioner Repair
Thanks for this anyway.
Amazing
Amazing, the up to date information has enhanced my knowledge allot. Keep posting such useful posts so that I can reap maximum benefits from them. Thank you
logo designer
I welcome the move to
I welcome the move to transparency but note all the requests for a summary in an "easy to use form". Although the raw data is useful for those with the time and skills to be "armchair auditors" it would be wrong for this to become the only form of audit.
Hotel Roma
Thanks for this article. I
Thanks for this article. I was looking for this information
Hotel Roma
I am happy to find this post
I am happy to find this post very useful for me, as it contains lot of information. I always prefer to read the quality content and this thing I found in you post. Thanks for sharing small flour mill machinery
Anyone saying bittorrent
Anyone saying bittorrent gives you viruses is a complete idiot, torrenting is fine!
Beat Making Software
Great
Buy Viagra Online
I agree with your point, please share with us more good articles.
Guest
with any specific builder who fits your criteriaCoach Outlet. and complex
carbs found in beans that achieve this effectLouis Vuitton
Handbags. should be clean and positive but not preachy and dull [url=http://www.cheapcoachcouponspurseshandbags.com]Cheap Coach Purses[/url]. There are other Christian youth group activities for churches to consider[url=http://www.coachbagsfactory2012.com]Coach Factory[/url].
Total health expenditure by age groups
Hello,
I need UK total health expenditure for 1950-2011 by age groups <1,1-4,5-9,10-14,15-19 and etc. Could you please tell where I can get it?
Raw data - we must trust someone to digest it
There are various comments here about the impossibility of digesting all this data. I welcome the move to transparency but note all the requests for a summary in an "easy to use form". Although the raw data is useful for those with the time and skills to be "armchair auditors" it would be wrong for this to become the only form of audit. Surely the "easy to use form" is a set of published financial statements, together with assurance from an independent, trusted source that those statements represent a true and fair view?
Doesn't everything we see on this site emphasise the value of the National Audit Office and related functions for Scotland, Wales, the health service and local government?
Craig A
Seruritie Vunribilities
Using Bit Torrent, which is know to transmit viruses, re-enforces my feeling Government don't want people to understand and scrutinise this information.
awesaome topic
What is likely to be embarrassing for the Government is when searchers discover payments which were made as a result of proposals which were approved, became projects, and were later discovered to be fraudulent and covered up.
Buy Backlinks
Bit Torrent does not transmit
Bit Torrent does not transmit viruses. People who wish to infect your computer with a virus, publish copied versions of a computer program with a virus installed on public Bit-Torrent websites.
This site is not for the public to add torrents, they have only been created the site owners. The files are CSV files which cannot be used to transmit viruses.
Viruses are only spread via Bit Torrent when the author of the Torrent intended you to download a virus.
Very user-unfriendly
This data is clearly not designed for Joe Public to use.
Creating lists of huge files without meaningful description is the worst way to manage and present information.
Transparency is great, but only if people can find the information they want.
2/10 for this effort.
Raw data, hard to digest
Hi there, I agree with the chaps that you have done a wonderful job releasing all this information. And I also agree that though the format may not be ideal, the various technologies for its dissemination are a good choice. However, I'd make a tentative suggestion... Now that you have the data readily at hand, you might as well make it a little bit more digestible.
To bring you an example, my company produces mobile VAS solutions for big telcos, and we get immense amount of raw data. My line managers expect me to provide them with intelligible insights that enable them to make informed decisions. So I dig through the pile, slice it and dice it, until it makes sense. I do not draw conclusions, at least not explicitly, nor make decision, only possible suggestions of alternative ways for interpretation.
I think in this case, the public does expect two things. The raw data, so the government can be held accountable. And an informative, easy to undertsand form of what the essence of 44 gigs of this data is. (Yes, I know, I am fully aware that it gives rise to criticism, but let's face it, what does not?)
Thanks,
Lefty
data quality questions
I wonder why these data are encoded in utf16?
Is there a formal description of the schema - the data's hardly exported in tnf, but it would at least be nice to get the format descriptions. I cannot find this info in the treasury docs.
fwiw, the raw data for facts..2009_10 compresses down to 15MB with bzip2 after ripping out the NULLs, as opposed to the 70MB on the web site.
The coins data is actually
The coins data is actually pretty small
try www.publicspendingdata.co.uk for smal csv files for each department / Program / account
More tips for Linux users
The file adjustment_table_extract_2009_10.txt contains many broken lines, i.e. some fields contain one or more newline characters. Here is a small Perl script to convert from UTF-16 to UTF-8, repair the breaks and do some filtering:
#!/usr/bin/perl
# - convert text encoding from UTF-16 to UTF-8
# - convert DOS/Windows newlines (CR-LF) to host system's newlines
# - fix broken lines (spurious line breaks in some records)
# - delete "NULL"s
#
# Examples of usage:
# coins.pl 82 < infile.txt > outfile.txt
# zcat infile.gz | coins.pl 82 > outfile.txt
# 7z -so e infile.zip | coins.pl 82 > outfile.txt
#
my $fields = $ARGV[0]; # Number of fields expected (81 in fact table, 82 in adjustment table)
my $part;
binmode(STDIN, ':raw:encoding(UTF-16)');
binmode(STDOUT, ':raw:encoding(UTF-8)');
while (<STDIN>) {
s/[\n\r]//g; # remove CR and LF
s/\@NULL/@/g; # remove "NULL"
my $data = $_;
my $str = $data; # count the number of fields
$str =~ s/[^@]//g;
if (length($str) > $fields-2) { # enough fields?
print "$data\n";
$part = "";
} else { # not enough fields
$part .= $data; # join the parts that we have so far
$str = $part;
$str =~ s/[^@]//g;
if (length($str) > $fields-2) { # do we have enough now?
print "$part\n";
$part = "";
}
}
}
Geographic Granularity - a null field
Local government... (also citizen bodies?) would I am sure appreciate some drill down route-map into what the local spends are... perhaps down to NUTS 3 level or even district council.
From a quick scan of the code table there I can't see a way into this..
Problems with the adjustments file
Is anyone else having problems with the adjustments file? Having converted the 2009-10 file to utf8, I'm finding that some records have newlines in them, so that when I try to import the file into PostgresQL the loader complains that lines are truncated. I'm also seeing some lines that have more fields than they should.
2009_10 adjustments file
Hi Ian
I've not got any embedded newlines. however, there are a lot of files with embedded @ signs in the data, leading to (I think) 466 lines with broken formatting. Notwithstanding comments on the site, it's not simple to identify where these have gone wrong with general regex's: 528 lines have the string "@ ", 8295 the string " @", 406 lines have the string " @ ".
I think that the only realistic way to get this fixed is to ask for better exporting mechanism.
I find it ironic that there's a big ad for the semantic web on this page as SW depends on accurate data.
I must say that I'm struggling to understand what the data means, eg there seem to be some very odd records (GDP figures and GDP deflators), reading the notes, I cannot see how the number of records can fall between forecasts, but it does and some of the snapshots look very short.
Tim
Adjustment Files - Rosslyn
The adjustment files were very poorly prepared. We found thousands of errors in both files. We had to adjust our data validation tool and create a version just for these files. Combinations of missing end of lines and poor column delimeters - who ever thought of using '@' to seperate columns, when this is also widely used in the descriptions. Obviously the data prepartion must have been done in a hurry and not tested.
Couple of hints - start looking for the following in the rows and replace the '@' with a space:
1. ' @ '
2. '@ xx-' where xx can be any value from 01 to 31
Tools such as UltraEdit are quite useful but you really need to a data validation application that can be modified on the fly, else these files will not be accurate.
Please visit https://rapidgateway.rapidintel.com to get access to the data in user friendly format.
Patience required
To all those who are complaining about the size of the files, format, etc, please just have a bit of patience. There are many groups who are working to provide more person-friendly ways of drilling down into this data. It's a massive step that it has been released at all - I've been involved in open data for some time, and I didn't expect HMT to release COINS data on anything like this timescale. So kudos to them. Everyone else just hang-on a short while until the user-friendly interfaces come along, or better still pitch in with ideas of the capabilities you want. What queries would you want to ask if you had a suitably user-friendly tool?
Tips for Linux users - correction 2
Sorry, the HTML filtering on the web site corrupted the text file in my previous comment. Here it is again:
#!/usr/bin/perl
binmode(STDOUT, ':raw:encoding(UTF-8)');
binmode(STDIN, ':raw:encoding(UTF-16)');
print while <STDIN>;
Tips for Linux users - correction
The method I gave in my last comment for converting from UTF-16 to UTF-8 doesn't work for the large files due to a limitation in iconv.
Instead, you can use a small Perl script. Copy the following four lines into a text editor:
#!/usr/bin/perl
binmode(STDOUT, ':raw:encoding(UTF-8)');
binmode(STDIN, ':raw:encoding(UTF-16)');
print while ;
Save the file as "utf16to8.pl" in your home directory. Make the file executable:
chmod +x utf16to8.pl
Now you can convert a text file of any size from UTF-16 to UTF-8 as follows:
~/utf16to8.pl < filename.txt > filename.utf-8.txt
Or, to unzip, convert and remove NULLs all in one go, type:
7z -so e filename.zip | ~/utf16to8.pl | sed -e 's/@NULL/@/g' > filename.utf-8.nonull.txt
Ironic, as it was government
Ironic, as it was government that "forced" OS to lock everything up and sell it's data in order to become a self supporting cost centre.
Radio
Re: Tips for Linux users
This was very helpful - thank you. The iconv solution worked fine on my 64bit Ubuntu 10.04 system.
Linux users more tips
#!/bin/bash
wget -o getdatai.coinsUrls2010.log -i coinsUrls2010
wget -o getdata.coinsUrls2009.log -i coinsUrls2009
for i in `ls -1 *.zip`
do
echo uncompressing and cleaning $i
cat $i | funzip | tr -dc "[:alnum:][:space:][:punct:]" > ../data/$i.txt
rm $i
done
note that using a stream decompressor such as funzip means you dont need p7 or large files support
Ive put the out turn data into a html format a bit like the gurdian on
http://www.publicspendingdata.co.uk/
and there are also .csv files of just the Outturn data for you to download and analyze in excel... the files are pretty small and easily downloadable which might be useful for those suffering decompression difficulties...
Tips for Linux users
The compressed file can be extracted using 7zip. The result is a text file in UTF-16 encoding which can be converted to the more common UTF-8 encoding using iconv. The NULL strings can be removed using sed.
Extract the data:
7z e filename.zip
where "filename.zip" is, for example, fact_table_extract_2009_10.zip.
Convert from UTF-16 to UTF-8:
iconv -f utf-16 -t utf-8 filename.txt > filename.utf-8.txt
Remove NULL strings:
cat filename.utf-8.txt | sed -e 's/@NULL/@/g' > filename.utf-8.nonull.txt
Or, more succinctly, you can do the whole job in one go:
7z -so e filename.zip | iconv -f utf-16 -t utf-8 | sed -e 's/@NULL/@/g' > filename.utf-8.nonull.txt
On Debian based systems, 7z is part of the p7zip-full package.
Hope that helps someone.
Improved spend visibility
Rosslyn Analytics, a London-based technology company that specializes in enabling organisations to quickly and easily obtain spend visibility, has launched a dedicated portal that gives the general public the ability to view the UK government’s recently published public sector data from COINS. This portal can be found at https://rapidgateway.rapidintel.com.
Great
car emblemsThis is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion.
This is my first time i visit here. I found so many entertaining stuff in your blog, especially its discussion.
Rapid Gateway access - login
The userid and password given on the site, do not work.
Peter.
A few more useful links
The Open Knowledge Foundation have been logging some of their learning on accessing the data in this Etherpad: http://pad.okfn.org/coins
The page includes links to spreadsheets created to work out what the codes in the dataset mean, contains some example python code for accessing the data. It also links to this set of Python scripts: http://bitbucket.org/okfn/coins
Rosslyn Analytics have imported the data into their analysis tool which generates graphical representation of some of the data: http://www.rosslynanalytics.com/newspress/index.php/rosslyn-analytics-ma...
Equivalent data set for income?
Will it be possible to get an equivalent of the COINS dataset for Income that the treasury receives?
Well done
It is nice to see that the goverment is being open with the information it hold.
Well done.
Hard work extracting
Perhaps in the coming days the data.gov.uk guardians can provide alternative formats for the data.
Having spent a full minute downloading the data (wow for torrents), it has taken me a long time to unzip.
Perhaps encode as ASCII and offer in a .tar.gz format?
So many people complaining about the format...
It's possibly the best format this data could have been released in. Seriously, this way anyone with a bit of python/perl/php/ruby/etc hacking knowledge and first year statistics tuition can do their own data mining and present it to the world, rather than everyone being stuck with the govt's pre-chewed spun-to-death version.
Complaints? Frankly I'm amazed the government has done this. Long may it continue.
Live Data
It would be nice to see access to the live data, via a database connection.
And its all very well expecting people to create software to access the data, but it would be great if some code that reads the files was made available along with it, that would go along way to explaining the format of the data.
Awesome
This is great. As mentioned above, that this is the raw data is very good. For those who are having a tough time importing it, rest assured that there are a whole bunch of concerned citizens and organisations that will be putting it together in a more digestible format very soon.
Data (So Far)
So I've spent a few hours working at the data - first part was going through the process of putting it in UTF8, then re-writing it into a more traditional CSV, importing the whole thing into MySQL and started rationalizing it for things such as Department Codes, Account ID etc, setting table indexes and pulling out DISTINCT sets.
Unfortunately the more processing I do the more stags I hit... SQL info result statements included in the CSV, clusters of non-integer account codes and so forth.
It looks like the Guardian has beat me to the punch in creating a searchable index, but the data format hasn't exactly helped - I understand this was rushed but IMO spitting out cleaned-up data with separate CSVs for each department and a list of department codes in an additional CSV would have taken little additional time on the governments end and have helped those of us not on the civil service payroll enormously :)
-- Mark
Department CSVs
Cleaned up data in CSVs by department and a list of department codes would be so cool! I can carry out basic analysis but don't have the knowhow to clean and organise the data.
If you manage it, please post a link ;-)
Impressive
This is a fantastic initiative. This sort of transparency should be the "default" for all government data. Now could you guys export this to North America now? ;-)
For Linux users
To decompress the fact table, install p7zip and run "7za e fact_table_extract_2009_10.zip".
The files themselves are encoded as UTF-16 (not UTF-8) text, which is a remarkably inefficient encoding for data that is probably mostly ASCII to begin with, but that's probably the fault of whatever database export/dump utility they are using on the system.
With data of this level of volume, you can't expect to use ordinary desktop tools for analysis. The Government should be commended for making this available - I don't know of any other democracy reaching this level of transparency.
Nice!
I think this might take a few minutes to sift through.
I would urge everybody to 'read the readme' (as it were) to be sure you know what it is that you are looking at. At least skim it, especially the "what the data won't tell you" bit.
For people who are just curious as to what it is, pages 11-20 of the guidance pdf have explanations of the column headings which might answer a lot of questions without having to download all the data.
It might be helpful for someone to add to the above datasets a 'representative sample' file with e.g. a hundred or so lines of data, so people can at least get an idea of what it looks like.
Rather more detailed/complicated than "DWP paid x for a red stapler on April 1st"...
-- DW.
Ways of exploring the data (without downloading it)
The guardian have produced a COINS Explorer interface for people who want to browse the data here: http://coins.guardian.co.uk/coins-explorer/search
The guardian site allows you to drill down into particular departments or areas of the dataset and then to export a CSV file. By heading deep enough into the data it should be possible to get simple CSV files which will open even in old versions of Excel etc.
The WhereDoesMyMoneyGo website also host a copy of the dataset here: http://coins.wheredoesmymoneygo.org/coins
For anyone who just wants to browse what's available at present - those should save a lot of hacking about with the data.
another concern
I do not want to complain on the weight or the format of the data, but it seems to me that this is just a query from a database and nobody did care about the quality after it (who uses @ as a separation element?!?!. the most difficult thing is not importing the data in this or that system but controlling that everything is correct because a few records do not import in the right shape. however, I really enjoy reading this sentence from the guidance booklet "Therefore any data extracted from the database are almost immediately out of date." any?in 5Gb data?
great
Is the Government’s Transparency Agenda having a positive impact on businesses that work with or would like to work with the public sector?
Closing The Sale