The real political nerds

May 8th, 2010 by Ben Goldacre in bad science, politics, structured data | 27 Comments »

Ben Goldacre, The Guardian, Saturday 8 May 2010

Data matters. We use it to understand what has already happened in the world, and we use it to make decisions about what to do next. But in among the graphics and electoral cock-ups lies a terrible truth: a small army of amateur enthusiasts are doing a better job of collecting and disseminating basic political data than the state has managed.

Chris Taggart blogs at CountCulture and was baffled to discover that there is no central or open record of the results from local elections in the UK. If you go to the Electoral Commission’s website, they pass the buck to the BBC, where you can find seat numbers for each area, but no record of how many votes were cast for each candidate. Plymouth University holds an unofficial database of these results, and they pay people to type every single one of them in, painstakingly and by hand. After all that they charge for access, which is perfectly understandable. So for democracy, open analysis, and public record, it might as well not exist.

“Want to look back at how people voted in your local council elections over the past 10 years?” asks Chris: “Tough. Want to compare turnout between different areas, and different periods? No can do. Want an easy way to see how close the election was last time, and how much your vote might make a difference? Forget it.”

Like so many data problems, all that’s needed is a tiny tweak: all this information is known to someone, somewhere, and it’s all been typed in, several times over, in several places, local websites, newspapers, and so on. Chris is pushing a simple solution, that is common throughout IT: a standard set of invisible tags on all local authority results webpages, so that the electoral results data can be consistently read and understood by computers, and collated for analysis by anyone who wants it. It costs nothing, it’s already compulsory for public consultation data, and Chris is making genuine headway, pushing his simple idea, to solve a huge problem, not because it’s his job, in some dismal quango, but for a laugh.

Until the StraightChoice was set up by idealistic nerds, nobody kept a record of the election materials which are distributed to the public across the country. Anyone could send them in, by simply sending an image, and Julian Todd now has an archive which political librarians would cry for, and it betrays many crimes.

There are the inevitable dodgy graphs, with parties using playfully distorted axes, and even European and local council election figures where it suited them (a Conservative leaflet in Holborn and St Pancras demotes the Lib Dems from their actual second place to third, and so on). They want a system where copies of every leaflet are formally sent to the website of the Electoral Commission, like with copyright libraries, and regulations which areenforced to forbid graphs which mislead tactical voters.

But beside the evidence of sneakiness, these volunteer projects are also generating data that provides a valuable insight into how politics works, on a par with the kinds of stuff you’d find on UKDA, the UK Data Archive for academics. StraightChoice, for example, has found a huge variation in activity, from a single leaflet in one safe Liverpool seat to 51 in the nearby marginal Liverpool Wavertree.

And what about policies? Francis Irving is one of the founders of MySociety, a charity set up to facilitate public engagement with democracy through nerdy solutions. They built TheyWorkForYou, which tells you more about parliamentary activity than Hansard, using the same dataset, but using it properly. “Wouldn’t it be nice” he asks: “to have structured data on what the candidates think on a series of local and national issues?”

Neither academics, nor parties, nor the media have achieved this: but 6000 activists around the country have worked on an incredibly complicated crowd sourcing operation built around DemocracyClub, again set up by two volunteers, Seb Bacon and Tim Green. With the help of mySociety, they populated the YourNextMP database of candidates, itself the baby of another volunteer, Edmund von der Burg. This data is now freely available, a resource for any political theorist or technically capable adolescent, right down to its rawest form.

Data is the fabric of our lives, and everywhere around us: but to be analysed, so it can generate new knowledge and understanding, it must be coralled into one place. In an ideal world, these empty frameworks would be built by national institutions: until they wake up, we have our nerds.


++++++++++++++++++++++++++++++++++++++++++
If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

27 Responses



  1. Statto said,

    May 8, 2010 at 11:59 am

    On a related note, I was wondering if anyone knows where one could obtain some data about recounts. I’d like to compare the error on a count with the sizes of small majorities.

  2. ellieban said,

    May 8, 2010 at 12:10 pm

    Statto,

    The recount at Oxford West got anything to do with that interest? Mine was piqued when I read somewhere that the first count only gave a 30 vote margin whic increased to 178 after the count. That’s a big difference. Time we switched to an electronic system?

  3. BrotherLogic said,

    May 8, 2010 at 12:27 pm

    “Want to look back at how people voted in your local council elections over the past 10 years?” asks Chris: “Tough. Want to compare turnout between different areas, and different periods? No can do. Want an easy way to see how close the election was last time, and how much your vote might make a difference? Forget it.”

    A lot of it’s on wikipedia. en.wikipedia.org/wiki/Sheffield_Council_election,_2007 for example. Plus isn’t this the kind of thing that data.gov was setup to solve? Only in the past few years has open data really become an issue and things are moving in the right direction so maybe we should just be patient.

  4. Richard Gadsden said,

    May 8, 2010 at 12:35 pm

    There is no official source for what the first count results were – the returning officer tells the candidates and agents that they are “minded” to declare a result and then someone requests a recount. Those “minded” results are pencilled on a bit of paper and are just destroyed after the result is declared, so they are only recorded if one of the candidates or agents notes them down and chooses to make them public.

  5. pajamapaati said,

    May 8, 2010 at 12:46 pm

    “Wouldn’t it be nice … to have structured data on what the candidates think on a series of local and national issues?”

    Honourable mention, then, to the Skeptical Voter wiki skeptical-voter.org/wiki which recorded the views of prospective MPs on a range of skeptical-related issues including a set of questions posed to candidates (e.g. the place of religion in law making and education, scientific advisers in policy-making etc)

  6. Statto said,

    May 8, 2010 at 12:49 pm

    Elliban: Maaaaaaybe.

    Richard: Thanks for the info. :) That’s a bit worrying, isn’t it? I’d've thought there’s a case that we have a democratic right to know how a result was obtained… Is the final count always the result declared? Is there any reason to give more weight to subsequent counts? (eg do they do them somehow more carefully?)

  7. adamsaltiel said,

    May 8, 2010 at 12:51 pm

    This is what I meant in my recent tweets. Not the best medium for conveying anything a bit more compicated.
    Thanks for excellent article-but there is something shameful here. Let me explain. If you go on the cabinetoffice web site you will see invitation to public to come up with ideas for opendata. The submitted ideas are really ridiculous as we can see in the context of 2010 ukelection. This is an example how government can become counterproductive and start to get things really wrong.
    And your article shows what has begun to motivate a lot of my political thinking, that small is better and small works in a way that institution do not.
    I do not think that either government or Whitehall are malicious, but I do think they can get into a grove which is mistaken. Our colective job is to find a remedy for this. There are no simple answers.
    The web site is very unfortunate. A misfortune slipping towards incompetence. And more than anything a missed opportunity. To this extent it is shameful.
    As to the availability of the data for opengovernment and openaccess the cabinetoffice do not get my vote of confidence. It seems to me, in the effort towards openness, one of the first areas that need to be explored is the relationship between the Cabinet Office with government on the one hand, and Whitehall and industry on the other.
    Meanwhile I will be checking out with enthusiasm the projects you have brought to my attention. Again, many, many thanks!
    Adam

  8. hakan said,

    May 8, 2010 at 12:53 pm

    Electronic voting with no paper trail and easy to rig elections? No thanks! 2nd term of Bush should be a lesson to all of us.

  9. daven said,

    May 8, 2010 at 1:47 pm

    A lot of this data has already been collected and made available in Northern Ireland, on sites like www.ark.ac.uk. The Linen Hall Library has been collecting political posters and leaflets (not just election ones) since the start of the troubles.

    England is just catching up with the experts in democratic innovation, conflict resolution and forming coalitions between people who hate each other.

  10. the.Duke.of.URL said,

    May 8, 2010 at 1:54 pm

    @haken

    Electronic voting need not entail no trail. And it is possible to set up such a system so that it can be forensically audited. This would make it much harder to rig any elections. Of course, any system can be abused.

  11. adamsaltiel said,

    May 8, 2010 at 2:29 pm

    @BrotherLogic I don’t think we should be patient. I think the projects mentioned in Ben’s article argue against this. We have been ‘patient’ for far too long. The technology could have been embraced by government about 5 years ago, or maybe a bit longer. It all exists. The problem has been that government has historically refused to embrace pragmatic solutions but instead has preferred large solutions from large suppliers. This area is a very good, fundamental, illustration of why centralised IT is so very wrong. Done well it could help to change the game. These issues do need to be proselytised, my role a bit.

  12. adamsaltiel said,

    May 8, 2010 at 2:35 pm

    some odd behaviour on this web site. got to love it. shows another point. Software is never really finished – and that doesn’t matter, we make it do. But big IT has an idea of completion all too close to perfection within what’s defined as needed. Completion and perfection are not needed. We just need what functions and what is left open sufficiently for improvement or replacement. No more.

  13. Andrew McLean said,

    May 8, 2010 at 2:42 pm

    I agree with the points about the disadvantages of electronic voting, but I think I can also she some light on the size of the discrepancy in the OXWAB count. I wasn’t there but the process for counting votes is the consistent across the country and I have observed at more ounts than I care to remember.

    The typical process goes something like this…

    Stage 1: Ballot papers are separated by candidate and bundled in piles of 20 votes for the same candidate.

    Stage 2: Piles of 20 votes for a candidate are collected in bundles of 25 piles (500 votes).

    Then effectively the vote are counted in a non-decimal counting system of bundles (500), piles (25) and units. ie 20 bundles + 5 piles + 2 votes = 10127 votes (in decimal place value notation).

    So a discrepancy of 148 is actually 6 piles and 2 votes. This could have resulted from as few as 4 errors (3 piles in the wrong bundle and 1 vote in the wrong bundle).

    My one complaint about the voting system is that typically the first stage is performed with observers scruitinising the counting, from a distance of a few tens of centimeters. But stage 2 and beyond is not subject to such scruitiny, and clearly this is the stage that is pronbe to the biggest errors.

  14. Toenex said,

    May 8, 2010 at 3:43 pm

    I’d be interested in knowing how the postal vote compares to voting on the day. I wouldn’t be surprised to see more of the ‘Clegg effect’ in the postal vote. Is such a breakdown held in this data?

  15. factician said,

    May 8, 2010 at 6:03 pm

    “Data matters.”

    Data matter. Data=plural. Datum=singular. Datum matters. Data matter.

    /pedant.

  16. Somerset Gestalt said,

    May 8, 2010 at 8:47 pm

    I work in local government research and have consistently tried to get old of small level turnout data. Even within the organisation my requests are met with rolled eyes and the complaint of no resources at best and absolute disinterest at worst.

    Electoral services staff are excellently placed to source this; but as we can see by the recent shrieking accross all the news channels, their priority is always going to be patching up the system. Until political pressure is such that we can maturely consider developing the knowledge of our system and not resort to process-driven name-calling then it’s going to stay locked down (literally) and lost.

  17. sjorford said,

    May 9, 2010 at 12:46 am

    @factician:

    “Data” is a mass noun, like water or air. It is in practice rarely used as a plural of datum. Mass nouns take singular verb forms.

    Water matters. Air matters. Data matters.

    Bad pedantry is worse than bad science.

  18. awhaley said,

    May 9, 2010 at 3:35 pm

    You can get a very comprehensive breakdown of local election results in Northern Ireland since 1993 here

    www.ark.ac.uk/elections/flg05.htm

    this is provided by Nicholas Whyte and is hosted by ARK which is a joint resource run by Queen’s University and the University of Ulster. The elections section also provides details of elections to the NI Assembly and for NI representatives at Westminster and the European Parliament.

    It also provides an opportunity to see how STV works in action (We use STV for all our elections except for Westminster), which might be of interest since this is the Lib Dems favoured electoral system. One thing you will notice is how rarely seats change hands – I heard Simon Hughes on the television today asserting that a great problem with First Past the Post is that it creates safe seats. You may agree that this but if you do then STV isn’t the answer.

  19. Filias Cupio said,

    May 10, 2010 at 4:14 am

    @factician, @sjorford:
    You are both right (both usages are common and acceptable) and therefore both wrong (for claiming the other usage is incorrect.)

    www.askoxford.com/asktheexperts/faq/aboutgrammar/data

    At much greater length but less authoritatively:

    grammar.quickanddirtytips.com/is-data-singular-or-plural.aspx

  20. Sili said,

    May 10, 2010 at 6:52 am

    they pass the buck to the BBC, where you can find seat numbers for each area, but no record of how many votes were cast for each candidate.

    What’s this then?
    news.bbc.co.uk/2/shared/vote2005/flash_map/html/map01.stm

    news.bbc.co.uk/2/shared/vote2005/flash_map/html/map05.stm

  21. hitchin said,

    May 10, 2010 at 9:15 am

    Ben is wrong when he says that “Until the StraightChoice project … nobody kept a record of the election materials distributed to the public across the country”.

    Bristol University Library has a huge collection going back to 1892. See:

    www.bris.ac.uk/is/library/collections/specialcollections/archives/election

  22. hrhpod said,

    May 10, 2010 at 11:37 pm

    I was a green PPC this time around and I enjoyed the opportunity afforded to me by the ‘sceptical voter questionnaire’ and ‘Your Next MP’, to lay out my views. Apart from anything else, it probably saved me answering the same questions over and over……

  23. tkeetch said,

    May 12, 2010 at 7:12 pm

    Ben,

    A minor but important correction:

    > (a Conservative leaflet in Holborn and St Pancras demotes the Lib Dems from their actual second place to third, and so on)

    The results they quoted on that leaflet were the 2009 European Elections, so they weren’t technically lying, just very misleading!

    I got this leaflet through my door and was very confused, took me a while to realise how they could make that claim.

    Tom

  24. rindo said,

    May 13, 2010 at 7:18 pm

    @ellieban.

    That’s intersting. If they really were only two counts with that degree of variability surely you couldn’t be very certain who ‘really’ won.

    Published result;
    23730 v 23906

    Assuming total number of votes cast was correct, difference of 30 would be
    23803 v 23833

    95% CI for difference between the means -1059.8 to 858.8 (paired t-test).

  25. rindo said,

    May 13, 2010 at 7:19 pm

    Hmm. Probably should use a non-parametric. No significant difference between the means using Mann-Whitney either.

    Further comments on the stats welcome :-).

  26. rindo said,

    May 13, 2010 at 8:40 pm

    Medians.

  27. factician said,

    May 15, 2010 at 3:53 am

    You are both right (both usages are common and acceptable) and therefore both wrong (for claiming the other usage is incorrect.)

    There’s something about this sentence that I love, very much.

You must be logged in to post a comment.