Is this a joke?

July 18th, 2009 by Ben Goldacre in bad science, evidence, evidence based policy, government reports, politics | 64 Comments »

Ben Goldacre, 18 July 2009, The Guardian.

We’d all like to help the police to do their job well. They, in turn, would like to have a massive database with DNA profiles from everyone who has been arrested, but not convicted of a crime.

We worry that this is intrusive, but some of us are willing to make concessions, on our principles, and the invasion into our privacy, in the name of preventing crimes. To do this, we’d like to know the evidence on whether this database is helpful, to help us make an informed decision.

Luckily the Home Office have now published a consultation paper on the subject. They defend their database by arguing that innocent people who have been arrested are as likely to commit crimes in the future as guilty people. “This”, they say, “is obviously a controversial assertion”. That’s not true: it’s a simple matter of fact, and you could easily assemble some good quality evidence to see if it’s true or not.

The Home Office have assembled some evidence. It is not good quality. In fact, this study from the Jill Dando Institute, attached to their consultation paper as an appendix, is possibly the most unclear and badly presented piece of research I have ever seen in a professional environment. Or I am having a bad day. Join me in my struggle to understand their work.

They want to show that the level of criminal activity in a group of people who have been arrested, but on whom no further action has been taken, is the same as the level of criminal activity in people who have been arrested and convicted of a crime, or who accept a caution.

On page 30 they explain their methods, haphazardly, scattered about in the text. They describe some people “sampled on 1st June 2004, 1st June 2005 and 1st June 2006”. These dates are never mentioned again. I have no idea what their plan was there. They then leap to talking about Table 2. This contains data on people each from a “sample” in 1996, 1995, and 1994, followed up for 30 months, 42 months, and 54 months respectively. Are these anything to do with the people from 2004, 2005, and 2006? I have no idea.

In fact I have no idea what “sample” means, perhaps that was the date they were first arrested. I don’t know why they were only followed up for 30, 42, and 54 months, instead of all the way to 2009. Crucially I also don’t know what the numbers in the table mean, because they don’t explain this properly. I think it is the number of people, from the original group, who have subsequently been arrested again.

Anyway. Then they start to discuss the results from this table. They say that these figures show that arrested non-convicted people are the same as convicted people. There are no statistics conducted on these figures, so there is absolutely no indication of how wide the error margins are, and whether these are chance findings. To give you a hint about the impact of noise on their data, more people are subsequently re-arrested over the 42 month period than over the 54 month period, which seems surprising, given that the people in the 54 month group had a much longer period of time over which to get arrested.

This is before we even get on to the other problems. At a few hundred people, this study seems pretty small for one that is supposed to give compelling evidence that there is no difference between two groups, because to prove a negative like this, you’d generally want a large sample, to minimise the chance of missing a true difference in the noise.

There is no evidence that they have done a “power calculation” to determine the sample size they’d need, and in any case, their comparison group feels a bit rigged to me. In their “convicted” sample they only count people who had a non-custodial sentence, and exclude people who got a custodial sentence, on the grounds that those people would be incapable of committing a crime during their incarceration. This also has the effect, however, of making the “criminal” group not quite so criminal, and so a bit more likely to be similar to innocent people.

I could go on. Table 1 is so thoroughly “not as described” as to be uninterpretible. In the text they talk about different cells on the table which are “solid red”, “stippled yellow”, and “blank”. In fact the whole thing is just blue.

This research was incomprehensible and unreadable. Anybody who claims to have been persuaded by the data quoted here is telling you, loudly and clearly in the subtitles, that they don’t need to understand a piece of research in order to find it compelling. Such people are not to be trusted, and if research of this callibre is what guides our policy on huge intrusions into the personal privacy of millions of innocent people, then they might as well be channeling spirits.

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

64 Responses

  1. JoanCrawford said,

    July 24, 2009 at 9:16 am

    @36, 37, 40 (Mike Whit, Health Pain)

    Mike, you are dead right. The other giveaway, of course, is that the post has cock-all to do with the topic under discussion.

  2. ferguskane said,

    July 25, 2009 at 2:45 pm

    The chances of a false positive depends on the test used, and is based on the number of DNA markers tested, combined with the natural variability of such markers. It is fairly easy to decrease the chances of such a false positive, one just increases the number of markers used. Thus, false positives should not be a major problem

    The essential problem with a national DNA database, is not that of a false positive match. I believe the real danger is of finding a true match, but with the wrong samples and leading to the wrong conclusions. If every UK citizen is in a DNA database, then one may assume that any hair fibre, skin sample or blood spot in a crime scene will match to someone. The problem is, did that person commit the crime?

    A national DNA database, would reduce the emphasis on real policework. A DNA match is a MUCH more powerful finding if the suspect is identified BEFORE the match. If the suspect is identified based on a match, then the rest of the evidence can be fitted around the DNA evidence. The onus then falls upon the suspect to provide an alibi. I’m not suggesting that fitting of the evidence would be done deliberately, but the DNA evidence would inevitably result in a bias in the investigation.

    In the end, a jury may be asked to make a decision largely based on DNA evidence from a search of a national database. Given the state of understanding of science and statistics, not only in the general public, but also within the legal profession, this is not a pleasing prospect.

    On a slightly different tack, it has been said that private citizens should always work to minimise the power of the state. I think this sentiment is generally right, and should be applied here.

  3. bagpuss said,

    July 27, 2009 at 11:45 am

    Psythe, your reasoning has a fundamental flaw.

    Your entire argument is based on the assumption of holding the DNA of everyone, but who is this everyone and how do we ensure that we obtain a DNA sample from them?

    I think it’s generally agreed that we do not have a comprehensive list of everyone living in this country. So how do we identify everyone, including those who are likely to be reluctant to make themselves known since they’re here illegally, and persuade them all to come forward to donate a DNA sample? And even if we could somehow achieve this miracle, there will undoubtedly be a way for people to appear to be giving a sample, but to actually give a fake one – say by paying someone else to pose as them.

    And that’s before we even consider people coming to the country on a temporary basis – tourists, business travellers, etc. Are we to stop everyone at the border and demand a DNA sample for the database? That might have a rather serious impact on the tourist industry and on business in general.

    So, at the “best”, we’ll only ever have an almost-complete database. Criminals will, of course, have the greatest incentive to find some way of avoiding their DNA being on the database, so the holes in the database are likely to be significant, even if they’re only small.

    So now, a DNA sample is taken from a crime scene, but it only matches to one person on the database. This person is, in fact, innocent but we know that they must be guilty, because they’re the only match on this “complete” database. Of course, the real culprit was someone who is already back home in some other country, or is the bloke who paid someone a few quid to impersonate him when his sample was taken, or is someone we have no idea has been living here for the last 5 years. Unfortunately, the innocent match was at home asleep, alone, at the time the crime was committed. They happen to live reasonably close to the crime (hardly unlikely – pretty much everyone in London, and plenty in the surrounding counties would live sufficiently close to a crime in London) and has absolutely no way of proving that they weren’t actually raping someone on the other side of the city.

    So we lock them up and throw away the key, while the real culprit goes on happily committing more crimes. And the innocent man gets to enjoy Her Majesty’s hospitality, with no-one even questioning his guilt unless the real culprit is careless enough to leave his DNA lying around at a crime scene again, while our innocent guy is inside.

  4. mikewhit said,

    July 28, 2009 at 6:09 pm

    @bagpuss:”or is the bloke who paid someone a few quid to impersonate him when his sample was taken”

    So how does this guy get away with not giving a sample when it’s really his turn – presumably a match for someone already on the system would be flagged ?

  5. HolfordWatch said,

    July 28, 2009 at 9:01 pm

    Aug. 2009: Popular Mechanics has some good articles on the state of the scientific evidence that underpins popular forensic science.

    CSI Myths: The Shaky Science Behind Forensics

    The truth about 4 common forensic methods: Problems with fingerprinting, ballistics & fiber analysis forensics

  6. bagpuss said,

    July 30, 2009 at 5:25 pm

    @mikewhit – Firstly, I wouldn’t be surprised if the system isn’t set up to flag matches and even if it is, it might not prove a problem. After all, there will be genuine coincidental matches.

    But more importantly, you are making the assumption that the person paid will also be on the database under their own right. If they’re currently living in the country without the authorities knowing, or are a temporary visitor, then they won’t be.

    My post wasn’t intended to be an accurate description of what would happen if a universal database were implemented. It was intended to point out just a few of the more obvious possible and probable flaws that will make it impossible for a database of “everyone” ever to exist. Even if that particular one is impossible in practice (and I doubt that) it doesn’t detract from my fundamental point.

  7. Psythe said,

    July 30, 2009 at 7:56 pm

    Sorry for lateness of this reply – forgot I don’t get an email prompt when someone replies to me here!

    Suw – you were concerned about the alternative uses of the DNA data. I agree wholeheartedly with you, and am most certainly NOT an advocate of the current system where the entire DNA sample gets kept. Another poster made the point that the more DNA you have the more you reduce the risk of a false positive, but there is nothing to stop you checking as many base locations as necessary on your test sample and your suspect once you have identified him – you don’t need to keep the sample for this.

    If only the very small amount of DNA needed for the initial marking technique is kept then data misuse is not a major issue – it is currently useless to insurance companies. You can make a good guess at ethnicity from it but I’m not sure how useful this will be to anyone. The only “naughty” use I can think of is proving someone was somewhere legitimately, but privately – for use in blackmail. It wouldn’t help the tabloids, even if it did get out, as it would be illegal for non-law enforcement agencies to use this data.

    David Mery – you mention a situation where someone got arrested and had his home searched despite the presence of previous DNA evidence. I agree this is pretty stupid, but it is n=1; we’re presumably agreed on the dangers of drawing conclusions from such evidence.

    SteveGJ – you’re absolutely right; my logic was flawed (or at least it didn’t convert very well into prose; it was so long ago I can’t remember now :-) )

    The fact remains that if there are multiple matches it would be better for the jury to know about them, which may or may not be the case with the current system, but should (in the majority of cases) be the case with a universal database.

    SteveGJ also suggests that forcing citizens to do something against their will is a very dangerous line to take. I’d argue that, rather than being forced to do something against our will, agreeing to do things that we would rather not do is a price we pay for living in society, with its privileges that we would rather not do without. A database of DNA markers only is minimally dangerous and can provide large benefits.

    A carte blanche to investigate email is considerably more dangerous (and is, I believe, pretty much done anyway by the USA via Echelon; its not illegal for THEM to spy on UK citizens, and not illegal for them to share the information they get with the UK authorities).

    Bish – yes, a DNA database could cause some convictions to be quashed, but only the unsafe ones which relied on DNA matches alone in cases where there was more than one match. One would also expect it to bring many more criminals to justice.

    FergusKane is concerned that one might get the right match on the wrong sample. However I’d argue that the average jury should be able to understand the logic that a skin sample, for instance, ties a person only to the site and not to the crime. Certain samples are harder to explain away than others – skin samples under the nails of the deceased, for instance.

    Bagpuss wonders how we will get a sample from “everyone”. I agree that a truly universal DNA database is not feasible – foreign visitors would as (s)he says likely remain exempt in the current climate. Posing for another could be made tricky by following up apparent duplicates, although this would admittedly increase the expense of the project. Taking a photograph of people at the time of sampling would reduce the cost (on the basis that if you have a match between a 5′ model and a 7′ wrestler you probably don’t need to follow it up) but would probably double the outcry. Getting the DNA sample at birth or entry into the UK would reduce the risk of people slipping through the net.

    Miscarriages of justice such as the one you describe can still occur, but at a rate of less than one in a billion. Assuming a hundred thousand cases in the UK per year which use DNA evidence this would be one case per ten thousand years (I’m sure SteveGJ will check my math :-) )

    Compare this to the potential 1 in 2 miscarriages of justice currently taking place when one relies on a jury.

  8. Psythe said,

    July 30, 2009 at 8:28 pm

    That last link of David Mery’s – the one about the DNA database leading to miscarriages – begged a sceptical investigation. Turns out it’s talking about miscarriages of justice rather than pregnancies – doh!

    It does make the point though that matches made on less than the normal ten loci have a higher risk of false positives.

    Does anyone know how big these loci are? They must be more than one base pair long (4^10 is just over a million).

    It also makes the point that a new technique, LtDNA, can use up the sample in getting its match, meaning that if the only match is a false negative further testing on the sample to exonerate the innocent suspect is not possible. I have to admit this is pretty scary.

    Having a universal database would still dilute the effect of this on the jury – the question would be; would you get more innocent people convicted as a result of this (due to crimes committed by people who were not on the database) than you would acquit from the realisation that the DNA evidence corresponded to multiple potential suspects and as such could not be relied upon?

    This sort of risk analysis does come into play at the level of the jury, rather than at the investigation level, where the data will show police which suspects to concentrate on and, potentially, obtain confessions from criminals who like the jury believe that DNA is incontrovertible – but not from the innocent who know it can’t be because they didn’t do it. Yes, this means that the innocent get questioned, but that’s pretty much the accepted situation with regard to police investigations.

  9. MedsVsTherapy said,

    August 5, 2009 at 3:58 pm

    How much will the false positive rate drop when this genetic data is combined with 20,000 in-home govt surveillance cameras?

    “They will be monitored to ensure that children attend school, go to bed on time and eat proper meals. Private security guards will also be sent round to carry out home checks, while parents will be given help to combat drug and alcohol addiction.”

  10. HungryHobo said,

    August 6, 2009 at 12:09 am

    ‘If I accept that the chance of a match between two close relatives is 1 in 10,000 as you state, then the chance of any of the wrong relatives rather than the “true” culprit are each 1:10,000. As there are three others, then the chance of a any “wrong” relative being picked is (near enough) 3:10,000 meaning that the “right” relative is going to be picked 99.97% of the time.’

    when I read this it struck me that by that arithmetic the chances of any 2 people having the same birthday in a class of 20 would be 20/365 rather than about 50/50 as it really is…

  11. MedsVsTherapy said,

    August 6, 2009 at 5:19 pm

    birthday: isn’t it around 189/365?
    for the first case, you have 19 possible matches. 19/365. for the second case having a different bday, you would have 18 possible matches – – given that the bday of the first case does not match the second case. Third case: you have already established the likelihood that there is a match with either of the first two, so now establish the likelihood of the third case with any remaining cases: 17/365?

  12. David Mery said,

    August 7, 2009 at 3:35 pm

    Reminder: you have until the end of today to send in your response to the Home Office consultation, if you haven’t done so already.

  13. heavens said,

    August 12, 2009 at 12:20 am

    Delster (and others): Today’s news says that a full DNA sequence costs ‘only’ US $50,000 per person. I think that fact alone addresses the “but they might do a full DNA sequence, instead of only those little cheap bits used for identification”.

  14. wayscj said,

    November 21, 2009 at 6:32 am

    ed hardy ed hardy
    ed hardy clothing ed hardy clothing
    ed hardy shop ed hardy shop
    christian audigier christian audigier
    ed hardy cheap ed hardy cheap
    ed hardy outlet ed hardy outlet
    ed hardy sale ed hardy sale
    ed hardy store ed hardy store
    ed hardy mens ed hardy mens
    ed hardy womens ed hardy womens
    ed hardy kids ed hardy kids ed hardy kids