Lucia de Berk – a martyr to stupidity

April 9th, 2010 by Ben Goldacre in bad science, numerical context, statistics | 53 Comments »

Ben Goldacre, The Guardian, Saturday 10 April 2010

Lucia de Berk is a Dutch nurse who has spent 6 years in jail on a life sentence for murdering 7 people, in a killing spree that never happened. She will hear about her appeal on Wednesday, and there is now little doubt that she will be let off. The statistical errors in the evidence against her were so crass that they can be explained in one newspaper column. So will the people who jailed her apologise?

The case against Lucia was built on a suspicious pattern: there were 9 incidents on a ward where she worked, and Lucia was present for all of them. This could be suspicious, but it could be a random cluster, best illustrated by the “Texas Sharp Shooter” phenomenon: imagine I am stood in front of a wooden barn with a machine gun in each hand, maniacally firing off a thousand bullets into the wall. I remove my blindfold, walk up to the barn, find 3 bullets which are very close together, and carefully paint a target around them. Then I announce that I am an olympic standard rifleman.

This is plainly foolish. All across the world, nurses are working on wards, where patients die, and it is inevitable that on one ward, in one hospital, in one town, in one country, somewhere in the world, you will find one nurse who seems to be on a lot when patients die. It’s very unlikely that one particular prespecified person will win the lottery, but it’s inevitable that someone will win: we don’t suspect the winner of rigging the balls.

And did the idea that there was a killer on the loose make any sense, statistically, for the hospital as a whole? There were 6 deaths over 3 years on one key ward where Lucia supposedly did her murdering. In the 3 preceeding years, before Lucia arrived, there were 7 deaths. So the death rate on this ward went down at the precise moment that a serial killer – on a killing spree – moved in.

Even more bizarre was the staggering foolishness by some of the statistical experts used in the court. One, Henk Elffers, a professor of law, combined individual statistical tests by taking p-values – a mathematical expression of statistical significance – and multiplying them together. This bit is for the nerds: you do not just multiply p-values together, you weave them with a clever tool, like maybe ‘Fisher’s method for combination of independent p-values’. If you multiply p-values together, then chance incidents will rapidly appear to be vanishingly unlikely. Let’s say you worked in twenty hospitals, each with a pattern of incidents that is purely random noise: let’s say p=0.5. If you multiply those harmless p-values, of entirely chance findings, you end up with a final p-value of p < 0.000001, falsely implying that the outcome is extremely highly statistically significant. With this mathematical error, by this reasoning, if you change hospitals a lot, you automatically become a suspect.

One statistician – Richard Gill – has held the Dutch courts’ feet to the fire, writing endless papers on these laughable statistical flaws ( ). Alongside the illusory patterns he has identified, there was one firm piece of forensic evidence. Some traces of the drug digoxin were found in one baby who died. The baby had previously been prescribed digoxin, months previously. Three court toxicologists now say the digoxin was not the cause of death.

In fact, even the Dutch state proseution now accepts that Lucia should be acquitted, and that there was no evidence for an unnatural death in any of the patients, though her convictions for stealing two library books from the hospital library – shamefully and bizarrely – will be upheld. Lucia denies stealing these two library books. Now living with her partner while she awaits the final judgement, Lucia is penniless, denied unemployment benefits because of her unusual status, and paralysed down one side following a stroke which she had, in 2006, aged 44, in the week she was told that her conviction would be upheld. Watch what the Dutch legal system does next, because they owe this woman a great deal.

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

53 Responses

  1. gill1109 said,

    May 21, 2010 at 12:44 pm

    My ad hominem to Henk Elffers: in 1974, aged 22, I was hired as a new PhD student at the Mathematical Centre, Amsterdam. Henk Elffers had been hired a year or so earlier, and he was quite a few years older than I, and he became a good friend, and I would even say a mentor, to me. I was struck by his drive to “purify” applications of statistics in the social sciences, his fight against the bogus scientificiality involved in many applications of then advanced statistical methods in the social sciences and in psychology.

    In those old days the idea was that you did some research, played around with various statistical consulation jobs, slowly came to a PhD thesis topic, wrote some papers yourself, and at some stage found a “supervisor” who became the responsible “promotor” when you defended your later completed thesis.

    So we had some fun together and wrote some papers together, and did other things we liked and didn’t like … in about 1976 Henk’s wife was pregnant with their first child and he wanted to have a half-time position. Our boss did not allow that so he quit and went to a Geography department in a different city. At that time he had not chosen a topic for his PhD nor a supervisor. He moved into social-economic geography and from their to economics and finally to law. As our paths diverged we lost contact altogether, but I always retained a soft spot for him and a great deal of respect.

    The statistics he had learnt previously in his last (master’s) year at university was basicly “almost nothing”, even according to the standards of the day, both from the applied and from the theoretical point of view.

    When the Lucia case started up, in 2001, one read in the newspapers that “statisticians” Henk Elffers and Richard de Mulder were giving evidence in the court. No-one in the statistical world had ever heard of either, except for the one or two colleagues of ours back in ’74, ’75 who were still in the business. Later the defence got the services of a rather pure and philosophically inclined probabilist and an even more theoretical and philosophically inclined computer scientist and logician. The new defence lawyer of Lucia had met one of them at a cocktail party. The first defence lawyer of Lucia had actually done some research and found that a statistician in Norway, Odd Aalen, had worked on a similar case. He asked my old friend Odd for advice and Odd suggested he contact myself. However the new lawyer preferred to stick with his new friends Ronald Meester and Michiel van Lambalgen.

    Later there were often talks and symposiums about how one should statistically analyse such data. Henk always gave very clear, and very clearly motivated, talks, in which he spelt out a kind of idealised version of what he had done for the courts. The Bayesian’s tended to discredit themselves with stupid and over-ambitious claims. The probabilists tended to miss the point.

    It was quite a shock when we later found out, say in 2007, that Henk’s sanitized version of his work was rather different from what he had actually done. The multiplication of three p-values for no reason at all adjacent to the statement that the conclusion from combining the data from the three wards was a resounding “this was not chance”, was an apalling “mistake”. The number 1 in 342 million lived a life of its own, most journalists, many lawyers, all the men and women in the street, interpreted this as “the probability that Lucia is innocent is…”.

    Of course the product of three p-values can be used as a test-statistic and a p-value can be computed from it, under the assumption of independence (under the null). p-values are essentially uniform random numbers between 0 and 1, under the null, so their product has the distribution of a product of uniform random numbers between 0 and 1. Twice the negative logarithm of this has a chi-squared distribution and Bob’s your uncle (Fisher’s combination method). This is however only a “last resort

  2. gill1109 said,

    May 21, 2010 at 1:03 pm

    sorry,.. to continue,… the Fisher combination method is however only a “last resort” way to combine the information from several contingency tables, the Cochrane-Mantel-Haenszel test is much more sensible. It comes down to looking at the total number of incidents in Lucia’s shifts, and comparing it with the null distribution obtained by thinking of each of Lucia’s wards as a separate vase of red and blue balls (shifts with incidents and shifts without), and taking from each vase separately the number of balls equal to the number of shifts.

    Of course there are lots more problems around all this. However, if Henk had done a decent computation the words “1 in 342 million” would never have been spoken in court – it might have been 1 in a 1000 or 1 in 10 000. Not so devastating. If he had checked the data the number would have collapsed and the case would have collapsed. If someone had mentioned that if there were 2 missing incidents in shifts outside Lucia’s, then the number would have collapsed, everything would have been different. Richard de Mulder, Henk’s colleague at the Erasmus law faculty, a lawyer with an MBA who knew how to start up and how to close down Microsoft Word, hence the big man in legal informatics, said that even if the data would have been a bit wrong the conclusion would have been the same. He said that everything that Henk did was state-of-the-art correct. And that it all meant that Lucia had to explain why the incidents happened in her shifts. Henk had provided some reasons “by way of example” and this was all there was on the table, so the court asked Lucia verbally if any of those things was true, and she gave the wrong answers, so she crucified herself on the statistical cross which Henk had put up for her.

    Henk never admitted these mistakes in public, only in private. He never read the judge’s verdict and he spread slanderous accusations about Ton Derksen the philosopher of science. Henk had been brain-washed by the lies spread from the children’s hospital, he was deeply moved by the fact that a baby had died when admitted over the weekend merely for social reasons (to give the parents a break). And Lucia was on duty. This child had in fact been incorrectly diagnosed by Arda Derksen and she was the only medical specialist ever who claimed that the death was unnatural. It was her own patient. As the witch-hunt was her own witch-hunt, and she was the director of investigations.

    I am so disappointed that Henk never admitted any resonsibility for anything. He became a turn-coat, he lost his scientist’s humility and took on the lawyer’s arrogance. I am so sad about that.

  3. gill1109 said,

    May 21, 2010 at 1:04 pm

    By the way “statistician” Richard de Mulder and hospital director Paul Smits both have MBA’s from Rochester. I wonder if they were old mates of one another? It wouldn’t surprise me.