No seriously, I felt the p-values, in my soul…

April 21st, 2007 by Ben Goldacre in bad science, statistics | 35 Comments »

Ben Goldacre
Saturday April 21, 2007
The Guardian

So this week in the papers a man was allergic to his own hair, bee colonies were collapsing because of mobile phones, and more. Speaking as the veteran of a great many squabbles, on MMR, phone masts, drugs, and more, I can tell you: facts are not entirely welcome. When all the evidence goes against someone’s beliefs, they will tell you, quite plainly, that they just know it to be true. They sense it. They intuit it. Nobody will ever listen to an explanation of why intuitions can be flawed – presumably because their intuitions have told them not to.

But we have an innate human ability to make something out of nothing. We see shapes in the clouds and a man in the moon; gamblers are convinced that they have “runs of luck”; we can take a perfectly cheerful heavy metal record, play it backwards, and hear hidden messages about satan. Our ability to spot patterns is what allows us to make sense of the world but sometimes, in our eagerness, we can mistakenly spot patterns where none exist.

In science, if you want to appreciate a phenomenon, it is often best to reduce it to its simplest and most controlled form. There is a prevalent belief among sporting types that sportsmen, like gamblers (except even more plausibly) have “runs of luck”. People ascribe this to confidence, “getting your eye in”, “warming up”, or more, and while it might exist somewhere, statisticians have looked in various places and found no relationship between, say, hitting a home run in one shot, and then hitting a home run in the next.

Because the “winning streak” is such a prevalent belief, it is an excellent model for looking at how we perceive random sequences of events, and this was used by a social psychologist called Thomas Gilovich in a classic experiment. He took basketball fans and showed them a random sequence of X’s and O’s – telling them that they represented hits and misses in a basketball game – and then asked them if they thought the sequences demonstrated streak shooting.

Here is a perfectly random sequence of figures from that experiment. You could think of it as being generated by a coin being flipped (I can explain why it’s random if you want but it’s a bit boring, essentially there is no correlation between one outcome and the next, and the number of adjacent figures with the same outcome – xx or oo – is the same as the number of adjacent figures with different outcomes – xo or ox). Here’s that random sequence:


The subjects in the experiment, when shown this entirely random sequence, were convinced that it exemplified “streak shooting”, or “runs of luck”. It’s easy to see why, if you look again: 6 of the first 8 shots were hits. No, wait: 8 of the first 11 shots were hits. I agree: no way does that look random. But it is.

What this ingenious experiment shows is just how bad we are at correctly identifying random sequences. We are wrong about what they should look like: we expect too much alternation, and to us, even truly random sequences seem somehow too lumpy and ordered.

Why is this important? Because it shows that our intuitions about the most basic observation of all, from which all others follow – our abilities to distinguish an actual pattern, from mere random background noise – are deeply, deeply flawed.

You cannot sense whether a pill improves intelligence, or cures the common cold, or whether MMR causes autism. Your tiny, beautiful ingot of human experience in the world does not present you with sufficient information to spot patterns on that scale: it’s like looking at the ceiling of the Sistine chapel with one eye through a very long cardboard tube.

Intuitions are great shortcuts. They’re very valuable for lots of things in the social domain: deciding if your girlfriend is cheating on you, perhaps, or whether a business partner is trustworthy. But for mathematical issues, or assessing causal relationships, intuitions suffer from inaccuracies, misfires, and oversensitivity. The challenge, perhaps, is to work out which tools to use where: because trying to be “scientific” about your relationship is as stupid as following an intuition about the risks and benefits of a treatment.

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

35 Responses

  1. jackpt said,

    April 21, 2007 at 12:28 am

    Good piece. Our perception is very efficient, but can’t be trusted. It’s like so-called Evp or when we experience pareidolia. With sports magic thinking is prevalent because mood affects play, if I’m overly preoccupied while playing Poker, or Go my play is worse. If I believed in bad runs no doubt I could become preoccupied with that creating somewhat of a self-fulfilling prophecy.

  2. boredagain said,

    April 21, 2007 at 2:52 am

    This entry made me think of a quote Alan Sokal used to introduce his essay on pseudoscience:

    ‘The human understanding is not composed of dry light, but is subject to influence from the will and the emotions, a fact that creates fanciful knowledge; man prefers to believe what he wants to be true’. – Francis Bacon

  3. Mark Wainwright said,

    April 21, 2007 at 8:30 am

    The ability to spot patterns where none exist is crucial to all kinds of pseudoscience. (And no doubt we all do it; all we can do is be on our guard.) But in the case of sportsmen, the claim that there are no runs of form is, a priori, extremely hard to swallow. A sportsman’s form is dependent on fitness and concentration which all kinds of extraneous things are likely to affect: a troublesome knee, having just split up with his girlfriend, a bit of a hangover, feeling depressed, etc. The non-existence of any such effect seems like a fantastically strong claim, so strong that I assume it isn’t the claim at all. Can you expand, Ben? For instance do the negative findings only apply to patterns within a single day’s play?

  4. wewillfixit said,

    April 21, 2007 at 8:56 am

    And then of corse the is the sheep and goats experiment which showed that believers in psychic phenomena are more likely to be poor at judging randomness than non believers.

    I wonder if you would get the same result for believers in things like homoeopathy?

  5. j l smith said,

    April 21, 2007 at 9:17 am

    Mark, the Gilovich experiment says absolutely nothing about the existence of runs of form in sport, a priori or otherwise. It’s a test of how humans perceive runs of form, but that’s not the same thing.

    (I wonder if the selection of basketball was deliberate: to the untrained eye (me) it looks more like brownian motion than a sport).

  6. bazvic said,

    April 21, 2007 at 10:23 am

    The point about the honey bee losses is interesting. CCD (as the Septics call it) is a very old syndrome.

    It is described in text books that pre-date mobile phones by decades (Marie Celeste hives and similar).

    Typically the colonies appear fine in the early part of the year but come April they are empty yet full of stores.

    The season 2005-2006 was very bad in the UK for beekeepers with loss typically 50 – 100%.

    The root cause is not surprising. In the late winter colonies are trying to grow and are under considerable stress. Many of the bees are old and anything that shortens their life span will tip the colony to destruction.

    To keep with the thread, whoever made the claim of the link between mobile phones and colony losses was faced with the grim certainty of colony loss being determined the previous year or blaming (and so finding comfort) something/someone that could be, in principle, fixed.

    In the case of the 2005-2006 losses, the trigger was probably the warm autumn, this lead to higher populations of varroa in hives which in turn spread more viruses which in turn led to shortened worker bee life span and so colony collapse. Cause and effect.

  7. cedgray said,

    April 21, 2007 at 11:28 am

    Not only is intuition a poor guide to accuracy in numerate fields, it’s very strongly skewed towards ‘agency’ – i.e. finding intentionality in causes.

    This is because most of the function of our frontal cortex is concerned with accurately sensing the intentions of other minds – our competitors, business partners and friends.

    So not only do we interpret poor causation, but the direction in which we see it is almost invariably caused by some agency. ‘Lady luck’ or ‘God’ or phrases like ‘The Whole Universe is Against Me Today!’

  8. raygirvan said,

    April 21, 2007 at 12:14 pm

    The Wikipedia List of cognitive biases has many more examples, and some are central to major science-vs-popular-belief controversies.

  9. Moganero said,

    April 21, 2007 at 12:50 pm

    “our abilities to distinguish an actual pattern” no doubt have great survival enhancing value, but evolutionarily the need to interpret statistical data is quite a recent phenomenon.
    Give us a few more million years and maybe we won’t need statisticians to interpret them – though I guess that while we still have people who have learnt to interpret them, we w0n’t need to develop the intuitive ability to do do so.

  10. Bob O'H said,

    April 21, 2007 at 2:36 pm

    Gilovich did also look at “winning streaks” in sports, or at least he looked at the “hot hand”, i.e. the sequences of hits and misses. I think the title says what you need to know:

    Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17, 295-314.


  11. Geoff_S said,

    April 21, 2007 at 5:39 pm

    Are you suggesting that practice, training, fitness levels and self confidence don’t result in improved sporting performance? I know my cycle time trial times (when I was young and fit)improved with training, and the more I practice flying my RC aeroplanes the better my landings 🙂

    Similarly I’m sure with basketball shots. More practice – more goals (or whatever they’re called). However, I’m sure you’re right when referring to random events like gambling or card deals. Though I sometimes thought the gods were against me when playing light-hearted lunchtime bridge games and my hands were devoid of any points at all for the whole hour 🙂


  12. Nanobot said,

    April 22, 2007 at 3:29 pm

    Geoff, I think it is a case of ‘all things being equal’, but you are quite right that in sport that is hardly a realistic scenario.

  13. A.S. said,

    April 22, 2007 at 3:43 pm

    Geoff, nobody could deny the positive effects of practice, training, fitness, etc. on sporting performance. As far as I can tell, the research does not even show that “winning streaks“ do not exist (in sports; obviously they cannot exist in gambling). What the Gilovich paper (and other research) does show is that players who are perceived or perceive themselves to have a winning streak do not actually perform significantly above their usual level of performance.

  14. Lise said,

    April 22, 2007 at 4:27 pm

    Training effects aside, doesn’t adrenaline (from scoring a goal or hitting a home run) have a somatic effect of its own? A nervous batsman is likely to be physically tense and therefore more prone to mis-hitting. A pumped-up batsman, fresh from hitting his last home run, is likely to be more physically relaxed and mentally focussed, and therefore likely to perform to the best of his ability. Sportsmen might refer to this effect as feeling “in the zone”. Although there’s a sound underlying point to be made about the perception of random data, there’s also a physiological effect in play when discussing sports performance.

  15. jackpt said,

    April 22, 2007 at 4:28 pm

    A.S. interesting, thanks, although with poker it’s a combination of skill and luck, despite being generally classified as gambling. Playing poker well is risk management. I only play tournaments and it’s usually only the skilful/disciplined players that get to the leader board. If I’m preoccupied with something there is a tendency in me to play worse hands or play badly (naturally I can only speak for myself), so I don’t play if I can’t give it my full attention. There’s diverse cognitive elements that wouldn’t be needed in games of pure chance. People that believe in “good runs” and “winning streaks” often make very poor poker players.

  16. Ken Zetie said,

    April 22, 2007 at 5:49 pm

    Didn’t Stephen J Gould investigate this sort of thing as well – when lying in hospital ecovering from some rare disease he studied baseball and basketball stats. It might be in “the Mismeasure of man” but I’ve leant out all my SJG books :(. He looked at the idea that basketball players will claim they are ‘in the zone’ in a game and score a much greater run of hits than normal. He found no evidence in all the meticulous stats of anything other than a normal distribution and a fairly constant figure of % chance to score for a given player for a given season. No run of form, no ‘zone’, just occasionally the coin will come down eight times in a row as heads. Well worth a read…especially if i could remember which book!


  17. bootboy said,

    April 22, 2007 at 8:13 pm

    Well put as usual. The brain is just a great big pattern matcher – matching precepts to abstractions on a sequence of different levels.

    The pattern matching required to turn a pulse of light into an internal cognitive map of the visual scene is a staggeringly impressive task for an evolution-engineered device. It makes sense that our brains are wired to attempt to impose patterns on events, and that they regularly encounter false-positives. For various reasons that I don’t have time to go into here, I reckon that the optimal engineering solution for the particular problems faced by the brain is going to be biased towards false-positives rather than false-negatives.

    I’m also convinced that this propensity towards false-positives in pattern-matching is responsible, in an evolutionary sense, for the conception of god.

  18. apothecary said,

    April 22, 2007 at 9:37 pm

    Another top article. WRT being “in the zone” etc – I’ve several times heard people say something to the effect that immediately after they’ve been praised for doing something well (hitting a six, scoring a good break, playing a tricky piece of music well, etc), their next attempt to do the same will end in failure. This they ascribe to the praise. I can see there might be a psychological aspect to that (being cocky and over confident) but is it not more likely to be an example of regression to the mean? Their mean performance is say, 7/10 on some abitary scale, they do something well (scoring 9/10) – their next act is more likely to be close to 7/10 than 9/10. Not being a statistician, I’d be grateful to be told if I’m barking up the wrong tree there.

  19. coracle said,

    April 22, 2007 at 10:50 pm

    It could be that if the performer breaks concentration eough to comment that they are ‘in the zone’ they then cease to be so.

  20. uriel said,

    April 22, 2007 at 11:18 pm

    The title reminds of Carl Sagan when asked a question to which he didn’t know the answer and the questioner persisted: ‘But what is your gut feeling?’, he replied:

    ‘I try not to think with my gut. If I’m serious about understanding the world, thinking with anything besides my brain, as tempting as that might be, is likely to get me into trouble.’

  21. Filias Cupio said,

    April 23, 2007 at 6:10 am

    I recall a Stephen Gould essay where he looked at games-with-hits runs by baseball players. With lots of players and lots of seasons, you expect some impressive looking runs by chance. He concluded that there was nothing beyond chance expectations *except* for the longest run (57 games, I think) which he said was exceedingly unlikely.

    It wouldn’t be hard to statistically test for “winning streaks” in athletes. E.g. I compile the batting performance of a cricketer for an entire season – for every delivery they faced, I record how many runs they scored and whether they got out. Now I extract at random just one of those deliveries, and give you all the rest of the data (in random order.) Your job is to give odds on what they scored in the delivery I withheld. Now suppose instead of giving you the season’s data all mixed up, I gave it to you in two lumps: deliveries faced by that player in the same week as the withheld one, and the rest. Could you better predict the score of the withheld delivery with this extra information?

    Program a computer to do this, feed it lots of different ‘withheld over’s, and for lots of batsman…

    Of course, at best you establish a correlation, not a cause. The correlation might (e.g.) be more to do with whether the captain had told them to play for safety or for quick runs, or the skill of the opposing bowler , rather than whether the batter is ‘in the zone’.

  22. Camp Freddie said,

    April 23, 2007 at 10:21 am

    The difficulty is that there are too many variables.
    In basketball, you’ll have “winning runs” due to skill, mainly while the opposing team is unable to deal with your tactics (bad marking or being marked by a below-average player).
    These normally end after a timeout, once the opposition coach spots that he needs to re-organise tactics to deal with you.

    You’ll often get people saying they’ve lost their touch after a break, when really it’s your opponents that have fixed their tactics.

    You’ll also have a roghly equal number of relatively bad runs, when you’re being marked by an above-average player.
    I suppose that this is technically a winning run due to the lack of opponents skill, rather than an ‘in the zone’ increase in your own.

    Of course, you’ll also have winning runs due to luck – which often seem like skill.

  23. Delster said,

    April 23, 2007 at 10:34 am


    the evolutionary pressure to develop an instinctive grasp of stats is fairly low…. i don’t know anybody who got themselves dead from misunderstanding the stats…

  24. Ciarán said,

    April 23, 2007 at 1:22 pm

    was not pretty much the problem Apple first experienced with the iPod shuffle – it played tracks too randomly and they had to reprogram the shuffle to make it less random.

  25. BrickWall said,

    April 23, 2007 at 1:47 pm

    Quick observation from today’s Telegraph relating to perceived cause.

    It seems PAT (a teaching union) are callin gon the Govt. to undertake further research on Wi-Fi because a member has complained of becoming ill following the installation of Wi-Fi in their school. Strangely enough when n=1 the whole sample seem to exhibit the problem!!

  26. andrew said,

    April 23, 2007 at 2:23 pm

    this reminds me of a 3 day course i went on courtesy of the wonderful people at social services who were my employers at the time, it was titled Heuristics and Applied Psychology and all i remember of it was an entirely pointless exercise in predicting the flipping of a coin similar to the basket ball scores and was meant to illustrate that humans do indeed work with “gut reactions” however as the course was so dull we all tended to go for a liquid lunch so the rest is quite hazy.
    These days i work in the health food industry (don’t just don’t we are not all loons) and the “gut instinct” or perceived value of some snake oils is eminently summed up by the much re-produced entry from godlessgeeks
    (1) My aunt had cancer.
    (2) The doctors gave her all these horrible treatments.
    (3) My aunt prayed to God and now she doesn’t have cancer.
    (4) Therefore, God exists.

    just replace (3) with i got some deer antler fur supplements (yes they exist)
    and (4) with therefore, supplements work and you get a taste of my day at work

  27. Dudley said,

    April 25, 2007 at 11:46 pm

    Re #6 (Ken)

    I think the book you;re thinking of is Life’s Grandeur by SJGould. Not one of his better ones.

  28. terry hamblin said,

    April 27, 2007 at 7:49 pm

    But I am sure it’s true that Manchester United never have penalties given against them.

  29. Dr Aust said,

    April 27, 2007 at 8:29 pm

    Hmm .. now there’s an interesting research project, terry. I would have thought some enterprising statistician or psychologist would have already looked into this splendidly real-world problem.

  30. bootboy said,

    April 28, 2007 at 3:07 am

    you won’t lose much spearing empty bushes every day, but decide just once that a lion’s a random pattern in the grass, and you’re out of the gene pool.

    That is such a better example than I had ready that I don’t know where to begin. Trouble is, I work on low level stuff – neural networks and how they form abstractions from percepts in a hierarchical chain. I always forget about the big things with teeth.

    * note to self – include more lions in justifications *

  31. jgw said,

    May 3, 2007 at 4:45 pm

    The example used is rather misleading in my opinion. Although the number of switches equals the number of repeats the total number of observations is odd.

    So in this case the ‘player’ has had a run of luck in the sense that he started on a O and that he got more X’s than O’s.

    Add an O to the end, so that the observations are equal, not the transitions, and the sequence looks very different.

    Looking at a sequence with equal number of transitions may well be more suitable, but perhaps the subject should have been asked if any part of the chain represented a streak of luck, rather than the whole chain.

  32. apothecary said,

    May 4, 2007 at 10:24 am

    Runs of “luck” do occur by chance of course – in the DICE therapy for stroke trial (Counsell et al BMJ 1994; 309: 1677-81, simulating stroke RCTs by rolling dice – a 6 = a death, 1-5 = survival), in one “trial” n=20, the 10 rolls for the patients in the “treatment” arm produced no “deaths”, but there were six “deaths” among the 10 “control” patients. The trialist describes how, when rolling the control group, the room fell eerily quiet as he rolled the fourth six in a row – this had never happened before! I think the chance is is 1 in 6^4, or 1 in 1296. In my rough and ready way, I reckon that’s about P=0.0008 (I realise that’s not a strictly correct interpretation of P, but it serves to contrast that chance finding with conventional levels of statistical significance). The DICE paper is, BTW and IMHO, a superb exposition of the play of chance, the need for large RCTs, the perils of MAs of small trials, the need for funnel plots, how results can be hyped, etc, etc

  33. Robert Carnegie said,

    May 7, 2007 at 11:36 am

    Our pattern-seeking skill means that any random number is liable to have something peculiar about it. A simple case (Martin Gardner, I think) would be randomly choosing 10 digits, each 0 to 9. If one or more than one digit appears repeated then it looks as though the system is biased – a little – but even when you choose 9 different digits in advance, the chance that the 10th random choice is different to all the others is only 1 in 10.

    I’m not sure how this is handled formally, but I think a number stops being random when you know its value.

  34. bootboy said,

    May 8, 2007 at 7:08 pm

    “but I think a number stops being random when you know its value.”

    AFAIK it’s when you can predict its value from the preceding sequence. It’s why, most computer programming languages use pseudo random numbers rather than random numbers – if you know the seed and the algorithm, you can calculate the next value. If you need true randomness, you need an external source of entropy – for example a plate that detects particle impacts – which themselves are not purely random (they’re chaotic) but for all intents and purposes, they might as well be (you’d need a particle-level model of the universe to make the prediction).

  35. longyan said,

    November 6, 2009 at 2:52 am

    It is no use doing what ugg bailey button you like ugg boots ; you have got to like ugg classic cardy what you do  My philosophy of ugg lo pro button life is work . When work is a pleasure , life is joy ! When work is duty ,ugg knightsbridge life is slavery .Work banishes those three great evils : boredom , vice, and poverty.