Crystal Balls… and Positive Predictive Values

December 9th, 2006 by Ben Goldacre in bad science, statistics | 45 Comments »

Ben Goldacre
Saturday December 9, 2006
The Guardian

Basically all those childish columns about Dr Gillian McKeith PhD are just a cover, so that every few months I can lay some viciously complicated maths on you, as background to a major news issue. This week, after a major government report, we heard that one murder a week is committed by someone with psychiatric problems. Psychiatrists should do better, the newspapers told us, and prevent more of these murders.

It’s great to want to reduce psychiatric violence. It’s great to have a public debate about the ethics of preventive detention (for psychiatric patients and other potential risk groups, perhaps). Before you career off and have that vital conversation, you need to understand the maths of predicting very rare events.

Let’s take the very concrete example of the HIV test. The figures here are ballpark, for illustration only. So: what do we measure about a test? Statisticians would say the HIV blood test has a very high “sensitivity” of 0.999. That means that if you do have the virus, there is a 99.9% chance that the blood test will be positive. Statisticians would also say the test has a high “specificity” of 0.9999 – so if a man is not infected, there is a 99.99% chance that the test will be negative. What a smashing blood test.

But if you look at it from the perspective of the person being tested, the maths gets slightly counterintuitive. Because weirdly, the meaning, the predictive value, of a positive or negative test that an individual gets, is changed in different situations, depending on the background rarity of the event that the test is trying to detect. The rarer the event in your population, the worse the very same test becomes.

Let’s say the HIV infection rate amongst high risk men in a particular area is 1.5%. We use our excellent blood test on 10,000 of these men and we can expect 151 positive blood results overall: 150 will be our truly HIV positive men, who will get true positive blood tests; and one will be the one false positive we could expect, from having 10,000 HIV negative men being tested with a test that is wrong one time in 10,000. So, if you get a positive HIV blood test result, in these circumstances your chances of being truly HIV positive are 150 out of 151. It’s a highly predictive test.

But now let’s use the same test where the background HIV infection rate in the population is about one in 10,000. If we test 10,000 people, we can expect two positive blood results overall. One from the person who really is HIV positive; and then one false positive that we could expect, again, from having 10,000 HIV negative men being tested with a test that is wrong one time in 10,000.

Suddenly, when the background rate of an event is rare, even our previously brilliant blood test becomes a bit rubbish. For the two men with a positive HIV blood test result, in this population where one in 10,000 have HIV it’s only 50:50 odds on whether you really are HIV positive.

Now let’s look at violence. The best predictive tool for psychiatric violence has a “sensitivity” of 0.75, and a “specificity” of 0.75. Accuracy is tougher, predicting an event in humans, with human minds, and changing human lives. Let’s say 5% of patients seen by a community mental health team will be involved in a violent event in a year. Using the same maths as we did for the HIV tests, your “0.75” predictive tool would be wrong 86 times out of 100. For serious violence, occurring at 1% a year, with our best “0.75” tool, you inaccurately finger your potential perpetrator 97 times out of a hundred. Will you preventively detain 97 people to prevent three events? And for murder, the extremely rare crime in question, occurring at one in 10,000 a year among patients with psychosis? The false positive rate is so high that the best test is almost entirely useless. I’m just giving you the maths on rare events. What you do with it is a matter for you.

Extra Reading:

I sound like a school mistress when I talk about extra reading.

Gerd Gigerenzer has written a lot about risk, the explanation above is largely from his work on HIV screening, and his excellent book “Reckoning With Risk” is a ripping ride through a lot of similar issues:

George Szmukler has written on the same thing but specifically looking at psychiatric risk, there are two papers here which seems to be free to access (didn’t ask me for my Athens login from home at any rate):

There is also a Gigerenzer paper from the BMJ which should be free to access, not quite on this topic, but very good, if anyone finds a free access Gigerenzer paper that does the HIV stuff in detail do please let me know:

… and power to the Open Access Journals movement. There are some other great papers, but they’re behind the closed doors of their journals’ login pages. It’s a disgrace that all those of you without a current academic institution Login/Password but who are interested in the questions can’t read these edifying academic papers for free online. Information wants to be free etc.


BUCHANAN, A. & LEESE, M. (2001) Lancet, 358, 1955-1959. A review of risk assessment tools.

MONAHAN, J., STEADMAN, H. J., SILVER, E., et al (2001) Rethinking Risk Assessment: The MacArthurStudyof
Mental Disorder and Violence.Oxford: Oxford University Press. The tool that gives 0.75

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

45 Responses

  1. SciencePunk said,

    December 9, 2006 at 4:13 pm

    Nice job explaining some complicated math! Any plans to clear up the Monty Hall problem once and for all?

  2. Karellen said,

    December 9, 2006 at 5:29 pm

    Hmmmm…..aren’t you mixing up the false positive and false negative rates? Or just assuming they’ll always be the same?

    Or just assuming that both will be “high enough” for the illustration that it won’t make any practical difference?

    (Bruce Schneier has some posts about how false positives and false negatives cause problems; top google result is

    but there are a few more on his site if you look around.

  3. RS said,

    December 9, 2006 at 5:58 pm

    What is the increased risk of violence or murder in psychiatric patients over and above that of a matched sample?

  4. jimyojimbo said,

    December 9, 2006 at 6:16 pm

    Am interested in your, and other readers’, views on open, i.e., free, access to scientific literature. In an ideal world, obviously it would be great. But the peer review system is necessary, we all know that, and needs regulating somehow. Respected peer-reviewed journals will cost money to run (even if, so I hear, some publishers make absolute sh1t loads of money). So should the cost be placed on academic institutions as now? This, as you say, means a lot of stuff is unavailable to non-affiliated individuals, the general public, and also to less affluent academic institutions, especially in the developing world.

    The alternative? A friend of mine who works in academic publishing described a proposed system whereby journals would be free, and funds raised from people submitting papers. Pay-per-submission – regardless of whether a submission is accepted. The danger in that is obvious – less affluent research groups squeezed out of the “market”, and industry funded research getting first dibs.

    Rock and a hard place, innit? Anyway, would be interesting to hear views on that.

  5. Bob O'H said,

    December 9, 2006 at 7:21 pm

    Wow, all that without a mention of Rev. Bayes!

    Karellen – Ben’s post is about false positive and negative rates (they’re 1-specificity and 1-sensitivity respectively). The site you link to explains the same problem.

    Incidentally, the maths is also explained by Bland and Altman in their Statistics Notes for the BMJ: (check the Diagnostic Tests notes).


  6. BSM said,

    December 9, 2006 at 7:50 pm

    These are points that are very hard to get across to people.

    Here is a company offering a “screening” test for lymphoma in dogs with a sensitivity and specificity of 84% and 83%, respectively.

    If we guess the prevalence to be 1%, which is too high probably, the PPV is 5%. In other words, 19 out of 20 positives will be false and 17 times out of 100 tests you will have created needless worry. They are playing a sneaky little card by emphasising the benefits of a negative result (NPV=99.8%). So, you can scare your pet owner with the idea of a 1% risk then spend a load of money on a test and leave them a 0.2% risk after a negative result.

    Thar’s money in them NPV’s.

    Suffice to say I shall not be using this test.

  7. BSM said,

    December 9, 2006 at 7:53 pm

    If the prevalence is really 0.1%, which is more likely, then only 1 in 200 +ves will be real!

  8. Ben Goldacre said,

    December 9, 2006 at 8:06 pm

    wow, thanks for Petscreen, i’ll keep it at the back of my mind, i keep meaning to do something on the maths of screening generally, thats something else that the media cover spectacularly badly, it’s just so tempting for them to assume that action is better than inaction, while any medical student could tell you that assessing the benefits of a screening program is not just a fascinating problem, it’s also, once you’ve learnt the chapter, a total banker question for any epidemiology finals paper.

    anyway, i just got this over email. “What Ben Goldacre had to say was so surprising that I checked it with a spread-sheet, which I have attached in case you might be interested. Best wishes, Seamus O’Connell”

    i can honestly say that very few things could make me happier than the thought of someone reading the column and then firing up Excel on a saturday morning to check my working. bravo and quite right too. it’s a very useful .xls file, which you can download and tinker with here, or there’s an image of it here:

  9. tomh said,

    December 9, 2006 at 10:20 pm

    I must say that all this false positive/negative is realy hard to grasp unless you actully try doing the maths for yourself, but once you have – its facinating. Shame you have to be doing alevel maths to have any chance of doing it though.

  10. Ben Goldacre said,

    December 9, 2006 at 10:24 pm

    i dont think the maths itself is very advanced, you just have to reeeeeaaally concentrate. that’s what’s neat about it.

  11. kingshiner said,

    December 9, 2006 at 10:44 pm

    Has mammography already been done to death on Badscience? I couldn’t find anything in the archive…. anyway the PPV of mammography is about 11% in over 50-year-olds, so far more false than true positives. Women having this test over and over again should expect a high probability of a positive mammography at some time during their screening career.

  12. Ben Goldacre said,

    December 10, 2006 at 1:34 am

    i last remember seeing foolish pleas for mammography to be extended to younger age groups ages ago, i think before i started doing the column. if you spot any in mainstream media do let me know and i will activate attack mode. the last time i remember a screening column opportunity coming up was when some companies offering MRIs and stuff were getting some flattering coverage about 18 months ago. a single issue like mammography is a much cleaner kill tho.

  13. Cytos said,

    December 10, 2006 at 4:25 am

    Wikipedia has a nice little section using Bayes to calculate false positive rates in medical tests:

    it also includes some information about the use of DNA testing in court which often suffers from the same misconceptions about probibilities.

    Also to jimyojimbo: There are already a series of open access journals PLOS probably being one of the most reported:

    Otherwise here’s a list of them:

  14. Mark Frank said,

    December 10, 2006 at 10:08 am

    The maths is fine of course, but a couple of comments.

    In the HIV example I would say a test that shows that your chances of having HIV have increased from 1 in 10,000 to 1 in 2 is rather useful and important – hardly rubbish.

    In the psychiatric example. I don’t see that the Bayesian analysis adds anything. The key estimate that we need is – if a subject has a case history Y then what are the chances that they will be violent in the future. In the case of HIV this is perhaps most easily calculated through asking about the sensitivity and specificity of the test and the prevalence of the condition in the population at large. That’s more practical than giving a large number of random people the test and seeing how many develop HIV. But in mental conditions surely the opposite is true. There is a no condition “will be violent” and a test for it. Rather there is a case history including not just mental history but also domestic and social circumstances which may or may not lead to violence. A Bayesian approach would require knowing what proportion of violent people had this type of case history (assuming you can categorise it satisfactorily) and what proportion of non-violent people had the same case history, as well as the prevalence of violent people in the population. This is a pretty tricky retrospective study. Surely it would be simpler to look at the people known to have that type of case history and track how many became violent?

  15. Joe said,

    December 10, 2006 at 10:37 am

    The HIV infection example is an interesting one, and there’s a further subtlety that’s often ignored.

    People presenting themselves for an HIV test often (though obviously not always) do so becuase they think they may have been exposed. The prior for the false positivity calculation should be the fraction of people seeking to be tested with the virus, not the population fraction – this pushes down the chance of one positive being false.

  16. RS said,

    December 10, 2006 at 12:19 pm

    Re: Open access journals – all the ones I’m familiar with charge you for submission (e.g. but I think they have institutional membership (i.e. the institutions pay upfront and there are no further publication costs) and often you can include publication costs into your grant.

    Re: Mammography in the young, I think it was Friday or (maybe Thursday), in either the Telegraph or the Guardian that I saw an article advocating exactly that. Ah, found it, one sensible(ish) bit followed by anecdote wth some alarming fallacies on display

    I never cease to be amazed by the amount of money spent per life saved on breast screening. You’d get a lot of Herceptin for that.

  17. Ben Goldacre said,

    December 10, 2006 at 1:10 pm

    mark frank #14: if you follow the references in Szmuklers paper linked above on risk prediction, there are plenty of studies on predicting risk in various ways which follow up and examine the outcomes from the predictions.

    re #16 screening etc. i think it’s really interesting that prof michael baum, cancer chap, has done a lot of work on both quackbusting (healthwatch, numerous chapters, papers and letters to times etc) as well as on disabusing people of the more grand claims of screening advocates.

  18. AndrewT said,

    December 10, 2006 at 3:25 pm

    Cracking article ben, even my mum understood the stats.
    The side issue of compulsory treatment orders was discussed on the Moral Maze this week.

    My friend had a mammogram at 19, as her mum had a positive one. She also came back positive. Neither had it in a further analaysis, and the unnecessary trauma this caused for her was ridiculous. You’ll always get a Daily Mail anecdote that ‘a mammogram saved my life’ from a 20 yr old, and never a ‘my life was turned upside down for a month when i thought i had cancer’ from the much higher number of people affected.

  19. Twm said,

    December 10, 2006 at 5:29 pm

    Thanks Ben, very informative. I spent part of my sunday afternoon hung over in my dressing gown fiddling with open office spreadsheet and a mug of coffee.
    Counter intuitive statistics are definitely worth immersing youself in till you are convinced (ala monty hall as someone mentioned already). I find that as well as excel, a high level language such as Python is useful for knocking up quick models of populations and statistics.

  20. kim said,

    December 10, 2006 at 7:44 pm

    Thanks – I found this piece a real eye-opener: all stuff I didn’t know before.

    BTW, Bad Science is mentioned briefly in this week’s Private Eye in the context of Gillian McKeith, if you haven’t seen it.

  21. Mark Frank said,

    December 10, 2006 at 8:36 pm

    I have either failed to understand something or failed to make myself clear. But then again it is all quite subtle so I may be totally screwed up.

    My issue is not with studies that “follow up and examine the outcomes from the predictions”. I absolutely would expect them to exist and be useful. But surely you don’t need a Bayesian analysis to use that data and you don’t need to know the frequency of violent or non-violent types in the population. You just measure the proportion of people that are predicated to be violent but turn out not to be; and the proportion that are predicted not be violent but turn out to be. If you trying to make a judgement on an individual that’s all the probability data you need (of course, if you are assessing how many violent people will go undetected over a year then you would want some population information).

    As far as I can see it is only necessary to use a Bayesian approach if your data is of the opposite kind – what is the proportion of each prediction given a specific outcome e.g what proportion of people committing violent acts would have been predicted to be violent had they been assessed. This appears to be mind-blowingly hard to measure without bias and a pointlessly complicated way of getting the probabilities you actually need.

    The more I think about it – the more it seems to me the HIV case and the mental health case are not analogous. I expect I’m wrong….but I can’t see why.

  22. Ben Goldacre said,

    December 10, 2006 at 8:46 pm

    Mark Frank, your knickers are officially in a twist, maybe try reading it again, along with this article by Martin Bland (great name for a statistician) on PPV:

  23. Robert Carnegie said,

    December 11, 2006 at 12:22 am

    Mark: I think the condition that you would be screening for is “Is Liable To Go Off On One”. You would not want to have psychiatric patients wandering around who are Liable To Go Off On One. You have a patient review protocol that re-assesses treatment for paients and provides an estimate of whether they will be Liable To Go Off On One if treatment is followed. And you try to treat them accordingly.

    This precautionary principle of locking people up because they are Liable To Go Off On One, Ben raises doubts about it. After all, anyone in the street is a little bit Liable To Go Off On One, for many sorts of reason. This is the argument I made on the forum – but not what Ben’s taken up for his column – that a very small minority of UK manslaughter is by certified lunatics, most is by normal people. This figure in turn is dwarfed by road death statistics, so we should lock up all motorists and save more lives than if we put away the poor old looneys.

    This is a question of scale, you see – one killing a week sounds like far too common an occurrence, but there are other worse nastinesses that we put out of our mind.

    There also are laws against locking people up because they might do something nasty rather than because they have – unless it’s might do terrrorism – so we can’t properly put away a hundred-odd mentally ill people because if we don’t then we expect to see a couple of them kill somebody, but we don’t know which. Except that probably we can lock them up.

    Mental health is said to be a “Cinderella service” of the NHS that hasn’t had enough attention in recent years while spending in other services has increased, and improvement of treatment seems like a good idea here, notwithstanding difficulties such as screening. But how much improvement would it take to improve the image, even the BBC wilfully interpreted the report to mean “one killing a week”, would they be any quieter about failure in the NHS if it was down to one killing a month by a looney? Would the service have to beat the Typical Man In The Street murder risk by a long way? It’s a bit like that old joke about women having to do their job twice as well as a man to get the same respect.

  24. Ben Goldacre said,

    December 11, 2006 at 12:40 am

    robert: if i’d had space i would have put the “murder a week” stat in its context like you suggest, ie it makes up 5-10% of all homicide, has been stable over time while homicide increases, etc. i kind of feel like anyone could do that job though, and i’ve been sanctimoniously somewhat disappointed by the fact that the everyday political columnists have failed to generate that kind of critique of the “murder a week” coverage.

    the issue of screening everyone is covered by szmukler in one of the papers i reffed above:

    “Discrimination can be avoided in two ways: first, by making legislation generic so that all persons are liable to preventive detention on the basis of having exceeded a particular threshold in the risk they pose to others; that is, all those in the ‘dangerous’ circle in Fig. 1, not just those in the segment ‘a’; or second, not allowing preventive detention for anyone.

    “In relation to risk assessment, the argument for fairness has special relevance. If persons with mental disorder should not be singled out as liable for preventive detention on the basis of risks to others, it follows that risk assessment to predict who will be violent should not be restricted to this group of persons (i.e. to only those in the ‘mental disorder’ circle in Fig. 1). If we wish to reduce violence in our community and if we are to use risk assessment to help, then all of us (i.e. everyone in the large outer circle in Fig. 1) should be equally liable to be assessed when there is some kind of ‘trigger’ event indicating that such testing is appropriate. How this might be implemented is unclear. People subject to a risk assessment might be all those who have been involved in a violent incident of any kind (for example, persons with injuries seen by general practitioners, accident and emergency departments, units dealing with trauma, or by police), all those who misuse substances (including alcohol), all those who have been involved in a road traffic accident, those who have been the subject of accusations of threatening or aggressive behaviour, say by neighbours or in the workplace, and so on. Risk assessments would thus become the business of many agencies – health professionals (not just mental health) in any setting, the police and social agencies (including local authority services, homeless persons units, employment offices). ‘Neighbourhood watch’ schemes could assume a whole new dimension.

    “I doubt that many of us would support such an approach, even if the ‘numbers’ were able to lead to accurate prediction. Yet we accept such procedures in the case of those with mental disorder. This double standard exposes the extent to which we discriminate against those with mental illness, devaluing their rights compared with ours.”

  25. gjf said,

    December 11, 2006 at 2:56 am

    The same type of thing crops up with drug testing before employment.

    Sensitivity :- the chance that a clean person has thier life ruined

    Specificity :- the chance that a complete smackhead gets the job

    Mythbusters tested some devices that were way too sensitive to opiates and got positive results from eating poppy seed bagels

  26. parkenf said,

    December 11, 2006 at 9:57 am

    The example of the HIV test is an interesting one as its sensitivity and specificity are astoundingly high. This is not always the case – certainly not for “screening” type programs as alluded above.
    Take cervical cancer: annual incidence (new cases) is about 1 in 20000 So given a set of woman not already known to have cervical cancer, you’d expect about 1 in 20000 to have it, at any one time.
    The tests: (new) FBC has sensitivity of 90%, and specificity of 85%
    (conventional) Pap has sensitivity of 79%, and specificity of 89%
    Suppose we test a million women annually. From our incidence rate we get that 50 of these woman have cancer or cancerous conditions that should be picked up by the test.
    Using the FBC test of the 999950 clear women, 150000 will have false positives which may need following up. Testing the 50 affected women, 5 will have false negatives.
    Using the Pap test of the 999950 clear women, 110000 will have false positives. Testing the 50 affected women, 10 will have false negatives.
    Bear in mind that a lot of these follow up measures are invasive, distressing, and even without them the wrong diagnosis can be very worrying (even if you understand the stats! Which I still can’t quite bring myself to believe).
    Now presumably this is why the test requirement is every 5 years (is it?). This pushes up the number of cancers to 250 per million, and these are less likely to be treatable. But the alternative is a staggering number of false positives every year.
    This is why many people think screening for rare diseases is counter productive. Even with the most specific tests (and these aren’t particularly specific) you get an enormous amount of false positives. Really, how much better off are you knowing that you’ve only got 150000 women to test now to find the (now 45) real cancer patients? (per million)

    On another point, this relates to a popular misunderstanding of school level statistics – well I studied maths with some statistics at university and I certainly never considered this. Generally the only statistical analysis taught to students either in maths or science is null hypothesis testing. Assume H0 (e.g. there is no effect) as against H1 (there is an effect) do your experiment, gather the data, reject H0 with p=.05. All well and good (disregarding the fact that my school level experimental skills were so bad that given the error bars any hypothesis was plausible).
    The error is in the logical fallacy: p(H0 is false)=p(H1is true). That this is not generally true is demonstrated in all these examples.

  27. Dr Aust said,

    December 11, 2006 at 11:35 am

    Completely facile compared to all this weighty number-crunching, but the idea of whether you could predict sociopathy in a “scientific” way, and what that would entail morally, was the subject of Phillip Kerr’s best techno-thriller, “A Philosophical Investigation”

    – which was supposed to be becoming an American movie directed by Chris “C X Files” Carter, though doesn’t seem to have got made yet.

  28. Dr Aust said,

    December 11, 2006 at 11:44 am

    Slightly more seriously:

    The trouble with the “released nutter kills” story is the combination of such an event being (i) utterly random; (ii) unexpected; (iii) gruesome; (iv) rare, shocking and thus ghoulishly media -worthy.

    Car crash deaths meet i, ii, and often iii, but since they don’t meet iv the details are not picked over in public and crash deaths become sort of “rare but commonplace”. If someone you vaguely know is killed in a car smash you are shocked at the death, but not so much at the manner of it.

    Murders most commonly meet ii (although not always – think cases of culmination of repeat domestic violence), but usually only get reported if they have added “special circumstances.(reportable gruesome-ness).

    Anyway, events like the random murders of Megan and Lin Russell, or of Jonathan Zito, by mental patients generate such a media storm that rational judgement goes out the window in the face of campaigns, tabloid outrage, and political posturing and knee-jerking.

    In days of yore this would be the stimulus to set up some sort of judicial enquiry, or Royal Commission, that could look at the issue properly, including picking over the kind of stats Ben discusses in the column . But nowadays Govts only have one of these fora if they want to bury the issue in endless hearings – see e.g. the Bloody Sunday enquiry.

    Finally, re screening, Private Eye used to have a campaign arguing that screening men over a certain age (65?) for AAAs (abdominal aortic aneurysms) would be much more cost-effective than any of the current health screening programmes. Might make an interesting counter example. Of course, it is not just the screening stats that are influential, but also the likely health outcome of an “undetected nasty” (probability that a cancerous cell will go on to be a life-threatening cancer, or that an aneursym will rupture) AND the “political context” (screening for certain things is a political hot-button, breash cacner being an example).

    Another screening test widely -touted but (as I remember) likely to lead to loads of false positives, scares and follow-up nasty tests is measuring PSA levels in men to assess risk of prostate cancer. This example highlights what happens with screening in insurance-based systems, where medical providers can make money pushing the idea of screening “direct to the consumer”. I am pretty sure this happens with prostate screening in the US, and I also read somewhere that people over 45 or so in the US are now rushing in to have regular colonoscopies out of a fear of bowel cancer.

  29. Dr Aust said,

    December 11, 2006 at 11:48 am

    Excuse the spelling – meant “breast cancer”, of course.

  30. Dr Aust said,

    December 11, 2006 at 12:10 pm

    Finally, as someone else mentioned above, what is really interesting – at least to me as a teacher of often mathematically lazy (rather than per se ignorant) science and medical students – is how it all makes sense ONCE YOU PUT SOME REAL NUMBERS THROUGH THE STATISTICS – see eg #26.

    This is one of those points one makes at the students over and over – to understand it, run the numbers yourself, don’t just memorize the probabilities without context.

  31. Gleamhound said,

    December 11, 2006 at 1:17 pm

    A psychiatrist friend used to take a pragmatic approach to patients where there was a concern of future violent behaviour.

    These patients would be admitted for a few days to the psychiatric unit for “Star testing”, which was a battery of standard tests and assessments. They would be discharged home/ to the community afterwards, when as usual, the tests did not reveal anything much.

    The reason this was called Star testing was that if the patient then did go and do something bad, the psychiatrist did not end up on the front page of the Daily Star. Or at least i he did, he could justifiably say that the patient had been properly assessed with the correct (presumably evidence-based) tools.

  32. The_Master said,

    December 11, 2006 at 4:21 pm

    I have found this really interesting, the mention of Breast Cancer screening interested me. I was sent an article last week written by a patient asking for a Prostate Screening Program to be set up, however at the same time he highlighted the shorcomings of such a program that as you know PSA is not that specific, 33% of people tested will get a false positive (High PSA no cancer) on PSA alone and that 20% of people tested will get a false negative (normal PSA has cancer). He talked himself out of it.

  33. wewillfixit said,

    December 11, 2006 at 5:19 pm

    Going away from the statistics somewhat, if you go away from the media coverage and more to the report itself, you might find that there is more to preventing violence by the mentally ill than “locking them up”. Like better follow up after discharge, assertive outreach etc for high risk groups of patients.

  34. apothecary said,

    December 11, 2006 at 6:03 pm

    Oops. Sorry parkenf. I scrolled straight past and missed your entry for some reason. Grovelling apologies for making the same point you did re cervical screening (but not nearly as well as you did)

  35. kingshiner said,

    December 11, 2006 at 7:32 pm


    I see what you mean – a test which is over sensitive can become non-specific (innocent bagel-eaters get banged up), but strictly the chance that a clean person has their life ruined is 1-specificity and not the sensitivity itself. Similarly a test which is over-specific can become insensitive but the chance that a junkie gets missed is 1-sensitivity, not the specificity. For tests with a continuous result you can pick your tradeoff between sensitivity and specificity by tinkering with the threshold at which you declare the test positive.

    This little bit of maths was applied in WWII where radar operators had to try to distinguish aircraft from everything else using not very good radar. If they had too low a threshold for declaring a little green splodge hostile, Biggles would keep getting scrambled to intercept sparrows; too low and some town would get bombed flat.

  36. kingshiner said,

    December 11, 2006 at 7:37 pm

    sorry, that should read too high and some town gets bombed…

  37. Twm said,

    December 13, 2006 at 3:25 pm

    The rather good TED lectures contain a Talk by Peter Donnelly who talks about misleading statistics (including HIV,cot deaths etc).

  38. Mark Frank said,

    December 16, 2006 at 11:59 am

    I thought this thread was dead. But as the odd person is still looking at it…

    At odd moments during this week I read all of Ben’s references, and a few other articles, and worked through the maths, and ordered the book. They are all good stuff and the maths is not difficult. But I still don’t think my knickers were so very twisted…. I can however phrase the question better.

    My question is – what evidence is there that the statistical model that applies to HIV testing also works for mental assessment for violence? None of the references addressed this point – they just assumed it was true. It doesn’t seem obvious to me, but quite possibly there is such evidence – I hoped someone might point me to it.

    I wrote a short piece here
    explaining what I mean – just in case someone is interested or knows the answer.

  39. Mark Frank said,

    December 16, 2006 at 12:26 pm

    Oh *** I wrote the link wrongly in my post 38 above. It should be:

  40. cellslayer said,

    December 18, 2006 at 9:28 pm

    With respect Parkenf #26, you are all over the place with your knowledge of cervical cancer and the cervical screening test.

    Why have you selected old data from 2001 when, on the same web page you have cited, more recent data is available which shows that there is no hard evidence that the new test (LBC, not FBC as you state) is any better than the old test? Selection bias or what?

    A few more corrections:
    1. It is disease PREVALENCE that determines the predictive value of a test, not incidence as you imply. The lower the prevalence of the disease (i.e the more healthy people than ill people) the more likely a positive test will be a false positive.

    2. The cervical screening test is a test for pre-cancer, NOT cancer. Many cases of pre-cancer regress to normality without the need for treatment. So even a ‘true positive’ is of limited clinical meaning.

    3. The prevalence of cervical pre-cancer can never be known, as we rely on its detection through an imperfect screening test. So all the maths is guesswork.

    However, I agree with the basic thrust of the Bayesian maths given in the above articles and have now been enertained for a good hour.

    Allow me to offer more realistic figures (although I admit my estimate of prevalence is an educated guess). We know that the cervical smear test is not perfect so let us assume that the sensitivity pre-cancer is 95% (I’m being generous) and the false positive rate is 1%. Assume the prevalence of pre-cancer to be 1 in 200 women (0.5%). Suppose we take 100 000 women and these percentages apply exactly to them. 500 of them will have pre-cancer and 99 500 will not. Of the 500 that do have pre-cancer, 475 will have a positive test (95% sensitivity). Of the 99,500 that do not have pre-cancer, 995 will have a positive test (1% false positives). Thus we have the following:

    Number of ‘true positives’ = 475
    Number of ‘false positives’ = 995

    Therefore, out of 100 000 women, 1470 have a positive test but only 475 of these have pre-cancer. So the probability of pre-cancer with a positive test is just 32% (475/1470). This is the positive predictive value (PPV) of the test. It is significantly higher than the prior probability of 0.5% but it is still quite low.

    Now here’s an interesting thought. With the impending launch of mass vaccination against HPV (the virus that causes cervical cancer), the PPV of the cervical smear test can only get worse. Women will be told, quite rightly, that they will continue to need screening in the vaccination era. But in due course the screening test as it exists now, even the ‘new’ LBC test, will be quite useless.

  41. rob said,

    January 4, 2007 at 3:41 pm

    Tuesday’s Guardian has a very similar column (no, not Ben’s one, I’m not that foolish) –,,1981233,00.html

  42. Dr Aust said,

    January 4, 2007 at 9:39 pm

    Re #41 –

    Yes, did wonder if Jonathan Wolff had been reading Ben’s column.

    Though if he has it is probably a good thing, since he is a philosopher and it is nice to see real understanding spreading beyond the science / maths geek-y.

    And he generally writes a lot of good sense about Unversities and academia.

  43. jiangjiang said,

    December 8, 2009 at 2:22 am

    ed hardy ed hardy
    ed hardy clothing ed hardy clothing
    ed hardy shop ed hardy shop
    christian audigier christian audigier
    ed hardy cheap ed hardy cheap
    ed hardy outlet ed hardy outlet
    ed hardy sale ed hardy sale
    ed hardy store ed hardy store
    ed hardy mens ed hardy mens
    ed hardy womens ed hardy womens
    ed hardy kids ed hardy kids ed hardy kids

  44. drbloke said,

    April 3, 2010 at 11:37 am

    I hope I’m not breaking some rule of etiquette by posting here 3 and a bit years after the original article appeared.

    I found this article very interesting and very clear. However, it occurred to me that the so called “useless” tests become rather more useful if those tested positive are tested again a second time.

    Take the example of the HIV test where sensitivity is 0.999 and specificity is also 0.99999 and infection rate is 1 in 10 000. Ben states there will be 1 positive test result and 1 false positive test result. He concludes that the chances of you truly being HIV+ from such a test is 50:50. Fair enough.

    But what if the two people who tested positive are tested a second time. Since there are only two people to be tested this is would not be an arduous task for the doctors involved. There is now a probability of 0.999 that the true positive would give another positive result but only 0.0001 probability that the false positive would get a second false positive result. So someone testing positive a second time can be 99.9% sure that the test is correct. Much better than 50:50.

    Am I making a terrible mistake somewhere?

  45. darzil said,

    December 20, 2012 at 10:01 pm

    It’s ages later, but the mistake I think is that you are assuming that each test is entirely dependant of the other.

    If for example, a person had a rare molecule that acted like HIV to the test, but was not HIV, the test would show them as having HIV every time if 100% repeatable.

    A test isn’t a dice roll, it’s testing various parameters of the sample, and if the false positive is due to the parameters rather than sampling issues, so you can’t treat two tests as independant events.