A frankly thin contrivance for writing on the fascinating issue of subgroup analysis

April 25th, 2009 by Ben Goldacre in bad science, big pharma, nutritionists, subgroup analysis | 53 Comments »

Welcome back to the only home-learning statistics and trial methodology course to feature villains. You will remember the comedy factory of the Equazen fish oil “trials”: those amazing capsules that make your child clever and well behaved. A new proper trial has now been published looking at whether these fish oil capsules work. They took 75 children with ADHD aged 8 to 18, split the group in half randomly, and gave each child either genuine fish oil capsules, or dummy capsules. They measured ratings scales, and a Clinical Global Impression scale, but there was no difference between the two groups. The fish oil pills did nothing, as in many previous studies, so this trial has not been press released by the company, nor has it been covered in the media.

The funders of this study, Equazen, will doubtless have been disappointed with a negative result. The authors of the study may have been disappointed too. But there was some light on the horizon. They looked at the data more closely, and found that some children did, in fact, respond: “a subgroup of 26% responded with more than 25% reduction of ADHD symptoms and a drop of CGI scores to the near-normal range.”

Subgroup analyses are widely derided in academia, and for very good reasons. The coins are randomly distributed throughout your christmas pudding. If you x-ray it, and follow a very complex path with your knife, you will be able to create one piece with more coins in it than the others: but that means nothing. The coins are still randomly distributed in the cake.

And yet this optimistic overanalysis is seen echoing out from business presentations, throughout the country, every day of the week. “You can see we did pretty poorly overall,” they might say: “but interestingly our national advertising campaign did cause a massive uptick in sales for the Bognor region.”

Interestingly it turns out that you can show significant benefits, using a subgroup analysis, even in a fake trial, where the intervention consists of doing absolutely nothing whatsoever. 30 years ago Lee et al published the classic cautionary paper on this topic in the journal Circulation: they recruited 1073 patients with coronary artery disease, and randomly allocated them to receive either Treatment 1 or Treatment 2. Both treatments were non-existent, because this was simply a simulation of a trial, but they went through the motions of randomising, and following up the data, to see what they could find in the random noise of patients’ progress.

They were not disappointed. Overall, as expected, there was no difference in survival between the two groups. But in a subgroup of 397 patients (characterized by “three-vessel disease” and “abnormal left ventricular contraction”) the survival of Treatment 1 patients was significantly different from that of Treatment 2 patients. This was entirely by chance.

You can also find spurious subgroup effects in real trials, if you do an analysis that’s foolish enough. Close analysis of the ECST trial found that the efficacy of a procedure called endarterectomy depended on which day of the week you were born on. Base your clinical decisions on that: I dare you. Furthermore there is a beautiful, almost linear relationship in this trial’s results between month of birth, and clinical outcome: patients born in May and June show a huge benefit, then as you move ahead through the calendar, there is less and less effect, until by March it starts to seem almost harmful. If this had been a biologically plausible variable, like age, this subgroup analysis would have been very hard to ignore.

It goes on. The ISIS-2 trial compared the benefits of aspirin against placebo during a heart attack. Aspirin improves outcomes, but a mischievous subgroup analysis revealed that it is not effective in patients born under the star signs of Libra and Gemini. Should these patiens be deprived of treatment? Because sometimes subgroup analyses can have a damaging impact on practice: the CCSG trial found that aspirin was effective in preventing stroke and death in men but not in women, and as a result, women were undertreated for a decade, until further trials and overviews showed a benefit.

And sometimes there can be what we might call proper mischief. The CLASS trial compared a painkiller called celecoxib against two older pills, over six months: this new drug showed fewer gastrointestinal complications, and so lots more people prescribed it. A year later, it emerged that the original intention of the trial had been to follow-up for over a year. The trial had shown no benefit for celecoxib over that longer period, but when they only looked at the subgroup of results at six months, the drug shined.

You are unlikely to find the answers to complex problems like school performance and behaviour in any pill, whether it’s ritalin or fish oil, and yet despite the rather desperate anti-establishment swagger of the $60bn food supplement pill industry, time and again we see that they use the exact same tricks as the $600bn pharmaceutical industry. Although Equazen, we might finally mention, are wholly owned by the £1.6bn pharmaceutical company Galenica.


I’ll bung up full references to all the comedy stats papers looking at subgroup analysis at 3pm today, sorry got to sprint now!

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

53 Responses

  1. SteveGJ said,

    April 25, 2009 at 11:01 am

    This particular abuse of statistics drives me nuts – just because it is so prolific and influential. It’s beloved of politicians, managers, consultants and just about any special interest group.

    I think we can give up on ever expecting these people to not clutch at statistical straws. However, I’m wondering if there could be some form of independent service that evaluates the statistical significance of media reports using some form of “traffic light” system on the trustworthiness of the output. At the moment there are a bunch of bloggers who flag up unreliable reports, but not on any consistent basis. Something that had a simple reliable/reservations/witchcraft rating system might be good. A website rating the reliability of reports maybe. Of course there is not a snowball’s chance in hell of any agency funding something like this, but maybe a volunteer system of those suitably qualified. It could never keep up with the pace of media reports and press releases in real time, but even a retrospective is useful as media reports are often referred back to over and over again. To paraphrase a saying attributed (unreliably) to Goebels, if you repeat a misleading report often enough it becomes accepted as the truth.

  2. MarkThompson said,

    April 25, 2009 at 11:20 am

    Ben, I wonder if you have read “Outliers” by Malcolm Gladwell.

    There is an interesting theme followed throughout the book about how success is often down to much more than innate skill. One of things that has been discovered is that successful sports people are much more likely to be drawn from the first few months of the academic year of whatever country they live in (e.g. in Canada, most successful Ice Hockey players are born in January, February or March because their academic year starts in January).

    This effect is also seen in all sorts of other areas as when children are young, a 6 or 11 month age gap can make a huge difference both physically and also mentally. Having been born in July, I also noticed this effect to an extent (with our academic year starting in September I was always one of the youngest in my year) and always felt like I was playing catch-up to an extent with some of my peers.

    Now, I am not saying that the results you quote regarding the month of birth indicate anything specifically, and they may well be spurious, but it might merit further research to see if something about the life chances and medical responses of people born in certain months, related to the cultural mores and norms of the time in which they grew up could have some effect and their likelihood of surviving for longer.

    Whilst I think that day of the week is almost certainly spurious, because of the way we organise the lives of our children, month of birth may not be quite as random an indicator as it may seem.

  3. groverjones said,

    April 25, 2009 at 11:28 am

    Have a look at MSLT-1 for sentinel lymph node biopsy for melanoma, too. The 5-year survival of the two groups was identical. The subgroup analysis showed that, in some patients, in those who have their nodes resected, they have longer disease-free survival. Possibly because they no longer have a regional nodal basin to have a recurrence in?? When disease free, non-regional node survival is analysed, the outcomes show no benefit. Despite this, the authors have seized on this irrelevant subgroup to make claims that SLNB is ‘standard of care’ for melanoma management!

  4. Wyatt Earp said,

    April 25, 2009 at 2:18 pm

    Ahem, Ben. Shone. [i]Shone[/i].

  5. Synchronium said,

    April 25, 2009 at 2:54 pm

    I wish I were better at statistics so I could spot these things.

  6. Synchronium said,

    April 25, 2009 at 2:58 pm

    Wish I was* ?? I wish I was/were better at grammar too. :(

  7. Sili said,

    April 25, 2009 at 3:26 pm

    Wyatt Earp,

    Variation is good. Stop being so uptight.


    “were” is correct if you’re one of those few people who retain irrealis as distinct from the base form. In pratice it doesn’t matter.

    I think the month of birth thing is actually quite interesting. I seem to recall some talk years ago about schizophrenics reacting differently to medication depending on whether they were Summer or Winter kids. The hypothesis was that they were actually suffering two different illnesses and that the cause was somehow enviromental. Thus accounting for the S/W difference.

    That of course is just hearsay, since I never saw the study (if there was one), but it doesn’t sound implaucible that an environmental factor to could make subgroupanalysis according to birthdate sensible.

    If the schizo case is true, then failure to identify a subgroup effect could even disguise a drug that really was effective. Essentially by conflating two different diseases with similar symptoms. An analogy could be saying that aspirin doesn’t help headaches because it doesn’t cure migraines, I think.

    Just to clarify, I’m not saying the Equazen is anything but frauds. I’m just curious about whether subgroupanalysis cannot in fact be a thing of good if used appropriately?

  8. emen said,

    April 25, 2009 at 3:48 pm

    Ben, this is brilliant! Both on “fish oil magic” and on subgroups.

    When I read it in the paper, I was wondering what the subgroup was, kids
    with brown eyes or something? But it is clear now:
    “Responders tended to have ADHD inattentive subtype and comorbid
    neurodevelopmental disorders.”

    How clever.

    These fish-oil capsules are like aloe vera, they are supposed to be good for
    everything: joint pain, heart problems, to prevent and treat depression and
    ALL kinds of psychiatric problems, prevent colds and flu, and of course to
    make your kids smarter and better behaved.

    It is so rare to read something so elegant and eloquent about it – thank

  9. j said,

    April 25, 2009 at 4:49 pm

    Looks like the Guardian edited out the fact that all these kids had ADHD in the version published there (although there is mention of “reduction of ADHD symptoms”) www.guardian.co.uk/commentisfree/2009/apr/25/bad-science-fish-oil

    I’d argue that this detail is important – a pity that it didn’t make it into the Guardian version…

  10. IMSoP said,

    April 25, 2009 at 4:54 pm


    I’m no statistician, but I guess the point is that a subgroup analysis can be interesting and useful for looking for future avenues of research, but not for drawing any actual conclusions.

    So in the seasonal schizophrenia case, you might decide the observation was sufficiently interesting to formulate a hypothesis, and then design a new study looking at that hypothesis – i.e. where the entire design was based around the season of the sufferers’ births.

    But the sub-group analysis isn’t much better an indicator that you *should* test your hypothesis than an informed guess or anecdotal evidence, and it certainly shouldn’t be given meaning in its own right.

  11. Sili said,

    April 25, 2009 at 5:41 pm

    That’s a good point, IMSoP, and a good indication that I engage in wolly thinking despite my supposed eductation.

    Of course, the hypothesis should be tested separately once formed.

    Thank you.

  12. DrJG said,

    April 25, 2009 at 6:44 pm

    At some medical meeting or another, a Professor dismissed subgroup analysis as “Searching for the pony”. For those who don’t know the tale, there was once a man with twin daughters, one irrepressibly optimistic, the other incurably pessimistic. He felt that neither was a good approach to life, so on their birthday he presented the pessimist with a roomful of fantastic toys and games, and the optimist with a roomful of, err, manure, and left them to it
    When he came back a while later, he found the pessimist holding a broken toy, crying that sooner or later, all the rest would also break. The optimist, however, had found a shovel and was tackling the manure with gusto. When asked why she was bothering, she replied that: “With all that shit, there’s got to be a pony in there somewhere.”

  13. tonyy said,

    April 25, 2009 at 7:13 pm

    Over the whole sample, the pills made no difference. If they made things better for a subgroup of 26%, then doesn’t it follow that for the other 74% they made things worse?

  14. SteveGJ said,

    April 25, 2009 at 7:42 pm

    The statistical flaw in subgroup analysis is simply that there are so many of them. The more potential subgroups there are, the more likely (in fact near certainty) that you will find one. For them to be worth following up with further research then there needs to be the potential for a plausible causative mechanism. However, even if such a potential mechanism is postulated, the result doesn’t amount to anything more than a row of beans until further work is done.

    Of course if the specific subgroup is identified before the results are analysed then it is as valid as any other result. It’s the post-hoc nature of the analysis that is the problem.

    So that’s the issue – post-hoc subgroup analysis (with the emphasis on the post-hoc bit).

  15. Staphylococcus said,

    April 26, 2009 at 2:07 am

    That’s an excellent point, tonyy. If the overall mean effect amounts to zero then there must be sub groups with negative impacts to balance those with positive ones. Interesting how the authors gloss over things like that…

  16. modus tollens said,

    April 26, 2009 at 3:36 am

    Agree whole heartedly with IMSoP and SteveGJ. This problem is rife in a number of academic disciplines. Issue seems to be that we need some system for distinguishing between genuine hypothesis tests and interesting (but post-hoc) findings that may suggest testable hypotheses for further research. Currently, in many disciplines, it is very hard to draw this distinction.

    To solve this problem, it would be really helpful if researchers in all disciplines registered their hypotheses with some central authority before they even started their experiment or research. This would also be a good way of keeping track of publication bias etc.

    Serendipitous findings have always played an important part in scientific inquiry but they are only the first steps of the research process.

  17. Pro-reason said,

    April 26, 2009 at 7:54 am

    I too thinked I knowed the correct past tense of ‘shine’, but Sili has sayed that variation is good, so I’ll change my ways, lest anyone think I’ve becomed uptight as I’ve getted older.

  18. Jeesh42 said,

    April 26, 2009 at 9:46 am

    Fascinating article.

    I work for a social research company that does a lot of “subgroup analysis” for survey results. This seems different from trial subgroup analysis because that is comparing subgroups from two different samples (e.g. women with placebo versus women with the genuine treatment), whereas we compare subgroups within the same sample (e.g. whether older people are significantly more satisfied than younger people with a public service – both results having been taken from the same survey).

    I would be really interested in knowing whether this is valid or not, because it seems different from the subgroup analysis Ben talks about. As I say, we would say something like: “70% say they’re satisfied, and subgroup analysis shows people making up this 70% are most likely to be old and female etc”.

    Can anyone link me to any papers on this, or offer any insight? I do have some basic stats knowledge, and I thought it was okay to compare two subgroups within one sample.

  19. Jeesh42 said,

    April 26, 2009 at 10:19 am

    Right, I’ve done some swift reading on this and it seems that the crucial thing is testing whether the effect of the treatment differs significantly between subgroups (the interaction test), rather than the significance of the treatment effect in one subgroup or the other (which is what Eqauzen does, and where you are more likely to find a meaningless difference). See page 6 of the paper below:


    Please let me know if I’m misinterpreting.

  20. T.J. Crowder said,

    April 26, 2009 at 11:24 am

    Someone has to say it: I don’t suppose you’re analysing a subgroup of subgroup analyses? 😉

  21. Sili said,

    April 26, 2009 at 11:28 am

    That’s the spirit, Pro-Reason!

  22. Lola Blogger said,

    April 26, 2009 at 11:41 am

    This is a most helpful post – on Wednesday I am due to present an analysis (as part of my degree) of a paper suggesting that coffee drinking in excess of 10 cups a day can protect against Parkinson’s disease.

    (Saaksjarvi, K et al (2008), ‘Prospective study of coffee consumption and risk of Parkinson’s disease’, European Journal of Clinical Nutrition, vol. 62, no. 7, pp. 908-15.)

    It seems a perfect example of a post-hoc subgroup analysis: a large lifestyle survey was done in Finland in 1973-76 that included an open question asking how many cups of coffee were drunk in a day, on average. No data on what size cup, what strength coffee, subjects excluded if no answer was given.

    With much adjusting for baseline markers, significant associations were found for reduced risk of Parkinson’s with more than 4 cups of coffee in those with BMI >= 25, and similarly in those with serum cholesterol < 7.24 mmol/l.

    The abstract asserts that the results support the hypothesis that coffee consumption reduces the risk of Parkinson’s disease, but it took a great deal of detailed scrutiny to discover that the only significant findings (p<0.05) were the two above.

    Keep up the good work, Dr Goldacre!

  23. Delster said,

    April 26, 2009 at 12:21 pm


    What your doing could be called data trawling rather than sub group analysis.

    Your taking a survey result (rather than experimental data) and analysing the result to find out what groups of people tended to give which answers.

    In other words your simply extracting data from the numbers.

    What they are doing is taking the data and reversing the process.

    Where your asking who was most satisfied and getting the answer elderly women they are searching for arbitary subsets of people who were the most “satisfied” to get a group that shows a beneficial result.

    Your using numbers to get the data…they are using their data to get the numbers they want.

  24. modus tollens said,

    April 26, 2009 at 1:11 pm

    The key issue Jeesh42 is whether or not you have some principled prediction before you do your analysis. For example, “we hypothesise (before we’ve seen the results) that subgroup A has a higher level of x then subgroup B”. If you don’t have such an hypothesis then you are not hypothesis testing – you are data trawling as Delster notes.

    TO deal with this you need to adopt a more stringent alpha level (the probability that your finding is just due to chance). Typically, if you are hypothesis testing you will adopt a 5% alpha level (i.e. p<.05). BUT, for post hoc tests, you should adopt a significance level of much more like p=.001. The exact figure depends on a number of issues but perhaps start here to find out more:


  25. nel said,

    April 26, 2009 at 1:56 pm

    Out of interest, anyone know if the research on fish oils for depression is just as dubious?

  26. SteveGJ said,

    April 26, 2009 at 2:40 pm


    What you are doing is perfectly find providing that those subgroups are defined in advance and you aren’t just trawling through lots of data looking for patterns. So if the age boundaries are selected after the analysis are just chosen to match those which show the greatest apparent significance, then that would be very dubious. However, if you have some standardised age groups then it’s perfectly fine (adjusted, of course, for the relevant sample sizes).

    To be sure about avoiding this sort of bias the purist way of doing this is to define all the sub-groups before the analysis. However, then some potentially interesting patterns might be missed.

  27. Jeesh42 said,

    April 26, 2009 at 3:01 pm

    Hmm… I get what Delster says (very useful, thank you) but not entirely what Modus Tollens is saying (but also, thank you). From what Delster says (which puts my mind at ease), there’s nothing wrong with data trawling – it’s just semantics to separate it from the “subgroup analysis” that Ben describes.

    We are testing a predetermined null hypothesis (that, say, older people are more satisfied than younger people). This is different from what Equazen are doing, which is specifically looking for a subgroup that does have a significant difference. I’m guessing the latter is the post-hoc stuff you guys are talking about.

    There’s still room for poor post-hoc analysis in the work that I do, which is what I should be avoiding. As Ben points out, if satisfaction levels are crap generally, it would be wrong for me to start looking for a subgroup that significantly bucks the trend (but satisfaction was significantly higher among people with dogs who use the internet) and use this as evidence.

  28. Lola Blogger said,

    April 26, 2009 at 7:36 pm

    Using Ben’s example, if you take your cake that randomly contains coins, and cut it in all the predetermined ways you can think of, you may find some interesting results, but they will still be spurious. What’s the difference between this and what Jeesh42 is describing?

  29. muscleman said,

    April 26, 2009 at 8:50 pm

    The hypothesis about the seasonal schizophrenia thing is that it might be due to VitD. This seems plausible since schizophrenia is more common in higher latitudes (though genetics has not been ruled out in that one either). I understand the whole area is currently a hot research topic. So watch out for that one.

  30. j said,

    April 26, 2009 at 8:55 pm

    Nel – a 2007 review found limited evidence that they may be useful when taken alongside ‘conventional’ antidepressant meds dtb.bmj.com/cgi/content/abstract/45/2/9 Certainly nothing anywhere near conclusive, though – and not exactly the type of results that set the world alight.

    A shame, really, as if fish oil pills were a cure for depression then that would be rather useful…

  31. modus tollens said,

    April 26, 2009 at 9:39 pm


    If you test your alternate hypothesis that older people are more satisfied than younger people and you formulated this hypothesis before you looked at any results then you are not data trawling or doing “bad” subgroup analysis as described in Ben’s article. It doesn’t matter whether or not your hypothesis involves subgroups per se. What matters is that you don’t look at the results before you formulate the hypothesis.

    However, if you happen to notice when you look at the data that people who own dogs and surf the internet have really high satisfaction compared to others then, as you note, you are data trawling/post hoc testing/ doing subgroup analysis as described by Ben. This is not necessarily bad but you have to be very careful to: a) tell people that your finding is post-hoc; b) adopt a more stringent alpha level; and c) do more research if you think it is potentially a genuine, relevant finding.

  32. Jeesh42 said,

    April 26, 2009 at 9:54 pm

    Makes sense! Lola Blogger, that’s exactly what I was describing.

  33. emen said,

    April 27, 2009 at 8:44 am

    nel # 25

    I would also be interested to find out more about these fish oil
    capsules, whether they are good for ANYTHING at all.

    I’m not an expert to know where to start, but this is what NHS Choices seem
    to say about fish oil and depression:

    “Omega-3 fatty acid
    Research has shown a link between the amount of a fish people in different
    countries eat and the level of depression. In Japan, where people eat on
    average 70kg (150lbs) of fish a year, the rate of depression is 0.12%.
    Whereas in New Zealand, where people eat only 18kg (40lbs) of fish a year,
    the rate of depression is almost 50 times higher.

    It is though that a chemical found in fish – omega-3 fatty acid – may help
    your brain work more efficiently, so serotonin (which can boost your mood)
    has more of an effect on you.

    Fish that contains a lot of omega-3 fatty acid includes salmon, sardines and
    mackerel. Vegetarian alternatives include walnuts and tofu, and omega-3 food
    supplements are also available over the counter (OTC) from health shops”

    but they don’t give you the link to that research.

    NHS Choices have written a lot about fish oil and Alzheimer’s, cancer,
    diabetes, rheumatoid arthritis, ARMD, memory etc., if you want to have look:


    but as I say, it would be interesting to know more. So many people take fish oil capsules (I don’t, I eat quite a
    lot of fish), are they ANY good or just a waste of money?

  34. heng said,

    April 27, 2009 at 9:51 am

    Feynman discusses this issue quite a bit (or at least it stuck well) in his various popular writings. As other people have noted, the problem occurs, not necessarily with allocating sub-groups, but with allocating sub-groups based on the data already acquired. It is intellectually dishonest to draw conclusions solely from the data. You need an a priori hypothesis and it is that hypothesis that the data is testing. That said, the data may provide a hint for a new hypothesis to test (with new data).

    I thoroughly recommend reading some of Feyman’s books e.g. www.amazon.co.uk/Surely-Youre-Joking-Mr-Feynman-Adventures/dp/009917331X

  35. heng said,

    April 27, 2009 at 9:53 am

    Further to my last post, I think the discussion is in this book…

  36. Robert Carnegie said,

    April 27, 2009 at 10:53 am

    @12 I don’t quite understand why it would be wrong to decide what factors you may be interested in after you have collected data (e.g. lung cancer / smoking, although the Nazis were there first – ideologically biased however), but I understand “searching for the pony”. And then there’s “correlation does not prove causation”. Probably however you’re testing a hypothesis of causation by looking at correlation. So the deficit would be of a credible cause-and-effect conjecture.

    @13, 15 is it statistical significance? A minority of patients may respond well to a drug above a predetermined significance level, overall there is no significant effect, and outside the minority there is no significant effect. And this might be a real effect or a statistical accident. The drug could be doing everyone some good but too little to be detected in most, or nothing for most, or it could be doing a small amount of harm, undetected, as well as good. You’d have to do a larger trial to have a better chance to find out.

  37. caini said,

    April 27, 2009 at 11:04 am

    May I add the devil of multiple interim analysis, and the number of trials stopped at this point. Keep analysing until you find a benefit then stop. Go activated protein c. £15,000 please.

  38. speedweasel said,

    April 27, 2009 at 12:07 pm

    In the past I‘ve compared post-hoc analysis to lying back and finding shapes in the clouds. Its more honest, but nowhere near as fruitful, if you close your eyes and decide what you will look for before you begin.


    So true. Richard Feynman is, and will always be my greatest hero. His writings are (dare I say it?) magical.

  39. blog.anothergeek.biz said,

    April 27, 2009 at 12:21 pm

    This needs to be a competition!

    Can anybody post the actual data so that we can all find our own subgroups? The obvious ones are people who get worse or improvers in control group. But I am sure that we could squeeze out others.

    The best ones could be published in the Saturday column.

  40. Moderate_Nige said,

    April 27, 2009 at 1:03 pm

    Nice to see on page 8 of the 25.04.09 Weekend magazine the advert for Boots’ range of “vitamin and herbal products”, including Equazen eye q capsules (eye q – geddit? That’s SO clever, in some way I can’t think of at the moment) displaying the words ‘Independently tested The Durham Trial’ on the package in a nice circle with a big reassuring tick in the middle. Didn’t Boots used to be called ‘Boots the Chemist’ at one time, i.e. it claimed to be a proper chemist’s shop rather than a lifestyle nutrition supplement industry outlet?
    (Interestingly, the small print at the bottom of the page says ‘Vitamins may benefit women of a child bearing age.’ Not a very positive assertion, is it? And also a waste of time for everyone else, I suppose.)
    If this is too depressing, turn back one page where you’ll find a nice picture of a very spotty cat (as well as some rather freaky baldy cats).

  41. eens said,

    April 27, 2009 at 1:48 pm

    ‘satisfaction was significantly higher among people with dogs who use the internet’

    darn…and there I was thinking satisfaction was only available to people with really really smart dogs that had long long toes and that didn’t mind wasting a few hours reading about all the silly things that obsess people!

  42. lasker said,

    April 27, 2009 at 1:49 pm

    Great column.
    I have always believed in astrology but its wonderful to see that its been proven. Seeing as I’m pisces I’ll redouble my efforts to avoid any endarterectomies.
    Celecoxib is the only NSAID that I can take without immediate stomach pain (OK so Ive probably got an ulcer) but now I’ll be careful not to take it for longer than 6 months at a time.
    And now I know that my fish oil capsules have no action whatsoever i can stop worrying about side effects.

  43. Michael Grayer said,

    April 27, 2009 at 1:50 pm

    In support of what Lola Blogger said (#28), using Ben’s other example:

    ‘“You can see we did pretty poorly overall,” they might say: “but interestingly our national advertising campaign did cause a massive uptick in sales for the Bognor region.”’

    This is only a useful result if you had a solid theoretical reason why Bognor may be systematically different from the rest of the country before you started looking at the results. Otherwise it’s purely data dredging.

  44. MedsVsTherapy said,

    April 27, 2009 at 3:18 pm

    “I’m no statistician, but I guess the point is that a subgroup analysis can be interesting and useful for looking for future avenues of research, but not for drawing any actual conclusions.”

    Exactly. It also may be foolish, and possibly unethical, to NOT explore the data and discover what hypotheses subgroup analyses might suggest. But that means that some post-hoc finding is simply yet another way to discover a possibility, exactly the same as the way that broadly sampled epidemiological studies can be used to indicate possible avenues for future research. Ideas can come from anywhere.

    The next step is to define the hypothesis, and test it as an a priori hypothesis: take the fish oil, define this supposedly specific subgroup effect, recruit a bunch of ppl in this profile, then randomize them to fish oil or olive oil.

    Just to play with the true believers, take their declared effect size from this post-hoc analysis, add a bit of error band, and power the study to find at least that degree of benefit.

    If it works, it works. If not, then: Good night! Thanks for playing science! Next?

    Alpha level adjustment helps to some degree. But is not the total solution. You might just happen upon a true, worthwhile effect that would be ruled out by the stricter alpha, if that is your total criterion for judging the worthiness of data-mining hypotheses-worthy-of-future-study. Also, there will be spurious effects that are mathematically strong enough to survive the stricter alpha level.

    Thus, best to dredge data for indications of true phenomena, re-read Bradford-Hill, and develop the next study.

  45. MedsVsTherapy said,

    April 27, 2009 at 3:20 pm

    RE: shapes in clouds: I don’t look for shapes in the clouds anymore – the secret government chemtrail program designed to rain down Morgellon’s, and all the UFO traffic, took all of the fun out of that.

  46. DrJG said,

    April 27, 2009 at 5:59 pm

    @Robert Carnegie – An even older professorial pronouncement surfaces from my memory – this from the early 80’s, well before PCs became so ubiquitous. He complained that much research was heading in the direction of acquiring a computer with a stats package, collecting and inputting as much biometric data as possible from your patients, and hitting the “compute” key. Then you list the correlations in order of significance, and draw your cutoff line wherever you feel you can justify it. OK, you may stumble on something useful, but without an initial idea of what you are looking for, you will generate mostly junk.

    But I can understand the pressure on researchers, certainly in the medical profession. Names on papers is essential to progress up the career ladder towards consultanthood, and much research is done not because the medic involved has any particular interest in the outcome, but because it looks like a fruitful source of journal credits, much destined to be filed and forgotten except as a suffix to a CV.

  47. k.r.johnson said,

    April 30, 2009 at 9:55 pm

    A brief note on the Fish Oil purported controversy. During the 1950s, children under five years of age were issued with cod liver oil free from clinics. I can assure you that it tasted revolting. If fish oil had any benefits for children, they would have been obvious when the doses of cod liver oil were withdrawn.

  48. xxjoxx said,

    May 7, 2009 at 9:11 pm

    I don’t see the problem with the finding of this paper, to be honest.

    There is not just one disorder called “ADHD”, there are three subtypes of ADHD – inattentive – predominantly hyperactive and combined typed – ADHD is a relatively new disorder for research, so it is possible that maybe the three subtypes of ADHD are not even the same disorder.

    So I disagree that this paper is a result of overanalyzing, when the results could also justify further research into the different subtypes, which may show that in fact the subtypes of ADHD are different disorders, with different etiologies, causations etc – and this which is why ADHD Inattentive type benefits from fish oils, and the others do not. Of course further research is required to replicate these findings, but I do think Ben, that this article is rather subjective due to your bias against previous research involving fish oils.

    I also think to suggest that investigating subtypes is overanalyzing is a weak – would you claim that investigating effects on subtypes of gender, education etc is also over analyzing, too?

  49. xxjoxx said,

    May 7, 2009 at 9:12 pm

    Sorry if I seem rather aggressive in my post above, btw. I read this website on a regular basis, but I felt compelled to post because this is a rather disappointing article.

  50. modus tollens said,

    May 15, 2009 at 11:03 pm

    @ xxjoxx

    Ben’s point is not that analysing subgroups per se is bad science – but rather how you analyse the subgroups. Re-read post 44.

  51. shane said,

    June 4, 2009 at 7:05 am

    I like it very useful information to share.Then you list the correlations in order of significance, and draw your cutoff line wherever you feel you can justify it. get more information from here: Testosterone Therapy

  52. simontax said,

    June 23, 2009 at 4:30 pm

    SteveJG: “However, I’m wondering if there could be some form of independent service that evaluates the statistical significance of media reports”

    It’s called education, Steve. Or if you prefer the political form “Education, Education, Education”.

  53. Lifewish said,

    July 15, 2009 at 11:59 pm

    FYI, the paper on subgroup analysis that Ben mentions is called “Clinical judgment and statistics: Lessons from a simulated randomized trial in coronary artery disease”. It’s available freely online here.

    Anyone should be able to get the gist of it, but Googling may be required for non-mathmos to understand some technical terms. Hell, I didn’t know what a Mantel-Haenszel test was until now.

    I also like the look of this other paper about statistical errors in medical literature.