Ben Goldacre

Saturday April 1, 2006

The Guardian

Nothing comes for free: if you can cope with 400 words on statistics, we can trash a front page news story together. “Cocaine floods the playground,” roared the front page of the Times last Friday. “Use of the addictive drug by children doubles in a year.”

Doubles? Now that was odd, because the press release for this government survey said it found “almost no change in patterns of drug use, drinking or smoking since 2000”. But the Telegraph ran with the story as well. So did the Mirror. Perhaps they had found the news themselves, buried in the report.

So I got the document. It’s a survey of 9,000 children, aged 11 to 15, in 305 schools. The three-page summary said, again, there was no change in prevalence of drug use. I found the data tables, and for the question about using cocaine in the past year, 1% said yes in 2004, and 2% said yes in 2005. Except almost all the figures were 1%, or 2%. They’d all been rounded off. By asking around, I found that the actual figures were 1.4% for 2004 and 1.9% for 2005, not 1% and 2%. So it hadn’t doubled. But if that alone was my story, this would be a pretty lame column, so read on.

What we now have is an increase of 0.5%: out of 9,000 kids, about 45 more kids saying “yes” to the question. Presented with a small increase like this, you have to think: is it statistically significant? Well, I did the maths, and the answer is yes, it is, in that you get a p-value of less than 0.05. What does that mean? Well, sometimes you might throw “heads” five times in a row, just by chance. Let’s imagine that there was definitely no difference in cocaine use, the odds were even, but you took the same survey 100 times: you might get a difference like we have seen here just by chance, but less than five times out of your 100 surveys.

But this is an isolated figure. To “data mine”, and take it out of its real world context, and say it is significant, is misleading. The statistical test for significance assumes that every data point, every child, is independent. But, of course, here the data is “clustered”. They are not data, they are real children, in 305 schools. They hang out together, they copy each other, they buy drugs off each other, there are crazes, epidemics, group interactions.

The increase of 45 kids taking cocaine could have been three major epidemics of cocaine use in three schools, or mini-epidemics in a handful of schools. This makes our result less significant. The small increase of 0.5% was only significant because it came from a large sample of 9,000 data points – like 9,000 tosses of a coin – but if they’re not independent data points, then you have to treat it, in some respects, like a smaller sample, and so the results become less significant. As statisticians would say, you must “correct for clustering”.

Then there is a final problem with the data. In the report, there are dozens of data points reported: on solvents, smoking, ketamine, cannabis, and so on. Standard practice in research is to say we only accept a finding as significant if it has a p-value of 0.05 or less. But like we said, a p-value of 0.05 means that for every 100 comparisons you do, five will be positive by chance alone. From this report you could have done dozens of comparisons, and some of them would indeed have been positive, but by chance alone, and the cocaine figure could be one of those. This is why statisticians do a “correction for multiple comparisons”, which is particularly brutal on the data, and often reduces the significance of findings dramatically, just like correcting for clustering can.

Mining is a dangerous profession – and data mining is just the same. The stats nerds who compiled this government report knew about clustering, and Bonferroni’s correction for multiple comparisons, and that, presumably, is why they said, quite clearly, that there was no change from 2004-2005. The journalists, apparently, did not want to believe this: they tried to re-interpret the data, and the story went from an increase of 0.5%, that might be a gradual trend, but could well be an entirely chance finding, to being a front page story in the Times about cocaine use doubling. You might not trust the press release, but if you don’t know your science, you take a big chance, when you delve under the bonnet of a study to find a story.

Â· Please send your bad science to bad.science@goldacre.net

## Adam said,

April 1, 2006 at 5:02 am

Yet another example showing that a good grasp of statistics is not something to be sniffed at.

## sam said,

April 1, 2006 at 6:08 am

Nice article. But it would also be useful to point out that this kind of statistics compares two hypotheses: no-change (the null hypothesis) versus change (the experimental hypothesis). This kind of statistics is not as good as saying how big the change is–the emphasis is on rejecting the null hypothesis.

In this case, it seems extremely likely that there is a change from year to year. That is, looking at the population of all children in the UK, there is likely to be some change year to year. It’s very unlikely that drug use stays *exactly* the same every year, considering the millions of children involved.

So, null-hypothesis significance testing does not give a lot of insights here, and does not really address whether drug use has doubled or indeed what size the change has been. Confidence intervals might help a bit….

## profnick said,

April 1, 2006 at 8:06 am

Hmm, all true but Ben’s wording could be another confusing factor. Ignoring the fact for the moment that the change is very small and likely,under the circumstances to be insignificant, the phrase “What we now have is an increase of 0.5%:” is misleading. The numerical change from 1.4% to 1.9% is indeed 0.5 but in terms of the proportion of the population sampled it is a change of 0.5/1.4 which is about 36%. So in the newspaper jargon it is not “doubled” but “increased by one third”…..still probably insignificant and certainly not so much of a thundering headline

## Jason Steward said,

April 1, 2006 at 8:50 am

“â€œWhat we now have is an increase of 0.5%:â€ is misleading. ”

Here in the US, we might say “An increase in 0.5 percentage points.” But the public in general wouldn’t know the difference. As I’m sure Ben has thoroughly confused UK readers….

“This is why statisticians do a â€œcorrection for multiple comparisonsâ€, which is particularly brutal on the data, and often reduces the significance of findings dramatically, just like correcting for clustering can.

”

Also, I always thought the â€œcorrection for multiple comparisonsâ€ was performed when comparing across the same groups multiple times with an analysis. For example, if instead of comparing by year, you compare between different groups within a year. A vs. B, A vs. C, C vs. B, etc. So an ANOVA (vs. multiple t-tests), for example, would test all of them at the same time correcting for the multiple comparisons.

In Ben’s example, it doesn’t seem necessary.

If someone could clarify……

## Joe Otten said,

April 1, 2006 at 9:44 am

I would disagree with Sam. Testing against the null hypothesis of no change is illuminating. Yes, the sample changes every year, but the question is whether this change is meaningful – whether the causes have changed or whether this is just random variation.

Rejecting the null hypothesis is saying that there has been a meaninful change, in the direction indicated by the data. If the data doesn’t support rejecting the null hypothesis, the underlying causes may still have moved, but there is not enough evidence to say in which direction they have moved.

## Rage on Omnipotent » Blog Archive » Does anyone get statistics? said,

April 1, 2006 at 10:20 am

[…] I used to – certainly don’t any more. Lots of honorable confusion on display while trying to debunk a front page news item. […]

## Ben Goldacre said,

April 1, 2006 at 10:52 am

[coughs] you might notice that this column is about 150 words longer than usual, because i wined and moaned and asked for more space so i could explain the facts of the story, statistical significance, correction for clustering, and correction for multiple comparisions, all in one column, in the main body of a national newspaper on a saturday.

now here is a request, and requests like this would usually be made grumpily, but in this case it’s honestly for my own interest, because if you can do it, i will steal your phrasing next time: if you think a piece of stats could have been explained better, then please write exactly what you would have written, even if its just for one section, and give a word count.

i’ll make it easier for you by giving a word count for each section. remember, your prose has to flow, introduce the concept

andshow how it was relevant for this example. your imaginary audience is a perfectly intelligent person who can understand anything that is clearly explained to them, but might not want to spend a huge amount of time trying, and just happens to know noting at all about stats:correction for multiple comparisons: 136 words

correction for clustering: 173 words

statistical significance testing: 123 words

## drk said,

April 1, 2006 at 11:14 am

Seems to me we need a “Bad Statistics” site for the statistics enthusiasts.

A long time ago I learnt and taught basis stats at university level – but this discussion will send me grovelling amongst my old textbooks.

Any “Bad Statistics” site would have a faq, possibly a wiki, and plenty of discussions on “Bad Statistics”

I hope some of the statistics mavens here are up to it – the cyber-community could do with a good stats based site that explained everything in basic terms, nailed the lies, and explained the grey areas.

## Pedantica said,

April 1, 2006 at 11:38 am

There is another aspect to this as well. It is based on a survey. When children are surveyed about their behaviour there is evidence that some of them will lie to appear more cool, even on an anonymous survey. For example I think there is evidence that when asked about whether they have had sex, boys turn out to be having significantly a great deal more sex than girls. For boys there is often kudos to be earned from claiming to have had sex; for girls this is generally less true.

Now if this tendancy to lie on surveys is stable from one year to the next then the relative increase or decrease shown by a survey should be accurate. But what if it is not. If cocaine was becoming preceived as more ‘cool’ then you might expect more young people to lie about using it.

I think this is another reason to be cautious of any results particularly where there statistical significance is questionable.

## profnick said,

April 1, 2006 at 11:50 am

Ben,

I’m sure you don’t need anyone to improve your writing to deadlines or word limits so I’m not even going to try. For my own personal preference I would have replaced your instances of “0.5%” with “from 1.4% to 1.9% and omittted all the stuff about correcting for clustering. The hatchet job on the front page article still stands.

## oharar said,

April 1, 2006 at 12:48 pm

“Also, I always thought the â€œcorrection for multiple comparisonsâ€ was performed when comparing across the same groups multiple times with an analysis.

…

In Benâ€™s example, it doesnâ€™t seem necessary.

If someone could clarifyâ€¦â€¦ ”

It is still necessary, for the reasons that Ben points out. There are many statistical tests reported, so some of them will be significant by chance. It’s the same problem as

post hoccomparisons in ANOVA (if you throw enough dats at the dart-board, one of them’s going to hit the bull).Incidentally, I hope whoever decided to round the percentages in the original report feels suitably chastened.

Bob

## Dan Curtis said,

April 1, 2006 at 12:54 pm

Is it bad science that is preventing this article from being available on today’s edition of The Guardian online?

## Clarkeycat said,

April 1, 2006 at 12:56 pm

Further to what Pedantica has said (post 9 above) I remember being given one of these surveys to complete during my school days about a decade ago.

Now at the time I was a pale, callow, 14 year old virgin grammar school boy – but someone reading my survey could only conclude it was completed by a young Pete Doherty.

All my friends and I had a good laugh afterwards and thought no more about it until about three months later the Daily Mail (obviously) began fuming about rampant teenage sex and drug use.

I’m sure the statisticians have got wise to this kind of larking about and do some sort of adjustment but nevertheless I’ve not been able to see one of these stories based on surveys without laughing ever since.

While it is true that if the level of blatant lies remains stable then the real increase should come through but I just find it hard to believe such small increases given the fact that (from my experience) almost everybody taking the survey will be lying to a greater or lesser extent.

Of course, this doesn’t change the fact that the Telegraph’s front page story was still bobbins though…

## Ben Goldacre said,

April 1, 2006 at 12:58 pm

www.guardian.co.uk/life/badscience/story/0,,1744541,00.html

## stever said,

April 1, 2006 at 1:10 pm

*coughs*

www.badscience.net/?a=xdforum&xdforum_action=postreply&xf_id=1&xt_id=225

## stever said,

April 1, 2006 at 1:11 pm

what i meant was

*coughs*

www.badscience.net/?a=xdforum&xdforum_action=viewthread&xf_id=1&xt_id=225

## stever said,

April 1, 2006 at 1:14 pm

Good column as ever Ben. excellent source material.

my only small gripe is that you didnt mention that the Guardian also ran the story under the headline ‘cocaine use doubles amongst school pupils’ or something similar. The Sun ran it too – all of them taking it directly from the Times.

## SRW said,

April 1, 2006 at 1:45 pm

Ben,

“An increase of 0.5%” is not “misleading” (Profnick above), it’s plain wrong. 1.4% to 1.9% is an increase of 0.5 percentage points or almost 36%.

If you’re responsible for this little mathematical howler please hang your head in shame — it spoils an otherwise superb piece.

The point is in the Grauniad style guide (see “percentage rises” in www.guardian.co.uk/styleguide/page/0,,184842,00.html). It’s been corrected a number of times in the Corrections and Clarifications column.

SRW

## RS said,

April 1, 2006 at 2:14 pm

Surely in context anyone with half a brain can see that a rise from 1.4% to 1.9% is a rise of 0.5 percentage points, or a rise of 30 odd percent. If they are confused by that they’re too dim to understand the rest.

Re: multiple comparisons, although a scientist mightn’t correct for them, doesn’t mean they shouldn’t. In theory you should correct for multiple studies as well.

## MsT said,

April 1, 2006 at 2:48 pm

Surely nobody could seriously suggest those figures were ambiguous? I mean, when all the numbers were given, in full? That’s insane..

## profnick said,

April 1, 2006 at 3:44 pm

OK so I was being polite by using the term “misleading” (SRW post), but it’s being optimistic even to Guardian readers to suggest that “anyone with half a brain can see that a rise from 1.4% to 1.9% is a rise of 0.5 percentage points, or a rise of 30 odd percent.”

## stever said,

April 1, 2006 at 4:05 pm

to quote the pioneering expose by me: ‘the survey shows an increase 0.5% overall and an increase of just over a third on the baseline figure.’

## profnick said,

April 1, 2006 at 4:32 pm

Stever,

Quite.

Still smiling?

## Martin said,

April 1, 2006 at 5:00 pm

Stever didn’t use the words “percentage point”, he just said “0.5%” like Ben. Does that mean his description is inaccurate or ambiguous as well? No. It is made clear by context. This is a pretty ridiculous criticism of the article. Giving all the numbers is a pretty damn clear way of making those numbers ENTIRELY unambiguous to my mind.

## SRW said,

April 1, 2006 at 5:02 pm

In my experience even quite mathematically sophisticated people (such as, for instance, those who have enough statistics to write a Bad Science expose of the Times) don’t register that there is an important difference between percentage point rises and percentage rises.

Or, to personalise it in a different direction: I have a maths degree from a decent university, and have done postgraduate work in maths. It was only when I started work in the real world that someone pointed out that there was an important difference.

It’s obvious when it is pointed out, but it isn’t pointed out often enough.

“0.5% rise” is miniscule, not significant on (almost) any dataset at all. “0.5 percentage points rise” is significant on the particular dataset under discussion. And that was the whole point of the piece.

SRW

## ProfT said,

April 1, 2006 at 5:07 pm

SRW: what you are identifying is that the phrase “percentage point” is inconsistently used and inconsistently understood, especially in the UK, so I am surprised to see you are advocating its use, as it is not a satisfactory term. It is because of that very fact that most people would chose not to use it and disambiguate through proper technical terminology or clear context. The correct term here is “Absolute Risk Increase” for 0.5% and “Relative Risk Increase” for 40%, but neither was necessary here since there was no confusion, the raw numbers were all given.

## Martin said,

April 1, 2006 at 5:10 pm

SRW: “In my experience even quite mathematically sophisticated people (such as, for instance, those who have enough statistics to write a Bad Science expose of the Times) donâ€™t register that there is an important difference between percentage point rises and percentage rises.”

SRW that’s almost unbelievably stupid. It’s quite clear that Ben understands the difference between absolute increase and relative increase, he just didn’t use your imperfect way of disambiguating them, he used the full set of numbers instead, meaning that nobody was in any doubt as to what the numbers meant.

## RS said,

April 1, 2006 at 5:15 pm

But, in context, a 0.5% rise is clearly a rise in the percentage points from 1.4% to 1.9%. Saying 0.5% rise is simply ambiguous, not wrong, and, in the context, the meaning is obvious. If I say 20% of people get X, and you say 50% of people get X, and I point out that a difference of 30% in our estimates is worrying, the correct response is not ‘ah, no, I think you’ll find there is a 150% difference’. Actually, that’s one of the reasons I tend not to talk about percentage differences like that, because the figure depends on which value you use for the denominator, so is also ambiguous.

## sam said,

April 1, 2006 at 6:09 pm

My point was not that the sample changes every year, but that the population also changes every year. With millions of children in the UK, how could drug use in the population stay exactly the same year to year? The words “significant” and “meaningful” can be thrown around, and they sound good. But all that null hypothesis signficance testing really does is address whether the null hypothesis (no change) is wrong–and in this case it is just about certain that the null hypothesis is wrong. Null hypothesis significance testing does not say whether a change is large or whether it is “meaningful.” Confidence intervals, or effect sizes, are better for this purpose.

## Janet W said,

April 1, 2006 at 6:33 pm

I want to know what percentage stever gets.

## sam said,

April 1, 2006 at 6:47 pm

Think about your own, say, alchohol use. How likely is the null hypothesis for just one person? Did you drink exactly the same number of pints in 2004 and 2005? It’s very likely that there was some change one way or the other. Now multiply this by milions of children in the population. Surely there is a change. For one thing, if 11-15 year olds are of interest, then different children will be in this range every year. It is extremely unlikely that the new children in this range are exactly the same as the ones who fall out of the age range.

I should add that Ben’s writing is great–the problem is that null hypothesis significance testing is overused, even by people who know about stats.

## RS said,

April 1, 2006 at 7:18 pm

But you are normally constructing your statistical test based on the variance, so the year to year fluctuations will be taken into account in the p-value produced.

## sam said,

April 1, 2006 at 7:38 pm

The point is that once you accept that there must be year to year fluctuations in the population, there is no purpose to testing the null hypothesis, that the population does not change at all. If you drank 150 pints of beer in 2004 and 153 pints in 2005, is that a fluctuation? A change? A significant change? A meaningful change?

## RS said,

April 1, 2006 at 8:34 pm

I don’t know what test Ben did (chi-square?), but most statistical tests are explicitly designed to take into account the random fluctuations about the mean level and only report a significant difference if the figure is outside the range expected by chance. Meaningful change is a completely different issue, and not one being discussed here – the medical literature quite often worries about this, and confidence intervals are a good way to look at it.

## sam said,

April 1, 2006 at 9:04 pm

Well first of all the p value does not tell you how big the change is.

Statistical tests are designed to deal with fluctuations in the sampling process–if you ask 9000 kids out of a population of millions of kids, drug use in the sample will depend on which kids you ask. If you ask a different 9000 kids the same year, you will get slightly different results, due to sampling fluctuations. Note that these are fluctuations in the SAMPLE.

Here, the sample from 2005 was compared to the sample from 2004. The difference was big enough to reject the null hypothesis, that drug use in the population of all children in 2005 was exactly the same as drug use in the population of all children in 2004.

My point is that drug use in the whole population in 2005 was obviously not exactly the same as drug use in the whole population in 2004. The null hypothesis is obviously wrong. There was obviously some true change in the population–how could it possibly stay exactly the same? This change is different from the idea of sampling fluctuations.

One we accept that the null hypothesis is obviously wrong, there’s no point to doing a significance test on the null hypothesis.

## RS said,

April 1, 2006 at 10:22 pm

Sam, while they are not the same literal population, we are interested in if they are the same statistical population. These are not the same thing at all. Our statistical test is trying to see if both samples are drawn from the same statistical population. A significant difference implies that the two samples are drawn from populations with different statistical structures. You seem to be treating the test as between two populations (2004 & 2005) with two estimates of the different populations. The actual interesting question is whether the 2004 and 2005 populations are both instances of an overall statistical structure that is not changing, only being instantiated (sampled) slightly differently each time. That is the question being tested, not whether the rate in 2004 is exactly the same as 2005.

## sam said,

April 1, 2006 at 10:41 pm

It is extremely unlikely that the two populations have exactly the same statistical structure, for example it is extremely unlikely that the two populations (all children in 2004 and all children in 2005) had exactly the same level of drug use. It is extremely likely that the overall statistical structure, for the whole population, changed from 2004 to 2005, just because it is so implausible that the statistical structure stayed exactly the same. The null hypothesis is that the statistical structures of the two populations is exactly the same–clearly the null hypothesis is wrong. This is why there is no point to testing the null hypothesis. This has nothing to do with sampling–the point is that the two populations truly differ, in terms of statistical structure.

## RS said,

April 1, 2006 at 10:43 pm

Sam, your reasoning says that no statistical test should ever be done.

The question being tested is whether the two populations are drawn from the same distribution, not whether they are exactly the same – this is the fundamental misapprehension that prevents you from understanding the point of testing the null hypothesis.

## sam said,

April 1, 2006 at 10:55 pm

Sorry, RS, you don’t draw populations from a distribution–you draw samples from a distribution. You’re getting samples and populations mixed up.

I’m not saying that no statistical test should ever be done. My point is that null hypothesis significance testing is limited in usefulness. It does not tell you how big a change is–this was the main concern, whether drug use doubled. It is somewhat useful for dealing with fluctuations due to sampling, but here there are two populations that are extremely unlikely to have exactly the same statistical structure. So testing the null hypothesis is misguided in this case. Alternatives such as confidence intervals or effect sizes would be more appropriate–these would give an estimate of how big the change was.

## RS said,

April 1, 2006 at 11:11 pm

Sam, I’m not getting populations and samples mixed up. What I am pointing out is why your problem with null hypothesis testing does not apply. The question being tested is not whether 2004 and 2005 have exactly the same frequencies. The question being tested is whether the estimates from each year are consistent with being drawn from the same distribution. Now you’re trying to argue that, since we can be pretty sure 2004 and 2005 will have different distributions, there is no point doing this test. I’m saying that the interesting question is whether, each year, the difference between populations is just the variation we would expect from them all being drawn from the same distribution (this is like the problem with different samples of the same ‘population’), or whether the populations between years are not being drawn from the same distribution and are thus ‘significantly different’ in an interesting way. This is what a statistical comparison of the two samples tests. Essentially I am saying that you are being misled by excessively strictly construing the meaning of words like sample and population – when it comes to comparing populations over time it becomes meaningless, but the answer is not to make the claim that therefore the numbers are trivially significantly different. The obvious example is that your interpretation cannot distinguish between an increased frequency of 1.4% to 1.9%, or an increased proportion from 1.4% to 50% – ‘hey, we already knew the populations would be a bit different, it’s all significant man!’

What would a confidence interval mean in this context? We are dealing with frequencies, they don’t have confidence intervals, and effect sizes are simply comparing the frequencies without any stats.

## stever said,

April 1, 2006 at 11:19 pm

haha – listen to you two!

## sam said,

April 1, 2006 at 11:20 pm

Well if I am going to be accused of strictly construing the meaning of words like sample and population, that’s good.

The problem with null hypothesis significance testing is that it does not distinguish between a change of 0.5% and a change of 48.6%. It does not tell you how big the change is. Null hypothesis significance testing examines whether the change is exactly 0.0000000000000000%. (some zeroes omitted)

A confidence interval would be something like: our best estimate is that the true change in the population is somewhere in the range from 0.1% to 1.0%. Ben should have reported confidence intervals for the change, rather than testing whether the change is “significant”.

## PK said,

April 1, 2006 at 11:33 pm

Yes. No. Yes. No. Yes! No! YES! NO! YES! NOOOOOOOO!

## RS said,

April 1, 2006 at 11:51 pm

Sam, where you’re getting mixed up is that a statistical population is not necessarily the same as a population of people.

But I think you have some bizarre bee in your bonnet about significance testing, so I’m going to bed.

## sam said,

April 2, 2006 at 12:01 am

bzzzzzzzzzz

I agree that a statistical population is not the same as a population of people. And I am not the one who tried to use sample and population interchangeably.

RS, you did not respond to my main point, that null hypothesis significance testing does not tell how big the difference is. It only tests whether there is a difference of 0.00000000000. (further zeroes omitted)

I’m reluctant to say this, but intoning (in a deep voice of course, while scratching ones beard) that a difference is “significant”, without trying to estimate how big the difference is, is bad science.

## Matt said,

April 2, 2006 at 12:28 am

Ignoring the RS vs. sam debate for the moment, I’d like to return to the earlier point about whether saying “an increase of 0.5%” is misleading.

I’m going to try and agree with both sides of the argument. Anyone who reads the whole article won’t be left in any doubt as to the meaning of the numbers and what is going on. However, the reader who reads the first few paragraphs, substitutes “math-math-blah-blah-math” for the middle chunk and then reads the final paragraph might be in trouble. In this paragraph, Ben clearly compares “an increase of 0.5%” with “doubling”, and mocks the Times for mixing them up. This seems to be the classic statistical trick of choosing a way of presenting your figures to gain the

maximum instinctive response.

## Martin said,

April 2, 2006 at 2:54 am

Not only would these hypothetical people who are confused have to read only half of the article, they’d also have to read only half of the relevant sentences, to get confused in the way you imagine. This is becoming increasingly bizarre, nobody finds the figures ambiguous, but some of you are trying desperately to find a way that some other people might. The article was a completely accurate description of the facts. I’m really not sure there is anything you can do to account for people who read only half a sentence in half an article.

## Peter said,

April 2, 2006 at 5:15 am

Although the papers may have got their statistics wrong, the fact remains that some kids are taking cocaine. Indeed, as the people here can’t seem to decide what’s exactly right for the statistical analysis how the hell can some hack? It smacks somewhat of being oh-so-smart but only with hindsight.

Glad my first contribution was so soothing and un-feather-ruffling.

## oharar said,

April 2, 2006 at 7:15 am

I’m with sam on this….

“The question being tested is not whether 2004 and 2005 have exactly the same frequencies. The question being tested is whether the estimates from each year are consistent with being drawn from the same distribution.”

Sorry, but I’m pretty sure you’re wrong. I can’t see how Ben could get an estimate of the expected between-year variation from the data, and indeed he explicitly acknowledges that he can’t include any extra variation (that’s the but about clustering). I suspect that he used either a chi-squared or a Fisher’s exact test, both of which use the null hypothesis that the frequencies are the same. Obviously Ben can correct me if I’m wrong.

Taking a step back, one thing I like about this article is that Ben doesn’t go straight for the p-value: he actually discusses it in terms of effect sizes, and interprets the those numbers directly. Most good applied statisticians (at least the ones I respect) downplay the use of significance tests. As sam has pointed out, they don’t tell you how strong the effects are. Actually, with a sample size much below 9000, a difference of this size wouldn’t be statistically significant.

Ah well, it sounds like PK was having a good time anyway.

Bob

## stever said,

April 2, 2006 at 9:50 am

Peter – there are some tthings that arent in dispute.

the main one being that because of some staggering shoddy journalism, a .5%point or 35% increase in cocaine use, barely statistically significant, was reported as a ‘doubling’ which tyhen became a flood of cocaine in playgounds. That was because the cherry picked a stat and didnt realise or check that it had (obviously) been rounded up, and they didnt even bother to ask someone with GCSE level maths what it might mean.

## Squander Two said,

April 2, 2006 at 9:53 am

> Not only would these hypothetical people who are confused have to read only half of the article, theyâ€™d also have to read only half of the relevant sentences, to get confused in the way you imagine.That is, in fact, how many people read newspapers, especially how mathematically ignorant people read anything with statistics in it. Look at how

The Timesgot the story in the first place: ignore the wider report, because it’s all just weird numbers and stuff that no-one really cares about, find the two figures iof 1% and 2%, ignore the possibility of rounding because something so mathematical simply wouldn’t occur to your average journalist, and go with “DOUBLED!”However, Ben’s in the forefront of the fight against such crappy ignorance. Ignorant people who read his articles will end up wiser in the ways of science, assuming that they read the whole thing right through. I can’t imagine how he could possibly combat some readers’ sloppy scan-reading, short of going round to their houses and shouting at them.

No matter what anyone writes about anything, some reader out there will only read bits and therefore completely misconstrue its meaning. That’s not the writer’s fault.

## RS said,

April 2, 2006 at 10:24 am

“Sorry, but Iâ€™m pretty sure youâ€™re wrong. I canâ€™t see how Ben could get an estimate of the expected between-year variation from the data, and indeed he explicitly acknowledges that he canâ€™t include any extra variation (thatâ€™s the but about clustering). I suspect that he used either a chi-squared or a Fisherâ€™s exact test, both of which use the null hypothesis that the frequencies are the same. Obviously Ben can correct me if Iâ€™m wrong.”

Although they test to see if the frequencies are the same they also have assumptions built in that allow for variation within certain limits of those frequencies – that is why they are statistical tests – all I am trying to say is that while the true 2004 and 2005 frequencies may be slightly different, that isn’t what testing the two samples is trying to find out, it is trying to find out if the two samples are drawn from the same population (in the statistical sense), which can’t just be declared to be trivially false because the 2004 and 2005 samples are drawn from two different populations (in the epidemiological sense).

“RS, you did not respond to my main point, that null hypothesis significance testing does not tell how big the difference is. It only tests whether there is a difference of 0.00000000000. (further zeroes omitted)…Iâ€™m reluctant to say this, but intoning (in a deep voice of course, while scratching ones beard) that a difference is â€œsignificantâ€, without trying to estimate how big the difference is, is bad science.”

I did, right at the beginning. All a significance test will tell you is if you can be pretty sure that the samples are drawn from different statistical populations. In the context it is exactly the right thing to do because Ben is wondering whether a difference that is so small represents any difference in the populations at all (i.e. is it just noise). So the null hypothesis is the one he wants to test, and he uses the word significant in its usual statistical sense just like the rest of us do when we publish. The question of whether that is ‘meaningful’, or how the confidence intervals fall is not relevant, although obviously interesting. He could have calculated confidence intervals and noted how they don’t rule out an effect size of only X, but that would be a different argument. The figures he has are a best estimated of the actual frequencies, so worth discussing.

## Janet W said,

April 2, 2006 at 10:43 am

“null hypothesis significance testing does not tell how big the difference is..”

I only learned what a null hypothesis was a couple of weeks ago, but….I thought the point of null hypothesis significance testing was to assess the test, rather than the results, i.e. whether or not the test is good enough to tell you anything at all?

This is an odd conversation.

One one hand, Ben is challenging people to find a better way (moving only one word at a time) to pack an explanation of a rounding error and 3 reasonably tricky statistical concepts into one short Guardian article.

On the other hand, people are arguing about whether or not this same Guardian readership can work out that 0.5 is about a third of 1.4.

## oharar said,

April 2, 2006 at 10:49 am

“..all I am trying to say is that while the true 2004 and 2005 frequencies may be slightly different, that isnâ€™t what testing the two samples is trying to find out, …”

I’m sorry, but this is simply wrong. The tests are tests of whether the frequencies are the same in the two populations, and the only variation they allow for is sampling variation (i.e. the fact that only 9000 individuals were sampled in each year, not teh full populations).

“…it is trying to find out if the two samples are drawn from the same population (in the statistical sense)…”

Which will not be true if the frequencies are different, and that’s the only information used in the tests. Hence, it must be a test of whether the frequencies are different.

Bob

## RS said,

April 2, 2006 at 11:24 am

“Iâ€™m sorry, but this is simply wrong. The tests are tests of whether the frequencies are the same in the two populations.”

No it isn’t. The test is whether the two samples are drawn from the same population. What I’m trying to point out is that the true 2004 and 2005 frequencies are strictly speaking irrelevant to the test, but that we could consider this approach akin to testing whether the 2004 and 2005 populations are drawn from this hypothetical ‘true’ population of kids. I think it is too complicated to explain this when we’re using words like population in two different senses, and I’m off for the day, so y’all have fun.

## oharar said,

April 2, 2006 at 1:46 pm

“No it isnâ€™t. The test is whether the two samples are drawn from the same population.”

But if they’re drawn from the same population, then they must have the same frequency.

“..but that we could consider this approach akin to testing whether the 2004 and 2005 populations are drawn from this hypothetical â€˜trueâ€™ population of kids.”

Not so. You’re adding an extra level of sampling: first you’re sampling the (2004 or 2005) populations from a super-population, and then you’re sampling the sample used in the analysis. But the statistical tests only assume one level of sampling. There isn’t enough information in the data to do any more: one would need to have several years of data, and then fit some form of hierarchical model to the data, to account for the two levels of sampling.

Bob

## P.L.Hayes said,

April 2, 2006 at 2:38 pm

RS said, “I suspect that he used either a chi-squared or a Fisherâ€™s exact test, both of which use the null hypothesis that the frequencies are the same”.

I’d guess from what he wrote that he just calculated by hand that the p-value for e.g. the hypothesis that p>0.014 is binom.test(171,9000,0.014,”g”,0.95)

Exact binomial test

data: 171 and 9000

number of successes = 171, number of trials = 9000, p-value = 7.14e-05

alternative hypothesis: true probability of success is greater than 0.014

95 percent confidence interval:

0.01669346 1.00000000

sample estimates:

probability of success

0.019

or

> binom.test(171,9000,0.014,”t”,0.95)

Exact binomial test

data: 171 and 9000

number of successes = 171, number of trials = 9000, p-value = 0.0001116

alternative hypothesis: true probability of success is not equal to 0.014

95 percent confidence interval:

0.01628040 0.02203691

sample estimates:

probability of success

0.019

but I don’t think they’d have added anything useful to the point being made in the article.

## P.L.Hayes said,

April 2, 2006 at 2:43 pm

Sorry – that came out wrong

RS said, “I suspect that he used either a chi-squared or a Fisherâ€™s exact test, both of which use the null hypothesis that the frequencies are the same”.

I’d guess from what he wrote that he just calculated by hand that the p-value for e.g. the hypothesis that p gt 0.014 is much less than 0.05. A little help from a stats tool such as R would’ve gotten him the more tedious to calculate confidence intervals:

# binom.test(171,9000,0.014,”g”,0.95)

Exact binomial test

data: 171 and 9000

number of successes = 171, number of trials = 9000, p-value = 7.14e-05

alternative hypothesis: true probability of success is greater than 0.014

95 percent confidence interval:

0.01669346 1.00000000

sample estimates:

probability of success

0.019

or

# binom.test(171,9000,0.014,”t”,0.95)

Exact binomial test

data: 171 and 9000

number of successes = 171, number of trials = 9000, p-value = 0.0001116

alternative hypothesis: true probability of success is not equal to 0.014

95 percent confidence interval:

0.01628040 0.02203691

sample estimates:

probability of success

0.019

but I donâ€™t think theyâ€™d have added anything useful to the point being made in the article.

## Janet W said,

April 2, 2006 at 2:45 pm

“I donâ€™t know what test Ben did (chi-square?), but most statistical tests are explicitly designed to take into account the random fluctuations about the mean level and…”

I still don’t understand this p thing. Using the “5 heads” analogy, if you want to find the probability of getting at least 171 successes out of 9000 by chance, assuming that the probability of success is 1.4%, why can’t you just use the binomial distribution? I tried this and got 0.00007, which is a lot lower than the 0.05 quoted above, so can someone tell me where I’m going wrong?

## P.L.Hayes said,

April 2, 2006 at 4:16 pm

Nowhere Janet W – or everywhere you can go wrong (except as far wrong as the newspapers go) given such summary data – which is the whole point I suppose 😉

## oharar said,

April 2, 2006 at 4:20 pm

P.L.Hayes and Janet W.: You’re forgetting that there’s uncertainty in the 0.014 figure as well. I back-calculated that to there being 126 children claiming to have used cocaine in 1994, and 171 in 2005. They are both from samples of 9000 children, and assuming simple random sampling, would both be binomially distributed. The test is whether

bothbinomial samples came from a population with the same proportion.As we’re speaking R, I used this:

fisher.test(matrix(c(171,126, 8829,8874), nrow=2))

The p-value is 0.010, and the confidence limit for the odds ratio is (1.075, 1.73) (the point estimate is 1.36, which give the 36% increase mentioned above). This is too much detail for the article, but anyone who’s got this far down is sufficiently nerdy that they might find it useful.

Bob## Robert Carnegie said,

April 2, 2006 at 4:47 pm

Points:

At any rate, it isn’t a flood. If there’s a flood then -everyone- needs dry socks.

Of course kids exaggerate in the survey. They exaggerated in the previous survey too. So all the measurements are biased in the same direction.

(An interesting control would be to include an invented drug in the survey, like Chris Morris’s “cake”. Students who admitted using “cake” would be treated with the derision they deserve. Having said that, I’ve taken “cake” twice today already. With “cream”. I think I can handle it but my trousers don’t button so easily any more.)

If I follow the argument, there’s a 5% chance that there hasn’t been a trend at all, and a 95% chance that there has… both small percentages, of course. In the survey, the change represents a couple of classrooms full of students. I think if you handed out cocaine to -one- class of students you’d get into the newspapers…

Drug use by school students is a serious threat and should be taken seriously. Drug dealers want to establish relationships with users early, just as banks want to take on students as customers although they’re going to be a dead loss for the next few years. And kids can probably get into houses that have good stuff to take, you can try selling them to Gary Glitter… lots of opportunities.

## P.L.Hayes said,

April 2, 2006 at 4:49 pm

“P.L.Hayes and Janet W.: Youâ€™re forgetting that thereâ€™s uncertainty in the 0.014 figure as well.”

Well, “forgetting” isn’t quite the word I’d use – but point taken

## Fyse said,

April 2, 2006 at 10:46 pm

I fear this thread has got rather old already, but for what it’s worth here’s my two cents. (Many apologies for it being so long, I did try to cut it down…)

What we’re trying to decide is whether there is evidence in this data for a significant change in cocaine use amongst the age range in question. In this context significant means ‘not due to the inevitable random fluctuations from year to year’. (It is this random fluctuation that Sam suggests makes null hypothesis testing irrelevant.) The factors that govern cocaine use (how fashionable it is, local availability, price etc.) give a certain probability of use for each individual child. This defines a probability distribution, and the actual usage among the whole population of children for a given year is a sample (of many millions) from this probability distribution. Because the samples from the underlying distribution are so huge, random fluctuations are likely to be small. This means any change in the actual usage between years is probably indicative of a change in the underlying distribution. It is therefore useful to attempt to characterise these sample populations by a small survey of 9000 individuals.

The null hypothesis would therefore read ‘There is no change in the underlying probability distribution that determines the chance of an individual child taking cocaine’. While there are obviously fluctuations in the actual figures for each year, the important thing when planning strategies to combat use of drugs is to look at the underlying distribution. This is not easy to do and requires a lot of careful consideration, particularly given that we are looking at samples of samples, but that does not make it a pointless exercise. Since government policy is intended to reduce the use of drugs, if there is no evidence for for a decrease (of any size) then their policies are probably not having the desired effect.

If I’ve followed the animated discussion correctly, I think this generally agrees with RS. The distinction that I think has caused the confusion is that we’re trying to determine changes in the underlying probability distribution governing child use of cocaine, not change in the actual number of users in any given year.

For what it’s worth Ben, I thought you made an excellent attempt to explain a tricky subject to the layman. I was talking to some friends about this earlier today, but they gained far more from five minutes with your article than from half an hour of my garbled explanations.

## Fyse said,

April 2, 2006 at 10:51 pm

PS I never quite saw the point in statistics at school, and spent A level lessons playing poker on the back row. Only now do I begin to see quite how important it all is…

## sam said,

April 2, 2006 at 11:10 pm

The goal should not be to judge whether there is evidence for a “significant” change. A “significant” change, in statistical terms, does not mean a big change. It would be possible to have a big change this is not statistically significant, or a tiny change that is statistically significant. Significance is not the same as size of the change.

Taking the above example, let’s say there has been a “significant” change of either 0.5% or 48.6%. The size of the change is what matters, e.g., for policy. And null hypothesis signficance testing does not tell you the size of the change.

The problem with Fyse’s comments is that there is no statistical concept of “inevitable random fluctuations from year to yearâ€™. There is a well defined concept of error due to sampling. Null hypothesis significance testing addresses this. But the problem is that the statistical characteristics of the population are undoubtably changing from year to year. This is not sampling error. So we already know that the null hypothesis is wrong.

I’ve never heard of the idea that a population is just another sample, and that collecting data involves taking a sample from a sample. The idea of two levels of sampling is interesting, but this is not how conventional statistics works.

Sorry for sounding like a broken record….

## Fyse said,

April 3, 2006 at 12:06 am

Clearly the thread isn’t dead yet! I hope Ben doesn’t mind that this discussion is only tenuously linked to his post…

>> “The goal should not be to judge whether there is evidence for a â€œsignificantâ€ change.”

We’re not discussing whether there is evidence for a ‘significant change’. We are discussing whether there is ‘significant evidence’ for a change (of any size). The distinction is vitally important.

Whenever we use statistics to examine change we

mustdetermine whether it is significant. This tells us whether the observed change is likely an indication of anything, or just a random sampling error. Obviously ‘significant’ doesn’t mean ‘big’. It means the data can give us valuable information on the hypothesis under consideration. The appropriate measure of significance is different for different situations, but it is always a relevant concept.>> “The size of the change is what matters, e.g., for policy.”

Obviously it would be nice to know with confidence the actual size of the change, but if that isn’t possible then it can still be useful to know simply whether a change is present at all. Incomplete information is not the same as a lack of information.

>> “The problem with Fyseâ€™s comments is that there is no statistical concept of â€œinevitable random fluctuations from year to yearâ€™.”

That was ill-advised phrasing on my part, but the point is valid. When taking a finite sample from a distribution you will inevitably get random sampling errors.

>> “the statistical characteristics of the population are undoubtably changing”

Yes, but is the underlying probability distribution changing? I agree that given the number of variables involved it is almost certain to alter, if only slightly. But even if we assume it is definitely changing, can we confidently assert that it is increasing (as opposed to decreasing)? That is a simple yes or no question to which null hypothesis testing can be applied.

>> “Iâ€™ve never heard of the idea that a population is just another sample”

Yeah, I’ve been pondering that since my comment and I haven’t quite followed it through properly yet. It only came from my thinking through this problem, and is a bit beyond my formal education in stats. I think it’s a valid point however, since a finite sample size cannot perfectly characterise a distribution. I fear we may get rather deep into the very nature of statistical analysis…

Although I think standard null hypothesis testing can give useful information about this problem, it’s probably not the ideal way to approach it. After an Information Theory course I tend to try and apply Bayesian inference to everything, and that would certainly require taking into account that the whole population is a sample of the underlying distribution (if only to assume a limit of large sample size).

Sorry, another massive comment. (I don’t suppose there are many left reading all this.)

## Fyse said,

April 3, 2006 at 12:08 am

PS Oops, I meant to add…

Apologies for my previous sloppiness with the distinction between significant change and significant evidence.

## Sockatume said,

April 3, 2006 at 8:37 am

Robert Carnegie: that’s a great idea. It would provide a way of quantifying the kids-wanting-to-look-cool effect. Just put a made-up drug in there and count the responses.

Anyway, the whole mess just goes to show that you should remind yourself what statistics actually mean in reality. Why, my yogurt intake has increased infinity percent this week compared to the preceding week…

## Chris L said,

April 3, 2006 at 10:33 am

Sockatume (#69): Excellent point. I’m reminded of this at least once a week, when some hobby, disease, cult or whatever claims (or is claimed) to be “the fastest growing X in the country”. This is meaningless without the actual figures, which are never given, because they would reveal the original claim to be nonsense.

Similarly, I often hear the claim that “if everyone did [pro-environment action du jour], we could close down one power station”. Again, without providing the actual figures, such claims are meaningless. I did some research and found that there are approximately 130-ish large power stations in the UK, so these claims amount to saying that “[pro-environment action du jour] will reduce your leccy bill by less than 1%”.

Re: the earlier poster who suggested a “Bad Statistics” site, try www.numberwatch.co.uk – it’s somewhat opinionated but is always a good read.

## AndrewT in a cafe said,

April 3, 2006 at 11:49 am

numberwatch is a nasty, vindictive, ill-thought out disgrace. You can get a good idea of his worldview by looking at his oh so amusing ‘vocabulary’ section. Its the british ‘junk science.’

## coracle said,

April 3, 2006 at 12:22 pm

On a vaguely related subject there was a rather interesting discussion on the statistics of the news report on teenage drivers on radio 4 at lunch today. They discussed the increase and how changes in the teenage drivers demographics may be erroneously affecting the stats. Anyone else catch it/know what I’m talking about?

## Stephen said,

April 3, 2006 at 12:57 pm

“Of course kids exaggerate in the survey. They exaggerated in the previous survey too. So all the measurements are biased in the same direction.”

In fact most surveys of voting intentions are like this. They are weighted by comparing previous election reults to prior samples of voting intentions. People consistently claim they will vote Labour and don’t show, whilst many Conservatives claim they are don’t knows. So the sampling is taken as a trend indicator not an actual mean predictor.

## Fyse said,

April 3, 2006 at 12:58 pm

coracle – Yeah, I heard that too. It was on ‘You and Yours’, right? Stats is suddenly everywhere!

## Chris L said,

April 3, 2006 at 1:24 pm

AndrewT (#71): I’ve seen similar attacks on NumberWatch; clearly, he has strong opinions on the state of the nation, but I’ve never seen anyone actually refute any of his mathematical / statistical arguments. Someone on another forum dismissed him as “clearly knowing nothing about epidemiology”. I would genuinely be interested to know what fault you find with his arguments.

## Briantist said,

April 3, 2006 at 1:35 pm

Did you see this bad maths cocaine story?

news.bbc.co.uk/1/hi/uk/4394392.stm

“HMS Cumberland seized two tonnes of cocaine worth Â£200m after intercepting a speedboat during a routine patrol. ”

1 tonne=1,000 kg = 1,000,000g

That makes the price of a gram Â£100…

## Jon said,

April 3, 2006 at 1:38 pm

Robert Carnegie:

“An interesting control would be to include an invented drug in the survey”

that’s often done – certainly in the “British Crime Survey” and the “Offending, Crime and Justice Survey”* the drug ‘semeron’ is used as a fake drug.

Jon

*both Home Office, the first a general household population survey on victimisation with a drugs self-completion, the second a survey about offending behaviour with a youth boost sample (2003) and is now a longitudinal panel survey of ‘young people’

## Briantist said,

April 3, 2006 at 1:54 pm

Like the “drug” cake in Chris Morris’ Brass Eye?

## stever said,

April 3, 2006 at 1:58 pm

Briantist – despite the police and customs routinely exaggerating the value of drug busts there is some tenuous logic to that stat. they always go for street value which is already a bit dishonest becasue that wiill be several multiples of what the consignment would sell for to the wholsalers/mid level. Also if the haul, as is likely, was near 100% pure then it is also inevitable that it would be cut to approx 20-40% purity. This would then take the street value – at Â£30-50 of what the punters bought to around 2million.

they should definitely explain it better. ie ‘would be sold on the streets for 2 million’

## RS said,

April 3, 2006 at 2:01 pm

I remember reading somewhere that they were worried children were just saying ‘oh yes, I have coke all the time, we’ve got a machine dispensing it in cans at school’, don’t know whether they still have ‘coke’ as a synonym for cocaine.

## AndrewT in a cafe said,

April 3, 2006 at 3:06 pm

chris l. primarily my main problem is that he doesn’t actually understand (or at least chooses to ignore) the difference between risk ratio and risk difference. (which is what half of this thread has been about)

that is, a risk ratio of 1.5 is irrelevant if only 0.001% of the population get a disease, but can actually make a huge difference if the prevalence of a disease is higher, say 10%. In fact the risk difference is the ‘public health’ measure of choice, as it estimates how many people are affected. (cf NNT for the doctors out there)

From an environmental risk perspective, it also makes a huge difference how many people are ‘exposed’. For instance very few of us live near power stations, but 80% of the uk live within 2km of a landfill site. The more people who are exposed, the more important smaller risk ratios are.

He assumes about epidemiologists that they know nothing of this. I taught it in a Basic Epidemiology module last year. lecture 5. its simple stuff.

For instance here:

www.numberwatch.co.uk/big_liars.htm

he bolds up certain key phrases as being ‘weasel words’ when in fact they are standard scientific terminology. What he refers to as ‘normal scientific standards’ for relative risks isn’t true.

www.numberwatch.co.uk/RR.htm

These are anecdotes masquerading as fact. Smaller risks do get published because of their immense public health impact. Mosquito-impregnated bednets reducing childhood mortality being a prime example.

## Briantist said,

April 3, 2006 at 4:20 pm

stever,

But also, I have been told, if you buy it “on the street” in any quantity, you get a discount . I understand that an ounce is priced at around Â£900, giving a discount of the “street price” of 35% – which is around the “standard retail markup” – 40%.

## Briantist said,

April 3, 2006 at 4:28 pm

And, of course, there is a fantastically simple way to test the purity of cocaine, but it involves turning it into crack, and that might be bad science.

## Jason Ditton said,

April 3, 2006 at 4:54 pm

Yes, but can anybody tell me how to post-hoc control for clustering?

Incidentally, the main design flaw of the BCS is that is a repeat cross-sectional rather than longitudinal panel design. If panel, none of the weighting and controlling procedures would be needed.

The size is so big (circa 40,000 R) that they cannot find any organisation to do it in a reasonable time frame, so they are at it all the time. They call this “continuous sampling”, which is an invented technique (or, rather, a name invented for an untested procedure). The previous thing they made up was “focussed enumeration” for the ethinc booster. This meant that if the intended ethnic R was not at home, go next door, as somebody there will probably be ethnic too.

## stever said,

April 3, 2006 at 5:51 pm

*officially the nerdiest thread EVER*

## sam said,

April 3, 2006 at 8:56 pm

I bet Ben is already dreading the next time he uses the word “significant” in an article.

## Jon said,

April 4, 2006 at 8:38 am

Jason:

> If panel, none of the weighting and controlling procedures would be needed.

if it were a panel (like the Offending, Crime & Justice Survey) it would need *more* weighting procedures because you would have to weight for differential refusal at each successive wave – i.e., the weight for Wave 3 would have to also incorporate weights for the probability of responding at Wave 1, then the probability of having responded to Wave 1 they also respond to Wave 2, and then the probability of responding to Wave 3 given that they’ve responded twice already.

You’d also have to deal with panel conditioning – to deal with the fact that these young people have completed the same questionnaire every year.

> They call this â€œcontinuous samplingâ€

actually, it is “continuous fieldwork” but not continuous sampling. The sampling is still – as far as I know – carried out at one point of time using the Postcode Address File (stratified and clustered) and then the fieldwork is continuous.

Jon

## BorisTheChemist said,

April 4, 2006 at 9:19 am

I feel like a complete idiot now.

## RW said,

April 4, 2006 at 9:32 am

Nerdiest perhaps but probably the deepest statistical rabbit hole ever. Debatable conclusions from a limited set of data and information? That’s statistics. Making up scare stories from a limited set of data and information? That’s journalism.

So is this bad statistics or bad journalism? Whilst there is debate about the conclusions that could be drawn from the data presented, there is no evidence that this has been dressed up by the report’s authors to give specious findings. The same cannot be said for the jounalist’s presentation of that data which builds mountains out of molehills. On this basis one might suggest that the agenda for the headline was set before the report was read but that would be another example of making outrageous claims without all the necessary data to support it.

## AndrewT said,

April 4, 2006 at 10:35 am

“Yes, but can anybody tell me how to post-hoc control for clustering”

[nerd mode]

impossible, without the full data set. The point estimates ie 1.4% and 1.9% are valid, the construction of a confidence interval is not; they are too narrow and p-values are too small. You would need the response rates per school sampled to estimate the between cluster variability, and without that, you’re stuffed

[/nerd mode]

## stever said,

April 4, 2006 at 12:22 pm

RW – quite. that was my point really. regardless of the ins and outs of sampling and stats the story really goes back to the bad journalism in the first instance.

## pv said,

April 4, 2006 at 12:47 pm

“On this basis one might suggest that the agenda for the headline was set before the report was read but that would be another example of making outrageous claims without all the necessary data to support it”

Perhaps not so outrageous for anyone who has attended an editorial meeting. I any case it’s not so difficult to spot a pattern of claims unsupported by the actual reports.

## Robert Carnegie said,

April 4, 2006 at 1:00 pm

I don’t think that manufacturing crack cocaine is Bad Science. It’s Naughty Science.

## censored said,

April 4, 2006 at 1:37 pm

In a vaguely related story today, there is this quote:

“A study today in a special MDMA issue of the British Journal of Psychopharmacology, suggests long-term side-effects may be temporary”

How can effects be both long term and temporary?

## RS said,

April 4, 2006 at 1:40 pm

Not read it, but at a guess, the long term effects are from continuous usage, if you stop it, they go away.

## stever said,

April 4, 2006 at 2:01 pm

The geezer who took 25 pills a day for 4 years appears to be in a fair bit of trouble, unsurprisingly.

www.guardian.co.uk/drugs/Story/0,,1746333,00.html

I imagine he could single handedly trash any drug use survey data.

## coracle said,

April 4, 2006 at 2:10 pm

err, that would be The Journal of Psychopharmacology, rather than the British Journal of…

That rather confused me.

## RS said,

April 4, 2006 at 2:14 pm

25 pills a day! Someone was ripping him off.

## RW said,

April 4, 2006 at 4:07 pm

Apparently this was in addition to a rather large selection of other non-prescription drugs including cannabis, solvents, benzodiazepines, amphetamines, LSD, cocaine, and heroin. May I suggest he was perhaps exagerating his prodigious appetite for those love heart sweets.

I’m also intrigued to understand how he would fund this pharmacaeutical extravaganza.

## MattLB said,

April 4, 2006 at 5:12 pm

>â€œA study today in a special MDMA issue of the British Journal of Psychopharmacology, suggests long-term side-effects may be temporaryâ€

>How can effects be both long term and temporary?

It does look odd, but could be meant to mean that the side-effects only manifest after long-term use, yet will diminish if the user subsequenlty stops taking the drug.

## Alan Harrison said,

April 4, 2006 at 8:27 pm

Well I’m late again, but I seem to have missed a stats practical (again) so no loss 😉

profnick said some years ago: “I would have replaced your instances of â€œ0.5%â€ with â€œfrom 1.4% to 1.9% and omittted all the stuff about correcting for clustering””

Ben said, rightly, that would leave a weak premise for a column.

In general most commenters forget that Ben is a journalist and this is his weekly column in the Guardian. Yes he could write tons of clever stuff that would keep all you pedants happy but he’d lose his job. Ben’s challenge to you (i.e. what would you write?) is a good one. If you can’t take it then shut up.

I’m left with memories of 3 hour stats practicals in Sheffield Uni in 1986 which, after the first, I spent feeding pizza crusts to the ducks in the park behind the Arts tower. Happy days.

## Why Dont You…Blog? » Bad Science - Bad Statistics… said,

April 4, 2006 at 8:39 pm

[…] This weeks article (at www.badscience.net/?p=230) is about the way news papers fight for headlines by really overdoing the actual data. The headline claims of the number of children using cocaine has “doubled” is based on an increase from 1.4% to 1.9%. Even my basic understanding of maths doesnt see that as a “doubling.” […]

## Jason Ditton said,

April 5, 2006 at 6:24 pm

Sorry. But how do you control for cluster effects?

## oharar said,

April 5, 2006 at 7:28 pm

“Sorry. But how do you control for cluster effects?”

As we’re already in the nerdiest thread, I don’t feel so bad about answering this. And the answer is…

It depends. In general, the effect of clusters is to increase variation in the data. The solution is to include this variation in the analysis, but how exactly that is done depends on the structure of the data and where the clustering is. You need some sort of replication (e.g. between schools, between years etc.), and then you model this variation. There’s a whole class of models called hierarchical models that deal with this, by estimating the variance at the different levels in the data.

Bob

## Steve said,

April 5, 2006 at 8:28 pm

Yes. What hierarchical models do is allow you to understand how much of any variation is attributable to the properties of ‘clusters’ rather than the properties of individuals . In the example that talked about in the article there could be several types of clustering eg schools, classes within schools. It then becomes complicated as there is then the possibility of cross-classification ie clustering of classes within a year group in all schools (a cohort effect). Some researchers like to control for clustering but I (as a geographer) am interested in the properties of the clusters themselves ie a cllective school or classroom ‘effect’ on drug use..

Anyone interested in looking at this should check the pioneering work of Harvey Goldstein the education researcher who developed and popularised multi-level models.

Introductions to multi-level models have been written by Kelvyn Jones and others (within geography)

## oharar said,

April 7, 2006 at 6:01 am

Yes! We’ve finally found the level of this community! Consecutive posts about hierarchical models and everyone shuts up.

Of course this also means that Steve and I are Ã¼ber-nerds (unter-nerds?), but I guess we both already knew that anyway.

Bob

## Sockatume said,

April 7, 2006 at 8:17 am

I can wholeheartedly reccomend introducing statistics (particularly the Monty Hall Problem) into dinner conversation.

## Aspiring Pedant said,

April 7, 2006 at 12:18 pm

Why has this message – ” You must bee logged in to post a comment” appeared? I really needed another password to forget.

As a humble engineer with a fairly basic understanding of statistics it seems to me that a common problem with statistics, as they appear in the media, is that journalists will often try to compare 2 data points and make out there is an underlying trend. The statistics Coracle referred to in comment 72 – news.bbc.co.uk/1/hi/uk/4871854.stm are another good example. The story says that teenage driver deaths are on the rise but quotes only figures from 2000 & 2004. Are there no figures available from other years? Or would those figures not create the right impression?

I suspect the same might be true for the drug use survey; were we to look at figures for the last 5 years we might find that the figures vary a little between 1 & 2% or maybe there is a genuine trend, but referring to only 2 sets of figures is never going to convince me that there is any kind of trend.

Alan Harrison – ” In general most commenters forget that Ben is a journalist ” – Ben is definitely not a jounalist; he works full time for the NHS and writes a weekly column for the Guardian. The fact that Ben is not a journalist, and doesn’t write like one, is what makes his column so refreshing. However, I agree that he has to write a column that can be understood by most Guardian readers and so too much detail on statistics is unlikely to be welcome. I have to say that I found his latest column a bit dull anyway – and I find statistics quite interesting.

## stever said,

April 7, 2006 at 1:10 pm

AP – it was because the forum got hit with a wave of crappy gambling spam yesterday from automated SPAM BOTS! registering should stop it happening.

## Fyse said,

April 7, 2006 at 8:21 pm

Sockatume – I’ve tried to explain the Monty Hall problem a number of times now, particularly to my Dad who just wont buy it. While I was initially persuaded by working through the maths, if your victim doesn’t have the necessary background it’s really difficult to form a convincing explanation. Any tips?!

By the way Ben, I was wondering whether the passwords we choose are visible in plain text in your SQL database? Obviously I trust you not to abuse the privilege, but I was curious for future reference when registering on other blogs.

## briantist said,

April 10, 2006 at 9:49 pm

Ben, what Fyse means is that if you store the passwords as plain text in your database, it is easy for a hacker to view them. The best way to sort this is to use a function like this one in your PHP code to encode the password in a one-way process, so no-one can just read the password..

function crypt($strPassword)

{

$strmd5=md5(strtolower(trim($strPassword)));

$strcrc=dechex(crc32($strPassword));

$strV1=md5($strmd5 . $strcrc) .md5(strlen($strcrc) . $strcrc);

return (md5($strV1));

}

## Ben Goldacre said,

April 10, 2006 at 9:53 pm

you’re clearly mistaking me for someone who programs websites.

## briantist said,

April 10, 2006 at 9:58 pm

I should have guessed! A programmer wouldn’t leave “Fatal error: Call to undefined function: () in /home/users/web/b2624/pow.bengoldacre/htdocs/wp-content/plugins/stattraq.php on line 90” at the bottom of each page. 😀

## briantist said,

April 10, 2006 at 10:08 pm

OK, back to statistics again then. Have had a look at Annex I of this document if you have a minute.

www.ofcom.org.uk/research/tv/reports/dtv/dtu_2005_q4/q4_2005.pdf

Is there any statical basis to the A1.14 adjustment that removes 4.3 million Freeview boxes from the figures? It represents 44% of the boxes sold, and its only purpose seems only to allow Sky to remain infront.

## RS said,

April 11, 2006 at 7:55 am

briantist, they give their reasoning:

“Second set duplication

A1.11 Latest available data (from Q3 2005) suggest that around 30.2% of Freeview boxes

are being used on secondary sets by viewers who already have digital (either

Freeview or Sky or cable) on their main set (source: GfK). Ofcom estimates that this

equates to a total of 3.2 million DTT receivers on secondary sets.

Inactive boxes

A1.12 A number of DTT boxes are currently inactive, possibly because they were never

installed by consumers, have been replaced, or because of reception issues. The

latest estimate for this figure (from Q4 2005) is around 1,040,000 (source: GfK).

ITV Digital legacy boxes

A1.13 There are also around an estimated 250,000 ITV Digital legacy boxes remaining in

the market. The number of homes where the ITV Digital box is the only digital

platform is estimated at 130,000 homes (source: GfK).

Ofcom adjustment

A1.14 Ofcom has therefore deducted around 4.3 million from Freeview sales in order to

account for these adjustments. This means the number of Freeview-only homes is

therefore calculated as a little under 6.5 million.”

## briantist said,

April 11, 2006 at 12:24 pm

Yeah, I read that. I just don’t think it’s valid.

It seems that anyone who has, say, a Freeview box and NTL digital cable gets discounted as a Freeview customer.

Likewise if you have Sky and Freeview the Freeview number gets reduced, but not the Sky number. This hands a double advantage to the Sky count!

To say that people who bought a DVB-T box with an “ondigital” label or an “itv digital” sticker are not watching Freeview with it is diengenious to say the least.

Also, people who buy a top-up TV badged box can watch all the Freeview services, but they don’t get counted either.

I just can’t see that for every twenty DVB-T boxes sold that NINE have been binned – if this was even remotely true then there would be no need for the BBC licence fee to go up to deal with free boxes for the underprivaldged, all that would be needed is a “put your unwanted Freeview box here” box in every high street (in the style of the cat-food collection boxes in my local Sainsbury’s).

## hatter said,

April 11, 2006 at 3:38 pm

Cocaine flooding? Anybody drown?

Pendantica it is true that boys are likely to generally claim more sex than reality and girls will claim less.

With drug surveys things can go either way. Answering yes to appear cool is possible, but equally possible is the sneaking suspicion that drug warrior nut is going to use your answers to entrap you.

It is interesting to know the trends, drug fads come and go, but this needs to be set against overall usage. Substitution is typical.

Did this report state levels of usage? I mean actual rates of use rather than the standard loaded terms like heavy, casual, etc. While we might prefer teenagers to not be using recreational drugs, including ones like alcohol and tobacco, both at least as harmful as anything illegal, if they’re just taking them relatively infrequently, no more than a few times a month, and in relatively small doses, it is not a major concern.

I doubt dealers are generally as forward thinking as to consider things like setting up a long-term relationship. Of course if someone can find a reliable dealer who consistenly sells a quality product, then they should stick with them. Of course that’s rare, like reliable banks.

They do typically just exaggerate the value of intercepted drugs. It is particularly noticeable with drugs that are not or cannot be diluted such as cannabis and MDMA tablets. They also like to quote big numbers for the quantities involved. Of course if the authorities were to mention that what they catch is a small percentage of the total amount smuggled and that that their efforts have almost no impact on street level supply they wouldn’t look nearly so good.

## briantist said,

April 20, 2006 at 1:10 pm

The Ofcom logic boggles me! It seems to be done like this:

How to count fruit, the Ofcom way.

– apples sold on Monday or Tuesday don’t count as apples

– any apple bought by an orange owner doesn’t count as an apple

– any apple bought by a banana owner doesn’t count as an apple

– if you buy an lemon, orange or a banana after you have bought an apple, the apple doesn’t count anymore

– lemons only count as lemons if they were origianally oranges, not if they are bought as lemons

## vcnmwwgr - Google Search said,

December 28, 2006 at 3:38 pm

[…] […]

## kyvpjrdr - Google Search said,

December 29, 2006 at 6:53 pm

[…] […]

## mickjames said,

June 14, 2007 at 4:14 pm

Fyse: the best way to deal with Monty Hall refuseniks is to play the game for money.

## ghghgh said,

April 19, 2010 at 3:33 am

asdfasd

## Observasjon er vitenskapens mor, men klokkertro er vitenskapens morder | Hjernedvask said,

May 25, 2011 at 8:50 pm

[…] Goldacre om journalisters vidløftige tolkning av […]

## Are there benefits to gaining a strong statistical background? | humblecontrarian said,

June 19, 2013 at 1:17 pm

[…] www.badscience.net/2006/03/cocaine-floods-the-playground/ […]