A new and interesting form of wrong

November 27th, 2010 by Ben Goldacre in bad science, numerical context, statistics, survey data | 32 Comments »

Ben Goldacre. The Guardian, Saturday 27 November 2010

Wrong isn’t enough: we need interestingly wrong, and this week that came in some research from Stonewall, an organisation for whom I generally have great respect, which was reported in the Guardian. Stonewall have conducted a survey, and their press release says it shows “the average coming out age has fallen by over 20 years”.

People may well be coming out earlier than before – intuitively that seems plausible – but Stonewall’s survey is flawed by design, and contains some interesting statistical traps.

They gathered data from 1,536 people who were already out, asking them what age they came out at, through social networking sites. Among the over-60s, the average age they had come out was 37; those in their 30s had come out at an average age of 21; in the group aged 18 to 24, the average age for coming out was 17.

Why is the age coming down? Here’s one reason. Obviously there are no out gay people in the 18 to 24 group who came out any time later than age 24, so the average age at which people came out in the 18 to 24 group cannot possibly be greater than the average age of that group, and certainly it will be lower than, say, 37, the average age at which people in their 60s came out.

For the same reason, it’s very likely indeed that the average age of coming out will increase as the average age of each age group rises: in fact, if we assumed (in formal terms we could call this a “model”) that at any time, all the people who are out have simply always come out at a uniform rate between the age of 10 and their current age, you would get almost exactly the same figures (you’d get 15, 23, and 35, instead of 17, 21, and 37). This is almost certainly why “the average coming out age has fallen by over 20 years”: in fact you could say that Stonewall’s survey has found that on average, as people get older, they get older.

But there is also an interesting problem around whether, with the data they collected, Stonewall could ever have created a meaningful answer to the question “have people started coming out earlier?” It’s a difficult analysis to design, because in each age band, there is no information on gay people who are not yet out, but may come out later, and also it’s hard to compare each age band with the others.

You could try to fix this by restricting all the data to only include people who came out under 24, and then measure mean age of coming out for each age group (18-24, 30s, 60+) in this subgroup alone. That would give you some kind of answer for this very narrow age band, but it makes some very dubious statistical assumptions. And if we allowed ourselves that move, we’d then be working with an extremely small set of data: there were only 33 respondents aged over 60 in total.

Even then, the discussion of this poll also assumes that the age at which people know their sexuality has remained unchanged. Some believe that everyone’s sexuality is fixed and known from birth – I may be walking into a minefield here – but if the age at which people recognise their own sexuality is changing, then a more relevant figure by which to measure discomfort at coming out might be the delay, rather than the absolute age.

I thought I’d already covered all the ways that a survey could get things wrong, but this one brought something new. Maybe we should accept that all research of this kind is only produced as a hook for a news story about a political issue, and isn’t ever supposed to be taken seriously. And in any case, my hunch is that a well-constructed study would probably confirm Stonewall’s original hypothesis. But it’s still fun to dig.

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

32 Responses

  1. Mark Wainwright said,

    November 27, 2010 at 3:02 am

    “my hunch is that a well-constructed study would probably confirm Stonewall’s original hypothesis”

    Mine too, but prima facie, the fact that a uniform model fits the data so well suggests we (and Stonewall) are wrong, doesn’t it?

  2. Andrej Bauer said,

    November 27, 2010 at 8:01 am

    Ordinary people and even students of mathematics have real trouble with understanding that a faulty argument leading to a true result is faulty. So good luck.

  3. Mark Frank said,

    November 27, 2010 at 9:17 am

    Surely they were answering the wrong question? The real question is what age did people come out in earlier decades. The current age of people who are out is only loosely related to this. If you were investigating e.g. average age of getting married you would look at marriage records for earlier times and find out the age of the bride and groom – not ask people who are currently married what age they first got married.

    It should be possible to do something similar with the data they have gathered – assuming they have the date people came out and their age at that time. There might be very little data for the earlier decades but you can’t get round that and all sorts of scope for biasses of various kinds.

  4. susu.exp said,

    November 27, 2010 at 1:10 pm

    I´m reminded of a panel I was a part of some time ago. In the end an audience member criticized that no panelist represented people in the closet. We did note that there´s no way for somebody to be on a public panel as a closeted person and actually be a closeted person (i.e. they´d be out as soon as they were part of the panel). The main issue here is that you´d really need to know something about the people who are not out (yet).

  5. iliff said,

    November 27, 2010 at 1:55 pm

    “Stonewall’s survey has found that … as people get older, they get older.”

    Beautifully put.


  6. cwhitrow said,

    November 27, 2010 at 3:12 pm

    It’s an interesting problem, as you say, but I think there is a way into it. As you pointed out, the difficulty is that for any age cohort, much of the distribution of coming-out ages is ‘censored’ (i.e. we have no idea how many people would be coming out later, at older ages). So, for a given age cohort, we only see the distribution up to the highest age in that cohort.

    However, we can get some idea of the rate at which people are coming out within each cohort. The assumption is that if people are coming out at a ‘faster’ rate in one cohort, then the mean (or median) coming out age must be lower in that cohort than in another where the coming-out rate is lower. For example, let’s say that among today’s 30-year-olds, 5% came out at 16, 7% at 17, 10% at 18, etc. We’re looking at the tail of some distribution, which we might assume to be more-or-less uni-modal (probably a reasonable assumption). We also have to assume that the standard deviation for each cohort is roughly same. Then we could look at the ratios of the numbers coming out in successive years and take this to be the coming-out rate at that age (for that cohort). These could then be compared across cohorts to see if there is a consistent variation. There wouldn’t be an easy statistical test for significance, although one could possibly use a bootstrap method to get a handle on this.

    I might also try to fit a model of the form k.exp(a.x^3 + b.x^2 + c.x + d) to each of the distributions (for different age cohorts). One might have to impose further constraints on these models in order to get useful results.

    In short, there’s no way around the data censorship problem, other than to make a bunch of reasonable assumptions. But if you do make some assumptions, you can get answers which I think have a good chance of being correct.

  7. cakes said,

    November 27, 2010 at 6:25 pm

    Taking a cue from cwhitrow above, could you, for each age-group, plot the incremental proportion of people coming out over time – so, in the 18-24 cohort, the number coming out in their 20th year is 3% of those already out, and in their 21st year, those coming out were 2% of those already out – (dx/dt)/x, and then see how many years of age you have to slide a given cohort’s curve to get it to fit a younger cohort. It is like cwhitrow’s *rate* of coming out (it may actually be the same, I am not thinking too well tonight) – but doesn’t cope with a stretched distribution like his does.

  8. Redclaire said,

    November 27, 2010 at 6:43 pm

    Just to add another thought to the discussion- what does “coming out” actually mean? Taking myself as an example- I am gay and aged 33. I first came out to my friends when I was 17, my siblings when I was 19, my work colleagues when I was in my late 20s and my mother when I was 32. I am still not out to my other relatives, and am still regularly “coming out” to new people I meet. So, what age did I come out at? I’m not entirely sure.

  9. cwhitrow said,

    November 27, 2010 at 8:27 pm

    Good point!

  10. heavens said,

    November 27, 2010 at 9:24 pm

    Redclaire’s comment explains the problem with reducing a long process to a single moment in time.

    What if, in addition to the process taking some time, the goal posts shift?

    What if, at age 18, I think that being “out” to a select circle of close friends is really being out, and thus answer that I came out at the age of 18?

    What if at age 25, I change my definition, and say that before, I wasn’t really out, because being “out” requires telling all of my co-workers and casual acquaintances? Now I say that I was “pretend out” at age 18, but now I’m really out, at age 25.

    What if at age 40, I say that it’s not enough to tell people who know me fairly well, because being out requires signaling my sexual orientation to complete strangers? I was kind of out before, but now that I have a bumper sticker, pictures of the pride parade on my Facebook page, and a rainbow flag, I’m really, really out at age 40.

    This all reminds me of a 16-year-old girl who said that teenagers should be permitted to do whatever they like, because all her friends were “very mature”. We may, as we age, develop a deeper understanding of what it means to be out, just like as we age, we develop a deeper understanding of what it means to be mature.

  11. finlay said,

    November 27, 2010 at 11:00 pm

    We could always come back to it in 20 years and conduct the survey again. They often have to do that in linguistics (my field) for similar reasons: you can’t measure if something has changed by looking at a static population; you can only make educated guesses from what the old people do and what the young people do. And if they both do the same, you basically have to leave it for a few years to see if it is in fact changing.

    I’d say if we come back in 20 years’ time and the 18-24 age group’s number has gone up to roughly what the current 38-44 age group’s number is now, then we’ll be able to conclude that it hasn’t changed…

  12. AndrewKoster said,

    November 28, 2010 at 8:33 pm

    @finlay: sure, but that’d be “giving up” for now. Obviously the easy solution is to say that in 20 or 40 years we can redo the same survey and then we will know for sure. However, the interesting problem Dr. Goldacre is referring to is how to use the current, known to be incomplete, data to still attempt to verify or reject the hypothesis. What Stonewall did was obviously wrong, but can their data be used? While these methods won’t be as good as if we didn’t have incomplete data in the first place, they may still be good enough to give us reasonable certainty.

  13. Paulski said,

    November 28, 2010 at 9:35 pm

    This reminds me of a segment in ‘Brass Eye.’

    ‘Crimes we know about are steadily increasing, and crimes we nothing about are rising too.’

  14. Martin B said,

    November 29, 2010 at 7:23 am

    You could use a variation of Finlay’s argument, and say that for any age group, remove anyone who has come out in the previous 20 years and compare the average age of coming out to the average of those 20 years younger. However, I can still see several issues with this, not least heavens’ idea of what is out, or the sort of self-reporting of previous experiences which we so regularly bemoan when performed in woo satisfaction surveys!

  15. Guy said,

    November 29, 2010 at 8:56 am

    Before we all start redefining the date. Just a little thought about how and from whom it was collected.
    “The poll was conducted through Stonewall’s social media pages” about says it all for me. This wasn’t a survey with a sample balanced for various factors leaving age as the only uncontrolled variable.
    This was self-responders, on-line users, currently interested in Gay rights etc etc.
    I often read surveys saying “70% of doctors don’t like xxx”. Whereas what they mean is that 70% of doctors who read Pulse magazine on line, who read this edition and felt strongly enough to answer our highly biased survey question felt that xxxx.

    As they say, Garbage In Garbage Out.
    Apologies to Stonewall if I’ve been unfair as they give no details about the survey other than above.

  16. plevy said,

    November 29, 2010 at 9:17 am

    “… so the average age at which people came out in the 18 to 24 group cannot possibly be greater than the average age of that group…”

    Pedant’s (and mathematician’s) point – the average age at which people came out in the 18 to 24 group could easily be greater than the average age of that group, though it certainly couldn’t be greater than 24.

    Imagine that there is one person aged 18, one aged 19 etc. but that only the person aged 24 has come out. Then the average age of people in this group is 21, but the average age of people who are out is 24.

  17. plevy said,

    November 29, 2010 at 9:18 am

    Correction, my mistake – I missed the point that everyone in the survey is already out… doh!

  18. lindsay said,

    November 29, 2010 at 11:01 am

    Hi I was just on the radio as a lesbian over 60 being interviewed about this research. I made exactly the same points you have made to the Woman’s Hour interviewer before they recorded us. So please don’t imagine we have all been impressed by the conclusion that people are coming out earlier. I had an argument with the interviewer before we did it about the premise of the Stonewall research. To the point I annoyed everyone. I am a researcher! If you take a group of queers of 18 and younger they are BOUND to have an earlier average coming out age than a group of over 60s. You could take a bunch of mothers of 18 and under and compare them with a bunch of mothers over 60 and find exactly the same difference of average age of first birth. Which would not mean that women are having babies younger

  19. lindsay said,

    November 29, 2010 at 11:06 am

    However it would have been interesting, and I said that to Woman’s Hour, to compare the average age of first coming out of people over 60 who came out at 18 or under with those of contemporary out people of 18 or under. Even within the limitations of the survey I think that could have had some validity. If, for instance, it had found that the average coming out age of the older group who came out at or before 18 was 16 or 17 and the current average age is 15. Of course this would not have been much of a story.

  20. lindsay said,

    November 29, 2010 at 11:07 am

    Of course none of the things I said as a researcher were reported on Woman’s Hour, they were interested in my coming out experiences in 1965. A lot of things happen behind the scenes, always.

  21. Klaire said,

    November 29, 2010 at 5:02 pm

    I have observed some women switching to homosexuality after 50 yro, but it is not a real coming out, rather a chosen option, when they think they are not enough attractive anymore to the opposite sex: married, divorced, widowed; one of them gave me a book, of the sort I never buy, called “Empowering Women” by Louise L. Hay; it was underlined, so it was easier to me to pry into it. It says (I`m translating from spanish): “Intimacy with another woman may reveal depths that women have never before experienced…another woman usually accepts and understands better physical changes related to age”, etc.
    So, motivations in this sense are a work in progress.

  22. johnnye87 said,

    November 29, 2010 at 5:49 pm

    Just to indicate how flawed this study is – the newsreader announcing it on Radio 4 this morning described it as “not particularly scientific”. Of course, they still featured it, but it’s come pretty far if journalists are mocking your research methods…

  23. Nuut said,

    November 30, 2010 at 7:23 am

    There is another problem with this research; in order to have a coming out age you need to still be alive. So only the over 60s who are still alive gave their coming out age. A seriously biased sample. If you came out aged 20, died aged 30, 30 years ago you would not be in this study.

  24. Guy said,

    November 30, 2010 at 9:15 am

    I’m really glad I didn’t say

    “I have observed some women switching to homosexuality after 50 yro, but it is not a real coming out, rather a chosen option, when they think they are not enough attractive anymore to the opposite sex”

    You don’t think that this stereotyping might be offensive to some over 50 lesbians???

  25. richardelguru said,

    November 30, 2010 at 2:41 pm

    @Nuut Yes, and when you consider the effects of AIDS…

  26. koalaesq said,

    November 30, 2010 at 7:37 pm

    @johnnye87 HA! That’s pretty funny, when even announcers call you out. How come they don’t do that more often on important issues rife with pseudo-science, like anti-vaccination or homeopathy frauds?

  27. PhilippBayer said,

    December 1, 2010 at 3:13 am

    Couldn’t you bin the results in >4 populations, e.g. 0-20, 21-40, 41-60, 61-80, and compare these 4 populations using the Kruskal-Wallis-test?
    So you compare the distribution of ages of coming out inside the sub-populations, not the ages themselves, but I might of course be wrong…

  28. IheartNaturalBeauty said,

    December 1, 2010 at 11:59 am

    Hi Ben, loving your work as they say! I’ve popped a photo of you on my blog today as a) you’re cute and b) as a holistic therapist I’ve enjoyed your confirmation that toxins don’t exist. Technically. I’m still going to talk about them though. iheartnaturalbeauty.blogspot.com – it would make my week if you’d check it out.

    Thanks, Iona

  29. Mark Frank said,

    December 1, 2010 at 3:08 pm

    Surely the best way to use the data is to forget organising it by how old the respondent is and organise it by when respondents came out and how old they were at the time.

    If you know how old they are and how old they were when they came out then you can easily work out what year they came out. All you have to do is organise the data into bins of decades or half-decades and work out average age of those who came out in those periods e.g.

    Decade Average Age

    70’s whatever
    80’s whatever
    90’s whatever

    Then you can one of a number of tests to see if there is a significant difference in the decades.

    There is a potential confounding factor because those who came out late in the earlier decades are more likely to have died than those who came out late in recent decades. You could try allowing for this using survival statistics or whatever. But anyway this bias should reduce the apparent average age in earlier decades. So if there is a trend for a lower average age in later decades despite this, then the figures provide quite strong evidence for lowering average age.

  30. Guy said,

    December 2, 2010 at 8:57 am

    Mark Frank, no they don’t provide any evidence because they weren’t collected in a systematic way. Whatever you do to juggle the figures is meaningless if the data was invalid to start with. I’m as interested as you in how you can use stats to accurately answer questions. But whatever you do to them, these stats cannot answer that question. Garbage in, garbage out.

  31. Ron Todd said,

    December 12, 2010 at 9:58 am

    I have a ISA account the Bank tells me that it is managed by a team of experts.

    Even when the economy was doing well the best these experts could manage was to increase the value of my fund by an amount almost exactly equal to the management costs.

    How much control do they have?

  32. matkad123 said,

    January 6, 2011 at 1:59 am

    Mark Frank is of course quite correct. It is fairly simple to manipulate the data of this study to get meaningful results.

    As Mr Goldacre pointed out the researchers have asked the wrong question of the data: they have compared age of coming out against age. In order to substantiate their claim that the age of coming out has fallen over time they should compare age of coming out against date of coming out.

    Assuming we know the date of the study it is quite straightforward to find the date of coming out for each participant: Subtracting participants’ age at the time of study from the date of the study (gives you their DOB) and add age at time of coming out (giving the date of their coming out). Or in vector terms:
    Y2 = X –Y1 + C

    We could then plot age of coming out against date of coming out and gain results from there.

    Of course those results could then be debated in terms of the reliability of the sample (some commenter’s have raised some valid points) but at least your method would be correct and you would be asking the right questions of the data.