Datamining for terrorists would be lovely if it worked

February 28th, 2009 by Ben Goldacre in bad science, evidence based policy, politics, statistics, surveillance | 79 Comments »

The Guardian
Saturday February 28 2009
Ben Goldacre

This week Sir David Omand, the former Whitehall security and intelligence co-ordinator, described how the state should analyse data about individuals in order to find terrorist suspects: travel information, tax, phone records, emails, and so on. “Finding out other people’s secrets is going to involve breaking everyday moral rules” he said, because we’ll need to screen everyone to find the small number of suspects.

There is one very significant issue that will always make data mining unworkable when used to search for terrorist suspects in a general population, and that is what we might call the “baseline problem”: even with the most brilliantly accurate test imaginable, your risk of false positives increases to unworkably high levels, as the outcome you are trying predict becomes rarer in the population you are examining. This stuff is tricky but important. If you pay attention you will understand it.

Let’s imagine you have an amazingly accurate test, and each time you use it on a true suspect, it will correctly identify them as such 8 times out of 10 (but miss them 2 times out of 10); and each time you use it on an innocent person, it will correctly identify them as innocent 9 times out of 10, but incorrectly identify them as a suspect 1 time out of 10.

These numbers tell you about the chances of a test result being accurate, given the status of the individual, which you already know (and the numbers are a stable property of the test). But you stand at the other end of the telescope: you have the result of a test, and you want to use that to work out the status of the individual. That depends entirely on how many suspects there are in the population being tested.

If you have 10 people, and you know that 1 is a suspect, and you assess them all with this test, then you will correctly get your one true positive and – on average – 1 false positive. If you have 100 people, and you know that 1 is a suspect, you will get your one true positive and, on average, 10 false positives. If you’re looking for one suspect among 1000 people, you will get your suspect, and 100 false positives. Once your false positives begin to dwarf your true positives, a positive result from the test becomes pretty unhelpful.

Remember this is a screening tool, for assessing dodgy behaviour, spotting dodgy patterns, in a general population. We are invited to accept that everybody’s data will be surveyed and processed, because MI5 have clever algorithms to identify people who were never previously suspected. There are 60 million people in the UK, with, let’s say, 10,000 true suspects. Using your unrealistically accurate imaginary screening test, you get 6 million false positives. At the same time, of your 10,000 true suspects, you miss 2,000.

If you raise the bar on any test, to increase what statisticians call the “specificity”, and thus make it less prone to false positives, then you also make it much less sensitive, so you start missing even more of your true suspects (remember you’re already missing 2 in 10 of them).

Or do you just want an even more stupidly accurate imaginary test, without sacrificing true positives? It won’t get you far. Let’s say you incorrectly identify an innocent person as a suspect 1 time in 100: you get 600,000 false positives. 1 time in 1000? Come on. Even with these infeasibly accurate imaginary tests, when you screen a general population as proposed, it is hard to imagine a point where the false positives are usefully low, and the true positives are not missed. And our imaginary test really was ridiculously good: it’s a very difficult job to identify suspects, just from slightly abnormal patterns in the normal things that everybody does.

Things get worse. These suspects are undercover operatives, they’re trying to hide from you, they know you’re data-mining, so they will go out of their way to produce trails which can confuse you.

And lastly, there is the problem of validating your algorithms, and callibrating your detection systems. To do that, you need training data: 10,000 people where you know for definite if they are suspects or not, to compare your test results against. It’s hard to picture how that can be done.

I’m not saying you shouldn’t spy on everyday people: obviously I have a view, but I’m happy to leave the morality and politics to those less nerdy than me. I’m just giving you the maths on specificity, sensitivity, and false positives.

Please send your bad science to


Other good links on this include Schneier:

and some eggheads:

If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

79 Responses

  1. 10channel said,

    February 28, 2009 at 1:40 am

    Have these people heard about the lie detector and its problems (which are the same)?

  2. Fish Custard said,

    February 28, 2009 at 2:54 am

    This is a welcome clarification of a bit of the book which made my non-maths head hurt a bit.

    [I do rather suspect that the relevant people, however, don’t give a flying poo for false positives, in a kind of West Midlands Police with Irish people way.]

  3. tamas said,

    February 28, 2009 at 4:30 am

    The problem with your argument is that it leaves out a very important element: how much cost you are willing to put up with to get to a suspect. Even with the numbers you proposed (although despite what you say, it would have to be a very stupid algorithm to leave 10% false positives in, from the ENTIRE population) the police would have to check on 750 people to get to one true suspect. That is not impossible, but very costly. And thus the question is how badly the society would want to get to would-be terrorist. If, say, checking out each individual that came up as positive in person (both false and true) would cost 100 pounds, then at the cost of 0.04% of the GDP you would have got rid of 80% of the countries terrorists. (I have taken all your numbers.) That’s not bad, is it?

  4. Uther said,

    February 28, 2009 at 5:44 am

    I also think the argument in this paper interesting:

    “Strong profiling is not mathematically optimal for discovering rare malfeasors” doi: 10.1073/pnas.0813202106

    Online here:

    They show that even if you have a prior probabilities for each individual, giving the likelihood that they’re a terrorist, simply sampling according to those probabilities is inefficient. You end up repeatedly sampling those people with large priors who are not terrorists. They show that uniform sampling is just as efficient.

    They also show that *if you have a numerical prior* then there is a way to calculate the optimal sampling strategy. They sidestep the issue of where such a prior might come from.

    The best comment I’ve seen on profiling for terrorists based on formulae noted the conflict of interest that the hindu-arabic numerals have – you’re just not going to get the right answers :).

    P.S. I highly recommend that you read the actual paper above and not the press commentary – it is much more readable.

  5. phayes said,

    February 28, 2009 at 5:45 am

    I hope and expect MI5 employs a few smart people and already know all this. I doubt they’ll be using these techniques so naïvely as to try to ‘diagnose’ terrorists from the general population. More likely they’ll be using them to augment their other anti-terrorist activities – to avoid missing important connections among already suspected and unsuspected members of terrorist cells, for example.

  6. MPL said,

    February 28, 2009 at 5:57 am

    Even worse, it would only work if the subjects aren’t actively trying to defeat it, and can’t figure out if they’re being selected.

    For instance, datamining to select people for extra screening when boarding planes can be defeated in an obvious way: if you fly enough times (like a business traveler would), you quickly find out if you’re on the “special” list. Bad guys could just test all their candidates to see if they get the special treatment or not, and only make use of those who do not.

    The temptation to start hassling the people your almighty computer tells you are evildoers would be terribly strong, I imagine—pull them over for traffic stops, check them out at work, etc—but the more you do that, the more you give away your hand. Then you’re spending all your time chasing people who know they’re being watched (and so behave themselves), while you have fewer resources to spend watching everyone else.

    It’s the same reason Mafia Dons are so hard to put away—they know they’re always being watched, so they never do anything illegal themselves, instead leaving it to the rank and file.

  7. RogerWilco said,

    February 28, 2009 at 6:15 am

    So Ben points out, rather thoroughly, I thought, why datamining can’t work and tamas says if it did it would be a bargain! Brilliant.

    The whole point is not the fact that you miss 20% of the bad guys but that you can’t find them amongst the 6 Million false positives.

  8. David Mingay said,

    February 28, 2009 at 7:48 am

    Out there in the real world, rather than to catch all the terrorists, the job of the public servant is twofold.

    Firstly, to catch enough terrorists to get a good annual appraisal, and secondly, to expand your empire: think how many extra staff you would need to be in charge of to interview the 6 million false positives.

  9. Allo V Psycho said,

    February 28, 2009 at 7:58 am

    Unfortunately, with medical diagnostic tests as with terrorism data mining, the cost of a false positive is never just money. A false positive diagnosis of cancer causes just as much dismay as a true positive, until it is tested further. A false positive diagnosis of ‘terrorist’ is likely to lead to closer observation, even arrest or harassment, and that it turn is likely to swell the ranks of the disaffected. My memory of the Irish troubles is that the security forces were so confident they could identify terrorists in the (much smaller) NI population that they supported arrest and imprisonment without trial: and, from memory, there were 10 times as many deaths in the 6 months after internment as there were in the 6 months before. The IRA also poisoned the data – feeding false info to the security forces so that obviously innocent people were arrested, and then the whole community was outraged (anecdote warning). I’m pretty sure that the current favourite terrorists would be Muslim. I can easily imagine the disaffection of this community if the 600,000 suspects (at the higher acuracy) all belonged to it.

  10. DTM said,

    February 28, 2009 at 8:08 am

    If you are interested in this sort of thing try reading up on the ‘receiver operating characteristic’ (ROC) curve:

    Basically you need to push the curve as far as possible into the top left hand corner, then set your alert threshold accordingly to get the best trade of between false alerts and missed alerts. The problem that Ben illustrates is that if your positive events are very rare it is very hard to get close to the top left corner.

    My own suspiscion, given some research I did on a much more simple system (with much less noisy data), is that doing this on the general population isn’t going to work.

  11. DevonDozer said,

    February 28, 2009 at 8:31 am

    Surely this is another example of “policy based evidence”. The IPPR seems to be a pet NuLab outfit used to publish arguments that are useful to them. The ‘security establishment’ love it because it is good for their empires. Gordon loves it because he gets more control – or thinks he does. It has nothing to do with the actual problem.

    Wat Tyler did an interesting piece about the IPPR a while back. Have a look at . Could they be a candidate for

    Meanwhile, has anybody tried opeing a bank account recently? Rules and regulations brought in “for our security” to crack down on drug dealing, money laundering & terrorism seem to have had no effect on those targets. For example,

    Ben’s excellent piece says there will be more of this.

  12. PhiJ said,

    February 28, 2009 at 8:50 am

    @RogerWilco: No. Unless I’m much mistaken, Ben points out that datamining will give a huge false positives to actual positive ratio, whereas tamas said that maybe we can investigate all of the suspects which the datamining picks out and use that as a cheap way to find four fifths of the terrorists. Nice!

    I thought I’d check the calculation, and make it ~0.03% of the GDP (which according to google is 2.13 trillion). £100 pounds for checking out each person seems like too little to me though, after all, the algorithm is probably going to throw out a lot of people who will look like they might be terrorists, so identifying the ones who are not may be costly, as well as identifying (with certainty, I hope the police have to achieve at least some level of certainly with these matters) the ones who are. But I really don’t know anything about these prices: this is all wild speculation on my part.

    After pointing out that the potential cost may be higher, I should say I agree that a one in ten false positives rate looks a bit steep.

    Oh, and of course the police are going to have some false positives in their investigations, no idea how many though. I hope that they’re pretty good, but I can be rather naive at times. No clue whether this is one.

  13. njdowrick said,

    February 28, 2009 at 9:42 am

    Ben’s analysis is fine if just one test is used. However, suppose that there are two tests we can apply – each with 80% success at identifying positives, and 99% successful at identifying negatives. Suppose further that these two tests are independent of each other, so that the chance of a true negative being identified as positive on one test is independent of the chance of this happening on the other test. Applying both tests to a population of 6×10^7 with 10000 true positives, we’ll get 6400 true positives identified, and 6000 false positives. This should be manageable.

    The question is whether two truly independent tests of this effectiveness exist. Is it obvious that they don’t? Life is multidimensional, after all!

  14. Mark Wainwright said,

    February 28, 2009 at 10:05 am

    Ben, you have sullied this otherwise excellent article by using “suspect” to mean “person guilty of a crime”. A suspect is someone the police have arrested because they suspect him (or her). They are still presumed INNOCENT until proven otherwise.

    This is not just pedantry, it’s really important, because the tabloid media would love to make us think that everyone who’s ever suspected is actually guilty. Hence “shock” headlines along the lines of “Terrorism suspect wins tombola ahead of disabled child” or whatever.

    Could you edit out this glitch? Use “terrorists” or “wrongdoers” or “criminals” or “tossers” or whatever you prefer.

  15. Steve Senior said,

    February 28, 2009 at 10:05 am

    While I think I understand the analysis, I disagree with the conclusion that data mining is of no utility. Accepting that any system, no matter how good will lead to a pool of suspects many of whom are innocent and which won’t contain some of the guilty doesn’t mean that the technique is useless. Please correct of I’m wrong but the situation is exactly analogous to a diagnostic test for a rare disease, and we still use those right?

    All it really means is that any further action taken towards those individuals who are highlighted should take into account of the post-test likelihood of guilt, like not immediately placing them under surveillance or pulling them up at airports. It’s still a useful place to start, and vastly preferable to, say racial profiling.

  16. PhiJ said,

    February 28, 2009 at 10:06 am

    I doubt that would work though: is it really possible to get two independent tests which are that good? All tests will be taking a lot of variables (with varying dependencies on each other) and trying to get the best answer out of those. The more obvious variables are likely to be accounted for in practically all the tests. I would expect that the more obvious variables would be the more powerful ones as well. I suppose you’d need two very different ways of looking at someone, both having lots of variables and a powerful test which we can get a good result from. Sounds unlikely, but not impossible.

    More importantly, it’s an easier getout for terrorists. If you’re going to simply require two positives (instead of treating both of the tests in a complicated way), terrorists can try to get around both of the tests. They will as soon as they work out (or guess) what some of the things you’re testing them on are. If they manage to remain undetected on just one of the two tests, it’s a win (for them). So you’d end up getting a lower actual positive than you’re proposing. That doesn’t mean it can’t be helpful though. Depends on how easy they are to get around.

  17. PhiJ said,

    February 28, 2009 at 10:07 am

    Oh, and as loads of people replied as I was writing my post – it was directed to post 13

  18. Mark Frank said,

    February 28, 2009 at 10:09 am

    I don’t know how counter-terrorism works (and I don’t suppose anyone is about to tell me) but it seems implausible that intelligence analysts run data mining packages on the whole UK population at regular intervals in the hope of turning up likely suspects. If they did then the problem of false positives would indeed be immense. The paper stresses the importance of integrating different sources of information and surely this makes all the difference. In particular there is no reason to suppose that the data mining is the first step in the investigation or that the question to be answered is “is this individual a terrorist?”

    For example, you might learn from other sources that there is a possible threat emerging from the extreme right wing racist community in East London. This immediately dramatically limits the population under investigation and also raises the base rate. You might then go on to try and identify individuals or you might simply asked the question – given the characteristics of this population how likely is that someone will be planning a terrorist operation?

    A separate possible and fruitful activity is to identify trends e.g. a sudden increase in workers at a particular government department who recently arrived in the UK and attend fundamentalist mosques. Just as a supermarket might use data mining to notice a sudden jump in demand for nappies in South Yorkshire.

  19. profnick said,

    February 28, 2009 at 10:28 am

    Whilst the argument in Ben’s article is sound in context, the title is misleading because, as pointed out by some other posts, there are some situations where 10% false positives or negatives are tolerable, because further investigations will be conducted on the subset of the dataset. Thus it is not right to dismiss data mining “per se” but only in a given context where it may indeed be a poor choice of tool.
    Data mining is, for example proving a useful tool in predicting environmental toxicity of industrial chemicals and thus obviating many animal tests.

  20. Ben Goldacre said,

    February 28, 2009 at 10:53 am

    datamining in many contexts obviously fine, title changed

  21. brachyury said,

    February 28, 2009 at 11:05 am

    Hmmm…. odd article you are making a whole series of assumptions about what David Omand means and presumptions that datamining would be used in the crudest possible way. Obviously you are coming from a medical screenign perspective, however

    1. What David Omand is describing is not necessarily datamining– in the sense that CCTV pictures of someone nicking sweets in Woolworth is not datamining it is just observation of malfeasance. The ability to go back and check someones email if you get a tip-off from the public is not ‘datamining.

    2. An email from a known jihadist to another individual flags the second individual quite clearly– its doesn’t require sophisticated probablistic models. Once flagged further observation is relatively costless as it can be done automatically.

    3. Observation of the bank records, telephone calls, travel information of previously identified suspects maybe more valuable than data mining of the entire domain of records.

    I wouldn’t actually call any of this datamining its just automated surveillance.

    4. There is however a place for datamining of large data series. Indeed banks and insurance companies have used this cost effectively for years to detect potential fraud–outlier detection. It is likely that datamining of records is only likely to be flagged up if intersected to individuals who are known to have extremist beliefs or mix with others with extremist beliefs.

    Anyway … this doesn’t address whether you want this sort of intrusion– but I don’t buy your specific criticisms.

  22. howfar said,

    February 28, 2009 at 11:17 am

    It’s interesting how, as soon as Ben’s writing suggests something political, people (naming no names) start to oppose him with numbers that they seem to have found lodged somewhere in their fundamental cavity.

    I suppose strong ideological commitment to political views is still (thankfully?) much more widespread than similar attachment to crystal healing etc.

    On a different aspect of this:

    @15: The correct analogy would be continual universal screening for rare acquirable conditions. Being “a terrorist” is not genetic, so for the analogy to work we have to consider it as equivalent to communicable diseases. An example might be compulsory, regular, nationwide screening for HIV. Even this doesn’t really go far enough, once we consider the fact that terrorists are actively trying to hide from detection.

  23. Suw said,

    February 28, 2009 at 11:29 am

    @FishCustard – if you want a very good piece explaining the problem of false positives, the BBC did a nice bit about cancer screening that has useful diagrams and everything:

    @MarkWainwright – I think he’s using suspect correctly, to mean someone about whom there is a reasonable suspicion of wrongdoing, as opposed to someone flagged up by a test about whom there is no evidence of wrongdoing. He’s talking about a “pre-screening” process that happens before evidence gathering, and not assuming that the “true suspects” are actually criminals or terrorists. That’s how I read it, anyway.

    @all – Regarding costs, £100 per suspect is way low. It’s not just the cost of running the tests, it’s the cost of setting up all these databases in the first place, as well as the cost of interconnecting them (gov’t IT not being great at communication), and then once all that’s done, there’s the cost of doing the datamining. The National ID card scheme alone is predicted to cost £10 billion. I don’t know what the cost of the National Vehicle Tracking Database is, but I’m betting that it too is going to run into the billions. The proposed Intercept Modernisation Programme , which ORG says includes a “proposal to centralise the electronic communications traffic data of the entire UK population in a database managed by the Government” would cost tens of billions.

    So let’s say, pulling figures out of thinnish air, that in order to set up the databases required to collect the data to be mined, and getting the the data mining algorithms in place and tested, the gov’t needs to spend £50 billion. Assuming American billions, that’s £830 per man, woman and child before we even get started, or to put it another way, £83k for each of the 600,000 false positives. (Add three 0s for British billions.)

    Then there’s the cost of doing the data mining, having a human examine the positives (false and true), deciding on which positives to follow up, following them up… That’s going to run into billions too, because that’s a lot of work.

    How much intelligence could be gathered through old-fashioned hard work and infiltration for that amount of money? How much work could be done, on the ground, to help disaffected and alienated youths currently at risk of becoming extremist? Answer: Lots.

    This whole idea of data mining is a no more than a civil servant’s wet dream, as @DavidMingay pointed out. It won’t actually work, it will waste taxpayers money, and it won’t do a damn thing to address the root causes of native-born terrorists. And that’s not even beginning to address the problems – social and economic – caused to people falsely suspected and investigated, or examine the abuses to which such a system would be prone, both from within and from hackers. After all, the Gov’t hasn’t exactly shown itself to be capable of looking after the data is has. Do we really want them gathering more?

  24. brachyury said,

    February 28, 2009 at 11:33 am

    It’s interesting how, as soon as Ben’s writing suggests something political, people (naming no names) start to oppose him with numbers that they seem to have found lodged somewhere in their fundamental cavity.

    I think people are puzzled as the entire article is uncharacteristically for ‘bad science’ based on conjecture.

  25. njdowrick said,

    February 28, 2009 at 12:08 pm

    Replying to no.16:

    There is certainly more than one sort of evidence that can be used in trying to decide whether a person is up to something. Some of the data collected for each person will be correlated with other data, but I’m sure that the number of independent degrees of freedom in the relevant dataset is considerably greater than one. Of course, I don’t know what the false positive rate for each degree of freedom will be, so trying to estimate the effectiveness of a combined test would be speculation. I just don’t think that Ben’s discussion, which assumes only one degree of freedom, is particularly relevant to the present case.

  26. cybergibbons said,

    February 28, 2009 at 12:15 pm

    I think this is an overly naive analysis of datamining. It does highlight one of the fundamental issues involved, but there are so many more factors involved. There can be many tests involved, with varying degrees of independence. You have other sources of intelligence which are used in conjunction with datamining. Very importantly, you also use connections between people rather than just testing them as individuals.

  27. seanphelan said,

    February 28, 2009 at 12:30 pm

    I haven’t studied or qualified in medicine and have never nade clinical diagnoses, but I have done quite a lot of data mining in my time (on large user surveys, and more recently on web server logs looking for data piracy).

    Anybody engaged in actually doing data mining will have strong “quant” skills – it’s a very nerdy, wonky activity – and will certainly understand false positives/false negatives. Practical investigation involves running lots of different tests – data mining, interviewing real people, studious observation – and a lot of debate with your team. Some tests are very helpful and increase your understanding of your data, others do not. Investigation of an occasional counterintuitive result may lead to a breakthrough, but more often just exposes bad data or flawed testing.

    Based on diligent observation of medical TV dramas like House and ER, I suspect that day-to-day medical diagnosis follows a similar path (those programs must be 100% accurate, after all :). False positives/negatives don’t make medical tests useless, but the probabilities have to be taken in to account in interpreting test results.

    False positives/false negatives are two of the many, many factors that competent and experienced data mining people take in to account every day, but they alone do not validate or invalidate the entire idea of data mining.

  28. RS said,

    February 28, 2009 at 12:42 pm

    I would point out that contrary to some of the comments above, Omand talks explicitly about datamining private data:

    “Such information may be held in national records, covered by Data Protection legislation, but it might also be held offshore by other nations or by global companies, and may or may not be subject to international agreements. Access to such information, and in some cases the ability to apply data
    mining and pattern recognition software to databases, might well be the key to effective pre-emption in future terrorist cases.”

    I think some above commenters have some rather overly optimistic views of the efficacy of datamining. In medicine the main diagnostic tests used have similar sensitivity and specificity rates to the ones Ben is using – because they are well defined and studied physical entities. Datamining is also used in medicine (e.g. for expression or DNA arrays) with rather poor results because the patterns have very mediocre specificty and sensitivity where they work at all.

    Datamining with private data will not be like combining hundredss of tests with the specificity of medical diagnostic tests – more likely it will produce, after combining all the available data and patterns, something that has a much lower specificity and sensitivity. If there were data characteristics of terrorists that were similar to the specificity of medical tests they’d have been found by now.

  29. Mark Frank said,

    February 28, 2009 at 1:31 pm

    Re #28 RS

    I don’t think anyone is denying that Omand is proposing data mining private data. The various comments are just pointing out that data mining does not necessarily mean something analogous to screening women for breast cancer. It is (hopefully)integrated into part of an overall intelligence strategy and can be used to suggest and answer (or partially answer) all sorts of questions. For example, the problem might not be to identify a terrorist but to decide whether to raise the security level locally or nationally because data suggest an increased risk.

    There is a danger in seeing everything in terms of medical statistics. For example, medical statistics assumes specificity and sensitivity are relatively stable properties of a test. In other fields they vary (sometimes they vary when the base rate changes which totally screws the standard calculations of PPV and NPV) and this makes the medical diagnosis model difficult to apply. I suspect this includes security. An incident in the Middle East or intelligence about a particular source of terrorism may suddenly change either or both figures.

  30. phayes said,

    February 28, 2009 at 2:18 pm

    Having googled around a bit to see what investigative data mining is really all about and how it is/might be used in counter-terrorism, I really don’t see the analogy between investigative data mining techniques and medical diagnostic tests as even remotely valid. AFAICT, they are completely different tools used in completely different ways for completely different purposes.

    Stethoscopes don’t make very good handcuffs. So what’s new?

  31. Andrew Clegg said,

    February 28, 2009 at 2:21 pm


    10% false positives isn’t a “very stupid algorithm”.

    10% FP = 90% precision. I’ve worked a bit in data mining (in a completely different field) and 90% precision is generally considered a very GOOD result, even when dealing with entities and feature sets a LOT less complex and variable than, err, entire humans and their behaviour!

    And in anything but the most trivial problems, halving the false positive rate from there — getting up to 95% percent precision — is a hard task.

    In some tasks this takes years of work with many different groups pitting their algorithms against each other on standardized training and tests sets, and even then this often leaves the problem of overtraining (i.e. failure to generalize from standard datasets to the real world).

    I leave it to the reader to speculate how well this would work behind the closed doors of the security services or their contractors.

  32. Ben Goldacre said,

    February 28, 2009 at 3:00 pm

    there are some interestingly fallacious arguments being mounted here, i guess in the name of “we should at least do something”, which is a sense i can identify with. however.

    firstly, this is not “analagous” to screening, it is screening. it doesnt matter if youre using your algorithm or tool to screen for cancer or terrorism, it is screening, and the issues of the sensitivity, specificity, positive and negative predictive values for your test – and the way they vary with population prevalence of the sought property – are universal. and whether you like it or not, i gave very VERY generous figures for these imaginary tests.

    secondly, this IS about screening the whole population, i dont know how anyone can think it’s not, we are all being invited to accept sharing our data with the security services, as a population, so that we can all be screened with the datamining algorithms. it is not for screening people from one ethnic origin. if you happen to think that ethnic origin is a good starting point for your screen and you will exclude everyone who is “not non-white” then (a) i think you just lost a few true positives and (b) it’s still just the starting point for your screen, and part of the screening process.

    thirdly, data on peoples communications between each other is not an additional source of data to the screening i have discussed, it is precisely the kind of data you would use in this screening. i find it pretty amazing that anyone can say “getting an email from a jihadist” is a good easy simple predictor of being a terrorist suspect. the airport bombers were doctors. i’m sure they emailed a lot of people both at work and socially. for all i know they might have emailed me about a patient, and i got an email from a jihadist.

    lastly, the argument that you run lots of screens one after another and that solves the problem is particularly silly. just because you break your screening algorithm into lots of little bits doesnt make it any more specific or sensitive. you’ll get false positives and lose true positives along the way, in bits.

    oh, and i don’t understand why people think it’s been missed that datamining is only being used to produce leads for future investigation. that is exactly what it would be used for, but for the very simple reasons described above, there is every reason to believe that datamining a general population’s records to look for terrorists is likely to throw up so many false positives that it will be an unusably unwieldy source of leads for subsequent investigation.

  33. njdowrick said,

    February 28, 2009 at 3:56 pm

    Re: second-last paragraph in Ben’s reply (no.32).

    Certainly “break[ing] your screening algorithm into lots of little bits” won’t improve matters, but this wasn’t suggested. The observation made was that combining several independent screening algorithms would make a huge difference, and I don’t see how this can be disputed. Without knowing how many screening algorithms there are, how independent they are, and how specific and sensitive each of them is I don’t see how one can be confident that the whole thing is nonsense. The possible existence of independent screening algorithms makes a potentially very big difference to Ben’s order-of-magnitude estimates, and should not be ignored.

  34. Jammydodger said,

    February 28, 2009 at 4:02 pm

    “I hope and expect MI5 employs a few smart people and already know all this. I doubt they’ll be using these techniques so naïvely as to try to ‘diagnose’ terrorists from the general population. More likely they’ll be using them to augment their other anti-terrorist activities – to avoid missing important connections among already suspected and unsuspected members of terrorist cells, for example. ”

    This may well be true… about MI5 employing smart people who know this…(but I wouldn’t assume that!)

    More importantly, unless the general population are smart and know this as well, they may not see this activity for what else it could be…

    … A spurious justification by the national security organisations to “encourage” the public to swallow the destruction of our civil liberties and seizing of our personal data for the whatever purpose government or security agencies choose to put it to… whether we see this as in our interests or not!

    Like homeopathic remedies or fish oil supplements, something doesn’t have to work as advertised, and can still enjoy enthusiastic popular support.

    Is this whole anti-terrorism argument, in fact, a patsy to try and garner public support for something which may not be in our interests, and is intended for entirely other purposes?

    I don’t know the answer, but the hypothesis certainly correlates with the available observations.

    And if it turns out to be so, should we really allow ourselves to be fooled in this way?

    Excellent piece Ben.

  35. Jammydodger said,

    February 28, 2009 at 4:07 pm

    I should add that I am not equating “correlation” with causation here! … That’s why ” I don’t know!”

    But I am vigilant and suspicious!

  36. Diversity said,

    February 28, 2009 at 4:27 pm

    Ben is, I guess, telling us how a great deal of money and effort at (UK) GCHQ and (US) NSA has already been wasted. Dumb data mining did not work in security as it did not work in a lot of other attempted applications, commercial and academic.

    What is in mind now (again a guess) is not only a number of idependent screening alogorithms working in parallel, but having a number of these algorithms capable of learning. And these algorithms will be continually exploring the data, like Google bots.

    What the first generation of these systems will throw up, when someone feeds in search parameters, is a list rather like a Google search result; with likelihoods of a positive match noted for each item in the result. The second generation will also ask “Did you mean to specify exacly this search? Or are you interested in X AND Y OR Z? Third generations will go in for more sophisticated dialogue with the user, feeding back into changes in the learning programme

    Sooner or later (I guess later), systems like these will throw up a high proportion of leads with high likelihoods of real matches. What is more, they will be developed outside government as well as inside. We will have to live with them.

    Sir David Omerod’s report is not advocting these systems as new initiatives. He concentrates on the questions of how a free, democratic society can live with them as searchers of data that is in the public domain, as searchers of data the Government holds, and as serchers of data that we regard as private. He is asking the right questions, even if (in my obstreperous opinion) he has not found adequate answers.

  37. phayes said,

    February 28, 2009 at 4:41 pm

    There are a wide range of investigative intelligence data mining and analysis techniques being used for a wide variety of purposes on populations of widely varying size – many(most?) of which are not remotely analogous to diagnostic medical tests being used to screen individuals for specific diseases in large populations: identification of (suspected) terrorist group hierarchy/structure or tracking trends and anomalies in activity on extremist websites, for example.

    One of the application areas I came across, investigative search (IS), is a bit like screening but even that’s hardly likely to be used as inefficiently and naïvely as suggested. Why on earth would the intelligence services want to spend all their time and resources misusing their data mining and analysis tools and following up what they (should) already know to be likely ‘false positives’? They could be that stupid, but it’s hardly likely, is it?

  38. brachyury said,

    February 28, 2009 at 6:45 pm

    @Ben (34)

    1. Your analogy is unfair.
    Its clear from your answer that you regard medical statistics as the highest form of mathematics. So in your own terms: your analogy for datamining is similar to the idea of screening everyone in the country for prostate cancer using PSA. Thats clearly a daft strategy, ergo PSA is a useless test. Except if you are over 55 and have trouble peeing maybe its worthwhile (maybe you might argue PSA is still poor– but hopefully you take my point). We don’t use datamining to generate leads as you suggest it is largely the other way round– we look carefully at the activity of suspects and their associates.

    2. Anyway….the reason for collecting data is in the main so that you can follow leads and contacts once a conspiracy is uncovered– not as you claim for a-priori search. It is because we don’t know who are the Jihadists that we keep information for all– not because we want to mine the whole population record for leads.

    3. Regarding the idea that following up those who have contacts with Jihadists is a bad test for finding terrorists. What the hell else could be a better lead? thats not datamining its the absolute primary lead in any criminal conspiracy. If a contact is innocent then discount it. Do you really think we shouldn’t be monitoring who jihadists are contacting?

  39. tanveer said,

    February 28, 2009 at 7:23 pm

    Surely this is a political issue too and trying to disengage from this and provide a purely technical explanation of why screening is a bad idea in this case does not tell the full story. I understand the technical argument but it is not the full story. The context of the test is important too and there are other far more important issues at stake such as our freedom and right to privacy. It is pretty clear which side of the argument you are on Ben and I am on your side but I do not think this should be reduced to a a narrow technical issue. It is an important aspect to the argument but definitely not the whole or most important part of it.

  40. Toenex said,

    February 28, 2009 at 7:46 pm

    Is this Government that will be monitoring my every ‘digital move’ the same one that each year asks me each year to tell them how much tax I’ve paid? The same Government that runs the Child Tax Credit system? Is this the Government whose local councils run the most information void web sites on the face of the planet?

    Corrrrr, watch out Mr Terrorist. They’ll be all over you like digital moss.

    Seriously, as someone with a background in data mining and pattern recognition this, even in the hands of the highly competent, is science fiction.

  41. howfar said,

    February 28, 2009 at 7:50 pm

    I, for one, am just glad that British governments have such a good record on quickly developing sophisticated computer systems without wasting enormous amounts of money.


    PS. Can people stop saying “jihadist” like they’re Jack fucking Bauer or something? It’s really getting on my tits. For a start, a mujahid (someone involved in jihad) is not necessarily a terrorist or even a soldier. Furthermore, many scholars of the Qur’an would argue that terrorism cannot form a legitimate part even of jihad as-sayf. Sura 4 verses 90 and 98 seem to be easily open to their interpretation.

    Also, “jihadist” is just such a wanky, Rambo, show-off action-man word. It sounds like the sort of thing one of those wobbly chinned saloon bar generals uses when explaining how he’d fight whatever war better than the highly trained professional soldiers who are actually risking their lives.

    Jihadi has marginally better provenance as a word, but is kinda crappy Arabic, or so the wife tells me. Mujahid would be better Arabic, but is (a) massively unspecific and (b) has too strong a connotation of a specific group of freedom-fighters/terrorists (delete according to decade).

    Also, I don’t know if anyone has noticed, but we didn’t need a specific word for the Irish individuals and groups who terrorised Britain for a few decades. The clue to the appropriate name for these people (other than the one that starts with C and rhymes with James Blount) was in their activities, not our under-informed assumptions about their religion.

    So yes Internets, stop irritating me!one11eleven!1

  42. Mark Frank said,

    February 28, 2009 at 9:49 pm


    “firstly, this is not “analagous” to screening, it is screening. it doesnt matter if youre using your algorithm or tool to screen for cancer or terrorism, it is screening, ”

    Ben – you can call it screening if you like but there must be substantial statistical differences between screening for cancer and using data mining for security.

    * Most importantly – the question you are trying to answer may well be different. If you do cancer screening you want to identify as many people as possible in the population that have that cancer. The security forces may be trying to identify all people likely to do terrorist acts in the UK – but it seems far more plausible that they will be trying to identify a specific group of terrorists with some known characteristics. Or they may not be trying to identify terrorists at all. They might for example be assessing the level of risk to some target or targets by estimating the chances of a terrorist act from persons unknown.

    * This is particularly pertinent when limiting the search to a specific community. If the objective is not to pick up all possible terrorists but to identify one or more terrorists in that community (because, for example, there is other intelligence of an imminent threat from within that community) then the rest of the UK is no more relevant than the rest of the world. This not just part of the test. There is a more limited objective and as a result greatly increased prevalence and sensitivity (although your specificity might be a problem).

    * Terrorists typically work in groups so that an exposure from any one in the group exposes the whole group. This greatly increases the benefit of identifying a terrorist.

    * Some exposures are going to have extremely high specificity and sensitivity. If the e-mail discusses a meeting with known terrorist and the sender has recently arrived from a country where we know a terrorist attack was planned and he has an inexplicably large amount of money in his bank account.
    I can think of others – but they are more contentious.

  43. warhelmet said,

    February 28, 2009 at 10:09 pm

    I tend to see Omand’s comments more in the context of a landgrab for funding and political influence.

  44. Suw said,

    February 28, 2009 at 10:17 pm

    Hm, I’m not sure that anyone could examine all the reasons why massive scale data collection and mining, targeting the entire population, in the word count of a newspaper column. Ben’s looked at one aspect of why it’s a crap idea. There are lots of other reasons, including financial cost, invasion of privacy, loss of freedom, data security, abuse, etc. etc. etc.

    So yes, of course it’s not the whole story. But it’s a part of the story. And to be honest, I’m not sure that even really complex data mining techniques can avoid lots of false positives when dealing with something as big as a country’s entire population, and something as fuzzy as “suspicious behaviour”.

    Yet being a false positive in the hunt for terrorists could be as life-changingly awful as being falsely accused of paedophilia. Take a look at the fallout from Operation Ore: badly botched investigations resulted in 35 of the accused committing suicide. And that wasn’t even based on data mining, but supposedly reliable data from a database of credit cards.

    If the authorities can’t get that right, why on earth should we assume that they can get complex data mining of the entire population right?

  45. treeofpain said,

    February 28, 2009 at 11:44 pm

    Ben, Good article & good reply to criticisms so far.
    Stupid bits people have missed that you have already mentioned: none of the algorithms can realistically be tested, or have been tested (hey disregarding torture of Guantanamo guests or wherever).
    6-degrees of separation (in our connected society, whatever the real number would actually be). The number of individuals loosly connected with a ‘supspect’ may actually be the entire population in only a few steps.
    The only reason for having access to the entirety of the data record is when you have nowhere to start from, apart from untested screen (please tell how these screens can be tested?? please?) hits, or like Googleing, user suggestions (and prejudices, failings, mis-correlations etc.)
    If you already have ‘suspects’, then really you can do trad surveillance, as this is what you would have to do anyway to reduce false positive numbers from a screen. Obviously the rest is a complete waste of time.
    A technical discussion is the most appropriate, rather than a political stance, as any sane, scientific look at this sort of tosh shows its weaknesses much more clearly than saying: immoral.
    Unlike medical screens, the cooperation of a true hit is unlikely to be forthcoming in order to prove that veracity.
    2 choices: either GCHQ do not understand how rubbish any of this is going to be, and just like to waste money, OR they do understand and there is an ulterior (or inferior??) motive.

  46. treeofpain said,

    February 28, 2009 at 11:55 pm

    Forgot to mention also: screening from a base position where there is not even a single clean data-set to start from, to do with anything (link in Ben’s article to software company is prob typical of ‘automated’ systems).
    Any links to the war on email spam anyone?? ‘algorithmic’ identifying of suspect data here, is notoriously difficult, and this is 1 of the streams.
    Classic argumentive tactics here I suspect: data + arguments to make ID cards work not really won, quick assume this position is true and add another layer quickly, so that people forget the fundamentally flawed foundations.

  47. Craig said,

    March 1, 2009 at 12:11 am

    This post is almost word-for-word identical to a statistics lecture given every year to beginning Psych students at my university.

    (that isn’t a criticism, by the way)

    Speaking of psychologists, Kahneman & Tversky have published a fair bit on related issues.

  48. NitWit005 said,

    March 1, 2009 at 5:34 am

    I would argue that such a tool is perfectly valid, it just generates highly circumstantial evidence. The same is true of a lie detector test, psychological profiling, etc. Any tool has a rate of false positives and false negatives.

    The real failures comes from people using the information from tools like this who do not know how to interpret the results. Humans are staggeringly bad at understanding probabilities.

    Where data mining techniques are usually useful in police work is how to distribute staff. If there are several large concentrations of people flagged by a tool, it may be wise to allocate more personal to those areas.

  49. Pro-reason said,

    March 1, 2009 at 6:40 am

    By publicly coming out in favour of the Palestinians, and making amends for the aggression against Iraq, the British government could reduce the risk of “terrorism” (Arabs counter-attacking us) virtually to zero. There is no need for false negatives, or the slightest hint of Nineteen Eighty-Four.

  50. NitWit005 said,

    March 1, 2009 at 6:52 am

    Umm, Pro-reason, the British have also had problems with non-Arab terrorists. Hell, back when they controlled Palestine they had problems with Jewish terrorists.

  51. adamcgf said,

    March 1, 2009 at 9:35 am

    #49 and #50 give a kind of mixture of what I wanted to say. Although the details of the maths and stats that Ben is (to another humanities boy) endlessly fascinating ultimately this is a waste of money for a host of reasons which are nothing to do with data mining and everything to do with wider policy. I remain deeply sceptical about the idea that we have vast identifiable and observable webs of terrorists plotting in our midsts, it feels much too much like a neo-con’s wet dream, and, more than anything, if we want to ‘defeat terrorists’ then we should treat them for what they are, small unrelated groups of criminals carrying out isolated criminal acts, rather than give them what they want and think of them as a big intelligent and all powerful organisation. We make these things true by believing them – there’s an unscientific statement for you, but I’d stand by it.

  52. Toenex said,

    March 1, 2009 at 11:07 am

    At least if naferious organisations decide to utilise more traditional forms of communication the government has control over the Royal Mail. Ohhh.

  53. Pro-reason said,

    March 1, 2009 at 11:19 am

    I’m not talking about the past.

  54. The Biologista said,

    March 1, 2009 at 12:59 pm

    Mark Frank: “Ben – you can call it screening if you like but there must be substantial statistical differences between screening for cancer and using data mining for security.”

    Yes indeed. We can assess the sensitivity, specificity etc for a cancer screening. We can’t assess a screen for terrorist detection because they don’t like to sit still for tests designed to catch them.

    Given the target (largely predictable cancer cells versus largely unpredictable people) there are bound to be statistical differences. The data mining is surely far weaker.

  55. Rich said,

    March 1, 2009 at 2:48 pm

    The clever people at MI5 already know it doesn’t work. Also from Bruce Scneier:

    Thereis no terrorist profile. If that’s true talk of “false positives” is meaningless since the tool is not better than picking people at random.

  56. Robert Carnegie said,

    March 1, 2009 at 5:37 pm

    Of course the US has had a policy in place for several years. Amongst people arrested at airports for attempting to board a planet are many WASP members of the Democratic Party. This may may be one of many things that President Obama has not gotten aroond to fixing yet, I haven’t been following it closely.

  57. Robert Carnegie said,

    March 1, 2009 at 5:38 pm

    “attempting to board a planet”, oh you know what I mean. BBC Radio has a science fiction event on, so does Film Four from Monday I think.

  58. Mark Frank said,

    March 2, 2009 at 10:15 am

    Re #54 Biologista
    “Yes indeed. We can assess the sensitivity, specificity etc for a cancer screening. We can’t assess a screen for terrorist detection because they don’t like to sit still for tests designed to catch them.

    Given the target (largely predictable cancer cells versus largely unpredictable people) there are bound to be statistical differences. The data mining is surely far weaker.”

    Of course it is true that you can’t assess the sensitivity and specificity of some kind of test for terrorists, especially as they are going vary wildly from one situation to another. But that’s not my reason for suggesting the cancer screening model is not appropriate. The most important reasons are that the security forces may be trying to answer a different question and that merging with other information sources can make a dramatic difference to the usefulness of the model.

    A full explanation is a bit long for a comment so I have put it on my personal blog.

  59. Queex said,

    March 2, 2009 at 10:46 am


    I think you’re misunderstanding how meta-analysis of different algorithms would actually work.

    To have two independent algorithms A and B, each must operate on different subsets of the data. You can combine the results to improve on either individually, but there’s no reason to think it will be any better than an algorithm C working on all the data.

    If both algorithms work on the full data (or indeed overlap at all), then they are not truly independent, which can greatly reduce the specificity gain from combining them- and there’s still no reason to think they would beat C.

    False positives don’t just arise from quirks of the algorithm used, they also arise from quirks of the data itself. The point is, even with the ambitious estimates of accuracy, for every true positive there will be a number of false positives and the only way to tell them apart is by going out and finding more data.

    Meta-analysis is not a panacea to wipe away false positive problems; it’s a toolbox to help get at the results you would have obtained had you done a single big study. If this hypothetical single study is still not specific enough to be practical, you’re SOL.

    A lot of the apologists for the approach seem to think that as long as some gain can be shown, it’s reason to support the idea. It’s not. You have to show that it’s more effective than spending a similar amount of money on other approaches. When you look at the likely cost (in terms of infrastructure and the expense of sorting wheat from chaff once you’ve done your mining), even the most unrealistically optimistic guesses as to its efficacy still make it worse than say, hiring a few dozen extra agents or funding a public education programme.

  60. heng said,

    March 2, 2009 at 3:17 pm

    Queex @58 made the comment I was about to make (although probably a bit more politely).

    Following from his discussion, if 2 algorithms, A and B yielded better combined results than algorithm C, then we would have a new algorithm, D, which split the data into 2 and passed it through A and B and combined the result. Its nonsensical to talk about an algorithm as anything other than a black box with inputs (all data) and outputs (terrorist or diseased or stolen credit card or whatever).

    For those thinking that medical diagnostics is anything other than data mining, frankly, you are wrong. Its *exactly* the same problem. You have inputs (blood tests, histories, CT images, MRI images) and outputs (have disease X?). Diagnostic doctors are just bayesian inference machines. If you could perfectly input the information into a computer, the computer would probably do a better job. Compare this to data mining for terrorists. You have inputs (phone taps, emails, cctv, informants, police interviews etc) and you have outputs (is terrorist?). Once you’ve defined your feature vector, in both cases you could use any of the widely researched machine learning algorithms. For the moment, assume you have enough training data, how would you define your feature vector?

    In principle, if you had enough computation, your feature vector could be everything: full frame CCTV images (because anything in the image could be of interest), every email on the internet (which of course can’t be encrypted), lossless PCM speech digitised from every phone call in the country (you need to keep accents and so on), police reports etc. Put them all into one big black box algorithm, along with masses of training data, and see what pops out the end. Of course, in the process you’ll pretty much solve several of the biggest machine intelligence problems around today. It would need to use every email because any self respecting terrorist is not going to flood the net with information.

    Alternatively, because you don’t have infinite computing power, you could reduce your feature vector by passing all the data through a data reduction algorithm. This would either be “tuned” (concentrating on individuals, or areas of the country or whatever), which comes down to profiling, or it would be random, throwing away random pieces of information until we have a computationally tractable problem (this is problematic for the reason given above). Whether what’s left is even potentially useful is anyone’s guess. The real problem here is that (unlike credit card transactions) nobody has the slightest clue what real terrorist actually do that’s different to non-terrorists – mostly because terrorists realise this and go out of their way to be normal.

    Now we come onto the problem of training data. How many terrorists have there been in the UK in the last 50 years (1000 tops?). Assuming that all the data these terrorists generate can be data mined for terrorist traits, and that these traits are correlated across all the data, so useful learning can actually be performed, what sort of problem do we have? Let’s assume that for every terrorist, there is 10 years of incriminating chatter to store and analyse. That means we have a sum total of 10,000 person years worth of incriminating data to analyse and train on. Compare this to the total chatter from the rest of the population: 3e9 person years. That swamps the amount of positive data with hideously noisy negative data. We require a non-trivial (!) amount of computation to train on that. 3 billion years worth of every CCTV in the country, every phone call in the country, every email in the country. Of course, this relies on all the information being available for every known terrorist…

    Already its a completely ludicrous proposal. I can’t be bothered to think what the requirements would be to actually run such a preposterous scheme. The absolute, fundamental problem, is that the vast majority of people and the vast majority of data (even from terrorists) is in no way suggestive of terrorists. Its useless information that still has to be processed. Nothing beats: a) solving the problem in the first place; b) actually acquiring useful information (that means police and intelligence services work).

  61. The Biologista said,

    March 2, 2009 at 3:42 pm

    Mark Frank: “Of course it is true that you can’t assess the sensitivity and specificity of some kind of test for terrorists, especially as they are going vary wildly from one situation to another. But that’s not my reason for suggesting the cancer screening model is not appropriate. The most important reasons are that the security forces may be trying to answer a different question and that merging with other information sources can make a dramatic difference to the usefulness of the model.”

    The same can be said of all screening, including that for cancer research. But the critical flaw still remains. When screening for a rare phenomenon, you’re going to get several orders of magnitude more false positives that true positives. When that is scaled up to a population in the millions and a rarity on the order of one in hundreds of thousands, just what sort of other techniques are really going to prevent thousands of innocent people from becoming suspects?

  62. brachyury said,

    March 2, 2009 at 3:57 pm

    we are going round in circles– with people pretending that datamining = supervised learning against the whole population using the highest bandwidth lowest utility form of data.

    I presume they only keep all the data because they dont know who might become a suspect.

    There are perfectly sensible ways in which you might search subsets of subjects with immediate contacts to known suspects– either using low powered supervised learning– or using unsupervised (exploratory datamining) to look for patterns amongst the suspects and their immediate contacts.

    If you don’t like the idea of surveillance- just say you don’t like it – don’t construct ludicrous straw methodologies to diss.

  63. mikewhit said,

    March 2, 2009 at 5:15 pm

    It’s surely entirely possible that all this talk by the Govt. is just for show/window-dressing/discouraging not-so-clever would-be villains … like those ads about “The Database” that knows about you – was that TV Licensing or tax dodging ?

    Also the times that “satellite tracking” gets used by the Govt/media leaving behind the belief in the ill-informed that a satellite is actually watching your every move, whereas it’s just a GPS receiver.

  64. Mark Frank said,

    March 2, 2009 at 5:52 pm

    Re #61

    The same can be said of all screening, including that for cancer research. But the critical flaw still remains. When screening for a rare phenomenon, you’re going to get several orders of magnitude more false positives that true positives.

    OK – lets get some examples on the table.

    1) Suppose the question you are trying to answer is – do we need to raise the national threat level? Then you are not trying to identify a terrorist. You are just asking is at least one unidentified terrorist somewhere in the UK. This changes the maths completely. It is like running a screen which tests for smallpox (if there is one) to answer the question is there at least one case of smallpox in the country.

    2) Suppose intelligence information suggests an imminent threat from a group based in Algeria who recently sent operatives to Leeds. You then run a pattern which is restricted to people living near Leeds and recently arrived from Algeria. Three things are going on

    (a) the question is more focussed – we are looking for a specific terrorist

    (b) the specificity and sensitivity have shot up to very high levels while the base rate has also increased (although less dramatically)

    (c) you only have to indentify one of the operatives and then follow them to forestall the operation

    I doubt anyone would attempt to quantify the specificity and sensitivity but they could run the test and see how many matches they get. It might be thousands which would be impossibly large. But for such a restricted search it might well be less than a hundred. Of course you have missed out on all sorts of possible positives who are not living in Leeds and recently arrived from Algeria. But that’s not the problem right now. We are responding to this particular intelligence.

    As the NAP book says – this is pretty much the kind of thing the intelligence services would do anyway. Are you suggesting it would be cheaper to do it manually?

  65. njdowrick said,

    March 2, 2009 at 5:57 pm

    Re: #59 (Queex)

    I think that I probably do misunderstand something; I’m certainly no expert. Thank you for trying to enlighten me!

    The picture I had in mind was something like this. Let’s suppose there’s a (completely accurate) database containing DNA profiles for everyone in the country, including me. Suppose that you know that one of my genes is carried by only 1% of the population. You search the entire database, and find 6*10^5 matches. Although these include me, the result is not much use.

    But if I have two other genes, each carried by 1% of the population, and if the probability of finding one of these genes in one person’s genome is independent of whether either of the other two are present, then I can combine the tests by looking only for people who carry all three genes. In this case I’d find about 60 matches, including me, which is not so bad.

    That’s all I was saying: if there are several genuinely independent filters that can be applied to the set of people being considered, the specificity of the test becomes much better. Whether this can in practice be applied to the case Ben was writing about, I don’t know, but to me (knowing nothing about the subject) his claim that a specificity of 99.9% is unrealistically high did not seem to be obvious, given the possibility that multiple filters might be applied.

    I hope this clarifies what I had in mind when I wrote my previous responses.

  66. ajw said,

    March 2, 2009 at 6:47 pm

    This report says much the same thing:

    Effective Counterterrorism and the Limited Role of Predictive Data Mining
    Jeff Jonas and Jim Harper
    Policy Analysis No 584
    Cato Institute
    11 December 2006

    Full report:


    The terrorist attacks on September 11, 2001, spurred extraordinary efforts intended to protect America from the newly highlighted scourge of international terrorism. Among the efforts was the consideration and possible use of “data mining” as a way to discover planning and preparation for terrorism. Data mining is the process of searching data for previously unknown patterns and using those patterns to predict future outcomes.

    Information about key members of the 9/11 plot was available to the U.S. government prior to the attacks, and the 9/11 terrorists were closely connected to one another in a multitude of ways. The National Commission on Terrorist Attacks upon the United States concluded that, by pursuing the leads available to it at the time, the government might have derailed the plan.

    Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem. It would be unfortunate if data mining for terrorism discovery had currency within national security, law enforcement, and technology circles because pursuing this use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community.

    What the 9/11 story most clearly calls for is a sharper focus on the part of our national security agencies—their focus had undoubtedly sharpened by the end of the day on September 11, 2001—along with the ability to efficiently locate, access, and aggregate information about specific suspects.

  67. richard_p_auckland said,

    March 2, 2009 at 8:45 pm

    I’d point something else out regarding the figure of 10,000 “terrorists”.

    The IRA were reckoned to have had no more than 400 active members during the Troubles. With this small group, they were able to achieve death and disruption on a pretty regular basis.

    The alleged Islamic terrorists operating in the UK have mounted one fatal and two or three non-fatal attacks in the last eight years. I’d deduce from this that they are either a bunch of fantasists with either no ability or no inclination to mount attacks, or that there are very, very few of them.

    Data mining for 20 terrorists would be even harder than for 10,000.

  68. jodyaberdein said,

    March 2, 2009 at 8:52 pm

    One irritation I have is with the word terrorist. Often it seems that there is deliberate circumspection as to what the word actually means. Certainly if you commit an act of violence or threaten it then detection becomes quite easy. So how to detect those covertly cooking up an act of destruction? I don’t know. I am quite interested in how another criterion for screening might apply though: that there has to be a treatment that works and is acceptable once you have your screen positive population. Internment anyone?

  69. Queex said,

    March 3, 2009 at 10:57 am

    65 njdowrick:

    The trouble is it’s not necessarily the case that those genes are independent. If they’re close to one another on the chromosome, they are in fact quite highly correlated.

    I don’t know about other fields in genetics, but in genome-wide association studies a ‘SNP’ with a rarity of 1% is on the cusp of being rejected for being too rare to work with reliably.

    But how independent can any of factors in the proposed system be? Someone who buys a book on the history of Islamist terrorism is more likely to use flagged words or phrases in their email, independently of whether or not they have any terrorist leanings themselves. The lack of independence makes a big difference to how much better your test gets.

    One point Ben originally made is that the specificity goes down as what you’re looking for becomes rarer. Even if you have a super-effective test, if the proportion of ‘cases’ is super-small, you are still hip-deep in the cacky. Trying to dodge this problem by winnowing down the candidates first doesn’t avoid the problem- it’s just another way of trading away sensitivity for specificity. So to get a workable system in this context (generously allowing 1000 ‘terrorists’ out of 60 million) your test needs to have an accuracy that many precise chemical or physical tests struggle to meet. Expecting to do so with behavioural analysis is foolish.

  70. memotypic said,

    March 3, 2009 at 1:56 pm

    This is all very well, but either way the system is a loser (and I’m not sure which outcome is worse).

    1. System is pants: Little change except a few innocents get harrassed and probably we lose a few more civil liberties along the way.
    2. System works by some miracle and isn’t polluted by savvy people spamming innocents with attack plans and making spurious car journeys etc.

    Frankly, god help us if it is (2). If there really were an efficient system it would really become a (~civil) war at that point. Nothing recruits people (who are largely drifting and disappointed with life and looking for a way out and a ficus for their anger) like having a ‘real’ enemy who is goos at fighting you. A good system would make things much worse. Every new prisoner a martyr, so the martyr count soars. Every relative has grounds to protest against a secret assessment system (they could not reveal how it works/weights because that would kill it).

    The only comfort is that governments are usually bloody awful at anything like this. They can barely spell ‘computer’ let alone use them effectively (because those purchasing and those using are largely different groups, each of which consists of committees of committees). And they don’t seem to be able to keep hold of them either.

    Basically the ‘Dream of Putin’ (/Cheney+/whoever) could never be a reality. Reality is complicated — just ask those in a police state who never did quite manage to get everyone (partly because it is tricky, but probably mostly because for every one sibling/lover/child/parent they captured, they made another two dissenters — hydra-style).

    You cant crush jelly in your hands.

  71. memotypic said,

    March 3, 2009 at 1:57 pm

    A ficus for their anger. Hmm.

    Got a gripe? Get a fig! :)


  72. njdowrick said,

    March 3, 2009 at 5:23 pm

    69 Queex

    Of course genes *can* be correlated, but they don’t have to be. Surely you’d agree that if you are looking for a person who you know lives in Birmingham, regularly eats Cornish Yarg, and collects calculators you’ll get a far smaller set through using all three filters than through using each one individually and ignoring the others. That’s really the only point I’m making.

    I am certainly not arguing that the proposed system would work (still less that it would be a good thing). All I’m saying is that Ben’s order-of-magnitude estimates just aren’t convincing to me given the huge range of evidence that could potentially be available. Ben’s general discussion failed to persuade me that the idea was obviously nonsense; of course, I fully realize that this may say as much about my intelligence (or my knowledge of the field) as it does about Ben’s arguments! To convince me I’d need to know in detail what was proposed; rough estimates that might be reasonable in different fields might not apply here.

  73. heng said,

    March 3, 2009 at 5:52 pm

    njdowrick @72
    Any algorithm ideally uses *all* available information. It is invariably worse to consider individual tests on individual pieces of information and then attempt to combine the results than it is to do tests on all the data at once. The reason for this is that it is much harder to consider the correlations between pieces of data if you process them separately. Of course, if there are no correlations then you don’t lose anything, but you don’t gain anything either.

  74. misterjohn said,

    March 5, 2009 at 6:34 pm

    Just to put in my twopennyworth about the use of tests.


    Suppose you were tested in a large-scale screening programme for a disease known to affect one person in a hundred.

    The test is 90% accurate.
    You test POSITIVE.

    What is the probability that you have the disease?


    Imagine testing 1000 people:

    10 have the disease, so at 90% accuracy we get 9 hits, 1 miss

    990 have no disease, but at 90% accuracy we will also get 99 false positives


    You are one of 108 people to get a positive result, but only 9 of them have the disease

    P = 9/108 = 1/12

    And that’s when we know, firstly, how accurate the test is, and the likelihood of having the disease. Anyone like to suggest prior probabilities for “Being a terrorist”, or the degree of accuracy of these tests, as has already been said.

  75. dslick said,

    March 6, 2009 at 7:46 pm

    Several posts have noted that iterative or recursive methods of data mining such as using neural network learning systems might lead to development of sufficiently accurate algorithms for separating signal (terrorists) from noise (non-terrorists) that it would be cost effective to thoroughly investigate all “possible terrorist so identified. Such an approach has in fact been used fruitfully in other areas of science. However, this approach assumes that essential properties of the objects being detected do not change fundamentally over time. I’m not confident that such an approach would work well in a situation where terrorist’s methods for communicating, organizing themselves, recruiting, raising funds, etc actively evolve over time with a primary goal of remaining undetected.

    Another concern with data mining is that sophisticated terrorist organizations that are able to learn about important variables in data mining algorithms would not only be able to change their operations to avoid detection, but might actually set up operations designed to massively increase false-positive in order to bog down the system. For example, they might set up an “operation” that is specifically designed to be discovered for purposes of implicating very large numbers of non-terrorists through linkages used by the search algorithm for identifying suspects (e.g., email communications).

  76. Robert Carnegie said,

    March 16, 2009 at 4:10 pm

    Regarding the IRA incidentally: the newly active again Regal IRA and Continental IRA have been estimated recently at 300 people. If the figure of 400 members of the previous IRA is correct, then allowing for retirements that must be nearly all of them, with the possible exception of Martin McGuinness. None of which surprises me. Every time there is an Irish peace deal, the entire IRA membership disbands and reforms under the guise of being Different Republicans.

  77. annsaet said,

    October 21, 2010 at 11:07 pm

    I’m very happy to see this point – the one about the poor predictive value of positive hits using even the most precise algorithm when using it to pick out a small number of “true positive” cases in a large, innocent population – getting more traction. The bigger the haystack, the harder it is to find true needles. See also my article “Nothing to hide, nothing to fear?” in the International Criminal Justice Review (2007). I’m also happy to see people picking up on the point of what we do have to fear if we give up our privacy in a moment of moral panic over terrorism. For instance: While it is hard to find terrorists in the haystack, it is relatively easy to find whistleblowers. Those in power whose misdeeds have been “blown” will know exactly what journalist got the story and approximately when. Not so hard to find out who said journalist was talking to at the time if you have her phone records, eh? And what about tracking your political opposition? This isn’t hypothetical. See After all … who gets to say what counts as “terrorism” once we give some mighty protector the right to sift through whatever he thinks he needs to see in order to protect us against it? None other than that mighty protector himself, is who! We know they’re already abusing the powers they have (and some they don’t). Why on earth would we voluntarily give them more? And who’s being “naive” if we do?

  78. Davis1 said,

    July 4, 2013 at 1:13 pm

    Thanks for sharing….

  79. Davis1 said,

    July 4, 2013 at 1:15 pm

    Information is very useful…