Datamining for terrorists would be lovely if it worked

February 28th, 2009 by Ben Goldacre in bad science, evidence based policy, politics, statistics, surveillance | 79 Comments »

The Guardian
Saturday February 28 2009
Ben Goldacre

This week Sir David Omand, the former Whitehall security and intelligence co-ordinator, described how the state should analyse data about individuals in order to find terrorist suspects: travel information, tax, phone records, emails, and so on. “Finding out other people’s secrets is going to involve breaking everyday moral rules” he said, because we’ll need to screen everyone to find the small number of suspects.

There is one very significant issue that will always make data mining unworkable when used to search for terrorist suspects in a general population, and that is what we might call the “baseline problem”: even with the most brilliantly accurate test imaginable, your risk of false positives increases to unworkably high levels, as the outcome you are trying predict becomes rarer in the population you are examining. This stuff is tricky but important. If you pay attention you will understand it.

Let’s imagine you have an amazingly accurate test, and each time you use it on a true suspect, it will correctly identify them as such 8 times out of 10 (but miss them 2 times out of 10); and each time you use it on an innocent person, it will correctly identify them as innocent 9 times out of 10, but incorrectly identify them as a suspect 1 time out of 10.

These numbers tell you about the chances of a test result being accurate, given the status of the individual, which you already know (and the numbers are a stable property of the test). But you stand at the other end of the telescope: you have the result of a test, and you want to use that to work out the status of the individual. That depends entirely on how many suspects there are in the population being tested.

If you have 10 people, and you know that 1 is a suspect, and you assess them all with this test, then you will correctly get your one true positive and – on average – 1 false positive. If you have 100 people, and you know that 1 is a suspect, you will get your one true positive and, on average, 10 false positives. If you’re looking for one suspect among 1000 people, you will get your suspect, and 100 false positives. Once your false positives begin to dwarf your true positives, a positive result from the test becomes pretty unhelpful.

Remember this is a screening tool, for assessing dodgy behaviour, spotting dodgy patterns, in a general population. We are invited to accept that everybody’s data will be surveyed and processed, because MI5 have clever algorithms to identify people who were never previously suspected. There are 60 million people in the UK, with, let’s say, 10,000 true suspects. Using your unrealistically accurate imaginary screening test, you get 6 million false positives. At the same time, of your 10,000 true suspects, you miss 2,000.

If you raise the bar on any test, to increase what statisticians call the “specificity”, and thus make it less prone to false positives, then you also make it much less sensitive, so you start missing even more of your true suspects (remember you’re already missing 2 in 10 of them).

Or do you just want an even more stupidly accurate imaginary test, without sacrificing true positives? It won’t get you far. Let’s say you incorrectly identify an innocent person as a suspect 1 time in 100: you get 600,000 false positives. 1 time in 1000? Come on. Even with these infeasibly accurate imaginary tests, when you screen a general population as proposed, it is hard to imagine a point where the false positives are usefully low, and the true positives are not missed. And our imaginary test really was ridiculously good: it’s a very difficult job to identify suspects, just from slightly abnormal patterns in the normal things that everybody does.

Things get worse. These suspects are undercover operatives, they’re trying to hide from you, they know you’re data-mining, so they will go out of their way to produce trails which can confuse you.

And lastly, there is the problem of validating your algorithms, and callibrating your detection systems. To do that, you need training data: 10,000 people where you know for definite if they are suspects or not, to compare your test results against. It’s hard to picture how that can be done.

I’m not saying you shouldn’t spy on everyday people: obviously I have a view, but I’m happy to leave the morality and politics to those less nerdy than me. I’m just giving you the maths on specificity, sensitivity, and false positives.

Please send your bad science to bad.science@guardian.co.uk

More:

Other good links on this include Schneier:

www.schneier.com/essay-108.html

and some eggheads:

www.nap.edu/catalog.php?record_id=12452


++++++++++++++++++++++++++++++++++++++++++
If you like what I do, and you want me to do more, you can: buy my books Bad Science and Bad Pharma, give them to your friends, put them on your reading list, employ me to do a talk, or tweet this article to your friends. Thanks! ++++++++++++++++++++++++++++++++++++++++++

79 Responses



  1. adamcgf said,

    March 1, 2009 at 9:35 am

    #49 and #50 give a kind of mixture of what I wanted to say. Although the details of the maths and stats that Ben is (to another humanities boy) endlessly fascinating ultimately this is a waste of money for a host of reasons which are nothing to do with data mining and everything to do with wider policy. I remain deeply sceptical about the idea that we have vast identifiable and observable webs of terrorists plotting in our midsts, it feels much too much like a neo-con’s wet dream, and, more than anything, if we want to ‘defeat terrorists’ then we should treat them for what they are, small unrelated groups of criminals carrying out isolated criminal acts, rather than give them what they want and think of them as a big intelligent and all powerful organisation. We make these things true by believing them – there’s an unscientific statement for you, but I’d stand by it.

  2. Toenex said,

    March 1, 2009 at 11:07 am

    At least if naferious organisations decide to utilise more traditional forms of communication the government has control over the Royal Mail. Ohhh.

  3. Pro-reason said,

    March 1, 2009 at 11:19 am

    I’m not talking about the past.

  4. The Biologista said,

    March 1, 2009 at 12:59 pm

    Mark Frank: “Ben – you can call it screening if you like but there must be substantial statistical differences between screening for cancer and using data mining for security.”

    Yes indeed. We can assess the sensitivity, specificity etc for a cancer screening. We can’t assess a screen for terrorist detection because they don’t like to sit still for tests designed to catch them.

    Given the target (largely predictable cancer cells versus largely unpredictable people) there are bound to be statistical differences. The data mining is surely far weaker.

  5. Rich said,

    March 1, 2009 at 2:48 pm

    The clever people at MI5 already know it doesn’t work. Also from Bruce Scneier:
    www.schneier.com/blog/archives/2008/08/mi5_on_terroris.html

    Thereis no terrorist profile. If that’s true talk of “false positives” is meaningless since the tool is not better than picking people at random.

  6. Robert Carnegie said,

    March 1, 2009 at 5:37 pm

    Of course the US has had a policy in place for several years. Amongst people arrested at airports for attempting to board a planet are many WASP members of the Democratic Party. This may may be one of many things that President Obama has not gotten aroond to fixing yet, I haven’t been following it closely.

  7. Robert Carnegie said,

    March 1, 2009 at 5:38 pm

    “attempting to board a planet”, oh you know what I mean. BBC Radio has a science fiction event on, so does Film Four from Monday I think.

  8. Mark Frank said,

    March 2, 2009 at 10:15 am

    Re #54 Biologista
    “Yes indeed. We can assess the sensitivity, specificity etc for a cancer screening. We can’t assess a screen for terrorist detection because they don’t like to sit still for tests designed to catch them.

    Given the target (largely predictable cancer cells versus largely unpredictable people) there are bound to be statistical differences. The data mining is surely far weaker.”

    Of course it is true that you can’t assess the sensitivity and specificity of some kind of test for terrorists, especially as they are going vary wildly from one situation to another. But that’s not my reason for suggesting the cancer screening model is not appropriate. The most important reasons are that the security forces may be trying to answer a different question and that merging with other information sources can make a dramatic difference to the usefulness of the model.

    A full explanation is a bit long for a comment so I have put it on my personal blog.

  9. Queex said,

    March 2, 2009 at 10:46 am

    njdowrick:

    I think you’re misunderstanding how meta-analysis of different algorithms would actually work.

    To have two independent algorithms A and B, each must operate on different subsets of the data. You can combine the results to improve on either individually, but there’s no reason to think it will be any better than an algorithm C working on all the data.

    If both algorithms work on the full data (or indeed overlap at all), then they are not truly independent, which can greatly reduce the specificity gain from combining them- and there’s still no reason to think they would beat C.

    False positives don’t just arise from quirks of the algorithm used, they also arise from quirks of the data itself. The point is, even with the ambitious estimates of accuracy, for every true positive there will be a number of false positives and the only way to tell them apart is by going out and finding more data.

    Meta-analysis is not a panacea to wipe away false positive problems; it’s a toolbox to help get at the results you would have obtained had you done a single big study. If this hypothetical single study is still not specific enough to be practical, you’re SOL.

    A lot of the apologists for the approach seem to think that as long as some gain can be shown, it’s reason to support the idea. It’s not. You have to show that it’s more effective than spending a similar amount of money on other approaches. When you look at the likely cost (in terms of infrastructure and the expense of sorting wheat from chaff once you’ve done your mining), even the most unrealistically optimistic guesses as to its efficacy still make it worse than say, hiring a few dozen extra agents or funding a public education programme.

  10. heng said,

    March 2, 2009 at 3:17 pm

    Queex @58 made the comment I was about to make (although probably a bit more politely).

    Following from his discussion, if 2 algorithms, A and B yielded better combined results than algorithm C, then we would have a new algorithm, D, which split the data into 2 and passed it through A and B and combined the result. Its nonsensical to talk about an algorithm as anything other than a black box with inputs (all data) and outputs (terrorist or diseased or stolen credit card or whatever).

    For those thinking that medical diagnostics is anything other than data mining, frankly, you are wrong. Its *exactly* the same problem. You have inputs (blood tests, histories, CT images, MRI images) and outputs (have disease X?). Diagnostic doctors are just bayesian inference machines. If you could perfectly input the information into a computer, the computer would probably do a better job. Compare this to data mining for terrorists. You have inputs (phone taps, emails, cctv, informants, police interviews etc) and you have outputs (is terrorist?). Once you’ve defined your feature vector, in both cases you could use any of the widely researched machine learning algorithms. For the moment, assume you have enough training data, how would you define your feature vector?

    In principle, if you had enough computation, your feature vector could be everything: full frame CCTV images (because anything in the image could be of interest), every email on the internet (which of course can’t be encrypted), lossless PCM speech digitised from every phone call in the country (you need to keep accents and so on), police reports etc. Put them all into one big black box algorithm, along with masses of training data, and see what pops out the end. Of course, in the process you’ll pretty much solve several of the biggest machine intelligence problems around today. It would need to use every email because any self respecting terrorist is not going to flood the net with information.

    Alternatively, because you don’t have infinite computing power, you could reduce your feature vector by passing all the data through a data reduction algorithm. This would either be “tuned” (concentrating on individuals, or areas of the country or whatever), which comes down to profiling, or it would be random, throwing away random pieces of information until we have a computationally tractable problem (this is problematic for the reason given above). Whether what’s left is even potentially useful is anyone’s guess. The real problem here is that (unlike credit card transactions) nobody has the slightest clue what real terrorist actually do that’s different to non-terrorists – mostly because terrorists realise this and go out of their way to be normal.

    Now we come onto the problem of training data. How many terrorists have there been in the UK in the last 50 years (1000 tops?). Assuming that all the data these terrorists generate can be data mined for terrorist traits, and that these traits are correlated across all the data, so useful learning can actually be performed, what sort of problem do we have? Let’s assume that for every terrorist, there is 10 years of incriminating chatter to store and analyse. That means we have a sum total of 10,000 person years worth of incriminating data to analyse and train on. Compare this to the total chatter from the rest of the population: 3e9 person years. That swamps the amount of positive data with hideously noisy negative data. We require a non-trivial (!) amount of computation to train on that. 3 billion years worth of every CCTV in the country, every phone call in the country, every email in the country. Of course, this relies on all the information being available for every known terrorist…

    Already its a completely ludicrous proposal. I can’t be bothered to think what the requirements would be to actually run such a preposterous scheme. The absolute, fundamental problem, is that the vast majority of people and the vast majority of data (even from terrorists) is in no way suggestive of terrorists. Its useless information that still has to be processed. Nothing beats: a) solving the problem in the first place; b) actually acquiring useful information (that means police and intelligence services work).

  11. The Biologista said,

    March 2, 2009 at 3:42 pm

    Mark Frank: “Of course it is true that you can’t assess the sensitivity and specificity of some kind of test for terrorists, especially as they are going vary wildly from one situation to another. But that’s not my reason for suggesting the cancer screening model is not appropriate. The most important reasons are that the security forces may be trying to answer a different question and that merging with other information sources can make a dramatic difference to the usefulness of the model.”

    The same can be said of all screening, including that for cancer research. But the critical flaw still remains. When screening for a rare phenomenon, you’re going to get several orders of magnitude more false positives that true positives. When that is scaled up to a population in the millions and a rarity on the order of one in hundreds of thousands, just what sort of other techniques are really going to prevent thousands of innocent people from becoming suspects?

  12. brachyury said,

    March 2, 2009 at 3:57 pm

    we are going round in circles– with people pretending that datamining = supervised learning against the whole population using the highest bandwidth lowest utility form of data.

    I presume they only keep all the data because they dont know who might become a suspect.

    There are perfectly sensible ways in which you might search subsets of subjects with immediate contacts to known suspects– either using low powered supervised learning– or using unsupervised (exploratory datamining) to look for patterns amongst the suspects and their immediate contacts.

    If you don’t like the idea of surveillance- just say you don’t like it – don’t construct ludicrous straw methodologies to diss.

  13. mikewhit said,

    March 2, 2009 at 5:15 pm

    It’s surely entirely possible that all this talk by the Govt. is just for show/window-dressing/discouraging not-so-clever would-be villains … like those ads about “The Database” that knows about you – was that TV Licensing or tax dodging ?

    Also the times that “satellite tracking” gets used by the Govt/media leaving behind the belief in the ill-informed that a satellite is actually watching your every move, whereas it’s just a GPS receiver.

  14. Mark Frank said,

    March 2, 2009 at 5:52 pm

    Re #61

    The same can be said of all screening, including that for cancer research. But the critical flaw still remains. When screening for a rare phenomenon, you’re going to get several orders of magnitude more false positives that true positives.

    OK – lets get some examples on the table.

    1) Suppose the question you are trying to answer is – do we need to raise the national threat level? Then you are not trying to identify a terrorist. You are just asking is at least one unidentified terrorist somewhere in the UK. This changes the maths completely. It is like running a screen which tests for smallpox (if there is one) to answer the question is there at least one case of smallpox in the country.

    2) Suppose intelligence information suggests an imminent threat from a group based in Algeria who recently sent operatives to Leeds. You then run a pattern which is restricted to people living near Leeds and recently arrived from Algeria. Three things are going on

    (a) the question is more focussed – we are looking for a specific terrorist

    (b) the specificity and sensitivity have shot up to very high levels while the base rate has also increased (although less dramatically)

    (c) you only have to indentify one of the operatives and then follow them to forestall the operation

    I doubt anyone would attempt to quantify the specificity and sensitivity but they could run the test and see how many matches they get. It might be thousands which would be impossibly large. But for such a restricted search it might well be less than a hundred. Of course you have missed out on all sorts of possible positives who are not living in Leeds and recently arrived from Algeria. But that’s not the problem right now. We are responding to this particular intelligence.

    As the NAP book says – this is pretty much the kind of thing the intelligence services would do anyway. Are you suggesting it would be cheaper to do it manually?

  15. njdowrick said,

    March 2, 2009 at 5:57 pm

    Re: #59 (Queex)

    I think that I probably do misunderstand something; I’m certainly no expert. Thank you for trying to enlighten me!

    The picture I had in mind was something like this. Let’s suppose there’s a (completely accurate) database containing DNA profiles for everyone in the country, including me. Suppose that you know that one of my genes is carried by only 1% of the population. You search the entire database, and find 6*10^5 matches. Although these include me, the result is not much use.

    But if I have two other genes, each carried by 1% of the population, and if the probability of finding one of these genes in one person’s genome is independent of whether either of the other two are present, then I can combine the tests by looking only for people who carry all three genes. In this case I’d find about 60 matches, including me, which is not so bad.

    That’s all I was saying: if there are several genuinely independent filters that can be applied to the set of people being considered, the specificity of the test becomes much better. Whether this can in practice be applied to the case Ben was writing about, I don’t know, but to me (knowing nothing about the subject) his claim that a specificity of 99.9% is unrealistically high did not seem to be obvious, given the possibility that multiple filters might be applied.

    I hope this clarifies what I had in mind when I wrote my previous responses.

  16. ajw said,

    March 2, 2009 at 6:47 pm

    This report says much the same thing:

    Effective Counterterrorism and the Limited Role of Predictive Data Mining
    Jeff Jonas and Jim Harper
    Policy Analysis No 584
    Cato Institute
    11 December 2006

    Full report: www.cato.org/pub_display.php?pub_id=6784

    Abstract

    The terrorist attacks on September 11, 2001, spurred extraordinary efforts intended to protect America from the newly highlighted scourge of international terrorism. Among the efforts was the consideration and possible use of “data mining” as a way to discover planning and preparation for terrorism. Data mining is the process of searching data for previously unknown patterns and using those patterns to predict future outcomes.

    Information about key members of the 9/11 plot was available to the U.S. government prior to the attacks, and the 9/11 terrorists were closely connected to one another in a multitude of ways. The National Commission on Terrorist Attacks upon the United States concluded that, by pursuing the leads available to it at the time, the government might have derailed the plan.

    Though data mining has many valuable uses, it is not well suited to the terrorist discovery problem. It would be unfortunate if data mining for terrorism discovery had currency within national security, law enforcement, and technology circles because pursuing this use of data mining would waste taxpayer dollars, needlessly infringe on privacy and civil liberties, and misdirect the valuable time and energy of the men and women in the national security community.

    What the 9/11 story most clearly calls for is a sharper focus on the part of our national security agencies—their focus had undoubtedly sharpened by the end of the day on September 11, 2001—along with the ability to efficiently locate, access, and aggregate information about specific suspects.

  17. richard_p_auckland said,

    March 2, 2009 at 8:45 pm

    I’d point something else out regarding the figure of 10,000 “terrorists”.

    The IRA were reckoned to have had no more than 400 active members during the Troubles. With this small group, they were able to achieve death and disruption on a pretty regular basis.

    The alleged Islamic terrorists operating in the UK have mounted one fatal and two or three non-fatal attacks in the last eight years. I’d deduce from this that they are either a bunch of fantasists with either no ability or no inclination to mount attacks, or that there are very, very few of them.

    Data mining for 20 terrorists would be even harder than for 10,000.

  18. jodyaberdein said,

    March 2, 2009 at 8:52 pm

    One irritation I have is with the word terrorist. Often it seems that there is deliberate circumspection as to what the word actually means. Certainly if you commit an act of violence or threaten it then detection becomes quite easy. So how to detect those covertly cooking up an act of destruction? I don’t know. I am quite interested in how another criterion for screening might apply though: that there has to be a treatment that works and is acceptable once you have your screen positive population. Internment anyone?

  19. Queex said,

    March 3, 2009 at 10:57 am

    65 njdowrick:

    The trouble is it’s not necessarily the case that those genes are independent. If they’re close to one another on the chromosome, they are in fact quite highly correlated.

    I don’t know about other fields in genetics, but in genome-wide association studies a ‘SNP’ with a rarity of 1% is on the cusp of being rejected for being too rare to work with reliably.

    But how independent can any of factors in the proposed system be? Someone who buys a book on the history of Islamist terrorism is more likely to use flagged words or phrases in their email, independently of whether or not they have any terrorist leanings themselves. The lack of independence makes a big difference to how much better your test gets.

    One point Ben originally made is that the specificity goes down as what you’re looking for becomes rarer. Even if you have a super-effective test, if the proportion of ‘cases’ is super-small, you are still hip-deep in the cacky. Trying to dodge this problem by winnowing down the candidates first doesn’t avoid the problem- it’s just another way of trading away sensitivity for specificity. So to get a workable system in this context (generously allowing 1000 ‘terrorists’ out of 60 million) your test needs to have an accuracy that many precise chemical or physical tests struggle to meet. Expecting to do so with behavioural analysis is foolish.

  20. memotypic said,

    March 3, 2009 at 1:56 pm

    This is all very well, but either way the system is a loser (and I’m not sure which outcome is worse).

    1. System is pants: Little change except a few innocents get harrassed and probably we lose a few more civil liberties along the way.
    2. System works by some miracle and isn’t polluted by savvy people spamming innocents with attack plans and making spurious car journeys etc.

    Frankly, god help us if it is (2). If there really were an efficient system it would really become a (~civil) war at that point. Nothing recruits people (who are largely drifting and disappointed with life and looking for a way out and a ficus for their anger) like having a ‘real’ enemy who is goos at fighting you. A good system would make things much worse. Every new prisoner a martyr, so the martyr count soars. Every relative has grounds to protest against a secret assessment system (they could not reveal how it works/weights because that would kill it).

    The only comfort is that governments are usually bloody awful at anything like this. They can barely spell ‘computer’ let alone use them effectively (because those purchasing and those using are largely different groups, each of which consists of committees of committees). And they don’t seem to be able to keep hold of them either.

    Basically the ‘Dream of Putin’ (/Cheney+/whoever) could never be a reality. Reality is complicated — just ask those in a police state who never did quite manage to get everyone (partly because it is tricky, but probably mostly because for every one sibling/lover/child/parent they captured, they made another two dissenters — hydra-style).

    You cant crush jelly in your hands.

  21. memotypic said,

    March 3, 2009 at 1:57 pm

    A ficus for their anger. Hmm.

    Got a gripe? Get a fig! 🙂

    F*o*cus…

  22. njdowrick said,

    March 3, 2009 at 5:23 pm

    69 Queex

    Of course genes *can* be correlated, but they don’t have to be. Surely you’d agree that if you are looking for a person who you know lives in Birmingham, regularly eats Cornish Yarg, and collects calculators you’ll get a far smaller set through using all three filters than through using each one individually and ignoring the others. That’s really the only point I’m making.

    I am certainly not arguing that the proposed system would work (still less that it would be a good thing). All I’m saying is that Ben’s order-of-magnitude estimates just aren’t convincing to me given the huge range of evidence that could potentially be available. Ben’s general discussion failed to persuade me that the idea was obviously nonsense; of course, I fully realize that this may say as much about my intelligence (or my knowledge of the field) as it does about Ben’s arguments! To convince me I’d need to know in detail what was proposed; rough estimates that might be reasonable in different fields might not apply here.

  23. heng said,

    March 3, 2009 at 5:52 pm

    njdowrick @72
    Any algorithm ideally uses *all* available information. It is invariably worse to consider individual tests on individual pieces of information and then attempt to combine the results than it is to do tests on all the data at once. The reason for this is that it is much harder to consider the correlations between pieces of data if you process them separately. Of course, if there are no correlations then you don’t lose anything, but you don’t gain anything either.

  24. misterjohn said,

    March 5, 2009 at 6:34 pm

    Just to put in my twopennyworth about the use of tests.

    Question

    Suppose you were tested in a large-scale screening programme for a disease known to affect one person in a hundred.

    The test is 90% accurate.
    You test POSITIVE.

    What is the probability that you have the disease?

    Answer

    Imagine testing 1000 people:

    10 have the disease, so at 90% accuracy we get 9 hits, 1 miss

    990 have no disease, but at 90% accuracy we will also get 99 false positives

    Therefore:

    You are one of 108 people to get a positive result, but only 9 of them have the disease

    P = 9/108 = 1/12

    And that’s when we know, firstly, how accurate the test is, and the likelihood of having the disease. Anyone like to suggest prior probabilities for “Being a terrorist”, or the degree of accuracy of these tests, as has already been said.

  25. dslick said,

    March 6, 2009 at 7:46 pm

    Several posts have noted that iterative or recursive methods of data mining such as using neural network learning systems might lead to development of sufficiently accurate algorithms for separating signal (terrorists) from noise (non-terrorists) that it would be cost effective to thoroughly investigate all “possible terrorist so identified. Such an approach has in fact been used fruitfully in other areas of science. However, this approach assumes that essential properties of the objects being detected do not change fundamentally over time. I’m not confident that such an approach would work well in a situation where terrorist’s methods for communicating, organizing themselves, recruiting, raising funds, etc actively evolve over time with a primary goal of remaining undetected.

    Another concern with data mining is that sophisticated terrorist organizations that are able to learn about important variables in data mining algorithms would not only be able to change their operations to avoid detection, but might actually set up operations designed to massively increase false-positive in order to bog down the system. For example, they might set up an “operation” that is specifically designed to be discovered for purposes of implicating very large numbers of non-terrorists through linkages used by the search algorithm for identifying suspects (e.g., email communications).

  26. Robert Carnegie said,

    March 16, 2009 at 4:10 pm

    Regarding the IRA incidentally: the newly active again Regal IRA and Continental IRA have been estimated recently at 300 people. If the figure of 400 members of the previous IRA is correct, then allowing for retirements that must be nearly all of them, with the possible exception of Martin McGuinness. None of which surprises me. Every time there is an Irish peace deal, the entire IRA membership disbands and reforms under the guise of being Different Republicans.

  27. annsaet said,

    October 21, 2010 at 11:07 pm

    I’m very happy to see this point – the one about the poor predictive value of positive hits using even the most precise algorithm when using it to pick out a small number of “true positive” cases in a large, innocent population – getting more traction. The bigger the haystack, the harder it is to find true needles. See also my article “Nothing to hide, nothing to fear?” in the International Criminal Justice Review (2007). I’m also happy to see people picking up on the point of what we do have to fear if we give up our privacy in a moment of moral panic over terrorism. For instance: While it is hard to find terrorists in the haystack, it is relatively easy to find whistleblowers. Those in power whose misdeeds have been “blown” will know exactly what journalist got the story and approximately when. Not so hard to find out who said journalist was talking to at the time if you have her phone records, eh? And what about tracking your political opposition? This isn’t hypothetical. See www.truth-out.org/attention-left-liberal-and-radical-groups-pennsylvania-has-been-monitoring-you63957. After all … who gets to say what counts as “terrorism” once we give some mighty protector the right to sift through whatever he thinks he needs to see in order to protect us against it? None other than that mighty protector himself, is who! We know they’re already abusing the powers they have (and some they don’t). Why on earth would we voluntarily give them more? And who’s being “naive” if we do?

  28. Davis1 said,

    July 4, 2013 at 1:13 pm

    Thanks for sharing….

    www.dataoutsourcingindia.com

  29. Davis1 said,

    July 4, 2013 at 1:15 pm

    Information is very useful…

    www.dataentryindia.biz/