Saturday February 28 2009
This week Sir David Omand, the former Whitehall security and intelligence co-ordinator, described how the state should analyse data about individuals in order to find terrorist suspects: travel information, tax, phone records, emails, and so on. “Finding out other people’s secrets is going to involve breaking everyday moral rules” he said, because we’ll need to screen everyone to find the small number of suspects.
There is one very significant issue that will always make data mining unworkable when used to search for terrorist suspects in a general population, and that is what we might call the “baseline problem”: even with the most brilliantly accurate test imaginable, your risk of false positives increases to unworkably high levels, as the outcome you are trying predict becomes rarer in the population you are examining. This stuff is tricky but important. If you pay attention you will understand it.
Let’s imagine you have an amazingly accurate test, and each time you use it on a true suspect, it will correctly identify them as such 8 times out of 10 (but miss them 2 times out of 10); and each time you use it on an innocent person, it will correctly identify them as innocent 9 times out of 10, but incorrectly identify them as a suspect 1 time out of 10.
These numbers tell you about the chances of a test result being accurate, given the status of the individual, which you already know (and the numbers are a stable property of the test). But you stand at the other end of the telescope: you have the result of a test, and you want to use that to work out the status of the individual. That depends entirely on how many suspects there are in the population being tested.
If you have 10 people, and you know that 1 is a suspect, and you assess them all with this test, then you will correctly get your one true positive and – on average – 1 false positive. If you have 100 people, and you know that 1 is a suspect, you will get your one true positive and, on average, 10 false positives. If you’re looking for one suspect among 1000 people, you will get your suspect, and 100 false positives. Once your false positives begin to dwarf your true positives, a positive result from the test becomes pretty unhelpful.
Remember this is a screening tool, for assessing dodgy behaviour, spotting dodgy patterns, in a general population. We are invited to accept that everybody’s data will be surveyed and processed, because MI5 have clever algorithms to identify people who were never previously suspected. There are 60 million people in the UK, with, let’s say, 10,000 true suspects. Using your unrealistically accurate imaginary screening test, you get 6 million false positives. At the same time, of your 10,000 true suspects, you miss 2,000.
If you raise the bar on any test, to increase what statisticians call the “specificity”, and thus make it less prone to false positives, then you also make it much less sensitive, so you start missing even more of your true suspects (remember you’re already missing 2 in 10 of them).
Or do you just want an even more stupidly accurate imaginary test, without sacrificing true positives? It won’t get you far. Let’s say you incorrectly identify an innocent person as a suspect 1 time in 100: you get 600,000 false positives. 1 time in 1000? Come on. Even with these infeasibly accurate imaginary tests, when you screen a general population as proposed, it is hard to imagine a point where the false positives are usefully low, and the true positives are not missed. And our imaginary test really was ridiculously good: it’s a very difficult job to identify suspects, just from slightly abnormal patterns in the normal things that everybody does.
Things get worse. These suspects are undercover operatives, they’re trying to hide from you, they know you’re data-mining, so they will go out of their way to produce trails which can confuse you.
And lastly, there is the problem of validating your algorithms, and callibrating your detection systems. To do that, you need training data: 10,000 people where you know for definite if they are suspects or not, to compare your test results against. It’s hard to picture how that can be done.
I’m not saying you shouldn’t spy on everyday people: obviously I have a view, but I’m happy to leave the morality and politics to those less nerdy than me. I’m just giving you the maths on specificity, sensitivity, and false positives.
Please send your bad science to email@example.com
Other good links on this include Schneier:
and some eggheads: