I’ve been thinking about Franklin, Perrig, Paxson, and Savage’s “An Inquiry into the Nature and Causes of the Wealth of Internet Miscreants” for about three weeks now.
This is a very good paper. For the infosec empiricist, the dataset itself is noteworthy. It consists of 13 million public IRC messages (that is, in-channel stuff, not PRIVMSGs) obtained from several networks and channels, collected over a 7 month period. These messages contain sensitive information (such as PII) and offers to sell various illicit goods. The authors provide no information concerning the process by which the IRC networks and channels were selected for monitoring, a matter which may be relevant for those seeking to replicate their findings. For the CS crowd, they use a nifty machine-learning technique to identify and categorize messages which are advertisements.
The authors are able to present a number of fascinating descriptive statistics about the market they study, including the number and activity level of market participants, price history, measures of flow of goods into the market, statistics on which goods are offered for sale most often, etc.
This paper has gotten some attention in the trade press because it discusses methods which could potentially be used to disrupt this IRC-based underground economy. In a nutshell, the key is to make it impossible to tell good sellers from bad, thereby deliberately creating a market for lemons and driving out customers.
Ultimately, there are way more questions than answers. This has nothing to do with the paper, which is excellent. It has to do with disciplinary maturity, which we in information security lack, and with quality data, which we lack even more.
But dwelling on the positive for a moment, it is interesting to consider what we might be able to investigate using a dataset like this. At a macro level, we might be able to observe the price reaction given a sudden increase in supply. For example, if we have independent confirmation that at a particular time 100 million credit card numbers became available for sale, it would be interesting to see if this was followed by a drop in the asking price, and if so, how large a drop.
Even more interesting: if we already have an idea about the elasticity of price with respect to supply, we can estimate the size of the market based on observed price movements given a supply shock of known size. If, similarly, we observe an unexplained drop in price, we may presume that an unreported supply shock has taken place. This is an indirect estimator of the amount of the personal information iceberg existing below the waterline. Cool!
There’s also a certain practical value. Consider the recent UK data breach. Already, there are reports that personal information from this incident are appearing on the underground market. Franklin, et. al. have provided us with an estimate of how much traffic in UK PII existed prior to this breach. The same surveillance techniques which informed their analysis are undoubtedly still under way today. Perhaps three, six, or twelve months from now a second analysis will show a dramatic increase in the amount of UK PII flowing through the market. The policy ramifications of knowing how great the lag is between PII being pilfered and its appearance on the market are significant. What use is a twelve month credit freeze, for example, if the lag is 24 months?
A final question that data like this can help answer concerns the relationship between breaches and identity theft. Since British banks reportedly are balking at monitoring all the accounts involved for fraud, this opportunity may be squandered. I commented on the important role banks could play back in June:
One way to estimate the extent to which having your PII exposed in a breach increases the probability of your becoming an identity theft victim is to watch for the exposed data elements using a fraud detection network
Other than using banks as a focal point and having them report on fraud using these stolen elements, I cannot think of another way.
I suppose one could try to determine whether the stolen elements were in the inventory of any black-market sellers, but I do not see how one can gain access to their inventory information. It’s clear that the illicit trade in this stuff is non-trivial, but I honestly do not know that we have anything approaching a comprehensive picture of the landscape.
I now see that I was overly pessimistic in my last two sentences, and I am thankful for that.
Let me close with a quotation about price data which reflects my current mood:
Certainly they tell us a great deal, some but not all of
which is reflected in policy debates.
At the same time much remains unknown. Given the number of instances in which deductive
arguments have been promulgated with great confidence only to be refuted by empirical
evidence, it seems wise to be somewhat cautious in drawing conclusions that go beyond the
scope of the data.
Although some of what is not known is probably unknowable–at least in the medium term–there
are considerable opportunities for expanding the range of questions that have been addressed
empirically. Price data are much more accessible than data pertaining to prevalence or quantity.
A relatively modest investment of resources could substantially increase both the quantity and the quality of the price data available for analysis.
This observation is from “What price data tell us about drug markets”, a 1998 paper. I fervently hope its applicability to the information security world is demonstrated by a stream of papers stimulated by Franklin, et. al.