Measuring the Wrong Stuff

There’s a great deal of discussion out there about security metrics. There’s a belief that better measurement will improve things. And while I don’t disagree, there are substantial risks from measuring the wrong things:

Because the grades are based largely on improvement, not simply meeting state standards, some high-performing schools received low grades. The Clove Valley School in Staten Island, for instance, received an F, although 86.5 percent of the students at the school met state standards in reading on the 2007 tests.

On the opposite end of the spectrum, some schools that had a small number of students reaching state standards on tests received grades that any child would be thrilled to take home. At the East Village Community School, for example, 60 percent of the students met state standards in reading, but the school received an A, largely because of the improvement it showed over 2006, when 46.3 percent of its students met state standards. (The New York Times, “50 Public Schools Fail Under New Rating System

Get that? The school that flunked has more students meeting state standards than the school that got an A.

There’s two important takeaways. First, if you’re reading “scorecards” from somewhere, make sure you understand the nitty gritty details. Second, if you’re designing metrics, consider what perverse incentives and results you may be getting. For example, if I were a school principal today, every other year I’d forbid teachers from mentioning the test. That year’s students would do awfully, and then I’d have an easy time improving next year.

NYT Reporter Has Never Heard of Descartes


Or perhaps more correctly, did not internalize Descartes when he heard of him. In “Our Lives, Controlled From Some Guy’s Couch,” John Tierney writes:

Until I talked to Nick Bostrom, a philosopher at Oxford University, it never occurred to me that our universe might be somebody else’s hobby. I hadn’t imagined that the omniscient, omnipotent creator of the heavens and earth could be an advanced version of a guy who spends his weekends building model railroads or overseeing video-game worlds like the Sims.

It is for occasions such as these that the expressions “gobsmacked” and “WTF” were created. How could you survive to adulthood, let alone get a degree in what I presume was some sort of liberal arts, let alone get a job at The Paper of Record, and not once wonder about whether reality is real? This also suggests that the poor thing’s youth was insufficiently misspent.

Perhaps the real interesting work in this sort of liberal arts has moved to the likes of Edward Fredkin at MIT.

It’s a great article, and I’m happy that serious newspapers are talking about things like this. But in World of Warcraft, a simulation that he gives as a comparison, the characters there have a repertoire of jokes. One of the jokes that a woman might say is, “Do you feel that you aren’t in control of your own destiny — like — you’re being controlled by an invisible hand?”

I’m pleased that Oxford philosophers think about this, and I’m glad that professional journalists are paying attention to it rather than the usual fluff. For our children, however, this is just part of popular culture.

Photo courtesy of denzilm.

Emergent Chaos and Pirates


… pirate ships limited the power of captains and guaranteed crew members a say in the ship’s affairs. The surprising thing is that, even with this untraditional power structure, pirates were, in Leeson’s words, among “the most sophisticated and successful criminal organizations in history.”

Leeson is fascinated by pirates because they flourished outside the state—and, therefore, outside the law. They could not count on higher authorities to insure that people would live up to promises or obey rules. Unlike the Mafia, pirates were not bound by ethnic or family ties; crews were as remarkably diverse as in the “Pirates of the Caribbean” films. Nor were they held together primarily by violence; while pirates did conscript some crew members, many volunteered.

Mmmmm, chaos and emergent rules that work. Who’da thunk?

Read about pirates in the New Yorker.

Photo: “Tom Ironlocks, Sam Hawkeye and Wilde Oskar posing,” by larsst.

Astronauts and Terrorists: Limits of Screening

astronaut-in-diapers.jpgSo we here at Emergent Chaos have carefully refrained from using the phrase “astronaut in diapers” not because we think that it is now incumbent apon the blogosphere to maintain what little dignity remains in American journalism, but because, within about nine minutes of the arrest of Lisa Nowak, the blogosphere had thoroughly digested the story, and there was apparently nothing left to say.

However, when the New York Times published “Astronaut’s Arrest Spurs Review of NASA Testing” with the lead words “NASA is reviewing its psychological screening and checkup process in the wake of the arrest of Capt. Lisa M. Nowak, the astronaut accused of attempted murder, space agency officials said yesterday,” it occurred to me that we could, after all, jump on the `astronaut in diapers’ bandwagon.

You see, we’re concerned with the idea of screening. We think it’s way over-applied, and reduces the emergence of chaos with which we are enamoured. And we’re forced to ask, if NASA, who, after all, can put a man on the moon, can’t screen its 100-odd astronauts successfully, what odds does the TSA have of screening for terrorists?

The TSA, you’ll recall, is an agency that has never put anything but a gloved hand where it doesn’t belong. And TSA wants to screen millions of Americans every day. They want to screen us for a set of criteria that remain extremely fuzzy. (As we covered in a review of the book, “Who Becomes a Terrorist and Why?“)

Setting (our) silliness aside for a moment, screening for rare conditions, like being a terrorist, or a willingness to don diapers and drive 15 hours to wave a BB gun in someone’s face, is hard. It’s hard because you don’t have good indicia of what to look for. It’s hard because every small over-reach will result in thousands of false positives, because, after all, most Americans aren’t terrorists, any more than most astronauts are murders.

Trying to screen for either is a waste.

When a 0% Success Rate is Worthwhile

There’s an article in, about “Turkish Hacker Depletes 10,000 Bank Accounts

A criminal enterprise comprised of 10 individuals who drained the accounts of 10,580 customers by sending virus-infected e-mails was busted in Istanbul.

The suspects reportedly sent virus-infected emails to 3,450,000 addresses, and subsequently drained 10,850 bank accounts.

That’s a hit rate of 0.314%. Which I’m not going to analyze today.

Additional resources, all in Turkish: “İnternet dolandırıcıları yakalandı,” “İnteraktif banka dolandırıcılığı” both seem to be “TSI” agency stories, and “10 bin müşteri hesabını boşalttılar” seems to be a site with additional details. Do any readers speak Turkish?

Halvar on Vulnerability Economics

Back in July, I wrote:

If fewer outbreaks are evidence that things are getting worse, are more outbreaks evidence things are getting better?

Now, I was actually tweaking F-Secure a little, in a post titled “It’s Getting Worse All The Time?” I didn’t expect Halvar Flake would demonstrate that the answer is yes. Attacks getting worse may well mean that things are getting better. Which is kind of counter-intuitive.

In Client Side Exploits, a lot of Office bugs and Vista, he writes about the other side of the Vista exploit coin, and how good security can drive bugs into widespread use:

ASLR is entering the mainstream with Vista, and while it won’t stop any moderately-skilled-but-determined attacker from compromising a server, it will make client side exploits of MSOffice file format parsing bugs a lot harder…As a result of this, client-side bugs in MSOffice are approaching their expiration date. Not quickly, as most customers will not switch to Vista immediately, but they are showing the first brown spots, and will at some point start to smell.

See also “Economics of vulnerabilities,” and “Vulnerability Game Theory.”

Vulnerability Game Theory

So a few days ago, I attended the Vista RTM party. I spent time hanging out with some of the pen testers, and they were surprised that no one had dropped 0day on us yet. These folks did a great job, but we all know that software is never perfect, and that there are things we missed. I hope that the defense in depth tools (/gs, safeseh, ASLR, UAC) help control the customer impact.

So, that said, I’d like to think about this from the researcher point of view. If you’re a clever researcher who’s finding Vista issues, what do you do with them? I think there are three different answers.

First, if you have one, you publish it immediately. Ideally, you do that in a responsible way, but you don’t want to risk your one vuln being found independently and fixed.

Next, if you have a few vulns, you sit on them all, and try to measure the independent find rate, so you know how long they last. When you have that estimate, you decide what to do with what’s left.

Finally, if you have a lot of vulns, and are hoping to sell them, you drop 0day on us as a marketing and advertising ploy. Whoever releases the first working exploit against Vista is going to bring themselves a lot of notoriety, and bring our customers a lot of pain. It’s sorta cool that no one’s done this yet. Maybe they’re waiting on the release to business or consumers? That’s an interesting gamble–you’ll get more attention, but you’re also making a bet that you expect no one will take the “first vuln” credit between now and then. So the longer it takes, the larger the implied compliment on waiting: It’s hard to find vulns, and I expect to be able to wait.

Implied compliments aren’t all that interesting. Someone will have the first issue.

What matters isn’t the first day, it’s the first year. I think we’re pleased with the work done, know that it’s never-ending, and are optimistic that Vista’s first year is going to look substantially better than XP’s first year. That’s the first real test: do we see fewer vulns, and are the vulns of lower average severity? The second real test is what happens to real customer impacts? That’s the test that matters most, and is far harder to measure.

The “Box Switching” Game


I have two boxes. Each has some positive amount of money in it, but I will give you no information about the possible dollar amounts other than the fact that one box has exactly twice the amount of money in it as the other. You randomly select one of the two boxes, open it, and find $100 inside. I now give you the option of keeping the $100 or switching boxes with me and keeping whatever’s inside the other box. Which should you choose?

This reminds me a lot of ‘The Wallet Game.’ I’m not yet sure if the analysis is the same? From Kevan Choset at Volokh. Read the excellent comments over there.

MatrixAll boxes by KellyBeth7.

More on Risk Tolerance

funky-dice.jpgThere’s a number of good comments on “Risk Appetite of Volatility Appetite,” and I’d like to respond to two of the themes.

The first is “risk appetite is an industry-standard term.” I don’t dispute this. I do question if I should care. On the one hand, terms that an industry picks up and uses tend to be useful and revelatory. Sometimes, they are also distortive. Risk appetite makes sense from the perspective of the financial industry, which is selling products of various riskiness. Knowing their customer’s appetite for risk makes sense. It makes sense even if that appetite is formed on false premises, that you must accept higher risk for a higher return. This is clearly false-just look at interest rates on insured savings accounts. A great deal of return is a function of information, and the willingness to find and use it. (Admittedly, a high interest rate may correlate with moral hazard on the part of the insured bank, and you may have to accept getting your money back later.) I think that the term risk appetite is also distortive, in that it influences the way people look at risk. I once caught myself looking for a risky investment, rather than one with a high expected upside. That high-reward investments often include lots of risk doesn’t mean it’s what I’m looking for.

The second is that I misunderstand risk. That may well be true, but I think that the goal of disaggregating risk from reward is useful. Anyone who’d like to offer up a more purely disaggregated risk is free to do so. It’s an interesting thought experiment, one that’s clearly making many readers uncomfortable. That’s not my usual goal, but I’m willing to accept it now and then in exchange for a rewarding conversation.

(These dice are from NelC, too.)

Risk Appetite or Volatility Appetite?

lucky-dice.jpgOver at “Not Bad For A Cubicle,” Thurston (who is always worth reading) manages to tickle a pet-peeve of mine in “A super-size risk appetite?” No rational business has a risk appetite. They accept risk. They may even buy risk in fairly explicit ways (some financial derivatives) if they think that those risks are mis-priced because of either asymmetric information or different risk models. No rational person has a risk appetite. Some rational people have a thrill appetite, which may include elements of risk taking. Gamblers, extreme sports devotees and idiots may all do things in search of a thrill that includes a risk of serious injury or death. That risk may even increase their thrill, but what they’re seeking is the thrill, and they take risk as part of that package.

If you think you have a risk appetite, I have a simple game for you. We flip a coin. If it lands heads, you give me a dollar. If it lands tails, you may choose to play again. This is pure risk. I’ve removed any possible gain. Feel free to play, I’ll send you my address.

The picture is NelC’s “My Lucky Dice.”

[I’ve responded to some of the comments at More on risk tolerance“.]

Avant-Garde: A game for three players

three-musicians.jpg(From Bram Cohen and Nick Mathewson.)
The players are three reclusive artists. Their real names are Anaïs, Benoît,
and Camille, but they sign their works as “A,” “B,” and “C” respectively in
order to cultivate an aura of mystery. Every week, each artist paints a new
work in one of two styles: X and Y.

The art world despises uniformity: if all three artists paint in the same
style, their paintings don’t sell, and they get no points. If one of them
paints in a style different from the others, the different artist is
avant-garde and receives a point.

Because the artists are reclusive, the players can’t communicate with each
other. All they learn from one week to another is what style the other players
used in the previous week. (They learn this when gallery manager passes them
the latest gossip from the art world.)

What is the ideal strategy? Clearly, it’s bad when all three paint in the same
style. If the players could communicate, they could agree to take turns being
avant-garde, so that one week A wins, the next week B wins, the next week C
wins, and so on. Also, if they could communicate, A and B could conspire to
shut out C by always using opposite styles. (If A and B always differ, C will
always match one of them, and the other will win.) But since the players can’t
communicate except through their plays, how can they arrange to coordinate in
twos or threes?

If somebody ran an iterated tournament of this game in the style of Axelrod’s
Prisoners’ Dilemma challenge, what program would you submit? (Remember that
your program would often be playing against instances of itself, without
knowing it.)

Variation: what happens when the artists are so reclusive that they won’t even
speak to their gallery manager? In this variation, they only learn whether they
won the last week or not (by checking for their check in the mail).

The painting is Picasso’s Three Musicians.

Man Charged For Notifying USC of Vulnerability

Federal prosecutors charged a San Diego-based computer expert on Thursday with breaching the security of a database server at the University of Southern California last June and accessing confidential student data.

A statement from the U.S. Attorney for the Central District of California names 25-year-old Eric McCarty as the person who contacted SecurityFocus last June with news of a flaw in the Web server and database system used to accept online applications from prospective students. SecurityFocus notified the University of Southern California of the vulnerability and worked with the university to close the flaw before publishing an article about the issue.

“It wasn’t that he could access the database and showed that it could be bypassed,” said Michael Zweiback, an assistant U.S. Attorney for the U.S. Department of Justice’s cybercrime and intellectual property crimes section. “He went beyond that and gained additional information regarding the personal records of the applicant. If you do that you are going to face, like he does, prosecution.”

The clear message: Next time, don’t tell.

[Update: The story quoted is Rob Lemos, “Man Charged With Accessing USC Student Data.”]

[2nd Update: Rob Lemos has a good three page story on this, “Breach case could curtail Web flaw finders.”]

Book Review: The Stag Hunt and the Evolution of Social Structure

Brian Skyrms’ The Stag Hunt and the Evolution of Social Structure
addresses a subject lying at the intersection of the social sciences, philosophy, and evolutionary biology — how it is possible for social structures to emerge among populations of selfishly-acting individuals.
Using Rousseau’s example of a Stag Hunt, in which hunters face a decision between a less-risky but less-rewarding individual hunt forhare, or the more-risky but more-rewarding cooperative hunt for stag, Skyrms addresses three emergence of social structure as a product of three distinct effects:

  1. Location

  2. Signaling

  3. Association

Two chapters on each of these, plus an initial chapter introducing the stag hunt in elementary game-theoretic terms and describing its relevance to task at hand comprise this thoroughly enjoyable 150-page volume.
Readers like myself, who approach Skyrms’ book having read Axelrod’s The Evolution of Cooperation (or much of the voluminous literature it spawned), will hesitate at Skyrm’s choice of an assurance game (as the stag hunt is known in more prosaic circles) to model the growth of societal organization, preferring the familiar Prisoners’ Dilemma. Drawing from the political philosophy of Hume, from recent re-examination of John Maynard Smith‘s haystack model of the evolution of altruism, and from experimental economics, Skyrms’ justifies his choice in the first chapter.
Next, Skyrms discusses the relevance of Location, as egoistic actors repeatedly play divide-the-dollar against randomly-selected partners, and against neighbors arrayed on a lattice (as in, for example xlife). In the latter scenario, rapid movement toward a “just” equilibrium of even division is observed. Here, as throughout the book, Skyrms reinforces the timeless relevance of the theme he treats (in this chapter, with allusions to distributive justice discussion by Aristotle and Kant). This tactic runs the risk of distracting the reader, or making the writer seem like a name-dropper or pedant, but Skyrms uses it to very positive effect.
In the book’s next chapter, the dynamic behavior of local interactions in a stag hunt game among actors with different degrees and kinds of knowledge about the previous successes of others is discussed. This establishes a fuller picture of how the spatial structure affects the macro-level outcome. Since I read this chapter while waiting for a plane, I focussed less on the details and more on the main idea, which is that outcomes vary depending on the breadth of actors’ vision in considering whom to imitate, and on how small the set of neighbors with whom they may interact is. Here, the book’s first part ends.
Part II concerns Signals. The second of its two chapters considers the evolutionary dynamics of a stag hunt with “cheap talk” — a player’s strategy is not only whether to hunt stag or hare, but also what signal to send, and how to respond to signals he receives. The preceding chapter concerns itself with the development of social conventions, using as its first example language itself. How can language have come about, since the only way to communicate the extremely complex convention which speech represents is via speech itself? In considering this, Skyrms draws on David Lewis and presents in 14 pages a demonstration of how a system of logical inference can evolve, presupposing nothing (such as rationality, intentionality) that has not been observed at the level of a bacterium! That is cool.
The book’s third and final part concerns Association. In the first of its chapters, actors strategies are fixed (in contrast with the entire book until now, in which they evolve), and the interaction patterns among actors are allowed to evolve. Will groups of “friends” form? Will they be long-lived or ephemeral? How does this depend upon chance, length of memory of good times or of slights? Interesting reading, but by now one’s expectations are high! The final chapter considers simultaneously evolving strategies and interaction structures.
I enjoyed this book immensely. Its power derives from its inter-disciplinary foundation, its unflagging clarity of exposition, and the sheer magnitude of the question it tries (with some success!) to answer.
Inasmuch as the ubiquity of the computer, and the interconnectedness it affords so many people has focussed attention on the sorts of issues discussed in this small but important volume, Skyrms’ has produced a work directly relevant to most of those who are reading this (here is proof(?)).
Personally, I feel the value transcends mere pragmatic utility.

The Wallet Game

At lunch after Shmoocon, Nick Mathewson said he’d like to pay something between zero and the amount of money in his wallet. I think this suggests a fascinating game, which is that Alice asks Bob for some amount of money. If Bob has that much money in his wallet, he pays. Otherwise, Alice pays him the amount asked for. How much should Alice ask for?

The more she asks for, the more likely she is to pay that amount. [Updated: That used to say ‘less likely.’] The more information Alice has about Bob, the better off she is. (If she has just seen Bob take a fat wad of bills for the ATM, for example.)

R.G., R.D., and noise suggested that if Alice challenges Bob to the game, Bob should be able to choose if he will ask or be asked.

What is Alice’s optimal strategy, absent special information about Bob’s circumstances? Does the non-continuous nature of US currency change things? What if everyone were carrying coins of a single denomination? Does iteration change things?

(Jenlight’s Duct tape wallet is from Flickr.)