[Update: Michael Zimmer points out that it wasn’t Facebook, but outside researchers who released the data.]
I wanted to comment quickly on an interesting post by Michael Zimmer, “ On the “Anonymity” of the Facebook Dataset.” He discusses how
A group of researchers have released a dataset of Facebook profile information from a group of college students for research purposes, which I know a lot of people will find quite valuable.
Of course, this sounds like an AOL-search-data-release-style privacy disaster waiting to happen. Recognizing this, the researchers detail some of the steps they’ve taken to try to protect the privacy of the subjects, including:
- All identifying information was deleted or encoded immediately after the data were downloaded.
- The roster of student names and identification numbers is maintained on a secure local server accessible only by the authors of this study. This roster will be destroyed immediately after the last wave of data is processed.
In the comments, Jason Kaufman implies that the data really isn’t that private, asking what could go wrong, and why would someone post it to Facebook expecting it to remain private.
I have just one question on all of this. If the data isn’t private, why did they attempt to anonymize it?
I believe they attempted to anonymize it because it’s fairly obvious that the data is private, and releasing it with names obviously attached would be pretty shocking. As Michael Zimmer says, “we really need to keep working on a new set of Internet research ethics and methodologies.”
Also, don’t miss Michael Zimmer’s followup post, “More on the anonymity of the Facebook dataset: It’s Harvard College.”