How powerful can data be complied and cross-linked?
The act that AOL has recently released the search data from its users during the 2 months period is quite controversial.
On one hand, a lot of people argue that it's a violation of personal privacy. Simply because people can compile, link data and trace it to the specific person; no matter how carefully you mask the data. The supporting evidence can be drawn from today's news about "one lady's identity has been compromised using the search data".
One the other hand, these data are extremely helpful for research purpose. Here is one article supporting this "Your online life is not private".
Coincidentally, the theme of today's group meeting is about "Research Ethics". We are also concerned about this, mainly because we do data mining(MSR). Although not on the scale of watching people's back, but results obtained from cross-linking can be quite intriguing as well. For example, looking at the development history, we can pinpoint the individuals: which developer produces the most number of bugs, which developer is the least productive or who is least popular guy (a.k.a. which person's posts on the mailing list always get ignored) ... People can get extremely annoyed if these information gets published.
Well, one subtle difference with the search data is that our data is already publicly available: the source code repositories. We just don't want our published work angers anyone and in consequence causing any legal trouble. In the worst case, we can argue that since all the data is publicly available and open source developers should consider themselves as public figures. Therefore, there is no violation of ethics! But well ...
With all has been said, here is my stand: let's take the middle ground.
Ethics is about protecting people. Being constantly watched is really uncomfortable and publicly distribute the search data can be harmful. What if data is shared in a controlled manner? For example, it is distributed among researchers' who have signed a consent form which states the proper use of data? After all, even banks share credit information.
The immediate question is: so what's the proper use of the data?
Wednesday, August 09, 2006
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment