What if shopper anonymity isn’t possible?
A new study led by researchers at the Massachusetts Institute of Technology found that even with names and account numbers deleted from credit cards, just four other pieces of information were enough to re-identify 90 percent of the individual shoppers.
The researchers analyzed transactions made by 1.1 million people in 10,000 stores over three months. Although the information had been "anonymized" by removing names and account numbers, each purchase made by the same credit card was tagged with the same random identification number.
The researchers had access to the locations, dates, and prices of someone’s non-anonymous purchases. (Prices were only given in a range.) That information was matched with publicly available information such as Instagram or Twitter posts to re-identify 90 percent of the shoppers as unique individuals and to uncover their records. The authors defined shoppers’ uniqueness around their purchase behavior as their "unicity."
"You bought a coffee at that coffee shop, and you bought jeans at that shop, and then you bought a pizza," lead author Yves-Alexandre de Montjoye told theverge.com as an example.
The study, "Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata," was published in the journal, Science.
Women and people with higher incomes were easiest to identify, apparently due to their more diverse shopping patterns. Without price information, two data points were still enough to identify more than 40 percent of the people in the data set.
Similar findings around metadata have been found in the past. The same researchers two years earlier conducted an analysis of mobile-phone records that yielded similar results. In 2008, computer scientists were able to re-identify some Netflix users in a database of nameless customer records. Last fall, Kourtney Kardashian, Ashlee Simpson and other celebrities were re-identified through an "anonymized" database of taxi ride records made public by New York City’s Taxi and Limousine Commission.
The new study seemed to further reinforce those findings with the tie-in to credit cards bringing the issue closer to retail. Keeping identities anonymous in this way is also expected to become more challenging as more behavioral data becomes available.
The authors agreed that the analysis of large data sets around people’s behavior offers insights that can help improve public health, city planning, education, and economic policy. NSA reportedly mines credit cards, e-mails and phone metadata in tracking down terrorists and criminals.
But the researchers said companies or institutions making such data sets widely available should be quantifying the risks of re-identification.
"The old model of anonymity doesn’t seem to be the right model when we are talking about large-scale metadata," Mr. de Montjoye told The New York Times.
- Unique in the Shopping Mall: On the reidentifiability of credit card metadata – MIT
- With a Few Bits of Data, Researchers Identify ‘Anonymous’ People – The New York Times (tiered sub.)
- Just four credit card clues can identify anyone – New Scientist
- Analysis: It’s surprisingly easy to identify individuals from credit-card metadata. – MIT
- Your shopping habits are one in a million, literally – The Verge
- People identified through credit-card use alone – Nature
Do the challenges of preserving anonymity with metadata analysis present challenges for the use of Big Data? Will the possibility of re-identification become a major issue for consumers?