What if shopper anonymity isn’t possible?

A new study led by researchers at the Massachusetts Institute of Technology found that even with names and account numbers deleted from credit cards, just four other pieces of information were enough to re-identify 90 percent of the individual shoppers.

The researchers analyzed transactions made by 1.1 million people in 10,000 stores over three months. Although the information had been "anonymized" by removing names and account numbers, each purchase made by the same credit card was tagged with the same random identification number.

The researchers had access to the locations, dates, and prices of someone’s non-anonymous purchases. (Prices were only given in a range.) That information was matched with publicly available information such as Instagram or Twitter posts to re-identify 90 percent of the shoppers as unique individuals and to uncover their records. The authors defined shoppers’ uniqueness around their purchase behavior as their "unicity."

"You bought a coffee at that coffee shop, and you bought jeans at that shop, and then you bought a pizza," lead author Yves-Alexandre de Montjoye told theverge.com as an example.

The study, "Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata," was published in the journal, Science.

Women and people with higher incomes were easiest to identify, apparently due to their more diverse shopping patterns. Without price information, two data points were still enough to identify more than 40 percent of the people in the data set.

Similar findings around metadata have been found in the past. The same researchers two years earlier conducted an analysis of mobile-phone records that yielded similar results. In 2008, computer scientists were able to re-identify some Netflix users in a database of nameless customer records. Last fall, Kourtney Kardashian, Ashlee Simpson and other celebrities were re-identified through an "anonymized" database of taxi ride records made public by New York City’s Taxi and Limousine Commission.

The new study seemed to further reinforce those findings with the tie-in to credit cards bringing the issue closer to retail. Keeping identities anonymous in this way is also expected to become more challenging as more behavioral data becomes available.

The authors agreed that the analysis of large data sets around people’s behavior offers insights that can help improve public health, city planning, education, and economic policy. NSA reportedly mines credit cards, e-mails and phone metadata in tracking down terrorists and criminals.

But the researchers said companies or institutions making such data sets widely available should be quantifying the risks of re-identification.

"The old model of anonymity doesn’t seem to be the right model when we are talking about large-scale metadata," Mr. de Montjoye told The New York Times.

Discussion Questions

Do the challenges of preserving anonymity with metadata analysis present challenges for the use of Big Data? Will the possibility of re-identification become a major issue for consumers?

Poll

13 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dr. Stephen Needel
Dr. Stephen Needel
9 years ago

I don’t think it’s a challenge for the USE of Big Data assuming Big Data users aren’t putting out the raw, merged data, which makes it easy for someone else to identify the shopper. I do think that the day of reckoning for data privacy will come (and I hope soon). At some point, shoppers are going to go off the grid, because the benefits they’re getting for sharing data are meager.

Joan Treistman
Joan Treistman
9 years ago

Identification is a major issue for consumers, whether it’s re-identification or identification fraud. We are at risk. I’m not convinced corporations are making this a priority. Apparently hacking has a broader definition and maintaining privacy cannot be tackled on an individual basis.

We read about companies not maximizing their Big Data resources. But we don’t read about corporations that strategize to protect those resources. There’s a storm brewing. It’s just a matter of how it gets started.

If companies take the lead, alone or with some consortium, it will be a message to all that this is a serious issue to those who have a stake in the outcome, i.e., trust from their loyal customers. If voters take the lead companies will have less control of how to protect their Big Data resources.

Adrian Weidmann
Adrian Weidmann
9 years ago

Re-identification has been used for many years. One large retailer based in Minneapolis has been aligning car license plates in the parking lots with credit card purchases since the early 2000s. All those cameras that you may think are there to capture criminal activity are being used to also monitor your whereabouts, activities and purchases. The latest companies that are leveraging their embedded technology infrastructure into the Big Data game are IBM, NCR and Cisco. We’re getting dangerously close to the scene in the 2002 film Minority Report with Tom Cruise that became the reference point for many of us promoting in-store technologies. Anderton was offered an AMEX credit card and a Guinness as he walked by a digital display in a future mall environment.

The information is out there. The question is why, how, when, what and where will brands and retailers choose to data mine this pile of information to offer you, the shopper, a valued offering? What will be the cost of getting it wrong?

Chris Petersen, PhD
Chris Petersen, PhD
9 years ago

Consumers have given up a lot of identity information for the sake of convenience. That will quickly reverse if there are more stories like this one.

This report is particularly interesting in that it is not about “theft” per se. It is about the power of Big Data analytics. Most consumers don’t realize the power of Big Data to pinpoint and predict their shopping patterns, or what that might mean to their privacy.

At the end the day, the prediction of the demise of cold hard cash is premature. Consumers have the final vote. And if they do perceive a threat they will vote with cash and not their plastic.

Ian Percy
Ian Percy
9 years ago

No, right now anonymity is not possible. Using face recognition and GPS location in a cell phone we can track down data on a total stranger walking by. As one article put it, this is the stalker’s dream come true.

Privacy and security can come only by design of the source code itself. It cannot be obtained after the fact. Canadians are global leaders in this space and I refer you to the Privacy by Design website where the “7 Foundational Pillars” of privacy are described. You’ll find a treasure trove of privacy research there too.

Larry Negrich
Larry Negrich
9 years ago

There is a right of privacy numbness that continues to slowly seep into the consumer mentality. These consumers are tracked by mobile phones, cars, apps, website and search engines all with their permission. Large number of people “share” their location, purchases, habits, etc., via multiple social media channels. Why would these data-sharing consumers care about something such as meta data analysis when they voluntarily offer more precise data up with each virtual and physical interaction?

Gib Bassett
Gib Bassett
9 years ago

I think there are a couple of dimensions here. One, it illustrates the importance of receiving permission to communicate with consumers through their preferred channels and clarifying how their accrued interaction history is used. Having a clear privacy policy and consumer-accessible preference centers online/available via mobile is key. So is painting a very clear vision for how the relationship and individual insights will be used to provide a tailored and higher-quality customer experience. I think that’s becoming fundamental.

And so if you are able to resolve the identity of someone to transactions, and use this correlation to build a profile that helps you serve your customer better, that’s a good thing. It seems that analytics people and marketers alike tend to skip over the permission and usage steps, straight to what data can be collected and used to target offers and deals to either individuals or segments. The consumer advocate lens then shifts to how retailers are using data in creepy and borderline unethical manners to sell more products. I think retailers and their analytics teams need to step back and work with other areas of the business on clarifying and communicating their overall customer experience value proposition—why shop with you as opposed to anywhere else, and what are you going to do to convince me in a transparent way?

Cathy Hotka
Cathy Hotka
9 years ago

The timing of this story is interesting, given the kerfuffle over the Samsung smart-TV.

For years customers have been told that personal information in the hands of retailers will produce better service. At some point that better service had better show up. We’re rapidly approaching the “creepy” stage, with little to show for it.

Vahe Katros
Vahe Katros
9 years ago

This MIT study was the topic of discussion by Ben Hunt, the Chief Risk Officer at Salient Partners. He’s smart and I’ve pulled some entertaining quotes and a section from that discussion —see it all here.

If I were to extrapolate what he’s saying, I think we should be worried for reasons we don’t yet understand.

We kill people based on metadata.
– Gen. Michael Hayden, former head of the NSA and CIA

In the future, everyone will be anonymous for 15 minutes.
– Banksy (2006)

Bene vixit, bene qui latuit. (To live well is to live concealed)
– Ovid (43 BC – 18 AD)

The most sacred thing is to be able to shut your own door.
– G.K. Chesterton (1874 – 1936)

“In exactly the same way that we have given away our personal behavioral data to banks and credit card companies and wireless carriers and insurance companies and a million app providers, so are we now being tempted to give away our portfolio behavioral data to mega-banks and mega-asset managers and the technology providers who work with them. Don’t worry, they say, there’s nothing in this information that identifies you directly. It’s all anonymous. What rubbish! With enough anonymous portfolio behavioral data and a laughably small IT budget, any competent magician can design a Big Data system that can predict with 90% accuracy what you will buy and sell in your account, at what price you will buy and sell, and under what external macro conditions you will buy and sell. Every day these private data sets at the mega-market players get bigger and bigger, and every day we get closer and closer to a Citadel or a Renaissance perfecting their Inference Machine for the liquid capital markets. For all I know, they already have.”

Peter J. Charness
Peter J. Charness
9 years ago

Back to cash. It’s really a question of what are the odds of someone doing something with all that data that could be harmful. If the worse that happens is a mistargeted coupon, then “who cares” that a computer can identify where I’ve been, what I bought…(did I really say that?). If that information can be used in a malicious fashion, then it will matter. Like many things technology has helped, the cost of finding out all about you has gone way down. 50 years ago one would have to hire private detectives to glean this same information, today it’s available with the swipe of a card.

Bernice Hurst
Bernice Hurst
9 years ago

On my recent visit to the US, I used a pay as you go cellphone plus a credit card that I only use in the US. I wonder if someone is now going to track me down and try to find out where I am and why I behave that way….

Gordon Arnold
Gordon Arnold
9 years ago

Typically the financial software used in a security, business continuity and disaster recovery testing like this is sampled from an approved Internal Revenue Service and Untied States Treasury department recommended list of participants. Since the government is usually funding these inquires they get to choose what is approved for testing.

In the land of transaction software and bookkeeping there are several variants that are both legal to use as well as deliberately designed to be easy to use and highly secure. Contrary to popular belief, the government does like it when transaction details are very easy to manipulate for investigation purposes. Sometimes this is a good thing for the public, but it is prolonging an identity theft crisis that retail, insurance and banking can no longer tolerate. When something is broken, it must be fixed or replaced. It is now time for something new that we know works.

Ralph Jacobson
Ralph Jacobson
9 years ago

With the volume of consumers more than willing to sign up for merchant programs of myriad types, I think there’s a vocal minority that continues to plague the industry with a false desire of shopper anonymity.

BrainTrust