Google tracking users online, Facebook monitoring likes, Amazon following people down their sales funnel, and continues with an army of data brokers. Even security companies have been involved in this data collection business. A recent NYTimes report found hundreds of trackers following their journalist as he visited 47 sites in a day. And regardless of the size of these companies, they all want you to believe that their data collected about you has been safely anonymized.
Any time you pick up your smartphone or type a website into the browser, there is a trail of data left behind. This flood of data includes information specifically about the device such as its unique MAC number, operating system, even a fingerprint of the display settings. Then there is information about your location via the IP address and GPS data, the sites visited, and the timestamp on all of these activities. This is without even factoring in all of the data you might have entered into various websites and your online purchases.
These bits of information are being picked up, recorded, and resold. The major players in this are Google, Facebook, Amazon, Twitter, and other less known names such as AppNexus. While the sheer amount of data and trackers may seem daunting, those disparate tracking activities are tied together by several unique individual identifiers that can link individual online activities to you and your device. It’s not just Google tracking you across all of their various services and to individual websites – they are just the most advanced at it.
Given this huge pool of data, the basics of anonymization are clear — but the precise process and criteria for creating it are not. The basics are that Personal Identifiable Information such as social security or drivers’ license numbers are removed forever from the data. Several processes (some of them even patented like those used by Avast’s Jumpshot unit) are used to identify, find, and remove such personal information.
Removing PII is also important for financial reasons. Once the PII has been identified and removed, the database is no longer protected by privacy laws like the EU’s GDPR or the California Consumer Privacy Act (CCPA) and can freely used or resold on the open market. Fair enough, there is no personally incriminating data about anyone in this database, right?
Wrong. The problem is that multiple studies show how simply and accurately individuals could be “reidentified” or “deanonymized” with anonymized and unprotected data. Researchers from the Université catholique de Louvain in Belgium and Imperial College in the UK were able to correctly reidentify 99.98% of Americans in any available ‘anonymized’ dataset by using just 15 characteristics, including age, gender, and marital status. A person can end up in these datasets simply by living: Being born, shopping, surfing online, or driving a car.
“We’re often assured that anonymization will keep our personal information safe,” said paper co-author Dr Julien Hendrickx from UCLouvain. “Our paper shows that de-identification is nowhere near enough to protect the privacy of people’s data.”
The more researchers look, the fewer data points are really required to strip off the veneer of anonymization. For these researchers, only 15 points were needed – and each of these points were taken from anonymized databases. Back in the 1930s, Edmond Locard needed 12 points to make a unique fingerprint identification from a real person. And even fewer details are needed when you have a smartphone. Other researchers from MIT and the UCLouvain found that only four spatio-temporal points (your approximate location at an approximate time) are needed to identify 95 percent of people.
Researchers made an easy-to-used tool which calculates the likelihood of your being outed in an anonymized dataset – without having to touch any of the statistics. This machine-learning powered tool starts off with you entering your zip code, gender, and date. Those three details are enough to identify a person 81% of the time from an anonymized data set. Add more details, the tool shows you the increased probability of being identified.
The only conclusion is that thanks to the rampant collection of private data — you are far less anonymous both online and off than ever dreamed possible.
“Data is a toxic asset and saving it is dangerous,” wrote Bruce Schneier, a public-interest technologist, in a blog post. To reduce this toxicity – you can reduce your own digital footprint by using tools such as a VPN and blocking trackers. In addition, you can pick a security partner that designs its tools to not vacuum up your data and which has a long-term policy of not selling user data. The best way for a company to protect your private data is to simply not have it.