Transportation

How “anonymous” wifi data can still be a privacy risk

Comment

Image Credits:

The thorny issue of tracking of location data without risking individual privacy is very neatly illustrated via a Freedom of Information (FOI) request asking London’s transport regulator to release the “anonymized” data-set it generated from a four week trial last year when it tracked metro users in the UK capital via wi-fi nodes and the MAC address of their smartphones as they traveled around its network.

At the time TfL announced the pilot it said the data collected would be “automatically de-personalised”. Its press release further added that it would not be able to identify any individuals.

It said it wanted to use the data to better understand crowding and “collective travel patterns” so that “we can improve services and information provision for customers”.

(Though it’s since emerged TfL may also be hoping to generate additional marketing revenue using the data — by, a spokesman specifies, improving its understanding of footfall around in-station marketing assets, such as digital posters and billboards, so not by selling data to third parties to target digital advertising at mobile devices.)

Press coverage of the TfL wi-fi tracking trial has typically described the collected data as anonymized and aggregated.

Those Londoners not wanting to be tracked during the pilot, which took place between November 21 and December 19 last year, had to actively to switch off the wi-fi on their devices. Otherwise their journey data was automatically harvested when they used 54 of the 270 stations on the London Underground network — even if they weren’t logged onto/using station wi-fi networks at the time.

However in an email seen by TechCrunch, TfL has now turned down an FOI request asking for it to release the “full dataset of anonymized data for the London Underground Wifi Tracking Trial” — arguing that it can’t release the data as there is a risk of individuals being re-identified (and disclosing personal data would be a breach of UK data protection law).

“Although the MAC address data has been pseudonymised, personal data as defined under the [UK] Data Protection Act 1998 is data which relate to a living individual who can be identified from the data, or from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller,” TfL writes in the FOI response in which it refuses to release the dataset.

“Given the possibility that the pseudonymised data could, if it was matched against other data sets, in certain circumstances enable the identification of an individual, it is personal data. The likelihood of this identification of an individual occurring would be increased by a disclosure of the data into the public domain, which would increase the range of other data sets against which it could be matched.”

So what value is there in data being “de-personalized” — and a reassuring narrative of ‘safety via anonymity’ being spun to smartphone users whose comings and goings are being passively tracked — if individual privacy could still be compromised?

At this point we’ve seen enough examples of data sets being sold and re-identified or shared and re-identified that for large scale data-sets that are being collected claims of anonymity are dubious — or at least need to be looked at very carefully.

“At this point we’ve seen enough examples of data sets being sold and re-identified or shared and re-identified that for large scale data-sets that are being collected claims of anonymity are dubious — or at least need to be looked at very carefully,” says Yves-Alexandre de Montjoye, a lecturer in computational privacy at Imperial College’s Data Science Institute, discussing the contradictions thrown up by the TfL wi-fi trial.

He dubs the wi-fi data collection trial’s line on privacy as a “really thorny one”.

“The data per se is not anonymous,” he tells TechCrunch. “It is not impossible to re-identify if the raw data were to be made public — it is very likely that one might be able to re-identify individuals in this data set. And to be honest, even TfL, it probably would not be too hard for them to match this data — for example — with Oyster card data [aka London’s contactless travelcard system for using TfL’s network].

“They’re specifically saying they’re not doing it but it would not be hard for them to do it.”

Other types of data that, combined with this large scale pseudonymised wifi location dataset could be used to re-identify individuals, might include mobile phone data (such as data held by a carrier) or data from apps on phones, de Montjoye suggests.

In one of his previous research studies, looking at credit card metadata, he found that just four random pieces of information were enough to re-identify 90 per cent of the shoppers as unique individuals.

In another study he co-authored, called Unique in the Crowd: The privacy bounds of human mobility, he and his fellow researchers write: “In a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals”.

Meanwhile, London’s Tube network handles up to five million passenger journeys per day — the vast majority of whom would be carrying at least one device with embedded wi-fi making their movements trackable.

The TfL wi-fi trial tracked journeys on about a fifth of the London Underground network for a month.

A spokesman confirms it is now in discussions, including with the UK’s data protection watchdog, about how it could implement a permanent rollout on the network. Though he adds there’s no specific timeframe in the frame as yet.

“Now we’re saying how do we do this going forward, basically,” says the TfL spokesman. “We’re saying is there anything we can do better… We’re actively meeting with the ICO and privacy groups and key stakeholders to talk us through what our plans will be for the future… and how can we work with you to look to take this forward.”

On the webpage where it provides privacy information about the wifi trial for its customers, TfL writes:

Each MAC address collected during the pilot will be de-personalised (pseudonymised) and encrypted to prevent the identification of the original MAC address and associated device. The data will be stored in a restricted area of a secure server and it will not be linked to any other data.

As TfL will not be able to link this data to any other information about you or your device, you will not receive any information by email, text, push message or any other means, as a result this pilot.

The spokesman tells us that the MAC addresses were encrypted twice, using a salt key in the second instance, which he specifies was destroyed at the end of the trial, claiming: “So there was no way you could ascertain what the MAC address was originally.”

“The only reason we’re saying de-personalized, rather than anonymized… is in order to understand how people are moving through the station you have to be able to have the same code going through the station [to track individual journeys],” he adds. “So we could understand that particular scrambled code, but we had no way of working out who it was.”

However de Montjoye points out that TfL could have used daily keys, instead of apparently keeping the same salt keys for a month, to reduce the resolution of the information being gathered — and shrink the re-identification risk to individuals.

“The big thing is whether — and one thing we’ve really been strongly advocating for — is to really try to take into account the full set of solutions, including security mechanisms to prevent the re-identification of [large scale datasets] even if the data is pseudoanonymous,” he says. “Meaning for example ensuring that there is no way someone can access, both the dataset or try to take auxiliary information and try to re-identify people through security mechanisms.”

“Here I have not seen anything that describes anything that they would have put in place to prevent these kind of attacks,” he adds of the TfL trial.

He also points out that the specific guidance produced by the UK ICO on wi-fi location analytics recommends data controllers strike a balance between the uses they want to make of data vs collecting excessive data (and thus the risk of re-identification).

The same guidance also emphasizes the need to clearly inform and be transparent with data subjects about the information being gathered and what it will be used for.

Yet, in the TfL instance, busy London Underground commuters would have had to actively switch off all the wi-fi radios on all their devices to avoid their travel data being harvested as they rushed to work — and might easily have missed posters TfL said it put up in stations to inform customers about the trial.

“One could at least question whether they’ve been really respecting this guidance,” argues de Montjoye.

TfL’s spokesman has clearly become accustomed to being grilled by journalists on this topic, and points to comments made by UK information commissioner, Elizabeth Denham, last month who, when asked about the trial at a local government oversight meeting, described it as “a really good example of a public body coming forward with a plan, a new initiative, consulting us deeply and doing a proper privacy impact assessment”.

“We agreed with them that at least for now, in the one-way hash that they wanted to implement for the trial, it was not reversible and it was impossible at this point to identify or follow the person through the various Tube station,” Denham added. “I would say it is a good example of privacy by design and good conversations with the regulator to try to get it right. There is a lot of effort there.”

Even so, de Montjoye’s view that large-scale location tracking operations need to not just encrypt personally identifiable data properly but also implement well designed security mechanisms to control how data is collected and can be accessed is difficult to argue with — with barely a week going by without news breaking of yet another large scale data breach.

Equifax is just one of the most recent (and egregious) examples. Many more are available.

While much more personal data is leeching into the public domain where it can be used to cross-reference and unlock other pieces of information, further increasing re-identification risks.

“They are reasonably careful and transparent about what they do which is good and honestly better than a lot of other cases,” says de Montjoye of TfL, though he also reiterates the point about using daily keys, and wonders: “Are they going to keep the same key forever if it becomes permanent?”

He reckons the most important issue likely remains whether the data being collected here “can be considered anonymous”, pointing again to the ICO guidance which he says: “Clearly states that ‘If an individual can be identified from that MAC address, or other information in the possession of the network operator, then the data will be personal data’” — and pointing out that TfL holds not just Oyster card data but also contactless credit and debit card data (bank cards can also be used to move around its transport network), meaning it has various additional large scale data-holdings which could potentially offer a route for re-identifying the ‘anonymous’ location dataset.

“More generally, the FOIA example nicely emphasize the difficulties (for the lack of a better word) of the use of the word anonymous and, IMHO, the issue its preponderance in our legal framework raises,” he adds. “We and PCAST have argued that de-identification is not a useful basis for policy and we need to move to proper security-based and provable systems.”

Asked why, if it’s confident the London Underground wi-fi data is truly anonymized it’s also refusing to release the dataset to the public as asked (by a member of the public), TfL’s spokesman tells us: “What we’re saying is we’re not releasing it because you could say that I know that someone was the only person in the station at that particular time. Therefore if I could see that MAC address, even if it’s scrambled, I can then say well that’s the code for that person and then I can understand where they go — therefore we’re not releasing it.”

So, in other words, anonymized data is only private until the moment it’s not — i.e. until you hold enough other data in your hand to pattern match and reverse engineer its secrets.

“We made it very clear throughout the pilot we would not release this data to third parties and that’s why we declined the FOI response,” the spokesman adds as further justification for denying the FOI request. And that at least is a more reassuringly solid rational.

One more thing that’s worthy of noting is that incoming changes to local data protection rules are likely to reduce some of the confusion in future, when the GDPR (General Data Protection Regulation) comes into effect across the European Union next May.

“I expect the GDPR, and the UK law implementing it, will make the situation around anonymity and pseudonymised data a lot clearer than it is now,” says Eerke Boiten, a cyber security professor at De Montfort University. “The GDPR has separate definitions of both, and does not make a risk-based assessment of whether pseudonymised personal data remains personal: It simply always is.

“Anonymised data under the GDPR is data where nobody can reconstruct the original identifying information — something that you cannot achieve with pseudonymisation on databases like this, even if you throw away the ‘salt’.”

“Pseudonymisation under the GDPR is essentially a security control, reducing in the first place disclosure impact — in the same way that encryption is,” he adds.

The UK is also considering changing domestic law to criminalize the re-identification of anonymous data. (Though de Montjoye voices concern about what that might mean for security researchers — and critics of the proposal have suggested the government should rather focus on ensuring data controllers properly anonymize data.)

GDPR will also change consent regimes as it can require explicit consent for the collection of personal data — though other lawful bases for processing data are available. So it seems unlikely that TfL could roll out a permanent system to gather wi-fi data on the London Underground in the way it did here, i.e. by relying on an opt-out.

“This would never be a ‘consent’ scenario under the GDPR,” agrees Boiten. “Failing to opt-out isn’t a ‘clear affirmative action’. For the GDPR, TfL would need to find a different justification, possibly involving their service responsibilities as well as the impact on passengers’ privacy. Informing passengers adequately would also be central.”

This article was updated to clarify that consent is one of the legal basis for processing personal data under GDPR — other legal basis do exist, although it’s not clear which of those, if any, TfL could use to process wi-fi location tracking data without obtaining consent 

More TechCrunch

Hello and welcome back to TechCrunch Space. Is it just me, or is the news cycle only accelerating this summer?! Want to reach out with a tip? Email Aria at…

TechCrunch Space: Space cowboys

Apple Intelligence features are not available in the developer beta, which is out now.

Without Apple Intelligence, iOS 18 beta feels like a TV show that’s waiting for the finale

Apple released the public betas for its next generation of software on the iPhone, Mac, iPad and Apple Watch on Monday. You can now test out iOS 18 and many…

Apple’s public betas for iOS 18 are here to test out

One major dissenter threatens to upend Fisker’s apparent best chance at offloading its unsold EVs, a deal that would keep the startup’s bankruptcy proceeding alive and pave the way for…

Fisker has one major objector to its Ocean SUV fire sale

Payments giant Stripe has delayed going public for so long that its major investor Sequoia Capital is getting creative to offer returns to its limited partners. The venture firm emailed…

Major Stripe investor Sequoia confirms $70B valuation, offers its investors a payday

Alphabet, Google’s parent company, is in advanced talks to acquire Wiz for $23 billion, a person close to the company told TechCrunch. The deal discussions were previously reported by The…

Google’s Kurian approached Wiz, $23B deal could take a week to land, source says

Name That Bird determines individual members of a species by identifying distinguishing characteristics that most humans would be hard-pressed to spot.

Bird Buddy’s new AI feature lets people name and identify individual birds

YouTube Music is introducing two new ways to boost song discovery on its platform. YouTube announced on Monday that it’s experimenting with an AI-generated conversational radio feature, and rolling out…

YouTube Music is testing an AI-generated radio feature and adding a song recognition tool

Tesla had internally planned to build the dedicated robotaxi and the $25,000 car, often referred to as the Model 2, on the same platform.

Elon Musk confirms Tesla ‘robotaxi’ event delayed due to design change

What this means for the space industry is that theory has become reality: The possibility of designing a habitation within a lunar tunnel is a reasonable proposition.

Moon cave! Discovery could redirect lunar colony and startup plays

Get ready for a prime week of savings at TechCrunch Disrupt 2024 with the launch of Disrupt Deal Days! From now to July 19 at 11:59 p.m. PT, we’re going…

Disrupt Deal Days are here: Prime savings for TechCrunch Disrupt 2024!

Deezer is the latest music streaming app to introduce an AI playlist feature. The company announced on Monday that a select number of paid users will be able to create…

Deezer chases Spotify and Amazon Music with its own AI playlist generator

Real-time payments are becoming commonplace for individuals and businesses, but not yet for cross-border transactions. That’s what Caliza is hoping to change, starting with Latin America. Founded in 2021 by…

Caliza lands $8.5 million to bring real-time money transfers to Latin America using USDC

Adaptive is a platform that provides tools designed to simplify payments and accounting for general construction contractors.

Adaptive builds automation tools to speed up construction payments

When VanMoof declared bankruptcy last year, it left around 5,000 customers who had preordered e-bikes in the lurch. Now VanMoof is up and running under new management, and the company’s…

How VanMoof’s new owners plan to win over its old customers

Mitti Labs aims to transform rice farming in India and other South Asian markets by reducing methane emissions by 50% and water consumption by 30%.

Mitti Labs aims to make rice farming less harmful to the climate, starting in India

This is a guide on how to check whether someone compromised your online accounts.

How to tell if your online accounts have been hacked

There is a general consensus today that generative AI is going to transform business in a profound way, and companies and individuals who don’t get on board will be quickly…

The AI financial results paradox

Google’s parent company Alphabet might be on the verge of making its biggest acquisition ever. The Wall Street Journal reports that Alphabet is in advanced talks to acquire Wiz for…

Google reportedly in talks to acquire cloud security company Wiz for $23B

Featured Article

Hank Green reckons with the power — and the powerlessness — of the creator

Hank Green has had a while to think about how social media has changed us. He started making YouTube videos in 2007 with his brother, novelist John Green, at a time when the first iPhone was in development, Myspace was still relevant and Instagram didn’t exist. Seventeen years later, posting…

Hank Green reckons with the power — and the powerlessness — of the creator

Here is a timeline of Synapse’s troubles and the ongoing impact it is having on banking consumers. 

Synapse’s collapse has frozen nearly $160M from fintech users — here’s how it happened

Featured Article

Helixx wants to bring fast-food economics and Netflix pricing to EVs

When Helixx co-founder and CEO Steve Pegg looks at Daisy — the startup’s 3D-printed prototype delivery van — he sees a second chance. And he’s pulling inspiration from McDonald’s to get there.  The prototype, which made its global debut this week at the Goodwood Festival of Speed, is an interesting proof…

Helixx wants to bring fast-food economics and Netflix pricing to EVs

Featured Article

India clings to cheap feature phones as brands struggle to tap new smartphone buyers

India is struggling to get new smartphone buyers, as millions of Indians don’t go for an upgrade and continue to be on feature phones.

India clings to cheap feature phones as brands struggle to tap new smartphone buyers

Roboticists at The Faboratory at Yale University have developed a way for soft robots to replicate some of the more unsettling things that animals and insects can accomplish — say,…

Meet the soft robots that can amputate limbs and fuse with other robots

Featured Article

If you’re an AT&T customer, your data has likely been stolen

This week, AT&T confirmed it will begin notifying around 110 million AT&T customers about a data breach that allowed cybercriminals to steal the phone records of “nearly all” of its customers. The stolen data contains phone numbers and AT&T records of calls and text messages during a six-month period in…

If you’re an AT&T customer, your data has likely been stolen

In the first half of 2024 alone, more than $35.5 billion was invested into AI startups globally.

Here’s the full list of 28 US AI startups that have raised $100M or more in 2024

Whistleblowers have accused OpenAI of placing illegal restrictions on how employees can communicate with government regulators, according to a letter obtained by The Washington Post. Lawyers representing anonymous whistleblowers sent…

Whistleblowers accuse OpenAI of ‘illegally restrictive’ NDAs

Business email compromise attacks are on the rise. Here’s how you can stay ahead of the hackers.

How to protect your startup from email scams

Featured Article

What exactly is an AI agent?

Regardless of how they’re defined, the agents are for helping complete tasks in an automated way with as little human interaction as possible.

What exactly is an AI agent?

Meta announced former President Donald Trump’s Facebook and Instagram accounts will no longer be subject to heightened suspension penalties, according to an updated blog post on Friday. The company says…

Meta removes special restrictions for Trump’s account ahead of 2024 elections