Anonymisation will unlock data mining

It is widely accepted that many companies such as Facebook, Google, Amazon, etc have vast amounts of data on our friends, interests, and spending habits amongst other things. At times, for example for data mining or scientific collaboration, it can be useful for companies to access internal and external data. However, they are rightfully blocked by Privacy laws.

Cracked security code

Hence, there is an increasing work in thinking of ways to properly anonymise data to enable mining. Aggregation of data to wash PII (personally identifiable information) is one way to achieve this, but can lose important granularity and detail.

Raffael Strassnig, VP Data Scientist at Barclays retail bank, spoke at a summit last month to stress the importance of protecting privacy. Anonymising data at scale is a very hard problem but Strassnig’s team have implemented an algorithm by a PhD candidate and modified it to work on Barclay’s Big Data. The method involves:

“clustering the data into k-means clusters, with no cluster overlapping, the clusters being a certain size to comply with k-anonymity constraint, and minimising the loss of data when applying the procedure to the dataset by using a dissimilarity measure”

Future developments in application of Machine Learning techniques may enable use of PII without anonymisation. Until then, the Data Science team at Barclays is leading the way in protecting their users’ data while processing it.

Advertisements

Beating the government: is big data crucial or creepy?

An article on Thursday in the UK online tech journal ArsTechnica reviews the surprising power of mobile communications data to identify trending unemployment.

PLOS One paper and Journal of the Royal Society Interface paper both published last week look at changes in the frequency, location, and timing of interactions between people via their cellular records. The correlations between these changes and observed layoffs can be used to train models for future predictions.

The article asks: is this harvesting of phone records to get ahead of employment shocks a critical tool for planners and government officials? Or actually a very creepy and invasive use of personal information? Comments welcome!

unemploymentcreepydata

This image, unrelated to the unemployment study, shows seasonal population changes in France and Portugal, measured by cellphone activity.