Anonymisation will unlock data mining

It is widely accepted that many companies such as Facebook, Google, Amazon, etc have vast amounts of data on our friends, interests, and spending habits amongst other things. At times, for example for data mining or scientific collaboration, it can be useful for companies to access internal and external data. However, they are rightfully blocked by Privacy laws.

Cracked security code

Hence, there is an increasing work in thinking of ways to properly anonymise data to enable mining. Aggregation of data to wash PII (personally identifiable information) is one way to achieve this, but can lose important granularity and detail.

Raffael Strassnig, VP Data Scientist at Barclays retail bank, spoke at a summit last month to stress the importance of protecting privacy. Anonymising data at scale is a very hard problem but Strassnig’s team have implemented an algorithm by a PhD candidate and modified it to work on Barclay’s Big Data. The method involves:

“clustering the data into k-means clusters, with no cluster overlapping, the clusters being a certain size to comply with k-anonymity constraint, and minimising the loss of data when applying the procedure to the dataset by using a dissimilarity measure”

Future developments in application of Machine Learning techniques may enable use of PII without anonymisation. Until then, the Data Science team at Barclays is leading the way in protecting their users’ data while processing it.

Are Algorithms becoming our Overlords?

algosAn apocalyptic piece from the Evening Standard last Easter which highlights all the different points in our lives where algorithms are controlling what we see or hear or do. A few examples include:

  • social network feeds
  • travel websites
  • song compositions
  • pension investments.

Why should we be concerned? As Robert Colvile, author of The Great Acceleration, mentions in the context of financial markets:

‘The real danger is that it can all happen at speeds to which humans can’t react. Firms go bankrupt or markets get shattered before anyone’s really realised what’s going on, which is why it’s really important to have the right safeguards in place.’

Elon Musk wants to open source Artificial Intelligence


Taken from Wired‘s article two months ago:

This morning, OpenAI will release its first batch of AI software, a toolkit for building artificially intelligent systems by way of a technology called “reinforcement learning”—one of the key technologies that, among other things, drove the creation of AlphaGo, the Google AI that shocked the world by mastering the ancient game of Go. With this toolkit, you can build systems that simulate a new breed of robot, play Atari games, and, yes, master the game of Go.

He envisions OpenAI as the modern incarnation of Xerox PARC, the tech research lab that thrived in the 1970s. Just as PARC’s largely open and unfettered research gave rise to everything from the graphical user interface to the laser printer to object-oriented programing, Brockman and crew seek to delve even deeper into what we once considered science fiction. PARC was owned by, yes, Xerox, but it fed so many other companies, most notably Apple, because people like Steve Jobs were privy to its research. At OpenAI, Brockman wants to make everyone privy to its research.

But along with such promise comes deep anxiety. Musk and Altman worry that if people can build AI that can do great things, then they can build AI that can do awful things, too. They’re not alone in their fear of robot overlords, but perhaps counterintuitively, Musk and Altman also think that the best way to battle malicious AI is not to restrict access to artificial intelligence but expand it. That’s part of what has attracted a team of young, hyper-intelligent idealists to their new project.

Giving up control is the essence of the open source ideal. If enough people apply themselves to a collective goal, the end result will trounce anything you concoct in secret. But if AI becomes as powerful as promised, the equation changes. We’ll have to ensure that new AIs adhere to the same egalitarian ideals that led to their creation in the first place. Musk, Altman, and Brockman are placing their faith in the wisdom of the crowd. But if they’re right, one day that crowd won’t be entirely human.

You can read the full text here.

Do you really need Data Scientists?

Data_Science_VDThis and many more common questions about Data Science are tackled by Instacart VP Data Science Jeremy Stanley, and former LinkedIn data leader Daniel Tunkelang. The term Data Science was only coined a decade or so ago but has gathered so much momentum that most business leaders now feel like they should have a Data Science team – even if they don’t know what they would do with them.

Jeremy and Daniel take us through some common misconceptions and recommended ways for thinking about finding real impact from Data Science. Some of my favourite lines from the article:

The above may sound a lot like data analytics, and indeed the difference between analytics and decision science isn’t always clear. Still, decision science should do more than produce reports and dashboards.

But collecting data isn’t enough. Data science only matters if data drives action.

Similarly, data-driven decision making requires a top-down commitment. From the CEO down, the organization has to commit to making decisions using data, rather than based on the highest paid person’s opinion ( or HiPPO).

Many people equate big data to data science, but size isn’t everything. Data science is about separating the signal in data from the noise.

Don’t hire a head of data or build a team until you have work for them to do. At the same time, ensure you’re collecting key data early on so that team can have an impact once you’re ready.

Build a company culture early that makes it a great place to practice data science, and you’ll reap dividends when they matter most.

Over time, the impact that a data science team has will be far higher if you build a diverse team with extremely different backgrounds, skill-sets, and world views.

Finally, focus early on hiring data scientists who reflect your company ideals. To be effective, data scientists must be trusted by their teams, the users of their products, and the decision makers they influence.