Talking Trust and Identity

I spoke at yesterday’s Social Data Revolution class on Trust & Identity at Berkeley University on behalf of Airbnb. The class also had speakers from Uber and Reddit. You can see the live recording on youtube.


Fujitsu’s got the wrong picture of a Data Scientist

fujitsu A world super power in electronics, Japaenese company Fujitsu claims that it no longer needs Data Scientists, and has automated their job! The company claims that

Data scientists use their skill to select a combination of algorithm and configuration to get the most accurate predictive model from the starting data

and that they have found a way to automate this searching over different configurations and models for the optimum. The diagram depicts a meta machine learning pipeline that tunes the hyper-parameters of a model in Spark or another language.

While it certainly makes sense to automate this potentially tedious optimisation, this will by no means deprecate the role of a Data Scientist. It is of course true that a Data Scientist has to intelligently choose an algorithm and its configuration, but this is a small part of the full life cycle of a data product that a Data Scientist is responsible for.

The processes of defining a metric to optimise, then obtaining and cleaning data, transforming data to informative feature, maybe also obtaining and cleaning labels (in the case of supervised learning) are all part of a Data Scientist’s responsibilities and need to be completed before an algorithm can be optimised.

Moreover, these processes constitute 90% of the blood, sweat, and tears of a Data Scientist that go into making a successful data product. Algorithm and configuration optimisation can give you a few percentage points boost in performance at most, but it is the accuracy of the data and intelligent feature sculpting which make the real difference.

Let’s hope Fujitsu does not sack all their Data Scientists just yet, or they may have a machine learning tuner with no data for the machine to learn from!

I’ve got you (computer chip) under my skin

It’s finally happened. The Epicentre co-working space in Stockholm, Sweden has installed a near-field communication (NFC) chip into 50 volunteers. Described as like ‘having a vaccination’, the chip is inserted into the wrists of employees under their skin. This enables them to automatically clock in and and out of work, register loyalty points, and automate many other transactions.


Personally, the costs seem to far outweigh the benefits in this scenario. Other employers, especially military organisations and security companies are watching this new device keenly. However, for saving a few seconds each day and the perceived security threat of not having these, the intrusion into one’s private life cannot be worth it.

Wearable devices such as the Fitbit and Apple Watch are becoming ever more popular, but a non-removable device is a new frontier. The argument has always been that they allow us to collect more Big Data and process it more immediately to improve our lives, but its not clear why faster is always better.

Being able to track every movement and heart beat is the stuff of Orwell and Huxley while the benefits are cursory at best. Caution brave new world.

Data Science meets Design: my visit to IDEO

Yesterday I was invited by David Webster to talk to the team at innovative design company IDEO. IDEO is a cutting edge digital and physical design studio in Palo Alto that has been leading creativity for over 30 years. I was lucky enough to have a tour by David through their workshop, engineering office, and toy lab.
ideoAfter the tour we had a joint Q&A with the whole team about how big data is used at Airbnb and how it might be used more in the design process at IDEO. Some key thoughts emerged:

  1. The world is moving towards more wearable sensory technology e.g. Google glasses, Apple watch, Fitbit. With this comes a wealth of feedback data on the user in the offline world. The internet of things (IoT) will make, for example, A/B testing in the offline (physical) world possible.
  2. For designers to be more data empowered, we first need the analytics and prediction tools to catch up. Currently it is easy to log data, cheap to store data and there are standardised tools to query data. However, no leader has emerged for extracting insights from data. This democratisation of insights needs to happen before data can permeate design.
  3. Data science works best with design when they collaborate early. At the start of a project it is easier to scope what data is necessary and easy to collect at the outset so that decisions can be informed and iterations can be faster.

The future for Data Science in Design is exciting and, when they start to overlap more, we will see changes in the world around around us accelerate even faster.

Data Scientists are more than just rebranded Software Engineers

Michael Li of the Data Incubator has written a timely article in VentureBeat on what a Data Scientist is not. In short a Data Scientist is:

  1. Not just a Business Analyst working on more data,
  2. Not just a rebranded Software Engineer,
  3. Not just a Machine Learning expert with no business knowledge.

A Data Scientist needs to be able to extract insights from datasets that are orders of magnitude larger than what they were 5 years ago. And they need to extract this insight carefully, with statistical significance and integrity. Moreover, the insight is only as useful as the business need it solves.

As a regular interviewer at Airbnb for junior and senior Data Scientists, attention to data cleaning and diligence in statistical analysis are fundamental for successful candidates. Moreover, we look for people that understand the ‘why’ of a problem and the business impact of a solution. This is what differentiates a really smart candidate from a hired candidate.