Data Science meets Design: my visit to IDEO

Yesterday I was invited by David Webster to talk to the team at innovative design company IDEO. IDEO is a cutting edge digital and physical design studio in Palo Alto that has been leading creativity for over 30 years. I was lucky enough to have a tour by David through their workshop, engineering office, and toy lab.
ideoAfter the tour we had a joint Q&A with the whole team about how big data is used at Airbnb and how it might be used more in the design process at IDEO. Some key thoughts emerged:

  1. The world is moving towards more wearable sensory technology e.g. Google glasses, Apple watch, Fitbit. With this comes a wealth of feedback data on the user in the offline world. The internet of things (IoT) will make, for example, A/B testing in the offline (physical) world possible.
  2. For designers to be more data empowered, we first need the analytics and prediction tools to catch up. Currently it is easy to log data, cheap to store data and there are standardised tools to query data. However, no leader has emerged for extracting insights from data. This democratisation of insights needs to happen before data can permeate design.
  3. Data science works best with design when they collaborate early. At the start of a project it is easier to scope what data is necessary and easy to collect at the outset so that decisions can be informed and iterations can be faster.

The future for Data Science in Design is exciting and, when they start to overlap more, we will see changes in the world around around us accelerate even faster.

Data Scientists are more than just rebranded Software Engineers

Michael Li of the Data Incubator has written a timely article in VentureBeat on what a Data Scientist is not. In short a Data Scientist is:

  1. Not just a Business Analyst working on more data,
  2. Not just a rebranded Software Engineer,
  3. Not just a Machine Learning expert with no business knowledge.

A Data Scientist needs to be able to extract insights from datasets that are orders of magnitude larger than what they were 5 years ago. And they need to extract this insight carefully, with statistical significance and integrity. Moreover, the insight is only as useful as the business need it solves.

As a regular interviewer at Airbnb for junior and senior Data Scientists, attention to data cleaning and diligence in statistical analysis are fundamental for successful candidates. Moreover, we look for people that understand the ‘why’ of a problem and the business impact of a solution. This is what differentiates a really smart candidate from a hired candidate.

Forget Big Data, we need Big Insights


A recent article in the UK Computing journal suggests that the new frontier in industry is extracting insights from big data. While a decade or more ago data ingestion was the greatest challenge to companies collecting large swathes of data, that problem has now largely been overcome.

Of 300 IT professionals interviewed, only 13% saw raw data access as the biggest challenge in their work. The majority were split between transforming raw data into useful data and extracting actionable insights from the useful data.

Database platforms such as SQL and Hadoop and other tools have largely standardised the warehousing and accessing of data across big data consumers. However, insight extraction is the next frontier and there is still no runaway leader or dominant player in this space. It is difficult to build a one size fits all solution to the problems a traditional business analyst might work on.

But it is likely we will see more and more contenders coming to the fore in this potentially lucrative space. It will be an exciting arms race to see play out in the data science world!

Beating the government: is big data crucial or creepy?

An article on Thursday in the UK online tech journal ArsTechnica reviews the surprising power of mobile communications data to identify trending unemployment.

PLOS One paper and Journal of the Royal Society Interface paper both published last week look at changes in the frequency, location, and timing of interactions between people via their cellular records. The correlations between these changes and observed layoffs can be used to train models for future predictions.

The article asks: is this harvesting of phone records to get ahead of employment shocks a critical tool for planners and government officials? Or actually a very creepy and invasive use of personal information? Comments welcome!


This image, unrelated to the unemployment study, shows seasonal population changes in France and Portugal, measured by cellphone activity.

Reliable data is 90% of the data scientist’s work

telegraphglobalwarming An article in the UK newspaper The Telegraph today reports to have uncovered a huge scandal in the measurement and use of temperature records from South American weather stations. The article claims that temperature readings have been reversed to show a 1 degree celsius rise in the past 40 years when in fact temperatures have been cooling.

Whether there is any truth or not in the article is of huge importance and interest. But the meta-message to take away here is that data has to be vetted and reliable and trustworthy before models can be built and decisions taken.

It is no surprise then that a data scientist may find themselves spending much more time obtaining, cleaning, checking, and re-checking data, than analysing it. And this is just how it should be, and also why a data scientist is unique in their role as the curator of data. The article is a timely reminder that this responsibility should be taken extremely seriously and executed with the upmost integrity.