Yesterday I was invited by David Webster to talk to the team at innovative design company IDEO. IDEO is a cutting edge digital and physical design studio in Palo Alto that has been leading creativity for over 30 years. I was lucky enough to have a tour by David through their workshop, engineering office, and toy lab.
After the tour we had a joint Q&A with the whole team about how big data is used at Airbnb and how it might be used more in the design process at IDEO. Some key thoughts emerged:
- The world is moving towards more wearable sensory technology e.g. Google glasses, Apple watch, Fitbit. With this comes a wealth of feedback data on the user in the offline world. The internet of things (IoT) will make, for example, A/B testing in the offline (physical) world possible.
- For designers to be more data empowered, we first need the analytics and prediction tools to catch up. Currently it is easy to log data, cheap to store data and there are standardised tools to query data. However, no leader has emerged for extracting insights from data. This democratisation of insights needs to happen before data can permeate design.
- Data science works best with design when they collaborate early. At the start of a project it is easier to scope what data is necessary and easy to collect at the outset so that decisions can be informed and iterations can be faster.
The future for Data Science in Design is exciting and, when they start to overlap more, we will see changes in the world around around us accelerate even faster.
- Not just a Business Analyst working on more data,
- Not just a rebranded Software Engineer,
- Not just a Machine Learning expert with no business knowledge.
A Data Scientist needs to be able to extract insights from datasets that are orders of magnitude larger than what they were 5 years ago. And they need to extract this insight carefully, with statistical significance and integrity. Moreover, the insight is only as useful as the business need it solves.
As a regular interviewer at Airbnb for junior and senior Data Scientists, attention to data cleaning and diligence in statistical analysis are fundamental for successful candidates. Moreover, we look for people that understand the ‘why’ of a problem and the business impact of a solution. This is what differentiates a really smart candidate from a hired candidate.
Read Riley Newman, head of Data Science at Airbnb, describe his experience during the past 5 years of Airbnb’s hypergrowth in today’s VentureBeat article. Learn about how he scaled the team and brought data to the top of mind in every corner of the company. There’s also a picture of my team featured!
During their trip I invited them to the offices at Airbnb to see how we thought about innovation – especially from the point of view of Data Science – and also for a catchup. Although I only studied at Imperial College for 1 year between 2005 and 2006, I have a great affinity for the university.
As part of the visit to the office, I was invited to contribute an interview for their alumni pages to hopefully get more of their current students thinking about a life in San Francisco and the tech startup scene. You can read the full transcript here.
I will be joining a panel discussion on behalf of Airbnb on the topic of ‘Extracting Actionable Insights Using Sentiment Analysis’. It’s sure to be a great event bringing together all the big players in the field.
Check out the full schedule and register to join!
While almost all members of the Airbnb community interact in good faith, there is an ever shrinking group of bad actors that seek to take advantage of the platform for profit. This problem is not unique to Airbnb: social networks battle with attempts to spam or phish users for their details; ecommerce sites try to prevent the use of stolen credit cards. The Trust and Safety team at Airbnb works tirelessly to remove bad actors from the Airbnb community and to help make the platform a safer and trustworthy place to experience belonging.
Missing Values In A Random Forest
We can train machine learning models to identify new bad actors (for more details see the previous blog post Architecting a Machine Learning System for Risk). One particular family of models we use is Random Forest Classifiers (RFCs). A RFC is a collection of trees, each independently grown using labeled and complete input training data. By complete we explicitly mean that there are no missing values i.e.
NaN values. But in practice the data often can have (many) missing values. In particular, very predictive features do not always have values available so they must be imputed before a random forest can be trained.
“…below-median income consumers will enjoy a disproportionate fraction of eventual welfare gains from this kind of ‘sharing economy’ through broader inclusion, higher quality rental-based consumption, and new ownership facilitated by rental supply revenues…”.
Commentators, such as Mashable, of the paper write that, while historically emphasis has been placed on the benefits to the consumer of increased access to higher quality products, this study looks at the other side of the equation and identifies short term and long term benefits to the suppliers also.
It remains to be seen if these benefits really hold in the longer term and can be sustained as momentum in the sharing economy gathers. Moreover, if these companies move towards IPO, then it is unclear how the pressure of shareholders will sit these positive effects.