Want to switch from Academia to Data Science?

Check out my latest Airbnb blog post on how to prepare for moving from Academia to Data Science in industry.academiatodsc

Advertisements

We came, we saw, we hacked!

Last weekend I spent Saturday and Sunday hacking on government data at this year’s BayesImpact‘s hackathon – Bayes Hack 2016! Located at OpenDNS HQ, the event invited teams of Data Scientists, Engineers, Designers, and anyone who is interested in data to hack for 24 hours.

For those unfamiliar with ‘hacking’: the premise is basically to build something in a very very short amount of time. We call it ‘hacking’ because you have to cut corners and write some ugly code to get a product out quickly. It’s different to your normal job but very liberating!

IMG_0347

I teamed up with four other Data Scientists from Airbnb and an Engineer and we decided to look at the Department of Labour‘s database on jobs and associated skills, knowledge, education requirements. Our prompt was the following:

Economic landscapes change dramatically, often outpacing a workforce lagging in its adaptation to new opportunities and industries. How can data scientists leverage predictive modeling to close the gap?

What did we build? We broke into two teams and one team built a recommendation engine for users to enter their skills and abilities to get back job suggestions. The second team, which I worked on, built an interactive visualisation for these recommendations to enable users to explore related jobs.

airjobs_main

For each of the 954 jobs in the database, we computed the the coordinates of the job in the 35-dimensional space of skills. These skills are include: Reading Comprehension, Active Listening, Writing, etc. For each pair of jobs in this skills vector space, we computed the distance between them using the Kullback-Leibler Divergence to give us a value between 0 and 1. The smaller the distance (divergence), the more similar two jobs are in terms of the skills required to be competent in the jobs. The visualisation was made in Gephi and exported to SigmaJs.

We were one of the 8 finalists on the day but eventually lost out to the fantastic Go Bot Chat team working on the Department of Interior’s database. The project provides parks and recreations recommendations to people using a chatbot service built on top of Facebook’s Messenger.

The weekend was super fun and inspiring to see how much can be done so quickly on so much openly available data. You can see our full source code on Github and all the other projects from the competition there too.

 

Talking Trust with Kellogg’s MBA class

I had the pleasure of video-conferencing into Kellogg‘s MBA class in Social Media at Northwestern University yesterday. Brayden King kindly invited me to talk about how Airbnb thinks about Trust and the challenges facing sharing economies.

kellogg

We spoke about the role of Data Science at the company and how it has changed over the years. As the volume of data has grown, we have more often than not moved away from explanatory predictive models to Machine Learning algorithms.

One thing that stood out to me as top of mind for the students in the MBA class was the process of Trust development for first time users. How does a first time guest get accepted by a host on Airbnb? How does a first time host get selected by a guest?

At Airbnb we have a team of highly skilled Data Scientists and Engineers working on matching algorithms designed to help first time guests and hosts. And even more than this, the community are their own best resource. Experienced hosts help new hosts manage their listing and new guests book their first experience.

At the heart of everything data-related we work on at Airbnb is the community and enabling them to make more connections amongst themselves and new users.

Airbnb launches first ever Kaggle competition!

In an exciting new partnership, Airbnb has teamed up with Kaggle to create an online Data Science data challenge. In this challenge we provide historical data on the first country guests book and then ask candidates to predict future first bookings.

kaggle

Try the challenge yourself! You have until February 11th 2016 to submit your entries. And if you have any questions you can use the forum and I will respond as soon as possible. Good luck and hope you have fun playing with our data!