An apocalyptic piece from the Evening Standard last Easter which highlights all the different points in our lives where algorithms are controlling what we see or hear or do. A few examples include:
- social network feeds
- travel websites
- song compositions
- pension investments.
Why should we be concerned? As Robert Colvile, author of The Great Acceleration, mentions in the context of financial markets:
‘The real danger is that it can all happen at speeds to which humans can’t react. Firms go bankrupt or markets get shattered before anyone’s really realised what’s going on, which is why it’s really important to have the right safeguards in place.’
Excited and honored to be presenting on Trust alongside Airbnb co-founder Joe Gebbia at next week’s inaugural Airbnb tech conference OpenAir 2016! We will be expanding on the study with Stanford University that Joe introduced in his TED Talk earlier in the year. Please grab a ticket for next Wednesday’s event here.
This and many more common questions about Data Science are tackled by Instacart VP Data Science Jeremy Stanley, and former LinkedIn data leader Daniel Tunkelang. The term Data Science was only coined a decade or so ago but has gathered so much momentum that most business leaders now feel like they should have a Data Science team – even if they don’t know what they would do with them.
Jeremy and Daniel take us through some common misconceptions and recommended ways for thinking about finding real impact from Data Science. Some of my favourite lines from the article:
The above may sound a lot like data analytics, and indeed the difference between analytics and decision science isn’t always clear. Still, decision science should do more than produce reports and dashboards.
But collecting data isn’t enough. Data science only matters if data drives action.
Similarly, data-driven decision making requires a top-down commitment. From the CEO down, the organization has to commit to making decisions using data, rather than based on the highest paid person’s opinion ( or HiPPO).
Many people equate big data to data science, but size isn’t everything. Data science is about separating the signal in data from the noise.
Don’t hire a head of data or build a team until you have work for them to do. At the same time, ensure you’re collecting key data early on so that team can have an impact once you’re ready.
Build a company culture early that makes it a great place to practice data science, and you’ll reap dividends when they matter most.
Over time, the impact that a data science team has will be far higher if you build a diverse team with extremely different backgrounds, skill-sets, and world views.
Finally, focus early on hiring data scientists who reflect your company ideals. To be effective, data scientists must be trusted by their teams, the users of their products, and the decision makers they influence.
I was fortunate enough to attend a Q&A with United States Chief Data Scientist DJ Patil at the Commonwealth Club last week. DJ was keen to stress the challenges facing the US government and the Big Data available to help solve these problems. But noted that the talent and progress we see in technology is not being applied to ‘real’ problems.
DJ gave examples from Law Enforcement and Health Care amongst as areas that are ripe for disruption by data and technology. He also stressed that much public data is readily available online, both at the local and national level – and he invited the Data Scientists in the audience to start hacking for social solutions!
An arms race has resumed amongst the world’s biggest hedge funds. Seeing the potential of the technologies produced at some of the most prolific Machine Learning groups in big tech companies such as Google and Facebook, a recent article notes that hedge funds are lifting lead Data Scientists to work on building better alpha strategies.
In the past, algorithmic trading prided itself on hiring highly skilled statisticians to sculpt informative signals and combine them in a state-of-the-art model to predict movements in prices. With the success of deep learning software, such as IBM’s Watson, hedge funds now see potential in throwing their financial big data at artificial intelligence at these artificial intelligence black boxes to predict alpha.
Bridgewater hired David Ferrucci, former lead engineer at IBM for developing Watson, Renaissance Technologies was founded by Bob Mercer and Peter Brown, former language recognition leads at IBM, and recently Blackrock hired Bill MacCartney, a former Google scientist.
For these robotics rockstars moving from Tech to Finance, one downside is that there work becomes a lot more secretive. The nature of algorithmic trading is very hush hush with all hedge funds in direct competition with each other. Compared to publishing research papers at IBM or Google, the traders at these funds will have to keep their advances to themselves – which is a loss for the rest of the scientific community.
In an exciting new partnership, Airbnb has teamed up with Kaggle to create an online Data Science data challenge. In this challenge we provide historical data on the first country guests book and then ask candidates to predict future first bookings.
Try the challenge yourself! You have until February 11th 2016 to submit your entries. And if you have any questions you can use the forum and I will respond as soon as possible. Good luck and hope you have fun playing with our data!
My latest Machine Learning blog post Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers is out on the Airbnb Data blog:
The Trust and Safety Team maintains a number of models for predicting and detecting fraudulent online and offline behaviour. A common challenge we face is attaining high confidence in the identification of fraudulent actions. Both in terms of classifying a fraudulent action as a fraudulent action (recall) and not classifying a good action as a fraudulent action (precision).
A classification model we often use is a Random Forest Classifier (RFC). However, by adjusting the logic of this algorithm slightly, so that we look for high confidence regions of classification, we can significantly improve the recall and precision of the classifier’s predictions. To do this we introduce a new splitting criterion (explained below) and show experimentally that it can enable more accurate fraud detection.
Have a read and let me know what you think!
I spoke at yesterday’s Social Data Revolution class on Trust & Identity at Berkeley University on behalf of Airbnb. The class also had speakers from Uber and Reddit. You can see the live recording on youtube.
A world super power in electronics, Japaenese company Fujitsu claims that it no longer needs Data Scientists, and has automated their job! The company claims that
Data scientists use their skill to select a combination of algorithm and configuration to get the most accurate predictive model from the starting data
and that they have found a way to automate this searching over different configurations and models for the optimum. The diagram depicts a meta machine learning pipeline that tunes the hyper-parameters of a model in Spark or another language.
While it certainly makes sense to automate this potentially tedious optimisation, this will by no means deprecate the role of a Data Scientist. It is of course true that a Data Scientist has to intelligently choose an algorithm and its configuration, but this is a small part of the full life cycle of a data product that a Data Scientist is responsible for.
The processes of defining a metric to optimise, then obtaining and cleaning data, transforming data to informative feature, maybe also obtaining and cleaning labels (in the case of supervised learning) are all part of a Data Scientist’s responsibilities and need to be completed before an algorithm can be optimised.
Moreover, these processes constitute 90% of the blood, sweat, and tears of a Data Scientist that go into making a successful data product. Algorithm and configuration optimisation can give you a few percentage points boost in performance at most, but it is the accuracy of the data and intelligent feature sculpting which make the real difference.
Let’s hope Fujitsu does not sack all their Data Scientists just yet, or they may have a machine learning tuner with no data for the machine to learn from!
It’s finally happened. The Epicentre co-working space in Stockholm, Sweden has installed a near-field communication (NFC) chip into 50 volunteers. Described as like ‘having a vaccination’, the chip is inserted into the wrists of employees under their skin. This enables them to automatically clock in and and out of work, register loyalty points, and automate many other transactions.
Personally, the costs seem to far outweigh the benefits in this scenario. Other employers, especially military organisations and security companies are watching this new device keenly. However, for saving a few seconds each day and the perceived security threat of not having these, the intrusion into one’s private life cannot be worth it.
Wearable devices such as the Fitbit and Apple Watch are becoming ever more popular, but a non-removable device is a new frontier. The argument has always been that they allow us to collect more Big Data and process it more immediately to improve our lives, but its not clear why faster is always better.
Being able to track every movement and heart beat is the stuff of Orwell and Huxley while the benefits are cursory at best. Caution brave new world.