An apocalyptic piece from the Evening Standard last Easter which highlights all the different points in our lives where algorithms are controlling what we see or hear or do. A few examples include:
- social network feeds
- travel websites
- song compositions
- pension investments.
Why should we be concerned? As Robert Colvile, author of The Great Acceleration, mentions in the context of financial markets:
‘The real danger is that it can all happen at speeds to which humans can’t react. Firms go bankrupt or markets get shattered before anyone’s really realised what’s going on, which is why it’s really important to have the right safeguards in place.’
Taken from Wired‘s article two months ago:
This morning, OpenAI will release its first batch of AI software, a toolkit for building artificially intelligent systems by way of a technology called “reinforcement learning”—one of the key technologies that, among other things, drove the creation of AlphaGo, the Google AI that shocked the world by mastering the ancient game of Go. With this toolkit, you can build systems that simulate a new breed of robot, play Atari games, and, yes, master the game of Go.
He envisions OpenAI as the modern incarnation of Xerox PARC, the tech research lab that thrived in the 1970s. Just as PARC’s largely open and unfettered research gave rise to everything from the graphical user interface to the laser printer to object-oriented programing, Brockman and crew seek to delve even deeper into what we once considered science fiction. PARC was owned by, yes, Xerox, but it fed so many other companies, most notably Apple, because people like Steve Jobs were privy to its research. At OpenAI, Brockman wants to make everyone privy to its research.
But along with such promise comes deep anxiety. Musk and Altman worry that if people can build AI that can do great things, then they can build AI that can do awful things, too. They’re not alone in their fear of robot overlords, but perhaps counterintuitively, Musk and Altman also think that the best way to battle malicious AI is not to restrict access to artificial intelligence but expand it. That’s part of what has attracted a team of young, hyper-intelligent idealists to their new project.
Giving up control is the essence of the open source ideal. If enough people apply themselves to a collective goal, the end result will trounce anything you concoct in secret. But if AI becomes as powerful as promised, the equation changes. We’ll have to ensure that new AIs adhere to the same egalitarian ideals that led to their creation in the first place. Musk, Altman, and Brockman are placing their faith in the wisdom of the crowd. But if they’re right, one day that crowd won’t be entirely human.
You can read the full text here.
A recent article by a Foreign-Exchange Journalist suggests the ‘Skynet’ of Finance is not too far away.
The article points to the huge improvements in Artificial Intelligence and the bullishness of Financial Services firms for takeover by technology in their industry. In particular, Transfer/Payments business expect to lose 28% of their business to FinTech in the next 5 years, and Banks expect to lose 24% of their business.
The silver lining to this takeover could however be, the article points out, the greater emphasis on the ‘human touch’ in key customer interfacing areas. For example, a human hand at the wheel to prevent another ‘flash crash’ or a human interpreter of the decisions of an Artificial Intelligence made lending / investment decision.
Whatever happens, we are likely to see more automation, lower costs for the customer, and smarter decision making – albeit in the near term.
An arms race has resumed amongst the world’s biggest hedge funds. Seeing the potential of the technologies produced at some of the most prolific Machine Learning groups in big tech companies such as Google and Facebook, a recent article notes that hedge funds are lifting lead Data Scientists to work on building better alpha strategies.
In the past, algorithmic trading prided itself on hiring highly skilled statisticians to sculpt informative signals and combine them in a state-of-the-art model to predict movements in prices. With the success of deep learning software, such as IBM’s Watson, hedge funds now see potential in throwing their financial big data at artificial intelligence at these artificial intelligence black boxes to predict alpha.
Bridgewater hired David Ferrucci, former lead engineer at IBM for developing Watson, Renaissance Technologies was founded by Bob Mercer and Peter Brown, former language recognition leads at IBM, and recently Blackrock hired Bill MacCartney, a former Google scientist.
For these robotics rockstars moving from Tech to Finance, one downside is that there work becomes a lot more secretive. The nature of algorithmic trading is very hush hush with all hedge funds in direct competition with each other. Compared to publishing research papers at IBM or Google, the traders at these funds will have to keep their advances to themselves – which is a loss for the rest of the scientific community.
I had the pleasure of video-conferencing into Kellogg‘s MBA class in Social Media at Northwestern University yesterday. Brayden King kindly invited me to talk about how Airbnb thinks about Trust and the challenges facing sharing economies.
We spoke about the role of Data Science at the company and how it has changed over the years. As the volume of data has grown, we have more often than not moved away from explanatory predictive models to Machine Learning algorithms.
One thing that stood out to me as top of mind for the students in the MBA class was the process of Trust development for first time users. How does a first time guest get accepted by a host on Airbnb? How does a first time host get selected by a guest?
At Airbnb we have a team of highly skilled Data Scientists and Engineers working on matching algorithms designed to help first time guests and hosts. And even more than this, the community are their own best resource. Experienced hosts help new hosts manage their listing and new guests book their first experience.
At the heart of everything data-related we work on at Airbnb is the community and enabling them to make more connections amongst themselves and new users.
I recently joined Cyberlaunch, the world’s leading accelerator for information security (Infosec) and machine learning (ML), as a Mentor for their startup companies.
Last week they launched a Startup Challenge to find the brightest solutions to challenging Infosec and ML problems. There are two prizes, each worth over $150,000.
Its sure to be a very competitive field and I am looking forward to the entries!
My latest Machine Learning blog post Confidence Splitting Criterions Can Improve Precision And Recall in Random Forest Classifiers is out on the Airbnb Data blog:
The Trust and Safety Team maintains a number of models for predicting and detecting fraudulent online and offline behaviour. A common challenge we face is attaining high confidence in the identification of fraudulent actions. Both in terms of classifying a fraudulent action as a fraudulent action (recall) and not classifying a good action as a fraudulent action (precision).
A classification model we often use is a Random Forest Classifier (RFC). However, by adjusting the logic of this algorithm slightly, so that we look for high confidence regions of classification, we can significantly improve the recall and precision of the classifier’s predictions. To do this we introduce a new splitting criterion (explained below) and show experimentally that it can enable more accurate fraud detection.
Have a read and let me know what you think!
A world super power in electronics, Japaenese company Fujitsu claims that it no longer needs Data Scientists, and has automated their job! The company claims that
Data scientists use their skill to select a combination of algorithm and configuration to get the most accurate predictive model from the starting data
and that they have found a way to automate this searching over different configurations and models for the optimum. The diagram depicts a meta machine learning pipeline that tunes the hyper-parameters of a model in Spark or another language.
While it certainly makes sense to automate this potentially tedious optimisation, this will by no means deprecate the role of a Data Scientist. It is of course true that a Data Scientist has to intelligently choose an algorithm and its configuration, but this is a small part of the full life cycle of a data product that a Data Scientist is responsible for.
The processes of defining a metric to optimise, then obtaining and cleaning data, transforming data to informative feature, maybe also obtaining and cleaning labels (in the case of supervised learning) are all part of a Data Scientist’s responsibilities and need to be completed before an algorithm can be optimised.
Moreover, these processes constitute 90% of the blood, sweat, and tears of a Data Scientist that go into making a successful data product. Algorithm and configuration optimisation can give you a few percentage points boost in performance at most, but it is the accuracy of the data and intelligent feature sculpting which make the real difference.
Let’s hope Fujitsu does not sack all their Data Scientists just yet, or they may have a machine learning tuner with no data for the machine to learn from!
I am hoping to give a talk with Eric Levine on behalf of Airbnb at next year’s SXSW Interactive conference in Austin. Please vote for our submission and leave some comments too!
Michael Li of the Data Incubator has written a timely article in VentureBeat on what a Data Scientist is not. In short a Data Scientist is:
- Not just a Business Analyst working on more data,
- Not just a rebranded Software Engineer,
- Not just a Machine Learning expert with no business knowledge.
A Data Scientist needs to be able to extract insights from datasets that are orders of magnitude larger than what they were 5 years ago. And they need to extract this insight carefully, with statistical significance and integrity. Moreover, the insight is only as useful as the business need it solves.
As a regular interviewer at Airbnb for junior and senior Data Scientists, attention to data cleaning and diligence in statistical analysis are fundamental for successful candidates. Moreover, we look for people that understand the ‘why’ of a problem and the business impact of a solution. This is what differentiates a really smart candidate from a hired candidate.