An apocalyptic piece from the Evening Standard last Easter which highlights all the different points in our lives where algorithms are controlling what we see or hear or do. A few examples include:
- social network feeds
- travel websites
- song compositions
- pension investments.
Why should we be concerned? As Robert Colvile, author of The Great Acceleration, mentions in the context of financial markets:
‘The real danger is that it can all happen at speeds to which humans can’t react. Firms go bankrupt or markets get shattered before anyone’s really realised what’s going on, which is why it’s really important to have the right safeguards in place.’
Twitter‘s latest SEC filings declare that only 8.5% of their active accounts are probably robots. Thats 24 million of their 284 million active accounts. However, a previous report by the Wall Street Journal in April last year suggested that as many as 44% of twitter accounts have never tweeted.
The devil, as always, is in the detail. First of all in deciding what an ‘active’ account constitutes, and secondly in judging which criteria to use for classification as a ‘bot’ account.
Much of the task of a data scientist is precisely this problem of constructing sensible definitions, as well as understanding and communicating the implications of alternative definitions. A successful data scientist, like an academic researcher, will be able to accurately uncover the learnings from a dataset in different scenarios, and then action based on the likeliest scenario.
I am a huge football (I think they call it ‘soccer’ in the US) fan and the football world cup every four years is an absolute feast for enthusiasts like me. Strangely though, despite all the progress in most industries of ‘big data’, football usually keeps only one statistic up on screen: goals scored by each team.
With three matches per day for the first two weeks of the four week tournament, the six hour time difference between San Francisco and Brazil, and having recently started at Airbnb, it was going to be impossible to keep up with every game all the time without some help.
So I wrote myself a twitter bot in Matlab called Scotty Stats to tweet the scores and probabilities of each side winning the match. It output the pairs of tweets, one for the home team and one for the away team. I also included a 🙂 or 😦 at the end if the probability of winning was particularly high or low respectively!
Ideally I would have collected the predictive data during the game e.g. number of shots on goal so far, past history in meetings, injuries before this game to key players, possession in game so far, and built a model to predict the probability of winning. But this would have been very time intensive and probably also cost dollars to get rich enough data.
So I did what every good hack does and re-used other people’s hard work. Who has live scores and probabilities of winning? – The bookies of course! Every few minutes I would have my Matlab program go to my favourite sports betting website, e.g. Betfair or Sporting Index, and fetch the time gone, the latest score, and the latest odds of either side winning. Transforming the betting odds to a probability is trivial. And although the betting odds have a profit premium built in, they were close enough for my purposes to the fair odds.
The twitter bot is not running anymore but it served me well during the World Cup. Although the England team crashed out early and the Germans won (again) so it didn’t all go to plan!