Recently, I came across a dataset of about 6 months worth of internal US flights prices data. For about 100 popular routes, the dataset had the time and current price for the future flight. I wanted to see whether we could actually predict directional changes in price with any confidence.
I built a model to try to predict whether the price would drop by at least 10% in the next 7 days. Using only historical price returns and weekly updating of the model parameters, I calculated the daily out-of-sample performance. The results were much better than I expected.
Firstly, the 2 parameters in my model were reasonably stable over time – a key property of a well defined model. And secondly, the out-of-sample R2 (measure of performance) was consistently positive and around 5%.
More concretely and actionable: for the dataset I was looking at, the price actually dropped 18% of the time (to below 10% in the proceeding week), the model made a prediction that the price would drop 13% of the time, and it was correct in 73% of these predictions.
With more features data such as flight duration, number of changes, oil prices, seasonality i’m confident that the 13% could get closer to 18% and the 73% could be pushed even higher, maybe to 95%.