There’s high hopes for applying deep learning to finance in areas such as predictive modelling, portfolio optimization, and market risk.
But the more common “frequentist” approaches being used can fall short in measuring model uncertainty.
That’s where Bayesian, or probabilistic, techniques step in.
In brief, frequentists put probability distributions on data given a hypothesis, while Bayesians put probability distributions on everything, hypotheses and data.
What this means is that quantifying uncertainty is left as more of an afterthought by frequentists, whereas Bayesian approaches are designed to model uncertainty, explained Matthew Dixon, assistant professor of statistics and finance at the Illinois Institute of Technology, and a co-founder of the Thalesians, speaking to MarketBrains.
In trading, that translates to a better characterization of relationships between assets over time, a more complete market outlook with real-time model updating and, ultimately, more efficient price discovery, hedging and adverse selection mitigation.
On the flip side however is that Bayesian approaches tend to be much more computationally expensive because they carry around a whole distribution rather than just a single point of the parameters.
Still, the type of data representations that deep learning permits are central constructs in deep probabilistic programming (DPP), a combination of deep learning and probabilistic programming.
Dixon’s research into DPP shows merit in exploring high dimensional modelling problems arising in large-scale portfolios, systemic risk and contagion in financial networks and where spatial structure is present in the application, such as limit order book modelling.
And during a recent QuantMinds webinar, panellists from the market making, broker and fintech communities explained how DPP is being used, including for high frequency trading.
Programming for speed
Several panellists highlighted Edward, a language based on TensorFlow that can be used directly from Python and is Turing complete. That means it can do any calculation, including common deterministic computations from which a computational graph (output of TensorFlow) can be built.
David Bellot, senior quantitative researcher at Liquid Capital Group, an options market maker, said that representing the model as a computational graph that can be parallelized significantly reduces speed of computations, a major advantage in high frequency trading.
In terms of modelling, Bellot added, the advantage for a practitioner is being able to iterate over many ideas per day, and try different possibilities, which were almost impossible before.
“When you look at financial market data, for example, in the world of high frequency trading, where you are going to have millions of data points per day to deal with, running and estimating one single model would take ages,” he said.
“With this kind of decompositions now, we can run a lot of different possibilities (and) compare models between them. In terms of speed that’s (a) massive benefit.”
The reality is that market patterns change quickly. An algorithm may have a profitable shelf life of just three or four months because of new market entrants and changes in trading venues’ or competitors’ technology, for examples.
“Many times, you have to invent something new or slightly different and you need tools in which you can describe machine learning very quickly,” Bellot said. “That’s the benefit we see from this probabilistic programming language, which you didn’t have before.”
Also mentioned was Keras, a neural networks API developed for fast experimentation, written in Python and capable of running on top of TensorFlow.
The barrier to entry is “deceptively” simple however, and some discipline is needed, Dixon’s research shows.
“The programming convenience of Keras for TensorFlow is alluring, but the ability to build effective and interpretable financial models is still in its infancy and should be investigated more fully by cross-disciplinary researchers in computational science, statistics and finance, before any definitive and practical conclusions can be drawn,” Dixon wrote.
For quants starting out on DPP, Bellot had some advice: “Start with the basics in machine learning, try to understand both the frequentist and probabilistic approach (and) try to have a good understanding of the story behind each model.”
“When it’s about modelling this kind of financial data, in many cases we don’t need to have extremely complex models at first,” he said. “(Don’t) try to use a super deep Bayesian deep learning model just because it’s very fashionable, but to see what the data tells and where do they come from.”
Another panellist, Elijah De Palma, a quantitative research analyst at Charles Schwab, summarized an academic debate that suggests the traditional Bayesian approach of expressing prior knowledge as prior probability is perhaps too crude.
“The big paradigm distinction with probabilistic programming from frequentist statistics is that the probabilistic graphical models can be used to impose a structural model for causal inference, and that’s regardless of the underlying probability distributions on a model’s parameters, whether they be Bayesian or non-parametric,” said De Palma.
In other words, an algorithm that captures reality must impose a causal model that is more fundamental than the data itself, and probabilistic programming offers one approach.
Speaking to MarketBrains, Matthew Dixon explained that discretionary traders develop an intuition for market impact factors that often goes beyond the insight of financial times series.
For example, a natural gas trader will have an understanding of the fundamental factors affecting supply and may express that in a strategy.
Causal models formalize these relationships, allowing a market expert to translate what they know into a probabilistic model before fitting it to data, rather than trying to statistically infer relationships in data and then attempt to establish causality, Dixon explained.
There’s still a lot of convincing to do up the financial industry food chain, panellists noted.
For large fund managers, less nimble than smaller hedge funds, the key barrier to entry is acceptance by senior business leaders, who generally do not have a technical background and also have a lingering memory of the role of quant models leading up to the global financial crisis, said De Palma.
“Thus, key requirements for senior business leaders are clear interpretability and industry-wide consensus acceptance,” he said.
There has been some progress: last year the CFA Institute announced the inclusion of big data analysis and machine learning topics into their curriculum beginning in 2019, but interpretability is still a big barrier to acceptance for probabilistic programming, he added.
Tyler Ward, local search modeler at Google with previous stints at RBC and Morgan Stanley, said that the current state of machine learning, and model building in general, is at a stage that could be compared to pre-Newtonian physics.
“DPP, these sorts of approaches, are clearly moving us towards what the end state of mature machine learning has to look like,” Ward said. “The current situation looks (like) a lot of black boxes that kind of do their thing and you try to link them together.”
“These need to be linked together in a causal way and also so that they can potentially try to cancel out the uncertainty in each other,” he said.
During Ward’s time at RBC, the mortgage models were inexpensive and easy to defend: the team had a computational budget and some data and grew the model progressively adding effects, checking them with information criteria.
“The regulators ask you: why did you make this choice? The answer is we didn’t, it was made automatically based on some very basic criteria that are easy to explain.
“This is the major deficiency left in this field. So, likely the next generation will include a lot more logic and make this sort of thing easier, and there will be a lot less of this hand plugging in parameters,” Ward added.
Yam Peleg, founder and chairman of Deep Trading, an AI fintech, noted that Bayesian deep learning is superior in that statistical inference is not just from one single point of weight, which is “extremely crucial in finance and quantitative trading as a whole.”