Learning Algorithms can be abstract, and most of us are wary of their biases and predictions. We get especially wary when we use them to drive our decision making. It’s also hard for most of us to think probabilistically about the world, although it’s “probably” better for us. CEOs and executives are no different , they have their fears and concerns about the choices and decisions bestowed upon them through a probabilistic model. In part 1 of this blog I talked about establishing trust by keeping algorithms and their abstractions simple. The following practice is essential to cement this trust in the data science way.
Make it explainable
Making ones predictions explainable is key to giving an audience control over the decisions. CEOs like to be in control and it’s absolutely normal to demand explainability when capital allocation recommendation is driven by an algorithm. One trick I like to use is deploy charts that explain how a prediction was made, like sunburst with word cloud showing what variables and scenarios led to the highest likelihood, or partial dependence plots when trees based algorithms are used. I highly recommend looking into explained.ai for some good insights on how to explain tree based algorithms.
Admittedly, there is an explainability crisis in some areas of machine learning. So explaining how confident we are in a certain prediction is a huge step forward. This can come in the form of confidence intervals for our predictions. It’s better to provide probabilistic predictions vs deterministic ones, i.e. we should aspire to show confidence intervals for our predictions when we can, or the probability of classifier that we use. The challenge is that once we go outside linear regressions, providing confidence intervals becomes much harder.
Recently I have been spending time working on and researching how to replicate traditional Deep Learning techniques using their Bayesian equivalents in order to bring confidence intervals to model predictions. There are some great examples on how to do that, Yarin Gal 2016 thesis is a good place to start. The idea is quite revolutionary and “simple”, by making use of dropout at inference and not only during training, we can give uncertainty and confidence intervals to our neural network predictions and therefore create a Bayesian approximation. This means, for classification problems, we can make an algorithm return something like “I don’t know” when the confidence is low. Having low confidence here is a feature not a bug.
Last year DeepMind published a paper For neural processes which mimic Gaussian processes, in the sense we get confidence intervals for the predictions without sacrificing computation power. This too turns out to be a very powerful technique for providing uncertainty to our predictions. Saying that this is useful is an understatement. Majority of machine learning models can’t extrapolate to new data points, but yet they still return a “confident” prediction. This makes predictions fairly problematic for unseen scenarios.
If using Bayesian approximation or Bayesian Data Analytics is out of reach, a good practice is to look for shifts in the data between what the model was trained on and the new data used at inference. It’s a method that I’ve used in production level systems to catch when a prediction is most vulnerable because of a shift in distribution, i.e. when data is non Stationary. This is a fairly common issue with capital market trading data, machinery health and integrity, etc… where the machine learning system is constantly “degrading”.
With varying degrees of success, I’ve used a technique from anomaly detection to assess whether or not a distribution has shifted and therefore one’s confidence in a prediction should go down. By building a one class classification (OCC) on “normal” distribution, we can detect abnormal or new distribution and raise an alarm. Obviously there is bit more to it, but you get the idea. There is no free lunch here, this is adding complexity to a learning system and more models to handle.
Ultimately we have to try to pull all the stops to bring trust to what our analysis is doing. life will be a lot simpler If we can fit the world in 2 dimensions but it would also be a lot less interesting.