Interpretability

Understand the mechanics behind “black box” machine learning model predictions

Machine learning models are often referred to as “black box” because their representations of knowledge are not intuitive, and, as a result, it’s difficult to understand how they work. Interpretability refers to techniques that overcome the black-box nature of most machine learning algorithms.

By revealing how various features contribute (or do not contribute) to predictions, you can validate that the model is using the right evidence for its predictions and find model biases that were not apparent during training. Some machine learning models, such as linear regression, decision trees, and generative additive models are inherently interpretable. However, interpretability often comes at the expense of predictive power and accuracy (Figure 1).

Figure 1: Trade-off between model performance and interpretability.

Figure 1: Trade-off between model performance and interpretability.

Applying Interpretability

Practitioners seek model interpretability primarily for three reasons:

  • Debugging: Understanding where or why predictions go wrong and running “what-if” scenarios can improve model robustness and eliminate bias.
  • Guidelines: “Black box” models violate many corporate technology best practices and personal preference.
  • Regulations: Model interpretability is required to comply with government regulations for sensitive applications such as in finance, public health, and transportation.

Model interpretability addresses these concerns and increases trust in the models in situations where explanations for predictions are important or required by regulation.

Interpretability can be applied on three levels as shown in Figure 2 below.

  • Local: Explaining the factors behind an individual prediction such as why a loan application was rejected
  • Cohort: Demonstrating how a model makes predictions for a specific population or group within a training or test data set such as why a group of manufactured products was classified as faulty
  • Global: Understanding how a machine learning model works over an entire training or test data set such as which factors are considered by a model classifying radiology images
Figure 2: Use cases for model interpretability.

Figure 2: Use cases for model interpretability.

Using Interpretability Techniques in MATLAB

Using MATLAB® for machine learning, you can apply techniques to interpret and explain most popular and highly accurate machine learning models that aren’t inherently interpretable.

Local interpretable model-agnostic explanations (LIME): Approximate a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree, and use it as a surrogate to explain how the original (complex) model works. Figure 3 below illustrates the three main steps of applying LIME.

Figure 3: By fitting a lime object, a simple interpretable model, you can obtain LIME explanations in MATLAB.

Figure 3: By fitting a lime object, a simple interpretable model, you can obtain LIME explanations in MATLAB.

Partial dependence (PDP) and individual conditional expectation (ICE) plots: Examine the effect of one or two predictors on the overall prediction by averaging the output of the model over all the possible feature values. Figure 4 below shows a partial depended plot that was generated with the MATLAB function plotPartialDependence.

Figure 4: Partial dependence plot showing whether x1 is above or below 3000, which makes a big difference on the prediction and, thus, model interpretability.

Figure 4: Partial dependence plot showing whether x1 is above or below 3000, which makes a big difference on the prediction and, thus, model interpretability.

Shapley values: Explain how much each predictor contributes to a prediction by calculating the deviation of a prediction of interest from the average. This method is popular in the finance industry because it is derived from game theory and it satisfies the regulatory requirement of providing complete explanations: the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. The MATLAB function shapley computes Shapley values for a query point of interest.

Figure 5: The Shapley values indicates how much each predictor deviates from the average prediction at the point of interest.

Figure 5: The Shapley values indicates how much each predictor deviates from the average prediction at the point of interest.

Evaluating all combinations of features generally takes a long time. Therefore in practice Shapley values are often approximated by applying Monte Carlo simulation.

MATLAB also supports permuted predictor importance for random forests, which looks at a model prediction error on a test or training data set and shuffle the values of a predictor and estimates the magnitude of the changes in error from shuffling the values of the predictor correspond to the predictor’s importance.

Choosing a Method for Interpretability

Figure 6 provides an overview of inherently explainable machine learning, various (model-agnostic) interpretability methods, and guidance on when to apply them.

Figure 6: How to select the appropriate interpretability method.

Figure 6: How to select the appropriate interpretability method.

Interpretability methods have their own limitations. A best practice is to be aware of those limitations as you fit these algorithms to the various use cases. Interpretability tools help you understand why a machine learning model makes the predictions that it does, which is a key part of verifying and validating applications of AI. Certification bodies are currently working on a framework for certifying AI for sensitive applications such as autonomous transportation and medicine.

See also: artificial intelligence, machine learning, supervised learning, deep learning, AutoML