Understand the mechanics behind “black box” machine learning model predictions

Machine learning models are often referred to as “black box” because their representations of knowledge are not intuitive, and, as a result, it’s difficult to understand how they work. Interpretable machine learning refers to techniques that overcome the black-box nature of most machine learning algorithms. By revealing how various features contribute (or do not contribute) to predictions, you can validate that the model is using the right evidence for its predictions and find model biases that were not apparent during training.

Practitioners seek model interpretability primarily for three reasons:

  1. Guidelines: “Black box” models violate many corporate technology best practices and personal preference.
  2. Validation: It’s valuable to understand where or why predictions go wrong and run “what-if” scenarios to improve model robustness and eliminate bias.
  3. Regulations: Model interpretability is required to comply with government regulations for sensitive applications, such as in finance, public health, and transportation.

Interpretable machine learning addresses these concerns and increases trust in the models in situations where explanations for predictions are important or required by regulation.

Interpretable machine learning works on three levels:

Local: Explaining the factors behind an individual prediction such as why a loan application was rejected

Cohort: Demonstrating how a model makes predictions for a specific population or group within a training or test data set such as why a group of manufactured products were classified as faulty

Global: Understanding how a machine learning model works over an entire training or test data set such as which factors are considered by a model classifying radiology images

Some machine learning models, such as linear regression and decision trees, are inherently interpretable. However, interpretability often comes at the expense of power and accuracy.

Figure 1: Trade-off between model performance and explainability.

Using MATLAB® for machine learning, you can apply techniques to interpret and explain most popular and highly accurate machine learning models that aren’t inherently interpretable.

Local Interpretable Model-Agnostic Explanations (LIME): Approximate a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree, and use it as a surrogate to explain how the original (complex) model works. Figure 2 below illustrates the three main steps of applying LIME.


Figure 2: How to obtain Local Interpretable Model-Agnostic Explanations (LIME).

Partial Dependence and Individual Conditional Expectation Plots: Examine the effect of one or two predictors on the overall prediction by averaging the output of the model over all the possible feature values

Figure 3: Partial dependence plot showing whether x1 is above or below 3000, which makes a big difference on the prediction

You can use MATLAB for other popular interpretability methods, including:

  • Permuted Predictor Importance: Look at a model prediction error on a test or training data set and shuffle the values of a predictor. The magnitude of the changes in error from shuffling the values of the predictor correspond to the predictor’s importance.
  • Shapley Value: Derived from cooperative game theory, the Shapley value is the average marginal contribution of a specific feature over all possible “coalitions” i.e., combinations of features. Evaluating all feature combinations generally takes a long time, therefore in practice Shapley values are approximated applying Monte Carlo simulation.
In summary, the main use cases for model interpretability are:
  Local Cohort Global
What's explained: Individual prediction Model behavior on subset of population Model behavior "anywhere"
Use cases

When individual prediction goes wrong

Prediction seems counter-intuitive

What-if analysis

Protection against bias 

Validate outcome for a particular group

Demonstrate how the model works

Compare different models for deployment

Applicable interpretability methods


Local decision tree

Shapely value

Global methods on subset of data


Global decision tree

Feature importance

Interpretability methods have their own limitations. A best practice is to be aware of those limitations as you fit these algorithms to the various use cases. Interpretability tools help you understand why a machine learning model makes the predictions that it does, which is a key part of verifying and validating applications of AI. Certification bodies are currently working on a framework for certifying AI for sensitive applications such as autonomous transportation and medicine.