Interpretability is the degree to which machine learning algorithms can be understood by humans. More specifically, interpretability describes the ability to understand the reasoning behind predictions and decisions made by a machine learning model.
Interpretability techniques help to reveal how machine learning models make predictions. By uncovering how various features contribute (or do not contribute) to predictions, interpretability techniques can help you validate that a machine learning model is using appropriate evidence for predictions and find biases in your model that were not apparent during training. Some machine learning models, such as linear regression, decision trees, and generative additive models are inherently interpretable. However, interpretability often comes at the expense of power and accuracy, as illustrated in Figure 1.
Interpretability versus Explainability
Explainable AI is an emerging field where the closely related terms interpretability and explainability are often used interchangeably. However, interpretability and explainability are different. Explainability refers to explaining the behavior of a machine learning model in human terms, without necessarily understanding the model’s inner mechanisms. Explainability can also be viewed as model-agnostic interpretability.
For engineers, one approach to explain the behavior of a system is by using first principles. A first-principles model has clear, explainable physical meaning, and its behavior can be parameterized. That type of model is known as “white box.” The behavior of machine learning models is more “opaque.”
Machine learning models vary in complexity, intuitiveness of knowledge representation, and, thus, difficulty to fully understand how they work. Machine learning models can be “gray box,” in which case you can apply interpretability techniques to understand their inner mechanisms, or “black box,” in which case you can apply explainability (or model-agnostic interpretability) techniques to understand their behavior. Deep learning models are typically black box.
Global and Local Interpretability Methods
Interpretability is typically applied at two levels:
- Global Methods: These interpretability methods provide an overview of the most influential variables in the model based on input data and predicted output.
- Local Methods: These interpretability methods provide an explanation of a single prediction result.
Figure 3 illustrates the difference between the local and global scope of interpretability. You can also apply interpretability to groups within your data and arrive at conclusions at the group level, such as why a group of manufactured products were classified as faulty.
Get Started with Interpretability Examples in MATLAB
Popular techniques for local interpretability include local interpretable model-agnostic explanations (LIME) and Shapley values. For global interpretability, many users start with feature ranking (or importance) and visualizing partial dependence plots. You can apply these techniques using MATLAB®, as shown in these examples:
Engineers and scientists seek model interpretability for three main reasons:
- Debugging: Understanding where or why predictions go wrong. Running “what-if” scenarios can improve model robustness and eliminate bias.
- Guidelines: Black-box or gray-box models may violate industry best practices.
- Regulations: Some government regulations require interpretability for sensitive applications, such as finance, public health, and transportation.
Model interpretability addresses these concerns and increases trust in the models in situations where explanations for predictions are important, such as when comparing results between competing models, or necessary, such as when interpretability is required by regulation.
Applications Where Interpretability Matters
Interpretability tools help you understand why a machine learning model makes the predictions that it does. Interpretability is likely to become increasingly relevant as regulatory and professional bodies continue to work toward a framework for certifying AI for sensitive applications, such as:
Local Interpretable Model-Agnostic Explanations (LIME): Use LIME to approximate a complex model in the neighborhood of the prediction of interest with a simple interpretable model, such as a linear model or decision tree. You can then use the simpler model as a surrogate to explain how the original (complex) model works. Figure 4 illustrates the three main steps of applying LIME.
Partial Dependence (PDP) and Individual Conditional Expectation (ICE) Plots: With these methods, you examine the effect of one or two predictors on the overall prediction by averaging the output of the model over all the possible feature values. Figure 5 shows a partial dependence plot that was generated with the
Strictly speaking, a partial dependence plot shows that certain ranges in the value of a predictor are associated with specific likelihoods for prediction; that’s not sufficient to establish a causal relationship between predictor values and prediction. However, if a local interpretability method like LIME indicates the predictor significantly influenced the prediction (in an area of interest), you can arrive at an explanation why a model behaved a certain way in that local area.
Shapley Values: This technique explains how much each predictor contributes to a prediction by calculating the deviation of a prediction of interest from the average. This method is particularly popular within the finance industry because it is derived from game theory as its theoretical underpinning, and because it satisfies the regulatory requirement of providing complete explanations: the sum of the Shapley values for all features corresponds to the total deviation of the prediction from the average. The
shapley function computes Shapley values for a query point of interest.
Evaluating all combinations of features generally takes a long time. Therefore, Shapley values are often approximated by applying Monte Carlo simulation in practice.
Figure 6 shows that in the context of predicting heart arrhythmia near the sample of interest, MFCC4 had a strong positive impact on predicting “abnormal,” while MFCC11 and 5 leaned against that prediction, i.e., toward a “normal” heart.
Predictor Importance Estimations by Permutation: MATLAB also supports permuted predictor importance for random forests. This approach takes the impact changes in predictor values have on model prediction error as an indication of predictor importance. The function shuffles the values of a predictor on test or training data and observes the magnitude of the resulting changes in error.
Choosing a Method for Interpretability
Different interpretability methods have their own limitations and strengths; take these into account when you seek interpretability in your application. Figure 6 provides an overview of interpretability methods and guidance on how to apply them. The interpretability methods presented in Figure 7 are available in MATLAB.
- Overview of Interpretability in MATLAB - Documentation
- Applying Visualization and Interpretability to Deep Neural Networks - Documentation
- Model Interpretability in MATLAB (5:49) - Video
- Lowering Barriers to AI Adoption with AutoML and Interpretability (35:11) - Video
- Machine Learning Tutorials and Examples with MATLAB - Overview
- Explainable AI (XAI): Are we there yet? - Blog
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.