← Back to Interpretability

Demystifying Model Interpretability: A Deep Dive into SHAP values

by Veronica Scerra

So far in this series, we’ve explored PDPs for their global perspective and ICE plots for their individual-level insights. Here, we’ll dive into a tool that elegantly bridges the gap between these two, offering both granular and aggregate views of feature importance. Meet SHAP (Shapley Additive exPlanations) values - a powerful, theoretically grounded approach that lets us peek inside the black box and see how each feature contributes to individual predictions.

TL; DR

SHAP
What: Decomposes model prediction into feature contributions using game theory (fun!)
Use When: You want local interpretability and global insights
Assumptions: Feature independence (but kernel SHAP and tree SHAP try to work around this)
Alternatives: ICE, LIME, ALE, PDP

What are SHAP values?

Imagine your model is a team of players working together to make a prediction - say, whether a borrower will default on a loan. Each feature is a player: credit score, income, debt-to-income ratio, age, etc. How much does each of these players contribute to the final prediction? That’s what SHAP values help us understand.

SHAP comes from game theory and borrows concepts of Shapley values, originally designed to fairly distribute the “payout” among players depending on their individual contributions. In machine learning, the “payout” is the model prediction, and SHAP values fairly distribute that prediction across all features.

What makes SHAP compelling is its strong theoretical guarantees. It’s the only method that satisfies local accuracy, missingness, and consistency. (If your eyes glaze over - don’t worry, we’ll break it down.)

How Does It Work?

Let’s break it down with a food analogy (because who doesn’t love food?).

Imagine you and three friends contribute to cooking a fancy dinner. The final meal is amazing (naturally). You want to figure out how much credit each friend deserves. One way to do it? Try all combinations of who's in or out of the kitchen. A friend's contribution is the average difference in meal quality when they’re included vs excluded.

That’s essentially what SHAP does.

The formula for the SHAP value for a feature is:

\[ \phi_i = \sum_{S \subseteq F \setminus \{i\}} \frac{|S|!(|F| - |S| - 1)!}{|F|!} \left[ f(S \cup \{i\}) - f(S) \right] \]

This answers: “Across all possible feature combinations, how much does feature \( i \) change the model’s prediction when added?” This goes beyond toggling one feature—SHAP considers contextual dependencies across many combinations.

In Practice:

Simple 🙂

Strengths of SHAP Plots

SHAP is one of the most flexible and informative interpretability tools available. It offers:

When to use SHAP

SHAP should be your go-to when:

Basically, SHAP is great when you need both rigor and clarity - especially in sensitive or high-stakes domains.

Limitations

Of course, SHAP isn't perfect. When a model has many features, calculating SHAP values can be computationally expensive. The other edge of that sword is that SHAP provides a lot of information. Without thoughtful storytelling and interpretation, it's easy to drown in all those hard-won details. It's best used alongside domain knowledge and complemented with tools like PDPs, ICE, or ALE plots

Keep in mind, like PDPs, SHAP assumes feature independence, which can produce misleading results when features are strongly correlated.

Final Thoughts

SHAP values offer one of the most complete pictures we have for model interpretability. By connecting individual predictions to feature contributions in a mathematically sound way, SHAP allows you to explain, debug, and ultimately trust your models like never before

Where PDPs give you a sense of overall behavior and ICE shows you individual trajectories, SHAP ties it all together - telling you both what the model did and why it did it. If you're going to invest in learning one interpretability tool in-depth, SHAP might be the one.

Luckily, you don't have to pick just one! In my next post, I'll explore another rising interpretability technique - Accumulated Local Effects (ALE) plots, which are designed to overcome some of the key limitations of PDPs and SHAP. Stay tuned!

← References