Demystifying Model Interpretability: A Deep Dive into SHAP values

by Veronica Scerra

So far in this series, we’ve explored PDPs for their global perspective and ICE plots for their individual-level insights. Here, we’ll dive into a tool that elegantly bridges the gap between these two, offering both granular and aggregate views of feature importance. Meet SHAP (Shapley Additive exPlanations) values - a powerful, theoretically grounded approach that lets us peek inside the black box and see how each feature contributes to individual predictions.

TL; DR

SHAP
What:	Decomposes model prediction into feature contributions using game theory (fun!)
Use When:	You want local interpretability and global insights
Assumptions:	Feature independence (but kernel SHAP and tree SHAP try to work around this)
Alternatives:	ICE, LIME, ALE, PDP

What are SHAP values?

Imagine your model is a team of players working together to make a prediction - say, whether a borrower will default on a loan. Each feature is a player: credit score, income, debt-to-income ratio, age, etc. How much does each of these players contribute to the final prediction? That’s what SHAP values help us understand.

SHAP comes from game theory and borrows concepts of Shapley values, originally designed to fairly distribute the “payout” among players depending on their individual contributions. In machine learning, the “payout” is the model prediction, and SHAP values fairly distribute that prediction across all features.

What makes SHAP compelling is its strong theoretical guarantees. It’s the only method that satisfies local accuracy, missingness, and consistency. (If your eyes glaze over - don’t worry, we’ll break it down.)

How Does It Work?

Let’s break it down with a food analogy (because who doesn’t love food?).

Imagine you and three friends contribute to cooking a fancy dinner. The final meal is amazing (naturally). You want to figure out how much credit each friend deserves. One way to do it? Try all combinations of who's in or out of the kitchen. A friend's contribution is the average difference in meal quality when they’re included vs excluded.

That’s essentially what SHAP does.

The formula for the SHAP value for a feature is:

\[ \phi_i = \sum_{S \subseteq F \setminus \{i\}} \frac{|S|!(|F| - |S| - 1)!}{|F|!} \left[ f(S \cup \{i\}) - f(S) \right] \]

\( \phi_i \): The SHAP value for feature \( i \)
\( F \): The set of all features
\( S \): Subset not including \( i \)
\( f(S) \): Model prediction using only features in \( S \)

This answers: “Across all possible feature combinations, how much does feature \( i \) change the model’s prediction when added?” This goes beyond toggling one feature—SHAP considers contextual dependencies across many combinations.

In Practice:

The prediction = average prediction + SHAP values for each feature
Each SHAP value reflects how much a feature “pulled” the prediction up/down

Simple 🙂

Strengths of SHAP Plots

SHAP is one of the most flexible and informative interpretability tools available. It offers:

Local explanations: You get a breakdown of why the model made a specific prediction for a specific instance
Global understanding: Aggregating SHAP values across many predictions gives you a clear picture of global feature importance.
Model agnostic and model specific: SHAP can be used with any model (via Kernal SHAP) or optimized for specific ones like tree-based models (Tree SHAP).
Fantastic visualizations: Beeswarm plots, waterfall charts, force plots - each gives you a nuanced look into your model's behavior (see the SHAP notebook for examples)

When to use SHAP

SHAP should be your go-to when:

You want transparency in individual predictions (e.g., explaining to a client why their load is approved)
You're debugging a model and want to see if it's relying too heavily on certain features
You need to audit a model for fairness, especially when predictions have real-world consequences

Basically, SHAP is great when you need both rigor and clarity - especially in sensitive or high-stakes domains.

Limitations

Of course, SHAP isn't perfect. When a model has many features, calculating SHAP values can be computationally expensive. The other edge of that sword is that SHAP provides a lot of information. Without thoughtful storytelling and interpretation, it's easy to drown in all those hard-won details. It's best used alongside domain knowledge and complemented with tools like PDPs, ICE, or ALE plots

Keep in mind, like PDPs, SHAP assumes feature independence, which can produce misleading results when features are strongly correlated.

Final Thoughts

SHAP values offer one of the most complete pictures we have for model interpretability. By connecting individual predictions to feature contributions in a mathematically sound way, SHAP allows you to explain, debug, and ultimately trust your models like never before

Where PDPs give you a sense of overall behavior and ICE shows you individual trajectories, SHAP ties it all together - telling you both what the model did and why it did it. If you're going to invest in learning one interpretability tool in-depth, SHAP might be the one.

Luckily, you don't have to pick just one! In my next post, I'll explore another rising interpretability technique - Accumulated Local Effects (ALE) plots, which are designed to overcome some of the key limitations of PDPs and SHAP. Stay tuned!

← References