by Veronica Scerra
So far in this series, we’ve explored PDPs for their global perspective and ICE plots for their individual-level insights. Here, we’ll dive into a tool that elegantly bridges the gap between these two, offering both granular and aggregate views of feature importance. Meet SHAP (Shapley Additive exPlanations) values - a powerful, theoretically grounded approach that lets us peek inside the black box and see how each feature contributes to individual predictions.
SHAP | |
---|---|
What: | Decomposes model prediction into feature contributions using game theory (fun!) |
Use When: | You want local interpretability and global insights |
Assumptions: | Feature independence (but kernel SHAP and tree SHAP try to work around this) |
Alternatives: | ICE, LIME, ALE, PDP |
Imagine your model is a team of players working together to make a prediction - say, whether a borrower will default on a loan. Each feature is a player: credit score, income, debt-to-income ratio, age, etc. How much does each of these players contribute to the final prediction? That’s what SHAP values help us understand.
SHAP comes from game theory and borrows concepts of Shapley values, originally designed to fairly distribute the “payout” among players depending on their individual contributions. In machine learning, the “payout” is the model prediction, and SHAP values fairly distribute that prediction across all features.
What makes SHAP compelling is its strong theoretical guarantees. It’s the only method that satisfies local accuracy, missingness, and consistency. (If your eyes glaze over - don’t worry, we’ll break it down.)
Let’s break it down with a food analogy (because who doesn’t love food?).
Imagine you and three friends contribute to cooking a fancy dinner. The final meal is amazing (naturally). You want to figure out how much credit each friend deserves. One way to do it? Try all combinations of who's in or out of the kitchen. A friend's contribution is the average difference in meal quality when they’re included vs excluded.
That’s essentially what SHAP does.
The formula for the SHAP value for a feature is:
\[ \phi_i = \sum_{S \subseteq F \setminus \{i\}} \frac{|S|!(|F| - |S| - 1)!}{|F|!} \left[ f(S \cup \{i\}) - f(S) \right] \]
This answers: “Across all possible feature combinations, how much does feature \( i \) change the model’s prediction when added?” This goes beyond toggling one feature—SHAP considers contextual dependencies across many combinations.
Simple 🙂
SHAP is one of the most flexible and informative interpretability tools available. It offers:
SHAP should be your go-to when:
Basically, SHAP is great when you need both rigor and clarity - especially in sensitive or high-stakes domains.
Of course, SHAP isn't perfect. When a model has many features, calculating SHAP values can be computationally expensive. The other edge of that sword is that SHAP provides a lot of information. Without thoughtful storytelling and interpretation, it's easy to drown in all those hard-won details. It's best used alongside domain knowledge and complemented with tools like PDPs, ICE, or ALE plots
Keep in mind, like PDPs, SHAP assumes feature independence, which can produce misleading results when features are strongly correlated.
SHAP values offer one of the most complete pictures we have for model interpretability. By connecting individual predictions to feature contributions in a mathematically sound way, SHAP allows you to explain, debug, and ultimately trust your models like never before
Where PDPs give you a sense of overall behavior and ICE shows you individual trajectories, SHAP ties it all together - telling you both what the model did and why it did it. If you're going to invest in learning one interpretability tool in-depth, SHAP might be the one.
Luckily, you don't have to pick just one! In my next post, I'll explore another rising interpretability technique - Accumulated Local Effects (ALE) plots, which are designed to overcome some of the key limitations of PDPs and SHAP. Stay tuned!