by Veronica Scerra
In previous posts, I dove into Partial Dependence Plots (PDPs) for their global perspective, Individual Conditional Expectation (ICE) plots for their local granularity, and SHAP values for their fair, game-theoretic feature attribution. Each of the above tools carry and important caveat, however, they don’t handle feature correlation well.
Enter Local Accumulated Effects (ALE) plots - a relatively new player in the interpretability game that was devised from the ground-up to avoid the trap of correlated features. ALE plots retain the interpretability goals of PDP and ICE, while offering faster computation and greater robustness in real-world, messy datasets.
ALE | |
---|---|
What: | Capture local effects of features on predictions, averaged over the conditional distribution of the data |
Use When: | You want global interpretability and want to handle correlated features better than PDPs or SHAP |
Assumptions: | Minimal - doesn’t require independence between features |
Alternatives: | ICE, SHAP, PDP |
Let’s use a new analogy. Suppose you’re hiking a mountain trail and tracking the elevation gain over time. A PDP is like calculating the average elevation gain if you just walked straight up the mountain (ignoring switchbacks). It might tell you the general trend, but it glosses over a lot of the nuance. An ICE plot could walk you through each person’s specific path up the mountain, which could be helpful, but would be too much if you had data from thousands of hikers. ALE plots, on the other hand? ALE breaks the trail into small segments (e.g., every 10 meters), and calculates the local elevation gain in each segment (i.e., the slope right there), and then accumulates these changes as you go along the trail. This gives you a picture of how steep things are, without assuming all hikers are walking the same path.
In machine learning terms, ALE computes the average change in prediction over small intervals of a feature, based on actual values present in the data, making it more reliable when features are correlated.
For a given feature \( x_j \):
\[ \hat{f}_j^{\text{ALE}}(x_j) = \int_{z_0}^{x_j} \mathbb{E}_{x_{-j} \mid x_j = z} \left[ \frac{\partial \hat{f}(z, x_{-j})}{\partial z} \right] dz \]
I get it, it looks scary, but it just means: accumulate the average local changes in the model’s prediction as you step through the feature’s values. Crucially, these averages are computed only where data actually exists - not extrapolation into unrealistic combinations.
The result is a clean, readable plot that shows how predictions locally change with the feature without being distorted by unrealistic data configurations.
Simple 🙂
Use ALE when:
They shine in real-world data problems (think credit risk, medical outcomes, pricing models), where features often dance together in complex, interdependent ways.
ALE plots assume that local perturbations of features are meaningful - so if your model behaves erratically or non-smoothly across the feature space, the local differences might not be representative. Also, while ALE plots handle first-order effects beautifully, interpreting second-order (interaction) ALE plots can get dicey.
While it’s more robust than PDPs, ALE still requires thoughtful feature engineering - as ever: \( f_{(garbage)} = garbage \)
If PDPs give you a high level map of the terrain, and ICE shows you individual paths, ALE is the terrain map drawn from the actual trails people walk. It’s trustworthy, interpretable, and highly informative - especially when features interact in complex, real-world ways.
Use ALE when you’re tired of being betrayed by correlated features, and you want a clear, averaged view of local model behavior. It doesn’t give you everything, but it gives you something you can actually use.
In my next post, I'll discuss Local Interpretable Model-agnostic Explanations (LIME), for when you want to know why your model made that prediction. Stay tuned!