by Veronica Scerra
In my transition from neuroscience to data science, one of the most shocking and upsetting realizations was that so many people, too many people, didn’t know or care about interpretability. Perhaps that’s too harsh - it’s not necessarily that people didn’t care, but that they seemed to think it was extraneous. My desire for interpretability was seen as something extra I was bringing to the table, rather than something fundamental to successful endeavors. Maybe in a world where you can simply install whatever machine learning library you desire and set a model in motion with a few lines, it’s easy to lose sight of what those few lines mean. I’d like to think the tide is turning a bit, and as our AI models and machine learning applications become ubiquitous (even where they’re not strictly necessary), true value will be found in interpretation that can truly connect need with ability, and understanding with function. To that end, I want to embark on a series of posts breaking down some of the common interpretability metrics. We begin with one of the foundational tools for model interpretability - Partial Dependence Plots (PDPs).
PDP | |
---|---|
What: | Shows average model prediction as a function of a specified feature |
Use When: | You want global interpretability |
Assumptions: | Feature independence |
Alternatives: | ICE, SHAP, ALE |
It was during my master’s coursework in statistics that I learned to calculate statistical tests by hand. We all had laptops and statistical programs - it was that our professor thought we would understand better and appreciate the effect of variables if we really got into the mud with things like ANOVA and ANCOVA tests, t-tests, and confidence intervals. This was when I got a deep understanding of the term “marginal effects”. Marginal anything, really, is the thing calculated literally in the margins, across variables.
PDPs are ways to evaluate maginal effects of isolated features on the predicted outcomes of the model. With PDPs you can ask the question: "How does varying this particular feature affect the prediction of the model when everything else is kept the same?" For instance, let’s say you have a stable hair-care routine, and your hair outcomes are fairly predictable. You use shampoo S, conditioner C, brush B, product P, amount of product A, number of brush strokes N, days since last washing D, (etc., etc.,) and get hair H on any given day, and you can rate your hair day on a scale from 1-10 (1 being “AAAack!” and 10 being “AAAMaazing!”). Without changing any of the other variables, you might use a PDP to track how relative humidity affects your hair score. This would tell you the effect that humidity has on your hair that goes beyond what any other variables are doing. Keep in mind, PDPs are global interpretability tools; they reflect average effects over the whole dataset with other features held constant.
For a given feature \( x_j \) the PDP is:
\[ \hat{f}_{PD}(x_j) = \frac{1}{n} \sum_{i=1}^{n} \hat{f}(x_j, x_{i, -j}) \]
In words, this means that the effect of the feature of interest on the trained model is equal to the average of all model predictions for varying values of the features of interest, while holding all other features constant. In the hair example above, you might find that higher humidity leads to worse hair days, and lower humidity leads to better hair days, revealing a negative linear relationship. Running the same experiment on number-of-days-since-last-wash (D) could tell you that you get better hair days for low values of D, peaking around 2, but then hair days get worse for larger values of D, creating a quasi-inverse parabolic relationship. These are all guesses, you’d really need to run the experiment on your own hair to be sure, and PDPs would be a tool to help you understand these reactions.
In practice:
Simple 🙂
PDPs are incredibly useful for easy-to-interpret, model-agnostic, and scalable interpretability. They can be great for debugging models or presenting insights to your team or stakeholders.
PDPs are a low-effort go-to when your model is a black box, like a random forest, gradient boosted tree, or neural network, and you want to see the global effect of your feature(s) of interest. You might also use it to visualize feature interactions (with 2D PDPs).
Be cautious when using PDPs for interpretability if you have any reason to think that your features are correlated - as PDPs assume feature independence. Correlated features can give you misleading results. Additionally, since PDPs show you marginal effects (effects that are averaged over the whole set of other features) they will not give you any insight into heterogeneity in your sample.
Like any other measurement of marginal influence, instance-specific nuances or interactions may be averaged out and lost. I'll go over some alternatives like SHAP, ALE, and ICE plots in other posts.
PDPs offer a simple, intuitive way to visualize how features influence model predictions. While not perfect, they're great starting points in any interpretability workflow. Bonus points for them making you sound like you know what you're talking about - people tend to get really impressed by terms like "marginal effects" and "feature independence". In my next post, I'll explore Individual Conditional Expectation (ICE) plots, which complement PDPs by giving us the ability to explore how individual observations respond to feature changes - very useful when PDPs are not granular enough.