by Veronica Scerra
I think that when I say I have a PhD in neuroscience, it conjures up in peoples’ minds images of white labcoats, pipettes, and cell cultures. That’s not unreasonable - many neuroscientists do that kind of bench work and our world is better for it. In reality, my days as a neuroscientist consisted of a few hours desperately watching an oscilloscope, and many hours cleaning, analyzing, and interpreting data, then building models to explain that data. And if all I had to do to be successful in neuroscience was that, I would probably still be there, because I loved those hours in front of my computer with my data. This brief jaunt down memory lane is actually relevant - what I’m writing about today are counterfactual explanations. In essence, my work in neuroscience can be distilled down to an exploration of a counterfactual. In basic terms, how this works is: we have a scenario wherein a certain phenomenon can be reliably produced, we create a scenario in which this known phenomenon does not hold true, and then we drill down on what specifically differs between our new scenario and the well-established one that can tell us something about the inner workings of a complex system, narrowing down the sources of unknown variance to as small a set as possible, so that we can draw meaningful conclusions from the outcomes. The complex system in my work was the primate frontal eye field, a small visomotor integrating region of the prefrontal cortex, but in a broader sense, this could be any black-box model.
| Counterfactual Explanations | |
|---|---|
| What: | Show minimal changes to input features that would alter the model’s prediction |
| Use When: | You want user recourse, fairness insights, or actionable explanations |
| Assumptions: | You must define plausible alternatives and realistic constraints (the alternatives must exist in the real world) |
| Alternatives: | SHAP, ICE, LIME, contrastive explanations |
Counterfactual explanations are a remarkably versatile and accessible explainability tool because they are built around outcomes, and do not require any deep understanding of the inner workings of the models producing those outcomes. It almost seems like a throwaway statement to say that you don’t need to understand the inner working of complex models to use counterfactuals, but it’s actually amazing - it means you don’t need to know anything about proprietary models and their design, it means that you don’t have to understand anything about machine learning to find them useful (anyone can understand them), and they can give interested parties actionable targets for changing/varying to obtain different outcomes. What counterfactual explanations do is search the input space for the nearest possible alternate inputs that generate a different output.
Let’s say you're using a classifier model to determine peoples’ eligibility for a new drug trial, and someone is classified as a 0, or ineligible. A counterfactual exploration would be able to tell you what that candidate might change, or do differently, to be classified differently. This can give interested parties a genuine understanding of the model’s decisions without having to know or understand anything going on under the hood. Similarly, these explanations can be used to assess fairness and bias in models. For example, if a counterfactual analysis of a model leads to the recommendation that candidates can get better outcomes by changing their race, or economic status, it might mean that the model is racially or economically biased, which, depending on the usage, could be a problem.
While other interpretability metrics can tell you how a decision was reached, counterfactuals let us explore what might have been…
Imagine you have a dial for each of your model input features, and you want to turn as few dials as possible, just enough to cross the model’s decision boundary. An ideal counterfactual would turn those dials (alter the input features) as little as possible to obtain a new decision - representing the closest world in which things are “different”. Often, you have the choice of different “alternate worlds”, wherein different features are changed. This is a good thing, as it can illuminate various paths for targeting changes. You have to be sensible to two things: 1) how you compute the “distance” between your base feature value, and 2) that the new counterfactual features values could actually exist (e.g., age can’t be negative), ensuring realistic alternatives. Several libraries exist for aiding in this process, for example, DiCE Python library will generate diverse feasible counterfactual options to test in your model, and the Alibi library can test out different algorithms and distance metrics to obtain optimal results
Simple 🙂
If PDP shows us the forest, ICE gives us the trees, and SHAP tells us the path we’ve taken to get where we are, counterfactuals let us ask what would have happened if we had turned left instead of right. Counterfactual explanations don’t just describe, they suggest. They empower. They let users and stakeholders understand the model’s decision as dynamic rather than static - an invitation to change rather than a closed door.
Stay tuned for a hands-on notebook where we generate and interpret counterfactuals using the DiCE library. You’ll see how simple tweaks can unlock powerful stories about your model - and your data. Stay tuned!