by Veronica Scerra
Compartmental disease modeling with vaccination intervention to quantify public health impact and assess endemic potential.
Coming off my HIV/AIDS modeling project for Cuba, I wanted to tackle a different type of epidemic challenge—one with a rapid outbreak timeline, limited data, and a clear intervention point. Yellow fever provided exactly that opportunity. Unlike HIV's slow burn over decades, yellow fever outbreaks explode quickly and can be controlled through vaccination. The 2002 outbreak in Touba, Senegal offered a perfect case study: a discrete epidemic with WHO surveillance data and a documented vaccination response starting October 1st.
The WHO reported a yellow fever outbreak in Touba, Senegal (population ~800,000) beginning in October 2002. The city identified cases on October 11th and had already initiated a mass vaccination campaign on October 1st following early detection. The available data consisted of just 8 time points spanning January through November 2002, documenting cumulative cases and deaths.
Working with only 8 observations might seem limiting, but this reflects the reality of outbreak response—you make decisions with incomplete information. The challenge was to build a model that could extract meaningful insights from sparse data while remaining scientifically rigorous.
An early obstacle emerged when examining the temporal structure: the January 18th observation showed 18 cumulative cases, but by October 4th, only 12 cases were reported. Cumulative counts can't decrease—this violated a fundamental constraint of epidemic data. After investigating, I determined the January observation was likely either from background surveillance or misattributed, not part of the October-November outbreak cluster. I removed this data point to maintain monotonicity, leaving 7 observations from the actual outbreak period.
I chose a compartmental SEIR model—a standard framework in infectious disease epidemiology—and extended it to include vaccination. The model divides the population into six compartments:
The vaccination component was critical—it allowed me to model the intervention explicitly and assess its impact through counterfactual scenarios.
With only 7 data points, I couldn't estimate all parameters simultaneously without overfitting. I made a strategic decision to fix well-established parameters from medical literature and focus estimation on the uncertain ones:
| Parameter | Value | Source | Status |
|---|---|---|---|
| Incubation Period (1/σ) | 6 days | WHO Guidelines, Monath & Vasconcelos (2015) | Fixed |
| Infectious Period (1/γ) | 7 days | Clinical literature, Garske et al. (2014) | Fixed |
| Vaccine Efficacy | 95% | WHO Position Paper (2013) | Fixed |
| Transmission Rate (β) | — | Estimated from data | Fitted |
| Mortality Rate (α) | — | Estimated from data | Fitted |
| Vaccination Rate | — | Estimated from data | Fitted |
| Vaccination Start | — | Estimated from data | Fitted |
This approach balanced data constraints with scientific rigor. The fixed parameters came from peer-reviewed clinical studies with narrow uncertainty ranges, while the fitted parameters represented outbreak-specific dynamics that varied by context.
For parameter estimation, I chose Maximum Likelihood Estimation (MLE) with a Poisson likelihood function. This wasn't an arbitrary choice—it was driven by the nature of the data:
I implemented the likelihood function to incorporate both cases and deaths, ensuring the optimizer couldn't ignore one data stream in favor of the other. Each observation contributed to the total log-likelihood through its Poisson probability.
During initial fitting attempts, I encountered a frustrating issue: the mortality parameter (α) kept converging to zero, predicting no deaths despite 11 observed deaths. The optimizer had found a local minimum where it could fit cases reasonably while completely ignoring mortality.
The root cause was simple but critical: the lower bound on α was set to 0.0, allowing the optimizer to "escape" the death predictions entirely. The fix required adjusting the parameter bounds to prevent this numerical instability:
This debugging process highlighted the importance of understanding optimizer behavior—good bounds aren't just about biological plausibility, they're about guiding the optimization to meaningful solutions.
After resolving the optimization issues, the model converged successfully with excellent fit quality:
| Metric | Cases | Deaths |
|---|---|---|
| R² (Coefficient of Determination) | 0.93 | 0.96 |
| RMSE (Root Mean Squared Error) | 5.0 | .80 |
| MAE (Mean Absolute Error) | 4.3 | .64 |
| Parameter | Estimate | Interpretation |
|---|---|---|
| Transmission Rate (β) | 0.206 | Rate of new infections per contact |
| Mortality Rate (α) | 0.033 | Daily probability of death while infectious |
| Vaccination Rate | 0.009 | ~0.9% of susceptibles vaccinated daily |
| R₀ (Basic Reproduction Number) | 1.17 | Each case infects ~1.17 others (epidemic potential) |
The R₀ value of 1.17 was particularly important—being greater than 1 confirmed the outbreak had epidemic potential and would continue to grow without intervention. This validated the urgency of the vaccination campaign.
To quantify the vaccination program's effectiveness, I ran three scenarios using the fitted model:
| Scenario | Total Cases | Total Deaths | Peak Infections |
|---|---|---|---|
| Baseline (With Vaccination) | 62 | 10 | 10 |
| No Vaccination | 97 | 14 | 11 |
| Early Vaccination (2 weeks earlier) | 62 | 10 | 10 |
| Later Vaccination (1 week later) | 70 | 11 | 10 |
Vaccination Impact: The program averted approximately 35 cases (35%) and 4 deaths (29%) compared to the no-intervention scenario. This represents significant public health benefit from rapid response.
Interestingly, vaccinating two weeks earlier produced identical outcomes to the baseline. This suggests the October 1st start date was already "early enough"—the outbreak was still in its early exponential phase, so the actual timing captured most of the preventable transmission. Starting even earlier wouldn't have helped because there wasn't yet substantial community transmission to prevent. Delaying vaccination rollout by even 1 week, however could have resulted in an additional 8 cases and caused an additional death.
Since I fixed the incubation and infectious periods based on literature, I needed to verify that my conclusions were robust to these choices. I re-ran the analysis across 16 parameter combinations spanning the full literature-supported ranges:
Key Finding: R₀ remained above 1.0 for all parameter combinations tested, with a coefficient of variation of only 8%. This confirmed that the epidemic conclusion—that yellow fever had outbreak potential requiring intervention—was robust to uncertainty in the fixed parameters.
Scientific Rigor with Limited Data
Optimization and Debugging
Practical Public Health Insights
This project reinforced that epidemic modeling is as much about careful decision-making as sophisticated mathematics. Every choice—from which parameters to fix, to how to handle outlier data points, to which optimization bounds to set—required balancing statistical principles, domain knowledge, and practical constraints.
Working with sparse data forced me to be deliberate about uncertainty. Rather than treating 7 data points as a limitation, I treated it as a constraint that required smarter modeling choices. The result was a model that was simultaneously simple enough to be interpretable, complex enough to capture key dynamics, and robust enough to trust for policy insights.
Most importantly, this project demonstrated how mathematical models can bridge the gap between observational data and actionable public health decisions—quantifying not just what happened, but what would have happened under different scenarios.