← Back to Disease Modeling

Yellow Fever Outbreak Modeling: Senegal 2002

by Veronica Scerra

Compartmental disease modeling with vaccination intervention to quantify public health impact and assess endemic potential.

Coming off my HIV/AIDS modeling project for Cuba, I wanted to tackle a different type of epidemic challenge—one with a rapid outbreak timeline, limited data, and a clear intervention point. Yellow fever provided exactly that opportunity. Unlike HIV's slow burn over decades, yellow fever outbreaks explode quickly and can be controlled through vaccination. The 2002 outbreak in Touba, Senegal offered a perfect case study: a discrete epidemic with WHO surveillance data and a documented vaccination response starting October 1st.

Dataset and Context

The WHO reported a yellow fever outbreak in Touba, Senegal (population ~800,000) beginning in October 2002. The city identified cases on October 11th and had already initiated a mass vaccination campaign on October 1st following early detection. The available data consisted of just 8 time points spanning January through November 2002, documenting cumulative cases and deaths.

Working with only 8 observations might seem limiting, but this reflects the reality of outbreak response—you make decisions with incomplete information. The challenge was to build a model that could extract meaningful insights from sparse data while remaining scientifically rigorous.

Data Quality Challenge

An early obstacle emerged when examining the temporal structure: the January 18th observation showed 18 cumulative cases, but by October 4th, only 12 cases were reported. Cumulative counts can't decrease—this violated a fundamental constraint of epidemic data. After investigating, I determined the January observation was likely either from background surveillance or misattributed, not part of the October-November outbreak cluster. I removed this data point to maintain monotonicity, leaving 7 observations from the actual outbreak period.

Key Decision: Data filtering based on epidemiological principles
  • Removed non-monotonic observation (Jan 18)
  • Verified remaining data showed proper cumulative growth
  • Reset time origin to outbreak detection (Oct 4, 2002)

Model Architecture: SEIR with Vaccination

I chose a compartmental SEIR model—a standard framework in infectious disease epidemiology—and extended it to include vaccination. The model divides the population into six compartments:

The vaccination component was critical—it allowed me to model the intervention explicitly and assess its impact through counterfactual scenarios.

Fixed vs. Fitted Parameters

With only 7 data points, I couldn't estimate all parameters simultaneously without overfitting. I made a strategic decision to fix well-established parameters from medical literature and focus estimation on the uncertain ones:

ParameterValueSourceStatus
Incubation Period (1/σ)6 daysWHO Guidelines, Monath & Vasconcelos (2015)Fixed
Infectious Period (1/γ)7 daysClinical literature, Garske et al. (2014)Fixed
Vaccine Efficacy95%WHO Position Paper (2013)Fixed
Transmission Rate (β)Estimated from dataFitted
Mortality Rate (α)Estimated from dataFitted
Vaccination RateEstimated from dataFitted
Vaccination StartEstimated from dataFitted

This approach balanced data constraints with scientific rigor. The fixed parameters came from peer-reviewed clinical studies with narrow uncertainty ranges, while the fitted parameters represented outbreak-specific dynamics that varied by context.

Optimization Method: Maximum Likelihood Estimation

For parameter estimation, I chose Maximum Likelihood Estimation (MLE) with a Poisson likelihood function. This wasn't an arbitrary choice—it was driven by the nature of the data:

Why MLE with Poisson?

I implemented the likelihood function to incorporate both cases and deaths, ensuring the optimizer couldn't ignore one data stream in favor of the other. Each observation contributed to the total log-likelihood through its Poisson probability.

Alternatives Considered:
  • Least Squares: Simpler but inappropriate for count data (assumes normal errors)
  • Bayesian MCMC: Would provide full uncertainty but too computationally expensive for this exploratory analysis
  • Profile Likelihood: Excellent for confidence intervals but requires many model runs

Optimization Challenges: The Alpha Problem

During initial fitting attempts, I encountered a frustrating issue: the mortality parameter (α) kept converging to zero, predicting no deaths despite 11 observed deaths. The optimizer had found a local minimum where it could fit cases reasonably while completely ignoring mortality.

The root cause was simple but critical: the lower bound on α was set to 0.0, allowing the optimizer to "escape" the death predictions entirely. The fix required adjusting the parameter bounds to prevent this numerical instability:

Solution: Changed mortality bounds from [0.0, 0.3] to [0.005, 0.3]
  • Justification: Observed case fatality rate (CFR) was 18.3% (11 deaths / 60 cases)
  • Over 7-day infectious period: α ≈ 0.183/7 ≈ 0.026
  • Minimum 0.005 prevents numerical issues while allowing model flexibility

This debugging process highlighted the importance of understanding optimizer behavior—good bounds aren't just about biological plausibility, they're about guiding the optimization to meaningful solutions.

Results and Model Performance

After resolving the optimization issues, the model converged successfully with excellent fit quality:

MetricCasesDeaths
R² (Coefficient of Determination)0.930.96
RMSE (Root Mean Squared Error)5.0.80
MAE (Mean Absolute Error)4.3.64

Fitted Parameters

ParameterEstimateInterpretation
Transmission Rate (β)0.206Rate of new infections per contact
Mortality Rate (α)0.033Daily probability of death while infectious
Vaccination Rate0.009~0.9% of susceptibles vaccinated daily
R₀ (Basic Reproduction Number)1.17Each case infects ~1.17 others (epidemic potential)

The R₀ value of 1.17 was particularly important—being greater than 1 confirmed the outbreak had epidemic potential and would continue to grow without intervention. This validated the urgency of the vaccination campaign.

Counterfactual Analysis: Vaccination Impact

To quantify the vaccination program's effectiveness, I ran three scenarios using the fitted model:

ScenarioTotal CasesTotal DeathsPeak Infections
Baseline (With Vaccination)621010
No Vaccination971411
Early Vaccination (2 weeks earlier)621010
Later Vaccination (1 week later)701110

Vaccination Impact: The program averted approximately 35 cases (35%) and 4 deaths (29%) compared to the no-intervention scenario. This represents significant public health benefit from rapid response.

Interestingly, vaccinating two weeks earlier produced identical outcomes to the baseline. This suggests the October 1st start date was already "early enough"—the outbreak was still in its early exponential phase, so the actual timing captured most of the preventable transmission. Starting even earlier wouldn't have helped because there wasn't yet substantial community transmission to prevent. Delaying vaccination rollout by even 1 week, however could have resulted in an additional 8 cases and caused an additional death.

Sensitivity Analysis: Robustness Check

Since I fixed the incubation and infectious periods based on literature, I needed to verify that my conclusions were robust to these choices. I re-ran the analysis across 16 parameter combinations spanning the full literature-supported ranges:

Key Finding: R₀ remained above 1.0 for all parameter combinations tested, with a coefficient of variation of only 8%. This confirmed that the epidemic conclusion—that yellow fever had outbreak potential requiring intervention—was robust to uncertainty in the fixed parameters.

Key Insights and Methodological Decisions

Scientific Rigor with Limited Data

Optimization and Debugging

Practical Public Health Insights

What I Learned

This project reinforced that epidemic modeling is as much about careful decision-making as sophisticated mathematics. Every choice—from which parameters to fix, to how to handle outlier data points, to which optimization bounds to set—required balancing statistical principles, domain knowledge, and practical constraints.

Working with sparse data forced me to be deliberate about uncertainty. Rather than treating 7 data points as a limitation, I treated it as a constraint that required smarter modeling choices. The result was a model that was simultaneously simple enough to be interpretable, complex enough to capture key dynamics, and robust enough to trust for policy insights.

Most importantly, this project demonstrated how mathematical models can bridge the gap between observational data and actionable public health decisions—quantifying not just what happened, but what would have happened under different scenarios.

View Source Code