Health Impacts¶
This chapter describes how the model quantifies the health consequences of dietary choices. It begins with the epidemiological concepts that underpin the methodology, then explains the implementation strategy for embedding these nonlinear relationships into a linear optimisation framework.
Conceptual Framework¶
The health module converts dietary intake patterns into monetised health costs using epidemiological dose–response relationships from the Global Burden of Disease (GBD) Study. This section explains the key concepts and formulas.
Relative Risk¶
For a given disease \(d\) (e.g., coronary heart disease) and dietary risk factor (e.g., vegetable intake), the relative risk \(\mathrm{RR}_d(x)\) quantifies how the probability of developing that disease changes with intake level \(x\). Specifically, \(\mathrm{RR}_d(x)\) is the ratio of disease probability at intake \(x\) to the probability at some reference intake.
Example
From GBD data, vegetables and CHD: \(\mathrm{RR}_{\mathrm{CHD}}(0) = 1.0\), \(\mathrm{RR}_{\mathrm{CHD}}(100\text{g}) = 0.91\), \(\mathrm{RR}_{\mathrm{CHD}}(300\text{g}) = 0.80\). Consuming 300g/day of vegetables reduces CHD risk by 20% compared to zero intake.
For protective foods (fruits, vegetables, whole grains, etc.), RR decreases as intake increases. For harmful foods (red meat, processed meat), RR increases with intake.
Theoretical Minimum Risk Exposure Level (TMREL)¶
The TMREL, denoted \(\bar{x}\), is the intake level that minimises disease risk. We define:
as the reference relative risk at optimal intake. For protective foods, TMREL corresponds to high intake where the RR curve reaches its minimum. For harmful foods, TMREL is typically zero.
Population Attributable Fraction (PAF)¶
The population attributable fraction measures how much of the disease burden would change if intake shifted from a baseline level \(x^{\mathrm{base}}\) to a new level \(x\). It is defined as:
Interpretation:
\(\mathrm{PAF}_d(x) > 0\): intake \(x\) is healthier than baseline (disease burden decreases)
\(\mathrm{PAF}_d(x) < 0\): intake \(x\) is less healthy than baseline (disease burden increases)
\(\mathrm{PAF}_d(\bar{x})\) is the fraction of burden avoidable by shifting to optimal intake
Example
Suppose baseline vegetable intake is 150g/day with \(\mathrm{RR}_{\mathrm{CHD}}(150) = 0.87\), and we consider shifting to 300g/day with \(\mathrm{RR}_{\mathrm{CHD}}(300) = 0.80\). Then:
An 8% reduction in CHD burden is attributable to this dietary shift.
Years of Life Lost (YLL)¶
Years of life lost quantifies premature mortality by multiplying deaths by remaining life expectancy. Let \(\mathrm{YLL}_d\) denote the observed baseline YLL for disease \(d\) in a population.
When intake changes from baseline \(x^{\mathrm{base}}\) to \(x\), the change in YLL is:
Example
A population loses 50,000 years of life annually to CHD (\(\mathrm{YLL}_{\mathrm{CHD}} = 50{,}000\)). If a dietary intervention achieves \(\mathrm{PAF}_{\mathrm{CHD}} = 0.08\), then:
Multiple Risk Factors¶
When multiple dietary risk factors affect the same disease \(d\), their effects combine multiplicatively:
where \(r\) indexes risk factors and \(x_r\) is the intake for each.
Example
CHD is affected by both vegetables (\(\mathrm{RR}_{v,\mathrm{CHD}} = 0.80\)) and red meat (\(\mathrm{RR}_{m,\mathrm{CHD}} = 1.15\)). The combined effect:
Net 8% reduction in CHD risk despite increased red meat consumption.
Health Cost Formulation¶
In food-opt, we define the health cost as the monetised value of years
of life lost that could have been avoided by eating optimally. For a
population cluster \(c\) and disease \(d\):
where \(V\) is the value per year of life lost (configured as
health.value_per_yll, default 50,000 USD). The term
\(\Delta\mathrm{YLL}_d(\bar{x})\) is the maximum YLL avoidable (at optimal
intake), while \(\Delta\mathrm{YLL}_d(x)\) is the YLL actually avoided at
intake \(x\). The difference is the YLL that could have been avoided but
wasn’t—the health cost of not eating optimally.
To get an implementation-friendly formula using relative risk factors directly, we can expand a simplify using \(\Delta\mathrm{YLL}_d(x) = \mathrm{PAF}_d(x) \times \mathrm{YLL}_{c,d}\) and the above formula for \(\mathrm{PAF_d}\):
This gives the final formula:
Key properties:
Zero cost at TMREL: When \(x = \bar{x}\), the cost is zero because we avoid as many years of life lost as possible.
Non-negative costs: Since TMREL minimises RR, we have \(\mathrm{RR}_d(x) \geq \mathrm{RR}_d^{\mathrm{ref}}\) always.
Example
Consider a cluster with:
\(\mathrm{YLL}_{\mathrm{CHD}} = 100{,}000\) years (observed CHD burden)
\(\mathrm{RR}_{\mathrm{CHD}}(x^{\mathrm{base}}) = 1.10\) (baseline diet slightly unhealthy)
\(\mathrm{RR}_{\mathrm{CHD}}^{\mathrm{ref}} = 0.85\) (at TMREL)
\(\mathrm{RR}_{\mathrm{CHD}}(x) = 0.95\) (optimised diet, not quite optimal)
\(V = 50{,}000\) USD/YLL
The health cost is approximately 455 million USD for this cluster–disease pair.
GBD Dietary Risk Factors¶
The model uses dietary risk factor definitions from the Global Burden of Disease Study 2021 [Brauer2024]. The following table reproduces a subset of these definitions from Brauer et al. (2024, Supplementary Appendix 1, p. 171).
Risk Factor |
Definition of Exposure |
Optimal Level (TMREL) |
|---|---|---|
Diet low in fruit |
Average daily consumption of fruit including fresh, frozen, cooked, canned, or dried fruit, excluding fruit juices and salted or pickled fruits |
340–350 g/day |
Diet low in vegetables |
Average daily consumption of vegetables, including fresh, frozen, cooked, canned, or dried vegetables, excluding legumes, salted or pickled vegetables, juices, nuts and seeds, and starchy vegetables |
306–372 g/day |
Diet low in whole grains |
Average daily consumption of whole grains (bran, germ, and endosperm in natural proportion) from cereals, bread, rice, pasta, etc. |
160–210 g/day |
Diet low in nuts and seeds |
Average daily consumption of nuts and seeds, including tree nuts, seeds, and peanuts |
19–24 g/day |
Diet low in legumes |
Average daily consumption of legumes and pulses, including fresh, frozen, cooked, canned, or dried legumes |
100–110 g/day |
Diet low in seafood omega-3 |
Average daily consumption of EPA and DHA (mg/day) |
470–660 mg/day |
Diet high in red meat |
Average daily consumption of unprocessed red meat (beef, pork, lamb, goat), excluding processed meats, poultry, fish, and eggs |
0–200 g/day |
Diet high in processed meat |
Average daily consumption of meat preserved by smoking, curing, salting, or chemical preservatives |
0 g/day |
Notes on current implementation:
Risk factors modelled by default: fruits, vegetables, whole_grains, nuts_seeds, legumes, red_meat (configured in
health.risk_factors). GBD also provides seafood omega-3 and processed meat risk factors, but fish/seafood and processed meat are not currently modelled as food groups.Disease causes modelled: CHD (coronary heart disease), Stroke, T2DM (type 2 diabetes), CRC (colorectal cancer)
Sugar: The GBD dataset includes relative risk factors for sugar-sweetened beverages, which are not represented in the model and thus not included here. No relative risk factors are given for total added sugar intake.
TMREL values: Derived from relative risk curves, not taken from the table above (see TMREL Derivation)
Age range: Risk factors evaluated for adults ≥25 years (
health.intake_age_min)Intake units: All quantities in fresh (as consumed) weight, matching GDD dietary data conventions
TMREL Derivation¶
Rather than using the published TMREL ranges from the table above, the model derives TMREL values directly from the GBD relative risk curves. For each risk factor, the derived TMREL is the intake level \(x\) that minimises the product of \(\mathrm{RR}_d(x)\) across all associated disease causes \(x\), evaluated on the empirical exposure points in the RR data. This approach ensures consistency between the TMREL used in health cost calculations and the underlying dose–response curves.
Implementation Strategy¶
Embedding the health cost formulation into a linear programme requires careful handling of nonlinearities. This section provides a high-level overview of the implementation approach.
Linearizing multiplicative risk factors¶
The core challenge is that relative risks multiply across risk factors \(r\):
This product is nonlinear in the intake variables \(x_r\). This is
a problem since food-opt is nominally formulated as a linear
optimization model. Non-linear constraints such as the above cannot
directly be incorporated into the overall linear program formulation,
and generally make the optimization program more difficult to solve
both theoretically and practically speaking.
In order to still incorporate the multiplicative factors, we convert multiplication to a logarithm + addition + exponential, and use piecewise-linear approximations of the logarithmic and exponential functions.
Convert multiplication to addition: \(\log(\prod_r \mathrm{RR}_{r,d}) = \sum_r \log(\mathrm{RR}_{r,d})\)
Approximate \(\log(\mathrm{RR}_{r,d}(x_r))\) as a piecewise-linear function of \(x_r\)
Approximate \(\exp(z)\) as a piecewise-linear function to recover \(\mathrm{RR}_d\)
Two-Stage SOS2 Interpolation¶
The implementation uses Special Ordered Sets of Type 2 (SOS2) constraints to represent piecewise-linear functions without introducing binary variables (when the solver supports SOS2).
Stage 1: Intake → log(RR)
For each risk factor \(r\) and disease \(d\), precompute breakpoints \((x_k, \log\mathrm{RR}_{r,d}(x_k))\) from the GBD dose–response data. During optimisation, introduce SOS2 variables \(\lambda_k\) satisfying:
The SOS2 constraint ensures at most two adjacent \(\lambda_k\) are nonzero, yielding piecewise-linear interpolation.
Stage 2: Aggregated log(RR) → RR
Sum the log-RR contributions across risk factors: \(z_d = \sum_r \log\mathrm{RR}_{r,d}\). Then apply a second SOS2 interpolation using precomputed breakpoints \((z_m, \exp(z_m))\) to recover \(\mathrm{RR}_d\).
Health Clustering¶
Modelling health impacts for each country individually would create an intractable number of variables and constraints. Instead, countries are grouped into health clusters that share:
Similar geographic location
Similar GDP per capita (proxy for healthcare quality)
Roughly balanced population sizes
The clustering algorithm uses weighted K-means with iterative refinement. The
number of clusters is configured via health.region_clusters.
Health clusters grouping countries based on geographic proximity, GDP per capita similarity, and population balance.¶
Baseline diet-attributable chronic disease burden (years of life lost per 100,000 population) by health cluster, computed from Global Burden of Disease data. Clusters with higher burden tend to have diets with greater exposure to dietary risk factors such as low fruit and vegetable intake or high red meat consumption.¶
Solver Compatibility¶
The piecewise-linear interpolation uses solver-dependent formulations:
Gurobi: Native SOS2 constraint support. Uses λ (lambda) variables with SOS2 adjacency constraints for efficient piecewise-linear interpolation.
HiGHS: Uses the delta (incremental) formulation that requires no binary variables, keeping the problem as a pure LP.
Delta Formulation for HiGHS¶
Since HiGHS lacks native SOS2 support, the implementation uses an incremental formulation that avoids binary variables entirely. For n breakpoints \((x_0, x_1, \ldots, x_{n-1})\) with function values \((f_0, f_1, \ldots, f_{n-1})\):
Variables: δ_j ∈ [0,1] for j = 0, …, n-2 (one per segment)
Constraints:
where \(\Delta x_j = x_{j+1} - x_j\) and \(\Delta f_j = f_{j+1} - f_j\).
Why it works: The fill-up constraints ensure segments are “filled” from left to right. When the input x is fixed by an equality constraint (as in both Stage 1 and Stage 2), the δ values are uniquely determined without degeneracy.
Comparison with lambda formulation:
Aspect |
Lambda + SOS2 |
Delta (incremental) |
|---|---|---|
Continuous variables |
n (one per breakpoint) |
n-1 (one per segment) |
Binary variables |
0 (with native SOS2) |
0 |
Additional constraints |
Convexity (Σλ=1) + SOS2 |
Fill-up ordering (n-2) |
Problem type |
LP (Gurobi), MIP (old HiGHS) |
LP (all solvers) |
Data Flow Overview¶
Preprocessing (workflow/scripts/prepare_health_costs.py):
Cluster countries into health regions
Compute baseline YLL and RR for each cluster–cause pair
Build breakpoint tables for SOS2 interpolation
Output:
risk_breakpoints.csv,cause_log_breakpoints.csv,cluster_cause_baseline.csv
Solver (workflow/scripts/solve_model.py):
Read breakpoint tables
Create SOS2 variables and constraints for each cluster–risk–cause combination
Construct health cost expressions and add to objective
Detailed Implementation¶
This section provides technical details for developers working with the health module.
Data Inputs¶
workflow/scripts/prepare_health_costs.py assembles the following datasets:
Baseline diet (
processing/{name}/dietary_intake.csv): average daily intake by country and food item from the Global Dietary Database (GDD)Relative risks (
processing/{name}/health/relative_risks.csv): dose–response pairs for each (risk factor, cause) combination from GBDMortality rates (
processing/{name}/health/gbd_mortality_rates.csv): cause-specific death rates by age, country and yearPopulation and life tables (
processing/{name}/population_age.csvandprocessing/{name}/health/life_table.csv): age-structured population counts and remaining life expectancy schedules
Preparation Workflow¶
The preprocessing script performs these steps:
Health clustering – groups countries into
health.region_clustersclusters using a multi-objective approach that balances:Geographic proximity (weight:
health.clustering.weights.geography)GDP per capita similarity (weight:
health.clustering.weights.gdp)Population balance (weight:
health.clustering.weights.population)
The cluster map is saved as
processing/{name}/health/country_clusters.csv.Baseline burden – combines mortality, population and life expectancy to compute years of life lost (YLL) per cluster. For each cause, it computes both total YLL and diet-attributable YLL using the population-attributable fraction. Results:
processing/{name}/health/cluster_cause_baseline.csv.TMREL derivation – finds the intake that minimises aggregate log(RR) for each risk factor. Results:
processing/{name}/health/derived_tmrel.csv.Risk-factor breakpoints – builds grids of intake values over the empirical RR data range, evaluating \(\log(\mathrm{RR})\) at each point. Results:
processing/{name}/health/risk_breakpoints.csv.Cause-level breakpoints – constructs breakpoints for the aggregated log-RR and its exponential. Results:
processing/{name}/health/cause_log_breakpoints.csv.
From Diet to Risk Exposure¶
Per-capita intake
During optimisation, consumption is tracked using food group stores named
store_<group>_<ISO3>. For each health cluster \(c\) and risk factor
\(r\), the solver computes per-capita intake by summing store levels across
countries in the cluster:
where \(e_{i,r}\) is the store level for country \(i\) and food group \(r\) in Mt/year, and \(P_c\) is the cluster population. The factor \(10^{12}\) converts from megatonnes to grams.
Linearised relative risk curves
For every (cluster, risk) pair, SOS2 variables \(\lambda_k\) satisfy:
Aggregating across risk factors
The combined effect on each disease is:
Recovering total relative risk
A second SOS2 interpolation maps \(z = \log\mathrm{RR}_{c,d}\) back to \(\mathrm{RR}_{c,d} = \exp(z)\) using precomputed breakpoints.
Health cost expression
The PyPSA store energy level encodes deviation from optimal:
measured in million YLL. The monetary contribution is
marginal_cost_storage × e.
Configuration¶
health:
enabled: true # Whether to include health costs in the objective function
region_clusters: 30
reference_year: 2018
intake_grid_points: 15 # Number of grid knots over empirical RR range
log_rr_points: 15
ssb_sugar_g_per_100g: 5.7 # ≈50 kcal per 226.8 g sugar-sweetened beverage (SSB) implies ~5.7 g sugar per 100 g
value_per_yll: 50000 # USD_2024 per year of life lost
intake_cap_g_per_day: 1000 # Uniform generous cap on intake grids and clipping
intake_age_min: 11 # GDD adult band starts at 11; set to 11 to retain adult intake data. Note however that GDB chronic disease risk factors are for adults of >=25 years.
# Dietary risk factors to consider (must match GDD data items)
risk_factors:
- fruits
- vegetables
- nuts_seeds
- legumes
- red_meat
- whole_grains
# GBD also covers seafood omega-3 and processed meat risk factors,
# but fish/seafood and processed meat are not modelled as food groups.
# GDB has data on sugar-sweetened beverage intake as a risk factor,
# from which we can in theory derive added sugar intake risk
# factors. The epidemiological evidence for this is, however,
# lacking, and so we don't count "sugar" as a risk factor.
# - sugar
# Health outcomes/causes to consider (must be present in IHME GBD data and relative risks)
causes:
- CHD # Coronary/Ischemic Heart Disease
- Stroke # Stroke (all types)
- T2DM # Type 2 Diabetes Mellitus
- CRC # Colorectal Cancer
# Mapping of risk factors to the causes they affect
risk_cause_map:
fruits: [CHD, Stroke, T2DM]
vegetables: [CHD, Stroke]
nuts_seeds: [CHD, T2DM]
legumes: [CHD]
red_meat: [CHD, Stroke, T2DM, CRC]
whole_grains: [CHD, Stroke, T2DM, CRC]
# sugar: [CHD, Stroke, T2DM, CRC]
# Multi-objective clustering settings for grouping countries into health clusters
clustering:
gdp_reference_year: 2025 # Reference year for GDP per capita data
weights:
geography: 1.0 # Weight for geographic proximity
gdp: 0.5 # Weight for GDP per capita similarity
population: 0.3 # Weight for population balance across clusters
Key parameters:
region_clusters: Number of health clusters (more = finer resolution, slower)intake_grid_points: Density of Stage 1 breakpointslog_rr_points: Density of Stage 2 breakpointsvalue_per_yll: Monetary value per year of life lost (USD)risk_factors: Which dietary risk factors to modelrisk_cause_map: Which causes each risk factor affects
Outputs¶
The preprocessing rule saves all intermediate products under
processing/{name}/health/:
country_clusters.csv: Cluster assignmentscluster_cause_baseline.csv: Baseline YLL and RR by cluster–causecluster_summary.csv: Cluster populationsrisk_breakpoints.csv: Stage 1 breakpoint tablescause_log_breakpoints.csv: Stage 2 breakpoint tablesderived_tmrel.csv: TMREL values derived from RR curves
Plotting rules create visualisations under results/{name}/plots/.
References¶
Brauer M, Roth GA, Aravkin AY, et al. Global Burden and Strength of Evidence for 88 Risk Factors in 204 Countries and 811 Subnational Locations, 1990–2021: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet, 2024;403(10440):2162–203. https://doi.org/10.1016/S0140-6736(24)00933-4