Health Impacts¶
Overview¶
The health module converts dietary choices in the optimisation into monetised health impacts. It combines epidemiological evidence on diet–disease links with country-level baseline mortality and demographic data, and then represents that relationship inside the linear programme through carefully constructed piecewise-linear (SOS2) approximations. The objective therefore weighs production, environmental and health costs in a consistent monetary unit.
Key ideas:
Dietary risk factors from the Global Burden of Disease (GBD) study underpin the exposure–response curves.
Countries are grouped into health clusters to keep the optimisation tractable while preserving heterogeneity in baseline burden and valuation.
Relative risks multiply across risk factors, so we work in log space to turn the problem into additions that can be linearised.
Data Inputs¶
workflow/scripts/prepare_health_costs.py assembles the following datasets:
Baseline diet (
data/health/processed/diet_intake.csv): average daily intake by country and food item.Relative risks (
data/health/processed/relative_risks.csv): dose–response pairs for each (risk factor, disease cause) combination.Mortality rates (
data/health/processed/mortality.csv): cause-specific death rates by age, country and year.Population and life tables (
processing/{name}/population_age.csvandprocessing/{name}/life_table.csv): age-structured population counts and remaining life expectancy schedules.
Dietary Risk Factors¶
The model incorporates dietary risk factors as defined by the Global Burden of Disease (GBD) Study 2021 [Brauer2024]. These risk factors link dietary intake patterns to specific disease outcomes through dose-response relationships.
GBD Risk Factor Definitions¶
The following table reproduces the GBD 2021 dietary risk factor definitions from Brauer et al. (2024, Supplementary Appendix 1, p. 171). All intake quantities are expressed in terms of fresh (as consumed) weight unless otherwise specified. The optimal intake levels represent the theoretical minimum risk exposure level (TMREL) used in GBD burden calculations:
Risk Factor |
Definition of Exposure |
Optimal Level or Range |
|---|---|---|
Diet low in fruit |
Average daily consumption (in grams per day) of fruit including fresh, frozen, cooked, canned, or dried fruit, excluding fruit juices and salted or pickled fruits |
340–350 g/day |
Diet low in vegetables |
Average daily consumption (in grams per day) of vegetables, including fresh, frozen, cooked, canned, or dried vegetables and excluding legumes and salted or pickled vegetables, juices, nuts and seeds, and starchy vegetables such as potatoes or corn |
306–372 g/day |
Diet low in whole grains |
Average daily consumption (in grams per day) of whole grains (bran, germ, and endosperm in their natural proportion) from breakfast cereals, bread, rice, pasta, biscuits, muffins, tortillas, pancakes, and other sources |
160–210 g/day |
Diet low in nuts and seeds |
Average daily consumption (in grams per day) of nuts and seeds, including tree nuts and seeds and peanuts |
19–24 g/day |
Diet low in fibre |
Average daily consumption (in grams per day) of fibre from all sources including fruits, vegetables, grains, legumes, and pulses |
22–25 g/day |
Diet low in seafood omega-3 fatty acids |
Average daily consumption (in milligrams per day) of eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) |
470–660 mg/day |
Diet low in omega-6 polyunsaturated fatty acids |
Average daily consumption (in % daily energy) from omega-6 polyunsaturated fatty acids (PUFA) (specifically linoleic acid, γ-linolenic acid, eicosadienoic acid, dihomo-γ-linolenic acid, arachidonic acid) |
9–10% of total daily energy |
Diet low in calcium |
Average daily consumption (in grams per day) of calcium from all sources, including milk, yogurt, and cheese |
0.72–0.86 g/day (males), 1.1–1.2 g/day (females) |
Diet low in milk |
Average daily consumption (in grams per day) of dairy milk including non-fat, low-fat, and full-fat milk, but excluding plant-based milks, fermented milk products such as buttermilk, and other dairy products such as cheese |
280–340 g/day (males), 500–610 g/day (females) |
Diet low in legumes |
Average daily consumption (in grams per day) of legumes and pulses, including fresh, frozen, cooked, canned, or dried legumes |
100–110 g/day |
Diet high in red meat |
Average daily consumption (in grams per day) of unprocessed red meat including pork and bovine meats such as beef, pork, lamb, and goat, but excluding all processed meats, poultry, fish, and eggs |
0–200 g/day |
Diet high in processed meat |
Average daily consumption (in grams per day) of meat preserved by smoking, curing, salting, or addition of chemical preservatives |
0 g/day |
Diet high in sugar-sweetened beverages (SSBs) (refined sugar proxy) |
Average daily consumption (in grams per day) of beverages with ≥50 kcal per 226.8 gram serving, including carbonated beverages, sodas, energy drinks, and fruit drinks, but excluding 100% fruit and vegetable juices. Exposures are converted to refined sugar equivalents assuming 5.7 g sugar per 100 g beverage (consistent with the ≥50 kcal threshold). |
0 g/day (refined sugar equivalent) |
Diet high in trans fatty acids |
Average daily consumption (in percent daily energy) of trans fat from all sources, mainly partially hydrogenated vegetable oils and ruminant products |
0–1.1% of total daily energy |
Diet high in sodium |
Average 24-hour urinary sodium excretion (in grams per day) |
1–5 g/day |
Notes:
All intake quantities are in fresh (as consumed) weight, matching the GDD dietary intake data convention (see Current Diets)
GBD risk factors are evaluated for adult populations (≥25 years) - the implementation weights dietary intake over ages ≥
health.intake_age_min(25 by default) using age-specific populationsThe model currently implements a subset of these risk factors based on data availability and model scope
SSB risk-factor exposures are converted to refined sugar equivalents using
health.ssb_sugar_g_per_100g; added-sugar intake from GDD (variablev35) is aggregated into the same refined-sugar risk factor.All risk factors use a generous flat tail at
health.intake_cap_g_per_day(1 000 g/person/day by default) so equality constraints can exceed observed data without driving the SOS2 outside its domain.Risk factor definitions specify both the intake measure (e.g., grams per day) and the threshold or optimal range
“Diet low in” risk factors specify minimum recommended intakes; “diet high in” risk factors treat any intake as risk-increasing
Milk/dairy measurements use milk equivalents, where cheese and yogurt are converted to their milk equivalent weight
See Current Diets for detailed mapping between GDD dietary intake data and these risk factors
Preparation Workflow¶
The preprocessing script performs these steps:
Health clustering – groups countries into
health.region_clustersclusters using a multi-objective approach that balances three criteria:Geographic proximity: countries close together tend to cluster together (weight:
health.clustering.weights.geography)GDP per capita similarity: countries with similar economic development levels cluster together (weight:
health.clustering.weights.gdp)Population balance: clusters are refined to have roughly equal total populations (weight:
health.clustering.weights.population)
The algorithm projects country geometries to an equal-area CRS, computes centroids, and runs weighted K-means on a feature matrix combining geography and log-transformed GDP per capita. An iterative refinement step then reassigns boundary countries from over-populated to under-populated clusters until the population coefficient of variation reaches an acceptable level. The cluster map is saved as
processing/{name}/health/country_clusters.csv.
Health clusters grouping countries based on geographic proximity, GDP per capita similarity, and population balance.¶
Baseline burden – combines mortality, population and life expectancy to compute years of life lost (YLL) per country and aggregates them to the health clusters. For each cause, it also computes a diet-attributable YLL using the population-attributable fraction \(\\text{PAF} = 1 - 1/\\mathrm{RR}\) derived from baseline intakes and the RR curves. The results go into
processing/{name}/health/cluster_cause_baseline.csvandprocessing/{name}/health/cluster_summary.csv.Record cluster totals – store each cluster’s population for scaling; the solver multiplies baseline YLLs by the configured
health.value_per_yllconstant (no external valuation dataset required).Risk-factor breakpoints – builds grids of intake values by taking evenly spaced knots (
health.intake_grid_points) over the empirical RR data range, then adding observed exposures, TMREL, baseline intakes, and the generous caphealth.intake_cap_g_per_day. It evaluates \(\log(RR)\) for every (risk, cause) pair. These tables are written toprocessing/{name}/health/risk_breakpoints.csv.Cause-level breakpoints – as the optimisation needs to recover \(RR = \exp(\sum_r \log RR_{r})\), the script also constructs breakpoints for the aggregated log-relative-risk and its exponential. Stored as
processing/{name}/health/cause_log_breakpoints.csv.
The generated tables drive the linearisation in
workflow/scripts/solve_model.py.
From Diet to Risk Exposure¶
Per-capita intake¶
During optimisation, consumption flows are tracked on links named
consume_<food>_<ISO3>. For each health cluster \(c\) and risk factor
\(r\), the solver forms a per-capita intake by combining these flows with
shares from workflow/scripts/health_food_mapping.py:
where
\(q_{c,f}\) is the aggregated flow in million tonnes per year for food \(f\) consumed by cluster \(c\);
\(\alpha_{f,r}\) is the share of food \(f\) attributed to risk factor \(r\) (currently 1.0 or 0.0);
\(P_c\) is the population represented by the cluster (baseline or updated planning population);
the constant rescales from Mt/year to g/day.
Linearised relative risk curves¶
Each risk factor \(r\) affects a subset of causes \(g\). The data from
risk_breakpoints.csv provides intake breakpoints
\(x_0, \ldots, x_K\) and the corresponding
\(\log RR_{r,g}(x_k)\) values. For every (cluster, risk) pair we introduce
SOS2 “lambda” variables \(\lambda_k\) that satisfy
and approximate the log-relative-risk as
SOS2 constraints keep only two adjacent \(\lambda_k\) active, yielding a piecewise-linear interpolation without binary decision variables when the solver supports SOS2. When HiGHS is used, the implementation falls back to a compact binary formulation.
Aggregating across risk factors¶
Epidemiological evidence models the combined effect of multiple risk factors on one cause as multiplicative:
Taking logarithms converts this to a sum that remains compatible with linear programming:
The solver accumulates the contributions from each risk factor into
log_rr_totals for every cluster–cause pair.
Recovering total relative risk¶
The optimisation needs \(RR_{c,g}\) again to price health damages. The
preprocessed cause_log_breakpoints.csv supplies points
\((z_m, \exp(z_m))\) that cover the feasible range of
\(z = \log RR_{c,g}\). A second SOS2 interpolation enforces
with \(\sum_m \theta_m = 1\). This gives a consistent linearised mapping from the aggregated log-relative-risk back to the multiplicative relative risk.
Monetising years of life lost¶
For each cluster–cause pair the preprocessing step stores \(\mathrm{YLL}^{\mathrm{base}}_{c,g}\) (baseline years of life lost). The solver also records the reference log-relative-risk \(z^{\mathrm{ref}}_{c,g}\) (from TMREL intake levels) and its exponential \(RR^{\mathrm{ref}}_{c,g}\). The contribution to the objective is constructed as
where \(V\) is the value per year of life lost (configured in
health.value_per_yll as USD_2024 per YLL), and
\(RR^{\mathrm{ref}}_{c,g}\) is the relative risk when all risk
factors are at their TMREL levels. This formulation ensures the health
cost is exactly zero when intake is at TMREL.
Since TMREL represents the theoretical minimum risk level, relative risk curves reach their minimum at TMREL intake. Therefore \(RR_{c,g} \geq RR^{\mathrm{ref}}_{c,g}\) always, and health costs are non-negative. PyPSA store energy levels directly encode the deviation from optimal:
measured in million years of life lost relative to TMREL baseline. The
monetary contribution is marginal_cost_storage × e.
Interpreting Results¶
The optimization objective includes health costs measured relative to TMREL:
Zero health cost: All dietary intakes are at TMREL levels (optimal, minimum risk)
Positive health cost: Intake deviates from TMREL, increasing disease burden relative to optimal
Objective Contribution¶
workflow/scripts/solve_model.py adds the summed cost over all clusters and
causes to the PyPSA objective. If the solver exposes SOS2 constraints, the
implementation keeps the formulation linear without integer variables; for
HiGHS a tight binary fallback is activated.
Configuration Highlights¶
health:
enabled: true # Whether to include health costs in the objective function
region_clusters: 30
reference_year: 2018
intake_grid_points: 100 # Number of grid knots over empirical RR range
log_rr_points: 100
omega3_per_100g_fish: 1.5
ssb_sugar_g_per_100g: 5.7 # ≈50 kcal per 226.8 g sugar-sweetened beverage (SSB) implies ~5.7 g sugar per 100 g
value_per_yll: 50000 # USD_2024 per year of life lost
intake_cap_g_per_day: 1000 # Uniform generous cap on intake grids and clipping
intake_age_min: 11 # GDD adult band starts at 11; set to 11 to retain adult intake data. Note however that GDB chronic disease risk factors are for adults of >=25 years.
# Dietary risk factors to consider (must match GDD data items)
risk_factors:
- fruits
- vegetables
- nuts_seeds
- legumes
- fish
- red_meat
- prc_meat
- whole_grains
# - sugar # Has a funky relative risk curve
# Health outcomes/causes to consider (must be present in IHME GBD data and relative risks)
causes:
- CHD # Coronary/Ischemic Heart Disease
- Stroke # Stroke (all types)
- T2DM # Type 2 Diabetes Mellitus
- CRC # Colorectal Cancer
# Mapping of risk factors to the causes they affect
risk_cause_map:
fruits: [CHD, Stroke, T2DM]
vegetables: [CHD, Stroke]
nuts_seeds: [CHD, CRC, T2DM]
legumes: [CHD]
fish: [CHD]
red_meat: [CHD, Stroke, T2DM, CRC]
prc_meat: [CHD, T2DM, CRC]
whole_grains: [CHD, Stroke, T2DM, CRC]
# sugar: [CHD, Stroke, T2DM, CRC]
# Theoretical minimum risk exposure levels (TMREL) from GBD Study 2021
# Source: Brauer et al. (2024), Global Burden of Disease Study 2021
# Values represent optimal intake levels where health risk is minimized
# Reference: https://doi.org/10.1016/S0140-6736(24)00933-4
tmrel_g_per_day:
fruits: 345 # TMREL: 340-350 g/day (midpoint)
vegetables: 339 # TMREL: 306-372 g/day (midpoint)
whole_grains: 185 # TMREL: 160-210 g/day (midpoint)
nuts_seeds: 21.5 # TMREL: 19-24 g/day (midpoint)
legumes: 105 # TMREL: 100-110 g/day (midpoint)
fish: 37.7 # TMREL: 470-660 mg/day omega-3 (midpoint 565 mg, converted using omega3_per_100g_fish)
red_meat: 0 # TMREL: 0-200 g/day (using conservative lower bound)
prc_meat: 0 # TMREL: 0 g/day (any intake increases risk)
sugar: 0 # TMREL: 0 g/day (any refined sugar intake increases risk)
# Multi-objective clustering settings for grouping countries into health clusters
clustering:
gdp_reference_year: 2025 # Reference year for GDP per capita data
weights:
geography: 1.0 # Weight for geographic proximity
gdp: 0.5 # Weight for GDP per capita similarity
population: 0.3 # Weight for population balance across clusters
Lowering region_clusters or log_rr_points eases the optimisation at the
cost of coarser health resolution. health.intake_grid_points controls the
density of the first-stage interpolation grid; smaller values give smoother
curves but produce larger tables.
Outputs¶
The preprocessing rule saves all intermediate products under
processing/{name}/health/. Downstream plotting rules also create quick-look
maps (results/{name}/plots/health_*.pdf) and CSV summaries to compare
baseline versus optimised health outcomes.
References¶
Brauer M, Roth GA, Aravkin AY, et al. Global Burden and Strength of Evidence for 88 Risk Factors in 204 Countries and 811 Subnational Locations, 1990–2021: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet, 2024;403(10440):2162–203. https://doi.org/10.1016/S0140-6736(24)00933-4