Health Impacts¶
Overview¶
The health module converts dietary choices in the optimisation into monetised health impacts. It combines epidemiological evidence on diet–disease links with country-level baseline mortality and demographic data, and then represents that relationship inside the linear programme through carefully constructed piecewise-linear (SOS2) approximations. The objective therefore weighs production, environmental and health costs in a consistent monetary unit.
Key ideas:
Dietary risk factors from the Global Burden of Disease (GBD) study underpin the exposure–response curves.
Countries are grouped into health clusters to keep the optimisation tractable while preserving heterogeneity in baseline burden and valuation.
Relative risks multiply across risk factors, so we work in log space to turn the problem into additions that can be linearised.
Data Inputs¶
workflow/scripts/prepare_health_costs.py
assembles the following datasets:
Baseline diet (
data/health/processed/diet_intake.csv
): average daily intake by country and food item.Relative risks (
data/health/processed/relative_risks.csv
): dose–response pairs for each (risk factor, disease cause) combination.Mortality rates (
data/health/processed/mortality.csv
): cause-specific death rates by age, country and year.Population and life tables (
processing/{name}/population_age.csv
andprocessing/{name}/life_table.csv
): age-structured population counts and remaining life expectancy schedules.
Dietary Risk Factors¶
The model incorporates dietary risk factors as defined by the Global Burden of Disease (GBD) Study 2021 [Brauer2024]. These risk factors link dietary intake patterns to specific disease outcomes through dose-response relationships.
GBD Risk Factor Definitions¶
The following table reproduces the GBD 2021 dietary risk factor definitions from Brauer et al. (2024, Supplementary Appendix 1, p. 171). All intake quantities are expressed in terms of fresh (as consumed) weight unless otherwise specified. The optimal intake levels represent the theoretical minimum risk exposure level (TMREL) used in GBD burden calculations:
Risk Factor |
Definition of Exposure |
Optimal Level or Range |
---|---|---|
Diet low in fruit |
Average daily consumption (in grams per day) of fruit including fresh, frozen, cooked, canned, or dried fruit, excluding fruit juices and salted or pickled fruits |
340–350 g/day |
Diet low in vegetables |
Average daily consumption (in grams per day) of vegetables, including fresh, frozen, cooked, canned, or dried vegetables and excluding legumes and salted or pickled vegetables, juices, nuts and seeds, and starchy vegetables such as potatoes or corn |
306–372 g/day |
Diet low in whole grains |
Average daily consumption (in grams per day) of whole grains (bran, germ, and endosperm in their natural proportion) from breakfast cereals, bread, rice, pasta, biscuits, muffins, tortillas, pancakes, and other sources |
160–210 g/day |
Diet low in nuts and seeds |
Average daily consumption (in grams per day) of nuts and seeds, including tree nuts and seeds and peanuts |
19–24 g/day |
Diet low in fibre |
Average daily consumption (in grams per day) of fibre from all sources including fruits, vegetables, grains, legumes, and pulses |
22–25 g/day |
Diet low in seafood omega-3 fatty acids |
Average daily consumption (in milligrams per day) of eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) |
470–660 mg/day |
Diet low in omega-6 polyunsaturated fatty acids |
Average daily consumption (in % daily energy) from omega-6 polyunsaturated fatty acids (PUFA) (specifically linoleic acid, γ-linolenic acid, eicosadienoic acid, dihomo-γ-linolenic acid, arachidonic acid) |
9–10% of total daily energy |
Diet low in calcium |
Average daily consumption (in grams per day) of calcium from all sources, including milk, yogurt, and cheese |
0.72–0.86 g/day (males), 1.1–1.2 g/day (females) |
Diet low in milk |
Average daily consumption (in grams per day) of dairy milk including non-fat, low-fat, and full-fat milk, but excluding plant-based milks, fermented milk products such as buttermilk, and other dairy products such as cheese |
280–340 g/day (males), 500–610 g/day (females) |
Diet low in legumes |
Average daily consumption (in grams per day) of legumes and pulses, including fresh, frozen, cooked, canned, or dried legumes |
100–110 g/day |
Diet high in red meat |
Average daily consumption (in grams per day) of unprocessed red meat including pork and bovine meats such as beef, pork, lamb, and goat, but excluding all processed meats, poultry, fish, and eggs |
0–200 g/day |
Diet high in processed meat |
Average daily consumption (in grams per day) of meat preserved by smoking, curing, salting, or addition of chemical preservatives |
0 g/day |
Diet high in sugar-sweetened beverages (SSBs) |
Average daily consumption (in grams per day) of beverages with ≥50 kcal per 226.8 gram serving, including carbonated beverages, sodas, energy drinks, and fruit drinks, but excluding 100% fruit and vegetable juices |
0 g/day |
Diet high in trans fatty acids |
Average daily consumption (in percent daily energy) of trans fat from all sources, mainly partially hydrogenated vegetable oils and ruminant products |
0–1.1% of total daily energy |
Diet high in sodium |
Average 24-hour urinary sodium excretion (in grams per day) |
1–5 g/day |
Notes:
All intake quantities are in fresh (as consumed) weight, matching the GDD dietary intake data convention (see Current Diets)
GBD risk factors are evaluated for adult populations (≥25 years) - the current implementation uses population-weighted “All ages” dietary intake averages, which may underestimate risk for adult-only populations
The model currently implements a subset of these risk factors based on data availability and model scope
Risk factor definitions specify both the intake measure (e.g., grams per day) and the threshold or optimal range
“Diet low in” risk factors specify minimum recommended intakes; “diet high in” risk factors treat any intake as risk-increasing
Milk/dairy measurements use milk equivalents, where cheese and yogurt are converted to their milk equivalent weight
See Current Diets for detailed mapping between GDD dietary intake data and these risk factors
Preparation Workflow¶
The preprocessing script performs these steps:
Health clustering – dissolves country geometries, computes equal-area centroids and runs K-means to assign each country to one of
health.region_clusters
clusters. The cluster map is saved asprocessing/{name}/health/country_clusters.csv
.Baseline burden – combines mortality, population and life expectancy to compute years of life lost (YLL) per country and aggregates them to the health clusters. The results go into
processing/{name}/health/cluster_cause_baseline.csv
andprocessing/{name}/health/cluster_summary.csv
.Record cluster totals – store each cluster’s population for scaling; the solver multiplies baseline YLLs by the configured
health.value_per_yll
constant (no external valuation dataset required).Risk-factor breakpoints – builds dense grids of intake values (including observed exposures and configured
health.intake_grid_step
) and evaluates \(\log(RR)\) for every (risk, cause) pair. These tables are written toprocessing/{name}/health/risk_breakpoints.csv
.Cause-level breakpoints – as the optimisation needs to recover \(RR = \exp(\sum_r \log RR_{r})\), the script also constructs breakpoints for the aggregated log-relative-risk and its exponential. Stored as
processing/{name}/health/cause_log_breakpoints.csv
.
The generated tables drive the linearisation in
workflow/scripts/solve_model.py
.
From Diet to Risk Exposure¶
Per-capita intake¶
During optimisation, consumption flows are tracked on links named
consume_<food>_<ISO3>
. For each health cluster \(c\) and risk factor
\(r\), the solver forms a per-capita intake by combining these flows with
shares from workflow/scripts/health_food_mapping.py
:
where
\(q_{c,f}\) is the aggregated flow in million tonnes per year for food \(f\) consumed by cluster \(c\);
\(\alpha_{f,r}\) is the share of food \(f\) attributed to risk factor \(r\) (currently 1.0 or 0.0);
\(P_c\) is the population represented by the cluster (baseline or updated planning population);
the constant rescales from Mt/year to g/day.
Linearised relative risk curves¶
Each risk factor \(r\) affects a subset of causes \(g\). The data from
risk_breakpoints.csv
provides intake breakpoints
\(x_0, \ldots, x_K\) and the corresponding
\(\log RR_{r,g}(x_k)\) values. For every (cluster, risk) pair we introduce
SOS2 “lambda” variables \(\lambda_k\) that satisfy
and approximate the log-relative-risk as
SOS2 constraints keep only two adjacent \(\lambda_k\) active, yielding a piecewise-linear interpolation without binary decision variables when the solver supports SOS2. When HiGHS is used, the implementation falls back to a compact binary formulation.
Aggregating across risk factors¶
Epidemiological evidence models the combined effect of multiple risk factors on one cause as multiplicative:
Taking logarithms converts this to a sum that remains compatible with linear programming:
The solver accumulates the contributions from each risk factor into
log_rr_totals
for every cluster–cause pair.
Recovering total relative risk¶
The optimisation needs \(RR_{c,g}\) again to price health damages. The
preprocessed cause_log_breakpoints.csv
supplies points
\((z_m, \exp(z_m))\) that cover the feasible range of
\(z = \log RR_{c,g}\). A second SOS2 interpolation enforces
with \(\sum_m \theta_m = 1\). This gives a consistent linearised mapping from the aggregated log-relative-risk back to the multiplicative relative risk.
Monetising years of life lost¶
For each cluster–cause pair the preprocessing step stores \(\mathrm{YLL}^{\mathrm{base}}_{c,g}\) (baseline years of life lost). The solver also records the reference log-relative-risk \(z^{\mathrm{ref}}_{c,g}\) (from baseline diets) and its exponential \(RR^{\mathrm{ref}}_{c,g}\). The contribution to the objective is constructed as
A constant term subtracts \(V\,\mathrm{YLL}^{\mathrm{base}}_{c,g}\) so that the baseline diet has zero health cost and only improvements or deteriorations relative to the reference affect the optimisation.
Objective Contribution¶
workflow/scripts/solve_model.py
adds the summed cost over all clusters and
causes to the PyPSA objective. If the solver exposes SOS2 constraints, the
implementation keeps the formulation linear without integer variables; for
HiGHS a tight binary fallback is activated. The script also records the constant
baseline adjustment in network.meta["objective_constant_terms"]["health"]
to
help interpret objective values ex post.
Configuration Highlights¶
health:
region_clusters: 30
reference_year: 2018
intake_grid_step: 20 # Intake resolution in g/person/day
log_rr_points: 10
omega3_per_100g_fish: 1.5
value_per_yll: 150000 # USD per year of life lost
# Dietary risk factors to consider (must match GDD data items)
risk_factors:
- fruits
- vegetables
- nuts_seeds
- legumes
- fish
- red_meat
- prc_meat
- whole_grains
# Health outcomes/causes to consider (must be present in IHME GBD data and relative risks)
causes:
- CHD # Coronary/Ischemic Heart Disease
- Stroke # Stroke (all types)
- T2DM # Type 2 Diabetes Mellitus
- CRC # Colorectal Cancer
# Theoretical minimum risk exposure levels (TMREL) from GBD Study 2021
# Source: Brauer et al. (2024), Global Burden of Disease Study 2021
# Values represent optimal intake levels where health risk is minimized
# Reference: https://doi.org/10.1016/S0140-6736(24)00933-4
tmrel_g_per_day:
fruits: 345 # TMREL: 340-350 g/day (midpoint)
vegetables: 339 # TMREL: 306-372 g/day (midpoint)
whole_grains: 185 # TMREL: 160-210 g/day (midpoint)
nuts_seeds: 21.5 # TMREL: 19-24 g/day (midpoint)
legumes: 105 # TMREL: 100-110 g/day (midpoint)
fish: 37.7 # TMREL: 470-660 mg/day omega-3 (midpoint 565 mg, converted using omega3_per_100g_fish)
red_meat: 0 # TMREL: 0-200 g/day (using conservative lower bound)
prc_meat: 0 # TMREL: 0 g/day (any intake increases risk)
Lowering region_clusters
or log_rr_points
eases the optimisation at the
cost of coarser health resolution. health.intake_grid_step
controls the
density of the first-stage interpolation grid; smaller values give smoother
curves but produce larger tables.
Outputs¶
The preprocessing rule saves all intermediate products under
processing/{name}/health/
. Downstream plotting rules also create quick-look
maps (results/{name}/plots/health_*.pdf
) and CSV summaries to compare
baseline versus optimised health outcomes.
References¶
Brauer M, Roth GA, Aravkin AY, et al. Global Burden and Strength of Evidence for 88 Risk Factors in 204 Countries and 811 Subnational Locations, 1990–2021: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet, 2024;403(10440):2162–203. https://doi.org/10.1016/S0140-6736(24)00933-4