Health Impacts¶

Overview¶

The health module converts dietary choices in the optimisation into monetised health impacts. It combines epidemiological evidence on diet–disease links with country-level baseline mortality and demographic data, and then represents that relationship inside the linear programme through carefully constructed piecewise-linear (SOS2) approximations. The objective therefore weighs production, environmental and health costs in a consistent monetary unit.

Key ideas:

Dietary risk factors from the Global Burden of Disease (GBD) study underpin the exposure–response curves.
Countries are grouped into health clusters to keep the optimisation tractable while preserving heterogeneity in baseline burden and valuation.
Relative risks multiply across risk factors, so we work in log space to turn the problem into additions that can be linearised.

Data Inputs¶

workflow/scripts/prepare_health_costs.py assembles the following datasets:

Baseline diet (data/health/processed/diet_intake.csv): average daily intake by country and food item.
Relative risks (data/health/processed/relative_risks.csv): dose–response pairs for each (risk factor, disease cause) combination.
Mortality rates (data/health/processed/mortality.csv): cause-specific death rates by age, country and year.
Population and life tables (processing/{name}/population_age.csv and processing/{name}/life_table.csv): age-structured population counts and remaining life expectancy schedules.

Dietary Risk Factors¶

The model incorporates dietary risk factors as defined by the Global Burden of Disease (GBD) Study 2021 [Brauer2024]. These risk factors link dietary intake patterns to specific disease outcomes through dose-response relationships.

GBD Risk Factor Definitions¶

The following table reproduces the GBD 2021 dietary risk factor definitions from Brauer et al. (2024, Supplementary Appendix 1, p. 171). All intake quantities are expressed in terms of fresh (as consumed) weight unless otherwise specified. The optimal intake levels represent the theoretical minimum risk exposure level (TMREL) used in GBD burden calculations:

Risk Factor	Definition of Exposure	Optimal Level or Range
Diet low in fruit	Average daily consumption (in grams per day) of fruit including fresh, frozen, cooked, canned, or dried fruit, excluding fruit juices and salted or pickled fruits	340–350 g/day
Diet low in vegetables	Average daily consumption (in grams per day) of vegetables, including fresh, frozen, cooked, canned, or dried vegetables and excluding legumes and salted or pickled vegetables, juices, nuts and seeds, and starchy vegetables such as potatoes or corn	306–372 g/day
Diet low in whole grains	Average daily consumption (in grams per day) of whole grains (bran, germ, and endosperm in their natural proportion) from breakfast cereals, bread, rice, pasta, biscuits, muffins, tortillas, pancakes, and other sources	160–210 g/day
Diet low in nuts and seeds	Average daily consumption (in grams per day) of nuts and seeds, including tree nuts and seeds and peanuts	19–24 g/day
Diet low in fibre	Average daily consumption (in grams per day) of fibre from all sources including fruits, vegetables, grains, legumes, and pulses	22–25 g/day
Diet low in seafood omega-3 fatty acids	Average daily consumption (in milligrams per day) of eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA)	470–660 mg/day
Diet low in omega-6 polyunsaturated fatty acids	Average daily consumption (in % daily energy) from omega-6 polyunsaturated fatty acids (PUFA) (specifically linoleic acid, γ-linolenic acid, eicosadienoic acid, dihomo-γ-linolenic acid, arachidonic acid)	9–10% of total daily energy
Diet low in calcium	Average daily consumption (in grams per day) of calcium from all sources, including milk, yogurt, and cheese	0.72–0.86 g/day (males), 1.1–1.2 g/day (females)
Diet low in milk	Average daily consumption (in grams per day) of dairy milk including non-fat, low-fat, and full-fat milk, but excluding plant-based milks, fermented milk products such as buttermilk, and other dairy products such as cheese	280–340 g/day (males), 500–610 g/day (females)
Diet low in legumes	Average daily consumption (in grams per day) of legumes and pulses, including fresh, frozen, cooked, canned, or dried legumes	100–110 g/day
Diet high in red meat	Average daily consumption (in grams per day) of unprocessed red meat including pork and bovine meats such as beef, pork, lamb, and goat, but excluding all processed meats, poultry, fish, and eggs	0–200 g/day
Diet high in processed meat	Average daily consumption (in grams per day) of meat preserved by smoking, curing, salting, or addition of chemical preservatives	0 g/day
Diet high in sugar-sweetened beverages (SSBs) (refined sugar proxy)	Average daily consumption (in grams per day) of beverages with ≥50 kcal per 226.8 gram serving, including carbonated beverages, sodas, energy drinks, and fruit drinks, but excluding 100% fruit and vegetable juices. Exposures are converted to refined sugar equivalents assuming 5.7 g sugar per 100 g beverage (consistent with the ≥50 kcal threshold).	0 g/day (refined sugar equivalent)
Diet high in trans fatty acids	Average daily consumption (in percent daily energy) of trans fat from all sources, mainly partially hydrogenated vegetable oils and ruminant products	0–1.1% of total daily energy
Diet high in sodium	Average 24-hour urinary sodium excretion (in grams per day)	1–5 g/day

Notes:

All intake quantities are in fresh (as consumed) weight, matching the GDD dietary intake data convention (see Current Diets)
GBD risk factors are evaluated for adult populations (≥25 years) - the implementation weights dietary intake over ages ≥ health.intake_age_min (25 by default) using age-specific populations
The model currently implements a subset of these risk factors based on data availability and model scope
SSB risk-factor exposures are converted to refined sugar equivalents using health.ssb_sugar_g_per_100g; added-sugar intake from GDD (variable v35) is aggregated into the same refined-sugar risk factor.
All risk factors use a generous flat tail at health.intake_cap_g_per_day (1 000 g/person/day by default) so equality constraints can exceed observed data without driving the SOS2 outside its domain.
Risk factor definitions specify both the intake measure (e.g., grams per day) and the threshold or optimal range
“Diet low in” risk factors specify minimum recommended intakes; “diet high in” risk factors treat any intake as risk-increasing
Milk/dairy measurements use milk equivalents, where cheese and yogurt are converted to their milk equivalent weight
See Current Diets for detailed mapping between GDD dietary intake data and these risk factors

Preparation Workflow¶

The preprocessing script performs these steps:

Health clustering – groups countries into health.region_clusters clusters using a multi-objective approach that balances three criteria:
- Geographic proximity: countries close together tend to cluster together (weight: health.clustering.weights.geography)
- GDP per capita similarity: countries with similar economic development levels cluster together (weight: health.clustering.weights.gdp)
- Population balance: clusters are refined to have roughly equal total populations (weight: health.clustering.weights.population)
The algorithm projects country geometries to an equal-area CRS, computes centroids, and runs weighted K-means on a feature matrix combining geography and log-transformed GDP per capita. An iterative refinement step then reassigns boundary countries from over-populated to under-populated clusters until the population coefficient of variation reaches an acceptable level. The cluster map is saved as processing/{name}/health/country_clusters.csv.

Health clusters grouping countries based on geographic proximity, GDP per capita similarity, and population balance.¶
Baseline burden – combines mortality, population and life expectancy to compute years of life lost (YLL) per country and aggregates them to the health clusters. For each cause, it also computes a diet-attributable YLL using the population-attributable fraction \(\\text{PAF} = 1 - 1/\\mathrm{RR}\) derived from baseline intakes and the RR curves. The results go into processing/{name}/health/cluster_cause_baseline.csv and processing/{name}/health/cluster_summary.csv.
Record cluster totals – store each cluster’s population for scaling; the solver multiplies baseline YLLs by the configured health.value_per_yll constant (no external valuation dataset required).
Risk-factor breakpoints – builds grids of intake values by taking evenly spaced knots (health.intake_grid_points) over the empirical RR data range, then adding observed exposures, TMREL, baseline intakes, and the generous cap health.intake_cap_g_per_day. It evaluates \(\log(RR)\) for every (risk, cause) pair. These tables are written to processing/{name}/health/risk_breakpoints.csv.
Cause-level breakpoints – as the optimisation needs to recover \(RR = \exp(\sum_r \log RR_{r})\), the script also constructs breakpoints for the aggregated log-relative-risk and its exponential. Stored as processing/{name}/health/cause_log_breakpoints.csv.

The generated tables drive the linearisation in workflow/scripts/solve_model.py.

From Diet to Risk Exposure¶

Per-capita intake¶

During optimisation, consumption flows are tracked on links named consume_<food>_<ISO3>. For each health cluster \(c\) and risk factor \(r\), the solver forms a per-capita intake by combining these flows with shares from workflow/scripts/health_food_mapping.py:

\[I_{c,r} = \frac{10^{6}}{365\,P_c} \sum_{f \in \mathcal{F}_r} \alpha_{f,r} \; q_{c,f}\]

where

\(q_{c,f}\) is the aggregated flow in million tonnes per year for food \(f\) consumed by cluster \(c\);
\(\alpha_{f,r}\) is the share of food \(f\) attributed to risk factor \(r\) (currently 1.0 or 0.0);
\(P_c\) is the population represented by the cluster (baseline or updated planning population);
the constant rescales from Mt/year to g/day.

Linearised relative risk curves¶

Each risk factor \(r\) affects a subset of causes \(g\). The data from risk_breakpoints.csv provides intake breakpoints \(x_0, \ldots, x_K\) and the corresponding \(\log RR_{r,g}(x_k)\) values. For every (cluster, risk) pair we introduce SOS2 “lambda” variables \(\lambda_k\) that satisfy

\[\sum_k \lambda_k = 1,\qquad I_{c,r} = \sum_k x_k\,\lambda_k,\]

and approximate the log-relative-risk as

\[\log RR_{c,r,g} = \sum_k \lambda_k\, \log RR_{r,g}(x_k).\]

SOS2 constraints keep only two adjacent \(\lambda_k\) active, yielding a piecewise-linear interpolation without binary decision variables when the solver supports SOS2. When HiGHS is used, the implementation falls back to a compact binary formulation.

Aggregating across risk factors¶

Epidemiological evidence models the combined effect of multiple risk factors on one cause as multiplicative:

\[RR_{c,g} = \prod_{r \in \mathcal{R}_g} RR_{c,r,g}.\]

Taking logarithms converts this to a sum that remains compatible with linear programming:

\[\log RR_{c,g} = \sum_{r \in \mathcal{R}_g} \log RR_{c,r,g}.\]

The solver accumulates the contributions from each risk factor into log_rr_totals for every cluster–cause pair.

Recovering total relative risk¶

The optimisation needs \(RR_{c,g}\) again to price health damages. The preprocessed cause_log_breakpoints.csv supplies points \((z_m, \exp(z_m))\) that cover the feasible range of \(z = \log RR_{c,g}\). A second SOS2 interpolation enforces

\[z = \sum_m z_m \theta_m,\qquad RR_{c,g} = \sum_m e^{z_m} \theta_m,\]

with \(\sum_m \theta_m = 1\). This gives a consistent linearised mapping from the aggregated log-relative-risk back to the multiplicative relative risk.

Monetising years of life lost¶

For each cluster–cause pair the preprocessing step stores \(\mathrm{YLL}^{\mathrm{base}}_{c,g}\) (baseline years of life lost). The solver also records the reference log-relative-risk \(z^{\mathrm{ref}}_{c,g}\) (from TMREL intake levels) and its exponential \(RR^{\mathrm{ref}}_{c,g}\). The contribution to the objective is constructed as

\[\text{Cost}_{c,g} = V\, \mathrm{YLL}^{\mathrm{base}}_{c,g} \left( \frac{RR_{c,g}}{RR^{\mathrm{ref}}_{c,g}} - 1 \right).\]

where \(V\) is the value per year of life lost (configured in health.value_per_yll as USD_2024 per YLL), and \(RR^{\mathrm{ref}}_{c,g}\) is the relative risk when all risk factors are at their TMREL levels. This formulation ensures the health cost is exactly zero when intake is at TMREL.

Since TMREL represents the theoretical minimum risk level, relative risk curves reach their minimum at TMREL intake. Therefore \(RR_{c,g} \geq RR^{\mathrm{ref}}_{c,g}\) always, and health costs are non-negative. PyPSA store energy levels directly encode the deviation from optimal:

\[e_{c,g} = (RR_{c,g} - RR^{\mathrm{ref}}_{c,g}) \cdot \frac{\mathrm{YLL}^{\mathrm{base}}_{c,g}}{RR^{\mathrm{ref}}_{c,g}} \cdot 10^{-6}\]

measured in million years of life lost relative to TMREL baseline. The monetary contribution is marginal_cost_storage × e.

Interpreting Results¶

The optimization objective includes health costs measured relative to TMREL:

Zero health cost: All dietary intakes are at TMREL levels (optimal, minimum risk)
Positive health cost: Intake deviates from TMREL, increasing disease burden relative to optimal

Objective Contribution¶

workflow/scripts/solve_model.py adds the summed cost over all clusters and causes to the PyPSA objective. If the solver exposes SOS2 constraints, the implementation keeps the formulation linear without integer variables; for HiGHS a tight binary fallback is activated.

Configuration Highlights¶

health:
  enabled: true  # Whether to include health costs in the objective function
  region_clusters: 30
  reference_year: 2018
  intake_grid_points: 100  # Number of grid knots over empirical RR range
  log_rr_points: 100
  omega3_per_100g_fish: 1.5
  ssb_sugar_g_per_100g: 5.7  # ≈50 kcal per 226.8 g sugar-sweetened beverage (SSB) implies ~5.7 g sugar per 100 g
  value_per_yll: 50000  # USD_2024 per year of life lost
  intake_cap_g_per_day: 1000  # Uniform generous cap on intake grids and clipping
  intake_age_min: 11  # GDD adult band starts at 11; set to 11 to retain adult intake data. Note however that GDB chronic disease risk factors are for adults of >=25 years.
  # Dietary risk factors to consider (must match GDD data items)
  risk_factors:
  - fruits
  - vegetables
  - nuts_seeds
  - legumes
  - fish
  - red_meat
  - prc_meat
  - whole_grains
  # - sugar # Has a funky relative risk curve
  # Health outcomes/causes to consider (must be present in IHME GBD data and relative risks)
  causes:
  - CHD              # Coronary/Ischemic Heart Disease
  - Stroke           # Stroke (all types)
  - T2DM             # Type 2 Diabetes Mellitus
  - CRC              # Colorectal Cancer
  # Mapping of risk factors to the causes they affect
  risk_cause_map:
    fruits: [CHD, Stroke, T2DM]
    vegetables: [CHD, Stroke]
    nuts_seeds: [CHD, CRC, T2DM]
    legumes: [CHD]
    fish: [CHD]
    red_meat: [CHD, Stroke, T2DM, CRC]
    prc_meat: [CHD, T2DM, CRC]
    whole_grains: [CHD, Stroke, T2DM, CRC]
    # sugar: [CHD, Stroke, T2DM, CRC]
  # Theoretical minimum risk exposure levels (TMREL) from GBD Study 2021
  # Source: Brauer et al. (2024), Global Burden of Disease Study 2021
  # Values represent optimal intake levels where health risk is minimized
  # Reference: https://doi.org/10.1016/S0140-6736(24)00933-4
  tmrel_g_per_day:
    fruits: 345         # TMREL: 340-350 g/day (midpoint)
    vegetables: 339     # TMREL: 306-372 g/day (midpoint)
    whole_grains: 185   # TMREL: 160-210 g/day (midpoint)
    nuts_seeds: 21.5    # TMREL: 19-24 g/day (midpoint)
    legumes: 105        # TMREL: 100-110 g/day (midpoint)
    fish: 37.7          # TMREL: 470-660 mg/day omega-3 (midpoint 565 mg, converted using omega3_per_100g_fish)
    red_meat: 0         # TMREL: 0-200 g/day (using conservative lower bound)
    prc_meat: 0         # TMREL: 0 g/day (any intake increases risk)
    sugar: 0            # TMREL: 0 g/day (any refined sugar intake increases risk)
  # Multi-objective clustering settings for grouping countries into health clusters
  clustering:
    gdp_reference_year: 2025  # Reference year for GDP per capita data
    weights:
      geography: 1.0    # Weight for geographic proximity
      gdp: 0.5          # Weight for GDP per capita similarity
      population: 0.3   # Weight for population balance across clusters

Lowering region_clusters or log_rr_points eases the optimisation at the cost of coarser health resolution. health.intake_grid_points controls the density of the first-stage interpolation grid; smaller values give smoother curves but produce larger tables.

Outputs¶

The preprocessing rule saves all intermediate products under processing/{name}/health/. Downstream plotting rules also create quick-look maps (results/{name}/plots/health_*.pdf) and CSV summaries to compare baseline versus optimised health outcomes.

References¶

[Brauer2024]

Brauer M, Roth GA, Aravkin AY, et al. Global Burden and Strength of Evidence for 88 Risk Factors in 204 Countries and 811 Subnational Locations, 1990–2021: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet, 2024;403(10440):2162–203. https://doi.org/10.1016/S0140-6736(24)00933-4