Health Impacts

Overview

The health module converts dietary choices in the optimisation into monetised health impacts. It combines epidemiological evidence on diet–disease links with country-level baseline mortality and demographic data, and then represents that relationship inside the linear programme through carefully constructed piecewise-linear (SOS2) approximations. The objective therefore weighs production, environmental and health costs in a consistent monetary unit.

Key ideas:

  • Dietary risk factors from the Global Burden of Disease (GBD) study underpin the exposure–response curves.

  • Countries are grouped into health clusters to keep the optimisation tractable while preserving heterogeneity in baseline burden and valuation.

  • Relative risks multiply across risk factors, so we work in log space to turn the problem into additions that can be linearised.

Data Inputs

workflow/scripts/prepare_health_costs.py assembles the following datasets:

  • Baseline diet (data/health/processed/diet_intake.csv): average daily intake by country and food item.

  • Relative risks (data/health/processed/relative_risks.csv): dose–response pairs for each (risk factor, disease cause) combination.

  • Mortality rates (data/health/processed/mortality.csv): cause-specific death rates by age, country and year.

  • Population and life tables (processing/{name}/population_age.csv and processing/{name}/life_table.csv): age-structured population counts and remaining life expectancy schedules.

Dietary Risk Factors

The model incorporates dietary risk factors as defined by the Global Burden of Disease (GBD) Study 2021 [Brauer2024]. These risk factors link dietary intake patterns to specific disease outcomes through dose-response relationships.

GBD Risk Factor Definitions

The following table reproduces the GBD 2021 dietary risk factor definitions from Brauer et al. (2024, Supplementary Appendix 1, p. 171). All intake quantities are expressed in terms of fresh (as consumed) weight unless otherwise specified. The optimal intake levels represent the theoretical minimum risk exposure level (TMREL) used in GBD burden calculations:

Risk Factor

Definition of Exposure

Optimal Level or Range

Diet low in fruit

Average daily consumption (in grams per day) of fruit including fresh, frozen, cooked, canned, or dried fruit, excluding fruit juices and salted or pickled fruits

340–350 g/day

Diet low in vegetables

Average daily consumption (in grams per day) of vegetables, including fresh, frozen, cooked, canned, or dried vegetables and excluding legumes and salted or pickled vegetables, juices, nuts and seeds, and starchy vegetables such as potatoes or corn

306–372 g/day

Diet low in whole grains

Average daily consumption (in grams per day) of whole grains (bran, germ, and endosperm in their natural proportion) from breakfast cereals, bread, rice, pasta, biscuits, muffins, tortillas, pancakes, and other sources

160–210 g/day

Diet low in nuts and seeds

Average daily consumption (in grams per day) of nuts and seeds, including tree nuts and seeds and peanuts

19–24 g/day

Diet low in fibre

Average daily consumption (in grams per day) of fibre from all sources including fruits, vegetables, grains, legumes, and pulses

22–25 g/day

Diet low in seafood omega-3 fatty acids

Average daily consumption (in milligrams per day) of eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA)

470–660 mg/day

Diet low in omega-6 polyunsaturated fatty acids

Average daily consumption (in % daily energy) from omega-6 polyunsaturated fatty acids (PUFA) (specifically linoleic acid, γ-linolenic acid, eicosadienoic acid, dihomo-γ-linolenic acid, arachidonic acid)

9–10% of total daily energy

Diet low in calcium

Average daily consumption (in grams per day) of calcium from all sources, including milk, yogurt, and cheese

0.72–0.86 g/day (males), 1.1–1.2 g/day (females)

Diet low in milk

Average daily consumption (in grams per day) of dairy milk including non-fat, low-fat, and full-fat milk, but excluding plant-based milks, fermented milk products such as buttermilk, and other dairy products such as cheese

280–340 g/day (males), 500–610 g/day (females)

Diet low in legumes

Average daily consumption (in grams per day) of legumes and pulses, including fresh, frozen, cooked, canned, or dried legumes

100–110 g/day

Diet high in red meat

Average daily consumption (in grams per day) of unprocessed red meat including pork and bovine meats such as beef, pork, lamb, and goat, but excluding all processed meats, poultry, fish, and eggs

0–200 g/day

Diet high in processed meat

Average daily consumption (in grams per day) of meat preserved by smoking, curing, salting, or addition of chemical preservatives

0 g/day

Diet high in sugar-sweetened beverages (SSBs)

Average daily consumption (in grams per day) of beverages with ≥50 kcal per 226.8 gram serving, including carbonated beverages, sodas, energy drinks, and fruit drinks, but excluding 100% fruit and vegetable juices

0 g/day

Diet high in trans fatty acids

Average daily consumption (in percent daily energy) of trans fat from all sources, mainly partially hydrogenated vegetable oils and ruminant products

0–1.1% of total daily energy

Diet high in sodium

Average 24-hour urinary sodium excretion (in grams per day)

1–5 g/day

Notes:

  • All intake quantities are in fresh (as consumed) weight, matching the GDD dietary intake data convention (see Current Diets)

  • GBD risk factors are evaluated for adult populations (≥25 years) - the current implementation uses population-weighted “All ages” dietary intake averages, which may underestimate risk for adult-only populations

  • The model currently implements a subset of these risk factors based on data availability and model scope

  • Risk factor definitions specify both the intake measure (e.g., grams per day) and the threshold or optimal range

  • “Diet low in” risk factors specify minimum recommended intakes; “diet high in” risk factors treat any intake as risk-increasing

  • Milk/dairy measurements use milk equivalents, where cheese and yogurt are converted to their milk equivalent weight

  • See Current Diets for detailed mapping between GDD dietary intake data and these risk factors

Preparation Workflow

The preprocessing script performs these steps:

  1. Health clustering – dissolves country geometries, computes equal-area centroids and runs K-means to assign each country to one of health.region_clusters clusters. The cluster map is saved as processing/{name}/health/country_clusters.csv.

  2. Baseline burden – combines mortality, population and life expectancy to compute years of life lost (YLL) per country and aggregates them to the health clusters. The results go into processing/{name}/health/cluster_cause_baseline.csv and processing/{name}/health/cluster_summary.csv.

  3. Record cluster totals – store each cluster’s population for scaling; the solver multiplies baseline YLLs by the configured health.value_per_yll constant (no external valuation dataset required).

  4. Risk-factor breakpoints – builds dense grids of intake values (including observed exposures and configured health.intake_grid_step) and evaluates \(\log(RR)\) for every (risk, cause) pair. These tables are written to processing/{name}/health/risk_breakpoints.csv.

  5. Cause-level breakpoints – as the optimisation needs to recover \(RR = \exp(\sum_r \log RR_{r})\), the script also constructs breakpoints for the aggregated log-relative-risk and its exponential. Stored as processing/{name}/health/cause_log_breakpoints.csv.

The generated tables drive the linearisation in workflow/scripts/solve_model.py.

From Diet to Risk Exposure

Per-capita intake

During optimisation, consumption flows are tracked on links named consume_<food>_<ISO3>. For each health cluster \(c\) and risk factor \(r\), the solver forms a per-capita intake by combining these flows with shares from workflow/scripts/health_food_mapping.py:

\[I_{c,r} = \frac{10^{6}}{365\,P_c} \sum_{f \in \mathcal{F}_r} \alpha_{f,r} \; q_{c,f}\]

where

  • \(q_{c,f}\) is the aggregated flow in million tonnes per year for food \(f\) consumed by cluster \(c\);

  • \(\alpha_{f,r}\) is the share of food \(f\) attributed to risk factor \(r\) (currently 1.0 or 0.0);

  • \(P_c\) is the population represented by the cluster (baseline or updated planning population);

  • the constant rescales from Mt/year to g/day.

Linearised relative risk curves

Each risk factor \(r\) affects a subset of causes \(g\). The data from risk_breakpoints.csv provides intake breakpoints \(x_0, \ldots, x_K\) and the corresponding \(\log RR_{r,g}(x_k)\) values. For every (cluster, risk) pair we introduce SOS2 “lambda” variables \(\lambda_k\) that satisfy

\[\sum_k \lambda_k = 1,\qquad I_{c,r} = \sum_k x_k\,\lambda_k,\]

and approximate the log-relative-risk as

\[\log RR_{c,r,g} = \sum_k \lambda_k\, \log RR_{r,g}(x_k).\]

SOS2 constraints keep only two adjacent \(\lambda_k\) active, yielding a piecewise-linear interpolation without binary decision variables when the solver supports SOS2. When HiGHS is used, the implementation falls back to a compact binary formulation.

Aggregating across risk factors

Epidemiological evidence models the combined effect of multiple risk factors on one cause as multiplicative:

\[RR_{c,g} = \prod_{r \in \mathcal{R}_g} RR_{c,r,g}.\]

Taking logarithms converts this to a sum that remains compatible with linear programming:

\[\log RR_{c,g} = \sum_{r \in \mathcal{R}_g} \log RR_{c,r,g}.\]

The solver accumulates the contributions from each risk factor into log_rr_totals for every cluster–cause pair.

Recovering total relative risk

The optimisation needs \(RR_{c,g}\) again to price health damages. The preprocessed cause_log_breakpoints.csv supplies points \((z_m, \exp(z_m))\) that cover the feasible range of \(z = \log RR_{c,g}\). A second SOS2 interpolation enforces

\[z = \sum_m z_m \theta_m,\qquad RR_{c,g} = \sum_m e^{z_m} \theta_m,\]

with \(\sum_m \theta_m = 1\). This gives a consistent linearised mapping from the aggregated log-relative-risk back to the multiplicative relative risk.

Monetising years of life lost

For each cluster–cause pair the preprocessing step stores \(\mathrm{YLL}^{\mathrm{base}}_{c,g}\) (baseline years of life lost). The solver also records the reference log-relative-risk \(z^{\mathrm{ref}}_{c,g}\) (from baseline diets) and its exponential \(RR^{\mathrm{ref}}_{c,g}\). The contribution to the objective is constructed as

\[\text{Cost}_{c,g} = V\, \mathrm{YLL}^{\mathrm{base}}_{c,g} \left( \frac{RR_{c,g}}{RR^{\mathrm{ref}}_{c,g}} - 1 \right).\]

A constant term subtracts \(V\,\mathrm{YLL}^{\mathrm{base}}_{c,g}\) so that the baseline diet has zero health cost and only improvements or deteriorations relative to the reference affect the optimisation.

Objective Contribution

workflow/scripts/solve_model.py adds the summed cost over all clusters and causes to the PyPSA objective. If the solver exposes SOS2 constraints, the implementation keeps the formulation linear without integer variables; for HiGHS a tight binary fallback is activated. The script also records the constant baseline adjustment in network.meta["objective_constant_terms"]["health"] to help interpret objective values ex post.

Configuration Highlights

health:
  region_clusters: 30
  reference_year: 2018
  intake_grid_step: 20   # Intake resolution in g/person/day
  log_rr_points: 10
  omega3_per_100g_fish: 1.5
  value_per_yll: 150000  # USD per year of life lost
  # Dietary risk factors to consider (must match GDD data items)
  risk_factors:
  - fruits
  - vegetables
  - nuts_seeds
  - legumes
  - fish
  - red_meat
  - prc_meat
  - whole_grains
  # Health outcomes/causes to consider (must be present in IHME GBD data and relative risks)
  causes:
  - CHD              # Coronary/Ischemic Heart Disease
  - Stroke           # Stroke (all types)
  - T2DM             # Type 2 Diabetes Mellitus
  - CRC              # Colorectal Cancer
  # Theoretical minimum risk exposure levels (TMREL) from GBD Study 2021
  # Source: Brauer et al. (2024), Global Burden of Disease Study 2021
  # Values represent optimal intake levels where health risk is minimized
  # Reference: https://doi.org/10.1016/S0140-6736(24)00933-4
  tmrel_g_per_day:
    fruits: 345         # TMREL: 340-350 g/day (midpoint)
    vegetables: 339     # TMREL: 306-372 g/day (midpoint)
    whole_grains: 185   # TMREL: 160-210 g/day (midpoint)
    nuts_seeds: 21.5    # TMREL: 19-24 g/day (midpoint)
    legumes: 105        # TMREL: 100-110 g/day (midpoint)
    fish: 37.7          # TMREL: 470-660 mg/day omega-3 (midpoint 565 mg, converted using omega3_per_100g_fish)
    red_meat: 0         # TMREL: 0-200 g/day (using conservative lower bound)
    prc_meat: 0         # TMREL: 0 g/day (any intake increases risk)

Lowering region_clusters or log_rr_points eases the optimisation at the cost of coarser health resolution. health.intake_grid_step controls the density of the first-stage interpolation grid; smaller values give smoother curves but produce larger tables.

Outputs

The preprocessing rule saves all intermediate products under processing/{name}/health/. Downstream plotting rules also create quick-look maps (results/{name}/plots/health_*.pdf) and CSV summaries to compare baseline versus optimised health outcomes.

References

[Brauer2024]

Brauer M, Roth GA, Aravkin AY, et al. Global Burden and Strength of Evidence for 88 Risk Factors in 204 Countries and 811 Subnational Locations, 1990–2021: A Systematic Analysis for the Global Burden of Disease Study 2021. The Lancet, 2024;403(10440):2162–203. https://doi.org/10.1016/S0140-6736(24)00933-4