Current Diets¶
Overview¶
The model represents current consumption patterns by combining three intake datasets — GDD-IA, GBD, and NHANES (for the USA) — with item-level food supply data from FAOSTAT for within-group disaggregation. The pipeline produces a single per-country, per-food baseline diet whose mass basis is aligned with what the model’s food bus delivers after applying food loss and waste. The baseline diet serves several roles:
Health impact assessment: dietary risk exposure for the burden of disease attributable to current diets.
Optimization reference: comparison point for optimized diets and, optionally, an equality constraint when
enforce_baseline_dietis enabled.Calibration: anchors the consumer-utility piecewise blocks (Consumer Values) and the production-stability L1 calibration (Calibration).
Population-weighted mean food group consumption (g/person/day) by UN M49 macro-region, showing how dietary patterns vary across world regions.¶
Global population-weighted mean consumption (g/person/day) broken down by individual foods within each food group.¶
Data Sources¶
- Global Dietary Database — Integrated Assessment (GDD-IA)
Provider: Marco Springmann (University of Oxford / UCL). GDD-IA combines the Global Dietary Database (GDD) survey-based intake estimates with FAOSTAT Food Balance Sheets and applies a multi-source caloric-intake normalisation procedure to produce consistent per-country food and energy intake estimates.
Status: Pending publication; available upon personal request from Marco Springmann. Will be re-licensed under CC-BY-NC on release.
Coverage: ~185 countries, per-country mean dietary intake at the reference year, reported in parallel grams/day and kcal/day for every food category.
Role: Primary source of per-country food-group totals for all food groups except the GBD-anchored risk groups (see below).
- Global Burden of Disease (GBD) 2019 dietary risk exposure
Provider: Institute for Health Metrics and Evaluation (IHME) [Brauer2024]
Coverage: country-level mean intake (g/day) for the GBD dietary risk factors, adults 25+.
Role: anchors the risk-factor food groups (fruits, vegetables, whole_grains, legumes, nuts_seeds, red_meat) so the baseline lines up with the same exposure basis the GBD relative-risk functions were calibrated against.
- NHANES — What We Eat in America / FPED
Provider: USDA ARS / CDC NHANES
Coverage: United States; population-mean intake per food group derived from the FPED demographic table.
Role: USA-only override for every food group it covers.
- FAOSTAT FBS + QCL
Provider: FAO Statistics Division
Role: Item-level supply (FBS) drives within-group disaggregation of food-group totals into per-food consumption. Production statistics (QCL) resolve shared FBS items (e.g. several millet species under one FBS code) and weight module-pool projections (see Step 2: Within-group food shares). FBS supply also serves as the anchor source for the foods in
diet.fbs_override_foods(meats, eggs, yam, coffee, cocoa).
Weight Conventions¶
GDD-IA reports intake “as consumed” (cooked weight for cereals and
meats, fresh weight for fruits and vegetables). The pipeline derives the
food-group mass values that downstream rules consume in the model’s
basis, so no further conversion is needed when reading
dietary_intake.csv:
For most groups (cereals, vegetables, fruits, nuts/seeds, oil, sugar, legumes, poultry, eggs) the IA-reported grams are already close enough to the model basis to be passed through as-is.
Red meat is inflated from cooked to raw retail mass by the configured
diet.gdd_ia.cooked_to_rawfactor (default 1.43, i.e.1/0.7).Dairy mass is derived from energy at cow-milk density (0.607 kcal/g) so the value is on a strict cow-milk-equivalent basis. All dairy subcategories reported by GDD-IA (fluid milk, yoghurt, cheese, condensed/evaporated, ice cream, butter, cream) are pooled by energy before the conversion.
GBD exposure is converted to the model basis at load time via
diet.source_basis and diet.weight_conversion (cooked→dry for
whole_grains and legumes at 0.45 and 0.40; cooked→fresh for
red_meat at 1.43). NHANES values are intake-based and pass through
unchanged.
Units in the merged dietary_intake.csv distinguish g/day (fresh
wt) from g/day (milk equiv) for dairy and g/day (refined sugar
eq) for sugar.
GDD-IA to Food Group Mapping¶
GDD-IA’s food categories are mapped onto the model’s food groups in
workflow/scripts/prepare_gdd_ia_dietary_intake.py. The mapping
covers every food group the model uses (fruits, vegetables,
starchy_vegetable, legumes, nuts_seeds, oil, sugar,
grain, whole_grains, red_meat, poultry, dairy,
eggs, plus stimulants for downstream tea/coffee handling).
Categories that are out of scope for the model (alcohol, seafood,
spices, rendered animal fats, miscellaneous “other”) are excluded from
food-group totals but their energy is tracked separately for the
kcal-normalisation step described below. Refined and whole-grain mass
are tracked separately so cereals can be split between the model’s
grain and whole_grains groups; plantain is routed to
starchy_vegetable; and all red-meat subcategories (including
processed) are folded into red_meat so the consumption side stays
consistent with FAOSTAT slaughter-volume animal production.
A more detailed category-level mapping will be added once GDD-IA is published.
Country Coverage¶
GDD-IA covers ~185 countries. For a handful of territories without
separate IA estimates the pipeline copies values from a configured
proxy. The built-in proxies live in
workflow/scripts/prepare_gdd_ia_dietary_intake.py and can be
extended via diet.gdd_ia.country_proxies in the config:
Missing country |
Proxy |
Rationale |
|---|---|---|
Afghanistan (AFG) |
Iran (IRN) |
Persian/Pashtun dietary similarity. |
American Samoa (ASM) |
Samoa (WSM) |
Pacific islands; geographic proximity. |
Brunei (BRN) |
Malaysia (MYS) |
Regional similarity. |
Bhutan (BTN) |
Nepal (NPL) |
Himalayan diet. |
Eritrea (ERI) |
Ethiopia (ETH) |
Existing convention. |
Equatorial Guinea (GNQ) |
Cameroon (CMR) |
Central African neighbour. |
French Guiana (GUF) |
France (FRA) |
French overseas territory. |
Palestine (PSE) |
Jordan (JOR) |
Regional similarity. |
Puerto Rico (PRI) |
United States (USA) |
US territory. |
Somalia (SOM) |
Ethiopia (ETH) |
Existing convention. |
South Sudan (SSD) |
Sudan (SDN) |
Regional and historical ties. |
Taiwan (TWN) |
China (CHN) |
Regional similarity. |
Data Processing¶
The diet pipeline runs in three preparation stages followed by the baseline-diet estimation:
Prepare GDD-IA (
prepare_gdd_ia_dietary_intake): reads the parallel grams and kcal CSVs, maps GDD-IA’s food categories to the model’s food groups, derives the per-food-group mass in model basis (pooling all dairy subcategories by energy, applying the cooked-to- raw meat inflation), and emits two files:gdd_ia_dietary_intake.csv— per-(country, food group) intake (g/day) at age =All ages.gdd_ia_kcal_target.csv— per-country kcal accounting: the total dietary energy, the out-of-scope subtotal, the in-scope target (total minus out-of-scope), and the refined / whole-grain cereal energy split. Consumed by the cereal residual fix and the kcal-normalisation step inestimate_baseline_diet.
Prepare NHANES (
prepare_nhanes_dietary_intake): parses the USDA FPED demographic-table PDF for the configured cycle and emits USA-only per-food-group intake with the FAOSTAT butter top-up, cured-meat fold, and fruit-juice projection (see Data Sources for the FPED specifics).Merge sources (
merge_dietary_sources): NHANES overrides GDD-IA for the (country, item) pairs it covers; the merged filedietary_intake.csvis the input toestimate_baseline_diet.
The GBD risk-exposure data is processed independently by
prepare_gbd_food_group_intake into
gbd_food_group_intake.csv and is read directly by
estimate_baseline_diet for the GBD-anchored groups.
Output Format¶
dietary_intake.csv:
unit,item,country,age,year,value
g/day (milk equiv),dairy,USA,All ages,2018,...
g/day (fresh wt),fruits,USA,All ages,2018,...
...
unit:g/day (fresh wt),g/day (milk equiv)(dairy), org/day (refined sugar eq)(sugar).item: food group name.country: ISO 3166-1 alpha-3 code.age:All agesfor GDD-IA rows; NHANES uses the configureddiet.baseline_ageliteral (the FPED single population-mean row).year: reference year.value: mean daily intake in grams per person, in model basis.
Baseline Diet Estimation¶
The dietary intake stage produces food-group-level totals. The
optimization model operates at the level of individual foods, so
workflow/scripts/estimate_baseline_diet.py disaggregates the totals
into per-(country, food) consumption estimates and applies a small
number of consistency fixes:
Step 1: Food group totals¶
For groups in health.risk_factors (currently fruits,
vegetables, whole_grains, legumes, nuts_seeds,
red_meat) the per-country total is taken from GBD when GBD reports
a value and falls back to the merged GDD-IA/NHANES value otherwise.
GBD strictly takes precedence on these groups — no averaging — so the
baseline is on the same intake basis the GBD relative-risk functions are
calibrated against. All other groups use GDD-IA (or NHANES for the USA).
GBD exposure is converted to the model’s basis at load time, per
food-group, using diet.source_basis plus per-(source, country,
food_group) overrides from data/curated/diet_source_basis_overrides.csv
and the conversion tables in diet.weight_conversion. The script also
logs cross-validation metrics: median and range of the GDD-IA/GBD ratio
across countries for every risk group, and GBD’s milk exposure as a
cross-check on the dairy total.
Step 1b: Cereal residual fix¶
GBD’s whole_grains risk factor is defined narrowly (dry whole-grain
flour). GDD-IA’s whole_grains is broader (any product with
substantial whole-grain content). When Step 1 anchors whole_grains
to GBD, ~250 kcal/day of cereal energy can disappear from the country’s
cereal budget. To preserve the cereal energy budget, the deficit is
reassigned to refined ``grain``:
The IA cereal kcal pool comes from gdd_ia_kcal_target.csv (basis-
aware), not from nutrition.csv per-group averages.
Step 1c: Anchor-aware kcal normalisation¶
For each country, the unanchored groups are scaled by a single
multiplicative factor so that total kcal across all groups lands on
the in-scope dietary-energy target from gdd_ia_kcal_target.csv
(total energy minus the out-of-scope subtotal). GBD-anchored groups
and the refined-grain residual from Step 1b are held fixed. The factor
is clipped to [0.1, 5.0] to guard against pathological values;
the mean, std, and range of the factor across countries are logged.
Step 3: Per-food consumption¶
Per-food consumption is the product of the food-group total (post Steps 1b and 1c) and the within-group share:
As a validation check, the within-group sums are verified to match the group totals to within 0.1 g/day (excluding foods that will be replaced by FBS overrides in Step 4).
Step 4: FBS supply overrides¶
For foods listed in diet.fbs_override_foods the Step-3 estimate is
replaced with an FBS-supply-anchored intake. The override formula is
where
\(S_{i,f}\) is the FAOSTAT FBS supply (kg/capita/year) for the food’s FBS items (carcass-weight for meat);
\(\sigma_{i,f}\) is the within-FBS-item share (1.0 unless several override foods share an FBS code, in which case the supply is split between them by country-level QCL production weights — e.g. dairy / dairy-buffalo both map to FBS 2848);
\(r_f\) is the carcass-to-retail factor for meat (0.67 cattle, 0.73 pig, 0.66 sheep, 0.60 chicken; 1.0 for non-meat foods);
\(w_{i,g(f)}\) is the country- and group-level consumer-waste fraction from
processing/{name}/food_loss_waste.csv.
Note that the override deducts only consumer waste, not
supply-chain loss: the FAOSTAT FBS “Food supply” element is already net
of production-side losses (production − feed − seed − processing −
other − losses = food). The \((1-w)\) factor lands the override on
the same post-FLW intake basis the model’s food_processing and
animal_production links deliver after applying their FLW
multipliers, so the diet mass-balances against the food bus.
Why yam needs an override
GDD-IA / GBD starchy-vegetable intake for sub-Saharan Africa is well below FAOSTAT food supply (e.g. Nigeria: GBD ≈ 70 g/day vs. FBS ≈ 700 g/day for starchy vegetables). Because yam production is almost entirely concentrated in West Africa, the within-group underestimate translates directly into a ~10× underestimate of yam demand. The within-group shares are correct — the problem is in the group total — so overriding yam consumption with FBS supply ensures the model’s demand matches observed food availability.
Why animal products use FBS, not survey intake
For meats, poultry, and eggs the per-food intake is anchored to FAOSTAT FBS supply rather than the survey-disaggregated group total. Three reasons:
Survey bias on socially significant foods. Self-reported food intake systematically over-reports red meat against slaughter- volume supply in many populations. GDD-IA harmonises survey data but does not reconcile against production. The combined intake total for red meat sat ~24 Mt/yr above what total world supply (production net of feed/non-food/exports, after post-loss and consumer waste) can deliver — physically impossible — and previously inflated the calibrated
animal_feed_l1_costninefold because the production-stability calibration was forced to fight intake-derived consumer values that were structurally above supply.Trade is handled implicitly. FBS supply per country already encodes
production + imports − exports − feed − seed − non-food − stock_changes, so country-level diet automatically reflects observed importer/exporter patterns. The model’s trade hubs then only have to reproduce the observed FAOSTAT trade flows at solve time, instead of resolving a mismatch via expensive feed-deviation L1 penalties.Same FAOSTAT backbone as production. Baseline animal production is built from QCL element 5510 with the shared
weight_conversion.carcass_to_freshtable applied. FBS aggregates the same QCL primary commodities at carcass-weight balance level. Anchoring both sides to FAOSTAT removes a class of unit/source mismatches that otherwise surfaces as residual slack after solve.
Dairy is intentionally excluded from the override list. Its
food_loss_waste convention is non-standard — the curated dairy
override sets loss_fraction=0 and waste_fraction=0.30, where
the 30 % lumps in non-food uses of raw milk (calf feed, processing,
industrial) plus retail and consumer waste, because the model does
not have an explicit non-food milk outlet. Under that convention the
GDD-IA-based dairy total happens to mass-balance against the
production-side QCL × 0.7 delivered to the food bus. Switching
dairy to an FBS override would break that balance.
Output¶
processing/{name}/baseline_diet.csv has one row per (country, food):
Column |
Description |
|---|---|
|
ISO 3166-1 alpha-3 country code. |
|
Model food name (e.g. |
|
Food group to which the food belongs. |
|
Estimated daily consumption in grams per person, on post-loss, post-waste consumer-eaten intake basis — the same basis the food bus delivers after the build_model FLW multiplier (see Weight bases for animal products). |
Rows are sorted by (country, food_group, food).
Downstream Uses¶
Baseline diet enforcement: when
config.validation.enforce_baseline_dietis true, the solver adds per-food, per-country equality constraints on food consumption links.Within-group ratio fixing: when
config.food_groups.fix_within_group_ratiosis true, foods within each group are constrained to keep their baseline proportions while group totals may vary.Piecewise consumer utility calibration: baseline per-food consumption and baseline food-equality duals together calibrate
results/{name}/consumer_values/utility_blocks.csv(Consumer Values).Health impact assessment: baseline consumption feeds the population-attributable fraction calculation (Health Impacts).
Workflow Integration¶
Snakemake rules (see workflow/rules/diet.smk):
prepare_gdd_ia_dietary_intakeprepare_nhanes_dietary_intakemerge_dietary_sourcesprepare_gbd_food_group_intakeprepare_faostat_fbs_itemsprepare_food_loss_wasteestimate_baseline_dietvalidate_baseline_dietandcompare_baseline_diet_to_gbd(consistency checks)
Input data:
data/manually_downloaded/GDD-IA-intake_grams_{baseline_year}.csvdata/manually_downloaded/GDD-IA-intake_kcals_{baseline_year}.csvdata/manually_downloaded/IHME_GBD_2019_DIET_RISK_1990_2019_DATA/*.csvdata/downloads/usda_fped/Table_1_FPED_MaleFemale_{cycle}.pdfFAOSTAT FBS and QCL (auto-fetched via the FAOSTAT bulk API)
Curated data files:
File |
Purpose |
|---|---|
|
Maps model foods to FAOSTAT FBS item codes for within-group share calculation. |
|
Maps foods sharing an FBS item to QCL production codes for disambiguation. |
|
Food → food-group mapping. |
|
Per-food native mass basis (dry / fresh / cooked / milk-equiv). |
|
Per-(source, country, food_group) basis overrides for the cross-source conversion. |
|
FPED column → model food-group mapping and unit-conversion factors. |
|
Per-(country, food_group) loss/waste overrides feeding
|
Configuration parameters:
config.countries— list of countries.config.food_groups.included— food groups to process.config.baseline_year— reference year for GDD-IA and GBD.config.diet.baseline_age— age label written to NHANES rows (default"All ages").config.diet.fbs_override_foods— foods anchored to FBS supply. See Why animal products use FBS.config.diet.source_basisandconfig.diet.weight_conversion— per-source native bases and conversion tables.config.diet.gdd_ia.cooked_to_raw— per-group cooked→raw inflation factors for GDD-IA (currentlyred_meat: 1.43).config.diet.gdd_ia.country_proxies— extra proxies beyond the defaults inprepare_gdd_ia_dietary_intake.py.config.diet.nhanes.cycleand.reference_year— FPED release.config.health.risk_factors— drives which food groups are anchored to GBD in Step 1.config.byproducts— foods excluded from share calculation.
Output:
processing/{name}/gdd_ia_dietary_intake.csv— GDD-IA group-level intake.processing/{name}/gdd_ia_kcal_target.csv— per-country kcal accounting (total dietary energy, out-of-scope subtotal, in-scope target, refined / whole-grain cereal energy split).processing/{name}/nhanes_dietary_intake.csv— USA NHANES override.processing/{name}/dietary_intake.csv— merged GDD-IA + NHANES.processing/{name}/gbd_food_group_intake.csv— GBD exposure.processing/{name}/baseline_diet.csv— per-food, per-country baseline diet.
Scripts:
workflow/scripts/prepare_gdd_ia_dietary_intake.pyworkflow/scripts/prepare_nhanes_dietary_intake.pyworkflow/scripts/merge_dietary_sources.pyworkflow/scripts/prepare_gbd_food_group_intake.pyworkflow/scripts/estimate_baseline_diet.pyworkflow/scripts/diet/food_group_projection.py— within-group pooled-projection helpers (FBS-code pools, production-share blends).