Current Diets¶
Overview¶
The model uses a hybrid approach to represent current consumption patterns, combining empirical dietary intake data from the Global Dietary Database (GDD) [GDD2024] [Miller2021] with food supply data from FAOSTAT Food Balance Sheets (FBS). This baseline data serves multiple purposes:
Health impact assessment: Calculating disease burden attributable to current dietary patterns
Baseline reference: Comparing optimized diets against current consumption
Model constraints: Optionally constrain the optimization to remain near current diets
Population-weighted mean food group consumption (g/person/day) by UN M49 macro-region, showing how dietary patterns vary across world regions.¶
Global population-weighted mean consumption (g/person/day) broken down by individual foods within each food group.¶
Data Sources¶
- Global Dietary Database (GDD)
Provider: Tufts University Friedman School of Nutrition Science and Policy
Coverage: 185 countries, individual-level dietary surveys (1990-2018)
Variables: 54 dietary factors including foods, beverages, and nutrients
Download: Requires free registration at https://globaldietarydatabase.org/data-download
Citation: [GDD2024]
- FAOSTAT Food Balance Sheets (FBS)
Provider: FAO Statistics Division
Coverage: Global, annual estimates of food supply
Variables: Food supply quantity (kg/capita/year)
Usage: Supplements GDD for food groups where intake survey data is sparse or inconsistent (Dairy, Poultry, Vegetable Oils)
Weight Conventions¶
GDD reports all dietary intake values in grams per day using “as consumed” weights [Miller2021]. This means:
Fresh vegetables and fruits: Reported in fresh weight (e.g., a fresh banana, fresh tomato)
Grains: Reported in cooked weight (e.g., cooked rice, prepared bread)
Dairy: Reported as total milk equivalents, which includes milk, yogurt, cheese and other dairy products converted to their milk equivalent weight
Meats: Reported in cooked/prepared weight
The model preserves these conventions in the processed output files. Units in the output CSV distinguish between general fresh weight (g/day (fresh wt)) and dairy milk equivalents (g/day (milk equiv)).
GDD to Food Group Mapping¶
The model maps GDD dietary variables to the food groups defined in config/food_groups. This mapping is implemented in workflow/scripts/prepare_gdd_dietary_intake.py.
Food Groups with GDD Data¶
The following food groups are populated from GDD variables:
Food Group |
GDD Code |
Description |
|---|---|---|
|
v01 |
Total fruits (whole fruits only, excluding juices) |
|
v02 |
Non-starchy vegetables |
|
v03, v04 |
Potatoes + other starchy vegetables (aggregated) |
|
v05 |
Beans and legumes |
|
v06 |
Nuts and seeds |
|
v07 |
Refined grains (white flour, white rice) |
|
v08 |
Whole grains |
|
v10 |
Unprocessed red meats (cattle, pig) |
|
v09 |
Total processed meats |
|
v12 |
Eggs |
|
v15, v35 |
Sugar-sweetened beverages and added sugars |
|
v17, v18 |
Coffee and tea. GDD reports these in cups/day (brewed beverage); the script converts to dry commodity weight using configured factors (default: 14.4 g-dry/cup for coffee, 2.4 g-dry/cup for tea). Cocoa is not covered by GDD and enters only via FAOSTAT within-group shares. |
Notes:
Multiple GDD variables can map to a single food group (e.g., starchy_vegetable = v03 potatoes + v04 other starchy veg)
When aggregating, values are summed within each food group
The
fruitsfood group uses only v01 (whole fruits), excluding v16 (fruit juices), to align with the GBD fruit risk factor definition used in health impact modelingGDD also tracks fish/seafood (v11), but fish is not currently modelled as a food group
Food Groups Sourced from FAOSTAT¶
The following food groups are populated from FAOSTAT Food Balance Sheets (FBS) because intake survey data (GDD) is often sparse, inconsistent, or structurally missing for these commodities:
Food Group |
Description & Source Items |
|---|---|
|
Total Milk Equivalent. Aggregated from FAOSTAT items: Milk - Excluding Butter (2848), Butter/Ghee (2740), and Cream (2743). Butter and cream are converted to milk equivalents using FAO dairy commodity tree extraction rates (≈21.3× for butter/ghee, ≈6.7× for cream); milk-excl.-butter is taken as-is. |
|
Poultry Meat (2734). |
|
Vegetable Oils (2586). |
Methodology for FAOSTAT Data: FAOSTAT reports “Food Supply” (retail weight), which typically includes household waste. The model converts this to “Dietary Intake” (consumed weight) by applying country-specific waste fractions derived from the UNSD Food Waste Index (see Food Processing & Trade).
Data Processing¶
The dietary data processing pipeline involves three stages:
Prepare GDD Data (
workflow/scripts/prepare_gdd_dietary_intake.py): Processes GDD survey data for most food groups.Prepare FAOSTAT Data (
workflow/scripts/prepare_faostat_gdd_supplements.py): Fetches FAOSTAT supply data for dairy, poultry, and oil; converts supply to intake by subtracting waste; fills missing countries using proxies.Merge Sources (
workflow/scripts/merge_dietary_sources.py): Combines the datasets into a unifieddietary_intake.csv.
The GDD processing step (Step 1) performs the following:
Load GDD files: Read country-level CSV files (
v*_cnty.csv) for each dietary variableFilter to reference year: Extract data for
config.health.reference_year(default: 2018)Map age groups: Convert GDD age midpoints to GBD-compatible age buckets (0-1, 1-2, 2-5, 6-10, 11-74, 75+ years)
Aggregate strata: Use GDD’s pre-computed population-weighted national aggregate rows (
female=999,urban=999,edu=999) rather than a simple mean across all demographic strata, which would ignore stratum sizes. Falls back to simple mean only when aggregate rows are absent.Map to food groups: Apply the GDD-to-food-group mapping defined in the script
Aggregate variables: Sum multiple GDD variables that map to the same food group (preserving age stratification)
Handle missing countries: Apply proxies for territories without separate GDD data
Validate completeness: Ensure all required countries and food groups are present
Output: Write
processing/{name}/gdd_dietary_intake.csvwith age-stratified data
Output Format¶
The processed dietary intake file has the following structure:
unit,item,country,age,year,value
g/day (milk equiv),dairy,USA,0-1 years,2018,252.3
g/day (milk equiv),dairy,USA,1-2 years,2018,258.3
g/day (milk equiv),dairy,USA,11-74 years,2018,174.6
g/day (milk equiv),dairy,USA,All ages,2018,187.1
g/day (fresh wt),fruits,USA,11-74 years,2018,145.2
...
Where:
unit: Weight convention specific to the food groupg/day (fresh wt): Fresh/cooked “as consumed” weight for most foodsg/day (milk equiv): Total milk equivalents for dairyg/day (refined sugar eq): Refined sugar equivalent for the sugar food group
item: Food group namecountry: ISO 3166-1 alpha-3 country codeage: Age group using GBD-compatible naming0-1 years: Infants under 1 year1-2 years: Toddlers 1-2 years2-5 years: Early childhood 2-5 years6-10 years: Middle childhood 6-10 years11-74 years: Adults 11-74 years75+ years: Elderly 75+ yearsAll ages: Population-weighted average across all age groups
year: Reference yearvalue: Mean daily intake in grams per person for the specified age group
Country Coverage¶
The GDD dataset covers 185 countries. For a small number of territories without separate dietary surveys, the model uses proxy data from similar countries:
American Samoa (ASM): Uses Samoa (WSM) data
French Guiana (GUF): Uses France (FRA) data
Puerto Rico (PRI): Uses USA data
Somalia (SOM): Uses Ethiopia (ETH) data
These proxies are defined in the COUNTRY_PROXIES dictionary in prepare_gdd_dietary_intake.py.
GBD Dietary Risk Exposure Data¶
In addition to the GDD survey data and FAOSTAT supplements described above, the model also incorporates dietary exposure estimates from the Global Burden of Disease (GBD) Study 2019 [Brauer2024]. These estimates cover adults aged 25 and older and are derived from the GBD’s dietary risk factor framework (see the “GBD Dietary Risk Factors” section in Health Impacts for the full risk factor definitions).
The GBD data provides country-level intake estimates (g/day) for the following food groups:
GBD Risk Factor |
Model Food Group |
Notes |
|---|---|---|
|
|
Whole fruits, excluding juices |
|
|
Non-starchy vegetables |
|
|
Whole grains |
|
|
Beans and pulses |
|
|
Nuts, seeds, and peanuts |
|
|
Unprocessed red meats |
|
(cross-validation only) |
Logged for comparison against FAOSTAT dairy |
These six food groups (excluding milk) overlap with GDD estimates, enabling
both cross-validation and averaging to produce more robust group totals. The
GBD data is processed by workflow/scripts/prepare_gbd_dietary_risk_exposure.py,
which filters the raw IHME CSV files for the configured reference year, maps
GBD location names to ISO3 country codes, and outputs
processing/{name}/gbd_dietary_risk_exposure.csv.
Baseline Diet Estimation¶
The dietary intake pipeline described above produces food-group-level totals (e.g., “fruits: 145 g/day in the USA”). The model, however, operates at the level of individual foods (e.g., banana, citrus). The baseline diet estimation algorithm bridges this gap by combining food-group totals with FAOSTAT item-level supply data to produce per-food, per-country consumption estimates.
This algorithm is implemented in workflow/scripts/estimate_baseline_diet.py
and proceeds in four steps.
Step 1: Food Group Totals¶
For food groups where both GDD and GBD estimates are available, the two sources are averaged to produce a more robust estimate:
This averaging applies to six food groups: fruits, vegetables,
whole_grains, legumes, nuts_seeds, and red_meat. If GBD data
is missing for a particular country, the GDD value is used alone.
For all other food groups (dairy, poultry, oil, grain,
starchy_vegetable, prc_meat, eggs, sugar,
stimulants), the GDD or FAOSTAT value from dietary_intake.csv
is used as-is.
The script also logs cross-validation metrics between GDD and GBD for the overlapping groups, reporting the median and range of the GDD/GBD ratio across countries. GBD milk intake is logged separately for comparison against the FAOSTAT-derived dairy estimate.
The age group used for baseline totals is configured via
config.diet.baseline_age (default: "11-74 years").
Step 3: Per-Food Consumption¶
The final per-food consumption estimate is the product of the food-group total and the within-group share:
where \(c_{i,f}\) is the estimated consumption (g/day) of food \(f\) in country \(i\), \(T_{i,g(f)}\) is the group total for the food group containing \(f\), and \(s_{i,f}\) is the within-group share.
As a validation check, within-group sums are verified to match group totals within a tolerance of 0.1 g/day. Any discrepancies are logged as warnings.
Step 4: FBS Supply Overrides¶
For specific foods where GDD survey-based group totals substantially
underestimate actual consumption, the per-food estimate from Step 3 is replaced
with waste-corrected FAOSTAT Food Balance Sheet supply data. The list of
overridden foods is configured via config.diet.fbs_override_foods (default:
["yam"]).
For each overridden food and country, the replacement consumption is:
where \(S_{i,f}\) is the FAOSTAT food supply (kg/capita/year) summed across the food’s FBS item codes, and \(w_{i,g(f)}\) is the country- and group-level consumer waste fraction from the food loss and waste dataset.
Why yam needs an override
GDD starchy vegetable intake for sub-Saharan Africa is 7–33× below FAOSTAT food supply (e.g., Nigeria: GDD ≈ 72 g/day vs. FBS ≈ 700 g/day for starchy vegetables). Because yam production is almost entirely concentrated in West Africa, the GDD underestimate translates directly to a ~10× underestimate of yam demand. The within-group shares are correct — the problem is entirely in the GDD group total for starchy vegetables in these countries. Overriding yam consumption with FBS supply ensures that the model’s demand matches observed food availability.
Baseline Diet Output¶
The output file processing/{name}/baseline_diet.csv contains one row per
(country, food) combination with the following columns:
Column |
Description |
|---|---|
|
ISO 3166-1 alpha-3 country code |
|
Model food name (e.g., |
|
Food group to which the food belongs |
|
Estimated daily consumption in grams per person |
Rows are sorted by (country, food_group, food).
Downstream Uses¶
The baseline diet feeds into several parts of the model:
Baseline diet enforcement: When
config.validation.enforce_baseline_dietis enabled, the solver adds per-food, per-country equality constraints on food consumption links, forcing the solution to replicate observed intake.Within-group ratio fixing: When
config.food_groups.fix_within_group_ratiosis enabled, the solver constrains foods within each group to maintain their baseline proportions while allowing group totals to vary.Piecewise consumer utility calibration: In the consumer-values workflow, baseline per-food consumption and baseline food-equality duals are combined to calibrate
results/{name}/consumer_values/utility_blocks.csv. These blocks are then used in the solve objective whenconfig.food_utility_piecewise.enabledis true.The current calibration anchors utility at baseline quantity: the block containing baseline consumption uses the extracted dual value, while blocks below baseline are more valuable and blocks above baseline are less valuable.
Health impact assessment: Baseline consumption is used when computing the population-attributable fraction of diet-related disease burden (see Health Impacts).
Workflow Integration¶
- Snakemake rules:
prepare_gdd_dietary_intakeprepare_faostat_gdd_supplementsmerge_dietary_sourcesprepare_gbd_dietary_risk_exposureestimate_baseline_diet
- Input data:
data/manually_downloaded/GDD-dietary-intake/Country-level estimates/*.csv(GDD)data/manually_downloaded/IHME_GBD_2019_DIET_RISK_1990_2019_DATA/*.csv(GBD)FAOSTAT API (live fetch for FBS, QCL, and animal production data)
Curated data files:
File |
Purpose |
|---|---|
|
Maps model foods to FAOSTAT FBS item codes for within-group share calculation |
|
Maps foods sharing an FBS item to individual QCL production codes for disambiguation |
|
Defines the food-to-food-group mapping |
- Configuration parameters:
config.countries: List of countries to processconfig.food_groups.included: Food groups to filter and aggregateconfig.health.reference_year: Year for GDD dietary intake dataconfig.diet.baseline_reference_year: Year for GBD exposure dataconfig.diet.baseline_age: Age group for baseline totals (default:"11-74 years")config.diet.fbs_override_foods: Foods whose consumption is overridden with waste-corrected FBS supply (default:["yam"])config.byproducts: Foods to exclude from share calculation (e.g., wheat-bran)
- Output:
processing/{name}/dietary_intake.csv— Merged food-group-level intakeprocessing/{name}/gbd_dietary_risk_exposure.csv— GBD risk exposure estimatesprocessing/{name}/baseline_diet.csv— Per-food, per-country consumption
- Scripts:
workflow/scripts/prepare_gdd_dietary_intake.pyworkflow/scripts/prepare_faostat_gdd_supplements.pyworkflow/scripts/merge_dietary_sources.pyworkflow/scripts/prepare_gbd_dietary_risk_exposure.pyworkflow/scripts/estimate_baseline_diet.py
References¶
Global Dietary Database. Dietary intake data by country, 2018. Tufts University Friedman School of Nutrition Science and Policy. https://www.globaldietarydatabase.org/ (accessed 2025)
Miller V, Singh GM, Onopa J, et al. Global Dietary Database 2017: Data Availability and Gaps on 54 Major Foods, Beverages and Nutrients among 5.6 Million Children and Adults from 1220 Surveys Worldwide. BMJ Global Health, 2021;6(2):e003585. https://doi.org/10.1136/bmjgh-2020-003585