Configuration¶
Overview¶
The food-opt model is configuration-driven: all scenario parameters, crop selections, constraints, and solver options are defined in YAML configuration files under config/. This allows exploring different scenarios without modifying code.
The default configuration is config/default.yaml, structured into thematic sections.
Custom configuration files¶
Instead of modifying the default configuration file, it is recommended to explore individual scenarios by creating named configuration files, overriding specific parts of the default configuration. Such a named configuration file must contain at the minimum a name. An example could be something like the following:
# config/my_scenario.yaml
name: "my_scenario" # Scenario name → results/my_scenario/
planning_horizon: 2040 # Override the default 2030 horizon
land:
regional_limit: 0.6 # Tighten land availability
slack_marginal_cost: 1e10 # Optional: raise slack penalty during validation
emissions:
ghg_price: 250 # Raise the carbon price above the default
Any keys omitted in your custom file fall back to the defaults shown in the sections below, so you can keep overrides concise.
By default, results are saved under results/{name}/, allowing multiple scenarios coming from different configuration files to coexist. This root (and roots for processing, logs, and benchmarks) can be overridden via paths in the config.
To build and solve the model based on the above example configuration, you would run the following:
tools/smk -j4 --configfile config/my_scenario.yaml
Scenario Presets¶
The workflow supports scenario presets defined in config/scenarios.yaml that apply configuration overrides via a {scenario} wildcard. This allows exploring variations (e.g., with/without health constraints or GHG pricing) within a single configuration without duplicating config files.
Each scenario preset in scenarios.yaml contains a set of configuration overrides that are applied recursively on top of the base configuration. For example:
# config/scenarios.yaml
default:
health:
enabled: false
emissions:
ghg_pricing_enabled: false
HG:
health:
enabled: true
emissions:
ghg_pricing_enabled: true
With default path roots, the scenario name becomes part of all output paths:
Built models:
results/{name}/build/model_scen-{scenario}.ncSolved models:
results/{name}/solved/model_scen-{scenario}.ncPlots:
results/{name}/plots/scen-{scenario}/
To build a specific scenario:
tools/smk -j4 --configfile config/my_scenario.yaml -- results/my_scenario/build/model_scen-HG.nc
This feature enables systematic sensitivity analysis and comparison across policy scenarios using a single configuration file.
Programmatic Scenario Generation¶
When conducting sensitivity analyses or parameter sweeps, you often need many scenarios that differ only in one or two parameter values. Writing these out manually is tedious and error-prone. The _generators DSL allows you to define scenario templates that are automatically expanded into concrete scenarios at configuration load time.
Basic structure
A generator specification has three required fields:
_generators:
- name: scenario_{param} # Name pattern with {placeholders}
parameters: # Parameter definitions
param:
<value-spec>
template: # Configuration template
some_section:
some_key: "{param}" # Placeholder substitution
When the configuration is loaded, each generator expands into multiple concrete scenarios. The {param} placeholders in both the name and template are replaced with generated values.
Generating parameter values
There are three ways to specify parameter values:
Log-spaced values (
space: log): Uses logarithmic spacing, useful when sensitivity varies across orders of magnitude.parameters: price: space: log start: 5 # First value stop: 500 # Last value num: 8 # Number of points round: true # Optional: round to integers
Linear-spaced values (
space: linor omitted): Uses uniform spacing.parameters: fraction: space: lin start: 0.0 stop: 1.0 num: 11
Explicit values (
values): Specify exact values for non-uniform grids.parameters: n: values: [3, 5, 10, 20, 50, 100]
Combination modes
When a generator has multiple parameters, the mode field controls how they are combined:
Zip mode (default): Pairs parameters element-wise. All parameter lists must have the same length. Generates N scenarios from N values per parameter. Use this when parameters should vary together along a single dimension.
Grid mode: Computes the Cartesian product. Generates M × N scenarios from M values of one parameter and N of another. Use this to explore a full parameter space.
Example: Single-parameter sweep
This generator creates 8 scenarios with log-spaced GHG prices from 5 to 500:
_generators:
- name: ghg_{ghg}
parameters:
ghg:
space: log
start: 5
stop: 500
num: 8
round: true
template:
emissions:
ghg_price: "{ghg}"
Result: scenarios ghg_5, ghg_8, ghg_14, …, ghg_500 (8 total).
Example: Paired parameters (zip mode)
This generator creates scenarios where GHG price and YLL value increase together:
_generators:
- name: ghg_yll_{ghg}
mode: zip
parameters:
ghg:
space: log
start: 5
stop: 500
num: 8
round: true
yll:
space: log
start: 50
stop: 100000
num: 8
round: true
template:
emissions:
ghg_price: "{ghg}"
health:
value_per_yll: "{yll}"
Result: 8 scenarios where the i-th GHG value pairs with the i-th YLL value.
Example: Parameter grid (grid mode)
This generator explores all combinations of GHG and biomass prices:
_generators:
- name: ghg{ghg}_biomass{biomass}
mode: grid
parameters:
ghg:
values: [0, 50, 100, 150, 200, 250, 300]
biomass:
values: [0, 50, 100, 150, 200]
template:
emissions:
ghg_price: "{ghg}"
biomass:
marginal_values_usd_per_tonne: "{biomass}"
Result: 35 scenarios (7 × 5 combinations).
Mixing generators with manual scenarios
Generators can coexist with manually defined scenarios in the same file:
# Manual scenario
baseline:
validation:
enforce_baseline_diet: true
# Generated scenarios
_generators:
- name: sensitivity_{x}
parameters:
x:
values: [1, 2, 3]
template:
some_param: "{x}"
Type preservation
When a placeholder is the entire value (e.g., "{param}"), the numeric type is preserved. When embedded in a string (e.g., "prefix_{param}"), values are converted to strings. This ensures configuration values have the correct types for downstream processing.
Sensitivity analysis mode¶
In addition to zip and grid modes, generators support mode: sensitivity for surrogate-based global sensitivity analysis. In this mode, parameter values are drawn from a space-filling Sobol sequence transformed to specified probability distributions, rather than from fixed value lists.
Each parameter specifies a distribution instead of a value range:
_generators:
- name: gsa_{sample_id}
mode: sensitivity
samples: 256
slice_parameters: [ghg_price]
parameters:
yield_factor:
lower: 0.8
upper: 1.2
ch4_factor:
distribution: lognormal
mu: 0.0
sigma: 0.15
ghg_price:
lower: 0
upper: 300
template:
sensitivity:
crop_yields:
all: "{yield_factor}"
emission_factors:
ch4: "{ch4_factor}"
emissions:
ghg_price: "{ghg_price}"
Supported distributions are uniform (default; requires lower, upper), log_uniform (requires lower, upper; both positive), normal (requires mean, std), normal_ci (requires lower, upper; optional confidence, bounds), and lognormal (requires mu, sigma).
The samples field sets the number of quasi-random samples (should be a power of 2). The slice_parameters field designates parameters for conditional analysis — these are included in the surrogate fit but can be fixed at specific values to study how sensitivity changes with policy choices. Surrogate method configuration (PCE, RF) lives in a separate sensitivity_analysis top-level section.
See Sensitivity Analysis for full methodology details, output file formats, and interpretation guidance.
Configuration sections¶
Scenario Metadata¶
scenarios:
# Each key represents a named scenario that can be activated via the
# {scenario} wildcard in Snakemake (e.g., model_scen-default.nc).
# The values are configuration overrides applied recursively on top
# of the default configuration.
default: {}
# Example:
# high_ghg:
# emissions:
# ghg_price: 500
# --- section: temporal ---
# Temporal configuration
#
# planning_horizon: Target year for optimization. Controls population
# projections (UN WPP) and GDP clustering (IMF WEO).
#
# baseline_year: Reference year for observed baseline data. Controls FAOSTAT
# production, GDD dietary intake, GBD health data, ESA CCI land cover,
# and LUIcube grassland data.
# Hard constraints (data must exist for exact year):
# - LUIcube grassland: 1992-2020 (binding upper limit)
# - ESA CCI land cover: 1992-2022
# - GBD mortality: manual download required for matching year
# Soft constraints (nearest available year used automatically):
# - GDD dietary intake: latest available is 2018
# - GBD dietary risk exposure: latest available is 2019
#
# currency_base_year: Base year for inflation-adjusted USD values.
planning_horizon: 2030
baseline_year: 2020
currency_base_year: 2024
planning_horizon: Target year for optimization (default: 2030). Currently determined only which (projected) population levels to use.
currency_base_year: Base year for inflation-adjusted USD values (default: 2024). All cost data is automatically converted to real USD in this base year using CPI adjustments. See Crop Production (Production Costs section) for details on cost modeling.
Download Options¶
downloads:
show_progress: true
Path Options¶
# Root directories for workflow artifacts. Defaults keep everything under the
# project directory, but these can be redirected (e.g. to scratch storage).
# Environment variables and "~" are expanded by the Snakefile.
paths:
results_root: "results"
processing_root: "processing"
logs_root: "logs"
benchmarks_root: "benchmarks"
NetCDF Options¶
# NetCDF export settings for PyPSA network files (build and solve outputs)
netcdf:
float32: true # Downcast float64 to float32 to reduce file size
compression: # Passed to xarray.Dataset.to_netcdf; set to null to disable
zlib: true
complevel: 4
paths.*_root values support environment-variable and ~ expansion in the
Snakefile (for example "${GROUP_SCRATCH}/${USER}/food-opt/processing").
Validation Options¶
validation:
use_actual_yields: true
use_actual_production: false
enforce_baseline_diet: false # Set food consumption equal to current day values
enforce_baseline_feed: false # Fix animal feed use to GLEAM baseline values
land_slack: false # Enable land slack generators (allows exceeding regional land limits at cost)
disable_new_cropland: false # If true, no new land can supply the cropland pool
disable_new_pasture: false # If true, no new land can supply the pasture pool
disable_spared_cropland: false # If true, existing cropland cannot be spared
disable_spared_grassland: false # If true, existing grassland cannot be spared
slack_marginal_cost: 50. # bn USD per Mt/Mha for validation slack (food groups, feed, land)
feed_slack_cost_factor: 0.1 # Feed slack cost as fraction of slack_marginal_cost (lower separates feed from food slack)
grassland_yield_multiplier: 1.0 # Multiplier applied to effective grassland feed yields before building grassland links
production_stability:
enabled: true
penalty_mode: "l1" # "hard" = inequality bounds, "quadratic" = soft QP penalty, "l1" = linear absolute value penalty
quadratic_cost: 1.0 # bn USD per deviation² unit (only used when penalty_mode is "quadratic")
# L1 penalty costs live directly under production_stability with scope-
# specific names. String "calibrated" resolves at solve time from
# data/curated/calibration/prod_stability_l1.yaml; a number overrides.
# Regenerate the calibration with `tools/calibrate stability`.
land_l1_cost: "calibrated" # bn USD per Mha deviation (crops + grassland, shared)
animal_feed_l1_cost: "calibrated" # bn USD per Mt DM deviation (animal feed use). Set to null to fall back to automatic Mha-equivalent scaling from land_l1_cost.
deviation_type: "absolute" # "absolute" or "relative" deviation from baseline
crops:
enabled: true
max_relative_deviation: 0.2 # ±20%
enable_slack: false # Allow violating minimum production bounds at penalty cost
min_baseline: 0.000001 # Mha denominator floor for relative penalty modes.
grassland:
enabled: true
max_relative_deviation: 0.2
enable_slack: false
min_baseline: 0.000001 # Mha denominator floor for relative penalty modes.
animals:
enabled: true
max_relative_deviation: 0.2
enable_slack: false # Allow violating minimum production bounds at penalty cost
min_baseline: 0.00001 # Mt DM denominator floor for relative penalty modes.
land_conversion:
enabled: true # Penalize land-use transitions (conversion, pasture routing, sparing) from zero baseline
diet_stability:
# Per-(food, country) deviation penalty on food_consumption links, anchored
# to the observed baseline-year diet (the same matched_baseline used by
# enforce_baseline_diet). Land/animal-feed production_stability does NOT
# constrain how the same hectares are routed into different foods, so a
# priced regime (GHG/YLL > 0) can leave land use ~unchanged while shifting
# the diet substantially. This block adds an explicit currency cost on diet
# deviation; off by default so it does not affect the calibration runs or
# any existing scenario.
enabled: false
penalty_mode: "l1" # "l1" = linear |dev|, "quadratic" = 0.5 * c * dev^2
deviation_type: "absolute" # "absolute" = Mt, "relative" = (Mt-baseline)/baseline
food_l1_cost: 0.0 # bn USD per Mt deviation (used when penalty_mode == "l1")
quadratic_cost: 0.0 # bn USD per Mt^2 (used when penalty_mode == "quadratic")
min_baseline: 0.000001 # Mt floor for relative-mode denominator
animal_growth_cap:
enabled: true # Cap animal production growth to prevent unrealistic spatial reallocation
max_relative_increase: 0.5 # Maximum relative increase from baseline (0.1 = +10%)
crop_growth_cap:
# Per-(crop, country) hard upper bound on harvested area at
# ``(1 + max_relative_increase) * sum_baseline``. Aggregates across
# regions, resource classes and water-supply types within each
# country, preserving within-country reallocation freedom while
# bounding total country-level expansion of any single crop.
enabled: true
max_relative_increase: 10.0 # +1000%
# --- section: food_incentives ---
food_incentives:
enabled: false # When true, food-level incentives are applied to the objective
sources: []
# --- section: consumer_values ---
consumer_values:
baseline_scenario: "baseline" # Scenario name for consumer values extraction (must have enforce_baseline_diet=true)
# --- section: food_utility_piecewise ---
food_utility_piecewise:
enabled: false # When true, use piecewise diminishing marginal utility for food consumption
n_blocks: 4
decline_factor: 0.7 # Multiplicative utility decay by block (0 < factor <= 1)
total_width_multiplier: 2.0 # Total incentivized quantity as multiple of baseline consumption
min_block_width_mt: 0.00001 # Minimum width floor (Mt/year) for each utility block to avoid tiny upper bounds
# --- section: optimal_taxes ---
optimal_taxes:
enabled: false # When true, enables the optimal taxes/subsidies workflow
Set validation.enforce_baseline_diet to true to force the optimizer to match
baseline consumption derived from the processed GDD file. When this flag is active,
the diet.baseline_age and baseline_year settings determine which
cohort/year is enforced. Use validation.food_group_slack_marginal_cost to set the
penalty (USD2024 per Mt) for the slack generators that backstop those fixed
food-group loads. Keep the value high so slack only activates when recorded production
cannot meet the enforced demand targets.
Set validation.enforce_baseline_feed to true to fix animal feed use to
GLEAM-derived baseline levels (see Baseline Feed Intake). The baseline is
scaled from GLEAM 2.0 (2010) to the reference year and calibrated against the
known GLEAM 3.0 global total using validation.gleam_calibration_year and
validation.gleam_calibration_total_gt_dm.
See Validation for a detailed walkthrough of the validation workflow and diagnostic figures.
Consumer Utility Options¶
Two mutually exclusive options can be used to represent consumer preference in the objective:
food_incentivesapplies a single linear marginal-cost adjustment per(food, country)pair.food_utility_piecewiseapplies a piecewise diminishing marginal utility curve per(food, country)pair.
When food_utility_piecewise.enabled is true, the workflow always reads
utility blocks from results/{name}/consumer_values/utility_blocks.csv.
These blocks are generated by calibrate_food_utility_blocks from:
baseline dual values extracted by
extract_consumer_values; andbaseline per-food consumption from the baseline scenario solve.
The current calibration anchors marginal utility at the baseline quantity:
the utility block containing baseline consumption uses the extracted dual
value, with higher utility below baseline and lower utility above baseline
according to food_utility_piecewise.decline_factor.
food_utility_piecewise cannot be combined with
validation.enforce_baseline_diet in the same scenario.
Production Stability Bounds¶
The validation.production_stability section allows constraining how much crop and
animal product production can deviate from current (baseline) levels. This is useful for
investigating what positive changes (e.g., improved health outcomes, reduced emissions)
can be achieved with limited disruption to existing production patterns.
Three penalty modes are available, selected via penalty_mode:
``hard`` (default): Inequality bounds. Per-(product, country) production is bounded by:
\[(1 - \delta) \times \text{baseline} \le \text{production} \le (1 + \delta) \times \text{baseline}\]where \(\delta\) is the
max_relative_deviationparameter (e.g., 0.2 for ±20%).``l1``: Soft L1 (linear absolute-value) penalty on deviations from baseline production. Each unit of absolute deviation incurs a cost of
l1_cost(bn USD per Mha for crops/grassland, or Mha-equivalent for animals). An L1 cost of approximately 1.0 is roughly the lowest value that induces the model to replicate current production patterns.``quadratic``: Soft quadratic penalty on deviations, with cost
quadratic_cost(bn USD per deviation² unit).
The deviation_type option (absolute or relative) controls whether deviations
are measured in absolute units or relative to the baseline.
Configuration options:
production_stability.enabled: Master switch for the feature (default:false)production_stability.penalty_mode:hard,l1, orquadratic(default:hard)production_stability.l1_cost: L1 penalty cost (default: 0.22, only used whenpenalty_modeisl1)production_stability.quadratic_cost: Quadratic penalty cost (default: 1.0, only used whenpenalty_modeisquadratic)production_stability.deviation_type:absoluteorrelative(default:absolute)production_stability.crops.enabled: Apply to crop productionproduction_stability.crops.max_relative_deviation: Maximum relative deviation for crops (0-1,hardmode only)production_stability.animals.enabled: Apply to animal product productionproduction_stability.animals.max_relative_deviation: Maximum relative deviation for animal products (0-1,hardmode only)
Behavior notes:
Products with zero baseline production are constrained to zero (no new products introduced)
Products missing baseline data are skipped with a warning
Multi-cropping is automatically disabled when production stability is enabled
Diet Stability¶
The validation.diet_stability section adds a per-(food, country) soft anchor on
food consumption toward the observed baseline diet (the same per-(food, country)
target_mt derived from baseline_year-resolved data that
enforce_baseline_diet consumes). It is independent of production_stability
and off by default.
Production stability constrains what is produced (and how much land/feed it uses)
but leaves the model free to reroute the same hectares into a very different diet
(e.g. converting feed-grain area to direct-food legumes/whole-grains). Under priced
regimes (positive ghg_price / value_per_yll) this rerouting can be substantial
even when total cropland and pasture deviate by only a few percent. Diet stability
attaches an explicit currency cost to every Mt that consumption of a food deviates
from baseline, restoring a knob to anchor consumption close to current levels.
Two penalty modes are available, selected via penalty_mode:
``l1`` (default): linear absolute-value penalty on consumption deviation. Each Mt of absolute deviation costs
food_l1_costbn USD.``quadratic``: soft quadratic penalty on consumption deviation,
0.5 * quadratic_cost * sum((p - baseline)^2).
Configuration options:
diet_stability.enabled: Master switch (default:false).diet_stability.penalty_mode:l1(default) orquadratic.diet_stability.deviation_type:absoluteorrelative(default:absolute).diet_stability.food_l1_cost: Linear penalty (bn USD per Mt deviation), used whenpenalty_modeisl1.diet_stability.quadratic_cost: Quadratic penalty (bn USD per Mt²), used whenpenalty_modeisquadratic.diet_stability.min_baseline: Mt floor for relative-mode denominators.
Interaction with other features:
The penalty is added on top of any piecewise consumer-values utility (
food_utility_piecewise); the two compose linearly.Diet stability has no effect when
enforce_baseline_dietis true (the diet is already pinned viap_set).The cost shows up as a separate
diet_stabilitycolumn in the per-scenarioanalysis/.../objective_breakdown.parquet.
A typical use is reproducing the current observed diet (per baseline_year)
under a priced GHG/YLL regime as the high-stability anchor of a transition study;
the appropriate food_l1_cost is calibration-specific (analogous to
land_l1_cost for production_stability).
Growth Caps¶
Two hard upper bounds on production growth sit alongside the soft production-stability penalty above. They act as structural backstops against runaway expansion in either direction (animals or crops) under L1 stability, without depending on the L1 penalty being well-tuned.
Both caps are independent of production_stability.enabled and are
configured under validation.animal_growth_cap and
validation.crop_growth_cap respectively.
Animal growth cap (validation.animal_growth_cap)
Upper-bounds each animal_production link’s feed input at
\((1 + \delta) \cdot \text{baseline}\_\text{feed}\_\text{use}\_\text{mt}\_\text{dm}\).
The granularity is per-(product, feed-category, country), which directly
constrains the feed mix as well as the production level.
animal_growth_cap.enabled: master switch (default:true)animal_growth_cap.max_relative_increase: cap (default0.1= +10%)
Zero-baseline links get an upper bound of zero, so animal systems cannot be introduced in countries where they were not present in the baseline.
Crop growth cap (validation.crop_growth_cap)
Upper-bounds the total country-level harvested area of each modelled crop at \((1 + \delta) \cdot \sum_{r,c,w} \text{baseline}\_\text{area}\_\text{mha}\), where the sum is over regions, resource classes, and water-supply types within a country. Country-level (rather than per-link) granularity preserves within-country reallocation freedom — the model can still shift crop production between regions and resource classes based on yield economics — while bounding total country-level expansion.
crop_growth_cap.enabled: master switch (default:true)crop_growth_cap.max_relative_increase: cap (default10.0= +1000%, i.e. 11× baseline)
Zero-baseline crop-country groups get an upper bound of zero, so crops cannot be introduced in countries where they were not present in the baseline.
The crop cap is intentionally much more generous than the animal
cap’s +10% because realistic dietary-shift scenarios already produce
legitimate global crop expansions of 300–400% (e.g. legumes under
plant-shift diets), and per-country shifts can be larger still. The
crop cap is a backstop against ridiculous expansion (the canonical
olive-USA case at 19× baseline) rather than a fine-tuned bound on
realistic reallocation. The principled fix to the underlying cost
calibration / L1 interaction lives elsewhere (in how negative
corrections are applied — see Cost calibration).
Why both caps exist (interaction with cost calibration)
Cost calibration (see Cost calibration) extracts per-Mha (or
per-Mt-DM) cost corrections from the duals of ±1% hard-bound stability
constraints. Those duals are local marginal-cost gradients valid at
baseline production; applied as a constant per-unit correction at any
production level under L1 stability, the calibration interpretation
breaks for crops or products with very small baselines. The canonical
case is olive in the USA: a moderate -0.40 bnUSD/Mha cost correction
calibrated at baseline 0.04 Mha would otherwise drive the model to
~0.7 Mha (19× baseline) and starve other US crops (notably maize) of
land. The growth caps prevent this kind of pathological extrapolation
without changing the calibration itself.
Limitations
Caps are applied uniformly at the country level — the model cannot exceed +X% in any individual country, but a sector-wide expansion (e.g. all major producers grow soybean by 50%) is still permitted. This is intentional: the caps target spatial reallocation artifacts, not sector-level demand growth.
For animals, the per-(product, feed-category) granularity means the
cap also constrains the feed mix: a country can’t shift entirely
from grain-fed to forage-fed cattle even if total cattle output stays
within ±10%. This is mostly desirable but can be over-restrictive for
counterfactual scenarios that probe alternative feed regimes; raise
max_relative_increase for such studies.
Crop Selection¶
crops:
# Core cereals
- wheat
- dryland-rice
- wetland-rice
- maize
- barley
- oat
- rye
- sorghum
- buckwheat
- foxtail-millet
- pearl-millet
# Legumes/pulses
- soybean
- dry-pea
- chickpea
- cowpea
- gram
- phaseolus-bean
- pigeonpea
# Roots and tubers
- white-potato
- sweet-potato
- cassava
- yam
- plantain # GAEZ-untracked; uses banana yield surface as a proxy. Harvested area is split off FAOSTAT QCL "Bananas" / "Plantains and others" via prepare_banana_plantain_split.
# Vegetables
- tomato
- carrot
- onion
- cabbage
# Fruits
- banana
- watermelon
- mango
- citrus
- coconut
- apple
# Stimulant crops
- cocoa
- coffee
- tea
# Oil crops
- sunflower
- rapeseed
- groundnut
- sesame
- oil-palm
- olive
# Sugar crops
- sugarcane
- sugarbeet
# Fiber crops
- cotton
# Fodder / biomass (also listed in non_food_crops below)
- alfalfa
- silage-maize
- biomass-sorghum
# Note: mango uses citrus as an explicit RES02 growing-season fallback.
# Taro excluded - missing RES02 (growing season) data for GFDL-ESM4.
# --- section: cropgrids_crops ---
# Crops whose yield, harvested area and suitable area come from CROPGRIDS +
# FAOSTAT instead of GAEZ. Listed crops bypass GAEZ rasters entirely (yield,
# suitability, water requirement, growing season, harvested area). They are
# always rainfed-only and their suitable area is set equal to their current
# CROPGRIDS harvested area (× cropgrids.suitable_area_expansion), so no new
# land can be brought into production for them. Each entry must:
# * appear in `crops`,
# * be absent from `irrigation.irrigated_crops` (rainfed-only),
# * not appear in any `multiple_cropping` combination,
# * have a row in data/curated/cropgrids_crop_mapping.csv,
# enforced by workflow/validation/cropgrids_crops.py.
cropgrids_crops:
- apple
cropgrids:
# Multiplier on CROPGRIDS harvested area to derive suitable_area. 1.0 means
# the crop can only be grown on land currently growing it; larger values
# allow some local expansion (uniformly across resource classes within the
# region). Per-crop overrides could be added later if needed.
suitable_area_expansion: 1.0
# --- section: yield_calibration ---
# Per-(country, crop) yield rescaling for crops where GAEZ relies on a proxy
# raster (e.g. plantain uses the GAEZ banana raster because GAEZ has no
# separate plantain output). For each listed crop, the build step derives a
# country-level multiplier
#
# multiplier_c = FBS-corrected FAOSTAT production_c
# / current model GAEZ-derived production_c
#
# that is applied uniformly to every per-cell yield in country c, so the
# country-level production matches FAOSTAT by construction while the GAEZ
# within-country spatial pattern is preserved.
#
# Only applied when ``validation.use_actual_yields`` is true; in optimisation
# mode the GAEZ potential yields are used unrescaled. The mechanism mirrors
# ``fodder_decomposition.yield_corrections`` (Eurostat-anchored) and the two
# corrections compose multiplicatively where they overlap.
yield_calibration:
enabled: true
crops:
- plantain # GAEZ uses banana raster as proxy; FAOSTAT plantain yields are
# ~30% higher than GAEZ-banana yields in African producers.
- coffee # GAEZ COC yields systematically below FAOSTAT in major
# producers (BRA, VNM, COL), giving ~25% global under-production.
- tea # GAEZ TEA yields overshoot in CHN; without rescaling the model
# produces ~30% more dried tea than FAOSTAT fresh-leaf statistics
# imply (after the standard 4:1 fresh-to-dry conversion).
# Numerical safety: per-country multipliers outside [min, max] are clipped
# (and logged) to avoid degenerate cases (zero current production, etc.).
multiplier_min: 0.5
multiplier_max: 3.0
# --- section: non_food_crops ---
# Crops not intended for human food production (fodder, biomass).
# These are excluded from foods.csv validation but still need yield/land data.
non_food_crops:
- alfalfa
- silage-maize
- biomass-sorghum
See Crop Production for full list. Add/remove crops to explore specialized vs. diversified production systems.
Multiple Cropping¶
multiple_cropping:
double_rice:
crops:
- wetland-rice
- wetland-rice
water_supplies:
- r
- i
rice_wheat:
crops:
- wetland-rice
- wheat
water_supplies:
- r
- i
maize_soybean:
crops:
- maize
- soybean
water_supplies:
- r
- i
Define sequential cropping systems as ordered lists of crops. Entries may
repeat a crop (double rice) or mix cereals and legumes (rice→wheat, maize→soybean) and
list multiple water_supplies (r for rainfed, i for irrigated) to build both
variants. The build_multi_cropping rule checks growing-season compatibility,
aggregates eligible area/yields, and sums irrigated water demand; build_model turns
each combination into a multi-output land link. Leave the section empty to disable the
feature. Multiple cropping zones that imply relay cropping (GAEZ classes “limited double” or
“double rice … limited triple”) are still accepted here but are interpreted as sequential crop
chains; relay-specific dynamics are not yet modelled.
Country Coverage¶
countries:
# - ABW # No level-1 GADM data
- AFG
- AGO
# - AIA # No regions (microstate)
# - ALA # No population
- ALB
# - AND # excluded: microstate
- ARE
- ARG
- ARM
- ASM
# - ATA # No level-1 GADM data
# - ATF # No population
- ATG
- AUS
- AUT
- AZE
- BDI
- BEL
- BEN
# - BES # excluded: small overseas territory
- BFA
- BGD
- BGR
# - BHR # excluded: desert city-state
- BHS
- BIH
# - BLM # No regions (microstate)
- BLR
- BLZ
# - BMU # No regions (microstate)
- BOL
- BRA
- BRB
- BRN
- BTN
# - BVT # No level-1 GADM data
- BWA
- CAF
- CAN
# - CCK # No level-1 GADM data
- CHE
- CHL
- CHN
- CIV
- CMR
- COD
- COG
# - COK # excluded: small island territory
- COL
- COM
- CPV
- CRI
- CUB
# - CUW # No level-1 GADM data
# - CXR # No level-1 GADM data
# - CYM # excluded: small overseas territory
- CYP
- CZE
- DEU
- DJI
# - DMA # excluded: small island state
- DNK
- DOM
- DZA
- ECU
- EGY
- ERI
# - ESH # excluded: sparse desert territory
- ESP
- EST
- ETH
- FIN
- FJI
# - FLK # No level-1 GADM data
- FRA
# - FRO # excluded: small island territory
# - FSM # excluded: small island state
- GAB
- GBR
- GEO
# - GGY # Too small
- GHA
# - GIB # No level-1 GADM data
- GIN
# - GLP # excluded: overseas department
- GMB
- GNB
- GNQ
- GRC
- GRD
# - GRL # excluded: ice-dominated
- GTM
- GUF
# - GUM # excluded: small island territory
- GUY
# - HKG # No level-1 GADM data
# - HMD # No level-1 GADM data
- HND
- HRV
- HTI
- HUN
- IDN
# - IMN # excluded: small island territory
- IND
# - IOT # No level-1 GADM data
- IRL
- IRN
- IRQ
- ISL
- ISR
- ITA
- JAM
# - JEY # No regions (microstate)
- JOR
- JPN
- KAZ
- KEN
- KGZ
- KHM
# - KIR # No level-1 GADM data
# - KNA # excluded: small island state
- KOR
# - KWT # excluded: desert city-state
- LAO
- LBN
- LBR
- LBY
# - LCA # excluded: small island state
# - LIE # excluded: microstate
- LKA
- LSO
- LTU
- LUX
- LVA
# - MAC # No level-1 GADM data
# - MAF # No level-1 GADM data
- MAR
# - MCO # No level-1 GADM data
- MDA
- MDG
# - MDV # No level-1 GADM data
- MEX
# - MHL # No regions (microstate)
- MKD
- MLI
- MLT
- MMR
- MNE
- MNG
# - MNP # excluded: small island territory
- MOZ
- MRT
# - MSR # excluded: small island territory
# - MTQ # excluded: overseas department
- MUS
- MWI
- MYS
# - MYT # excluded: overseas department
- NAM
# - NCL # excluded: overseas territory
- NER
# - NFK # No level-1 GADM data
- NGA
- NIC
# - NIU # No level-1 GADM data
- NLD
- NOR
- NPL
# - NRU # No regions (microstate)
- NZL
- OMN
- PAK
- PAN
# - PCN # No level-1 GADM data
- PER
- PHL
# - PLW # excluded: small island state
- PNG
- POL
- PRI
# - PRK # excluded: no health data available for North Korea
- PRT
- PRY
- PSE
# - PYF # excluded: overseas territory
# - QAT # excluded: desert city-state
# - REU # excluded: overseas department
- ROU
- RUS
- RWA
- SAU
- SDN
- SEN
# - SGP # excluded: desert city-state (urban)
# - SGS # No level-1 GADM data
# - SHN # excluded: small island territory
# - SJM # No population
- SLB
- SLE
- SLV
# - SMR # No regions (microstate)
- SOM
# - SPM # excluded: small island territory
- SRB
- SSD
- STP
- SUR
- SVK
- SVN
- SWE
- SWZ
# - SXM # No level-1 GADM data
# - SYC # excluded: small island state
- SYR
# - TCA # excluded: small island territory
- TCD
- TGO
- THA
- TJK
# - TKL # No regions (microstate)
- TKM
- TLS
# - TON # excluded: small island state
- TTO
- TUN
- TUR
# - TUV # No regions (microstate)
- TWN
- TZA
- UGA
- UKR
# - UMI # No population
- URY
- USA
- UZB
# - VAT # No level-1 GADM data
# - VCT # excluded: small island state
- VEN
# - VGB # excluded: small island territory
# - VIR # excluded: small island territory
- VNM
- VUT
# - WLF # excluded: overseas territory
# - WSM # excluded: small island state
- YEM
- ZAF
- ZMB
- ZWE
Include countries/territories to model; exclude to reduce problem size. Microstate and countries missing essential data are commented out.
Spatial Aggregation¶
Controls regional resolution and land classification.
aggregation:
regions:
target_count: 750
allow_cross_border: false
method: "kmeans"
simplify_tolerance_km: 5
simplify_min_area_km: 25
resource_class_quantiles: [0.25, 0.5, 0.75]
# Method used to rank gridcells before binning them into unweighted within-region quantiles.
# - "max_yield": maximum crop yield in each gridcell (uses actual yields when validation.use_actual_yields is true)
# - "regional_crop_mix_actual_yield": actual-yield-only score weighted by the region's current harvested crop mix
resource_class_score: "regional_crop_mix_actual_yield"
# Data source for determining irrigated land area when aggregating by region/resource class.
# - "current": use GAEZ "land equipped for irrigation" dataset (same area for all crops)
# - "potential": use GAEZ irrigated suitability rasters (crop-specific potential area)
irrigated_area_source: "current"
- Trade-offs:
More regions → higher spatial resolution, longer solve time
Fewer resource classes → faster solving, less yield heterogeneity
Land, Water, Fertilizer, and Residues¶
Limits on land, fertilizer availability, and residue management.
land:
regional_limit: 1.0 # fraction of each region's potential cropland that is made available.
land_use_cost_usd_per_ha: 0.0 # Small optional per-hectare land-use cost to regularize land allocation (set >0 to activate)
conversion_cost_forest_usd_per_ha: 8000 # Overnight investment cost for converting forested land to agriculture (2024 USD/ha); sources in docs/land_use.rst
conversion_cost_nonforest_usd_per_ha: 2000 # Overnight investment cost for converting non-forested land to agriculture (2024 USD/ha); sources in docs/land_use.rst
investment_horizon: 25 # Years over which to annualize land conversion investment costs
discount_rate: 0.05 # Annual discount rate for annualizing land conversion investment costs
filtering:
min_crop_yield_t_per_ha: 0.01 # Minimum yield for crop links (t/ha); filters ~1% of entries
min_grassland_yield_t_per_ha: 0.05 # Minimum yield for grassland links (t/ha); filters ~6% of entries
min_area_ha: 100 # Minimum land area (ha); filters very small resource classes
Water Supply¶
water:
# Water supply scenario determines which dataset is used for regional water limits:
# - "sustainable": Water Footprint Network blue water availability by basin (Hoekstra & Mekonnen 2011)
# Represents sustainable water extraction limits.
# - "current_use": Huang et al. (2018) gridded irrigation water withdrawals
# Represents actual/current agricultural water use, useful for validation.
supply_scenario: "current_use"
# Reference year for Huang irrigation data (only used when supply_scenario is "current_use")
huang_reference_year: 2010
water.supply_scenarioselects the water availability dataset:sustainable(Water Footprint Network blue water availability) orcurrent_use(Huang et al. irrigation withdrawals). Usecurrent_usefor validation or benchmarking against present-day withdrawals.water.huang_reference_yearselects the year (1971-2010) used for the Huang monthly withdrawals whensupply_scenarioiscurrent_use.
fertilizer:
limit: 200_000_000 # t-N (200 Mt-N total limit in synthetic fertilizer application)
marginal_cost_usd_per_tonne: 500 # USD per t-N of synthetic fertilizer
# High-input agriculture N application rates (percentile of global FUBC data)
n_percentile: 80 # Use 80th percentile for high-input systems (range: 0-100)
# Manure nitrogen management
manure_n_to_fertilizer: 0.75 # Fraction of N excreted in confined quarters available as fertilizer (accounting for losses during storage/handling)
# Proxy mappings for model crops that are absent from the IFA FUBC dataset.
# Each mapping copies the source crop's derived N rate to the target crop
# so that downstream build_model.py can index every model crop without
# silent-zero fallbacks.
proxy_rates:
plantain: banana # FUBC has no plantain row; banana is the closest match.
silage-maize: maize # silage maize is fertilised like grain maize.
residues:
max_feed_fraction: 0.30 # Maximum fraction of residues that can be removed for animal feed (remainder must be incorporated into soil)
max_feed_fraction_by_region: # Overrides by ISO3 country code or M49 region/sub-region name (country overrides sub-region overrides region)
Asia: 0.70 # Asia uses a lot of crop residues for feeding; setting this higher helps livestock feed balancing in the model
residues.max_feed_fraction_by_regionoverrides the global fraction for ISO3 countries or UN M49 regions/sub-regions.Precedence is: country overrides sub-region overrides region.
GAEZ Data Parameters¶
Configures which GAEZ v5 climate scenario and input level to use.
data:
gaez:
# GAEZ v5 parameters
# Note: RES05 (yields/suitability) has ENSEMBLE, but RES02 (growing season) only has individual GCMs
climate_model: "GFDL-ESM4" # Specific GCMs: "GFDL-ESM4", "IPSL-CM6A-LR", "MPI-ESM1-2-HR", "MRI-ESM2-0", "UKESM1-0-LL"
climate_model_ensemble: "ENSEMBLE" # Multi-model mean (only available for RES05, not RES02)
period: "FP2140" # Future: "FP2140" (2021-2040), "FP4160" (2041-2060), "FP6180" (2061-2080), "FP8100" (2081-2100); Historical: "HP0120" (2001-2020), "HP8100" (1981-2000)
climate_scenario: "SSP126" # "SSP126" (low emissions), "SSP370" (medium, ~RCP4.5), "SSP585" (high), "HIST" (historical)
input_level: "H" # "H" (High), "L" (Low)
# Variable codes for GAEZ v5
yield_var: "RES05-YCX" # Average attainable yield, current cropland
water_requirement_var: "RES05-WDC" # Water deficit/net irrigation requirement during crop cycle, current cropland
suitability_var: "RES05-SX1" # Share of grid cell assessed as VS or S (very suitable or suitable)
usda:
# API credentials: configure in config/secrets.yaml or via USDA_API_KEY environment variable
# See config/secrets.yaml.example for setup instructions
retrieve_nutrition: true # Set to true to fetch nutrition data from USDA instead of using the provided data
# Nutrient mapping: internal name -> USDA FoodData Central name
# USDA names must match nutrient names in FoodData Central exactly
nutrients:
protein: "Protein"
carb: "Carbohydrate, by difference"
fat: "Total lipid (fat)"
cal: "Energy"
land_cover:
# ECMWF credentials: configure in config/secrets.yaml or via environment variables
# See config/secrets.yaml.example for setup instructions
version: "v2_1_1"
faostat:
qcl_production_element_code: 5510 # "Production" in tonnes (QCL dataset, covers crops and livestock)
fbs_food_supply_element_code: 645 # "Food supply quantity (kg/capita/yr)" in FBS dataset
fbs_other_uses_element_code: 5154 # "Other uses (non-food)" in 1000 tonnes (FBS dataset)
fbs_production_element_code: 5511 # "Production" in 1000 tonnes (FBS dataset)
soilgrids:
target_resolution_m: 10000 # Target resolution in meters (10000m = 10km)
- Scenarios:
SSP126: Strong mitigation (1.5-2°C warming)
SSP370: Moderate emissions (~3°C)
SSP585: High emissions (~4-5°C)
- Input Levels:
H: Modern agriculture (fertilizer, irrigation, pest control)
L: Subsistence farming (minimal external inputs)
Irrigation¶
irrigation:
# Which model crops are allowed to have irrigated production.
# In GAEZ v5, all crops have both irrigated (HILM/LILM) and rainfed (HRLM/LRLM) data available.
# List specific crops here if you want to restrict irrigation, or use "all" for all crops.
irrigated_crops: "all"
# --- section: costs ---
costs:
averaging_period:
start_year: 2015
end_year: 2024
animal_costs:
fadn:
high_cost_threshold_usd_per_t: 50000
livestock_specific_costs:
SE330: "Other livestock specific costs"
shared_farm_costs:
SE340: "Machinery & building current costs"
SE345: "Energy"
SE350: "Contract work"
SE356: "Other direct inputs"
SE360: "Depreciation"
SE370: "Wages paid"
SE380: "Interest paid"
SE390: "Taxes"
grazing_cost_items:
SE310: "Feed for grazing livestock"
SE315: "Feed for grazing livestock home-grown"
exclude_costs:
SE320: "Feed for pigs & poultry"
SE325: "Feed for pigs & poultry home-grown"
SE375: "Rent paid"
usda:
request_timeout_seconds: 120
# Conversion factors: kg per head dressed weight
dressed_weight_kg_per_head:
meat-cattle: 350.0
meat-pig: 90.0
include_items:
- "Hired labor"
- "Opportunity cost of unpaid labor"
- "Bedding and litter"
- "Custom services"
- "Fuel, lube, and electricity"
- "Repairs"
- "Interest on operating capital"
- "Marketing"
- "Veterinary and medicine"
- "Capital recovery of machinery and equipment"
- "General farm overhead"
- "Taxes and insurance"
grazing_cost_items:
- "Grazed feed"
exclude_items:
- "Homegrown harvested feed"
- "Purchased feed"
- "Total, feed costs"
- "Opportunity cost of land"
- "Total, operating costs"
- "Costs listed"
faostat:
aggregate_area_code_limit: 5000
element_codes:
production: ["2510", "5510"]
stocks: ["2111", "5111"]
producing_animals: ["2313", "5318", "5313"]
# Fallbacks for animal products without USDA/FADN data. Resolution order
# in merge_animal_costs.py: source data -> alias -> literature -> zero.
# ``production`` is the non-grazing operating cost and ``grazing`` the
# grazed-forage cost, both in USD per tonne of product (base year),
# mirroring the source-data column convention. See docs/costs.rst for
# the source citations behind each entry.
fallback_aliases:
dairy-buffalo: dairy
fallback_values_usd_per_t:
meat-chicken:
production: 1300
grazing: 0
meat-sheep:
production: 1200
grazing: 2300
crop_costs:
non_endogenous_cost_share: 0.7 # Fraction of revenue (price * yield) attributed to non-endogenous production costs
# Crop-specific upper winsorization on per-ha cost. For each crop, country
# values above this quantile of the non-fallback distribution are clipped
# to that quantile. This removes FAOSTAT greenhouse / data-quality
# outliers (e.g. tomato in cold-climate Europe) without distorting the
# bulk. Null disables the cap. See docs/costs.rst.
outlier_cap_quantile: 0.90
faostat:
price_element_code: 5532 # Producer Price (USD/tonne)
yield_element_code: 5412 # Yield (kg/ha)
# --- section: cost_calibration ---
# Unified cost calibration corrections extracted from production stability duals.
# Covers crops, grassland, and animals. When enabled, additive cost corrections
# are applied at build time to align model costs with revealed preferences.
cost_calibration:
enabled: true # Apply calibration corrections to production costs
generate: false # Generate calibration from solved model (breaks DAG cycle when true)
scenario: "calibration" # Scenario name used for calibration solve
crop_correction_csv: "data/curated/calibration/crop_cost.csv"
grassland_correction_csv: "data/curated/calibration/grassland_cost.csv"
animal_correction_csv: "data/curated/calibration/animal_cost.csv"
# --- section: prod_stability_calibration ---
# Calibrated L1 production-stability penalty costs (land vs. animal feed).
# Read at solve time whenever ``validation.production_stability.land_l1_cost``
# or ``.animal_feed_l1_cost`` is set to the sentinel string ``"calibrated"``.
# Generated by an in-process Broyden iteration; see
# config/calibration/stability.yaml and tools/calibrate stability.
prod_stability_calibration:
enabled: true # Resolve the "calibrated" sentinel at solve time from the file below
generate: false # Generate the file (breaks DAG cycle when true)
target_deviation_pct: 5.0 # Target deviation (% of baseline)
calibrated_l1_yaml: "data/curated/calibration/prod_stability_l1.yaml"
# Seed used when no previous calibrated YAML is wired in. With a warm-start
# input the seed is overridden by the previous land/animal_feed_l1_cost.
seed_land_l1_cost: 0.1
seed_animal_feed_l1_cost: 0.03
# |log(d/target)|_inf < tolerance -> converged. 0.02 = ±2% relative.
tolerance: 0.02
max_iter: 8
# Per-iteration cap on |Δx|_inf in log-coords. log(2) limits steps to
# at most doubling/halving lambda per iteration.
trust_region_log: 0.693
trace_csv: "<results>/{name}/calibration/prod_stability_trace.csv"
Restrict irrigation to water-scarce scenarios or explore rainfed-only production.
Macronutrients¶
macronutrients: {}
# For each of "carb", "protein", "fat" and "cal" we support "min",
# "max" and "equal" keywords, which are given in g/person/day; see
# example below. Alternatively, use "equal_to_baseline: true" to
# enforce per-country equality at the level implied by each country's
# baseline diet (mutually exclusive with min/max/equal).
# carb:
# min: 250 # g/person/day
# # equal_to_baseline: true # per-country g/person/day from baseline diet
# protein:
# min: 50 # g/person/day
# fat:
# min: 50 # g/person/day
# cal:
# min: 2000 # kcal/person/day
# # equal_to_baseline: true # per-country kcal/person/day from baseline diet
# --- section: sensitivity ---
# Multiplicative adjustment factors for sensitivity analysis. Applied after
# model construction. See config/schemas/config.schema.yaml for structure.
sensitivity: {}
# --- section: byproducts ---
# Foods that are not for direct human consumption (excluded from food group tracking)
byproducts:
- beet-pulp
- wheat-bran
- wheat-germ
- rice-bran
- barley-bran
- oat-bran
- buckwheat-hulls
- oilseed-meal
- palm-kernel-meal
- rapeseed-meal
- ddgs
- molasses
- maize-ethanol
- maize-gluten-feed
- maize-gluten-meal
- maize-starch
- sugarcane-ethanol
- cotton-lint
# --- section: gleam3_feed_attribution ---
# When distributing a GLEAM3 intake bucket (e.g. "By-products" =
# ~200 Mt/yr ruminant DM intake globally) across model feed categories,
# ``compute_gleam3_feed_fractions`` weights each contained model entity
# by its per-country production potential, derived from FAOSTAT crop
# production × the foods.csv pathway factor that produces it. For
# pathways where the realised share of the source crop is materially
# below 1.0 (so the unmodified potential over-states the entity's true
# share of intake), supply a dispatch-share override here. Pathways
# not listed default to 1.0.
gleam3_feed_attribution:
pathway_dispatch_shares:
# Wet-milled corn (HFCS / starch / corn-oil / gluten meal+feed):
# ~15-20 % of US corn × US share of global maize production (~30 %)
# ⇒ roughly 5-8 % of global maize. USDA ERS, Corn and Other Feed
# Grains: https://www.ers.usda.gov/topics/crops/corn-and-other-feed-grains/
maize_wetmill: 0.07
# Dry-milled corn fuel ethanol: ~40 % of US corn × US ~30 % share
# ⇒ ~12 % of global maize. USDA ERS Feed Grains Sector at a Glance.
maize_ethanol: 0.12
# Sugarcane ethanol: Brazil ~50 % of cane to ethanol vs sugar, India
# and most other producers mostly sugar; ~25 % global average. F.O.
# Licht / OECD-FAO Agricultural Outlook 2023-2032 oilseeds chapter.
sugarcane_ethanol: 0.25
Use min, max, or equal constraints.
Food Groups¶
food_groups:
included:
- whole_grains
- grain
- fruits
- vegetables
- legumes
- nuts_seeds
- starchy_vegetable
- oil
- red_meat
- poultry
- dairy
- eggs
- sugar
- stimulants
- animal_fat
# Optional per-group constraints with "min", "max" or "equal" in g/person/day
constraints: {}
equal_by_country_source: null
# Per-capita consumption caps (g/person/day) applied as e_nom_max on stores.
# Values are set to:
# ceil(2 * max(TMREL, max country-level group consumption))
# using custom baseline diet estimates from processing/{name}/baseline_diet.csv
# and TMREL values from derived health RR curves (where available).
max_per_capita:
whole_grains: 300
grain: 1403
fruits: 658
vegetables: 785
legumes: 300
nuts_seeds: 79
starchy_vegetable: 1221
oil: 155
red_meat: 285
poultry: 241
dairy: 2865
eggs: 213
sugar: 133
stimulants: 50
animal_fat: 50
# Fix relative food contributions within each food group based on baseline
# consumption data. When enabled, the model maintains baseline ratios between
# foods in each group (e.g., if wheat is 60% and rice 40% of grains, that
# ratio is preserved) while allowing total group consumption to vary.
fix_within_group_ratios:
enabled: false
# --- section: weight_conversion ---
# Mass-basis conversion tables, keyed "<from>_to_<to>". Each table maps a
# food (or food group) name to a multiplicative factor; foods not listed
# default to 1.0. Consumed by the diet pipeline (baseline_diet, FLW,
# health RR conversions) and by the animal-product pipeline (FAOSTAT QCL
# carcass → retail conversion and feed→retail ME normalisation).
# Bases recognised model-wide: dry, fresh, cooked, carcass, brewed.
weight_conversion:
# GBD whole_grain is reported on a dry whole-grain basis (the TMREL
# 100-150 g/day is calibrated on dry content); GBD legumes is reported
# on a cooked basis. Convert to model basis (dry) via 0.45 / 0.40.
cooked_to_dry:
grain: 0.45
whole_grains: 0.45
legumes: 0.40
# GBD red_meat is reported in cooked basis; the model uses raw retail
# mass. Inflation factor 1/0.7 ≈ 1.43 lands GBD red_meat exposure on
# raw retail basis. (Plan: complement with a basis correction to the
# health module's red_meat RR function so attributable burden uses
# cooked-basis exposure end to end.)
cooked_to_fresh:
red_meat: 1.43
poultry: 1.0
# Green-tea-leaf → made (dry) tea: FAO uses 0.22 as the standard
# processing yield (1 kg of green leaf yields ~0.22 kg of dry made
# tea). Applied in the FBS override so the supply-side GAEZ tea yields
# (made-tea basis) and the demand-side baseline consumption (derived
# from FBS green-leaf supply) land in the same basis.
fresh_to_dry:
tea-dried: 0.22
# Carcass → retail (boneless, edible) conversion for meat products.
# Source: OECD-FAO Agricultural Outlook 2023-2032, Meat chapter,
# Box 6.1 ("Edible retail weight"). Cross-reference: USDA Agricultural
# Handbook 697 (1992), Table 7. The food bus carries retail mass, while
# FAOSTAT QCL reports meat in carcass weight — these factors land FBS
# supply (FBS override path), QCL production (prepare_faostat_animal_
# production), implicit FLW (prepare_food_loss_waste), and ME-per-kg
# requirements (build_feed_to_animal_products) on the retail/fresh basis
# the model uses internally. Eggs and dairy aren't listed because their
# FBS supply is already in retail mass (factor 1.0 by default).
carcass_to_fresh:
meat-cattle: 0.67 # OECD-FAO 2023 Box 6.1: Beef 67%
meat-pig: 0.73 # OECD-FAO 2023 Box 6.1: Pigmeat 73%
meat-sheep: 0.66 # OECD-FAO 2023 Box 6.1: Sheep 66%
meat-chicken: 0.60 # OECD-FAO 2023 Box 6.1: Poultry 60%
included lists the food groups tracked by the model. constraints is an
optional mapping where any included group may define min, max, or
equal targets in g/person/day. Leaving constraints empty disables all
food group limits; add entries only for the groups you want to control.
Diet Controls¶
diet:
baseline_age: "All ages"
# Foods whose per-country intake is computed directly from FAOSTAT Food
# Balance Sheet supply rather than disaggregated from GDD/GBD group
# totals. For each listed food the override sets
# intake_g_day = FBS_supply × within_FBS_share × basis_factor
# × (1 − loss_fraction) × (1 − waste_fraction)
# × 1000 / 365
# where ``basis_factor`` is looked up via the shared weight_conversion
# table (e.g. carcass_to_fresh for meats, fresh_to_dry for tea-dried).
# which mirrors the FLW correction the build_model animal_production and
# food_processing links apply on the production side. This anchors the
# diet to the same FAOSTAT backbone as baseline production.
#
# Meats, poultry, and eggs are anchored to FBS because slaughter-volume
# supply is more reliable than self-reported intake (which is known to
# over-report red meat in particular). Dairy is intentionally NOT in the
# list: its food_loss_waste convention is non-standard (waste=0.30
# already lumps in the non-food fraction of raw milk), so the GDD-based
# disaggregation happens to mass-balance against the production-side
# FLW; switching it to an FBS override would break that balance.
fbs_override_foods:
- yam
- cocoa-powder
- coffee-green
- tea-dried
- meat-cattle
- meat-pig
- meat-sheep
- meat-chicken
- eggs
# NHANES "What We Eat in America" / FPED demographic table.
# Used as a region-specific intake override for the United States. The
# `cycle` corresponds to the FPED release cycle in the URL (e.g. 1720 for
# the 2017-March 2020 Prepandemic release). The `url` template embeds the
# cycle and is filled in by the retrieval rule.
nhanes:
cycle: "1720"
url: "https://www.ars.usda.gov/ARSUserFiles/80400530/pdf/fped/Table_1_FPED_MaleFemale_{cycle}.pdf"
reference_year: 2018
# GBD-covered risk groups (fruits, vegetables, whole_grains, legumes,
# nuts_seeds, red_meat) are anchored to GBD's reported intake when
# available, so the model's attributable disease burden is consistent
# with what GBD itself estimates from its dietary risk exposure data.
# GDD/FAOSTAT serves as the fallback when GBD is missing for a country.
# Per-food-group native mass basis declared by each external dietary
# data source. The diet pipeline compares the source's basis to the
# food's basis (data/curated/food_basis.csv) and applies the matching
# factor in weight_conversion below when they differ. This replaces
# the older scattered conversion flags.
#
# Notes:
# - GDD reports food intake "as consumed" (cooked weight for cereals/
# legumes/meats; fresh weight for fruits/vegetables/dairy).
# - GBD shares basis with its IHME RR dose-response curves. Most
# groups follow GDD; whole_grains is the documented exception
# (calibrated on dry whole-grain content; the GBD TMREL of 100-150
# g/day is on a dry basis -- at cooked weight it would be trivially
# met, contradicting GBD's own findings).
# - FAOSTAT FBS supply is in raw agricultural commodity terms;
# sugar (item 2542) is "Sugar Raw Equivalent" (dry); dairy is
# in milk-equivalent fresh weight; meat is carcass weight which
# we convert to retail (raw fresh) before declaring basis.
# - NHANES values pass through FPED ounce-equivalents: a hybrid
# basis we leave untouched.
source_basis:
# GDD-IA mass is derived from kcal (g = kcal_ia / kcal_per_g_model_basis)
# in prepare_gdd_ia_dietary_intake, so it's already in model basis
# by construction. No `gdd_ia` source_basis entries needed.
gbd:
whole_grains: dry
legumes: cooked
red_meat: cooked
fruits: fresh
vegetables: fresh
nuts_seeds: dry
milk: fresh
# FBS Food-supply "primary equivalents" each carry their own basis
# (cocoa beans, green coffee, milk-equiv, raw sugar, etc.). For most
# foods the FBS basis happens to match the model's food_basis and no
# entry is needed (the override pipeline applies factor 1.0).
# Exceptions:
# - Meats: FBS reports carcass weight; food_basis declares retail
# ("fresh"). Converted via weight_conversion.carcass_to_fresh.
# - Tea: FBS item 2635 ("Tea incl. mate") supply is green leaf;
# food_basis declares tea-dried as dry. Converted via
# weight_conversion.fresh_to_dry.
faostat_fbs_supply:
meat-cattle: carcass
meat-pig: carcass
meat-sheep: carcass
meat-chicken: carcass
tea-dried: fresh
# NHANES is intentionally absent: FPED ounce-equivalents are a
# hybrid basis (flour-content for breads, cooked weight for rice)
# that doesn't admit a single conversion factor, so the helper
# returns 1.0 (no conversion) when a source isn't declared.
# Per-(source, country, food_group) basis overrides live in
# data/curated/diet_source_basis_overrides.csv. The CSV is read
# alongside the global source_basis declaration above; rows in the
# CSV take precedence per (source, country, food_group). Keep
# high-volume per-country exceptions in the CSV (it scales better
# than a YAML block) and reserve this YAML for the cross-cutting
# global defaults.
# GDD-IA pipeline configuration.
gdd_ia:
# GDD-IA reports meat in cooked weight (implied kcal/g for beef
# ≈ 2.4, between raw 2.15 and cooked 2.50). Apply cooked-to-raw
# inflation 1/0.7 ≈ 1.43 to land on raw retail (model basis).
cooked_to_raw:
red_meat: 1.43
# Extra per-country proxies on top of the built-in COUNTRY_PROXIES
# in prepare_gdd_ia_dietary_intake.py. Mapping country → proxy code.
country_proxies: {}
Customize baseline_age if you pre-process alternative cohorts for the baseline
diet. The reference year is controlled by the top-level baseline_year parameter.
These values are used whenever validation.enforce_baseline_diet is set to true.
Biomass¶
biomass:
crops:
- maize
- oil-palm
- sugarcane
- biomass-sorghum
marginal_values_usd_per_tonne: 0 # USD_2024 per tonne dry matter exported to the energy sector
enforce_baseline_demand: true # Enforce baseline biofuel/industrial and biogas demand
biofuel_demand_scale: 1.0 # Solve-time multiplier on enforced biofuel/industrial and biogas demand
biogas_crop_demand: "data/curated/biogas_crop_demand.csv" # Biogas crop demand (silage maize); null to disable
enforce_fiber_demand: true # Enforce baseline fiber demand (cotton lint) from FAOSTAT FBS
# Foods that stay part of the diet but can additionally be routed to biomass
# for disposal. Without this outlet the model dumps surplus via food slack
# (visible as large negative duals on the food consumption equality).
# Two patterns of "real-world non-food demand" justify a disposal route:
# - Forced co-products of non-food commodity demand (e.g. cottonseed-oil
# is jointly produced when cotton is grown for fiber).
# - Crops where actual production exceeds what the modelled diet absorbs,
# reflecting unrepresented uses (post-harvest losses beyond food-group
# waste factors, birdseed/forage, coir/coconut by-products, etc.).
# See docs/configuration.rst for guidance on adding to this list.
disposal_foods:
- cottonseed-oil
- sesame-oil
- sesame-seed
- groundnut-oil
- groundnut
- coconut-oil
- coconut
- foxtail-millet
# Major oilseed oils: disposal lets the LP process more oilseed to meet
# the meal-driven protein-feed demand without being constrained by the
# fixed enforce_baseline_diet oil consumption (real-world soybean/rapeseed/
# sunflower processing is meal-driven, with surplus oil going to biofuel
# or industrial uses that the model captures via this sink).
- soybean-oil
- rapeseed-oil
- sunflower-oil
- chickpea
- gram
- rendered-fat # tallow/lard yielded as a co-product of cattle/pig animal_production; ~50% goes to industrial uses (soap, biofuel) not captured in IA food intake
Per-country biomass buses track dry-matter exports to the energy sector. All foods
listed under byproducts gain links to this bus, providing a disposal route for
byproducts that lack feed mappings. Crops listed in biomass.crops can be diverted
directly as feedstocks. The marginal_values_usd_per_tonne parameter
(USD2024 per tonne dry matter) sets the price received when biomass leaves the
food system; set to 0 for free disposal.
Foods listed under biomass.disposal_foods get an additional link from their food
bus to the country’s biomass bus, but unlike byproducts they remain part of the
diet and food-group tracking. This route is intended for foods where actual production
exceeds what the modelled diet absorbs, leaving the optimizer no realistic outlet for
the surplus other than food-balance slack. Two patterns are common:
Forced co-products of non-food commodity demand, e.g. cottonseed oil is a fixed-coefficient byproduct of cotton ginning when cotton is grown for fiber.
Crops where real-world production includes uses the model does not represent: birdseed and forage for foxtail-millet, post-harvest losses beyond the food-group waste factors for sesame, coir/charcoal/husk uses for coconut, whole-peanut feed use beyond what the oilseed-meal pool captures for groundnut.
Without a disposal route the consumption equality on these foods would be satisfied
by food slack at validation.slack_marginal_cost, which inflates the objective and,
more importantly, drives the dual variables of the consumption equality strongly
negative — which biases consumer-value calibration (see Consumer Values).
The amount actually routed to biomass in a baseline solve is itself a useful diagnostic
of the gap between baseline production and modelled outlets; it can be inspected via
the biomass_disposal carrier on links in the solved network.
When enforce_baseline_demand is true, biofuel and biogas crop demand is fixed at
baseline levels. Each biofuel link is created with p_nom equal to baseline demand
and p_min_pu = 1.0, forcing flow to match demand exactly. Two sources of demand
are combined:
Biofuel/industrial demand from FAOSTAT Food Balance Sheets (
Other useselement), routed via food buses. This captures ethanol (maize grain, sugarcane) and biodiesel (vegetable oils) demand.Biogas crop demand from
biogas_crop_demand(default:data/curated/biogas_crop_demand.csv), routed directly from crop buses. This captures whole-crop silage maize diverted to anaerobic digestion for biogas production. Setbiogas_crop_demandtonullto disable.
Country |
Crop |
Demand (Mt DM) |
Source |
|---|---|---|---|
DEU |
silage-maize |
14.85 |
FNR 2024: 900 kha biogas maize × ~47 t FM/ha × 35% DM [1] |
ITA |
silage-maize |
2.40 |
ISAAC/CIB: ~125 kha biogas maize in Po Valley × ~55 t FM/ha [2] |
AUT |
silage-maize |
0.25 |
Austrian Biomass Association: ~20 kha estimated [3] |
CZE |
silage-maize |
0.42 |
Czech Biogas Association: ~40 kha [4] |
Countries with negligible or zero biogas crop demand are omitted (zero by default). Denmark banned crop-based biogas feedstock; France caps it at 15%; Poland, Netherlands, and Belgium use manure-dominant systems.
Footnotes
When enforce_fiber_demand is true, baseline fiber demand (cotton lint) is enforced
via per-country fiber buses and fixed-capacity stores. Each country with positive
demand gets a fiber:{country} bus and a store:fiber:cotton-lint:{country} store
whose capacity equals the FAOSTAT-derived demand. The store bounds
(e_min_pu = e_max_pu = 1.0) force the store level to equal demand exactly, so
cotton lint production must match baseline fiber consumption. Cotton lint is excluded
from biomass byproduct routing when fiber demand is enforced to prevent double-counting.
Animal Products¶
animal_products:
include:
- meat-cattle
- meat-pig
- meat-chicken
- dairy
- eggs
- dairy-buffalo
- meat-sheep
# GLEAM 3.0 production system → model product mapping.
# Defines which model products each (Animal, LPS) system contributes to.
# Multi-product systems (e.g. cattle grazing → dairy + meat) are split
# using FCR-weighted shares in the feed baseline.
#
# Sheep/goat milk is proxied through "dairy" (cattle milk pathway) rather
# than modeled as a separate product. At ~3-4% of global milk production
# the volume doesn't justify a distinct product with its own efficiency,
# emissions, and nutritional profile. The feed accounting is still correct:
# GLEAM3 sheep/goat feed intake is captured in the dairy baseline, and
# FAOSTAT dairy production (see faostat_items.dairy) includes sheep/goat
# milk. The production-based scaling step reconciles any efficiency mismatch.
gleam3_system_product_map:
Cattle:
Grassland: [dairy, meat-cattle]
Mixed: [dairy, meat-cattle]
Feedlots: [meat-cattle]
Buffalo:
Grassland: [dairy-buffalo, meat-cattle]
Mixed: [dairy-buffalo, meat-cattle]
Sheep:
Grassland: [dairy, meat-sheep]
Mixed: [dairy, meat-sheep]
Goats:
Grassland: [dairy, meat-sheep]
Mixed: [dairy, meat-sheep]
Chicken:
Broiler: [meat-chicken]
Layer: [eggs]
Backyard: [eggs, meat-chicken]
Pigs:
Backyard: [meat-pig]
Intermediate: [meat-pig]
Industrial: [meat-pig]
# For multi-product species (cattle, buffalo, chicken, sheep/goats),
# the Wirsenius scaling factor f splits total GLEAM3 feed between
# co-products. Countries with f far from the regional median likely
# reflect GLEAM3 data quality issues rather than real efficiency
# differences. This factor clamps f to [median/k, median*k] where k
# is the value below. Set to a large value (e.g. 100) to disable.
me_scaling_clamp_factor: 2.0
# Ruminant net-to-metabolizable energy conversion efficiency factors.
# Used to convert Wirsenius (2000) net-energy values to metabolizable
# energy: ME_required = NE_m/k_m + NE_g/k_g (+ NE_l/k_l for dairy).
# k_m and k_g follow the NASEM (2016) / NRC (2000) Beef Cattle California
# Net Energy System equations evaluated at a typical mixed-diet
# metabolizability q = ME/GE ≈ 0.60: k_m ≈ 0.65, k_g ≈ 0.43.
# k_l is the fixed efficiency from NRC (2001) Dairy Cattle, 7th rev. ed.
net_to_metabolizable_energy_conversion:
k_m: 0.65 # Maintenance efficiency (NASEM 2016 / NRC 2000 Beef)
k_g: 0.43 # Growth efficiency (NASEM 2016 / NRC 2000 Beef)
k_l: 0.64 # Lactation efficiency, dairy (NRC 2001 Dairy)
# Carcass-to-retail meat conversion factors live in the top-level
# `weight_conversion.carcass_to_fresh` table (single source of truth used
# by the animal-production, food-loss-waste, and baseline-diet pipelines).
# Animal-production co-product foods. Each entry maps a co-product food
# name to per-source-product yields (Mt co-product per Mt primary retail
# product). Source products not listed yield zero co-product. The bus
# ``food:{co-product}:{country}`` is auto-created in the food_list so
# the co-product can be consumed, traded, or routed to disposal.
# Rendered fat (tallow / lard): typical carcass-fat fractions
# (tallow ~6% of cattle carcass / carcass_to_retail 0.67 = 0.090;
# lard ~10% / 0.73 = 0.137). Sources: USDA AH 697; FAOSTAT FBS
# item 2737 cross-check.
co_products:
rendered-fat:
yield_per_retail:
meat-cattle: 0.090
meat-pig: 0.137
# FAOSTAT QCL item names to aggregate for each model product.
# First item is the primary product; additional items are proxied species
# whose production is lumped into the model product.
faostat_items:
dairy:
- "Raw milk of cattle"
- "Raw milk of goats" # proxy: goat milk → dairy
- "Raw milk of sheep" # proxy: sheep milk → dairy
- "Raw milk of camel" # proxy: camel milk → dairy
meat-cattle:
- "Meat of cattle with the bone, fresh or chilled"
- "Meat of buffalo, fresh or chilled" # proxy: buffalo → cattle
meat-pig:
- "Meat of pig with the bone, fresh or chilled"
meat-chicken:
- "Meat of chickens, fresh or chilled"
- "Meat of ducks, fresh or chilled" # proxy: duck → chicken
- "Meat of turkeys, fresh or chilled" # proxy: turkey → chicken
- "Meat of pigeons and other birds n.e.c., fresh, chilled or frozen"
eggs:
- "Hen eggs in shell, fresh"
dairy-buffalo:
- "Raw milk of buffalo"
meat-sheep:
- "Meat of sheep, fresh or chilled"
- "Meat of goat, fresh or chilled" # proxy: goat → sheep
residue_crops:
- banana
- barley
- chickpea
- cowpea
- dry-pea
- dryland-rice
- foxtail-millet
- gram
- maize
- oat
- pearl-millet
- phaseolus-bean
- pigeonpea
- rye
- sorghum
- sugarcane
- wetland-rice
- wheat
fodder_decomposition:
fdd_crops:
- alfalfa
- silage-maize
eurostat:
averaging_years: 5
suitability_blend_weight: 0.7
yield_corrections:
enabled: true
eurostat_moisture: 0.65
floor: 0.2
ceiling: 2.0
grazing:
enabled: true
isimip_utilization_rate: 0.60 # Applied to ISIMIP yields in merge step
forage_overlap_crops:
- alfalfa
- silage-maize
- biomass-sorghum
grassland_forage_calibration:
enabled: true
generate: false
grassland_yield_correction: "data/curated/calibration/grassland_yield.csv"
fodder_conversion_correction: "data/curated/calibration/fodder_conversion.csv"
exogenous_forage: "data/curated/calibration/exogenous_forage.csv"
scenario: "default"
# Protein-feed calibration: per-country exogenous monogastric/ruminant
# protein supply (Mt DM) derived from positive feed-bus slack on an
# uncalibrated validation solve. Stands in for protein sources the model
# does not produce endogenously: fishmeal, synthetic amino acids, and
# animal by-products (meat & bone meal, blood, feather meal). When a
# scenario has ``enabled: true`` the CSV is read at solve time and free
# generators up to the listed cap are added to each country's protein
# feed bus (in validation/enforce_baseline_feed mode dispatch is forced).
# See docs/calibration.rst for the full discussion.
feed_protein_calibration:
enabled: true
generate: false
exogenous_protein: "data/curated/calibration/exogenous_protein.csv"
scenario: "default"
# Food waste calibration: a per-food-group multiplier on (1 - waste_fraction)
# applied uniformly across countries in prepare_food_loss_waste.py. Derived
# from food-bus slack on an uncalibrated validation solve (see
# config/calibration/food_waste.yaml and tools/calibrate food_waste). Only
# the consumer-side waste fraction is adjusted; the producer-side loss
# fraction is left to its FBS/SDG default.
food_loss_waste_calibration:
enabled: true
generate: false
calibration_file: "data/curated/calibration/food_waste.yaml"
food_groups:
# Groups with documented FBS-vs-GDD gap that the SDG-based defaults
# under- or over-state. The SDG global all-foods 10% waste rate fits
# poorly for several groups; the calibrated multipliers below pull
# each group's effective waste fraction toward what the GDD-vs-FBS
# comparison and FAO Save Food 2011 literature suggest:
# - vegetables / fruits: large excess under SDG (~10% waste vs
# ~50% implied by FAO Save Food / FBS-to-GDD ratios).
# - starchy_vegetable / oil / stimulants: small shortage; SDG
# over-states consumer waste for storage-stable roots/tubers
# and concentrated commodities (oils, coffee/cocoa).
- vegetables
- fruits
- starchy_vegetable
- oil
- stimulants
scenario: "default"
# Food demand calibration: a per-food global multiplier on the baseline-diet
# target_mt applied uniformly across countries at solve time (see
# ``_match_baseline_to_consume_links`` in workflow/scripts/solve_model/core.py).
# Derived from per-food slack on an uncalibrated validation-mode solve (see
# config/calibration/food_demand.yaml and ``tools/calibrate food_demand``).
# Closes the residual per-food global gap that remains between FAOSTAT
# QCL-derived supply and the GDD-IA-derived demand after food-waste
# calibration, so downstream cost and production-stability calibrations
# see a self-consistent baseline.
food_demand_calibration:
enabled: true
generate: false
calibration_file: "data/curated/calibration/food_demand.csv"
# Bounds on the per-food multiplier. Tight on purpose: anything that
# would fall outside flags a structural data issue worth investigating
# rather than being silently absorbed.
min_multiplier: 0.5
max_multiplier: 2.0
scenario: "uncalibrated"
Disable grazing to force intensive feed-based systems.
Commodity Configuration (Trade and Marketing Costs)¶
# Unified commodity-class framework. Every modelled crop, food (including
# animal products and byproducts), and feed category is assigned to exactly
# one class. Each class carries two cost parameters:
#
# trade_cost_per_t_km -- USD_2024 per tonne per km, charged on inter-hub
# and country-to-hub trade links
# marketing_cost_per_t -- USD_2024 per tonne, one-shot farm-to-wholesale
# marketing markup charged on the production link
# of the commodity (crop_production, food_processing,
# feed_conversion, animal_production)
#
# No fallbacks: a config that fails to assign a commodity to a class will be
# rejected by ``workflow/validation/commodities.py``. See ``docs/costs.rst``
# (Marketing costs) and ``docs/food_processing.rst`` (Trade costs) for the
# literature behind the default magnitudes.
commodities:
hubs: 20 # Number of trade hubs (shared across crop, food, and feed trade)
crops:
non_tradable: [alfalfa, biomass-sorghum, silage-maize]
classes:
bulk_dry_goods:
trade_cost_per_t_km: 0.006 # USD_2024 / t / km
marketing_cost_per_t: 30 # USD_2024 / t (farm to grain elevator / first-handler wholesale)
items:
- wheat
- dryland-rice
- wetland-rice
- maize
- barley
- oat
- rye
- sorghum
- buckwheat
- foxtail-millet
- pearl-millet
- soybean
- dry-pea
- chickpea
- cowpea
- gram
- phaseolus-bean
- pigeonpea
- cocoa
- coffee
- tea
- sunflower
- rapeseed
- groundnut
- sesame
- cotton
bulky_fresh:
trade_cost_per_t_km: 0.014
marketing_cost_per_t: 60 # USD_2024 / t (bulky low-value perishable; cassava, potato, sugar crops, biomass)
items:
- white-potato
- sweet-potato
- cassava
- yam
- plantain
- sugarbeet
- sugarcane
- oil-palm
- alfalfa
- silage-maize
- biomass-sorghum
perishable_high_value:
trade_cost_per_t_km: 0.022
marketing_cost_per_t: 200 # USD_2024 / t (fragile fresh produce; cooling, packing, fast logistics)
items:
- tomato
- carrot
- onion
- cabbage
- banana
- watermelon
- mango
- citrus
- coconut
- apple
- olive
foods:
non_tradable: []
classes:
processed_dry_goods:
trade_cost_per_t_km: 0.012 # USD_2024 / t / km (bulk milled / dried goods)
marketing_cost_per_t: 80 # USD_2024 / t (mill + wholesale margin on staple processed foods)
items:
- flour-white
- flour-wholemeal
- rice-brown
- rice-white
- barley-hulled
- buckwheat
- foxtail-millet
- maize
- oat
- pearl-millet
- rye
- sorghum
- chickpea
- cowpea
- dry-pea
- gram
- phaseolus-bean
- pigeon-pea
- soy
- groundnut
- sesame-seed
- sunflower-seed
- sugar
- cocoa-powder
- coffee-green
- tea-dried
processed_oils:
trade_cost_per_t_km: 0.016 # USD_2024 / t / km (liquid bulk; tankers / drums)
marketing_cost_per_t: 120 # USD_2024 / t (extraction + refining + wholesale)
items:
- coconut-oil
- cottonseed-oil
- groundnut-oil
- olive-oil
- palm-oil
- rapeseed-oil
- sesame-oil
- soybean-oil
- sunflower-oil
fresh_produce:
trade_cost_per_t_km: 0.022
marketing_cost_per_t: 200 # USD_2024 / t (fresh fruits, vegetables, roots delivered as food)
items:
- apple
- banana
- citrus
- coconut
- mango
- plantain
- watermelon
- cabbage
- carrot
- onion
- tomato
- cassava
- potato
- sweet-potato
- yam
chilled_meat:
trade_cost_per_t_km: 0.028
marketing_cost_per_t: 800 # USD_2024 / t retail-equiv (slaughter + packing margin; USDA ERS 2023)
items:
- meat-cattle
- meat-pig
- meat-chicken
- meat-sheep
dairy_and_eggs:
trade_cost_per_t_km: 0.024
marketing_cost_per_t: 300 # USD_2024 / t (dairy processing + packaging + cold chain)
items:
- dairy
- dairy-buffalo
- eggs
feed_byproduct:
trade_cost_per_t_km: 0.008 # USD_2024 / t / km (bulk dry feed-grade)
marketing_cost_per_t: 30 # USD_2024 / t (low-value bulk byproducts routed to feed)
items:
- barley-bran
- beet-pulp
- buckwheat-hulls
- oat-bran
- rice-bran
- wheat-bran
- wheat-germ
- ddgs
- maize-gluten-feed
- maize-gluten-meal
- oilseed-meal
- palm-kernel-meal
- rapeseed-meal
- molasses
industrial_byproduct:
trade_cost_per_t_km: 0.008
marketing_cost_per_t: 40 # USD_2024 / t (bulk fiber / biofuel / co-product exports)
items:
- cotton-lint
- maize-ethanol
- maize-starch
- sugarcane-ethanol
- rendered-fat
feeds:
non_tradable: [ruminant_forage]
classes:
grain_protein:
trade_cost_per_t_km: 0.006 # matches crop bulk_dry_goods (concentrated, dense)
marketing_cost_per_t: 30
items:
- ruminant_grain
- ruminant_protein
- monogastric_grain
- monogastric_protein
forage:
trade_cost_per_t_km: 0.012 # 2x grain cost
marketing_cost_per_t: 25
items:
- ruminant_forage
bulky_low_quality:
trade_cost_per_t_km: 0.016 # ~2.7x grain cost
marketing_cost_per_t: 35
items:
- ruminant_roughage
- monogastric_low_quality
The commodities block carries both the inter-hub trade cost (trade_cost_per_t_km,
USD_2024 per tonne per km) and the farm-to-wholesale marketing markup
(marketing_cost_per_t, USD_2024 per tonne) for every modelled commodity.
Every crop in crops:, every modelled feed category, and every food (including
animal products and byproducts) must appear in exactly one class. The strict
assignment is enforced by workflow/validation/commodities.py – there is no
default fallback. See Production Costs for the literature behind the default magnitudes
and Default marketing-cost parameters (USD_2024 per tonne) for the class-by-class table.
Increase trade_cost_per_t_km to explore localized food systems; decrease for
globalized trade. The marketing_cost_per_t parameter is the new
farm-to-wholesale layer; raising it widens the gap between farm-gate production
costs and effective commodity prices in the optimiser.
Emissions Pricing¶
emissions:
ghg_pricing_enabled: true # Whether to include GHG pricing in the objective function
ghg_price: 200 # USD_2024/tCO2-eq (emissions stored in MtCO2-eq internally)
ch4_to_co2_factor: 27.0 # IPCC AR6 GWP100 (WG1, Chapter 7, Table 7.15; https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-7/)
n2o_to_co2_factor: 273.0 # IPCC AR6 GWP100 (WG1, Chapter 7, Table 7.15; https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-7/)
rice:
methane_emission_factor_kg_per_ha: 134.47 # kg CH4 per ha per crop (IPCC 2019 Refinement, Vol 4, Chapter 5, Tables 5.11 and 5.11A. Default for continuously flooded fields.)
rainfed_wetland_rice_ch4_scaling_factor: 0.54 # IPCC 2019 Refinement, Vol 4, Chapter 5, Table 5.12. Scaling factor for "Regular rainfed" water regime.
fertilizer:
synthetic_n2o_factor: 0.010 # kg N2O-N per kg N input (IPCC 2019 Refinement, Table 11.1 aggregated default)
organic_n2o_factor: 0.006 # kg N2O-N per kg N applied for organic amendments incl. manure (IPCC 2019 Refinement, Table 11.1; EF1 wet-climate default). Applied to actual n_applied so direct manure-N2O stays consistent with manure_n_to_fertilizer.
# Indirect N2O emission parameters (IPCC 2019 Refinement, Chapter 11.2.2, Table 11.3)
indirect_ef4: 0.010 # kg N2O-N per kg (NH3-N + NOx-N) volatilized and deposited (EF4)
indirect_ef5: 0.011 # kg N2O-N per kg N leached/runoff (EF5)
frac_gasf: 0.11 # Fraction of synthetic fertilizer N volatilized as NH3 and NOx (FracGASF)
frac_gasm: 0.21 # Fraction of organic N and grazing N volatilized as NH3 and NOx (FracGASM)
frac_leach: 0.24 # Fraction of applied/deposited N lost through leaching and runoff in wet climates (FracLEACH-(H))
residues:
incorporation_n2o_factor: 0.010 # kg N2O-N per kg residue N incorporated into soil (IPCC 2019 Refinement, Table 11.1 aggregated default)
Land Use Change¶
luc:
horizon_years: 25
managed_flux_mode: "zero"
forest_fraction_threshold: 0.2 # Minimum forest fraction (0-1) to apply regrowth sequestration
savanna_pvc_threshold: 75 # MgC/ha potential vegetation carbon; Hayek et al. 2024 threshold for closed vs open savanna
# Data source for cropland baseline area:
# - "gaez": GAEZ RES06-HAR (harvested area downscaled from FAOSTAT 2019-2021 3-year average), consistent with production stability
# - "esa": ESA CCI land cover satellite data
cropland_source: "gaez"
Controls how land use change emissions and carbon sequestration are modeled over the planning horizon.
- Parameters:
horizon_years: Time horizon (years) for amortizing land use change emissionsmanaged_flux_mode: How to treat emissions from existing managed land ("zero"assumes no net flux from current agricultural land)forest_fraction_threshold: Minimum forest cover fraction (0-1) required for a grid cell to be eligible for regrowth sequestration when land is spared
Health Configuration¶
health:
enabled: true # Whether to include health costs in the objective function
region_clusters: 30
intake_grid_points: 15 # Number of grid knots over empirical RR range
log_rr_points: 15
ssb_sugar_g_per_100g: 5.7 # ≈50 kcal per 226.8 g sugar-sweetened beverage (SSB) implies ~5.7 g sugar per 100 g
value_per_yll: 50000 # USD_2024 per year of life lost
intake_cap_g_per_day: 1000 # Uniform generous cap on intake grids and clipping
intake_age_min: 11 # GDD adult band starts at 11; set to 11 to retain adult intake data. Note however that GDB chronic disease risk factors are for adults of >=25 years.
# Dietary risk factors to consider (must match GDD data items)
risk_factors:
- fruits
- vegetables
- nuts_seeds
- legumes
- red_meat
- whole_grains
# GBD also covers seafood omega-3 and processed meat risk factors,
# but fish/seafood and processed meat are not modelled as food groups.
# GDB has data on sugar-sweetened beverage intake as a risk factor,
# from which we can in theory derive added sugar intake risk
# factors. The epidemiological evidence for this is, however,
# lacking, and so we don't count "sugar" as a risk factor.
# - sugar
# Health outcomes/causes to consider (must be present in IHME GBD data and relative risks)
causes:
- CHD # Coronary/Ischemic Heart Disease
- Stroke # Stroke (all types)
- T2DM # Type 2 Diabetes Mellitus
- CRC # Colorectal Cancer
# Mapping of risk factors to the causes they affect
risk_cause_map:
fruits: [CHD, Stroke, T2DM]
vegetables: [CHD, Stroke]
nuts_seeds: [CHD, T2DM]
legumes: [CHD]
red_meat: [CHD, Stroke, T2DM, CRC]
whole_grains: [CHD, Stroke, T2DM, CRC]
# sugar: [CHD, Stroke, T2DM, CRC]
# Per risk-factor overrides using log-linear RR from literature CSV files.
# When a risk factor maps to a CSV path, the GBD dose-response curve is replaced
# with a log-linear curve derived from the CSV, age-corrected using relative
# attenuation factors from the GBD data.
alternative_rr:
red_meat: "data/curated/red_meat_rr_log_linear.csv"
# (Per-risk-factor basis is now declared centrally in
# diet.source_basis and applied automatically via the basis helper
# whenever the source basis differs from the food's basis declared
# in data/curated/food_basis.csv.)
# Multi-objective clustering settings for grouping countries into health clusters
clustering:
weights:
geography: 1.0 # Weight for geographic proximity
gdp: 0.5 # Weight for GDP per capita similarity
population: 0.3 # Weight for population balance across clusters
Reduce region_clusters or log_rr_points to speed up solving.
The value_per_yll parameter monetizes health impacts in USD_2024 per year of life lost (YLL).
Solver Configuration¶
solving:
solver: highs
# solver: gurobi
# io_api controls how the model is communicated to the solver:
# - 'lp' or 'mps': Write problem to file (LP/MPS format) which solver reads
# - 'direct': Use solver's Python API directly (e.g., gurobipy) for faster performance
# - null: Use linopy's default (typically 'lp')
io_api: "direct"
threads: 1 # Number of threads to use for solving
# The calculate_fixed_duals option induces linopy to solve the MILP,
# then fix all integer variables to their optimal values, then solve
# the resulting LP in order to get dual variables for model
# constraints.
calculate_fixed_duals: true
options_gurobi:
LogToConsole: 0
OutputFlag: 1
Method: 2
MIPGap: 0.001 # target 0.1% relative optimality gap
MIPFocus: 2
options_highs:
solver: "choose"
mip_rel_gap: 0.001 # align relative gap with gurobi setting
export_for_tuning: false # Export model to MPS before solving (for Gurobi parameter tuning)
time_limit: null # Solver-internal time limit in minutes (null = no limit)
runtime: 5 # Maximum solver runtime in minutes (used by SLURM)
mem_mb: 8000 # Maximum solve_model memory in MB (used by SLURM)
inline_analysis: false # When true, analysis runs inside the solve process (no intermediate .nc)
# --- section: remote_solve ---
remote_solve:
enabled: false # If true, solve_model is executed remotely over SSH
local_scenarios: ["baseline"] # Scenarios that must always solve locally (currently only "baseline" is supported)
host: "user@login.cluster" # Placeholder SSH host or alias; customize for your setup
workdir: "~/path/to/food-opt" # Placeholder remote project root containing this repository
pixi_env: "default" # Placeholder remote pixi environment passed to tools/smk -e
use_slurm: false # Set true when remote solves should be submitted via --slurm
slurm_account: "" # SLURM account for remote job submission
slurm_partition: "" # SLURM partition for remote compute jobs
sync_workflow: false # Sync workflow/ and config/ code before remote solve (may dirty remote git state)
sync_pixi_files: false # Sync pixi.toml and pixi.lock to remote workdir
ssh_options: [] # Extra ssh CLI args, e.g. ["-o", "ControlMaster=auto"]
rsync_options: [] # Extra rsync CLI args
preflight_check: true # If true, create remote workdir before syncing
# --- section: sensitivity_analysis ---
# Defaults for surrogate-based global sensitivity analysis. GSA configs
# (see e.g. config/gsa.yaml) deep-merge their overrides on top of this.
# Non-GSA configs never run the surrogate rules but still need the block
# to satisfy schema validation.
sensitivity_analysis:
holdout_fraction: 0.15
threads: 6
# Method downstream consumers (uncertainty-band plots, notebooks) use when
# no explicit choice is given. Must match a key under ``methods``.
default_surrogate: xgb
# When false (default), the surrogate-fit rule declares every Sobol scenario
# the generator promises so Snakemake drives the full solve→analyse→
# surrogate chain in a single invocation (the canonical Snakemake idiom).
# When true, the rule instead scans the analysis directory and fits the
# surrogate only on scenarios with complete outputs on disk; intended for
# cluster sweeps where solves run *outside* Snakemake (via
# ``tools/batch-solve``) and a small fraction may legitimately be missing
# because per-solve TimeLimit hit. Setting this to true changes the
# workflow contract: the user must run the solve+analyse phase
# before targeting the surrogate. As a guardrail, the rule errors out if
# more than 50% of scenarios are missing.
discover_scenarios_on_disk: false
# Sobol-index settings, shared across surrogate methods. ``outputs``
# is the allowlist of OutputSpec names whose Sobol indices we
# compute, plot, and persist; vector specs are excluded by default
# because the per-element fan-out across MC samples and plot rules
# blows up for ~80 individual foods.
sobol:
outputs: [total_cost, co2, ch4, n2o, land_use, yll]
grid_resolution: 15
n_mc_global: 16384
n_mc_conditional: 2048
methods:
pce:
method_options:
cross_truncation: 0.8
rf:
method_options:
n_estimators: 500
mars:
method_options:
max_terms: 50
max_degree: 2
penalty: 3.0
n_knots: 25
xgb:
method_options:
n_estimators: 5000
max_depth: 4
learning_rate: 0.02
subsample: 0.8
colsample_bytree: 0.8
min_child_weight: 5
early_stopping_rounds: 50
# Surrogate targets extracted from each scenario's analysis
# directory. Each entry declares:
# kind: "scalar" (default) or "vector"
# source: parquet filename under analysis/scen-<scenario>/
# reducer: reducer registered in
# workflow.scripts.analysis.sensitivity_common.REDUCERS
# (scalar reducers return float; vector reducers return
# dict[str, float], expanded to one column per element)
# (extras): kwargs forwarded to the reducer (e.g. column name)
# label: human-readable axis label used by plots
# units: axis-label suffix used by uncertainty-band plots
# Order here defines the display order in Sobol plots. Vector
# outputs are fit by xgb/rf only; pce/mars hard-fail at fit time.
outputs:
total_cost:
source: objective_breakdown.parquet
reducer: row_sum
label: Total Cost
units: bn USD
co2:
source: net_emissions.parquet
reducer: filter_sum
filter_col: gas
filter_value: co2
value_col: mtco2eq
label: "CO\u2082 Emissions"
units: "MtCO\u2082eq"
ch4:
source: net_emissions.parquet
reducer: filter_sum
filter_col: gas
filter_value: ch4
value_col: mtco2eq
label: "CH\u2084 Emissions"
units: "MtCO\u2082eq"
n2o:
source: net_emissions.parquet
reducer: filter_sum
filter_col: gas
filter_value: n2o
value_col: mtco2eq
label: "N\u2082O Emissions"
units: "MtCO\u2082eq"
land_use:
source: land_use.parquet
reducer: column_sum
column: area_mha
label: Land Use
units: Mha
yll:
source: health_totals.parquet
reducer: column_sum
column: yll_myll
label: Years of Life Lost
units: million YLL
foods:
kind: vector
source: food_consumption.parquet
reducer: pivot_column
key_col: food
value_col: consumption_mt
label: Food consumption (global)
units: Mt
feed_categories:
kind: vector
source: feed_by_category.parquet
reducer: pivot_column
key_col: category
value_col: mt_dm
label: Feed by category (global)
units: Mt DM
yll_by_cause:
kind: vector
source: health_attribution.parquet
reducer: pivot_column
key_col: cause
value_col: yll_myll
label: YLL by disease cause
units: million YLL
yll_by_food_group:
kind: vector
source: health_attribution.parquet
reducer: pivot_column
key_col: food_group
value_col: yll_myll
label: YLL by dietary risk factor
units: million YLL
- Solver choice:
HiGHS: Open-source, fast, good for most problems
Gurobi: Commercial, often faster for very large problems, requires license (free for academic users)
The remote_solve subsection allows delegating only solve_model to a
remote SSH host (for example an HPC login node) while keeping model building
and analysis local. See Workflow & Execution for setup instructions and usage
details.
Set remote_solve.local_scenarios (default: ["baseline"]) for scenarios
that must always use the local solve_model rule.
Plotting Configuration¶
plotting:
comparison_scenarios:
- "scen-default"
# Crop groups for map visualizations. Each group has a display name, a hex
# color, and a list of member crops. Every crop listed under the top-level
# `crops` key should belong to exactly one group (validated at startup).
# Colors sourced from ColorBrewer Dark2 and Paired palettes.
crop_groups:
Cereals:
color: "#E6AB02" # Dark2 #6
crops: [wheat, dryland-rice, wetland-rice, maize, barley, oat, rye,
sorghum, buckwheat, foxtail-millet, pearl-millet]
Legumes:
color: "#666666" # Dark2 #8
crops: [soybean, dry-pea, chickpea, cowpea, gram, phaseolus-bean, pigeonpea]
"Roots & tubers":
color: "#A6761D" # Dark2 #7
crops: [white-potato, sweet-potato, cassava, yam, plantain]
Vegetables:
color: "#1B9E77" # Dark2 #1
crops: [tomato, carrot, onion, cabbage]
Fruits:
color: "#D95F02" # Dark2 #2
crops: [banana, watermelon, mango, citrus, coconut, apple]
Oilseeds:
color: "#7570B3" # Dark2 #3
crops: [sunflower, rapeseed, groundnut, sesame, oil-palm, olive]
"Sugar crops":
color: "#E7298A" # Dark2 #4
crops: [sugarcane, sugarbeet]
Stimulants:
color: "#B15928" # Paired #12
crops: [cocoa, coffee, tea]
"Fiber crops":
color: "#1F78B4" # Paired #2
crops: [cotton]
"Feed crops":
color: "#66A61E" # Dark2 #5
crops: [alfalfa, silage-maize, biomass-sorghum, grassland]
colors:
# Sensitivity parameter colors and groups (tab20b palette).
# Group order determines stacking order; colors within a group are adjacent hues.
parameter_groups:
# Group order = stacking order (bottom → top); colours within each
# group ramp light → dark from bottom to top.
"Health risk": # reds, light → dark
color: "#d6616b"
parameters:
rr_protective: "#e7969c"
rr_harmful: "#d6616b"
Agricultural: # blues, light → dark
color: "#5254a3"
parameters:
flw_factor: "#9c9ede"
yield_factor: "#6b6ecf"
fcr_factor: "#5254a3"
Emissions: # greens, light → dark
color: "#8ca252"
parameters:
luc_factor: "#cedb9c"
ch4_factor: "#b5cf6b"
n2o_factor: "#8ca252"
"Policy valuation": # purples, light → dark
color: "#ce6dbd"
parameters:
ghg_price: "#de9ed6"
value_per_yll: "#ce6dbd"
crops:
wheat: "#C58E2D"
'dryland-rice': "#E0B341"
'wetland-rice': "#F7E29E"
maize: "#F1C232"
barley: "#B68D23"
oat: "#D4B483"
rye: "#A67C52"
sorghum: "#A0522D"
buckwheat: "#8B5A2B"
'foxtail-millet': "#E3C878"
'pearl-millet': "#D9A441"
soybean: "#7B4F2A"
'dry-pea': "#B9925B"
chickpea: "#D7B377"
cowpea: "#8C5C38"
gram: "#A47038"
'phaseolus-bean': "#6E3B1E"
pigeonpea: "#9C6B3E"
'white-potato': "#8FB98B"
'sweet-potato': "#CE7B3A"
cassava: "#6E8B3D"
yam: "#4F6F2C"
tomato: "#C0392B"
carrot: "#E67E22"
onion: "#D35400"
cabbage: "#27AE60"
banana: "#F7DC6F"
watermelon: "#E74C3C"
mango: "#F4A62A"
citrus: "#F39C12"
coconut: "#8E735B"
apple: "#E64C3C"
sunflower: "#F1C40F"
rapeseed: "#F5B041"
groundnut: "#A8683C"
sesame: "#C97A2B"
'oil-palm': "#A04000"
olive: "#6E7D57"
cocoa: "#5C3317"
coffee: "#6F4E37"
tea: "#4B7A2E"
cotton: "#F5F5DC"
sugarcane: "#9B59B6"
sugarbeet: "#AF7AC5"
alfalfa: "#1ABC9C"
'biomass-sorghum': "#16A085"
grassland: "#7FB77E"
# Colors sourced from ColorBrewer Dark2, Set2, and Paired palettes.
food_groups:
whole_grains: "#E6AB02" # Dark2 #6
grain: "#FFD92F" # Set2 #6
fruits: "#FC8D62" # Set2 #2
vegetables: "#66C2A5" # Set2 #1
legumes: "#8DA0CB" # Set2 #3
nuts_seeds: "#A6761D" # Dark2 #7
starchy_vegetable: "#E78AC3" # Set2 #4
oil: "#E5C494" # Set2 #7
red_meat: "#E31A1C" # Paired #6
poultry: "#FB9A99" # Paired #5
dairy: "#A6CEE3" # Paired #1
eggs: "#FDBF6F" # Paired #7
stimulants: "#B15928" # Paired #12
animal_fat: "#FFFF99" # Paired #11
fallback_cmaps:
crops: "Set3"
Customize visualization colors for publication-quality plots. The
colors.food_groups palette is applied consistently across all food-group
charts and maps; extend it if you add new groups to data/curated/food_groups.csv.