Model Framework

Mathematical Structure

The food-opt model is formulated as a mixed integer linear programming (MILP) problem that minimizes total system costs subject to constraints on production capacity, nutritional requirements, and environmental limits. Here, we give a high-level overview over the model structure; a complete listing of equations is outstanding.

Objective Function

The objective function minimizes total costs across multiple dimensions:

\[\min \sum_{i} c_i x_i\]

where \(x_i\) are decision variables and \(c_i\) are associated costs including:

  • Production costs: Crop and livestock production expenses

  • Trade costs: Transportation costs based on distance and product type

  • Environmental costs: Emissions priced at configured carbon price (USD/tCO₂-eq)

  • Health costs: Dietary risk factors valued using years of life lost (YLL) multiplied by the configured health.value_per_yll

Decision Variables

The model optimizes the following classes of decision variables:

  • Crop production (\(P_{c,r,w,k}\)): Production of crop \(c\) in region \(r\) with water supply \(w\) (irrigated/rainfed) and resource class \(k\)

  • Livestock production (\(L_{a,r,s}\)): Production of animal product \(a\) in region \(r\) using production system \(s\) (grazing/feed-based)

  • Food processing (\(F_{c,f,r}\)): Processing activities converting crop \(c\) to food product \(f\) in region \(r\)

  • Land allocation (\(A_{r,w,k}\)): Cropland area allocated in region \(r\), water supply \(w\), resource class \(k\)

  • Trade flows (\(T_{c,r,h}\), \(T_{c,h,h'}\)): Trade of commodity \(c\) between region \(r\) and hub \(h\), as well as between hubs \(h\) and \(h'\).

  • Food consumption (\(D_{f,r}\)): Per-capita consumption of food \(f\) in region \(r\)

Constraints

The model is subject to multiple constraint categories:

Production Constraints

  • Crop yields limiting crop production based on region and resource class

  • Livestock feed requirements and conversion efficiencies

  • Land availability limits by region and resource class

  • Water availability constraints by basin and growing season

Nutritional Constraints

  • Minimum macronutrient requirements (carbohydrates, protein, fat, calories)

  • Minimum food group consumption (whole grains, fruits, vegetables, etc.)

  • Per-capita dietary balance across the population

Processing and Trade Constraints

  • Crop-to-food conversion efficiencies

  • Hub-based trade network topology

  • Transport costs differentiated by commodity category

  • Non-tradable commodities (e.g., fodder crops)

Environmental Constraints

  • Optional limits on total greenhouse gas emissions

  • Optional limits on total nitrogen fertilizer application

PyPSA Implementation

The model is implemented using PyPSA (Python for Power System Analysis), which provides a flexible framework for optimizing flow networks with linear constraints. While PyPSA was originally designed for energy systems, its component-based structure maps naturally to food system flows.

Network Components

Carriers

Define the commodity types flowing through the network (e.g., crop_wheat, food_bread, nutrient_protein). Each carrier has an associated unit (tonnes, megacalories, etc.). See the PyPSA carriers documentation for more details.

Buses

Represent locations where commodities accumulate or are exchanged. Buses are typically defined per-country or per-region (e.g., crop_wheat_USA, land_class0_region42). See the PyPSA buses documentation for more details.

Links

Represent transformations or transport of commodities. See the PyPSA links documentation for more details. Links have:

  • bus0: Primary input bus

  • bus1: Primary output bus

  • efficiency: Conversion efficiency from bus0 → bus1

  • bus2, bus3, …: Additional input/output legs

  • efficiency2, efficiency3, …: Efficiencies for additional legs (positive = output, negative = input)

Examples:

  • Crop production link: inputs = land + water + fertilizer; output = crop

  • Food processing link: inputs = crops; output = food product

  • Trade link: input = crop in region A; output = crop in region B (with transport cost)

Stores

Represent resource availability or capacity limits. See the PyPSA stores documentation for more details. Key uses:

  • Land area available in each region/resource class

  • Water availability in each basin

  • Global fertilizer supply limits

Global Constraints

Enforce system-wide limits. See the PyPSA global constraints documentation for more details. Examples include:

  • Total fertilizer consumption

  • Total greenhouse gas emissions

  • Population-level nutritional requirements

Component Naming Conventions

The model uses systematic naming conventions to organize components:

  • Crops: crop_{crop_name}_{country_code}

  • Foods: food_{food_name}_{country_code}

  • Nutrients: nutrient_{nutrient_name}_{country_code}

  • Land: land_class{class_num}_{region_id}

  • Water: water_basin{basin_id}

  • Primary resources: primary_fertilizer, primary_water

Resource Flow Structure

The model follows a hierarchical flow structure:

  1. Primary resources → Land, water, fertilizer availability in each region

  2. Crop production → Raw agricultural commodities produced on land

  3. Animal production → Livestock products from grassland (grazing) or crops (feed-based)

  4. Food processing → Conversion of crops to food products

  5. Trade → Inter-regional flows via hub networks

  6. Consumption → Aggregation to nutritional outcomes and dietary risk factors

  7. Health impacts → DALYs from dietary exposures

Units and Conversions

The model uses consistent units throughout:

Mass
  • Land area: Mha (million hectares)

  • Crop/food production: tonnes (t) or megatonnes (Mt)

  • Nutritional mass (protein, etc.): grams/person/day → Mt/year

Energy
  • Nutritional energy (calories): kcal/person/day → PJ/year

Emissions
  • Methane (CH₄) and nitrous oxide (N₂O): tonnes of gas, converted downstream to Mt CO₂-eq

  • Aggregate greenhouse gases: tCO₂-eq (tonnes CO₂-equivalent)

  • Conversion factors: CH₄ (28 GWP100), N₂O (265 GWP100)

Water
  • Water use: Mm³ (million cubic meters)

Economic
  • Costs: USD (various sub-units: USD/tonne, USD/km, USD/tCO₂-eq)

Key conversion factors used in the code (workflow/scripts/build_model.py):

  • TONNE_TO_MEGATONNE = 1e-6

  • KCAL_TO_PJ = 4.184e-12

  • KCAL_PER_100G_TO_PJ_PER_MEGATONNE 4.184e-2

  • DAYS_PER_YEAR = 365

Solver Configuration

The model supports multiple LP solvers:

  • HiGHS (default, open-source): Fast interior-point method, suitable for large problems

  • Gurobi (commercial): Often faster for very large problems, supports advanced solver options

Solver selection and options are configured in config/default.yaml:

solving:
  solver: highs
  # solver: gurobi
  # io_api controls how the model is communicated to the solver:
  # - 'lp' or 'mps': Write problem to file (LP/MPS format) which solver reads
  # - 'direct': Use solver's Python API directly (e.g., gurobipy) for faster performance
  # - null: Use linopy's default (typically 'lp')
  io_api: "direct"
  threads: 1  # Number of threads to use for solving
  # The calculate_fixed_duals option induces linopy to solve the MILP,
  # then fix all integer variables to their optimal values, then solve
  # the resulting LP in order to get dual variables for model
  # constraints.
  calculate_fixed_duals: true
  options_gurobi:
    LogToConsole: 0
    OutputFlag: 1
    Method: 2
    MIPGap: 0.001  # target 0.1% relative optimality gap
    MIPFocus: 2
  options_highs:
    solver: "choose"
    mip_rel_gap: 0.001  # align relative gap with gurobi setting
  export_for_tuning: false  # Export model to MPS before solving (for Gurobi parameter tuning)
  time_limit: null  # Solver-internal time limit in minutes (null = no limit)
  runtime: 5  # Maximum solver runtime in minutes (used by SLURM)
  mem_mb: 8000  # Maximum solve_model memory in MB (used by SLURM)
  inline_analysis: false  # When true, analysis runs inside the solve process (no intermediate .nc)

# --- section: remote_solve ---
remote_solve:
  enabled: false  # If true, solve_model is executed remotely over SSH
  local_scenarios: ["baseline"]  # Scenarios that must always solve locally (currently only "baseline" is supported)
  host: "user@login.cluster"  # Placeholder SSH host or alias; customize for your setup
  workdir: "~/path/to/food-opt"  # Placeholder remote project root containing this repository
  pixi_env: "default"  # Placeholder remote pixi environment passed to tools/smk -e
  use_slurm: false  # Set true when remote solves should be submitted via --slurm
  slurm_account: ""  # SLURM account for remote job submission
  slurm_partition: ""  # SLURM partition for remote compute jobs
  sync_workflow: false  # Sync workflow/ and config/ code before remote solve (may dirty remote git state)
  sync_pixi_files: false  # Sync pixi.toml and pixi.lock to remote workdir
  ssh_options: []  # Extra ssh CLI args, e.g. ["-o", "ControlMaster=auto"]
  rsync_options: []  # Extra rsync CLI args
  preflight_check: true  # If true, create remote workdir before syncing

# --- section: sensitivity_analysis ---
# Defaults for surrogate-based global sensitivity analysis. GSA configs
# (see e.g. config/gsa.yaml) deep-merge their overrides on top of this.
# Non-GSA configs never run the surrogate rules but still need the block
# to satisfy schema validation.
sensitivity_analysis:
  holdout_fraction: 0.15
  threads: 6
  # Method downstream consumers (uncertainty-band plots, notebooks) use when
  # no explicit choice is given. Must match a key under ``methods``.
  default_surrogate: xgb
  # When false (default), the surrogate-fit rule declares every Sobol scenario
  # the generator promises so Snakemake drives the full solve→analyse→
  # surrogate chain in a single invocation (the canonical Snakemake idiom).
  # When true, the rule instead scans the analysis directory and fits the
  # surrogate only on scenarios with complete outputs on disk; intended for
  # cluster sweeps where solves run *outside* Snakemake (via
  # ``tools/batch-solve``) and a small fraction may legitimately be missing
  # because per-solve TimeLimit hit. Setting this to true changes the
  # workflow contract: the user must run the solve+analyse phase
  # before targeting the surrogate. As a guardrail, the rule errors out if
  # more than 50% of scenarios are missing.
  discover_scenarios_on_disk: false
  # Sobol-index settings, shared across surrogate methods.  ``outputs``
  # is the allowlist of OutputSpec names whose Sobol indices we
  # compute, plot, and persist; vector specs are excluded by default
  # because the per-element fan-out across MC samples and plot rules
  # blows up for ~80 individual foods.
  sobol:
    outputs: [total_cost, co2, ch4, n2o, land_use, yll]
    grid_resolution: 15
    n_mc_global: 16384
    n_mc_conditional: 2048
  methods:
    pce:
      method_options:
        cross_truncation: 0.8
    rf:
      method_options:
        n_estimators: 500
    mars:
      method_options:
        max_terms: 50
        max_degree: 2
        penalty: 3.0
        n_knots: 25
    xgb:
      method_options:
        n_estimators: 5000
        max_depth: 4
        learning_rate: 0.02
        subsample: 0.8
        colsample_bytree: 0.8
        min_child_weight: 5
        early_stopping_rounds: 50
  # Surrogate targets extracted from each scenario's analysis
  # directory.  Each entry declares:
  #   kind:     "scalar" (default) or "vector"
  #   source:   parquet filename under analysis/scen-<scenario>/
  #   reducer:  reducer registered in
  #             workflow.scripts.analysis.sensitivity_common.REDUCERS
  #             (scalar reducers return float; vector reducers return
  #             dict[str, float], expanded to one column per element)
  #   (extras): kwargs forwarded to the reducer (e.g. column name)
  #   label:    human-readable axis label used by plots
  #   units:    axis-label suffix used by uncertainty-band plots
  # Order here defines the display order in Sobol plots.  Vector
  # outputs are fit by xgb/rf only; pce/mars hard-fail at fit time.
  outputs:
    total_cost:
      source: objective_breakdown.parquet
      reducer: row_sum
      label: Total Cost
      units: bn USD
    co2:
      source: net_emissions.parquet
      reducer: filter_sum
      filter_col: gas
      filter_value: co2
      value_col: mtco2eq
      label: "CO\u2082 Emissions"
      units: "MtCO\u2082eq"
    ch4:
      source: net_emissions.parquet
      reducer: filter_sum
      filter_col: gas
      filter_value: ch4
      value_col: mtco2eq
      label: "CH\u2084 Emissions"
      units: "MtCO\u2082eq"
    n2o:
      source: net_emissions.parquet
      reducer: filter_sum
      filter_col: gas
      filter_value: n2o
      value_col: mtco2eq
      label: "N\u2082O Emissions"
      units: "MtCO\u2082eq"
    land_use:
      source: land_use.parquet
      reducer: column_sum
      column: area_mha
      label: Land Use
      units: Mha
    yll:
      source: health_totals.parquet
      reducer: column_sum
      column: yll_myll
      label: Years of Life Lost
      units: million YLL
    foods:
      kind: vector
      source: food_consumption.parquet
      reducer: pivot_column
      key_col: food
      value_col: consumption_mt
      label: Food consumption (global)
      units: Mt
    feed_categories:
      kind: vector
      source: feed_by_category.parquet
      reducer: pivot_column
      key_col: category
      value_col: mt_dm
      label: Feed by category (global)
      units: Mt DM
    yll_by_cause:
      kind: vector
      source: health_attribution.parquet
      reducer: pivot_column
      key_col: cause
      value_col: yll_myll
      label: YLL by disease cause
      units: million YLL
    yll_by_food_group:
      kind: vector
      source: health_attribution.parquet
      reducer: pivot_column
      key_col: food_group
      value_col: yll_myll
      label: YLL by dietary risk factor
      units: million YLL

Model Scale

Typical model dimensions (for the default configuration with 750 regions):

  • Regions: 750 sub-national optimization regions

  • Crops: ~70 crop types

  • Resource classes: 3-4 yield quality classes per region

  • Variables: ~1-5 million decision variables

  • Constraints: ~1-10 million constraints

  • Solve time: Minutes to hours depending on region count and solver

The model scales roughly linearly with the number of regions. Reducing aggregation.regions.target_count in the configuration will decrease solve time at the cost of spatial resolution.