Sensitivity Analysis

The food systems model involves many uncertain inputs: emission factors for different greenhouse gases, crop yield potentials, health dose-response parameters, and policy-level choices like carbon prices and health valuations. Understanding which of these uncertainties matter most for model outcomes is essential for directing research priorities and communicating result robustness.

Global sensitivity analysis answers the question: which input uncertainties drive the most variation in model outcomes? This is quantified through Sobol indices, which decompose the variance of each output into contributions from individual inputs and their interactions.

  • First-order index (\(S_1\)): The fraction of output variance attributable to a single input, acting alone.

  • Total-order index (\(S_T\)): The fraction of output variance attributable to a single input, including all its interactions with other inputs.

A parameter with \(S_1 \approx S_T\) influences the output mainly through its direct effect. A parameter with \(S_T \gg S_1\) is involved in significant interactions with other parameters.

This implementation uses Polynomial Chaos Expansion (PCE) as a surrogate model to compute Sobol indices analytically, avoiding the thousands of additional model evaluations that traditional Monte Carlo methods require.

Methodology

Polynomial Chaos Expansion

The central idea is to approximate the model’s input-output mapping as a polynomial in the uncertain inputs. If \(\mathbf{X} = (X_1, \ldots, X_d)\) are the uncertain parameters and \(Y\) is a scalar output, the PCE representation is:

\[Y \approx \sum_{\boldsymbol{\alpha}} c_{\boldsymbol{\alpha}}\, \Psi_{\boldsymbol{\alpha}}(\mathbf{X})\]

where \(\boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_d)\) is a multi-index, \(c_{\boldsymbol{\alpha}}\) are scalar coefficients, and \(\Psi_{\boldsymbol{\alpha}}\) are multivariate orthonormal polynomials with respect to the joint input distribution. Each \(\Psi_{\boldsymbol{\alpha}}\) is a product of univariate orthonormal polynomials:

\[\Psi_{\boldsymbol{\alpha}}(\mathbf{X}) = \prod_{i=1}^d \psi_{\alpha_i}^{(i)}(X_i)\]

where \(\psi_k^{(i)}\) is the degree-\(k\) orthonormal polynomial for the marginal distribution of \(X_i\). For uniform inputs these are (normalised) Legendre polynomials; for normal inputs, Hermite polynomials.

The orthonormality property \(\mathbb{E}[\Psi_{\boldsymbol{\alpha}} \Psi_{\boldsymbol{\beta}}] = \delta_{\boldsymbol{\alpha}\boldsymbol{\beta}}\) is what makes the variance decomposition exact: the variance of the expansion is simply the sum of squared coefficients. Once the coefficients are known, Sobol indices follow directly without further model evaluations.

The polynomial basis is generated using chaospy. A cross-truncation parameter \(q \in (0, 1]\) controls the multi-index set: lower values favour lower-order interaction terms, keeping the basis compact. The default is \(q = 0.75\).

Sparse Fitting via LARS

With many parameters and moderate polynomial degree, the number of candidate basis terms can exceed the number of model samples. Rather than requiring a full tensor-product design, the implementation uses Least Angle Regression (LARS) with cross-validation to select a sparse subset of active terms.

LARS incrementally adds basis terms that are most correlated with the current residual, using cross-validation to choose the optimal number of active terms. This produces a parsimonious expansion that captures the dominant polynomial structure without overfitting.

The fitting uses sklearn.linear_model.LarsCV with 5-fold cross-validation.

Validation: The quality of the PCE surrogate is assessed by two metrics:

  • Leave-one-out (LOO) error: A relative error measure computed via the hat matrix, approximating the prediction error on held-out samples without refitting. Values below 0.1 indicate a reliable surrogate.

  • R-squared (\(R^2\)): The coefficient of determination on training data. Values close to 1.0 confirm good fit.

Sobol Indices from PCE Coefficients

Thanks to the orthonormality of the basis, the total variance of the expansion is:

\[D = \sum_{\boldsymbol{\alpha} \neq \mathbf{0}} c_{\boldsymbol{\alpha}}^2\]

The first-order Sobol index for parameter \(i\) sums the squared coefficients of terms where only \(\alpha_i > 0\) (no other parameter is active):

\[\begin{split}S_{1,i} = \frac{1}{D} \sum_{\substack{\boldsymbol{\alpha}:\, \alpha_i > 0 \\ \alpha_j = 0\; \forall\, j \neq i}} c_{\boldsymbol{\alpha}}^2\end{split}\]

The total-order Sobol index sums all terms where \(\alpha_i > 0\), regardless of other active parameters:

\[S_{T,i} = \frac{1}{D} \sum_{\boldsymbol{\alpha}:\, \alpha_i > 0} c_{\boldsymbol{\alpha}}^2\]

These indices satisfy \(0 \le S_{1,i} \le S_{T,i} \le 1\) and \(\sum_i S_{1,i} \le 1\) (with equality when there are no interactions).

Conditional Sensitivity Analysis

Some parameters are policy choices (e.g., GHG price, value per YLL) rather than epistemic uncertainties. It is often useful to ask: given a specific policy setting, how sensitive are outcomes to the remaining uncertain parameters?

Designated slice parameters are fixed at specific conditioning values. The conditioning is performed analytically: for each slice parameter \(j\) with conditioning value \(x_j^*\), the univariate basis is evaluated at that value, and the resulting factors are absorbed into the PCE coefficients:

\[c'_{\boldsymbol{\alpha}} = c_{\boldsymbol{\alpha}} \cdot \prod_{j \in \text{slice}} \psi_{\alpha_j}^{(j)}(x_j^*)\]

Sobol indices are then computed from the transformed coefficients \(c'_{\boldsymbol{\alpha}}\), considering only terms where at least one non-slice parameter is active. The result is a set of Sobol indices for the remaining parameters, conditional on the policy choices — showing how sensitivity patterns shift as policy values change.

Experimental Design

Sobol Quasi-Random Sequences

The model is sampled using a scrambled Sobol sequence — a quasi-random, low-discrepancy design that provides better coverage of the parameter space than simple random sampling, especially in high dimensions. The Sobol sequence fills the \([0, 1]^d\) hypercube with balanced coverage, and an inverse CDF transform maps each dimension to the corresponding parameter distribution.

The implementation uses scipy.stats.qmc.Sobol with scrambling enabled for improved uniformity. Scrambling is seeded for deterministic reproducibility (default seed: 42).

Note

The number of samples should be a power of 2 for optimal balance of the Sobol sequence (e.g., 64, 128, 256, 512, 1024).

Supported Distributions

Each uncertain parameter is assigned an independent marginal distribution. The joint distribution is the product of the marginals (i.e., parameters are assumed independent).

Distribution

Required fields

Description

uniform (default)

lower

upper

Flat distribution over [lower, upper]

normal

mean

std

Gaussian with given mean and standard deviation

lognormal

mu

sigma

Log-normal with log-scale mean and std

When the distribution field is omitted, uniform is assumed (requiring only lower and upper).

Configuration

Sensitivity analysis is configured through the _generators DSL in the scenarios file (see Configuration for the full generator syntax). A sensitivity generator uses mode: sensitivity and specifies parameter distributions rather than fixed value lists.

Full example (from the production configuration):

# config/pce_sensitivity_scenarios.yaml

# Reference scenario (no sensitivity adjustments)
default: {}

_generators:
  - name: pce_{sample_id}
    mode: sensitivity
    samples: 1024
    slice_parameters: [value_per_yll, ghg_price]
    parameters:
      yield_factor:
        lower: 0.8
        upper: 1.2
      ch4_factor:
        lower: 0.7
        upper: 1.3
      n2o_factor:
        lower: 0.7
        upper: 1.3
      luc_factor:
        lower: 0.5
        upper: 1.5
      rr_fruits:
        lower: 0
        upper: 1
      rr_vegetables:
        lower: 0
        upper: 1
      rr_whole_grains:
        lower: 0
        upper: 1
      rr_legumes:
        lower: 0
        upper: 1
      rr_nuts_seeds:
        lower: 0
        upper: 1
      rr_red_meat:
        lower: 0
        upper: 1
      value_per_yll:
        lower: 5000
        upper: 500000
      ghg_price:
        lower: 0
        upper: 300
    template:
      sensitivity:
        crop_yields:
          all: "{yield_factor}"
        emission_factors:
          ch4: "{ch4_factor}"
          n2o: "{n2o_factor}"
          luc: "{luc_factor}"
        health_relative_risk:
          fruits: "{rr_fruits}"
          vegetables: "{rr_vegetables}"
          whole_grains: "{rr_whole_grains}"
          legumes: "{rr_legumes}"
          nuts_seeds: "{rr_nuts_seeds}"
          red_meat: "{rr_red_meat}"
      health:
        value_per_yll: "{value_per_yll}"
      emissions:
        ghg_price: "{ghg_price}"

Health relative risk parameters use a quantile parameterization: each rr_<risk_factor> value is a quantile \(q \in [0, 1]\) that interpolates between the GBD confidence bounds at every dose-response breakpoint:

\[\log(\text{RR}(q)) = (1 - q) \cdot \log(\text{RR}_{\text{low}}) + q \cdot \log(\text{RR}_{\text{high}})\]
  • \(q = 0\): GBD lower bound (strongest protective effect for beneficial foods)

  • \(q = 1\): GBD upper bound (weakest protective effect for beneficial foods)

This is applied per (risk factor, cause, exposure) point, so a single quantile parameter per risk factor produces cause-specific adjustments automatically.

Field reference:

  • name: Scenario name pattern. Use {sample_id} as a placeholder for the zero-indexed sample number (e.g., pce_{sample_id} produces pce_0, pce_1, …, pce_1023).

  • mode: sensitivity: Activates space-filling Sobol sampling with distribution-based parameter specifications.

  • samples: Number of samples to draw. Should be a power of 2.

  • seed (optional): Random seed for the scrambled Sobol sequence. Default: 42.

  • slice_parameters (optional): List of parameter names to use as conditioning variables in the conditional Sobol analysis. These parameters are included in the PCE fit but can be fixed at specific values analytically.

  • parameters: Mapping of parameter names to distribution specifications (see Supported Distributions above).

  • template: Configuration template with {param_name} placeholders that are substituted with sampled values. Type is preserved when the placeholder is the entire value.

Running the Analysis

The sensitivity analysis requires three stages: build all sampled scenarios, solve them, then run the PCE computation. Snakemake handles all dependencies automatically.

Run the full pipeline (build + solve + analyze for all 1024 samples):

tools/smk -j4 --configfile config/pce_sensitivity.yaml

Run just the PCE analysis step (after scenarios are already solved):

tools/smk -j4 --configfile config/pce_sensitivity.yaml -- \
    results/pce_sensitivity/analysis/pce_global_indices_pce_.csv

The {prefix} wildcard in the output path matches the scenario name prefix (here pce_) so the rule knows which scenarios to aggregate.

Note

Each sample requires a full model build and solve. Start with a small sample count (32–64) for testing, then increase (512–1024) for production. A coarser spatial resolution also reduces per-sample cost.

Output Files

Three CSV files are written to results/{name}/analysis/:

pce_global_indices_{prefix}.csv — Global Sobol indices

Column

Type

Description

output

string

Output metric (total_cost, ghg_emissions, land_use, yll)

parameter

string

Parameter name from generator spec

S1

float

First-order Sobol index

ST

float

Total-order Sobol index

One row per (output, parameter) pair. For example, with 4 outputs and 7 parameters, this file has 28 rows.

pce_conditional_indices_{prefix}.csv — Conditional Sobol indices

Column

Type

Description

output

string

Output metric

parameter

string

Parameter name (non-slice parameters only)

S1_cond

float

Conditional first-order Sobol index

ST_cond

float

Conditional total-order Sobol index

conditional_variance

float

Output variance when slice parameters are fixed

slice columns

float

One column per slice parameter with the conditioning value

Slice parameter columns are named after the parameters themselves (e.g., value_per_yll, ghg_price). One row per (output, parameter, conditioning-value combination).

pce_validation_{prefix}.csv — PCE surrogate quality metrics

Column

Type

Description

output

string

Output metric

loo_error

float

Relative leave-one-out error (lower is better)

r2

float

Coefficient of determination

n_terms

int

Total candidate basis terms

n_active_terms

int

Non-zero PCE coefficients after LARS selection

n_samples

int

Number of model samples

max_degree

int

Maximum polynomial degree (default: 3)

One row per output metric.

Conditional sensitivity area plots

You can visualize how conditional first-order Sobol shares change with policy slice parameters using:

tools/smk -j4 --configfile config/pce_sensitivity.yaml -- \
    results/pce_sensitivity/plots/pce_conditional_s1_vs_value_per_yll_pce_.pdf

This rule also generates:

  • results/{name}/plots/pce_conditional_s1_vs_ghg_price_{prefix}.pdf

  • results/{name}/plots/pce_conditional_s1_vs_value_per_yll_{prefix}.pdf

Each figure contains one panel per model output (for example total_cost, ghg_emissions, land_use, yll) with stacked areas for non-slice parameters showing conditional first-order Sobol shares.

Interpreting Results

Reading Sobol indices:

  • \(S_1 \approx 1\): This parameter is the dominant driver; reducing its uncertainty would substantially reduce output uncertainty.

  • \(S_T \gg S_1\): This parameter is involved in significant interactions with other parameters.

  • \(S_1 \approx S_T \approx 0\): This parameter has negligible influence on the output.

Example interpretation: If yield_factor has \(S_1 = 0.6\) for ghg_emissions, then 60% of the variance in GHG emissions is explained by crop yield uncertainty alone.

Validation quality:

  • LOO error < 0.1 indicates a reliable PCE surrogate.

  • LOO error > 0.1 suggests the polynomial approximation is insufficient. Consider increasing the number of samples, raising the polynomial degree, or checking whether the model response is highly non-smooth.

Conditional indices: These show how sensitivity patterns shift with policy choices. For instance, at low GHG prices, yield uncertainty may dominate emissions variance, while at high GHG prices, land-use-change factors may become more important as the model restructures production patterns.