agml-cy-bench

CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.

https://github.com/wur-ai/agml-cy-bench

Keywords from Contributors

transformation

Last synced: 9 months ago · JSON representation

Repository

CY-Bench (Crop Yield Benchmark) is a comprehensive dataset and benchmark to forecast crop yields at subnational level. CY-Bench standardizes selection, processing and spatio-temporal harmonization of public subnational yield statistics with relevant predictors. Contributors include agronomers, climate scientists and machine learning researchers.

Basic Info

Host: GitHub
Owner: WUR-AI
License: other
Language: Jupyter Notebook
Default Branch: main
Homepage: https://cybench.agml.org/
Size: 81 MB

Statistics

Stars: 32
Watchers: 9
Forks: 14
Open Issues: 35
Releases: 0

Created over 2 years ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License

README.md

AgML - Machine Learning for Agricultural Modeling

AgML is the AgMIP transdisciplinary community of agricultural and machine learning modelers.

AgML aspires to

identify key research gaps and opportunities at the intersection of agricultural modelling and machine learning research,
support enhanced collaboration and engagement between experts in these disciplines, and
conduct and publish protocol-based studies to establish best practices for robust machine learning use in agricultural modelling.

AgML Crop Yield Forecasting

The objective of AgML Crop Yield Forecasting task is to create a benchmark to compare models for crop yield forecasting across countries and crops. The models and forecasts can be used for food security planning or famine early warning. The benchmark is called CY-Bench (crop yield benchmark).

Overview

Early in-season predictions of crop yields can inform decisions at multiple levels of the food value chain from late-season agricultural management such as fertilization, harvest, and storage to import or export of produce. Anticipating crop yields is also important to ensure market transparency at the global level ( e.g. Agriculture Market Information System, GEOGLAM Crop Monitor) and to plan response actions in food insecure countries at risk of food production shortfalls.

We propose CY-Bench, a dataset and benchmark for subnational crop yield forecasting, with coverage of major crop growing countries and underrepresented countries of the world for maize and wheat. By subnational, we mean the administrative level where yield statistics are published. When statistics are available for multiple levels, we pick the highest resolution. By yield, we mean end-of-season yield statistics as published by national statistics offices or similar entities representing a group of countries. By forecasting, we mean prediction is made ahead of harvest. The task is also called in-season crop yield forecasting. In-season forecasting is done at a number of time points during the growing season from start of season (SOS) to end of season (EOS) or harvest. The first forecast is made at middle-of-season (EOS - SOS)/2. Other options are quarter-of-season (EOS - SOS)/4 and n-day(s) before harvest. The exact time point or time step when forecast is made depends on the crop calendar for the selected crop and country (or region). All time series inputs are truncated up to the forecast or inference time point, i.e. data from the remaining part of the season is not used. Since yield statistics may not be available for the current season, we evaluate models using predictors and yield statistics for all available years. The models and forecasts can be used for food security planning or famine early warning. We compare models, algorithms and architectures by keeping other parts of the workflow as similar as possible. For example: the dataset includes same source for each type of predictor (e.g. weather variables, soil moisture, evapotranspiration, remote sensing biomass indicators, soil properties), and selected data are preprocessed using the same pipeline (use the crop mask, crop calendar; use the same boundary files and approach for spatial aggregation) and (for algorithms that require feature design) and same feature design protocol.

Coverage for maize

Undifferentiated Maize or Grain Maize where differentiated Maize Coverage Map

Coverage for wheat

Undifferentiated Wheat or Winter Wheat where differentiated Wheat Coverage Map

Deciphering crop names

The terms used to reference different varieties or seasons of maize/wheat have been simplified in CY-Bench. The following table describes the representative crop name as provided in the crop statistics The terms used to reference different varieties or seasons of maize/wheat have been simplified in CY-Bench. The following table describes the representative crop names as provided in the crop statistics:

| Country/Region | Maize | Wheat | |:---------------------------------------------------------------------|:------------------|:--------------------------------------| | EU-EUROSTAT | Grain Maize | Soft Wheat | | Africa-FEWSNET | Maize | N/A | | Argentina | Corn | Wheat | | Australia | N/A | Winter Wheat | | Brazil | Grain Corn | Grain Wheat | | China | Grain Corn | Grain/Winter/Spring Wheat | | Germany | Grain Maize | Winter Wheat | | India | Maize | Wheat | | Mali | Maize | N/A | | Mexico | White/Yellow Corn | N/A | | USA | Grain Corn | Winter Wheat |

Targets

Yield Map

Getting started

cybench is an open source python library to load CY-Bench dataset and run the CY-Bench tasks.

Installation

git clone https://github.com/WUR-AI/AgML-CY-Bench

Requirements

Run the following commands to install dependencies or requirements.

pip install poetry cd AgML-CY-Bench poetry install

Downloading the sample dataset

You can work with a small sample of the dataset by running

git clone https://github.com/WUR-AI/sample_data.git cybench/data from the AgML-CY-Bench folder.

Running a reduced version of the benchmark

To check everything is set up correctly, run poetry run python cybench/runs/run_benchmark.py -d maize_NL -m test

Running the full benchmark

To run the benchmark for many crops and countries, follow the steps for installation and requirements from the previous section in a machine with significant amount of resources (memory and storage).

Get the dataset from Zenodo. After downloading the dataset, move the unzipped data inside AgML-CY-Bench/cybench/data or make sure AgML-CY-Bench/cybench/data points to the directory containing unzipped data.

Unzip the downloaded data: unzip cybench-data.zip -d <target_dir> Move the data to the expected data path: mv <target_dir> cybench/data or create a symbolic link from cybench/data to the target directory: ln -sf <target_dir> cybench/data

Run the benchmark on a dataset using poetry run python cybench/runs/run_benchmark.py -d maize_NL

If you want to write your own model and compare performance with the benchmark, write a model class your_model that extends the BaseModel class. The base model class definition is inside models.model.

``` from cybench.models.model import BaseModel from cybench.runs.runbenchmark import runbenchmark

class MyModel(BaseModel): pass

runname = <runname> datasetname = "maizeUS" result = runbenchmark(runname=runname, modelname="mymodel", modelconstructor=MyModel, modelinitkwargs: , modelfitkwargs: , datasetname=datasetname)

metrics = ["normalizedrmse", "mape", "r2"] dfmetrics = result["dfmetrics"].resetindex() print(df_metrics.groupby("model").agg({ m : "mean" for m in metrics }))

```

Compare the results (values of metrics for the specified dataset) with the baseline results for the same dataset.

Reproducing the baseline results

The baseline results were produced in the following test environment:

Operating system: Ubuntu 18.04 CPU: Intel Xeon Gold 6448Y (32 Cores) memory (RAM): 256GB disk storage: 2TB GPU: NVIDIA RTX A6000

Benchmark run time

During the benchmark run with the baseline models, several countries were run in parallel, each in a GPU in a distributed cluster. The larger countries took approximately 18 hours to complete. If run sequentially in a single capable GPU, the whole benchmark should take 50-60 hours to complete.

Leaderboard

See tables inside results_baselines

Data sources

| Crop Statistics | Shapefiles or administrative boundaries | Predictors, crop masks, crop calendars | |-----------------|-----------------------------------------|----------------------------------------| | Africa from FEWSNET | Africa from FEWSNET | Weather: AgERA5 | | Mali (1) | Use Africa shapefiles from FEWSNET | Soil: WISE soil data | | Argentina | Argentina | Soil moisture: GLDAS | | Australia | Australia | Evapotranspiration: FAO | | Brazil | Brazil | FAPAR: JRC FAPAR | | China | China | Crop calendars: ESA WorldCereal | | EU | EU | NDVI: MOD09CMG | | Germany (2) | Use EU shapefiles | Crop Masks: ESA WorldCereal | | India | India | | | Mexico | Mexico | | | US | US | |

1: Mali data at admin level 3. Mali data is also included in the FEWSNET Africa dataset, but at admin level 1 only.

2: Germany data is also included in the EU dataset, but there most of the data fails coherence tests (e.g. yield = production / harvest_area)

How to cite

Please cite CY-bench as follows:

@dataset{paudel_etal2024,
  author       = {Paudel, Dilli and
                  Baja, Hilmy and
                  van Bree, Ron and
                  Kallenberg, Michiel and
                  Ofori-Ampofo, Stella and
                  Potze, Aike and
                  Poudel, Pratishtha and
                  Saleh, Abdelrahman and
                  Anderson, Weston and
                  von Bloh, Malte and
                  Castellano, Andres and
                  Ennaji, Oumnia and
                  Hamed, Raed and
                  Laudien, Rahel and
                  Lee, Donghoon and
                  Luna, Inti and
                  Masiliūnas, Dainius and
                  Meroni, Michele and
                  Mutuku, Janet Mumo and
                  Mkuhlani, Siyabusa and
                  Richetti, Jonathan and
                  Ruane, Alex C. and
                  Sahajpal, Ritvik and
                  Shuai, Guanyuan and
                  Sitokonstantinou, Vasileios and
                  de Souza Noia Junior, Rogerio and
                  Srivastava, Amit Kumar and
                  Strong, Robert and
                  Sweet, Lily-belle and
                  Vojnović, Petar and
                  de Wit, Allard and
                  Zachow, Maximilian and
                  Athanasiadis, Ioannis N.},
  title        = {{CY-Bench: A comprehensive benchmark dataset
                   for subnational crop yield forecasting}},
  year         = 2024,
  publisher    = {AgML (https://www.agml.org/)},
  version      = {1.0},
  doi          = {10.5281/zenodo.11502142},
}

How to contribute

Thank you for your interest in contributing to AgML Crop Yield Forecasting. Please check contributing guidelines for how to get involved and contribute.

Additional information

For more information please visit the AgML website.

Owner

Name: WUR-AI
Login: WUR-AI
Kind: organization

Repositories: 15
Profile: https://github.com/WUR-AI

GitHub Events

Total

Issues event: 12
Watch event: 11
Delete event: 2
Issue comment event: 20
Push event: 89
Pull request review event: 4
Pull request event: 19
Fork event: 5
Create event: 11

Last Year

Issues event: 12
Watch event: 11
Delete event: 2
Issue comment event: 20
Push event: 89
Pull request review event: 4
Pull request event: 19
Fork event: 5
Create event: 11

Committers

Last synced: 11 months ago

All Time

Total Commits: 786
Total Committers: 23
Avg Commits per committer: 34.174
Development Distribution Score (DDS): 0.457

Past Year

Commits: 254
Committers: 5
Avg Commits per committer: 50.8
Development Distribution Score (DDS): 0.346

Top Committers

Name	Email	Commits
krsnapaudel	d**l@w**l	427
Michiel Kallenberg	m**g@g**m	91
ellaampy	e**y@g**m	45
Michiel Kallenberg		35
ronvree	r**e@g**m	29
Aike Potze	a**e@h**m	26
janet68	4****8	21
Pratishtha Poudel	p**a@g**m	19
hbja	h**6@g**m	18
Maximilian Zachow	m**w@p**m	13
Inti Luna Aviles	i**s@g**m	10
AbdelrahmanAmr3	a**i@g**m	9
Raed Hamed	5****d	9
VANT	i****a	8
Carla	r**n@p**e	7
smkuhlani	3****i	5
Jonathan Richetti	r**5@p**r	5
mmeronijrc	m**i@e**u	2
oumniaennaji	7****i	2
Amit Srivastava	6****n	2
Jonathan Richetti	r**5@v**r	1
Gonzalo-Mier	g**z@w**l	1
gnodnooh	g**h@g**m	1

Committer Domains (Top 20 + Academic)

wur.nl: 2 virga-login.cm.cluster: 1 ext.ec.europa.eu: 1 petrichor-login.cm.cluster: 1 posteo.de: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 37
Total pull requests: 83
Average time to close issues: about 1 month
Average time to close pull requests: 16 days
Total issue authors: 5
Total pull request authors: 9
Average comments per issue: 1.0
Average comments per pull request: 1.25
Merged pull requests: 64
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 11
Pull requests: 23
Average time to close issues: 7 days
Average time to close pull requests: 4 days
Issue authors: 5
Pull request authors: 3
Average comments per issue: 0.18
Average comments per pull request: 0.35
Merged pull requests: 18
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

krsnapaudel (28)
michielkallenberg (4)
ellaampy (3)
Lalitgis (1)
ThScheeve (1)

Pull Request Authors

krsnapaudel (37)
michielkallenberg (23)
ellaampy (6)
mzachow (6)
AbdelrahmanAmr3 (4)
ronvree (3)
umdsgy (2)
Gonzalo-Mier (1)
poudelpratishtha (1)

Top Labels

Issue Labels

data-preparation (7) model-api (4) documentation (1) help wanted (1) enhancement (1) baseline-models (1)

Pull Request Labels

baseline-models (1)

agml-cy-bench

Science Score: 39.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

AgML - Machine Learning for Agricultural Modeling

AgML Crop Yield Forecasting

Table of contents

Overview

Coverage for maize

Coverage for wheat

Deciphering crop names

Targets

Getting started

Installation

Requirements

Downloading the sample dataset

Running a reduced version of the benchmark

Running the full benchmark

Reproducing the baseline results

Leaderboard

Data sources

How to cite

How to contribute

Additional information

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies