gcamfaostat
gcamfaostat: An R package to prepare, process, and synthesize FAOSTAT data for global agroeconomic and multisector dynamic modeling - Published in JOSS (2024)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 15 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
1 of 1 committers (100.0%) from academic institutions -
✓Institutional organization owner
Organization jgcri has institutional domain (www.pnnl.gov) -
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
An R package to prepare, process, and synthesize FAOSTAT data for global agroeconomic and multisector dynamic modeling
Basic Info
- Host: GitHub
- Owner: JGCRI
- License: other
- Language: R
- Default Branch: main
- Homepage: https://jgcri.github.io/gcamfaostat/
- Size: 480 MB
Statistics
- Stars: 13
- Watchers: 2
- Forks: 4
- Open Issues: 8
- Releases: 5
Metadata Files
README.md
gcamfaostat: An R package to prepare, process, and synthesize FAOSTAT data for global agroeconomic and multisector dynamic modeling
Summary
The gcamfaostat R package is designed for the preparation, processing, and synthesis of the Food and Agriculture Organization (FAO) Statistics (FAOSTAT) agroeconomic data. The primary purpose is to facilitate FAOSTAT data use in global economic and multisector dynamic models while ensuring transparency, traceability, and reproducibility. Here, we provide an overview of the development of gcamfaostat (v1.0.0) and demonstrate its capabilities in generating and maintaining agroeconomic data required for the Global Change Analysis Model (GCAM). Our initiative seeks to enhance the quality and accessibility of data for the global agroeconomic modeling community, with the aim of fostering more robust and harmonized outcomes in a collaborative, efficient, and open-source framework. One of the important features of the package is the possibility to construct the FAO Food Balance Sheets at the disaggregated commodity level (with over 500 commodities), which provides a comprehensive and detailed data input for a variety of analytical and modeling applications. The processed data and visualizations offered by gcamfaostat can also be valuable to a broader audience interested in gaining insights into the intricacies of global agriculture.
This tool bridges a crucial gap in the literature by offering several key features and capabilities.
- Transparency and Reproducibility:
gcamfaostatincorporates functions for downloading, cleaning, synthesizing, and balancing agroeconomic datasets in a traceable, transparent, and reproducible manner. This enhances the credibility of the processing and allows for better scrutiny of the methods. We have documented and demonstrated the use of the package in generating and updating agroeconomic data needed for the GCAM. - Expandability and Consistency:
gcamfaostatcan be used to flexibly process and update agroeconomic data for any agroeconomic model. The package framework can be also easily expanded to include new modules for consistently processing new data. - Community Collaboration and Efficiency: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the effort required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency.
- User Accessibility: Where applicable, the processed data can be mapped and aggregated to user-specified regions and sectors for agroeconomic modeling. However, beyond the modeling community,
gcamfaostatcan be valuable to a broader range of users interested in understanding global agriculture trends and dynamics, as it provides user-friendly data processing and visualization tools.
User Guide
The package is documented in the online manual.
Quick Start in R (> 4.0) & Rstudio
1. Download and install (size < 1 GB):
- On the command line: navigate to your desired folder location and then enter
git clone https://github.com/JGCRI/gcamfaostat.git
2. Load and run the gcamdata package
- Open the
gcamfaostat.Rprojfile in thegcamfaostatfolder using RStudio. - Load the
gcamdatapackage: devtools::load_all()
3. Modify configurations
- To export csv output files, in
constants.R,- set
Process_Raw_FAO_DatatoTRUEif raw data have been downloaded,PREBUILT_DATAwill be used otherwise. - specify the modules to be excluded (
DISABLED_MODULES) if needed. - modify data years. Most FAOSTAT datasets are available for
1960 - 2022.
- set
4. Run the driver
driver_drake()- if
driver_drake(write_csv_model = GCAM), related CSV for GCAM will be exported tooutput/gcamfaostat_GCAM - if
driver_drake(write_csv_model = "Traceable_FBS"), related CSV for generating traceable FBS (T-FBS) will be exported tooutput/gcamfaostat_Traceable_FBS - Users can add and design data flows for other models.
5. Use data and package functions
- Data saved in output can be used in downstream models.
- Once
drive_drakehas been run, all the intermediate data are saved and can be explored (see examples in Use Cases and Visualization.
Package structure
gcamfaostatprocesses input data to output data in a format that is needed for downstream processing and modeling, e.g., data used in gcamdata-aglu-FAO (see the schematic below).- Input data was stored in the Prebuilt Data of the package. The raw data is archived on Zenodo (see Zhao (2024) and URL in the
FF_download_RemoteArchivefunction) to ensure the processing is 100% replicable. Users can also download the latest data usingFF_download_FAOSTAT. - All intermediate processing and data flows are transparent and traceable. See Processing Flow for data-tracing examples.
Schematic: module (data processing chunk) structure of gcamfaostat
Future work and contribution
Data development is never a once and for all task, and continued efforts are needed to sustain and improve the processing procedures. Further improvements might include:
- Sustain processing functions for updated raw data: ensuring that our processing functions remain up-to-date when raw data undergoes revisions is imperative.
- Evaluate and enhance assumptions: a critical examination of the assumptions utilized in processes like interpolation, extrapolation, aggregation, disaggregation, and mapping is essential and should be an ongoing endeavor.
- Revise assumptions in low-quality data zones: regions and sectors with little or low-quality data require careful consideration. We will need to adjust our assumptions when improved data becomes available.
- Promoting broader applications: leveraging data processed by gcamfaostat can significantly contribute to harmonizing input data in global agroeconomic modeling. Encouraging the utilization of this data and fostering collaboration to enhance data processing is crucial.
- Assess sensitivity in downstream applications: understanding the sensitivity of downstream data applications, e.g., global agroeconomic projections, to upstream data processing assumptions is crucial. This awareness empowers us to make informed decisions and refinements.
We welcome and value community contributions to gcamfaostat. Please read our Contributing Guidelines for information on how to contribute to this package. Through collective and collaborative efforts, we hope to improve the interface between raw data, modeling community, and broader audience. We would be grateful for the feedback and suggestions on potential improvements of the developed data processing framework.
Related publications
- Bond-Lamberty, Ben, Kalyn Dorheim, Ryna Cui, Russell Horowitz, Abigail Snyder, Katherine Calvin, Leyang Feng et al. "gcamdata: An R package for preparation, synthesis, and tracking of input data for the GCAM integrated human-earth systems model." Journal of Open Research Software 7, no. 1 (2019). DOI: 10.5334/jors.232
- Calvin, Katherine V., Abigail Snyder, Xin Zhao, and Marshall Wise. "Modeling land use and land cover change: using a hindcast to estimate economic parameters in gcamland v2. 0." Geoscientific Model Development 15, no. 2 (2022): 429-447. https://doi.org/10.5194/gmd-15-429-2022
- Chepeliev, Maksym. "Incorporating nutritional accounts to the GTAP Data Base." Journal of Global Economic Analysis 7, no. 1 (2022): 1-43. https://doi.org/10.21642/JGEA.070101AF
- Narayan et al., (2021). ambrosia: An R package for calculating and analyzing food demand that is responsive to changing incomes and prices. Journal of Open Source Software, 6(59), 2890. https://doi.org/10.21105/joss.02890
- Zhao, Xin, Katherine V. Calvin, Marshall A. Wise, and Gokul Iyer. "The role of global agricultural market integration in multiregional economic modeling: Using hindcast experiments to validate an Armington model." Economic Analysis and Policy 72 (2021): 1-17. https://doi.org/10.1016/j.eap.2021.07.007
- Zhao, Xin and Marshall Wise. "Core Model Proposal# 360: GCAM agriculture and land use (AgLU) data and method updates: connecting land hectares to food calories." PNNL https://jgcri.github.io/gcam-doc/cmp/CMP360-AgLUdatamethodupdates.pdf
- Zhao, Xin (2024). FAOSTAT AgLU data Archive GCAMv7 (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13941470
Copyright 2023 Battelle Memorial Institute; see the LICENSE file.
Owner
- Name: Joint Global Change Research Institute
- Login: JGCRI
- Kind: organization
- Location: College Park, MD, USA
- Website: https://www.pnnl.gov/projects/jgcri
- Repositories: 129
- Profile: https://github.com/JGCRI
Advancing fundamental understanding of human and Earth systems
JOSS Publication
gcamfaostat: An R package to prepare, process, and synthesize FAOSTAT data for global agroeconomic and multisector dynamic modeling
Authors
Joint Global Change Research Institute, Pacific Northwest National Laboratory, College Park, MD, USA
Center for Global Trade Analysis, Department of Agricultural Economics, Purdue University, West Lafayette, IN, USA
Joint Global Change Research Institute, Pacific Northwest National Laboratory, College Park, MD, USA
Joint Global Change Research Institute, Pacific Northwest National Laboratory, College Park, MD, USA
Joint Global Change Research Institute, Pacific Northwest National Laboratory, College Park, MD, USA
Tags
GCAM faostat global economic modelingGitHub Events
Total
- Create event: 7
- Issues event: 6
- Release event: 2
- Watch event: 7
- Delete event: 4
- Issue comment event: 2
- Push event: 25
- Pull request event: 4
- Fork event: 3
Last Year
- Create event: 7
- Issues event: 6
- Release event: 2
- Watch event: 7
- Delete event: 4
- Issue comment event: 2
- Push event: 25
- Pull request event: 4
- Fork event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Zhao, Xin | x****o@p****v | 120 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 16
- Total pull requests: 7
- Average time to close issues: 3 months
- Average time to close pull requests: 2 days
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 1.69
- Average comments per pull request: 0.29
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- realxinzhao (11)
- klau506 (2)
- HenriKajasilta (2)
Pull Request Authors
- realxinzhao (9)
- kanishkan91 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- R >= 3.1.2 depends
- assertthat >= 0.2 imports
- data.table >= 1.10.4 imports
- dplyr >= 0.8.2 imports
- magrittr >= 1.5 imports
- methods * imports
- readr >= 1.3.1 imports
- rlang * imports
- tibble >= 1.1 imports
- tidyr >= 0.7.1 imports
- R.utils >= 2.6.0 suggests
- drake >= 6.2.1 suggests
- gcamdata.compdata * suggests
- igraph >= 1.0.1 suggests
- knitr * suggests
- mockr >= 0.1 suggests
- rmarkdown * suggests
- stringr * suggests
- testthat >= 1.0.2 suggests
- tidyselect * suggests
- usethis >= 1.4.0 suggests
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- actions/checkout v3 composite
- r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite
- actions/cache v1 composite
- actions/checkout v1 composite
- r-lib/actions/setup-pandoc v2-branch composite
- r-lib/actions/setup-r v2-branch composite
- r-lib/actions/setup-tinytex v2-branch composite
- actions/checkout v4 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite