beethoven
BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
12 of 21 committers (57.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
BEETHOVEN is: Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
Basic Info
- Host: GitHub
- Owner: NIEHS
- License: other
- Language: R
- Default Branch: main
- Homepage: https://niehs.github.io/beethoven/
- Size: 650 MB
Statistics
- Stars: 6
- Watchers: 6
- Forks: 2
- Open Issues: 8
- Releases: 0
Metadata Files
README.md
Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality 
[](https://github.com/NIEHS/beethoven/actions/workflows/check-standard.yaml) [](https://github.com/NIEHS/beethoven/actions/workflows/test-coverage.yaml) [](https://github.com/NIEHS/beethoven/actions/workflows/lint.yaml) [](https://lifecycle.r-lib.org/articles/stages.html#experimental) Group Project for the Spatiotemporal Exposures and Toxicology group with help from friends :smiley: :cowboy_hat_face: :earth_americas:
Installation
r
remotes::install_github("NIEHS/beethoven")
Workflow
beethoven is a targets reproducible analysis pipeline with the following workflow.

Version 0.4.4 of beethoven has stable targets for downloading data files, calculating features at AQS sites, and merging to a base learner-ready data.table (dt_feat_calc_xyt). Ongoing changes relate to calculating features for the prediction grid, computationally managing prediction grid, base learner hyperparameter tuning, and meta learner function development.
r
targets::tar_visnetwork()

Organization
Here, we describe the structure of the repository, important files, and the targets object naming conventions.
Folder Structure
R/is where thebeethovenfunctions are stored. Only ".R" files should be in this folder (ie.targetshelpers, post-processing, model fitting functions).inst/is a directory for arbitrary files outside of the mainR/directorytargets/is a sub-directory withininst/which contains the pipeline files (ie. "targetsaqs.R"). These files declare the `targets::tartargetobjects which constitute thebeethoven` pipeline.
tests/stores unit and integration tests (testthat/) and test data (testdata/) according to the testthat package's standard structure. for unit testing.testthat.Ris created and maintained bytestthat, and is not to be edited manually.
container/stores definition files and build scripts to build covariate- and model-specific Apptainer container images (container_covariates.defandcontainer_models.def).man/contains function documentation files (".Rd") which are by the roxygen2 package. These files are not to be edited manually.vignettes/contains ".Rmd" narrative text and code files. These are rendered by pkgdown into the Articles section of thebeethovenwebpage..github/workflows/is a hidden directory which stores the GitHub CI/CD "yaml" files.tools/is dedicated to educational or demonstration material (e.g. Rshiny), but is not excluded from the package build.
Important Files
_targets.Rconfigurestargetssettings, creates computational resource controllers, and structures thebeethovenpipeline.- To run
beethoven, users must review and update the following parameters for their user profile and computing system: controller_*Ensure the local controllers do not request more CPUs than are available on your machine or high performance system.#SBATCH --partitionUtilization of NVIDIA GPUs (withinglue::gluecommand)--bind /USER_PATH_TO_INPUT/input:/input(withinglue::gluecommand)
- To run
_targets.yamlis created and updated by runningtargets::tar_makeand is not to be edited manually.run.shsubmits separateSBATCHjobs for the covariate, cpu- and gpu-enabled base learner, and the meta learnertargets(see/inst/scripts/). This setup ensures that each stage utilizes the proper container image and computational resources. To runbeethoven, users must review and update the following parameters for their user profile and computing system in each of theinst/scripts/run_*files.:#SBATCH --mail-user#SBATCH --partition#SBATCH --mem#SBATCH --cpus-per-task--bind /USER_PATH_TO_INPUT/input:/input--bind /USER_PATH_TO_SLURM/slurm:/USER_PATH_TO_SLURM/slurm
Running beethoven Pipeline
User settings
beethoven pipeline is configured for SLURM with defaults for NIEHS HPC settings. For adapting the settings to users' environment, consult with the documentation of your platform and edit the requested resources in the stage-specific run files (/inst/scripts/) (lines 3-11) and _targets.R (lines 41-45; individual crew and crew.cluster controller workers).
Critical targets
There are 5 "critical" targets that users may want to change to run beethoven.
chr_daterange- Controls all time-related targets for the entire pipeline. This is the only
targetthat needs to be changed to update the pipeline with a new temopral range. Month and year specific arguments are derived from the time range defined bychr_daterange.
- Controls all time-related targets for the entire pipeline. This is the only
chr_nasa_token- Sets the file path to the user's NASA Earthdata account credentials. These credentials expire at ~90 day intervals and therefore must be updated regularly.
chr_mod06_links- The file path to the MOD06 links file. These links must be manually downloaded per the
amadeus::download_modisfunction. The links are then stored in a CSV file that is read by the function. The new file with links must be updated to match the new date range.
- The file path to the MOD06 links file. These links must be manually downloaded per the
chr_input_dir- The file path to the input directory. This target controls where the raw data files are downloaded to and imported from. This file path must be mounted to the container at run time in the
run.shscript.
- The file path to the input directory. This target controls where the raw data files are downloaded to and imported from. This file path must be mounted to the container at run time in the
num_dates_split- Controls the size of temporal splits. Splitting the temporal range into smaller chunks allows for parallel processing across multiple workers. It also allows for dispatching new dynamic branches when the temporal range is updated.
Apptainer
Current implementation of beethoven utilizes Apptainer images to run the pipeline with consistent package versions and custom installations. Users must build these images before runnning beethoven.
sh
cd container/ # must be working in the `container/` directory
sh build_container_covariates.sh # build "covariates" stage image
sh build_container_models.sh # build "models" image
mv *sif ../ # move images to `beethoven/` root directory
[!NOTE]
.siffiles are omitted from GitHub due to size (>5 Gb each)
Run
After switching back to the project root directory, users can run the pipeline with the run.sh shell script. The following lines of /inst/scripts/run_*.sh must be updated with user-specific settings before running the pipeline
```sh
SBATCH --mail-user=[USER_EMAIL] # email address for job notifications
SBATCH --partition=[PARTITION_NAME] # HPC partition to run on
SBATCH --mem=[###G] # Total memory for the job
SBATCH --cpus-per-task=[###] # Total CPUs for the job
... --bind [USERINPUTDIRECTORY]/input:/input \ ... --bind [USERSYSTEMPATH/munge]:/run/munge \ --bind [USERSYSTEMPATH/slurm]:[USERSYSTEMPATH/slurm] \ ```
Once configured, the pipeline can be run with a SLRUM batch job.
sh
cd ../ # assuming still in the `container/` directory
sbatch run.sh
The SLURM batch job can also be submitted R session with the batch helper function.
r
source("R/helpers.R")
batch()
Contribution
The Developer's Guide provides detailed instructions for how to develop or update beethoven settings or individual targets objecdts
To contribute developments or modifications, open a Pull request into the dev branch with a detailed description of the proposed changes. Pull requests must pass all status checks, and then will be approved or rejected by beethoven's authors.
Utilize Issues to notify the authors of bugs, questions, or recommendations. Identify each issue with the appropriate label to help ensure a timely response.
Owner
- Name: National Institute of Environmental Health Science
- Login: NIEHS
- Kind: organization
- Location: Durham, NC
- Website: https://www.niehs.nih.gov/
- Repositories: 55
- Profile: https://github.com/NIEHS
The mission of the National Institute of Environmental Health Sciences is to discover how the environment affects people in order to promote healthier lives.
GitHub Events
Total
- Create event: 22
- Issues event: 71
- Watch event: 6
- Delete event: 20
- Issue comment event: 102
- Push event: 246
- Pull request review comment event: 6
- Pull request review event: 15
- Gollum event: 1
- Pull request event: 40
- Fork event: 2
Last Year
- Create event: 22
- Issues event: 71
- Watch event: 6
- Delete event: 20
- Issue comment event: 102
- Push event: 246
- Pull request review comment event: 6
- Pull request review event: 15
- Gollum event: 1
- Pull request event: 40
- Fork event: 2
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Insang Song | i****g@n****v | 331 |
| mitchellmanware | m****e@g****m | 243 |
| {SET}group | 1****y | 194 |
| Kyle Messier | m****p@e****v | 83 |
| Eva Marques | m****l@e****v | 40 |
| Insang Song | s****x@h****m | 36 |
| kyle-messier | k****r@n****v | 28 |
| Spatiotemporal-Exposures-and-Toxicology | m****p@a****v | 15 |
| Mitchell Manware | m****e@M****l | 13 |
| Spatiotemporal-Exposures-and-Toxicology | m****p@a****v | 8 |
| Eva Marques | m****l@c****v | 7 |
| Eva Marques | e****s@g****m | 4 |
| Messier | m****p@a****v | 4 |
| Mariana Kassien | k****a@e****v | 4 |
| Ranadeep Daw | 3****p | 3 |
| Eva Marques | m****l@g****v | 2 |
| dzilber | d****r@g****m | 2 |
| Daniel Zilber | d****r@n****v | 1 |
| Mitchell Manware | m****e@e****v | 1 |
| Spatiotemporal-Exposures-and-Toxicology | m****p@a****l | 1 |
| Spatiotemporal-Exposures-and-Toxicology | m****p@a****n | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 99
- Total pull requests: 117
- Average time to close issues: 4 months
- Average time to close pull requests: 3 days
- Total issue authors: 7
- Total pull request authors: 6
- Average comments per issue: 2.44
- Average comments per pull request: 0.97
- Merged pull requests: 90
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 39
- Pull requests: 64
- Average time to close issues: about 1 month
- Average time to close pull requests: 2 days
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 1.9
- Average comments per pull request: 0.52
- Merged pull requests: 47
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kyle-messier (40)
- sigmafelix (26)
- mitchellmanware (21)
- eva0marques (7)
- MAKassien (3)
- Sanisha003 (1)
- dawranadeep (1)
Pull Request Authors
- mitchellmanware (46)
- kyle-messier (32)
- sigmafelix (30)
- eva0marques (7)
- dawranadeep (1)
- MAKassien (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v2 composite
- actions/checkout v2 composite
- codecov/codecov-action v3 composite
- r-lib/actions/setup-r v2 composite
- actions/checkout v3 composite
- actions/setup-node v3 composite
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- covr * suggests
- knitr * suggests
- rmarkdown * suggests
- sf * suggests
- sftime * suggests
- terra * suggests
- testthat >= 3.0.0 suggests