automation
Modular, Scalable Phenomic Data Processing Pipelines
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
12 of 28 committers (42.9%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Keywords
Repository
Modular, Scalable Phenomic Data Processing Pipelines
Basic Info
Statistics
- Stars: 5
- Watchers: 1
- Forks: 4
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
PhytoOracle | Modular, Scalable Phenomic Data Processing Pipeline
PhytoOracle Automation (POA) is general-use, distributed computing pipeline for phenomic data. POA can be run on local or HPC resources and is capable of processing large phenomic datasets such as those collected by the Field Scanner at the University of Arizona's Maricopa Agricultural Center (pictured below, Photo: Jesse Rieser for The Wall Street Journal).

POA's distributed framework, leveraging CCTools' Makeflow and Workqueue, allows users to leverage hundreds to thousands of computing cores for parallel processing of large data processing tasks. The pipeline is run using a YAML file, which specifies processing steps run by the pipeline wrapper script (distributed_pipeline_wrapper.py).
Comprehensive instructions for gantry field operations, from field preparation to phenotype information extraction, can be found here.
Required Dependencies
- Linux-based computer, cluster, or server
- Singularity
- iRODS
- Python
YAML File
For more information on YAML file key/value pairs, click here.
Arguments/Flags
For more information on arguments/flags, click here.
Required setup
iRODS
The POA workflow requires iRODS. Follow the documentation here to install iRODS.
If you are running POA on the UA HPC, iRODS is already installed so there is not need to reinstall it. Skip to section "Linux & Windows Subsystem for Linux 2 (WSL2) users", bullet # 3.
Data transfer node
If you are running POA on the UA HPC, you will need to set up SSH keys to gain access to data transfer nodes (DTNs). To get SSH keys set up, follow the steps below here
Running POA
The script distributed_pipeline_wrapper.py is used to run the pipeline. This script downloads and extracts bundled test data, runs containers, and bundles output data.
Local computer
On your computer/server, run the following command:
./distributed_pipeline_wrapper.py -d 2020-02-14 -y yaml_files/example_machinelearning_workflow.yaml
HPC cluster
There are three options when running POA on HPC clusters: interactive, non-interactice, and Cron.
Interactive
The pipeline can use a data transfer node to download data, which speeds up processing.
Interactive jobs should be run on tmux to enable a persistent connection. To install tmux on the UA HPC head node, follow the directions here.
You must first launch an interactive node using the following command on UA HPC Puma:
./shell_scripts/interactive_node.sh
Once the resources are allocated, run the following command to process data:
./distributed_pipeline_wrapper.py -hpc -d 2020-02-14 -y yaml_files/example_machinelearning_workflow.yaml
Data will be downloaded and workflows will be launched. You view progress information for a specific workflow using the mf_monitor.sh script. For example, to view progress information for the first workflow, run:
./shell_scripts/mf_monitor.sh 1
Non-interactive
To submit a date for processing in a non-interactive node, run:
shell
sbatch shell_scripts/slurm_submission.sh <yaml_file>
For example:
shell
sbatch shell_scripts/slurm_submission.sh yaml_files/example_machinelearning_workflow.yaml
Make sure to change the account and partition values as needed in the YAML file.
For modules requiring a larger number of cores (e.g., Megastitch in the stereoTop and flirIrCamera, and ps2Top), slurmsubmissionlarge.sh should be used.
Cron
To schedule Cron jobs, follow the directions here.
Owner
- Name: PhytoOracle
- Login: phytooracle
- Kind: organization
- Email: phytooracle@gmail.com
- Location: Tucson, AZ
- Repositories: 17
- Profile: https://github.com/phytooracle
Code for extracting phenotypic insights from plant phenomics data. Developed by Drs. Duke Pauli & Eric Lyons labs at the University of Arizona
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gonzalez" given-names: "Emmanuel" orcid: "https://orcid.org/0000-0002-3021-9842" - family-names: "Zarei" given-names: "Ariyan" orcid: "https://orcid.org/0000-0002-3670-2472" - family-names: "Hendler" given-names: "Nathanial" orcid: "https://orcid.org/0000-0002-3164-0428" - family-names: "Simmons" given-names: "Travis" orcid: "https://orcid.org/0000-0002-6915-7087" - family-names: "Zarei" given-names: "Arman" orcid: "" - family-names: "Cosi" given-names: "Michele" orcid: "https://orcid.org/0000-0001-7609-1939" - family-names: "Ellingson" given-names: "Holly" orcid: "https://orcid.org/0000-0002-3358-6496" - family-names: "Calleja" given-names: "Sebastian" orcid: "https://orcid.org/0000-0001-9401-4494" - family-names: "Demieville" given-names: "Jeffrey" orcid: "https://orcid.org/0000-0002-7725-7379" - family-names: "Lyons" given-names: "Eric" orcid: "https://orcid.org/0000-0002-3348-8845" - family-names: "Pauli" given-names: "Duke" orcid: "https://orcid.org/0000-0002-8292-2388" title: "automation" version: 1.0.0 doi: date-released: 2022-04-12 url: "https://github.com/phytooracle/automation"
GitHub Events
Total
- Issues event: 3
- Watch event: 1
- Delete event: 4
- Issue comment event: 4
- Push event: 66
- Pull request review event: 3
- Pull request review comment event: 2
- Pull request event: 10
- Create event: 3
Last Year
- Issues event: 3
- Watch event: 1
- Delete event: 4
- Issue comment event: 4
- Push event: 66
- Pull request review event: 3
- Pull request review comment event: 2
- Pull request event: 10
- Create event: 3
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Emmanuel Gonzalez | e****z@e****u | 1,270 |
| Jeffrey Demieville | 1****e | 112 |
| Nathanial Hendler | e****t@l****u | 70 |
| emilyc02 | 8****2 | 39 |
| Travis-Simmons | 9****4@s****u | 37 |
| beyzabozdag | n****g@g****m | 34 |
| equant | n****r@g****m | 19 |
| reys7899 | 6****9 | 9 |
| jep236 | 4****6 | 7 |
| Aditya Kumar | 6****s | 6 |
| Emmanuel Gonzalez | e****z@g****g | 6 |
| Sarthak | s****l@g****m | 5 |
| PAULI-LAB | p****b@P****l | 3 |
| James Lee | h****b | 2 |
| Travis Simmons | t****s@w****u | 2 |
| Zeussssssss | a****2@y****n | 2 |
| h38464874b | s****1@a****u | 2 |
| Emmanuel Gonzalez | eg@m****n | 1 |
| Emmanuel Gonzalez | eg@m****s | 1 |
| Nimet Beyza Bozdag | n****g@w****u | 1 |
| PAULI-LAB | p****b@d****u | 1 |
| Travis Simmons | 6****s | 1 |
| James Lee | 1****b | 1 |
| Travis Simmons | t****s@j****u | 1 |
| Travis Simmons | t****s@r****u | 1 |
| Travis Simmons | t****s@r****u | 1 |
| Travis Simmons | t****s@r****u | 1 |
| emilyc02 | e****2@e****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 61
- Total pull requests: 47
- Average time to close issues: 3 months
- Average time to close pull requests: 21 days
- Total issue authors: 5
- Total pull request authors: 6
- Average comments per issue: 0.74
- Average comments per pull request: 0.04
- Merged pull requests: 40
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 5
- Average time to close issues: 5 months
- Average time to close pull requests: 41 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 2.0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- equant (25)
- emmanuelgonz (23)
- Travis-Simmons (11)
- beyzabozdag (1)
- jeffreydemieville (1)
Pull Request Authors
- emmanuelgonz (22)
- jeffreydemieville (7)
- equant (7)
- Travis-Simmons (5)
- beyzabozdag (4)
- emilyc02 (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pyyaml *
- typing_extensions *
- ubuntu 18.04 build