automation

Modular, Scalable Phenomic Data Processing Pipelines

https://github.com/phytooracle/automation

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    12 of 28 committers (42.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords

distributed-computing hpc lettuce makeflow phenomics phenotyping pipeline plant plant-phenotyping python sorghum workqueue
Last synced: 4 months ago · JSON representation ·

Repository

Modular, Scalable Phenomic Data Processing Pipelines

Basic Info
  • Host: GitHub
  • Owner: phytooracle
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.97 MB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 4
  • Open Issues: 2
  • Releases: 0
Topics
distributed-computing hpc lettuce makeflow phenomics phenotyping pipeline plant plant-phenotyping python sorghum workqueue
Created over 4 years ago · Last pushed 4 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

PhytoOracle | Modular, Scalable Phenomic Data Processing Pipeline

PhytoOracle Automation (POA) is general-use, distributed computing pipeline for phenomic data. POA can be run on local or HPC resources and is capable of processing large phenomic datasets such as those collected by the Field Scanner at the University of Arizona's Maricopa Agricultural Center (pictured below, Photo: Jesse Rieser for The Wall Street Journal).

POA's distributed framework, leveraging CCTools' Makeflow and Workqueue, allows users to leverage hundreds to thousands of computing cores for parallel processing of large data processing tasks. The pipeline is run using a YAML file, which specifies processing steps run by the pipeline wrapper script (distributed_pipeline_wrapper.py).

Comprehensive instructions for gantry field operations, from field preparation to phenotype information extraction, can be found here.

Required Dependencies

YAML File

For more information on YAML file key/value pairs, click here.

Arguments/Flags

For more information on arguments/flags, click here.

Required setup

iRODS

The POA workflow requires iRODS. Follow the documentation here to install iRODS.

If you are running POA on the UA HPC, iRODS is already installed so there is not need to reinstall it. Skip to section "Linux & Windows Subsystem for Linux 2 (WSL2) users", bullet # 3.

Data transfer node

If you are running POA on the UA HPC, you will need to set up SSH keys to gain access to data transfer nodes (DTNs). To get SSH keys set up, follow the steps below here

Running POA

The script distributed_pipeline_wrapper.py is used to run the pipeline. This script downloads and extracts bundled test data, runs containers, and bundles output data.

Local computer

On your computer/server, run the following command: ./distributed_pipeline_wrapper.py -d 2020-02-14 -y yaml_files/example_machinelearning_workflow.yaml

HPC cluster

There are three options when running POA on HPC clusters: interactive, non-interactice, and Cron.

Interactive

The pipeline can use a data transfer node to download data, which speeds up processing.

Interactive jobs should be run on tmux to enable a persistent connection. To install tmux on the UA HPC head node, follow the directions here.

You must first launch an interactive node using the following command on UA HPC Puma:

./shell_scripts/interactive_node.sh

Once the resources are allocated, run the following command to process data: ./distributed_pipeline_wrapper.py -hpc -d 2020-02-14 -y yaml_files/example_machinelearning_workflow.yaml

Data will be downloaded and workflows will be launched. You view progress information for a specific workflow using the mf_monitor.sh script. For example, to view progress information for the first workflow, run: ./shell_scripts/mf_monitor.sh 1

Non-interactive

To submit a date for processing in a non-interactive node, run: shell sbatch shell_scripts/slurm_submission.sh <yaml_file>

For example: shell sbatch shell_scripts/slurm_submission.sh yaml_files/example_machinelearning_workflow.yaml

Make sure to change the account and partition values as needed in the YAML file. For modules requiring a larger number of cores (e.g., Megastitch in the stereoTop and flirIrCamera, and ps2Top), slurmsubmissionlarge.sh should be used.

Cron

To schedule Cron jobs, follow the directions here.

Owner

  • Name: PhytoOracle
  • Login: phytooracle
  • Kind: organization
  • Email: phytooracle@gmail.com
  • Location: Tucson, AZ

Code for extracting phenotypic insights from plant phenomics data. Developed by Drs. Duke Pauli & Eric Lyons labs at the University of Arizona

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Gonzalez"
  given-names: "Emmanuel"
  orcid: "https://orcid.org/0000-0002-3021-9842"
- family-names: "Zarei"
  given-names: "Ariyan"
  orcid: "https://orcid.org/0000-0002-3670-2472"
- family-names: "Hendler"
  given-names: "Nathanial"
  orcid: "https://orcid.org/0000-0002-3164-0428"
- family-names: "Simmons"
  given-names: "Travis"
  orcid: "https://orcid.org/0000-0002-6915-7087"
- family-names: "Zarei"
  given-names: "Arman"
  orcid: ""
- family-names: "Cosi"
  given-names: "Michele"
  orcid: "https://orcid.org/0000-0001-7609-1939"
- family-names: "Ellingson"
  given-names: "Holly"
  orcid: "https://orcid.org/0000-0002-3358-6496"
- family-names: "Calleja"
  given-names: "Sebastian"
  orcid: "https://orcid.org/0000-0001-9401-4494"
- family-names: "Demieville"
  given-names: "Jeffrey"
  orcid: "https://orcid.org/0000-0002-7725-7379"
- family-names: "Lyons"
  given-names: "Eric"
  orcid: "https://orcid.org/0000-0002-3348-8845"
- family-names: "Pauli"
  given-names: "Duke"
  orcid: "https://orcid.org/0000-0002-8292-2388"

title: "automation"
version: 1.0.0
doi: 
date-released: 2022-04-12
url: "https://github.com/phytooracle/automation"

GitHub Events

Total
  • Issues event: 3
  • Watch event: 1
  • Delete event: 4
  • Issue comment event: 4
  • Push event: 66
  • Pull request review event: 3
  • Pull request review comment event: 2
  • Pull request event: 10
  • Create event: 3
Last Year
  • Issues event: 3
  • Watch event: 1
  • Delete event: 4
  • Issue comment event: 4
  • Push event: 66
  • Pull request review event: 3
  • Pull request review comment event: 2
  • Pull request event: 10
  • Create event: 3

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 1,636
  • Total Committers: 28
  • Avg Commits per committer: 58.429
  • Development Distribution Score (DDS): 0.224
Past Year
  • Commits: 437
  • Committers: 13
  • Avg Commits per committer: 33.615
  • Development Distribution Score (DDS): 0.437
Top Committers
Name Email Commits
Emmanuel Gonzalez e****z@e****u 1,270
Jeffrey Demieville 1****e 112
Nathanial Hendler e****t@l****u 70
emilyc02 8****2 39
Travis-Simmons 9****4@s****u 37
beyzabozdag n****g@g****m 34
equant n****r@g****m 19
reys7899 6****9 9
jep236 4****6 7
Aditya Kumar 6****s 6
Emmanuel Gonzalez e****z@g****g 6
Sarthak s****l@g****m 5
PAULI-LAB p****b@P****l 3
James Lee h****b 2
Travis Simmons t****s@w****u 2
Zeussssssss a****2@y****n 2
h38464874b s****1@a****u 2
Emmanuel Gonzalez eg@m****n 1
Emmanuel Gonzalez eg@m****s 1
Nimet Beyza Bozdag n****g@w****u 1
PAULI-LAB p****b@d****u 1
Travis Simmons 6****s 1
James Lee 1****b 1
Travis Simmons t****s@j****u 1
Travis Simmons t****s@r****u 1
Travis Simmons t****s@r****u 1
Travis Simmons t****s@r****u 1
emilyc02 e****2@e****u 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 61
  • Total pull requests: 47
  • Average time to close issues: 3 months
  • Average time to close pull requests: 21 days
  • Total issue authors: 5
  • Total pull request authors: 6
  • Average comments per issue: 0.74
  • Average comments per pull request: 0.04
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 5
  • Average time to close issues: 5 months
  • Average time to close pull requests: 41 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • equant (25)
  • emmanuelgonz (23)
  • Travis-Simmons (11)
  • beyzabozdag (1)
  • jeffreydemieville (1)
Pull Request Authors
  • emmanuelgonz (22)
  • jeffreydemieville (7)
  • equant (7)
  • Travis-Simmons (5)
  • beyzabozdag (4)
  • emilyc02 (2)
Top Labels
Issue Labels
enhancement (25) bug (19) wontfix (19) Season 11 (14) yaml (11) help wanted (10) S11_postprocessing (9) RFC (7) cctools (4) 3D (4) Season 12 (3) Season 10 (2) slack (2) psii (2) question (1) level_0 (1)
Pull Request Labels
enhancement (1) Season 12 (1) Season 11 (1) yaml (1)

Dependencies

requirements.txt pypi
  • pyyaml *
  • typing_extensions *
Dockerfile docker
  • ubuntu 18.04 build