submitdatairidanext

Pipeline for submitting data to INSDC databases for IRIDA Next

https://github.com/phac-nml/submitdatairidanext

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

data-sharing data-submission file-transfer microbial-genomics next-generation-sequencing
Last synced: 6 months ago · JSON representation ·

Repository

Pipeline for submitting data to INSDC databases for IRIDA Next

Basic Info
  • Host: GitHub
  • Owner: phac-nml
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Homepage:
  • Size: 2.64 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 4
  • Releases: 0
Topics
data-sharing data-submission file-transfer microbial-genomics next-generation-sequencing
Created 12 months ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

Nextflow nf-core linting Pipeline CI

Data Submission Pipeline for IRIDA Next

Pipeline for submitting data to INSDC databases for IRIDA Next.

Development Status

This pipeline is currently in early stage development. Several aspects of core functionality are missing, untested and subject to change. This pipeline is incomplete and not ready for production use.

Input

The input to the pipeline is a standard sample sheet (passed as --input samplesheet.csv) that looks like:

| sample | fastq1 | fastq2 | taxonid | collectiondate | country | | ------- | --------------- | --------------- | -------: | --------------: | ------- | | SampleA | file1.fastq.gz | file2.fastq.gz | 562 | 2025 | Canada |

The structure of this file is defined in assets/schema_input.json. Validation of the sample sheet is performed by nf-validation.

Parameters

The main parameters are --input as defined above and --output for specifying the output results directory. You may wish to provide -profile singularity to specify the use of singularity containers and -r [branch] to specify which GitHub branch you would like to run.

Other parameters (defaults from nf-core) are defined in nextflow_schema.json.

Running

To run the pipeline, please do:

bash nextflow run phac-nml/submitdatairidanext -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results

Where the samplesheet.csv is structured as specified in the Input section. For more information see usage doc

Output

A JSON file for loading metadata into IRIDA Next is output by this pipeline. The format of this JSON file is specified in our Pipeline Standards for the IRIDA Next JSON. This JSON file is written directly within the --outdir provided to the pipeline with the name iridanext.output.json.gz (ex: [outdir]/iridanext.output.json.gz).

An example of the what the contents of the IRIDA Next JSON file looks like for this particular pipeline is as follows:

{ "files": { "global": [ { "path": "summary/summary.txt.gz" } ], "samples": { "SAMPLE1": [ { "path": "assembly/SAMPLE1.assembly.fa.gz" } ], "SAMPLE2": [ { "path": "assembly/SAMPLE2.assembly.fa.gz" } ], "SAMPLE3": [ { "path": "assembly/SAMPLE3.assembly.fa.gz" } ] } }, "metadata": { "samples": { "SAMPLE1": { "reads.1": "sample1_R1.fastq.gz", "reads.2": "sample1_R2.fastq.gz" }, "SAMPLE2": { "reads.1": "sample2_R1.fastq.gz", "reads.2": "sample2_R2.fastq.gz" }, "SAMPLE3": { "reads.1": "sample1_R1.fastq.gz", "reads.2": "null" } } } }

Within the files section of this JSON file, all of the output paths are relative to the outdir. Therefore, "path": "assembly/SAMPLE1.assembly.fa.gz" refers to a file located within outdir/assembly/SAMPLE1.assembly.fa.gz.

There is also a pipeline execution summary output file provided (specified in the above JSON as "global": [{"path":"summary/summary.txt.gz"}]). However, there is no formatting specification for this file.

For more information see output doc

Test profile

To run with the test profile, please do:

bash nextflow run phac-nml/submitdatairidanext -profile docker,test -r main -latest --outdir results

Legal

Copyright 2023 Government of Canada

Licensed under the MIT License (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at:

https://opensource.org/license/mit/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Owner

  • Name: National Microbiology Laboratory
  • Login: phac-nml
  • Kind: organization

Citation (CITATIONS.md)

# phac-nml/submitdatairidanext: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

## Upload Tools

- [ena-webin-cli](https://github.com/enasequence/webin-cli)

  > The European Bioinformatics Institute (EMBL-EBI)

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Issues event: 28
  • Delete event: 19
  • Issue comment event: 8
  • Push event: 120
  • Pull request review event: 2
  • Pull request review comment event: 5
  • Pull request event: 37
  • Create event: 15
Last Year
  • Issues event: 28
  • Delete event: 19
  • Issue comment event: 8
  • Push event: 120
  • Pull request review event: 2
  • Pull request review comment event: 5
  • Pull request event: 37
  • Create event: 15

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 2
  • Total pull requests: 1
  • Average time to close issues: 20 days
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 1
  • Average time to close issues: 20 days
  • Average time to close pull requests: about 1 hour
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dfornika (17)
Pull Request Authors
  • dfornika (19)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels

Dependencies

.github/workflows/branch.yml actions
  • mshick/add-pr-comment b8f338c590a895d50bcbfa6c5859251edc8952fc composite
.github/workflows/ci.yml actions
  • actions/cache v3 composite
  • actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
  • nf-core/setup-nextflow v1 composite
.github/workflows/linting.yml actions
  • actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
  • actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
  • actions/upload-artifact b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 composite
  • nf-core/setup-nextflow v2 composite
  • pietrobolcato/action-read-yaml 1.1.0 composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact 20319c5641d495c8a52e688b7dc5fada6c3a9fbc composite
  • marocchino/sticky-pull-request-comment 331f8f5b4215f0445d3c07b4967662a32a2d3e31 composite
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
pyproject.toml pypi
modules/local/upload_to_ena/environment.yml conda
  • ena-webin-cli 8.1.1.*
modules/nf-core/custom/dumpsoftwareversions/environment.yml pypi