Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: sternp
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Size: 352 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 4
Created almost 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

TranscriptM

A metatranscriptome bioinformatics pipeline including metagenome contamination correction.

Installation (conda)

``` git clone git@github.com:sternp/transcriptm.git

cd transcriptm

conda env create -n transcriptm -f transcriptm.yaml

conda activate transcriptm

pip install -e . ```

Usage (QC, mapping, gDNA decontamination, and read counting pipeline)

``` transcriptm count -1 readsR1.fastq.gz \ -2 readsR2.fastq.gz \ -n 24 \ --ref combinedreference.fna \ --gff combinedreference.gff \ -m 128 \ -db /dir/to/bowtie21 /dir/to/bowtie22 \ -o output_directory

Specifying -g will concatenate a directory of .fna genomes into a single ref sequence and annotate with prokka (time intensive) Alternatively, you can use pre-contructed files via --ref and --gff

Please note, currently the required format for the contig's FASTA headers are . For example: >Ardenticatenaceae-ID123400001, >Ardenticatenaceae-ID1234_00002...etc

positional arguments: {count}

optional arguments: -h, --help show this help message and exit --version Show version information. --verbosity VERBOSITY 1 = critical, 2 = error, 3 = warning, 4 = info, 5 = debug. Default = 4 (logging) (default: 4) --log LOG Output logging information to file -1 Forward FASTQ reads (default: none) -2 Reverse FASTQ reads (default: none) --conda-prefix Path to the location of installed conda environments, or where to install new environments (default: /work/microbiome/users/sternesp/conda/envs/transcriptm-dev/envs/) -n , --n-cores Maximum number of cores available for use. (default: 8) -m , --max-memory Maximum memory for available usage in gigabytes, GB (default: 64) -o

, --output Output directory (default: ./) --dry-run [DRYRUN] Perform snakemake dry run, tests workflow order and conda environments --conda-frontend Which conda frontend to use (default: mamba) -db [ ...] Location of one or more Bowtie2-formatted databases for contamination filtering during read QC (i.e. human, rRNA..etc) (default: none) --trimmomatic Apply custom trimmomatic values (default: SLIDINGWINDOW:4:20\ MINLEN:50) --sequencer-source NexteraPE, TruSeq2, TruSeq3, none (default: TruSeq3) --skip-qc Skip the read QC step -g , --genome-dir Directory containing FASTA files of each genome (default: none) --ref A single refernce .fna file of contigs/genomes (default: none) --gff GFF annotation of the reference sequence specified by --ref (default: none) -x , --genome-fasta-extension File extension of genomes in the genome directory (default: fna) --kingdom For use in Prokka when constructing & annotating a new reference from .fna files (default: Bacteria) --min-read-aligned-percent Minimum read alignment percent for CoverM filtering (scale from 0-1) (default: 0.75) --min-read-percent-identity Minimum read percent identity for CoverM filtering (scale from 0-1) (default: 0.95) --gDNA Median x-fold gDNA coverage to enable gDNA contamination correction. (default: 1) ```

Notes

You may need to download databases to filter contaminating reads from rRNA genes, the human genome...etc. Otherwise you can make your own Bowtie2-formatted database. You can specify multiple Bowtie2 databases in the transcriptm command, i.e. -db /dir/to/db1 /dir/to/db2

Pre-made databases can be downloaded like so: ``` conda activate transcriptm

kneaddatadatabase --download humangenome bowtie2 /work/microbiome/db/humangrch37bowtie2 Available pre-made databases: KneadData Databases ( database : build = location ) humangenome : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/Homosapienshg37andhumancontaminationBowtie2v0.1.tar.gz humangenome : bmtagger = http://huttenhower.sph.harvard.edu/kneadDatadatabases/HomosapiensBMTaggerv0.1.tar.gz humantranscriptome : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/Homosapienshg38transcriptomeBowtie2v0.1.tar.gz ribosomalRNA : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/SILVA128LSUParcSSUParcribosomalRNAv0.2.tar.gz mouseC57BL : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/mouseC57BL6NJBowtie2_v0.1.tar.gz

```

Owner

  • Login: sternp
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sternes
    given-names: Peter R.
    orcid: https://orcid.org/0000-0002-4456-150X
  - family-names: Tyson
    given-names: Gene W.
    orcid: https://orcid.org/0000-0001-8559-9427
title: "TranscriptM: A metatranscriptome bioinformatics pipeline including metagenome contamination correction"
version: 0.3.1
doi: 110.5281/zenodo.11090118
date-released: 2024-04-30

GitHub Events

Total
  • Release event: 1
  • Push event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

setup.py pypi