transcriptm

https://github.com/sternp/transcriptm

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: sternp
License: bsd-3-clause
Language: Python
Default Branch: master
Size: 352 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 4

Created almost 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

TranscriptM

A metatranscriptome bioinformatics pipeline including metagenome contamination correction.

Installation (conda)

``` git clone git@github.com:sternp/transcriptm.git

cd transcriptm

conda env create -n transcriptm -f transcriptm.yaml

conda activate transcriptm

pip install -e . ```

Usage (QC, mapping, gDNA decontamination, and read counting pipeline)

``` transcriptm count -1 readsR1.fastq.gz \ -2 readsR2.fastq.gz \ -n 24 \ --ref combinedreference.fna \ --gff combinedreference.gff \ -m 128 \ -db /dir/to/bowtie21 /dir/to/bowtie22 \ -o output_directory

Specifying -g will concatenate a directory of .fna genomes into a single ref sequence and annotate with prokka (time intensive) Alternatively, you can use pre-contructed files via --ref and --gff

Please note, currently the required format for the contig's FASTA headers are . For example: >Ardenticatenaceae-ID123400001, >Ardenticatenaceae-ID1234_00002...etc

positional arguments: {count}

optional arguments: -h, --help show this help message and exit --version Show version information. --verbosity VERBOSITY 1 = critical, 2 = error, 3 = warning, 4 = info, 5 = debug. Default = 4 (logging) (default: 4) --log LOG Output logging information to file -1 Forward FASTQ reads (default: none) -2 Reverse FASTQ reads (default: none) --conda-prefix Path to the location of installed conda environments, or where to install new environments (default: /work/microbiome/users/sternesp/conda/envs/transcriptm-dev/envs/) -n , --n-cores Maximum number of cores available for use. (default: 8) -m , --max-memory Maximum memory for available usage in gigabytes, GB (default: 64) -o

, --output Output directory (default: ./) --dry-run [DRYRUN] Perform snakemake dry run, tests workflow order and conda environments --conda-frontend Which conda frontend to use (default: mamba) -db [ ...] Location of one or more Bowtie2-formatted databases for contamination filtering during read QC (i.e. human, rRNA..etc) (default: none) --trimmomatic Apply custom trimmomatic values (default: SLIDINGWINDOW:4:20\ MINLEN:50) --sequencer-source NexteraPE, TruSeq2, TruSeq3, none (default: TruSeq3) --skip-qc Skip the read QC step -g , --genome-dir Directory containing FASTA files of each genome (default: none) --ref A single refernce .fna file of contigs/genomes (default: none) --gff GFF annotation of the reference sequence specified by --ref (default: none) -x , --genome-fasta-extension File extension of genomes in the genome directory (default: fna) --kingdom For use in Prokka when constructing & annotating a new reference from .fna files (default: Bacteria) --min-read-aligned-percent Minimum read alignment percent for CoverM filtering (scale from 0-1) (default: 0.75) --min-read-percent-identity Minimum read percent identity for CoverM filtering (scale from 0-1) (default: 0.95) --gDNA Median x-fold gDNA coverage to enable gDNA contamination correction. (default: 1) ```

Notes

You may need to download databases to filter contaminating reads from rRNA genes, the human genome...etc. Otherwise you can make your own Bowtie2-formatted database. You can specify multiple Bowtie2 databases in the transcriptm command, i.e. -db /dir/to/db1 /dir/to/db2

Pre-made databases can be downloaded like so: ``` conda activate transcriptm

kneaddatadatabase --download humangenome bowtie2 /work/microbiome/db/humangrch37bowtie2 Available pre-made databases: KneadData Databases ( database : build = location ) humangenome : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/Homosapienshg37andhumancontaminationBowtie2v0.1.tar.gz humangenome : bmtagger = http://huttenhower.sph.harvard.edu/kneadDatadatabases/HomosapiensBMTaggerv0.1.tar.gz humantranscriptome : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/Homosapienshg38transcriptomeBowtie2v0.1.tar.gz ribosomalRNA : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/SILVA128LSUParcSSUParcribosomalRNAv0.2.tar.gz mouseC57BL : bowtie2 = http://huttenhower.sph.harvard.edu/kneadDatadatabases/mouseC57BL6NJBowtie2_v0.1.tar.gz

```

Owner

Login: sternp
Kind: user

Repositories: 1
Profile: https://github.com/sternp

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sternes
    given-names: Peter R.
    orcid: https://orcid.org/0000-0002-4456-150X
  - family-names: Tyson
    given-names: Gene W.
    orcid: https://orcid.org/0000-0001-8559-9427
title: "TranscriptM: A metatranscriptome bioinformatics pipeline including metagenome contamination correction"
version: 0.3.1
doi: 110.5281/zenodo.11090118
date-released: 2024-04-30

GitHub Events

Total

Release event: 1
Push event: 1
Create event: 1

Last Year

Release event: 1
Push event: 1
Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

transcriptm

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TranscriptM

Installation (conda)

Usage (QC, mapping, gDNA decontamination, and read counting pipeline)

Notes

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies