https://github.com/broadinstitute/vectreeid

Amplicon taxonomic identification pipeline for Neafsey Lab

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Keywords

phylogenetics python taxonomy

Last synced: 7 months ago · JSON representation

Repository

Amplicon taxonomic identification pipeline for Neafsey Lab

Basic Info

Host: GitHub
Owner: broadinstitute
Language: Python
Default Branch: main
Homepage:
Size: 26.8 MB

Statistics

Stars: 1
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 0

Topics

phylogenetics python taxonomy

Created almost 5 years ago · Last pushed about 1 year ago

Metadata Files

Readme

VecTreeID : Taxonomic Identification Pipeline

Created for Neafsey Lab @ Harvard School of Public Health

Maintained by Genomic Center for Infectious Diseases @ Broad Institute of MIT & Harvard

Contact: Jason Travis Mohabir (jmohabir@broadinstitute.org)

Public repository for the amplicon taxonomic identification in the Neafsey lab.

See the README in each corresponding pipeline for usage.

Installation

Install Anaconda3

The online documentation on how to install Anaconda 3 is given here: https://docs.anaconda.com/anaconda/install/linux/

Follow your Operating System specific instructions on how to install Anaconda3

Create conda environment for running the tool

Use the TaxonomyAssignmentPipeline.yml file to create a conda virtual environment

conda env create --file TaxonomyAssignmentPipeline.yml -p /path/to/env/<name-of-environment>/ To activate the conda environment source activate <name-of-environment> A detail description on creating a conda environment is given here: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file

Important Note

Source: https://github.com/etetoolkit/ete/pull/636

The version of ete3 used is unable to parse the jplace output from EPA-ng. Users will need to manually update the 'newick.py' file in the conda environment after it has been initially created.

Replace the /lib/python3.10/site-packages/ete3/parser/newick.py file with the `newick.py' file provided.

The development team is working on this issue.

Arguments

```

\ \ / / |__ | |_ | _ \ \ \ / / | | __ ___ ___ | | | | | | \ \/ / _ \/ __| | '/ _ \/ _ \ | | | | | | \ / / (| | | | / _/| |_| || | \/ _|_||| _|_|__|__/

[Created on π Day 2023]
[Authors: Jason Travis Mohabir, Aina Zurita Martinez]
[Created for Neafsey Lab @ Harvard School of Public Health]
[Maintained by Genomic Center for Infectious Diseases @ Broad Institute of MIT & Harvard]

usage: VecTreeID: Taxonomy Assignment Pipeline for VectorSeq [-h] [--name NAME] --amplicon AMPLICON [--dada2directory DADA2DIRECTORY] [--workingdirectory WORKINGDIRECTORY] [--minasvreadcount MINASVREADCOUNT] [--minsamplereadcount MINSAMPLEREADCOUNT] [--maxtargetseq MAXTARGETSEQ] [--artefactcutoff ARTEFACTCUTOFF] [--mincoverage MINCOVERAGE] [--minidentity MINIDENTITY] [--lwrcutoff LWRCUTOFF] [--maxhaplotypespersample MAXHAPLOTYPESPERSAMPLE] [--minabundanceassignment MINABUNDANCEASSIGNMENT] [--tempdir TEMPDIR] [--referencetree REFERENCETREE] [--referencemsa REFERENCEMSA] [--referencedatabase REFERENCEDATABASE] [--blastonly] [--runblast] [--runmsa] [--runtree]

options: -h, --help show this help message and exit --name NAME name of batch --amplicon AMPLICON amplicon name --dada2directory DADA2DIRECTORY DADA2 directory with inputs --workingdirectory WORKINGDIRECTORY working directory --minasvreadcount MINASVREADCOUNT asv total read count threshold --minsamplereadcount MINSAMPLEREADCOUNT sample total read count threshold --maxtargetseq MAXTARGETSEQ blastn maxtargetseq --artefactcutoff ARTEFACTCUTOFF artefact filter (coverage & identity) --mincoverage MINCOVERAGE percent coverage filter --minidentity MINIDENTITY percent identity filter --lwrcutoff LWRCUTOFF Like Weight Ratio cutoff --maxhaplotypespersample MAXHAPLOTYPESPERSAMPLE maximum number of ASVs for batch-level --minabundanceassignment MINABUNDANCEASSIGNMENT minimum ASV read count abundance --tempdir TEMPDIR temporary directory --referencetree REFERENCETREE reference tree --referencemsa REFERENCEMSA reference msa --referencedatabase REFERENCEDATABASE reference BLAST database --blastonly only run blastn --runblast run blast --runmsa run msa --runtree run tree ```

Owner

Name: Broad Institute
Login: broadinstitute
Kind: organization
Location: Cambridge, MA

Website: http://www.broadinstitute.org/
Twitter: broadinstitute
Repositories: 1,083
Profile: https://github.com/broadinstitute

Broad Institute of MIT and Harvard

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

Committers

Last synced: 11 months ago

All Time

Total Commits: 42
Total Committers: 2
Avg Commits per committer: 21.0
Development Distribution Score (DDS): 0.405

Past Year

Commits: 11
Committers: 1
Avg Commits per committer: 11.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Jason Travis Mohabir	j**r@b**g	25
amzurita	a**3@g**m	17

Committer Domains (Top 20 + Academic)

broadinstitute.org: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 1 minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/broadinstitute/vectreeid

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

VecTreeID : Taxonomic Identification Pipeline

Created for Neafsey Lab @ Harvard School of Public Health

Maintained by Genomic Center for Infectious Diseases @ Broad Institute of MIT & Harvard

Installation

Install Anaconda3

Create conda environment for running the tool

Important Note

Arguments

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels