unassigner

Type strain identification for 16S reads

https://github.com/PennChopMicrobiomeProgram/unassigner

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Type strain identification for 16S reads

Basic Info
  • Host: GitHub
  • Owner: PennChopMicrobiomeProgram
  • Language: Python
  • Default Branch: master
  • Size: 342 KB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 5
  • Open Issues: 8
  • Releases: 11
Created over 11 years ago · Last pushed 11 months ago
Metadata Files
Readme

README.md

Unassigner

Tests Codacy Badge codecov PyPI Bioconda DockerHub <!-- End badges -->

Evaluate consistency with named bacterial species for short 16S rRNA marker gene sequences.

Summary

The 16S rRNA gene is found in all bacteria, and its gene sequence is highly conserved. Amplification and sequencing of bacterial 16S rRNA genes is a common method used to survey bacterial communities in microbiome research. However, high throughput instruments are unable to sequence the entire gene. Therefore, a short region of the gene is selected for amplification and sequencing.

The resultant sequences, spanning part of the 16S gene, can be used to identify the types of bacteria present in a specimen. For example, one sequence might be assigned to the Streptococcus genus based on sequence similarity. Many programs are available to carry out such taxonomic assignment.

It is generally thought that the 16S rRNA gene is not suitable for assignment of bacterial species. We agree, but with a catch: the gene sequence is suitable for ruling out assignment to many bacterial species. This software is designed to rule out all the species designations that are inconsistent with a partial 16S rRNA gene sequence. For those species that are not definitively ruled out, we assign a probability that the sequence is inconsistent with the species.

Because the software is geared towards ruling out species rather than deciding on the best assignment, we call it the unassigner. It's a cheesy joke, but we've decided to roll with it.

The unassigner library provides a command-line program, unassign, that takes a FASTA file of DNA sequences in a 16S gene region, and gives the probability that the sequence is inconsistent with nearby bacterial species.

Installation

Install with conda using:

bash conda create --name unassigner -c conda-forge -c bioconda unassigner

Or run with Docker using:

bash docker run --rm -it ctbushman/unassigner:latest unassign --help

Alternative Installation

Unassigner can be installed using pip:

bash pip install unassigner

But will require vsearch to be installed separately. It can also be installed from GitHub:

bash git clone https://github.com/PennChopMicrobiomeProgram/unassigner.git pip install unassigner/

Usage

The unassign program requires one argument, a FASTA-formatted file of short 16S sequences:

bash unassign my_sequences.fasta

If the program has not been run before, it will automatically download the bacterial species data it needs, format its reference files, create an output directory named my_sequences_unassigned, and write a table of results there, along with some auxiliary output files. Note that the output directory will be in the same directory as my_sequences.fasta.

Please see the output of unassign --help for a list of the available options.

Trim ragged

The trimragged program takes in a query sequence to search and trim and an input fasta file (or it can read from stdin):

bash trimragged AGAGTTTGATCCTGGCTCAG --input_file my_sequences.fasta

Trimragged is included to extract different regions from the full length 16S rRNA gene. The purpose of this auxiliary software is to account for the full length 16S rRNA sequences where only a part of the primer is present in the sequence. This can be due to low quality at the beginning or at the end of a sequence due to limitations of sequencing platforms.

The software operates in three steps: 1) Matching the full length of the primer, 2) Matching the partial primer, 3) Aligning reads to other sequences with a known primer location. The sequence of the primer to search and trim is required for the software. Only one primer is accepted at a time, so the user needs to run the software twice with each primer sequence.

Step 1: The software first searches for the full length of the primer sequence. If mismatches are allowed, then the software expands all possibilities of the primer sequence mutations in a list and searches for each. Once a hit is found, the start and end index is stored as a PrimerMatch object.

Step 2: If the minpartial argument is greater than 0, the software then searches for partial matches of the primer in the remaining sequences. The software makes a list of all the possibilities of primers, removing nucleotides from the beginning of the sequence till the minimum length specified by minpartial is reached. Then the software searches for each of the possible primer sequences. Once a hit is found, the start and end index is stored as a Primer Match object.

Step 3: The last part of the software relies on building a database of the sequences with already identified primer sequences from the previous two steps. Then the rest of the reads are aligned against the database of sequences with known primer locations using vsearch. Once a hit is found, and the positions of the primers are estimated by extending the aligned region.

Please see the output of trimragged --help for a list of the available options.

Contributing

We welcome ideas from our users about how to improve this software. Please open an issue if you have a question or would like to suggest a feature.

Owner

  • Name: PennChopMicrobiomeProgram
  • Login: PennChopMicrobiomeProgram
  • Kind: organization
  • Location: United States of America

GitHub Events

Total
  • Create event: 10
  • Release event: 1
  • Issues event: 8
  • Watch event: 1
  • Delete event: 6
  • Issue comment event: 3
  • Push event: 20
  • Pull request review event: 3
  • Pull request review comment event: 5
  • Pull request event: 10
  • Fork event: 1
Last Year
  • Create event: 10
  • Release event: 1
  • Issues event: 8
  • Watch event: 1
  • Delete event: 6
  • Issue comment event: 3
  • Push event: 20
  • Pull request review event: 3
  • Pull request review comment event: 5
  • Pull request event: 10
  • Fork event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 11
  • Total pull requests: 24
  • Average time to close issues: almost 2 years
  • Average time to close pull requests: 14 days
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 0.27
  • Average comments per pull request: 0.21
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 8
Past Year
  • Issues: 6
  • Pull requests: 21
  • Average time to close issues: 23 days
  • Average time to close pull requests: 9 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.19
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 8
Top Authors
Issue Authors
  • Ulthran (9)
  • cchehoud (1)
  • eclarke (1)
Pull Request Authors
  • Ulthran (16)
  • dependabot[bot] (8)
Top Labels
Issue Labels
enhancement (6) bug (1)
Pull Request Labels
dependencies (8) github_actions (8) codex (3)

Dependencies

requirements.txt pypi
  • biopython *
  • scipy *
setup.py pypi
  • biopython *
  • scipy *
.github/workflows/codecov.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v3 composite
  • s-weigand/setup-conda v1.1.0 composite
.github/workflows/linter.yml actions
  • actions/checkout v3 composite
  • github/super-linter v4 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
.github/workflows/tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • s-weigand/setup-conda v1.1.0 composite