https://github.com/biomeds/metabarcoding_pipeline

https://github.com/biomeds/metabarcoding_pipeline

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: BioMeDS
  • License: mit
  • Default Branch: main
  • Size: 85 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of chiras/metabarcoding_pipeline
Created about 2 years ago · Last pushed about 2 years ago

https://github.com/BioMeDS/metabarcoding_pipeline/blob/main/

# Metabarcoding processing pipeline
by Alexander Keller (LMU Munich)

A simple script to process metabarcoding (e.g. 16S V4) data, with amplicons generated by 
* 16S: Kozich et al. 2013 AEM
* ITS2: Sickel et al. 2015 BMC Ecology

If you use this script, please kindly cite this article: https://doi.org/10.1098/rstb.2021.0171

# Dependencies
* VSEARCH https://github.com/torognes/vsearch
* SeqFilter https://github.com/BioInf-Wuerzburg/SeqFilter
* (USEARCH python scripts depreciated and work around is now integrated https://drive5.com/python/ )
* Also check the _DBs folder for Databases

# What will the script do?

* Un-gzipping files
* Individual sample preparation
  * Merging forward and reverse reads
  * Quality filtering
  * Backup Option: Forward read only use in case of bad quality reverse reads
* Community level processing
  * Dereplication
  * Denoising
  * ASV generation
  * Chimera (de novo) removal
  * Taxonomic classification
    - allows for multiple reference databases (iterative) with decreasing priority
    - all unclassified reads are hierarchically classified
  * Creation of a community table

# Usage:
1) Put all your raw sequencing files (```.fastq``` or ```.fastq.gz```) into a subfolder of where this script is (do not use full paths).
2) Copy a config.txt from the resources folder, adapt it to your needs, and copy it into your data folder. Consier to check paths to binaries in the script file

5) You also need to add a ```config.txt``` file, where information about databases are stored. An example is in the example directory.

Then you are ready to run:
```sh
bash _processing_MB_0.2a.sh 
```

Results will be in a new subfolder of your current directory called ```.```

In case the analysis needs to be reverted, which will remove files and bring the folder structure back to the original state.

```sh
bash _revert_analysis_1.sh 
```

# Import into R
In the ```.``` folder, there will be an R script for data import and basic ecological analyses.

Owner

  • Name: BioMedical Data Science
  • Login: BioMeDS
  • Kind: organization

Group at the Center for Computational and Theoretical Biology, University of Würzbrug

GitHub Events

Total
Last Year