metpipe

A metagenomic pipeline from proprocessing to annotations and analysis for metagenomic Reads

https://github.com/psikon/metpipe

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A metagenomic pipeline from proprocessing to annotations and analysis for metagenomic Reads

Basic Info

Host: GitHub
Owner: psikon
Language: Shell
Default Branch: master
Homepage:
Size: 15.1 MB

Statistics

Stars: 0
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 0

Created over 13 years ago · Last pushed about 12 years ago

Metadata Files

Readme Citation

metPipe

The metagenomic pipeline metPipe bundles a number of bioinformatical tools for analyising metagenomic short read datasets from RAW data to complete taxonomically annotated data.

1. Requirements

a) Hardware

The metPipe Pipeline was designed to run on s standard 64-bit Linux computer. For the analysis of short datasets and tutorial purposes (without a run of MetaCV) a minimum of 8 GB RAM is required. To analyse larger datasets and include a run of MetaCV 32 GB RAM and a multiple CPU-Cores are recommended. For the installation of the pipeline and the required databases a disk space 77 GB will be used.

b) Software

The Pipeline was developed with Python 2.7 and R 2.15 for standard Linux 64-bit workstations. Before running the install script please check the following dependencies:

Python >= 2.6
R >= 2.15 (for R >= 3.0 useDevel() - Parameter for Bioconductor - biocLite must be assigned; see: http://www.bioconductor.org/developers/useDevel/)
Java Runtime Environment (z.B. open-jre)
gcc >= 4.8.0
git
libboost-dev
libboost-regex-dev
libxerces-c-dev
libsqlite3-dev

2.) Installation

After downloading the software from https://github.com/psikon/metpipe, unpack the files and run the installation script with following command :

bash ./installer.sh

All external dependencies will be downloaded and installed in a local folder. The installation process may take some minutes or hours, depending on the connection speed and the databases.

3.) Usage:

``` usage: metpipe.py [-h] [--version] [-v] [-t THREADS] [-p PARAM] [-s {preprocessing,assembly,annotation,analysis}] [-o OUTPUT] [-a {metavelvet,flash,both}] [-c {metacv,blastn,both}] [--use_contigs] [--notrimming] [--noquality] [--noreport] [--merge] input [input ...]

positional arguments: input single or paired input files in format

optional arguments: -h, --help show this help message and exit --version show program's version number and exit -v more detailed output (default = False) -t THREADS number of threads to use (default = 7) -p PARAM use alternative config file (default = parameter.conf) -s {preprocessing,assembly,annotation,analysis} skip steps in the pipeline (default = None) -o OUTPUT use alternative output folder -a {metavelvet,flash,both} assembling program to use (default = MetaVelvet) -c {metacv,blastn,both} classifier to use for annotation (default = both) --use_contigs should MetaCV use assembled Reads or RAW Reads (default = RAW --notrimming trim and filter input reads? (default = True) --noquality create no quality report (default = True) --noreport create no pie chart with the annotated taxonomical data (default = True) --merge merge concatinated reads with not concatinated (default = False) ```

All step specific settings can be found in the parameter.conf file in the root dir of this program.

4. Quickstart

For testing/tutorial purporses a little test dataset is included in the root folder. This data include 15.000 paired-end MiSeq reads with a length of 250bp.

run the pipeline with the following command:

bash ./metpipe.sh -t 7 -p parameter.conf -o ../example -a flash -c both sequences/forward.fastq sequences/reverse/fastq

After processing you will get 5 files in the analysis folder:

- blastn.db       - blast XML results parsed in SQL Lite DB
- annotated.db    - taxonomical annotated SQL Lite DB
- bacteria.db     - seperated bacteria from SQL Lite DB
- eukaryota.db    - seperated eukaryota from SQL Lite DB
- metpipe.html    - interactic HTML5 piechart of taxonomies

5. Contact

If you encounter a problem/bug, please first check the wiki page: https://github.com/psikon/metpipe/wiki and the known issues pages: https://github.com/psikon/metpipe/issues to see if it has already been documented.

If not, please report the issue either using the contact information below or by submitting a new issue online. Please include information on your run, and every log file produced by your run.

Philipp Sehnert: philipp.sehnert@gmail.com

6. Citing

Owner

Name: Philipp Hennersdorf
Login: psikon
Kind: user
Location: Bremen

Repositories: 11
Profile: https://github.com/psikon

Citation (citations.txt)

[Preprocessing]


[Concatination of the Reads]

Stitch, Austin G. Davis-Richardson, https://github.com/audy/stitch 

[Assembler]

ABySS: A parallel assembler for short read sequence data. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. Genome Research, 2009-June. (Genome Research, PubMed)

Namiki T*, Hachiya T*, Tanaka H, Sakakibara Y. (2012) MetaVelvet : An extension of Velvet assembler to de novo metagenome assembly from short sequence reads, Nucleic Acids Res., in press. (Summplementary figures and tables)

Velvet: algorithms for de novo short read assembly using de Bruijn graphs. D.R. Zerbino and E. Birney. Genome Research 18:821-829. 

[Annotate]

[Misc]

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science