chloroExtractor

chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data - Published in JOSS (2018)

https://github.com/chloroextractorteam/chloroextractor

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: chloroExtractorTeam
License: mit
Language: Perl
Default Branch: master
Size: 31.1 MB

Statistics

Stars: 4
Watchers: 7
Forks: 8
Open Issues: 17
Releases: 12

Created over 9 years ago · Last pushed almost 7 years ago

Metadata Files

Readme Contributing License Code of conduct Codemeta

chloroExtractor

News

We got published in the Journal of Open Source Software as .

Build and coverage status

Introduction

The chloroExtractor is a perl based program which provides a pipeline for DNA extraction of chloroplast DNA from whole genome plant data. Too huge amounts of chloroplast DNA can cast problems for the assembly of whole genome data. One solution for this problem can be a core extraction before sequencing, but this can be expensive. The chloroExtractor takes your whole genome data and extracts the chloroplast DNA, so you can have your different DNA separated easily by the chloroExractor. Furthermore the chloroExtractor takes the chloroplast DNA and tries to assemble it. This is possible because of the preserved nature of the chloroplasts primary and secondary structure. Through k-mer filtering the k-mers which contain the chloroplast sequences get extracted and can then be used to assemble the chloroplast on a guided assembly with several other chloroplasts.

Requirements

The version numbers given in parentheses are tested and known to work in this combination. If you do a local install you can try to use other versions of some programs or modules but they are not guaranteed to work. The docker container we provide will always contain a working combination of programs and modules.

Required Software

Jellyfish (2.2.4)
Spades (v3.10.1)
bowtie2 (2.2.6)
NCBI-Blast+ (2.2.31+)
Samtools (0.1.19-96b5f2294a)
Bedtools (v2.25.0)
GNU R (3.2.3)
Ghostscript (9.18)
Python (2.7.12)
Perl (v5.22.1) ### Required Perl modules
Moose (2.1604)
Log::Log4Perl (1.44)
Term::ProgressBar (2.17)
Graph (0.96)
IPC::Run (0.94)
File::Which (1.19)

Installation

Install the requirements then clone the directory recursively shell git clone --recursive https://github.com/chloroExtractorTeam/chloroExtractor

Docker

Our chloroExtractor is also available as a docker image.

The docker image for the current release is .

We also provide rolling releases for docker images for our master and develop branch: * *

Running chloroExtractor using that image requires the installation of docker and the permission to execute the docker commands. Additionally, the docker container needs to be able to allocate enough memory (5GB are sufficient for the demo dataset). In ubuntu memory for docker is usually not limited but on Mac OS X it is, refer to this guide to increase the memory. The data are mapped into the container as a volumne under /data. Our chloroExtractor will run with /data as working directory. Therefore, the output files will be stored inside the directory which was mapped into the container. In case you are not using a user mapping, chloroExtractor will run with root priveleges and all created files will belong the root user. For further information about docker and its security implications please visit their website.

shell docker pull chloroextractorteam/chloroextractor docker run -v /location-of-input-data:/data --rm chloroextractorteam/chloroextractor -1 first_read.fq -2 second_read.fa [other options]

Usage

To use the chloroExtractor, use the ptx executable in the bin/ folder

```shell ./ptx --help

``` or use the docker container:

script docker run -v /location-of-input-data:/data --rm chloroextractorteam/chloroextractor --help It returns a list of all mandatory parameters and optional setting.

```shell $ ./ptx [] -1 -2 -d

Options: -1|--reads Input reads file, first of pair.

-2|--mates
    Input reads file, second of pair

-d|--dir [ptx]
    Path to a working directory. Will be created. If exists, needs to be
    empty.

--create-config
    Create a config file with default settings for user customization.

-c|--config
    Use user customized config file. Superseeds default config.

--continue=[TASKID TASKID ...] [TRUE]
    By default, the pipeline will check for a incomplete previous run
    and if possible continue after the last successful task of that run.
    Additionally you may provide task ids to specify a specific task -
    instead of the last task - to continue from.

--redo [FALSE]
    Force pipeline to restart from the beginning, ignoring and
    overwriting previous results. Supersedes --continue.

--stop-after=<TASKID>
    Stop the pipeline after the specified task.

--skip=<TASKID/PATTERN TASKID/PATTERN ...>
    Skip specified tasks or tasks matching specified patterns (perl
    regex). If other tasks request results from skipped tasks, the
    pipeline will try to reuse results from previous runs. You need to
    take care, that these results still make sence in the current run.

-V|--version
    Display version.

-h|--help
    Display this help.

```

All the Options can and should be handled with the configuration file ptx.cfg, which is located in the mainfolder. With this config file you can handle the options for each step and task individual. On default the chloroExtractor uses this config file, you can edit these one, or make your own one and uses the -c parameter to use it.

```shell

$ ./ptx -c ownptx.cfg -1 FQ1 -2 FQ2

```

Input data

The chloroExtractor uses unsortet Fastq files with paired end reads. Please make sure your reads are not sortet at all, otherwise there could be problems or even wrong results.

Example

An example data set can be downloaded from zenodo. As example we download the dataset into a folder and run chloroExtractor with the input files.

For preparation, a folder will be created and an example dataset will be downloaded:

```shell

create a folder for the testrun, adjust this to your needs or use the current folder DATAFOLDER=$(pwd)

DATAFOLDER=/tmp/chloroExtractor-testrun mkdir -p ${DATAFOLDER} cd ${DATAFOLDER}

download the example set and extract the sequencing reads

wget 'https://zenodo.org/record/884449/files/SRR5216995_1M.tar.bz2' -O - | tar xjf - ``` Afterwards, chloroExtractor can be run in command line mode:

```shell

run chloroExtractor via command line (assuming all dependencies are installed and ptx folder is in PATH)

ptx -1 SRR52169951M1.fastq -2 SRR52169951M2.fastq [17-09-21 13:42:42] [PipeWrap] Running ptx from the beginning, no previous runs detected. [17-09-21 13:42:42] [PipeWrap] Running 'jf0': jellyfish count -t 8 -m 31 -s 500M -C -o jf0.jf /data/SRR52169951M1.fastq /data/SRR52169951M2.fastq [...] ```

or using the docker container:

```shell

other possibility is docker container based chloroExtractor (assuming that the user is allowed to run docker)

docker pull chloroextractorteam/chloroextractor # ensure the latest version from docker hub

this binds the DATAFOLDER from above into the docker container you can also use the path directly instead of the variable

docker run -v ${DATAFOLDER}:/data --rm chloroextractorteam/chloroextractor -1 SRR52169951M1.fastq -2 SRR52169951M2.fastq [17-09-21 13:52:30] [PipeWrap] Running ptx from the beginning, no previous runs detected. [17-09-21 13:52:30] [PipeWrap] Running 'jf0': jellyfish count -t 8 -m 31 -s 500M -C -o jf0.jf /data/SRR52169951M1.fastq /data/SRR52169951M2.fastq [...] ```

Both runs result in a final chloroplast assembly in the file fcg.fa.

Another more detailed example is available at our demo.

Changelog

Version v1.0.9 2019-05-01

The strange behavior of fcg is fixed now (Fix #135)

Version v1.0.8 2019-04-16

Avoiding read coverage as hard filter. Using kmer coverage instead, but printing warning message.

Version v1.0.7 2019-04-12

Solved a progress bar issue, which sometimes occured (Fix #128) Update to kmer_filter_reads.pl version 0.05

Version v1.0.6 2019-04-08

Target coverage is reduced to 50 (instead of 200). Testset was updated to avoid strange ARRAY(0x...) messages.

Version v1.0.5 2018-07-11

It updates the fastg-parser to version v0.6.3, therefore it finally solves the bug caused by SPAdes' fastg files (Fix #101). Moreover, added some documentation about docker images and changed format of our changelog section.

Version v1.0.4 2018-06-14

Archived as . It updates the fastg-parser to version v0.6.0 and therefore fixes #101 and add citation information.

Version v1.0.3 2018-02-20

Archived as . It includes a test set and a patch for the divede by zero bug.

Version v1.0.2 2018-01-16

Archived as and was created after review process in The Journal of Open Source Software, we added Daniel Amsel to the acknowledgement section.

Version v1.0.1 2018-01-15

Archived as and was created after review process in The Journal of Open Source Software

Version v1.0.0 2018-01-15

Archived as and used for submission to The Journal of Open Source Software

Howto cite

The software chloroExtractor was published in JOSS as .

If you are using chloroExtractor please cite - Ankenbrand et al., (2018). chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data. Journal of Open Source Software, 3(21), 464, https://doi.org/10.21105/joss.00464

Repository archive

The releases of this repository are archived at Zenodo at .

License

For License please refer to the LICENSE file

Owner

Name: chloroExtractorTeam
Login: chloroExtractorTeam
Kind: organization

Repositories: 18
Profile: https://github.com/chloroExtractorTeam

JOSS Publication

chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data

Published

January 16, 2018

DOI

10.21105/joss.00464

Volume 3, Issue 21, Page 464

Authors

Markus J. Ankenbrand

Department of Animal Ecology and Tropical Biology (Zoology III), University of Würzburg, Germany, These authors contributed equally to this work

Simon Pfaff

Center for Computational and Theoretical Biology, University of Würzburg, These authors contributed equally to this work

Niklas Terhoeven
Center for Computational and Theoretical Biology, University of Würzburg, Department of Bioinformatics, University of Würzburg

Musga Qureischi

Department of Bioinformatics, University of Würzburg, Centre for Experimental Molecular Medicine, University Clinics Würzburg, Germany

Maik Gündel

Department of Bioinformatics, University of Würzburg

Clemens L. Weiß
Research Group for Ancient Genomics and Evolution, Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany

Thomas Hackl

Department of Civil and Environmental Engineering, Massachusetts Institute of Technology

Frank Förster

Center for Computational and Theoretical Biology, University of Würzburg, Department of Bioinformatics, University of Würzburg, Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Applied Ecology and Bioresources, Gießen, Germany

Editor

Pjotr Prins

CodeMeta (codemeta.json)

{
  "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
  "@type": "Code",
  "author": [
    {
      "@id": "https://orcid.org/0000-0002-6620-807X",
      "@type": "Person",
      "email": "markus.ankenbrand@uni-wuerzburg.de",
      "name": "Markus J. Ankenbrand",
      "affiliation": "Department of Animal Ecology and Tropical Biology (Zoology III), University of Würzburg, Germany"
    },
    {
      "@id": "https://orcid.org/0000-0001-8505-9439",
      "@type": "Person",
      "email": "simon.pfaff@stud-mail.uni-wuerzburg.de",
      "name": "Simon Pfaff",
      "affiliation": "Center for Computational and Theoretical Biology, University of Würzburg"
    },
    {
      "@id": "",
      "@type": "Person",
      "email": "",
      "name": "Niklas Terhoeven",
      "affiliation": "Center for Computational and Theoretical Biology, University of Würzburg"
    },
    {
      "@id": "https://orcid.org/0000-0001-9661-8494",
      "@type": "Person",
      "email": "qureischi_m@ukw.de",
      "name": "Musga Qureischi",
      "affiliation": "Centre for Experimental Molecular Medicine, University Clinics Würzburg, Germany"
    },
    {
      "@id": "https://orcid.org/0000-0002-0022-320X",
      "@type": "Person",
      "email": "",
      "name": "Maik Gündel",
      "affiliation": ""
    },
    {
      "@id": "",
      "@type": "Person",
      "email": "",
      "name": "Clemens L. Weiss",
      "affiliation": "Research Group for Ancient Genomics and Evolution, Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany"
    },
    {
      "@id": "https://orcid.org/0000-0002-0022-320X",
      "@type": "Person",
      "email": "thackl@mit.edu",
      "name": "Thomas Hackl",
      "affiliation": "Department of Civil and Environmental Engineering, Massachusetts Institute of Technology"
    },
    {
      "@id": "https://orcid.org/0000-0003-4166-5423",
      "@type": "Person",
      "email": "frank.foerster@ime.fraunhofer.de",
      "name": "Frank Förster",
      "affiliation": "Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Applied Ecology and Bioresources, Gießen, Germany"
    }
  ],
  "identifier": "https://doi.org/10.5281/zenodo.883594",
  "codeRepository": "https://github.com/chloroExtractorTeam/chloroExtractor",
  "datePublished": "2017-09-28",
  "dateModified": "2019-04-08",
  "dateCreated": "2017-09-28",
  "description": "The chloroExtractor is a perl based program which provides a pipeline for DNA extraction of chloroplast DNA from whole genome plant data.",
  "keywords": "chloroplast, genome, assembly, k-mer",
  "license": "MIT",
  "title": "chloroExtractor",
  "version": "v1.0.6"
}

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 593
Total Committers: 13
Avg Commits per committer: 45.615
Development Distribution Score (DDS): 0.55

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Frank Förster	f**r@b**e	267
Thomas Hackl	t**l@u**e	124
Markus Ankenbrand	m**d@s**e	96
PfaffS	s**f@s**e	47
Maik Guendel	m**l@s**e	44
Musga Qureischi	m**i@s**e	8
Niklas Terhoeven	n**n@u**e	1
Thomas Hackl	t**l@l**e	1
qmusga	3****a	1
Clemens Weiss	s**2@w**e	1
Maik Guendel	s**2@w**e	1
Thomas Hackl	s**2@w**e	1
Thomas Hackl	t**l@S**)	1

Committer Domains (Top 20 + Academic)

stud-mail.uni-wuerzburg.de: 4 uni-wuerzburg.de: 2 schlappi-iii.(none): 1 wbbi170.biozentrum.uni-wuerzburg.de: 1 wbbi121.biozentrum.uni-wuerzburg.de: 1 wrzh089.rzhousing.uni-wuerzburg.de: 1 lim4.de: 1 biozentrum.uni-wuerzburg.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 28
Total pull requests: 72
Average time to close issues: about 1 month
Average time to close pull requests: about 20 hours
Total issue authors: 11
Total pull request authors: 6
Average comments per issue: 1.75
Average comments per pull request: 1.43
Merged pull requests: 68
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

greatfireball (8)
iimog (5)
nterhoeven (3)
jdeligt (3)
PfaffS (2)
Joyvalley (2)
lychen83 (1)
classic-mcfly (1)
jdarias93 (1)
nsmt89 (1)
Tmesipteris (1)

Pull Request Authors

greatfireball (56)
iimog (8)
PfaffS (4)
thackl (2)
qmusga (1)
nterhoeven (1)

Top Labels

Issue Labels

enhancement (10) bug (6) help wanted (2) Workaround applied (2) question (1) Answered (1) Maybe new feature (1) MAC-specific (1) JOSS (1)

Pull Request Labels

enhancement (2)

chloroExtractor

Science Score: 95.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

chloroExtractor

News

Build and coverage status

Introduction

Requirements

Required Software

Installation

Docker

Usage

Input data

Example

create a folder for the testrun, adjust this to your needs or use the current folder DATAFOLDER=$(pwd)

download the example set and extract the sequencing reads

run chloroExtractor via command line (assuming all dependencies are installed and ptx folder is in PATH)

other possibility is docker container based chloroExtractor (assuming that the user is allowed to run docker)

this binds the DATAFOLDER from above into the docker container you can also use the path directly instead of the variable

Changelog

Version v1.0.9 2019-05-01

Version v1.0.8 2019-04-16

Version v1.0.7 2019-04-12

Version v1.0.6 2019-04-08

Version v1.0.5 2018-07-11

Version v1.0.4 2018-06-14

Version v1.0.3 2018-02-20

Version v1.0.2 2018-01-16

Version v1.0.1 2018-01-15

Version v1.0.0 2018-01-15

Howto cite

Repository archive

License

Owner

JOSS Publication

chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data

Authors

Editor

Tags

CodeMeta (codemeta.json)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels