https://github.com/databio/dnameth_pipelines

PEP-compatible pipelines for DNA methylation data (RRBS, WGBS)

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

PEP-compatible pipelines for DNA methylation data (RRBS, WGBS)

Basic Info

Host: GitHub
Owner: databio
Language: Python
Default Branch: master
Homepage:
Size: 556 KB

Statistics

Stars: 2
Watchers: 6
Forks: 1
Open Issues: 21
Releases: 3

Created about 9 years ago · Last pushed almost 3 years ago

Metadata Files

Readme Changelog

DNA methylation pipelines

This repository contains pipelines to process DNA methylation data for RRBS and WGBS experiments. It does adapter trimming, mapping, methylation calling, and produces other outputs. You can download the latest version from the releases page and a history of version changes is in the CHANGELOG.

Pipeline features at-a-glance

These features are explained in more detail later in this README.

Description pending.

Installing

Prerequisite python packages. This pipeline uses pypiper to run a single sample, looper to handle multi-sample projects (for either local or cluster computation), and pararead for parallel processing sequence reads. You can do a user-specific install of these like this:

pip install --user https://github.com/databio/pypiper/zipball/master pip install --user https://github.com/pepkit/looper/zipball/master pip install --user https://github.com/databio/pararead/zipball/master

Required executables. You will need some common bioinformatics tools installed. The list is specified in the pipeline configuration files (.yaml files in src/).

Genome resources. This pipeline requires genome assemblies produced by refgenie. You may download pre-indexed references or you may index your own (see refgenie instructions). Any prealignments you want to do use will also require refgenie assemblies. Some common examples are provided by ref_decoy.

Clone the pipeline. Clone this repository using one of these methods: - using SSH: git clone git@github.com:databio/dnameth_pipelines.git - using HTTPS: git clone https://github.com/databio/dnameth_pipelines.git

Configuring

There are two configuration options: You can either set up environment variables to fit the default configuration, or change the configuration file to fit your environment. Choose one:

Option 1: Default configuration (recommended; e.g. src/rrbs.yaml). - Make sure the executable tools (java, samtools, bowtie2, etc.) are in your PATH. - Set up environment variables to point to jar files for the java tools (picard and trimmomatic). export PICARD="/path/to/picard.jar" export TRIMMOMATIC="/path/to/trimmomatic.jar"

Define environment variable GENOMES for refgenie genomes. export GENOMES="/path/to/genomes/folder/"

Option 2: Custom configuration. Instead, you can also put absolute paths to each tool or resource in the configuration file to fit your local setup. Just change the pipeline configuration file (src/rrbs.yaml) appropriately.

Running the pipeline

You never need to interface with the pipeline directly, but you can if you want. Just run python src/rrbs.py -h to see usage. But the best way to use this pipeline is to run it using looper. You will need to tell looper about your project. Example project data are in the examples/test_project folder. Run the pipeline across all samples in the test project with this command: looper run examples/test_project/test_config.yaml

If the looper executable in not your $PATH, add the following line to your .bashrc or .profile:

export PATH=$PATH:~/.local/bin

Now, adapt the example project to your project. Here's a quick start: You need to build two files for your project (follow examples in the examples/test_project folder):

project config file -- describes output locations, pointers to data, etc.
sample annotation file -- comma-separated value (CSV) list of your samples.

Your annotation file must specify these columns: - sample_name - library (must be 'RRBS' or 'WGBS') - organism (may be 'human' or 'mouse') - read1 - read2 - whatever else you want

Run your project as above, by passing your project config file to looper run. More detailed instructions and advanced options for how to define your project are in the Looper documentation on defining a project. Of particular interest may be the section on using looper derived columns.

Using a cluster

Once you've specified your project to work with this pipeline, you will also inherit all the power of looper for your project. You can submit these jobs to a cluster with a simple change to your configuration file. Follow instructions in configuring looper to use a cluster.

Looper can also summarize your results, monitor your runs, clean intermediate files to save disk space, and more. You can find additional details on what you can do with this in the looper docs.

Contributing

Pull requests welcome. Active development should occur in a development or feature branch.

Owner

Name: Databio
Login: databio
Kind: organization
Location: University of Virginia

Website: https://databio.org
Repositories: 88
Profile: https://github.com/databio

Solving problems in computational biology

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 65
Total pull requests: 29
Average time to close issues: 11 days
Average time to close pull requests: 5 days
Total issue authors: 3
Total pull request authors: 3
Average comments per issue: 1.65
Average comments per pull request: 2.21
Merged pull requests: 25
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/databio/dnameth_pipelines

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

DNA methylation pipelines

Pipeline features at-a-glance

Installing

Configuring

Running the pipeline

Using a cluster

Contributing

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels