chipseq_workflows

Workflows for processing of ChIP-seq, ChIPmentation, and Cut&Run data based on the Common Workflow Language

https://github.com/compepigen/chipseq_workflows

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Workflows for processing of ChIP-seq, ChIPmentation, and Cut&Run data based on the Common Workflow Language

Basic Info
  • Host: GitHub
  • Owner: CompEpigen
  • License: apache-2.0
  • Language: Common Workflow Language
  • Default Branch: master
  • Size: 1.16 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 4
Created over 7 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Zenodo

Readme.md

CWL-based Workflows for ChIP-seq, Cut&Run, ChIPmentation, and ACTseq

This repository includes workflows for upstream processing of ChIP-seq and Cut&Run data. The workflows are written in CWL and they are ready to run with a containerization solution like docker or sigularity.

CWL Workflows:

Following workflows are available in the directory ./CWL/workflows: * ChIPseq.cwl - works with single or paired end standard ChIPseq data * ChIPseq_spike_in.cwl - same as above but with support for spike-in normalization * CUTnRUN.cwl - specialized version of the ChIPseq spike-in pipeline for CUT&RUN * ChIPmenmentaion.cwl - works with single or paired end ChIPmentation data * ACTseq.cwl - works with single or paired end ACTseq data

They depend on subworkflows located in ./CWL/workflow_modules and CommandLineTool wrapper located in ./CWL/tools.

How to use:

0. How to set up:

Clone this repository, or download and extract a specific release.

You need to install a cwlrunner. We recommend cwltool for execution on a single machine or toil if you would like to use an HPC cluster or cloud infrastructure. A snapshot of a dedicated conda environment can be found at conda_toil_env.yml. However, every other runner that is fully compatible with CWL v1.0 should work, too.

To use the docker containers specified in the workflows, you need to set up docker or singularity.

Moreover, you need to install node.js for evaluation of the built-in java script expressions.

1. How to run:

For workflow execution with cwltool and toil, you can find example shell scripts in the top directory (run_*.sh). They can be use in the following fashion (however, you likely need to slightly adjust the cwlrunner command line specified in the scripts):
bash ./run_cwltool.sh <cwl_workflow> <input_file_or_dir> <output_dir>
Or: bash ./run_cwltool.sh <cwl_workflow> <input_file_or_dir> <output_dir>

  • <cwl_workflow> is a path to a cwl workflow.
  • <input_file_or_dir> is the path to a YAML file containing input parameters, or the path to a directory with multiple YAML jobs that should be executed
  • <output_dir> is the directory where final output files will be collected

Monitoring:
A log file (the name is ending with .yaml.log) will be created for each sample/run. You can open them to check the progress.

Example/test run:
Please try to run the following tiny test example to check whether your system is correctly configured. bash ./run_cwltool.sh ./CWL/workflows/ChIPseq.cwl ./tiny_test/test_main.chip.yml ~/cwl_test

Terminating a job:
If you like to terminate a specific run and all its subprocessed, you can find the corresponding job-ids in the file cwl_background_job_ids.log. Run: kill <job_id>.

2. Check the output

Once the processing finished, you can find one QC report per workflow iteration (name ends with *_report.html). These reports were created using multiqc.

They can give you a deep understanding of your data's quality. However, you should also keep in mind that unexpected results can come from bugs in the custom scripts or the tools which are applied by the workflow. Therefore, check the report carefully. Especially, examine the "read flow" through the pipeline using the fastqc output: pay attention whether you encounter an unexpected loss of reads.

Testing and debugging

There are tiny synthetic tests located in ./tiny_test/. They are perfect for initial testing and debugging.

Owner

  • Name: Computational Cancer Epigenomics @ DKFZ
  • Login: CompEpigen
  • Kind: organization

GitHub Events

Total
  • Watch event: 1
  • Push event: 3
Last Year
  • Watch event: 1
  • Push event: 3

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • suhrig (1)
Pull Request Authors
  • KerstenBreuer (1)
Top Labels
Issue Labels
Pull Request Labels