https://github.com/bartongroup/two_pass_alignment_pipeline

Snakemake pipeline for two pass alignment

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Snakemake pipeline for two pass alignment

Basic Info

Host: GitHub
Owner: bartongroup
Language: Jupyter Notebook
Default Branch: master
Size: 16.2 MB

Statistics

Stars: 2
Watchers: 6
Forks: 0
Open Issues: 1
Releases: 0

Created over 6 years ago · Last pushed about 6 years ago

https://github.com/bartongroup/two_pass_alignment_pipeline/blob/master/

# Two pass benchmarking pipeline:

[![two_pass_pipeline_doi](https://zenodo.org/badge/DOI/10.5281/zenodo.3778868.svg)](https://zenodo.org/record/3778868)

A snakemake pipeline to process the data for the two-pass alignment paper.

Workflow without read simulation:







Workflow with read simulation:



## Install and run:

To run the pipeline you first need conda. Conda environments for each rule will be created automatically by snakemake. The environment yamls are in `rules/env_yamls`. If you want to use the same conda environments for all pipeline runs, rather than building them for each run, you can use the `--conda-prefix` parameter of snakemake to set a shared location for environments to be stored in.

To run the benchmarking pipeline for the Arabidopsis DRS data will require c.a. 75GB of disk space.

Something like:

```
# need to clone recursively to get submodules
git clone --recursive https://github.com/bartongroup/two_pass_alignment_pipeline.git
cd two_pass_alignment_pipeline

# a specific minimap2 bugfix is required which is not yet on bioconda
# this is contained as a submodule, which needs building
cd scripts/minimap2 && make
cd ../..

# make snakemake environment
conda env create -f 2passpipeline.yml
conda activate 2passpipeline

# download data and annotations for e.g. Arabidopsis DRS benchmarking
cd annotations/arabidopsis
./fetch_annotations.sh
cd ../../pipeline/arabidopsis_drs
./fetch_data.sh
cd ..

# set up the conda environments required to run the pipeline
snakemake -d arabidopsis_drs \
  --use-conda \
  --create-envs-only \
  --conda-prefix 

# run the pipeline on Arabidopsis DRS data, using SGE cluster
# note that the cluster settings may need to be altered for your setup
snakemake -d arabidopsis_drs \
  --use-conda \
  --conda-prefix  \
  --cluster "qsub -V -cwd -pe smp {threads}" \
  -j 999

# To run the pipeline without a cluster i.e. in serial:
snakemake -d arabidopsis_drs \
  --use-conda \
  --conda-prefix  \
  -j 999
```

## Requirements:

### Before running:

* conda
* snakemake

### Installed using conda/snakemake

All of these things should be installed in their own environments by snakemake so you shouldn't have to worry about them:

For 2passtools:
* 2passtools
* python 3 (tested with 3.6)
* numpy
* scipy
* pysam
* ncls
* scikit-learn
* click
* click-log

For yanosim:
* yanosim
* python 3
* numpy
* scipy
* pysam
* click

For pipeline:
* minimap2 >= 2.17
* samtools
* stringtie >= 2.0
* bedtools
* gffcompare

To run FLAIR:
* flair
* kerneltree

## TODO:
  * Make annotation optional so pipeline can be run on organisms without annotations.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bartongroup/two_pass_alignment_pipeline

Science Score: 23.0%

Repository

Basic Info

Statistics

https://github.com/bartongroup/two_pass_alignment_pipeline/blob/master/

Owner

GitHub Events

Total

Last Year