https://github.com/bartongroup/two_pass_alignment_pipeline
Snakemake pipeline for two pass alignment
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Snakemake pipeline for two pass alignment
Basic Info
- Host: GitHub
- Owner: bartongroup
- Language: Jupyter Notebook
- Default Branch: master
- Size: 16.2 MB
Statistics
- Stars: 2
- Watchers: 6
- Forks: 0
- Open Issues: 1
- Releases: 0
Created over 6 years ago
· Last pushed about 6 years ago
https://github.com/bartongroup/two_pass_alignment_pipeline/blob/master/
# Two pass benchmarking pipeline: [](https://zenodo.org/record/3778868) A snakemake pipeline to process the data for the two-pass alignment paper. Workflow without read simulation:![]()
Workflow with read simulation:## Install and run: To run the pipeline you first need conda. Conda environments for each rule will be created automatically by snakemake. The environment yamls are in `rules/env_yamls`. If you want to use the same conda environments for all pipeline runs, rather than building them for each run, you can use the `--conda-prefix` parameter of snakemake to set a shared location for environments to be stored in. To run the benchmarking pipeline for the Arabidopsis DRS data will require c.a. 75GB of disk space. Something like: ``` # need to clone recursively to get submodules git clone --recursive https://github.com/bartongroup/two_pass_alignment_pipeline.git cd two_pass_alignment_pipeline # a specific minimap2 bugfix is required which is not yet on bioconda # this is contained as a submodule, which needs building cd scripts/minimap2 && make cd ../.. # make snakemake environment conda env create -f 2passpipeline.yml conda activate 2passpipeline # download data and annotations for e.g. Arabidopsis DRS benchmarking cd annotations/arabidopsis ./fetch_annotations.sh cd ../../pipeline/arabidopsis_drs ./fetch_data.sh cd .. # set up the conda environments required to run the pipeline snakemake -d arabidopsis_drs \ --use-conda \ --create-envs-only \ --conda-prefix
# run the pipeline on Arabidopsis DRS data, using SGE cluster # note that the cluster settings may need to be altered for your setup snakemake -d arabidopsis_drs \ --use-conda \ --conda-prefix \ --cluster "qsub -V -cwd -pe smp {threads}" \ -j 999 # To run the pipeline without a cluster i.e. in serial: snakemake -d arabidopsis_drs \ --use-conda \ --conda-prefix \ -j 999 ``` ## Requirements: ### Before running: * conda * snakemake ### Installed using conda/snakemake All of these things should be installed in their own environments by snakemake so you shouldn't have to worry about them: For 2passtools: * 2passtools * python 3 (tested with 3.6) * numpy * scipy * pysam * ncls * scikit-learn * click * click-log For yanosim: * yanosim * python 3 * numpy * scipy * pysam * click For pipeline: * minimap2 >= 2.17 * samtools * stringtie >= 2.0 * bedtools * gffcompare To run FLAIR: * flair * kerneltree ## TODO: * Make annotation optional so pipeline can be run on organisms without annotations.
Owner
- Name: Geoff Barton's Computational Biology Group
- Login: bartongroup
- Kind: organization
- Location: Dundee, Scotland, UK
- Website: https://www.compbio.dundee.ac.uk
- Twitter: bartongrp
- Repositories: 57
- Profile: https://github.com/bartongroup
## Install and run:
To run the pipeline you first need conda. Conda environments for each rule will be created automatically by snakemake. The environment yamls are in `rules/env_yamls`. If you want to use the same conda environments for all pipeline runs, rather than building them for each run, you can use the `--conda-prefix` parameter of snakemake to set a shared location for environments to be stored in.
To run the benchmarking pipeline for the Arabidopsis DRS data will require c.a. 75GB of disk space.
Something like:
```
# need to clone recursively to get submodules
git clone --recursive https://github.com/bartongroup/two_pass_alignment_pipeline.git
cd two_pass_alignment_pipeline
# a specific minimap2 bugfix is required which is not yet on bioconda
# this is contained as a submodule, which needs building
cd scripts/minimap2 && make
cd ../..
# make snakemake environment
conda env create -f 2passpipeline.yml
conda activate 2passpipeline
# download data and annotations for e.g. Arabidopsis DRS benchmarking
cd annotations/arabidopsis
./fetch_annotations.sh
cd ../../pipeline/arabidopsis_drs
./fetch_data.sh
cd ..
# set up the conda environments required to run the pipeline
snakemake -d arabidopsis_drs \
--use-conda \
--create-envs-only \
--conda-prefix