https://github.com/bihealth/swibrid
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: bihealth
- Language: Python
- Default Branch: main
- Size: 9.03 MB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Created almost 4 years ago
· Last pushed 11 months ago
Metadata Files
Readme
README.rst
swibrid
#######
SWIBRID (SWItch joint Breakpoint Repertoire IDentification) is a computational pipeline to analyze long-read sequencing data of switch joints occurring during class switch recombination.
.. image:: docs/source/_static/swibrid_scheme.png
:width: 800
:alt: SWIBRID scheme
documentation
=============
read the `documentation `_.
quick start guide
=================
installation
------------
#. clone the github repo and change into the source folder::
git clone git@github.com:bihealth/swibrid.git
cd swibrid
#. create a conda environment::
conda env create -f swibrid_env.yaml
conda activate swibrid_env
#. install ``swibrid``::
pip install .
alternatively, use the docker image::
docker run -v $(pwd):/home/swibriduser -u $(id -u):$(id -g) ghcr.io/bihealth/swibrid:latest -h
testing
-------
for a simple and (relatively) quick end-to-end test, run::
swibrid test
this will create two samples with about 1000 synthetic reads in ``input`` and run the pipeline on this data,
using a reduced hg38 genome in ``index`` with only the switch region (chr14:105000000-106000000).
it will probably take about 5 minutes and produce plots in ``output/read_plots`` and
table of summary statistics in ``output/summary``
running your own data
---------------------
this assumes you have a ``fastq.gz`` file with sequencing output from minION or PacBio.
If samples were multiplexed (e.g., with ONT barcodes), you should set up a sample sheet like so::
BC01 sample1
BC02 sample2
...
and a file with barcode and primer sequences like so::
>BC01
AAGAAAGTTGTCGGTGTCTTTGTG
>BC02
TCGATTCCGTTTGTAGTCGTCTGT
...
>primer_mu_fw
CACCCTTGAAAGTAGCCCATGCCTTCC
>primer_alpha_rv
CTCAGTCCAACACCCACCACTCC
>primer_gamma_rv
CTGCCTCCCAGTGTCCTGCATTACTTCTG
#. set up snakemake and config files in a new directory::
mkdir results
cd results
swibrid setup
#. provide genome (+ index), annotation files in ``index``::
mkdir index
cd index
# get hg38 genome from UCSC (or elsewhere)
wget http://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
# create LAST index
lastdb hg38db hg38.fa
# download gene annotation from ENCODE (or elsewhere)
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gtf.gz
gunzip gencode.v33.annotation.gtf.gz
swibrid get_annotation -i gencode.v33.annotation.gtf -o gencode.v33.annotation.exon.gene_shorted.bed
#. create bed file with switch region definitions::
chr14 105588700 105591700 SA2
chr14 105603000 105603500 SE
chr14 105626500 105629000 SG4
chr14 105645400 105647900 SG2
chr14 105708900 105712900 SA1
chr14 105743700 105747700 SG1
chr14 105772100 105775600 SG3
chr14 105856100 105861100 SM
#. edit (at least) the following entries in the ``config.yaml`` file (make sure that sample names in ``SAMPLES`` all appear in the sample sheet)::
INPUT: "path/to/input.fastq.gz"
SAMPLE_SHEET: "path/to/sample_sheet.csv"
BARCODES_PRIMERS: "path/to/barcodes_primers.fa"
SAMPLES: ["sample1","sample2", ...]
SWITCH_ANNOTATION: "path/to/switch_regions.bed"
#. run the pipeline::
swibrid run -np # for a dry-run
swibrid run # for an actual run
swibrid run --slurm # submit to slurm
swibrid run --unlock # unlock snakemake before restarting an interrupted/killed instance
Owner
- Name: Berlin Institute of Health
- Login: bihealth
- Kind: organization
- Website: https://www.cubi.bihealth.org/
- Repositories: 215
- Profile: https://github.com/bihealth
BIH Core Unit Bioinformatics & BIH HPC IT
GitHub Events
Total
- Push event: 4
- Public event: 1
Last Year
- Push event: 4
- Public event: 1
Dependencies
.github/workflows/publish-docs.yml
actions
- actions/checkout v2 composite
- actions/setup-python v2 composite
- peaceiris/actions-gh-pages v3 composite
Dockerfile
docker
- continuumio/miniconda3 latest build
setup.py
pypi