https://github.com/bihealth/swibrid

https://github.com/bihealth/swibrid

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bihealth
  • Language: Python
  • Default Branch: main
  • Size: 9.03 MB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed 11 months ago
Metadata Files
Readme

README.rst

swibrid
#######

SWIBRID (SWItch joint Breakpoint Repertoire IDentification) is a computational pipeline to analyze long-read sequencing data of switch joints occurring during class switch recombination.

.. image:: docs/source/_static/swibrid_scheme.png
    :width: 800
    :alt: SWIBRID scheme

documentation
=============

read the `documentation `_.


quick start guide
=================


installation
------------

#. clone the github repo and change into the source folder::

        git clone git@github.com:bihealth/swibrid.git
        cd swibrid

#. create a conda environment::

        conda env create -f swibrid_env.yaml
        conda activate swibrid_env

#. install ``swibrid``::

        pip install .


alternatively, use the docker image::

        docker run -v $(pwd):/home/swibriduser -u $(id -u):$(id -g) ghcr.io/bihealth/swibrid:latest -h 


testing
-------

for a simple and (relatively) quick end-to-end test, run::

   swibrid test

this will create two samples with about 1000 synthetic reads in ``input`` and run the pipeline on this data,
using a reduced hg38 genome in ``index`` with only the switch region (chr14:105000000-106000000).
it will probably take about 5 minutes and produce plots in ``output/read_plots`` and 
table of summary statistics in ``output/summary``


running your own data
---------------------

this assumes you have a ``fastq.gz`` file with sequencing output from minION or PacBio.
If samples were multiplexed (e.g., with ONT barcodes), you should set up a sample sheet like so::

   BC01         sample1
   BC02         sample2
   ...

and a file with barcode and primer sequences like so::

   >BC01
   AAGAAAGTTGTCGGTGTCTTTGTG
   >BC02
   TCGATTCCGTTTGTAGTCGTCTGT
   ...
   >primer_mu_fw
   CACCCTTGAAAGTAGCCCATGCCTTCC
   >primer_alpha_rv
   CTCAGTCCAACACCCACCACTCC
   >primer_gamma_rv
   CTGCCTCCCAGTGTCCTGCATTACTTCTG

#. set up snakemake and config files in a new directory::

        mkdir results
        cd results
        swibrid setup

#. provide genome (+ index), annotation files in ``index``::

        mkdir index
        cd index
        # get hg38 genome from UCSC (or elsewhere)
        wget http://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz 
        gunzip hg38.fa.gz
        # create LAST index
        lastdb hg38db hg38.fa 
        # download gene annotation from ENCODE (or elsewhere)
        wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gtf.gz
        gunzip gencode.v33.annotation.gtf.gz
        swibrid get_annotation -i gencode.v33.annotation.gtf -o gencode.v33.annotation.exon.gene_shorted.bed

#. create bed file with switch region definitions::

	chr14	105588700	105591700	SA2
	chr14	105603000	105603500	SE
	chr14	105626500	105629000	SG4
	chr14	105645400	105647900	SG2
	chr14	105708900	105712900	SA1
	chr14	105743700	105747700	SG1
	chr14	105772100	105775600	SG3
	chr14	105856100	105861100	SM

#. edit (at least) the following entries in the ``config.yaml`` file (make sure that sample names in ``SAMPLES`` all appear in the sample sheet)::
   
        INPUT: "path/to/input.fastq.gz"
        SAMPLE_SHEET: "path/to/sample_sheet.csv"
        BARCODES_PRIMERS: "path/to/barcodes_primers.fa" 
        SAMPLES: ["sample1","sample2", ...]
        SWITCH_ANNOTATION: "path/to/switch_regions.bed"
    
   
#. run the pipeline::

        swibrid run -np        # for a dry-run
        swibrid run            # for an actual run
        swibrid run --slurm    # submit to slurm
        swibrid run --unlock   # unlock snakemake before restarting an interrupted/killed instance

Owner

  • Name: Berlin Institute of Health
  • Login: bihealth
  • Kind: organization

BIH Core Unit Bioinformatics & BIH HPC IT

GitHub Events

Total
  • Push event: 4
  • Public event: 1
Last Year
  • Push event: 4
  • Public event: 1

Dependencies

.github/workflows/publish-docs.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • peaceiris/actions-gh-pages v3 composite
Dockerfile docker
  • continuumio/miniconda3 latest build
setup.py pypi