https://github.com/brentp/bwa-meth

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome

https://github.com/brentp/bwa-meth

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    4 of 18 committers (22.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords from Contributors

bioinformatics genomics ngs htslib dna vizualisation seqera reporting quality-control pypi
Last synced: 7 months ago · JSON representation

Repository

fast and accurate alignment of BS-Seq reads using bwa-mem and a 3-letter genome

Basic Info
Statistics
  • Stars: 150
  • Watchers: 12
  • Forks: 58
  • Open Issues: 18
  • Releases: 0
Created over 12 years ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

bwa-meth

Fast and accurante alignment of BS-Seq reads.

NOTE!!!

As of 2016-08-18, bwa-meth now outputs sam to stdout. It is up to the user to convert to bam. This means that the --prefix and --calmd flags are gone.

Update 2016

bwa-meth is still among (if not the) best aligners for BS-Seq. While it is fairly stable, I will continue to support the alignment part of bwa-meth--fixing any bugs or updating as needed.

There are now several (likely better) alternatives for tabulation and SNP calling than provided here so I will not develop those further.

For tabulation, bias, and plotting, use MethylDackel

For SNP calling (a more modern BisSNP), use biscuit

Intro

This works for single-end reads and for paired-end reads from the directional protocol (most common).

Uses the method employed by methylcoder and Bismark of in silico conversion of all C's to T's in both reference and reads.

Recovers the original read (needed to tabulate methylation) by attaching it as a comment which bwa appends as a tag to the read.

Performs favorably to existing aligners gauged by number of on and off-target reads for a capture method that targets CpG-rich region. Some off-target regions may be enriched, but all aligners are be subject to the same assumptions. See manuscript: http://arxiv.org/abs/1401.1129 for details. Optimal alignment is the upper-left corner. Curves are drawn by varying the mapping quality cutoff for alingers that use it.

This image is on real reads and represents an attempt to find good parameters for all aligners tested.

Untrimmed reads comparison

Note that bwa-meth and Last perform well without trimming.

run.sh scripts for each method are here: https://github.com/brentp/bwa-meth/tree/master/compare I have done my best to have each method perform optimally, but no doubt there could be improvements.

QuickStart

Without installation, you can use as python bwameth.py with install, the command is bwameth.py.

The commands: ```bash bwameth.py index $REF #Indexes with BWA-MEM (default) #OR bwameth.py index-mem2 $REF #Indexes with BWA-MEM2

bwameth.py --reference $REF someR1.fastq.gz someR2.fastq.gz > some.output.sam `` will createsome.output.bamandsome.output.bam.bai`. To align single end-reads, specify only 1 file.

See the full example at: https://github.com/brentp/bwa-meth/tree/master/example/

Installation

The following snippet should work for most systems that have samtools and bwa installed and the ability to install python packages. (Or, you can send this to your sys-admin). See the dependencies section below for further instructions:

```Shell

# these 4 lines are only needed if you don't have toolshed installed
wget https://pypi.python.org/packages/source/t/toolshed/toolshed-0.4.0.tar.gz
tar xzvf toolshed-0.4.0.tar.gz
cd toolshed-0.4.0
sudo python setup.py install

wget https://github.com/brentp/bwa-meth/archive/master.zip
unzip master.zip
cd bwa-meth-master/
sudo python setup.py install

```

After this, you should be able to run: bwameth.py and see the help.

Dependencies

bwa-meth depends on

  • python 2.7+ (including python3)

    • toolshed library. can be installed with:
      • easy_install toolshed or
      • pip install toolshed
    • if you don't have root or sudo priviledges, you can run python setup.py install --user from this directory and the bwameth.py executable will be at: ~/.local/bin/bwameth.py
    • if you do have root or sudo run: [sudo] python setup.py install from this directory
    • users unaccustomed to installing their own python packages should download anaconda: https://store.continuum.io/cshop/anaconda/ and then install the toolshed module with pip as described above.
  • samtools command on the $PATH (https://github.com/samtools/samtools)

  • bwa mem from: https://github.com/lh3/bwa OR bwa-mem2 from: https://github.com/bwa-mem2/bwa-mem2

usage

Index

One time only, you need to index a reference sequence.

bwameth.py index $REF #Indexes with BWA-MEM (default)
#OR
bwameth.py index-mem2 $REF #Indexes with BWA-MEM2

If your reference is some.fasta, this will create some.c2t.fasta and all of the bwa indexes associated with it.

Align

bwameth.py --threads 16 \
     --reference $REFERENCE \
     $FQ1 $FQ2 > some.sam

The output will pass will have the reads in the correct location (flipped from G => A reference).

Handles clipped alignments and indels correctly. Fastqs can be gzipped or not.

The command above will be sent to BWA-MEM or BWA-MEM2 to do the work as something like:

```bash bwa mem -L 25 -pCM -t 15 $REFERENCE.c2t.fa \ '<python bwameth.py c2t $FQ1 $FQ2'

          #OR

bwa-mem2 mem -L 25 -pCM -t 15 $REFERENCE.c2t.fa \ '<python bwameth.py c2t $FQ1 $FQ2' ```

Index from BWA-MEM or BWA-MEM2 is auto detected and the corresponding aligner is chosen.

So the converted reads are streamed directly to bwa and never written to disk. The output from that is modified by bwa-meth and streamed straight to a bam file.

Owner

  • Name: Brent Pedersen
  • Login: brentp
  • Kind: user
  • Location: Oregon, USA

Doing genomics

GitHub Events

Total
  • Issues event: 1
  • Watch event: 8
  • Issue comment event: 8
  • Push event: 2
  • Fork event: 2
Last Year
  • Issues event: 1
  • Watch event: 8
  • Issue comment event: 8
  • Push event: 2
  • Fork event: 2

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 334
  • Total Committers: 18
  • Avg Commits per committer: 18.556
  • Development Distribution Score (DDS): 0.075
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Brent Pedersen b****e@g****m 309
Brad Langhorst l****t@n****m 8
Roman Chernyatchik r****k@j****m 2
jklughammer j****r@c****t 1
Anand Mayakonda a****3@g****m 1
Chris Cheshire c****e@g****m 1
Graham Gower g****r@g****m 1
John Didion g****b@d****t 1
Kenneth Hoste k****e@u****e 1
Nicola Soranzo n****o@e****k 1
Paul Menzel p****l@m****e 1
astatham a****m@g****m 1
dpryan79 d****9@g****m 1
nchernia n****a@b****g 1
swingingsimian n****n@g****m 1
tsy 9****1@q****m 1
ttriche t****e@g****m 1
xfengnefx 6****x 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 136
  • Total pull requests: 35
  • Average time to close issues: 8 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 59
  • Total pull request authors: 17
  • Average comments per issue: 3.93
  • Average comments per pull request: 2.14
  • Merged pull requests: 33
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 7
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 18 minutes
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • superbobry (8)
  • avilella (3)
  • crazyhottommy (3)
  • nchernia (3)
  • Shicheng-Guo (2)
  • xie186 (2)
  • bwlang (2)
  • JohnLonginotto (2)
  • WRui (2)
  • anusurendra (2)
  • cb4github (1)
  • jvhaarst (1)
  • alexpcheng (1)
  • yangruialex (1)
  • iromeo (1)
Pull Request Authors
  • bwlang (2)
  • jklughammer (2)
  • chris-cheshire (1)
  • ttriche (1)
  • boegel (1)
  • swingingsimian (1)
  • grahamgower (1)
  • dpryan79 (1)
  • iromeo (1)
  • nchernia (1)
  • tsy19900929 (1)
  • nsoranzo (1)
  • jdidion (1)
  • PoisonAlien (1)
  • xfengnefx (1)
Top Labels
Issue Labels
wontfix (1)
Pull Request Labels

Dependencies

requirements.txt pypi
  • toolshed >=0.4.5
setup.py pypi
  • toolshed >=0.4.5