crisprBwa

BWA-based alignment of CRISPR gRNA spacer sequences

https://github.com/crisprverse/crisprbwa

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

aligner bioconductor bioconductor-package bwa crispr crispr-analysis crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences sgrna sgrna-design

Keywords from Contributors

gene genomics sequencing ontology genomics-analysis proteomics
Last synced: 6 months ago · JSON representation

Repository

BWA-based alignment of CRISPR gRNA spacer sequences

Basic Info
  • Host: GitHub
  • Owner: crisprVerse
  • License: mit
  • Language: R
  • Default Branch: devel
  • Homepage:
  • Size: 105 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
aligner bioconductor bioconductor-package bwa crispr crispr-analysis crispr-cas9 crispr-design crispr-target grna grna-sequence grna-sequences sgrna sgrna-design
Created over 3 years ago · Last pushed over 1 year ago

https://github.com/crisprVerse/crisprBwa/blob/devel/

crisprBwa: alignment of gRNA spacer sequences using BWA
================

-   Overview
    of crisprBwa
-   Installation and getting
    started
    -   Software
        requirements
        -   OS Requirements
    -   Installation from
        Bioconductor
-   Building a
    bwa index
-   Alignment using
    runCrisprBwa
-   Applications beyond CRISPR
    -   Example using RNAi (siRNA
        design)
-   Reproducibility
-   References

Authors: Jean-Philippe Fortin

Date: July 13, 2022

# Overview of crisprBwa

`crisprBwa` provides two main functions to align short DNA sequences to
a reference genome using the short read aligner BWA-backtrack (Li and
Durbin 2009) and return the alignments as R objects: `runBwa` and
`runCrisprBwa`. It utilizes the Bioconductor package `Rbwa` to access
the BWA program in a platform-independent manner. This means that users
do not need to install BWA prior to using `crisprBwa`.

The latter function (`runCrisprBwa`) is specifically designed to map and
annotate CRISPR guide RNA (gRNA) spacer sequences using CRISPR nuclease
objects and CRISPR genomic arithmetics defined in the Bioconductor
package [crisprBase](https://github.com/crisprVerse/crisprBase). This
enables a fast and accurate on-target and off-target search of gRNA
spacer sequences for virtually any type of CRISPR nucleases. It also
provides an off-target search engine for our main gRNA design package
[crisprDesign](https://github.com/crisprVerse/crisprDesign) of the
[crisprVerse](https://github.com/crisprVerse) ecosystem. See the
`addSpacerAlignments` function in `crisprDesign` for more details.

# Installation and getting started

## Software requirements

### OS Requirements

This package is supported for macOS and Linux only. Package was
developed and tested on R version 4.2.1.

## Installation from Bioconductor

`crisprBwa` can be installed from from the Bioconductor devel branch
using the following commands in a fresh R session:

``` r
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(version="devel")
BiocManager::install("crisprBwa")
```

# Building a bwa index

To use `runBwa` or `runCrisprBwa`, users need to first build a BWA
genome index. For a given genome, this step has to be done only once.
The `Rbwa` package conveniently provides the function `bwa_build_index`
to build a BWA index from any custom genome from a FASTA file.

As an example, we build a BWA index for a small portion of the human
chromosome 12 (`chr12.fa` file provided in the `crisprBwa` package) and
save the index file as `myIndex` to a temporary directory:

``` r
library(Rbwa)
fasta <- system.file(package="crisprBwa", "example/chr12.fa")
outdir <- tempdir()
index <- file.path(outdir, "chr12")
Rbwa::bwa_build_index(fasta,
                      index_prefix=index)
```

To learn how to create a BWA index for a complete genome or
transcriptome, please visit our [tutorial
page](https://github.com/crisprVerse/Tutorials/tree/master/Building_Genome_Indices).

# Alignment using `runCrisprBwa`

As an example, we align 5 spacer sequences (of length 20bp) to the
custom genome built above, allowing a maximum of 3 mismatches between
the spacer and protospacer sequences.

We specify that the search is for the wildtype Cas9 (SpCas9) nuclease by
providing the `CrisprNuclease` object `SpCas9` available through the
`crisprBase` package. The argument `canonical=FALSE` specifies that
non-canonical PAM sequences are also considered (NAG and NGA for
SpCas9). The function `getAvailableCrisprNucleases` in `crisprBase`
returns a character vector of available `crisprNuclease` objects found
in `crisprBase`.

We also need to provide a `BSgenome` object corresponding to the
reference genome used for alignment to extract protospacer and PAM
sequences of the target sequences.

``` r
library(crisprBwa)
```

    ## Warning: multiple methods tables found for 'aperm'

    ## Warning: replacing previous import 'BiocGenerics::aperm' by
    ## 'DelayedArray::aperm' when loading 'SummarizedExperiment'

``` r
library(BSgenome.Hsapiens.UCSC.hg38)
```

    ## Loading required package: BSgenome

    ## Loading required package: BiocGenerics

    ## 
    ## Attaching package: 'BiocGenerics'

    ## The following objects are masked from 'package:stats':
    ## 
    ##     IQR, mad, sd, var, xtabs

    ## The following objects are masked from 'package:base':
    ## 
    ##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    ##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    ##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    ##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    ##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    ##     table, tapply, union, unique, unsplit, which.max, which.min

    ## Loading required package: S4Vectors

    ## Loading required package: stats4

    ## 
    ## Attaching package: 'S4Vectors'

    ## The following objects are masked from 'package:base':
    ## 
    ##     expand.grid, I, unname

    ## Loading required package: IRanges

    ## Loading required package: GenomeInfoDb

    ## Loading required package: GenomicRanges

    ## Loading required package: Biostrings

    ## Loading required package: XVector

    ## 
    ## Attaching package: 'Biostrings'

    ## The following object is masked from 'package:base':
    ## 
    ##     strsplit

    ## Loading required package: rtracklayer

``` r
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
bsgenome <- BSgenome.Hsapiens.UCSC.hg38
spacers <- c("AGCTGTCCGTGGGGGTCCGC",
             "CCCCTGCTGCTGTGCCAGGC",
             "ACGAACTGTAAAAGGCTTGG",
             "ACGAACTGTAACAGGCTTGG",
             "AAGGCCCTCAGAGTAATTAC")
runCrisprBwa(spacers,
             bsgenome=bsgenome,
             crisprNuclease=crisprNuclease,
             n_mismatches=3,
             canonical=FALSE,
             bwa_index=index)
```

    ## [runCrisprBwa] Using BSgenome.Hsapiens.UCSC.hg38 
    ## [runCrisprBwa] Searching for SpCas9 protospacers

    ##                 spacer          protospacer pam   chr pam_site strand
    ## 1 AAGGCCCTCAGAGTAATTAC AAGGCCCTCAGAGTAATTAC AGA chr12   170636      +
    ## 2 ACGAACTGTAAAAGGCTTGG ACGAACTGTAAAAGGCTTGG AGG chr12   170815      -
    ## 3 ACGAACTGTAACAGGCTTGG ACGAACTGTAAAAGGCTTGG AGG chr12   170815      -
    ## 4 AGCTGTCCGTGGGGGTCCGC AGCTGTCCGTGGGGGTCCGC AGG chr12   170585      +
    ## 5 CCCCTGCTGCTGTGCCAGGC CCCCTGCTGCTGTGCCAGGC CGG chr12   170609      +
    ##   n_mismatches canonical
    ## 1            0     FALSE
    ## 2            0      TRUE
    ## 3            1      TRUE
    ## 4            0      TRUE
    ## 5            0      TRUE

# Applications beyond CRISPR

The function `runBwa` is similar to `runCrisprBwa`, but does not impose
constraints on PAM sequences. It can be used to search for any short
read sequence in a genome.

## Example using RNAi (siRNA design)

Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNAi) experiments. `runBWa` can be used to map shRNA/siRNA
seed sequences to reference genomes to predict putative off-targets:

``` r
seeds <- c("GTAAGCGGAGTGT", "AACGGGGAGATTG")
runBwa(seeds,
       n_mismatches=2,
       bwa_index=index)
```

    ##           query   chr    pos strand n_mismatches
    ## 1 AACGGGGAGATTG chr12  68337      -            2
    ## 2 AACGGGGAGATTG chr12   1666      -            2
    ## 3 AACGGGGAGATTG chr12 123863      +            2
    ## 4 AACGGGGAGATTG chr12 151731      -            2
    ## 5 AACGGGGAGATTG chr12 110901      +            2
    ## 6 GTAAGCGGAGTGT chr12 101550      -            2

# Reproducibility

``` r
sessionInfo()
```

    ## R version 4.2.1 (2022-06-23)
    ## Platform: x86_64-apple-darwin17.0 (64-bit)
    ## Running under: macOS Catalina 10.15.7
    ## 
    ## Matrix products: default
    ## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
    ## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
    ## 
    ## locale:
    ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    ## 
    ## attached base packages:
    ## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
    ## [8] base     
    ## 
    ## other attached packages:
    ##  [1] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.65.2                  
    ##  [3] rtracklayer_1.57.0                Biostrings_2.65.3                
    ##  [5] XVector_0.37.1                    GenomicRanges_1.49.1             
    ##  [7] GenomeInfoDb_1.33.7               IRanges_2.31.2                   
    ##  [9] S4Vectors_0.35.3                  BiocGenerics_0.43.4              
    ## [11] crisprBwa_1.1.3                   Rbwa_1.1.0                       
    ## 
    ## loaded via a namespace (and not attached):
    ##  [1] SummarizedExperiment_1.27.2 tidyselect_1.1.2           
    ##  [3] xfun_0.32                   purrr_0.3.4                
    ##  [5] lattice_0.20-45             vctrs_0.4.1                
    ##  [7] htmltools_0.5.3             yaml_2.3.5                 
    ##  [9] utf8_1.2.2                  XML_3.99-0.10              
    ## [11] rlang_1.0.5                 pillar_1.8.1               
    ## [13] glue_1.6.2                  BiocParallel_1.31.12       
    ## [15] bit64_4.0.5                 matrixStats_0.62.0         
    ## [17] GenomeInfoDbData_1.2.8      lifecycle_1.0.1            
    ## [19] stringr_1.4.1               zlibbioc_1.43.0            
    ## [21] MatrixGenerics_1.9.1        codetools_0.2-18           
    ## [23] evaluate_0.16               restfulr_0.0.15            
    ## [25] Biobase_2.57.1              knitr_1.40                 
    ## [27] tzdb_0.3.0                  fastmap_1.1.0              
    ## [29] parallel_4.2.1              fansi_1.0.3                
    ## [31] crisprBase_1.1.8            readr_2.1.2                
    ## [33] DelayedArray_0.23.1         vroom_1.5.7                
    ## [35] bit_4.0.4                   Rsamtools_2.13.4           
    ## [37] rjson_0.2.21                hms_1.1.2                  
    ## [39] digest_0.6.29               stringi_1.7.8              
    ## [41] BiocIO_1.7.1                grid_4.2.1                 
    ## [43] cli_3.4.0                   tools_4.2.1                
    ## [45] bitops_1.0-7                magrittr_2.0.3             
    ## [47] RCurl_1.98-1.8              tibble_3.1.8               
    ## [49] crayon_1.5.1                pkgconfig_2.0.3            
    ## [51] ellipsis_0.3.2              Matrix_1.4-1               
    ## [53] rmarkdown_2.16              rstudioapi_0.14            
    ## [55] R6_2.5.1                    GenomicAlignments_1.33.1   
    ## [57] compiler_4.2.1

# References

Li, Heng, and Richard Durbin. 2009. Fast and Accurate Short Read Alignment with BurrowsWheeler Transform. *Bioinformatics* 25 (14): 175460.

Owner

  • Name: crisprVerse
  • Login: crisprVerse
  • Kind: organization
  • Email: fortin946@gmail.com

Bioconductor ecosystem for CRISPR gRNA design

GitHub Events

Total
  • Create event: 1
Last Year
  • Create event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 28
  • Total Committers: 4
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.5
Past Year
  • Commits: 4
  • Committers: 2
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
fortinj2 f****e@g****m 14
Jean-Philippe Fortin f****6@g****m 10
J Wokaty j****y@s****u 2
Nitesh Turaga n****a@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 6,700 total
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
bioconductor.org: crisprBwa

BWA-based alignment of CRISPR gRNA spacer sequences

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 6,700 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 29.9%
Downloads: 89.6%
Maintainers (1)
Last synced: 6 months ago