strandCheckR

strandCheckR: An R package for quantifying and removing double strand sequences for strand-specific RNA-seq - Published in JOSS (2019)

https://github.com/uofabioinformaticshub/strandcheckr

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    3 of 11 committers (27.3%) from academic institutions
  • Institutional organization owner
    Organization uofabioinformaticshub has institutional domain (www.adelaide.edu.au)
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords from Contributors

bioconductor-package motif-analysis motif-enrichment-analysis sequence-logo bioconductor
Last synced: 6 months ago · JSON representation

Repository

Basic Info
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 1
  • Open Issues: 2
  • Releases: 1
Created over 9 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE)
```

[![Build Status](https://travis-ci.org/UofABioinformaticsHub/strandCheckR.svg?branch=master)](https://travis-ci.org/UofABioinformaticsHub/strandCheckR)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
[![DOI](https://zenodo.org/badge/70646093.svg)](https://zenodo.org/badge/latestdoi/70646093)

strandCheckR
---

This package aims to check the strandedness of reads in a bam file, 
enabling easy detection of any contaminating genomic DNA or other
unexpected sources of contamination.
It can be applied to quantify and remove reads which correspond to putative 
double strand DNA within a strand-specific RNA sample. 
The package uses a sliding window to scan a bam file and find the number of 
positive/negative reads in each window.
It then provides method to plot the proportions of positive/negative stranded 
alignments within all windows, which allow users to determine how much the
sample was contaminated, and to determine an appropriate threshold for filtering. 
Finally, users can filter putative DNA contamination from any strand-specific 
RNAseq sample using their selected threshold.

## Installation

To install the release version from Bioconductor:

```{r installBioc, eval = FALSE}
install.packages("BiocManager")
BiocManager::install("strandCheckR")
```

To install the development version on github (i.e. this version):

```{r installGit, eval=FALSE}
install.packages("BiocManager")
BiocManager::install("UofABioinformaticsHub/strandCheckR")
```


## Quick Usage Guide

Following are the main functions of the package.

- `getStrandFromBamFile()`

To get the number of +/- stranded reads of all sliding windows across a bam 
file:

```{r, message=FALSE}
# Load the package and example bam files
library(strandCheckR)
files <- system.file(
	"extdata", c("s1.sorted.bam", "s2.sorted.bam"),
	package = "strandCheckR"
)

# Find the read proportions from chromosome 10 for the two files
win <- getStrandFromBamFile(files, sequences = "10")

# Tidy up the file name for prettier output
win$File <- basename(as.character(win$File))
win
```


- `plotHist()`

The histogram plot shows you the proportion of +/- stranded reads across all 
windows.

```{r plotHist, message=FALSE}
plotHist(
        windows = win, 
        groupBy = "File", 
        normalizeBy = "File", 
        scales = "free_y"
        )
```


In this example, *s2.sorted.bam* seems to be contaminated with double stranded 
DNA, as evidenced by many windows containing a roughly equal proportion of 
reads on both strands, whilst *s1.sorted.bam* is cleaner.

- `plotWin()`

The output from `plotWin()` represents each window as a point. 
This plot also has threshold lines which can be used to provide guidance as to 
the best threshold to choose when filtering windows. 
Given a suitable threshold, reads from a positive (resp. negative) window are 
kept if and only if the proportion is above (resp. below) the corresponding 
threshold line.

```{r plotWin, message=FALSE, warning=FALSE}
plotWin(win, groupBy = "File")
```


- `filterDNA()`

The function `filterDNA()` removes potential double stranded DNA from a bam
file using a selected threshold.


```{r win2, message=FALSE}
win2 <- filterDNA(
	file = files[2], 
	destination = "s2.filtered.bam", 
	sequences = "10", 
	threshold = 0.7, 
	getWin = TRUE
)
```


Comparing the histogram plot of the file before and after filtering shows that 
reads from the windows with roughly equal proportions of +/- stranded reads 
have been removed.

```{r plotHistAfterFilter, message=FALSE}
win2$File <- basename(as.character(win2$File))
win2$File <- factor(win2$File, levels = c("s2.sorted.bam", "s2.filtered.bam"))
library(ggplot2)
plotHist(win2, groupBy = "File", normalizeBy = "File", scales = "free_y") 
```


A more comprehensive vignette is available at https://bioconductor.org/packages/release/bioc/vignettes/strandCheckR/inst/doc/strandCheckR.html

## Support

We recommend that questions seeking support in using the software are posted to 
the Bioconductor support forum - https://support.bioconductor.org/ - where they 
will attract not only our attention but that of the wider Bioconductor community.

Code contributions, bug reports and feature requests are most welcome. 
Please make any pull requests against the master branch at https://github.com/UofABioinformaticsHub/strandCheckR and file issues at https://github.com/UofABioinformaticsHub/strandCheckR/issues

## Author Contributions

- *Thu-Hien To* authored the vast majority of code within the package along with unit tests
- *Thu-Hien To* and *Stevie Pederson* worked closely together on the package design and 
methodology

## License

`strandCheckR` is licensed under [GPL >= 2.0](https://www.r-project.org/Licenses/GPL-2)

```{r, echo=FALSE, results='hide'}
## Clean up the files generated by the above 
file.remove("s2.filtered.bam", "s2.filtered.bam.bai", "out.stat")
```


## Session Info

```{r sessio-info, echo = FALSE}
sessionInfo()
```

Owner

  • Name: University of Adelaide, Bioinformatics Hub
  • Login: UofABioinformaticsHub
  • Kind: organization
  • Location: Adelaide, South Australia

JOSS Publication

strandCheckR: An R package for quantifying and removing double strand sequences for strand-specific RNA-seq
Published
February 17, 2019
Volume 4, Issue 34, Page 1145
Authors
Thu-Hien To ORCID
Bioinformatics Hub - University of Adelaide
Stephen M. Pederson ORCID
Bioinformatics Hub - University of Adelaide
Editor
Melissa Gymrek ORCID
Tags
Bioinformatics RNA-seq strand specific DNA contamination

GitHub Events

Total
  • Issues event: 1
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 5
  • Pull request event: 1
  • Create event: 4
Last Year
  • Issues event: 1
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 5
  • Pull request event: 1
  • Create event: 4

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 278
  • Total Committers: 11
  • Avg Commits per committer: 25.273
  • Development Distribution Score (DDS): 0.345
Past Year
  • Commits: 9
  • Committers: 3
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.444
Top Committers
Name Email Commits
tothuhien t****n@g****m 182
steveped s****u@g****m 31
Thu Hien t****t@c****o 19
Nitesh Turaga n****a@g****m 14
J Wokaty j****y@s****u 10
Thu-Hien To h****n@T****l 10
Steve Pederson s****d 6
vobencha v****a@g****m 2
A Wokaty a****y@s****u 2
rmflight r****9@g****m 1
LiNk-NY m****9@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 12
  • Total pull requests: 4
  • Average time to close issues: 7 days
  • Average time to close pull requests: almost 3 years
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rmflight (9)
  • smped (3)
Pull Request Authors
  • smped (3)
  • rmflight (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
bioconductor.org: strandCheckR

Calculate strandness information of a bam file

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Forks count: 18.4%
Average: 24.2%
Stargazers count: 33.2%
Downloads: 69.6%
Maintainers (1)
Last synced: 6 months ago