Bedtoolsr

Bedtoolsr: An R package for genomic data analysis and manipulation - Published in JOSS (2019)

https://github.com/phanstiellab/bedtoolsr

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, zenodo.org
  • Committers with academic emails
    5 of 9 committers (55.6%) from academic institutions
  • Institutional organization owner
    Organization phanstiellab has institutional domain (phanstiel-lab.med.unc.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

R package wrapping bedtools

Basic Info
  • Host: GitHub
  • Owner: PhanstielLab
  • License: other
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 769 KB
Statistics
  • Stars: 42
  • Watchers: 2
  • Forks: 4
  • Open Issues: 1
  • Releases: 10
Created over 7 years ago · Last pushed 12 months ago
Metadata Files
Readme License

README.md

R-CMD-check <!-- badges: end -->

DOI

Overview

The bedtools suite of programs is a widely used set of various utilities for genomic analysis. This R package provides a convenient wrapper for bedtools functions allowing for the documentation and use of them from within the R environment. This includes manual pages for all functions as well as key added features including the ability to provide either file paths or R objects as inputs and outputs.

Installation

bedtoolsr can be installed directly from GitHub using the following commands:

install.packages("devtools") devtools::install_github("PhanstielLab/bedtoolsr")

Note that if bedtools is not found in R's PATH, or you want to use a specific version, you can manually specify the desired directory in R with:

options(bedtools.path = "[bedtools path]")

You can also install a specific release for a particular version of bedtools. Download the corresponding zip file (not the source code), decompress, and install from R:

install.packages("/path/to/bedtoolsr_v2.28.0-7", type="source", repos=NULL)

Operating System Support

bedtoolsr should work on any system with R and bedtools installed. It has been tested on macOS (version 10.14 "Mojave") and Linux (Ubuntu version 18.04). bedtools is not available for Windows; however, you can either use a virtual machine or Windows Subsystem for Linux.

Example Usage

``` A.bed <- data.frame(chrom=c("chr1", "chr1"), start=c(10, 30), end=c(20, 40)) B.bed <- data.frame(chrom=c("chr1"), start=15, end=20)

bedtoolsr::bt.intersect(A.bed, B.bed) V1 V2 V3 1 chr1 15 20 ```

Complex Example

In this more complex example, loop calls in bedpe format are downloaded from Phanstiel et al., 2017 and CTCF ChIP-seq peak calls are downloaded from Van Bortle et al., 2017. bedtoolsr is used to add 5kb on either side of the CTCF peaks with the bedtoolsr::bt.slop function before the bedtoolsr::bt.pairtobed function computes overlap of CTCF peaks with either both, one, or neither loop anchor from the loop call bedpe file. The total and percentage of loops found in each case is calculated then plotted with ggplot2.

```

Download and unzip loop calls bedpe file from Phanstiel et. al, 2017

download.file(url = "https://www.cell.com/cms/10.1016/j.molcel.2017.08.006/attachment/0a5229f1-46bb-4aae-aa42-33651377e633/mmc3.zip", destfile = "LoopsPMA.txt.zip") unzip(zipfile = "LoopsPMA.txt.zip", files = "molcel6338TableS2Loops_PMA.txt")

Download CTCF Peak bed file from Van Bortle et. al, 2017

download.file(url = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE96800&format=file&file=GSE96800%5FCTCF%5Fpeak%5FedgeR%5Fraw%2Etxt%2Egz", destfile = "CTCF_peaks.txt.gz")

Read in bed and bedpe files (optional but helfpul for subsequent R operations)

loopsBedpe <- read.table(file = "molcel6338TableS2LoopsPMA.txt", header = T) ctcfPeaks <- read.table(file = gzfile("CTCFpeaks.txt.gz"), header = T)

Add 5kb up/downstream from each ctcf peak with the bedtoolsr slop function

ctcfPeaks <- bedtoolsr::bt.slop(i = ctcfPeaks, g = "hg19", b = 5000, header = T)

Compute overlaps with bedtoolsr (find loops that have CTCF bound at both, one, or no ends?)

bothEnds <- bedtoolsr::bt.pairtobed(a = loopsBedpe, b = ctcfPeaks, type = "both") oneEnd <- bedtoolsr::bt.pairtobed(a = loopsBedpe, b = ctcfPeaks, type = "xor") neitherEnd <- bedtoolsr::bt.pairtobed(a = loopsBedpe, b = ctcfPeaks, type = "neither")

Count the number of loops found in each case

totalN <- nrow(unique(loopsBedpe)) bothN <- nrow(unique(bothEnds[,1:10])) oneN <- nrow(unique(oneEnd[,1:10])) neitherN <- nrow(unique(neitherEnd[,1:10]))

Calculate percentages of the total number of loops and round

bothP <- round(bothN/totalN100) oneP <- round(oneN/totalN100) neitherP <- round(neitherN/totalN*100)

Create data frame to plot results

df <- data.frame( group = c("Both", "One", "Neither"), number = c(bothN, oneN, neitherN), percent = c(bothP, oneP, neitherP) )

Plot results

library(ggplot2) ggplot(data = df, aes(x = 1, y = percent, fill = group))+ geomcol(col = "white")+ coordpolar("y") + scalefillmanual(values = c("#2171b5", "#bdd7e7", "#6baed6"))+ labs(title = "Loop anchors with bound CTCF")+ annotate(geom = "text", x = c(1.0, 1.25, 1.75), y = c(60, 8, 17.25), label = paste0(df$group, " (", df$percent, "%)"), col = c("white", "white", "black"), size = 4.5)+ themevoid()+ theme( plot.title = elementtext(hjust = 0.5, vjust = -15, face = "bold"), legend.position = "none", text = element_text(face = "bold") ) ```

Building for a different version of bedtools

In order to more easily support past and future versions of bedtools we adopted a metaprogramming approach. A single python script reads bedtools --help output and automatically generates the entire R package. It was designed to be generic so that it can be rebuilt quickly for any version of bedtools.

To generate a new version of bedtoolsr, run makePackage.py. There are command-line arguments for the location of bedtools, where the output package should go, and the package version suffix. Special cases are specified in anomalies.json.

Testing

bedtoolsr uses continuous integration made possible by unit tests using the testthat R package. Once installed you can perform unit tests for most of the bedtoolsr functions using the following code:

First, install testthat if not already installed: install.packages('testthat') `

Load bedtoolsr and testthat: library('testthat') library('bedtoolsr')

Perform tests: testthat::test_package("bedtoolsr")

Expected results: ══ testthat results ══════════════════════════════════════════════════════ OK: 24 SKIPPED: 0 FAILED: 0

Contributions

We welcome user feedback and contributions on this package. If you have a question or a problem, the best approach is to report it is through GitHub's issue tracker. If you want to propose a change to the source code, either to fix a bug or make an improvement, use a pull request.

Website

For more information, please see the bedtoolsr website.

Authors

Contact

douglas_phanstiel@med.unc.edu

Owner

  • Name: The Phanstiel Lab
  • Login: PhanstielLab
  • Kind: organization

Code repo for the Phanstiel Lab at UNC

GitHub Events

Total
  • Issues event: 3
  • Watch event: 4
  • Issue comment event: 12
  • Push event: 3
Last Year
  • Issues event: 3
  • Watch event: 4
  • Issue comment event: 12
  • Push event: 3

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 157
  • Total Committers: 9
  • Avg Commits per committer: 17.444
  • Development Distribution Score (DDS): 0.363
Past Year
  • Commits: 4
  • Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Craig Wenger c****r@g****m 100
Doug Phanstiel d****i@s****u 18
Douglas Phanstiel d****i@w****u 16
Eric Davis e****s@o****m 7
Mayura Patwardhan m****n@M****l 5
Mayura Patwardhan m****n@M****m 4
Mayura Patwardhan m****n@m****u 3
mayurapatwardhan m****a@e****u 2
Daniel S. Katz d****z@i****g 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 15
  • Total pull requests: 3
  • Average time to close issues: 4 months
  • Average time to close pull requests: 35 minutes
  • Total issue authors: 14
  • Total pull request authors: 2
  • Average comments per issue: 2.47
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 2 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 4.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nhejazi (2)
  • ppolonen (1)
  • agduncan94 (1)
  • Adnanhashim (1)
  • yaaminiv (1)
  • rongxinzh (1)
  • wanghlv (1)
  • RegnerM2015 (1)
  • jordimaggi (1)
  • blc49 (1)
  • HuGang-c (1)
  • zhang919 (1)
  • 14stutzmanav (1)
  • amitjavilaventura (1)
Pull Request Authors
  • danielskatz (2)
  • mayurapatwardhan (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • utils * imports
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v1 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite