Bedtoolsr
Bedtoolsr: An R package for genomic data analysis and manipulation - Published in JOSS (2019)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, zenodo.org -
✓Committers with academic emails
5 of 9 committers (55.6%) from academic institutions -
✓Institutional organization owner
Organization phanstiellab has institutional domain (phanstiel-lab.med.unc.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.6%) to scientific vocabulary
Repository
R package wrapping bedtools
Basic Info
Statistics
- Stars: 42
- Watchers: 2
- Forks: 4
- Open Issues: 1
- Releases: 10
Metadata Files
README.md
Overview
The bedtools suite of programs is a widely used set of various utilities for genomic analysis. This R package provides a convenient wrapper for bedtools functions allowing for the documentation and use of them from within the R environment. This includes manual pages for all functions as well as key added features including the ability to provide either file paths or R objects as inputs and outputs.
Installation
bedtoolsr can be installed directly from GitHub using the following commands:
install.packages("devtools")
devtools::install_github("PhanstielLab/bedtoolsr")
Note that if bedtools is not found in R's PATH, or you want to use a specific version, you can manually specify the desired directory in R with:
options(bedtools.path = "[bedtools path]")
You can also install a specific release for a particular version of bedtools. Download the corresponding zip file (not the source code), decompress, and install from R:
install.packages("/path/to/bedtoolsr_v2.28.0-7", type="source", repos=NULL)
Operating System Support
bedtoolsr should work on any system with R and bedtools installed. It has been tested on macOS (version 10.14 "Mojave") and Linux (Ubuntu version 18.04). bedtools is not available for Windows; however, you can either use a virtual machine or Windows Subsystem for Linux.
Example Usage
``` A.bed <- data.frame(chrom=c("chr1", "chr1"), start=c(10, 30), end=c(20, 40)) B.bed <- data.frame(chrom=c("chr1"), start=15, end=20)
bedtoolsr::bt.intersect(A.bed, B.bed) V1 V2 V3 1 chr1 15 20 ```
Complex Example
In this more complex example, loop calls in bedpe format are downloaded from Phanstiel et al., 2017 and CTCF ChIP-seq peak calls are downloaded from Van Bortle et al., 2017. bedtoolsr is used to add 5kb on either side of the CTCF peaks with the bedtoolsr::bt.slop function before the bedtoolsr::bt.pairtobed function computes overlap of CTCF peaks with either both, one, or neither loop anchor from the loop call bedpe file. The total and percentage of loops found in each case is calculated then plotted with ggplot2.
```
Download and unzip loop calls bedpe file from Phanstiel et. al, 2017
download.file(url = "https://www.cell.com/cms/10.1016/j.molcel.2017.08.006/attachment/0a5229f1-46bb-4aae-aa42-33651377e633/mmc3.zip", destfile = "LoopsPMA.txt.zip") unzip(zipfile = "LoopsPMA.txt.zip", files = "molcel6338TableS2Loops_PMA.txt")
Download CTCF Peak bed file from Van Bortle et. al, 2017
download.file(url = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE96800&format=file&file=GSE96800%5FCTCF%5Fpeak%5FedgeR%5Fraw%2Etxt%2Egz", destfile = "CTCF_peaks.txt.gz")
Read in bed and bedpe files (optional but helfpul for subsequent R operations)
loopsBedpe <- read.table(file = "molcel6338TableS2LoopsPMA.txt", header = T) ctcfPeaks <- read.table(file = gzfile("CTCFpeaks.txt.gz"), header = T)
Add 5kb up/downstream from each ctcf peak with the bedtoolsr slop function
ctcfPeaks <- bedtoolsr::bt.slop(i = ctcfPeaks, g = "hg19", b = 5000, header = T)
Compute overlaps with bedtoolsr (find loops that have CTCF bound at both, one, or no ends?)
bothEnds <- bedtoolsr::bt.pairtobed(a = loopsBedpe, b = ctcfPeaks, type = "both") oneEnd <- bedtoolsr::bt.pairtobed(a = loopsBedpe, b = ctcfPeaks, type = "xor") neitherEnd <- bedtoolsr::bt.pairtobed(a = loopsBedpe, b = ctcfPeaks, type = "neither")
Count the number of loops found in each case
totalN <- nrow(unique(loopsBedpe)) bothN <- nrow(unique(bothEnds[,1:10])) oneN <- nrow(unique(oneEnd[,1:10])) neitherN <- nrow(unique(neitherEnd[,1:10]))
Calculate percentages of the total number of loops and round
bothP <- round(bothN/totalN100) oneP <- round(oneN/totalN100) neitherP <- round(neitherN/totalN*100)
Create data frame to plot results
df <- data.frame( group = c("Both", "One", "Neither"), number = c(bothN, oneN, neitherN), percent = c(bothP, oneP, neitherP) )
Plot results
library(ggplot2) ggplot(data = df, aes(x = 1, y = percent, fill = group))+ geomcol(col = "white")+ coordpolar("y") + scalefillmanual(values = c("#2171b5", "#bdd7e7", "#6baed6"))+ labs(title = "Loop anchors with bound CTCF")+ annotate(geom = "text", x = c(1.0, 1.25, 1.75), y = c(60, 8, 17.25), label = paste0(df$group, " (", df$percent, "%)"), col = c("white", "white", "black"), size = 4.5)+ themevoid()+ theme( plot.title = elementtext(hjust = 0.5, vjust = -15, face = "bold"), legend.position = "none", text = element_text(face = "bold") ) ```

Building for a different version of bedtools
In order to more easily support past and future versions of bedtools we adopted a metaprogramming approach. A single python script reads bedtools --help output and automatically generates the entire R package. It was designed to be generic so that it can be rebuilt quickly for any version of bedtools.
To generate a new version of bedtoolsr, run makePackage.py. There are command-line arguments for the location of bedtools, where the output package should go, and the package version suffix. Special cases are specified in anomalies.json.
Testing
bedtoolsr uses continuous integration made possible by unit tests using the testthat R package. Once installed you can perform unit tests for most of the bedtoolsr functions using the following code:
First, install testthat if not already installed:
install.packages('testthat')
`
Load bedtoolsr and testthat:
library('testthat')
library('bedtoolsr')
Perform tests:
testthat::test_package("bedtoolsr")
Expected results:
══ testthat results ══════════════════════════════════════════════════════
OK: 24 SKIPPED: 0 FAILED: 0
Contributions
We welcome user feedback and contributions on this package. If you have a question or a problem, the best approach is to report it is through GitHub's issue tracker. If you want to propose a change to the source code, either to fix a bug or make an improvement, use a pull request.
Website
For more information, please see the bedtoolsr website.
Authors
Contact
douglas_phanstiel@med.unc.edu
Owner
- Name: The Phanstiel Lab
- Login: PhanstielLab
- Kind: organization
- Website: http://phanstiel-lab.med.unc.edu/
- Repositories: 15
- Profile: https://github.com/PhanstielLab
Code repo for the Phanstiel Lab at UNC
GitHub Events
Total
- Issues event: 3
- Watch event: 4
- Issue comment event: 12
- Push event: 3
Last Year
- Issues event: 3
- Watch event: 4
- Issue comment event: 12
- Push event: 3
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Craig Wenger | c****r@g****m | 100 |
| Doug Phanstiel | d****i@s****u | 18 |
| Douglas Phanstiel | d****i@w****u | 16 |
| Eric Davis | e****s@o****m | 7 |
| Mayura Patwardhan | m****n@M****l | 5 |
| Mayura Patwardhan | m****n@M****m | 4 |
| Mayura Patwardhan | m****n@m****u | 3 |
| mayurapatwardhan | m****a@e****u | 2 |
| Daniel S. Katz | d****z@i****g | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 15
- Total pull requests: 3
- Average time to close issues: 4 months
- Average time to close pull requests: 35 minutes
- Total issue authors: 14
- Total pull request authors: 2
- Average comments per issue: 2.47
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 2 days
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 4.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- nhejazi (2)
- ppolonen (1)
- agduncan94 (1)
- Adnanhashim (1)
- yaaminiv (1)
- rongxinzh (1)
- wanghlv (1)
- RegnerM2015 (1)
- jordimaggi (1)
- blc49 (1)
- HuGang-c (1)
- zhang919 (1)
- 14stutzmanav (1)
- amitjavilaventura (1)
Pull Request Authors
- danielskatz (2)
- mayurapatwardhan (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- utils * imports
- testthat * suggests
- actions/checkout v2 composite
- r-lib/actions/check-r-package v1 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
- r-lib/actions/setup-r-dependencies v1 composite