methrix

An R :package: for fast and flexible DNA methylation analysis

https://github.com/compepigen/methrix

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary

Keywords

bedgraph bioinformatics dna-methylation

Keywords from Contributors

bioconductor-package gene
Last synced: 7 months ago · JSON representation

Repository

An R :package: for fast and flexible DNA methylation analysis

Basic Info
Statistics
  • Stars: 33
  • Watchers: 7
  • Forks: 12
  • Open Issues: 3
  • Releases: 0
Topics
bedgraph bioinformatics dna-methylation
Created about 7 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License

README.md


Fast and efficient summarization of generic bedGraph files from Bisufite sequencing

Lifecycle: maturing BioC status

Introduction

Bedgraph files generated by BS pipelines often come in various flavors. Critical downstream step requires aggregation of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.

Package documentation

  • For a short and quick documentation, see the Bioconductor vignette

  • A exemplary complete data analysis with steps from reading in to annotation and differential methylation calling can be find in our WGBS best practices workflow.

Citation

Mayakonda A, Schönung M, Hey J, Batra RN, Feuerstein-Akgoz C, Köhler K, Lipka DB, Sotillo R, Plass C, Lutsik P, Toth R. Methrix: an R/bioconductor package for systematic aggregation and analysis of bisulfite sequencing data. Bioinformatics. 2020 Dec 21:btaa1048. doi: 10.1093/bioinformatics/btaa1048. Epub ahead of print. PMID: 33346800.

Installation

```r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

Installing stable version from BioConductor

BiocManager::install("methrix")

Installing developmental version from GitHub

BiocManager::install("CompEpigen/methrix") ```

Ideally one should use the newest versions of R and BioC versions. In case of the older versions (for e.g, R < 4.0), installing from BioConductor might lead to installing an older version of the package. In that case installing from GitHub might be easier since it is more merciful with regards to versions.

Features

  • Faster summarization of generic bedGraph files with data.table back-end
  • Fills missing CpGs from reference genome
  • Vectorized code (faster, memory expensive) and non-vectorized code (slower, minimal memory)
  • Built upon SummarizedExperiment with custom methods for CpG extraction, sub-setting, and filtering
  • Easy conversion to bsseq object for downstream analysis
  • Extensive one click interactive html report generation. See here for an example
  • Supports serialized arrays with HDF5Array and saveHDF5SummarizedExperiment

Updates:

see here

Quick usage:

Usage is simple and involves generating a methrix object using read_bedgraphs() command which can then be passed to all downstream analyses.

Below is the two step procedure to import WGBS bedGraph files.

Step-1: Extract all CpG loci from the reference genome

```r

hg19cpgs = methrix::extractCPGs(ref_genome = "BSgenome.Hsapiens.UCSC.hg19") -Extracting CpGs -Done. Extracted 29,891,155 CpGs from 298 contigs. There were 50 or more warnings (use warnings() to see the first 50) ```

Step-2: Read in bedgraphs and generate a methrix object

The example data of the methrix package is used.

```r

Example bedgraph files

bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = "*bedGraph\.gz$", full.names = TRUE)

meth = methrix::readbedgraphs(files = bdgfiles, refcpgs = hg19cpgs, chridx = 1, startidx = 2, Midx = 3, Uidx = 4,

stranded = TRUE, collapse_strands = TRUE)

-Preset: Custom --Missing beta and coverage info. Estimating them from M and U values -CpGs raw: 29,891,155 (total reference CpGs) -CpGs retained: 28,217,448(reference CpGs from contigs of interest)

-CpGs stranded: 56,434,896(reference CpGs from both strands)

-Processing: C1.bedGraph.gz --CpGs missing: 56,434,219 (from known reference CpGs) -Processing: C2.bedGraph.gz --CpGs missing: 56,434,207 (from known reference CpGs) -Processing: N1.bedGraph.gz --CpGs missing: 56,434,194 (from known reference CpGs) -Processing: N2.bedGraph.gz --CpGs missing: 56,434,195 (from known reference CpGs) -Finished in: 00:02:00 elapsed (00:02:23 cpu)

meth An object of class methrix nCpGs: 28,217,448 nsamples: 4 is_h5: FALSE Reference: hg19 ```

Methrix operations

What can be done on methrix object? Following are the key functions

```r

reading and writing:

readbedgraphs() #Reads in bedgraph files into methrix writebedgraphs() #Writes bedGraphs from methrix object write_bigwigs() #Writes bigWigs from methrix object

operations

orderbysd() #Orders methrix object by SD regionfilter() #Filters matrices by region maskmethrix() #Masks lowly covered CpGs coveragefilter() #Filters methrix object based on coverage subsetmethrix() #Subsets methrix object based on given conditions. removeuncovered() #Removes loci that are uncovered across all samples removesnps() #Removes loci overlapping with possible SNPs

Visualization and QC

methrixreport() #Creates a detailed interative html summary report from methrix object methrixpca() #Principal Component Analysis plotpca() #Plots the result of PCA plotcoverage() #Plots coverage statistics plotdensity() #Plots the density distribution of the beta values plotviolin() #Plots the distribution of the beta values on a violin plot plotstats() #Plot descriptive statistics getstats() #Estimate descriptive statistics of the object

Other

methrix2bsseq() #Convert methrix to bsseq object

```

Owner

  • Name: Computational Cancer Epigenomics @ DKFZ
  • Login: CompEpigen
  • Kind: organization

GitHub Events

Total
  • Issues event: 4
  • Watch event: 7
  • Issue comment event: 7
  • Push event: 3
  • Fork event: 1
Last Year
  • Issues event: 4
  • Watch event: 7
  • Issue comment event: 7
  • Push event: 3
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 219
  • Total Committers: 18
  • Avg Commits per committer: 12.167
  • Development Distribution Score (DDS): 0.621
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
tkike r****1@g****m 83
poisonalien a****3@g****m 83
Rajbir N Batra r****a@d****e 11
Toth t****r@a****e 7
Nitesh Turaga n****a@g****m 6
Maximilian Schönung m****g@c****e 5
HeyLifeHD 3****D 4
Anand Mayakonda a****d@p****n 3
Hervé Pagès h****b@g****m 3
ClarissaFeuersteinAkgoz 4****z 3
Maximilian Schönung m****g@c****l 2
HeyLifeHD j****y@g****m 2
Maximilian Schönung 3****g 2
Mayakonda Thippeswamy m****d@v****e 1
Valentin Maurer 4****v 1
Pavlo Lutsik p****k@g****m 1
mayakond m****d@c****e 1
rnbatra R****a@d****e 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 29
  • Total pull requests: 6
  • Average time to close issues: 10 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 11
  • Total pull request authors: 3
  • Average comments per issue: 1.93
  • Average comments per pull request: 1.83
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: 4 days
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 1.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • lutsik (10)
  • ClarissaFeuersteinAkgoz (5)
  • PoisonAlien (4)
  • YoannPa (2)
  • ClarissaFeuerstein (2)
  • richardheery (1)
  • jamespeapen (1)
  • questcof (1)
  • pangjli (1)
  • jarcasariego (1)
  • Christian-Heyer (1)
Pull Request Authors
  • ClarissaFeuersteinAkgoz (4)
  • rnbatra (1)
  • maurerv (1)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 11,785 total
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
bioconductor.org: methrix

Fast and efficient summarization of generic bedGraph files from Bisufite sequencing

  • Versions: 6
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 11,785 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Forks count: 6.5%
Stargazers count: 6.9%
Average: 17.3%
Downloads: 72.9%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.6 depends
  • SummarizedExperiment * depends
  • data.table >= 1.12.4 depends
  • BSgenome * imports
  • DelayedArray * imports
  • DelayedMatrixStats * imports
  • GenomicRanges * imports
  • HDF5Array * imports
  • IRanges * imports
  • ggplot2 * imports
  • graphics * imports
  • matrixStats * imports
  • methods * imports
  • parallel * imports
  • rtracklayer * imports
  • stats * imports
  • utils * imports
  • BSgenome.Mmusculus.UCSC.mm9 * suggests
  • Biostrings * suggests
  • DSS * suggests
  • GenomeInfoDb * suggests
  • GenomicScores * suggests
  • MafDb.1Kgenomes.phase3.GRCh38 * suggests
  • MafDb.1Kgenomes.phase3.hs37d5 * suggests
  • RColorBrewer * suggests
  • bsseq * suggests
  • knitr * suggests
  • plotly * suggests
  • rmarkdown * suggests
  • testthat >= 2.1.0 suggests