HiCcompare

Joint normalization of two Hi-C matrices, visualization and detection of differential chromatin interactions. See multiHiCcompare for the analysis of multiple Hi-C matrices

https://github.com/dozmorovlab/hiccompare

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    3 of 12 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

difference-detection hi-c hic normalization visualization

Keywords from Contributors

bioconductor-package gene genomics transcriptomics proteomics rna-seq tumor-heterogeneity tumor-mutational-burden tumor-purity core-package
Last synced: 6 months ago · JSON representation

Repository

Joint normalization of two Hi-C matrices, visualization and detection of differential chromatin interactions. See multiHiCcompare for the analysis of multiple Hi-C matrices

Basic Info
Statistics
  • Stars: 15
  • Watchers: 3
  • Forks: 3
  • Open Issues: 1
  • Releases: 0
Topics
difference-detection hi-c hic normalization visualization
Created over 8 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

HiCcompare

Stansfield, John C., Kellen G. Cresswell, Vladimir I. Vladimirov, and Mikhail G. Dozmorov. HiCcompare: An R-Package for Joint Normalization and Comparison of HI-C Datasets.” BMC Bioinformatics 19, no. 1 (December 2018).

Overview

HiCcompare provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. HiCcompare is available as an R package, the major releases can be found on Bioconductor here.

If you have more than two Hi-C datasets which you need to normalize or compare please see our other package, multiHiCcompare, which is available on Bioconductor here.

HiCcompare accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources such as the http://aidenlab.org/data.html and http://cooler.readthedocs.io/en/latest/index.html. HiCcompare is designed to give the user the ability to perform a comparative analysis on the 3-Dimensional structure of the genomes of cells in different biological states. HiCcompare first can jointly normalize two Hi-C datasets to remove biases between them. Then it can detect signficant differences between the datsets using a genomic distance based permutation test. The novel concept of the MD plot, based on the commonly used MA plot or Bland-Altman plot is the basis for these methods. The log Minus is plotted on the y axis while the genomic Distance is plotted on the x axis. The MD plot allows for visualization of the differences between the Hi-C datasets.

The main functions are: + hic_loess() which performs joint loess normalization on the Hi-C datasets + hic_compare() which performs the difference detection process to detect significant changes between Hi-C datasets and assist in comparative analysis

Several Hi-C datasets are also included in the package.

Read the full paper describing the methods behind HiCcompare here

Installation

First make sure you have all dependencies installed in R.

``` install.packages(c('dplyr', 'data.table', 'ggplot2', 'gridExtra', 'mgcv', 'parallel', 'devtools'))

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install(c("InteractionSet", "GenomicRanges", "IRanges", "BiocParallel", "QDNAseq", "GenomeInfoDbData"))
```

To install HiCcompare from bioconductor open R and enter the following commands. Currently it is recommended to use the GitHub release or the development version of the bioconductor release.

```

Bioconductor development version and Github Release contain major changes for difference detection

it is recommended to use the github release until the next Bioconductor update

try http:// if https:// URLs are not supported

if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("HiCcompare") library(HiCcompare) ```

Or to install the latest version of HiCcompare directly from the github release open R and enter the following commands.

library(devtools) install_github('dozmorovlab/HiCcompare', build_vignettes = TRUE) library(HiCcompare)

Usage

First you will need to obtain some Hi-C data. Data is available from the sources listed in the overview along with many others. You will need to extract the data and read it into R as either a 3 column sparse upper triangular matrix or a 7 column BEDPE file. For more details on data extraction see the vignette included with HiCcompare.

Below is an example analysis using HiCcompare. The data in 3 column sparse upper triangular matrix format is loaded and the first step is to create a hic.table object using the create.hic.table() function. Next, the two Hi-C matrices are jointly normalized using the hic_loess() function. Finally, difference detection can be performed using the hic_compare() function. The hic_loess() and hic_compare() functions will also produce an MD plot for visualizing the differences between the datasets.

```

load data

library(HiCcompare) data("HMEC.chr22") data("NHEK.chr22")

create the hic.table object

chr22.table = create.hic.table(HMEC.chr22, NHEK.chr22, chr = 'chr22') head(chr22.table)

Jointly normalize data for a single chromosome

hic.table = hic_loess(chr22.table, Plot = TRUE) head(hic.table)

input hic.table object into hic_compare

hic.table = hic_compare(hic.table, Plot = TRUE) head(hic.table) ```

Refer to the HiCcompare vignette for full usage instructions. For a full explanation of the methods used in HiCcompare see the manuscript here.

To view the usage vignette:

browseVignettes("HiCcompare")

Tutorial for Differential Analysis of Hi-C Data

For more detailed instructions and examples on how to perform differential analyses on Hi-C data please see our tutorial paper "R Tutorial: Detection of Differentially Interacting Chromatin Regions From Multiple Hi‐C Dataset" published in Current Protocols in Bioinformatics. https://doi.org/10.1002/cpbi.76

Branches

  • Master: contains the current stable release of HiCcompare
  • supplemental: contains supplementary files and data, see Additional Vignettes section below
  • manuscript_bioinformatics: contains write up for submission to Bioinformatics
  • test_version: contains versions of HiCcommpare currently in development. This version of the software may be unstable and is not reccomended for users.

Additional Vignettes

The HiCcompare paper included several supplemental files that showcase some of the usage and reasoning behind the methods. Below are the titles and brief descriptions of each of these vignettes along with links to the compiled .pdf and the source .Rmd files.

Normalization method comparison.

Comparison of several Hi-C normalization techniques to display the persistence of bias in individually normalized chromatin interaction matrices, and its effect on the detection of differential chromatin interactions.

Compiled

Source

S2 File. Estimation of the IF power-law depencence.

Estimation of the power-law depencence between the $log{10}-log{10}$ interaction frequencies and distance between interacting regions. This vignette displays the reasoning behind using a power-law function for the simulation of the signal portion of Hi-C matrices.

Compiled

Source

S3 File. Estimation of the SD power-law dependence.

Estimation of the power-law depencence between the $log{10}-log{10}$ SD of interaction frequencies and distance between interacting regions. This vignette displays the reasoning behind using a power-law function for the simulation of the noise component of Hi-C matrices.

Compiled

Source

S4 File. Estimation of proportion of zeros.

Estimation of the depencence between the proportion of zeros and distance between interacting regions. This vignette shows distribution of zeros in real Hi-C data. The results were used for modeling the proportion of zeros in simulated Hi-C matrices with a linear function.

Compiled

Source

S5 File. Evaluation of difference detection in simulated data.

Extended evaluation of differential chromatin interaction detection analysis using simulated Hi-C data. Many different classifier performance measures are presented. Note: if trying to compile the source .Rmd this will take a long time to knit.

Compiled

Source

S6 File. Evaluation of difference detection in real data.

Extended evaluation of differential chromatin interaction detection analysis using real Hi-C data. Many different classifier performance measures are presented. Note: if trying to compile the source .Rmd this will take a long time to knit.

Compiled

Source

S7 File. loess at varying resolution.

Visualization of the loess loint normalization over varying resolutions. This vignette shows that increasing sparsity of Hi-C matrices with increasing resolution causes loess to become less useful for normalization at high resolutions.

Compiled

Source

Citation

Please cite HiCcompare if you use it in your analysis.

John C. Stansfield, Kellen G. Cresswell, Vladimir I. Vladimirov, Mikhail G. Dozmorov, HiCcompare: an R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics. 2018 Jul 31;19(1):279. doi: 10.1186/s12859-018-2288-x.

Contributions & Support

Suggestions for new features and bug reports are welcome. Please create a new issue for any of these or contact the author directly: @jstansfield0 (stansfieldjc@vcu.edu)

Contributors

Authors: @jstansfield0 (stansfieldjc@vcu.edu) & @mdozmorov (mikhail.dozmorov@vcuhealth.org)

Owner

  • Name: Dozmorov Lab
  • Login: dozmorovlab
  • Kind: organization

Genomics, bionformatics, computational biology, 3D genome, Hi-C

GitHub Events

Total
  • Watch event: 4
  • Issue comment event: 4
Last Year
  • Watch event: 4
  • Issue comment event: 4

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 192
  • Total Committers: 12
  • Avg Commits per committer: 16.0
  • Development Distribution Score (DDS): 0.214
Past Year
  • Commits: 8
  • Committers: 3
  • Avg Commits per committer: 2.667
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
jstansfield0 s****c@v****u 151
Nitesh Turaga n****a@g****m 14
Mikhail Dozmorov m****v@g****m 12
Hervé Pagès h****s@f****g 3
J Wokaty j****y@s****u 2
vobencha v****n@r****g 2
vobencha v****a@g****m 2
J Wokaty j****y 2
Hervé Pagès h****b@g****m 1
cresswellkg c****g@v****u 1
LiNk-NY m****9@g****m 1
Martin Morgan m****n@f****g 1
Committer Domains (Top 20 + Academic)

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 28,409 total
  • Total dependent packages: 3
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
bioconductor.org: HiCcompare

HiCcompare: Joint normalization and comparative analysis of multiple Hi-C datasets

  • Versions: 5
  • Dependent Packages: 3
  • Dependent Repositories: 0
  • Downloads: 28,409 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 14.9%
Downloads: 44.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.4.0 depends
  • dplyr * depends
  • BiocParallel * imports
  • GenomicRanges * imports
  • IRanges * imports
  • InteractionSet * imports
  • KernSmooth * imports
  • S4Vectors * imports
  • data.table * imports
  • ggplot2 * imports
  • graphics * imports
  • gridExtra * imports
  • gtools * imports
  • methods * imports
  • mgcv * imports
  • pheatmap * imports
  • rhdf5 * imports
  • stats * imports
  • utils * imports
  • knitr * suggests
  • multiHiCcompare * suggests
  • rmarkdown * suggests
  • testthat * suggests