HiCParser

R package to parse HiC data into R

https://github.com/emaigne/hicparser

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

R package to parse HiC data into R

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing Code of conduct Support

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%"
)
```

# HiCParser



[![GitHub
issues](https://img.shields.io/github/issues/emaigne/HiCParser)](https://github.com/emaigne/HiCParser/issues)
[![GitHub
pulls](https://img.shields.io/github/issues-pr/emaigne/HiCParser)](https://github.com/emaigne/HiCParser/pulls)
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![Bioc release
status](http://www.bioconductor.org/shields/build/release/bioc/HiCParser.svg)](https://bioconductor.org/checkResults/release/bioc-LATEST/HiCParser)
[![Bioc devel
status](http://www.bioconductor.org/shields/build/devel/bioc/HiCParser.svg)](https://bioconductor.org/checkResults/devel/bioc-LATEST/HiCParser)
[![Bioc downloads
rank](https://bioconductor.org/shields/downloads/release/HiCParser.svg)](http://bioconductor.org/packages/stats/bioc/HiCParser/)
[![Bioc
support](https://bioconductor.org/shields/posts/HiCParser.svg)](https://support.bioconductor.org/tag/HiCParser)
[![Bioc
history](https://bioconductor.org/shields/years-in-bioc/HiCParser.svg)](https://bioconductor.org/packages/release/bioc/html/HiCParser.html#since)
[![Bioc last
commit](https://bioconductor.org/shields/lastcommit/devel/bioc/HiCParser.svg)](http://bioconductor.org/checkResults/devel/bioc-LATEST/HiCParser/)
[![Bioc
dependencies](https://bioconductor.org/shields/dependencies/release/HiCParser.svg)](https://bioconductor.org/packages/release/bioc/html/HiCParser.html#since)
[![R-CMD-check-bioc](https://github.com/emaigne/HiCParser/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/emaigne/HiCParser/actions/workflows/check-bioc.yml)
[![Codecov test
coverage](https://codecov.io/gh/emaigne/HiCParser/branch/devel/graph/badge.svg)](https://app.codecov.io/gh/emaigne/HiCParser?branch=devel)


The goal of `HiCParser` is to parse Hi-C data (`HiCParser` supports serveral 
formats), and import them in R, as an `InteractionSet` object.

## Installation instructions

Get the latest stable `R` release from
[CRAN](http://cran.r-project.org/). Then install `HiCParser` from
[Bioconductor](http://bioconductor.org/) using the following code:

```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("HiCParser")
```

And the development version from
[GitHub](https://github.com/emaigne/HiCParser) with:

```{r, eval = FALSE}
BiocManager::install("emaigne/HiCParser")
```

Then load the package : 

```{r}
library("HiCParser")
```

## Supported formats

So far, `HiCParser` supports:

  - [cool and mcool](https://github.com/open2c/cooler) formats
  - [hic](https://github.com/aidenlab/hictools) format
  - [HiC-Pro](https://github.com/nservant/HiC-Pro) format
  - A tabular format, where

      - the first column is named "chromosome"
      - the second column is named "position 1" or "position.1"
      - the third column is named "position 2" or "position.2"
      - the fourth column is named "*x*.R*y*", and *x* is the id of the condition ("1", or "2", usually), *y* is the id of the replicate ("1", "2", "3", etc.); it should contain matrix counts
      - the remaining columns are optional, and should be formatted like the fourth column

## Example

### hic format

We show here how to parse one hic format file.

```{r, eval = FALSE}
hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser")
data <- parseHiC(
    paths = hicFilePath,
    binSize = 5000000,
    conditions = 1,
    replicates = 1
)
```
Note that a hic file can include several matrices, with different bin sizes.
This is why the bin size should be provided.

We show here how to parse several files (actually, the same file, 
several times). We suppose here that we have 2 conditions, with 3 replicates 
for each condition.

```{r, eval = FALSE}
data <- parseHiC(
    paths = rep(hicFilePath, 6),
    binSize = 5000000,
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
```
Currently, `HiCParser` supports the hic format up to the version 9.

### HiC-Pro format

A HiC-Pro file contains a matrix file, and a bed file.
A different bed file could be use for each matrix file, 
but the same can also be used.

```{r, eval = FALSE}
matrixFilePath <- 
    system.file("extdata", "hicsample_21.matrix", package = "HiCParser")
bedFilePath <- 
    system.file("extdata", "hicsample_21.bed", package = "HiCParser")
data <- parseHiCPro(
    matrixPaths = rep(matrixFilePath, 6),
    bedPaths = bedFilePath,
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
```

### cool and mcool formats

Please note that the cool and mcool format store data in HDF5 format.
The [HDF5 package](https://bioconductor.org/packages/release/bioc/html/rhdf5.html) 
is not included by default, because it requires a substantial time to be 
compiled, and many users will not need the cool/mcool parser.
So, in order to use the cool/mcool parser, you should install the `rhdf5` 
package.

The cool format include only one bin size.

```{r, eval = FALSE}
if (!"rhdf5" %in% installed.packages()) {
    install.packages("rhdf5")
}
coolFilePath <- system.file("extdata",
    "hicsample_21.cool",
    package = "HiCParser"
)
data <- parseCool(
    paths = rep(coolFilePath, 6),
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
```

The mcool format may include several bin sizes.
It is thus compulsory to mention it.
The same function is used for the cool/mcool formats.

```{r, eval = FALSE}
mcoolFilePath <- system.file("extdata",
    "hicsample_21.mcool",
    package = "HiCParser"
)
data <- parseCool(
    paths = rep(mcoolFilePath, 6),
    binSize = 5000000,
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
```

## Tabular files

A tabular file is a tab-separated multi-replicate sparse matrix with a header:

```
chromosome    position 1    position 2    C1.R1    C1.R2    C1.R3    ...
Y             1500000       7500000       145      184      72       ...
```

The number of interactions between `position 1` and `position 2` of
`chromosome` are reported in each `condition.replicate` column. There is no
limit to the number of conditions and replicates.

To load Hi-C data in this format:

```{r tabFormat, eval = FALSE}
hic.experiment <- parseTabular(
    system.file("extdata",
        "hicsample_21.tsv",
        package = "HiCParser"
    ),
    sep = "\t"
)
```

### Output

The output is a [InteractionSet](https://bioconductor.org/packages/release/bioc/html/InteractionSet.html).
This object can store one or several samples.
Please read the [corresponding vignette](https://bioconductor.org/packages/devel/bioc/vignettes/InteractionSet/inst/doc/interactions.html) in order to known more about this format.

```{r}
library("HiCParser")
hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser")
hic.experiment <- parseHiC(
    paths = rep(hicFilePath, 6),
    binSize = 5000000,
    conditions = rep(seq(2), each = 3),
    replicates = rep(seq(3), 2)
)
hic.experiment
```

The conditions and replicates are reported in the `colData` slot : 

```{r}
SummarizedExperiment::colData(hic.experiment)
```

They corresponds to columns of the `assays` matrix (containing interactions values):

```{r}
head(SummarizedExperiment::assay(hic.experiment))
```

The positions of interactions are in the `interactions` slot of the object:

```{r}
InteractionSet::interactions(hic.experiment)
```


## Citation

Below is the citation output from using `citation('HiCParser')` in R.
Please run this yourself to check for any updates on how to cite
**HiCParser**.

To cite the ‘HiCParser’ HiCParser in a publication, use :

  Maigné E, Zytnicki M (2024). _A multiple format Hi-C data parser_.
  doi:10.18129/B9.bioc.HiCParser ,
  https://github.com/emaigne/HiCParser/HiCParser - R package version 0.1.0,
  .

As a BibTeX entry :

    @Manual{hicparser,
      title = {A multiple format Hi-C data parser},
      author = {Elise Maigné and Matthias Zytnicki},
      year = {2024},
      url = {http://www.bioconductor.org/packages/HiCParser},
      note = {https://github.com/emaigne/HiCParser/HiCParser - R package version 0.1.0},
      doi = {10.18129/B9.bioc.HiCParser},
    }

Please note that the `HiCParser` was only made possible thanks to many
other R and bioinformatics software authors, which are cited either in
the vignettes and/or the paper(s) describing this package.

## Code of Conduct

Please note that the `HiCParser` project is released with a [Contributor
Code of Conduct](http://bioconductor.org/about/code-of-conduct/). By
contributing to this project, you agree to abide by its terms.

## Development tools

- Continuous code testing is possible thanks to [GitHub
  actions](https://www.tidyverse.org/blog/2020/04/usethis-1-6-0/)
  through *[usethis](https://CRAN.R-project.org/package=usethis)*,
  *[remotes](https://CRAN.R-project.org/package=remotes)*, and
  *[rcmdcheck](https://CRAN.R-project.org/package=rcmdcheck)* customized
  to use [Bioconductor’s docker
  containers](https://www.bioconductor.org/help/docker/) and
  *[BiocCheck](https://bioconductor.org/packages/3.17/BiocCheck)*.
- Code coverage assessment is possible thanks to
  [codecov](https://codecov.io/gh) and
  *[covr](https://CRAN.R-project.org/package=covr)*.
- The [documentation website](http://emaigne.github.io/HiCParser) is
  automatically updated thanks to
  *[pkgdown](https://CRAN.R-project.org/package=pkgdown)*.
- The code is styled automatically thanks to
  *[styler](https://CRAN.R-project.org/package=styler)*.
- The documentation is formatted thanks to
  *[devtools](https://CRAN.R-project.org/package=devtools)* and
  *[roxygen2](https://CRAN.R-project.org/package=roxygen2)*.

For more details, check the `dev` directory.

This package was developed using
*[biocthis](https://bioconductor.org/packages/3.17/biocthis)*.

Owner

  • Name: Elise Maigné
  • Login: emaigne
  • Kind: user
  • Company: INRAE, FR

I'm more active here : https://forgemia.inra.fr/elise.maigne

GitHub Events

Total
  • Push event: 32
Last Year
  • Push event: 32

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 86
  • Total Committers: 2
  • Avg Commits per committer: 43.0
  • Development Distribution Score (DDS): 0.244
Past Year
  • Commits: 86
  • Committers: 2
  • Avg Commits per committer: 43.0
  • Development Distribution Score (DDS): 0.244
Top Committers
Name Email Commits
Elise Maigné e****e@i****r 65
mzytnicki 3****i 21
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 1,385 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
bioconductor.org: HiCParser

Parser for HiC data in R

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,385 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 30.3%
Average: 40.9%
Downloads: 92.4%
Maintainers (1)
Last synced: 10 months ago

Dependencies

.github/workflows/check-bioc.yml actions
  • JamesIves/github-pages-deploy-action releases/v4 composite
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/upload-artifact master composite
  • docker/build-push-action v4 composite
  • docker/login-action v2 composite
  • docker/setup-buildx-action v2 composite
  • docker/setup-qemu-action v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
DESCRIPTION cran
  • Rcpp >= 1.0.12 imports
  • BiocStyle * suggests
  • RefManageR * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • sessioninfo * suggests
  • testthat >= 3.0.0 suggests