https://github.com/bioinfomachinelearning/hicdiff

Diffusion models for denoising Hi-C chromosome conformation capturing data

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Diffusion models for denoising Hi-C chromosome conformation capturing data

Basic Info

Host: GitHub
Owner: BioinfoMachineLearning
License: mit
Language: Python
Default Branch: main
Size: 1.52 MB

Statistics

Stars: 2
Watchers: 2
Forks: 0
Open Issues: 2
Releases: 0

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

README.md

# HiCDiff Diffusion models for denoising Hi-C chromosome conformation capturing data ![showing.png](./showing.png)

Description

The directory contains the code used to run the experiments and our own models for the paper

HiC dataset used in the paper

The Cooler file dataset for Human cells with GEO number GSE130711 can be get from https://salkinstitute.app.box.com/s/fp63a4j36m5k255dhje3zcj5kfuzkyj1 or more detailed Human single-cell data at https://salkinstitute.app.box.com/s/fp63a4j36m5k255dhje3zcj5kfuzkyj1/folder/82405563291 The Cooler file format dataset for Drosophila was obtained from GEO with code GSE131811 can be get from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131811

Dependencies and Installation

HiCDiff is written in Python3 and uses the Pytorch module. The dependencies can be installed by the following command:

```bash

create conda environment

conda env create -f HiCDiff.yml

active the environment

conda active HiCDiff ```

Preparing datasets

```bash

First step create folder for the datasets, the 'Datasets' folder should be at the same level as 'TrainingYourData' Folder

mkdir -p Datasets/Human

or

mkdir -p Datasets/Drosophila

Second download the dataset to Datasets/Human or Datasets/Drosophila the by the given link

Third check the download files' extension, if not .mcool extension, you should zoomify the files to get the resolution you want

cooler zoomify --balance filename.cool

Fourth name the zoomified file to the customized name as bellow

mv fiilename.mcool cell1_name.mcool

note: you can replace the numerical number as any interger you want, and change the 'name' as you want.

```

Inference through HiCDiff

If you want to evaluate the model to get the PREDICTED result, YOU can run HiCDiff by the following command:

```bash

First step check the envirment whetther it is active, if not active the envirment

conda activate HiCDiff

Second step run the training scripts

python inference.py -u [booleanvalue] -b [batchsize] -n [cellnumber] -l [cellline] -s [sigma] ```

Training HiCDiff by Yourself

If you want to retrain your dataset, YOU can run HiCDiff by the following command:

```bash

First step check the envirment whetther it is active, if not active the envirment

conda activate HiCDiff

Second step run the training scripts

python train.py -u [booleanvalue] -e [epochnumber] -b [batchsize] -n [cellnumber] -l [cell_line] -s [sigma] ```

Optional Parameters:

bash -u, --unspervised # set the model you want to use, '1' means you will use unsupervsed way to train your model, '0' indicates you will use supervised way to train your model. -e, --epoch # How many epoches that you want to train. -b, --batch_size # The batch size you want to use in you model. -n, --celln # Cell number in the dataset you want to feed in you model. -l, --celline # Which cell line you want to choose for your dataset, default is 'Human', you should choose one name in ['Human', 'Dros'] -s, --sigma # The Gaussian noise level for the raw dataset, it should be equal or larger than 0.0 but not larger than 1.0, '1.0' means the largest noise added to datasets.

Developer

Yanli Wang Deparment of Computer Science University of Missouri Columbia, MO 65211, USA Email: yw7bh@missouri.edu

Contact

Jianlin (Jack) Cheng, PhD, AAAS Fellow Curators' Distinguished Professor William and Nancy Thompson Distinguished Professor Department of Electrical Engineering and Computer Science University of Missouri Columbia, MO 65211, USA Email: chengji@missouri.edu

License

This project is covered under the MIT License.

Reference

Yanli Wang, & Jianlin Cheng. HiCDiff: single-cell Hi-C data denoising with diffusion models. (published on Briefings in Bioinformatics]).

Owner

Name: BioinfoMachineLearning
Login: BioinfoMachineLearning
Kind: organization

Repositories: 29
Profile: https://github.com/BioinfoMachineLearning

GitHub Events

Total

Issues event: 1
Watch event: 1
Push event: 11
Fork event: 1

Last Year

Issues event: 1
Watch event: 1
Push event: 11
Fork event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/bioinfomachinelearning/hicdiff

Science Score: 10.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Description

HiC dataset used in the paper

Dependencies and Installation

create conda environment

active the environment

Preparing datasets

First step create folder for the datasets, the 'Datasets' folder should be at the same level as 'TrainingYourData' Folder

or

Second download the dataset to Datasets/Human or Datasets/Drosophila the by the given link

Third check the download files' extension, if not .mcool extension, you should zoomify the files to get the resolution you want

Fourth name the zoomified file to the customized name as bellow

note: you can replace the numerical number as any interger you want, and change the 'name' as you want.

Inference through HiCDiff

First step check the envirment whetther it is active, if not active the envirment

Second step run the training scripts

Training HiCDiff by Yourself

First step check the envirment whetther it is active, if not active the envirment

Second step run the training scripts

Optional Parameters:

Developer

Contact

License

Reference

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies