aric

Accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data

https://github.com/xwanglabthu/aric

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: nature.com
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

deconvolution methylation rna-seq

Last synced: 10 months ago · JSON representation

Repository

Accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data

Basic Info

Host: GitHub
Owner: XWangLabTHU
License: gpl-3.0
Language: Python
Default Branch: main
Homepage: https://xwanglabthu.github.io/ARIC/
Size: 820 KB

Statistics

Stars: 2
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 1

Topics

deconvolution methylation rna-seq

Created about 5 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

ARIC

Section 1: Introduction
Section 2: Installation Tutorial
- Section 2.1: System requirement
- Section 2.2: Installation
Section 3: A Quick Tutorial for Demo data Deconvolution
- Section 3.1: Quick Start
- Section 3.2: Function Introduction
Section 4: Applications on TCGA Ovarian Cancer
- Section 4.1: Deconvolution for All Patients
- Section 4.2: Survival Analysis
Section 5: Computational Efficiency Comparison
Citation

Section 1: Introduction

ARIC is a bioinfomatics software for bulk gene expression and DNA methylation data deconvolution. ARIC utilizes a novel two-step marker selection strategy, including component-wise condition number-based feature collinearity elimination and adaptive outlier markers removal. This strategy can systematically obtain effective markers that ensure a robust and precise weighted υ-SVR-based rare proportion prediction.

Section 2: Installation Tutorial

Section 2.1: System requirement

ARIC is implemented using python and can be install in windows, UNIX/LINUX and MAC OS. ARIC requires python version >= 3 and all the dependent packages will be installed using pip.

Section 2.2: Installation

ARIC can be installed from pypi by the following command. The source code can also be downloaded from pypi.

Shell pip install ARIC

Section 3: A Quick Tutorial for Demo data Deconvolution

In this section, we will demonstrate how to perform bulk data deconvolution using the demo data.

Section 3.1: Quick Start

We provide a small demo data here. There are two main files in csv format. One saves the mixture bulk data and another saves external reference data. Just put the file path to the function "ARIC", and the program will do every thing.

```python from ARIC import *

ARIC(mixpath="mix.csv", refpath="ref.csv") ```

Section 3.2: Function Introduction

The main function in ARIC is decipher.

Python ARIC(mix_path, ref_path, save_path=None, marker_path=None, selected_marker=False, scale=0.1, delcol_factor=10, iter_num=10, confidence=0.75, w_thresh=10, unknown=False, is_methylation=False)

'mix_path': Path to mixture data, must be an csv file with colnames and rownames.
'ref_path': Path to reference data, must be an csv file with colnames and rownames.
'save_path': Where to save the deconvolution results. Default: mixpathprefix_prop.csv.
'marker_path': Path to the user specificed markers. Must be an csv file.
'selected_marker': Output selected marker for every sample. Marker files will be saved in a folder named "sample_marker.csv".
'scale': Used for controlling the convergence of SVR. A smaller value makes the convergence much faster. Default: 0.1.
'delcol_factor': Used for controlling the extent of removing collinearity. Default: 10.
'iter_num': Iterative numbers of outliers detection. Default: 10.
'confidence': Ratio of remained markers in each outlier detection loop. Default: 0.75.
'w_thresh': Threshold to cut the weights designer. Default: 10.
'unknown': Whether to estimate unknown content proportion.
'is_methylation': Whether the data type belongs to methylation data. If true, preliminary marker selection will be performed.

Section 4: Applications on TCGA Ovarian Cancer

In this part, we will demonstrate how to use ARIC for ovarian cancer patients' classification. Users can follow the below instruction to reproduce the results in our article.

Ovarian cancer patients data with survival information can be downloaded from LinkedOmics directly. LM22 reference data can be downloaded from CIBERTSORT. The survival information will be saved in file "HumanTCGA_OVMSClinicalClinical01282016BIClinicalFirehose.tsi".

We provide the scaled data and survival information here.

Section 4.1: Deconvolution for All Patients

First, put "mixscaled.csv" and "refscaled.csv" to your folder.

```Python from ARIC import *

ARIC(mixpath="mixscaled.csv", refpath="refscaled.csv", savepath="ovARIC.csv", selected_marker=True)

```

Then, wait for the deconvolution done.

```Python

--------------WELCOME TO ARIC----------------

Data reading finished! ARIC Engines Start, Please Wait...... 100%|█████████████████████████████████████████████████████████████| 514/514 [01:14<00:00, 6.89it/s] Deconvo Results Saving! Finished! ```

There will be 2 main outputs. The first one is estimated proportion file named "ovARIC.csv". The second is a folder named "mixscaled" (the same name with the input mixture file). All the markers selected by ARIC for each sample will be saved in folder "mix_scaled".

Section 4.2: Survival Analysis

Then, we perform survival analysis based on R package "survival" and "survminer".

```R library(survival) library(survminer) library(tidyr) library(gridExtra)

import survival information

surinfo <- read.table(file = "HumanTCGAOVMSClinicalClinical01282016BIClinical_Firehose.tsi", header = TRUE, row.names = 1) tmprowname <- rownames(sur_info)

data <- read.csv(file = "ov_ARIC.csv", header = TRUE, row.names = 1)

selected_celltype <- c("T.cells.CD8", "T.cells.gamma.delta", "Macrophages.M1", "NK.cells.resting", "NK.cells.activated")

data <- data[selectedcelltype, ] data <- colSums(x = data) propmedian <- median(data)

highrisk <- names(data)[which(data <= propmedian)] lowrisk <- names(data)[which(data > propmedian)]

label <- rep(x = "tumor", times = ncol(surinfo)) names(label) <- colnames(surinfo) idxhigh <- which(names(label) %in% highrisk) label[idxhigh] <- "high" idxlow <- which(names(label) %in% lowrisk) label[idxlow] <- "low"

surinfo <- rbind(surinfo, label) rownames(surinfo) <- c(tmprowname, "risk")

surinfo <- as.data.frame(t(surinfo[, which(colnames(sur_info) %in% names(data))]))

surinfo <- dropna(data = surinfo, c("overallsurvival", "status")) surinfo <- transform(surinfo, overallsurvival = as.numeric(overallsurvival)) surinfo <- transform(surinfo, status = as.numeric(status))

fit <- survfit(Surv(overallsurvival, status) ~ risk, data=surinfo) ggsurvplot(fit, pval = TRUE, conf.int = TRUE)

rescox <- coxph(Surv(overallsurvival, status) ~ risk, data=sur_info)

summary(res_cox)$conf.int

```

Then, we can get the survival curve and hazard ratio like below.

exp(coef) exp(-coef) lower .95 upper .95 risklow 0.7424766 1.346844 0.593249 0.9292413

ARIC Predicted OV patients' survival curve

Section 5: Computational Efficiency Comparison

Computational efficiency is largely influenced by the number of markers. Therefore, we compared the computation time with both different methods and different marker numbers.

We generated in silico mixed gene expression data with different marker numbers (100, 500, 1000, 2000, 5000, 7000 and 10000). In order to get a reliable result, we generated 10 datasets and each dataset had 50 samples for each situation with different marker numbers. We compared the mean computation time for 50 samples and summarized the results in the foloowing table.

Computational Efficiency Comparison

ARIC needs to compute component-wise condition number after removing each collinearity marker. Therefore, the computational time will be longer than matrix operation-based methods like dtangle and deconRNAseq. The computational efficiencies of ARIC, EPIC and FARDEEP are at the same level. In addition, computational time of CIBERSORT growth drastically with the increase of marker number. Thus, we strongly recommended filtering low quality markers before deconvolution.

Citation

Zhang, Wei, Hanwen Xu, Rong Qiao, Bixi Zhong, Xianglin Zhang, Jin Gu, Xuegong Zhang, Lei Wei, and Xiaowo Wang. "ARIC: accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data." Briefings in Bioinformatics 23, no. 1 (2022): bbab362.

Owner

Name: XWangLabTHU
Login: XWangLabTHU
Kind: organization

Repositories: 4
Profile: https://github.com/XWangLabTHU

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 18
Total Committers: 2
Avg Commits per committer: 9.0
Development Distribution Score (DDS): 0.111

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
ZweiTHU	s**2@g**m	16
Honchkrow	zw@s****n	2

Committer Domains (Top 20 + Academic)

sdu.edu.cn: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: 3 days
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 1.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ryrl9703 (2)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 10 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: aric

ARIC: Accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data

Homepage: https://xwanglabthu.github.io/ARIC/
Documentation: https://aric.readthedocs.io/
License: GPL V3
Latest release: 1.0.1
published over 2 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 10 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 15.3%

Stargazers count: 31.9%

Average: 40.4%

Dependent repos count: 67.6%

Downloads: 77.3%

Maintainers (1)

Shaway

Last synced: 11 months ago

Dependencies

setup.py pypi

numpy *

aric

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

ARIC

Section 1: Introduction

Section 2: Installation Tutorial

Section 2.1: System requirement

Section 2.2: Installation

Section 3: A Quick Tutorial for Demo data Deconvolution

Section 3.1: Quick Start

Section 3.2: Function Introduction

Section 4: Applications on TCGA Ovarian Cancer

Section 4.1: Deconvolution for All Patients

```Python

--------------WELCOME TO ARIC----------------

Section 4.2: Survival Analysis

import survival information

Section 5: Computational Efficiency Comparison

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: aric

Rankings

Maintainers (1)

Dependencies