srfd-bayes
[Nature Communications] Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
[Nature Communications] Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis
Basic Info
Statistics
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis
Official MATLAB code for the paper "Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis".
Prerequisites
MATLAB R2018b
Experimental settings
The experiments consist of two parts: deconvolution and diagnosis, each is performed on both simulation dataset and real dataset.
In the deconvolution step, the semi-reference-deconvolution (SRFD) is first implemented on cfDNA methylation data to obtain a reference database, which is then utilized to deconvolve the test samples to decipher their fraction vectors. To estimate the tumor fractions of real samples, the reference database learned from simulation dataset is directly utilized for the deconvolution of real cfDNA methylation data from cancer patients.
In the diagnosis step, the diagnostic prior is first obtained from the machine learning based classifiers, and then the conditional probability distribution is computed from the tumor components in the fraction vectors of each test sample. The prior and the conditional probability distribution are combined to make the final Bayesian diagnostic decision.
Usage
run main.m
File instruction
The 'data' directory contains two type of datasets: simulation dataset and real dataset.
- simulation_dataset
The simulation data contains four datasets with the CNV event probabilities of 0, 10%, 30% and 50%. Each dataset consists of 2400 training (400 for each category) and 2400 test (400 for each category) samples.
train_data is a $Ks \times N$ matrix. $N=2400$ suggests the number of samples, each with $Ks=350$ dimensional methylation levels, i.e. $\beta$ value. Similarly, test_data is also a $Ks \times N$ matrix. *traintheta* and test_theta denote the simulated tumor fraction of training and test samples.
- real_dataset
The real data mainly contains three datasets: chip-based methylation data from GSE122126, GSE108462 and GSE129374, Xu et al. data and Chen et al. data. The chip-based data is formulated in file validationrealdata.mat .
References
If you find this work or code useful, please cite this study. If you have any questions about this code, please contact zhouxiao17@mails.tsinghua.edu.cn
Zhou X, Cheng Z, Dong M, et al. Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis[J]. Nature Communications, 2022, 13(1): 1-13.
Owner
- Login: Astaxanthin
- Kind: user
- Repositories: 2
- Profile: https://github.com/Astaxanthin
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Zhou
given-names: Xiao
orcid: https://orcid.org/0000-0001-5121-5640
title: Astaxanthin/SRFD-Bayes: 1.0.0
version: 1.0.0
date-released: 2022-11-24