bloodbased-pancancer-diagnosis
Benchmarking study of feature extraction methods for cancer diagnosis using blood-based biomarkers. Feature extraction methods are compared both in terms of their performance and robustness
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Repository
Benchmarking study of feature extraction methods for cancer diagnosis using blood-based biomarkers. Feature extraction methods are compared both in terms of their performance and robustness
Statistics
- Stars: 2
- Watchers: 0
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Blood-based transcriptomic signature panel identification for cancer diagnosis: Benchmarking of feature extraction methods
Citation
If you use this repository, please cite our publication in Briefings in Bioinformatics : Blood-based transcriptomic signature panel identification for cancer diagnosis: benchmarking of feature extraction methods
Problem
Compare feature extraction methods for binary classification of cancer types and subtypes using blood-based biomarkers.
Approach
Build a generic pipeline to run any biomarker dataset on multiple feature extraction methods and classification models
Type of data used
- microRNAs from Extra Cellular Vesicles
- Total RNA from Tumour Educated Platelets
- microRNAs from blood
- microRNAs from serum
Pipeline
The Feature Extraction Method comparison pipeline code is made available as an R package, inside the directory FEMPipeline.
To use this in your project :
devtools::install_github("abhivij/bloodbased-pancancer-diagnosis/FEMPipeline")
And within R :
library(FEMPipeline)
The function to call the pipeline is execute_pipeline.
To obtain information regarding the arguments, within R, use
?execute_pipeline
Main inputs to the pipeline are :
* Read count file in (transcripts x samples) format. Other omics datasets can also be used.
* Phenotype file - tab separated file with column named 'Sample' with each of the samples in read count file, and their corresponding meta-data that includes a classification criteria column
* Classification criteria column name
Code & Directory Structure
The R script files outside the FEMPipeline directory calls the FEMPipeline package for datasets relevant to this study
- pipeline_executor.R : starting point to call pipeline
- datasetpipelinearguments.R : list of datasets and its meta-data, used by pipeline_executor.R as arguments to call pipeline
- katanascripts/ : scripts to call pipelineexecutor.R in Katana computational cluster
- data/ : contains source data, extracted data and preprocessed data
- phenotype_info/ : contains currently used phenotype files and the script used in some steps of phenotype file creation
- data_extraction/ : data extraction step in the pipeline
- results_processing/ : scripts to generate plots from results, statistically analyze results, compute pairwise Jaccard Index, combine results, analyze results specifically of that of Ranger feature selection method
- install.R : list of packages to be installed to run this pipeline
Owner
- Name: Abhishek Vijayan
- Login: abhivij
- Kind: user
- Location: Sydney
- Company: UNSW
- Repositories: 9
- Profile: https://github.com/abhivij
Research Associate, BABS, UNSW
Citation (CITATION.cff)
authors:
- family-names: Vijayan
given-names: Abhishek
cff-version: 1.0.0
message: "If you use this software, please cite both the article from preferred-citation and the software itself."
title: "Blood-based transcriptomic signature panel identification for cancer diagnosis: Benchmarking of feature extraction methods"
version: 1.0.0
doi: 10.5281/zenodo.6300985
preferred-citation:
authors:
- family-names: Vijayan
given-names: Abhishek
orcid: https://orcid.org/0000-0001-9877-9080
- family-names: Fatima
given-names: Shadma
orcid: https://orcid.org/0000-0002-3583-1301
- family-names: Sowmya
given-names: Arcot
orcid: https://orcid.org/0000-0001-9236-5063
- family-names: Vafaee
given-names: Fatemeh
orcid: https://orcid.org/0000-0002-7521-2417
title: "Blood-based transcriptomic signature panel identification for cancer diagnosis: benchmarking of feature extraction methods"
type: article
doi: 10.1093/bib/bbac315
url: https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbac315/6658855