https://github.com/bblodfon/pdac-efs-bench2024
Feature Selection benchmark and analysis on multi-omics PDAC datasets
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Feature Selection benchmark and analysis on multi-omics PDAC datasets
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
pdac-efs-bench2024
This repository contains code and data for benchmarking multi-omics feature selection methods in pancreatic ductal adenocarcinoma (PDAC), with a focus on survival prediction.
Citation
Zobolas, J., George, A.-M., López, A., Fischer, S., Becker, M., & Aittokallio, T. (2025). Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data. https://arxiv.org/pdf/2509.02648
Environment Setup
Restore R libraries required to run the benchmark using renv:
renv::restore(exclude = c("BiocGenerics", "BiocManager", "BiocVersion", "ComplexHeatmap", "IRanges", "S4Vectors", "MASS", "ipred", "class"))
Data and Preprocessing
We use 3 multi-omics PDAC datasets. Preprocessing steps include:
- Patient and feature filtering
- Metadata curation
- Resampling definition for benchmark
- Creation of
mlr3survival tasks per omic data type
All related code and metadata are found in the data/ directory.
Multi-omics Benchmark Overview
The benchmarking pipeline consists of three main steps, located in the bench/ directory.
Hybrid Ensemble Feature Selection (hEFS)
Script: bench/run_efs.sh: Runs the Ensemble Feature Selection procedure (bench/efs.R) across:
- All datasets
- All omics
- All resampling iterations
Stores results as EnsembleFSResult objects in bench/efs/.
Contact the main author to share with you these intermediate results.
Omic-wise Feature Selection
Script: bench/run_fs.sh: Performs feature selection per omic and per subsampling iteration (100 total).
Two methods available: - Cox Lasso - Pre-computed Ensemble Feature Selection (loaded from step 1). We automatically select the number of features via the Pareto front method.
Output: bench/fs.rds - a table with:
dataset_id,omic_id,rsmp_id- Selected features for each method
Multi-omics Integration & Benchmarking
Script: bench/run_mm_bench.R: Combines selected features across omics (via late integration/fusion) per subsampling iteration, then trains and evaluates survival models on training/test splits.
Available models are Cox Proportional Hazards, Cox Lasso, Random Survival Forests and BlockForest.
Output: bench/result.rds - a table with:
dataset_id: Identifier of the dataset usedfs_method_id: Feature selection method appliedrsmp_id: Subsampling (resampling) iteration identifiermodel_data_config: Configuration used for model training, indicating the model type and which omics and/or clinical data were included (allmeans clinical + all omics)task_nfeats: Number of selected features used in the tasktask_feats: The specific features selected- Performance scores for the test sets (Harrell's C-index, etc.)
Owner
- Name: John Zobolas
- Login: bblodfon
- Kind: user
- Repositories: 13
- Profile: https://github.com/bblodfon
GitHub Events
Total
- Watch event: 1
- Member event: 1
- Push event: 118
- Public event: 1
Last Year
- Watch event: 1
- Member event: 1
- Push event: 118
- Public event: 1
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0