https://github.com/bblodfon/pdac-efs-bench2024

Feature Selection benchmark and analysis on multi-omics PDAC datasets

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Feature Selection benchmark and analysis on multi-omics PDAC datasets

Basic Info

Host: GitHub
Owner: bblodfon
License: mit
Language: R
Default Branch: main
Homepage:
Size: 95 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License

pdac-efs-bench2024

This repository contains code and data for benchmarking multi-omics feature selection methods in pancreatic ductal adenocarcinoma (PDAC), with a focus on survival prediction.

Citation

Zobolas, J., George, A.-M., López, A., Fischer, S., Becker, M., & Aittokallio, T. (2025). Optimizing Prognostic Biomarker Discovery in Pancreatic Cancer Through Hybrid Ensemble Feature Selection and Multi-Omics Data. https://arxiv.org/pdf/2509.02648

Environment Setup

Restore R libraries required to run the benchmark using renv: renv::restore(exclude = c("BiocGenerics", "BiocManager", "BiocVersion", "ComplexHeatmap", "IRanges", "S4Vectors", "MASS", "ipred", "class"))

Data and Preprocessing

We use 3 multi-omics PDAC datasets. Preprocessing steps include:

Patient and feature filtering
Metadata curation
Resampling definition for benchmark
Creation of mlr3 survival tasks per omic data type

All related code and metadata are found in the data/ directory.

Multi-omics Benchmark Overview

The benchmarking pipeline consists of three main steps, located in the bench/ directory.

Hybrid Ensemble Feature Selection (hEFS)

Script: bench/run_efs.sh: Runs the Ensemble Feature Selection procedure (bench/efs.R) across:

All datasets
All omics
All resampling iterations

Stores results as EnsembleFSResult objects in bench/efs/. Contact the main author to share with you these intermediate results.

Omic-wise Feature Selection

Script: bench/run_fs.sh: Performs feature selection per omic and per subsampling iteration (100 total).

Two methods available: - Cox Lasso - Pre-computed Ensemble Feature Selection (loaded from step 1). We automatically select the number of features via the Pareto front method.

Output: bench/fs.rds - a table with:

dataset_id, omic_id, rsmp_id
Selected features for each method

Multi-omics Integration & Benchmarking

Script: bench/run_mm_bench.R: Combines selected features across omics (via late integration/fusion) per subsampling iteration, then trains and evaluates survival models on training/test splits. Available models are Cox Proportional Hazards, Cox Lasso, Random Survival Forests and BlockForest.

Output: bench/result.rds - a table with:

dataset_id: Identifier of the dataset used
fs_method_id: Feature selection method applied
rsmp_id: Subsampling (resampling) iteration identifier
model_data_config: Configuration used for model training, indicating the model type and which omics and/or clinical data were included (all means clinical + all omics)
task_nfeats: Number of selected features used in the task
task_feats: The specific features selected
Performance scores for the test sets (Harrell's C-index, etc.)

Owner

Name: John Zobolas
Login: bblodfon
Kind: user

Repositories: 13
Profile: https://github.com/bblodfon

GitHub Events

Total

Watch event: 1
Member event: 1
Push event: 118
Public event: 1

Last Year

Watch event: 1
Member event: 1
Push event: 118
Public event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bblodfon/pdac-efs-bench2024

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

pdac-efs-bench2024

Citation

Environment Setup

Data and Preprocessing

Multi-omics Benchmark Overview

Hybrid Ensemble Feature Selection (hEFS)

Omic-wise Feature Selection

Multi-omics Integration & Benchmarking

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels