neuco-bench
Welcome to NeuCo-Bench, a benchmarking framework for evaluating compressed embeddings on downstream tasks.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: ieee.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary
Repository
Welcome to NeuCo-Bench, a benchmarking framework for evaluating compressed embeddings on downstream tasks.
Basic Info
Statistics
- Stars: 3
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
NeuCo-Bench
Licence: Apache-2.0
Originally developed to evaluate challenge submissions for the 2025 EARTHVISION Challenge at CVPR (competition details), NeuCo-Bench is now released for local benchmarking and evaluation.
NeuCo-Bench is a benchmarking framework designed to evaluate how effectively compressed embeddings preserve information for downstream tasks.
In domains like Earth Observation (EO), pipelines typically handle large volumes of image data used primarily for analytical tasks. Traditional compression techniques focus on pixel-level reconstruction, while Foundation Model (FM) research does not explicitly consider embedding size. NeuCo-Bench addresses this gap by enforcing strict size constraints and evaluating embeddings directly on real-world EO tasks.
NeuCo-Bench provides an initial set of EO tasks and invites community contributions of additional tasks and datasets from EO and other domains.
Key Features
- Model-agnostic: Supports evaluation of any fixed-size embedding (e.g. 1024‑dim feature vectors), which enables comparison among compression and representation learning methods.
- Task-Driven Evaluation: Utilizes linear probes across diverse EO tasks, including land-cover proportion estimation, cloud detection, and biomass estimation.
- Metrics: Incorporates signal-to-noise scores and dynamic rank aggregation to compare methods.
Quickstart
```bash
start from fresh environment (skip if not needed)
micromamba create -n neuco-bench -c conda-forge python=3.12 micromamba activate neuco-bench
clone NeuCo-Bench and install requirements
git clone https://github.com/embed2scale/NeuCo-Bench.git cd NeuCo-Bench/benchmark pip install -r ../requirements.txt
run standalone NeuCo-Bench evaluation script
python main.py \ --annotationpath path/to/annotationfolder \ --submissionfile path/to/submissionfile.csv \ --outputdir path/to/results \ --config path/to/config.yaml \ --methodname your-method-name \ --phase phase-name ```
--annotation_pathDirectory containing CSV label files for each task.--submission_fileCSV file with your embeddings.--output_dirDestination for per-task reports, plots, and aggregated benchmark results.--configYAML file specifying cross-validation settings and logging options (see provided sample).--method_nameIdentifier for your method used in filenames and leaderboard entries.--phaseGroups evaluation runs under a specified phase name for ranking, creating a subfolder withinoutput_dir.
To disable GPU utilization, run CUDA_VISIBLE_DEVICES='' before execution.
Overview
NeuCo-Bench emphasizes task-oriented semantic evaluation rather than pixel-level reconstruction, measuring how effectively compressed embeddings retain information relevant to EO tasks.
To evaluate embeddings:
1. Download the SSL4EO-S12-downstream dataset from Hugging Face (see Data).
2. Encode images into fixed-size embeddings, save as CSV (see Creating Embeddings).
3. Run NeuCo-Bench locally to evaluate and aggregate scores, generating a leaderboard (see Evaluation and Ranking).
Data
The SSL4EO-S12-downstream dataset includes:
data/
Subfolders for modalities (s1/,s2l1c/,s2l2a/) with subsets of 1000zarr.zipfiles each.labels/
Annotation files for each downstream task.
Both data/ and labels/ are required. See examples/data for a TorchDataset loader; if you experience data-loading errors, verify that zarr==2.18.0 is used.
Data format aligns with SSL4EOS12 v1.1, recommended as a pretraining dataset.
Creating Embeddings
Generate embeddings and save them as CSV files. Example scripts in examples/ illustrate the required format and provide two baseline methods: Averaging Baseline (Bilinear interpolation and averaging of the modalities) and downsampled embeddings from a pretrained FM (DINO ViT pretrained on SSL4EO).
To ensure consistent benchmarking, all methods should use the same embedding dimension. We set the embedding size to 1024 (dimensions) during the 2025 CVPR EARTHVISION data challenge.
As reference, we provide a selection of CSV files from the 2025 CVPR EARTHVISION data challenge in the repo's top-level data/ directory. More details in data/README.md.
In general, the https://github.com/embed2scale/NeuCo-Bench/tree/main/data folder is tracked by Git LFS to keep initial clones of this repo slim. If you like to download the approx. 500 MB of embeddings, utilize:
Bash
git lfs install
git pull
Evaluation and Ranking
Run the benchmark on your embeddings with:
bash
python main.py \
--annotation_path path/to/annotation_folder \
--submission_file path/to/submission_file.csv \
--output_dir path/to/results \
--config path/to/config.yaml \
--method_name "your-method-name" \
--phase "phase-name"
Configuration
A sample config file (benchmark/config.yaml) specifies:
batch_size,epochs,learning_rate,k_folds: Cross-validation settings.standardize_embeddings: Standardize embeddings using global mean and std (recommended).normalize_labels: Normalize target labels to 0,1.enable_plots: Generate per-fold plots (e.g., parity plots for regression).update_leaderboard: Aggregate and update leaderboard after evaluation.task_filter: Tasks to evaluate (default: all tasks available inannotation_path).
Results
Results saved under output_dir/<phase-name>/ include:
- Task-specific metrics and loss curves
results_summary.jsonwith per-task signal-to-noise scores and overall scores
Aggregation
Aggregate scores for leaderboard by setting update_leaderboard to True during last evaluation or manually run:
bash
from evaluation.results import summarize_runs
summarize_runs(output_dir=output_dir, phase=phase)
Future Work & Contributing
All downstream tasks and labels are published on Hugging Face. We are planning to extend the framework to further tasks (eg. spatial and temporal downstream tasks).
We invite the community to collaborate and appreciate contributions, including but not limited to the following: - Benchmark and contribute new compression techniques - Incorporate additional downstream task and metrics - Extension to further input modalities
Check out CONTRIBUTING.md.
Owner
- Name: embed2scale
- Login: embed2scale
- Kind: organization
- Repositories: 1
- Profile: https://github.com/embed2scale
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Wittmann
given-names: Isabelle
orcid: https://orcid.org/0009-0005-2137-6167
- family-names: Vinge
given-names: Rikard
orcid: https://orcid.org/0000-0002-7306-3403
- family-names: Albrecht
given-names: Conrad M.
orcid: https://orcid.org/0009-0009-2422-7289
- family-names: Schneider
given-names: Jannik
title: "NeuCo-Bench"
version: 1.0
date-released: 2025-05-12
url: https://github.com/embed2scale/benchmark
GitHub Events
Total
- Issues event: 1
- Watch event: 9
- Delete event: 3
- Public event: 1
- Push event: 13
- Pull request review event: 1
- Pull request event: 6
- Fork event: 1
- Create event: 4
Last Year
- Issues event: 1
- Watch event: 9
- Delete event: 3
- Public event: 1
- Push event: 13
- Pull request review event: 1
- Pull request event: 6
- Fork event: 1
- Create event: 4
Dependencies
- PyYAML ==6.0.2
- matplotlib ==3.10.1
- numpy ==2.2.5
- pandas ==2.2.3
- scikit_learn ==1.6.1
- scipy ==1.15.2
- timm ==1.0.15
- torch ==2.6.0
- torchgeo ==0.7.0
- torchmetrics ==1.7.1
- torchvision ==0.21.0
- tqdm ==4.67.1
- xarray ==2024.3.0