neuco-bench

Welcome to NeuCo-Bench, a benchmarking framework for evaluating compressed embeddings on downstream tasks.

https://github.com/embed2scale/neuco-bench

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: ieee.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Welcome to NeuCo-Bench, a benchmarking framework for evaluating compressed embeddings on downstream tasks.

Basic Info
  • Host: GitHub
  • Owner: embed2scale
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.63 MB
Statistics
  • Stars: 3
  • Watchers: 4
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Citation

README.md

NeuCo-Bench

Licence: Apache-2.0

Originally developed to evaluate challenge submissions for the 2025 EARTHVISION Challenge at CVPR (competition details), NeuCo-Bench is now released for local benchmarking and evaluation.

NeuCo-Bench is a benchmarking framework designed to evaluate how effectively compressed embeddings preserve information for downstream tasks.

In domains like Earth Observation (EO), pipelines typically handle large volumes of image data used primarily for analytical tasks. Traditional compression techniques focus on pixel-level reconstruction, while Foundation Model (FM) research does not explicitly consider embedding size. NeuCo-Bench addresses this gap by enforcing strict size constraints and evaluating embeddings directly on real-world EO tasks.

NeuCo-Bench provides an initial set of EO tasks and invites community contributions of additional tasks and datasets from EO and other domains.

Framework overview

Key Features

  • Model-agnostic: Supports evaluation of any fixed-size embedding (e.g. 1024‑dim feature vectors), which enables comparison among compression and representation learning methods.
  • Task-Driven Evaluation: Utilizes linear probes across diverse EO tasks, including land-cover proportion estimation, cloud detection, and biomass estimation.
  • Metrics: Incorporates signal-to-noise scores and dynamic rank aggregation to compare methods.

Quickstart

```bash

start from fresh environment (skip if not needed)

micromamba create -n neuco-bench -c conda-forge python=3.12 micromamba activate neuco-bench

clone NeuCo-Bench and install requirements

git clone https://github.com/embed2scale/NeuCo-Bench.git cd NeuCo-Bench/benchmark pip install -r ../requirements.txt

run standalone NeuCo-Bench evaluation script

python main.py \ --annotationpath path/to/annotationfolder \ --submissionfile path/to/submissionfile.csv \ --outputdir path/to/results \ --config path/to/config.yaml \ --methodname your-method-name \ --phase phase-name ```

  • --annotation_path Directory containing CSV label files for each task.
  • --submission_file CSV file with your embeddings.
  • --output_dir Destination for per-task reports, plots, and aggregated benchmark results.
  • --config YAML file specifying cross-validation settings and logging options (see provided sample).
  • --method_name Identifier for your method used in filenames and leaderboard entries.
  • --phase Groups evaluation runs under a specified phase name for ranking, creating a subfolder within output_dir.

To disable GPU utilization, run CUDA_VISIBLE_DEVICES='' before execution.

Overview

NeuCo-Bench emphasizes task-oriented semantic evaluation rather than pixel-level reconstruction, measuring how effectively compressed embeddings retain information relevant to EO tasks.

To evaluate embeddings: 1. Download the SSL4EO-S12-downstream dataset from Hugging Face (see Data).
2. Encode images into fixed-size embeddings, save as CSV (see Creating Embeddings).
3. Run NeuCo-Bench locally to evaluate and aggregate scores, generating a leaderboard (see Evaluation and Ranking).


Data

The SSL4EO-S12-downstream dataset includes:

  • data/
    Subfolders for modalities (s1/, s2l1c/, s2l2a/) with subsets of 1000 zarr.zip files each.
  • labels/
    Annotation files for each downstream task.

Both data/ and labels/ are required. See examples/data for a TorchDataset loader; if you experience data-loading errors, verify that zarr==2.18.0 is used.

Data format aligns with SSL4EOS12 v1.1, recommended as a pretraining dataset.


Creating Embeddings

Generate embeddings and save them as CSV files. Example scripts in examples/ illustrate the required format and provide two baseline methods: Averaging Baseline (Bilinear interpolation and averaging of the modalities) and downsampled embeddings from a pretrained FM (DINO ViT pretrained on SSL4EO).

To ensure consistent benchmarking, all methods should use the same embedding dimension. We set the embedding size to 1024 (dimensions) during the 2025 CVPR EARTHVISION data challenge. As reference, we provide a selection of CSV files from the 2025 CVPR EARTHVISION data challenge in the repo's top-level data/ directory. More details in data/README.md. In general, the https://github.com/embed2scale/NeuCo-Bench/tree/main/data folder is tracked by Git LFS to keep initial clones of this repo slim. If you like to download the approx. 500 MB of embeddings, utilize: Bash git lfs install git pull


Evaluation and Ranking

Run the benchmark on your embeddings with:

bash python main.py \ --annotation_path path/to/annotation_folder \ --submission_file path/to/submission_file.csv \ --output_dir path/to/results \ --config path/to/config.yaml \ --method_name "your-method-name" \ --phase "phase-name"

Configuration

A sample config file (benchmark/config.yaml) specifies:

  • batch_size, epochs, learning_rate, k_folds: Cross-validation settings.
  • standardize_embeddings: Standardize embeddings using global mean and std (recommended).
  • normalize_labels: Normalize target labels to 0,1.
  • enable_plots: Generate per-fold plots (e.g., parity plots for regression).
  • update_leaderboard: Aggregate and update leaderboard after evaluation.
  • task_filter: Tasks to evaluate (default: all tasks available in annotation_path).

Results

Results saved under output_dir/<phase-name>/ include:

  • Task-specific metrics and loss curves
  • results_summary.json with per-task signal-to-noise scores and overall scores

Aggregation

Aggregate scores for leaderboard by setting update_leaderboard to True during last evaluation or manually run:

bash from evaluation.results import summarize_runs summarize_runs(output_dir=output_dir, phase=phase)


Future Work & Contributing

All downstream tasks and labels are published on Hugging Face. We are planning to extend the framework to further tasks (eg. spatial and temporal downstream tasks).

We invite the community to collaborate and appreciate contributions, including but not limited to the following: - Benchmark and contribute new compression techniques - Incorporate additional downstream task and metrics - Extension to further input modalities

Check out CONTRIBUTING.md.

Owner

  • Name: embed2scale
  • Login: embed2scale
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Wittmann
    given-names: Isabelle
    orcid: https://orcid.org/0009-0005-2137-6167
  - family-names: Vinge
    given-names: Rikard
    orcid: https://orcid.org/0000-0002-7306-3403
  - family-names: Albrecht
    given-names: Conrad M.
    orcid: https://orcid.org/0009-0009-2422-7289
  - family-names: Schneider
    given-names: Jannik
title: "NeuCo-Bench"
version: 1.0
date-released: 2025-05-12
url: https://github.com/embed2scale/benchmark

GitHub Events

Total
  • Issues event: 1
  • Watch event: 9
  • Delete event: 3
  • Public event: 1
  • Push event: 13
  • Pull request review event: 1
  • Pull request event: 6
  • Fork event: 1
  • Create event: 4
Last Year
  • Issues event: 1
  • Watch event: 9
  • Delete event: 3
  • Public event: 1
  • Push event: 13
  • Pull request review event: 1
  • Pull request event: 6
  • Fork event: 1
  • Create event: 4

Dependencies

requirements.txt pypi
  • PyYAML ==6.0.2
  • matplotlib ==3.10.1
  • numpy ==2.2.5
  • pandas ==2.2.3
  • scikit_learn ==1.6.1
  • scipy ==1.15.2
  • timm ==1.0.15
  • torch ==2.6.0
  • torchgeo ==0.7.0
  • torchmetrics ==1.7.1
  • torchvision ==0.21.0
  • tqdm ==4.67.1
  • xarray ==2024.3.0