https://github.com/calgo-lab/bgr

https://github.com/calgo-lab/bgr

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: calgo-lab
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 93.6 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 1
  • Releases: 2
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

SoilNet: A Multimodal Multitask Model for Hierarchical Classification of Soil Horizons

Authors: Teodor Chiaburu, Vipin Singh, Frank Hausser, Felix Biessmann

Citation: If you use this repository, please consider citing our paper:

bibtex @misc{chiaburu2025soilnetmultimodalmultitaskmodel, title={SoilNet: A Multimodal Multitask Model for Hierarchical Classification of Soil Horizons}, author={Teodor Chiaburu and Vipin Singh and Frank Haußer and Felix Bießmann}, year={2025}, eprint={2508.03785}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.03785}, }


Table of Contents


Abstract

While recent advances in foundation models have improved the state of the art in many domains, some problems in empirical sciences could not benefit from this progress yet. Soil horizon classification, for instance, remains challenging because of its multimodal and multitask characteristics and a complex hierarchically structured label taxonomy. Accurate classification of soil horizons is crucial for monitoring soil health, which directly impacts agricultural productivity, food security, ecosystem stability and climate resilience. In this work, we propose SoilNet - a multimodal multitask model to tackle this problem through a structured modularized pipeline. Our approach integrates image data and geotemporal metadata to first predict depth markers, segmenting the soil profile into horizon candidates. Each segment is characterized by a set of horizon-specific morphological features.

Finally, horizon labels are predicted based on the multimodal concatenated feature vector, leveraging a graph-based label representation to account for the complex hierarchical relationships among soil horizons. Our method is designed to address complex hierarchical classification, where the number of possible labels is very large, imbalanced and non-trivially structured. We demonstrate the effectiveness of our approach on a real-world soil profile dataset.


Project Overview

Soilnet Tasks

  1. Task 1: Segmentation: Soil profile images are segmented based on features extracted from the full images concatenated with features extracted from the geotemporal data.

  2. Task 2: Tabular Prediction: Tabular morphological features are predicted based on (visual) features extracted from the segments concatenated with the geotemporal features (one set of tabular features per segment).

  3. Task 3: Classification: Horizon labels are predicted based on concatenated visual segment features, geotemporal features and tabular morphological features (one label per segment).

An illustration of our proposed modularized multimodal multitask architecture for solving the three tasks (SoilNet):

Soilnet Architecture


Installation

  1. Clone the repository: bash git clone https://github.com/calgo-lab/BGR.git

  2. Create a virtual environment (recommended): ```bash

    Using venv

    python -m venv env source env/bin/activate # On Windows use env\Scripts\activate ```

  3. Install dependencies: ```bash

    Using pip

    pip install -r requirements.txt ``` Note: The key dependency for running the models is PyTorch with version 2.1.0 and CUDA 12.1.


Dataset

For carrying out the experiments and training our models we disposed of an image-tabular dataset built and curated by our partner geological institute (Reference anonymized).

For the time being, the full dataset cannot be made publicly available.

The following figure demonstrates the data structure of the dataset we used for training and evaluation:

Soilnet Data Structure

Additionally, the dataset contained geotemporal metadata for each soil profile image. For further details on the dataset, please refer to the paper.


Usage

The code can be used through running the main.py script, which allows for training (or running inference with) the SoilNet model and modules.

To see all available command line arguments, run: bash python main.py --help

Training

To train a model, use the following command: bash python main.py --data_folder_path=<path_to_data_folder> --target=<target> --experiment_type=<experiment_name> Where: - <path_to_data_folder>: Path to the folder containing the dataset. - <target>: The target task to train the model on. - <experiment_name>: Name of the experiment, see experiments

Evaluation

To evaluate a trained model, provide a path to the model checkpoint and the data folder: bash python main.py --data_folder_path=<path_to_data_folder> --target=<target> --experiment_type=<experiment_name> --inference_model_file=<path_to_model_checkpoint> Where: - <path_to_data_folder>: Path to the folder containing the dataset. - <target>: The target task to evaluate the model on. - <experiment_name>: Name of the experiment, see experiments - <path_to_model_checkpoint>: Path to the model checkpoint for the specfied experiment.

Results

Here, we provide a summary of the results for the SoilNet (SN) model with 4 different configurations. Metrics are computed on the test set and given in percents (%). The full images for the depth module were encoded with the MaskedResNet image encoder. PatchCNN and ResNet refer to the segment encoders. LSTM refers to all three task-predictors (depth, tabulars, horizons). Emb and CE refer to the horizon loss, SN = SoilNet, Acc.agg. = Accuracy aggregated over main symbols. The main symbols represent the horizon symbols on a higher level of the hierarchy, demonstrating more geologically coherent predictions.

| Model name | IoU | Acc. | F1 | Prec. | Rec. | Acc.@5 | Prec.@5 | Rec.@5 | Acc.agg. | | :---------------------- | :------------- | :------------- | :------------- | :------------- | :------------- | :------------- | :------------- | :------------- | :------------- | | SNPatchCNNLSTMEmb | 51.25 | 36.27 | 7.55 | 10.40 | 10.48 | 60.21 | 40.27 | 33.84 | 71.42 | | SNResNetLSTMEmb | 51.47 | 35.40 | 6.58 | 8.51 | 9.48 | 59.65 | 33.75 | 33.08 | 68.70 | | SNPatchCNNLSTMCE | 49.52 | 43.99 | 7.20 | 9.03 | 8.26 | 72.02 | 50.33 | 30.33 | 68.61 | | SNResNetLSTMCE | 49.91 | 45.70 | 7.99 | 9.22 | 8.62 | 76.25 | 49.85 | 35.03 | 69.88 |

Conformalization

We conducted preliminary experiments conformalizing our model (currently, only the version trained with a ResNet backbone and cross entropy). We were particularly interested in building an uncertainty-driven annotation pipeline, where SoilNet would defer annotaions to an expert whenever its uncertainty in any task was too high. To this extend, we tested multiple uncertainty ranking methods (Monte Carlo Dropout vs conformal intervals for Task 1 and Softmax entropies vs conformal sets in Task 3).

Paper under review.

Repository Structure

The repository is structured as follows (only relevant files displayed):

  • bgr/:
    • soil/
      • data/
        • datasets.py: PyTorch datasets used in the experiments, Images will be loaded here using file paths stored in the tabular data.
        • horizon_tabular_data.py: Loading and processing of the dataset in a tabular format.
      • experiments/
        • simple_depth/: Experiments for the depth marker prediction task (Task 1).
        • simple_tabulars/: Experiments for the tabular horizon feature prediction task (Task 2).
        • simple_horizon/: Experiments for the horizon classification task (Task 3).
        • end2end/: Experiments for the end-to-end SoilNet model (Task 1, 2 and 3).
      • modelling/
        • depth/: Models and modules for the depth marker prediction.
        • tabulars/: Models and modules for the tabular horizon feature prediction.
        • horizon/: Models and modules for the horizon classification.
        • geotemp_modules.py: Modules for the processing / encoding of the geotemporal data.
        • image_modules.py: Modules for the processing of the images and segments images.
        • soilnet.py: The SoilNet model, which integrates all the modules for the three tasks.
      • experiment_runner.py: The experiment runner, which handles the training and evaluation of the models.
      • metrics.py: The custom metrics used in the experiments.
      • training_args.py: The training arguments for the experiments, including hyperparameters for the models.
  • notebooks/
    • soil/: Jupyter notebooks for the experiments, including data exploration and visualization.
  • main.py: The main script and entry point for the repository.
  • requirements.txt: The requirements file for the repository, including all dependencies.

Owner

  • Name: Cognitive Algorithms Lab
  • Login: calgo-lab
  • Kind: organization
  • Location: Germany

GitHub Events

Total
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Watch event: 1
  • Push event: 26
  • Public event: 1
  • Pull request event: 8
Last Year
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Watch event: 1
  • Push event: 26
  • Public event: 1
  • Pull request event: 8