https://github.com/biodt/bfm-model
Multi-modal Foundation Model for Biodiversity dynamics forecasting
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Keywords
Repository
Multi-modal Foundation Model for Biodiversity dynamics forecasting
Basic Info
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
BioAnalyst: A Foundation Model for Biodiversity
BioAnalyst Foundation Model (BFM) for biodiversity dynamics forecasting.
This repository contains the implementation of the architecture, training, evaluation and finetuning workflows of the BFM.
Installation
There are 2 ways to install the software:
This software is tested to work with Python 3.10 and 3.12
1) With pip
```bash python -m venv venv source venv/bin/activate pip install -U pip setuptools wheel
from setuptools 61 onwards, it's possible to install with pip from a pyproject.toml
pip install -e .
OPTIONAL: For CUDA capable machines
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124 ```
OR
2) With poetry. (Make sure you have Poetry installed)
To install poetry, you can simply run
bash
curl -sSL https://install.python-poetry.org | python3 -
Afterwards, just run in a terminal
bash
poetry install
To run the scripts, activate the virtual env
bash
poetry shell
Run experiments
Training
bash
salloc -p gpu_h100 --nodes 1 --gpus-per-node 2 -t 02:00:00
source venv/bin/activate
python bfm_model/bfm/train_lighting.py
Testing
bash
python bfm_model/bfm/test_lighting.py
Rollout Predictions
bash
python bfm_model/bfm/rollouts.py
Rollout Finetuning
We offer 2 Parameter Efficient Finetuning Techniques, namely LoRA and VeRA. They can be configured by enabling and disabling interchangable each of them on the train_config.yaml on the finetune section.
bash
python bfm_model/bfm/rollout_finetuning.py
In the cluster
```bash sbatch snellius_train.sh
or
sbatch snellius_finetune.sh ```
Analysing results
We use Hydra to store all the artifacts from all the runs. In this way we can configure with yaml files, override from CLI, make multiruns with multiple parameters, and have all the results stored in the outputs folder.
There, we can find by date and time all the data from the runs (configs, checkpoints, metrics, ...).
MLflow
MLflow is used to log all the runs, and we configure it to save its internal files in the mlruns folder. The logging is done via filesystem, so that you don't need to have a MLflow server running during the training.
You can run the MLflow server when you want (after or during training) to inspect the runs with the command:
```bash
run in the root of the repository, where the mlruns folder is located
mlflow server --host 0.0.0.0 --port 8082 ```
On snellius:
- run the mlflow command above in the same node where your vscode interface is executing (login node or ondemand)
- vscode will detect the port and forward a local port to it (popup appearing, or go to the "PORTS" tab to open it)
If you are not using vscode, or want a manual connection:
- forward a local port to it: ssh -L 0.0.0.0:<LOCAL_PORT>:<node_id>:8082 <USER>@snellius.surf.nl (example: ssh -L 0.0.0.0:8899:int6:8082 snellius)
- open http://localhost:<LOCAL_PORT>/ (example: http://localhost:8899/)
Visualisation
This repository contains various visualisation functions that are applicable for every stage of the workflow. More specific:
- Batch level: Inspect and visualise the RAW data (2 timesteps) from the Batches along with their MAE. Run the notebook
documentation/batch_visualisation.ipynb. You need to change theDATA_PATHto the directory you have the batches you want to visualise. The code plots only a single batch but it can be configured to visualise all of them and save them with the appropriate flag.
[!NOTE] You need to produce predictions either by running
bfm_model/bfm/test_lighting.pyor bybfm_model/bfm/rollout_finetuning.pyand enabling the finetune.prediction: True on the train_config. These will create export folders with the predictions and the ground truths in a compact tensor format.
Prediction level: To visualise them simply run
streamlit run prediction_viewer.py. You can navigate the different tabs and variable groups to inspect each and every one of them.Rollout level: To visualise them simply run
streamlit run rollout_viewer.pyand visit the localhost. There you can inspect the different Variable Groups with their respective Variables and Levels.
Examples
Inside the documentation folder you can find various notebooks that inspect and interact with BioAnalyst for different tasks.
The most straight-forward is the example_prediction.ipynb where you can run one timestep ahead prediction.
[!NOTE] It requires the production of at least one Batch and supply it via the dataloder! The model weights available are from the Small model. We will update the scripts and the results with the Medium model weights when they become available.
Prepare and upload model weights
First you need to make the weights safe by using safetensors.
Use the notebook documentation/prepare_checkpoint.ipynb to do so.
Then just follow the Model card tab and upload the weights either with CLI or with a short python script.
Citation
If you like our work, please consider citing us as follows:
@misc{trantas2025bioanalystfoundationmodelbiodiversity,
title={BioAnalyst: A Foundation Model for Biodiversity},
author={Athanasios Trantas and Martino Mensio and Stylianos Stasinos and Sebastian Gribincea and Taimur Khan and Damian Podareanu and Aliene van der Veen},
year={2025},
eprint={2507.09080},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.09080},
}
Resources
Good Trainer example: https://github.com/SeanNaren/min-LLM/blob/fsdp/train.py#L235-L243
Interesting addition for CLI args generation: https://github.com/google/python-fire
TODODs
[x] Codebase cleanup
[x] Hugging Face weights upload, loading and tutorial notebook.
[x] Finetune routine implementation with LoRA and optinally VeRA DONE
[x] Finetune dataset setup
[x] Rollout Finetune modes: Monthly (x1), Yearly (x12)
[x] Investigate if a (Prioritized) Buffer for Rollout Finetune is required - No need
[x] Investigate effect of batch_size on finetuning - currently low memory usage but slow execution
[x] Safe tensors storage
[x] Validate distributed training strategy
[ ] Make clear the data structure throughout the whole codebase. Currently we have interchanged dicts & Batch Tuples
Owner
- Name: BioDT
- Login: BioDT
- Kind: organization
- Website: https://biodt.eu
- Twitter: BiodiversityDT
- Repositories: 1
- Profile: https://github.com/BioDT
Horizon EU Biodiversity Digital Twin
GitHub Events
Total
- Release event: 1
- Watch event: 2
- Delete event: 1
- Public event: 1
- Fork event: 1
- Create event: 2
Last Year
- Release event: 1
- Watch event: 2
- Delete event: 1
- Public event: 1
- Fork event: 1
- Create event: 2
Dependencies
- 171 dependencies
- flake8-pyproject >=1.2.3 develop
- fvcore >=0.1.5 develop
- hydra-core >=1.3.2 develop
- openpyxl >=3.1.5 develop
- optuna >=4.0.0 develop
- optuna-integration ^4.0.0 develop
- pandas >=2.2.2 develop
- pre-commit >=3.7.0 develop
- pytest >=8.3.4 develop
- pytorch-forecasting >=1.2.0 develop
- scikit-learn >=1.5.1 develop
- seaborn >=0.13.2 develop
- torchmetrics >=1.4.0 develop
- xarray >=2024.10 develop
- cartopy >=0.24.1
- einops >=0.8.0
- fvcore 0.1.5.post20221221
- hydra-core >=1.3.2
- lightning >=2.5.2
- mlflow >=3.1.1
- nbformat >=5.10.4
- numpy >=1.26
- plotly >=5.24.1
- psutil >=6.1.1
- pynvml >=12.0.0
- python >=3.11,<3.14
- timm >=1.0.9
- torch >=2.7.1
- torchaudio >=2.7.1
- torchvision >=0.22.1
- typer >=0.15.1