wildlife-datasets

WildlifeDatasets: An open-source toolkit for animal re-identification

https://github.com/WildlifeDatasets/wildlife-datasets

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary

Keywords

dataset datasets deep-learning ecology ecology-modelling machine-learning
Last synced: 6 months ago · JSON representation

Repository

WildlifeDatasets: An open-source toolkit for animal re-identification

Basic Info
Statistics
  • Stars: 124
  • Watchers: 2
  • Forks: 14
  • Open Issues: 0
  • Releases: 7
Topics
dataset datasets deep-learning ecology ecology-modelling machine-learning
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

GitHub issues GitHub pull requests GitHub contributors GitHub forks GitHub stars GitHub watchers License

Wildlife datasets

Pipeline for wildlife re-identification including dataset zoo, training tools and trained models. Usage includes classifying new images in labelled databases and clustering individuals in unlabelled databases.

Documentation · Report Bug · Request Feature · :mailbox_with_mail:Email


| WildlifeReID-10k | MegaDescriptor | Wildlife tools | |:--------------:|:-----------:|:------------:| | Dataset for identification of individual animals | Trained model for individual re‑identification | Tools for training re‑identification models |


Wildlife Re-Identification (Re-ID) Datasets

The aim of the project is to provide comprehensive overview of datasets for wildlife individual re-identification and an easy-to-use package for developers of machine learning methods. The core functionality includes:

  • overview of 46 publicly available wildlife re-identification datasets.
  • utilities to mass download and convert them into a unified format and fix some wrong labels.
  • default splits for several machine learning tasks including the ability create additional splits.

An introductory example is provided in a Jupyter notebook. The package provides a natural synergy with WildlifeTools, which provides our MegaDescriptor model and tools for training neural networks.

Do you know about any animal re-identification dataset which is not included? Post it to the discussion forum please.

Changelog

[01/07/2025] Added BristolGorillas2020 (primates) and CzechLynx (lynxes).
[14/04/2025] Added AnimalCLEF2025, WildlifeReID-10k (unifications of multiple datasets), MultiCamCows2024 (cows) and PrimFace (primates).
[31/10/2024] Added AmvrakikosTurtles, ReunionTurtles, SouthernProvinceTurtles, ZakynthosTurtles (sea turtles), ELPephants (elephants) and Chicks4FreeID (chickens).
[09/05/2024] Added CatIndividualImages (cats), CowDataset (cows) and DogFaceNet (dogs).
[28/02/2024] Added MPDD (dogs), PolarBearVidID (polar bears) and SeaStarReID2023 (sea stars).
[04/01/2024] Received Best paper award at WACV 2024.

Summary of datasets

An overview of the provided datasets is available in the documentation, while the more numerical summary is located in a Jupyter notebook. Due to its size, it may be necessary to view it via nbviewer.

We include basic characteristics such as publication years, number of images, number of individuals, dataset time span (difference between the last and first image taken) and additional information such as source, number of poses, inclusion of timestamps, whether the animals were captured in the wild and whether the dataset contain multiple species.

Dataset summary

Installation

The installation of the package is simple by pip install wildlife-datasets

Basic functionality

We show an example of downloading, extracting and processing the MacaqueFaces dataset.

``` from wildlife_datasets import analysis, datasets

datasets.MacaqueFaces.get_data('data/MacaqueFaces') dataset = datasets.MacaqueFaces('data/MacaqueFaces') ```

The class dataset contains the summary of the dataset. The content depends on the dataset. Each dataset contains the identity and paths to images. This particular dataset also contains information about the date taken and contrast. Other datasets store information about bounding boxes, segmentation masks, position from which the image was taken, keypoints or various other information such as age or gender.

dataset.df

Overview of the MacaqueFaces dataset

The dataset also contains basic metadata including information about the number of individuals, time span, licences or published year.

dataset.summary

Metadata of the MacaqueFaces dataset

This particular dataset already contains cropped images of faces. Other datasets may contain uncropped images with bounding boxes or even segmentation masks.

d.plot_grid()

Additional functionality

For additional functionality including mass loading, datasets splitting or evaluation metrics we refer to the documentation or the notebooks.

Additional datasets

For a list of additional datasets not included in WidlifeDatasets, see this webpage.

Citation

If you like our package, please cite our paper. You may be also interested in our SeaTurtleID2022 dataset published in another paper.

@InProceedings{Cermak_2024_WACV, author = {\v{C}erm\'ak, Vojt\v{e}ch and Picek, Luk\'a\v{s} and Adam, Luk\'a\v{s} and Papafitsoros, Kostas}, title = {{WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification}}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {5953-5963} }

Owner

  • Name: WildlifeDatasets
  • Login: WildlifeDatasets
  • Kind: organization

GitHub Events

Total
  • Create event: 4
  • Release event: 3
  • Issues event: 3
  • Watch event: 48
  • Delete event: 2
  • Member event: 1
  • Issue comment event: 6
  • Push event: 90
  • Pull request event: 1
  • Fork event: 9
Last Year
  • Create event: 4
  • Release event: 3
  • Issues event: 3
  • Watch event: 48
  • Delete event: 2
  • Member event: 1
  • Issue comment event: 6
  • Push event: 90
  • Pull request event: 1
  • Fork event: 9

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 799
  • Total Committers: 5
  • Avg Commits per committer: 159.8
  • Development Distribution Score (DDS): 0.511
Past Year
  • Commits: 214
  • Committers: 2
  • Avg Commits per committer: 107.0
  • Development Distribution Score (DDS): 0.009
Top Committers
Name Email Commits
sadda l****r@g****m 391
adamluk3 a****3@l****z 337
cermavo3 c****3@l****z 46
Vojtech Cermak c****h@s****z 23
Lukas Picek l****k@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 1
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 4
  • Total pull request authors: 1
  • Average comments per issue: 2.4
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: 2 days
  • Average time to close pull requests: about 1 month
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 5.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mfruhner (2)
  • zhoumu53 (1)
  • VojtechCermak (1)
  • MatthiasZuerl (1)
Pull Request Authors
  • picekl (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 966 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 69
  • Total maintainers: 2
proxy.golang.org: github.com/wildlifedatasets/wildlife-datasets
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
proxy.golang.org: github.com/WildlifeDatasets/wildlife-datasets
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
pypi.org: wildlife-datasets

Library for easier access and research of wildlife re-identification datasets

  • Versions: 53
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 966 Last month
Rankings
Downloads: 5.0%
Dependent packages count: 6.6%
Average: 20.2%
Stargazers count: 28.2%
Forks count: 30.5%
Dependent repos count: 30.6%
Maintainers (2)
Last synced: 6 months ago