awesome-forests

🌳 A curated list of ground-truth forest datasets for the machine learning and forestry community.

https://github.com/blutjens/awesome-forests

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • β—‹
    CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • βœ“
    .zenodo.json file
    Found .zenodo.json file
  • βœ“
    DOI references
    Found 26 DOI reference(s) in README
  • βœ“
    Academic publication links
    Links to: sciencedirect.com, science.org, ieee.org, zenodo.org
  • βœ“
    Committers with academic emails
    1 of 9 committers (11.1%) from academic institutions
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary

Keywords

biodiversity carbon climate-change datasets deep-learning ecosystems forestry machine-learning
Last synced: 6 months ago · JSON representation

Repository

🌳 A curated list of ground-truth forest datasets for the machine learning and forestry community.

Basic Info
  • Host: GitHub
  • Owner: blutjens
  • License: cc0-1.0
  • Default Branch: main
  • Homepage:
  • Size: 764 KB
Statistics
  • Stars: 332
  • Watchers: 9
  • Forks: 41
  • Open Issues: 5
  • Releases: 0
Topics
biodiversity carbon climate-change datasets deep-learning ecosystems forestry machine-learning
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

awesome-forests Awesome

Awesome-forests is a curated list of ground-truth/validation/in situ forest datasets for the forest-interested machine learning community. The list targets data-based biodiversity, carbon, wildfire, ecosystem service, you name it! analysis. The list does NOT contain data products, such as, algorithm-generated global maps.

Getting started with data science in forests is TOUGH. The lack of organized datasets is one reason why. So, this list of datasets intends to get you started with building machine learning models for analysing your forests.

If you know of a dataset that you like, please create an issue or email (lutjens at mit [dot] edu) and I'll add it! Thank you:)

Happy dog in a forest by Jamie street on Unsplash Photo of a dog in a forest, by Jamie Street on Unsplash

Content

Tree species classification

Processed

Raw

Tree detection

Processed

  • DeepForest WeEcology NEON (Weecology, NEON, UofFlorida, 2018) \ A tree detection dataset from β‰ˆ22 National Forest sites, USA with >15k labeled and >400k unlabeled trees with airborne RGB, Hyperspectral, and Lidar imagery.

  • Kaggle Aerial Cactus Identification (CONACYT, 2019) \ A cactus detection dataset from Mexiko with 17k cacti with airborne RGB imagery.

  • Swedish National Forest Data Lab: Forest Damages – Larch Casebearer 1.0. (Swedish Forest Agency 2021) \ A tree detection and classification dataset from 10 sites with RGB drone imagery. In total ~ 102k annotated bounding boxes labeled "Lark" or "other", of which ~ 44,5k are also labeled describing tree damage in four categories.

  • Norlab – PercepTree (Northern laboratory, 2022) \ This repository contains two datasets: 43k synthetic forest images and 100 real image dataset. Both include high-definition RGB images with depth information, bounding box, instance segmentation masks and keypoints annotation.

Raw

Tree traits, damage, and health classification

  • Spectra to functional traits (Cherif et al., 2023) \ A functional trait analysis dataset with 42 sets across the US, Asia, and Europe with multi-sensor airborne 1D canopy spectra (400-2500nm) and in-situ trait measurements (AI, LMA, pigments, ...)

  • Forest Damages – Larch Casebearer (Swedish Forest Agency, 2021) \ A tree damage classification dataset from 5 areas in Sweden with 1.5k images with >100k labeled trees with airborne RGB

Raw

  • UAV data of standing deadwood (Schiefer et al., 2023) \ A dataset of aerial UAV imagery of standing deadwood as time-series over four years (2018-2021), ~700ha, at 10m resolution, in Germany and Finland

Navigation in forests

  • FinnWoodlands Dataset (Tampere University, Finland, 2023) \ A dataset for autonomous navigation inside forests with ~5K RGB stereo images, point clouds, and sparse depth maps, as well as 300 annotated frames for semantic, instance, or panoptic segmentation of tree trunks, paths, and more.

Biodiversity flora

  • Kaggle iNaturalist (iNaturalist, FGVC8, 2021) \ A flora and fauna species classification dataset from global sites with 2.7M labeled images of 10k species with smartphone imagery.

  • Kaggle GeoLifeCLEF 2021 (ImageCLEF, 2021) \ A flora and fauna location-based species recommendation dataset from France with 1.9M labeled images of 31k species with satellite imagery and cartographic variables.

Aboveground carbon quantification

Processed

Raw

Belowground carbon quantification

  • todo: add ground-truth datasets on belowground carbon inventories

Tree crown segmentation

Processed

  • FOR-instance (Puliti et al., 2023) \ ML-ready benchmark dataset for 3D semantic and instance segmentation of 1130 individual trees within 5 classes from UAV-based Riegl LiDAR sensor with over 2.79 ha from 5 countries.

  • Quebec Trees Dataset (Cloutier et al., 2023) \ Tree crown segmentation dataset of 14 classes over 23000 labeled tree crowns. The dataset is composed of high-resolution RGB orthomosaics for seven dates in 2021, and associated photogrammetric point clouds.

Raw

Forest type and land cover classification

  • coastTrain (Murray et al., 2022) \ A dataset with over 190K point observations of coastal ecosystem classes (tidal flat, mangrove, coral reef, saltmarsh, seagrass, intertidal, kelp, ...) including geolocation and relevant metadata, but no satellite imagery.

  • BigEarthNet: large-scale Sentinel-2 benchmark (TU Berlin, 2019) \ A landcover multi-classification dataset from 10 European countries with β‰ˆ600k labeled images with CORINE land cover labels with Sentinel-2 L2A (10m res.) satellite imagery.

  • Chesapeake land cover (Chesapeake Conservancy, Microsoft, NAIP, USGS, 2013-2017) \ A land cover classification dataset from the Chesapeake Bay, USA, of a 6x7kmΒ² area with high- and low-resolution (NLCD) land cover labels with high- (NAIP, RGB-NIR) and low-resolution (Landsat 8, 13-band) satellite imagery.

  • Kaggle Planet: Understanding the Amazon from Space (SCCON, Planet, 2017) \ A land cover classification dataset from the Amazon with deforestation, mining, cloud labels with RGB-NIR (5m res.) satellite imagery.

  • WiDS Datathon 2019: detection of oil palm plantations (Global WiDS Team & West Big Data Innovation Hub, 2019) \ Binary palm oil plantation classification with 20k images with Planet RGB (3m res.) satellite imagery

  • UC Merced land use dataset(UC Merced, 2010) \ A small land cover classification dataset with 2100 images and 21 balanced classes with airborne (0.3m res.) imagery.

  • See Awesome satellite imagery datasets for more satellite imagery datasets.

  • See SustainBench for more UN SDG -related satellite imagery datasets.

Change detection and deforestation

  • Dynamic EarthNet challenge (Planet, DLR, TUM, 2021) \ A time-series prediction and multi-class change detection dataset of Europe over 2-years with 75 image time-series with 7 land-cover labels and weekly Planet RGB (3m res.) imagery.

  • Semantic change detection dataset (SECOND) (Yang et al., 2020) \ A land cover change detection dataset in over cities and suburbs in China with β‰ˆ5k image-pairs with 6 land cover classes and airborne imagery.

  • ForestNet deforestation driver (Jeremy Irvin, Hao Sheng et al., 2020) \ A dataset that consists of 2,756 LANDSAT-8 satellite images of forest loss events with deforestation driver annotations. The driver annotations were grouped into Plantation, Smallholder Agriculture, Grassland/shrubland, and Other.

  • Global Forest Change (University of Maryland, 2013) \ Different layers of global forest loss, extracted from Landsat satellite imagery, todo: this is a data product, find ground-truth data

  • Awesome remote sensing change detection \ A list with more change detection datasets.

Wildfire

  • todo: add datasets for fire detection, fuel moisture quantification, wildfire spread prediction, etc.
  • todo: Add https://mlhub.earth/data/susarmoisturecontentmain, https://www.sciencedirect.com/science/article/pii/S003442572030167X?via%3Dihub

Wildlife

  • iWildCam A species classification dataset from 414 global locations with >200k labeled images with wildlife camera trap imagery, Landsat-8 multispectral imagery, and GPS coordinates.

  • iNaturalist Multiple species classification datasets from global imagery of animals and plants with >2.7M from 10k species.

  • See LILA.science for more processed conservation datasets

  • See Awesome-deep-ecology for more ecology datasets

Bioacoustics

  • todo: add bioacoustics datasets

Raw geospatial imagery

Awesome-awesome

  • Awesome satellite imagery datasets \ A list of more satellite imagery datasets with annotations for deep learning and computer vision.

  • Awesome GIS \ A list of GIS resources.

  • OpenForest \ A list of over 88 in-situ datasets and data products in forestry that are open-access and focused on understanding the composition of forests at the tree level

  • todo: add link to dataset list on conservationtech.directory

Excluded data products

These datasets were excluded, because we could not find a source for the validation dataset. If you know the source please create an issue or pull request. - Forest biomass in China

Attributions

Owner

  • Name: BjΓΆrn LΓΌtjens (he/him)
  • Login: blutjens
  • Kind: user
  • Company: MIT

Postdoctoral Associate in tackling climate change with AI @ MIT. Project overview at https://blutjens.github.io/

GitHub Events

Total
  • Issues event: 6
  • Watch event: 51
  • Issue comment event: 4
  • Push event: 8
  • Pull request event: 4
  • Fork event: 6
Last Year
  • Issues event: 6
  • Watch event: 51
  • Issue comment event: 4
  • Push event: 8
  • Pull request event: 4
  • Fork event: 6

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 50
  • Total Committers: 9
  • Avg Commits per committer: 5.556
  • Development Distribution Score (DDS): 0.18
Past Year
  • Commits: 17
  • Committers: 4
  • Avg Commits per committer: 4.25
  • Development Distribution Score (DDS): 0.235
Top Committers
Name Email Commits
blutjens b****s@g****m 41
hoeoek 6****k 2
omahs 7****s 1
melisandeteng 3****g 1
han16nah h****h@m****g 1
dwddao c****o@g****m 1
Vincent Grondin 5****n 1
Konstantin Klemmer k****r@g****e 1
Gyri g****n@t****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 10
  • Total pull requests: 10
  • Average time to close issues: 10 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 2
  • Total pull request authors: 8
  • Average comments per issue: 0.4
  • Average comments per pull request: 0.5
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 5
  • Average time to close issues: 4 months
  • Average time to close pull requests: 4 days
  • Issue authors: 2
  • Pull request authors: 3
  • Average comments per issue: 0.25
  • Average comments per pull request: 0.4
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • blutjens (9)
  • dbailleul (1)
Pull Request Authors
  • melisandeteng (2)
  • hoeoek (2)
  • omahs (2)
  • gyrrei (1)
  • konstantinklemmer (1)
  • daviddao (1)
  • han16nah (1)
  • VGrondin (1)
Top Labels
Issue Labels
Pull Request Labels