s4a

Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

https://github.com/orion-ai-lab/s4a

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

crop-classification deep-learning segmentation sentinel-2
Last synced: 6 months ago · JSON representation

Repository

Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Basic Info
  • Host: GitHub
  • Owner: Orion-AI-Lab
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 6.73 MB
Statistics
  • Stars: 101
  • Watchers: 6
  • Forks: 20
  • Open Issues: 5
  • Releases: 0
Topics
crop-classification deep-learning segmentation sentinel-2
Created about 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Sen4AgriNet

A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Contributors: Sykas D., Zografakis D., Sdraka M.


Supplementary repo with DL experiments using the Sen4AgriNet dataset: Sen4AgriNet-Models.


This repository provides a native PyTorch Dataset Class for Sen4AgriNet dataset (patches_dataset.py). Should work with any new version of PyTorch1.7.1+ and Python3.8.5+.

Dataset heavily relies on cocoapi for dataloading and indexing, therefore make sure you have it installed: python pip3 install pycocotools

Then make sure every other requirement is installed: python pip3 install -r requirements.txt

Instructions

In order to use the provided PyTroch Dataset class, the required netCDF files of Sen4AgriNet must be downloaded and placed inside the dataset/netcdf/ folder. These files are available for download at Dropbox, Google Drive and HuggingFace Hub.

Then, three separate COCO files must be created: one for training, one for validation and one for testing. Alternatively, the predefined COCO files for the 3 Scenarios can be downloaded from here.

After this initial setup, patches_dataset.py can be used in a PyTorch deep learning pipeline to load, prepare and return patches from the dataset according to the split dictated by the COCO files. This Dataset class has the following features: - Reads the netCDF files of the dataset containing the Sentinel-2 observations over time and the corresponding labels. - Isolates the Sentinel-2 bands requested by the user. - Computes the median Sentinel-2 image on a given frequency, e.g. monthly (or loads precomputed medians, if any). - Returns the timeseries of median images inside a predefined window. - Normalizes the images. - Returns hollstein masks for clouds, cirrus, shadow or snow. - Returns a parcel mask: 1 for parcel, 0 for non-parcel. - Can alternatively return binary labels: 1 for crops, 0 for non-crops.

Dataset exploration

This is roughly the way that our patches_dataset.py works. The whole procedure is also described in the provided notebook.

  1. Open a netCDF file for exploration.

```python3 import netCDF4 from pathlib import Path

patch = netCDF4.Dataset(Path('data/202031TCGpatch1414.nc'), 'r') patch ```

Outputs python3 """ <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): title: S4A Patch Dataset authors: Papoutsis I., Sykas D., Zografakis D., Sdraka M. patch_full_name: 2020_31TCG_patch_14_14 patch_year: 2020 patch_name: patch_14_14 patch_country_code: ES patch_tile: 31TCG creation_date: 27 Apr 2021 references: Documentation available at . institution: National Observatory of Athens. version: 21.03 _format: NETCDF4 _nco_version: netCDF Operators version 4.9.1 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco) _xarray_version: 0.17.0 dimensions(sizes): variables(dimensions): groups: B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, labels, parcels """ 2. Visualize a single timestamp.

```python3 import xarray as xr

banddata = xr.opendataset(xr.backends.NetCDF4DataStore(patch['B02'])) band_data.B02.isel(time=0).plot() ``` Single Month

  1. Visualize the labels:

python3 labels = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['labels'])) labels.labels.plot() Labels

  1. Visualize the parcels:

python3 parcels = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['parcels'])) parcels.parcels.plot() Parcels

  1. Plot the median of observations for each month:

```python3 import pandas as pd

Or maybe aggregate based on a given frequency

Refer to

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

group_freq = '1MS'

Grab year from netcdf4's global attribute

year = patch.patch_year

output intervals

daterange = pd.daterange(start=f'{year}-01-01', end=f'{int(year) + 1}-01-01', freq=group_freq)

Aggregate based on given frequency

banddata = banddata.groupbybins( 'time', bins=daterange, right=True, includelowest=False, labels=daterange[:-1] ).median(dim='time') ```

If you plot right now, you might notice that some months are empty: Single Month

(Optional) Fill in empty months:

```python3 import matplotlib.pyplot as plt

banddata = banddata.interpolatena(dim='timebins', method='linear', fill_value='extrapolate')

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(18, 12))

for i, season in enumerate(band_data.B02):

ax = axes.flat[i]
cax = band_data.B02.isel(time_bins=i).plot(ax=ax)

for i, ax in enumerate(axes.flat): ax.axes.getxaxis().setticklabels([]) ax.axes.getyaxis().setticklabels([]) ax.axes.axis('tight') ax.setxlabel('') ax.setylabel('') ax.set_title(f'Month: {i+1}')

plt.tight_layout() plt.show() ``` Per Month

PatchesDataset usage example

Please refer to the provided notebook for a detailed usage example of the provided PatchesDataset.

  1. Read the COCO file to be used. python3 from pathlib import Path from pycocotools.coco import COCO root_path_coco = Path('coco_files/') coco_train = COCO(root_path_coco / 'coco_example.json')

  2. Initialize the PatchesDataset. python3 from torch.utils.data import DataLoader from patches_dataset import PatchesDataset from utils.config import LINEAR_ENCODER root_path_netcdf = Path('dataset/netcdf') # Path to the netCDF files dataset_train = PatchesDataset(root_path_netcdf=root_path_netcdf, coco=coco_train, group_freq='1MS', prefix='test_patchesdataset', bands=['B02', 'B03', 'B04'], linear_encoder=LINEAR_ENCODER, saved_medians=False, window_len=6, requires_norm=False, return_masks=False, clouds=False, cirrus=False, shadow=False, snow=False, output_size=(183, 183) )

  3. Initialize the Dataloader. python3 dataloader_train = DataLoader(dataset_train, batch_size=1, shuffle=True, num_workers=4, pin_memory=True )

  4. Get a batch. python3 batch = next(iter(dataloader_train))

The batch variable is a dictionary containing the keys: medians, labels, idx. batch['medians'] contains a pytorch tensor of size [1, 6, 3, 183, 183] where: - batch size: 1 - timestamps: 6 - bands: 3 - height: 183 - width: 183

Batch Medians

batch['labels'] contains the corresponding labels of the medians, which is a pytorch tensor of size [1, 183, 183] where: - batch size: 1 - height: 183 - width: 183

Batch Labels

batch['idx'] contains the index of the returned timeseries.

Webpage

Dataset Webpage: https://www.sen4agrinet.space.noa.gr/

Experiments

Please visit Sen4AgriNet-Models for a complete experimentation pipeline using the Sen4AgriNet dataset.

Citation

To cite please use: @ARTICLE{ 9749916, author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis}, journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing}, title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning}, year={2022}, doi={10.1109/JSTARS.2022.3164771} }

Owner

  • Name: Orion Lab
  • Login: Orion-AI-Lab
  • Kind: organization
  • Email: ipapoutsis@noa.gr
  • Location: Greece

Orion Lab research group: Deep Learning in Earth Observation at the National Observatory of Athens

GitHub Events

Total
  • Issues event: 4
  • Watch event: 11
  • Issue comment event: 1
  • Push event: 1
  • Fork event: 2
Last Year
  • Issues event: 4
  • Watch event: 11
  • Issue comment event: 1
  • Push event: 1
  • Fork event: 2

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 18
  • Total Committers: 3
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.444
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
masdra p****s@g****m 10
Maria Sdraka p****s@u****m 6
dimzog z****3@g****m 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 9
  • Total pull requests: 1
  • Average time to close issues: about 7 hours
  • Average time to close pull requests: 15 days
  • Total issue authors: 7
  • Total pull request authors: 1
  • Average comments per issue: 1.78
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.4
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • StorywithLove (2)
  • VSainteuf (2)
  • Spiruel (1)
  • PeterKKan (1)
  • Multihuntr (1)
  • nilsleh (1)
  • saqibzia-dev (1)
Pull Request Authors
  • paren8esis (1)
Top Labels
Issue Labels
Pull Request Labels