s4a

Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

https://github.com/orion-ai-lab/s4a

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Keywords

crop-classification deep-learning segmentation sentinel-2

Last synced: 6 months ago · JSON representation

Repository

Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Basic Info

Host: GitHub
Owner: Orion-AI-Lab
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 6.73 MB

Statistics

Stars: 101
Watchers: 6
Forks: 20
Open Issues: 5
Releases: 0

Topics

crop-classification deep-learning segmentation sentinel-2

Created about 4 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

Sen4AgriNet

A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Contributors: Sykas D., Zografakis D., Sdraka M.

Supplementary repo with DL experiments using the Sen4AgriNet dataset: Sen4AgriNet-Models.

This repository provides a native PyTorch Dataset Class for Sen4AgriNet dataset (patches_dataset.py). Should work with any new version of PyTorch1.7.1+ and Python3.8.5+.

Dataset heavily relies on cocoapi for dataloading and indexing, therefore make sure you have it installed: python pip3 install pycocotools

Then make sure every other requirement is installed: python pip3 install -r requirements.txt

Instructions

In order to use the provided PyTroch Dataset class, the required netCDF files of Sen4AgriNet must be downloaded and placed inside the dataset/netcdf/ folder. These files are available for download at Dropbox, Google Drive and HuggingFace Hub.

Then, three separate COCO files must be created: one for training, one for validation and one for testing. Alternatively, the predefined COCO files for the 3 Scenarios can be downloaded from here.

After this initial setup, patches_dataset.py can be used in a PyTorch deep learning pipeline to load, prepare and return patches from the dataset according to the split dictated by the COCO files. This Dataset class has the following features: - Reads the netCDF files of the dataset containing the Sentinel-2 observations over time and the corresponding labels. - Isolates the Sentinel-2 bands requested by the user. - Computes the median Sentinel-2 image on a given frequency, e.g. monthly (or loads precomputed medians, if any). - Returns the timeseries of median images inside a predefined window. - Normalizes the images. - Returns hollstein masks for clouds, cirrus, shadow or snow. - Returns a parcel mask: 1 for parcel, 0 for non-parcel. - Can alternatively return binary labels: 1 for crops, 0 for non-crops.

Dataset exploration

This is roughly the way that our patches_dataset.py works. The whole procedure is also described in the provided notebook.

Open a netCDF file for exploration.

```python3 import netCDF4 from pathlib import Path

patch = netCDF4.Dataset(Path('data/202031TCGpatch1414.nc'), 'r') patch ```

Outputs python3 """ <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): title: S4A Patch Dataset authors: Papoutsis I., Sykas D., Zografakis D., Sdraka M. patch_full_name: 2020_31TCG_patch_14_14 patch_year: 2020 patch_name: patch_14_14 patch_country_code: ES patch_tile: 31TCG creation_date: 27 Apr 2021 references: Documentation available at . institution: National Observatory of Athens. version: 21.03 _format: NETCDF4 _nco_version: netCDF Operators version 4.9.1 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco) _xarray_version: 0.17.0 dimensions(sizes): variables(dimensions): groups: B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, labels, parcels """ 2. Visualize a single timestamp.

```python3 import xarray as xr

banddata = xr.opendataset(xr.backends.NetCDF4DataStore(patch['B02'])) band_data.B02.isel(time=0).plot() ``` Single Month

Visualize the labels:

python3 labels = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['labels'])) labels.labels.plot() Labels

Visualize the parcels:

python3 parcels = xr.open_dataset(xr.backends.NetCDF4DataStore(patch['parcels'])) parcels.parcels.plot() Parcels

Plot the median of observations for each month:

```python3 import pandas as pd

Or maybe aggregate based on a given frequency

Refer to

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

group_freq = '1MS'

Grab year from netcdf4's global attribute

year = patch.patch_year

output intervals

daterange = pd.daterange(start=f'{year}-01-01', end=f'{int(year) + 1}-01-01', freq=group_freq)

Aggregate based on given frequency

banddata = banddata.groupbybins( 'time', bins=daterange, right=True, includelowest=False, labels=daterange[:-1] ).median(dim='time') ```

If you plot right now, you might notice that some months are empty: Single Month

(Optional) Fill in empty months:

```python3 import matplotlib.pyplot as plt

banddata = banddata.interpolatena(dim='timebins', method='linear', fill_value='extrapolate')

fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(18, 12))

for i, season in enumerate(band_data.B02):

ax = axes.flat[i]
cax = band_data.B02.isel(time_bins=i).plot(ax=ax)

for i, ax in enumerate(axes.flat): ax.axes.getxaxis().setticklabels([]) ax.axes.getyaxis().setticklabels([]) ax.axes.axis('tight') ax.setxlabel('') ax.setylabel('') ax.set_title(f'Month: {i+1}')

plt.tight_layout() plt.show() ``` Per Month

PatchesDataset usage example

Please refer to the provided notebook for a detailed usage example of the provided PatchesDataset.

Read the COCO file to be used. python3 from pathlib import Path from pycocotools.coco import COCO root_path_coco = Path('coco_files/') coco_train = COCO(root_path_coco / 'coco_example.json')
Initialize the PatchesDataset. python3 from torch.utils.data import DataLoader from patches_dataset import PatchesDataset from utils.config import LINEAR_ENCODER root_path_netcdf = Path('dataset/netcdf') # Path to the netCDF files dataset_train = PatchesDataset(root_path_netcdf=root_path_netcdf, coco=coco_train, group_freq='1MS', prefix='test_patchesdataset', bands=['B02', 'B03', 'B04'], linear_encoder=LINEAR_ENCODER, saved_medians=False, window_len=6, requires_norm=False, return_masks=False, clouds=False, cirrus=False, shadow=False, snow=False, output_size=(183, 183) )
Initialize the Dataloader. python3 dataloader_train = DataLoader(dataset_train, batch_size=1, shuffle=True, num_workers=4, pin_memory=True )
Get a batch. python3 batch = next(iter(dataloader_train))

The batch variable is a dictionary containing the keys: medians, labels, idx. batch['medians'] contains a pytorch tensor of size [1, 6, 3, 183, 183] where: - batch size: 1 - timestamps: 6 - bands: 3 - height: 183 - width: 183

Batch Medians

batch['labels'] contains the corresponding labels of the medians, which is a pytorch tensor of size [1, 183, 183] where: - batch size: 1 - height: 183 - width: 183

Batch Labels

batch['idx'] contains the index of the returned timeseries.

Webpage

Dataset Webpage: https://www.sen4agrinet.space.noa.gr/

Experiments

Please visit Sen4AgriNet-Models for a complete experimentation pipeline using the Sen4AgriNet dataset.

Citation

To cite please use: @ARTICLE{ 9749916, author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis}, journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing}, title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning}, year={2022}, doi={10.1109/JSTARS.2022.3164771} }

Owner

Name: Orion Lab
Login: Orion-AI-Lab
Kind: organization
Email: ipapoutsis@noa.gr
Location: Greece

Repositories: 5
Profile: https://github.com/Orion-AI-Lab

Orion Lab research group: Deep Learning in Earth Observation at the National Observatory of Athens

GitHub Events

Total

Issues event: 4
Watch event: 11
Issue comment event: 1
Push event: 1
Fork event: 2

Last Year

Issues event: 4
Watch event: 11
Issue comment event: 1
Push event: 1
Fork event: 2

Committers

Last synced: 6 months ago

All Time

Total Commits: 18
Total Committers: 3
Avg Commits per committer: 6.0
Development Distribution Score (DDS): 0.444

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
masdra	p**s@g**m	10
Maria Sdraka	p**s@u**m	6
dimzog	z**3@g**m	2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 9
Total pull requests: 1
Average time to close issues: about 7 hours
Average time to close pull requests: 15 days
Total issue authors: 7
Total pull request authors: 1
Average comments per issue: 1.78
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 0
Average time to close issues: about 2 hours
Average time to close pull requests: N/A
Issue authors: 3
Pull request authors: 0
Average comments per issue: 0.4
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

StorywithLove (2)
VSainteuf (2)
Spiruel (1)
PeterKKan (1)
Multihuntr (1)
nilsleh (1)
saqibzia-dev (1)

s4a

Science Score: 39.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Sen4AgriNet

A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning

Instructions

Dataset exploration

Or maybe aggregate based on a given frequency

Refer to

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases

Grab year from netcdf4's global attribute

output intervals

Aggregate based on given frequency

PatchesDataset usage example

Webpage

Experiments

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels