cultionet

Image segmentation of cultivated land

https://github.com/jgrss/cultionet

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com, mdpi.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

agriculture cropland crops deep-learning fields land-cover pytorch pytorch-lightning remote-sensing satellite
Last synced: 6 months ago · JSON representation

Repository

Image segmentation of cultivated land

Basic Info
  • Host: GitHub
  • Owner: jgrss
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 55.1 MB
Statistics
  • Stars: 30
  • Watchers: 1
  • Forks: 5
  • Open Issues: 7
  • Releases: 25
Topics
agriculture cropland crops deep-learning fields land-cover pytorch pytorch-lightning remote-sensing satellite
Created almost 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License

README.md

License python <!-- -->

Cultionet

Cultionet is a library for semantic segmentation of cultivated land with a neural network. The base architecture is a UNet variant, inspired by UNet 3+ and Psi-Net, with convolution blocks following ResUNet-a. The library is built on PyTorch Lightning and the segmentation objectives (class targets and losses) were designed following previous work in the remote sensing community.

Key features of Cultionet:

Install Cultionet

If PyTorch is installed

commandline pip install git@github.com:jgrss/cultionet.git

See the installation section for more detailed instructions.


Data format

The model inputs are satellite time series (e.g., bands or spectral indices). Data are stored in a PyTorch Data object. For example, Cultionet datasets will have data that look something like the following.

python Data( x=[1, 3, 12, 100, 100], y=[1, 100, 100], bdist=[1, 100, 100], start_year=torch.tensor([2020]), end_year=torch.tensor([2021]), left=torch.tensor([<longitude>]), bottom=torch.tensor([<latitude>]), right=torch.tensor([<longitude>]), top=torch.tensor([<latitude>]), res=torch.tensor([10.0]), batch_id=['{site id}_2021_1_none'], )

where

x = input features = torch.Tensor of (batch x channels/bands x time x height x width) y = labels = torch.Tensor of (batch x height x width) bdist = distance transform = torch.Tensor of (batch x height x width) left = image left coordinate bounds = torch.Tensor bottom = image bottom coordinate bounds = torch.Tensor right = image right coordinate bounds = torch.Tensor top = image top coordinate bounds = torch.Tensor res = image spatial resolution = torch.Tensor batch_id = image id = list

Datasets

Create the vector training dataset

Training data pairs should consist of two files per grid/year. One file is a polygon vector file (stored as a GeoPandas-compatible format like GeoPackage) of the training grid for a region. The other file is a polygon vector file (stored in the same format) of the training labels for a grid.

What is a grid?

A grid defines an area to be labeled. For example, a grid could be 1 km x 1 km. A grid should be small enough to be combined with other grids in batches in GPU memory. Thus, 1 km x 1 km is a good size with, say, Sentinel-2 imagery at 10 m spatial resolution.

Note: grids across a study area should all be of equal dimensions

What is a training label?

Training labels are polygons of delineated cropland (i.e., crop fields). The training labels will be clipped to the training grid (described above). Thus, it is important to digitize all crop fields within a grid unless data are to be used for partial labels.

Configuration file

The configuration file is used to create training datasets. Copy the config template and modify it accordingly.

Training data requirements

The polygon vector file should have a field with values for crop fields set equal to 1. Other crop classes are allowed and can be recoded during the data creation step. However, the current version of cultionet expects the final data to be binary (i.e., 0=non-cropland; 1=cropland). For grids with all null data (i.e., non-crop), simply create a grid file with no intersecting crop polygons.

Training name requirements

There are no requirements. Simply specify the paths in the configuration file.

Example directory structure and format for training data. For each region, there is a grid file and a polygon file. The number of grid/polygon pairs within the region is unlimited.

```yaml regionidfile: - /userdata/training/gridREGIONAYEAR.gpkg - /userdata/training/gridREGIONBYEAR.gpkg - ...

polygonfile: - /userdata/training/croppolygonsREGIONAYEAR.gpkg - /userdata/training/croppolygonsREGIONB_YEAR.gpkg - ... ```

The grid file should contain polygons of the AOIs. The AOIs represent the area that imagery will be clipped and masked to (only 1 km x 1 km has been tested). Required columns include 'geo_id' and 'year', which are a unique identifier and the sampling year, respectively.

```python griddf = gpd.readfile("/userdata/training/gridREGIONAYEAR.gpkg") grid_df.head(2)

                                     geo_id year        geometry

0 REGIONAe3a4f2346f50984d87190249a5def1d0 2021 POLYGON ((... 1 REGIONA18485a3271482f2f8a10bb16ae59be74 2021 POLYGON ((... ```

The polygon file should contain polygons of field boundaries, with a column for the crop class. Any number of other columns can be included. Note that polygons do not need to be clipped to the grids.

python import geopandas as gpd poly_df = gpd.read_file("/user_data/training/crop_polygons_REGION_A_YEAR.gpkg") poly_df.head(2) crop_class geometry 0 1 POLYGON ((... 1 1 POLYGON ((...

Create the image time series

This must be done outside of Cultionet. Essentially, a directory with band or VI time series must be generated before using Cultionet.

  • The raster files should be stored as GeoTiffs with names that follow a date format (e.g., yyyyddd.tif or yyymmdd.tif).
    • The date format can be specified at the CLI.
  • There is no maximum requirement on the temporal frequency (i.e., daily, weekly, bi-weekly, monthly, etc.).
    • Just note that a higher frequency will result in larger memory footprints for the GPU, plus slower training and inference.
  • While there is no requirement for the time series frequency, time series must have different start and end years.
    • For example, a northern hemisphere time series might consist of (1 Jan 2020 to 1 Jan 2021) whereas a southern hemisphere time series might range from (1 July 2020 to 1 July 2021). In either case, note that something like (1 Jan 2020 to 1 Dec 2020) will not work.
  • Time series should align with the training data files. More specifically, the training data year (year in the grid vector file) should correspond to the time series start year.
    • For example, a training grid 'year' column equal to 2022 should be trained on a 2022-2023 image time series.
  • The image time series footprints (bounding box) can be of any size, but should encompass the training data bounds. During data creation (next step below), only the relevant bounds of the image are extracted and matched with the training data using the training grid bounds.

Example time series directory with bi-weekly cadence for three VIs (i.e., evi2, gcvi, kndvi)

yaml project_dir: time_series_vars: grid_id_a: evi2: 2022001.tif 2022014.tif ... 2023001.tif gcvi: <repeat of above> kndvi: <repeat of above> grid_id_b: <repeat of above>

Create the time series training dataset

After training data and image time series have been created, the training data PyTorch files (.pt) can be generated using the commands below.

Note: Modify a copy of the config template as needed and save in the project directory. The command below assumes image time series are saved under /project_dir/time_series_vars. The training polygon and grid paths are taken from the config.yml file.

This command would generate .pt files with image time series of 100 x 100 height/width and a spatial resolution of 10 meters.

```commandline

Activate your virtual environment. See installation section below for environment details.

pyenv venv venv.cultionet

Create the training dataset.

(venv.cultionet) cultionet create --project-path /projectdir --grid-size 100 100 --destination train -r 10.0 --max-crop-class 1 --crop-column cropclass --image-date-format %Y%m%d --num-workers 8 --config-file config.yml ```

The output .pt data files will be stored in /project_dir/data/train/processed. Each .pt data file will consist of all the information needed to train the segmentation model.

Training a model

To train a model on a dataset, use (as an example):

commandline (venv.cultionet) cultionet train --val-frac 0.2 --augment-prob 0.5 --epochs 100 --hidden-channels 32 --processes 8 --load-batch-workers 8 --batch-size 4 --accumulate-grad-batches 4 --dropout 0.2 --deep-sup --dilations 1 2 --pool-by-max --learning-rate 0.01 --weight-decay 1e-4 --attention-weights natten

For more CLI options, see:

commandline (venv.cultionet) cultionet train -h

After a model has been fit, the best/last checkpoint file can be found at /project_dir/ckpt/last.ckpt.

Predicting on an image with a trained model

First, a prediction dataset is needed

commandline (venv.cultionet) cultionet create-predict --project-path /project_dir --year 2022 --ts-path /features --num-workers 4 --config-file project_config.yml

Apply inference over the predictin dataset

commandline (venv.cultionet) cultionet predict --project-path /project_dir --out-path predictions.tif --grid-id 1 --window-size 100 --config-file project_config.yml --device gpu --processes 4

Installation

Install Cultionet (assumes a working CUDA installation)

  1. Create a new virtual environment (example using pyenv) commandline pyenv virtualenv 3.10.14 venv.cultionet pyenv activate venv.cultionet

  2. Update install numpy and Python GDAL (assumes GDAL binaries are already installed) commandline (venv.cultionet) pip install -U pip (venv.cultionet) pip install -U setuptools wheel pip install -U numpy==1.24.4 (venv.cultionet) pip install setuptools==57.5.0 (venv.cultionet) GDAL_VERSION=$(gdal-config --version | awk -F'[.]' '{print $1"."$2"."$3}') (venv.cultionet) pip install GDAL==$GDAL_VERSION --no-binary=gdal

  3. Install PyTorch 2.2.1 for CUDA 11.4 and 11.8 commandline (venv.cultionet) pip install -U --no-cache-dir setuptools>=65.5.1 (venv.cultionet) pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118

The command below should print True if PyTorch can access a GPU.

commandline python -c "import torch;print(torch.cuda.is_available())"

  1. Install natten for CUDA 11.8 if using neighborhood attention. commandline (venv.cultionet) pip install natten==0.17.1+torch220cu118 -f https://shi-labs.com/natten/wheels

  2. Install cultionet

commandline (venv.cultionet) pip install git@github.com:jgrss/cultionet.git

Installing CUDA on Ubuntu

See CUDA installation

Owner

  • Name: Jordan Graesser
  • Login: jgrss
  • Kind: user

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 112
  • Total Committers: 5
  • Avg Commits per committer: 22.4
  • Development Distribution Score (DDS): 0.277
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jordan Graesser j****s 81
jgrss j****r@g****m 26
Michael Mann m****3@g****m 2
MatthewPierson90 7****0 2
nnguyen622 6****2 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 19
  • Total pull requests: 72
  • Average time to close issues: 6 months
  • Average time to close pull requests: 12 days
  • Total issue authors: 4
  • Total pull request authors: 4
  • Average comments per issue: 7.11
  • Average comments per pull request: 0.08
  • Merged pull requests: 63
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 16
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 hour
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MatthewPierson90 (10)
  • mmann1123 (4)
  • jgrss (3)
  • varunshah111 (1)
Pull Request Authors
  • jgrss (72)
  • MatthewPierson90 (5)
  • mmann1123 (2)
  • nnguyen622 (1)
Top Labels
Issue Labels
documentation (2) bug (1) enhancement (1)
Pull Request Labels
bug (17) enhancement (11) documentation (8)

Dependencies

docs/requirements.txt pypi
  • PyYAML >=5.1
  • attrs >=21.
  • decorator ==4.4.2
  • frozendict >=2.2.
  • frozenlist >=1.3.
  • future >=0.17.1
  • geopandas >=0.10.
  • graphviz >=0.19.
  • numpy <=1.21.0
  • numpydoc *
  • opencv-python >=4.5.5.
  • pandas <=1.3.5
  • pyDeprecate ==0.3.1
  • pytorch_lightning >=1.5.9
  • rasterio *
  • rtree >=0.9.7
  • scikit-image >=0.19.
  • scipy >=1.2.
  • setuptools ==59.5.0
  • shapely >=1.8.
  • sphinx *
  • sphinx-automodapi *
  • sphinxcontrib-apidoc *
  • sphinxcontrib-bibtex ==1.0.0
  • sphinxcontrib-napoleon *
  • tensorboard >=2.2.0
  • torch *
  • torch-geometric >=2.0.2
  • torch-geometric-temporal >=0.40
  • torchmetrics >=0.7.0
  • tqdm >=4.62.
  • xarray >=0.21.
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • syphar/restore-pip-download-cache v1 composite
  • syphar/restore-virtualenv v1 composite
Dockerfile docker
  • nvidia/cuda 11.3.0-base-ubuntu20.04 build
environment.yml pypi
pyproject.toml pypi
setup.py pypi