Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org, sciencedirect.com, mdpi.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Repository
Image segmentation of cultivated land
Basic Info
Statistics
- Stars: 30
- Watchers: 1
- Forks: 5
- Open Issues: 7
- Releases: 25
Topics
Metadata Files
README.md
Cultionet
Cultionet is a library for semantic segmentation of cultivated land with a neural network. The base architecture is a UNet variant, inspired by UNet 3+ and Psi-Net, with convolution blocks following ResUNet-a. The library is built on PyTorch Lightning and the segmentation objectives (class targets and losses) were designed following previous work in the remote sensing community.
Key features of Cultionet:
- uses satellite image time series instead of individual dates for training and inference
- uses a Transformer time series embeddings
- uses a UNet architecture with skip connections and deep supervision similar to UNet 3+
- uses multi-stream outputs inspired by Psi-Net
- uses residual ResUNet-a blocks with Dilated Neighborhood Attention
- uses the Tanimoto loss
Install Cultionet
If PyTorch is installed
commandline
pip install git@github.com:jgrss/cultionet.git
See the installation section for more detailed instructions.
Data format
The model inputs are satellite time series (e.g., bands or spectral indices). Data are stored in a PyTorch Data object. For example, Cultionet datasets will have data that look something like the following.
python
Data(
x=[1, 3, 12, 100, 100], y=[1, 100, 100], bdist=[1, 100, 100],
start_year=torch.tensor([2020]), end_year=torch.tensor([2021]),
left=torch.tensor([<longitude>]), bottom=torch.tensor([<latitude>]),
right=torch.tensor([<longitude>]), top=torch.tensor([<latitude>]),
res=torch.tensor([10.0]), batch_id=['{site id}_2021_1_none'],
)
where
x = input features = torch.Tensor of (batch x channels/bands x time x height x width)
y = labels = torch.Tensor of (batch x height x width)
bdist = distance transform = torch.Tensor of (batch x height x width)
left = image left coordinate bounds = torch.Tensor
bottom = image bottom coordinate bounds = torch.Tensor
right = image right coordinate bounds = torch.Tensor
top = image top coordinate bounds = torch.Tensor
res = image spatial resolution = torch.Tensor
batch_id = image id = list
Datasets
Create the vector training dataset
Training data pairs should consist of two files per grid/year. One file is a polygon vector file (stored as a GeoPandas-compatible format like GeoPackage) of the training grid for a region. The other file is a polygon vector file (stored in the same format) of the training labels for a grid.
What is a grid?
A grid defines an area to be labeled. For example, a grid could be 1 km x 1 km. A grid should be small enough to be combined with other grids in batches in GPU memory. Thus, 1 km x 1 km is a good size with, say, Sentinel-2 imagery at 10 m spatial resolution.
Note: grids across a study area should all be of equal dimensions
What is a training label?
Training labels are polygons of delineated cropland (i.e., crop fields). The training labels will be clipped to the training grid (described above). Thus, it is important to digitize all crop fields within a grid unless data are to be used for partial labels.
Configuration file
The configuration file is used to create training datasets. Copy the config template and modify it accordingly.
Training data requirements
The polygon vector file should have a field with values for crop fields set equal to 1. Other crop classes are allowed and can be recoded during the data creation step. However, the current version of cultionet expects the final data to be binary (i.e., 0=non-cropland; 1=cropland). For grids with all null data (i.e., non-crop), simply create a grid file with no intersecting crop polygons.
Training name requirements
There are no requirements. Simply specify the paths in the configuration file.
Example directory structure and format for training data. For each region, there is a grid file and a polygon file. The number of grid/polygon pairs within the region is unlimited.
```yaml regionidfile: - /userdata/training/gridREGIONAYEAR.gpkg - /userdata/training/gridREGIONBYEAR.gpkg - ...
polygonfile: - /userdata/training/croppolygonsREGIONAYEAR.gpkg - /userdata/training/croppolygonsREGIONB_YEAR.gpkg - ... ```
The grid file should contain polygons of the AOIs. The AOIs represent the area that imagery will be clipped and masked to (only 1 km x 1 km has been tested). Required columns include 'geo_id' and 'year', which are a unique identifier and the sampling year, respectively.
```python griddf = gpd.readfile("/userdata/training/gridREGIONAYEAR.gpkg") grid_df.head(2)
geo_id year geometry
0 REGIONAe3a4f2346f50984d87190249a5def1d0 2021 POLYGON ((... 1 REGIONA18485a3271482f2f8a10bb16ae59be74 2021 POLYGON ((... ```
The polygon file should contain polygons of field boundaries, with a column for the crop class. Any number of other columns can be included. Note that polygons do not need to be clipped to the grids.
python
import geopandas as gpd
poly_df = gpd.read_file("/user_data/training/crop_polygons_REGION_A_YEAR.gpkg")
poly_df.head(2)
crop_class geometry
0 1 POLYGON ((...
1 1 POLYGON ((...
Create the image time series
This must be done outside of Cultionet. Essentially, a directory with band or VI time series must be generated before using Cultionet.
- The raster files should be stored as GeoTiffs with names that follow a date format (e.g.,
yyyyddd.tiforyyymmdd.tif).- The date format can be specified at the CLI.
- There is no maximum requirement on the temporal frequency (i.e., daily, weekly, bi-weekly, monthly, etc.).
- Just note that a higher frequency will result in larger memory footprints for the GPU, plus slower training and inference.
- While there is no requirement for the time series frequency, time series must have different start and end years.
- For example, a northern hemisphere time series might consist of (1 Jan 2020 to 1 Jan 2021) whereas a southern hemisphere time series might range from (1 July 2020 to 1 July 2021). In either case, note that something like (1 Jan 2020 to 1 Dec 2020) will not work.
- Time series should align with the training data files. More specifically, the training data year (year in the grid vector file) should correspond to the time series start year.
- For example, a training grid 'year' column equal to 2022 should be trained on a 2022-2023 image time series.
- The image time series footprints (bounding box) can be of any size, but should encompass the training data bounds. During data creation (next step below), only the relevant bounds of the image are extracted and matched with the training data using the training grid bounds.
Example time series directory with bi-weekly cadence for three VIs (i.e., evi2, gcvi, kndvi)
yaml
project_dir:
time_series_vars:
grid_id_a:
evi2:
2022001.tif
2022014.tif
...
2023001.tif
gcvi:
<repeat of above>
kndvi:
<repeat of above>
grid_id_b:
<repeat of above>
Create the time series training dataset
After training data and image time series have been created, the training data PyTorch files (.pt) can be generated using the commands below.
Note: Modify a copy of the config template as needed and save in the project directory. The command below assumes image time series are saved under
/project_dir/time_series_vars. The training polygon and grid paths are taken from the config.yml file.
This command would generate .pt files with image time series of 100 x 100 height/width and a spatial resolution of 10 meters.
```commandline
Activate your virtual environment. See installation section below for environment details.
pyenv venv venv.cultionet
Create the training dataset.
(venv.cultionet) cultionet create --project-path /projectdir --grid-size 100 100 --destination train -r 10.0 --max-crop-class 1 --crop-column cropclass --image-date-format %Y%m%d --num-workers 8 --config-file config.yml ```
The output .pt data files will be stored in /project_dir/data/train/processed. Each .pt data file will consist of
all the information needed to train the segmentation model.
Training a model
To train a model on a dataset, use (as an example):
commandline
(venv.cultionet) cultionet train --val-frac 0.2 --augment-prob 0.5 --epochs 100 --hidden-channels 32 --processes 8 --load-batch-workers 8 --batch-size 4 --accumulate-grad-batches 4 --dropout 0.2 --deep-sup --dilations 1 2 --pool-by-max --learning-rate 0.01 --weight-decay 1e-4 --attention-weights natten
For more CLI options, see:
commandline
(venv.cultionet) cultionet train -h
After a model has been fit, the best/last checkpoint file can be found at /project_dir/ckpt/last.ckpt.
Predicting on an image with a trained model
First, a prediction dataset is needed
commandline
(venv.cultionet) cultionet create-predict --project-path /project_dir --year 2022 --ts-path /features
--num-workers 4 --config-file project_config.yml
Apply inference over the predictin dataset
commandline
(venv.cultionet) cultionet predict --project-path /project_dir --out-path predictions.tif --grid-id 1 --window-size 100 --config-file project_config.yml --device gpu --processes 4
Installation
Install Cultionet (assumes a working CUDA installation)
Create a new virtual environment (example using pyenv)
commandline pyenv virtualenv 3.10.14 venv.cultionet pyenv activate venv.cultionetUpdate install numpy and Python GDAL (assumes GDAL binaries are already installed)
commandline (venv.cultionet) pip install -U pip (venv.cultionet) pip install -U setuptools wheel pip install -U numpy==1.24.4 (venv.cultionet) pip install setuptools==57.5.0 (venv.cultionet) GDAL_VERSION=$(gdal-config --version | awk -F'[.]' '{print $1"."$2"."$3}') (venv.cultionet) pip install GDAL==$GDAL_VERSION --no-binary=gdalInstall PyTorch 2.2.1 for CUDA 11.4 and 11.8
commandline (venv.cultionet) pip install -U --no-cache-dir setuptools>=65.5.1 (venv.cultionet) pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
The command below should print True if PyTorch can access a GPU.
commandline
python -c "import torch;print(torch.cuda.is_available())"
Install
nattenfor CUDA 11.8 if using neighborhood attention.commandline (venv.cultionet) pip install natten==0.17.1+torch220cu118 -f https://shi-labs.com/natten/wheelsInstall cultionet
commandline
(venv.cultionet) pip install git@github.com:jgrss/cultionet.git
Installing CUDA on Ubuntu
Owner
- Name: Jordan Graesser
- Login: jgrss
- Kind: user
- Website: https://jgrss.github.io
- Repositories: 19
- Profile: https://github.com/jgrss
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jordan Graesser | j****s | 81 |
| jgrss | j****r@g****m | 26 |
| Michael Mann | m****3@g****m | 2 |
| MatthewPierson90 | 7****0 | 2 |
| nnguyen622 | 6****2 | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 19
- Total pull requests: 72
- Average time to close issues: 6 months
- Average time to close pull requests: 12 days
- Total issue authors: 4
- Total pull request authors: 4
- Average comments per issue: 7.11
- Average comments per pull request: 0.08
- Merged pull requests: 63
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 16
- Average time to close issues: N/A
- Average time to close pull requests: about 1 hour
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 14
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- MatthewPierson90 (10)
- mmann1123 (4)
- jgrss (3)
- varunshah111 (1)
Pull Request Authors
- jgrss (72)
- MatthewPierson90 (5)
- mmann1123 (2)
- nnguyen622 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- PyYAML >=5.1
- attrs >=21.
- decorator ==4.4.2
- frozendict >=2.2.
- frozenlist >=1.3.
- future >=0.17.1
- geopandas >=0.10.
- graphviz >=0.19.
- numpy <=1.21.0
- numpydoc *
- opencv-python >=4.5.5.
- pandas <=1.3.5
- pyDeprecate ==0.3.1
- pytorch_lightning >=1.5.9
- rasterio *
- rtree >=0.9.7
- scikit-image >=0.19.
- scipy >=1.2.
- setuptools ==59.5.0
- shapely >=1.8.
- sphinx *
- sphinx-automodapi *
- sphinxcontrib-apidoc *
- sphinxcontrib-bibtex ==1.0.0
- sphinxcontrib-napoleon *
- tensorboard >=2.2.0
- torch *
- torch-geometric >=2.0.2
- torch-geometric-temporal >=0.40
- torchmetrics >=0.7.0
- tqdm >=4.62.
- xarray >=0.21.
- actions/checkout v2 composite
- actions/setup-python v2 composite
- syphar/restore-pip-download-cache v1 composite
- syphar/restore-virtualenv v1 composite
- nvidia/cuda 11.3.0-base-ubuntu20.04 build