hls-foundation-os
This repository contains examples of fine-tuning Harmonized Landsat and Sentinel-2 (HLS) Prithvi foundation model.
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
1 of 9 committers (11.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Repository
This repository contains examples of fine-tuning Harmonized Landsat and Sentinel-2 (HLS) Prithvi foundation model.
Basic Info
Statistics
- Stars: 363
- Watchers: 12
- Forks: 94
- Open Issues: 33
- Releases: 0
Topics
Metadata Files
README.md
Image segmentation by foundation model finetuning
This repository shows three examples of how Prithvi can be finetuned for downstream tasks. The examples include flood detection using Sentinel-2 data from the Sen1Floods11 dataset, burn scars detection using the NASA HLS fire scars dataset and multi-temporal crop classification using the NASA HLS multi-temporal crop classification dataset.
:mega: Update: We have built TerraTorch to facilitate finetuning divers geospatial deep learning models which significantly expands the below implementation based on MMSegmentation and increases usability.
The approach
Background
To finetune for these tasks in this repository, we make use of MMSegmentation, which provides an extensible framework for segmentation tasks.
MMSegmentation allows us to concatenate necks and heads appropriate for any segmentation downstream task to the encoder, and then perform the finetuning. This only requires setting up a config file detailing the desired model architecture, dataset setup and training strategy.
We build extensions on top of MMSegmentation to support our encoder and provide classes to read and augment remote sensing data (from .tiff files) using MMSegmentation data pipelines. These extensions can be found in the geospatial_fm directory, and they are installed as a package on the top of MMSegmentation for ease of use. If more advanced functionality is necessary, it should be added there.
The pretrained backbone
The pretrained model we work with is a ViToperating as a Masked Autoencoder, trained on HLS data. The encoder from this model is made available as the backbone and the weights can be downloaded from Hugging Face here.
The architectures
We use a simple architecture that adds a neck and segmentation head to the backbone. The neck concatenates and processes the transformer's token based embeddings into an embedding that can be fed into convolutional layers. The head processes this embedding into a segmentation mask. The code for the architecture can be found in this file.
The pipeline
Additionally, we provide extra components for data loading pipelines in geospatial_pipelines.py. These are documented in the file.
We observe the MMCV convention that all operations assume a channel-last format.
However, we also introduce some components with the prefix Torch, such as TorchNormalize. These components assume the torch convention of channel-first.
At some point during the pipeline, before feeding the data to the model, it is necessary to change to channel-first format.
We reccomend implementing the change after the ToTensor operation (which is also necessary at some point), using the TorchPermute operation.
Tutorial
Check out the exploration notebook for a more in depth example of the usage of the model.
Setup
Dependencies
- Clone this repository
conda create -n <environment-name> python==3.9conda activate <environment-name>- Install torch (tested for >=1.7.1 and <=1.11.0) and torchvision (tested for >=0.8.2 and <=0.12). May vary with your system. Please check at: https://pytorch.org/get-started/previous-versions/.
- e.g.:
pip install torch==1.11.0+cu115 torchvision==0.12.0+cu115 --extra-index-url https://download.pytorch.org/whl/cu115
- e.g.:
cdinto the cloned repopip install -e .pip install -U openmimmim install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/{cuda_version}/{torch_version}/index.html. Note that pre-built wheels (fast installs without needing to build) only exist for some versions of torch and CUDA. Check compatibilities here: https://mmcv.readthedocs.io/en/v1.6.2/get_started/installation.html- e.g.:
mim install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/cu115/torch1.11.0/index.html
- e.g.:
Alternate Setup (Windows Users - Tested for Windows 10)
conda create -n <environment-name> python=3.9conda activate <environment-name>- Install torch (tested for >=1.7.1 and <=1.11.0) and torchvision (tested for >=0.8.2 and <=0.12). May vary with your system. Please check at: https://pytorch.org/get-started/previous-versions/.
- e.g.:
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
- e.g.:
git clone https://github.com/NASA-IMPACT/hls-foundation-os.git <your-local-path>\hls-foundation-osgit clone https://github.com/open-mmlab/mmsegmentation.git <your-local-path>\mmsegmentationcd <your-local-path>\mmsegmentation- Checkout mmsegmentation version compatible with hls-foundation:
git checkout 186572a3ce64ac9b6b37e66d58c76515000c3280 - modify setup.py so it installs from the cloned mmsegmentation. Change line
mmsegmentation @ git+https://github.com/open-mmlab/mmsegmentation.git@186572a3ce64ac9b6b37e66d58c76515000c3280tommsegmentation @ file:///<your-local-path>/mmsegmentation cd <your-local-path>\hls-foundation-ospip install -e .pip install -U openmimmim install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/{cuda_version}/{torch_version}/index.html. Note that pre-built wheels (fast installs without needing to build) only exist for some versions of torch and CUDA. Check compatibilities here: https://mmcv.readthedocs.io/en/v1.6.2/get_started/installation.html- e.g.:
mim install mmcv-full==1.6.2 -f https://download.openmmlab.com/mmcv/dist/cu115/torch1.11.0/index.html
- e.g.:
conda install -c conda-forge opencvpip install datasets
Data
The flood detection dataset can be downloaded from Sen1Floods11. Splits in the mmsegmentation format are available in the data_splits folders.
The NASA HLS fire scars dataset can be downloaded from Hugging Face.
The NASA HLS multi-temporal crop classification dataset can be downloaded from Hugging Face.
Using git-lfs you can download the data as in the following example: ``` sh
from: https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install git clone https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification
extract files
cd multi-temporal-crop-classification tar -xvf trainingchips.tgz && tar -xvf validationchips.tgz ```
Without git-lfs (Credit @robmarkcole): ```sh mkdir data cd data
mkdir multi-temporal-crop-classification cd multi-temporal-crop-classification
not this can take some time and appear to hang, be patient
wget https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification/resolve/main/trainingchips.tgz?download=true -O trainingchips.tgz tar -xvzf training_chips.tgz
wget https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification/resolve/main/validationchips.tgz?download=true -O validationchips.tgz tar -xvzf validation_chips.tgz
delete some mac-os added files
find . -name '._*' -delete
the following are NOT required (TBC)
https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification/resolve/main/trainingdata.txt https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification/resolve/main/validationdata.txt
instead copy over the files from the splits directory to the location of the images
cd .. mkdir hlsburnscars cd hlsburnscars wget https://huggingface.co/datasets/ibm-nasa-geospatial/hlsburnscars/resolve/main/hlsburnscars.tar.gz?download=true -O hlsburnscars.tar.gz tar -xvf hlsburnscars.tar.gz ```
Running the finetuning
In the
configsfolder there are three config examples for the three segmentation tasks. Complete the configs with your setup specifications. Parts that must be completed are marked with#TO BE DEFINED BY USER. They relate to the location where you downloaded the dataset, pretrained model weights, the test set (e.g. regular one or Bolivia out of bag data) and where you are going to save the experiment outputs.a. With the conda env created above activated, run:
mim train mmsegmentation configs/sen1floods11_config.pyormim train mmsegmentation configs/burn_scars.pyormim train mmsegmentation configs/multi_temporal_crop_classification.pyb. Multi-gpu training can be run by adding
--launcher pytorch --gpus <number of gpus>c. To run testing:
mim test mmsegmentation configs/sen1floods11_config.py --checkpoint /path/to/best/checkpoint/model.pth --eval "mIoU"ormim test mmsegmentation configs/burn_scars.py --checkpoint /path/to/best/checkpoint/model.pth --eval "mIoU"ormim test mmsegmentation configs/multi_temporal_crop_classification.py --checkpoint /path/to/best/checkpoint/model.pth --eval "mIoU"
Checkpoints on Hugging Face
We also provide checkpoints on Hugging Face for the burn scars detection and the multi temporal crop classification tasks.
Running the inference
We provide a script to run inference on new data in GeoTIFF format. The data can be of any shape (e.g. height and width) as long as it follows the bands/channels of the original dataset. An example is shown below.
python model_inference.py -config /path/to/config/config.py -ckpt /path/to/checkpoint/checkpoint.pth -input /input/folder/ -output /output/folder/ -input_type tif -bands 0 1 2 3 4 5
The bands parameter is useful in case the files used to run inference have the data in different orders/indexes than the original dataset.
Additional documentation
This project builds on MMSegmentation and MMCV. For additional documentation, consult their docs (please note this is currently version 0.30.0 of MMSegmentation and version 1.5.0 of MMCV, not latest).
Citation
If this repository helped your research, please cite HLS foundation in your publications. Here is an example BibTeX entry:
@software{HLS_Foundation_2023,
author = {Jakubik, Johannes and Chu, Linsong and Fraccaro, Paolo and Bangalore, Ranjini and Lambhate, Devyani and Das, Kamal and Oliveira Borges, Dario and Kimura, Daiki and Simumba, Naomi and Szwarcman, Daniela and Muszynski, Michal and Weldemariam, Kommy and Zadrozny, Bianca and Ganti, Raghu and Costa, Carlos and Watson, Campbell and Mukkavilli, Karthik and Roy, Sujit and Phillips, Christopher and Ankur, Kumar and Ramasubramanian, Muthukumaran and Gurung, Iksha and Leong, Wei Ji and Avery, Ryan and Ramachandran, Rahul and Maskey, Manil and Olofossen, Pontus and Fancher, Elizabeth and Lee, Tsengdar and Murphy, Kevin and Duffy, Dan and Little, Mike and Alemohammad, Hamed and Cecil, Michael and Li, Steve and Khallaghi, Sam and Godwin, Denys and Ahmadi, Maryam and Kordi, Fatemeh and Saux, Bertrand and Pastick, Neal and Doucette, Peter and Fleckenstein, Rylie and Luanga, Dalton and Corvin, Alex and Granger, Erwan},
doi = {10.57967/hf/0952},
month = aug,
title = {{HLS Foundation}},
repository-code = {https://github.com/nasa-impact/hls-foundation-os},
year = {2023}
}
Owner
- Name: Inter Agency Implementation and Advanced Concepts
- Login: NASA-IMPACT
- Kind: organization
- Email: esds.dsig@gmail.com
- Repositories: 88
- Profile: https://github.com/NASA-IMPACT
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Fraccaro" given-names: "Paolo" affiliation: "IBM Research" - family-names: "Gomes" given-names: "Carlos" affiliation: "IBM Research" - family-names: "Jakubik" given-names: "Johannes" affiliation: "IBM Research" - family-names: "Chu" given-names: "Linsong" affiliation: "IBM Research" - family-names: "Gabby" given-names: "Nyirjesy" affiliation: "IBM Research" - family-names: "Bangalore" given-names: "Ranjini" affiliation: "IBM Research" - family-names: "Lambhate" given-names: "Devyani" affiliation: "IBM Research" - family-names: "Das" given-names: "Kamal" affiliation: "IBM Research" - family-names: "Oliveira Borges" given-names: "Dario" affiliation: "IBM Research" - family-names: "Kimura" given-names: "Daiki" affiliation: "IBM Research" - family-names: "Simumba" given-names: "Naomi" affiliation: "IBM Research" - family-names: "Szwarcman" given-names: "Daniela" affiliation: "IBM Research" - family-names: "Muszynski" given-names: "Michal" affiliation: "IBM Research" - family-names: "Weldemariam" given-names: "Kommy" - family-names: "Edwards" given-names: "Blair" affiliation: "IBM Research" - family-names: "Schmude" given-names: "Johannes" affiliation: "IBM Research" - family-names: "Hamann" given-names: "Hendrik" affiliation: "IBM Research" - family-names: "Zadrozny" given-names: "Bianca" affiliation: "IBM Research" - family-names: "Ganti" given-names: "Raghu" affiliation: "IBM Research" - family-names: "Costa" given-names: "Carlos" affiliation: "IBM Research" - family-names: "Watson" given-names: "Campbell" affiliation: "IBM Research" - family-names: "Mukkavilli" given-names: "Karthik" affiliation: "IBM Research" - family-names: "Parkin" given-names: "Rob" affiliation: "IBM Research" - family-names: "Roy" given-names: "Sujit" affiliation: "University of Alabama in Huntsville" - family-names: "Phillips" given-names: "Christopher" affiliation: "University of Alabama in Huntsville" - family-names: "Ankur" given-names: "Kumar" affiliation: "University of Alabama in Huntsville" - family-names: "Ramasubramanian" given-names: "Muthukumaran" affiliation: "University of Alabama in Huntsville" - family-names: "Gurung" given-names: "Iksha" affiliation: "University of Alabama in Huntsville" - family-names: "Leong" given-names: "Wei Ji" affiliation: "Development Seed" - family-names: "Avery" given-names: "Ryan" affiliation: "Development Seed" - family-names: "Ramachandran" given-names: "Rahul" affiliation: "NASA" - family-names: "Maskey" given-names: "Manil" affiliation: "NASA" - family-names: "Olofossen" given-names: "Pontus" affiliation: "NASA" - family-names: "Fancher" given-names: "Elizabeth" affiliation: "Barrios Technology" - family-names: "Lee" given-names: "Tsengdar" affiliation: "NASA" - family-names: "Murphy" given-names: "Kevin" affiliation: "NASA" - family-names: "Duffy" given-names: "Dan" affiliation: "NASA" - family-names: "Little" given-names: "Mike" affiliation: "NASA" - family-names: "Alemohammad" given-names: "Hamed" affiliation: "Clark University" - family-names: "Cecil" given-names: "Michael" affiliation: "Clark University" - family-names: "Li" given-names: "Steve" affiliation: "Clark University" - family-names: "Khallaghi" given-names: "Sam" affiliation: "Clark University" - family-names: "Godwin" given-names: "Denys" affiliation: "Clark University" - family-names: "Ahmadi" given-names: "Maryam" affiliation: "Clark University" - family-names: "Kordi" given-names: "Fatemeh" affiliation: "Clark University" - family-names: "Saux" given-names: "Bertrand" affiliation: "ESA" - family-names: "Pastick" given-names: "Neal" affiliation: "USGS" - family-names: "Doucette" given-names: "Peter" affiliation: "USGS" - family-names: "Fleckenstein" given-names: "Rylie" affiliation: "USGS" - family-names: "Luanga" given-names: "Dalton" affiliation: "DOE/ORNL" - family-names: "Corvin" given-names: "Alex" affiliation: "RedHat" - family-names: "Granger" given-names: "Erwan" affiliation: "RedHat" title: "HLS Foundation" doi: https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M license: "Apache 2.0" date-released: 2023-08-03 repository-code: "https://github.com/nasa-impact/hls-foundation-os"
GitHub Events
Total
- Issues event: 3
- Watch event: 56
- Member event: 1
- Issue comment event: 3
- Fork event: 16
Last Year
- Issues event: 3
- Watch event: 56
- Member event: 1
- Issue comment event: 3
- Fork event: 16
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Paolo | p****o@i****m | 44 |
| Carlos Gomes | c****s@i****m | 39 |
| xhagrg | g****a@h****m | 23 |
| muthukumaran R | m****1@u****u | 9 |
| paolofraccaro | 3****o | 5 |
| agraham9966 | a****6@g****m | 2 |
| Paolo Fraccaro | P****o@i****m | 1 |
| lchu | l****u@u****m | 1 |
| Lelouch vi' Britania | s****4@h****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 39
- Total pull requests: 16
- Average time to close issues: 22 days
- Average time to close pull requests: 5 days
- Total issue authors: 22
- Total pull request authors: 9
- Average comments per issue: 1.87
- Average comments per pull request: 0.44
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: about 2 months
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- robmarkcole (6)
- anumerico (4)
- mahrokh3409 (2)
- jakubsadel (2)
- aleksmirosh (2)
- hanxLi (2)
- sm-potter (2)
- danjac94 (2)
- mxyqsh (1)
- agustinavg (1)
- Godjobgerry (1)
- dialuser (1)
- widssaguenay (1)
- EllaJewison (1)
- Amirbn73 (1)
Pull Request Authors
- muthukumaranR (3)
- weiji14 (3)
- CarlosGomes98 (2)
- xhagrg (2)
- ahmedemam576 (1)
- paolofraccaro (1)
- hanxLi (1)
- agraham9966 (1)
- longdaothanh (1)
- boro128 (1)
- careeranalyser (1)
- Sahil1709 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- einops *
- imagecodecs *
- mmsegmentation *
- rasterio *
- tensorboard *
- tifffile *
- timm ==0.4.12