bimae_seed_classification
BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification, CVPRW 2024
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
Repository
BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification, CVPRW 2024
Basic Info
- Host: GitHub
- Owner: max-kuk
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://openaccess.thecvf.com/content/CVPR2024W/PBVS/papers/Kukushkin_BiMAE_-_A_Bimodal_Masked_Autoencoder_Architecture_for_Single-Label_Hyperspectral_CVPRW_2024_paper.pdf
- Size: 298 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification
Maksim Kukushkin, Martin Bogdan, Thomas Schmid
Input Data
The input data should be in the form of TFRecords. The TFRecords should contain the following features: - 'id': tf.string, - 'rgbimage': tf.float32, - 'hsimage': tf.uint8, - 'label': tf.string
Pretraining
To train the model, run the following command:
bash
nohup python mae_trainer.py --model=mae_vit_tiny_patch24 --scr_dir=path/to/tfrecord \
--batch_size=512 --epochs=300 --patch_size=24 --hs_image_size=24 --hs_num_patches=300 \
--hs_mask_proportion=0.9 --rgb_image_size=192 --rgb_num_patches=64 \
--hs_mask_proportion=0.75 > mae_trainer.log &
Finetuning

To finetune the model, run the following command:
bash
nohup python mae_trainer_finetuning.py --model=mae_vit_tiny_patch24 --select_channels_strategy=step_60 \
--scr_dir=path/to/tfrecord --batch_size=512 --epochs=50 --patch_size=24 \
--hs_image_size=24 --hs_num_patches=300 --hs_mask_proportion=0.9 --rgb_image_size=192 \
--rgb_num_patches=64 --hs_mask_proportion=0.75 --num_classes=19 --from_scratch=False \
--target_modalities=bimodal > mae_trainer_finetuning.log &
Following models are available: - maevittinypatch24 - maevitsmallpatch24 - maevitbase_patch24
Following strategies for selecting channels are available: - step_60 - select every 60th channel - step_30 - select every 30th channel - top_10 - select first 10 channels (1,10) - top_5 - select first 5 channels (1,5) - bottom_10 - select last 10 channels (290,300) - bottom_5 - select last 5 channels (295,300)
License
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
Citation
If you find this code useful in your research, please consider citing:
@inproceedings{kukushkin2024bimae,
author={Kukushkin, Maksim and Bogdan, Martin and Schmid, Thomas},
booktitle={2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
title={BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification},
year={2024},
pages={2987-2996},
keywords={Manifolds;Visualization;Costs;Scalability;Conferences;Self-supervised learning;Pattern recognition;masked autoencoder;hyperspectral imaging;seed purity testing;hyperspectral classification;multimodal masked autoencoder;masked modeling;self-supervised learning},
doi={10.1109/CVPRW63382.2024.00304}}
Owner
- Name: Max Kukuškin
- Login: max-kuk
- Kind: user
- Location: Leipzig
- Company: University of Leipzig
- Website: max-kuk.github.io
- Repositories: 2
- Profile: https://github.com/max-kuk
Ph.D. Student @ Machine Learning Group, University of Leipzig
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this work, please cite it as follows:"
authors:
- family-names: Kukushkin
given-names: Maksim
- family-names: Bogdan
given-names: Martin
- family-names: Schmid
given-names: Thomas
title: "BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image Classification"
year: 2024
conference:
name: "2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)"
pages: "2987-2996"
doi: "10.1109/CVPRW63382.2024.00304"
keywords:
- Manifolds
- Visualization
- Costs
- Scalability
- Conferences
- Self-supervised learning
- Pattern recognition
- masked autoencoder
- hyperspectral imaging
- seed purity testing
- hyperspectral classification
- multimodal masked autoencoder
- masked modeling
license: "https://spdx.org/licenses/CC-BY-4.0.html"
GitHub Events
Total
- Push event: 13
- Fork event: 1
- Create event: 2
Last Year
- Push event: 13
- Fork event: 1
- Create event: 2
Dependencies
- Flask ==2.3.2
- GitPython ==3.1.32
- Jinja2 ==3.1.2
- Keras-Preprocessing ==1.1.2
- Mako ==1.2.4
- Markdown ==3.4.4
- MarkupSafe ==2.1.1
- Pillow ==10.0.0
- PyJWT ==2.8.0
- PyYAML ==6.0.1
- SQLAlchemy ==2.0.19
- SciencePlots ==2.1.1
- Werkzeug ==2.3.7
- absl-py ==1.4.0
- alembic ==1.11.2
- antlr4-python3-runtime ==4.9.3
- appdirs ==1.4.4
- array-record ==0.5.0
- astunparse ==1.6.3
- backcall ==0.2.0
- blinker ==1.6.2
- cachetools ==5.3.1
- click ==8.1.6
- cloudpickle ==2.2.1
- cmake ==3.26.3
- contourpy ==1.0.7
- cycler ==0.11.0
- databricks-cli ==0.17.7
- decorator ==5.1.1
- dm-tree ==0.1.8
- docker ==6.1.3
- docker-pycreds ==0.4.0
- entrypoints ==0.4
- etils ==1.6.0
- filelock ==3.12.0
- flatbuffers ==23.5.26
- fonttools ==4.39.3
- fsspec ==2023.12.2
- gast ==0.4.0
- gitdb ==4.0.10
- google-auth ==2.22.0
- google-auth-oauthlib ==1.0.0
- google-pasta ==0.2.0
- googleapis-common-protos ==1.62.0
- greenlet ==2.0.2
- grpcio ==1.57.0
- gunicorn ==21.2.0
- h5py ==3.10.0
- hydra-core ==1.3.2
- importlib-resources ==6.1.1
- install ==1.3.5
- ipywidgets ==8.1.1
- itsdangerous ==2.1.2
- jax ==0.4.20
- jaxlib ==0.4.14
- jedi ==0.18.1
- joblib ==1.2.0
- jupyterlab-widgets ==3.0.9
- keras ==3.1.1
- keras-core ==0.1.4
- keras-cv ==0.7.2
- keras-nlp ==0.6.4
- kiwisolver ==1.4.4
- libclang ==16.0.6
- lit ==16.0.1
- markdown-it-py ==3.0.0
- matplotlib ==3.8.3
- mdurl ==0.1.2
- ml-dtypes ==0.3.2
- mlflow ==2.6.0
- mpmath ==1.3.0
- mypy-extensions ==1.0.0
- namex ==0.0.7
- numpy ==1.24.3
- nvidia-cublas-cu11 ==11.11.3.6
- nvidia-cublas-cu12 ==12.3.4.1
- nvidia-cuda-cupti-cu11 ==11.8.87
- nvidia-cuda-cupti-cu12 ==12.3.101
- nvidia-cuda-nvcc-cu11 ==11.8.89
- nvidia-cuda-nvcc-cu12 ==12.3.107
- nvidia-cuda-nvrtc-cu11 ==11.7.99
- nvidia-cuda-nvrtc-cu12 ==12.3.107
- nvidia-cuda-runtime-cu11 ==11.8.89
- nvidia-cuda-runtime-cu12 ==12.3.101
- nvidia-cudnn-cu11 ==8.7.0.84
- nvidia-cudnn-cu12 ==8.9.7.29
- nvidia-cufft-cu11 ==10.9.0.58
- nvidia-cufft-cu12 ==11.0.12.1
- nvidia-curand-cu11 ==10.3.0.86
- nvidia-curand-cu12 ==10.3.4.107
- nvidia-cusolver-cu11 ==11.4.1.48
- nvidia-cusolver-cu12 ==11.5.4.101
- nvidia-cusparse-cu11 ==11.7.5.86
- nvidia-cusparse-cu12 ==12.2.0.103
- nvidia-nccl-cu11 ==2.16.5
- nvidia-nccl-cu12 ==2.19.3
- nvidia-nvjitlink-cu12 ==12.3.101
- nvidia-nvtx-cu11 ==11.7.91
- oauthlib ==3.2.2
- omegaconf ==2.3.0
- opencv-python ==4.9.0.80
- opt-einsum ==3.3.0
- optree ==0.10.0
- pandas ==2.0.3
- parso ==0.8.3
- pathspec ==0.11.0
- pickleshare ==0.7.5
- platformdirs ==3.0.0
- plotly ==5.19.0
- promise ==2.3
- protobuf ==3.20.3
- pure-eval ==0.2.2
- pyarrow ==12.0.1
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pydot ==1.4.2
- pyparsing ==3.0.9
- python-dateutil ==2.8.2
- pytz ==2023.3
- querystring-parser ==1.2.4
- regex ==2023.10.3
- requests-oauthlib ==1.3.1
- rich ==13.5.2
- rsa ==4.9
- scikit-learn ==1.2.2
- scipy ==1.10.1
- seaborn ==0.13.2
- sentry-sdk ==1.40.5
- setproctitle ==1.3.3
- smmap ==5.0.0
- spectral ==0.23.1
- sqlparse ==0.4.4
- sympy ==1.11.1
- tabulate ==0.9.0
- tenacity ==8.2.3
- tensorboard ==2.16.2
- tensorboard-data-server ==0.7.1
- tensorboard-plugin-wit ==1.8.1
- tensorflow ==2.16.1
- tensorflow-datasets ==4.9.3
- tensorflow-estimator ==2.15.0
- tensorflow-hub ==0.15.0
- tensorflow-io-gcs-filesystem ==0.33.0
- tensorflow-metadata ==1.14.0
- tensorrt ==8.5.3.1
- termcolor ==2.3.0
- threadpoolctl ==3.1.0
- tokenize-rt ==5.2.0
- toml ==0.10.2
- tomli ==2.0.1
- torchvision4ad ==0.1.2
- tqdm ==4.66.1
- typing_extensions ==4.5.0
- tzdata ==2023.3
- urllib3 ==1.26.16
- wandb ==0.16.3
- wcwidth ==0.2.5
- websocket-client ==1.6.1
- widgetsnbextension ==4.0.9
- wrapt ==1.14.1