attention-based-models-for-hyper-kvasir
Automatic and accurate analysis of medical images is a subject of great importance in our current society. In particular, this work focuses on gastrointestinal endoscopy images, as the study of these images helps to detect possible health conditions in those regions.
https://github.com/richardesp/attention-based-models-for-hyper-kvasir
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
Automatic and accurate analysis of medical images is a subject of great importance in our current society. In particular, this work focuses on gastrointestinal endoscopy images, as the study of these images helps to detect possible health conditions in those regions.
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Attention-based Models for Hyper-Kvasir

Automatic and accurate analysis of medical images is a subject of great importance in our current society. In particular, this work focuses on gastrointestinal endoscopy images, as the study of these images helps to detect possible health conditions in those regions.
Table of Contents
Installation
To set up the environment and install the necessary dependencies, follow these steps:
Clone the repository:
bash git clone https://github.com/your-username/Attention-based-models-for-Hyper-Kvasir.git cd Attention-based-models-for-Hyper-KvasirCreate a Conda environment:
bash
conda create --name hyperkvasir_env python=3.8
conda activate hyperkvasir_env
- Install the required packages::
bash
conda install --file requirements.txt
Project Structure
bash
.
├── LICENSE.txt
├── README.md
├── main.py
├── notebooks
│ ├── Pre-processing notebooks...
│ ├── Model training notebooks...
│ └── Visualization and analysis notebooks...
├── requirements.txt
├── scripts
│ └── download_dataset.sh
└── src
├── base
│ └── base_make_dataset.py
├── data
│ └── make_dataset.py
├── features
│ └── build_features.py
├── models
│ ├── predict_model.py
│ └── train_model.py
├── utils
│ ├── check_gpu_arm.py
│ └── get_vprint.py
└── visualization
└── visualize.py
Usage
Any training or prediction on images can be performed by executing the provided notebooks in the notebooks directory. Navigate to the specific notebook that matches your desired operation:
- For preprocessing tasks, refer to the notebooks prefixed with
1.x-rep. - For model training, especially with Vision Transformers (ViT), refer to the notebooks prefixed with
2.x-rep. - For initial runs with pre-trained Vision Transformers, use the notebook
3.0-rep-pre-trained-vit-initial-run.ipynb. - For visualization and analysis, including attention extraction, UMAP, and importance correlation plots, refer to the notebooks prefixed with
5.x-rep.
Dataset
The dataset used in this project can be found at Hyper-Kvasir Dataset.
Dataset Details
The dataset can be split into four distinct parts: - Labeled image data - Unlabeled image data - Segmented image data - Annotated video data
Each part is further described below:
Labeled images
In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.
Unlabeled Images
In total, the dataset contains 99,417 unlabeled images. The unlabeled images can be found in the unlabeled folder which is a subfolder in the image folder, together with the other labeled image folders. In addition to the unlabeled image files, we also provide the extracted global features and cluster assignments in the Hyper-Kvasir Github repository as Attribute-Relation File Format (ARFF) files. ARFF files can be opened and processed using, for example, the WEKA machine learning library, or they can easily be converted into comma-separated values (CSV) files.
Segmented Images
We provide the original image, a segmentation mask, and a bounding box for 1,000 images from the polyp class. In the mask, the pixels depicting polyp tissue, the region of interest, are represented by the foreground (white mask), while the background (in black) does not contain polyp pixels. The bounding box is defined as the outermost pixels of the found polyp. For this segmentation set, we have two folders, one for images and one for masks, each containing 1,000 JPEG-compressed images. The bounding boxes for the corresponding images are stored in a JavaScript Object Notation (JSON) file. The image and its corresponding mask have the same filename. The images and files are stored in the segmented images folder. It is important to point out that the segmented images have duplicates in the images folder of polyps since the images were taken from there.
Annotated Videos
The dataset contains a total of 373 videos containing different findings and landmarks. This corresponds to approximately 11.62 hours of videos and 1,059,519 video frames that can be converted to images if needed. Each video has been manually assessed by a medical professional working in the field of gastroenterology and resulted in a total of 171 annotated findings.
License
The license for the Hyper-Kvasir dataset is Creative Commons Attribution 4.0 International (CC BY 4.0).
More information can be found here.
Acknowledgments
We would like to extend our sincere gratitude to:
- Isabel Jiménez-Velasco
- Manuel J. Marín-Jiménez
- Rafael Muñoz-Salinas
for their invaluable contributions and insights that greatly benefited this project.
Citation
If you use this work, please cite it as follows:
bibtex
@inproceedings{espantaleon2023caip,
author = {Ricardo Espantale{\'{o}}n{-}P{\'{e}}rez and
Isabel Jim{\'{e}}nez{-}Velasco and
Rafael Mu{\~{n}}oz{-}Salinas and
Manuel J. Mar{\'{\i}}n{-}Jim{\'{e}}nez},
title = {Empirical Study of Attention-Based Models for Automatic Classification
of Gastrointestinal Endoscopy Images},
booktitle = {Computer Analysis of Images and Patterns - 20th International Conference,
{CAIP}},
series = {Lecture Notes in Computer Science},
volume = {14185},
pages = {98--108},
year = {2023},
doi = {10.1007/978-3-031-44240-7\_10}
}
Owner
- Login: richardesp
- Kind: user
- Repositories: 17
- Profile: https://github.com/richardesp
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite the following work:"
authors:
- family-names: Espantaleón-Pérez
given-names: Ricardo
- family-names: Jiménez-Velasco
given-names: Isabel
- family-names: Muñoz-Salinas
given-names: Rafael
- family-names: Marín-Jiménez
given-names: Manuel J.
title: "Empirical Study of Attention-Based Models for Automatic Classification of Gastrointestinal Endoscopy Images"
type: article
year: 2023
doi: "10.1007/978-3-031-44240-7_10"
conference:
name: "Computer Analysis of Images and Patterns - 20th International Conference, CAIP"
series: "Lecture Notes in Computer Science"
volume: "14185"
pages: "98-108"
GitHub Events
Total
- Release event: 1
- Watch event: 2
- Push event: 2
- Create event: 1
Last Year
- Release event: 1
- Watch event: 2
- Push event: 2
- Create event: 1
Dependencies
- absl-py =1.4.0=pypi_0
- anyio =3.6.2=pyhd8ed1ab_0
- appdirs =1.4.4=pyh9f0ad1d_0
- appnope =0.1.3=pyhd8ed1ab_0
- argon2-cffi =21.3.0=pyhd8ed1ab_0
- argon2-cffi-bindings =21.2.0=py310h1a28f6b_0
- asttokens =2.2.1=pyhd8ed1ab_0
- astunparse =1.6.3=pypi_0
- attrs =22.2.0=pyh71513ae_0
- babel =2.11.0=py310hca03da5_0
- backcall =0.2.0=pyh9f0ad1d_0
- backports =1.0=pyhd8ed1ab_3
- backports.functools_lru_cache =1.6.4=pyhd8ed1ab_0
- bayesian-optimization =1.4.2=pypi_0
- beautifulsoup4 =4.11.2=pyha770c72_0
- blas =1.0=openblas
- bleach =6.0.0=pyhd8ed1ab_0
- boto3 =1.26.64=pyhd8ed1ab_0
- botocore =1.29.64=pyhd8ed1ab_0
- bottleneck =1.3.5=py310h96f19d2_0
- brotli =1.0.9=h1a8c8d9_8
- brotli-bin =1.0.9=h1a8c8d9_8
- brotlipy =0.7.0=py310h1a28f6b_1002
- bzip2 =1.0.8=h3422bc3_4
- c-ares =1.18.1=h3422bc3_0
- ca-certificates =2023.01.10=hca03da5_0
- cached-property =1.5.2=hd8ed1ab_1
- cached_property =1.5.2=pyha770c72_1
- cachetools =5.3.0=pypi_0
- cairo =1.16.0=h302bd0f_3
- certifi =2022.12.7=py310hca03da5_0
- cffi =1.15.1=py310h80987f9_3
- charset-normalizer =2.1.1=pyhd8ed1ab_0
- click =8.1.3=unix_pyhd8ed1ab_2
- cloudpickle =2.2.1=pypi_0
- colorama =0.4.6=pyhd8ed1ab_0
- comm =0.1.2=pyhd8ed1ab_0
- cryptography =38.0.4=py310h834c97f_0
- cycler =0.11.0=pyhd8ed1ab_0
- debugpy =1.5.1=py310hc377ac9_0
- decorator =5.1.1=pyhd8ed1ab_0
- defusedxml =0.7.1=pyhd8ed1ab_0
- eigen =3.3.7=h525c30c_1
- entrypoints =0.4=pyhd8ed1ab_0
- executing =1.2.0=pyhd8ed1ab_0
- expat =2.4.9=hc377ac9_0
- fftw =3.3.9=h1a28f6b_1
- flask =2.2.2=pyhd8ed1ab_0
- flatbuffers =23.1.21=pypi_0
- flit-core =3.8.0=pyhd8ed1ab_0
- fontconfig =2.14.1=h6b8db82_1
- fonttools =4.25.0=pyhd3eb1b0_0
- freetype =2.12.1=h1192e45_0
- gast =0.4.0=pypi_0
- gettext =0.21.0=h826f4ad_0
- giflib =5.2.1=h80987f9_1
- glib =2.69.1=h514c7bf_2
- google-auth =2.16.0=pypi_0
- google-auth-oauthlib =0.4.6=pypi_0
- google-pasta =0.2.0=pypi_0
- graphite2 =1.3.14=hc377ac9_1
- grpcio =1.51.1=pypi_0
- gst-plugins-base =1.14.1=hf0a386a_0
- gstreamer =1.14.1=he09cfb7_0
- gym =0.26.2=pypi_0
- gym-notices =0.0.8=pypi_0
- h5py =3.7.0=py310h181c318_0
- harfbuzz =4.3.0=hb1b0ec1_0
- hdf5 =1.12.1=h160e8cb_2
- icu =68.1=hc377ac9_0
- idna =3.4=pyhd8ed1ab_0
- importlib-metadata =6.0.0=pyha770c72_0
- importlib_metadata =6.0.0=hd8ed1ab_0
- importlib_resources =5.10.2=pyhd8ed1ab_0
- ipykernel =6.19.2=py310h33ce5c2_0
- ipython =8.9.0=pyhd1c38e8_0
- ipython_genutils =0.2.0=py_1
- ipywidgets =8.0.4=pyhd8ed1ab_0
- itsdangerous =2.1.2=pyhd8ed1ab_0
- jedi =0.18.2=pyhd8ed1ab_0
- jinja2 =3.1.2=pyhd8ed1ab_1
- jmespath =1.0.1=pyhd8ed1ab_0
- joblib =1.2.0=pyhd8ed1ab_0
- jpeg =9e=he4db4b2_2
- json5 =0.9.6=pyhd3eb1b0_0
- jsonschema =4.17.3=pyhd8ed1ab_0
- jupyter =1.0.0=py310hca03da5_8
- jupyter_client =8.0.2=pyhd8ed1ab_0
- jupyter_console =6.4.4=pyhd8ed1ab_0
- jupyter_core =5.1.1=py310hca03da5_0
- jupyter_events =0.6.3=pyhd8ed1ab_0
- jupyter_server =1.23.4=py310hca03da5_0
- jupyter_server_terminals =0.4.4=pyhd8ed1ab_1
- jupyterlab =3.5.3=py310hca03da5_0
- jupyterlab_pygments =0.2.2=pyhd8ed1ab_0
- jupyterlab_server =2.16.5=py310hca03da5_0
- jupyterlab_widgets =3.0.5=pyhd8ed1ab_0
- kaggle =1.5.12=pypi_0
- keras =2.11.0=pypi_0
- keras-preprocessing =1.1.2=pyhd3eb1b0_0
- kiwisolver =1.4.4=py310h313beb8_0
- krb5 =1.19.4=h8380606_0
- lcms2 =2.14=h481adae_1
- lerc =3.0=hc377ac9_0
- libaec =1.0.6=hb7217d7_1
- libblas =3.9.0=16_osxarm64_openblas
- libbrotlicommon =1.0.9=h1a8c8d9_8
- libbrotlidec =1.0.9=h1a8c8d9_8
- libbrotlienc =1.0.9=h1a8c8d9_8
- libcblas =3.9.0=16_osxarm64_openblas
- libclang =15.0.6.1=pypi_0
- libcurl =7.87.0=h0f1d93c_0
- libcxx =14.0.6=h2692d47_0
- libdeflate =1.8=h1a28f6b_5
- libedit =3.1.20221030=h80987f9_0
- libev =4.33=h642e427_1
- libffi =3.4.2=h3422bc3_5
- libgfortran =5.0.0=11_3_0_hd922786_27
- libgfortran5 =11.3.0=hdaf2cc0_27
- libiconv =1.17=he4db4b2_0
- liblapack =3.9.0=16_osxarm64_openblas
- libllvm12 =12.0.0=h12f7ac0_4
- libnghttp2 =1.46.0=h95c9599_0
- libopenblas =0.3.21=openmp_hc731615_3
- libpng =1.6.37=hb8d0fd4_0
- libpq =12.9=h65cfe13_3
- libsodium =1.0.18=h27ca646_1
- libssh2 =1.10.0=hf27765b_0
- libtiff =4.5.0=h313beb8_1
- libwebp =1.2.4=h68602c7_0
- libwebp-base =1.2.4=h57fd34a_0
- libxcb =1.13=h9b22ae9_1004
- libxml2 =2.9.14=h8c5e841_0
- libxslt =1.1.35=h9833966_0
- llvm-openmp =15.0.7=h7cfbb63_0
- lxml =4.9.1=py310h2fae87d_0
- lz4-c =1.9.4=h313beb8_0
- markdown =3.4.1=pypi_0
- markupsafe =2.1.1=py310h1a28f6b_0
- matplotlib =3.5.2=py310hca03da5_0
- matplotlib-base =3.5.2=py310hc377ac9_0
- matplotlib-inline =0.1.6=pyhd8ed1ab_0
- mistune =2.0.4=pyhd8ed1ab_0
- munkres =1.1.4=pyh9f0ad1d_0
- nbclassic =0.5.1=pyhd8ed1ab_0
- nbclient =0.7.2=pyhd8ed1ab_0
- nbconvert =7.2.9=pyhd8ed1ab_0
- nbconvert-core =7.2.9=pyhd8ed1ab_0
- nbconvert-pandoc =7.2.9=pyhd8ed1ab_0
- nbformat =5.7.3=pyhd8ed1ab_0
- ncurses =6.3=h07bb92c_1
- nest-asyncio =1.5.6=pyhd8ed1ab_0
- notebook =6.5.2=pyha770c72_1
- notebook-shim =0.2.2=pyhd8ed1ab_0
- nspr =4.33=hc377ac9_0
- nss =3.74=h142855e_0
- numexpr =2.8.4=py310hecc3335_0
- numpy =1.23.5=py310hb93e574_0
- numpy-base =1.23.5=py310haf87e8b_0
- oauthlib =3.2.2=pypi_0
- opencv =4.6.0=py310he2359d5_2
- openssl =1.1.1s=h1a28f6b_0
- opt-einsum =3.3.0=pypi_0
- packaging =23.0=pyhd8ed1ab_0
- pandas =1.5.2=py310h46d7db6_0
- pandas-datareader =0.10.0=pyh6c4a22f_0
- pandoc =2.19.2=hce30654_1
- pandocfilters =1.5.0=pyhd8ed1ab_0
- parso =0.8.3=pyhd8ed1ab_0
- pcre =8.45=hc377ac9_0
- pexpect =4.8.0=pyh1a96a4e_2
- pickleshare =0.7.5=py_1003
- pillow =9.3.0=py310h313beb8_2
- pip =23.0=pyhd8ed1ab_0
- pixman =0.40.0=h1a28f6b_0
- pkgutil-resolve-name =1.3.10=pyhd8ed1ab_0
- platformdirs =2.6.2=pyhd8ed1ab_0
- ply =3.11=py310hca03da5_0
- pooch =1.6.0=pyhd8ed1ab_0
- prometheus_client =0.16.0=pyhd8ed1ab_0
- prompt-toolkit =3.0.36=pyha770c72_0
- prompt_toolkit =3.0.36=hd8ed1ab_0
- protobuf =3.19.6=pypi_0
- psutil =5.9.0=py310h1a28f6b_0
- pthread-stubs =0.4=h27ca646_1001
- ptyprocess =0.7.0=pyhd3deb0d_0
- pure_eval =0.2.2=pyhd8ed1ab_0
- pyasn1 =0.4.8=pypi_0
- pyasn1-modules =0.2.8=pypi_0
- pycparser =2.21=pyhd8ed1ab_0
- pygments =2.14.0=pyhd8ed1ab_0
- pyopenssl =23.0.0=pyhd8ed1ab_0
- pyparsing =3.0.9=pyhd8ed1ab_0
- pyqt =5.15.7=py310hc377ac9_0
- pyqt5-sip =12.11.0=pypi_0
- pyrsistent =0.18.0=py310h1a28f6b_0
- pysocks =1.7.1=pyha2e5f31_6
- python =3.10.9=hc0d8a6c_0
- python-dateutil =2.8.2=pyhd8ed1ab_0
- python-fastjsonschema =2.16.2=pyhd8ed1ab_0
- python-json-logger =2.0.4=pyhd8ed1ab_0
- python-slugify =8.0.0=pypi_0
- pytz =2022.7.1=pyhd8ed1ab_0
- pyyaml =6.0=py310h80987f9_1
- pyzmq =23.2.0=py310hc377ac9_0
- qt-main =5.15.2=ha2d02b5_7
- qt-webengine =5.15.9=h2903aaf_4
- qtconsole =5.4.0=py310hca03da5_0
- qtpy =2.2.0=py310hca03da5_0
- qtwebkit =5.212=h0f11f3c_4
- readline =8.1.2=h46ed386_0
- requests =2.28.2=pyhd8ed1ab_0
- requests-oauthlib =1.3.1=pypi_0
- rfc3339-validator =0.1.4=pyhd8ed1ab_0
- rfc3986-validator =0.1.1=pyh9f0ad1d_0
- rsa =4.9=pypi_0
- s3transfer =0.6.0=pyhd8ed1ab_0
- scikit-learn =1.2.0=py310h313beb8_1
- scipy =1.9.3=py310h20cbe94_0
- send2trash =1.8.0=pyhd8ed1ab_0
- setuptools =67.1.0=pyhd8ed1ab_0
- sip =6.6.2=py310hc377ac9_0
- six =1.16.0=pyh6c4a22f_0
- sniffio =1.3.0=pyhd8ed1ab_0
- soupsieve =2.3.2.post1=pyhd8ed1ab_0
- sqlite =3.40.1=h7a7dc30_0
- stack_data =0.6.2=pyhd8ed1ab_0
- tensorboard =2.11.2=pypi_0
- tensorboard-data-server =0.6.1=pypi_0
- tensorboard-plugin-wit =1.8.1=pypi_0
- tensorflow-addons =0.19.0=pypi_0
- tensorflow-estimator =2.11.0=pypi_0
- tensorflow-macos =2.11.0=pypi_0
- tensorflow-metal =0.7.0=pypi_0
- termcolor =2.2.0=pypi_0
- terminado =0.17.1=pyhd1c38e8_0
- text-unidecode =1.3=pypi_0
- threadpoolctl =3.1.0=pyh8a188c0_0
- tinycss2 =1.2.1=pyhd8ed1ab_0
- tk =8.6.12=hb8d0fd4_0
- toml =0.10.2=pyhd3eb1b0_0
- tomli =2.0.1=py310hca03da5_0
- tornado =6.2=py310h1a28f6b_0
- tqdm =4.64.1=pyhd8ed1ab_0
- traitlets =5.9.0=pyhd8ed1ab_0
- typeguard =2.13.3=pypi_0
- typing-extensions =4.4.0=hd8ed1ab_0
- typing_extensions =4.4.0=pyha770c72_0
- tzdata =2022g=h191b570_0
- urllib3 =1.26.14=pyhd8ed1ab_0
- wcwidth =0.2.6=pyhd8ed1ab_0
- webencodings =0.5.1=py_1
- websocket-client =1.5.1=pyhd8ed1ab_0
- werkzeug =2.2.2=pyhd8ed1ab_0
- wheel =0.38.4=pyhd8ed1ab_0
- widgetsnbextension =4.0.5=pyhd8ed1ab_0
- wrapt =1.14.1=pypi_0
- xorg-libxau =1.0.9=h27ca646_0
- xorg-libxdmcp =1.1.3=h27ca646_0
- xz =5.2.10=h80987f9_1
- yaml =0.2.5=h3422bc3_2
- zeromq =4.3.4=hbdafb3b_1
- zipp =3.12.1=pyhd8ed1ab_0
- zlib =1.2.13=h5a0b063_0
- zstd =1.5.2=h8574219_0