attention-based-models-for-hyper-kvasir

Automatic and accurate analysis of medical images is a subject of great importance in our current society. In particular, this work focuses on gastrointestinal endoscopy images, as the study of these images helps to detect possible health conditions in those regions.

https://github.com/richardesp/attention-based-models-for-hyper-kvasir

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Automatic and accurate analysis of medical images is a subject of great importance in our current society. In particular, this work focuses on gastrointestinal endoscopy images, as the study of these images helps to detect possible health conditions in those regions.

Basic Info
  • Host: GitHub
  • Owner: richardesp
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 77.2 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Attention-based Models for Hyper-Kvasir

Automatic and accurate analysis of medical images is a subject of great importance in our current society. In particular, this work focuses on gastrointestinal endoscopy images, as the study of these images helps to detect possible health conditions in those regions.

Table of Contents

  1. Installation
  2. Project Structure
  3. Usage
  4. Dataset
  5. License
  6. Acknowledgments

Installation

To set up the environment and install the necessary dependencies, follow these steps:

  1. Clone the repository: bash git clone https://github.com/your-username/Attention-based-models-for-Hyper-Kvasir.git cd Attention-based-models-for-Hyper-Kvasir

  2. Create a Conda environment:

bash conda create --name hyperkvasir_env python=3.8 conda activate hyperkvasir_env

  1. Install the required packages::

bash conda install --file requirements.txt

Project Structure

bash . ├── LICENSE.txt ├── README.md ├── main.py ├── notebooks │ ├── Pre-processing notebooks... │ ├── Model training notebooks... │ └── Visualization and analysis notebooks... ├── requirements.txt ├── scripts │ └── download_dataset.sh └── src ├── base │ └── base_make_dataset.py ├── data │ └── make_dataset.py ├── features │ └── build_features.py ├── models │ ├── predict_model.py │ └── train_model.py ├── utils │ ├── check_gpu_arm.py │ └── get_vprint.py └── visualization └── visualize.py

Usage

Any training or prediction on images can be performed by executing the provided notebooks in the notebooks directory. Navigate to the specific notebook that matches your desired operation:

  • For preprocessing tasks, refer to the notebooks prefixed with 1.x-rep.
  • For model training, especially with Vision Transformers (ViT), refer to the notebooks prefixed with 2.x-rep.
  • For initial runs with pre-trained Vision Transformers, use the notebook 3.0-rep-pre-trained-vit-initial-run.ipynb.
  • For visualization and analysis, including attention extraction, UMAP, and importance correlation plots, refer to the notebooks prefixed with 5.x-rep.

Dataset

The dataset used in this project can be found at Hyper-Kvasir Dataset.

Dataset Details

The dataset can be split into four distinct parts: - Labeled image data - Unlabeled image data - Segmented image data - Annotated video data

Each part is further described below:

Labeled images

In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.

Unlabeled Images

In total, the dataset contains 99,417 unlabeled images. The unlabeled images can be found in the unlabeled folder which is a subfolder in the image folder, together with the other labeled image folders. In addition to the unlabeled image files, we also provide the extracted global features and cluster assignments in the Hyper-Kvasir Github repository as Attribute-Relation File Format (ARFF) files. ARFF files can be opened and processed using, for example, the WEKA machine learning library, or they can easily be converted into comma-separated values (CSV) files.

Segmented Images

We provide the original image, a segmentation mask, and a bounding box for 1,000 images from the polyp class. In the mask, the pixels depicting polyp tissue, the region of interest, are represented by the foreground (white mask), while the background (in black) does not contain polyp pixels. The bounding box is defined as the outermost pixels of the found polyp. For this segmentation set, we have two folders, one for images and one for masks, each containing 1,000 JPEG-compressed images. The bounding boxes for the corresponding images are stored in a JavaScript Object Notation (JSON) file. The image and its corresponding mask have the same filename. The images and files are stored in the segmented images folder. It is important to point out that the segmented images have duplicates in the images folder of polyps since the images were taken from there.

Annotated Videos

The dataset contains a total of 373 videos containing different findings and landmarks. This corresponds to approximately 11.62 hours of videos and 1,059,519 video frames that can be converted to images if needed. Each video has been manually assessed by a medical professional working in the field of gastroenterology and resulted in a total of 171 annotated findings.

License

The license for the Hyper-Kvasir dataset is Creative Commons Attribution 4.0 International (CC BY 4.0).

More information can be found here.

Acknowledgments

We would like to extend our sincere gratitude to:

  • Isabel Jiménez-Velasco
  • Manuel J. Marín-Jiménez
  • Rafael Muñoz-Salinas

for their invaluable contributions and insights that greatly benefited this project.

Citation

If you use this work, please cite it as follows:

bibtex @inproceedings{espantaleon2023caip, author = {Ricardo Espantale{\'{o}}n{-}P{\'{e}}rez and Isabel Jim{\'{e}}nez{-}Velasco and Rafael Mu{\~{n}}oz{-}Salinas and Manuel J. Mar{\'{\i}}n{-}Jim{\'{e}}nez}, title = {Empirical Study of Attention-Based Models for Automatic Classification of Gastrointestinal Endoscopy Images}, booktitle = {Computer Analysis of Images and Patterns - 20th International Conference, {CAIP}}, series = {Lecture Notes in Computer Science}, volume = {14185}, pages = {98--108}, year = {2023}, doi = {10.1007/978-3-031-44240-7\_10} }

Owner

  • Login: richardesp
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the following work:"
authors:
  - family-names: Espantaleón-Pérez
    given-names: Ricardo
  - family-names: Jiménez-Velasco
    given-names: Isabel
  - family-names: Muñoz-Salinas
    given-names: Rafael
  - family-names: Marín-Jiménez
    given-names: Manuel J.
title: "Empirical Study of Attention-Based Models for Automatic Classification of Gastrointestinal Endoscopy Images"
type: article
year: 2023
doi: "10.1007/978-3-031-44240-7_10"
conference:
  name: "Computer Analysis of Images and Patterns - 20th International Conference, CAIP"
  series: "Lecture Notes in Computer Science"
  volume: "14185"
pages: "98-108"

GitHub Events

Total
  • Release event: 1
  • Watch event: 2
  • Push event: 2
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 2
  • Push event: 2
  • Create event: 1

Dependencies

requirements.txt pypi
  • absl-py =1.4.0=pypi_0
  • anyio =3.6.2=pyhd8ed1ab_0
  • appdirs =1.4.4=pyh9f0ad1d_0
  • appnope =0.1.3=pyhd8ed1ab_0
  • argon2-cffi =21.3.0=pyhd8ed1ab_0
  • argon2-cffi-bindings =21.2.0=py310h1a28f6b_0
  • asttokens =2.2.1=pyhd8ed1ab_0
  • astunparse =1.6.3=pypi_0
  • attrs =22.2.0=pyh71513ae_0
  • babel =2.11.0=py310hca03da5_0
  • backcall =0.2.0=pyh9f0ad1d_0
  • backports =1.0=pyhd8ed1ab_3
  • backports.functools_lru_cache =1.6.4=pyhd8ed1ab_0
  • bayesian-optimization =1.4.2=pypi_0
  • beautifulsoup4 =4.11.2=pyha770c72_0
  • blas =1.0=openblas
  • bleach =6.0.0=pyhd8ed1ab_0
  • boto3 =1.26.64=pyhd8ed1ab_0
  • botocore =1.29.64=pyhd8ed1ab_0
  • bottleneck =1.3.5=py310h96f19d2_0
  • brotli =1.0.9=h1a8c8d9_8
  • brotli-bin =1.0.9=h1a8c8d9_8
  • brotlipy =0.7.0=py310h1a28f6b_1002
  • bzip2 =1.0.8=h3422bc3_4
  • c-ares =1.18.1=h3422bc3_0
  • ca-certificates =2023.01.10=hca03da5_0
  • cached-property =1.5.2=hd8ed1ab_1
  • cached_property =1.5.2=pyha770c72_1
  • cachetools =5.3.0=pypi_0
  • cairo =1.16.0=h302bd0f_3
  • certifi =2022.12.7=py310hca03da5_0
  • cffi =1.15.1=py310h80987f9_3
  • charset-normalizer =2.1.1=pyhd8ed1ab_0
  • click =8.1.3=unix_pyhd8ed1ab_2
  • cloudpickle =2.2.1=pypi_0
  • colorama =0.4.6=pyhd8ed1ab_0
  • comm =0.1.2=pyhd8ed1ab_0
  • cryptography =38.0.4=py310h834c97f_0
  • cycler =0.11.0=pyhd8ed1ab_0
  • debugpy =1.5.1=py310hc377ac9_0
  • decorator =5.1.1=pyhd8ed1ab_0
  • defusedxml =0.7.1=pyhd8ed1ab_0
  • eigen =3.3.7=h525c30c_1
  • entrypoints =0.4=pyhd8ed1ab_0
  • executing =1.2.0=pyhd8ed1ab_0
  • expat =2.4.9=hc377ac9_0
  • fftw =3.3.9=h1a28f6b_1
  • flask =2.2.2=pyhd8ed1ab_0
  • flatbuffers =23.1.21=pypi_0
  • flit-core =3.8.0=pyhd8ed1ab_0
  • fontconfig =2.14.1=h6b8db82_1
  • fonttools =4.25.0=pyhd3eb1b0_0
  • freetype =2.12.1=h1192e45_0
  • gast =0.4.0=pypi_0
  • gettext =0.21.0=h826f4ad_0
  • giflib =5.2.1=h80987f9_1
  • glib =2.69.1=h514c7bf_2
  • google-auth =2.16.0=pypi_0
  • google-auth-oauthlib =0.4.6=pypi_0
  • google-pasta =0.2.0=pypi_0
  • graphite2 =1.3.14=hc377ac9_1
  • grpcio =1.51.1=pypi_0
  • gst-plugins-base =1.14.1=hf0a386a_0
  • gstreamer =1.14.1=he09cfb7_0
  • gym =0.26.2=pypi_0
  • gym-notices =0.0.8=pypi_0
  • h5py =3.7.0=py310h181c318_0
  • harfbuzz =4.3.0=hb1b0ec1_0
  • hdf5 =1.12.1=h160e8cb_2
  • icu =68.1=hc377ac9_0
  • idna =3.4=pyhd8ed1ab_0
  • importlib-metadata =6.0.0=pyha770c72_0
  • importlib_metadata =6.0.0=hd8ed1ab_0
  • importlib_resources =5.10.2=pyhd8ed1ab_0
  • ipykernel =6.19.2=py310h33ce5c2_0
  • ipython =8.9.0=pyhd1c38e8_0
  • ipython_genutils =0.2.0=py_1
  • ipywidgets =8.0.4=pyhd8ed1ab_0
  • itsdangerous =2.1.2=pyhd8ed1ab_0
  • jedi =0.18.2=pyhd8ed1ab_0
  • jinja2 =3.1.2=pyhd8ed1ab_1
  • jmespath =1.0.1=pyhd8ed1ab_0
  • joblib =1.2.0=pyhd8ed1ab_0
  • jpeg =9e=he4db4b2_2
  • json5 =0.9.6=pyhd3eb1b0_0
  • jsonschema =4.17.3=pyhd8ed1ab_0
  • jupyter =1.0.0=py310hca03da5_8
  • jupyter_client =8.0.2=pyhd8ed1ab_0
  • jupyter_console =6.4.4=pyhd8ed1ab_0
  • jupyter_core =5.1.1=py310hca03da5_0
  • jupyter_events =0.6.3=pyhd8ed1ab_0
  • jupyter_server =1.23.4=py310hca03da5_0
  • jupyter_server_terminals =0.4.4=pyhd8ed1ab_1
  • jupyterlab =3.5.3=py310hca03da5_0
  • jupyterlab_pygments =0.2.2=pyhd8ed1ab_0
  • jupyterlab_server =2.16.5=py310hca03da5_0
  • jupyterlab_widgets =3.0.5=pyhd8ed1ab_0
  • kaggle =1.5.12=pypi_0
  • keras =2.11.0=pypi_0
  • keras-preprocessing =1.1.2=pyhd3eb1b0_0
  • kiwisolver =1.4.4=py310h313beb8_0
  • krb5 =1.19.4=h8380606_0
  • lcms2 =2.14=h481adae_1
  • lerc =3.0=hc377ac9_0
  • libaec =1.0.6=hb7217d7_1
  • libblas =3.9.0=16_osxarm64_openblas
  • libbrotlicommon =1.0.9=h1a8c8d9_8
  • libbrotlidec =1.0.9=h1a8c8d9_8
  • libbrotlienc =1.0.9=h1a8c8d9_8
  • libcblas =3.9.0=16_osxarm64_openblas
  • libclang =15.0.6.1=pypi_0
  • libcurl =7.87.0=h0f1d93c_0
  • libcxx =14.0.6=h2692d47_0
  • libdeflate =1.8=h1a28f6b_5
  • libedit =3.1.20221030=h80987f9_0
  • libev =4.33=h642e427_1
  • libffi =3.4.2=h3422bc3_5
  • libgfortran =5.0.0=11_3_0_hd922786_27
  • libgfortran5 =11.3.0=hdaf2cc0_27
  • libiconv =1.17=he4db4b2_0
  • liblapack =3.9.0=16_osxarm64_openblas
  • libllvm12 =12.0.0=h12f7ac0_4
  • libnghttp2 =1.46.0=h95c9599_0
  • libopenblas =0.3.21=openmp_hc731615_3
  • libpng =1.6.37=hb8d0fd4_0
  • libpq =12.9=h65cfe13_3
  • libsodium =1.0.18=h27ca646_1
  • libssh2 =1.10.0=hf27765b_0
  • libtiff =4.5.0=h313beb8_1
  • libwebp =1.2.4=h68602c7_0
  • libwebp-base =1.2.4=h57fd34a_0
  • libxcb =1.13=h9b22ae9_1004
  • libxml2 =2.9.14=h8c5e841_0
  • libxslt =1.1.35=h9833966_0
  • llvm-openmp =15.0.7=h7cfbb63_0
  • lxml =4.9.1=py310h2fae87d_0
  • lz4-c =1.9.4=h313beb8_0
  • markdown =3.4.1=pypi_0
  • markupsafe =2.1.1=py310h1a28f6b_0
  • matplotlib =3.5.2=py310hca03da5_0
  • matplotlib-base =3.5.2=py310hc377ac9_0
  • matplotlib-inline =0.1.6=pyhd8ed1ab_0
  • mistune =2.0.4=pyhd8ed1ab_0
  • munkres =1.1.4=pyh9f0ad1d_0
  • nbclassic =0.5.1=pyhd8ed1ab_0
  • nbclient =0.7.2=pyhd8ed1ab_0
  • nbconvert =7.2.9=pyhd8ed1ab_0
  • nbconvert-core =7.2.9=pyhd8ed1ab_0
  • nbconvert-pandoc =7.2.9=pyhd8ed1ab_0
  • nbformat =5.7.3=pyhd8ed1ab_0
  • ncurses =6.3=h07bb92c_1
  • nest-asyncio =1.5.6=pyhd8ed1ab_0
  • notebook =6.5.2=pyha770c72_1
  • notebook-shim =0.2.2=pyhd8ed1ab_0
  • nspr =4.33=hc377ac9_0
  • nss =3.74=h142855e_0
  • numexpr =2.8.4=py310hecc3335_0
  • numpy =1.23.5=py310hb93e574_0
  • numpy-base =1.23.5=py310haf87e8b_0
  • oauthlib =3.2.2=pypi_0
  • opencv =4.6.0=py310he2359d5_2
  • openssl =1.1.1s=h1a28f6b_0
  • opt-einsum =3.3.0=pypi_0
  • packaging =23.0=pyhd8ed1ab_0
  • pandas =1.5.2=py310h46d7db6_0
  • pandas-datareader =0.10.0=pyh6c4a22f_0
  • pandoc =2.19.2=hce30654_1
  • pandocfilters =1.5.0=pyhd8ed1ab_0
  • parso =0.8.3=pyhd8ed1ab_0
  • pcre =8.45=hc377ac9_0
  • pexpect =4.8.0=pyh1a96a4e_2
  • pickleshare =0.7.5=py_1003
  • pillow =9.3.0=py310h313beb8_2
  • pip =23.0=pyhd8ed1ab_0
  • pixman =0.40.0=h1a28f6b_0
  • pkgutil-resolve-name =1.3.10=pyhd8ed1ab_0
  • platformdirs =2.6.2=pyhd8ed1ab_0
  • ply =3.11=py310hca03da5_0
  • pooch =1.6.0=pyhd8ed1ab_0
  • prometheus_client =0.16.0=pyhd8ed1ab_0
  • prompt-toolkit =3.0.36=pyha770c72_0
  • prompt_toolkit =3.0.36=hd8ed1ab_0
  • protobuf =3.19.6=pypi_0
  • psutil =5.9.0=py310h1a28f6b_0
  • pthread-stubs =0.4=h27ca646_1001
  • ptyprocess =0.7.0=pyhd3deb0d_0
  • pure_eval =0.2.2=pyhd8ed1ab_0
  • pyasn1 =0.4.8=pypi_0
  • pyasn1-modules =0.2.8=pypi_0
  • pycparser =2.21=pyhd8ed1ab_0
  • pygments =2.14.0=pyhd8ed1ab_0
  • pyopenssl =23.0.0=pyhd8ed1ab_0
  • pyparsing =3.0.9=pyhd8ed1ab_0
  • pyqt =5.15.7=py310hc377ac9_0
  • pyqt5-sip =12.11.0=pypi_0
  • pyrsistent =0.18.0=py310h1a28f6b_0
  • pysocks =1.7.1=pyha2e5f31_6
  • python =3.10.9=hc0d8a6c_0
  • python-dateutil =2.8.2=pyhd8ed1ab_0
  • python-fastjsonschema =2.16.2=pyhd8ed1ab_0
  • python-json-logger =2.0.4=pyhd8ed1ab_0
  • python-slugify =8.0.0=pypi_0
  • pytz =2022.7.1=pyhd8ed1ab_0
  • pyyaml =6.0=py310h80987f9_1
  • pyzmq =23.2.0=py310hc377ac9_0
  • qt-main =5.15.2=ha2d02b5_7
  • qt-webengine =5.15.9=h2903aaf_4
  • qtconsole =5.4.0=py310hca03da5_0
  • qtpy =2.2.0=py310hca03da5_0
  • qtwebkit =5.212=h0f11f3c_4
  • readline =8.1.2=h46ed386_0
  • requests =2.28.2=pyhd8ed1ab_0
  • requests-oauthlib =1.3.1=pypi_0
  • rfc3339-validator =0.1.4=pyhd8ed1ab_0
  • rfc3986-validator =0.1.1=pyh9f0ad1d_0
  • rsa =4.9=pypi_0
  • s3transfer =0.6.0=pyhd8ed1ab_0
  • scikit-learn =1.2.0=py310h313beb8_1
  • scipy =1.9.3=py310h20cbe94_0
  • send2trash =1.8.0=pyhd8ed1ab_0
  • setuptools =67.1.0=pyhd8ed1ab_0
  • sip =6.6.2=py310hc377ac9_0
  • six =1.16.0=pyh6c4a22f_0
  • sniffio =1.3.0=pyhd8ed1ab_0
  • soupsieve =2.3.2.post1=pyhd8ed1ab_0
  • sqlite =3.40.1=h7a7dc30_0
  • stack_data =0.6.2=pyhd8ed1ab_0
  • tensorboard =2.11.2=pypi_0
  • tensorboard-data-server =0.6.1=pypi_0
  • tensorboard-plugin-wit =1.8.1=pypi_0
  • tensorflow-addons =0.19.0=pypi_0
  • tensorflow-estimator =2.11.0=pypi_0
  • tensorflow-macos =2.11.0=pypi_0
  • tensorflow-metal =0.7.0=pypi_0
  • termcolor =2.2.0=pypi_0
  • terminado =0.17.1=pyhd1c38e8_0
  • text-unidecode =1.3=pypi_0
  • threadpoolctl =3.1.0=pyh8a188c0_0
  • tinycss2 =1.2.1=pyhd8ed1ab_0
  • tk =8.6.12=hb8d0fd4_0
  • toml =0.10.2=pyhd3eb1b0_0
  • tomli =2.0.1=py310hca03da5_0
  • tornado =6.2=py310h1a28f6b_0
  • tqdm =4.64.1=pyhd8ed1ab_0
  • traitlets =5.9.0=pyhd8ed1ab_0
  • typeguard =2.13.3=pypi_0
  • typing-extensions =4.4.0=hd8ed1ab_0
  • typing_extensions =4.4.0=pyha770c72_0
  • tzdata =2022g=h191b570_0
  • urllib3 =1.26.14=pyhd8ed1ab_0
  • wcwidth =0.2.6=pyhd8ed1ab_0
  • webencodings =0.5.1=py_1
  • websocket-client =1.5.1=pyhd8ed1ab_0
  • werkzeug =2.2.2=pyhd8ed1ab_0
  • wheel =0.38.4=pyhd8ed1ab_0
  • widgetsnbextension =4.0.5=pyhd8ed1ab_0
  • wrapt =1.14.1=pypi_0
  • xorg-libxau =1.0.9=h27ca646_0
  • xorg-libxdmcp =1.1.3=h27ca646_0
  • xz =5.2.10=h80987f9_1
  • yaml =0.2.5=h3422bc3_2
  • zeromq =4.3.4=hbdafb3b_1
  • zipp =3.12.1=pyhd8ed1ab_0
  • zlib =1.2.13=h5a0b063_0
  • zstd =1.5.2=h8574219_0