in-vehicle-driver-state-analysis

In-Vehicle Driver State Analysis Using Image Segmentation

https://github.com/matejfric/in-vehicle-driver-state-analysis

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary

Keywords

cnn
Last synced: 6 months ago · JSON representation ·

Repository

In-Vehicle Driver State Analysis Using Image Segmentation

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
cnn
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

In-Vehicle Driver State Analysis

DOI

This repository contains supplementary source code for the diploma thesis In-Vehicle Driver State Analysis Using Image Segmentation.

PipelineAnimation

(Model results on the Driver Monitoring Dataset [1].)


1. Repository Organization

  • model/ - Library code.
  • notebooks/ - Runnable code, primarily Jupyter notebooks, and Python scripts used to orchestrate notebook execution with varying parameters.
    • notebooks/example/ - Demonstrates an inference pipeline on sample data. Instructions for running the pipeline are provided here.
    • notebooks/datasets/ - Dataset preprocessing.
    • notebooks/eda/ - Exploratory data analysis.
    • notebooks/sam/ - Segment Anything Model 2 (SAM 2) inference (generation of ground truth masks). This module uses a different virtual environment than the rest of the project, as described here.
    • notebooks/semantic_segmentation/ - Semantic segmentation model training, evaluation, and inference.
    • notebooks/tae/ - Training and evaluation of the Temporal Autoencoder (TAE).
    • notebooks/stae/ - Training and evaluation of the Spatio-Temporal Autoencoder (STAE).
    • notebooks/memory_map/ - Exports training data to memory-mapped files (np.memmap) to speed up training and reduce CPU usage, at the cost of increased disk space.
    • notebooks/clip/ - OpenAI CLIP model (this topic is not covered in the thesis).

2. Development Environment

  • Ubuntu 24.04
  • CUDA 12.5
  • Python 3.12

bash conda env create -f environment.yml -n driver-state-analysis pip install -e . pre-commit install

3. What can we do with an RGB image?

The following diagram illustrates a (non-exhaustive) list of different tasks that can be performed using an RGB image as input:

mermaid mindmap root)RGB Image) (GT Mask) YOLOv8x Segment Anything 2 (Semantic Segmentation) EfficientNet ResNet U-Net UNet++ (Monocular Depth Estimation) Image MiDaS Marigold Depth Anything Depth Anything 2 Video Video Depth Anything DepthCrafter (Edge Detection) Canny Sobel Laplacian (Pose Estimation) Graph NN YOLOv11 LBP HOG CLIP Text Skin Segmentation

4. References

  1. Ortega, J., Kose, N., Cañas, P., Chao, M.a., Unnervik, A., Nieto, M., Otaegui, O., & Salgado, L. (2020). DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis. In: A. Bartoli & A. Fusiello (eds), Computer Vision - ECCV 2020 Workshops (pg. 387–405). Springer International Publishing.
  2. Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., Bao, B., Bell, P., Berard, D., Burovski, E., Chauhan, G., Chourdia, A., Constable, W., Desmaison, A., DeVito, Z., Ellison, E., Feng, W., Gong, J., Gschwind, M., Hirsh, B., Huang, S., Kalambarkar, K., Kirsch, L., Lazos, M., Lezcano, M., Liang, Y., Liang, J., Lu, Y., Luk, C., Maher, B., Pan, Y., Puhrsch, C., Reso, M., Saroufim, M., Siraichi, M. Y., Suk, H., Suo, M., Tillet, P., Wang, E., Wang, X., Wen, W., Zhang, S., Zhao, X., Zhou, K., Zou, R., Mathews, A., Chanan, G., Wu, P., & Chintala, S. (2024). PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation [Conference paper]. 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS '24). https://doi.org/10.1145/3620665.3640366
  3. Iakubovskii, P. (2019). Segmentation Models PyTorch (Version 0.4.0) [Computer software]. GitHub. https://github.com/qubvel/segmentation_models.pytorch
  4. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. M. (2020). Transformers: State-of-the-Art Natural Language Processing [Conference paper]. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
  5. Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K. V., Carion, N., Wu, C.-Y., Girshick, R., Dollár, P., & Feichtenhofer, C. (2024). SAM 2: Segment anything in images and videos (arXiv:2408.00714). arXiv. https://arxiv.org/abs/2408.00714
  6. Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., & Zhao, H. (2024). Depth Anything V2 (arXiv:2406.09414). arXiv. https://arxiv.org/abs/2406.09414
  7. Jocher, G., Qiu, J., & Chaurasia, A. (2023). Ultralytics YOLO (Version 8.0.0) [Computer software]. https://github.com/ultralytics/ultralytics
  8. ONNX Runtime Contributors. (2021). ONNX Runtime (Version 1.21.0) [Computer software]. https://onnxruntime.ai/
  9. ONNX Contributors. (2019). ONNX: Open Neural Network Exchange (Version 1.17.0) [Computer software]. https://onnx.ai/
  10. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. arXiv. https://arxiv.org/abs/2103.00020

Owner

  • Name: Matěj Frič
  • Login: matejfric
  • Kind: user
  • Location: Ostrava
  • Company: VSB-TUO

I am a hardworking university student pursuing a degree in Computation and Applied Mathematics. I am open to various internship opportunities.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Frič
    given-names: Matěj
    orcid: https://orcid.org/0009-0000-9885-1521
title: "Supplementary Material: In-Vehicle Driver State Analysis Using Image Segmentation"
version: 1.0.0
license: MIT
date-released: 2025-04-30
repository-code: "https://github.com/matejfric/in-vehicle-driver-state-analysis"
doi: 10.5281/zenodo.15242002

GitHub Events

Total
  • Release event: 1
  • Public event: 1
  • Push event: 14
  • Create event: 1
Last Year
  • Release event: 1
  • Public event: 1
  • Push event: 14
  • Create event: 1

Dependencies

notebooks/example/Dockerfile docker
  • python 3.12-slim build
environment.yml pypi
  • accelerate ==1.3.0
  • albucore ==0.0.23
  • albumentations ==2.0.5
  • ansicolors ==1.1.8
  • anyio ==4.6.2.post1
  • appdirs ==1.4.4
  • argon2-cffi ==23.1.0
  • argon2-cffi-bindings ==21.2.0
  • arrow ==1.3.0
  • async-lru ==2.0.5
  • babel ==2.17.0
  • backoff ==2.2.1
  • beautifulsoup4 ==4.13.3
  • bleach ==6.2.0
  • boto3 ==1.35.44
  • botocore ==1.35.44
  • build ==1.2.2.post1
  • cachecontrol ==0.14.2
  • cleo ==2.1.0
  • clip ==1.0
  • coloredlogs ==15.0.1
  • crashtest ==0.4.1
  • dacite ==1.6.0
  • dagshub ==0.3.39
  • dagshub-annotation-converter ==0.1.1
  • dataclasses-json ==0.6.7
  • decord ==0.6.0
  • defusedxml ==0.7.1
  • diffusers ==0.32.2
  • dulwich ==0.22.8
  • easydict ==1.13
  • efficientnet-pytorch ==0.7.1
  • einops ==0.8.0
  • fastjsonschema ==2.20.0
  • ffmpeg-python ==0.2.0
  • findpython ==0.6.2
  • flatbuffers ==25.2.10
  • fqdn ==1.5.1
  • ftfy ==6.3.1
  • fusepy ==3.0.1
  • future ==1.0.0
  • gql ==3.5.0
  • h11 ==0.14.0
  • httpcore ==1.0.6
  • httpx ==0.27.2
  • huggingface-hub ==0.26.0
  • humanfriendly ==10.0
  • hypothesis ==6.118.6
  • imageio-ffmpeg ==0.6.0
  • iniconfig ==2.0.0
  • installer ==0.7.0
  • ipywidgets ==8.1.5
  • isoduration ==20.11.0
  • jaraco-classes ==3.4.0
  • jaraco-context ==6.0.1
  • jaraco-functools ==4.1.0
  • jeepney ==0.9.0
  • jmespath ==1.0.1
  • json5 ==0.10.0
  • jsonpointer ==3.0.0
  • jsonschema ==4.23.0
  • jsonschema-specifications ==2024.10.1
  • jupyter ==1.1.1
  • jupyter-console ==6.6.3
  • jupyter-events ==0.12.0
  • jupyter-lsp ==2.2.5
  • jupyter-server ==2.15.0
  • jupyter-server-terminals ==0.5.3
  • jupyterlab ==4.3.6
  • jupyterlab-pygments ==0.3.0
  • jupyterlab-server ==2.27.3
  • jupyterlab-widgets ==3.0.13
  • jupytext ==1.17.0
  • keyring ==25.6.0
  • lxml ==5.3.0
  • markdown-it-py ==3.0.0
  • marshmallow ==3.23.0
  • mdit-py-plugins ==0.4.2
  • mdurl ==0.1.2
  • mistune ==3.1.3
  • ml-dtypes ==0.5.0
  • model ==0.1.0
  • more-itertools ==10.6.0
  • msgpack ==1.1.0
  • munch ==4.0.0
  • mypy-extensions ==1.0.0
  • nbclient ==0.10.0
  • nbconvert ==7.16.6
  • nbformat ==5.10.4
  • notebook ==7.3.3
  • notebook-shim ==0.2.4
  • nvidia-cublas-cu12 ==12.4.5.8
  • nvidia-cuda-cupti-cu12 ==12.4.127
  • nvidia-cuda-nvrtc-cu12 ==12.4.127
  • nvidia-cuda-runtime-cu12 ==12.4.127
  • nvidia-cudnn-cu12 ==9.1.0.70
  • nvidia-cufft-cu12 ==11.2.1.3
  • nvidia-curand-cu12 ==10.3.5.147
  • nvidia-cusolver-cu12 ==11.6.1.9
  • nvidia-cusparse-cu12 ==12.3.1.170
  • nvidia-nccl-cu12 ==2.21.5
  • nvidia-nvjitlink-cu12 ==12.4.127
  • nvidia-nvtx-cu12 ==12.4.127
  • onnx ==1.17.0
  • onnxruntime-gpu ==1.21.0
  • onnxscript ==0.1.0.dev20241110
  • overrides ==7.7.0
  • pandocfilters ==1.5.1
  • papermill ==2.6.0
  • pathvalidate ==3.2.1
  • pbs-installer ==2025.2.12
  • pkginfo ==1.12.1.2
  • pluggy ==1.5.0
  • poetry ==2.1.1
  • poetry-core ==2.1.1
  • pre-commit ==4.1.0
  • pretrainedmodels ==0.7.4
  • pycocotools ==2.0.8
  • pydantic ==2.10.6
  • pydantic-core ==2.27.2
  • pyproject-hooks ==1.2.0
  • pytest ==8.3.3
  • python-graphviz ==0.20.3
  • python-json-logger ==3.3.0
  • rapidfuzz ==3.12.2
  • referencing ==0.35.1
  • regex ==2024.9.11
  • requests-toolbelt ==1.0.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rich ==13.9.2
  • rpds-py ==0.20.0
  • ruff ==0.9.7
  • s3transfer ==0.10.3
  • safetensors ==0.4.5
  • sam2util ==1.1.3
  • seaborn ==0.13.2
  • secretstorage ==3.3.3
  • segmentation-models-pytorch ==0.4.0
  • send2trash ==1.8.3
  • shellingham ==1.5.4
  • simsimd ==6.2.1
  • sniffio ==1.3.1
  • sortedcontainers ==2.4.0
  • soupsieve ==2.6
  • stringzilla ==3.12.1
  • supervision ==0.25.1
  • sympy ==1.13.1
  • tabulate ==0.9.0
  • tenacity ==9.0.0
  • terminado ==0.18.1
  • timm ==0.9.7
  • tinycss2 ==1.4.0
  • tokenizers ==0.20.1
  • tomlkit ==0.13.2
  • torch ==2.5.1
  • torchinfo ==1.8.0
  • torchview ==0.2.6
  • transformers ==4.46.0
  • treelib ==1.7.0
  • trove-classifiers ==2025.3.3.18
  • types-python-dateutil ==2.9.0.20241206
  • typing-extensions ==4.12.2
  • typing-inspect ==0.9.0
  • uri-template ==1.3.0
  • virtualenv ==20.29.2
  • webcolors ==24.11.1
  • webencodings ==0.5.1
  • widgetsnbextension ==4.0.13
  • xformers ==0.0.29.post1
  • zstandard ==0.23.0
model/depth_anything/requirements.txt pypi
  • decord *
  • easydict *
  • einops *
  • imageio *
  • imageio-ffmpeg *
  • matplotlib *
  • numpy *
  • opencv-python *
  • pillow *
  • torch *
  • torchvision *
  • tqdm *
  • xformers *
model/setup.py pypi
notebooks/example/requirements.txt pypi
  • jupyterlab ==4.4.0
  • matplotlib ==3.10.1
  • mlflow ==2.21.3
  • numpy ==2.2.4
  • onnxruntime ==1.21.1
  • opencv-python-headless ==4.11.0.86
  • pandas ==2.2.3
  • pillow ==11.2.1
  • tqdm ==4.67.1
  • transformers ==4.51.3
notebooks/sam/requirements.txt pypi
  • ffmpeg-python *
  • ipykernel *
  • matplotlib *
  • ninja *
  • papermill *
  • sam2util *
  • setuptools *
  • supervision *
  • ultralytics *
  • wheel *
pyproject.toml pypi