in-vehicle-driver-state-analysis
In-Vehicle Driver State Analysis Using Image Segmentation
https://github.com/matejfric/in-vehicle-driver-state-analysis
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Keywords
Repository
In-Vehicle Driver State Analysis Using Image Segmentation
Basic Info
- Host: GitHub
- Owner: matejfric
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://doi.org/10.5281/zenodo.15242002
- Size: 10.6 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
In-Vehicle Driver State Analysis
This repository contains supplementary source code for the diploma thesis In-Vehicle Driver State Analysis Using Image Segmentation.
(Model results on the Driver Monitoring Dataset [1].)
- 1. Repository Organization
- 2. Development Environment
- 3. What can we do with an RGB image?
- 4. References
1. Repository Organization
model/- Library code.notebooks/- Runnable code, primarily Jupyter notebooks, and Python scripts used to orchestrate notebook execution with varying parameters.notebooks/example/- Demonstrates an inference pipeline on sample data. Instructions for running the pipeline are provided here.notebooks/datasets/- Dataset preprocessing.notebooks/eda/- Exploratory data analysis.notebooks/sam/- Segment Anything Model 2 (SAM 2) inference (generation of ground truth masks). This module uses a different virtual environment than the rest of the project, as described here.notebooks/semantic_segmentation/- Semantic segmentation model training, evaluation, and inference.notebooks/tae/- Training and evaluation of the Temporal Autoencoder (TAE).notebooks/stae/- Training and evaluation of the Spatio-Temporal Autoencoder (STAE).notebooks/memory_map/- Exports training data to memory-mapped files (np.memmap) to speed up training and reduce CPU usage, at the cost of increased disk space.notebooks/clip/- OpenAI CLIP model (this topic is not covered in the thesis).
2. Development Environment
- Ubuntu 24.04
- CUDA 12.5
- Python 3.12
bash
conda env create -f environment.yml -n driver-state-analysis
pip install -e .
pre-commit install
3. What can we do with an RGB image?
The following diagram illustrates a (non-exhaustive) list of different tasks that can be performed using an RGB image as input:
mermaid
mindmap
root)RGB Image)
(GT Mask)
YOLOv8x
Segment Anything 2
(Semantic Segmentation)
EfficientNet
ResNet
U-Net
UNet++
(Monocular Depth Estimation)
Image
MiDaS
Marigold
Depth Anything
Depth Anything 2
Video
Video Depth Anything
DepthCrafter
(Edge Detection)
Canny
Sobel
Laplacian
(Pose Estimation)
Graph NN
YOLOv11
LBP
HOG
CLIP
Text
Skin Segmentation
4. References
- Ortega, J., Kose, N., Cañas, P., Chao, M.a., Unnervik, A., Nieto, M., Otaegui, O., & Salgado, L. (2020). DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis. In: A. Bartoli & A. Fusiello (eds), Computer Vision - ECCV 2020 Workshops (pg. 387–405). Springer International Publishing.
- Ansel, J., Yang, E., He, H., Gimelshein, N., Jain, A., Voznesensky, M., Bao, B., Bell, P., Berard, D., Burovski, E., Chauhan, G., Chourdia, A., Constable, W., Desmaison, A., DeVito, Z., Ellison, E., Feng, W., Gong, J., Gschwind, M., Hirsh, B., Huang, S., Kalambarkar, K., Kirsch, L., Lazos, M., Lezcano, M., Liang, Y., Liang, J., Lu, Y., Luk, C., Maher, B., Pan, Y., Puhrsch, C., Reso, M., Saroufim, M., Siraichi, M. Y., Suk, H., Suo, M., Tillet, P., Wang, E., Wang, X., Wen, W., Zhang, S., Zhao, X., Zhou, K., Zou, R., Mathews, A., Chanan, G., Wu, P., & Chintala, S. (2024). PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation [Conference paper]. 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS '24). https://doi.org/10.1145/3620665.3640366
- Iakubovskii, P. (2019). Segmentation Models PyTorch (Version 0.4.0) [Computer software]. GitHub. https://github.com/qubvel/segmentation_models.pytorch
- Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. M. (2020). Transformers: State-of-the-Art Natural Language Processing [Conference paper]. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
- Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K. V., Carion, N., Wu, C.-Y., Girshick, R., Dollár, P., & Feichtenhofer, C. (2024). SAM 2: Segment anything in images and videos (arXiv:2408.00714). arXiv. https://arxiv.org/abs/2408.00714
- Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., & Zhao, H. (2024). Depth Anything V2 (arXiv:2406.09414). arXiv. https://arxiv.org/abs/2406.09414
- Jocher, G., Qiu, J., & Chaurasia, A. (2023). Ultralytics YOLO (Version 8.0.0) [Computer software]. https://github.com/ultralytics/ultralytics
- ONNX Runtime Contributors. (2021). ONNX Runtime (Version 1.21.0) [Computer software]. https://onnxruntime.ai/
- ONNX Contributors. (2019). ONNX: Open Neural Network Exchange (Version 1.17.0) [Computer software]. https://onnx.ai/
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. arXiv. https://arxiv.org/abs/2103.00020
Owner
- Name: Matěj Frič
- Login: matejfric
- Kind: user
- Location: Ostrava
- Company: VSB-TUO
- Repositories: 1
- Profile: https://github.com/matejfric
I am a hardworking university student pursuing a degree in Computation and Applied Mathematics. I am open to various internship opportunities.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Frič
given-names: Matěj
orcid: https://orcid.org/0009-0000-9885-1521
title: "Supplementary Material: In-Vehicle Driver State Analysis Using Image Segmentation"
version: 1.0.0
license: MIT
date-released: 2025-04-30
repository-code: "https://github.com/matejfric/in-vehicle-driver-state-analysis"
doi: 10.5281/zenodo.15242002
GitHub Events
Total
- Release event: 1
- Public event: 1
- Push event: 14
- Create event: 1
Last Year
- Release event: 1
- Public event: 1
- Push event: 14
- Create event: 1
Dependencies
- python 3.12-slim build
- accelerate ==1.3.0
- albucore ==0.0.23
- albumentations ==2.0.5
- ansicolors ==1.1.8
- anyio ==4.6.2.post1
- appdirs ==1.4.4
- argon2-cffi ==23.1.0
- argon2-cffi-bindings ==21.2.0
- arrow ==1.3.0
- async-lru ==2.0.5
- babel ==2.17.0
- backoff ==2.2.1
- beautifulsoup4 ==4.13.3
- bleach ==6.2.0
- boto3 ==1.35.44
- botocore ==1.35.44
- build ==1.2.2.post1
- cachecontrol ==0.14.2
- cleo ==2.1.0
- clip ==1.0
- coloredlogs ==15.0.1
- crashtest ==0.4.1
- dacite ==1.6.0
- dagshub ==0.3.39
- dagshub-annotation-converter ==0.1.1
- dataclasses-json ==0.6.7
- decord ==0.6.0
- defusedxml ==0.7.1
- diffusers ==0.32.2
- dulwich ==0.22.8
- easydict ==1.13
- efficientnet-pytorch ==0.7.1
- einops ==0.8.0
- fastjsonschema ==2.20.0
- ffmpeg-python ==0.2.0
- findpython ==0.6.2
- flatbuffers ==25.2.10
- fqdn ==1.5.1
- ftfy ==6.3.1
- fusepy ==3.0.1
- future ==1.0.0
- gql ==3.5.0
- h11 ==0.14.0
- httpcore ==1.0.6
- httpx ==0.27.2
- huggingface-hub ==0.26.0
- humanfriendly ==10.0
- hypothesis ==6.118.6
- imageio-ffmpeg ==0.6.0
- iniconfig ==2.0.0
- installer ==0.7.0
- ipywidgets ==8.1.5
- isoduration ==20.11.0
- jaraco-classes ==3.4.0
- jaraco-context ==6.0.1
- jaraco-functools ==4.1.0
- jeepney ==0.9.0
- jmespath ==1.0.1
- json5 ==0.10.0
- jsonpointer ==3.0.0
- jsonschema ==4.23.0
- jsonschema-specifications ==2024.10.1
- jupyter ==1.1.1
- jupyter-console ==6.6.3
- jupyter-events ==0.12.0
- jupyter-lsp ==2.2.5
- jupyter-server ==2.15.0
- jupyter-server-terminals ==0.5.3
- jupyterlab ==4.3.6
- jupyterlab-pygments ==0.3.0
- jupyterlab-server ==2.27.3
- jupyterlab-widgets ==3.0.13
- jupytext ==1.17.0
- keyring ==25.6.0
- lxml ==5.3.0
- markdown-it-py ==3.0.0
- marshmallow ==3.23.0
- mdit-py-plugins ==0.4.2
- mdurl ==0.1.2
- mistune ==3.1.3
- ml-dtypes ==0.5.0
- model ==0.1.0
- more-itertools ==10.6.0
- msgpack ==1.1.0
- munch ==4.0.0
- mypy-extensions ==1.0.0
- nbclient ==0.10.0
- nbconvert ==7.16.6
- nbformat ==5.10.4
- notebook ==7.3.3
- notebook-shim ==0.2.4
- nvidia-cublas-cu12 ==12.4.5.8
- nvidia-cuda-cupti-cu12 ==12.4.127
- nvidia-cuda-nvrtc-cu12 ==12.4.127
- nvidia-cuda-runtime-cu12 ==12.4.127
- nvidia-cudnn-cu12 ==9.1.0.70
- nvidia-cufft-cu12 ==11.2.1.3
- nvidia-curand-cu12 ==10.3.5.147
- nvidia-cusolver-cu12 ==11.6.1.9
- nvidia-cusparse-cu12 ==12.3.1.170
- nvidia-nccl-cu12 ==2.21.5
- nvidia-nvjitlink-cu12 ==12.4.127
- nvidia-nvtx-cu12 ==12.4.127
- onnx ==1.17.0
- onnxruntime-gpu ==1.21.0
- onnxscript ==0.1.0.dev20241110
- overrides ==7.7.0
- pandocfilters ==1.5.1
- papermill ==2.6.0
- pathvalidate ==3.2.1
- pbs-installer ==2025.2.12
- pkginfo ==1.12.1.2
- pluggy ==1.5.0
- poetry ==2.1.1
- poetry-core ==2.1.1
- pre-commit ==4.1.0
- pretrainedmodels ==0.7.4
- pycocotools ==2.0.8
- pydantic ==2.10.6
- pydantic-core ==2.27.2
- pyproject-hooks ==1.2.0
- pytest ==8.3.3
- python-graphviz ==0.20.3
- python-json-logger ==3.3.0
- rapidfuzz ==3.12.2
- referencing ==0.35.1
- regex ==2024.9.11
- requests-toolbelt ==1.0.0
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rich ==13.9.2
- rpds-py ==0.20.0
- ruff ==0.9.7
- s3transfer ==0.10.3
- safetensors ==0.4.5
- sam2util ==1.1.3
- seaborn ==0.13.2
- secretstorage ==3.3.3
- segmentation-models-pytorch ==0.4.0
- send2trash ==1.8.3
- shellingham ==1.5.4
- simsimd ==6.2.1
- sniffio ==1.3.1
- sortedcontainers ==2.4.0
- soupsieve ==2.6
- stringzilla ==3.12.1
- supervision ==0.25.1
- sympy ==1.13.1
- tabulate ==0.9.0
- tenacity ==9.0.0
- terminado ==0.18.1
- timm ==0.9.7
- tinycss2 ==1.4.0
- tokenizers ==0.20.1
- tomlkit ==0.13.2
- torch ==2.5.1
- torchinfo ==1.8.0
- torchview ==0.2.6
- transformers ==4.46.0
- treelib ==1.7.0
- trove-classifiers ==2025.3.3.18
- types-python-dateutil ==2.9.0.20241206
- typing-extensions ==4.12.2
- typing-inspect ==0.9.0
- uri-template ==1.3.0
- virtualenv ==20.29.2
- webcolors ==24.11.1
- webencodings ==0.5.1
- widgetsnbextension ==4.0.13
- xformers ==0.0.29.post1
- zstandard ==0.23.0
- decord *
- easydict *
- einops *
- imageio *
- imageio-ffmpeg *
- matplotlib *
- numpy *
- opencv-python *
- pillow *
- torch *
- torchvision *
- tqdm *
- xformers *
- jupyterlab ==4.4.0
- matplotlib ==3.10.1
- mlflow ==2.21.3
- numpy ==2.2.4
- onnxruntime ==1.21.1
- opencv-python-headless ==4.11.0.86
- pandas ==2.2.3
- pillow ==11.2.1
- tqdm ==4.67.1
- transformers ==4.51.3
- ffmpeg-python *
- ipykernel *
- matplotlib *
- ninja *
- papermill *
- sam2util *
- setuptools *
- supervision *
- ultralytics *
- wheel *