yolo-fpd
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: wtc0214
- License: agpl-3.0
- Language: Python
- Default Branch: main
- Size: 1.25 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Enhanced YOLO with Spectral Recalibration for Accurate and Real-Time Sign Language Detection
Abstract:
Sign language serves as a vital communication medium for individuals with hearing impairments, yet conventional convolutional architectures often suffer from significant feature degradation, particularly in high-frequency details and multi-scale feature representation. This paper introduces a novel method, YOLO-FPD, which leverages Fast Fourier Transform (FFT) to construct a dual-domain decoupled feature representation framework. A Parallel Frequency-domain Attention Module (PFMLP) is integrated to dynamically enhance key responses in both frequency and spatial domains, while a Dynamic Heterogeneous Multi-scale Cross-stage Fusion Module (DHMCS-FM) is proposed to improve multi-scale and high-frequency gesture feature capture.Experimental results on public datasets demonstrate that YOLO-FPD achieves state-of-the-art accuracy (mAP@50 of 93.2% on the ASL dataset and 92.4% on the Expression dataset) while maintaining real-time performance, outperforming several mainstream models.Our approach not only addresses the challenges of high-frequency detail loss and multi-scale feature representation but also establishes a collaborative mechanism between frequency-domain and spatial-domain processing, paving the way for more robust and efficient sign language recognition systems.
🔧 Installation
This implementation is based on YOLOv5, a single-stage target detection network.
✅ Environment
python 3.10 pytorch 1.13 torchvision 0.14.1 cuda 11.6
Create a new conda environment and train
```bash conda create -n signlang python=3.10 conda activate signlang
Install dependencies
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 pip install -r requirements.txt
Train python train.py model=name.yaml data=data.yaml epoch=300 batch=8
Detect python detect.py mode=predict model=weightpath source=datasetpath
Owner
- Login: wtc0214
- Kind: user
- Repositories: 1
- Profile: https://github.com/wtc0214
Citation (CITATION.cff)
cff-version: 1.2.0
preferred-citation:
type: software
message: If you use YOLOv5, please cite it as below.
authors:
- family-names: Jocher
given-names: Glenn
orcid: "https://orcid.org/0000-0001-5950-6979"
title: "YOLOv5 by Ultralytics"
version: 7.0
doi: 10.5281/zenodo.3908559
date-released: 2020-5-29
license: AGPL-3.0
url: "https://github.com/ultralytics/yolov5"
GitHub Events
Total
- Push event: 5
Last Year
- Push event: 5
Dependencies
- pytorch/pytorch 2.0.0-cuda11.7-cudnn8-runtime build
- gcr.io/google-appengine/python latest build
- Flask ==2.3.2
- gunicorn ==22.0.0
- pip ==23.3
- werkzeug >=3.0.1
- zipp >=3.19.1