yolo-fpd

https://github.com/wtc0214/yolo-fpd

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary

Last synced: 7 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: wtc0214
License: agpl-3.0
Language: Python
Default Branch: main
Size: 1.25 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 10 months ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License Citation

Enhanced YOLO with Spectral Recalibration for Accurate and Real-Time Sign Language Detection

Abstract:
Sign language serves as a vital communication medium for individuals with hearing impairments, yet conventional convolutional architectures often suffer from significant feature degradation, particularly in high-frequency details and multi-scale feature representation. This paper introduces a novel method, YOLO-FPD, which leverages Fast Fourier Transform (FFT) to construct a dual-domain decoupled feature representation framework. A Parallel Frequency-domain Attention Module (PFMLP) is integrated to dynamically enhance key responses in both frequency and spatial domains, while a Dynamic Heterogeneous Multi-scale Cross-stage Fusion Module (DHMCS-FM) is proposed to improve multi-scale and high-frequency gesture feature capture.Experimental results on public datasets demonstrate that YOLO-FPD achieves state-of-the-art accuracy (mAP@50 of 93.2% on the ASL dataset and 92.4% on the Expression dataset) while maintaining real-time performance, outperforming several mainstream models.Our approach not only addresses the challenges of high-frequency detail loss and multi-scale feature representation but also establishes a collaborative mechanism between frequency-domain and spatial-domain processing, paving the way for more robust and efficient sign language recognition systems.

🔧 Installation

This implementation is based on YOLOv5, a single-stage target detection network.

✅ Environment

python 3.10  
pytorch 1.13  
torchvision 0.14.1  
cuda 11.6

Create a new conda environment and train

```bash conda create -n signlang python=3.10 conda activate signlang

Install dependencies

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 pip install -r requirements.txt

Train python train.py model=name.yaml data=data.yaml epoch=300 batch=8

Detect python detect.py mode=predict model=weightpath source=datasetpath

Owner

Login: wtc0214
Kind: user

Repositories: 1
Profile: https://github.com/wtc0214

Citation (CITATION.cff)

cff-version: 1.2.0
preferred-citation:
  type: software
  message: If you use YOLOv5, please cite it as below.
  authors:
  - family-names: Jocher
    given-names: Glenn
    orcid: "https://orcid.org/0000-0001-5950-6979"
  title: "YOLOv5 by Ultralytics"
  version: 7.0
  doi: 10.5281/zenodo.3908559
  date-released: 2020-5-29
  license: AGPL-3.0
  url: "https://github.com/ultralytics/yolov5"

GitHub Events

Total

Push event: 5

Last Year

Push event: 5

Dependencies

utils/docker/Dockerfile docker

pytorch/pytorch 2.0.0-cuda11.7-cudnn8-runtime build

utils/google_app_engine/Dockerfile docker

gcr.io/google-appengine/python latest build

utils/google_app_engine/additional_requirements.txt pypi

Flask ==2.3.2
gunicorn ==22.0.0
pip ==23.3
werkzeug >=3.0.1
zipp >=3.19.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science