Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: wtc0214
  • License: agpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 1.25 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License Citation

README.md

Enhanced YOLO with Spectral Recalibration for Accurate and Real-Time Sign Language Detection

Abstract:
Sign language serves as a vital communication medium for individuals with hearing impairments, yet conventional convolutional architectures often suffer from significant feature degradation, particularly in high-frequency details and multi-scale feature representation. This paper introduces a novel method, YOLO-FPD, which leverages Fast Fourier Transform (FFT) to construct a dual-domain decoupled feature representation framework. A Parallel Frequency-domain Attention Module (PFMLP) is integrated to dynamically enhance key responses in both frequency and spatial domains, while a Dynamic Heterogeneous Multi-scale Cross-stage Fusion Module (DHMCS-FM) is proposed to improve multi-scale and high-frequency gesture feature capture.Experimental results on public datasets demonstrate that YOLO-FPD achieves state-of-the-art accuracy (mAP@50 of 93.2% on the ASL dataset and 92.4% on the Expression dataset) while maintaining real-time performance, outperforming several mainstream models.Our approach not only addresses the challenges of high-frequency detail loss and multi-scale feature representation but also establishes a collaborative mechanism between frequency-domain and spatial-domain processing, paving the way for more robust and efficient sign language recognition systems.


🔧 Installation

This implementation is based on YOLOv5, a single-stage target detection network.

✅ Environment

python 3.10  
pytorch 1.13  
torchvision 0.14.1  
cuda 11.6  

Create a new conda environment and train

```bash conda create -n signlang python=3.10 conda activate signlang

Install dependencies

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116 pip install -r requirements.txt

Train python train.py model=name.yaml data=data.yaml epoch=300 batch=8

Detect python detect.py mode=predict model=weightpath source=datasetpath

Owner

  • Login: wtc0214
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
preferred-citation:
  type: software
  message: If you use YOLOv5, please cite it as below.
  authors:
  - family-names: Jocher
    given-names: Glenn
    orcid: "https://orcid.org/0000-0001-5950-6979"
  title: "YOLOv5 by Ultralytics"
  version: 7.0
  doi: 10.5281/zenodo.3908559
  date-released: 2020-5-29
  license: AGPL-3.0
  url: "https://github.com/ultralytics/yolov5"

GitHub Events

Total
  • Push event: 5
Last Year
  • Push event: 5

Dependencies

utils/docker/Dockerfile docker
  • pytorch/pytorch 2.0.0-cuda11.7-cudnn8-runtime build
utils/google_app_engine/Dockerfile docker
  • gcr.io/google-appengine/python latest build
utils/google_app_engine/additional_requirements.txt pypi
  • Flask ==2.3.2
  • gunicorn ==22.0.0
  • pip ==23.3
  • werkzeug >=3.0.1
  • zipp >=3.19.1