cv-islr

WWW25@CV-ISLR

https://github.com/jiafei127/cv-islr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

WWW25@CV-ISLR

Basic Info
  • Host: GitHub
  • Owner: Jiafei127
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 3.86 MB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

WWW25@CV-ISLR

This repository contains our implementation for the Cross-View Isolated Sign Language Recognition (CV-ISLR) task submitted to the WWW 2025 competition. Our approach combines Ensemble Learning and Video Swin Transformer (VST) modules to address the challenges of cross-view sign language recognition. The framework is built on top of the MMAction2 v1.2.0 library.


Main Contributions

  1. Ensemble Learning Integration:
    We integrate Ensemble Learning into the CV-ISLR framework, enhancing robustness and generalization to effectively handle viewpoint variability.

  2. Multi-Dimensional VST Blocks:
    We utilize VST blocks of varying sizes (Small, Base, Large) for both RGB and Depth videos, capturing features at multiple levels of granularity to improve recognition accuracy.


Installation

To set up the environment, follow these steps:

  1. Clone the repository:

```bash git clone https://github.com/Jiafei127/CV-ISLR.git cd CV-ISLR

  1. Install dependencies:

```bash conda create -n cvislr python=3.8 -y conda activate cvislr conda install pytorch torchvision -c pytorch # This command will automatically install the latest version PyTorch and cudatoolkit, please check whether they match your environment. pip install -U openmim mim install mmengine mim install mmcv mim install mmdet
mim install mmpose

  1. Install MMAction2 v1.2.0:

```bash pip install -v -e .

Below is a markdown file for the CV-ISLR competition code repository. It adheres to GitHub's best practices for a well-documented repository.


Training

To train the models for RGB and Depth inputs:

  1. Prepare the dataset: Download and preprocess the MM-WLAuslan dataset following the instructions provided in the dataset/README.md.

  2. Train the backbone models: bash python tools/train.py configs/recognition/swin/swin-<file_name>_rgb.py python tools/train.py configs/recognition/swin/swin-<file_name>_depth.py

  3. Save model checkpoints: After training, checkpoints will be saved in the work_dirs/ folder.

  4. Inferencebash PORT=29500 bash tools/dist_test.sh configs/recognition/swin/swin-<file_name>_rgb.py ./work_dirs/swin-<checkpoint_name>.pth --dump result.pkl


Ensemble Learning

After training the individual models, apply the ensemble strategy (You can download our Model Checkpoints[huggingface].):

  1. Merge predictions from multiple backbones: bash cd ./ENSEMBLE python ensemble.py
  2. Submit the zip file answer.zip to Codalab.

Performance

Top-1 Accuracy Results

| Team | RGB Acc@1 | RGB-D Acc@1 | |---------------|-----------|-------------| | VIPL-SLR | 56.87% | 57.97% | | tonicemerald | 40.30% | 33.97% | | gkdx2 (Ours) | 20.29% | 24.53% |

Table 1: The top-3 results for CV-ISLR on RGB and RGB-D tracks.

| Backbone | RGB-based | Depth-based | RGB-D-based | |-------------|-----------|-------------|-------------| | VST-Small | 14.84% | 14.01% | - | | VST-Base | 17.51% | 16.46% | - | | VST-Large | 17.04% | 17.58% | - | | Ensemble | 20.29% | - | 24.53% |

Table 2: Experimental results for RGB and RGB-D tracks on different backbones.


Acknowledgements

This project is built on the MMAction2 framework and utilizes the MM-WLAuslan dataset. We thank the developers for their contributions to open-source tools and datasets.


Owner

  • Name: Fei Wang
  • Login: Jiafei127
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMAction2 Contributors"
title: "OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark"
date-released: 2020-07-21
url: "https://github.com/open-mmlab/mmaction2"
license: Apache-2.0

GitHub Events

Total
  • Watch event: 5
  • Push event: 11
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Watch event: 5
  • Push event: 11
  • Pull request event: 2
  • Fork event: 1

Dependencies

requirements/build.txt pypi
  • Pillow *
  • decord >=0.4.1
  • einops *
  • matplotlib *
  • numpy *
  • opencv-contrib-python *
  • scipy *
  • torch >=1.3
requirements/docs.txt pypi
  • docutils ==0.18.1
  • einops *
  • modelindex *
  • myst-parser *
  • opencv-python *
  • scipy *
  • sphinx ==6.1.3
  • sphinx-notfound-page *
  • sphinx-tabs *
  • sphinx_copybutton *
  • sphinx_markdown_tables *
  • sphinxcontrib-jquery *
  • tabulate *
requirements/mminstall.txt pypi
  • mmcv >=2.0.0rc4,<2.2.0
  • mmengine >=0.7.1,<1.0.0
requirements/multimodal.txt pypi
  • transformers >=4.28.0
requirements/optional.txt pypi
  • PyTurboJPEG *
  • av >=9.0
  • future *
  • imgaug *
  • librosa *
  • lmdb *
  • moviepy *
  • openai-clip *
  • packaging *
  • pims *
  • soundfile *
  • tensorboard *
  • wandb *
requirements/readthedocs.txt pypi
  • mmcv *
  • titlecase *
  • torch *
  • torchvision *
requirements/tests.txt pypi
  • coverage * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • parameterized * test
  • pytest * test
  • pytest-runner * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi
setup.py pypi
tools/data/activitynet/environment.yml conda
  • ca-certificates 2020.1.1.*
  • certifi 2020.4.5.1.*
  • ffmpeg 2.8.6.*
  • libcxx 10.0.0.*
  • libedit 3.1.20181209.*
  • libffi 3.3.*
  • ncurses 6.2.*
  • openssl 1.1.1g.*
  • pip 20.0.2.*
  • python 3.7.7.*
  • readline 8.0.*
  • setuptools 46.4.0.*
  • sqlite 3.31.1.*
  • tk 8.6.8.*
  • wheel 0.34.2.*
  • xz 5.2.5.*
  • zlib 1.2.11.*
tools/data/hvu/environment.yml conda
  • ca-certificates 2020.1.1.*
  • certifi 2020.4.5.1.*
  • ffmpeg 2.8.6.*
  • libcxx 10.0.0.*
  • libedit 3.1.20181209.*
  • libffi 3.3.*
  • ncurses 6.2.*
  • openssl 1.1.1g.*
  • pip 20.0.2.*
  • python 3.7.7.*
  • readline 8.0.*
  • setuptools 46.4.0.*
  • sqlite 3.31.1.*
  • tk 8.6.8.*
  • wheel 0.34.2.*
  • xz 5.2.5.*
  • zlib 1.2.11.*
tools/data/kinetics/environment.yml conda
  • ca-certificates 2020.1.1.*
  • certifi 2020.4.5.1.*
  • ffmpeg 2.8.6.*
  • libcxx 10.0.0.*
  • libedit 3.1.20181209.*
  • libffi 3.3.*
  • ncurses 6.2.*
  • openssl 1.1.1g.*
  • pip 20.0.2.*
  • python 3.7.7.*
  • readline 8.0.*
  • setuptools 46.4.0.*
  • sqlite 3.31.1.*
  • tk 8.6.8.*
  • wheel 0.34.2.*
  • xz 5.2.5.*
  • zlib 1.2.11.*