cv-islr

WWW25@CV-ISLR

https://github.com/jiafei127/cv-islr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

WWW25@CV-ISLR

Basic Info

Host: GitHub
Owner: Jiafei127
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 3.86 MB

Statistics

Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

WWW25@CV-ISLR

This repository contains our implementation for the Cross-View Isolated Sign Language Recognition (CV-ISLR) task submitted to the WWW 2025 competition. Our approach combines Ensemble Learning and Video Swin Transformer (VST) modules to address the challenges of cross-view sign language recognition. The framework is built on top of the MMAction2 v1.2.0 library.

Main Contributions

Ensemble Learning Integration:
We integrate Ensemble Learning into the CV-ISLR framework, enhancing robustness and generalization to effectively handle viewpoint variability.
Multi-Dimensional VST Blocks:
We utilize VST blocks of varying sizes (Small, Base, Large) for both RGB and Depth videos, capturing features at multiple levels of granularity to improve recognition accuracy.

Installation

To set up the environment, follow these steps:

Clone the repository:

```bash git clone https://github.com/Jiafei127/CV-ISLR.git cd CV-ISLR

Install dependencies:

```bash conda create -n cvislr python=3.8 -y conda activate cvislr conda install pytorch torchvision -c pytorch # This command will automatically install the latest version PyTorch and cudatoolkit, please check whether they match your environment. pip install -U openmim mim install mmengine mim install mmcv mim install mmdet
mim install mmpose

Install MMAction2 v1.2.0:

```bash pip install -v -e .

Below is a markdown file for the CV-ISLR competition code repository. It adheres to GitHub's best practices for a well-documented repository.

Training

To train the models for RGB and Depth inputs:

Prepare the dataset: Download and preprocess the MM-WLAuslan dataset following the instructions provided in the dataset/README.md.
Train the backbone models: bash python tools/train.py configs/recognition/swin/swin-<file_name>_rgb.py python tools/train.py configs/recognition/swin/swin-<file_name>_depth.py
Save model checkpoints: After training, checkpoints will be saved in the work_dirs/ folder.
Inference： bash PORT=29500 bash tools/dist_test.sh configs/recognition/swin/swin-<file_name>_rgb.py ./work_dirs/swin-<checkpoint_name>.pth --dump result.pkl

Ensemble Learning

After training the individual models, apply the ensemble strategy (You can download our Model Checkpoints[huggingface].):

Merge predictions from multiple backbones: bash cd ./ENSEMBLE python ensemble.py
Submit the zip file answer.zip to Codalab.

Performance

Top-1 Accuracy Results

| Team | RGB Acc@1 | RGB-D Acc@1 | |---------------|-----------|-------------| | VIPL-SLR | 56.87% | 57.97% | | tonicemerald | 40.30% | 33.97% | | gkdx2 (Ours) | 20.29% | 24.53% |

Table 1: The top-3 results for CV-ISLR on RGB and RGB-D tracks.

| Backbone | RGB-based | Depth-based | RGB-D-based | |-------------|-----------|-------------|-------------| | VST-Small | 14.84% | 14.01% | - | | VST-Base | 17.51% | 16.46% | - | | VST-Large | 17.04% | 17.58% | - | | Ensemble | 20.29% | - | 24.53% |

Table 2: Experimental results for RGB and RGB-D tracks on different backbones.

Acknowledgements

This project is built on the MMAction2 framework and utilizes the MM-WLAuslan dataset. We thank the developers for their contributions to open-source tools and datasets.

Owner

Name: Fei Wang
Login: Jiafei127
Kind: user

Repositories: 1
Profile: https://github.com/Jiafei127

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMAction2 Contributors"
title: "OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark"
date-released: 2020-07-21
url: "https://github.com/open-mmlab/mmaction2"
license: Apache-2.0

GitHub Events

Total

Watch event: 5
Push event: 11
Pull request event: 2
Fork event: 1

Last Year

Watch event: 5
Push event: 11
Pull request event: 2
Fork event: 1

Dependencies

requirements/build.txt pypi

Pillow *
decord >=0.4.1
einops *
matplotlib *
numpy *
opencv-contrib-python *
scipy *
torch >=1.3

requirements/docs.txt pypi

docutils ==0.18.1
einops *
modelindex *
myst-parser *
opencv-python *
scipy *
sphinx ==6.1.3
sphinx-notfound-page *
sphinx-tabs *
sphinx_copybutton *
sphinx_markdown_tables *
sphinxcontrib-jquery *
tabulate *

requirements/mminstall.txt pypi

mmcv >=2.0.0rc4,<2.2.0
mmengine >=0.7.1,<1.0.0

requirements/multimodal.txt pypi

transformers >=4.28.0

requirements/optional.txt pypi

PyTurboJPEG *
av >=9.0
future *
imgaug *
librosa *
lmdb *
moviepy *
openai-clip *
packaging *
pims *
soundfile *
tensorboard *
wandb *

requirements/readthedocs.txt pypi

mmcv *
titlecase *
torch *
torchvision *

requirements/tests.txt pypi

coverage * test
flake8 * test
interrogate * test
isort ==4.3.21 test
parameterized * test
pytest * test
pytest-runner * test
xdoctest >=0.10.0 test
yapf * test

requirements.txt pypi

setup.py pypi

tools/data/activitynet/environment.yml conda

ca-certificates 2020.1.1.*
certifi 2020.4.5.1.*
ffmpeg 2.8.6.*
libcxx 10.0.0.*
libedit 3.1.20181209.*
libffi 3.3.*
ncurses 6.2.*
openssl 1.1.1g.*
pip 20.0.2.*
python 3.7.7.*
readline 8.0.*
setuptools 46.4.0.*
sqlite 3.31.1.*
tk 8.6.8.*
wheel 0.34.2.*
xz 5.2.5.*
zlib 1.2.11.*

tools/data/hvu/environment.yml conda

ca-certificates 2020.1.1.*
certifi 2020.4.5.1.*
ffmpeg 2.8.6.*
libcxx 10.0.0.*
libedit 3.1.20181209.*
libffi 3.3.*
ncurses 6.2.*
openssl 1.1.1g.*
pip 20.0.2.*
python 3.7.7.*
readline 8.0.*
setuptools 46.4.0.*
sqlite 3.31.1.*
tk 8.6.8.*
wheel 0.34.2.*
xz 5.2.5.*
zlib 1.2.11.*

tools/data/kinetics/environment.yml conda

ca-certificates 2020.1.1.*
certifi 2020.4.5.1.*
ffmpeg 2.8.6.*
libcxx 10.0.0.*
libedit 3.1.20181209.*
libffi 3.3.*
ncurses 6.2.*
openssl 1.1.1g.*
pip 20.0.2.*
python 3.7.7.*
readline 8.0.*
setuptools 46.4.0.*
sqlite 3.31.1.*
tk 8.6.8.*
wheel 0.34.2.*
xz 5.2.5.*
zlib 1.2.11.*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

cv-islr

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

WWW25@CV-ISLR

Main Contributions

Installation

Training

Ensemble Learning

Performance

Top-1 Accuracy Results

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies