wtpose
WTPose: Waterfall Transformer for Multi-person Pose Estimation
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Repository
WTPose: Waterfall Transformer for Multi-person Pose Estimation
Basic Info
- Host: GitHub
- Owner: navinranjan7
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 43.9 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
WTPose: Waterfall Transformer for Multi-Person Pose Estimaiton
T4V@CVPR 2023 (Poster) | WACV Workshop 2025 (Oral)
Description
This repository contains the implementation and experiments related to our paper: "WTPose: Waterfall Transformer for Pose Estimation". WTPose introduces a novel Waterfall Transformer Module (WTM) that enhances pose estimation by improving the performance of vision transformers like the Shifted Window (Swin) transformer.
Figure 1: Waterfall transformer framework for multi-person pose estimation. The input color image is fed through the modified Swin
Transformer backbone and WTM module to obtain 128 feature channels at reduced resolution by a factor of 4. The decoder module
generates K heatmaps, one per joint.
The WTM processes feature maps from multiple levels of the backbone through its waterfall branches. It applies filtering operations based on a **dilated attention mechanism** to expand the Field-of-View (FOV) and effectively capture both local and global context. These innovations lead to significant improvements over baseline models.
Figure 2: The proposed waterfall transformer module. The inputs are multi-scale feature maps from all four stages of the Swin backbone
and low-level features from the ResNet bottleneck. The waterfall module creates a waterfall flow, initially processing the input and then
creating a new branch. The feature dimensions (spatial and channel dimensions) output by various blocks are shown in parentheses.
Pose estimation examples using WTPose, showcasing the effectiveness of our approach.
Results on MS COCO Val set
Using detection results from a detector that obtains 56 mAP on person. The configs here are for both training and test.
256x192 resolution
|Model | Resolution | Params (M) | GFLOPs | AP | AR | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Swin-T | 256x192 | 32.8 | 6.3 | 72.44 | 78.20 |config | log | - | | WTPose-T | 256x192 | 30.0 | 12.8 | 74.23 | 79.43 |config | log | - | | Swin-B | 256x192 | 93.0 | 19.0 | 73.72 | 79.32 |config | log | - | | WTPose-B | 256x192 | 89.3 | 25.6 | 74.96 | 80.51 |config | log | - | | Swin-L | 256x192 | 32.8 | 41.0 | 74.30 | 79.82 |config | log | - | | WTPose-L | 256x192 | 32.8 | 47.9 | 75.40 | 80.81 |config | log | - |
384x288 resolution
|Model | Resolution | Params (M) | GFLOPs | AP | AR | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Swin-T | 384x288 | 32.8 | 13.8 | 74.89 | 80.93 |config | log | - | | WTPose-T | 384x288 | 30.0 | 28.3 | 76.36 | 81.43 | config | log | - | | Swin-B | 384x288 | 93.0 | 41.6 | 75.81 | 80.99 |config | log | - | | WTPose-B | 384x288 | 89.3 | 55.8 | 77.18 | 82.07 |config | log | - | | Swin-L | 384x288 | 32.8 | 88.2 | 76.30 | 81.44 |config | log | - | | WTPose-L | 384x288 | 32.8 | 104.2 | 77.56 | 82.61 |config | log | - |
Installation and Setup
To use this code, clone the repository and refer to installation.md for more detailed installation and dataset preparation. Alternatively, you can use environment.yaml to create an environment.
Usage
Run the following command to start training or evaluation:
bash
python tools\train.py --config
Citation
If you use our work, please consider citing:
bibtex
@article{ranjan2024waterfall,
author = {Ranjan, Navin and Artacho, Bruno and Savakis, Andreas},
title = {Waterfall Transformer for Multi-person Pose Estimation},
journal = {arXiv preprint arXiv:2411.18944},
year = {2024}
}
Acknowledge
We acknowledge the excellent implementation from mmpose.
Contact
For any questions or discussions, feel free to open an issue or contact us at: 📧 nr4325@g.rit.edu
⭐ If you find this repository useful, please consider giving it a star!
Owner
- Name: Navin Ranjan
- Login: navinranjan7
- Kind: user
- Location: New York, USA
- Company: Rochester Institute of Technology
- Website: https://navinranjan7.github.io/
- Repositories: 1
- Profile: https://github.com/navinranjan7
Researcher at Vision and Image Processing Lab, Rochester Institute of Technology (RIT), New York, USA.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMPose Contributors" title: "OpenMMLab Pose Estimation Toolbox and Benchmark" date-released: 2020-08-31 url: "https://github.com/open-mmlab/mmpose" license: Apache-2.0
GitHub Events
Total
- Watch event: 2
- Push event: 57
- Create event: 2
Last Year
- Watch event: 2
- Push event: 57
- Create event: 2
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v1.0.14 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v1.0.14 composite
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- loguru ==0.6.0
- numpy ==1.21.6
- onnxruntime ==1.14.1
- onnxruntime-gpu ==1.8.1
- albumentations >=0.3.2
- numpy *
- torch >=1.8
- docutils ==0.16.0
- markdown *
- myst-parser *
- sphinx ==4.5.0
- sphinx_copybutton *
- sphinx_markdown_tables *
- urllib3 <2.0.0
- mmcv >=2.0.0,<2.1.0
- mmdet >=3.0.0,<3.2.0
- mmengine >=0.4.0,<1.0.0
- requests *
- shapely ==1.8.4
- mmcv >=2.0.0rc4
- mmengine >=0.6.0,<1.0.0
- munkres *
- regex *
- scipy *
- titlecase *
- torch >1.6
- torchvision *
- xtcocotools >=1.13
- chumpy *
- json_tricks *
- matplotlib *
- munkres *
- numpy *
- opencv-python *
- pillow *
- scipy *
- torchvision *
- xtcocotools >=1.12
- coverage * test
- flake8 * test
- interrogate * test
- isort ==4.3.21 test
- parameterized * test
- pytest * test
- pytest-runner * test
- xdoctest >=0.10.0 test
- yapf * test