wtpose

WTPose: Waterfall Transformer for Multi-person Pose Estimation

https://github.com/navinranjan7/wtpose

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

WTPose: Waterfall Transformer for Multi-person Pose Estimation

Basic Info

Host: GitHub
Owner: navinranjan7
License: apache-2.0
Language: Python
Default Branch: main
Size: 43.9 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Code of conduct Citation

WTPose: Waterfall Transformer for Multi-Person Pose Estimaiton

T4V@CVPR 2023 (Poster) | WACV Workshop 2025 (Oral)

Description

This repository contains the implementation and experiments related to our paper: "WTPose: Waterfall Transformer for Pose Estimation". WTPose introduces a novel Waterfall Transformer Module (WTM) that enhances pose estimation by improving the performance of vision transformers like the Shifted Window (Swin) transformer.

Figure 1: Waterfall transformer framework for multi-person pose estimation. The input color image is fed through the modified Swin Transformer backbone and WTM module to obtain 128 feature channels at reduced resolution by a factor of 4. The decoder module generates K heatmaps, one per joint.

The WTM processes feature maps from multiple levels of the backbone through its waterfall branches. It applies filtering operations based on a **dilated attention mechanism** to expand the Field-of-View (FOV) and effectively capture both local and global context. These innovations lead to significant improvements over baseline models.

Figure 2: The proposed waterfall transformer module. The inputs are multi-scale feature maps from all four stages of the Swin backbone and low-level features from the ResNet bottleneck. The waterfall module creates a waterfall flow, initially processing the input and then creating a new branch. The feature dimensions (spatial and channel dimensions) output by various blocks are shown in parentheses.

Pose estimation examples using WTPose, showcasing the effectiveness of our approach.

Pose Estimation Result 1

Pose Estimation Result 2

Results on MS COCO Val set

Using detection results from a detector that obtains 56 mAP on person. The configs here are for both training and test.

256x192 resolution

|Model | Resolution | Params (M) | GFLOPs | AP | AR | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Swin-T | 256x192 | 32.8 | 6.3 | 72.44 | 78.20 |config | log | - | | WTPose-T | 256x192 | 30.0 | 12.8 | 74.23 | 79.43 |config | log | - | | Swin-B | 256x192 | 93.0 | 19.0 | 73.72 | 79.32 |config | log | - | | WTPose-B | 256x192 | 89.3 | 25.6 | 74.96 | 80.51 |config | log | - | | Swin-L | 256x192 | 32.8 | 41.0 | 74.30 | 79.82 |config | log | - | | WTPose-L | 256x192 | 32.8 | 47.9 | 75.40 | 80.81 |config | log | - |

384x288 resolution

|Model | Resolution | Params (M) | GFLOPs | AP | AR | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Swin-T | 384x288 | 32.8 | 13.8 | 74.89 | 80.93 |config | log | - | | WTPose-T | 384x288 | 30.0 | 28.3 | 76.36 | 81.43 | config | log | - | | Swin-B | 384x288 | 93.0 | 41.6 | 75.81 | 80.99 |config | log | - | | WTPose-B | 384x288 | 89.3 | 55.8 | 77.18 | 82.07 |config | log | - | | Swin-L | 384x288 | 32.8 | 88.2 | 76.30 | 81.44 |config | log | - | | WTPose-L | 384x288 | 32.8 | 104.2 | 77.56 | 82.61 |config | log | - |

Installation and Setup

To use this code, clone the repository and refer to installation.md for more detailed installation and dataset preparation. Alternatively, you can use environment.yaml to create an environment.

Usage

Run the following command to start training or evaluation: bash python tools\train.py --config

Citation

If you use our work, please consider citing: bibtex @article{ranjan2024waterfall, author = {Ranjan, Navin and Artacho, Bruno and Savakis, Andreas}, title = {Waterfall Transformer for Multi-person Pose Estimation}, journal = {arXiv preprint arXiv:2411.18944}, year = {2024} }

Acknowledge

We acknowledge the excellent implementation from mmpose.

Contact

For any questions or discussions, feel free to open an issue or contact us at: 📧 nr4325@g.rit.edu

⭐ If you find this repository useful, please consider giving it a star!

Owner

Name: Navin Ranjan
Login: navinranjan7
Kind: user
Location: New York, USA
Company: Rochester Institute of Technology

Website: https://navinranjan7.github.io/
Repositories: 1
Profile: https://github.com/navinranjan7

Researcher at Vision and Image Processing Lab, Rochester Institute of Technology (RIT), New York, USA.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMPose Contributors"
title: "OpenMMLab Pose Estimation Toolbox and Benchmark"
date-released: 2020-08-31
url: "https://github.com/open-mmlab/mmpose"
license: Apache-2.0

GitHub Events

Total

Watch event: 2
Push event: 57
Create event: 2

Last Year

Watch event: 2
Push event: 57
Create event: 2

Dependencies

.github/workflows/deploy.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/lint.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/merge_stage_test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
codecov/codecov-action v1.0.14 composite

.github/workflows/pr_stage_test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
codecov/codecov-action v1.0.14 composite

.circleci/docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

logs/log1/err/setup.py pypi

projects/rtmpose/examples/onnxruntime/requirements.txt pypi

loguru ==0.6.0
numpy ==1.21.6
onnxruntime ==1.14.1
onnxruntime-gpu ==1.8.1

requirements/albu.txt pypi

albumentations >=0.3.2

requirements/build.txt pypi

numpy *
torch >=1.8

requirements/docs.txt pypi

docutils ==0.16.0
markdown *
myst-parser *
sphinx ==4.5.0
sphinx_copybutton *
sphinx_markdown_tables *
urllib3 <2.0.0

requirements/mminstall.txt pypi

mmcv >=2.0.0,<2.1.0
mmdet >=3.0.0,<3.2.0
mmengine >=0.4.0,<1.0.0

requirements/optional.txt pypi

requests *

requirements/poseval.txt pypi

shapely ==1.8.4

requirements/readthedocs.txt pypi

mmcv >=2.0.0rc4
mmengine >=0.6.0,<1.0.0
munkres *
regex *
scipy *
titlecase *
torch >1.6
torchvision *
xtcocotools >=1.13

requirements/runtime.txt pypi

chumpy *
json_tricks *
matplotlib *
munkres *
numpy *
opencv-python *
pillow *
scipy *
torchvision *
xtcocotools >=1.12

requirements/tests.txt pypi

coverage * test
flake8 * test
interrogate * test
isort ==4.3.21 test
parameterized * test
pytest * test
pytest-runner * test
xdoctest >=0.10.0 test
yapf * test

requirements.txt pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science