wtpose

WTPose: Waterfall Transformer for Multi-person Pose Estimation

https://github.com/navinranjan7/wtpose

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

WTPose: Waterfall Transformer for Multi-person Pose Estimation

Basic Info
  • Host: GitHub
  • Owner: navinranjan7
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 43.9 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Code of conduct Citation

README.md

WTPose: Waterfall Transformer for Multi-Person Pose Estimaiton

T4V@CVPR 2023 (Poster) | WACV Workshop 2025 (Oral)

Description

This repository contains the implementation and experiments related to our paper: "WTPose: Waterfall Transformer for Pose Estimation". WTPose introduces a novel Waterfall Transformer Module (WTM) that enhances pose estimation by improving the performance of vision transformers like the Shifted Window (Swin) transformer.

Figure 1: Waterfall transformer framework for multi-person pose estimation. The input color image is fed through the modified Swin Transformer backbone and WTM module to obtain 128 feature channels at reduced resolution by a factor of 4. The decoder module generates K heatmaps, one per joint.


The WTM processes feature maps from multiple levels of the backbone through its waterfall branches. It applies filtering operations based on a **dilated attention mechanism** to expand the Field-of-View (FOV) and effectively capture both local and global context. These innovations lead to significant improvements over baseline models.

Figure 2: The proposed waterfall transformer module. The inputs are multi-scale feature maps from all four stages of the Swin backbone and low-level features from the ResNet bottleneck. The waterfall module creates a waterfall flow, initially processing the input and then creating a new branch. The feature dimensions (spatial and channel dimensions) output by various blocks are shown in parentheses.


Pose estimation examples using WTPose, showcasing the effectiveness of our approach.

Results on MS COCO Val set

Using detection results from a detector that obtains 56 mAP on person. The configs here are for both training and test.

256x192 resolution

|Model | Resolution | Params (M) | GFLOPs | AP | AR | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Swin-T | 256x192 | 32.8 | 6.3 | 72.44 | 78.20 |config | log | - | | WTPose-T | 256x192 | 30.0 | 12.8 | 74.23 | 79.43 |config | log | - | | Swin-B | 256x192 | 93.0 | 19.0 | 73.72 | 79.32 |config | log | - | | WTPose-B | 256x192 | 89.3 | 25.6 | 74.96 | 80.51 |config | log | - | | Swin-L | 256x192 | 32.8 | 41.0 | 74.30 | 79.82 |config | log | - | | WTPose-L | 256x192 | 32.8 | 47.9 | 75.40 | 80.81 |config | log | - |

384x288 resolution

|Model | Resolution | Params (M) | GFLOPs | AP | AR | config | log | weight | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | | Swin-T | 384x288 | 32.8 | 13.8 | 74.89 | 80.93 |config | log | - | | WTPose-T | 384x288 | 30.0 | 28.3 | 76.36 | 81.43 | config | log | - | | Swin-B | 384x288 | 93.0 | 41.6 | 75.81 | 80.99 |config | log | - | | WTPose-B | 384x288 | 89.3 | 55.8 | 77.18 | 82.07 |config | log | - | | Swin-L | 384x288 | 32.8 | 88.2 | 76.30 | 81.44 |config | log | - | | WTPose-L | 384x288 | 32.8 | 104.2 | 77.56 | 82.61 |config | log | - |

Installation and Setup

To use this code, clone the repository and refer to installation.md for more detailed installation and dataset preparation. Alternatively, you can use environment.yaml to create an environment.

Usage

Run the following command to start training or evaluation: bash python tools\train.py --config

Citation

If you use our work, please consider citing: bibtex @article{ranjan2024waterfall, author = {Ranjan, Navin and Artacho, Bruno and Savakis, Andreas}, title = {Waterfall Transformer for Multi-person Pose Estimation}, journal = {arXiv preprint arXiv:2411.18944}, year = {2024} }

Acknowledge

We acknowledge the excellent implementation from mmpose.

Contact

For any questions or discussions, feel free to open an issue or contact us at: 📧 nr4325@g.rit.edu


⭐ If you find this repository useful, please consider giving it a star!

Owner

  • Name: Navin Ranjan
  • Login: navinranjan7
  • Kind: user
  • Location: New York, USA
  • Company: Rochester Institute of Technology

Researcher at Vision and Image Processing Lab, Rochester Institute of Technology (RIT), New York, USA.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMPose Contributors"
title: "OpenMMLab Pose Estimation Toolbox and Benchmark"
date-released: 2020-08-31
url: "https://github.com/open-mmlab/mmpose"
license: Apache-2.0

GitHub Events

Total
  • Watch event: 2
  • Push event: 57
  • Create event: 2
Last Year
  • Watch event: 2
  • Push event: 57
  • Create event: 2

Dependencies

.github/workflows/deploy.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/lint.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/merge_stage_test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v1.0.14 composite
.github/workflows/pr_stage_test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v1.0.14 composite
.circleci/docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
logs/log1/err/setup.py pypi
projects/rtmpose/examples/onnxruntime/requirements.txt pypi
  • loguru ==0.6.0
  • numpy ==1.21.6
  • onnxruntime ==1.14.1
  • onnxruntime-gpu ==1.8.1
requirements/albu.txt pypi
  • albumentations >=0.3.2
requirements/build.txt pypi
  • numpy *
  • torch >=1.8
requirements/docs.txt pypi
  • docutils ==0.16.0
  • markdown *
  • myst-parser *
  • sphinx ==4.5.0
  • sphinx_copybutton *
  • sphinx_markdown_tables *
  • urllib3 <2.0.0
requirements/mminstall.txt pypi
  • mmcv >=2.0.0,<2.1.0
  • mmdet >=3.0.0,<3.2.0
  • mmengine >=0.4.0,<1.0.0
requirements/optional.txt pypi
  • requests *
requirements/poseval.txt pypi
  • shapely ==1.8.4
requirements/readthedocs.txt pypi
  • mmcv >=2.0.0rc4
  • mmengine >=0.6.0,<1.0.0
  • munkres *
  • regex *
  • scipy *
  • titlecase *
  • torch >1.6
  • torchvision *
  • xtcocotools >=1.13
requirements/runtime.txt pypi
  • chumpy *
  • json_tricks *
  • matplotlib *
  • munkres *
  • numpy *
  • opencv-python *
  • pillow *
  • scipy *
  • torchvision *
  • xtcocotools >=1.12
requirements/tests.txt pypi
  • coverage * test
  • flake8 * test
  • interrogate * test
  • isort ==4.3.21 test
  • parameterized * test
  • pytest * test
  • pytest-runner * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi