bboxmaskpose
[ICCV 25] The official repository of paper 'Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle'
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Repository
[ICCV 25] The official repository of paper 'Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle'
Basic Info
- Host: GitHub
- Owner: MiraPurkrabek
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://MiraPurkrabek.github.io/BBox-Mask-Pose/
- Size: 6.58 MB
Statistics
- Stars: 40
- Watchers: 7
- Forks: 4
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
ICCV 2025
[](https://arxiv.org/abs/2412.01562)
[](https://mirapurkrabek.github.io/BBox-Mask-Pose/)
[](LICENSE)
[](https://youtu.be/U05yUP4b2LQ)
Papers with code:
[](https://paperswithcode.com/sota/2d-human-pose-estimation-on-ochuman?p=detection-pose-estimation-and-segmentation-1)
[](https://paperswithcode.com/sota/human-instance-segmentation-on-ochuman?p=detection-pose-estimation-and-segmentation-1)
📋 Overview
The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other. This approach enhances all three tasks simultaneously. Using segmentation masks instead of bounding boxes improves performance in crowded scenarios, making top-down methods competitive with bottom-up approaches.
Key contributions: 1. MaskPose: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters - Download pre-trained weights below 2. BBox-MaskPose (BMP): method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation - Try the demo! 3. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes') - Download pre-trained weights below 5. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
For more details, please visit our project website.
📢 News
- Aug 2025: HuggingFace Image Demo is out! 🎮
- Jul 2025: Version 1.1 with easy-to-run image demo released
- Jun 2025: Paper accepted to ICCV 2025! 🎉
- Dec 2024: The code is available
- Nov 2024: The project website is on
🚀 Installation
This project is built on top of MMPose and SAM 2.1. Please refer to the MMPose installation guide or SAM installation guide for detailed setup instructions.
Basic installation steps: ```bash
Clone the repository
git clone https://github.com/mirapurkrabek/BBoxMaskPose.git BBoxMaskPose/ cd BBoxMaskPose
Install your version of torch, torchvision, OpenCV and NumPy
pip install torch==2.1.2+cu121 torchvision==0.16.2+cu121 --extra-index-url https://download.pytorch.org/whl/cu121 pip install numpy==1.25.1 opencv-python==4.9.0.80
Install MMLibrary
pip install -U openmim mim install mmengine "mmcv==2.1.0" "mmdet==3.3.0" "mmpretrain==1.2.0"
Install dependencies
pip install -r requirements.txt pip install -e . ```
🎮 Demo
Step 1: Download SAM2 weights using the enclosed script.
Step 2: Run the full BBox-Mask-Pose pipeline on an input image:
bash
python demo/bmp_demo.py configs/bmp_D3.yaml data/004806.jpg
It will take an image 004806.jpg from OCHuman and run (1) detector, (2) pose estimator and (3) SAM2 refinement. Details are in the cofiguration file bmp_D3.yaml.
Options:
- configs/bmp_D3.yaml: BMP configuration file
- data/004806.jpg: Input image
- --device: (Optional) Inference device (default: cuda:0)
- --output-root: (Optional) Directory to save outputs (default: demo/outputs)
- --create-gif: (Optional) Generate an animated GIF of all iterations (default False)
After running, outputs are in outputs/004806/. The expected output should look like this:
📦 Pre-trained Models
Pre-trained models are available on VRG Hugging Face 🤗. To run the demo, you only need do download SAM weights with enclosed script. Our detector and pose estimator will be downloaded during the runtime.
If you want to download our weights yourself, here are the links to our HuggingFace: - ViTPose-b trained on COCO+MPII+AIC -- download weights - MaskPose-b -- download weights - Fine-tuned RTMDet-L -- download weights
🙏 Acknowledgments
The code combines MMDetection, MMPose 2.0, ViTPose and SAM 2.1.
📝 Citation
The code was implemented by Miroslav Purkrábek. If you use this work, kindly cite it using the reference provided below.
For questions, please use the Issues of Discussion.
@InProceedings{Purkrabek2025ICCV,
author={Purkrabek, Miroslav and Matas, Jiri},
title={Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025},
month={October},
}
Owner
- Name: Miroslav Purkrábek
- Login: MiraPurkrabek
- Kind: user
- Location: Prague, Czech Republic
- Repositories: 1
- Profile: https://github.com/MiraPurkrabek
AI Researcher @ Visual Recognition Group, FEE CTU in Prague
Citation (CITATION.cff)
# CITATION.cff file for Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
# This file provides metadata for the software and its preferred citation format.
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Purkrabek
given-names: Miroslav
- family-names: Matas
given-names: Jiri
title: "Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle"
version: 1.0.0
date-released: 2025-06-20
preferred-citation:
type: conference-paper
authors:
- family-names: Purkrabek
given-names: Miroslav
- family-names: Matas
given-names: Jiri
collection-title: "Proceedings of the IEEE/CVF International Conference on Computer Vision"
month: 10
start: 1 # First page number
end: 8 # Last page number
title: "Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle"
year: 2025
GitHub Events
Total
- Issues event: 8
- Watch event: 34
- Delete event: 3
- Issue comment event: 8
- Push event: 19
- Pull request event: 2
- Pull request review event: 6
- Pull request review comment event: 4
- Fork event: 3
- Create event: 5
Last Year
- Issues event: 8
- Watch event: 34
- Delete event: 3
- Issue comment event: 8
- Push event: 19
- Pull request event: 2
- Pull request review event: 6
- Pull request review comment event: 4
- Fork event: 3
- Create event: 5
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 4
- Total pull requests: 2
- Average time to close issues: about 1 month
- Average time to close pull requests: 2 minutes
- Total issue authors: 4
- Total pull request authors: 1
- Average comments per issue: 2.25
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 2
- Average time to close issues: about 1 month
- Average time to close pull requests: 2 minutes
- Issue authors: 4
- Pull request authors: 1
- Average comments per issue: 2.25
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- NielsRogge (1)
- JunghoYoo (1)
- ousandada (1)
- neptune4year (1)
Pull Request Authors
- MiraPurkrabek (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- albumentations >=0.3.2
- numpy *
- torch >=1.8
- docutils ==0.16.0
- markdown *
- myst-parser *
- sphinx ==4.5.0
- sphinx_copybutton *
- sphinx_markdown_tables *
- urllib3 <2.0.0
- mmcv >=2.0.0,<2.2.0
- mmdet >=3.0.0,<3.3.0
- mmengine >=0.4.0,<1.0.0
- requests *
- shapely ==1.8.4
- mmcv >=2.0.0rc4
- mmengine >=0.6.0,<1.0.0
- munkres *
- regex *
- scipy *
- titlecase *
- torch >1.6
- torchvision *
- xtcocotools >=1.13
- chumpy *
- json_tricks *
- matplotlib *
- munkres *
- numpy *
- opencv-python *
- pillow *
- scipy *
- torchvision *
- xtcocotools >=1.12
- coverage * test
- flake8 * test
- interrogate * test
- isort ==4.3.21 test
- parameterized * test
- pytest * test
- pytest-runner * test
- xdoctest >=0.10.0 test
- yapf * test