awesome-robotics-3d
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, ieee.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Statistics
- Stars: 754
- Watchers: 14
- Forks: 38
- Open Issues: 6
- Releases: 0
Topics
Metadata Files
README.md
Awesome-Robotics-3D
✨ About
This repo contains a curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision
Please feel free to send me pull requests or email to add papers!
If you find this repository useful, please consider citing 📝 and STARing ⭐ this list.
Feel free to share this list with others! List curated and maintained by Zubair Irshad. If you have any questions, please get in touch!
:fire: Other relevant survey papers:
"Neural Fields in Robotics", arXiv, Oct 2024. [Paper]
"When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models", arXiv, May 2024. [Paper]
"3D Gaussian Splatting in Robotics: A Survey", arXiv, Oct 2024. [Paper]
"A Comprehensive Study of 3-D Vision-Based Robot Manipulation", TCYB 2021. [Paper]
🏠 Overview
- Policy Learning
- Pretraining
- VLM and LLM
- Representations
- Simulations, Datasets and Benchmarks
- Citation
Policy Learning
3D Diffuser Actor: "Policy diffusion with 3d scene representations", arXiv Feb 2024. [Paper] [Webpage] [Code]
3D Diffusion Policy: "Generalizable Visuomotor Policy Learning via Simple 3D Representations", RSS 2024. [Paper] [Webpage] [Code]
DNAct: "Diffusion Guided Multi-Task 3D Policy Learning", arXiv Mar 2024. [Paper] [Webpage]
ManiCM: "Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation", arXiv Jun 2024. [Paper] [Webpage] [Code]
HDP: "Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation", CVPR 2024. [Paper] [Webpage] [Code]
Imagination Policy: "Using Generative Point Cloud Models for Learning Manipulation Policies", arXiv Jun 2024. [Paper] [Webpage]
PCWM: "Point Cloud Models Improve Visual Robustness in Robotic Learners", ICRA 2024. [Paper] [Webpage]
RVT: "Generalizable Visuomotor Policy Learning via Simple 3D Representations", CORL 2023. [Paper] [Webpage] [Code]
Act3D: "3D Feature Field Transformers for Multi-Task Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
VIHE: "Transformer-Based 3D Object Manipulation Using Virtual In-Hand View", arXiv, Mar 2024. [Paper] [Webpage] [Code]
SGRv2: "Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation", arXiv, Jun 2024. [Paper] [Webpage]
Sigma-Agent: "Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation", arXiv June 2024. [Paper]
RVT-2: "Learning Precise Manipulation from Few Demonstrations", RSS 2024. [Paper] [Webpage] [Code]
SAM-E: "Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation", ICML 2024. [Paper] [Webpage] [Code]
RISE: "3D Perception Makes Real-World Robot Imitation Simple and Effective", arXiv, Apr 2024. [Paper] [Webpage] [Code]
Polarnet: "3D Point Clouds for Language-Guided Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
Chaineddiffuser: "Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
Pointcloud_RL: "On the Efficacy of 3D Point Cloud Reinforcement Learning", arXiv, June 2023. [Paper] [Code]
Perceiver-Actor: "A Multi-Task Transformer for Robotic Manipulation", CORL 2022. [Paper] [Webpage] [Code]
CLIPort: "What and Where Pathways for Robotic Manipulation", CORL 2021. [Paper] [Webpage] [Code]
Polarnet: "3D Point Clouds for Language-Guided Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
Pretraining
3D-MVP: "3D Multiview Pretraining for Robotic Manipulation", arXiv, June 2024. [Paper] [Webpage]
DexArt: "Benchmarking Generalizable Dexterous Manipulation with Articulated Objects", CVPR 2023. [Paper] [Webpage] [Code]
RoboUniView: "Visual-Language Model with Unified View Representation for Robotic Manipulaiton", arXiv, Jun 2023. [Paper] [Website] [Code]
SUGAR: "Pre-training 3D Visual Representations for Robotics", CVPR 2024. [Paper] [Webpage] [Code]
DPR: "Visual Robotic Manipulation with Depth-Aware Pretraining", arXiv, Jan 2024. [Paper]
MV-MWM: "Multi-View Masked World Models for Visual Robotic Manipulation", ICML 2023. [Paper] [Code]
Point Cloud Matters: "Rethinking the Impact of Different Observation Spaces on Robot Learning", arXiv, Feb 2024. [Paper] [Code]
RL3D: "Visual Reinforcement Learning with Self-Supervised 3D Representations", IROS 2023. [Paper] [Website] [Code]
VLM and LLM
RoboRefer: "Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics", ArXiv 2025. [Paper] [Website]
AHA: "A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation", ArXiv 2024. [Paper] [Website]
ShapeLLM: "ShapeLLM: Universal 3D Object Understanding for Embodied Interaction", ECCV 2024. [Paper/PDF] [Code] [Website]
3D-VLA: "3D Vision-Language-Action Generative World Model", ICML 2024. [Paper] [Website] [Code]
RoboPoint: "A Vision-Language Model for Spatial Affordance Prediction for Robotics", CORL 2024. [Paper] [Website] [Demo]
Open6DOR: "Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach", IROS 2024. [Paper] [Website] [Code]
ReasoningGrasp: "Reasoning Grasping via Multimodal Large Language Model", CORL 2024. [Paper]
SpatialVLM: "Endowing Vision-Language Models with Spatial Reasoning Capabilities", CVPR 2024. [Paper] [Website] [Code]
SpatialRGPT: "Grounded Spatial Reasoning in Vision Language Model", arXiv, June 2024. [Paper] [Website]
Scene-LLM: "Extending Language Model for 3D Visual Understanding and Reasoning", arXiv, Mar 2024. [Paper]
ManipLLM: "Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation ", CVPR 2024. [Paper] [Website] [Code]
Manipulate-Anything: "Manipulate-Anything: Automating Real-World Robots using Vision-Language Models", CoRL, 2024. [Paper] [Website]
MOKA: "Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting", RSS 2024. [Paper] [Website] [Code]
Agent3D-Zero: "An Agent for Zero-shot 3D Understanding", arXIv, Mar 2024. [Paper] [Website] [Code]
MultiPLY: "A Multisensory Object-Centric Embodied Large Language Model in 3D World", CVPR 2024. [Paper] [Website] [Code]
ThinkGrasp: "A Vision-Language System for Strategic Part Grasping in Clutter", arXiv, Jul 2024. [Paper] [Website]
VoxPoser: "Composable 3D Value Maps for Robotic Manipulation with Language Models", CORL 2023. [Paper] [Website] [Code]
Dream2Real: "Zero-Shot 3D Object Rearrangement with Vision-Language Models", ICRA 2024. [Paper] [Website] [Code]
LEO: "An Embodied Generalist Agent in 3D World", ICML 2024. [Paper] [Website] [Code]
SpatialPIN: "Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors", arXiv, Mar 2024. [Paper] [Website]
SpatialBot: "Precise Spatial Understanding with Vision Language Models", arXiv, Jun 2024. [Paper] [Code]
COME-robot: "Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V", arXiv, Apr 2024. [Paper] [Website]
3D-LLM: "Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting", Neurips 2023. [Paper] [Website] [Code]
VLMaps: "Visual Language Maps for Robot Navigation", ICRA 2023. [Paper] [Website] [Code]
MoMa-LLM: "Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation", RA-L 2024. [Paper] [Website] [Code]
LGrasp6D: "Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance", ECCV 2024. [Paper] [Website]
OpenAD: "Open-Vocabulary Affordance Detection in 3D Point Clouds", IROS 2023. [Paper] [Website] [Code]
3DAPNet: "Language-Conditioned Affordance-Pose Detection in 3D Point Clouds", ICRA 2024. [Paper] [Website] [Code]
OpenKD: "Open-Vocabulary Affordance Detection using Knowledge Distillation and Text-Point Correlation", ICRA 2024. [Paper] [Code]
PARIS3D: "Reasoning Based 3D Part Segmentation Using Large Multimodal Model", ECCV 2024. [Paper] [Code]
Representation
RoVi-Aug: "Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning", CORL 2024. [Paper] [Webpage]
Vista: "View-Invariant Policy Learning via Zero-Shot Novel View Synthesis", CORL 2024. [Paper] [Webpage] [Code]
GraspSplats: "Efficient Manipulation with 3D Feature Splatting", CORL 2024. [Paper] [Webpage] [Code]
RAM: "Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation", CORL 2024. [Paper] [Webpage] [Code]
Language-Embedded Gaussian Splats (LEGS): "Incrementally Building Room-Scale Representations with a Mobile Robot", IROS 2024. [Paper] [Webpage]
Splat-MOVER: "Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting", arXiv May 2024. [Paper] [Webpage]
GNFactor: "Multi-Task Real Robot Learning with Generalizable Neural Feature Fields", CORL 2023. [Paper] [Webpage] [Code]
ManiGaussian: "Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", ECCV 2024. [Paper] [Webpage] [Code]
GaussianGrasper: "3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping", arXiv Mar 2024. [Paper] [Webpage] [Code]
ORION: "Vision-based Manipulation from Single Human Video with Open-World Object Graphs", arXiv May 2024. [Paper] [Webpage]
ConceptGraphs: "Open-Vocabulary 3D Scene Graphs for Perception and Planning", ICRA 2024. [Paper] [Webpage] [Code]
SparseDFF: "Sparse-View Feature Distillation for One-Shot Dexterous Manipulation", ICLR 2024. [Paper] [Webpage]
GROOT: "Learning Generalizable Manipulation Policies with Object-Centric 3D Representations", CORL 2023. [Paper] [Webpage] [Code]
Distilled Feature Fields: "Enable Few-Shot Language-Guided Manipulation", CORL 2023. [Paper] [Webpage] [Code]
SGR: "A Universal Semantic-Geometric Representation for Robotic Manipulation", CORL 2023. [Paper] [Webpage] [Code]
OVMM: "Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps", arXiv, Jun 2024. [Paper]
CLIP-Fields: "Weakly Supervised Semantic Fields for Robotic Memory", RSS 2023. [Paper] [Webpage] [Code]
NeRF in the Palm of Your Hand: "Corrective Augmentation for Robotics via Novel-View Synthesis", CVPR 2023. [Paper] [Webpage]
JCR: "Unifying Scene Representation and Hand-Eye Calibration with 3D Foundation Models", arXiv, Apr 2024. [Paper] [Code]
D3Fields: "Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation", arXiv, Sep 2023. [Paper] [Webpage] [Code]
SayPlan: "Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning", CORL 2023. [Paper] [Webpage]
Dex-NeRF: "Using a Neural Radiance field to Grasp Transparent Objects", CORL 2021. [Paper] [Webpage]
Simulations, Datasets and Benchmarks
RoboRefer: "Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics", ArXiv 2025. [Paper] [Website]
The Colosseum: "A Benchmark for Evaluating Generalization for Robotic Manipulation", RSS 2024. [Paper] [Website] [Code]
OpenEQA: "Embodied Question Answering in the Era of Foundation Models", CVPR 2024. [Paper] [Website] [Code]
DROID: "A Large-Scale In-the-Wild Robot Manipulation Dataset", RSS 2024. [Paper] [Website] [Code]
RH20T: "A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot", ICRA 2024. [Paper] [Website] [Code]
Gen2Sim: "A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot", ICRA 2024. [Paper] [Website] [Code]
BEHAVIOR Vision Suite: "Customizable Dataset Generation via Simulation", CVPR 2024. [Paper] [Website] [Code]
RoboCasa: "Large-Scale Simulation of Everyday Tasks for Generalist Robots", RSS 2024. [Paper] [Website] [Code]
ARNOLD: "ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes", ICCV 2023. [Paper] [Webpage] [Code]
VIMA: "General Robot Manipulation with Multimodal Prompts", ICML 2023. [Paper] [Website] [Code]
ManiSkill2: "A Unified Benchmark for Generalizable Manipulation Skills", ICLR 2023. [Paper] [Website] [Code]
Robo360: "A 3D Omnispective Multi-Material Robotic Manipulation Dataset", arxiv, Dec 2023. [Paper]
AR2-D2: "Training a Robot Without a Robot", CORL 2023. [Paper] [Website] [Code]
Habitat 2.0: "Training Home Assistants to Rearrange their Habitat", Neuips 2021. [Paper] [Website] [Code]
VL-Grasp: "a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes", IROS 2023. [Paper] [Code]
OCID-Ref: "A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding", NAACL 2021. [Paper] [Code]
ManipulaTHOR: "A Framework for Visual Object Manipulation", CVPR 2021. [Paper] [Website] [Code]
RoboTHOR: "An Open Simulation-to-Real Embodied AI Platform", CVPR 2020. [Paper] [Website] [Code]
HabiCrowd: "HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation", IROS 2024. [Paper] [Website] [Code]
Citation
If you find this repository useful, please consider citing this list:
@misc{irshad2024roboticd3D,
title = {Awesome Robotics 3D - A curated list of resources on 3D vision papers relating to robotics},
author = {Muhammad Zubair Irshad},
journal = {GitHub repository},
url = {https://github.com/zubair-irshad/Awesome-Robotics-3D},
year = {2024},
}
Owner
- Name: Zubair Irshad
- Login: zubair-irshad
- Kind: user
- Location: Silicon Valley, CA, USA
- Company: @GeorgiaTech @TRI-ML @GT-RIPL
- Website: https://zubairirshad.com
- Twitter: mzubairirshad
- Repositories: 60
- Profile: https://github.com/zubair-irshad
🔭 Machine Learning Research Scientist @ToyotaResearch 🔭 Researching 3D Vision | Scene Understanding | Embodied AI 🎓 PhD in AI and ML from @GeorgiaTech
GitHub Events
Total
- Watch event: 234
- Issue comment event: 1
- Push event: 2
- Pull request event: 7
- Fork event: 13
Last Year
- Watch event: 234
- Issue comment event: 1
- Push event: 2
- Pull request event: 7
- Fork event: 13
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 5
- Total pull requests: 18
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Total issue authors: 3
- Total pull request authors: 11
- Average comments per issue: 0.4
- Average comments per pull request: 0.56
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 18
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Issue authors: 3
- Pull request authors: 11
- Average comments per issue: 0.4
- Average comments per pull request: 0.56
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- stillonearth (1)
- Hlings (1)
- babycommando (1)
Pull Request Authors
- jiafei1224 (6)
- hq-fang (4)
- AnjieCheng (2)
- toannguyen1904 (2)
- qizekun (2)
- AmrinKareem (2)
- pvskand (2)
- huangjy-pku (2)
- yxKryptonite (2)
- GuanxingLu (2)
- rafaymhddn (1)
- RealSuSeven (1)
- Anjingkun (1)