Mingtong Zhang
I am building general purpose robots.
My research interests are in computer vision and robotics.
I focus on generalization capability and scalability of AI.
I explore how to represent our physical world and develop general models and algorithms to empower artificial intelligence to perceive and interact with it.
Email  / 
GScholar  / 
Github  / 
Twitter
|
|
News
- Jan. 2025: Talk at Peking University. Meet old and new friends!
|
I am into engineering and mechanics.
|
Learning from Massive Human Videos for Universal Humanoid Pose Control
Jiageng Mao*, Siheng Zhao*, Siqi Song*, Tianheng Shi, Junjie Ye, Mingtong Zhang, Haoran Geng, Jitendra Malik, Vitor Campagnolo Guizilini, Yue Wang
[Website]
[Paper]
[Code]
We introduce Humanoid-X, a large-scale dataset and model UH-1 that translates text instructions into humanoid robot actions, leveraging internet-scale data, video captions, motion retargeting, and policy learning for real-world deployment.
|
|
Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling
Mingtong Zhang*,
Kaifeng Zhang*,
Yunzhu Li
Conference on Robot Learning (CoRL), 2024
[Website]
[Paper]
[Code]
[Demo]
We build the structured world model for deformable objects. Our approach effectively creates a real-to-sim neural-based digital twin of real objects from real-world interactions and observations.
|
|
D3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement
Yixuan Wang*,
Mingtong Zhang*,
Zhuoran Li*,
Tarik Kelestemur,
Katherine Driggs-Campbell,
Jiajun Wu,
Li Fei-Fei,
Yunzhu Li
Conference on Robot Learning (CoRL), 2024
Oral
[Website]
[Paper]
[Code]
We propose an implicit representation to incorporate 3D, dynamic and semantic of scenes to enable generalization of robotic manipulations.
|
|
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
Zixian Liu*,
Mingtong Zhang*,
Yunzhu Li
Conference on Robot Learning (CoRL), 2024 @ LangRob Spotlight
International Conference on Robotics and Automation (ICRA), 2025
[Website]
We propose keypoints as the interface to learn object dynamics and visual prompting to vision-language models, enabling open-vocabulary robotic manipulation in a zero-shot manner.
|
|
Neural Dynamics Augmented Diffusion Policy
Ruihai Wu*,
Haozhe Chen*,
Mingtong Zhang*,
Haoran Lu,
Yitong Li,
Yunzhu Li
International Conference on Robotics and Automation (ICRA), 2025
[Website]
We propose neural dynamics augmented imitation learning that covers a large scene configurations with few-shot demonstrations.
|
|
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration
International Conference on Robotics and Automation (ICRA), 2024
Best Paper Award
[Project]
[Paper]
[Blogpost]
[Code]
[Data]
Scaling up learning across many different robot types.
|
|
Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields
Mingtong Zhang*,
Shuhong Zheng*,
Zhipeng Bao,
Martial Hebert,
Yu-Xiong Wang
European Conference on Computer Vision Workshop, 2022
[arXiv]
We propose a new approach to connect generative learning and discriminative learning through neural fields.
|
Service
- Reviewer: CoRL, ICLR, CVPR
|
|
Simulately: Handy information and resources for physics simulators for robot learning research.
Haoran Geng,
Yuyang Li,
Yuzhe Qin,
Ran Gong,
Wensi Ai,
Yuanpei Chen,
Puhao Li,
Junfeng Ni,
Zhou Xian,
Songlin Wei,
Yang You,
Yufei Ding,
Jialiang Zhang,
Mingtong Zhang
Open-source Project
Selected into CMU 16-831
[Website]
|
|