Mingtong Zhang

I am building general purpose robots.

My research interests are in computer vision and robotics. I focus on generalization capability and scalability of AI.

I explore how to represent our physical world and develop general models and algorithms to empower artificial intelligence to perceive and interact with it.

Email / GScholar / Github / Twitter

News

Jan. 2025: Talk at Peking University. Meet old and new friends!

I am into engineering and mechanics.

Research

	Learning from Massive Human Videos for Universal Humanoid Pose Control Jiageng Mao, Siheng Zhao, Siqi Song, Tianheng Shi, Junjie Ye, Mingtong Zhang*, Haoran Geng, Jitendra Malik, Vitor Campagnolo Guizilini, Yue Wang [Website] [Paper] [Code] We introduce Humanoid-X, a large-scale dataset and model UH-1 that translates text instructions into humanoid robot actions, leveraging internet-scale data, video captions, motion retargeting, and policy learning for real-world deployment.
	Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling Mingtong Zhang, Kaifeng Zhang, Yunzhu Li Conference on Robot Learning (CoRL), 2024 [Website] [Paper] [Code] [Demo] We build the structured world model for deformable objects. Our approach effectively creates a real-to-sim neural-based digital twin of real objects from real-world interactions and observations.
	D³Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement Yixuan Wang, Mingtong Zhang, Zhuoran Li, Tarik Kelestemur, Katherine Driggs-Campbell, Jiajun Wu, Li Fei-Fei, Yunzhu Li Conference on Robot Learning* (CoRL), 2024 Oral [Website] [Paper] [Code] We propose an implicit representation to incorporate 3D, dynamic and semantic of scenes to enable generalization of robotic manipulations.
	KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation Zixian Liu, Mingtong Zhang, Yunzhu Li Conference on Robot Learning (CoRL), 2024 @ LangRob Spotlight International Conference on Robotics and Automation (ICRA), 2025 [Website] We propose keypoints as the interface to learn object dynamics and visual prompting to vision-language models, enabling open-vocabulary robotic manipulation in a zero-shot manner.
	Neural Dynamics Augmented Diffusion Policy Ruihai Wu, Haozhe Chen, Mingtong Zhang, Haoran Lu, Yitong Li, Yunzhu Li International Conference on Robotics and Automation* (ICRA), 2025 [Website] We propose neural dynamics augmented imitation learning that covers a large scene configurations with few-shot demonstrations.
	Open X-Embodiment: Robotic Learning Datasets and RT-X Models Open X-Embodiment Collaboration International Conference on Robotics and Automation (ICRA), 2024 Best Paper Award [Project] [Paper] [Blogpost] [Code] [Data] Scaling up learning across many different robot types.
	Beyond RGB: Scene-Property Synthesis with Neural Radiance Fields Mingtong Zhang, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Yu-Xiong Wang European Conference on Computer Vision Workshop, 2022 [arXiv] We propose a new approach to connect generative learning and discriminative learning through neural fields.

Service

Reviewer: CoRL, ICLR, CVPR

Simulately: Handy information and resources for physics simulators for robot learning research.
Haoran Geng, Yuyang Li, Yuzhe Qin, Ran Gong, Wensi Ai, Yuanpei Chen, Puhao Li, Junfeng Ni, Zhou Xian, Songlin Wei, Yang You, Yufei Ding, Jialiang Zhang, Mingtong Zhang
Open-source Project
Selected into CMU 16-831
[Website]

Template adapted from Jon Barron and Shuran Song.