My current research interests lie at 3D Computer Vision, Multi-Modal and Robot Learning . Specifically, I am interested in category-level 6-DoF pose estimation/tracking, 3D shape reconstruction and multi-modal (3D video + language). I am also interested in diffusion-generating model for 3D vision and generalist robot policy, and applications in real-world robotic tasks.
Our method achieves real-time, causal 6-DoF pose tracking while reconstructing the 3D shape in the current observation. It not only enables zero-shot inference for unseen objects with known categories, but also perfectly showcases the zero-shot capabilities for unseen objects with unknown classes.
Developing a real-time category-level object 6-DoF pose tracking that can be applied to aerial manipulation without using any pre-defined object CAD models.
We focus on category-level multiobject 9-Dimensional (9D) state tracking from the point cloud stream, and propose a novel 9D state estimation network with Kalman-based state optimization to estimate the 6-DoF pose and 3D size of each instance in the scene.
Introducing a diffusion-driven self-supervised network for multi-object shape reconstruction and categorical pose estimation, only leveraging the shape priors.
We propose a pixel-level noise mining framework for robust salient object detection by exploiting its own knowledge, and without the need for external models.
We propose a dynamicstatic parallel network for dynamic body gestures and a spatiotemporal graph attention module to improve the graph data fusion effect in the dynamic-static network. Finally, we implement a complete command module to form complete commands with body and hand information for interactions and control of the mobile robot.
We propose a novel gravitational discriminative optimization (GDO) method based on a multiview reconstruction framework for shape surface reconstrcution. It consists of a training phase and a reconstruction phase.