Empowering Robots to Assist Humans
dr. Yu Xiang Works with UTD Students and Summer Campers
to Advance Robots’ Autonomy
Dr. Yu Xiang leads the Intelligent Robotics and Vision Lab (IRVL) in the department of computer science in the Erik Jonsson School of Engineering and Computer Science at the University of Texas at Dallas.
The lab focuses on fundamental research in intelligent robotics and computer vision, and Dr. Xiang considers the scientific question: How can robots conduct tasks autonomously in the physical world to assist humans? For example, he is interested in building robots that can cook a meal or clean a kitchen table for people.
To deploy robots to perform complex tasks autonomously, Dr. Xiang conducts original research to enable robots to learn various skills in perception, planning, control and learning. His approach is to use vision as the main sensing modality for robots and build robotic systems integrating various robot skills. To complement this approach, he studies mechanisms which enable robots to improve skills in a lifelong way.
Inside Xiang’s lab is a storage bin full of toy packages of common foods such as spaghetti, ketchup, and carrots, which are used in training the lab robot named Ramp. Ramp is a Fetch Robotics mobile manipulator robot, standing about four feet tall on a round mobile platform, with a long mechanical arm with seven joints. At the end of the mechanical arm is a square “hand” with two fingers to grasp objects.
To teach Ramp skills on using objects to perform tasks, Dr. Xiang’s team is exploring a technique called “learning from human demonstrations”: A human demonstrates a task to a robot, and then the robot repeats it. A critical component in learning from human demonstration is enabling the robot to understand and, ultimately, emulate the human’s behavior.
Recently, Dr. Xiang and his research students (Jikai Wang and Qifan Zhang) introduced a data capture system, and a new dataset named HO-Cap, which can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems.
“We introduce a semi-automatic method to obtain annotations of the shape and pose of hands and objects in the collected videos, which significantly reduces the required annotation time compared to manual labeling,” Dr. Xiang said. “With this system, we captured a video dataset of humans using objects to perform different tasks, as well as simple pick-and-place and handover of an object from one hand to the other, which can be used as human demonstrations for embodied AI and robot manipulation research.”
In summer 2024, Dr. Xiang’s lab participated in the STEM Bridge Summer Camp at UT Dallas. They hosted a team of six high school students, providing them with seven weeks of hands-on research experience in the lab. Supervised by Wang and Zhang, the students collected images of hands manipulating objects in the lab and then applied these computer vision algorithms in HO-Cap to detect and estimate the pose of hands and objects from the collected images. At camp’s end, the program celebrated the end of a successful journey. Sixteen groups of students from each research lab wrapped up their projects and engaged in a dynamic closing ceremony presentation, which featured three-minute video presentations, followed by five-minute group presentations (below, a video of Dr. Xiang’s lab’s presentation). Dr. Xiang’s lab placed second amongst these sixteen groups.