AI model translates text commands into motion for diverse robots and avatars

2025-05-08 19:13:04 英文原文

作者：Brown University

Researchers develop AI motion 'translation' model for controlling different kinds of robots — MotionGlot is a model that can generate motion trajectories that obey user instructions across multiple embodiments with different action dimensions, such as (a) quadruped robots, and (b) humans. The figures (a,b) depict the qualitative benchmark of MotionGlot against the adapted templates (A.T) of [1] on the text-to-robot motion (Section IV-A.1), Q&A with human motion (Section IV-C) tasks respectively. The overall quantitative performance across tasks is shown in (c). In (a,b), increasing opacity indicates forward time. Credit: *arXiv* (2024). DOI: 10.48550/arxiv.2410.16623

Brown University researchers have developed an artificial intelligence model that can generate movement in robots and animated figures in much the same way that AI models like ChatGPT generate text.

A paper describing this work is published on the arXiv preprint server.

The model, called MotionGlot, enables users to simply type an action—"walk forward a few steps and take a right"— and the model can generate accurate representations of that motion to command a robot or animated avatar.

The model's key advance, according to the researchers, is its ability to "translate" motion across robot and figure types, from humanoids to quadrupeds and beyond. That enables the generation of motion for a wide range of robotic embodiments and in all kinds of spatial configurations and contexts.

"We're treating motion as simply another language," said Sudarshan Harithas, a Ph.D. student in computer science at Brown, who led the work. "And just as we can translate languages—from English to Chinese, for example—we can now translate language-based commands to corresponding actions across multiple embodiments. That enables a broad set of new applications."

The research will be presented later this month at the 2025 International Conference on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of computer science at Brown.

Large language models like ChatGPT generate text through a process called "next token prediction," which breaks language down into a series of tokens, or small chunks, like individual words or characters. Given a single token or a string of tokens, the language model makes a prediction about what the next token might be.

These models have been incredibly successful in generating text, and researchers have begun using similar approaches for motion. The idea is to break down the components of motion—the discrete position of legs during the process of walking, for example—into tokens. Once the motion is tokenized, fluid movements can be generated through next token prediction.

One challenge with this approach is that motions for one body type can look very different for another. For example, when a person is walking a dog down the street, the person and the dog are both doing something called "walking," but their actual motions are very different. One is upright on two legs; the other is on all fours.

According to Harithas, MotionGlot can translate the meaning of walking from one embodiment to another. So a user commanding a figure to "walk forward in a straight line" will get the correct motion output whether they happen to be commanding a humanoid figure or a robot dog.

To train their model, the researchers used two datasets, each containing hours of annotated motion data. QUAD-LOCO features dog-like quadruped robots performing a variety of actions along with rich text describing those movements. A similar dataset called QUES-CAP contains real human movement, along with detailed captions and annotations appropriate to each movement.

Using that training data, the model reliably generates appropriate actions from text prompts, even actions it has never specifically seen before. In testing, the model was able to recreate specific instructions, like "a robot walks backwards, turns left and walks forward," as well as more abstract prompts like "a robot walks happily."

It can even use motion to answer questions. When asked, "Can you show me movement in cardio activity?" the model generates a person jogging.

"These models work best when they're trained on lots and lots of data," Sridhar said. "If we could collect large-scale data, the model can be easily scaled up."

The model's current functionality and the adaptability across embodiments make for promising applications in human-robot collaboration, gaming and virtual reality, and digital animation and video production, the researchers say. They plan to make the model and its source code publicly available so other researchers can use it and expand on it.

More information: Sudarshan Harithas et al, MotionGlot: A Multi-Embodied Motion Generation Model, arXiv (2024). DOI: 10.48550/arxiv.2410.16623

Journal information: arXiv

Citation: AI model translates text commands into motion for diverse robots and avatars (2025, May 8) retrieved 9 May 2025 from https://techxplore.com/news/2025-05-ai-motion-kinds-robots.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

关于《AI model translates text commands into motion for diverse robots and avatars》的评论

暂无评论

发表评论

摘要

Researchers at Brown University have developed MotionGlot, an AI model that translates text commands into motion trajectories for various robotic and animated embodiments, including humanoids and quadrupeds. This model uses next token prediction to generate fluid movements across different body types based on annotated datasets. It demonstrates the ability to create accurate motions from text prompts and can adapt to new scenarios it hasn't seen before. Potential applications include human-robot collaboration, gaming, virtual reality, and digital animation. The paper is available on arXiv (DOI: 10.48550/arxiv.2410.16623).

AI model translates text commands into motion for diverse robots and avatars

关于《AI model translates text commands into motion for diverse robots and avatars》的评论

发表评论

摘要

相关新闻

相关讨论