作者:by Ingrid Fadelli, Phys.org
Over the past years, computer scientists have introduced increasingly sophisticated generative AI models that can produce personalized content following specific inputs or instructions. While image generation models are now widely used, many of them are unpredictable and precisely controlling the images they create remains a challenge.
In a recent paper presented at this year's Conference on Computer Vision and Pattern Recognition (CVPR 2025), held in Nashville, June 11–15, researchers at NVIDIA introduced DiffusionRenderer, a new machine learning approach that could advance the generation and editing of images, allowing users to precisely adjust specific image attributes.
"Generative AI has made huge strides in visual creation, but it introduces an entirely new creative workflow that differs from classic graphics and still struggles with controllability," Sanja Fidler, VP of AI Research at NVIDIA and head of the Spatial Intelligence lab, told Tech Xplore.
"With DiffusionRenderer, we wanted to bridge that gap by combining the precision of traditional graphics pipelines with the flexibility of AI. Our goal is to explore and design the next generation of rendering to be more accessible, controllable, and easily integrated with existing tools."
The new approach introduced by Fidler and her colleagues can convert individual two-dimensional (2D) videos into graphics-compatible scene representations. Notably, it also allows users to adjust the lighting and materials in the representations, producing new content aligned with their needs and preferences.
"DiffusionRenderer is a huge breakthrough because it solves two longtime challenges in computer graphics simultaneously — inverse rendering for pulling the geometry and materials from real-world videos, and forward rendering for generating photorealistic images and videos from scene representations," said Fidler.
"One of the most exciting achievements of DiffusionRenderer is that it brings generative AI to the core of graphics workflows and complements it by making traditionally time-consuming tasks like asset creation, relighting, and material editing more efficient."
The new neural rendering approach introduced by the researchers relies on diffusion models, a class of deep learning algorithms that can generate images by progressively refining random noise into coherent graphics. In contrast with other image generation techniques introduced in the past, DiffusionRenderer works by first producing G-buffers (i.e., intermediate image representations outlining specific attributes) and then using these representations to create new and realistic images.
"We're also proud of the breakthrough we made in building a high-quality synthetic dataset with accurate lighting and materials to help the model learn to realistically decompose and reconstruct scenes," explained Fidler. "We found that the quality scales with the size of the underlying video diffusion model—meaning when we integrated with NVIDIA Cosmos, the results become even sharper and more consistent."
In the future, DiffusionRenderer could be used by both robotics researchers and creative professionals. For instance, it could prove valuable for content creators who are developing videogames, advertisements or producing films, as it would allow them to add, remove or edit specific attributes with high precision. It could also be used by computer scientists to create photorealistic data to train algorithms for robotics or image classification.
"Its other big impact could be in simulation and physical AI — robotics and AV training need the most diverse possible datasets, and DiffusionRenderer can generate new lighting conditions from new scenes," added Fidler. "We're excited to keep pushing the boundaries in this space.
"Our future work focuses on generating even higher-quality results, improving runtime efficiency, and adding more powerful features like semantic control, object compositing, and more advanced editing tools."
Written for you by our author Ingrid Fadelli, edited by Lisa Lock, and fact-checked and reviewed by Andrew Zinin—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You'll get an ad-free account as a thank-you.
More information: DiffusionRenderer: Neural inverse and forward rendering with video diffusion models. arXiv:2501.18590 [cs.CV]. arxiv.org/abs/2501.18590
© 2025 Science X Network
Citation: NVIDIA's new AI tool enables precise editing of 3D scenes and photorealistic images (2025, July 14) retrieved 15 July 2025 from https://techxplore.com/news/2025-07-nvidia-ai-tool-enables-precise.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.