英语轻松读发新版了,欢迎下载、更新

NVIDIA's new AI tool enables precise editing of 3D scenes and photorealistic images

2025-07-14 10:40:22 英文原文

作者:by Ingrid Fadelli, Phys.org

NVIDIA develops new AI tool to infer attributes of 3D scenes and generate photorealistic images based on specific inputs
Image showing examples of DiffusionRenderer's geometry estimates and photorealistic images with specific lighting conditions generated by the model. Credit: Liang et al, NVIDIA

Over the past years, computer scientists have introduced increasingly sophisticated generative AI models that can produce personalized content following specific inputs or instructions. While image generation models are now widely used, many of them are unpredictable and precisely controlling the images they create remains a challenge.

In a recent paper presented at this year's Conference on Computer Vision and Pattern Recognition (CVPR 2025), held in Nashville, June 11–15, researchers at NVIDIA introduced DiffusionRenderer, a new machine learning approach that could advance the generation and editing of images, allowing users to precisely adjust specific image attributes.

"Generative AI has made huge strides in visual creation, but it introduces an entirely new creative workflow that differs from classic graphics and still struggles with controllability," Sanja Fidler, VP of AI Research at NVIDIA and head of the Spatial Intelligence lab, told Tech Xplore.

"With DiffusionRenderer, we wanted to bridge that gap by combining the precision of traditional graphics pipelines with the flexibility of AI. Our goal is to explore and design the next generation of rendering to be more accessible, controllable, and easily integrated with existing tools."

The new approach introduced by Fidler and her colleagues can convert individual two-dimensional (2D) videos into graphics-compatible scene representations. Notably, it also allows users to adjust the lighting and materials in the representations, producing new content aligned with their needs and preferences.

Credit: NVIDIA

"DiffusionRenderer is a huge breakthrough because it solves two longtime challenges in simultaneously — inverse rendering for pulling the geometry and materials from real-world videos, and forward rendering for generating photorealistic images and videos from scene representations," said Fidler.

"One of the most exciting achievements of DiffusionRenderer is that it brings generative AI to the core of graphics workflows and complements it by making traditionally time-consuming tasks like asset creation, relighting, and material editing more efficient."

The new neural rendering approach introduced by the researchers relies on diffusion models, a class of deep learning algorithms that can generate images by progressively refining random noise into coherent graphics. In contrast with other image generation techniques introduced in the past, DiffusionRenderer works by first producing G-buffers (i.e., intermediate image representations outlining specific attributes) and then using these representations to create new and realistic images.

"We're also proud of the breakthrough we made in building a high-quality synthetic dataset with accurate lighting and materials to help the model learn to realistically decompose and reconstruct scenes," explained Fidler. "We found that the quality scales with the size of the underlying video diffusion model—meaning when we integrated with NVIDIA Cosmos, the results become even sharper and more consistent."

Credit: NVIDIA

In the future, DiffusionRenderer could be used by both robotics researchers and creative professionals. For instance, it could prove valuable for who are developing videogames, advertisements or producing films, as it would allow them to add, remove or edit specific attributes with high precision. It could also be used by to create photorealistic data to train algorithms for robotics or image classification.

"Its other big impact could be in simulation and physical AI — robotics and AV training need the most diverse possible datasets, and DiffusionRenderer can generate new lighting conditions from new scenes," added Fidler. "We're excited to keep pushing the boundaries in this space.

"Our future work focuses on generating even higher-quality results, improving runtime efficiency, and adding more powerful features like semantic control, object compositing, and more advanced editing tools."

Written for you by our author Ingrid Fadelli, edited by Lisa Lock, and fact-checked and reviewed by Andrew Zinin—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive. If this reporting matters to you, please consider a donation (especially monthly). You'll get an ad-free account as a thank-you.

More information: DiffusionRenderer: Neural inverse and forward rendering with video diffusion models. arXiv:2501.18590 [cs.CV]. arxiv.org/abs/2501.18590

research.nvidia.com/labs/toron … i/DiffusionRenderer/

© 2025 Science X Network

Citation: NVIDIA's new AI tool enables precise editing of 3D scenes and photorealistic images (2025, July 14) retrieved 15 July 2025 from https://techxplore.com/news/2025-07-nvidia-ai-tool-enables-precise.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

关于《NVIDIA's new AI tool enables precise editing of 3D scenes and photorealistic images》的评论


暂无评论

发表评论

摘要

Researchers at NVIDIA introduced DiffusionRenderer, a new machine learning approach presented at CVPR 2025, aimed at advancing image generation and editing with precise control over attributes like lighting and materials. This method combines traditional graphics precision with AI flexibility, enabling users to convert 2D videos into graphics-compatible scenes and adjust elements for photorealistic outcomes. The technique uses diffusion models to generate images from G-buffers, addressing longstanding challenges in computer graphics such as inverse and forward rendering. Future applications include enhancing datasets for robotics training and improving efficiency in creative industries like videogame development and film production.