Aria Gen 2 glasses take research with Project Aria to the next level.
Despite the growing momentum around 3D reconstruction and generative AI in computer vision, a critical gap remains: how to create photorealistic, fully functional, and physically accurate 3D digital twins that are indistinguishable from their real-world counterparts and enable practical, downstream applications? Bridging this gap is essential for unlocking the full potential of immersive technologies and intelligent systems. This workshop aims to address this pressing challenge by spotlighting the latest advances in 3D digital twin creation—spanning geometry, appearance, functionality, and semantic understanding—and exploring their transformative impact across Spatial and Contextual AI, Robotics, AR/VR, and Digital Content Creation. Attendees will gain insights from distinguished speakers across diverse disciplines, who will share cutting-edge research, system innovations, and real-world deployment experiences.
Alongside this year’s talks and panel discussions, we are thrilled to introduce the updates on our recently-released Digital Twin Catalog (DTC) dataset, including:
•A dataset explorer for intuitive, in-browser exploration of every 3D object.
•Open-source, state-of-the-art reconstruction baselines that let researchers benchmark—and build upon—the latest methods.
Our aim is to create a shared foundation for rigorous evaluation, fresh ideas, and active collaboration in 3D-digital-twin research. By uniting an easily browsable dataset with strong baseline code and clear metrics, we hope to spark innovation, cross-pollinate projects, and speed the adoption of digital-twin technology across academia and industry.
Invited Speakers
Schedule
Morning Session
09:00 - 09:15
Opening Remarks and Welcome
09:15 - 09:45
Keynote Talk:
Building Models of Reality for Mixed Reality, Contextual AI and Robotics
Dr. Richard Newcombe
09:45 - 10:15
Project Aria and the Digital Twin Catalog
James Fort
Coffee Break
10:30 - 11:00
Learning Structured CAD Representations for 3D Digital Twins
Recent advances on neural fields have propelled the development of low-level rendering primitives such as NeRFs and 3D Gaussian splats, but they are not effective primitives for modeling, design, or interaction. To enable functional (not merely visualizable) 3D digital twins, the abilities to offer structured reasoning, semantic manipulation, as well as controllability and reusability, are all in demand and they can all be afforded by CAD models. The constructive nature of CAD models makes it natural to build and reason about them in terms of higher-level primitives and their structures/relations rather than over unstructured points/voxels. In this talk, I will first go over a series of our works which evolved from learning coordinate-based neural implicit fields to differentiable assembly of plane, and then quadric, primitives via constructive solid geometry (CSG). However, CSG assemblies are non-unique when subjected to reconstruction losses. To the end, I will present a method which learns to construct a unique spatial (e.g., Voronoi) partitioning first before primitive fitting, to produce boundary representations (B-Reps) from point clouds. I will then show how B-Rep generation can take on a variety of inputs including texts, single-/multi-view images, and sketches by learning a unified latent representation. Finally, to bring 3D digital twins to city scale, I will present our latest work on using architectural programs for structured 3D abstraction of buildings.
Prof. Richard (Hao) Zhang
11:00 - 11:30
Capturing Reality End-to-End: Fast, Texture-Ready Digital Twins with Dynamics and Articulation
Building usable 3D digital twins demands methods that work with realistic inputs and produce simulation-ready assets. This talk traces a practical pipeline that starts with Dynamic Gaussians Mesh (DG-Mesh)—the first approach to extract time-consistent, high-fidelity meshes with cross-frame correspondences from a casual video of a non-rigid scene. We then move to LARM, which needs only a handful of viewpoints to recover detailed geometry, textures, and joint skeletons for complex articulated objects, and to PartUV, which turns noisy reconstructions into clean, part-aligned UV maps for texturing. Finally, we show how ManiSkill 3 ingests these assets and drives GPU-parallel physics at tens of thousands of frames per second, closing the loop from monocular capture to large-scale simulation.
Dr. Xiaoshuai Zhang
11:30 - 12:00
Learning Generalizable Manipulation from Sim and Real Data
Abstract: TBD
Prof. Xiaolong Wang
Lunch Break
Afternoon Session
14:00 - 14:30
4D Digital Twins from Videos in the Wild
Creating 3D digital twins from real-world visual data remains a major challenge, particularly in dynamic environments or when observations are sparse. In this talk, I’ll present our recent work on a streaming 3D reconstruction framework that recovers geometry from sparse-view inputs, in-paints missing regions, and handles the reconstruction of dynamic scenes online. I’ll then introduce approaches that jointly perform reconstruction and tracking, moving toward more complete and coherent digital twins of complex, real-world environments.
Dr. Qianqian Wang
14:30 - 15:00
Feed-forward digital twins
Traditional 3D reconstruction and photogrammetry systems make use of iterative optimisation techniques such as bundle adjustment and fitting neural radiance fields or 3D Gaussian Splatting. In this talk, I will discuss new, powerful neural networks that can solve these tasks in a feed-forward manner. I will first introduce VGGT, a transformer network that can perform 3D reconstruction similar to COLMAP, but do so quickly, reliably, and, most importantly, by utilising only off-the-shelf components, with no optimisation involved in post-processing. I will then discuss Twinner, a network that can reconstruct not only the 3D shape of objects, but also their materials and scene illumination. The latter allows the model to be trained on real multi-view data.
Prof. Andrea Vedaldi
Coffee Break
15:15 - 15:45
Neural Relightable Assets
Representing, capturing and generating 3D objects has been a very active research area. However, many of the methods either bake the lighting (i.e., only solve the "novel view synthesis" problem) or assume restrictive shading models. I will offer some ways to capture and represent fully relightable 3D assets that do not make assumptions on specific shading models, and can be inserted and rendered in new scenes under new lighting conditions.
Dr. Milos Hasan
15:45 - 16:15
Physics-based inverse rendering
Abstract: Inferring the shape and material of an object is a crucial ingredient for building digital twins of real-world objects. To this end, recent advances in physics-based differentiable rendering have enabled analysis-by-synthesis solutions capable of taking complex light transport effects (e.g., soft shadows and inter-reflections) into consideration. In this talk, I will present some of our recent works in this direction and discuss many remaining challenges.
Prof. Shuang Zhao
16:15 - 17:00
Discussion Panel - 3D Digital Twin R&D across Robotics, Vision, and Graphics
This session brings together experts from all three disciplines to explore current challenges and future directions in digital-twin research. Audience questions will be collected in advance via Google Form, allowing the panel to address the most pressing topics. We’ll reserve the final 15 minutes for spontaneous, on-site questions to encourage lively, real-time interaction.
17:00 - 17:05
Conclusion
Organizers
Research lead, Meta Reality Labs Research
Research scientist, Meta Reality Labs Research
Research scientist, Meta Reality Labs Research
Associate professor, UCSD
Assistant professor, Stanford
Professor, UCSD
Principle Scientist, Adobe
Associate Professor, Princeton
Associate Professor, UC Irvine
Assistant Professor, UPenn
Principle Scientist, Naver Labs Europe
PhD student, Stanford
PhD student, Stanford
Professor, Stanford