Aria Gen 2 glasses take research with Project Aria to the next level.

Learn more in our latest blog post. 

3D Digital Twin: Progress, Challenges, and Future Directions. CCVPR 2025 Workshop, Nashville, TN: June 12 2025, 09:00–17:00

3D Digital Twin: Progress, Challenges, and Future Directions

06/12/2025 9:00am - 5:00pm. Room: 102 B

Despite the growing momentum around 3D reconstruction and generative AI in computer vision, a critical gap remains: how to create photorealistic, fully functional, and physically accurate 3D digital twins that are indistinguishable from their real-world counterparts and enable practical, downstream applications? Bridging this gap is essential for unlocking the full potential of immersive technologies and intelligent systems. This workshop aims to address this pressing challenge by spotlighting the latest advances in 3D digital twin creation—spanning geometry, appearance, functionality, and semantic understanding—and exploring their transformative impact across Spatial and Contextual AI, Robotics, AR/VR, and Digital Content Creation. Attendees will gain insights from distinguished speakers across diverse disciplines, who will share cutting-edge research, system innovations, and real-world deployment experiences.

Alongside this year’s talks and panel discussions, we are thrilled to introduce the updates on our recently-released Digital Twin Catalog (DTC) dataset, including:

A dataset explorer for intuitive, in-browser exploration of every 3D object.
Open-source, state-of-the-art reconstruction baselines that let researchers benchmark—and build upon—the latest methods.

Our aim is to create a shared foundation for rigorous evaluation, fresh ideas, and active collaboration in 3D-digital-twin research. By uniting an easily browsable dataset with strong baseline code and clear metrics, we hope to spark innovation, cross-pollinate projects, and speed the adoption of digital-twin technology across academia and industry.

Schedule


Morning Session


09:00 - 09:15

Opening Remarks and Welcome



09:15 - 09:45

Keynote Talk:

Building Models of Reality for Mixed Reality, Contextual AI and Robotics

=

Dr. Richard Newcombe


09:45 - 10:15


Project Aria and the Digital Twin Catalog

  • Introduction of DTC dataset
    • Dataset composition
    • New online dataset explorer
    • Open-source SOTA 3D object reconstruction baseline methods
  • Introduction of Aria Gen 2

James Fort



Coffee Break


10:30 - 11:00


Learning Structured CAD Representations for 3D Digital Twins

Recent advances on neural fields have propelled the development of low-level rendering primitives such as NeRFs and 3D Gaussian splats, but they are not effective primitives for modeling, design, or interaction. To enable functional (not merely visualizable) 3D digital twins, the abilities to offer structured reasoning, semantic manipulation, as well as controllability and reusability, are all in demand and they can all be afforded by CAD models. The constructive nature of CAD models makes it natural to build and reason about them in terms of higher-level primitives and their structures/relations rather than over unstructured points/voxels. In this talk, I will first go over a series of our works which evolved from learning coordinate-based neural implicit fields to differentiable assembly of plane, and then quadric, primitives via constructive solid geometry (CSG). However, CSG assemblies are non-unique when subjected to reconstruction losses. To the end, I will present a method which learns to construct a unique spatial (e.g., Voronoi) partitioning first before primitive fitting, to produce boundary representations (B-Reps) from point clouds. I will then show how B-Rep generation can take on a variety of inputs including texts, single-/multi-view images, and sketches by learning a unified latent representation. Finally, to bring 3D digital twins to city scale, I will present our latest work on using architectural programs for structured 3D abstraction of buildings.


Prof. Richard (Hao) Zhang


11:00 - 11:30


Capturing Reality End-to-End: Fast, Texture-Ready Digital Twins with Dynamics and Articulation

Building usable 3D digital twins demands methods that work with realistic inputs and produce simulation-ready assets. This talk traces a practical pipeline that starts with Dynamic Gaussians Mesh (DG-Mesh)—the first approach to extract time-consistent, high-fidelity meshes with cross-frame correspondences from a casual video of a non-rigid scene. We then move to LARM, which needs only a handful of viewpoints to recover detailed geometry, textures, and joint skeletons for complex articulated objects, and to PartUV, which turns noisy reconstructions into clean, part-aligned UV maps for texturing. Finally, we show how ManiSkill 3 ingests these assets and drives GPU-parallel physics at tens of thousands of frames per second, closing the loop from monocular capture to large-scale simulation.


Dr. Xiaoshuai Zhang


11:30 - 12:00

Learning Generalizable Manipulation from Sim and Real Data

Abstract: TBD

Prof. Xiaolong Wang



Lunch Break



Afternoon Session


14:00 - 14:30


4D Digital Twins from Videos in the Wild

Creating 3D digital twins from real-world visual data remains a major challenge, particularly in dynamic environments or when observations are sparse. In this talk, I’ll present our recent work on a streaming 3D reconstruction framework that recovers geometry from sparse-view inputs, in-paints missing regions, and handles the reconstruction of dynamic scenes online. I’ll then introduce approaches that jointly perform reconstruction and tracking, moving toward more complete and coherent digital twins of complex, real-world environments.


Dr. Qianqian Wang


14:30 - 15:00


Feed-forward digital twins

Traditional 3D reconstruction and photogrammetry systems make use of iterative optimisation techniques such as bundle adjustment and fitting neural radiance fields or 3D Gaussian Splatting. In this talk, I will discuss new, powerful neural networks that can solve these tasks in a feed-forward manner. I will first introduce VGGT, a transformer network that can perform 3D reconstruction similar to COLMAP, but do so quickly, reliably, and, most importantly, by utilising only off-the-shelf components, with no optimisation involved in post-processing. I will then discuss Twinner, a network that can reconstruct not only the 3D shape of objects, but also their materials and scene illumination. The latter allows the model to be trained on real multi-view data.


Prof. Andrea Vedaldi



Coffee Break


15:15 - 15:45


Neural Relightable Assets

Representing, capturing and generating 3D objects has been a very active research area. However, many of the methods either bake the lighting (i.e., only solve the "novel view synthesis" problem) or assume restrictive shading models. I will offer some ways to capture and represent fully relightable 3D assets that do not make assumptions on specific shading models, and can be inserted and rendered in new scenes under new lighting conditions.

Dr. Milos Hasan


15:45 - 16:15


Physics-based inverse rendering

Abstract: Inferring the shape and material of an object is a crucial ingredient for building digital twins of real-world objects. To this end, recent advances in physics-based differentiable rendering have enabled analysis-by-synthesis solutions capable of taking complex light transport effects (e.g., soft shadows and inter-reflections) into consideration. In this talk, I will present some of our recent works in this direction and discuss many remaining challenges.

Prof. Shuang Zhao


16:15 - 17:00


Discussion Panel - 3D Digital Twin R&D across Robotics, Vision, and Graphics

This session brings together experts from all three disciplines to explore current challenges and future directions in digital-twin research. Audience questions will be collected in advance via Google Form, allowing the panel to address the most pressing topics. We’ll reserve the final 15 minutes for spontaneous, on-site questions to encourage lively, real-time interaction.


17:00 - 17:05

Conclusion


Organizers

Dr. Zhao Dong

Research lead, Meta Reality Labs Research

Dr. Zhaoyang Lv

Research scientist, Meta Reality Labs Research

Dr. Zhengqin Li

Research scientist, Meta Reality Labs Research

Prof. Hao Su

Associate professor, UCSD

Prof. Jiajun Wu

Assistant professor, Stanford

Dr. Kalyan Sunkavalli

Principle Scientist, Adobe

Prof. Jia Deng

Associate Professor, Princeton

Prof. Shuang Zhao

Associate Professor, UC Irvine

Prof. Lingjie Liu

Assistant Professor, UPenn

Dr. Jérome Revaud

Principle Scientist, Naver Labs Europe

Hong-Xing “Koven” Yu

PhD student, Stanford

Dr. Zhengqin Li

PhD student, Stanford

Prof. Leonidas Guibas

Professor, Stanford