ECCV 2024 Project Aria Tutorial

Third Hands-on Egocentric Research Tutorial with Project Aria, from Meta

Held in Conjunction with ECCV 2024

14:00 - 18:00, 29th September 2024, Room Amber 3

Overview

In 2022, Meta Reality Labs Research hosted the first Project Aria Workshop at CVPR, introducing researchers to Project Aria, a research device from Reality Labs Research, worn like a regular pair of glasses, and used to accelerate research in the field of egocentric perception. Since then, the Project Aria academic program has grown to include over 100 university partners from around the world, providing researchers with devices, datasets, tools, and services, to accelerate always-on egocentric perception.

In this third tutorial, we will introduce attendees to new features of the Aria Research Kit (ARK) and Open Science Initiatives (OSI), share novel research from academic partner program members, describe how researchers can gain access to the Project Aria academic program, and introduce new open datasets for accelerating machine perception research.

The tutorial will take place in Room Amber 3 on the afternoon of September 29th from 14:00 - 18:00 with a 30-minute break at 15:30. See the detailed agenda, below.

Agenda

Time

Topic

Presenter

14:00 - 14:50

Surreal's Update to the Aria Research Kit and Open Science Initiatives: In this talk, we will be setting the context for egocentric research and showing how Meta's Project Aria will enable a new wave of AI research. We will also show some updates from the Aria Research Kit the Open Science Initiatives that will enable you to get started with Egocentric research quickly and improve your Egocentric research if you're already invested in the usage of Aria. We'll see how recordings are getting better and it's easier to consume data for downstream tasks.

James Fort

Research Product Manager

Reality Labs Research, Meta

14:50 - 15:30

Digital Twin Catalog: The field of 3D reconstruction is central in the computer vision community and aims to solve fundamental problems around reconstructing 3D objects from 2D images. State of the art techniques today fall short of the needs for real-world applications, largely limited by the datasets available. Project Aria’s Digital Twin Catalog aims to motivate the 3D reconstruction community to reach a new level of quality and realism by providing a large and highly detailed set of 3D object models, corresponding source capture data, and reconstruction algorithms for researchers to use.

Dr. Zhao Dong

Research Science Manager,

Reality Labs Research, Meta

15:30 - 16:00

30-minute Break

16:00 - 16:20

Surreal's HOT3D dataset for hand-object interaction unerstanding:
We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects.

Dr. Prithviraj Banerjee

Research Scientist

Reality Labs Research, Meta

16:20 - 16:40

Surreal's Nymeria dataset for egocentric human motion understanding: Future AR/VR technology is the era of human-centric contextualized AI computing. A cornerstone in this paradigm is to understand one’s own body motion and action. To accelerate research in this field, this talk introduces the Nymeria dataset. Nymeria is the world's largest collection of human motion in the wild, capturing diverse people performing diverse activities across diverse locations. It is first of its kind to record body motion using multiple egocentric multimodal devices, all accurately synchronized and localized in one single metric 3D world. Nymeria is also the world's largest dataset with motion-language descriptions, featuring hierarchical in-context narration. To demonstrate the potential of the Nymeria dataset, this talk also discusses how we build state-of-the-art algorithms to solve egocentric body tracking, motion synthesis and action recognition.

Dr. Lingni Ma

Research Scientist

Reality Labs Research, Meta

16:40 - 16:50

Surreal's EFM3D dataset for 3D egocentric foundation models: The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D, a benchmark with two core 3D egocentric perception tasks. EFM3D is the first benchmark for 3D object detection and surface regression on high quality annotated egocentric data of Project Aria. We propose Egocentric Voxel Lifting (EVL), a baseline for 3D EFMs. EVL leverages all available egocentric modalities and inherits foundational capabilities from 2D foundation models. This model, trained on a large simulated dataset, outperforms existing methods on the EFM3D benchmark.

Dr. Julian Straub

Research Scientist

Reality Labs Research, Meta

16:50 - 17:15

Ego-Exo4D Overview: This session provides a overview of the Ego-Exo4D dataset. Ego-Exo4D is a diverse, large-scale multimodal multiview video benchmark dataset centered around simultaneously-captured egocentric and exocentric video of skilled human activtities. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain.

Dr. Antonino Furnari

University of Catania

17:15 - 17:30

Egocentric Recording in the Home: How to collect and annotate unscripted recordings using the Aria Research Kit. This short talk will focus on new capabilities when using the Aria toolkit for unscripted long term recordings in participants homes.

Prof. Dima Damien

University of Bristol

17:30 - 17:45

Building a Procedural Wearable Assistant with ARIA: Current benchmarks on Egocentric Vision lack of principle benchmark including real dual-agent conversation. A personal assistant should be able to see what the user is doing and relate vision to language to contextualize questions such as “what is this?” or “what should I do now?”. In this talk, I will discuss the acquisition of a novel dual-agent multimodal dataset using ARIA glasses and the ARIA Research Kit. I will present a data acquisition and annotation pipeline that leverages the ARIA SDK and the MPS services. Finally, I will provide an overview of the acquired dataset, which includes procedural videos captured in various scenarios with the collaboration of both trainees and experts.

Dr. Francesco Ragusa

University of Catania

17:45 - 18:00

LaMar Aria: We benchmark crowd-sourced 3D localization and mapping for AR, a core technology for the grounding and persistence of digital content in the real world. Devices record egocentric multi-modal data that exhibits specific challenges and opportunities not found in existing benchmarks. Compared to existing mobile and AR devices, Project Aria makes it possible to capture such data at large scale with accurate ground truth.

Dr. Paul-Edouard Sarlin

ETH Zurich

Organizers

Richard Newcombe

VP of Research Science, Reality Labs Research, Meta

Edward Miller

Research Product Manager, Reality Labs Research, Meta

James Fort

Research Product Manager, Reality Labs Research, Meta.

Kristen Grauman

Director Research Science, Meta AI, Meta