Nymeria Dataset


A massive dataset of multimodal egocentric daily motion in the wild

Nymeria is the world's largest dataset of human motion in the wild, capturing diverse people engaging in diverse activities across diverse locations. It is first of its kind to record body motion using multiple egocentric multimodal devices, all accurately synchronized and localized in one single metric 3D world. Nymeria also holds the distinction of the world's largest dataset of motion-language descriptions, featuring hierarchical in-context narration.

The dataset is designed to accelerate research in egocentric human motion understanding and present exciting new challenges beyond body tracking, motion synthesis and action recognition. It aims to advance contextualized computing and pave the way for future AR/VR technology.


Dataset Highlights

  • 300 hours of daily activity
  • 3600 hours of video data
  • 1200 sequences
  • 264 participants
  • 50 indoor and outdoor locations
  • 20 scenarios
  • 230 hours motion with natural language descriptions
  • 310.5 K sentences
  • 8.64 M words
  • 400 Km traveling trajectory
  • 1053 Km wrist motion
  • Parametric human model powered by Meta Momentum

300 hours of daily activities

Project Aria and Machine Perception Services

Project Aria was utilized as a lightweight headset to record multimodal data, including 1 RGB video, 2 grayscale videos, 2 eye-tracking videos, 2 IMUs, 1 magnetometer, 1 barometer and audio. The recordings were processed by Aria Machine Perception Service to obtain accurate 6 DoF device trajectory, semi-dense point clouds, and eye gaze estimation with depth.

A screenshot of single Aria with MPS output

Novel 'miniAria' wristbands to resemble future wearables

The miniAria wristbands were developed by repacking the electronics and sensors of Project Aria into a wristband form factor to capture egocentric data from wrist. This novel setup is motivated by the potential of future wearable devices and to obtain accurate wrist motion to improve body tracking algorithms.

A screenshot of miniAria wristband with MPS output

Inertial-based motion capture for full-body kinematics

The XSens MVN Link mocap suit was used to record high-quality body motion in the wild. The suit estimates full-body kinematics using 17 inertial trackers and a magnetometer. An optimization was developed to register the global motion tracking into the same coordinates as Project Aria and miniAria wristbands.

A screenshot of XSens mocap with motion kinematics

Motion retarget to parametric human model powered by Momentum

Leveraging Meta Momentum library, an optimization is developed for retargeting skeleton motion from XSens to a full parametric human model.

A screenshot of momentum retarget

An observer with Project Aria for third-person perspective

An observer is added to each recording, who follows the participants as the moving camera and interact with them as needed. They provided a holistic view of the action. All recording devices are optimized to align in a single metric 3D world, and accurately synchronized via hardware solutions.

A screenshot of observer with MPS output

In-context motion-language descriptions

To connect motion understanding with natural languages, annotators add multi-level text descriptions by watching the playback rendering of synchronized egocentric view, third-person view and body motion rendering. Videos are densely segmented into short clips to describe the participants' body poses, focus attentions, interactions and atomic actions, as well as into long clips to summarize the activity.

Demo of in-context motion-language annotation

Representing the rich diversity of everyday life

Diverse scenarios

To capture natural authentic motion and interactions, 20 scenarios were designed for common daily activity with high-level descriptions. Each recording is 15 minutes long. The following demo shows 1 subject performing 6 different scenarios.

Diverse participants

A total of 264 participants were recruited to capture how different people perform the same activities in various manners. The demographics are balanced in terms of gender, ethnicity, weight, height and age. The following demo shows 6 subjects performing 3 sets of activities: badminton, cooking and party decorations.

Diverse locations

diverse locations

In total, 47 single-family houses of different layout were rented, comprising 201 rooms, 45 gardens and 37 multi-story houses. Each location contributed 4 to 15 hours of recordings. In the examples, all device trajectories are overlaid to show the density of actions. The clusters of head (red), left wrist (green) and right wrist (blue) merges naturally as expected.

diverse locations

Additionally, 3 locations from an open-space campus were captured, including a cafeteria with an outdoor patio, a multi-level office building, and a parking lot connected to multiple hiking/biking trails. Together, the video shows different subjects performing various activities in these locations.

Natural language descriptions

The dataset used 3 annotation tasks to add natural language descriptions to motion. providing context from coarse to fine levels. These tasks range from detail-oriented narration for body poses, to simplified atomic action, and high-level activity summarization. The hierarchical schema aims to enhance motion understanding at of different granularity. Word cloud visualizations offer insight into the narrations.

motion narration word cloud

Motion narration

39 hours, 117.2 K sentences, 2.72 M words, 3739 vocabulary

activity summarization word cloud

Activity summarization

196 hours, 22.6 K sentences, 0.45 M words, 3168 vocabulary

atomic action word cloud

Atomic action

207 hours, 170.6 K sentences, 5.47 M words, 5129 vocabulary

combined word cloud


230 hours, 310.5 K sentences, 8.64 M words, 6545 vocabulary

Created with privacy and ethics in mind

Nymeria dataset was collected with rigorous privacy and ethics policy. We strictly follow Project Aria Research guideline. Prior to data collection, formal consent was obtained from participants and home owners regarding data recording and usage. Data was collected and stored with de-identification. EgoBlur was used to blur faces and license plates for all videos.

face blur example

Learn more about Nymeria


Subscribe to Project Aria for updates

CODE (coming soon)
DATA (coming soon)
A screenshot of nymeria research paper

Subscribe to Project Aria Updates

Stay in the loop with the latest news from Project Aria.

By providing your email, you agree to receive marketing related electronic communications from Meta, including news, events, updates, and promotional emails related to Project Aria. You may withdraw your consent and unsubscribe from these at any time, for example, by clicking the unsubscribe link included on our emails. For more information about how Meta handles your data please read our Data Policy.