Featured label
Research

Introducing “Reading Recognition in the Wild”: A dataset for understanding human behaviors during reading from egocentric sensor suite

June 4, 2025 · 5 min read

Key Takeaways

    • Reading in the Wild is a large-scale multimodal dataset comprising 100 hours of reading and non-reading videos captured in diverse and realistic scenarios using Project Aria from 100+ participants.
    • The dataset features video, eye gaze, and head pose sensor outputs, created to help solve the task of reading recognition from wearable devices. Notably, this is the first egocentric dataset to feature high-frequency eye-tracking data collected at 60 Hz.
    • We have developed a lightweight and flexible model for reading recognition with high precision and recall by utilizing RGB, eye gaze, and head pose data. Detailed performance analysis and capabilities of the model are available in our technical report.

Reading Recognition for Advancing Contextual AI

Reading is a fundamental human activity that forms the basis of communication, entertainment, and learning. It spans a wide range of mediums, from handwritten notes and printed books to digital screens and environmental signage. The future of AI relies heavily on understanding the user's physical context. Smart glasses, like the Meta Ray-Ban, offer a promising form factor that bridges visual AI with the real world, enabling personalized, contextual experiences. In this context, the ability to recognize when the user is reading is essential for developing AI that is truly personalized and contextually aware. This capability allows devices to understand what a user has read and where they can assist, enhancing the user's interaction with the world.

A significant challenge in this domain is the scarcity of data for large-scale, real-world reading recognition. Previous reading datasets were collected in controlled settings, where participants read lengthy texts on large screens with front-facing eye-tracking cameras. This approach limits the scale, diversity, and realism of the data. Moreover, effective reading recognition requires leveraging multi-modal information. Simply having text in the field of view does not guarantee that the user is reading it, making it difficult to to solve using visual information alone. Similarly, a horizontal eye gaze pattern does not always indicate reading activity. Additionally, any solution must be efficient enough for real-time, always-on operation, given the practical constraints of all-day wearable devices. Current Optical Character Recognition (OCR) technology demands high-resolution images, resulting in significant power consumption. However, reading recognition can act as an efficient proxy, triggering OCR only when the user is actively engaged with reading material.

Today, we are excited to announce the release of Reading Recognition in the Wild, a dataset created with Project Aria glasses. This dataset features 100 hours of egocentric video, along with eye gaze, head pose, audio, IMU, magnetometer, and barometer data, all captured from real-world scenarios using the Project Aria sensor suite. This dataset is not just a collection of data; it is a testament to our commitment to advancing thought leadership in the AI community, particularly in the realm of egocentric AI.

A First-of-Its-Kind Dataset

The Reading in the Wild dataset is the first to feature high-frequency eye-tracking data (60z) with Project Aria glasses. This dataset encompasses 100 hours of reading and non-reading activities captured in diverse and realistic scenarios from 111 participants. It includes three key modalities: egocentric RGB video, eye gaze, and head pose data, along with other sensor data from the Project Aria sensor suite, including audio, IMU, magnetometer, and barometer data.

New, High Performance Model for Reading Recognition

In conjunction with the dataset, we have developed a lightweight, flexible model for reading recognition that achieves high precision and recall. This model leverages RGB, eye gaze, and head pose data, either individually or in combination. A comprehensive analysis of its performance and capabilities is provided in our technical report.

Unlocking New Applications

Achieving reading recognition makes it feasible to keep a record of a user's reading interactions, building a contextually aware AI. This capability opens doors to several applications, such as reading assistant tools for children with learning difficulties and people with low vision. It can also track whether a user has read crucial information, like signs while driving, and measure attention and distraction during tasks.

Additionally, our dataset and method can be extended to classify different types of reading, a topic of interest in cognitive studies of reading comprehension. Unlike previous studies limited to controlled environments, our dataset allows for reading mode and medium classification in unconstrained settings, providing valuable experimental results in this direction.

Conclusion

The Reading in the Wild dataset is more than just data; it is a catalyst for innovation in the field of egocentric AI. We are excited to see how the research community will utilize this resource to push the boundaries of what is possible with AI. Together, we can create AI systems that are not only more intelligent but also more attuned to the nuances of human interaction and perception.