reading-in-the-wild-columbus
Reading in the Wild - Columbus Subset
https://github.com/aiot-mlsys-lab/reading-in-the-wild-columbus
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
Reading in the Wild - Columbus Subset
Basic Info
- Host: GitHub
- Owner: AIoT-MLSys-Lab
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://www.projectaria.com/datasets/reading-in-the-wild/
- Size: 44.3 MB
Statistics
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Reading in the Wild - Columbus Subset
[📝 Blogpost] [📂 Project Page] [📊 Data Explorer] [📄 Paper]
Introduction
The Reading in the Wild dataset is the first-of-its-kind large-scale multimodal dataset collected using Meta's smart glasses under Project Aria. The dataset contains 100 hours of reading and non-reading egocentric videos in diverse and realistic scenarios. It also includes eye gaze and head pose data collected during reading and non-reading activities. The dataset can be used to develop models not only for identifying reading activities but also for classifying different types of reading actvities in real-world scenarios.
The dataset contains two subsets -- Seattle subset and Columbus subset. This repository is for the Columbus subset. The Seattle subset is maintained separately here.
Overview of Columbus Subset
The Columbus subset contains around 20 hours of data from 31 subjects containing reading and non-reading activities in indoor scenarios. It is collected with the objective for zero-shot experiments. It contains examples of hard negatives (where text is present but is not being read), searching/browsing (which gives confusing gaze patterns), and reading non-English texts (where reading direction differs).
As summarized in the following chart, the Columbus subset contains data collected from reading across three different medium types including digital, print, and objects. It also contains data collected from reading across three different types of contents, including paragraphs which have long continuous text, short texts such as posters and nutrition labels, and non-textual content such as illustrative diagrams.
Comparison to Existing Datasets
Compared to existing egocentric video datasets as well as reading datasets, our dataset is the first reading dataset that contains high-frequency eye-gaze, diverse and realistic egocentric videos, and hard negative (HN) samples.
Models
A base model (v1_default) trained on the training data of the Seattle subset can be
found here.
The model uses a 5° FoV RGB crop (64x64) from the RGB camera of the glasses centered on the wearer's eye gaze, 3D gaze
velocities sampled at 60Hz spanning 2s from the eye tracking cameras and 3D head orientation and velocity sampled at
60Hz spanning 2s from the IMU sensors. The model can selectively work with any combination of these three modalities.
Besides the base model, the following variants are also provided here:
+ v1_1s: uses a shorter 1s span for Gaze data
+ v1_15Hz: uses a lower 15Hz sampling frequency for Gaze data
+ v1_large: uses a larger RGB crop size of 128x128
+ v1_medium: outputs categorical predictions for medium (no-read, 'print, 'digital and objects).
+ v1_mode: outputs categorical predictions for reading modes(no-read, 'walk, 'out-loud, engaged, scan,
write/type and skim).
For details of the base model and its variants, please refer to here.
Getting Started
Setup
Use conda to create a new environment and install the required packages. The codebase has been tested with Python 3.12 and PyTorch 2.4.
commandline
conda env create -f environment.yml
Once the environment is created, activate it:
commandline
conda activate ritw-osu
Download
Dataset
The dataset is hosted at HuggingFace Hub in this page. To download please run:
commandline
python -m ritw.download --config-name config.yaml
An example config file is provided in /config/download.yaml. The dataset will be downloaded to the folder indicated by local_dir. The config file allows you to specify filters to download a subset of the dataset.
Models
Download the models from here and put them inside the models/ folder.
Inference
The inference pipeline is configurable via a config file. An example config is shown below: ```yaml
Example: ../config/config.yaml
conf/config.yaml
starttime: 0.0
snippetgap: 0.01667 # roughly 1/60 seconds
mode: "folder" # folder/single. If mode is single, please provide inputfilename in config. If mode is folder, inference will be done on all files in root dir.
modalities:
- "gaze"
- "imu"
- "rgb"
- "gaze+rgb"
- "gaze+imu"
- "imu+rgb"
- "gaze+imu+rgb"
outputsavepath: "output/"
rootdir: "/path/to/ritw/dataset/"
modelname:
- "v1default"
- "v0"
num_workers: 4 # adjust based on available CPU cores
``
The config allows selecting modalities and models to infer on. Note that, to add new models, put the model in theritw-osu/models` directory and add the model name to the config file.
To run prediction, create a config.yaml file (also see predict.yaml for reference) and save to ritw-osu/config. Then use the following command:
bash
python -m ritw.predict --config-name config.yaml
The command runs each file and model combinations in separate processes. The output is saved in the form of csv files in directory <output_save_path>/<model_name>.
Evaluation
The evaluation module allows you to benchmark the performance of the models based on the inference results. It utilizes metadata from the recordings, applies configurable filters to focus on a specific subset of the dataset, computes various performance metrics for each modality, and outputs a summary table in Markdown format.
Below is an example configuration file (conf/config.yaml) for the evaluation module:
```yaml
Example: ../config/config.yaml
metadatafile: "data/metadata.csv" resultdir: "output/v1default" targetrecall: 0.9 metrics: - "F1" - "Acc" - "P@R=0.9" - "T@R=0.9" - "Acc@R=0.9" - "F1@R=0.9" - "AUC" modalities: - "gaze" - "rgb" - "imu" - "gaze+imu" - "imu+rgb" - "gaze+rgb" - "gaze+imu+rgb" filters: ContainsNonText: - "images" ShortTextOrPara: - "paragraphs" Medium: - "digital" Platform: - "laptop" ```
After setting up your environment and ensuring that the metadata and prediction CSV files are available, run the evaluation module from the project’s root directory:
bash
python -m ritw.evaluate --config-name config.yaml
License
Reading in the Wild - Columbus Subset dataset and code is released by The Ohio State University under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Data and code may not be used for commercial purposes. For more information, please refer to the LICENSE file included in this repository.
Attribution
When using the dataset and code, please attribute it as follows:
@inproceedings{yang25reading,
title={Reading Recognition in the Wild},
author={Charig Yang and Samiul Alam and Shakhrul Iman Siam and Michael Proulx and Lambert Mathias and Kiran Somasundaram and Luis Pesqueira and James Fort and Sheroze Sheriffdeen and Omkar Parkhi and Carl Ren and Mi Zhang and Yuning Chai and Richard Newcombe and Hyo Jin Kim},
booktitle={arXiv Preprint},
year={2025},
url={https://arxiv.org/abs/2505.24848},
}
Owner
- Name: OSU AIoT-MLSys Lab
- Login: AIoT-MLSys-Lab
- Kind: organization
- Location: United States of America
- Website: https://aiot-mlsys-lab.github.io/
- Repositories: 15
- Profile: https://github.com/AIoT-MLSys-Lab
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Reading Recognition in the Wild
message: 'If you use this software, please cite it as below.'
type: dataset
authors:
- given-names: Charig
family-names: Yang
- given-names: Samiul
family-names: Alam
email: alam.140@osu.edu
affiliation: OSU
orcid: 'https://orcid.org/0000-0002-8458-4642'
- given-names: Shakhrul Iman
family-names: Siam
- given-names: Michael J.
family-names: Proulx
- given-names: Lambert
family-names: Mathias
- given-names: Kiran
family-names: Somasundaram
- given-names: Luis
family-names: Pesqueira
- given-names: James
family-names: Fort
- given-names: Sheroze
family-names: Sheriffdeen
- given-names: Omkar
family-names: Parkhi
- given-names: Carl
family-names: Ren
- given-names: Mi
family-names: Zhang
- given-names: Yuning
family-names: Chai
- given-names: Richard
family-names: Newcombe
- given-names: Hyo Jin
family-names: Kim
identifiers:
- type: doi
value: 10.48550/arXiv.2505.24848
repository-code: >-
https://github.com/AIoT-MLSys-Lab/Reading-in-the-Wild-Columbu
url: 'https://www.projectaria.com/datasets/reading-in-the-wild/'
repository-artifact: 'https://huggingface.co/datasets/OSU-AIoT-MLSys-Lab'
abstract: >-
To enable egocentric contextual AI in always-on smart
glasses, it is crucial to be able to keep a record of the
user's interactions with the world, including during
reading. In this paper, we introduce a new task of reading
recognition to determine when the user is reading. We
first introduce the first-of-its-kind large-scale
multimodal Reading in the Wild dataset, containing 100
hours of reading and non-reading videos in diverse and
realistic scenarios. We then identify three modalities
(egocentric RGB, eye gaze, head pose) that can be used to
solve the task, and present a flexible transformer model
that performs the task using these modalities, either
individually or combined. We show that these modalities
are relevant and complementary to the task, and
investigate how to efficiently and effectively encode each
modality. Additionally, we show the usefulness of this
dataset towards classifying types of reading, extending
current reading understanding studies conducted in
constrained settings to larger scale, diversity and
realism.
GitHub Events
Total
- Watch event: 2
- Push event: 12
- Public event: 1
Last Year
- Watch event: 2
- Push event: 12
- Public event: 1