mediacontentatlas

Code for Media Content Atlas

https://github.com/mediacontentatlas/mediacontentatlas

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.4%) to scientific vocabulary

Keywords

digitalmedia interactive-visualizations multimodal-large-language-models

Last synced: 6 months ago · JSON representation ·

Repository

Code for Media Content Atlas

Basic Info

Host: GitHub
Owner: mediacontentatlas
License: mit
Language: Python
Default Branch: main
Homepage: https://mediacontentatlas.github.io
Size: 1.45 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

digitalmedia interactive-visualizations multimodal-large-language-models

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

Media Content Atlas (MCA) 📱🗺️

A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs

Media Content Atlas (MCA) is a first-of-its-kind pipeline that enables large-scale, AI-driven analysis of digital media experiences using multimodal LLMs. It combines recent advances in machine learning and visualization to support both open-ended and hypothesis-driven research into screen content and behavior.

🔗 Website & Demo: mediacontentatlas.github.io
🎥 Quick Video Explanation: Watch on YouTube
📄 Paper: Preprint
⏩ See Quickstart Tutorial here

📎 Citation: Cerit, M., Zelikman, E., Cho, M., Robinson, T. N., Reeves, B., Ram, N., & Haber, N. (2025). Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). ACM. https://doi.org/10.1145/3706599.3720055

🔍 Overview

Built on 1.12 million smartphone screenshots collected from 112 adults over a month, MCA enables researchers to:

Perform content-based clustering and topic modeling using semantic and visual signals
Automatically generate descriptions of screen content
Search and retrieve content across individuals and moments
Visualize digital media behavior with an interactive dashboard

Expert reviewers rated MCA's clustering results 96% relevant and AI-generated descriptions 83% accurate.

MCA Pipeline

🗂️ Code Structure

The pipeline is fully modular, with standalone scripts and notebooks for each stage:

1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.

2. 📦 `mca_pipeline/` – Core Components

| Stage | Script | Description | |-------|--------|-------------| | 🖼️ Embedding | anonymized_clip_embedding_generation.py | Generate visual embeddings using CLIP | | 📝 Captioning | anonymized_description_generation.py | Generate descriptions using LLaVA-OneVision | | 🔠 Embedding | anonymized_description_embedding_generation.py | Generate sentence embeddings using GTE-Large | | 🧵 Clustering | anonymized_clustering_topicmodeling_example.py | Cluster and label screenshots using BERTopic + LLaMA2 | | 📊 Visualization | anonymized_create_interactive_visualizations.ipynb | Create an interactive dashboard using DataMapPlot | | 🔍 Retrieval | anonymized_image_retrieval_app.py | Retrieve screenshots using visual or textual similarity |

3. 🧪 `expert_surveys/` – Evaluation Instruments

| File | Description | |------|-------------| | anonymized_survey1.py | Survey for cluster label relevance | | anonymized_survey2.py | Survey for description accuracy | | anonymized_survey3.py | Survey for retrieval performance |

🙋‍♀️ Questions or Feedback?

We’d love to hear from you! Feel free to:

💬 Open an issue for bugs, suggestions, or feature requests
📬 Email us: mervecer@stanford.edu
🌐 Explore the lite demo: mediacontentatlas.github.io

🛠️ Roadmap

Here’s what’s next for MCA, let us know if you'd like collaborate:

🔁 Reproducibility updates for easier setup
🧩 Customization utilities (label editing, filters, user tagging)
📈 Longitudinal visualizations to explore media patterns over time Stay tuned! ⭐ Star this repo to keep up with updates.

📚 Citation

If you use MCA in your research, please cite the CHI 2025 paper:

```bibtex @inproceedings{cerit2025mca, author = {Merve Cerit and Eric Zelikman and Mu-Jung Cho and Thomas N. Robinson and Byron Reeves and Nilam Ram and Nick Haber}, title = {Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs}, booktitle = {Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)}, year = {2025}, month = {April}, location = {Yokohama, Japan}, publisher = {ACM}, address = {New York, NY, USA}, pages = {19}, doi = {10.1145/3706599.3720055} }

Owner

Login: mediacontentatlas
Kind: user

Repositories: 1
Profile: https://github.com/mediacontentatlas

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software or data, please cite the CHI EA 2025 paper below."
title: "Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs"
authors:
  - family-names: "Cerit"
    given-names: "Merve"
  - family-names: "Zelikman"
    given-names: "Eric"
  - family-names: "Cho"
    given-names: "Mu-Jung"
  - family-names: "Robinson"
    given-names: "Thomas N."
  - family-names: "Reeves"
    given-names: "Byron"
  - family-names: "Ram"
    given-names: "Nilam"
  - family-names: "Haber"
    given-names: "Nick"
date-released: 2025-01-24
version: "1.0.0"
doi: "10.1145/3706599.3720055"
repository-code: "https://github.com/mediacontentatlas/mediacontentatlas"
conference: "Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25)"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

mediacontentatlas

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Media Content Atlas (MCA) 📱🗺️

🔍 Overview

🗂️ Code Structure

1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.

2. 📦 `mca_pipeline/` – Core Components

3. 🧪 `expert_surveys/` – Evaluation Instruments

🙋‍♀️ Questions or Feedback?

🛠️ Roadmap

📚 Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

mediacontentatlas

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Media Content Atlas (MCA) 📱🗺️

🔍 Overview

🗂️ Code Structure

1. ⏩ Check out Quickstart Tutorial on Google Colab with Free T4.

2. 📦 mca_pipeline/ – Core Components

3. 🧪 expert_surveys/ – Evaluation Instruments

🙋‍♀️ Questions or Feedback?

🛠️ Roadmap

📚 Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

2. 📦 `mca_pipeline/` – Core Components

3. 🧪 `expert_surveys/` – Evaluation Instruments