masc-dataset

Code and data for the MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning.

https://github.com/ali-aahmed/masc-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Code and data for the MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning.

Basic Info

Host: GitHub
Owner: Ali-Aahmed
License: mit
Language: Python
Default Branch: main
Size: 159 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

🚀 MASC Dataset: Mobile Application Screen Classification

📌 Overview

MASC (Mobile Application Screen Classification) is a manually curated dataset containing 7,065 mobile UI screens classified into 10 distinct categories. Designed for UI/UX research and ML applications, it enables: - 📱 Accurate screen type classification - 🤖 Automated UI testing - 🎨 Design pattern analysis

🌟 Key Features

Multi-modal Data: Screenshots + JSON hierarchies + Semantic annotations
High Quality: 3-step manual validation process ✅
ML-Ready: Pre-extracted feature vectors for 11 UI characteristics

Design topics overview

| Topic name | Num. UIs | Description | | :--- | ---: | :--- | | Chat | 329 | Chat functionality | | List | 960 | Elements organized in a column | | Login | 889 | Input fields for logging | | Maps | 500 | Geographic display | | Menu | 557 | Items list in an overlay or aside | | Profile | 526 | Info on a user profile or product | | Search | 725 | Search engine functionality | | Settings | 629 | Controls to change app settings | | Welcome | 1084 | First-run experience | | Home | 163 | home screen | | | | | |Total UI=| 7065 | |

📂MASC Dataset

Full Dataset: Download from kaggle
Samples:
📸 Raw Screenshot | 📝 Raw JSON | 📊 semantic_JSON

📂 Dataset Structure

The dataset is organized into multiple components, each representing a different aspect of the UI:

Screenshot Images: High-resolution images (JPG, 540x960 px) capturing the visual design of mobile UIs.
UI Semantic Annotations (JSON): A JSON file describing all UI components, including buttons, text fields, and icons.
View Hierarchies (JSON): A DOM-like structure representing parent-child relationships between UI components.
MASC_Features.csv (CSV): File containing extracted features for each UI.
Labels.csv (CSV): File containing (Screen Id,class) each UI.

📁 Repository Structure

MASC-Dataset/ ├── code/ │ ├── masc_classification.py # Main script for data preprocessing and classification │ ├── requirements.txt # List of dependencies │ ├── feature_extraction.py # Script for extracting UI features │ ├── README.md # Documentation ├── data/ │ ├── raw/ # Original, unprocessed UI data │ ├── processed/ # Cleaned and structured dataset ├── README.md # Project documentation ├── LICENSE # Usage license

📥 Accessing the Data

1. Public Datasets Used

| Dataset | Description | Link | |---------|-------------|------| | Rico | 72k Android UI screens | Download | | Enrico | 1,460 curated screens | GitHub | | Screen2Words | 112k UI descriptions | Download |

📥 Installation & Setup

Ensure you have Python installed, then install the required dependencies: bash pip install -r code/requirements.txt

🚀 Usage

To preprocess data and train the classification model, run: bash python code/feature_extraction.py python code/masc_classification.py

🛠 Dependencies

The project uses the following Python libraries: text numpy==1.23.5 pandas==1.5.3 scikit-learn==1.2.2 xgboost==1.7.6 matplotlib==3.7.1 seaborn==0.12.2 nltk==3.8.1 joblib==1.2.0

Citation

If you use this dataset or code in your research, please cite it as follows:

Ahmed, A. (2025). "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning."
Available at: GitHub Repository
DOI: 10.5281/zenodo.14783065

📜 License

This dataset and source code are licensed under the MIT License.

📧 Contact

For questions or collaborations, contact: Ali Ahmed – ali.ahmed.@mu.edu.eg

Owner

Login: Ali-Aahmed
Kind: user

Repositories: 1
Profile: https://github.com/Ali-Aahmed

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset or code, please cite the following paper (currently under review at The Visual Computer):"
authors:
  - family-names: Ahmed
    given-names: Ali
    orcid: "0009-0007-6020-4477"
title: "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning"
version: "1.0"
repository-code: "https://github.com/username/MASC-Dataset"
doi: "10.5281/zenodo.14783065"
preferred-citation:
  type: article
  title: "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning"
  authors:
    - family-names: Ahmed
      given-names: Ali
  journal: "The Visual Computer (Under Review)"
  year: 2025
  note: "This paper is currently under review at The Visual Computer."
  doi: "10.5281/zenodo.14783065"

GitHub Events

Total

Push event: 21
Create event: 5

Last Year

Push event: 21
Create event: 5

Dependencies

code/requirements.txt pypi

joblib ==1.2.0
matplotlib ==3.7.1
nltk ==3.8.1
numpy ==1.23.5
pandas ==1.5.3
scikit-learn ==1.2.2
seaborn ==0.12.2
xgboost ==1.7.6

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science