masc-dataset

Code and data for the MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning.

https://github.com/ali-aahmed/masc-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Code and data for the MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning.

Basic Info
  • Host: GitHub
  • Owner: Ali-Aahmed
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 159 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

🚀 MASC Dataset: Mobile Application Screen Classification

License: MIT DOI


📌 Overview

MASC (Mobile Application Screen Classification) is a manually curated dataset containing 7,065 mobile UI screens classified into 10 distinct categories. Designed for UI/UX research and ML applications, it enables: - 📱 Accurate screen type classification - 🤖 Automated UI testing - 🎨 Design pattern analysis

🌟 Key Features

  • Multi-modal Data: Screenshots + JSON hierarchies + Semantic annotations
  • High Quality: 3-step manual validation process ✅
  • ML-Ready: Pre-extracted feature vectors for 11 UI characteristics

Design topics overview

| Topic name | Num. UIs | Description | | :--- | ---: | :--- | | Chat | 329 | Chat functionality | | List | 960 | Elements organized in a column | | Login | 889 | Input fields for logging | | Maps | 500 | Geographic display | | Menu | 557 | Items list in an overlay or aside | | Profile | 526 | Info on a user profile or product | | Search | 725 | Search engine functionality | | Settings | 629 | Controls to change app settings | | Welcome | 1084 | First-run experience | | Home | 163 | home screen | | | | | |Total UI=| 7065 | |

📂MASC Dataset

📂 Dataset Structure

The dataset is organized into multiple components, each representing a different aspect of the UI:

  • Screenshot Images: High-resolution images (JPG, 540x960 px) capturing the visual design of mobile UIs.
  • UI Semantic Annotations (JSON): A JSON file describing all UI components, including buttons, text fields, and icons.
  • View Hierarchies (JSON): A DOM-like structure representing parent-child relationships between UI components.
  • MASC_Features.csv (CSV): File containing extracted features for each UI.
  • Labels.csv (CSV): File containing (Screen Id,class) each UI.

📁 Repository Structure

MASC-Dataset/ ├── code/ │ ├── masc_classification.py # Main script for data preprocessing and classification │ ├── requirements.txt # List of dependencies │ ├── feature_extraction.py # Script for extracting UI features │ ├── README.md # Documentation ├── data/ │ ├── raw/ # Original, unprocessed UI data │ ├── processed/ # Cleaned and structured dataset ├── README.md # Project documentation ├── LICENSE # Usage license

📥 Accessing the Data

1. Public Datasets Used

| Dataset | Description | Link | |---------|-------------|------| | Rico | 72k Android UI screens | Download | | Enrico | 1,460 curated screens | GitHub | | Screen2Words | 112k UI descriptions | Download |

📥 Installation & Setup

Ensure you have Python installed, then install the required dependencies: bash pip install -r code/requirements.txt

🚀 Usage

To preprocess data and train the classification model, run: bash python code/feature_extraction.py python code/masc_classification.py

🛠 Dependencies

The project uses the following Python libraries: text numpy==1.23.5 pandas==1.5.3 scikit-learn==1.2.2 xgboost==1.7.6 matplotlib==3.7.1 seaborn==0.12.2 nltk==3.8.1 joblib==1.2.0

Citation

If you use this dataset or code in your research, please cite it as follows:

Ahmed, A. (2025). "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning."
Available at: GitHub Repository
DOI: 10.5281/zenodo.14783065

📜 License

This dataset and source code are licensed under the MIT License.

📧 Contact

For questions or collaborations, contact: Ali Ahmed – ali.ahmed.@mu.edu.eg

Owner

  • Login: Ali-Aahmed
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset or code, please cite the following paper (currently under review at The Visual Computer):"
authors:
  - family-names: Ahmed
    given-names: Ali
    orcid: "0009-0007-6020-4477"
title: "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning"
version: "1.0"
repository-code: "https://github.com/username/MASC-Dataset"
doi: "10.5281/zenodo.14783065"
preferred-citation:
  type: article
  title: "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning"
  authors:
    - family-names: Ahmed
      given-names: Ali
  journal: "The Visual Computer (Under Review)"
  year: 2025
  note: "This paper is currently under review at The Visual Computer."
  doi: "10.5281/zenodo.14783065"

GitHub Events

Total
  • Push event: 21
  • Create event: 5
Last Year
  • Push event: 21
  • Create event: 5

Dependencies

code/requirements.txt pypi
  • joblib ==1.2.0
  • matplotlib ==3.7.1
  • nltk ==3.8.1
  • numpy ==1.23.5
  • pandas ==1.5.3
  • scikit-learn ==1.2.2
  • seaborn ==0.12.2
  • xgboost ==1.7.6