masc-dataset
Code and data for the MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Repository
Code and data for the MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning.
Basic Info
- Host: GitHub
- Owner: Ali-Aahmed
- License: mit
- Language: Python
- Default Branch: main
- Size: 159 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🚀 MASC Dataset: Mobile Application Screen Classification
📌 Overview
MASC (Mobile Application Screen Classification) is a manually curated dataset containing 7,065 mobile UI screens classified into 10 distinct categories. Designed for UI/UX research and ML applications, it enables: - 📱 Accurate screen type classification - 🤖 Automated UI testing - 🎨 Design pattern analysis
🌟 Key Features
- Multi-modal Data: Screenshots + JSON hierarchies + Semantic annotations
- High Quality: 3-step manual validation process ✅
- ML-Ready: Pre-extracted feature vectors for 11 UI characteristics
Design topics overview
| Topic name | Num. UIs | Description | | :--- | ---: | :--- | | Chat | 329 | Chat functionality | | List | 960 | Elements organized in a column | | Login | 889 | Input fields for logging | | Maps | 500 | Geographic display | | Menu | 557 | Items list in an overlay or aside | | Profile | 526 | Info on a user profile or product | | Search | 725 | Search engine functionality | | Settings | 629 | Controls to change app settings | | Welcome | 1084 | First-run experience | | Home | 163 | home screen | | | | | |Total UI=| 7065 | |
📂MASC Dataset
- Full Dataset: Download from kaggle
- Samples:
📸 Raw Screenshot | 📝 Raw JSON | 📊 semantic_JSON
📂 Dataset Structure
The dataset is organized into multiple components, each representing a different aspect of the UI:
- Screenshot Images: High-resolution images (JPG, 540x960 px) capturing the visual design of mobile UIs.
- UI Semantic Annotations (JSON): A JSON file describing all UI components, including buttons, text fields, and icons.
- View Hierarchies (JSON): A DOM-like structure representing parent-child relationships between UI components.
- MASC_Features.csv (CSV): File containing extracted features for each UI.
- Labels.csv (CSV): File containing (Screen Id,class) each UI.
📁 Repository Structure
MASC-Dataset/
├── code/
│ ├── masc_classification.py # Main script for data preprocessing and classification
│ ├── requirements.txt # List of dependencies
│ ├── feature_extraction.py # Script for extracting UI features
│ ├── README.md # Documentation
├── data/
│ ├── raw/ # Original, unprocessed UI data
│ ├── processed/ # Cleaned and structured dataset
├── README.md # Project documentation
├── LICENSE # Usage license
📥 Accessing the Data
1. Public Datasets Used
| Dataset | Description | Link | |---------|-------------|------| | Rico | 72k Android UI screens | Download | | Enrico | 1,460 curated screens | GitHub | | Screen2Words | 112k UI descriptions | Download |
📥 Installation & Setup
Ensure you have Python installed, then install the required dependencies:
bash
pip install -r code/requirements.txt
🚀 Usage
To preprocess data and train the classification model, run:
bash
python code/feature_extraction.py
python code/masc_classification.py
🛠 Dependencies
The project uses the following Python libraries:
text
numpy==1.23.5
pandas==1.5.3
scikit-learn==1.2.2
xgboost==1.7.6
matplotlib==3.7.1
seaborn==0.12.2
nltk==3.8.1
joblib==1.2.0
Citation
If you use this dataset or code in your research, please cite it as follows:
Ahmed, A. (2025). "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning."
Available at: GitHub Repository
DOI: 10.5281/zenodo.14783065
📜 License
This dataset and source code are licensed under the MIT License.
📧 Contact
For questions or collaborations, contact: Ali Ahmed – ali.ahmed.@mu.edu.eg
Owner
- Login: Ali-Aahmed
- Kind: user
- Repositories: 1
- Profile: https://github.com/Ali-Aahmed
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this dataset or code, please cite the following paper (currently under review at The Visual Computer):"
authors:
- family-names: Ahmed
given-names: Ali
orcid: "0009-0007-6020-4477"
title: "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning"
version: "1.0"
repository-code: "https://github.com/username/MASC-Dataset"
doi: "10.5281/zenodo.14783065"
preferred-citation:
type: article
title: "MASC Dataset: A Novel Resource for Classifying Mobile Application Screens using Machine Learning"
authors:
- family-names: Ahmed
given-names: Ali
journal: "The Visual Computer (Under Review)"
year: 2025
note: "This paper is currently under review at The Visual Computer."
doi: "10.5281/zenodo.14783065"
GitHub Events
Total
- Push event: 21
- Create event: 5
Last Year
- Push event: 21
- Create event: 5
Dependencies
- joblib ==1.2.0
- matplotlib ==3.7.1
- nltk ==3.8.1
- numpy ==1.23.5
- pandas ==1.5.3
- scikit-learn ==1.2.2
- seaborn ==0.12.2
- xgboost ==1.7.6