ntu-ureca-code-readability

https://github.com/a-alviento/ntu-ureca-code-readability

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: A-Alviento
Language: R
Default Branch: main
Size: 1.29 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme Citation

URECA Code Readability Tool

This repository contains the necessary files to train a logistic regression model for code readability assessment with a simple rule-based feedback tool based on the trained model.

This project is designed to assess the readability of code using machine learning techniques. It utilizes a logistic regression model trained on a dataset of code snippets. The trained model is then applied to any given code snippet to provide a readability feedback.

Getting Started

Navigate Project

r/ - Contains the R files used to train the logistic regression model for the readability tool.
features-generator.ipynb - A Jupyter Notebook to generate features from the code snippets in the dataset folder.
feedback_tool.ipynb - A Jupyter Notebook that provides readability feedback on given code snippets.
feedback_tool.py - A Python file version of feedback_tool.ipynb for use with streamlit deployment.
streamlit_deploy.py- A Python file for deploying the feedback tool as a Streamlit web application.

Installations

Before getting started, you'll need to install the following Python packages. You can do this by running pip install <package-name> for each one:

spacy
pyspellchecker
streamlit
indexer

Next, download en_core_web_sm model in the Natural Language Processing library, spaCy by running:

python -m spacy download en_core_web_sm

Run

Clone the repository:

git clone <repository-url>

Navigate to the project directory:

cd URECA-Code-Readability-New

Run the Streamlit application:

streamlit run streamlit_deploy.py

Open the provided local URL in your web browser to interact with the application.

Recommended Workflow

The features-generator.ipynb notebook extracts features from the code snippets in the dataset folder, outputting them to a CSV file (feature_matrix_x in the r/ folder) for training the logistic regression model.
The logistic regression model is trained using the R files in the r/ folder, taking the generated CSV file as input.
The feedback_tool.ipynb notebook uses the trained model to provide readability feedback on given code snippets.

References

S. Scalabrino, M. Linares-Vásquez, R. Oliveto, and D. Poshyvanyk, “A comprehensive model for code readability,” J. Softw. Evol. Process, vol. 30, no. 6, p. e1958, 2018, doi: 10.1002/smr.1958.

Owner

Name: Adrian Alviento
Login: A-Alviento
Kind: user

Repositories: 10
Profile: https://github.com/A-Alviento

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
authors:
  - family-names: "Alviento"
    given-names: "Adrian"
    affiliation: "Nanyang Technological University"
title: "Improving Code Readability for Novice Coders: A Tool for Actionable Feedback"
version: 1.0.0
date-released: "2023-04-07"

repository-code:
  type: "Software"
  url: "https://github.com/your_username/ureca-code-readability-new"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science