https://github.com/alicerunsonfedora/gh-twilight
Predict a repository's size based on a contributor's commit history.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Repository
Predict a repository's size based on a contributor's commit history.
Basic Info
- Host: GitHub
- Owner: alicerunsonfedora
- License: mpl-2.0
- Language: Python
- Default Branch: master
- Homepage: https://marquiskurt.net/gh-twilight/
- Size: 111 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Project Twilight
Project Twilight is a machine learning experiment for my DMC345 (Intro to Machine Learning) course that tries to predict a Git repository's size by reading a list of numbers that represent commits in a week. This tool makes use of SciKit Learn, matplotlib, and NumPy to handle data manipulation and analysis and PyGithub to fetch data from GitHub.
Installation
Install via PyPI (TBD)
Run pip install gh-twilight to install the tool to your Python environment.
Build from source
Requirements
- Python 3.7 or higher
- Poetry package manager
Clone the repository and run poetry install in the project root to set up the environment and install dependencies.
Creating a configuration file
To create the config file that Project Twilight uses, run gh-twilight --generate in your terminal. The configuration utility will help you set up some details about what models you want to use for training, your GitHub access token, and how you want to predict your data.
Running the utility
Configuration arguments
--config CONFIG: The path to the configuration file to use.--generate: Runs the interactive configuration utility.
Analysis and prediction arguments
--plot: Creates plot graphs of predicted and testing data from training the network.--predict: Run predictions on the data provided in the configuration file.
Extra arguments
--log-file LOG_FILE: The path to where you want the logs to be store. Omitting this argument will disable logging.--csv: Exports the raw dataset to a CSV file before analysis.--json: Exports the raw dataset to a JSON file before analysis.
Sparkle configuration file
The configuration file (in TOML syntax) contains important information on how to collect data, what data to collect, and how to run analysis and predictions. There are three important keys in the configuration file:
config.account: Includes GitHub personal token and Git username.config.activities: Includes what repository to use as training data and what models to use.config.predictions: Includes what model to use to make predictions and inputs to predict.
Account information
The config.account section includes the following keys:
git_name: The Git username that made the commits to the repositorytoken: The GitHub personal token with therepopermission.
Activity configuration
The config.activities section includes the following keys:
models: A list of strings containing what models to use. Valid options areforest,neural, andlinear.repos: A list of strings containing the repositories on GitHub to use as training data.
Prediction configuration
The config.predictions section includes the following keys:
method: The model to use to make predictions. Valid options areforest,neural,linear, andbest.- Using
bestwill automatically determine the best model to use by using the model with the highest R2 accuracy score.
- Using
inputs: A list of dictionaries that contain the input values to predict. The dictionary should have the following keys:name: The name of the repository. This does not need to point to a real repository on GitHub.commits: A list containing seven integers that represent how many commits are made on the weekdays if all weeks are combined. For example, if a user make two commits to a repository every day for two weeks, the commits list should be[4, 4, 4, 4, 4, 4, 4].
An example configuration may look like the following: ```toml [config.account] git_name = "Twilight Sparkle" token = "githash"
[config.activities] models = [ "forest", "linear" ] repos = [ "equestria/friendship.equ", "equestria/governance" ]
[config.predictions] method = "best" inputs = [{ name = "equestria/journal", commits = [1, 13, 9, 8, 7, 12, 8] }] ```
License
Project Twilight is free and open-source software licensed under the Mozilla Public License, v2.0.
Owner
- Name: Marquis Kurt
- Login: alicerunsonfedora
- Kind: user
- Location: Bear, DE
- Website: http://www.marquiskurt.net
- Repositories: 100
- Profile: https://github.com/alicerunsonfedora
[mar.kɪs kɚrt] He/him. iOS app and game developer.
GitHub Events
Total
Last Year
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Marquis Kurt | s****e@m****t | 12 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0