patchtrack

PatchTrack: A Comprehensive Analysis of ChatGPT’s Influence on Pull Request Outcomes

https://github.com/unlv-evol/patchtrack

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

PatchTrack: A Comprehensive Analysis of ChatGPT’s Influence on Pull Request Outcomes

Basic Info

Host: GitHub
Owner: unlv-evol
License: mit
Language: HTML
Default Branch: main
Size: 38 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

PatchTrack: A Comprehensive Analysis of ChatGPT’s Influence on Pull Request Outcomes

GitHub last commit (branch) GitHub License

Abstract (Short)

The rapid adoption of large language models (LLMs) like ChatGPT has introduced new dynamics in software development, particularly within pull request workflows. While prior research has examined the quality of AI-generated code, little is known about how developers actually use these suggestions in real-world collaboration. We analyze 338 pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage, including 645 AI-generated snippets and 3,486 developer-authored patches. We introduce PatchTrack, a tool that classifies whether ChatGPT patches were applied, not applied, or not suggested, enabling fine-grained analysis of AI-assisted decisions. Full adoption of ChatGPT code is rare: the median integration rate was 25%. A qualitative analysis of 89 pull requests with integrated patches revealed recurring patterns of structural integration, selective extraction, and iterative refinement, showing that developers typically treat ChatGPT’s output as a starting point rather than a final implementation.

Directory Structure and Description

. ├── LICENSE # The License for the tool - MIT License ├── PatchTrack.py # Main entrypoint of the tool ├── README.md # Readme file to describe how the tool work ├── RQ1_2_3 # Contains all the results for RQ1, RQ2 and RQ3 ├── analyzer # Directory for core modules of the tool │ ├── __init__.py │ ├── analysis.py # Plotting the classification result │ ├── classifier.py # Classifies the dataset as PA, PN or NE │ ├── common.py # Setting n-grams, file types, etc │ ├── constant.py # All the constant variables used in the tool │ ├── dataDict.py # Keep track of extracted pr-project-pair information │ ├── helpers.py # All helper functions e.g. api requests, normalization, etc. │ ├── main.py # Main PatchTrack class │ ├── patchLoader.py # Parse and tokenize PR patches. The file should be a diff format │ ├── sourceLoader.py # Parse and tokenize ChatGPT code snippet │ └── totals.py # Compute total number of PA, PN or NE classified ├── bin │ └── os-packages.sh # OS-specific dependencies ├── dataprep │ ├── __init__.py │ ├── allPullRequestSharings.zip # DevGPT and Extended json dataset │ ├── load.py # Functions for loading datasets │ ├── manual # Procedures on how to generate the extended dataset │ └── patches.zip # Extracted ChatGPT code snippest and PR patches ├── notebooks # Notebooks containing code for running the experiments │ ├── __init__.py │ └── run_experiment.ipynb ├── output # figures and csv files of all the results of running the tool ├── requirements.txt # List of python dependencies required by the tool ├── tests # Testing different component of the tool. This is still a WIP └── tokens-example.txt # Example file containing GitHub API tokens seperated by a comma (,). This should be renamed to `tokens.txt`

Setting up

To setup and test PatchTrack tool on your local computer, following the steps below:

Get the code

The easiest way is using the git clone command:

bash git clone https://github.com/replication-pack/PatchTrack.git

Minimum System Requirements

Operating System: Mac0SX, Linux, Windows
RAM: >= 4 GB
Storage: >= 1 GB
Processor: CPU 1.18 GHz or greater #### Other tools
Git, Python >= 3.10 ### Python Virtual Environment Let's set python virtual environment;

```bash cd PatchTrack/

python3 -m venv venv ``` Activate the virtual environment

bash source venv/bin/activate

Dependencies and Dataset

PatchTrack consist of two categories of depencies i.e. (i) OS specific dependencies and (ii) development dependencies. The OS specific dependency is libmagic. To dependencis will be installed automatically when you start the tool.

Datasets are stored in the dataprep directory in zipped files. This will be automatically extracted and placed in the right directory using the step below. Now, let us install the dependencies and load the required datasets.

bash python PatchTrack.py --init The above command will install all the required packages, set directories and unzip datasets for the smooth execution of the tool. Note: PatchTrack has been tested on python >= 3.10

You can also install the OS-specific libraries manually on Ubuntu/Debian or MacOS X, by runing the shell script in the bin directory.

bash cd bin/ chmod +x os-package.sh ./os-package.sh The above code will automatically detect the OS (Linux or MacOS X) and install the libraries. Note: This is required before installing development specific dependencies.

Running the tool

Notebook - Reproducing the results

This is the easiest approach to test the tool and reproduce the classification results presented in the paper. In the notebooks directory, simply run the run_experiment.ipynb file.

Console

If you wish to play with PatchTrack, there are a number of command line arguments for configuring different part of the tool. Some of those option are actively being improved. We are developing a comprehensive documentation of the tool which will be uploaded soon.

List of Commands

Information of the other command line options are provided by using -h or --help ``` usage: PatchTrack.py [-h] [-i] [-n NUM] [-c NUM] [-v] [-p STR] [-s SOURCE_PATH] [-r]

options: -h, --help show this help message and exit -i, --init setup required datasets and directories. This command should be executed at least once -n NUM, --ngram NUM use n-gram of NUM lines (default: 1) -c NUM, --context NUM print NUM lines of context (default: 10) -v, --verbose enable verbose mode (default: False) -p STR, --patchpath STR path to ChatGPT and PR patch files (default: data/patches) -s SOURCEPATH, --sourcepath SOURCEPATH path to json files of extracted ChatGPT conversations and code snippets (default: data/extracted) -r, --restore restore default setting, files and directories ```

GitHub Tokens

We use GitHub tokens when extracting PR patches. This allows for higher rate limit because of the high number of requests to the GitHub API. Tokens can be set in the tokens.txt file seperated by a comman. The user can add as many tokens as needed. A minimal of 2 tokens can be used to safely execute code and to make sure that the rate limit is not reached for a token.

License

This repository is MIT licensed. See the LICENSE file for more information.

Owner

Name: UNLV EVOL
Login: unlv-evol
Kind: organization
Email: john.businge@unlv.edu

Website: https://johnxu21.github.io/evol/
Repositories: 1
Profile: https://github.com/unlv-evol

Advancing Empirical Software Engineering Research

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ogenrwot"
  given-names: "Daniel"
  orcid: "https://orcid.org/0000-0002-0133-8164"
- family-names: "Businge"
  given-names: "John"
  orcid: "https://orcid.org/0000-0003-3206-7085"
title: "Replication Package for PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes"
version: 1.0.0
doi: 10.5281/zenodo.14978625
date-released: 2025-03-6
url: "https://github.com/evol-lab/PatchTrack"

GitHub Events

Total

Delete event: 1
Push event: 4
Create event: 4

Last Year

Delete event: 1
Push event: 4
Create event: 4

Dependencies

requirements.txt pypi

PyDriller *
beautifulsoup4 *
bitarray *
ipykernel *
ipython *
matplotlib *
mkdocs *
mkdocs-material *
mkdocstrings *
numpy *
pandas *
plotly *
pytest *
python-magic *
python-magic-bin *
requests *
unidiff *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

patchtrack

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

PatchTrack: A Comprehensive Analysis of ChatGPT’s Influence on Pull Request Outcomes

Abstract (Short)

Directory Structure and Description

Setting up

Get the code

Minimum System Requirements

Dependencies and Dataset

Running the tool

Notebook - Reproducing the results

Console

List of Commands

GitHub Tokens

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies