codeassist

CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具，高质量为 Python、Java 和 C++ 补全代码。

https://github.com/shibing624/codeassist

Keywords

auto-completion code-autocomplete code-generation gpt-4 gpt2 starcoder wizardcoder

Last synced: 6 months ago · JSON representation ·

Repository

CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具，高质量为 Python、Java 和 C++ 补全代码。

Basic Info

Host: GitHub
Owner: shibing624
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 1.06 MB

Statistics

Stars: 58
Watchers: 3
Forks: 8
Open Issues: 2
Releases: 4

Topics

auto-completion code-autocomplete code-generation gpt-4 gpt2 starcoder wizardcoder

Created about 4 years ago · Last pushed about 2 years ago

Metadata Files

Readme Contributing License Citation

README.md

🇨🇳中文 | 🌐English | 📖文档/Docs | 🤖模型/Models

CodeAssist: Advanced Code Completion Tool

Introduction

CodeAssist is an advanced code completion tool that intelligently provides high-quality code completions for Python, Java, and C++ and so on.

CodeAssist 是一个高级代码补全工具，高质量为 Python、Java 和 C++ 等编程语言补全代码

Features

GPT based code completion
Code completion for Python, Java, C++, javascript and so on
Line and block code completion
Train(Fine-tuning) and predict model with your own data

Release Models

| Arch | BaseModel | Model | Model Size | |:-------|:------------------|:------------------------------------------------------------------------------------------------------------------------|:----------:| | GPT | gpt2 | shibing624/code-autocomplete-gpt2-base | 487MB | | GPT | distilgpt2 | shibing624/code-autocomplete-distilgpt2-python | 319MB | | GPT | bigcode/starcoder | WizardLM/WizardCoder-15B-V1.0 | 29GB |

Demo

HuggingFace Demo: https://huggingface.co/spaces/shibing624/code-autocomplete

backend model: shibing624/code-autocomplete-gpt2-base

Install

shell pip install torch # conda install pytorch pip install -U codeassist

or

shell git clone https://github.com/shibing624/codeassist.git cd CodeAssist python setup.py install

Usage

WizardCoder model

WizardCoder-15b is fine-tuned bigcode/starcoder with alpaca code data, you can use the following code to generate code:

example: examples/wizardcoder_demo.py

```python import sys

sys.path.append('..') from codeassist import WizardCoder

m = WizardCoder("WizardLM/WizardCoder-15B-V1.0") print(m.generate('def loadcsvfile(file_path):')[0]) ```

output:

```python import csv

def loadcsvfile(filepath): """ Load data from a CSV file and return a list of dictionaries. """ # Open the file in read mode with open(filepath, 'r') as file: # Create a CSV reader object csvreader = csv.DictReader(file) # Initialize an empty list to store the data data = [] # Iterate over each row of data for row in csvreader: # Append the row of data to the list data.append(row) # Return the list of data return data ```

model output is impressively effective, it currently supports English and Chinese input, you can enter instructions or code prefixes as required.

distilgpt2 model

distilgpt2 fine-tuned code autocomplete model, you can use the following code:

example: examples/distilgpt2_demo.py

```python import sys

sys.path.append('..') from codeassist import GPT2Coder

m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python") print(m.generate('import torch.nn as')[0]) ```

output:

shell import torch.nn as nn import torch.nn.functional as F

Use with huggingface/transformers：

example: examples/usetransformersgpt2.py

Train Model

Train WizardCoder model

example: examples/trainingwizardcodermydata.py

shell cd examples CUDA_VISIBLE_DEVICES=0,1 python training_wizardcoder_mydata.py --do_train --do_predict --num_epochs 1 --output_dir outputs-wizard --model_name WizardLM/WizardCoder-15B-V1.0

GPU memory: 31GB
finetune need 2*V100(32GB)
inference need 1*V100(32GB)

Train distilgpt2 model

example: examples/traininggpt2mydata.py

shell cd examples python training_gpt2_mydata.py --do_train --do_predict --num_epochs 15 --output_dir outputs-gpt2 --model_name gpt2

PS: fine-tuned result model is GPT2-python: shibing624/code-autocomplete-gpt2-base, I spent about 24 hours with V100 to fine-tune it.

Server

start FastAPI server:

example: examples/server.py

shell cd examples python server.py

open url: http://0.0.0.0:8001/docs

api

Dataset

This allows to customize dataset building. Below is an example of the building process.

Let's use Python codes from Awesome-pytorch-list

We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.
This code from this project is well written (high-quality codes).

dataset tree:

shell examples/download/python ├── train.txt └── valid.txt └── test.txt

There are three ways to build dataset: 1. Use the huggingface/datasets library load the dataset huggingface datasets https://huggingface.co/datasets/shibing624/source_code

python from datasets import load_dataset dataset = load_dataset("shibing624/source_code", "python") # python or java or cpp print(dataset) print(dataset['test'][0:10])

output: shell DatasetDict({ train: Dataset({ features: ['text'], num_rows: 5215412 }) validation: Dataset({ features: ['text'], num_rows: 10000 }) test: Dataset({ features: ['text'], num_rows: 10000 }) }) {'text': [ " {'max_epochs': [1, 2]},\n", ' refit=False,\n', ' cv=3,\n', " scoring='roc_auc',\n", ' )\n', ' search.fit(*data)\n', '', ' def test_module_output_not_1d(self, net_cls, data):\n', ' from skorch.toy import make_classifier\n', ' module = make_classifier(\n' ]}

Download dataset from Cloud

| Name | Source | Download | Size | | :------- | :--------- | :---------: | :---------: | | Python+Java+CPP source code | Awesome-pytorch-list(5.22 Million lines) | githubsourcecode.zip | 105M |

download dataset and unzip it, put to examples/.

Get source code from scratch and build dataset

preparecodedata.py

shell cd examples python prepare_code_data.py --num_repos 260

Contact

Issue(建议) ：
邮件我：xuming: xuming624@qq.com
微信我：加我微信号：xuming624, 备注：个人名称-公司-NLP 进NLP交流群。

Citation

如果你在研究中使用了codeassist，请按如下格式引用：

APA: latex Xu, M. codeassist: Code AutoComplete with GPT model (Version 1.0.0) [Computer software]. https://github.com/shibing624/codeassist

BibTeX: latex @software{Xu_codeassist, author = {Ming Xu}, title = {CodeAssist: Code AutoComplete with Generation model}, url = {https://github.com/shibing624/codeassist}, version = {1.0.0} }

License

This repository is licensed under the The Apache License 2.0.

Please follow the Attribution-NonCommercial 4.0 International to use the WizardCoder model.

Contribute

项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：

在tests添加相应的单元测试
使用python setup.py test来运行所有单元测试，确保所有单测都是通过的

之后即可提交PR。

Reference

Owner

Name: xuming
Login: shibing624
Kind: user
Location: Beijing, China
Company: @tencent

Website: https://blog.csdn.net/mingzai624
Repositories: 32
Profile: https://github.com/shibing624

Senior Researcher, Machine Learning Developer, Advertising Risk Control.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Xu"
  given-names: "Ming"
title: "code-autocomplete: Code AutoComplete with GPT2 model"
url: "https://github.com/shibing624/code-autocomplete"
data-released: 2022-03-01
version: 0.0.4

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Committers

Last synced: 10 months ago

All Time

Total Commits: 100
Total Committers: 2
Avg Commits per committer: 50.0
Development Distribution Score (DDS): 0.02

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
shibing624	s**4@1**m	98
flemingxu	f**u@t**m	2

Committer Domains (Top 20 + Academic)

tencent.com: 1 126.com: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 4
Total pull requests: 0
Average time to close issues: 12 days
Average time to close pull requests: N/A
Total issue authors: 4
Total pull request authors: 0
Average comments per issue: 2.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

codeassist

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

CodeAssist: Advanced Code Completion Tool

Introduction

Features

Release Models

Demo

Install

Usage

WizardCoder model

distilgpt2 model

Use with huggingface/transformers：

Train Model

Train WizardCoder model

Train distilgpt2 model

Server

Dataset

Contact

Citation

License

Contribute

Reference

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies