codeassist
CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具,高质量为 Python、Java 和 C++ 补全代码。
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Keywords
Repository
CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具,高质量为 Python、Java 和 C++ 补全代码。
Basic Info
Statistics
- Stars: 58
- Watchers: 3
- Forks: 8
- Open Issues: 2
- Releases: 4
Topics
Metadata Files
README.md
🇨🇳中文 | 🌐English | 📖文档/Docs | 🤖模型/Models
CodeAssist: Advanced Code Completion Tool
Introduction
CodeAssist is an advanced code completion tool that intelligently provides high-quality code completions for Python, Java, and C++ and so on.
CodeAssist 是一个高级代码补全工具,高质量为 Python、Java 和 C++ 等编程语言补全代码
Features
- GPT based code completion
- Code completion for
Python,Java,C++,javascriptand so on - Line and block code completion
- Train(Fine-tuning) and predict model with your own data
Release Models
| Arch | BaseModel | Model | Model Size | |:-------|:------------------|:------------------------------------------------------------------------------------------------------------------------|:----------:| | GPT | gpt2 | shibing624/code-autocomplete-gpt2-base | 487MB | | GPT | distilgpt2 | shibing624/code-autocomplete-distilgpt2-python | 319MB | | GPT | bigcode/starcoder | WizardLM/WizardCoder-15B-V1.0 | 29GB |
Demo
HuggingFace Demo: https://huggingface.co/spaces/shibing624/code-autocomplete
backend model: shibing624/code-autocomplete-gpt2-base
Install
shell
pip install torch # conda install pytorch
pip install -U codeassist
or
shell
git clone https://github.com/shibing624/codeassist.git
cd CodeAssist
python setup.py install
Usage
WizardCoder model
WizardCoder-15b is fine-tuned bigcode/starcoder with alpaca code data, you can use the following code to generate code:
example: examples/wizardcoder_demo.py
```python import sys
sys.path.append('..') from codeassist import WizardCoder
m = WizardCoder("WizardLM/WizardCoder-15B-V1.0") print(m.generate('def loadcsvfile(file_path):')[0]) ```
output:
```python import csv
def loadcsvfile(filepath): """ Load data from a CSV file and return a list of dictionaries. """ # Open the file in read mode with open(filepath, 'r') as file: # Create a CSV reader object csvreader = csv.DictReader(file) # Initialize an empty list to store the data data = [] # Iterate over each row of data for row in csvreader: # Append the row of data to the list data.append(row) # Return the list of data return data ```
model output is impressively effective, it currently supports English and Chinese input, you can enter instructions or code prefixes as required.
distilgpt2 model
distilgpt2 fine-tuned code autocomplete model, you can use the following code:
example: examples/distilgpt2_demo.py
```python import sys
sys.path.append('..') from codeassist import GPT2Coder
m = GPT2Coder("shibing624/code-autocomplete-distilgpt2-python") print(m.generate('import torch.nn as')[0]) ```
output:
shell
import torch.nn as nn
import torch.nn.functional as F
Use with huggingface/transformers:
example: examples/usetransformersgpt2.py
Train Model
Train WizardCoder model
example: examples/trainingwizardcodermydata.py
shell
cd examples
CUDA_VISIBLE_DEVICES=0,1 python training_wizardcoder_mydata.py --do_train --do_predict --num_epochs 1 --output_dir outputs-wizard --model_name WizardLM/WizardCoder-15B-V1.0
- GPU memory: 31GB
- finetune need 2*V100(32GB)
- inference need 1*V100(32GB)
Train distilgpt2 model
example: examples/traininggpt2mydata.py
shell
cd examples
python training_gpt2_mydata.py --do_train --do_predict --num_epochs 15 --output_dir outputs-gpt2 --model_name gpt2
PS: fine-tuned result model is GPT2-python: shibing624/code-autocomplete-gpt2-base, I spent about 24 hours with V100 to fine-tune it.
Server
start FastAPI server:
example: examples/server.py
shell
cd examples
python server.py
open url: http://0.0.0.0:8001/docs

Dataset
This allows to customize dataset building. Below is an example of the building process.
Let's use Python codes from Awesome-pytorch-list
- We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.
- This code from this project is well written (high-quality codes).
dataset tree:
shell
examples/download/python
├── train.txt
└── valid.txt
└── test.txt
There are three ways to build dataset: 1. Use the huggingface/datasets library load the dataset huggingface datasets https://huggingface.co/datasets/shibing624/source_code
python
from datasets import load_dataset
dataset = load_dataset("shibing624/source_code", "python") # python or java or cpp
print(dataset)
print(dataset['test'][0:10])
output:
shell
DatasetDict({
train: Dataset({
features: ['text'],
num_rows: 5215412
})
validation: Dataset({
features: ['text'],
num_rows: 10000
})
test: Dataset({
features: ['text'],
num_rows: 10000
})
})
{'text': [
" {'max_epochs': [1, 2]},\n",
' refit=False,\n', ' cv=3,\n',
" scoring='roc_auc',\n", ' )\n',
' search.fit(*data)\n',
'',
' def test_module_output_not_1d(self, net_cls, data):\n',
' from skorch.toy import make_classifier\n',
' module = make_classifier(\n'
]}
- Download dataset from Cloud
| Name | Source | Download | Size | | :------- | :--------- | :---------: | :---------: | | Python+Java+CPP source code | Awesome-pytorch-list(5.22 Million lines) | githubsourcecode.zip | 105M |
download dataset and unzip it, put to examples/.
- Get source code from scratch and build dataset
shell
cd examples
python prepare_code_data.py --num_repos 260
Contact

Citation
如果你在研究中使用了codeassist,请按如下格式引用:
APA:
latex
Xu, M. codeassist: Code AutoComplete with GPT model (Version 1.0.0) [Computer software]. https://github.com/shibing624/codeassist
BibTeX:
latex
@software{Xu_codeassist,
author = {Ming Xu},
title = {CodeAssist: Code AutoComplete with Generation model},
url = {https://github.com/shibing624/codeassist},
version = {1.0.0}
}
License
This repository is licensed under the The Apache License 2.0.
Please follow the Attribution-NonCommercial 4.0 International to use the WizardCoder model.
Contribute
项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:
- 在
tests添加相应的单元测试 - 使用
python setup.py test来运行所有单元测试,确保所有单测都是通过的
之后即可提交PR。
Reference
Owner
- Name: xuming
- Login: shibing624
- Kind: user
- Location: Beijing, China
- Company: @tencent
- Website: https://blog.csdn.net/mingzai624
- Repositories: 32
- Profile: https://github.com/shibing624
Senior Researcher, Machine Learning Developer, Advertising Risk Control.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Xu" given-names: "Ming" title: "code-autocomplete: Code AutoComplete with GPT2 model" url: "https://github.com/shibing624/code-autocomplete" data-released: 2022-03-01 version: 0.0.4
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| shibing624 | s****4@1****m | 98 |
| flemingxu | f****u@t****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 4
- Total pull requests: 0
- Average time to close issues: 12 days
- Average time to close pull requests: N/A
- Total issue authors: 4
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mrT23 (1)
- donghaiwang (1)
- ChiYeungLaw (1)
- fade-color (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- loguru *
- pandas *
- transformers >=4.6.0
- loguru *
- pandas *
- transformers >=4.6.0