https://github.com/big-data-lab-umbc/chatgpt-comparison-detection

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥

Basic Info

Host: GitHub
Owner: big-data-lab-umbc
Default Branch: main
Homepage: https://arxiv.org/abs/2301.07597
Size: 27.3 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of Hello-SimpleAI/chatgpt-comparison-detection

Created about 3 years ago · Last pushed over 3 years ago

https://github.com/big-data-lab-umbc/chatgpt-comparison-detection/blob/main/

# ChatGPT-Comparison-Detection Project 

![](https://img.shields.io/badge/Languages-%20English%2C%20Chinese-brightgreen) 
![](https://img.shields.io/badge/ChatGPT-Corpus%2C%20Detector-blue)

Official repository of paper ["How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection"](https://arxiv.org/abs/2301.07597). Please star, watch, and fork our repo for the active updates!

See also([ Feedback Space for Detectors](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection/discussions/2) please feel free to leave your feedback here! )





---
### Human ChatGPT Comparison Corpus (HC3) / -ChatGPT 
Yes, we propose the first **Human vs. ChatGPT** comparison corpus, named **HC3**.

 **Human vs. ChatGPT** ,  **HC3**.



The first version of the HC3 datasets are now available on  Huggingface Datasets:
- [HC3-Engllish](https://huggingface.co/datasets/Hello-SimpleAI/HC3)
- [HC3-Chinese](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)


HC3  ModelScope :
- [HC3-Engllish](https://www.modelscope.cn/datasets/simpleai/HC3)
- [HC3-Chinese](https://www.modelscope.cn/datasets/simpleai/HC3-Chinese)


> Train/Test splits & filtered versions of the paper, ref to Google Drive links in [HC3/README.md](HC3/README.md).

### Dataset Copyright

If the source datasets used in this corpus has a specific license which is stricter than CC-BY-SA, our products follow the same.
If not, they follow CC-BY-SA license.

| English Split       | Source | Source License | Note |
|----------|-------------|--------|-------------|
| reddit_eli5 | [ELI5](https://github.com/facebookresearch/ELI5)   | BSD License    |     |
| open_qa  | [WikiQA](https://www.microsoft.com/en-us/download/details.aspx?id=52419)  | [PWC Custom](https://paperswithcode.com/datasets/license)   |      |
| wiki_csai   | Wikipedia | CC-BY-SA |   | [Wiki FAQ](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Copyright) |
| medicine    | [Medical Dialog](https://github.com/UCSD-AI4H/Medical-Dialogue-System) | Unknown|  [Asking](https://github.com/UCSD-AI4H/Medical-Dialogue-System/issues/10)|
| finance     | [FiQA](https://paperswithcode.com/dataset/fiqa-1) | Unknown |  Asking by   |

| Chinese Split       | Source | Source License  | Note |
|----------|-------------|-----------|-------------|
| open_qa  | [WebTextQA & BaikeQA](https://github.com/brightmart/nlp_chinese_corpus) | MIT license |  |  |
| baike     | Baidu Baike  | None   |    |   |
| nlpcc_dbqa  | [NLPCC-DBQA](https://github.com/msra-nlc/ChineseDBQA) | Unknown |   [Asking](https://github.com/UCSD-AI4H/Medical-Dialogue-System/issues/10) |
| medicine    | [Chinese Medical Dialogue](https://tianchi.aliyun.com/dataset/90163) |  CC-BY-NC 4.0 | 
| finance     | [FinanceZhidao](https://www.heywhale.com/mw/dataset/5e9588f8e7ec38002d0331b1/content) | CC-BY 4.0 |  |
| psychology  | [On Baidu AI Studio](https://aistudio.baidu.com/aistudio/datasetdetail/38489) | CC0  | |
|law          | [LegalQA](https://github.com/siatnlp/LegalQA) | Unknown | [Asking](https://github.com/siatnlp/LegalQA/issues/2) |


---

### ChatGPT detectors / 
![image](https://user-images.githubusercontent.com/37113676/211677236-d7c028f5-b9a5-4d88-baee-8b86dc942ff7.png)
(Hosted on  Hugging Face Spaces)


We provide three kinds of detectors, all in Bilingual / :
- [QA version / ](https://huggingface.co/spaces/Hello-SimpleAI/chatgpt-detector-qa): detect whether an **answer** is generated by ChatGPT for certain **question**, using PLM-based classifiers / ****ChatGPTPTM;
- [Sinlge-text version / ](https://huggingface.co/spaces/Hello-SimpleAI/chatgpt-detector-single): detect whether a piece of text is ChatGPT generated, using PLM-based classifiers / ****ChatGPTPTM;
- [Linguistic version / ](https://huggingface.co/spaces/Hello-SimpleAI/chatgpt-detector-ling): detect whether a piece of text is ChatGPT generated, using linguistic features / ****ChatGPT;


 modelscope :
- [QA version / ](https://www.modelscope.cn/studios/simpleai/chatgpt-detector-qa)
- [Sinlge-text version / ](https://www.modelscope.cn/studios/simpleai/chatgpt-detector-single)
- [Linguistic version / ](https://www.modelscope.cn/studios/simpleai/chatgpt-detector-ling)


The model weights are all available at  Hugging Face Models:

| Model Checkpoints              | Comment      |
|-----------------------|------------|
|[chatgpt-detector-roberta](https://huggingface.co/Hello-SimpleAI/chatgpt-detector-roberta)|To detect a single piece of text|
|[chatgpt-qa-detector-roberta](https://huggingface.co/Hello-SimpleAI/chatgpt-qa-detector-roberta)|To detect a question-answer pair|
|[chatgpt-detector-roberta-chinese](https://huggingface.co/Hello-SimpleAI/chatgpt-detector-roberta-chinese)||
|[chatgpt-qa-detector-roberta-chinese](https://huggingface.co/Hello-SimpleAI/chatgpt-qa-detector-roberta-chinese)|QA|

The English models are based on [roberta-base](https://huggingface.co/roberta-base).
The Chinese models are based on [hfl/chinese-roberta-wwm-ext](https://huggingface.co/hfl/chinese-roberta-wwm-ext).


---

### Important Dates / :

| Events                | Dates      |
|-----------------------|------------|
| Project Launch /         | 2022-12-09  |
| Comparison Data Collection /         | 2022-12-11 to Now |
| Release ChatGPT Detector (Demo) /  Demo  | 2023-01-11 |
| Models Release /  | 2023-01-18 |
| Comparison Corpus Release /  | 2023-01-18 |
| Research Paper /  | 2023-01-19 |
|...|...|



---

### Citation

Checkout this paper [arxiv: 2301.07597](https://arxiv.org/abs/2301.07597)

```
@article{guo-etal-2023-hc3,
    title = "How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection",
    author = "Guo, Biyang  and
      Zhang, Xin  and
      Wang, Ziyuan  and
      Jiang, Minqi  and
      Nie, Jinran  and
      Ding, Yuxuan  and
      Yue, Jianwei  and
      Wu, Yupeng",
    journal={arXiv preprint arxiv:2301.07597}
    year = "2023",
}
```



---
### Our Story... / 

On December 9, 2022, which is 10 days after the launch of [ChatGPT](https://openai.com/blog/chatgpt/), we started this project, for two purposes: 
1. To create some **open-source models** for efficiently detecting ChatGPT-generated content; 
2. To collect a valuable **human-ChatGPT comparison Q&A corpus**, to facilitate releated research.

2022  12  9  [ChatGPT](https://openai.com/blog/chatgpt/)  10 
1. **** ChatGPT 
2. ** ChatGPT **

Welcome to follow our project! We have released a preview of our ChatGPT detectors, and the **models, dataset will be open-sourced** in about a week. We look forward to receiving feedback from the community to help improve the models and make contributions to **open** academic research together:)

ChatGPT********

### About Us / 

We are a group of insignificant researchers (in the shadow of ChatGPT) hoping to do some significant work for the community. The team for this projects consists of PhD students and engineers from 6 universities/companies.

 ChatGPT 6/

|   |   |   |   |
|:-:|:-:|:-:|:-:|
| [Biyang Guo](https://github.com/beyondguo) | [Minqi Jiang](https://github.com/Minqi824) | [Ziyuan Wang](https://github.com/SUFEHeisenberg) | [Xin Zhang](https://github.com/izhx) |
|||||
| [Jinran Nie](https://github.com/NJRBarry) | [Yuxuan Ding](https://github.com/yxding95) | [Jianwei Yue](https://github.com/TurquoiseA) | [Yupeng Wu](https://github.com/realRoc) |
|||   |   |