emnlp22-transforming

The official implementation of the EMNLP 2022 paper "How Large Language Models are Transforming Machine-Paraphrased Plagiarism".

https://github.com/jpwahle/emnlp22-transforming

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

machine-learning natural-language-processing nlp paraphrase-generation plagiarism
Last synced: 6 months ago · JSON representation ·

Repository

The official implementation of the EMNLP 2022 paper "How Large Language Models are Transforming Machine-Paraphrased Plagiarism".

Basic Info
Statistics
  • Stars: 9
  • Watchers: 4
  • Forks: 0
  • Open Issues: 7
  • Releases: 0
Topics
machine-learning natural-language-processing nlp paraphrase-generation plagiarism
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

How Large Language Models Are Transforming Machine Paraphrase Generation

arXiv HuggingFace Dataset

Quick Start

Install

bash poetry install

Run

To generate paraphrases using T5, run the following command:

Note: T5 benefits from more few shot examples as it actually performs some gradient steps. However, to make it comparable to GPT-3, we don't recommend exceeding 50 examples.

bash poetry run python paraphrase.generate --model_name gpt3 --num_prompts 4 --num_examples 32

For generating paraphrases using GPT-3, run the following command:

Warning: Using GPT-3 requires a paid account and can quickly run up a bill if you don't have credits. Reducing the number of prompts and/or the number of samples can help reduce costs.

bash OPENAI_API_KEY={YOUR_KEY} poetry run python paraphrase.generate --model_name gpt3 --num_prompts 4 --num_examples 32

For help, run the following command:

bash poetry run python -m paraphrase.generate --help

Dataset

The dataset generated for our study is available on 🤗 Hugging Face Datasets.

Detection

For the detection code, please refer to this repository and paper.

For all models except GPT-3 and T5, we used the trained versions on MPC. For PlagScan, we embedded the text in the same way as in the paper above.

Citation

bib @inproceedings{wahle-etal-2022-large, title = "How Large Language Models are Transforming Machine-Paraphrase Plagiarism", author = "Wahle, Jan Philip and Ruas, Terry and Kirstein, Frederic and Gipp, Bela", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-main.62", pages = "952--963", abstract = "The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work.However, the role of large autoregressive models in generating machine-paraphrased plagiarism and their detection is still incipient in the literature.This work explores T5 and GPT3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia.We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples.Our results suggest that large language models can rewrite text humans have difficulty identifying as machine-paraphrased (53{\%} mean acc.).Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5).The best-performing detection model (GPT-3) achieves 66{\%} F1-score in detecting paraphrases.We make our code, data, and findings publicly available to facilitate the development of detection solutions.", }

License

This repository is licensed under the Apache License 2.0 - see the LICENSE file for details. Use the code for any of your research projects, but be nice and give credit where credit is due. Any illegal use for plagiarism or other purposes is prohibited.

Owner

  • Name: Jan Philip Wahle
  • Login: jpwahle
  • Kind: user
  • Location: Göttingen
  • Company: @gipplab

👨🏼‍💻 Computer Science Researcher | 📍Göttingen, Germany

Citation (CITATION.bib)

@inproceedings{wahle-etal-2022-large,
    title = "How Large Language Models are Transforming Machine-Paraphrase Plagiarism",
    author = "Wahle, Jan Philip  and
      Ruas, Terry  and
      Kirstein, Frederic  and
      Gipp, Bela",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.62",
    pages = "952--963",
    abstract = "The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work.However, the role of large autoregressive models in generating machine-paraphrased plagiarism and their detection is still incipient in the literature.This work explores T5 and GPT3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia.We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples.Our results suggest that large language models can rewrite text humans have difficulty identifying as machine-paraphrased (53{\%} mean acc.).Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5).The best-performing detection model (GPT-3) achieves 66{\%} F1-score in detecting paraphrases.We make our code, data, and findings publicly available to facilitate the development of detection solutions.",
}

GitHub Events

Total
Last Year

Dependencies

poetry.lock pypi
  • absl-py 1.3.0
  • accelerate 0.14.0
  • aiohttp 3.8.3
  • aiosignal 1.3.1
  • async-timeout 4.0.2
  • attrs 22.1.0
  • bert-score 0.3.12
  • bleu 0.3
  • blis 0.7.9
  • catalogue 2.0.8
  • certifi 2022.12.7
  • charset-normalizer 2.1.1
  • click 8.1.3
  • colorama 0.4.6
  • confection 0.0.3
  • contourpy 1.0.6
  • cycler 0.11.0
  • cymem 2.0.7
  • datasets 2.6.1
  • dill 0.3.5.1
  • efficiency 1.1
  • et-xmlfile 1.1.0
  • evaluate 0.3.0
  • filelock 3.8.0
  • fonttools 4.38.0
  • frozenlist 1.3.3
  • fsspec 2022.11.0
  • huggingface-hub 0.10.1
  • idna 3.4
  • jinja2 3.1.2
  • joblib 1.2.0
  • kiwisolver 1.4.4
  • langcodes 3.3.0
  • markupsafe 2.1.1
  • matplotlib 3.6.2
  • multidict 6.0.2
  • multiprocess 0.70.13
  • murmurhash 1.0.9
  • nltk 3.7
  • numpy 1.23.4
  • openai 0.25.0
  • openpyxl 3.0.10
  • packaging 21.3
  • pandas 1.5.1
  • pandas-stubs 1.2.0.62
  • pathy 0.7.1
  • pillow 9.3.0
  • preshed 3.0.8
  • psutil 5.9.4
  • pyarrow 10.0.0
  • pydantic 1.10.2
  • pyparsing 3.0.9
  • python-dateutil 2.8.2
  • pytz 2022.6
  • pyyaml 6.0
  • regex 2022.10.31
  • requests 2.28.1
  • responses 0.18.0
  • rouge-score 0.1.2
  • sentencepiece 0.1.97
  • setuptools 65.5.1
  • setuptools-scm 7.0.5
  • six 1.16.0
  • smart-open 5.2.1
  • spacy 3.4.3
  • spacy-legacy 3.0.10
  • spacy-loggers 1.0.3
  • srsly 2.4.5
  • thinc 8.1.5
  • tokenizers 0.13.2
  • tomli 2.0.1
  • torch 1.12.1
  • tqdm 4.64.1
  • transformers 4.24.0
  • typer 0.7.0
  • typing-extensions 4.4.0
  • urllib3 1.26.12
  • wasabi 0.10.1
  • xxhash 3.1.0
  • yarl 1.8.1
pyproject.toml pypi
  • accelerate ^0.14.0
  • bert-score ^0.3.12
  • bleu ^0.3
  • datasets ^2.6.1
  • evaluate ^0.3.0
  • nltk ^3.7
  • numpy ^1.23.4
  • openai ^0.25.0
  • pandas ^1.5.1
  • python ^3.9
  • rouge-score ^0.1.2
  • sentencepiece ^0.1.97
  • torch 1.12.1
  • tqdm ^4.64.1
  • transformers ^4.24.0