paraphrase-generator

A paraphrase generator built using the T5 model which produces paraphrased English sentences.

https://github.com/vamsi995/paraphrase-generator

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, ieee.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A paraphrase generator built using the T5 model which produces paraphrased English sentences.

Basic Info
  • Host: GitHub
  • Owner: Vamsi995
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 528 KB
Statistics
  • Stars: 313
  • Watchers: 6
  • Forks: 66
  • Open Issues: 15
  • Releases: 1
Created over 5 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation Security

README.md

Paraphrase Generator with T5

DOI

A Paraphrase-Generator built using transformers which takes an English sentence as an input and produces a set of paraphrased sentences. This is an NLP task of conditional text-generation. The model used here is the T5ForConditionalGeneration from the huggingface transformers library. This model is trained on the Google's PAWS Dataset and the model is saved in the transformer model hub of hugging face library under the name Vamsi/T5ParaphrasePaws.

List of publications using Paraphrase-Generator (please open a pull request to add missing entries):

DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models

Sports Narrative Enhancement with Natural Language Generation

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Wissensgenerierung für deutschprachige Chatbots

Causal Document-Grounded Dialogue Pre-training

Creativity Evaluation Method for Procedural Content Generated Game Items via Machine Learning

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

  • Streamlit library
  • Huggingface transformers library
  • Pytorch
  • Tensorflow

Installing

  • Streamlit

$ pip install streamlit

  • Huggingface transformers library $ pip install transformers

  • Tensorflow $ pip install --upgrade tensorflow

  • Pytorch Head to the docs and install a compatible version https://pytorch.org/

Running the web app

  • Clone the repository $ git clone [repolink]
  • Running streamlit app ``` $ cd Streamlit

$ streamlit run paraphrase.py - Running the flask app $ cd Server

$ python server.py ```

The initial server call will take some time as it downloads the model parameters. The later calls will be relatively faster as it will store the model params in the cache.

General Usage

PyTorch and TF models are available ​ ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.frompretrained("Vamsi/T5ParaphrasePaws")
model = AutoModelForSeq2SeqLM.from
pretrained("Vamsi/T5ParaphrasePaws")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

sentence = "This is something which i cannot understand at all"

text = "paraphrase: " + sentence + " "

encoding = tokenizer.encodeplus(text,padtomaxlength=True, return_tensors="pt")

inputids, attentionmasks = encoding["inputids"].to(device), encoding["attentionmask"].to(device)

outputs = model.generate( inputids=inputids, attentionmask=attentionmasks, maxlength=256, dosample=True, topk=200, topp=0.95, earlystopping=True, numreturn_sequences=5 )

for output in outputs: line = tokenizer.decode(output, skipspecialtokens=True,cleanuptokenization_spaces=True) print(line)

```

Dockerfile

The repository also contains a minimal reproducible Dockerfile that can be used to spin up a server with the API endpoints to perform text paraphrasing.

Note: The Dockerfile uses the built-in Flask development server, hence it's not recommended for production usage. It should be replaced with a production-ready WSGI server.

After cloning the repository, starting the local server it's a two lines script:

docker build -t paraphrase . docker run -p 5000:5000 paraphrase

and then the API is available on localhost:5000

curl -XPOST localhost:5000/run_forward \ -H 'content-type: application/json' \ -d '{"sentence": "What is the best paraphrase of a long sentence that does not say much?", "decoding_params": {"tokenizer": "", "max_len": 512, "strategy": "", "top_k": 168, "top_p": 0.95, "return_sen_num": 3}}'

Built With

Authors

Citing

bibtex @misc{alisetti2021paraphrase, title={Paraphrase generator with t5}, author={Alisetti, Sai Vamsi}, year={2021} }

Owner

  • Name: Sai Vamsi Alisetti
  • Login: Vamsi995
  • Kind: user
  • Location: Hyderabad
  • Company: IIT Palakkad

Educating, Learning, Revolutionizing Tech

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Alisetti"
  given-names: "Sai Vamsi"
title: "Paraphrase Generator with T5"
doi: 10.5281/zenodo.10731518
date-released: 2020
url: "https://github.com/Vamsi995/Paraphrase-Generator"

GitHub Events

Total
  • Issues event: 2
  • Watch event: 6
  • Delete event: 5
  • Issue comment event: 3
  • Push event: 3
  • Pull request event: 9
  • Fork event: 3
  • Create event: 5
Last Year
  • Issues event: 2
  • Watch event: 6
  • Delete event: 5
  • Issue comment event: 3
  • Push event: 3
  • Pull request event: 9
  • Fork event: 3
  • Create event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 7
  • Average time to close issues: N/A
  • Average time to close pull requests: about 5 hours
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.14
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 7
Past Year
  • Issues: 2
  • Pull requests: 7
  • Average time to close issues: N/A
  • Average time to close pull requests: about 5 hours
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.14
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 7
Top Authors
Issue Authors
  • um1ty (1)
  • jagadeesh-rajarajan-ss (1)
  • manisek87 (1)
Pull Request Authors
  • dependabot[bot] (13)
Top Labels
Issue Labels
Pull Request Labels
dependencies (12) python (6)

Dependencies

requirements.txt pypi
  • Flask ==1.1.2
  • httplib2 ==0.19.0
  • nltk ==3.6.6
  • numpy ==1.21.0
  • pandas ==1.0.5
  • pytorch_lightning ==1.6.0
  • requests ==2.24.0
  • seaborn ==0.10.1
  • streamlit ==0.67.1
  • tensorflow ==2.5.3
  • tensorflow_hub ==0.9.0
  • torch ==1.6.0
  • transformers ==3.3.1
Dockerfile docker
  • tensorflow/tensorflow latest build