text-paraphraser
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, ieee.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: AdarshAgrawal7007
- License: other
- Language: Python
- Default Branch: main
- Size: 9.77 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Paraphrase Generator with T5
A Paraphrase-Generator built using transformers which takes an English sentence as an input and produces a set of paraphrased sentences. This is an NLP task of conditional text-generation. The model used here is the T5ForConditionalGeneration from the huggingface transformers library. This model is trained on the Google's PAWS Dataset and the model is saved in the transformer model hub of hugging face library under the name Vamsi/T5ParaphrasePaws.
List of publications using Paraphrase-Generator (please open a pull request to add missing entries):
Sports Narrative Enhancement with Natural Language Generation
EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
Wissensgenerierung für deutschprachige Chatbots
Causal Document-Grounded Dialogue Pre-training
Creativity Evaluation Method for Procedural Content Generated Game Items via Machine Learning
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites
- Streamlit library
- Huggingface transformers library
- Pytorch
- Tensorflow
Installing
- Streamlit
$ pip install streamlit
Huggingface transformers library
$ pip install transformersTensorflow
$ pip install --upgrade tensorflowPytorch
Head to the docs and install a compatible version https://pytorch.org/
Running the web app
- Clone the repository
$ git clone [repolink] - Running streamlit app ``` $ cd Streamlit
$ streamlit run paraphrase.py
- Running the flask app
$ cd Server
$ python server.py ```
The initial server call will take some time as it downloads the model parameters. The later calls will be relatively faster as it will store the model params in the cache.


General Usage
PyTorch and TF models are available ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.frompretrained("Vamsi/T5ParaphrasePaws")
model = AutoModelForSeq2SeqLM.frompretrained("Vamsi/T5ParaphrasePaws")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sentence = "This is something which i cannot understand at all"
text = "paraphrase: " + sentence + " "
encoding = tokenizer.encodeplus(text,padtomaxlength=True, return_tensors="pt")
inputids, attentionmasks = encoding["inputids"].to(device), encoding["attentionmask"].to(device)
outputs = model.generate( inputids=inputids, attentionmask=attentionmasks, maxlength=256, dosample=True, topk=200, topp=0.95, earlystopping=True, numreturn_sequences=5 )
for output in outputs: line = tokenizer.decode(output, skipspecialtokens=True,cleanuptokenization_spaces=True) print(line)
```
Dockerfile
The repository also contains a minimal reproducible Dockerfile that can be used to spin up a server with the API endpoints to perform text paraphrasing.
Note: The Dockerfile uses the built-in Flask development server, hence it's not recommended for production usage. It should be replaced with a production-ready WSGI server.
After cloning the repository, starting the local server it's a two lines script:
docker build -t paraphrase .
docker run -p 5000:5000 paraphrase
and then the API is available on localhost:5000
curl -XPOST localhost:5000/run_forward \
-H 'content-type: application/json' \
-d '{"sentence": "What is the best paraphrase of a long sentence that does not say much?", "decoding_params": {"tokenizer": "", "max_len": 512, "strategy": "", "top_k": 168, "top_p": 0.95, "return_sen_num": 3}}'
Built With
- Streamlit - Fastest way for building data apps
- Flask - Backend framework
- Transformers-Huggingface - On a mission to solve NLP, one commit at a time. Transformers Library.
Authors
Citing
bibtex
@misc{alisetti2021paraphrase,
title={Paraphrase generator with t5},
author={Alisetti, Sai Vamsi},
year={2021}
}
Owner
- Login: AdarshAgrawal7007
- Kind: user
- Repositories: 1
- Profile: https://github.com/AdarshAgrawal7007
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Alisetti" given-names: "Sai Vamsi" title: "Paraphrase Generator with T5" doi: 10.5281/zenodo.10731518 date-released: 2020 url: "https://github.com/Vamsi995/Paraphrase-Generator"
GitHub Events
Total
- Push event: 1
- Create event: 1
Last Year
- Push event: 1
- Create event: 1
Dependencies
- tensorflow/tensorflow latest build
- Flask ==2.3.2
- httplib2 ==0.19.0
- nltk ==3.9
- numpy ==1.22.0
- pandas ==1.0.5
- pytorch_lightning ==2.4.0
- requests ==2.32.0
- seaborn ==0.10.1
- streamlit ==1.37.0
- tensorflow ==2.12.1
- tensorflow_hub ==0.9.0
- torch ==2.2.0
- transformers ==4.50.0