https://github.com/astorfi/punctuation-restoration

A TensorFlow Implementation of Punctuation Restoration.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

A TensorFlow Implementation of Punctuation Restoration.

Basic Info

Host: GitHub
Owner: astorfi
License: mit
Default Branch: main
Homepage:
Size: 4.64 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of k9luo/Punctuation-Restoration

Created over 5 years ago · Last pushed over 5 years ago

https://github.com/astorfi/Punctuation-Restoration/blob/main/

Punctuation Restoration
====================================================================
![](https://img.shields.io/badge/linux-ubuntu-red.svg)

![](https://img.shields.io/badge/cuda-10.0.130-green.svg)
![](https://img.shields.io/badge/python-3.7.6-green.svg)

![](https://img.shields.io/badge/tensorflow-1.14.0-blue.svg)
![](https://img.shields.io/badge/numpy-1.19.1-blue.svg)
![](https://img.shields.io/badge/ujson-4.0.1-blue.svg)
![](https://img.shields.io/badge/jupyter-1.0.0-blue.svg)
![](https://img.shields.io/badge/ipython-7.18.1-blue.svg)
![](https://img.shields.io/badge/pandas-1.1.3-blue.svg)
![](https://img.shields.io/badge/tqdm-4.50.2-blue.svg)

## Requirements

Imagine that you are building a software for transcribing speech to text. The speech transcription part works perfectly, but cannot transcribe punctuations. The task is to train a predictive model to ingest a sequence of text and add punctuation (period, comma or question mark) in the appropriate locations. This task is important for all downstream data processing jobs.

**Example input:**
 
```this is a string of text with no punctuation this is a new sentence```
 
**Example output:**
 
```this is a string of text with no punctuation  this is a new sentence ```

## Solution

My solution is largely based on [Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration](https://www.isca-speech.org/archive/Interspeech_2016/pdfs/1517.PDF).

The architecture is defined as follows:
1. Obtain words embeddings from [GloVe](https://nlp.stanford.edu/projects/glove/).
2. The word embeddings are then processed by densely connected [Bi-LSTM](https://arxiv.org/pdf/1303.5778.pdf) layers.
3. These Bi-LSTM layers are followed by a RNN with an attention mechanism and [conditional random field (CRF)](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) log likelihood loss.

The experiments are performed on the IWSLT dataset which consists of TED Talks transcript.

The detailed analysis can be found in this [notebook](https://github.com/k9luo/Punctuation-Restoration/blob/main/main.ipynb).

## Setup and Installation

First step, clone the repo:

```https://github.com/k9luo/Punctuation-Restoration.git```

Second step, you can download pretrained [GloVe](https://nlp.stanford.edu/projects/glove/) word embeddings and create a new conda virutal environment with `setup.sh`. Or you can manually do these steps yourself. Note that the running `setup.sh` will install the GPU version of TensorFlow:

```sh setup.sh```

Third step, activate the virtual environment:

```conda activate restore_punct```

Fourth step, add the new virutal environment to Jupyter Notebook:

```python -m ipykernel install --user --name=restore_punct```

## Training and Inference

Please run `python main.py`.

Owner

Name: Sina Torfi
Login: astorfi
Kind: user
Location: San Jose
Company: Meta

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/astorfi/punctuation-restoration

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/astorfi/Punctuation-Restoration/blob/main/

Owner

GitHub Events

Total

Last Year