text_summarization_lstm_gru

# Comparing the performance of LSTM and GRU for Text Summarization using Pointer Generator Networks

https://github.com/sajithm/text_summarization_lstm_gru

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

gru lstm pointer-generator tensorflow2 text-summarization
Last synced: 6 months ago · JSON representation ·

Repository

# Comparing the performance of LSTM and GRU for Text Summarization using Pointer Generator Networks

Basic Info
  • Host: GitHub
  • Owner: sajithm
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 35.2 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
gru lstm pointer-generator tensorflow2 text-summarization
Created over 3 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Comparing the performance of LSTM and GRU for Text Summarization using Pointer Generator Networks

Compare the performance of LSTM and GRU in Text Summarization using Pointer Generator Networks as discussed in https://github.com/abisee/pointer-generator

Based on code at https://github.com/steph1793/PointerGeneratorSummarizer

Prerequisites

Python 3.7+
Tensorflow 2.8+ (Any 2.x should work, but not tested)
rouge 1.0.1 (pip install rouge)

Dataset

We use the CNN-DailyMail dataset. The application reads data in the tfrecords format files.

Dataset can be created and processed based on the instructions at https://github.com/abisee/cnn-dailymail

Alternatively, pre-processed dataset can be downloaded from https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail

Easiest way yet, download the dataset zip file and extract it into the dataset directory (by default, this will be "./dataset/")

Pre-Trained Models (Optional)

In case you want to skip the training and run evaluate the models, you can download the checkpoint zip file and extract it into the checkpoint directory (by default, this will be "./checkpoint/").

Usage

Training and Evaluation will perform the actions on all four models. If you wish to train/eval only one model, pass it as --model_name.

Training

~~~ python main.py --mode="train" --vocabpath="./dataset/vocab" --datadir="./dataset/chunked_train" ~~~

Evaluation

~~~ python main.py --mode="eval" --vocabpath="./dataset/vocab" --datadir="./dataset/chunked_val" ~~~

Parameters

Most of the parameters have defaults and can be skipped. Here are the parameters that you can tweak along with their defaults.

| Parameter | Default | Description | | ----------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------- | | maxenclen | 512 | Encoder input max sequence length | | maxdeclen | 128 | Decoder input max sequence length | | maxdecsteps | 128 | maximum number of words of the predicted abstract | | mindecsteps | 32 | Minimum number of words of the predicted abstract | | batchsize | 4 | batch size | | beamsize | 4 | beam size for beam search decoding (must be equal to batch size in decode mode) | | vocabsize | 50000 | Vocabulary size | | embedsize | 128 | Words embeddings dimension | | encunits | 256 | Encoder LSTM/GRU cell units number | | decunits | 256 | Decoder LSTM/GRU cell units number | | attnunits | 512 | [context vector, decoder state, decoder input] feedforward result dimension - used to compute the attention weights | | learningrate | 0.15 | Learning rate | | adagradinitacc | 0.1 | Adagrad optimizer initial accumulator value. Please refer to the Adagrad optimizer API documentation on tensorflow site | | maxgradnorm | 0.8 | Gradient norm above which gradients must be clipped | | checkpointssavesteps | 1000 | Save checkpoints every N steps | | maxcheckpoints | 10 | Maximum number of checkpoints to keep. Older ones are deleted | | maxsteps | 50000 | Max number of iterations | | maxnumtoeval | 100 | Max number of examples to evaluate | | checkpointdir | "./checkpoint" | Checkpoint directory | | logdir | "./log" | Directory in which to write logs | | resultsdir | None | Directory in which we write the intermediate results (Article, Actual Summary and Predicted Summary) during evaluation | | datadir | None | Data Folder | | vocabpath | None | Path to vocab file | | mode | None | Should be "train" or "eval" | | model_name | None | Name of a specific model. If empty, all models are used |

Owner

  • Name: Sajith Madhavan
  • Login: sajithm
  • Kind: user
  • Location: Bangalore, India

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Comparing the performance of LSTM and GRU for Text
  Summarization using Pointer Generator Networks
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Sajith
    email: sajithm@msn.com
    family-names: Madhavan
    orcid: 'https://orcid.org/0000-0002-8227-6144'
repository-code: >-
  https://github.com/sajithm/text_summarization_lstm_gru
license: Apache-2.0

GitHub Events

Total
Last Year