https://github.com/apachecn-archive/planning-based-hierarchical-variational-model

https://github.com/apachecn-archive/planning-based-hierarchical-variational-model

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: apachecn-archive
  • Language: Python
  • Default Branch: master
  • Size: 18.6 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 3 years ago · Last pushed about 3 years ago
Metadata Files
Readme

README.md

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model

Introduction

Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions. To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM). Our model first plans a sequence of groups (each group is a subset of input items to be covered by a sentence) and then realizes each sentence conditioned on the planning result and the previously generated context, thereby decomposing long text generation into dependent sentence generation sub-tasks. To capture expression diversity, we devise a hierarchical latent structure where a global planning latent variable models the diversity of reasonable planning and a sequence of local latent variables controls sentence realization.

This project is a Tensorflow implementation of our work.

Requirements

  • Python 3.6
  • Numpy
  • Tensorflow 1.4.0

Quick Start

  • Dataset

    Our dataset contains 119K pairs of product specifications and the corresponding advertising text. For more information, please refer to our paper.

  • Preprocess

    • Download data from https://drive.google.com/open?id=1vB0fT1ex2Tsid-i5s-jqdz9QUFbCh0CO and unzip the file, which will create a new directory named data. The path to our dataset is ./data/data.jsonl.
    • We provided most preprocessed data under ./data/processed/ except pre-trained word embeddings which can be generated with the following command line:

    bash preprocess.sh

  • Train

    ./run.sh

  • Test

    ./test.sh

Citation

Our paper is available at https://arxiv.org/abs/1908.06605v2.

Please kindly cite our paper if this paper and the code are helpful.

Owner

  • Name: ApacheCN 归档
  • Login: apachecn-archive
  • Kind: organization
  • Email: wizard.z@qq.com

防止重要项目丢失而设立的归档

GitHub Events

Total
Last Year