german-generated-poetic-texts

This repository publishes poetic texts in German generated by character-based recurrent neural network

https://github.com/nevmenandr/german-generated-poetic-texts

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: wiley.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary

Keywords

dataset german-language poetry-generation
Last synced: 10 months ago · JSON representation ·

Repository

This repository publishes poetic texts in German generated by character-based recurrent neural network

Basic Info
  • Host: GitHub
  • Owner: nevmenandr
  • License: cc0-1.0
  • Default Branch: main
  • Homepage:
  • Size: 808 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
dataset german-language poetry-generation
Created over 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

DOI

German Generated Poetic Texts - GGPT

Goal and content

This repository publishes poetic texts in German generated by character-based recurrent neural networks.

At first sight, it seems pointless to publish computer-generated poetic texts, since the computer can generate such texts in infinite numbers. In fact, however, such publication proves to be useful. For example, researchers may need generated texts for analysis, and it is difficult to obtain such texts quickly. The models that generate them require software customization, require certain versions of deep learning frameworks. The availability of trained models itself can also be questionable. These problems can cause a lot of headaches. In addition, the generation process may require special technical skills. This limits the work with such texts to scholars of the humanities.

This repository contains ready-to-use texts.

The models are trained on texts of German Hexameter, on poetry by Friedrich Hölderlin, Theodor Fontane and Paul Celan.

One model was trained on the Celan texts two on the Fontane and Hölderlin texts, and three on the Hexameter texts. Each model has been trained for its own number of epochs and has its own loss value.

Ten samples at least 28,000 characters in length were generated for each model and are presented in this repository.

Neural network architecture

Models were trained with the code developed by Andrej Karpathy for character-based multi-layer Recurrent Neural Networks (LSTM) in Torch.

Train sets

| Train corpus | Characters | Lines | | ----- | ---------- | ----- | | Hölderlin | 415,516 | 10,677| | Fontane | 365,360 | 10,327 | | Celan | 267,521 | 9,757 | | Hexameter | 605,627 | 12,516 |

Hölderlin's poems were crawled from this web site.

Hexameter lines extracted from large collection of German verses running by Thomas Haider.

Data

Ten samples with different temperature were generated for each model. For an explanation of the temperature concept, see the original Karpathy repository.

| Train | Epoch | Loss | Temperature | | ----- | ----- | ---- | ----------- | | Hölderlin | 43.75 | 1.3026 | 0.1 | | Hölderlin | 43.75 | 1.3026 | 0.2 | | Hölderlin | 43.75 | 1.3026 | 0.3 | | Hölderlin | 43.75 | 1.3026 | 0.4 | | Hölderlin | 43.75 | 1.3026 | 0.5 | | Hölderlin | 43.75 | 1.3026 | 0.6 | | Hölderlin | 43.75 | 1.3026 | 0.7 | | Hölderlin | 43.75 | 1.3026 | 0.8 | | Hölderlin | 43.75 | 1.3026 | 0.9 | | Hölderlin | 43.75 | 1.3026 | 1.0 | | Hölderlin | 50.00 | 1.3049 | 0.1 | | Hölderlin | 50.00 | 1.3049 | 0.2 | | Hölderlin | 50.00 | 1.3049 | 0.3 | | Hölderlin | 50.00 | 1.3049 | 0.4 | | Hölderlin | 50.00 | 1.3049 | 0.5 | | Hölderlin | 50.00 | 1.3049 | 0.6 | | Hölderlin | 50.00 | 1.3049 | 0.7 | | Hölderlin | 50.00 | 1.3049 | 0.8 | | Hölderlin | 50.00 | 1.3049 | 0.9 | | Hölderlin | 50.00 | 1.3049 | 1.0 | | Fontane | 42.25 | 1.4736 | 0.1 | | Fontane | 42.25 | 1.4736 | 0.2 | | Fontane | 42.25 | 1.4736 | 0.3 | | Fontane | 42.25 | 1.4736 | 0.4 | | Fontane | 42.25 | 1.4736 | 0.5 | | Fontane | 42.25 | 1.4736 | 0.6 | | Fontane | 42.25 | 1.4736 | 0.7 | | Fontane | 42.25 | 1.4736 | 0.8 | | Fontane | 42.25 | 1.4736 | 0.9 | | Fontane | 42.25 | 1.4736 | 1.0 | | Fontane | 80.00 | 1.5189 | 0.1 | | Fontane | 80.00 | 1.5189 | 0.2 | | Fontane | 80.00 | 1.5189 | 0.3 | | Fontane | 80.00 | 1.5189 | 0.4 | | Fontane | 80.00 | 1.5189 | 0.5 | | Fontane | 80.00 | 1.5189 | 0.6 | | Fontane | 80.00 | 1.5189 | 0.7 | | Fontane | 80.00 | 1.5189 | 0.8 | | Fontane | 80.00 | 1.5189 | 0.9 | | Fontane | 80.00 | 1.5189 | 1.0 | | Celan | 46.30 | 1.5115 | 0.1 | | Celan | 46.30 | 1.5115 | 0.2 | | Celan | 46.30 | 1.5115 | 0.3 | | Celan | 46.30 | 1.5115 | 0.4 | | Celan | 46.30 | 1.5115 | 0.5 | | Celan | 46.30 | 1.5115 | 0.6 | | Celan | 46.30 | 1.5115 | 0.7 | | Celan | 46.30 | 1.5115 | 0.8 | | Celan | 46.30 | 1.5115 | 0.9 | | Celan | 46.30 | 1.5115 | 1.0 | | hexameter | 14.34 | 1.3988 | 0.1 | | hexameter | 14.34 | 1.3988 | 0.2 | | hexameter | 14.34 | 1.3988 | 0.3 | | hexameter | 14.34 | 1.3988 | 0.4 | | hexameter | 14.34 | 1.3988 | 0.5 | | hexameter | 14.34 | 1.3988 | 0.6 | | hexameter | 14.34 | 1.3988 | 0.7 | | hexameter | 14.34 | 1.3988 | 0.8 | | hexameter | 14.34 | 1.3988 | 0.9 | | hexameter | 14.34 | 1.3988 | 1.0 | | hexameter | 43.01 | 1.3479 | 0.1 | | hexameter | 43.01 | 1.3479 | 0.2 | | hexameter | 43.01 | 1.3479 | 0.3 | | hexameter | 43.01 | 1.3479 | 0.4 | | hexameter | 43.01 | 1.3479 | 0.5 | | hexameter | 43.01 | 1.3479 | 0.6 | | hexameter | 43.01 | 1.3479 | 0.7 | | hexameter | 43.01 | 1.3479 | 0.8 | | hexameter | 43.01 | 1.3479 | 0.9 | | hexameter | 43.01 | 1.3479 | 1.0 | | hexameter | 80.00 | 1.3702 | 0.1 | | hexameter | 80.00 | 1.3702 | 0.2 | | hexameter | 80.00 | 1.3702 | 0.3 | | hexameter | 80.00 | 1.3702 | 0.4 | | hexameter | 80.00 | 1.3702 | 0.5 | | hexameter | 80.00 | 1.3702 | 0.6 | | hexameter | 80.00 | 1.3702 | 0.7 | | hexameter | 80.00 | 1.3702 | 0.8 | | hexameter | 80.00 | 1.3702 | 0.9 | | hexameter | 80.00 | 1.3702 | 1.0 |

See also metadata in TSV format.

Papers

Hölderlin generation was made for the poet's anniversary in 2020. See paper.

Models

Models are published on huggigface:

Citation

If you found this repository useful, please cite it with the URL.

@misc{orekhovboris2020ggpt, author = {Orekhov, Boris}, month = sep, title = {{German Generated Poetic Texts - GGPT}}, url = {https://github.com/nevmenandr/german-generated-poetic-texts}, year = {2022} }

Owner

  • Name: Boris Orekhov
  • Login: nevmenandr
  • Kind: user
  • Location: Moscow

Digital humanities researcher

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: German Generated Poetic Texts - GGPT
message: >-
  If you use this dataset, please cite it using the metadata
  from this file.
type: dataset
authors:
  - given-names: Boris
    family-names: Orekhov
    email: nevmenandr@gmail.com
    affiliation: HSE University
    orcid: 'https://orcid.org/0000-0002-9099-0436'
identifiers:
  - type: doi
    value: 10.5281/zenodo.7114237
repository-code: 'https://github.com/nevmenandr/german-generated-poetic-texts'
url: >-
  https://github.com/nevmenandr/german-generated-poetic-texts
abstract: >-
  Package for the both chThis repository publishes poetic
  texts in German generated by character-based recurrent
  neural networks. At first sight, it seems pointless to
  publish computer-generated poetic texts, since the
  computer can generate such texts in infinite numbers. In
  fact, however, such publication proves to be useful. For
  example, researchers may need generated texts for
  analysis, and it is difficult to obtain such texts
  quickly. The models that generate them require software
  customization, require certain versions of deep learning
  frameworks. The availability of trained models itself can
  also be questionable. These problems can cause a lot of
  headaches. In addition, the generation process may require
  special technical skills. This limits the work with such
  texts to scholars of the humanities.ess and Chinese poetry
  game. The rules of the game involve chess moves from games
  in pgn format. The board for the game is a Chinese poem of
  the Tang era, in which the line is equal to 7 characters
  (the last, 8th vertical is most often punctuation marks).
  When a piece makes a move, the characters on the board
  change places. The character on which the piece was is
  moved to the place of the character to which the move was
  made.
keywords:
  - german language
  - poetry generation
  - recurrent neural networks
license: GPL-1.0
commit: 6f9588cf69835e34d96fefd6d9fab04e667801b3
version: 1.0.0
date-released: '2022-09-26'

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels