german-generated-poetic-texts
This repository publishes poetic texts in German generated by character-based recurrent neural network
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: wiley.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Keywords
Repository
This repository publishes poetic texts in German generated by character-based recurrent neural network
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
German Generated Poetic Texts - GGPT
Goal and content
This repository publishes poetic texts in German generated by character-based recurrent neural networks.
At first sight, it seems pointless to publish computer-generated poetic texts, since the computer can generate such texts in infinite numbers. In fact, however, such publication proves to be useful. For example, researchers may need generated texts for analysis, and it is difficult to obtain such texts quickly. The models that generate them require software customization, require certain versions of deep learning frameworks. The availability of trained models itself can also be questionable. These problems can cause a lot of headaches. In addition, the generation process may require special technical skills. This limits the work with such texts to scholars of the humanities.
This repository contains ready-to-use texts.
The models are trained on texts of German Hexameter, on poetry by Friedrich Hölderlin, Theodor Fontane and Paul Celan.
One model was trained on the Celan texts two on the Fontane and Hölderlin texts, and three on the Hexameter texts. Each model has been trained for its own number of epochs and has its own loss value.
Ten samples at least 28,000 characters in length were generated for each model and are presented in this repository.
Neural network architecture
Models were trained with the code developed by Andrej Karpathy for character-based multi-layer Recurrent Neural Networks (LSTM) in Torch.
Train sets
| Train corpus | Characters | Lines | | ----- | ---------- | ----- | | Hölderlin | 415,516 | 10,677| | Fontane | 365,360 | 10,327 | | Celan | 267,521 | 9,757 | | Hexameter | 605,627 | 12,516 |
Hölderlin's poems were crawled from this web site.
Hexameter lines extracted from large collection of German verses running by Thomas Haider.
Data
Ten samples with different temperature were generated for each model. For an explanation of the temperature concept, see the original Karpathy repository.
| Train | Epoch | Loss | Temperature | | ----- | ----- | ---- | ----------- | | Hölderlin | 43.75 | 1.3026 | 0.1 | | Hölderlin | 43.75 | 1.3026 | 0.2 | | Hölderlin | 43.75 | 1.3026 | 0.3 | | Hölderlin | 43.75 | 1.3026 | 0.4 | | Hölderlin | 43.75 | 1.3026 | 0.5 | | Hölderlin | 43.75 | 1.3026 | 0.6 | | Hölderlin | 43.75 | 1.3026 | 0.7 | | Hölderlin | 43.75 | 1.3026 | 0.8 | | Hölderlin | 43.75 | 1.3026 | 0.9 | | Hölderlin | 43.75 | 1.3026 | 1.0 | | Hölderlin | 50.00 | 1.3049 | 0.1 | | Hölderlin | 50.00 | 1.3049 | 0.2 | | Hölderlin | 50.00 | 1.3049 | 0.3 | | Hölderlin | 50.00 | 1.3049 | 0.4 | | Hölderlin | 50.00 | 1.3049 | 0.5 | | Hölderlin | 50.00 | 1.3049 | 0.6 | | Hölderlin | 50.00 | 1.3049 | 0.7 | | Hölderlin | 50.00 | 1.3049 | 0.8 | | Hölderlin | 50.00 | 1.3049 | 0.9 | | Hölderlin | 50.00 | 1.3049 | 1.0 | | Fontane | 42.25 | 1.4736 | 0.1 | | Fontane | 42.25 | 1.4736 | 0.2 | | Fontane | 42.25 | 1.4736 | 0.3 | | Fontane | 42.25 | 1.4736 | 0.4 | | Fontane | 42.25 | 1.4736 | 0.5 | | Fontane | 42.25 | 1.4736 | 0.6 | | Fontane | 42.25 | 1.4736 | 0.7 | | Fontane | 42.25 | 1.4736 | 0.8 | | Fontane | 42.25 | 1.4736 | 0.9 | | Fontane | 42.25 | 1.4736 | 1.0 | | Fontane | 80.00 | 1.5189 | 0.1 | | Fontane | 80.00 | 1.5189 | 0.2 | | Fontane | 80.00 | 1.5189 | 0.3 | | Fontane | 80.00 | 1.5189 | 0.4 | | Fontane | 80.00 | 1.5189 | 0.5 | | Fontane | 80.00 | 1.5189 | 0.6 | | Fontane | 80.00 | 1.5189 | 0.7 | | Fontane | 80.00 | 1.5189 | 0.8 | | Fontane | 80.00 | 1.5189 | 0.9 | | Fontane | 80.00 | 1.5189 | 1.0 | | Celan | 46.30 | 1.5115 | 0.1 | | Celan | 46.30 | 1.5115 | 0.2 | | Celan | 46.30 | 1.5115 | 0.3 | | Celan | 46.30 | 1.5115 | 0.4 | | Celan | 46.30 | 1.5115 | 0.5 | | Celan | 46.30 | 1.5115 | 0.6 | | Celan | 46.30 | 1.5115 | 0.7 | | Celan | 46.30 | 1.5115 | 0.8 | | Celan | 46.30 | 1.5115 | 0.9 | | Celan | 46.30 | 1.5115 | 1.0 | | hexameter | 14.34 | 1.3988 | 0.1 | | hexameter | 14.34 | 1.3988 | 0.2 | | hexameter | 14.34 | 1.3988 | 0.3 | | hexameter | 14.34 | 1.3988 | 0.4 | | hexameter | 14.34 | 1.3988 | 0.5 | | hexameter | 14.34 | 1.3988 | 0.6 | | hexameter | 14.34 | 1.3988 | 0.7 | | hexameter | 14.34 | 1.3988 | 0.8 | | hexameter | 14.34 | 1.3988 | 0.9 | | hexameter | 14.34 | 1.3988 | 1.0 | | hexameter | 43.01 | 1.3479 | 0.1 | | hexameter | 43.01 | 1.3479 | 0.2 | | hexameter | 43.01 | 1.3479 | 0.3 | | hexameter | 43.01 | 1.3479 | 0.4 | | hexameter | 43.01 | 1.3479 | 0.5 | | hexameter | 43.01 | 1.3479 | 0.6 | | hexameter | 43.01 | 1.3479 | 0.7 | | hexameter | 43.01 | 1.3479 | 0.8 | | hexameter | 43.01 | 1.3479 | 0.9 | | hexameter | 43.01 | 1.3479 | 1.0 | | hexameter | 80.00 | 1.3702 | 0.1 | | hexameter | 80.00 | 1.3702 | 0.2 | | hexameter | 80.00 | 1.3702 | 0.3 | | hexameter | 80.00 | 1.3702 | 0.4 | | hexameter | 80.00 | 1.3702 | 0.5 | | hexameter | 80.00 | 1.3702 | 0.6 | | hexameter | 80.00 | 1.3702 | 0.7 | | hexameter | 80.00 | 1.3702 | 0.8 | | hexameter | 80.00 | 1.3702 | 0.9 | | hexameter | 80.00 | 1.3702 | 1.0 |
See also metadata in TSV format.
Papers
Hölderlin generation was made for the poet's anniversary in 2020. See paper.
- Der digitale Superdichter. Vor 250 Jahren wurde Friedrich Hölderlin geboren. Heute kann Computertechnik neue Gedichte im Hölderlin-Sound generieren. Ein Werkstattbericht // Die Literarische Welt, 14 March 2020, p. 29.
- Neural reading. Insights from the analysis of poetry generated by artificial neural networks // Orbis Litterarum. 2020. Vol. 75. Number 5. P. 230—246. DOI: 10.1111/oli.12274
Models
Models are published on huggigface:
- Celan doi: 10.57967/hf/2278
- Fontane doi: 10.57967/hf/2279
- Hexameter doi: 10.57967/hf/2281
- Hölderlin doi: 10.57967/hf/2280
Citation
If you found this repository useful, please cite it with the URL.
@misc{orekhovboris2020ggpt,
author = {Orekhov, Boris},
month = sep,
title = {{German Generated Poetic Texts - GGPT}},
url = {https://github.com/nevmenandr/german-generated-poetic-texts},
year = {2022}
}
Owner
- Name: Boris Orekhov
- Login: nevmenandr
- Kind: user
- Location: Moscow
- Website: https://nevmenandr.github.io
- Twitter: nevmenandr
- Repositories: 42
- Profile: https://github.com/nevmenandr
Digital humanities researcher
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: German Generated Poetic Texts - GGPT
message: >-
If you use this dataset, please cite it using the metadata
from this file.
type: dataset
authors:
- given-names: Boris
family-names: Orekhov
email: nevmenandr@gmail.com
affiliation: HSE University
orcid: 'https://orcid.org/0000-0002-9099-0436'
identifiers:
- type: doi
value: 10.5281/zenodo.7114237
repository-code: 'https://github.com/nevmenandr/german-generated-poetic-texts'
url: >-
https://github.com/nevmenandr/german-generated-poetic-texts
abstract: >-
Package for the both chThis repository publishes poetic
texts in German generated by character-based recurrent
neural networks. At first sight, it seems pointless to
publish computer-generated poetic texts, since the
computer can generate such texts in infinite numbers. In
fact, however, such publication proves to be useful. For
example, researchers may need generated texts for
analysis, and it is difficult to obtain such texts
quickly. The models that generate them require software
customization, require certain versions of deep learning
frameworks. The availability of trained models itself can
also be questionable. These problems can cause a lot of
headaches. In addition, the generation process may require
special technical skills. This limits the work with such
texts to scholars of the humanities.ess and Chinese poetry
game. The rules of the game involve chess moves from games
in pgn format. The board for the game is a Chinese poem of
the Tang era, in which the line is equal to 7 characters
(the last, 8th vertical is most often punctuation marks).
When a piece makes a move, the characters on the board
change places. The character on which the piece was is
moved to the place of the character to which the move was
made.
keywords:
- german language
- poetry generation
- recurrent neural networks
license: GPL-1.0
commit: 6f9588cf69835e34d96fefd6d9fab04e667801b3
version: 1.0.0
date-released: '2022-09-26'
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0