dh2025
Repository containing the data and code for our short paper on the novel beginning study presented at DH2025 in Lisbon
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization canspinproject has institutional domain (www.canspin.uni-rostock.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
Repository containing the data and code for our short paper on the novel beginning study presented at DH2025 in Lisbon
Basic Info
- Host: GitHub
- Owner: CANSpiNproject
- Language: HTML
- Default Branch: main
- Size: 28.5 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 5
Metadata Files
README.md
dh2025
This repository contains the data and code for our short paper "They crossed the valley of Catamarca: A study of narrative space in novel openings" presented at DH2025 in Lisbon.
Content
Folders and files
- cs1 annotation data as
.tsvfiles:/canspin-deu-19/canspin-deu-20(for legal reasons, this data from the 20th century is only available as shuffled tsv)/canspin-spa-19/canspin-lat-19
- cs1 annotation data as Catma project:
/CATMA_4AA4ADC0-4C28-54F9-B6A1-5DCEFF34B90B_DH2025_CANSpiN
- data and documentation of the novel beginning analysis:
/novel_beginning_analysiscategories.md: documents our definitions of the novel beginning analysis categories and their application to the textscategorization.tsv: contains the novel beginning analysis data
- data and visualizations derived from the analysis:
/resultsannotation_distribution__<chapter_id>.html.json: contains data on the distribution of cs1 annotations over a chapter in text units of 200 (spa-19 and lat-19) and 300 tokens (deu-19 and deu-20)annotation_statistics__first_1000_token.json: documents the relative and absolute cs1 annotations amounts and most frequent token per annotation class for the first 1000 token of the first chapters of all textsannotation_statistics__whole_chapters.json: documents the relative and absolute cs1 annotations amounts and most frequent token per annotation class for the whole first chapters of all texts/visualizationsannotation_distribution__<chapter_id>.html/.png: visualizes the distribution data of cs1 annotations for each chaptercs1_annotation_amounts__1000_tokens.html/.png: shows the proportion of annotation amount to the token amount in the first 1000 tokens of each textcs1_annotation_amounts__all_tokens.html/.png: shows the proportion of annotation amount to the token amount in the whole first chapter of each textfirst_character_event_overview.png: shows the token position the first character event occurs in each textfirst-character-event-cs1-relation__<chapter_id>.png: combines the data on cs1 annotation distribution of each chapter with the first character event data of each chapter
- bibliography of the short paper:
bibliography.bib
- notebook to recreate analysis results that are already saved in the
/resultsfolder:perform_analysis.ipynb
Corpus overview
It consists of the first chapters of eight german, spanish, and latin-american novels from the 19th and 20th century. The data originates from the corpora of the European Literary Text Collection (ELTeC), the Corpus de novelas hispanoamericanas del siglo XIX (conha19), the Complete Works of Uwe Johnson project (CWUJ), and E-Books.
| Corpus | ID | Title | Author | Year | Token | Source | |--------|----|-------|--------|------|-------|--------| | DEU19 | DEU19001 | Weisse Sclaven oder die Leiden des Volkes | Willkomm, Ernst Adolf | 1845 | 5491 | ELTeC-deu | | DEU19 | DEU19030 | Die verlorene Handschrift | Freytag, Gustav | 1864 | 7179 | ELTeC-deu | | DEU20 | DEU20002 | Ansichten eines Clowns | Böll, Heinrich | 1963 | 2689 | E-Book: Kiepenheuer & Witsch 2009 | restricted | | DEU20 | DEU20021 | Zwei Ansichten | Johnson, Uwe | 1965 | 744 | CWUJ | restricted | | SPA19 | SPA19001 | El Señor de Bembibre | Gil y Carrasco, Enrique | 1855 | 1883 | ELTeC-spa | | SPA19 | SPA19008 | Los templarios | Mora, Juan de Dios | 1856 | 4309 | ELTeC-spa | | LAT19 | LAT19004 | El falso Inca. Cronicón de la conquista | Payró, Roberto | 1905 | 1210 | conha19 | | LAT19 | LAT19041 | El pozo del Yocci | Gorriti, Juana Manuela | 1876 | 1074 | conha19 |
Annotation overview
Classes
The annotation system CANSpiN.CS1 (v1.1.0) is defined in the respective guideline.
Amount

Usage
To use the notebook perform_analysis.ipynb, install the gitma-canspin package (v1.6.5) following the instructions of its README. The notebook enables the user to reproduce the analysis steps we have performed. It is not necessary to execute it, if you wish to see the analysis results only. In this case, see our paper and the content of the /results folder.
Licenses
The original texts are in the public domain, with the exception of the German-language novels from the 20th century, which are protected by copyright. Accordingly, the latter data is published here in a derived format as shuffled .tsv only.
We publish the annotations under Creative Commons Attribution International 4.0 licence, the Jupyter Notebook under GNU General Public License 3.
The Aspekta font used for the creation of visualizations with the Pillow package in the notebook is licensed under the Open Font License 1.1.
Owner
- Name: Computational Approaches to Narrative Space in 19th and 20th Century Novels
- Login: CANSpiNproject
- Kind: organization
- Location: Germany
- Website: https://www.canspin.uni-rostock.de/en
- Repositories: 1
- Profile: https://github.com/CANSpiNproject
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: dh2025
message: >-
If you use this dataset, please cite it using the metadata
from this file.
type: dataset
authors:
- given-names: Nils
family-names: Kellner
affiliation: University of Rostock
orcid: 'https://orcid.org/0009-0002-3966-5635'
- given-names: Marc
family-names: Lemke
affiliation: University of Rostock
orcid: 'https://orcid.org/0009-0004-8065-8191'
- given-names: Ulrike
family-names: Henny-Krahmer
affiliation: University of Rostock
orcid: 'https://orcid.org/0000-0003-2852-065X'
- given-names: Julián C.
family-names: Spinelli
affiliation: University of Buenos Aires
orcid: 'https://orcid.org/0009-0003-0895-815X'
- given-names: Erik
family-names: Renz
affiliation: University of Rostock
orcid: 'https://orcid.org/0009-0005-8288-7470'
- given-names: Anika
family-names: Piotraschke
affiliation: University of Rostock
orcid: 'https://orcid.org/0009-0004-3076-5781'
identifiers:
- type: doi
value: 10.5281/zenodo.15423438
repository-code: 'https://github.com/CANSpiNproject/dh2025'
url: 'https://www.canspin.uni-rostock.de/en'
abstract: >-
This is a repository containing the data and code for our short paper
"They crossed the valley of Catamarca: A study of narrative space in novel openings"
presented at DH2025 in Lisbon.
keywords:
- CANSpiN
- SPP 2207
- Digital Humanities
- Computational Literary Studies
- DH2025
license: CC-BY-4.0
commit: b4715de59e57a3646c9e48e84cd98c6faf797ad6
version: 1.0.4
date-released: '2025-05-23'
GitHub Events
Total
- Release event: 3
- Delete event: 2
- Push event: 40
- Create event: 7
Last Year
- Release event: 3
- Delete event: 2
- Push event: 40
- Create event: 7