littleprince_dataset
The little prince translations line by line - El principito traducido linea a linea - Le Petit Prince traduit ligne par ligne
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Repository
The little prince translations line by line - El principito traducido linea a linea - Le Petit Prince traduit ligne par ligne
Basic Info
- Host: GitHub
- Owner: jwackito
- License: mit
- Language: Python
- Default Branch: main
- Size: 446 KB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Aligned Multilingual "The Little Prince" Dataset
This repository provides a trilingual aligned version of The Little Prince (Le Petit Prince) by Antoine de Saint-Exupéry. It is intended for use in research and applications involving machine translation, parallel corpora, and cross-lingual language modeling.
Dataset Structure
The dataset is organized into three folders, one for each language:
.
├── english/
│ └── thelittleprince.txt
├── spanish/
│ └── elprincipito.txt
├── french/
│ └── lepetitprince.txt
Each file contains the full text of The Little Prince in its respective language. The files are line-aligned, meaning that:
- Line n in the English file corresponds to line n in the Spanish and French files.
This alignment enables straightforward use in multilingual NLP tasks.
Example
| English | Spanish | French | |----------------------------------|------------------------------------|----------------------------------| | Once when I was six years old... | Cuando tenía seis años... | Lorsque j'avais six ans... | | I saw a magnificent image... | Vi una magnífica imagen... | J'ai vu une magnifique image... |
Usage
You are welcome to use this dataset for academic and research purposes. Please cite the dataset as described below if you use it in your work.
License
MIT License
Citation
If you use this dataset, please cite it using the metadata below.
bibtex
@software{Bogado_Aligned_Multilingual_The_2025,
author = {Bogado, Joaquin and Giuliodoro, Germán},
license = {MIT License},
month = jun,
title = {{Aligned Multilingual 'The Little Prince' Dataset}},
url = {https://github.com/jwackito/littleprince_dataset},
version = {1.0.0},
year = {2025}
}
Owner
- Name: Joaquin Bogado
- Login: jwackito
- Kind: user
- Location: La Plata, Buenos Aires, Argentina
- Repositories: 11
- Profile: https://github.com/jwackito
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this dataset, please cite it using the metadata below."
title: "Aligned Multilingual 'The Little Prince' Dataset"
authors:
- family-names: Bogado
given-names: Joaquin
- family-names: Giuliodoro
given-names: Germán
date-released: 2025-06-28
version: "1.0.0"
license: "MIT License"
url: "https://github.com/jwackito/littleprince_dataset"
repository-code: "https://github.com/jwackito/littleprince_dataset"
abstract: >
A line-aligned trilingual dataset of Antoine de Saint-Exupéry's 'The Little Prince',
available in English, Spanish, and French. Useful for training and evaluating multilingual
language models and translation systems.
GitHub Events
Total
- Watch event: 1
- Push event: 21
- Pull request event: 5
- Fork event: 1
- Create event: 2
Last Year
- Watch event: 1
- Push event: 21
- Pull request event: 5
- Fork event: 1
- Create event: 2