littleprince_dataset

The little prince translations line by line - El principito traducido linea a linea - Le Petit Prince traduit ligne par ligne

https://github.com/jwackito/littleprince_dataset

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The little prince translations line by line - El principito traducido linea a linea - Le Petit Prince traduit ligne par ligne

Basic Info
  • Host: GitHub
  • Owner: jwackito
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 446 KB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Aligned Multilingual "The Little Prince" Dataset

This repository provides a trilingual aligned version of The Little Prince (Le Petit Prince) by Antoine de Saint-Exupéry. It is intended for use in research and applications involving machine translation, parallel corpora, and cross-lingual language modeling.

Dataset Structure

The dataset is organized into three folders, one for each language: . ├── english/ │ └── thelittleprince.txt ├── spanish/ │ └── elprincipito.txt ├── french/ │ └── lepetitprince.txt Each file contains the full text of The Little Prince in its respective language. The files are line-aligned, meaning that: - Line n in the English file corresponds to line n in the Spanish and French files.

This alignment enables straightforward use in multilingual NLP tasks.

Example

| English | Spanish | French | |----------------------------------|------------------------------------|----------------------------------| | Once when I was six years old... | Cuando tenía seis años... | Lorsque j'avais six ans... | | I saw a magnificent image... | Vi una magnífica imagen... | J'ai vu une magnifique image... |

Usage

You are welcome to use this dataset for academic and research purposes. Please cite the dataset as described below if you use it in your work.

License

MIT License

Citation

If you use this dataset, please cite it using the metadata below. bibtex @software{Bogado_Aligned_Multilingual_The_2025, author = {Bogado, Joaquin and Giuliodoro, Germán}, license = {MIT License}, month = jun, title = {{Aligned Multilingual 'The Little Prince' Dataset}}, url = {https://github.com/jwackito/littleprince_dataset}, version = {1.0.0}, year = {2025} }

Owner

  • Name: Joaquin Bogado
  • Login: jwackito
  • Kind: user
  • Location: La Plata, Buenos Aires, Argentina

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset, please cite it using the metadata below."
title: "Aligned Multilingual 'The Little Prince' Dataset"
authors:
  - family-names: Bogado
    given-names: Joaquin
  - family-names: Giuliodoro
    given-names: Germán
date-released: 2025-06-28
version: "1.0.0"
license: "MIT License"
url: "https://github.com/jwackito/littleprince_dataset"
repository-code: "https://github.com/jwackito/littleprince_dataset"
abstract: >
  A line-aligned trilingual dataset of Antoine de Saint-Exupéry's 'The Little Prince',
  available in English, Spanish, and French. Useful for training and evaluating multilingual
  language models and translation systems.

GitHub Events

Total
  • Watch event: 1
  • Push event: 21
  • Pull request event: 5
  • Fork event: 1
  • Create event: 2
Last Year
  • Watch event: 1
  • Push event: 21
  • Pull request event: 5
  • Fork event: 1
  • Create event: 2