latin-2nd-century-to-thomas-more
This repository contains annotation of latin-non-classical-data
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Repository
This repository contains annotation of latin-non-classical-data
Basic Info
- Host: GitHub
- Owner: chartes
- License: cc-by-4.0
- Language: Jupyter Notebook
- Default Branch: master
- Size: 2.06 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Presentation
This corpus has been designed by Anthony Glaise and Thibault Clérice for testing the robustness of our MSD and lemmatization models trained on classical Latin data. It was built in three phases, which you can find in raw.
Anthony Glaise corrected and checked all annotations, chose most of the texts with guidelines by Thibault Clérice. Thibault Clérice provided data engineering, technical support, annotations guidelines and data analysis (publication forthcoming).
All data were annotated by the Latin Lasla Model (version might vary) and Pie-Extended. They were corrected on Pyrrha using the École nationale des Chartes' instance.
Cite it with
tex
@misc{Glaise_Du_IIeme_siecle_2021,
author = {Glaise, Anthony and Clérice, Thibault},
doi = {10.5281/zenodo.6383162},
month = {12},
title = {{Du IIème siècle à Thomas More, un corpus gold de latin lemmatisé et annoté en morpho-syntaxe}},
url = {https://github.com/chartes/latin-2nd-century-to-thomas-more},
year = {2021}
}
or
Glaise, A., & Clérice, T. (2021). Du IIème siècle à Thomas More, un corpus gold de latin lemmatisé et annoté en morpho-syntaxe (Version 0.0.1) [Data set]. https://doi.org/10.5281/zenodo.6383162
Data
In raw, you'll find data grouped by phase, in their "raw" form after the export from Pyrrha.
In lasla-model-ready, you'll find each text separately, preprocessed in order to be valid with current Pie-Extended LASLA models.
In lasla-model-ready.tsv, you'll find all the content of lasla-model-ready in a single file.
Total
There are 57123 tokens registered in this corpus.
| POS | Number of tokens | |------------|--------------------| | NOMcom | 11310 | | VER | 10641 | | PUNC | 9292 | | CONcoo | 3903 | | PRE | 3458 | | ADJqua | 3296 | | ADV | 3094 | | PROdem | 2305 | | CONsub | 1775 | | PROrel | 1478 | | NOMpro | 1396 | | PROind | 1036 | | PROper | 674 | | ADVneg | 665 | | ADVrel | 430 | | VERaux | 407 | | PROpos | 393 | | PROref | 338 | | PROpos.ref | 316 | | ADJcar | 285 | | PROint | 147 | | ADJord | 121 | | ADVint | 110 | | OUT | 110 | | INJ | 56 | | ADJadv.ord | 30 | | ADJdis | 26 | | ADJadv.mul | 13 | | ADJmul | 11 | | ADVint.neg | 7 |
File Phase-1.tsv
| First Token | Text title | Number of tokens | |----------------------------------------------------------|------------------------------------------------|--------------------| | urn:cts:latinLit:stoa0275.stoa022.opp-lat1:3 | Tertullien, De pallio | 684 | | urn:cts:latinLit:stoa0275.stoa027.opp-lat2:9-10 | Tertullien, De spectaculis | 717 | | urn:cts:latinLit:stoa0040.stoa003.opp-lat4:17.4 | Augustin, De civitate Dei | 2990 | | urn:cts:latinLit:stoa0040.stoa011.opp-lat4:262.1-262.4 | Augustin, Lettre CCLXII | 670 | | urn:cts:latinLit:stoa0270.stoa002.opp-lat2:9-10 | Sulpice Sévère, Vita Martini | 598 | | urn:cts:latinLit:stoa0238.stoa002.perseus-lat2:pr.1-1.20 | Prudence, Psychomachie | 555 | | urn:cts:latinLit:stoa0096.stoa003.opp-lat2:1.35-1.37 | Commodien, Instructiones | 506 | | urn:cts:latinLit:stoa0104a.stoa010.opp-lat1:6-8 | Cyprien de Carthage, De unitate Ecclesiae | 751 | | urn:cts:latinLit:stoa0249a.stoa002.opp-lat1:6.53-6.60 | Salvien de Marseille, De gubernatione Dei | 619 | | urn:cts:latinLit:stoa0076c.stoa002.opp-lat2:8.8-8.10 | Jean Cassien, Institutiones | 625 | | urn:cts:latinLit:stoa0022.stoa044.opp-lat1:1-8 | Ambroise de Milan, De Tobia | 653 | | urn:cts:latinLit:stoa0054.stoa001a.opp-lat1:1-2 | Bède le Vénérable, De locis sanctis | 1073 | | urn:cts:latinLit:stoa0149b.stoa001.opp-lat1:2 | Hilaire de Poitiers, Tractatus super psalmos | 9931 | | urn:cts:latinLit:stoa0171.stoa002.opp-lat1:26-27 | Lactance, De mortibus persecutoruma | 656 | | urn:cts:latinLit:stoa0162.stoa024.opp-lat1:1.1.1-1.2.3 | Jérôme, Commentaire sur Jérémie | 604 | | urn:cts:latinLit:stoa0058.stoa023.perseus-lat1:3 | Boèce, Contra Eutychen et Nestorium | 811 | | urn:cts:latinLit:stoa0143.stoa001:@.30-2.31 | Grégoire de Tours, Historia Francorum | 685 | | urn:cts:latinLit:stoa0261.stoa002:4.3 | Sidoine Apollinaire, Lettres | 860 | | urn:cts:latinLit:stoa0112.stoa001:1-3 | Eginhard, Vie de Charlemagne | 723 |
File Phase-2.tsv
| First Token | Text title | Number of tokens | |---------------------------------------------------------------------|--------------------------------------------------------|--------------------| | [REF:urn:cts:latinLit:stoa0187b.stoa002:1-4] | Mamertin, Panegyricus dicto Juliano imperatori | 790 | | [REF:urn:cts:latinLit:stoa0287.stoa001:1.20] | Végèce, Epitome de re militari | 637 | | [REF:urn:cts:latinLit:stoa0186-stoa001:3.14] | Macrobe, Saturnales | 858 | | [REF:urn:cts:latinLit:phi2331.phi004.perseus-lat1:25-27] | Histoire Auguste, Marc Aurèle | 660 | | [REF:urn:cts:latinLit:stoa0041a.stoa005:1.18-1.28] | Caelius Aurelianus, Gynaeciorum Sorani | 761 | | [REF:urn:cts:latinLit:stoa0110.stoa002a:47-63] | Donat, In Bucolicis Vergilii commentarium, praefatio | 817 | | [REF:urn:cts:latinLit:stoa0285c.stoa001] | Vacca, Vita M. Annaei Lucani | 571 | | [REF:urn:cts:latinLit:stoa0116c.stoa001:1-2] | Euantius, De comoedia uel de fabula | 892 | | [REF:urn:cts:latinLit:stoa0107.stoa001:38-40] | Darès de Phrygie, De excidio Troiae historia | 562 | | [REF:urn:cts:latinLit:stoa0023.stoa001.perseus-lat2:20.3.1-20.3.12] | Ammien Marcellin, Res gestae | 881 | | [REF:urn:cts:latinLit:stoa0146b.stoa001:5.2] | Hégésippe, Histoires | 447 |
File Phase-3.tsv
| First Token | Text title | Number of tokens | |-----------------------------|--------------------------------------------------------------|--------------------| | [REF:More] | Thomas More, Utopia | 11187 | | [REF:LegendeAndre] | Jacques de Voragine, Saint André (Legende Dorée) | 3618 | | [REF:LegendeAntoine] | Jacques de Voragine, Saint Antoine (Legende Dorée) | 1590 | | [REF:LegendeAlexis] | Jacques de Voragine, Saint Alexis (Legende Dorée) | 1488 | | [REF:LegendeSylvestre] | Jacques de Voragine, Saint Sylvestre (Legende Dorée) | 1632 | | [REF:LegendeLucie] | Jacques de Voragine, Saint Lucie (Legende Dorée) | 1188 | | [REF:LegendeMarieMadeleine] | Jacques de Voragine, Saint Marie-Madeleine (Legende Dorée) | 1228 | | [REF:LegendeThomas] | Jacques de Voragine, Saint Thomas (Legende Dorée) | 1824 | | [REF:LegendeFrancois] | Jacques de Voragine, Saint François (Legende Dorée) | 781 |
Owner
- Name: École nationale des chartes
- Login: chartes
- Kind: organization
- Location: 65 rue de Richelieu, 75002 Paris
- Website: http://www.chartes.psl.eu/
- Repositories: 12
- Profile: https://github.com/chartes
Grand établissement d’enseignement supérieur dédié à la recherche historique
Citation (CITATION.CFF)
cff-version: 1.2.0 message: "If you use this dataset, please cite it as below." authors: - family-names: "Glaise" given-names: "Anthony" orcid: https://orcid.org/"0000-0003-4715-5184" - family-names: "Clérice" given-names: "Thibault" orcid: "https://orcid.org/0000-0003-1852-9204" title: "Du IIème siècle à Thomas More, un corpus gold de latin lemmatisé et annoté en morpho-syntaxe" version: 1.0.2 doi: 10.5281/zenodo.6383162 date-released: 2021-12-09 type: data url: "https://github.com/chartes/latin-2nd-century-to-thomas-more"