https://github.com/chartes/of3c
Old French Collective Corpus of the École des chartes
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.2%) to scientific vocabulary
Keywords
Repository
Old French Collective Corpus of the École des chartes
Basic Info
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
OF3C - Old French Collective Corpus of the École des chartes
Cite this corpus
- Camps, Jean-Baptiste, Clérice, Thibault, Duval, Frédéric, Kanaoka, Naomi & Pinche, Ariane (2021). Corpus and Models for Lemmatisation and POS-tagging of Old French, arXiv preprint arXiv:2109.11442, https://arxiv.org/abs/2109.11442.
Sources
- [Chrestien]: Kunstmann, Pierre (éd), Chrétien de Troyes: Cligès, Erec, Lancelot, Perceval, Yvain – manuscrit P (BnF fr. 794), 2009, http://www.atilf.fr/dect.
- [Code]: Duval and Pastore, in progress.
- [DocLing]: Gleßgen, Martin Dietrich (dir.), et al., Les plus anciens documents linguistiques de la France, 2016, http://www.rose.uzh.ch/docling/, 3e édition.
- [Geste]: Camps, Jean-Baptiste (dir.), Geste: un corpus de chansons de geste, 2016-… (v02), École nationale des chartes, Paris, 2019, http://doi.org/10.5281/zenodo.2630574, textes du domaine public, développements CC-BY-SA.
- [Lancelot]: Ing, Lucence, Disparitions lexicales en diachronie: traitements automatiques sur le Lancelot en prose, thèse de doct. en préparation, dir. F. Duval, codir. J.B. Camps, École nationale des chartes, Université PSL, Paris.
- [WauchierSConf] Pinche, Ariane, Édition nativement numérique du recueil hagiographique ‘Li Seint Confessor’ de Wauchier de Denain d’après le manuscrit fr. 412 de la Bibliothèque nationale de France, thèse de doctorat dir. C. pierreville et B. Bureau, Université de Lyon, Lyon, 2021.
The [Varia] are composed of short excerpts, taken from the work of students at the École des chartes, annotated in 2020, as part of the evaluation of the course initiation à la philologie romane: introduction au moyen français, given by Lucence Ing and Jean-Baptiste Camps (thematic dossier on the plague and medicine, during the first lockdown of 2020 of the COVID19 pandemic)
Texts from:
- Chroniques de Froissart after Paris ms. fr. 2663, 168v.-169r Online Froissart : P63, SHF 1-318
- Chroniques de Froissart, after London Arundel 67 (vol. 1), 360r-360v Online Froissart : L67, SHF 1-330
- Great surgery by Guy de Chauliac From the ed. by Nicaise, Edouard (1890) p. 167 ff
- Poésies de Gilles li Muisis, published for the first time, according to the manuscript of Lord Ashburnham by baron Kervyn de Lettenhove, Louvain, 1882, https://archive.org/details/posiesdegilles01lemuuoft/page/78/mode/2up,
Statistics (2023-04-26)
Token, Lemma and POS counts
| Category | Different | Total | Values with 1 occurrence only | |------------|-------------|-----------|---------------------------------| | Forms | 47,661 | 1,183,960 | 23,851 | | Lemma | 11,295 | 1,183,960 | 3,852 | | POS | 66 | 1,183,960 | 6 |
Morphology counts
Non-x values means that the category actually applied to the token: a verb will have a DEGRE annotation of x, because verb can't have DEGRE.
| Category | Different | Total | Non-x values | |------------|-------------|---------|----------------| | Mode | 6 | 478,657 | 60,740 | | Temps | 5 | 478,657 | 57,367 | | Personne | 5 | 478,657 | 106,566 | | Nombre | 3 | 478,657 | 290,326 | | Genre | 4 | 478,657 | 226,996 | | Cas | 4 | 478,657 | 229,586 | | Degre | 5 | 478,657 | 42,949 |
POS
| Value | Count | |---------------|---------| | NOMcom | 160,410 | | VERcjg | 156,630 | | PROper | 96,533 | | PRE | 91,586 | | PONfbl | 79,784 | | ADVgen | 79,578 | | CONcoo | 66,658 | | DETdef | 57,655 | | PONfrt | 42,489 | | CONsub | 40,120 | | VERppe | 35,647 | | ADJqua | 31,675 | | VERinf | 28,218 | | NOMpro | 27,872 | | ADVneg | 25,947 | | PROrel | 25,542 | | DETpos | 22,367 | | PROadv | 15,003 | | PRE.DETdef | 14,836 | | PROdem | 14,327 | | PROind | 11,661 | | DETind | 10,985 | | PONpga | 7,707 | | DETndf | 7,076 | | DETdem | 6,057 | | PONpdr | 4,842 | | DETcar | 3,229 | | VERppa | 2,784 | | ADJind | 2,575 | | PROimp | 2,036 | | PROcar | 1,855 | | ADJcar | 1,277 | | ADJpos | 1,049 | | PROint | 1,014 | | PONpxx | 1,012 | | ADVneg.PROper | 952 | | PROpos | 669 | | ADJord | 636 | | ADVsub | 592 | | INJ | 549 | | ADVint | 506 | | DETrel | 448 | | PROord | 327 | | PROper.PROper | 311 | | ADVgen.PROper | 271 | | DETint | 225 | | PRE.PROdem | 151 | | DETcom | 52 | | PRE.PROper | 47 | | PROrel.PROper | 46 | | RED | 34 | | ETR | 33 | | CONsub.PROper | 18 | | ADVgen.CONsub | 16 | | PRE.DETcom | 12 | | DETord | 8 | | ADJqua.NOMcom | 7 | | PRE.PROrel | 4 | | ADVing | 2 | | ADVneg.PROadv | 2 | | PROint.PROper | 1 | | CONsubs | 1 | | ADVgen.PROadv | 1 | | NomPro | 1 | | PRE.DETrel | 1 | | CONsub.DETdef | 1 |
Mode
| Value | Count | |-----------|---------| | MODE=x | 417,917 | | MODE=ind | 51,951 | | MODE=sub | 5,416 | | MODE=imp | 2,061 | | MODE=con | 1,311 | | MODE=cond | 1 |
Temps
| Value | Count | |-----------|---------| | TEMPS=x | 421,290 | | TEMPS=pst | 29,150 | | TEMPS=psp | 14,882 | | TEMPS=ipf | 9,012 | | TEMPS=fut | 4,323 |
Personne
| Value | Count | |---------|---------| | PERS.=x | 372,091 | | PERS.=3 | 76,497 | | PERS.=1 | 18,377 | | PERS.=2 | 11,455 | | PERS.=0 | 237 |
Nombre
| Value | Count | |---------|---------| | NOMB.=s | 218,952 | | NOMB.=x | 188,331 | | NOMB.=p | 71,374 |
Genre
| Value | Count | |---------|---------| | GENRE=x | 251,661 | | GENRE=m | 155,955 | | GENRE=f | 63,962 | | GENRE=n | 7,079 |
Cas
| Value | Count | |---------|---------| | CAS=x | 249,071 | | CAS=r | 145,693 | | CAS=n | 75,652 | | CAS=i | 8,241 |
Degre
| Value | Count | |---------|---------| | DEGRE=x | 435,708 | | DEGRE=- | 24,947 | | DEGRE=p | 16,622 | | DEGRE=c | 910 | | DEGRE=s | 470 |
Owner
- Name: École nationale des chartes
- Login: chartes
- Kind: organization
- Location: 65 rue de Richelieu, 75002 Paris
- Website: http://www.chartes.psl.eu/
- Repositories: 12
- Profile: https://github.com/chartes
Grand établissement d’enseignement supérieur dédié à la recherche historique