Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: Jean-Baptiste-Camps
  • Language: XSLT
  • Default Branch: main
  • Size: 929 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 1
  • Open Issues: 5
  • Releases: 0
Created almost 5 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Citation

README.md

Word Segmentation Datasets

Datasets to be used for training word segmentation, in particular with Boudams (Clrice, 2020).

They come from various sources, documented in the paper, and from the datasets published by Oriflamms (Stutzmann et al., https://github.com/oriflamms/).

/!\ Because of the size, the train/dev/test files were not all included. They can be regenerated using the bash scripts,

  • fro/src/generate_denorm_noapos.bash for the fro dataset
  • lat/src/*/generate.sh for each latin corpora (especially the bigger one, Patrologia Latina).

Datasets

Old French (fro)

BFM

  • Guillot-Barbance, Cline, Heiden, Serge et Lavrentiev, Alexei (2017), Base de franais mdival : une base de rfrence de sources mdivales ouverte et libre au service de la communaut scientifique , Diachroniques, n 7, pp.168-184. halshs-01809581

Geste

Maritem

  • Camps, J.-B., Chaillou, C., Mariotti, V. and Saviotti, F. (2021). Editing and Attributing Musical Texts: the Chansonnier du Roi and the MARITEM Project. EADH2021: Interdisciplinary Perspectives on Data, 2nd International Conference of the European Association for Digital Humanities, Krasnoyarsk, 2021 https://halshs.archives-ouvertes.fr/halshs-03260116/document.

Nouveau corpus d'Amsterdam

  • Stein, Achim et al. (2006): Nouveau Corpus d'Amsterdam. Corpus informatique de textes littraires d'ancien franais (ca 1150-1350), tabli par Anthonij Dees (Amsterdam 1987), remani par Achim Stein, Pierre Kunstmann et Martin-D. Glegen. Stuttgart: Institut fr Linguistik/Romanistik, version 3.

OF3C: Old French Collective Corpus of the cole des chartes

  • Camps, Jean-Baptiste, Clrice, Thibault, Duval, Frdric, Kanaoka, Naomi & Pinche, Ariane (2021). Corpus and Models for Lemmatisation and POS-tagging of Old French, arXiv preprint arXiv:2109.11442, https://arxiv.org/abs/2109.11442.

OpenMedFr

Oriflamms

Oriflamms projects, dir. Dominique Stutzmann, available at:

  • https://github.com/oriflamms/Pelerinage
  • https://github.com/oriflamms/AlbumMssFrXIII
  • https://github.com/oriflamms/ECMEN
  • https://github.com/oriflamms/Graal

Latin (lat)

btv1b9080806r

  • Vernet, M. Un Manuscrit victorin au service de la pastorale du XIIIe sicle. Masters thesis, Universit PSL, Paris (2021).

Oriflamms

Oriflamms projects, dir. Dominique Stutzmann, available at:

  • https://github.com/oriflamms/Fontenay/
  • https://github.com/oriflamms/Dated-and-Datable-Manuscripts_AI2A
  • https://github.com/oriflamms/PsautierIMS

PatrologieLatine

  • Migne, J.P. (ed), Patrologia Latina: cursus completus. 221 vols. Paris, 184464.

Boudams

Clrice, T. (2020). Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin. Journal of Data Mining & Digital Humanities, 2020.

Owner

  • Name: Jean-Baptiste Camps
  • Login: Jean-Baptiste-Camps
  • Kind: user
  • Location: Paris
  • Company: École nationale des chartes (@chartes) | PSL

Assoc. Prof. in Computational Philology @chartes | Head of MA @Humanites-Numeriques-PSL

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 5
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 23 hours
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.2
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • PonteIneptique (4)
  • Jean-Baptiste-Camps (1)
Pull Request Authors
  • Jean-Baptiste-Camps (2)
  • MargueriteVernet (1)
Top Labels
Issue Labels
latin (4)
Pull Request Labels