word_segmentation_data
https://github.com/jean-baptiste-camps/word_segmentation_data
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Jean-Baptiste-Camps
- Language: XSLT
- Default Branch: main
- Size: 929 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
Word Segmentation Datasets
Datasets to be used for training word segmentation, in particular with Boudams (Clrice, 2020).
They come from various sources, documented in the paper, and from the datasets published by Oriflamms (Stutzmann et al., https://github.com/oriflamms/).
/!\ Because of the size, the train/dev/test files were not all included. They can be regenerated using the bash scripts,
fro/src/generate_denorm_noapos.bashfor the fro datasetlat/src/*/generate.shfor each latin corpora (especially the bigger one, Patrologia Latina).
Datasets
Old French (fro)
BFM
- Guillot-Barbance, Cline, Heiden, Serge et Lavrentiev, Alexei (2017), Base de franais mdival : une base de rfrence de sources mdivales ouverte et libre au service de la communaut scientifique , Diachroniques, n 7, pp.168-184. halshs-01809581
Geste
- Geste: un corpus de chansons de geste, dir. Jean-Baptiste Camps, avec la collab. d'Elena Albarran, Alice Cochet & Lucence Ing, Paris, 2016-, DOI: 10.5281/zenodo.1744918, https://github.com/Jean-Baptiste-Camps/Geste/.
Maritem
- Camps, J.-B., Chaillou, C., Mariotti, V. and Saviotti, F. (2021). Editing and Attributing Musical Texts: the Chansonnier du Roi and the MARITEM Project. EADH2021: Interdisciplinary Perspectives on Data, 2nd International Conference of the European Association for Digital Humanities, Krasnoyarsk, 2021 https://halshs.archives-ouvertes.fr/halshs-03260116/document.
Nouveau corpus d'Amsterdam
- Stein, Achim et al. (2006): Nouveau Corpus d'Amsterdam. Corpus informatique de textes littraires d'ancien franais (ca 1150-1350), tabli par Anthonij Dees (Amsterdam 1987), remani par Achim Stein, Pierre Kunstmann et Martin-D. Glegen. Stuttgart: Institut fr Linguistik/Romanistik, version 3.
OF3C: Old French Collective Corpus of the cole des chartes
- Camps, Jean-Baptiste, Clrice, Thibault, Duval, Frdric, Kanaoka, Naomi & Pinche, Ariane (2021). Corpus and Models for Lemmatisation and POS-tagging of Old French, arXiv preprint arXiv:2109.11442, https://arxiv.org/abs/2109.11442.
OpenMedFr
- Wrisley, D. J., Fernandez Riva, G., Open Medieval French. https://github.com/OpenMedFr/.
Oriflamms
Oriflamms projects, dir. Dominique Stutzmann, available at:
- https://github.com/oriflamms/Pelerinage
- https://github.com/oriflamms/AlbumMssFrXIII
- https://github.com/oriflamms/ECMEN
- https://github.com/oriflamms/Graal
Latin (lat)
btv1b9080806r
- Vernet, M. Un Manuscrit victorin au service de la pastorale du XIIIe sicle. Masters thesis, Universit PSL, Paris (2021).
Oriflamms
Oriflamms projects, dir. Dominique Stutzmann, available at:
- https://github.com/oriflamms/Fontenay/
- https://github.com/oriflamms/Dated-and-Datable-Manuscripts_AI2A
- https://github.com/oriflamms/PsautierIMS
PatrologieLatine
- Migne, J.P. (ed), Patrologia Latina: cursus completus. 221 vols. Paris, 184464.
Boudams
Clrice, T. (2020). Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin. Journal of Data Mining & Digital Humanities, 2020.
Owner
- Name: Jean-Baptiste Camps
- Login: Jean-Baptiste-Camps
- Kind: user
- Location: Paris
- Company: École nationale des chartes (@chartes) | PSL
- Website: www.chartes.psl.eu/jean-baptiste-camps
- Twitter: jbcamps
- Repositories: 11
- Profile: https://github.com/Jean-Baptiste-Camps
Assoc. Prof. in Computational Philology @chartes | Head of MA @Humanites-Numeriques-PSL
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 5
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 23 hours
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.2
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- PonteIneptique (4)
- Jean-Baptiste-Camps (1)
Pull Request Authors
- Jean-Baptiste-Camps (2)
- MargueriteVernet (1)