freemnorm

Parallel corpus for Early Modern French

https://github.com/freem-corpora/freemnorm

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Parallel corpus for Early Modern French

Basic Info
  • Host: GitHub
  • Owner: FreEM-corpora
  • Language: Python
  • Default Branch: master
  • Size: 3.92 MB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

FreEM Norm corpus

DOI

diff - WARNING: This repository is the new repository of [PARALLEL17](https://github.com/e-ditiones/PARALLEL17), which is not maintained anymore


Parallel corpus (diplomatic vs normalised) of 17th c. French texts.

For more information about FreEM corpora, cf. our website.

Corpus

The corpus is available in the corpus folder.

A detailed list of the content is available here.

Transcriptions

Transcripts are almost diplomatic. Long ſ is maintained ( plaiſir and not plaisir). Ligatures which have disappeared ( ſt, st, ct) are not kept, but not those that are maintained in contemporary French (œ, æ).

Use the normaliser

[TO DO]

Contribute

If you want to contribute, you can do so by cloning the repository and sending us a pull request, or by sending an email at simon.gabay[at]unige.ch.

Acknowledgments

Additional data and corrections have been provided by Philippe Gambette (GitHub) and Jonathan Poinhos.

Cite this repository

If you use the data:

bibtex @software{gabay_simon_2022_6481179, author = {Gabay, Simon and Gambette, Philippe}, title = {{FreEM-corpora/FreEMnorm: FreEM norm Parallel (original vs. normalised) corpus for Early Modern French}}, month = jan, year = 2022, note = {If you use this software, please cite it as below.}, publisher = {Zenodo}, version = {1.0.1}, doi = {10.5281/zenodo.6481179}, url = {https://doi.org/10.5281/zenodo.6481179} }

You can also additionnally use one of our latest publications:

bibtex @inproceedings{gabay:hal-02276150, TITLE = {{A Workflow For On The Fly Normalisation Of 17th c. French}}, AUTHOR = {Gabay, Simon and Riguet, Marine and Barrault, Lo{\"i}c}, URL = {https://hal.archives-ouvertes.fr/hal-02276150}, BOOKTITLE = {{DH2019}}, ADDRESS = {Utrecht, Netherlands}, ORGANIZATION = {{ADHO}}, YEAR = {2019}, MONTH = Jul, KEYWORDS = {17th Century France ; Parallel corpus building}, PDF = {https://hal.archives-ouvertes.fr/hal-02276150/file/DH2019_final.pdf}, HAL_ID = {hal-02276150}, HAL_VERSION = {v1}, }

bibtex @inproceedings{gabay:hal-02596669, TITLE = {{Traduction automatique pour la normalisation du fran{\c c}ais du XVII e si{\`e}cle}}, AUTHOR = {Gabay, Simon and Barrault, Lo{\"i}c}, URL = {https://hal.archives-ouvertes.fr/hal-02596669}, BOOKTITLE = {{TALN 2020}}, ADDRESS = {Nancy, France}, ORGANIZATION = {{ATALA}}, SERIES = {27{\`e}me Conf{\'e}rence sur le Traitement Automatique des Langues Naturelles}, YEAR = {2020}, MONTH = Jun, KEYWORDS = {Normalisation ; 17th c French ; Neural Machine Translation (NMT) ; Statistical Machine Translation (SMT) ; Digital humanities ; Humanit{\'e}s num{\'e}riques ; Fran{\c c}ais classique ; Traduction automatique neuronale ; Traduction automatique statistique}, PDF = {https://hal.archives-ouvertes.fr/hal-02596669/file/main.pdf}, HAL_ID = {hal-02596669}, HAL_VERSION = {v1}, }

bibtex @inproceedings{gabay:hal-03596653, TITLE = {{Automatic Normalisation of Early Modern French}}, AUTHOR = {Bawden, Rachel and Poinhos, Jonathan and Kogkitsidou, Eleni and Gambette, Philippe and Sagot, Beno{\^i}t and Gabay, Simon}, URL = {https://hal.inria.fr/hal-03596653}, BOOKTITLE = {{Proceedings of the 13th Language Resources and Evaluation Conference}}, ADDRESS = {Marseille, France}, ORGANIZATION = {{European Language Resources Association}}, YEAR = {2022}, MONTH = Jun, HAL_ID = {hal-03540226}, HAL_VERSION = {v1}, }

Please keep me posted if you use this data!

Contact

simon.gabay[at]unige.ch

Licence

Licence Creative Commons
This work is licensed under a Creative Commons Attribution 4.0 International Licence.

Owner

  • Name: FreEM-corpora
  • Login: FreEM-corpora
  • Kind: organization

Citation (CITATION.cff)

cff-version: 4.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Gabay
    given-names: Simon
    orcid: https://orcid.org/0000-0001-9094-4475
  - family-names: Gambette
    given-names: Philippe
    orcid: https://orcid.org/0000-0001-7062-0262
title: "FreEM-corpora/FreEMnorm: FreEM norm Parallel (original vs. normalised) corpus for Early Modern French"
version: "1.0.1"
doi: "10.5281/zenodo.5865428"
license: cc-by-4.0
date-released: 2022-01-17

GitHub Events

Total
  • Watch event: 1
  • Push event: 11
Last Year
  • Watch event: 1
  • Push event: 11