corsican-stylometry
Strylometry analysis and topic modeling on a Corsican historical corpus
https://github.com/vincentsarbachpulicani/corsican-stylometry
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.2%) to scientific vocabulary
Repository
Strylometry analysis and topic modeling on a Corsican historical corpus
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 4
- Releases: 0
Metadata Files
README.md
Stylometry and topic modeling in Corsican language
Organisation of the repository
data -> location of the xml files used for this search
previous_works -> dossier with previous thesis written by Vincent Sarbach-Pulicani on the subject
ressources -> folders with scripts written for the study: only the most important have been kept
results -> results and visualizations of the topic modeling and stylometry
Description
With the emergence of nationalism in the 19th century came regionalist movements to assert and claim cultural particularities. Corsica fitted in very well with this dynamic and even presented itself as a favourable location for the development of such ideas. The centralisation of the State around a strong capital and the policies of assimilation of the indigenous populations on the border with France led certain players to defend these particularisms. It was in this context that the Corsican autonomist newspaper A Muvra was born in May 1920 in Paris, under the impetus of Petru and Matteu Rocca. For almost 19 years, hundreds of authors participated in the writing of this massive dialectal work. The aim of this dissertation is to carry out author profiling, i.e. to determine the style and subjects covered by an author. To do this, we carry out authority attribution stylometry on texts using pseudonyms before completing these analyses with topic modelling, indexing of latent topics in a corpus of texts. The aim is to gain a better understanding of the complex sociology behind this rich and varied newspaper, through the use of computational methods.
Owner
- Name: Vincent Sarbach-Pulicani
- Login: vincentsarbachpulicani
- Kind: user
- Location: Paris - Corte
- Repositories: 2
- Profile: https://github.com/vincentsarbachpulicani
Digital Humanities student at @chartes. Italian and Corsican studies, late XIXth century and interwar period.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Sarbach-Pulicani
given-names: Vincent
title: "Corsican stylometry : ressources and dataset for corsican NLP"
version: 2.0.4
date-released: 2022-06-02