corsican-stylometry

Strylometry analysis and topic modeling on a Corsican historical corpus

https://github.com/vincentsarbachpulicani/corsican-stylometry

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Strylometry analysis and topic modeling on a Corsican historical corpus

Basic Info
  • Host: GitHub
  • Owner: vincentsarbachpulicani
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 189 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 4
  • Releases: 0
Created about 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

Stylometry and topic modeling in Corsican language

Organisation of the repository

data -> location of the xml files used for this search

previous_works -> dossier with previous thesis written by Vincent Sarbach-Pulicani on the subject

ressources -> folders with scripts written for the study: only the most important have been kept

results -> results and visualizations of the topic modeling and stylometry

Description

With the emergence of nationalism in the 19th century came regionalist movements to assert and claim cultural particularities. Corsica fitted in very well with this dynamic and even presented itself as a favourable location for the development of such ideas. The centralisation of the State around a strong capital and the policies of assimilation of the indigenous populations on the border with France led certain players to defend these particularisms. It was in this context that the Corsican autonomist newspaper A Muvra was born in May 1920 in Paris, under the impetus of Petru and Matteu Rocca. For almost 19 years, hundreds of authors participated in the writing of this massive dialectal work. The aim of this dissertation is to carry out author profiling, i.e. to determine the style and subjects covered by an author. To do this, we carry out authority attribution stylometry on texts using pseudonyms before completing these analyses with topic modelling, indexing of latent topics in a corpus of texts. The aim is to gain a better understanding of the complex sociology behind this rich and varied newspaper, through the use of computational methods.

Owner

  • Name: Vincent Sarbach-Pulicani
  • Login: vincentsarbachpulicani
  • Kind: user
  • Location: Paris - Corte

Digital Humanities student at @chartes. Italian and Corsican studies, late XIXth century and interwar period.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sarbach-Pulicani
    given-names: Vincent
title: "Corsican stylometry : ressources and dataset for corsican NLP"
version: 2.0.4
date-released: 2022-06-02

GitHub Events

Total
Last Year