colombian-congress-nlp

An analysis of who the targets of disgust related language are in colombian congress sessions are using word embeddings.

https://github.com/alejandro-sarria-morales/colombian-congress-nlp

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

An analysis of who the targets of disgust related language are in colombian congress sessions are using word embeddings.

Basic Info
  • Host: GitHub
  • Owner: alejandro-sarria-morales
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 7.03 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

Shades of disdain: an analysis of disgust in colombian congressional speech through word embeddings

This study investigates the utilization of disgust-related language in official sessions of the Colombian Senate with a focus on the current legislative period following the 2022 presidential elections. The research examines whether demographic characteristics of senatorssuch as coalition membership, gender, and ethnicitypredict their association with disgust language. Utilizing a corpus of official transcripts from 15 senate sessions and employing Word2Vec to create semantic embeddings, this analysis identifies a significant semantic relationship between the use of disgust-related terms and certain groups of senators. An ordinary least squares regression model, incorporating senators' demographic characteristics, revealed that coalition affiliation and gender significantly predict the association with disgust language, while ethnicity does not. Notably, female members of the government coalition are especially targeted. These findings show evidence of systematic use of disgust language against historically marginalized groups during senate sessions. The implications of these results for the prevention of political value, as well as the methodology contributions of this study are discussed. This research deepens the understanding of political discourse dynamics and suggests the need for mechanisms to protect democratic discourse and vulnerable political actors. Future directions for work exploring this phenomenon or adapting the method proposed to other areas are laid out.

Files in this repository

  • pdf to clean text.ipynb: converts pdf to text files and cleans them
  • analyzer.ipynb: Preprocesses text, creates word embeddgins model, does statistical analysis and some visualizations
  • data: folder with data for the project

External links

  • Overleaf project: https://www.overleaf.com/project/65ff0d77d77de566e3712366
  • Zotero library: https://www.zotero.org/groups/5461283/macs-3200-sarria-alejandro/library

Requirements

  • Python 3.11.5
  • spacy 3.7.2
  • pandas 2.0.3
  • requests 2.31.0
  • bs4 4.12.3
  • gensim 4.3.0
  • numpy 1.24.3
  • sklearn 1.4.2
  • matplotlib 3.7.2
  • seaborn 0.12.2
  • scipy 1.11.1

Replication materials

The datasets for replication are in the data folder of this repository. Analyzer_replicati.ipynb and reggresion models.qmd contain the code to replicate this project

How to cite

Cite this Repository as:

Sarria-Morales, Alejandro, Shades of disdain: an analysis of disgust in colombian congressional speech through word embeddings (2024) GitHub Repository. https://github.com/alejandrosarria0296/colombian-congress-nlp/

Owner

  • Name: Alejandro Sarria-Morales
  • Login: alejandro-sarria-morales
  • Kind: user

MA student in Computational Social Science at the University of Chicago. Trying to understand latin american politics through text.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Shades of disdain: an analysis of disgust in colombian
  congressional speech through word embeddings
message: >-
  If you use this dataset, please cite it using the metadata
  from this file.
type: dataset
authors:
  - given-names: Alejandro
    family-names: Sarria-Morales
    email: asarria@uchicago.edu
    affiliation: University of Chicago

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1