colombian-congress-nlp
An analysis of who the targets of disgust related language are in colombian congress sessions are using word embeddings.
https://github.com/alejandro-sarria-morales/colombian-congress-nlp
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary
Repository
An analysis of who the targets of disgust related language are in colombian congress sessions are using word embeddings.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Shades of disdain: an analysis of disgust in colombian congressional speech through word embeddings
This study investigates the utilization of disgust-related language in official sessions of the Colombian Senate with a focus on the current legislative period following the 2022 presidential elections. The research examines whether demographic characteristics of senatorssuch as coalition membership, gender, and ethnicitypredict their association with disgust language. Utilizing a corpus of official transcripts from 15 senate sessions and employing Word2Vec to create semantic embeddings, this analysis identifies a significant semantic relationship between the use of disgust-related terms and certain groups of senators. An ordinary least squares regression model, incorporating senators' demographic characteristics, revealed that coalition affiliation and gender significantly predict the association with disgust language, while ethnicity does not. Notably, female members of the government coalition are especially targeted. These findings show evidence of systematic use of disgust language against historically marginalized groups during senate sessions. The implications of these results for the prevention of political value, as well as the methodology contributions of this study are discussed. This research deepens the understanding of political discourse dynamics and suggests the need for mechanisms to protect democratic discourse and vulnerable political actors. Future directions for work exploring this phenomenon or adapting the method proposed to other areas are laid out.
Files in this repository
- pdf to clean text.ipynb: converts pdf to text files and cleans them
- analyzer.ipynb: Preprocesses text, creates word embeddgins model, does statistical analysis and some visualizations
- data: folder with data for the project
External links
- Overleaf project: https://www.overleaf.com/project/65ff0d77d77de566e3712366
- Zotero library: https://www.zotero.org/groups/5461283/macs-3200-sarria-alejandro/library
Requirements
- Python 3.11.5
- spacy 3.7.2
- pandas 2.0.3
- requests 2.31.0
- bs4 4.12.3
- gensim 4.3.0
- numpy 1.24.3
- sklearn 1.4.2
- matplotlib 3.7.2
- seaborn 0.12.2
- scipy 1.11.1
Replication materials
The datasets for replication are in the data folder of this repository. Analyzer_replicati.ipynb and reggresion models.qmd contain the code to replicate this project
How to cite
Cite this Repository as:
Sarria-Morales, Alejandro, Shades of disdain: an analysis of disgust in colombian congressional speech through word embeddings (2024) GitHub Repository. https://github.com/alejandrosarria0296/colombian-congress-nlp/
Owner
- Name: Alejandro Sarria-Morales
- Login: alejandro-sarria-morales
- Kind: user
- Repositories: 1
- Profile: https://github.com/alejandro-sarria-morales
MA student in Computational Social Science at the University of Chicago. Trying to understand latin american politics through text.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Shades of disdain: an analysis of disgust in colombian
congressional speech through word embeddings
message: >-
If you use this dataset, please cite it using the metadata
from this file.
type: dataset
authors:
- given-names: Alejandro
family-names: Sarria-Morales
email: asarria@uchicago.edu
affiliation: University of Chicago
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1