emolex

https://github.com/marcoscardenasmancilla/emolex

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.5%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: marcoscardenasmancilla
License: agpl-3.0
Language: Python
Default Branch: main
Size: 59.6 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 1

Created 12 months ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

EmoLex (ES)

Author / Autor: Dr. Marcos H. Crdenas Mancilla
E-mail: marcoscardenasmancilla@gmail.com
Creation date / Fecha de creacin: 2025-07-25
License / Licencia: AGPL V3
Copyright (c) 2025 Marcos Hugo Crdenas Mancilla

Description

This Python script implements a Random Forest classifier to automatically assign Spanish words to affective-semantic subgroups based on psycholinguistic and emotional variables. It integrates unsupervised clustering results (sub-clusters) with supervised classification to improve scalability and accuracy in lexical profiling.

Key Features:

Data Input: Loads preprocessed data from long_format_sub-clustering.csv, containing affective ratings and sub-cluster labels.
Data Cleaning: Removes rows with missing values in predictor or target variables.
Label Encoding: Encodes string labels (if necessary) for classification.
Training/Testing Split: 80% training, 20% testing.
Model Training: Trains a RandomForestClassifier with 100 estimators.
Evaluation: Prints precision, recall, f1-score, and confusion matrix.
Model Export: Saves the trained model with a timestamp using joblib.
Visualization: Plots feature importances using matplotlib and seaborn.

Predictors:

Valence_Mean
Arousal_Mean
Concreteness_Mean
Emotionality
Zipf_EsPal
Balanced_Integration_Score

Objective:

To automate and enhance the classification of emotional words in Spanish by leveraging machine learning techniques that combine quantitative, affective and psycholinguistic cues.

Descripcin

Este script en Python implementa un clasificador Random Forest para asignar automticamente palabras en espaol a subgrupos afectivo-semnticos, basndose en variables psicolingsticas y emocionales. Integra resultados de clasificacin no supervisada (subclsteres) con aprendizaje supervisado para mejorar la escalabilidad y precisin del perfilamiento lxico.

Caractersticas principales:

Entrada de datos: Carga el archivo long_format_sub-clustering.csv con etiquetas de subagrupamiento y puntuaciones afectivas.
Limpieza: Elimina filas con valores faltantes en predictores o variable objetivo.
Codificacin de etiquetas: Convierte etiquetas no numricas en enteros si es necesario.
Divisin del conjunto: 80% entrenamiento, 20% prueba.
Entrenamiento del modelo: Utiliza RandomForestClassifier con 100 rboles.
Evaluacin: Imprime mtricas de precisin, recall, f1-score y matriz de confusin.
Exportacin del modelo: Guarda el modelo entrenado con joblib y timestamp.
Visualizacin: Grafica la importancia de los atributos predictivos con matplotlib y seaborn.

Predictores utilizados:

Valence_Mean
Arousal_Mean
Concreteness_Mean
Emotionality
Zipf_EsPal
Balanced_Integration_Score

Objetivo:

Automatizar y mejorar la clasificacin de palabras emocionales en espaol utilizando tcnicas de aprendizaje automtico que combinan informacin cuantitativa, afectiva y psicolingstica.

How to cite this repository / Cmo citar este repositorio

Crdenas-Mancilla, M. H. (2025). EmoLex: A Random Forest classifier for emotional lexica in Spanish (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.16467496

Web App

https://marcoscardenasmancilla.github.io/EmoLex/

References / Referencias

Liesefeld, H. R., & Janczyk, M. (2019). Combining speed and accuracy to control for speedaccuracy trade-offs. Behavior Research Methods, 51(1), 4060. https://doi.org/10.3758/s13428-018-1076-x
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 28252830.
Prez-Snchez, M. ., Stadthagen-Gonzalez, H., Guasch, M., Hinojosa, J. A., Fraga, I., Marn, J., & Ferr, P. (2021). EmoPro: Emotional prototypicality for 1,286 Spanish words: Relationships with affective and psycholinguistic variables. Behavior Research Methods, 53(5), 18571875. https://doi.org/10.3758/s13428-020-01519-9
Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 11911207. https://doi.org/10.3758/s13428-012-0314-x

Cross-validation output log / Registro de salida de validacin cruzada

imagen

Owner

Name: Marcos H. Cárdenas-Mancilla
Login: marcoscardenasmancilla
Kind: user

Repositories: 2
Profile: https://github.com/marcoscardenasmancilla

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Cárdenas-Mancilla"
  given-names: "Marcos Hugo"
  orcid: "https://orcid.org/0000-0002-6942-6232"
title: "EmoLex: A Random Forest classifier for emotional lexica in Spanish"
version: 1.0.0
doi:  10.5281/zenodo.16467496
date-released: 2025-07-25
url: "https://github.com/marcoscardenasmancilla/EmoLex"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science