Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: marcoscardenasmancilla
  • License: agpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 59.6 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

EmoLex (ES)

Author / Autor: Dr. Marcos H. Crdenas Mancilla
E-mail: marcoscardenasmancilla@gmail.com
Creation date / Fecha de creacin: 2025-07-25
License / Licencia: AGPL V3
Copyright (c) 2025 Marcos Hugo Crdenas Mancilla


Description

This Python script implements a Random Forest classifier to automatically assign Spanish words to affective-semantic subgroups based on psycholinguistic and emotional variables. It integrates unsupervised clustering results (sub-clusters) with supervised classification to improve scalability and accuracy in lexical profiling.

Key Features:

  1. Data Input: Loads preprocessed data from long_format_sub-clustering.csv, containing affective ratings and sub-cluster labels.
  2. Data Cleaning: Removes rows with missing values in predictor or target variables.
  3. Label Encoding: Encodes string labels (if necessary) for classification.
  4. Training/Testing Split: 80% training, 20% testing.
  5. Model Training: Trains a RandomForestClassifier with 100 estimators.
  6. Evaluation: Prints precision, recall, f1-score, and confusion matrix.
  7. Model Export: Saves the trained model with a timestamp using joblib.
  8. Visualization: Plots feature importances using matplotlib and seaborn.

Predictors:

  • Valence_Mean
  • Arousal_Mean
  • Concreteness_Mean
  • Emotionality
  • Zipf_EsPal
  • Balanced_Integration_Score

Objective:

To automate and enhance the classification of emotional words in Spanish by leveraging machine learning techniques that combine quantitative, affective and psycholinguistic cues.


Descripcin

Este script en Python implementa un clasificador Random Forest para asignar automticamente palabras en espaol a subgrupos afectivo-semnticos, basndose en variables psicolingsticas y emocionales. Integra resultados de clasificacin no supervisada (subclsteres) con aprendizaje supervisado para mejorar la escalabilidad y precisin del perfilamiento lxico.

Caractersticas principales:

  1. Entrada de datos: Carga el archivo long_format_sub-clustering.csv con etiquetas de subagrupamiento y puntuaciones afectivas.
  2. Limpieza: Elimina filas con valores faltantes en predictores o variable objetivo.
  3. Codificacin de etiquetas: Convierte etiquetas no numricas en enteros si es necesario.
  4. Divisin del conjunto: 80% entrenamiento, 20% prueba.
  5. Entrenamiento del modelo: Utiliza RandomForestClassifier con 100 rboles.
  6. Evaluacin: Imprime mtricas de precisin, recall, f1-score y matriz de confusin.
  7. Exportacin del modelo: Guarda el modelo entrenado con joblib y timestamp.
  8. Visualizacin: Grafica la importancia de los atributos predictivos con matplotlib y seaborn.

Predictores utilizados:

  • Valence_Mean
  • Arousal_Mean
  • Concreteness_Mean
  • Emotionality
  • Zipf_EsPal
  • Balanced_Integration_Score

Objetivo:

Automatizar y mejorar la clasificacin de palabras emocionales en espaol utilizando tcnicas de aprendizaje automtico que combinan informacin cuantitativa, afectiva y psicolingstica.


How to cite this repository / Cmo citar este repositorio

  • Crdenas-Mancilla, M. H. (2025). EmoLex: A Random Forest classifier for emotional lexica in Spanish (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.16467496

Web App

https://marcoscardenasmancilla.github.io/EmoLex/


References / Referencias

  • Liesefeld, H. R., & Janczyk, M. (2019). Combining speed and accuracy to control for speedaccuracy trade-offs. Behavior Research Methods, 51(1), 4060. https://doi.org/10.3758/s13428-018-1076-x
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 28252830.
  • Prez-Snchez, M. ., Stadthagen-Gonzalez, H., Guasch, M., Hinojosa, J. A., Fraga, I., Marn, J., & Ferr, P. (2021). EmoPro: Emotional prototypicality for 1,286 Spanish words: Relationships with affective and psycholinguistic variables. Behavior Research Methods, 53(5), 18571875. https://doi.org/10.3758/s13428-020-01519-9
  • Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 11911207. https://doi.org/10.3758/s13428-012-0314-x

Cross-validation output log / Registro de salida de validacin cruzada

imagen

Owner

  • Name: Marcos H. Cárdenas-Mancilla
  • Login: marcoscardenasmancilla
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Cárdenas-Mancilla"
  given-names: "Marcos Hugo"
  orcid: "https://orcid.org/0000-0002-6942-6232"
title: "EmoLex: A Random Forest classifier for emotional lexica in Spanish"
version: 1.0.0
doi:  10.5281/zenodo.16467496
date-released: 2025-07-25
url: "https://github.com/marcoscardenasmancilla/EmoLex"

GitHub Events

Total
  • Delete event: 1
  • Push event: 32
  • Pull request event: 2
Last Year
  • Delete event: 1
  • Push event: 32
  • Pull request event: 2