boardgames-aspect-extraction

Project "What do you like in boardgames?" of NLP Unimi 2023/2024

https://github.com/ubriacopo/boardgames-aspect-extraction

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

aspect-extraction board-game nlp
Last synced: 10 months ago · JSON representation

Repository

Project "What do you like in boardgames?" of NLP Unimi 2023/2024

Basic Info
  • Host: GitHub
  • Owner: Ubriacopo
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 155 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 0
Topics
aspect-extraction board-game nlp
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

readme.md

Project #4 of the NLP: What do you like in boardgames?

For the project I chose to elaborate the proposal number 4 as I have a personal interest in the topic of the domain, being boardgames.

All proposals are present in the repository under resources/

How to run

To run the solution dependencies shall be installed. They are listed in the requirements.
Also install:

python -m spacy download encorewebmd
python -m spacy download en
corewebsm

No script has been written as I believed notebooks to be better at guiding the thought process.

First run the main/dataset/bgg_corpus_service.ipynb or download the dataset directly from: https://www.kaggle.com/datasets/jacopofichera/bgg-scrapped-reviews

To run preprocessing go to main/dataset/pre_processing.ipynb. This will generate various pre-processed datasets based on the starting one.

For LDA simply refer to main/lda/final_model.ipynb to launch training on the best found configuration of hyperparameters. The model is then created under \output in the same directory being an LdaMulticore instance of Gensim that can be reloaded.

For ABAE it is the same but in the abae folder. It creates more files being one for the word embeddings model, one for the initialization of aspect weight matrix before training and the keras instance model. To load and manipulate the model please refer to the ABAEManager class that holds methods based on what output is needed (if classify or loss evaluation).

Inference is left to be done by hand but using class #todo you can save it as part of the model output definition to be reloaded and used with #todo class to infer correct labels

References

My reference paper I think:

Paper: https://aclanthology.org/P17-1036.pdf
Repo :https://github.com/ruidan/Unsupervised-Aspect-Extraction/blob/master/code/train.py

Another interesting useful reference for an indepth application:

https://www.kaggle.com/code/nkitgupta/aspect-based-sentiment-analysis
Explains well how to do all. Nice insight on Emojis and Unicode normalization

Approach?

In an unsupervised paradigm for aspect extraction, you don't rely on labeled data. Instead, you can use clustering and topic modeling techniques to identify and extract aspects. Heres how you can approach it:

Data Collection and Preprocessing:
    Collect Data: Gather a large corpus of text related to your domain.
    Preprocess Text: Tokenize the text, remove stop words, and perform other cleaning steps.

Text Representation:
    Word Embeddings: Use pre-trained embeddings like Word2Vec, GloVe, or contextual embeddings like BERT embeddings.
    Document Embeddings: Represent each document as a vector, for instance by averaging word embeddings or using sentence embeddings from models like Sentence-BERT.

Aspect Extraction Techniques: ABAE, LDA

Project Setup and Instllation

python -m spacy download encoreweb_trf

Owner

  • Name: Jacopo Fichera
  • Login: Ubriacopo
  • Kind: user
  • Location: Bergamo, Italy

SW dev @ Team Quality Srl / CS Major Student @ UNIMI

GitHub Events

Total
  • Push event: 5
Last Year
  • Push event: 5

Dependencies

requirements.txt pypi
  • jupyter *
  • pandas *