fareviews

Formal Argumentation Analysis of Product Reviews

https://github.com/davideceolin/fareviews

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Formal Argumentation Analysis of Product Reviews

Basic Info
  • Host: GitHub
  • Owner: davideceolin
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Size: 44.6 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 2
  • Open Issues: 4
  • Releases: 0
Created about 6 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Citation

README.md

github license badge build fair-software.eu markdown-link-check

FAReviews

Prerequisites

FAReviews uses python3 for argumentation mining and prolog for argument reasoning. The required prolog scripts can be found in the folder "argue".

To install the required Python 3 packages use:

bash pip3 install -r requirements.txt pip3 install --upgrade spacy pip3 install pytextrank

Download the pre-trained vectors trained on part of Google News dataset (about 100 billion words): GoogleNews-vectors-negative300.bin.gz and save it in the FAREVIEWS folder.

Download the Amazon reviews data set: bash wget -c "http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/AMAZON_FASHION_5.json.gz"

Install the Spacy encoreweb_md pipeline and nltk stopwords python -m spacy download en_core_web_md python -m nltk.downloader stopwords

Run method 1:

Argument Mining

Perform feature extraction: bash python3 ./utils/compute_scores.py The script will ask you to provide the number of jobs, chunks, batch size, textrank threshold, and folder for output. It creates in the output folder: [datafile name]_prods.pkl [datafile name]_reviews.csv

Create the matrix with distance metrics: bash python3 ./utils/graph_creation.py The script will ask you to provide the csv file with review data ([datafile name]reviews.csv), the pkl file with the product list ([datafile name]prods.pkl), the number of cores to use, and folder for output. It creates in the output folder:

[datafile name]_prods_mc.pkl

Argument Reasoning

Download the argue folder, then run the following code to start the server: cd argue swipl server.pl ?- server(3333).

While the server is running, solve the Argumentation Graph. bash python3 ./utils/graph_solver.py The script will ask you to provide the csv file with review data ([datafile name]reviews.csv), the pkl file with the product list and matrices and clusters ([datafile name]prodsmc.pkl), the number of cores to use, the folder to use for the output, and whether or not to save the figures of the created graphs to png. It creates in the output folder: ``` [datafile name]reviewsresults.csv [product asin].png [product asinlabels].png ```

Run method 2:

The three scripts described above can be ran sequentially using the FAReviews.py. This allows the user to provide the input data and several input parameters. In order for the argument reasoning part to be able to start from FAReviews.py, you have to make sure to have the prolog server running.
The following arguments can be provided to FAReviews.py (only -f is a required argument, the others are optional):

  • -f: Provide the location of the input data file (csv expected). Required argument.
  • -nc: Number of cores to use for the various processes that are ran in parallel. Default = 8.
  • -cs: Chunk size used in compute scores. Default = 100.
  • -bs: Batch size used in compute scores. Default = 20.
  • -trt: Minimum textrank score threshold for the tokens to be used. Tokens with a textrank score below the threshold are not used. The threshold is used in compute scores, and the resulting output is passed to the scripts that follow. Default = 0.0.
  • -sn: Name of the output folder (within the current folder) where you want to save the output. If it does not yet exist, it will be created. Default is Output.
  • -si: True/False If true, also save the output of compute scores and rungraph. If false, only the output of graphcreation_3 is saved. Default is False.
  • -sf: True/False. Option to save the constructed graphs to png per product. Default is False.

Run the script for example as follows, after you have started the prolog server: python3 FAReviews.py -f Data/mydata.csv -nc 4 -cs 50 -bs 40 -trt 0.10 -sn MyOutputFolder

The script will output the final results of graphcreation3 (and intermediate results/graphs if respective arguments are set), and will print which part it is currently working on and how long the finished parts have taken to complete. Note that if you use the textrank threshold, tokens with a textrank score below the threshold are not used and therefore not saved in any of the output files.

Contributing

If you want to contribute to the development of auto_extract, have a look at the contribution guidelines.

Owner

  • Name: Davide Ceolin
  • Login: davideceolin
  • Kind: user
  • Location: Amsterdam
  • Company: CWI

Citation (CITATION.cff)

# YAML 1.2
---
authors:
  -
    family-names: Ceolin
    given-names: Davide
    orcid: "https://orcid.org/0000-0002-3357-9130"
  -
    family-names: Ootes
    given-names: Laura
    orcid: "https://orcid.org/0000-0002-2800-8309"
cff-version: "1.2.0"
title: "FAReviews"
license: Apache-2.0
# date-released: 
# doi: 10.0000/FIXME
version: "0.1.0"
repository-code: "https://github.com/davideceolin/FAReviews"
keywords:
  - Argumentation reasoning
message: "If you use this software, please cite it using these metadata."

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • gensim *
  • networkx *
  • nltk *
  • numpy *
  • pandas *
  • pytextrank *
  • sklearn *
  • spacy *
  • spacy_readability *
.github/workflows/code-style.yml actions
  • actions/checkout v3 composite
.github/workflows/markdown-link-check.yml actions
  • actions/checkout main composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite