kpa-hierarchy

kpa_hierarchy code, to share code of key point analysis hierarchy.

https://github.com/ibm/kpa-hierarchy

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

kpa_hierarchy code, to share code of key point analysis hierarchy.

Basic Info

Host: GitHub
Owner: IBM
License: apache-2.0
Language: Python
Default Branch: main
Size: 751 KB

Statistics

Stars: 4
Watchers: 5
Forks: 1
Open Issues: 2
Releases: 0

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Citation

kpa-hierarchy

Scope

This repository contains:

(1) The ThinkP dataset: a high quality benchmark dataset of key point hierarchies for business and product reviews.
(2) NEW: Code for KPH construction and evaluation.

Data

Information about using the data can be found here.

Setup

In order to run this repo, you need a Python Anaconda environment with all the requirements installed: bash conda create --name kpa_hierarchy python=3.9 conda activate kpa_hierarchy pip install -r requirements.txt

Creating Pairwise Scores:

bash python create_pairwise_scores.py --output_path "./out/pairwise_scores.csv"

To generate pairwise scores for the ThinkP dataset, use the create_pairwise_scores.py script. This will load the ThinkP dataset and convert it to a dataframe with the predicted scores for all the key points pair in each topic. Replace lines 17-19 with your code for computing the pairwise scores: add a column for the dataframe with a unique method name and the scores for each row. The scores for the methods reported in the paper are in eval/pairwise_scores/all_pairwise_scores.csv.

Arguments: - gold_path : path to the gold data jsonl, default to the path of ThinkP. - output_path: path of csv to save the pairwise scores dataframe.

Evaluating Pairwise scores:

bash python eval_pairwise_scores.py --output_path "./out/eval_pairwise.csv" --methods APInc NLI BinInc KPA-Match NLI_BinInc_WL

This script runs evaluation over the scores computed in the previous section, and outputs: 1. the Precision-Recall graphs per domain. 2. The auc (for recall > 0.1) and best f1 score (using leave-one-topic-out) for choosing the classification threshold for each domain.

Arguments: - output_path: path to the output .png file to be saved. The table with the scores will be saved to a csv file in the same path. - pairwise_scores_file: path to csv with pairwise scores (defaults to our provided pairwise scores). - methods: list of space seperated methods to evaluate, i.e. columns in the dataframe in pairwisescoresfile

Constructing KPH

bash python predict_kph.py --topic "AV6weBrZFFBfRGCbcRGO4g_neg" --viz --output_dir "./out/build_kph/" --pairwise_method "NLI_BinInc_WL" --tree_method "tncf"

Tree construction is performed in two steps: first, computing the pairwise scores for each pair of key points, and then using the pairwise scores to construct the hierarchy. The first step is done in the previous section, resulting in the pairwise scores dataframe. To run kph construction from the pairwise scores, run the predict_kph.py script. This script constructs a single KPH, for a given classification threshold, and prints its evaluation measures against the gold data.

Arguments: - gold_path: path to the gold data jsonl (default to the path of ThinkP). - pairwise_scores_file: path to a csv file with pairwise scores (defaults to our provided pairwise scores). - pairwise_methods: the pairwise methods to use, i.e. a column in the dataframe in pairwisescoresfile (default to NLIBinIncWL) - threshold: the decision threshold for counting two kps as related (default 0.5) - topic: the (string) topic id of the business or product to build the kph for - output_dir: path to output directory to save a directory with the jsonl file of the hierarchy and .txt for visualization") - viz: create or not a user friendly visualization of the generated tree - tree_methods: which kph method to use for tree construction, must be a key in kph_method_to_predictor_class, the dictionary in predict_kph.py. The construction methods available in the paper are available.

Adding a new hierarchy construction method

KPH construction is done using a class that extends TreePredictor: its constructor receives a decision threshold and a dataframe which contains all the rows for a certain topic in the pairwise scores df, with the relevant pairwise scores column named "score". The class has a method called get_hierarchy that returns a KPH object. Both TreePredictor and KPH are documented in KPH.py. Once the class is ready, add an entry to kph_method_to_predictor_class with a unique name as key and the class name as value, and run predict_kph.py as explained in the previous section.

Evaluating KPH constructions

bash python eval_kph.py --output_dir ./out/eval_kph --tree_methods reduced_tree greedy_local_score greedy_best_edge tncf --pairwise_methods NLI_BinInc_WL

This script first creates and saves all KPHs with all thresholds for all the combinations of the construction methods and pairwise methods. Then it performs the evaluation, computes the best f1_score (using leave-one-topic-out in each domain) and saves a visualization of the best tree for each combination of methods and topic.

Arguments: - output_dir: required, directory to save trees and evaluation results. previous evaluations in the same dir will be overriden. the generated trees are saved during the run, so if the execution was terminated or if you want to add more methods to the evaluation, You can use the same output dir and continue from where you left off - gold_path: path to the gold data jsonl (default to the path of ThinkP). - pairwise_scores_file: path to csv with pairwise scores (defaults to our provided pairwise scores). - tree_methods: list of space seperated KPH construction methods to evaluate, must be keys in kph_method_to_predictor_class (as explained in the previous section) - pairwise_methods: list of space seperated pairwise methods to evaluate, i.e. columns in the dataframe in pairwisescoresfile - domains: list of domains to evaluates (by default, run for all domains).

Citing

If you are using ThinkP in a publication, please cite the following paper:

From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization
Arie Cattan, Lilach Eden, Yoav Kantor and Roy Bar-Haim.
ACL 2023.

Changelog

Major changes are documented here

Owner

Name: International Business Machines
Login: IBM
Kind: organization
Email: awesome@ibm.com
Location: United States of America

Website: https://www.ibm.com/opensource/
Twitter: ibmdeveloper
Repositories: 3,152
Profile: https://github.com/IBM

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this data or software, please cite the paper below."
title: "From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization"
authors:
- family-names: Cattan
  given-names: Arie
- family-names: Eden
  given-names: Lilach
- family-names: Kantor
  given-names: Yoav
- family-names: Bar-Haim
  given-names: Roy
version: 1.0.0
date-released: 2023-06-05
license: Apache-2.0
url: "https://arxiv.org/abs/2306.03853"
repository-code: "https://github.com/IBM/kpa-hierarchy"

GitHub Events

Total

Pull request event: 1
Create event: 1

Last Year

Pull request event: 1
Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 2
Average time to close issues: 2 months
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.5
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

1mAlbert (1)
rudra0713 (1)

Pull Request Authors

renovate[bot] (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

jsonlines *
matplotlib *
networkx *
numpy *
pandas *
scikit_learn *
tabulate *
torch *
tqdm *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

kpa-hierarchy

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

kpa-hierarchy

Scope

Data

Setup

Creating Pairwise Scores:

Evaluating Pairwise scores:

Constructing KPH

Adding a new hierarchy construction method

Evaluating KPH constructions

Citing

Changelog

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies