https://github.com/ighina/arp_score

Average Relative Proximity metrics and experiments used in the paper "When Cohesion Lies in the Embedding Space: New Framework and Methodologies for Embedding-Based Reference-Free Metrics for Topic Segmentation".

Last synced: 4 months ago · JSON representation

Repository

Average Relative Proximity metrics and experiments used in the paper "When Cohesion Lies in the Embedding Space: New Framework and Methodologies for Embedding-Based Reference-Free Metrics for Topic Segmentation".

Basic Info

Host: GitHub
Owner: Ighina
Language: Python
Default Branch: main
Size: 986 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme

README.md

Embedding-based Reference-Free Evaluation Metrics for Topic Segmentation

The repository includes the metrics and experiments used in the paper "When Cohesion Lies in the Embedding Space: New Framework and Methodologies for Embedding-Based Reference-Free Metrics for Topic Segmentation".

From this repository both the experiments presented in the paper and the embedding-based metrics on your own segmentations can be used.

Requirements

To start, create a conda environment

conda create -n refree_topseg

and install all the dependencies

pip install -r requirements.txt

Run the experiments

To run the experiments presented in the original paper, first you need to download the individual datasets in the data folder. All the datasets but SBBC-RadioNews are publicly available and can be found here:

Wikisection Datasets: https://github.com/sebastianarnold/WikiSection
QMSum Dataset: https://github.com/Yale-LILY/QMSum

Specifically, download all the json files from the Wikisection dataset into data/wikisection and clone the original QMSum repository into the data folder, as well. Ince done that, the experiments are executed by the main.py function as following:

main.py -d dataset_name -e model_name

Where datasetname should be one of "QMSUM", "wikisectionencity" or "wikisectionendisease", while modelname should be one of "roberta-base", "falcon" or "MPNET".

All the results will be stored as figures (for the synthetic experiments) in "outputfigures" folder and as tables (for the real world experiments) in the "outputcorrelations" folder.

Running Experiments With Falcon Embeddings

As Falcon is a very computationally expensive model, we have divided the process of extracting embeddings from the model in a separate script that should be run before using the main.py script with falcon as an encoder.

For each of the three datasets described above you should run then run the following:

extract_embeddings_with_falcon.py dataset_name

with datasetname being one of "wikisectionencity", "wikisectionen_disease" or "QMSUM".

Run Our Embedding-based metrics on your own segmentations

You can use the embedding-based scores for reference-free evaluation on your own segmentations as well.

To use ARP method, for example, you can use our ARP class by first instantiating:

``` from embedding_metrics import ARPMetric

metric = ARPMetric(score_function, encoder) ```

Where scorefunction is one of "pairwisecosine", "standarddeviation" or "averagecosine", as described in the paper, while encoder should be either a sentence-transformer model or a viable huggingface model (transformer encoders like BERT and RoBERTa are the only ones supported for now). In both cases, the encoder argument should be the name of the model as it appears on its official model card on huggingface hub. For example:

metric = ARPMetric(score_function="pairwise_cosine", encoder="all-mpnet-base-v2")

which is also the default configuration and, therefore, equivalent to: metric = ARPMetric()

At this point, we need a corpus and a segmentation for it in the following form: ``` corpus = [["document 1 sentence 1", "document 1 sentence 2 (end of topic segment)", "document 1 sentence 3"], ["document 2 sentence 1", "document 2 sentence 2"]]

segmentation = [[0, 1, 0], [0, 0]] ```

In the above minimal example, the corpus is a list of 2 documents each including sentences which are separated in a list. The segmentation, then it is also a list of lists, where each top-level list corresponds to the document having the same position in the corpus and includes a list of binary integers, 1 if the sentence at the given document and at the given index in the document ends a topic segment and 0 otherwise. This is also the same format described in the popular nltk library.

Having our corpus and hypothesised segmentation, then, we can obtain the ARP score for the corpus by running the following: ``` from embedding_metrics import ARPMetric

metric = ARPMetric() # using default settings

corpus = {your corpus}

segmentation = {your hypothesised segmentation}

ARPscore = metric.evaluatesegmentation(corpus, segmentation) ```

Additionally, we can pass custom embeddings instead of using our encoder to extract them in the following way: ARP_score = metric.evaluate_segmentation(corpus, segmentation, embeddings=custom_embeddings) Where customembeddings should be a list of numpy arrays (embeddings) corresponding to the same document-sentence structure (e.g. in the previous example it would be a list of 2 embedding arrays, one of size 3xembeddingdimension and the second of size 2xembedding_dimension).

Finally, if you want to return all the scores for each input document together with the aggregate average of such scores: aggregated_average_score, document_level_scores = metric.evaluate_segmentation(corpus, segmentation, output_all_scores=True)

To use SegReFree, just change the instantiation to: metric = SegReFreeMetric() And equally, for Silhouette: metric = SilhouetteMetric() More specific parameters for these two metrics can be found in the code. Use them as described for ARP metric.

Owner

Name: Iacopo Ghinassi
Login: Ighina
Kind: user
Location: London
Company: Queen Mary University of London

Repositories: 5
Profile: https://github.com/Ighina

PhD Candidate in Speech and Language Processing, passionate about Digital Humanities and building things from scratch.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ighina/arp_score

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Embedding-based Reference-Free Evaluation Metrics for Topic Segmentation

Requirements

Run the experiments

Running Experiments With Falcon Embeddings

Run Our Embedding-based metrics on your own segmentations

Owner

GitHub Events

Total

Last Year