https://github.com/apmoore1/example_augmentation

https://github.com/apmoore1/example_augmentation

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: apmoore1
  • Language: Python
  • Default Branch: master
  • Size: 4.82 MB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 7 years ago · Last pushed almost 7 years ago
Metadata Files
Readme

README.md

Requirements

  1. python >= 3.6.1
  2. pip install -r requirements.txt
  3. If you would like to create augmented data and copy the exact way we did it in the paper it requires the Stanford tokeniser of which the open source Bella package requires you to have this as a docker service therefore Docker is required.
  4. run the following to get the stanford tokeniser as a service for the Bella package docker run -p 9000:9000 --rm mooreap/corenlp
  5. The 300 dimension 840B token Glove vector that can be downloaded from here, we assume that this word vector is at the following location ./embeddings/glove.840B.300d.txt

Getting the data and converting it into Train, Validation, and Test sets

Download the following datasets:

  1. SemEval Restaurant and Laptop 2014 [1] train and [test] and put the Laptop_Train_v2.xml and Restaurants_Train_v2.xml training files into the following directory ./data and do the same for the test files (Laptops_Test_Gold.xml and Restaurants_Test_Gold.xml)
  2. Election dataset [2] from the following link and extract all of the data into the following folder ./data/election, the election folder should now contain the following files annotations.tar.gz, test_id.txt, train_id.txt, and tweets.tar.gz. Extract both the annotations.tar.gz and the tweets.tar.gz files.

Then run the following command to create the relevant and determinstic train, validaion, and test splits of which these will be stored in the following directory ./data/splits:

python generate_datasets.py ./data ./data/splits

This should create all of the splits that we will use throughout the normal experiments that are the baseline values for all of our augmentation experiments. This will also print out some statistics for each of the splits to ensure that they are relatively similar.

Create the augmented data

As stated in the paper the data is augmented by exchanging target words within their K (5) nearest words for each sample in the dataset thus creating a dataset up to five times larger than the original. The reason it is upto 5 times is due to the embedding not containing all of the target words.

, these samples are then subsampled by randomly sampling from this augmented dataset until we have dataset of the same size as the original dataset. We do this by random sub-sampling for each epoch to allow the model to see more unique examples. Therefore an embedding is required of which we use a custom embeddings for each dataset that is domain specific and can be downloaded from the following links: 1. Restaurant 2. Laptop 3. Twitter Election

All of the embeddings are trained with n-gram phrases of up to 6 grams where the phrases are detected based on normalised pointwise mutual information this was used to find targets that are more than one word long this idea also comes from Mikolov. These embeddings were trained using the Word2Vec Skip Gram algorthim where all words were tokenised using Stanford, lower cased, and found phrases of up to 6 n-grams and finally create a 300 dimension vector. None of these parameters were optimised e.g. we did not tune the dimensionality to create more semantically meaningful embeddings, window 5, minimum word count of 5, trained for 5 epochs, basically standard settings within Gensim. The Restaurant and Laptop embeddings were train on the Yelp dataset of 2018 and the electronics amazon review datasetRelated paper respectively. The Twitter Election embeddings were trained on Tweets that we have collected from the 2nd of Febrauary to the 8th of March 2018 where the Tweets where collected based on content from the mention of a list of MP account names.

The affects of data augmentation

Restaurant and using the same targets in the training dataset

To get a list of target words perform the following: bash python target_word_list.py data/splits/Restaurant\ Train target_words/restaurant/restaurant_train.json This will then create a list of unique target words that have come from the Restaurant training dataset and save them as a list within target_words/restaurant/restaurant_train.json. This can then be used to expand the current training dataset based on switching targets that are similar within the same sentence.

Expanding through language model

The following command with create a new json type file where each new line is a json dictionary that corresponds to one sample in the original TDSA training dataset but with 3 new fields; alternative_targets, alternative_perplexity, and original_perplexity: bash python augment_transformer.py data/splits/Restaurant\ Train ../yelp_language_model_save_large/model.tar.gz ./target_words/restaurant/restaurant_train.json ./original_augmentation_datasets/restaurant/yelp_lm.json 'spacy' --cuda --batch_size 15

Expanding by using the Domain Specific Embedding

bash python augment_embedding.py data/splits/Restaurant\ Train ./embeddings/yelp/lower\ case\ phrase\ stanford\ 300D ./target_words/restaurant/restaurant_train.json ./original_augmentation_datasets/restaurant/embedding.json 'spacy' --lower

Plotting the domain specific embeddings similarity scores

Here we want to know the distrbution of the similarity scores so that we can create a threshold value. This threshold value is a lot easier for the language model as we can use the perplexity score of the original sentence and only choose targets that are equal or lower to that original perplexity score.

The similarity scores that we will use to create this distribution will come from the expanded dataset above. Even though this will cause a bais towards targets the occur more frequently this bias comes from the training data so we are going to keep that bias. ``` bash python embeddingsimilaritydist.py originalaugmentationdatasets/restaurant/embedding.json ./images/embeddingsimilaritydist/restaurant.png 10.0

python embeddingsimilaritydist.py originalaugmentationdatasets/restaurant/embedding.json ./images/embeddingsimilaritydist/restaurant.png 5.0 `` This will show that the simiarity value of 0.36 (0.418) will cover 10% (5%) of the simialrity values within the augmented dataset. The plot returned from this command shows that the data is not normally distributed and this is confirmed by theD’Agostino and Pearson’s` normality test.

Creating new Training datasets

Here we show how we create K best alternative target datasets and K Threshold alternative datasets:

K

This is where we choose the K most similar targets based on either the language model or the embedding. Below is the command to run to create both of these datasets respectively: bash python create_datasets.py original_augmentation_datasets/restaurant/yelp_lm.json augmented_data/restaurant/no_additional_targets/lm_10_no_threshold.json 10 --lm python create_datasets.py original_augmentation_datasets/restaurant/embedding.json augmented_data/restaurant/no_additional_targets/embedding_10_no_threshold.json 10 --embedding Where in both cases we can see that K is 10. We repeat this same process for [2,3,5] values of K. For this we can run the following script:

bash ./create_datasets.sh /home/andrew/Envs/example_augmentation/bin/python restaurant

K Threshold

This is the same as above except that we restrict the K most similar to only those K that pass some sort of threshold, in the case of the language model this is that the K targets when within the sentence the perplexity of the sentence is lower or equal to the same sentence but with the original target. In the embedding case it's not context/sentence specific rather we have to define up front a specific similarity score that the K targets have to be greater or equal to the similarity of the original target. To inform us on the similarity threshold to use we look at the similarity plot produced in the above section and from this we have decided 0.418 as it will only allow the top 5% of the most similar targets through and hopefully increase precision when K is large. The command top produce the threshold dataset is shown below for K equal to 10: bash python create_datasets.py original_augmentation_datasets/restaurant/yelp_lm.json augmented_data/restaurant/no_additional_targets/lm_10.json 10 --lm --threshold 1 python create_datasets.py original_augmentation_datasets/restaurant/embedding.json augmented_data/restaurant/no_additional_targets/embedding_10.json 10 --embedding --threshold 0.418 We repeat this same process for [2,3,5] values of K, without changing the threshold limit for the embedding which is 0.418. For this we can run the following script:

bash ./create_datasets.sh /home/andrew/Envs/example_augmentation/bin/python restaurant 0.418

The affects this has on modelling

First to ensure that the learning rates that we have selected in the model configurations are suitable we can run the following to plot learning rate against loss for the first 100 batches in the training data: (Currently one problem with this method is that when we do it for several modls at the same time it plots over each other) bash python find_lr_models.py ./data/splits/Restaurant\ Train results/learning_rates/ ./model_configs/ Restaurant /tmp/find_lr.log

bash ./restaurant_run_script.sh /home/andrew/Envs/example_augmentation/bin/python ./model_configs/standard Here we show the affects that data augmentation has on the sentiment models. The models that we shall use are the following: 1. IAN 2. TDSLTM

Plotting the results, we can use the following command to plot the results for Validation and Test sets with both Macro F1 and Accuracy metrics: bash python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Restaurant\ Test 'Macro F1' ./images/results/restaurant/augmentation/no_additional_targets_macro_f1_test.png Restaurant python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Restaurant\ Val 'Macro F1' ./images/results/restaurant/augmentation/no_additional_targets_macro_f1_val.png Restaurant --val python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Restaurant\ Test 'Accuracy' ./images/results/restaurant/augmentation/no_additional_targets_accuracy_test.png Restaurant python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Restaurant\ Val 'Accuracy' ./images/results/restaurant/augmentation/no_additional_targets_accuracy_val.png Restaurant --val

If we open ./augmentation_sentence_examples/restaurant/embedding.tsv we can see the sentence on line 24 is a problem with regards to its suggested target replacements:

sentence: It's also attached to Angel's Share, which is a cool, more romantic [bar]... related targets: bars(0.611), pub(0.5102), bartender(0.4999), bartenders(0.4885), counter(0.4596)

"dining experience", "date spot", "all you can eat deal", "icing on the cake", "place", "spot", "setting"

As we can see the first is wrong as it is non-singular, the second and fifth are plausible but the third and fourth are completely wrong but are related by topic.

Laptop and using the same targets in the training dataset (Same as above from the restaurant only showing the commands here for reproducibility reasons)

To get a list of target words perform the following: bash python target_word_list.py data/splits/Laptop\ Train target_words/laptop/laptop_train.json This will then create a list of unique target words that have come from the Laptop training dataset and save them as a list within target_words/laptop/laptop_train.json. This can then be used to expand the current training dataset based on switching targets that are similar within the same sentence.

Expanding through language model

The following command with create a new json type file where each new line is a json dictionary that corresponds to one sample in the original TDSA training dataset but with 3 new fields; alternative_targets, alternative_perplexity, and original_perplexity: bash python augment_transformer.py data/splits/Laptop\ Train ../amazon_language_model_save_large/model.tar.gz ./target_words/laptop/laptop_train.json ./original_augmentation_datasets/laptop/amazon_lm.json 'spacy' --cuda --batch_size 15

Expanding by using the Domain Specific Embedding

bash python augment_embedding.py data/splits/Laptop\ Train ./embeddings/amazon/lower\ case\ phrase\ stanford\ 300D ./target_words/laptop/laptop_train.json ./original_augmentation_datasets/laptop/embedding.json 'spacy' --lower

Plotting the domain specific embeddings similarity scores

Here we want to know the distrbution of the similarity scores so that we can create a threshold value. This threshold value is a lot easier for the language model as we can use the perplexity score of the original sentence and only choose targets that are equal or lower to that original perplexity score.

The similarity scores that we will use to create this distribution will come from the expanded dataset above. Even though this will cause a bais towards targets the occur more frequently this bias comes from the training data so we are going to keep that bias. ``` bash python embeddingsimilaritydist.py originalaugmentationdatasets/laptop/embedding.json ./images/embeddingsimilaritydist/laptop.png 10.0

python embeddingsimilaritydist.py originalaugmentationdatasets/laptop/embedding.json ./images/embeddingsimilaritydist/laptop.png 5.0 `` This will show that the simiarity value of 0.336 (0.404) will cover 10% (5%) of the simialrity values within the augmented dataset. The plot returned from this command shows that the data is not normally distributed and this is confirmed by theD’Agostino and Pearson’s` normality test.

Creating new Training datasets

Here we show how we create K best alternative target datasets and K Threshold alternative datasets:

K

This is where we choose the K most similar targets based on either the language model or the embedding. Below is the command to run to create both of these datasets respectively: bash python create_datasets.py original_augmentation_datasets/laptop/amazon_lm.json augmented_data/laptop/no_additional_targets/lm_10_no_threshold.json 10 --lm python create_datasets.py original_augmentation_datasets/laptop/embedding.json augmented_data/laptop/no_additional_targets/embedding_10_no_threshold.json 10 --embedding Where in both cases we can see that K is 10. We repeat this same process for [2,3,5] values of K. For this we can run the following script:

bash ./create_datasets.sh /home/andrew/Envs/example_augmentation/bin/python laptop

K Threshold

This is the same as above except that we restrict the K most similar to only those K that pass some sort of threshold, in the case of the language model this is that the K targets when within the sentence the perplexity of the sentence is lower or equal to the same sentence but with the original target. In the embedding case it's not context/sentence specific rather we have to define up front a specific similarity score that the K targets have to be greater or equal to the similarity of the original target. To inform us on the similarity threshold to use we look at the similarity plot produced in the above section and from this we have decided 0.404 as it will only allow the top 5% of the most similar targets through and hopefully increase precision when K is large. The command top produce the threshold dataset is shown below for K equal to 10: bash python create_datasets.py original_augmentation_datasets/laptop/amazon_lm.json augmented_data/laptop/no_additional_targets/lm_10.json 10 --lm --threshold 1 python create_datasets.py original_augmentation_datasets/laptop/embedding.json augmented_data/laptop/no_additional_targets/embedding_10.json 10 --embedding --threshold 0.404 We repeat this same process for [2,3,5] values of K, without changing the threshold limit for the embedding which is 0.404. For this we can run the following script:

bash ./create_datasets.sh /home/andrew/Envs/example_augmentation/bin/python laptop 0.404

The affects this has on modelling

First to ensure that the learning rates that we have selected in the model configurations are suitable we can run the following to plot learning rate against loss for the first 100 batches in the training data: (Currently one problem with this method is that when we do it for several modls at the same time it plots over each other) bash python find_lr_models.py ./data/splits/Laptop\ Train results/learning_rates/ ./model_configs/ Laptop /tmp/find_laptop_lr.log

bash ./laptop_run_script.sh /home/andrew/Envs/example_augmentation/bin/python ./model_configs/standard Here we show the affects that data augmentation has on the sentiment models. The models that we shall use are the following: 1. IAN 2. TDSLTM

Plotting the results, we can use the following command to plot the results for Validation and Test sets with both Macro F1 and Accuracy metrics: bash python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Laptop\ Test 'Macro F1' ./images/results/laptop/augmentation/no_additional_targets_macro_f1_test.png Laptop python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Laptop\ Val 'Macro F1' ./images/results/laptop/augmentation/no_additional_targets_macro_f1_val.png Laptop --val python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Laptop\ Test 'Accuracy' ./images/results/laptop/augmentation/no_additional_targets_accuracy_test.png Laptop python vis_results.py 5 results/augmentation/no_additional_targets/ data/splits/Laptop\ Val 'Accuracy' ./images/results/laptop/augmentation/no_additional_targets_accuracy_val.png Laptop --val

If we open ./augmentation_sentence_examples/restaurant/embedding.tsv we can see the sentence on line 24 is a problem with regards to its suggested target replacements:

sentence: It's also attached to Angel's Share, which is a cool, more romantic [bar]... related targets: bars(0.611), pub(0.5102), bartender(0.4999), bartenders(0.4885), counter(0.4596)

"dining experience", "date spot", "all you can eat deal", "icing on the cake", "place", "spot", "setting"

As we can see the first is wrong as it is non-singular, the second and fifth are plausible but the third and fourth are completely wrong but are related by topic.

Anaylsing the results of K

Here we want to know if K is significant or not, furthermore we will expore this in two ways:

  1. Is the Best K for each model and augmentation technique significantly better than the worse K?
  2. Is the Best K for each model and augmentation technique sigificantly better than the next best K?
  3. Is there a trend of best K's and significantly worse K's? -- This is shown through the number of times K is best for each metric and data split and the number of times a K is significantly worse than the best K.
  4. Given all the significantly best and worse K pairs is there an overall best and worse K from all of the model and augmentation pairs?

We are going to break down the code commands to generate the scores for these based on Metric and data split (Validation or Test). For each run it calculates the significants based on one-tailed paired bootstrap test with 10,000 bootstrap samples. As each of the models and augmentation techniques have been run 5 times to take into account the random seed problem we will take the median best model for each to compare significant values.

Validation

Accuracy

We will break this down for both the validation and test sets. For the validation sets for both Macro F1 and Accuracy scores for the laptop dataset: bash ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Laptop\ Val 'Accuracy' Laptop 10000 true true ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Restaurant\ Val 'Accuracy' Restaurant 10000 true true The results for this can be found in the following pdf and latex file (pdf is a rendering of the latex). As well as the number of times K was best and worse pdf and latex file.

Macro F1

bash ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Laptop\ Val 'Macro F1' Laptop 10000 true true ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Restaurant\ Val 'Macro F1' Restaurant 10000 true true

The results for this can be found in the following pdf and latex file.

Test

Accuracy

bash ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Laptop\ Test 'Accuracy' Laptop 10000 false true ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Restaurant\ Test 'Accuracy' Restaurant 10000 false true

Macro F1

bash ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Laptop\ Test 'Macro F1' Laptop 10000 false true ./is_k_significant.sh ~/Envs/example_augmentation/bin/python ./results/augmentation/no_additional_targets/ ./data/splits/Restaurant\ Test 'Macro F1' Restaurant 10000 false true

Converting the word vectors from binary file to text file

bash python from_vector_to_txt.py ./embeddings/yelp/lower\ case\ phrase\ stanford\ 300D ./embeddings/yelp/ds_embedding.txt python from_vector_to_txt.py ./embeddings/amazon/lower\ case\ phrase\ stanford\ 300D ./embeddings/amazon/ds_embedding.txt

Are language models embedding better than embeddings and is domain speicifc required?

bash ./lm_embedding_run_script.sh /home/andrew/Envs/example_augmentation/bin/python and to visulise the results: bash python vis_domain_results.py 5 ./results/ data/splits/Laptop\ Test 'Macro F1' images/results/laptop/domain_specific/macro_f1_test_val.png Laptop --val_fp data/splits/Laptop\ Val python vis_domain_results.py 5 ./results/ data/splits/Laptop\ Test 'Accuracy' images/results/laptop/domain_specific/accuracy_test_val.png Laptop --val_fp data/splits/Laptop\ Val python vis_domain_results.py 5 ./results/ data/splits/Restaurant\ Test 'Macro F1' images/results/restaurant/domain_specific/macro_f1_test_val.png Restaurant --val_fp data/splits/Restaurant\ Val python vis_domain_results.py 5 ./results/ data/splits/Restaurant\ Test 'Accuracy' images/results/restaurant/domain_specific/accuracy_test_val.png Restaurant --val_fp data/splits/Restaurant\ Val

What happens when augmentation meets domain specific LM?

ATAE Laptop with DS LM and Glove: bash ./laptop_run_script.sh /home/andrew/Envs/example_augmentation/bin/python ./model_configs/Laptop_ds_lm_embedding 'atae' 'atae_ds_lm_embedding'

Extra baselines

python runmodels.py 5 ./data/splits/ ./results/baseline ./modelconfigs/standard Laptop ./logdir/Laptopbaselineextra.log --modelnames "lstm" "lstmrandom" "dselmotembeddingtunelaptop" "dselmotembeddinglaptop" "dselmotlaptop" "elmot" --modelnamesavenames "lstm" "lstmrandom" "dselmotembeddingtune" "dselmotembedding" "dselmot" "elmot" python runmodels.py 5 ./data/splits/ ./results/baseline ./modelconfigs/standard Restaurant ./logdir/Restaurantbaselineextra.log --modelnames "lstm" "lstmrandom" "dselmotembeddingtunerestaurant" "dselmotembeddingrestaurant" "dselmotrestaurant" "elmot" --modelnamesavenames "lstm" "lstmrandom" "dselmotembeddingtune" "dselmotembedding" "dselmot" "elmot"

python runmodels.py 1 ./data/splits/ ./results/baseline ./modelconfigs/standard Laptop ./logdir/Laptopbaselineextrafine.log --modelnames "dselmotfinetunelaptop" --modelnamesavenames "dselmotfine_tune" - Acc val, test 74.46, 77, macro f1 val, test 66.85, 71 epoch 30

References

  1. SemEval-2014 Task 4: Aspect Based Sentiment Analysis
  2. TDParse: Multi-target-specific sentiment recognition on Twitter

Number of instances

Laptop Train - 1851 - 58 batches with batch size 32 Restaurant Train - 2882 - 91 batches with batch size 32

See the amount of the unlabelled data in amazon, yelp, and elections that have bad unicode

To do this we are going to use the ftfy package. First we are going to see the scale of the problem using the following script: bash python text_encoding_issues_ftfy.py ../MP-Tweets/filtered_split_train.txt python text_encoding_issues_ftfy.py ../amazon/filtered_split_train.txt python text_encoding_issues_ftfy.py ../yelp/splits/filtered_split_train.txt However the problem is more difficult than I relaise as this needs to be fixed before the tokenization happens as shown below: python ftfy_text = 'All mounts have different tv \'s on them , One is a 50&#34 ; , 40&#34 ; and a 32". Product comes with multiple screw for different tvs .\n' text_before_hand = "All mounts have different tv 's on them , One is a 50&#34 ; , 40&#34 ; and a 32". Product comes with multiple screw for different tvs .\n" As we can see it should have 40 and 50 as 40" and 50" like it corrected the 32. However this has not happened due to the tokenization.

Training a Target Extraction method:

We want to find new targets within large samples of text so that we can then uses these to help augmentation. To do so first we must train our Target Extraction models. We will use a standard LSTM based approach and use the domain specific ELMo Transformer models to help with the word representations. We shall do this for each of the datasets.

Amazon (SemEval 2014 Laptop domain)

bash python target_extraction_train_predict.py semeval_2014 --train_fp ../../Music/original_target_datasets/semeval_2014/SemEval\'14-ABSA-TrainData_v2\ \&\ AnnotationGuidelines/Laptop_Train_v2.xml --test_fp ../../Music/original_target_datasets/semeval_2014/ABSA_Gold_TestData/Laptops_Test_Gold.xml --number_to_predict_on 1000000 --batch_size 256 target_extraction_configs/amazon.jsonnet ./target_extract_models/amazon ../amazon/filtered_split_train.txt ../extra_target_data/amazon_predicted_targets.txt This should produce a Test F1 score of around: 0.85 (0.85423197492163) which is around the state-of-the-art performance (0.8426), this also takes around 82 minutes to make predictions for all 1,000,000 sentences.

Yelp (SemEval 2014 Restaurant domain)

bash python target_extraction_train_predict.py --train_fp ../../Music/original_target_datasets/semeval_2014/SemEval\'14-ABSA-TrainData_v2\ \&\ AnnotationGuidelines/Restaurants_Train_v2.xml --test_fp ../../Music/original_target_datasets/semeval_2014/ABSA_Gold_TestData/Restaurants_Test_Gold.xml --number_to_predict_on 1000000 --batch_size 256 semeval_2014 ./target_extraction_configs/yelp.jsonnet ./target_extract_models/yelp ../yelp/splits/filtered_split_train.txt ../extra_target_data/yelp_predicted_targets.txt This should produce a Test F1 score of around 0.88 (0.882843352347521) which beats the state-of-the-art on this dataset (85.61), this also takes around 69 minutes to make predictions for all 1,000,000 sentences.

MP Tweets (Twitter Election dataset)

bash python target_extraction_train_predict.py --number_to_predict_on 1000000 --batch_size 256 election_twitter ./target_extraction_configs/mp.jsonnet ./target_extract_models/mp ../MP-Tweets/filtered_split_train.txt ../extra_target_data/mp_predicted_targets.txt This should produce a Test F1 score of around 0.8778 (0.8778369844089204) (no baseline paper to compare to), this also takes around 104 minutes to make predictions fro all 1,000,000 sentences.

Analysis the predicted targets

As all of the predicted target data is within the following directory ../extra_target_data we now want to analysis them to see how they differ from the original gold standard datasets. In all of the cases when we are extracting out the predicted targets we are only going to retrieve the targets that the model is 90% confident that it is a target.

Yelp (SemEval Restaurant)

bash python predicted_target_extraction.py ../extra_target_data/yelp_predicted_targets.txt ../../Music/original_target_datasets/semeval_2014/SemEval\'14-ABSA-TrainData_v2\ \&\ AnnotationGuidelines/Restaurants_Train_v2.xml ../../Music/original_target_datasets/semeval_2014/ABSA_Gold_TestData/Restaurants_Test_Gold.xml 0.9 Output Percentage of targets that have been predicted that are in train: 43.84937238493724 Percentage of targets that have been predicted that are in test: 51.73076923076923 Number of new predicted targets that are in the whole gold datasets: 632 compared to that are not: 896 Number of new predicted and training targets that are in the test datasets: 287 compared to that are not: 233 Number of new predicted targets that are in the test datasets: 269 compared to that are not: 251 Number of new training targets that are in the test datasets: 187 compared to that are not: 333 Total number of predicted targets: 12042 Number of targets in train: 1195 Number of targets in test: 520 Number of targets in train and test: 1528

Amazon (SemEval Laptop)

bash python predicted_target_extraction.py ../extra_target_data/amazon_predicted_targets.txt ../../Music/original_target_datasets/semeval_2014/SemEval\'14-ABSA-TrainData_v2\ \&\ AnnotationGuidelines/Laptop_Train_v2.xml ../../Music/original_target_datasets/semeval_2014/ABSA_Gold_TestData/Laptops_Test_Gold.xml 0.9 Output Percentage of targets that have been predicted that are in train: 44.12698412698413 Percentage of targets that have been predicted that are in test: 50.128534704370175 Number of new predicted targets that are in the whole gold datasets: 484 compared to that are not: 697 Number of new predicted and training targets that are in the test datasets: 214 compared to that are not: 175 Number of new predicted targets that are in the test datasets: 195 compared to that are not: 194 Number of new training targets that are in the test datasets: 153 compared to that are not: 236 Total number of predicted targets: 6716 Number of targets in train: 945 Number of targets in test: 389 Number of targets in train and test: 1181

MP (Election Twitter)

bash python predicted_target_extraction.py --election ../extra_target_data/mp_predicted_targets.txt . . 0.9 Output Percentage of targets that have been predicted that are in train: 42.40129799891834 Percentage of targets that have been predicted that are in test: 53.12916111850865 Number of new predicted targets that are in the whole gold datasets: 886 compared to that are not: 1293 Number of new predicted and training targets that are in the test datasets: 523 compared to that are not: 228 Number of new predicted targets that are in the test datasets: 399 compared to that are not: 352 Number of new training targets that are in the test datasets: 421 compared to that are not: 330 Total number of predicted targets: 125954 Number of targets in train: 1849 Number of targets in test: 751 Number of targets in train and test: 2179

Owner

  • Name: Andrew Moore
  • Login: apmoore1
  • Kind: user
  • Location: Lancaster
  • Company: Lancaster University

PhD student and researcher. Main interests: Target/Aspect based sentiment analysis, Semi-Supervised Learning.

GitHub Events

Total
Last Year

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 30
  • Total Committers: 1
  • Avg Commits per committer: 30.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Andrew Moore a****4@g****m 30

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels