https://github.com/apmoore1/bella-allennlp

Allen NLP models and datasets for Bella

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Allen NLP models and datasets for Bella

Basic Info

Host: GitHub
Owner: apmoore1
Language: Python
Default Branch: master
Size: 5.79 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 2

Created over 7 years ago · Last pushed almost 7 years ago

Metadata Files

Readme

TODO

Add tests for the augmeted data iterator

Target Extraction

We are treating this problem as a sequence labelling problem. As the given datasets are not pre-tokenised first they must be tokenised. The tokeniser we use is Spacy. However as the text is not pre-tokenised we want to first see how many of the tokens line up with the span offsets that are sets as the target word and that must be predicted in this task. To do this we created the ./tokens_and_targets.py script which prints out the number of targets (samples) that the target word(s) does not neatly fit within the tokens created from the tokeniser we call these tokenisation errors. An example of this error can be seen below:

Turned on BBCQT and thought I was watching a translation of a Greek political show.Anti austerity from the SNP obviously funded on oil#bbcqt

Where the target is bbcqt but the token that spacy found was oil#bbcqt therefore predicting that as the target word would be incorrect as it incorporates more than just the target. So running the following command: bash python tokens_and_targets.py ~/.Bella/Datasets/ Laptop python tokens_and_targets.py ~/.Bella/Datasets/ Restaurant python tokens_and_targets.py ~/.Bella/Datasets/ Election We find the following for each of the datasets: 0.94%, 0.18%, and 1.53% of the test datasets to have tokenisation errors which is minimal but note worthy. One thing we did find is that in the Laptop training dataset one of the targets included a space within sentence id 1436. Here we only report errors on the test sets as with the training and validation sets we can force a space between the target word and text within the text and change the span offsets and thus remove the tokenisation errors as shown by the following commands: bash python tokens_and_targets.py ~/.Bella/Datasets/ Laptop --force_space python tokens_and_targets.py ~/.Bella/Datasets/ Restaurant --force_space python tokens_and_targets.py ~/.Bella/Datasets/ Election --force_space The reason we do not do this for the test datasets is because we want a fair compriason with previous work and thus do not change the dataset in any way to avoid the tokenisation errors.

Owner

Name: Andrew Moore
Login: apmoore1
Kind: user
Location: Lancaster
Company: Lancaster University

Website: https://apmoore1.github.io/
Repositories: 55
Profile: https://github.com/apmoore1

PhD student and researcher. Main interests: Target/Aspect based sentiment analysis, Semi-Supervised Learning.

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: about 21 hours
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/apmoore1/bella-allennlp

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TODO

Target Extraction

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels