https://github.com/camel-lab/conllx_evaluation
Evaluate accuracy of CoNLL-X annotations performed by annotators
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Repository
Evaluate accuracy of CoNLL-X annotations performed by annotators
Basic Info
- Host: GitHub
- Owner: CAMeL-Lab
- License: mit
- Language: Python
- Default Branch: main
- Size: 104 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Camel-depeval
Compare two CoNLL-X files or directories, to obtain the tokenization F-score and POS tag accuracy, as well as the LAS, UAS, and label scores.
Since comparison usually occurs between gold and parsed files, the two files/directories will be differentiated using gold and parsed keywords. In other words, you do not need to have gold and parsed files to compare; any two will do.
The tree alignment part of the code uses cedwordalignment.
Note: the evaluator is also CoNLL-U compatible.
Methodology
- Two files or directories are passed to the evaluator. If two directories are passed, the directories must have matching file names.
- The files are read, and the trees every two files are compared.
- Align trees using ced_word_alignment
- involves inserting null alignment tokens
- The evaluation scores are then calulated
- tokenization f-score is calculated on all aligned tokens, while the remaining metrics are calulated after removing insertions (null alignment tokens added to the gold tree)
Assumptions
Since cedwordalignment is used, the second and third assumptions are the same. - No words are added to either the parsed or gold files. - No changes to the word order. - Text is in the same script and encoding.
Contents
align_trees.pyaligns trees using the cedwordalignment algorithmclass_conllxused to read CoNLL-X filesclassesdataclasses used throughout the codeconllx_countsgets different statistics after comparing 2 CoNLL-X filesconllx_scorescalculates scores given countsevaluate_conllx_drivermain scripthandle_argssimplifies use of the argparse libraryrequirements.txtnecessary dependencies needed to run the scripts.- cedwordalignment/ the ced alignment library
README.mdthis document.
Requirements
- Python 3.8 and above.
To use, you need to first install the necessary dependencies by running the following command:
bash
pip install -r requirements.txt
Usage
```text usage: evaluateconllxdriver.py [-h] [-g] [-p] [-gd] [-pd]
This script takes 2 CoNLL-X files or 2 directories of CoNLL-X files and evaluates the scores.
required arguments: -g , --gold the gold CoNLL-X file -p , --parsed the parsed CoNLL-X file
or: -gd , --golddir the gold directory containing CoNLL-X files -pd , --parseddir the parsed directory containing CoNLL-X files ```
Examples
The sentences used are taken from CamelTB1001introduction1.conllx and CamelTB1001night1_1.conllx (data can be obtained from The Camel Treebank.
Sample 1:
The toknization is the same, and so the Fscore is 100%, and the insertion/deletion counts are both 0.
```text
python src/main.py -g data/samplesgold/sample1.conllx -p data/samplesparsed/sample1.conllx
```
|||
|- |- |
| tokenizationfscore | 100.0 |
| tokenizationprecision | 100.0 |
| tokenizationrecall | 100.0 |
| wordaccuracy | 100.0 |
| pos | 81.579 |
| uas | 55.263 |
| label | 65.789 |
| las | 44.737 |
| ppuasscore | 0 |
| pplabelscore | 0 |
| pplasscore | 0 |
Sample 2:
text
python src/main.py -g data/samples_gold/sample_2.conllx -p data/samples_parsed/sample_2.conllx
|||
|- |- |
| tokenizationfscore | 90.385 |
| tokenizationprecision | 90.385 |
| tokenizationrecall | 90.385 |
| wordaccuracy | 97.222 |
| pos | 86.538 |
| uas | 65.385 |
| label | 75.0 |
| las | 57.692 |
| ppuasscore | 0.0 |
| pplabelscore | 0.0 |
| pplas_score | 0.0 |
Sample normalization:
Using the arguments x (punctuation), n (number), and a (alef, yeh, and ta marbuta), the evaluation will ignore differences in tokenization. When using the arguments, the following comparisons will be equal:
1 and ١ , and ، ي and ى
Without normalization
text
python src/main.py -g data/samples_gold/sample_4_norm.conllx -p data/samples_parsed/sample_4_norm.conllx
|||
|- |- |
| tokenizationfscore | 80.0 |
| tokenizationprecision | 80.0 |
| tokenizationrecall | 80.0 |
| wordaccuracy | 75.0 |
| pos | 80.0 |
| uas | 80.0 |
| label | 80.0 |
| las | 80.0 |
| ppuasscore | 50.0 |
| pplabelscore | 50.0 |
| pplas_score | 50.0 |
With normalization of punctuation and numbers (you can also add a to make the arugment -xna)
text
python src/main.py -g data/samples_gold/sample_4_norm.conllx -p data/samples_parsed/sample_4_norm.conllx -xn
|||
|- |- |
| tokenizationfscore | 100.0 |
| tokenizationprecision | 100.0 |
| tokenizationrecall | 100.0 |
| wordaccuracy | 100.0 |
| pos | 100.0 |
| uas | 100.0 |
| label | 100.0 |
| las | 100.0 |
| ppuasscore | 100.0 |
| pplabelscore | 100.0 |
| pplas_score | 100.0 |
Run evaluation on a folder
text
python src/main.py --gold_dir=data/samples_gold --parsed_dir=data/samples_parsed
|||||||||||
|- |- |- |- |- |- |- |- |- |- |
|tokenizationfscore| tokenizationprecision| tokenizationrecall| wordaccuracy| pos| uas| label| las| ppuasscore| pplabelscore| pplas_score|
|sample_4_norm|80.0|80.0|80.0|75.0|80.0|80.0|80.0|80.0|50.0|50.0|50.0||
|sample_2|90.385|90.385|90.385|97.222|86.538|65.385|75.0|57.692|0.0|0.0|0.0||
|sample_1|100.0|100.0|100.0|100.0|81.579|55.263|65.789|44.737|0.0|0.0|0.0||
|sample_3|80.0|80.0|80.0|75.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0||
License
conllx_evaluator is available under the MIT license. See the LICENSE file for more info.
Owner
- Name: CAMeL Lab
- Login: CAMeL-Lab
- Kind: organization
- Location: Abu Dhabi, UAE
- Website: http://camel-lab.com
- Repositories: 22
- Profile: https://github.com/CAMeL-Lab
The Computational Approaches to Modeling Language (CAMeL) Lab at New York University Abu Dhabi
GitHub Events
Total
- Delete event: 1
- Push event: 6
- Pull request event: 2
- Create event: 2
Last Year
- Delete event: 1
- Push event: 6
- Pull request event: 2
- Create event: 2
Dependencies
- docopt ==0.6.2
- editdistance ==0.5.3
- editdistance ==0.6.0
- numpy ==1.22.4
- pandas ==1.4.2
- python-dateutil ==2.8.2
- pytz ==2022.1
- six ==1.16.0