https://github.com/ausgerechnet/bratutils

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.

https://github.com/ausgerechnet/bratutils

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

A collection of utilities for manipulating data and calculating inter-annotator agreement in brat annotation files.

Basic Info
  • Host: GitHub
  • Owner: ausgerechnet
  • License: mit
  • Default Branch: master
  • Size: 76.2 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of jeanphilippegoldman/bratutils
Created about 5 years ago · Last pushed over 6 years ago

https://github.com/ausgerechnet/bratutils/blob/master/

bratutils
=========
[![CircleCI](https://circleci.com/gh/savkov/bratutils.svg?style=svg&circle-token=9a7bdcb066c87c45017fe2214c71f2e2f9672c94)](https://circleci.com/gh/savkov/bratutils)
[![Maintainability](https://api.codeclimate.com/v1/badges/4c8fccbe0c29026c90bd/maintainability)](https://codeclimate.com/github/savkov/bratutils/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/4c8fccbe0c29026c90bd/test_coverage)](https://codeclimate.com/github/savkov/bratutils/test_coverage)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A collection of utilities for manipulating data and calculating inter-annotator 
agreement in brat annotation files.

### Installation

Install as a normal package from the source directory.

```bash
$ pip install bratutils
```


### Agreement Definition

Agreement in multi-token annotations is commonly evaluated using [f-score][fsc].
due to various problems with computing the traditional [Krippendorf's alpha][al] 
and [Cohen's kappa][ka]. [Hripcsak][hripcsak] prove the validity of the metric 
for very large populations, i.e. for unrestricted text annotations.

This library roughly follows the definitions of precision and recall calculation
from the [MUC-7 test scoring][muc]. The basic definitions along with some 
additional restrictions are laid out below:

* `CORRECT` - when annotation tags and indices match completely
* `INCORRECT` - when annotation tags do not match, but the indices coincide
* `PARTIAL` - when the annotation tags are the same but one of the annotations
has the same end index and a different start index
* `MISSING` - annotations exising only in the gold standard annotation set
* `SPURIOUS` - annotations existing only in the candidate annotation set

_Note_: the gold standard is considered the collections/document from which the 
 comparison is invoked, while the supplied parallel annotation is considered 
 the candidate set.
 
_*Disclaimer:*_ the current definition of the `PARTIAL` category accomodates 
working with syntactic chunks. A different arrangement (e.g. pick largest 
contained tag as partial match instead of rightmost) might be more suitable for 
other tasks, for example some types of semantic annotation.


### Examples

Simple example:

```python
from bratutils import agreement as a

doc = a.Document('res/samples/A/data-sample-1.ann')
doc2 = a.Document('res/samples/B/data-sample-1.ann')

doc.make_gold()
statistics = doc2.compare_to_gold(doc)

print(statistics)
```

Output:

```shell
-------------------MUC-Table--------------------
------------------------------------------------
pos:135
act:134
cor:115
par:5
inc:4
mis:11
spu:10
------------------------------------------------
pre:0.858208955224
rec:0.851851851852
fsc:0.855018587361
------------------------------------------------
und:0.0814814814815
ovg:0.0746268656716
sub:0.0725806451613
------------------------------------------------
bor:119
ibo:15
------------------------------------------------
------------------------------------------------
```


[fsc]: 
[al]: 
[ka]: 
[hripcsak]: 
[muc]: 

Owner

  • Name: Philipp Heinrich
  • Login: ausgerechnet
  • Kind: user
  • Location: Erlangen
  • Company: @fau-klue

GitHub Events

Total
Last Year