https://github.com/acdh-oeaw/acdh-prodigy-utils

custom loaders for spaCy's prodigy

https://github.com/acdh-oeaw/acdh-prodigy-utils

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 2 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary

Keywords

prodigy spacy
Last synced: 6 months ago · JSON representation

Repository

custom loaders for spaCy's prodigy

Basic Info
  • Host: GitHub
  • Owner: acdh-oeaw
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 6.65 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Archived
Topics
prodigy spacy
Created over 6 years ago · Last pushed almost 4 years ago

https://github.com/acdh-oeaw/acdh-prodigy-utils/blob/master/

# prodigy_utils

A bunch of custom loaders for prodigy

* dsebaseapp
* transkribus
* sketch-engine
* django-rest-framework based APIs

# install

* clone the repo
* build the package (in your virtual environment) `python setup.py develop`
* add needed api-credentials to your `prodigy.json` config file like
```python
"api_keys": {
    "ske_user": "someusername",
    "ske_pw": "somepassword",
    "transkribus_user": "someusername",
    "transkribus_pw": "somepassword"
}
```

also install lxml and requests

## example dsebaseapp

annotate TEI documents stored in a dsebaseapp instance

### create dataset

`python -m prodigy dataset asbw "ASBW-Retro for gold annotations"`

### Make NER-Gold-Data

`python -m prodigy ner.make-gold asbw de_core_news_sm https://asbw-retro.acdh-dev.oeaw.ac.at::asbw-retro::editions --loader from_dsebaseapp --label PER,ORG,LOC -U`

## example django-rest-framework

`python -m prodigy ner.make-gold drf de_core_news_sm https://annotator.acdh-dev.oeaw.ac.at/api/nersampletodo/?format=json::text::50 --loader from_drf --label PER,ORG,LOC,MISC -U`

## example transkribus

### Make NER-Gold-Data

`python -m prodigy ner.make-gold asbw de_core_news_sm 44688::181839  --loader from_transkribus --label PER,ORG,LOC,MISC -U`

### text classifier

#### make a dataset

`python -m prodigy dataset mpr_retro_ungarn_textcat "MPR-Ungarn for text classification"`

#### start prodigy

`python -m prodigy textcat.manual mpr_retro_ungarn_textcat de_core_news_sm 45410::187485 --loader from_transkribus_regions --label PB,P,REGEST,NOTE,MINUTEH,OTHER`

## example sketch-engine


### text classifier

#### make a dataset

`python -m prodigy dataset ske-amc "AMC for text classification"`

#### start prodigy

`python -m prodigy textcat.manual ske-amc de_core_news_sm amc_3.1 --loader from_ske_docs --label SPORT,CHRONIK,SONST`


### NER

`python -m prodigy ner.make-gold ske-amc de_core_news_sm amc3_demo --loader from_ske_docs`


### stand-alones

In the folder 'example_prodigy_standalones' additional examples on prodigy's usage are shown, namely such were all the configuration is done within python code itself. More info in the README of that subfolder.

Owner

  • Name: Austrian Centre for Digital Humanities & Cultural Heritage
  • Login: acdh-oeaw
  • Kind: organization
  • Email: acdh@oeaw.ac.at
  • Location: Vienna, Austria

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 27
  • Total Committers: 2
  • Avg Commits per committer: 13.5
  • Development Distribution Score (DDS): 0.185
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Peter Andorfer P****r@o****t 22
steff-vm s****h@o****t 5
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

setup.py pypi
  • lxml >=4.6.1