pie-extended

Extension for pie to include taggers with their models and pre/postprocessors

https://github.com/hipster-philology/nlp-pie-taggers

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Extension for pie to include taggers with their models and pre/postprocessors

Basic Info
  • Host: GitHub
  • Owner: hipster-philology
  • License: mpl-2.0
  • Language: Python
  • Default Branch: master
  • Size: 257 KB
Statistics
  • Stars: 10
  • Watchers: 5
  • Forks: 3
  • Open Issues: 11
  • Releases: 4
Created over 6 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog License Citation

README.md

Pie Extended

Build Status Coverage Status PyPI

Warning: This software is only compatible with up to Python 3.7 for the moment.

Extension for pie to include taggers with their models and pre/postprocessors.

Pie is a wonderful tool to train models. And most of the time, it will be enough. What pie_extended is proposing here is to provide you with the necessary tools to share your models with customized pre- and post-processing.

The current system provide an easier access to adding customized: - normalization of your text, - sentence tokenization, - word tokenization, - disambiguation, - output formatting

Cite as

@software{thibault_clerice_2020_3883590, author = {Clérice, Thibault}, title = {Pie Extended, an extension for Pie with pre-processing and post-processing}, month = jun, year = 2020, publisher = {Zenodo}, doi = {10.5281/zenodo.3883589}, url = {https://doi.org/10.5281/zenodo.3883589} }

Current supported languages

  • Classical Latin (Model: lasla)
  • Ancient Greek (Model: grc)
  • Old French (Model: fro)
  • Early Modern French (Model: freem)
  • Classical French (Model: fr)
  • Old Dutch (Model: dum)

If you trained models and want some help sharing them with Pie Extended, open an issue :)

Install

To install, simply do pip install pie-extended. Then, look at all available models.

WARNING: if you don't have a GPU or CUDA

Please, in case of doubt, run pip install pie-extended --extra-index-url https://download.pytorch.org/whl/cpu

Run on terminal

But on top of that, it provides a quick and easy way to use others models ! For example, in a shell :

bash pie-extended download lasla pie-extended install-addons lasla pie-extended tag lasla your_file.txt

will give you access to all you need !

Python API

You can run the lemmatizer in your own scripts and retrieve token annotations as dictionaries:

```python from typing import List from pieextended.cli.utils import gettagger, get_model, download

In case you need to download

dodownload = False if dodownload: for dl in download("lasla"): x = 1

model_path allows you to override the model loaded by another .tar

modelname = "lasla" tagger = gettagger(modelname, batchsize=256, device="cpu", model_path=None)

sentences: List[str] = ["Lorem ipsum dolor sit amet, consectetur adipiscing elit. "]

Get the main object from the model (: data iterator + postprocesor

from pieextended.models.lasla.imports import getiteratorandprocessor for sentencegroup in sentences: iterator, processor = getiteratorandprocessor() print(tagger.tagstr(sentencegroup, iterator=iterator, processor=processor) ) ```

will result in

python [{'form': 'lorem', 'lemma': 'lor', 'POS': 'NOMcom', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'lorem'}, {'form': 'ipsum', 'lemma': 'ipse', 'POS': 'PROdem', 'morph': 'Case=Acc|Numb=Sing', 'treated': 'ipsum'}, {'form': 'dolor', 'lemma': 'dolor', 'POS': 'NOMcom', 'morph': 'Case=Nom|Numb=Sing', 'treated': 'dolor'}, {'form': 'sit', 'lemma': 'sum1', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3', 'treated': 'sit'}, {'form': 'amet', 'lemma': 'amo', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Act|Person=3', 'treated': 'amet'}, {'form': ',', 'lemma': ',', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': ','}, {'form': 'consectetur', 'lemma': 'consector2', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Sub|Tense=Pres|Voice=Dep|Person=3', 'treated': 'consectetur'}, {'form': 'adipiscing', 'lemma': 'adipiscor', 'POS': 'VER', 'morph': 'Tense=Pres|Voice=Dep', 'treated': 'adipiscing'}, {'form': 'elit', 'lemma': 'elio', 'POS': 'VER', 'morph': 'Numb=Sing|Mood=Ind|Tense=Pres|Voice=Act|Person=3', 'treated': 'elit'}, {'form': '.', 'lemma': '.', 'pos': 'PUNC', 'morph': 'MORPH=empty', 'treated': '.'}]

Add a model

  • Create a package in ./pie_extended/models/. Exemple: foo.
  • Add the name of the package in ./pie_extended/models/__init__.py in the variable modules.
  • In the module pie_extended.models.foo, we should find the following variable:
    • Models : a string with filenames and tasks for Pie.
    • DESC: a METADATA object that bears information about the model
    • DOWNLOADS: A list of file to download.

```python from pieextended.utils import Metadata, File, getpath

DESC = Metadata( "Foo" "language", ["Author 1", "Author 2"], "A readable description", "A link to more information" )

DOWNLOADS = [ File("/a/link/to/a/file", "localnameofthefile.tar") ]

Models = "<{},task1,task2><{},lemma,pos>".format( getpath("foo", "localnameofthe_file.tar") )

`` - In the modulepieextended.models.foo.imports, we should find the following content: 1.getiteratorandprocessor: a function that returns aDataIteratorand aProcessor 2. (optionally)addons: a function that installs add-ons 3. (optionally)Disambiguator`: a disambiguator instance (or an object creator that returns one)

Check for a simple example in pie_extended.models.fro.imports and a more complex one in pie_extended.models.lasla.imports

Install development version (⚠ for development only)

Clone the repository, create an environment, and then

bash python setup.py develop

Warning

This is an extremely early build, subject to change here and there. But it is functional !

Owner

  • Name: hipster-philology
  • Login: hipster-philology
  • Kind: organization

Citation (CITATION.CFF)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Clérice
    given-names: Thibault
    orcid: https://orcid.org/0000-0003-1852-9204
title: "Pie Extended, an extension for Pie with pre-processing and post-processing"
doi: 10.5281/zenodo.3883589
version: 0.0.39
date-released: 2021-06-04

GitHub Events

Total
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Pull request event: 1
  • Fork event: 1

Committers

Last synced: over 3 years ago

All Time
  • Total Commits: 139
  • Total Committers: 4
  • Avg Commits per committer: 34.75
  • Development Distribution Score (DDS): 0.094
Top Committers
Name Email Commits
Thibault Clérice l****e@g****m 126
Thibault Clérice 1****e@u****m 8
Jean-Baptiste Camps j****s@h****m 4
Simon Gabay g****n@g****m 1

Issues and Pull Requests

Last synced: almost 2 years ago

All Time
  • Total issues: 23
  • Total pull requests: 21
  • Average time to close issues: 4 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 6
  • Total pull request authors: 3
  • Average comments per issue: 2.17
  • Average comments per pull request: 0.52
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • PonteIneptique (13)
  • Jean-Baptiste-Camps (4)
  • glorieux-f (2)
  • emanjavacas (2)
  • alexbartz (1)
  • Lucaterre (1)
  • Ctaaffe (1)
Pull Request Authors
  • PonteIneptique (17)
  • Jean-Baptiste-Camps (4)
  • gabays (1)
Top Labels
Issue Labels
bug (4) enhancement (2) good first issue (2) model:fr (2) model:lasla (1) documentation (1) technical-debt (1) model:freem (1)
Pull Request Labels
model:grc (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 79 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 3
  • Total versions: 45
  • Total maintainers: 1
pypi.org: pie-extended

Extension for nlp-pie package

  • Versions: 45
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 79 Last month
Rankings
Dependent repos count: 9.0%
Dependent packages count: 10.1%
Average: 15.8%
Stargazers count: 17.7%
Forks count: 19.1%
Downloads: 23.1%
Maintainers (1)
Last synced: 10 months ago

Dependencies

requirements.txt pypi
  • PaPie ==0.3.9
  • autodisambiguator >=0.0.1,<1.0.0
  • click <8.0,>=7.0
  • colorama >=0.4.4
  • numpy <1.18.0
  • regex *
  • requests >=2.25.0
  • scipy <1.6.0
  • unidecode >=1.1.1
.github/workflows/test.yml actions
  • AndreMiras/coveralls-python-action develop composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
setup.py pypi