spacy-wrap

spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.

https://github.com/kennethenevoldsen/spacy-wrap

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.8%) to scientific vocabulary

Keywords

deep-learning huggingface huggingface-transformers language-model machine-learning natural-language-processing nlp pytorch spacy spacy-extension spacy-extensions spacy-models spacy-nlp spacy-pipeline spacy-transformers text-classification transformers

Keywords from Contributors

augmentation nlproc text-augmentation training-data dependency-distance descriptive-statistics readability readability-scores syntactic-analysis data-profilers
Last synced: 6 months ago · JSON representation

Repository

spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.

Basic Info
Statistics
  • Stars: 46
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 7
Topics
deep-learning huggingface huggingface-transformers language-model machine-learning natural-language-processing nlp pytorch spacy spacy-extension spacy-extensions spacy-models spacy-nlp spacy-pipeline spacy-transformers text-classification transformers
Created about 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License Citation

readme.md

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

PyPI version python version Code style: black github actions pytest github actions docs github coverage

spaCy-wrap is a minimal library intended for wrapping fine-tuned transformers from the Huggingface model hub in your spaCy pipeline allowing the inclusion of existing models within SpaCy workflows.

As far as possible it follows a similar API as spacy-transformers.

NOTE: Since the release of spaCy-wrap, Explosion released the spacy-huggingface-pipelines it takes the approach of wrapping the Huggingface pipeline as opposed to the transformer. That means token aggregation and conversion into spans happens at the Huggingface pipeline, while in spaCy-wrap it happens at the logits of the model which can sometimes lead to unfortunate differences in results. I generally recommend using the spacy-huggingface-pipelines for most use cases, but if you need to use the transformer output more directly spaCy-wrap can have its uses.

Installation

Installing spacy-wrap is simple using pip:

pip install spacy_wrap

Examples

The following shows a simple example of how you can quickly add a fine-tuned transformer model from the Huggingface model hub for either text classification, named entity or token classification.

Sequence Classification

In this example, we will use a model fine-tuned for sentiment classification on SST2. This model classifies whether a text is positive or negative. We will add this model to a blank English pipeline:

```python import spacy import spacy_wrap

nlp = spacy.blank("en")

config = { "docextensiontrfdata": "clftrfdata", # document extention for the forward pass "docextension_prediction": "sentiment", # document extention for the prediction "model": { # the model name or path of huggingface model "name": "distilbert-base-uncased-finetuned-sst-2-english",
}, }

transformer = nlp.addpipe("sequenceclassification_transformer", config=config)

doc = nlp("spaCy is a wonderful tool")

print(doc.cats)

{'NEGATIVE': 0.001, 'POSITIVE': 0.999}

print(doc._.sentiment)

'POSITIVE'

print(doc..clftrf_data)

TransformerData(wordpieces=...

`` These pipelines can also easily be applied to multiple documents using thenlp.pipe` as one would expect from a spaCy component:

```python docs = nlp.pipe( [ "I hate wrapping my own models", "Isn't there a tool for this?!", "spacy-wrap is great for wrapping models", ] )

for doc in docs: print(doc._.sentiment)

'NEGATIVE'

'NEGATIVE'

'POSITIVE'

```


More Examples It is always nice to have more than one example. Here is another one where we add the Hate speech model for Danish to a blank Danish pipeline: ```python import spacy import spacy_wrap nlp = spacy.blank("da") config = { "doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass "doc_extension_prediction": "hate_speech", # document extention for the prediction # choose custom labels "labels": ["Not hate Speech", "Hate speech"], "model": { "name": "DaNLP/da-bert-hatespeech-detection", # the model name or path of huggingface model }, } transformer = nlp.add_pipe("classification_transformer", config=config) doc = nlp("Senile gamle idiot") # old senile idiot doc._.clf_trf_data # TransformerData(wordpieces=... doc._.hate_speech # "Hate speech" doc._.hate_speech_prob # {'prob': array([0.013, 0.987], dtype=float32), 'labels': ['Not hate Speech', 'Hate speech']} ```


Token Classification

We can also use the model for token classification:

```python import spacy import spacy_wrap nlp = spacy.blank("en")

config = {"model": {"name": "vblagoje/bert-english-uncased-finetuned-pos"}, # "predictions_to": ["pos"] # optional, can be "pos", "tag" or "ents" }

snlp.addpipe("tokenclassification_transformer", config=config)

text = "My name is Wolfgang and I live in Berlin"

doc = nlp(text) print(doc..tokclf_predictions)

['PRON', 'NOUN', 'AUX', 'PROPN', 'CCONJ', 'PRON', 'VERB', 'ADP', 'PROPN']

```

By default, spacy-wrap will automatically detect it the labels follow the universal POS tags as well. If so it will also assign it to the token.pos, similar regular spacy pipelines:

```python print(doc[0].pos_)

'PRON'

```

Named Entity Recognition

In this example, we use a model fine-tuned for named entity recognition. spacy-wrap will in this case infer from the IOB tags that the model is intended for named entity recognition and assign it to doc.ents.

```python import spacy import spacy_wrap nlp = spacy.blank("en")

specify model from the hub

config = {"model": {"name": "dslim/bert-base-NER"}, "predictions_to": ["ents"]} # forced to be named entity recognition, if left out it will be estimated from the labels

add it to the pipe

nlp.addpipe("tokenclassification_transformer", config=config)

doc = nlp("My name is Wolfgang and I live in Berlin.")

print(doc.ents)

(Wolfgang, Berlin)

```

Documentation

| Documentation | | | -------------------------- | ------------------------------------------- | | Installation | Installation instructions for spacy-wrap. | | News and changelog | New additions, changes and version history. | | Documentation | The reference for spacy-wrap's API. |

Where to ask questions

| Type | | | ------------------------------ | ---------------------- | | FAQ | FAQ | | Bug Reports | GitHub Issue Tracker | | Feature Requests & Ideas | GitHub Issue Tracker | | Usage Questions | GitHub Discussions | | General Discussion | GitHub Discussions |

Owner

  • Name: Kenneth Enevoldsen
  • Login: KennethEnevoldsen
  • Kind: user
  • Location: Aarhus
  • Company: Center for Humanities Computing Aarhus

Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre

GitHub Events

Total
Last Year

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 261
  • Total Committers: 8
  • Avg Commits per committer: 32.625
  • Development Distribution Score (DDS): 0.709
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Kenneth Enevoldsen k****n@g****m 76
pre-commit-ci[bot] 6****] 69
KennethEnevoldsen k****n@g****m 66
dependabot[bot] 4****] 41
github-actions g****s@g****m 5
github-actions a****n@g****m 2
Will Frey j****9@g****m 1
Adriane Boyd a****d@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 8
  • Total pull requests: 180
  • Average time to close issues: 2 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 6
  • Total pull request authors: 5
  • Average comments per issue: 3.38
  • Average comments per pull request: 0.75
  • Merged pull requests: 122
  • Bot issues: 0
  • Bot pull requests: 161
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • KennethEnevoldsen (2)
  • Dris101 (2)
  • nthomsencph (1)
  • MWMWM (1)
  • espoirMur (1)
  • mazuzic-conga (1)
Pull Request Authors
  • dependabot[bot] (92)
  • pre-commit-ci[bot] (73)
  • KennethEnevoldsen (17)
  • willfrey (1)
  • nickprock (1)
Top Labels
Issue Labels
enhancement (4) bug (4) Stale (1) good first issue (1)
Pull Request Labels
dependencies (92) github_actions (48) python (14)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 633 last-month
  • Total dependent packages: 3
  • Total dependent repositories: 1
  • Total versions: 21
  • Total maintainers: 1
pypi.org: spacy-wrap

Wrappers for including pre-trained transformers in spaCy pipelines

  • Documentation: https://spacy-wrap.readthedocs.io/
  • License: MIT License Copyright (c) 2021 Kenneth Enevoldsen Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  • Latest release: 1.4.5
    published over 2 years ago
  • Versions: 21
  • Dependent Packages: 3
  • Dependent Repositories: 1
  • Downloads: 633 Last month
Rankings
Dependent packages count: 2.3%
Downloads: 10.3%
Average: 11.4%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/dependabot_automerge.yml actions
  • hmarr/auto-approve-action v3.1.0 composite
.github/workflows/documentation.yml actions
  • actions/checkout v3 composite
  • ad-m/github-push-action v0.6.0 composite
  • sphinx-notes/pages 2.1 composite
.github/workflows/release.yml actions
  • actions/checkout v3 composite
  • relekang/python-semantic-release v7.33.1 composite
.github/workflows/stale.yml actions
  • actions/stale v6 composite
.github/workflows/tests.yml actions
  • MishaKav/pytest-coverage-comment v1.1.43 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
pyproject.toml pypi
  • spacy >=3.2.1,<3.6.0
  • spacy_transformers >=1.2.1,<1.3.0
  • thinc >=8.0.13,<8.2.0