https://github.com/explosion/spacy-transformers

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.6%) to scientific vocabulary

Keywords

bert google gpt-2 huggingface language-model machine-learning natural-language-processing natural-language-understanding nlp openai pytorch pytorch-model spacy spacy-extension spacy-pipeline transfer-learning xlnet

Keywords from Contributors

tokenization cython entity-linking named-entity-recognition text-classification jax machine-learning-library mxnet transformer type-checking

Last synced: 5 months ago · JSON representation

Repository

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Basic Info

Host: GitHub
Owner: explosion
License: mit
Language: Python
Default Branch: master
Homepage: https://spacy.io/usage/embeddings-transformers
Size: 1.15 MB

Statistics

Stars: 1,393
Watchers: 30
Forks: 172
Open Issues: 1
Releases: 46

Topics

Created over 6 years ago · Last pushed 9 months ago

Metadata Files

Readme License

README.md

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

This package provides spaCy components and architectures to use transformer models via Hugging Face's transformers in spaCy. The result is convenient access to state-of-the-art transformer architectures, such as BERT, GPT-2, XLNet, etc.

This release requires spaCy v3. For the previous version of this library, see the v0.6.x branch.

Features

Use pretrained transformer models like BERT, RoBERTa and XLNet to power your spaCy pipeline.
Easy multi-task learning: backprop to one transformer model from several pipeline components.
Train using spaCy v3's powerful and extensible config system.
Automatic alignment of transformer output to spaCy's tokenization.
Easily customize what transformer data is saved in the Doc object.
Easily customize how long documents are processed.
Out-of-the-box serialization and model packaging.

🚀 Installation

Installing the package from pip will automatically install all dependencies, including PyTorch and spaCy. Make sure you install this package before you install the models. Also note that this package requires Python 3.6+, PyTorch v1.5+ and spaCy v3.0+.

bash pip install 'spacy[transformers]'

For GPU installation, find your CUDA version using nvcc --version and add the version in brackets, e.g. spacy[transformers,cuda92] for CUDA9.2 or spacy[transformers,cuda100] for CUDA10.0.

If you are having trouble installing PyTorch, follow the instructions on the official website for your specific operating system and requirements.

📖 Documentation

⚠️ Important note: This package has been extensively refactored to take advantage of spaCy v3.0. Previous versions that were built for spaCy v2.x worked considerably differently. Please see previous tagged versions of this README for documentation on prior versions.

📘 Embeddings, Transformers and Transfer Learning: How to use transformers in spaCy
📘 Training Pipelines and Models: Train and update components on your own data and integrate custom models
📘 Layers and Model Architectures: Power spaCy components with custom neural networks
📗 Transformer: Pipeline component API reference
📗 Transformer architectures: Architectures and registered functions

Applying pretrained text and token classification models

Note that the transformer component from spacy-transformers does not support task-specific heads like token or text classification. A task-specific transformer model can be used as a source of features to train spaCy components like ner or textcat, but the transformer component does not provide access to task-specific heads for training or inference.

Alternatively, if you only want use to the predictions from an existing Hugging Face text or token classification model, you can use the wrappers from spacy-huggingface-pipelines to incorporate task-specific transformer models into your spaCy pipelines.

Bug reports and other issues

Please use spaCy's issue tracker to report a bug, or open a new thread on the discussion board for any other issue.

Owner

Name: Explosion
Login: explosion
Kind: organization
Email: contact@explosion.ai
Location: Berlin, Germany

Website: https://explosion.ai
Twitter: explosion_ai
Repositories: 61
Profile: https://github.com/explosion

A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing

GitHub Events

Total

Release event: 4
Watch event: 49
Delete event: 4
Member event: 1
Issue comment event: 5
Push event: 11
Pull request event: 2
Fork event: 7
Create event: 7

Last Year

Release event: 4
Watch event: 49
Delete event: 4
Member event: 1
Issue comment event: 5
Push event: 11
Pull request event: 2
Fork event: 7
Create event: 7

Committers

Last synced: 9 months ago

All Time

Total Commits: 1,324
Total Committers: 22
Avg Commits per committer: 60.182
Development Distribution Score (DDS): 0.371

Past Year

Commits: 8
Committers: 3
Avg Commits per committer: 2.667
Development Distribution Score (DDS): 0.25

Top Committers

Name	Email	Commits
Matthew Honnibal	h**h@g**m	833
Ines Montani	i**s@i**o	215
Adriane Boyd	a**d@g**m	137
svlandeg	s**m@g**m	81
Daniël de Kok	me@d****u	18
Yohei Tamura	t**y@g**m	15
Madeesh Kannan	s****e	4
Santiago Castro	b**t@m**y	4
Ryn Daniels	r**n@e**i	3
Kenneth Enevoldsen	k**n@g**m	2
Osman Baskaya	t**e@g**m	1
Basile Dura	b****a	1
Masoud Kazemi	m**3@g**m	1
Nirant	N****K	1
Paul O'Leary McCann	p**m@d**m	1
Peter B	5****r	1
Ryn Daniels	3****s	1
Timothy J Laurent	t**t@g**m	1
kayvane	k**2@h**m	1
ssavvi	5****i	1
tsoernes	t**s@g**m	1
zlin	z**n@b**m	1

Committer Domains (Top 20 + Academic)

bombora.com: 1 dampfkraft.com: 1 explosion.ai: 1 montevideo.com.uy: 1 danieldk.eu: 1 ines.io: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 130
Average time to close issues: N/A
Average time to close pull requests: 8 days
Total issue authors: 0
Total pull request authors: 13
Average comments per issue: 0
Average comments per pull request: 0.92
Merged pull requests: 114
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 1 day
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

adrianeboyd (89)
danieldk (15)
shadeMe (6)
honnibal (5)
svlandeg (5)
ryndaniels (3)
hume-brian (2)
polm (2)
bryant1410 (1)
KennethEnevoldsen (1)
bdura (1)
hiroshi-matsuda-rit (1)
cherbel (1)

Top Labels

Issue Labels

Pull Request Labels

bug (10) enhancement (7) tests (6) v1.1 (5) feat / pipeline (4) perf / speed (3) feat / alignment (3) perf / memory (2) feat / serialize (2) docs (1) pytorch (1) install (1)

Dependencies

.github/workflows/explosionbot.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/issue-manager.yml actions

tiangolo/issue-manager 0.2.1 composite

requirements.txt pypi

cython >=0.25
dataclasses >=0.6,<1.0
mypy >=0.990,<0.1000
numpy >=1.15.0
pytest >=5.2.0
pytest-cov >=2.7.0,<2.8.0
spacy >=3.5.0,<4.0.0
spacy-alignments >=0.7.2,<1.0.0
srsly >=2.4.0,<3.0.0
torch >=1.8.0
transformers >=3.4.0,<4.27.0
types-contextvars >=0.1.2
types-dataclasses >=0.1.3

.github/workflows/tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

pyproject.toml pypi

setup.py pypi