https://github.com/explosion/spacy-transformers
πΈ Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
-
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
-
βDOI references
-
βAcademic publication links
-
βCommitters with academic emails
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (18.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
πΈ Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Basic Info
- Host: GitHub
- Owner: explosion
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://spacy.io/usage/embeddings-transformers
- Size: 1.15 MB
Statistics
- Stars: 1,393
- Watchers: 30
- Forks: 172
- Open Issues: 1
- Releases: 46
Topics
Metadata Files
README.md
spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
This package provides spaCy components and
architectures to use transformer models via
Hugging Face's transformers in
spaCy. The result is convenient access to state-of-the-art transformer
architectures, such as BERT, GPT-2, XLNet, etc.
This release requires spaCy v3. For the previous version of this library, see the
v0.6.xbranch.
Features
- Use pretrained transformer models like BERT, RoBERTa and XLNet to power your spaCy pipeline.
- Easy multi-task learning: backprop to one transformer model from several pipeline components.
- Train using spaCy v3's powerful and extensible config system.
- Automatic alignment of transformer output to spaCy's tokenization.
- Easily customize what transformer data is saved in the
Docobject. - Easily customize how long documents are processed.
- Out-of-the-box serialization and model packaging.
π Installation
Installing the package from pip will automatically install all dependencies, including PyTorch and spaCy. Make sure you install this package before you install the models. Also note that this package requires Python 3.6+, PyTorch v1.5+ and spaCy v3.0+.
bash
pip install 'spacy[transformers]'
For GPU installation, find your CUDA version using nvcc --version and add the
version in brackets, e.g.
spacy[transformers,cuda92] for CUDA9.2 or spacy[transformers,cuda100] for
CUDA10.0.
If you are having trouble installing PyTorch, follow the instructions on the official website for your specific operating system and requirements.
π Documentation
β οΈ Important note: This package has been extensively refactored to take advantage of spaCy v3.0. Previous versions that were built for spaCy v2.x worked considerably differently. Please see previous tagged versions of this README for documentation on prior versions.
- π Embeddings, Transformers and Transfer Learning: How to use transformers in spaCy
- π Training Pipelines and Models: Train and update components on your own data and integrate custom models
- π Layers and Model Architectures: Power spaCy components with custom neural networks
- π
Transformer: Pipeline component API reference - π Transformer architectures: Architectures and registered functions
Applying pretrained text and token classification models
Note that the transformer component from spacy-transformers does not support
task-specific heads like token or text classification. A task-specific
transformer model can be used as a source of features to train spaCy components
like ner or textcat, but the transformer component does not provide access
to task-specific heads for training or inference.
Alternatively, if you only want use to the predictions from an existing
Hugging Face text or token classification model, you can use the wrappers from
spacy-huggingface-pipelines
to incorporate task-specific transformer models into your spaCy pipelines.
Bug reports and other issues
Please use spaCy's issue tracker to report a bug, or open a new thread on the discussion board for any other issue.
Owner
- Name: Explosion
- Login: explosion
- Kind: organization
- Email: contact@explosion.ai
- Location: Berlin, Germany
- Website: https://explosion.ai
- Twitter: explosion_ai
- Repositories: 61
- Profile: https://github.com/explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
GitHub Events
Total
- Release event: 4
- Watch event: 49
- Delete event: 4
- Member event: 1
- Issue comment event: 5
- Push event: 11
- Pull request event: 2
- Fork event: 7
- Create event: 7
Last Year
- Release event: 4
- Watch event: 49
- Delete event: 4
- Member event: 1
- Issue comment event: 5
- Push event: 11
- Pull request event: 2
- Fork event: 7
- Create event: 7
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Matthew Honnibal | h****h@g****m | 833 |
| Ines Montani | i****s@i****o | 215 |
| Adriane Boyd | a****d@g****m | 137 |
| svlandeg | s****m@g****m | 81 |
| DaniΓ«l de Kok | me@d****u | 18 |
| Yohei Tamura | t****y@g****m | 15 |
| Madeesh Kannan | s****e | 4 |
| Santiago Castro | b****t@m****y | 4 |
| Ryn Daniels | r****n@e****i | 3 |
| Kenneth Enevoldsen | k****n@g****m | 2 |
| Osman Baskaya | t****e@g****m | 1 |
| Basile Dura | b****a | 1 |
| Masoud Kazemi | m****3@g****m | 1 |
| Nirant | N****K | 1 |
| Paul O'Leary McCann | p****m@d****m | 1 |
| Peter B | 5****r | 1 |
| Ryn Daniels | 3****s | 1 |
| Timothy J Laurent | t****t@g****m | 1 |
| kayvane | k****2@h****m | 1 |
| ssavvi | 5****i | 1 |
| tsoernes | t****s@g****m | 1 |
| zlin | z****n@b****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 130
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Total issue authors: 0
- Total pull request authors: 13
- Average comments per issue: 0
- Average comments per pull request: 0.92
- Merged pull requests: 114
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- adrianeboyd (89)
- danieldk (15)
- shadeMe (6)
- honnibal (5)
- svlandeg (5)
- ryndaniels (3)
- hume-brian (2)
- polm (2)
- bryant1410 (1)
- KennethEnevoldsen (1)
- bdura (1)
- hiroshi-matsuda-rit (1)
- cherbel (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- tiangolo/issue-manager 0.2.1 composite
- cython >=0.25
- dataclasses >=0.6,<1.0
- mypy >=0.990,<0.1000
- numpy >=1.15.0
- pytest >=5.2.0
- pytest-cov >=2.7.0,<2.8.0
- spacy >=3.5.0,<4.0.0
- spacy-alignments >=0.7.2,<1.0.0
- srsly >=2.4.0,<3.0.0
- torch >=1.8.0
- transformers >=3.4.0,<4.27.0
- types-contextvars >=0.1.2
- types-dataclasses >=0.1.3
- actions/checkout v3 composite
- actions/setup-python v4 composite