https://github.com/argilla-io/spacy-wordnet
spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
Basic Info
- Host: GitHub
- Owner: argilla-io
- License: mit
- Language: Python
- Default Branch: master
- Size: 139 MB
Statistics
- Stars: 260
- Watchers: 12
- Forks: 19
- Open Issues: 6
- Releases: 5
Topics
Metadata Files
README.md
spaCy WordNet
spaCy Wordnet is a simple custom component for using WordNet, MultiWordnet and WordNet domains with spaCy.
The component combines the NLTK wordnet interface with WordNet domains to allow users to:
- Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word
bank. - Get and filter synsets by domain. For example, getting synonyms of the verb
withdrawin the financial domain.
Getting started
The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:
Prerequisites
- Python 3.X
- spaCy
You also need to install the following NLTK wordnet data:
bash
python -m nltk.downloader wordnet
python -m nltk.downloader omw
Install
bash
pip install spacy-wordnet
Supported languages
Almost all Open Multi Wordnet languages are supported.
Usage
Once you choose the desired language (from the list of supported ones above), you will need to manually download a spaCy model for it. Check the list of available models for each language at SpaCy 2.x or SpaCy 3.x.
English example
Download example model:
bash
python -m spacy download en_core_web_sm
Run: ````python
import spacy
from spacywordnet.wordnetannotator import WordnetAnnotator
Load an spacy model
nlp = spacy.load('encoreweb_sm')
Spacy 3.x
nlp.addpipe("spacywordnet", after='tagger')
Spacy 2.x
nlp.addpipe(WordnetAnnotator(nlp, name="spacywordnet"), after='tagger')
token = nlp('prices')[0]
wordnet object link spacy token with nltk wordnet interface by giving acces to
synsets and lemmas
token..wordnet.synsets() token..wordnet.lemmas()
And automatically tags with wordnet domains
token..wordnet.wordnetdomains() ````
spaCy WordNet lets you find synonyms by domain of interest for example economy ````python economydomains = ['finance', 'banking'] enrichedsentence = [] sentence = nlp('I want to withdraw 5,000 euros')
For each token in the sentence
for token in sentence: # We get those synsets within the desired domains synsets = token..wordnet.wordnetsynsetsfordomain(economydomains) if not synsets: enrichedsentence.append(token.text) else: lemmasforsynset = [lemma for s in synsets for lemma in s.lemmanames()] # If we found a synset in the economy domains # we get the variants and add them to the enriched sentence enrichedsentence.append('({})'.format('|'.join(set(lemmasforsynset))))
Let's see our enriched sentence
print(' '.join(enriched_sentence))
>> I (need|want|require) to (draw|withdraw|drawoff|takeout) 5,000 euros
````
Portuguese example
Download example model:
bash
python -m spacy download pt_core_news_sm
Run: ```python import spacy
from spacywordnet.wordnetannotator import WordnetAnnotator
Load an spacy model
nlp = spacy.load('ptcorenews_sm')
Spacy 3.x
nlp.addpipe("spacywordnet", after='tagger', config={'lang': nlp.lang})
Spacy 2.x
nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')
text = "Eu quero retirar 5.000 euros" economydomains = ['finance', 'banking'] enrichedsentence = [] sentence = nlp(text)
For each token in the sentence
for token in sentence: # We get those synsets within the desired domains synsets = token..wordnet.wordnetsynsetsfordomain(economydomains) if not synsets: enrichedsentence.append(token.text) else: lemmasforsynset = [lemma for s in synsets for lemma in s.lemmanames('por')] # If we found a synset in the economy domains # we get the variants and add them to the enriched sentence enrichedsentence.append('({})'.format('|'.join(set(lemmasforsynset))))
Let's see our enriched sentence
print(' '.join(enriched_sentence))
>> Eu (querer|desejar|esperar) retirar 5.000 euros
```
Owner
- Name: Argilla
- Login: argilla-io
- Kind: organization
- Email: contact@argilla.io
- Website: https://argilla.io
- Twitter: argilla_io
- Repositories: 12
- Profile: https://github.com/argilla-io
Building the open-source tool for data-centric NLP
GitHub Events
Total
- Watch event: 8
Last Year
- Watch event: 8
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Francisco Aranda | f****o@r****i | 9 |
| Dani | d****l@r****i | 3 |
| Francisco Aranda | f****n@g****m | 2 |
| Vinícius | v****2@g****m | 1 |
| Samuel Frazee | f****l@g****m | 1 |
| Ian Thompson | i****1@g****m | 1 |
| Antonio Carlos Falcão Petri | f****i | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 6
- Total pull requests: 13
- Average time to close issues: 28 days
- Average time to close pull requests: 4 months
- Total issue authors: 5
- Total pull request authors: 9
- Average comments per issue: 2.5
- Average comments per pull request: 1.38
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- gleidsonh (2)
- carschno (1)
- KTRosenberg (1)
- falcaopetri (1)
- polm (1)
Pull Request Authors
- frascuchon (3)
- dvsrepo (2)
- dependabot[bot] (2)
- it176131 (2)
- Vnicius (1)
- Daniel-R-Armstrong (1)
- falcaopetri (1)
- Hydroptix (1)
- m0canu1 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/download-artifact v2 composite
- actions/upload-artifact v2 composite
- pypa/gh-action-pypi-publish master composite
- nltk >=3.3,<3.6
- spacy >=2.0,<4.0