https://github.com/argilla-io/spacy-wordnet

spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Keywords

pipeline spacy wordnet

Keywords from Contributors

named-entity-recognition cython entity-linking text-classification tokenization

Last synced: 9 months ago · JSON representation

Repository

spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface

Basic Info

Host: GitHub
Owner: argilla-io
License: mit
Language: Python
Default Branch: master
Size: 139 MB

Statistics

Stars: 260
Watchers: 12
Forks: 19
Open Issues: 6
Releases: 5

Topics

pipeline spacy wordnet

Created over 7 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Changelog License Authors

spaCy WordNet

spaCy Wordnet is a simple custom component for using WordNet, MultiWordnet and WordNet domains with spaCy.

The component combines the NLTK wordnet interface with WordNet domains to allow users to:

Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word bank.
Get and filter synsets by domain. For example, getting synonyms of the verb withdraw in the financial domain.

Getting started

The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:

Prerequisites

Python 3.X
spaCy

You also need to install the following NLTK wordnet data:

bash python -m nltk.downloader wordnet python -m nltk.downloader omw

Install

bash pip install spacy-wordnet

Supported languages

Almost all Open Multi Wordnet languages are supported.

Usage

Once you choose the desired language (from the list of supported ones above), you will need to manually download a spaCy model for it. Check the list of available models for each language at SpaCy 2.x or SpaCy 3.x.

English example

Download example model: bash python -m spacy download en_core_web_sm

Run: ````python

import spacy

from spacywordnet.wordnetannotator import WordnetAnnotator

Load an spacy model

nlp = spacy.load('encoreweb_sm')

Spacy 3.x

nlp.addpipe("spacywordnet", after='tagger')

Spacy 2.x

nlp.addpipe(WordnetAnnotator(nlp, name="spacywordnet"), after='tagger')

token = nlp('prices')[0]

wordnet object link spacy token with nltk wordnet interface by giving acces to

synsets and lemmas

token..wordnet.synsets() token..wordnet.lemmas()

And automatically tags with wordnet domains

token..wordnet.wordnetdomains() ````

spaCy WordNet lets you find synonyms by domain of interest for example economy ````python economydomains = ['finance', 'banking'] enrichedsentence = [] sentence = nlp('I want to withdraw 5,000 euros')

For each token in the sentence

for token in sentence: # We get those synsets within the desired domains synsets = token..wordnet.wordnetsynsetsfordomain(economydomains) if not synsets: enrichedsentence.append(token.text) else: lemmasforsynset = [lemma for s in synsets for lemma in s.lemmanames()] # If we found a synset in the economy domains # we get the variants and add them to the enriched sentence enrichedsentence.append('({})'.format('|'.join(set(lemmasforsynset))))

Let's see our enriched sentence

print(' '.join(enriched_sentence))

>> I (need|want|require) to (draw|withdraw|drawoff|takeout) 5,000 euros

````

Portuguese example

Download example model: bash python -m spacy download pt_core_news_sm

Run: ```python import spacy

from spacywordnet.wordnetannotator import WordnetAnnotator

Load an spacy model

nlp = spacy.load('ptcorenews_sm')

Spacy 3.x

nlp.addpipe("spacywordnet", after='tagger', config={'lang': nlp.lang})

Spacy 2.x

nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')

text = "Eu quero retirar 5.000 euros" economydomains = ['finance', 'banking'] enrichedsentence = [] sentence = nlp(text)

For each token in the sentence

for token in sentence: # We get those synsets within the desired domains synsets = token..wordnet.wordnetsynsetsfordomain(economydomains) if not synsets: enrichedsentence.append(token.text) else: lemmasforsynset = [lemma for s in synsets for lemma in s.lemmanames('por')] # If we found a synset in the economy domains # we get the variants and add them to the enriched sentence enrichedsentence.append('({})'.format('|'.join(set(lemmasforsynset))))

Let's see our enriched sentence

print(' '.join(enriched_sentence))

>> Eu (querer|desejar|esperar) retirar 5.000 euros

```

Owner

Name: Argilla
Login: argilla-io
Kind: organization
Email: contact@argilla.io

Website: https://argilla.io
Twitter: argilla_io
Repositories: 12
Profile: https://github.com/argilla-io

Building the open-source tool for data-centric NLP

GitHub Events

Total

Watch event: 8

Last Year

Watch event: 8

Committers

Last synced: about 1 year ago

All Time

Total Commits: 18
Total Committers: 7
Avg Commits per committer: 2.571
Development Distribution Score (DDS): 0.5

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Francisco Aranda	f**o@r**i	9
Dani	d**l@r**i	3
Francisco Aranda	f**n@g**m	2
Vinícius	v**2@g**m	1
Samuel Frazee	f**l@g**m	1
Ian Thompson	i**1@g**m	1
Antonio Carlos Falcão Petri	f****i	1

Committer Domains (Top 20 + Academic)

recogn.ai: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 6
Total pull requests: 13
Average time to close issues: 28 days
Average time to close pull requests: 4 months
Total issue authors: 5
Total pull request authors: 9
Average comments per issue: 2.5
Average comments per pull request: 1.38
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

gleidsonh (2)
carschno (1)
KTRosenberg (1)
falcaopetri (1)
polm (1)

Pull Request Authors

frascuchon (3)
dvsrepo (2)
dependabot[bot] (2)
it176131 (2)
Vnicius (1)
Daniel-R-Armstrong (1)
falcaopetri (1)
Hydroptix (1)
m0canu1 (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (2) github_actions (2)

Dependencies

.github/workflows/ci.yml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/download-artifact v2 composite
actions/upload-artifact v2 composite
pypa/gh-action-pypi-publish master composite

requirements.txt pypi

nltk >=3.3,<3.6
spacy >=2.0,<4.0

setup.py pypi

https://github.com/argilla-io/spacy-wordnet

Science Score: 13.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

spaCy WordNet

Getting started

Prerequisites

Install

Supported languages

Usage

English example

Load an spacy model

Spacy 3.x

Spacy 2.x

nlp.addpipe(WordnetAnnotator(nlp, name="spacywordnet"), after='tagger')

wordnet object link spacy token with nltk wordnet interface by giving acces to

synsets and lemmas

And automatically tags with wordnet domains

For each token in the sentence

Let's see our enriched sentence

>> I (need|want|require) to (draw|withdraw|drawoff|takeout) 5,000 euros

Portuguese example

Load an spacy model

Spacy 3.x

Spacy 2.x

nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')

For each token in the sentence

Let's see our enriched sentence

>> Eu (querer|desejar|esperar) retirar 5.000 euros

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies