https://github.com/argilla-io/spacy-wordnet

spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface

https://github.com/argilla-io/spacy-wordnet

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

pipeline spacy wordnet

Keywords from Contributors

named-entity-recognition cython entity-linking text-classification tokenization
Last synced: 9 months ago · JSON representation

Repository

spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface

Basic Info
  • Host: GitHub
  • Owner: argilla-io
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 139 MB
Statistics
  • Stars: 260
  • Watchers: 12
  • Forks: 19
  • Open Issues: 6
  • Releases: 5
Topics
pipeline spacy wordnet
Created over 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License Authors

README.md

spaCy WordNet

spaCy Wordnet is a simple custom component for using WordNet, MultiWordnet and WordNet domains with spaCy.

The component combines the NLTK wordnet interface with WordNet domains to allow users to:

  • Get all synsets for a processed token. For example, getting all the synsets (word senses) of the word bank.
  • Get and filter synsets by domain. For example, getting synonyms of the verb withdraw in the financial domain.

Getting started

The spaCy WordNet component can be easily integrated into spaCy pipelines. You just need the following:

Prerequisites

  • Python 3.X
  • spaCy

You also need to install the following NLTK wordnet data:

bash python -m nltk.downloader wordnet python -m nltk.downloader omw

Install

bash pip install spacy-wordnet

Supported languages

Almost all Open Multi Wordnet languages are supported.

Usage

Once you choose the desired language (from the list of supported ones above), you will need to manually download a spaCy model for it. Check the list of available models for each language at SpaCy 2.x or SpaCy 3.x.

English example

Download example model: bash python -m spacy download en_core_web_sm

Run: ````python

import spacy

from spacywordnet.wordnetannotator import WordnetAnnotator

Load an spacy model

nlp = spacy.load('encoreweb_sm')

Spacy 3.x

nlp.addpipe("spacywordnet", after='tagger')

Spacy 2.x

nlp.addpipe(WordnetAnnotator(nlp, name="spacywordnet"), after='tagger')

token = nlp('prices')[0]

wordnet object link spacy token with nltk wordnet interface by giving acces to

synsets and lemmas

token..wordnet.synsets() token..wordnet.lemmas()

And automatically tags with wordnet domains

token..wordnet.wordnetdomains() ````

spaCy WordNet lets you find synonyms by domain of interest for example economy ````python economydomains = ['finance', 'banking'] enrichedsentence = [] sentence = nlp('I want to withdraw 5,000 euros')

For each token in the sentence

for token in sentence: # We get those synsets within the desired domains synsets = token..wordnet.wordnetsynsetsfordomain(economydomains) if not synsets: enrichedsentence.append(token.text) else: lemmasforsynset = [lemma for s in synsets for lemma in s.lemmanames()] # If we found a synset in the economy domains # we get the variants and add them to the enriched sentence enrichedsentence.append('({})'.format('|'.join(set(lemmasforsynset))))

Let's see our enriched sentence

print(' '.join(enriched_sentence))

>> I (need|want|require) to (draw|withdraw|drawoff|takeout) 5,000 euros

````

Portuguese example

Download example model: bash python -m spacy download pt_core_news_sm

Run: ```python import spacy

from spacywordnet.wordnetannotator import WordnetAnnotator

Load an spacy model

nlp = spacy.load('ptcorenews_sm')

Spacy 3.x

nlp.addpipe("spacywordnet", after='tagger', config={'lang': nlp.lang})

Spacy 2.x

nlp.add_pipe(WordnetAnnotator(nlp.lang), after='tagger')

text = "Eu quero retirar 5.000 euros" economydomains = ['finance', 'banking'] enrichedsentence = [] sentence = nlp(text)

For each token in the sentence

for token in sentence: # We get those synsets within the desired domains synsets = token..wordnet.wordnetsynsetsfordomain(economydomains) if not synsets: enrichedsentence.append(token.text) else: lemmasforsynset = [lemma for s in synsets for lemma in s.lemmanames('por')] # If we found a synset in the economy domains # we get the variants and add them to the enriched sentence enrichedsentence.append('({})'.format('|'.join(set(lemmasforsynset))))

Let's see our enriched sentence

print(' '.join(enriched_sentence))

>> Eu (querer|desejar|esperar) retirar 5.000 euros

```

Owner

  • Name: Argilla
  • Login: argilla-io
  • Kind: organization
  • Email: contact@argilla.io

Building the open-source tool for data-centric NLP

GitHub Events

Total
  • Watch event: 8
Last Year
  • Watch event: 8

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 18
  • Total Committers: 7
  • Avg Commits per committer: 2.571
  • Development Distribution Score (DDS): 0.5
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Francisco Aranda f****o@r****i 9
Dani d****l@r****i 3
Francisco Aranda f****n@g****m 2
Vinícius v****2@g****m 1
Samuel Frazee f****l@g****m 1
Ian Thompson i****1@g****m 1
Antonio Carlos Falcão Petri f****i 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 6
  • Total pull requests: 13
  • Average time to close issues: 28 days
  • Average time to close pull requests: 4 months
  • Total issue authors: 5
  • Total pull request authors: 9
  • Average comments per issue: 2.5
  • Average comments per pull request: 1.38
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • gleidsonh (2)
  • carschno (1)
  • KTRosenberg (1)
  • falcaopetri (1)
  • polm (1)
Pull Request Authors
  • frascuchon (3)
  • dvsrepo (2)
  • dependabot[bot] (2)
  • it176131 (2)
  • Vnicius (1)
  • Daniel-R-Armstrong (1)
  • falcaopetri (1)
  • Hydroptix (1)
  • m0canu1 (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2) github_actions (2)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/upload-artifact v2 composite
  • pypa/gh-action-pypi-publish master composite
requirements.txt pypi
  • nltk >=3.3,<3.6
  • spacy >=2.0,<4.0
setup.py pypi