https://github.com/avivyaish/x2vec
We have implemented, expanded and reviewed the paper “Sense2Vec - A Fast and Accurate Method For Word Sense Disambiguation In Neural Word Embeddings" by Andrew Trask, Phil Michalak and John Liu.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
machine-learning
machinelearning
ml
natural-language-processing
nlp
word-embedding
word-embeddings
wordembedding
wordembeddings
Last synced: 6 months ago
·
JSON representation
Repository
We have implemented, expanded and reviewed the paper “Sense2Vec - A Fast and Accurate Method For Word Sense Disambiguation In Neural Word Embeddings" by Andrew Trask, Phil Michalak and John Liu.
Basic Info
Statistics
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
machine-learning
machinelearning
ml
natural-language-processing
nlp
word-embedding
word-embeddings
wordembedding
wordembeddings
Created over 8 years ago
· Last pushed about 4 years ago
https://github.com/AvivYaish/X2VEC/blob/master/
# X2VEC
### Authors: Aviv Yaish, Dan Kufra
We have implemented, expanded and reviewed the paper Sense2Vec - A Fast and Accurate Method For Word Sense Disambiguation In Neural Word Embeddings" by Andrew Trask, Phil Michalak and John Liu.
The paper can be found at:
https://arxiv.org/abs/1511.06388
#### Installation
1. Install libraries:
string,
time,
textblob,
pywsd,
nltk (preferable to download all associated addons/corpora/etc')
matplotlib
sklearn.manifold
numpy
2. Download ex2's corpus, move it to the project's directory and rename it to corpus.
#### Usage
The code is split up into multiple classes.
X2Vec:
Main class that other models inherit from. Implements many of the general functions our models need.
Pos2Vec:
A model that implements POS tagging as its tokenization method.
Sense2Vec:
A model that implements Word Sense Disambiguation as its tokenization method.
Sentiment2Vec:
A model that implements Sentiment Analysis as its tokenization method.
In addition, we have provided the run_models.py file. This file can be used for easily training, saving
and evaluating new models.
To run it simply call "python run_models.py" in your shell.
To change the model types you wish to train simply alter 'model_types_to_train'.
To change the models and methods you wish to evaluate simply alter the evaluate if.
For evaluation, we have provided 8 files of manually chosen words as an example for evaluation
(in the evaluation_files directory). This is simply to give a feel for how we see the evaluation can be done when
given a tagged corpus or label.
If you wish to evaluate new words then simply create files in the same format and give them to the evaluation
function instead.
#### Notes
1. Sense2Vec training is SLOW! Very Slow.
2. The evaluation is simply a proof of concept of how it can be done when given a proper labeled dataset.
3. The TSNE visualization should be given a reasonable number (say, 1000-2000).
Owner
- Name: Aviv Yaish
- Login: AvivYaish
- Kind: user
- Company: The Hebrew University
- Website: https://avivyaish.com/
- Twitter: yaish_aviv
- Repositories: 6
- Profile: https://github.com/AvivYaish
Computer science PhD candidate, with an interest in cryptocurrencies, distributed ledgers, game theory and artificial intelligence.
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2