https://github.com/avivyaish/x2vec

We have implemented, expanded and reviewed the paper “Sense2Vec - A Fast and Accurate Method For Word Sense Disambiguation In Neural Word Embeddings" by Andrew Trask, Phil Michalak and John Liu.

https://github.com/avivyaish/x2vec

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

machine-learning machinelearning ml natural-language-processing nlp word-embedding word-embeddings wordembedding wordembeddings
Last synced: 6 months ago · JSON representation

Repository

We have implemented, expanded and reviewed the paper “Sense2Vec - A Fast and Accurate Method For Word Sense Disambiguation In Neural Word Embeddings" by Andrew Trask, Phil Michalak and John Liu.

Basic Info
  • Host: GitHub
  • Owner: AvivYaish
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 793 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
machine-learning machinelearning ml natural-language-processing nlp word-embedding word-embeddings wordembedding wordembeddings
Created over 8 years ago · Last pushed about 4 years ago

https://github.com/AvivYaish/X2VEC/blob/master/

# X2VEC
### Authors: Aviv Yaish, Dan Kufra

We have implemented, expanded and reviewed the paper Sense2Vec - A Fast and Accurate Method For Word Sense Disambiguation In Neural Word Embeddings" by Andrew Trask, Phil Michalak and John Liu.

The paper can be found at:  
https://arxiv.org/abs/1511.06388

#### Installation

    1. Install libraries:
        string,
        time,
        textblob,
        pywsd,
        nltk (preferable to download all associated addons/corpora/etc')
        matplotlib
        sklearn.manifold
        numpy
    2. Download ex2's corpus, move it to the project's directory and rename it to corpus.
    
#### Usage
    The code is split up into multiple classes.
        X2Vec:
            Main class that other models inherit from. Implements many of the general functions our models need.
        Pos2Vec:
            A model that implements POS tagging as its tokenization method.
        Sense2Vec:
            A model that implements Word Sense Disambiguation as its tokenization method.
        Sentiment2Vec:
            A model that implements Sentiment Analysis as its tokenization method.
    In addition, we have provided the run_models.py file. This file can be used for easily training, saving
    and evaluating new models.

    To run it simply call "python run_models.py" in your shell.
    To change the model types you wish to train simply alter 'model_types_to_train'.
    To change the models and methods you wish to evaluate simply alter the evaluate if.

    For evaluation, we have provided 8 files of manually chosen words as an example for evaluation
    (in the evaluation_files directory). This is simply to give a feel for how we see the evaluation can be done when
    given a tagged corpus or label.
    If you wish to evaluate new words then simply create files in the same format and give them to the evaluation
    function instead.


#### Notes
    1. Sense2Vec training is SLOW! Very Slow.
    2. The evaluation is simply a proof of concept of how it can be done when given a proper labeled dataset.
    3. The TSNE visualization should be given a reasonable number (say, 1000-2000).

Owner

  • Name: Aviv Yaish
  • Login: AvivYaish
  • Kind: user
  • Company: The Hebrew University

Computer science PhD candidate, with an interest in cryptocurrencies, distributed ledgers, game theory and artificial intelligence.

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2