node2vec

Implementation of the node2vec algorithm.

https://github.com/eliorc/node2vec

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    2 of 16 committers (12.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary

Keywords

deep-learning embeddings machine-learning-algorithms

Keywords from Contributors

distribution transformers autograding parallel interactive cryptocurrencies ecosystem-modeling observability hacking shellcodes
Last synced: 6 months ago · JSON representation

Repository

Implementation of the node2vec algorithm.

Basic Info
  • Host: GitHub
  • Owner: eliorc
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 88.9 KB
Statistics
  • Stars: 1,279
  • Watchers: 20
  • Forks: 255
  • Open Issues: 0
  • Releases: 10
Topics
deep-learning embeddings machine-learning-algorithms
Created about 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Node2Vec

Downloads

Python3 implementation of the node2vec algorithm Aditya Grover, Jure Leskovec and Vid Kocijan. node2vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016.

Maintenance

I no longer have time to maintain this, if someone wants to pick the baton let me know

Installation

pip install node2vec

Usage

```python import networkx as nx from node2vec import Node2Vec

Create a graph

graph = nx.fastgnprandom_graph(n=100, p=0.5)

Precompute probabilities and generate walks - ON WINDOWS ONLY WORKS WITH workers=1

node2vec = Node2Vec(graph, dimensions=64, walklength=30, numwalks=200, workers=4) # Use temp_folder for big graphs

Embed nodes

model = node2vec.fit(window=10, mincount=1, batchwords=4) # Any keywords acceptable by gensim.Word2Vec can be passed, dimensions and workers are automatically passed (from the Node2Vec constructor)

Look for most similar nodes

model.wv.most_similar('2') # Output node names are always strings

Save embeddings for later use

model.wv.saveword2vecformat(EMBEDDING_FILENAME)

Save model for later use

model.save(EMBEDDINGMODELFILENAME)

Embed edges using Hadamard method

from node2vec.edges import HadamardEmbedder

edgesembs = HadamardEmbedder(keyedvectors=model.wv)

Look for embeddings on the fly - here we pass normal tuples

edges_embs[('1', '2')] ''' OUTPUT array([ 5.75068220e-03, -1.10937878e-02, 3.76693785e-01, 2.69105062e-02, ... ... .... ..................................................................], dtype=float32) '''

Get all edges in a separate KeyedVectors instance - use with caution could be huge for big networks

edgeskv = edgesembs.askeyedvectors()

Look for most similar edges - this time tuples must be sorted and as str

edgeskv.mostsimilar(str(('1', '2')))

Save embeddings for later use

edgeskv.saveword2vecformat(EDGESEMBEDDING_FILENAME)

```

Parameters

node2vec.Node2vec

  • Node2Vec constructor:

    1. graph: The first positional argument has to be a networkx graph. Node names must be all integers or all strings. On the output model they will always be strings.
    2. dimensions: Embedding dimensions (default: 128)
    3. walk_length: Number of nodes in each walk (default: 80)
    4. num_walks: Number of walks per node (default: 10)
    5. p: Return hyper parameter (default: 1)
    6. q: Input parameter (default: 1)
    7. weight_key: On weighted graphs, this is the key for the weight attribute (default: 'weight')
    8. workers: Number of workers for parallel execution (default: 1)
    9. sampling_strategy: Node specific sampling strategies, supports setting node specific 'q', 'p', 'numwalks' and 'walklength'. Use these keys exactly. If not set, will use the global ones which were passed on the object initialization`
    10. quiet: Boolean controlling the verbosity. (default: False)
    11. temp_folder: String path pointing to folder to save a shared memory copy of the graph - Supply when working on graphs that are too big to fit in memory during algorithm execution.
    12. seed: Seed for the random number generator (default: None). Deterministic results can be obtained if seed is set and workers=1.
  • Node2Vec.fit method: Accepts any key word argument acceptable by gensim.Word2Vec

node2vec.EdgeEmbedder

EdgeEmbedder is an abstract class which all the concrete edge embeddings class inherit from. The classes are AverageEmbedder, HadamardEmbedder, WeightedL1Embedder and WeightedL2Embedder which their practical definition could be found in the paper on table 1 Notice that edge embeddings are defined for any pair of nodes, connected or not and even node with itself.

  • Constructor:

    1. keyed_vectors: A gensim.models.KeyedVectors instance containing the node embeddings
    2. quiet: Boolean controlling the verbosity. (default: False)
  • EdgeEmbedder.__getitem__(item) method, better known as EdgeEmbedder[item]:

    1. item - A tuple consisting of 2 nodes from the keyed_vectors passed in the constructor. Will return the embedding of the edge.
  • EdgeEmbedder.as_keyed_vectors method: Returns a gensim.models.KeyedVectors instance with all possible node pairs in a sorted manner as string. For example, for nodes ['1', '2', '3'] we will have as keys "('1', '1')", "('1', '2')", "('1', '3')", "('2', '2')", "('2', '3')" and "('3', '3')".

Caveats

  • Node names in the input graph must be all strings, or all ints
  • Parallel execution not working on Windows (joblib known issue). To run non-parallel on Windows pass workers=1 on the Node2Vec's constructor

TODO

  • [x] Parallel implementation for walk generation
  • [ ] Parallel implementation for probability precomputation

Owner

  • Name: Elior Cohen
  • Login: eliorc
  • Kind: user
  • Company: @datascienceisrael

GitHub Events

Total
  • Issues event: 3
  • Watch event: 58
  • Pull request event: 1
  • Fork event: 7
Last Year
  • Issues event: 3
  • Watch event: 58
  • Pull request event: 1
  • Fork event: 7

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 72
  • Total Committers: 16
  • Avg Commits per committer: 4.5
  • Development Distribution Score (DDS): 0.417
Past Year
  • Commits: 2
  • Committers: 2
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Elior Cohen e****p@g****m 42
pg2455 p****5@c****u 5
Jadesola Bejide 5****e 4
dependabot[bot] 4****] 3
Gerrit-Jan de Bruin g****n@g****m 3
raminqaf r****b@g****m 2
Roman Shaptala r****a@g****m 2
Komal Kumar k****u@g****m 2
Elior Cohen e****r@d****l 2
ninpnin n****n 1
ndrus-softserve n****s@s****m 1
Luca Cappelletti c****4@g****m 1
Furkan Akkurt 7****5 1
Frenzel, David d****d@g****m 1
Aleksandar Despotovski d****1@g****m 1
Jadesola Bejide j****1@b****k 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 90
  • Total pull requests: 23
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 79
  • Total pull request authors: 19
  • Average comments per issue: 3.36
  • Average comments per pull request: 1.17
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 months
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 1.33
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • amjass12 (6)
  • shartoo (2)
  • BSharmi (2)
  • Mahmedturk (2)
  • ChoYoungSeo (2)
  • davidfstein (2)
  • Floriangarcia7 (2)
  • zixiliuUSC (1)
  • sodre (1)
  • Sanchita333 (1)
  • Sandy4321 (1)
  • chanlevan (1)
  • ricew4ng (1)
  • Lihengwannafly (1)
  • horvathr (1)
Pull Request Authors
  • instabaines (4)
  • dependabot[bot] (4)
  • jade-bejide (2)
  • imvrusso (2)
  • Crosswind (2)
  • eliorc (2)
  • Vincent-Ustach (1)
  • Neronuser (1)
  • MohammadHeydari (1)
  • raminqaf (1)
  • kkteru (1)
  • ninpnin (1)
  • despotovski01 (1)
  • jinhangjiang (1)
  • ndrus-softserve (1)
Top Labels
Issue Labels
question (6) bug (2) enhancement (1)
Pull Request Labels
dependencies (4)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 22,986 last-month
  • Total docker downloads: 54
  • Total dependent packages: 10
    (may contain duplicates)
  • Total dependent repositories: 89
    (may contain duplicates)
  • Total versions: 30
  • Total maintainers: 1
pypi.org: node2vec

Implementation of the node2vec algorithm

  • Versions: 17
  • Dependent Packages: 10
  • Dependent Repositories: 88
  • Downloads: 22,986 Last month
  • Docker Downloads: 54
Rankings
Dependent packages count: 1.1%
Dependent repos count: 1.6%
Downloads: 1.6%
Stargazers count: 1.9%
Average: 2.2%
Forks count: 3.2%
Docker downloads count: 3.8%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/eliorc/node2vec
  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
conda-forge.org: node2vec

The node2vec algorithm learns continuous representations for nodes in any (un)directed, (un)weighted graph.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Forks count: 11.7%
Stargazers count: 12.5%
Dependent repos count: 24.4%
Average: 25.0%
Dependent packages count: 51.6%
Last synced: 6 months ago