knowalk

Algorithm that produces embeddings out of a knowledge graph based on biased random walker from user specific weights. The algoritm runs in parallel on multiple cpu, by default n of cpus of the machine -1.

https://github.com/freh-g/knowalk

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Algorithm that produces embeddings out of a knowledge graph based on biased random walker from user specific weights. The algoritm runs in parallel on multiple cpu, by default n of cpus of the machine -1.

Basic Info
  • Host: GitHub
  • Owner: freh-g
  • License: agpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 28.4 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

KnoWalk

KnoWalk is an algorithm that produces nodes embeddings from a knowledge graph. It is optimized for running in parallel on multiple CPUs, by default n of available CPUs - 1. KnoWalk takes as input a knowledge graph that has to be fed to the algorithm in a .csv format without index and with specific headers i.e.

  • source: id of the source
  • target: id of the target
  • rel_type: type of the interaction
  • source_type: type of the source
  • target_type: type of the target

An example is available in data/kg_edgelist.csv. This knowledge graph is composed of 41k nodes of 4 types (functions, phenotype, drug and protein) and ~60 types of relationships.

KnoWalk need several dependencies, an environment named KnoWalk can be created by running

conda env create -f KnoWalk.yml Then activate the environment with

conda activate KnoWalk

The algorithm builds a nx.MultiDiGraph() from the edgelist file and builds biased walks that are used as input to Word2Vec algorithms.

For tuning the algorithm the script accepts several parameters that can be listed by running

python3 KW2VEC.py -h

An important component is the weight dictionary. It have to be passed during the call in a form of python dictionary in which the keys are tuples reporting the jump from node type to node type to weight (node types have to be the same in sourcetype and targettype columns of the edgelist) and the keys is the assigned value. For the kg_edgelist.csv a correct call specifing weights could be:

python KW2VEC.py -e data/kg_edgelist.csv -w "{('drug','protein'): 0,('protein','function'): 10,('function','phenotype'):100}" -s True -o outputs/kg_embeddings.pickle

Owner

  • Name: Francesco Gualdi
  • Login: freh-g
  • Kind: user
  • Location: Barcelona
  • Company: Institut Hospital del Mar d'investigacions mediques (IMIM)

Bsc in biology at UNIFE, Msc in medical biotechnology at UNIVR. Currently working in medical bioinformatics group at IMIM, Barcelona doing my PhD in Biomedicine

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Gualdi
  given-names: Francesco
  orcid: https://orcid.org/0000-0003-0449-9884
- family-names: Piñero
  given-names: Janet
  orcid: https://orcid.org/0000-0003-1244-7654
- family-names: Oliva
  given-names: Baldomero
  orcid: https://orcid.org/0000-0003-0702-0250
title: "knowalk"
version: 1.0.0
doi: 10.5281/zenodo.8233718
date-released: 2023-08-10
url: https://github.com/freh-g/knowalk

GitHub Events

Total
Last Year