knowalk

Algorithm that produces embeddings out of a knowledge graph based on biased random walker from user specific weights. The algoritm runs in parallel on multiple cpu, by default n of cpus of the machine -1.

https://github.com/freh-g/knowalk

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: freh-g
License: agpl-3.0
Language: Python
Default Branch: main
Homepage:
Size: 28.4 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

README.md

KnoWalk

KnoWalk is an algorithm that produces nodes embeddings from a knowledge graph. It is optimized for running in parallel on multiple CPUs, by default n of available CPUs - 1. KnoWalk takes as input a knowledge graph that has to be fed to the algorithm in a .csv format without index and with specific headers i.e.

source: id of the source
target: id of the target
rel_type: type of the interaction
source_type: type of the source
target_type: type of the target

An example is available in data/kg_edgelist.csv. This knowledge graph is composed of 41k nodes of 4 types (functions, phenotype, drug and protein) and ~60 types of relationships.

KnoWalk need several dependencies, an environment named KnoWalk can be created by running

conda env create -f KnoWalk.yml Then activate the environment with

conda activate KnoWalk

The algorithm builds a nx.MultiDiGraph() from the edgelist file and builds biased walks that are used as input to Word2Vec algorithms.

For tuning the algorithm the script accepts several parameters that can be listed by running

python3 KW2VEC.py -h

An important component is the weight dictionary. It have to be passed during the call in a form of python dictionary in which the keys are tuples reporting the jump from node type to node type to weight (node types have to be the same in sourcetype and targettype columns of the edgelist) and the keys is the assigned value. For the kg_edgelist.csv a correct call specifing weights could be:

python KW2VEC.py -e data/kg_edgelist.csv -w "{('drug','protein'): 0,('protein','function'): 10,('function','phenotype'):100}" -s True -o outputs/kg_embeddings.pickle

Owner

Name: Francesco Gualdi
Login: freh-g
Kind: user
Location: Barcelona
Company: Institut Hospital del Mar d'investigacions mediques (IMIM)

Repositories: 0
Profile: https://github.com/freh-g

Bsc in biology at UNIFE, Msc in medical biotechnology at UNIVR. Currently working in medical bioinformatics group at IMIM, Barcelona doing my PhD in Biomedicine

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Gualdi
  given-names: Francesco
  orcid: https://orcid.org/0000-0003-0449-9884
- family-names: Piñero
  given-names: Janet
  orcid: https://orcid.org/0000-0003-1244-7654
- family-names: Oliva
  given-names: Baldomero
  orcid: https://orcid.org/0000-0003-0702-0250
title: "knowalk"
version: 1.0.0
doi: 10.5281/zenodo.8233718
date-released: 2023-08-10
url: https://github.com/freh-g/knowalk

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science