knowalk
Algorithm that produces embeddings out of a knowledge graph based on biased random walker from user specific weights. The algoritm runs in parallel on multiple cpu, by default n of cpus of the machine -1.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Repository
Algorithm that produces embeddings out of a knowledge graph based on biased random walker from user specific weights. The algoritm runs in parallel on multiple cpu, by default n of cpus of the machine -1.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
KnoWalk
KnoWalk is an algorithm that produces nodes embeddings from a knowledge graph. It is optimized for running in parallel on multiple CPUs, by default n of available CPUs - 1. KnoWalk takes as input a knowledge graph that has to be fed to the algorithm in a .csv format without index and with specific headers i.e.
- source: id of the source
- target: id of the target
- rel_type: type of the interaction
- source_type: type of the source
- target_type: type of the target
An example is available in data/kg_edgelist.csv. This knowledge graph is composed of 41k nodes of 4 types (functions, phenotype, drug and protein) and ~60 types of relationships.
KnoWalk need several dependencies, an environment named KnoWalk can be created by running
conda env create -f KnoWalk.yml
Then activate the environment with
conda activate KnoWalk
The algorithm builds a nx.MultiDiGraph() from the edgelist file and builds biased walks that are used as input to Word2Vec algorithms.
For tuning the algorithm the script accepts several parameters that can be listed by running
python3 KW2VEC.py -h
An important component is the weight dictionary. It have to be passed during the call in a form of python dictionary in which the keys are tuples reporting the jump from node type to node type to weight (node types have to be the same in sourcetype and targettype columns of the edgelist) and the keys is the assigned value. For the kg_edgelist.csv a correct call specifing weights could be:
python KW2VEC.py -e data/kg_edgelist.csv -w "{('drug','protein'): 0,('protein','function'): 10,('function','phenotype'):100}" -s True -o outputs/kg_embeddings.pickle
Owner
- Name: Francesco Gualdi
- Login: freh-g
- Kind: user
- Location: Barcelona
- Company: Institut Hospital del Mar d'investigacions mediques (IMIM)
- Repositories: 0
- Profile: https://github.com/freh-g
Bsc in biology at UNIFE, Msc in medical biotechnology at UNIVR. Currently working in medical bioinformatics group at IMIM, Barcelona doing my PhD in Biomedicine
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: Gualdi given-names: Francesco orcid: https://orcid.org/0000-0003-0449-9884 - family-names: Piñero given-names: Janet orcid: https://orcid.org/0000-0003-1244-7654 - family-names: Oliva given-names: Baldomero orcid: https://orcid.org/0000-0003-0702-0250 title: "knowalk" version: 1.0.0 doi: 10.5281/zenodo.8233718 date-released: 2023-08-10 url: https://github.com/freh-g/knowalk