https://github.com/ammar257ammar/favlib

Fact Validation Library

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: science.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Fact Validation Library

Basic Info

Host: GitHub
Owner: ammar257ammar
License: mit
Language: Python
Default Branch: master
Size: 68.1 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of MaastrichtU-IDS/FaVLib

Created over 6 years ago · Last pushed about 4 years ago

https://github.com/ammar257ammar/FaVLib/blob/master/

## FaVLib

In order to use an integrated solution to the fact validation problem, we developed a Fact Validation Library, called FaVLib, with a workflow implementation that uses Docker and Common Workflow Language. This library trains a classifier on embedding features to predict truthfulness of a given fact based on other facts already present in the knowledge graph.

The FaVLib uses Docker and Common Workflow Language (CWL) representation to properly configure software components and easily execute the workflow. The workflow depicted in Figure has three main steps: 
1. data generation 
2. embedding learning
3. triple classification

![FaVLib Workflow](workflow_factvalid.png)

In the first step, the data generation is carried out on a Knowledge Graph (KG) and the parameters (e.g., negative sampling strategy and test fraction ratio) to generate the necessary data sets for embedding learning and machine learning steps. Next, in the embedding learning step, training positives are used to learn entity embeddings based on the configured embedding method and embedding parameters. In the final step, basic classifiers are trained on the training data and evaluated on test data using embedding features, and feature vectors are exported to be re-usable for other machine learning algorithms / platforms. 

The library integrates various tools including AYNEC (https://github.com/tdg-seville/AYNEC), which generates negative/positive samples and splits data into train and test set, and PyKEEN (https://github.com/SmartDataAnalytics/PyKEEN), which learns multiple embedding methods and machine learning methods for triple fact classification. 

## How to run it

## Using Docker 
* First make sure [Docker](https://docs.docker.com/install/) is installed! ([see below](https://github.com/MaastrichtU-IDS/FaVLib#install-docker))
* Clone this repository
```shell
git clone https://github.com/MaastrichtU-IDS/FaVLib.git
```
* Move to the cloned repository
```shell
cd FavLib
```

* Pull the container image (or build it if you prefer)

```shell
docker pull umids/favlib
```

* Run the container on port `8888`

```shell
docker run -d  --rm --name favlib -p 8888:8888 -v $(pwd):/jupyter -v /tmp:/tmp umids/favlib
```

* Run a workflow:

```shell
docker exec -it favlib cwltool --outdir=/jupyter/output/ workflow/main-workflow-pykeen.cwl workflow/workflow-pykeen.yml
```
* workflow/main-workflow-pykeen.cwl : workflow description defined in CWL
* workflow/workflow-pykeen.yml      : configuration file, need to define your input and parameter
* --outdir=/jupyter/output/         : the output of the workflow will be stored in '/jupyter/output/'

Look at the configuration file, workflow/workflow-pykeen.yml  :
### Data Generation
* inputFile:  /jupyter/data/input/sample_kg.tsv
* minNumRel: 1
* negStrategy: 'change_source'
* fractionTest: 0.1
* predict: 1    # to make prediction
* predicate: '\'  # make prediction for only given predicate
* numTrainNegatives: 1   # number of negative for each triple in train
* numTestNegatives: 10   # number of negative for each triple in train
### Embedding Learning
* embedding_model_name: 'TransE'
* embedding_dim: 50
* normalization_of_entities: 2
* seed: 0
* scoring_function: 1
* margin_loss: 1
* learning_rate: 0.01
* batch_size: 64
* num_epochs: 50
* filter_negative_triples: True
* preferred_device: 'cpu'
 ### Triple Classification
* output_train: 0  # to output the training feature matrix (to test your with different model)
* output_test: 0  # to output the test feature matrix


* That's it!

## Install Docker

https://docs.docker.com/install/

Windows requires to have:

* Windows 10 64-bit: Pro, Enterprise, or Education (Build 15063 or later).

* Hyper-V and Containers Windows features must be enabled.

See [this page](https://d2s.semanticscience.org/docs/guide-docker) for some basic informations about how to run Docker, connect it to your laptop storage and deploy applications locally.

Owner

Name: Ammar Ammar
Login: ammar257ammar
Kind: user
Location: The Netherlands
Company: Maastricht University

Repositories: 14
Profile: https://github.com/ammar257ammar

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ammar257ammar/favlib

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/ammar257ammar/FaVLib/blob/master/

Owner

GitHub Events

Total

Last Year