ontoclue_docker
This repository contains the code for the Docker version of the OntoClue Project.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.5%) to scientific vocabulary
Keywords
Repository
This repository contains the code for the Docker version of the OntoClue Project.
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 5
- Releases: 1
Topics
Metadata Files
README.md
Docker Container for OntoClue
This repository contains the code for the Docker container for OntoClue.
About OntoClue
OntoClue is a project that explores various embedding approaches to assess document-to-document similarity using the RELISH Corpus. RELISH is an expert-curated database designed for benchmarking document similarity in biomedical literature and consists of PubMed IDs (PMIDs) with their expert curated corresponding document-2-document relevance assessments wrt other PMIDs. Relevance is categorized as "relevant", "partial" or "irrelevant".
This project consists of several approaches, each with detailed explanations and documentation. These approaches can be executed individually by following the instructions provided in their respective repositories.
- Word2doc2vec
- Doc2vec
- fastText
- WMD-Word2vec
- BERT
- Hybrid-pre-word2doc2vec
- Hybrid-pre-doc2vec
- Hybrid-pre-fasttext
- Hybrid-pre-wmd-word2vec
- Hybrid-post-word2doc2vec
- Hybrid-post-fasttext
- Hybrid-post-wmd-word2vec
- Hybrid-postreduction-word2doc2vec
- Hybrid-postreduction-fasttext
- Hybrid-postreduction-wmd-word2vec
- Word2doc2vec using Pre-trained Word2Vec model
- fastText using Pre-trained fastText model
- WMD-Word2vec using Pre-trained Word2Vec model
Dockerized version of OntoClue
The Docker container for this project ensures reproducibility of the runs, allowing for consistent training and evaluation of different neural network models on the document-to-document similarity within the RELISH Corpus. The entire pipeline can be executed with a few commands. This pipeline includes:
- Cloning the corresponding repository for the selected approach, which contains all the necessary code.
- Downloading the appropriate pre-processed datasets based on the chosen approach (normal text vs annotated text).
- Running tests to verify dataset integrity and reproducibility of runs.
- Evaluating the models.
NOTE: The datasets that are downloaded are already preprocessed using a preprocessing pipeline, as explained in the relish-preprocessing repository, and include annotated datasets as well. Documentation on datasets, data preprocessing, and annotation are also available.
Requirements
In order to get started with the pipeline it is essential that you have Docker installed. Please follow the instructions below to install Docker.
Setting up Docker on Linux
For Linux distribution like Ubuntu, Debian, CentOS, execute the following:
Update your existing list of packages:
sudo apt updateInstall a few prerequisite packages which let apt use packages over HTTPS:
sudo apt install apt-transport-https ca-certificates curl software-properties-commonAdd the GPG key for the official Docker repository:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -Add the Docker repository to APT sources:
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"Update the package database with the Docker packages:
sudo apt updateInstall Docker:
sudo apt install docker-ce
Getting Started
1. Clone the Repository
First, clone the repository to your local machine using the following command:
Using HTTP:
git clone https://github.com/zbmed-semtec/private-ontoclue-project.git
Using SSH:
Ensure you have set up SSH keys in your GitHub account.
git clone git@github.com:zbmed-semtec/private-ontoclue-project.git
You will also need to setup your ssh agent for it to work with the docker container.
eval $(ssh-agent) > /dev/null
ssh-add -k /path/to/your/key
example with default name for the keys.
eval $(ssh-agent) > /dev/null
ssh-add -k /home/user/.ssh/id_ed25519
2. Building the Docker Image:
Change the directory to private-ontoclue-project and execute the following command:
sudo docker build -t ontoclue .
3. Running the Docker Container:
sudo docker run -it ontoclue
If you are using the ssh agent, you will need to pass the ssh agent socket to the container.
docker run --mount type=bind,source=$SSH_AUTH_SOCK,target=/ssh-agent --env SSH_AUTH_SOCK=/ssh-agent -it ontoclue
4. Selecting Embedding Approach:
After running the container, you will be prompted to select an embedding approach:

Upon selecting an approach, the corresponding repository will be cloned from GitHub, and the appropriate datasets will be downloaded based on the chosen approach.
5. Cloning the Repsoitory
Once you select the approach, you will be prompted to select the cloning method for the repository for the particular approach.

6. Running Tests [Optional]:
Once the datasets are downloaded, you will have the option to run tests. This is an optional step. These tests verify:
- If the data was downloaded to the correct directory.
- If the correct data corresponding to the selected approach was downloaded.
- Quick reproducibility checks between runs.
Depending on your preference, you can select y (yes) or n (no). You will see a prompt like this:

7. Selecting Class Distribution:
After the tests are completed, you will be prompted to select the class distribution. Depending on your preference, you can select 3 (three class distribution) or 2 (two class distribution).

Following this, you will see a message indicating that the pipeline is being initiated. This process will take a while to complete 100 iterations.
8. Acessing and Viewing Log Files:
The progress of the run is logged into files named Optunatrials{class_distribution}.log. Follow the steps below to view these log files and copy output files from the Docker container to your local system.
1. List all containers:
First, list all running and stopped containers to find the one you need to access:
sudo docker ps -a
2. Enter the container:
Access the running Docker container using its container ID:
sudo docker exec -it <container_id> /bin/bash
3. View log files:
Once inside the container, view the log file associated with your specific run:
cat <name_of_the_approach>/output_{3/2}/Optuna_trials_{3/2}.log
Here, replace {3/2} with 3 for a three-class distribution or 2 for a two-class distribution.
4. Output Directory
The output files are stored in:
<name_of_the_approach>/output_3for a three-class distribution.<name_of_the_approach>/output_2for a two-class distribution.
5. Copying files to your local system
To copy files from the Docker container to your local system, use the following command:
sudo docker cp <container_ID>:/<name_of_the_approach>/output_[3/2] <path_to_local_dir>
Replace the first path with the appropriate path inside the container and the second path with the destination directory on your local machine.
Owner
- Name: zbmed-semtec
- Login: zbmed-semtec
- Kind: organization
- Repositories: 12
- Profile: https://github.com/zbmed-semtec
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ravinder"
given-names: "Rohitha"
orcid: "https://orcid.org/0009-0004-4484-6283"
- family-names: "Rebholz-Schuhmann"
given-names: "Dietrich"
orcid: "https://orcid.org/0000-0002-1018-0370"
- family-names: "Castro"
given-names: "Leyla Jael"
orcid: "https://orcid.org/0000-0003-3986-0510"
title: "ontoclue_docker"
version: 1.0.0
license: GPL-3.0 license
date-released: 2025-02-13
repository-code: "https://github.com/zbmed-semtec/ontoclue_docker"
GitHub Events
Total
- Issues event: 10
- Member event: 2
- Push event: 3
- Create event: 3
Last Year
- Issues event: 10
- Member event: 2
- Push event: 3
- Create event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 8
- Total pull requests: 0
- Average time to close issues: 8 days
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 0
- Average time to close issues: 8 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- rohitharavinder (8)