biosynrescl
Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Repository
Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL
Basic Info
- Host: GitHub
- Owner: hajhouj
- License: gpl-3.0
- Language: Java
- Default Branch: main
- Size: 4.09 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
BioSynResCL
BioSynResCL is a command-line tool for resolving biomedical terms to their corresponding concepts. It uses a dataset of abbreviations and a medical vocabulary to perform the resolution, and the entire computation can be done on either the CPU or on a GPU using OpenCL.
This tool was designed with the aim to improve speed, scalability and performance of resolving biomedical terms especially in large-scale datasets.
Background
This work, BioSynResCL, is the product of a research paper to be presented at the 6th IEEE International Conference on Computing and Artificial Intelligence: Technologies and Applications (IEEE CloudTech'23).
How to Build
- Clone this repository.
- Navigate to the root directory of the project.
- Run
mvn clean package.
How to Use
Vocabulary data
BioSynResCL uses vocabulary datasets for term resolution. The datasets can be downloaded from the following links:
- UMLS Terms vocabulary with all languages : Download from here.
- UMLS Terms vocabulary for English only : Download from here.
- MeSH Terms vocabulary : Download from here.
After downloading the desired dataset, decompress its contents and use the "terms.txt" file as a parameter when running either the benchmark or the resolution tool.
Run Benchmark
java -cp "./target/libs/*" com.hajhouj.biosynres.cl.benchmark.BioSynResBenchmark benchmark "queryTerm" "filename" "topN" "platform"
Where:
- queryTerm is the term to resolve.
- filename is the filename of the input file.
- topN is the number of top results to retrieve.
- platform is the platform to use (CPU or OPENCL).
Selecting OpenCL Device
If you opt to use OpenCL as the computational platform, you will need to specify the OpenCL device ID for computation. You can determine the available devices and their IDs by running the following command:
java -cp "./target/libs/*" com.hajhouj.biosynres.cl.benchmark.BioSynResBenchmark devices
The output should look something like this:
Use the desired device by adding its query as a JVM parameter using -Duse-device, for example if we want to use Intel(R) UHD Graphics 630 from the result above, we should use -Duse-device=0.1 :
DEVICE QUERY | DEVICE NAME
-------------+-------------
0.0 | Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
0.1 | Intel(R) UHD Graphics 630
0.2 | AMD Radeon Pro 555X Compute Engine
To use a specific device, include its query ID as a JVM parameter using -Duse-device. For example, to use the Intel(R) UHD Graphics 630 from the output above, your command should look like this:
java -Duse-device=0.1 -cp "./target/libs/*" com.hajhouj.biosynres.cl.benchmark.BioSynResBenchmark benchmark "Autism disorder" "vocabulary/all/terms.txt" 10 OPENCL
Run Resolution Tool
java -cp "./target/libs/*" com.hajhouj.biosynres.cl.resolve.BioSynResolutionTool resolve "queryTerm" "acronymsFileName" "vocabularyFileName" "topN" "platform"
Where:
- queryTerm is the term to resolve.
- acronymsFileName is the filename of the input file containing abbreviations.
- vocabularyFileName is the filename of the input file containing vocabulary.
- topN is the number of top results to retrieve.
- platform is the platform to use (CPU or OPENCL).
Dependencies
To run BioSynResCL, you will need:
- Java 8 or later
- An OpenCL compatible GPU (for OpenCL platform)
Contributing
Contributions are very welcome. Feel free to fork this project, make your changes, and submit a pull request. If you have found a bug or have a feature request, please open an issue.
Citation
If you find the code in this repository useful in your research, we kindly request that you cite our work. Here is the BibTeX entry for our paper:
@inproceedings{hajhouj2023enhancing,
title={Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL: A Use Case Study},
author={Hajhouj, Mohammed and Abik, Mounia and Zarnoufi, Randa},
booktitle={2023 IEEE 6th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)},
pages={1--8},
year={2023},
organization={IEEE}
}
License
BioSynResCL is released under GNU General Public License v3.0 .
UMLS Data Usage and Compliance Information
This project utilizes concept names derived from the UMLS Metathesaurus, a comprehensive database managed by the National Library of Medicine (NLM). The inclusion of UMLS data is specifically for benchmarking purposes in the context of accelerated syntactic resolution using OpenCL within this open-source project.
Please Note: The usage of UMLS data is governed by the terms of the UMLS Metathesaurus License Agreement. Users of this repository are responsible for ensuring their use of the UMLS data complies with these terms. It is important to acknowledge that redistribution of the UMLS data, even in part, may be subject to restrictions as per the license agreement.
For full details on the licensing terms and conditions, please refer to the UMLS Metathesaurus License Agreement.
We encourage all users to review the license terms and adhere to them strictly to ensure compliance with copyright laws and regulations.
Owner
- Login: hajhouj
- Kind: user
- Repositories: 1
- Profile: https://github.com/hajhouj
Citation (CITATION.cff)
# Citation for this repository
If you use the code from this repository in your research, please cite the following paper:
@inproceedings{hajhouj2023enhancing,
title={Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL: A Use Case Study},
author={Hajhouj, Mohammed and Abik, Mounia and Zarnoufi, Randa},
booktitle={2023 IEEE 6th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)},
pages={1--8},
year={2023},
organization={IEEE}
}
GitHub Events
Total
Last Year
Dependencies
- com.hajhouj.fastsico:fastsico 1.0.2
- mysql:mysql-connector-java 8.0.26
- org.apache.commons:commons-text 1.10.0