biosynrescl

Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL

https://github.com/hajhouj/biosynrescl

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL

Basic Info

Host: GitHub
Owner: hajhouj
License: gpl-3.0
Language: Java
Default Branch: main
Size: 4.09 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

BioSynResCL

BioSynResCL is a command-line tool for resolving biomedical terms to their corresponding concepts. It uses a dataset of abbreviations and a medical vocabulary to perform the resolution, and the entire computation can be done on either the CPU or on a GPU using OpenCL.

This tool was designed with the aim to improve speed, scalability and performance of resolving biomedical terms especially in large-scale datasets.

Background

This work, BioSynResCL, is the product of a research paper to be presented at the 6th IEEE International Conference on Computing and Artificial Intelligence: Technologies and Applications (IEEE CloudTech'23).

How to Build

Clone this repository.
Navigate to the root directory of the project.
Run mvn clean package.

How to Use

Vocabulary data

BioSynResCL uses vocabulary datasets for term resolution. The datasets can be downloaded from the following links:

UMLS Terms vocabulary with all languages : Download from here.
UMLS Terms vocabulary for English only : Download from here.
MeSH Terms vocabulary : Download from here.

After downloading the desired dataset, decompress its contents and use the "terms.txt" file as a parameter when running either the benchmark or the resolution tool.

Run Benchmark

java -cp "./target/libs/*" com.hajhouj.biosynres.cl.benchmark.BioSynResBenchmark benchmark "queryTerm" "filename" "topN" "platform"

Where:

queryTerm is the term to resolve.
filename is the filename of the input file.
topN is the number of top results to retrieve.
platform is the platform to use (CPU or OPENCL).

Selecting OpenCL Device

If you opt to use OpenCL as the computational platform, you will need to specify the OpenCL device ID for computation. You can determine the available devices and their IDs by running the following command:

java -cp "./target/libs/*" com.hajhouj.biosynres.cl.benchmark.BioSynResBenchmark devices

The output should look something like this:

Use the desired device by adding its query as a JVM parameter using -Duse-device, for example if we want to use Intel(R) UHD Graphics 630 from the result above, we should use -Duse-device=0.1 :

DEVICE QUERY | DEVICE NAME -------------+------------- 0.0 | Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz 0.1 | Intel(R) UHD Graphics 630 0.2 | AMD Radeon Pro 555X Compute Engine

To use a specific device, include its query ID as a JVM parameter using -Duse-device. For example, to use the Intel(R) UHD Graphics 630 from the output above, your command should look like this:

java -Duse-device=0.1 -cp "./target/libs/*" com.hajhouj.biosynres.cl.benchmark.BioSynResBenchmark benchmark "Autism disorder" "vocabulary/all/terms.txt" 10 OPENCL

Run Resolution Tool

java -cp "./target/libs/*" com.hajhouj.biosynres.cl.resolve.BioSynResolutionTool resolve "queryTerm" "acronymsFileName" "vocabularyFileName" "topN" "platform"

Where:

queryTerm is the term to resolve.
acronymsFileName is the filename of the input file containing abbreviations.
vocabularyFileName is the filename of the input file containing vocabulary.
topN is the number of top results to retrieve.
platform is the platform to use (CPU or OPENCL).

Dependencies

To run BioSynResCL, you will need:

Java 8 or later
An OpenCL compatible GPU (for OpenCL platform)

Contributing

Contributions are very welcome. Feel free to fork this project, make your changes, and submit a pull request. If you have found a bug or have a feature request, please open an issue.

Citation

If you find the code in this repository useful in your research, we kindly request that you cite our work. Here is the BibTeX entry for our paper:

@inproceedings{hajhouj2023enhancing, title={Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL: A Use Case Study}, author={Hajhouj, Mohammed and Abik, Mounia and Zarnoufi, Randa}, booktitle={2023 IEEE 6th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)}, pages={1--8}, year={2023}, organization={IEEE} }

License

BioSynResCL is released under GNU General Public License v3.0 .

UMLS Data Usage and Compliance Information

This project utilizes concept names derived from the UMLS Metathesaurus, a comprehensive database managed by the National Library of Medicine (NLM). The inclusion of UMLS data is specifically for benchmarking purposes in the context of accelerated syntactic resolution using OpenCL within this open-source project.

Please Note: The usage of UMLS data is governed by the terms of the UMLS Metathesaurus License Agreement. Users of this repository are responsible for ensuring their use of the UMLS data complies with these terms. It is important to acknowledge that redistribution of the UMLS data, even in part, may be subject to restrictions as per the license agreement.

For full details on the licensing terms and conditions, please refer to the UMLS Metathesaurus License Agreement.

We encourage all users to review the license terms and adhere to them strictly to ensure compliance with copyright laws and regulations.

Owner

Login: hajhouj
Kind: user

Repositories: 1
Profile: https://github.com/hajhouj

Citation (CITATION.cff)

# Citation for this repository
If you use the code from this repository in your research, please cite the following paper:

@inproceedings{hajhouj2023enhancing,
  title={Enhancing Syntactic Resolution in Biomedical Data Processing with OpenCL: A Use Case Study},
  author={Hajhouj, Mohammed and Abik, Mounia and Zarnoufi, Randa},
  booktitle={2023 IEEE 6th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)},
  pages={1--8},
  year={2023},
  organization={IEEE}
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

biosynrescl

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

BioSynResCL

Background

How to Build

How to Use

Vocabulary data

Run Benchmark

Selecting OpenCL Device

Run Resolution Tool

Dependencies

Contributing

Citation

License

UMLS Data Usage and Compliance Information

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies