https://github.com/abdcelikkanat/revisitingkmers
This is the repository for the project entitled "Revisiting K-mer Profile for Effective and Scalable Genome Representation Learning"
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Repository
This is the repository for the project entitled "Revisiting K-mer Profile for Effective and Scalable Genome Representation Learning"
Basic Info
- Host: GitHub
- Owner: abdcelikkanat
- Language: Python
- Default Branch: main
- Size: 87.9 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Revisiting K-mer Profile for Effective and Scalable Genome Representation Learning
Overview
This project explores effective and scalable genome representation learning approaches relying on the k-mer features for the metagenomics binning task.
Installation
- Clone this repository:
git clone https://github.com/abdcelikkanat/revisitingkmers.git cd revisitingkmers - Install dependencies: Make sure you have Python 3.8 installed. You can install the required Python packages using
pip:pip install -r requirements.txt - Install
gdown(if you don't already have it) for downloading the datasets:pip install gdown
Datasets
To download and prepare the training dataset, run the following commands:
gdown 1p59ch_MO-9DXh3LUIvorllPJGLEAwsUp
unzip dnabert-s_train.zip
To download the evaluation datasets, use the following commands:
gdown 1I44T2alXrtXPZrhkuca6QP3tFHxDW98c
unzip dnabert-s_eval.zip
Usage
To view the detailed usage instructions for each model, you can use the --help flag:
Poisson Model
python poisson_model.py --help
Nonlinear Model
python nonlinear.py --help
Citation
If you find the work useful for your research, please consider citing the following paper:
@article{celikkanat2024revisiting,
title={Revisiting K-mer Profile for Effective and Scalable Genome Representation Learning},
author={Celikkanat, Abdulkadir and Masegosa, Andres R. and Nielsen, Thomas D.},
journal={Advances in Neural Information Processing Systems},
volume={37},
year={2024}
}
Owner
- Name: Abdulkadir Çelikkanat
- Login: abdcelikkanat
- Kind: user
- Location: İstanbul
- Repositories: 3
- Profile: https://github.com/abdcelikkanat
GitHub Events
Total
- Watch event: 6
- Push event: 1
Last Year
- Watch event: 6
- Push event: 1
Dependencies
- nvcr.io/nvidia/pytorch 23.04-py3 build
- numpy ==1.22.2
- scikit-learn ==1.2.0
- scipy ==1.10.1
- torch ==2.3.1
- tqdm ==4.65.0
- transformers ==4.42.3