https://github.com/aehrc/insider
Detecting foreign inserted DNA segments in the genome
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary
Repository
Detecting foreign inserted DNA segments in the genome
Basic Info
- Host: GitHub
- Owner: aehrc
- Language: Python
- Default Branch: master
- Size: 4.78 MB
Statistics
- Stars: 9
- Watchers: 9
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
INSIDER
This repository contains scripts for detecting foreign DNA sequences in genomes.
Requirements
python >= 3.7.0, pyspark >= 3.0.0, scikit-learn >= 0.24.0, scipy >= 1.6.0, statsmodels >= 0.12.0
Alternatively, create the conda environment: conda create env -f environment.yml.
Quick start
To run the INSIDER pipeline: sh INSIDER_Pipeline.sh.
Usage
Calculate K-mer frequencies
python bin/calculate_kmer_frequencies.py \
split \
-f test_file.fa \
-k 2 \
-n \
-o test_2mer
For each sequence, extract 2-mers and count their frequencies. Ambigous bases (i.e., N's) are ignored.
Cluster K-mer frequencies
python insider_cluster.py \
consensus \
--freqDir test_2mer \
--params params.json \
-o test_2mer_cIds.txt
Cluster sequences based on their K-mers. Hyperparameters can be specified in the JSON file.
Analyse K-mer frequencies
python insider_analyse.py
main \
--freqDir test_2mer \
--cIdFile test_2mer_cIds.txt \
-o test_2mer_output.txt
Assess the similarity between each cluster and the genome based on their K-mer frequencies.
Reference
For more information, please refer to the following article:
INSIDER: alignment-free detection of foreign DNA sequences
Aidan P. Tay, Brendan Hosking, Cameron Hosking, Denis C. Bauer, and Laurence O.W. Wilson
Computational and Structural Biotechnology Journal, 2021, 19, 3810-3816
Owner
- Name: The Australian e-Health Research Centre
- Login: aehrc
- Kind: organization
- Website: https://aehrc.com
- Twitter: ehealthresearch
- Repositories: 101
- Profile: https://github.com/aehrc
The Australian e-Health Research Centre (AEHRC) is CSIRO’s digital health research program.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0