kmeRs

Similarity Score Matrix and HeatMap for Nucleic and Amino acid K-mers

https://github.com/urniaz/kmers

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Keywords

amino-acids bioinformatics nucleic-acids similarity-matrix

Last synced: 9 months ago · JSON representation

Repository

Similarity Score Matrix and HeatMap for Nucleic and Amino acid K-mers

Basic Info

Host: GitHub
Owner: urniaz
License: gpl-3.0
Language: R
Default Branch: master
Homepage: https://urniaz.github.io/kmeRs/
Size: 802 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 2

Topics

amino-acids bioinformatics nucleic-acids similarity-matrix

Created over 7 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

README.md

CRAN/METACRAN License CRAN/METACRAN Version CRAN/METACRAN Total downloads

kmeRs: K-mers Similarity Score Matrix and HeatMap

Similarity Score Matrix and HeatMap for nucleic and amino acid k-mers. Similarity score is evaluated by Point Accepted Mutation (PAM) and BLOcks SUbstitution Matrix (BLOSUM). The 30, 40, 70, 120, 250 and 62, 45, 50, 62, 80, 100 matrix versions are available for PAM and BLOSUM, respectively. Alignment is evaluated by local ¹ and global ² alignment.

[1. Installation](#1-installation) [2. Documentation](#2-documentation) - [2.1 Functions](#21-functions) - [2.2 Examples](#22-examples) - [2.3.1 How to display BLOSUM matrix used for amino acides calculation?](#231-how-to-display-blosum-matrix-used-for-amino-acides-calculation) - [2.3.2 How to find the most *'different'* k-mer from the *'given'* set of k-mers?](#232-how-to-find-the-most-different-k-mer-from-the-given-set-of-k-mers) - [2.3.3 How to find the most *'different'* k-mer to *'whole'* given set of k-mers? ](#233-how-to-find-the-most-different-k-mer-to-whole-given-set-of-k-mers) - [2.3.4 How to calculate basic statistics for the matrix?](#234-how-to-calculate-basic-statistics-for-the-matrix) - [2.3.5 How to display a similarity matrix as a heatmap?](#235-how-to-display-a-similarity-matrix-as-a-heatmap) - [2.3 Tutorials](#23-tutorials) [3. Tests](#3-tests) [4. How to cite](#4-how-to-cite) [5. Acknowledgement](#5-acknowledgement) [6. References](#6-references) ## 1. Installation **Bioconductor Packages** need to be installed first, ```r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Biostrings") BiocManager::install("pwalign") BiocManager::install("BiocGenerics") ``` **then install from CRAN** ```r install.packages("kmeRs") ``` **or development version from GitHub** ```r install.packages("devtools") devtools::install_github("urniaz/kmeRs/kmeRs/") ``` ## 2. Documentation ### 2.1 Functions The documentation for package including user guides, package vignettes, descriptions of the functions and other documentation is simply accessible by running command below. ```r # To show the manual and functions' descriptions help(package = "kmeRs") ``` ### 2.2 Examples #### 2.3.1 How to display BLOSUM matrix used for amino acides calculation? Simply apply the **kmeRs_similarity_matrix** function and mark the appropriate matrix, here BLOSUM62. ```r # Simple BLOSUM62 similarity matrix for all amino acid nucleotides BLOSUM62 <- kmeRs_similarity_matrix(submat = "BLOSUM62") # Fancy knitr table knitr::kable(BLOSUM62) ```

	G	A	L	M	F	W	K	Q	E	S	P	V	I	C	Y	H	R	N	D	T
G	6	0	-4	-3	-3	-2	-2	-2	-2	0	-2	-3	-4	-3	-3	-2	-2	0	-1	-2
A	0	4	-1	-1	-2	-3	-1	-1	-1	1	-1	0	-1	0	-2	-2	-1	-2	-2	0
L	-4	-1	4	2	0	-2	-2	-2	-3	-2	-3	1	2	-1	-1	-3	-2	-3	-4	-1
M	-3	-1	2	5	0	-1	-1	0	-2	-1	-2	1	1	-1	-1	-2	-1	-2	-3	-1
F	-3	-2	0	0	6	1	-3	-3	-3	-2	-4	-1	0	-2	3	-1	-3	-3	-3	-2
W	-2	-3	-2	-1	1	11	-3	-2	-3	-3	-4	-3	-3	-2	2	-2	-3	-4	-4	-2
K	-2	-1	-2	-1	-3	-3	5	1	1	0	-1	-2	-3	-3	-2	-1	2	0	-1	-1
Q	-2	-1	-2	0	-3	-2	1	5	2	0	-1	-2	-3	-3	-1	0	1	0	0	-1
E	-2	-1	-3	-2	-3	-3	1	2	5	0	-1	-2	-3	-4	-2	0	0	0	2	-1
S	0	1	-2	-1	-2	-3	0	0	0	4	-1	-2	-2	-1	-2	-1	-1	1	0	1
P	-2	-1	-3	-2	-4	-4	-1	-1	-1	-1	7	-2	-3	-3	-3	-2	-2	-2	-1	-1
V	-3	0	1	1	-1	-3	-2	-2	-2	-2	-2	4	3	-1	-1	-3	-3	-3	-3	0
I	-4	-1	2	1	0	-3	-3	-3	-3	-2	-3	3	4	-1	-1	-3	-3	-3	-3	-1
C	-3	0	-1	-1	-2	-2	-3	-3	-4	-1	-3	-1	-1	9	-2	-3	-3	-3	-3	-1
Y	-3	-2	-1	-1	3	2	-2	-1	-2	-2	-3	-1	-1	-2	7	2	-2	-2	-3	-2
H	-2	-2	-3	-2	-1	-2	-1	0	0	-1	-2	-3	-3	-3	2	8	0	1	-1	-2
R	-2	-1	-2	-1	-3	-3	2	1	0	-1	-2	-3	-3	-3	-2	0	5	0	-2	-1
N	0	-2	-3	-2	-3	-4	0	0	0	1	-2	-3	-3	-3	-2	1	0	6	1	0
D	-1	-2	-4	-3	-3	-4	-1	0	2	0	-1	-3	-3	-3	-3	-1	-2	1	6	-1
T	-2	0	-1	-1	-2	-2	-1	-1	-1	1	-1	0	-1	-1	-2	-2	-1	0	-1	5

2.3.2 How to find the most 'different' k-mer from the 'given' set of k-mers?

In this example, the most ‘different’ k-mer to “GATTACA” sequence will be indicated from given set of heptamers. Here, 7 heptamer (being an anagram of the movie title “GATTACA”) are given, as follow:

```r

Given hexamers

kmers_given <- c("GATTACA", "ACAGATT", "GAATTAC", "GAAATCT", "CTATAGA", "GTACATA", "AACGATT")

Matrix calculation

kmersmat <- kmeRssimilaritymatrix(q = c("GATTACA"), x = kmersgiven , submat = "BLOSUM62")

Fancy knitr table

knitr::kable(kmers_mat) ```

	GATTACA
GATTACA	37
ACAGATT	1
GAATTAC	15
GAAATCT	19
CTATAGA	7
GTACATA	12
AACGATT	4

Now, applying kmeRs_score function the total score is calculated and the matrix is sorted by increasing score value. The lowest value (in case of BLOSUM) indicates the most ‘different’ sequence from given k-mers, in contrast to the highest value which indicates the most similar one.

```r

Score and sort the matrix

kmersres <- kmeRsscore(kmers_mat)

Fancy knitr table

knitr::kable(kmers_res) ```

	GATTACA	Sum
ACAGATT	1	1
AACGATT	4	4
CTATAGA	7	7
GTACATA	12	12
GAATTAC	15	15
GAAATCT	19	19
GATTACA	37	37

As can be observed, the most ‘different’ sequence to GATTACA is ACAGATT with total score equal to 1 and the most similar to GATTACA sequence is of course GATTACA sequence with the highest score equal to 37.

2.3.3 How to find the most 'different' k-mer to 'whole' given set of k-mers?

In this example, the most ‘different’ k-mer to whole given set of heptamers will be indicated. The same heptamers as in example 2 are used.

```r

Given hexamers

kmers_given <- c("GATTACA", "ACAGATT", "GAATTAC", "GAAATCT", "CTATAGA", "GTACATA", "AACGATT")

Matrix calculation

kmersmat <- kmeRssimilaritymatrix(q = kmersgiven, submat = "BLOSUM62")

Score the matrix and sort by decreasing score

kmersres <- kmeRsscore(kmers_mat)

Fancy knitr table

knitr::kable(kmers_res) ```

	GATTACA	ACAGATT	GAATTAC	GAAATCT	CTATAGA	GTACATA	AACGATT	Sum
CTATAGA	7	3	6	-2	37	11	0	62
AACGATT	4	24	1	8	0	6	37	80
ACAGATT	1	37	1	8	3	9	24	83
GAATTAC	15	1	37	18	6	9	1	87
GTACATA	12	9	9	9	11	37	6	93
GATTACA	37	1	15	19	7	12	4	95
GAAATCT	19	8	18	37	-2	9	8	97

As can be observed, the most ‘different’ sequence to all given heptamers is CTATAGA with score equal to 62 and the most similar sequence is GAAATCT with the highest score equal to 97.

2.3.4 How to calculate basic statistics for the matrix?

Applying function kmeRs_statistics to the result matrix the basic statistics can be calculated as additional rows. When summary_statistics_only is set to TRUE only summary table is returned. It is much more elegant way to present results, especially in case of ‘big data’ output.

```r

Calculate stats

kmersstats <- kmeRsstatistics(kmers_res)

Fancy knitr table

knitr::kable(kmersstats[ ,1:(dim(kmersstats)[2] - 4) ]) ```

	GATTACA	ACAGATT	GAATTAC	GAAATCT	CTATAGA	GTACATA	AACGATT	Sum
CTATAGA	7.00	3.00	6.00	-2.00	37.00	11.00	0.00	62.00
AACGATT	4.00	24.00	1.00	8.00	0.00	6.00	37.00	80.00
ACAGATT	1.00	37.00	1.00	8.00	3.00	9.00	24.00	83.00
GAATTAC	15.00	1.00	37.00	18.00	6.00	9.00	1.00	87.00
GTACATA	12.00	9.00	9.00	9.00	11.00	37.00	6.00	93.00
GATTACA	37.00	1.00	15.00	19.00	7.00	12.00	4.00	95.00
GAAATCT	19.00	8.00	18.00	37.00	-2.00	9.00	8.00	97.00
Min	1.00	1.00	1.00	-2.00	-2.00	6.00	0.00	62.00
Max	37.00	37.00	37.00	37.00	37.00	37.00	37.00	97.00
Mean	13.57	11.86	12.43	13.86	8.86	13.29	11.43	85.29
SD	12.08	13.64	12.62	12.40	13.16	10.63	13.83	12.04

2.3.5 How to display a similarity matrix as a heatmap?

Simply applying function kmeRs_heatmap to the result matrix.

```r

Heatmap without sum column

kmeRsheatmap(kmersres[, -8])
```

2.3 Tutorials

Package vignettes are avaliable in /kmeRs/vignettes/ folder of the repository or directly from the package by command:

```r

To show the introduction vignette

vignette("IntroductiontokmeRs_package", package="kmeRs") ```

3. Tests

Test are implemented as continuous integration by CircleCI. Continuous integration is a DevOps software development practice where code changes are regularly merge into a central repository, after which automated builds and tests are run.

The source of tests covering the code of the package are available in /kmeRs/tests/ folder of the repository.

4. How to cite

When applying kmeRs please cite both:

CRAN repository : https://doi.org/10.32614/cran.package.kmers
Original paper : TBA

5. Acknowledgement

Special thanks to Jason Lin, PhD from Chiba Cancer Center Research Institute, Chiba, Japan for contribution in 2021 as implementing of heatmap function and update deprecated functions. The kmeRsgeneratekmers and kmeRs_twoSeqSim base on orginal source from tcR and rDNAse, respectively.

6. References

Needleman, Saul B., and Christian D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.” Journal of Molecular Biology 48 (3): 443–53. https://doi.org/10.1016/0022-2836(70)90057-4.
Smith, T. F., and M. S. Waterman. 1981. “Identification of Common Molecular Subsequences.” Journal of Molecular Biology 147 (1): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.

Owner

Name: Rafal Urniaz, PhD
Login: urniaz
Kind: user

Repositories: 1
Profile: https://github.com/urniaz

GitHub Events

Total

Watch event: 9

Last Year

Watch event: 9

Packages

Total packages: 1
Total downloads:
- cran 171 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

cran.r-project.org: kmeRs

K-Mers Similarity Score Matrix and HeatMap

Homepage: https://github.com/urniaz/kmeRs
Documentation: http://cran.r-project.org/web/packages/kmeRs/kmeRs.pdf
License: GPL-3
Status: removed
Latest release: 2.1.0
published almost 2 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 171 Last month

Rankings

Dependent packages count: 28.5%

Dependent repos count: 35.2%

Average: 50.1%

Downloads: 86.5%

Maintainers (1)

rafal.urniaz@cantab.net

Last synced: about 1 year ago

Dependencies

kmeRs/DESCRIPTION cran

R >= 3.4.0 depends
BiocGenerics * imports
Biostrings * imports
pwalign * imports
stats * imports
utils * imports
RColorBrewer * suggests
knitr * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests
unittest * suggests

	G	A	L	M	F	W	K	Q	E	S	P	V	I	C	Y	H	R	N	D	T
G	6	0	-4	-3	-3	-2	-2	-2	-2	0	-2	-3	-4	-3	-3	-2	-2	0	-1	-2
A	0	4	-1	-1	-2	-3	-1	-1	-1	1	-1	0	-1	0	-2	-2	-1	-2	-2	0
L	-4	-1	4	2	0	-2	-2	-2	-3	-2	-3	1	2	-1	-1	-3	-2	-3	-4	-1
M	-3	-1	2	5	0	-1	-1	0	-2	-1	-2	1	1	-1	-1	-2	-1	-2	-3	-1
F	-3	-2	0	0	6	1	-3	-3	-3	-2	-4	-1	0	-2	3	-1	-3	-3	-3	-2
W	-2	-3	-2	-1	1	11	-3	-2	-3	-3	-4	-3	-3	-2	2	-2	-3	-4	-4	-2
K	-2	-1	-2	-1	-3	-3	5	1	1	0	-1	-2	-3	-3	-2	-1	2	0	-1	-1
Q	-2	-1	-2	0	-3	-2	1	5	2	0	-1	-2	-3	-3	-1	0	1	0	0	-1
E	-2	-1	-3	-2	-3	-3	1	2	5	0	-1	-2	-3	-4	-2	0	0	0	2	-1
S	0	1	-2	-1	-2	-3	0	0	0	4	-1	-2	-2	-1	-2	-1	-1	1	0	1
P	-2	-1	-3	-2	-4	-4	-1	-1	-1	-1	7	-2	-3	-3	-3	-2	-2	-2	-1	-1
V	-3	0	1	1	-1	-3	-2	-2	-2	-2	-2	4	3	-1	-1	-3	-3	-3	-3	0
I	-4	-1	2	1	0	-3	-3	-3	-3	-2	-3	3	4	-1	-1	-3	-3	-3	-3	-1
C	-3	0	-1	-1	-2	-2	-3	-3	-4	-1	-3	-1	-1	9	-2	-3	-3	-3	-3	-1
Y	-3	-2	-1	-1	3	2	-2	-1	-2	-2	-3	-1	-1	-2	7	2	-2	-2	-3	-2
H	-2	-2	-3	-2	-1	-2	-1	0	0	-1	-2	-3	-3	-3	2	8	0	1	-1	-2
R	-2	-1	-2	-1	-3	-3	2	1	0	-1	-2	-3	-3	-3	-2	0	5	0	-2	-1
N	0	-2	-3	-2	-3	-4	0	0	0	1	-2	-3	-3	-3	-2	1	0	6	1	0
D	-1	-2	-4	-3	-3	-4	-1	0	2	0	-1	-3	-3	-3	-3	-1	-2	1	6	-1
T	-2	0	-1	-1	-2	-2	-1	-1	-1	1	-1	0	-1	-1	-2	-2	-1	0	-1	5

	G	A	L	M	F	W	K	Q	E	S	P	V	I	C	Y	H	R	N	D	T
G	6	0	-4	-3	-3	-2	-2	-2	-2	0	-2	-3	-4	-3	-3	-2	-2	0	-1	-2
A	0	4	-1	-1	-2	-3	-1	-1	-1	1	-1	0	-1	0	-2	-2	-1	-2	-2	0
L	-4	-1	4	2	0	-2	-2	-2	-3	-2	-3	1	2	-1	-1	-3	-2	-3	-4	-1
M	-3	-1	2	5	0	-1	-1	0	-2	-1	-2	1	1	-1	-1	-2	-1	-2	-3	-1
F	-3	-2	0	0	6	1	-3	-3	-3	-2	-4	-1	0	-2	3	-1	-3	-3	-3	-2
W	-2	-3	-2	-1	1	11	-3	-2	-3	-3	-4	-3	-3	-2	2	-2	-3	-4	-4	-2
K	-2	-1	-2	-1	-3	-3	5	1	1	0	-1	-2	-3	-3	-2	-1	2	0	-1	-1
Q	-2	-1	-2	0	-3	-2	1	5	2	0	-1	-2	-3	-3	-1	0	1	0	0	-1
E	-2	-1	-3	-2	-3	-3	1	2	5	0	-1	-2	-3	-4	-2	0	0	0	2	-1
S	0	1	-2	-1	-2	-3	0	0	0	4	-1	-2	-2	-1	-2	-1	-1	1	0	1
P	-2	-1	-3	-2	-4	-4	-1	-1	-1	-1	7	-2	-3	-3	-3	-2	-2	-2	-1	-1
V	-3	0	1	1	-1	-3	-2	-2	-2	-2	-2	4	3	-1	-1	-3	-3	-3	-3	0
I	-4	-1	2	1	0	-3	-3	-3	-3	-2	-3	3	4	-1	-1	-3	-3	-3	-3	-1
C	-3	0	-1	-1	-2	-2	-3	-3	-4	-1	-3	-1	-1	9	-2	-3	-3	-3	-3	-1
Y	-3	-2	-1	-1	3	2	-2	-1	-2	-2	-3	-1	-1	-2	7	2	-2	-2	-3	-2
H	-2	-2	-3	-2	-1	-2	-1	0	0	-1	-2	-3	-3	-3	2	8	0	1	-1	-2
R	-2	-1	-2	-1	-3	-3	2	1	0	-1	-2	-3	-3	-3	-2	0	5	0	-2	-1
N	0	-2	-3	-2	-3	-4	0	0	0	1	-2	-3	-3	-3	-2	1	0	6	1	0
D	-1	-2	-4	-3	-3	-4	-1	0	2	0	-1	-3	-3	-3	-3	-1	-2	1	6	-1
T	-2	0	-1	-1	-2	-2	-1	-1	-1	1	-1	0	-1	-1	-2	-2	-1	0	-1	5

	G	A	L	M	F	W	K	Q	E	S	P	V	I	C	Y	H	R	N	D	T
G	6	0	-4	-3	-3	-2	-2	-2	-2	0	-2	-3	-4	-3	-3	-2	-2	0	-1	-2
A	0	4	-1	-1	-2	-3	-1	-1	-1	1	-1	0	-1	0	-2	-2	-1	-2	-2	0
L	-4	-1	4	2	0	-2	-2	-2	-3	-2	-3	1	2	-1	-1	-3	-2	-3	-4	-1
M	-3	-1	2	5	0	-1	-1	0	-2	-1	-2	1	1	-1	-1	-2	-1	-2	-3	-1
F	-3	-2	0	0	6	1	-3	-3	-3	-2	-4	-1	0	-2	3	-1	-3	-3	-3	-2
W	-2	-3	-2	-1	1	11	-3	-2	-3	-3	-4	-3	-3	-2	2	-2	-3	-4	-4	-2
K	-2	-1	-2	-1	-3	-3	5	1	1	0	-1	-2	-3	-3	-2	-1	2	0	-1	-1
Q	-2	-1	-2	0	-3	-2	1	5	2	0	-1	-2	-3	-3	-1	0	1	0	0	-1
E	-2	-1	-3	-2	-3	-3	1	2	5	0	-1	-2	-3	-4	-2	0	0	0	2	-1
S	0	1	-2	-1	-2	-3	0	0	0	4	-1	-2	-2	-1	-2	-1	-1	1	0	1
P	-2	-1	-3	-2	-4	-4	-1	-1	-1	-1	7	-2	-3	-3	-3	-2	-2	-2	-1	-1
V	-3	0	1	1	-1	-3	-2	-2	-2	-2	-2	4	3	-1	-1	-3	-3	-3	-3	0
I	-4	-1	2	1	0	-3	-3	-3	-3	-2	-3	3	4	-1	-1	-3	-3	-3	-3	-1
C	-3	0	-1	-1	-2	-2	-3	-3	-4	-1	-3	-1	-1	9	-2	-3	-3	-3	-3	-1
Y	-3	-2	-1	-1	3	2	-2	-1	-2	-2	-3	-1	-1	-2	7	2	-2	-2	-3	-2
H	-2	-2	-3	-2	-1	-2	-1	0	0	-1	-2	-3	-3	-3	2	8	0	1	-1	-2
R	-2	-1	-2	-1	-3	-3	2	1	0	-1	-2	-3	-3	-3	-2	0	5	0	-2	-1
N	0	-2	-3	-2	-3	-4	0	0	0	1	-2	-3	-3	-3	-2	1	0	6	1	0
D	-1	-2	-4	-3	-3	-4	-1	0	2	0	-1	-3	-3	-3	-3	-1	-2	1	6	-1
T	-2	0	-1	-1	-2	-2	-1	-1	-1	1	-1	0	-1	-1	-2	-2	-1	0	-1	5