biomarkovchains.jl

Representing biological sequences as Markov chains

https://github.com/biojulia/biomarkovchains.jl

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary

Keywords

bioinformatics biology dna julia julialang markov-chain

Keywords from Contributors

finite-volume numeric mesh interpretability interactive matrix-exponential pde projection optim controllers
Last synced: 6 months ago · JSON representation ·

Repository

Representing biological sequences as Markov chains

Basic Info
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 22
Topics
bioinformatics biology dna julia julialang markov-chain
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License Citation

README.md


Representing biological sequences as Markov chains

[![Documentation](https://img.shields.io/badge/documentation-online-blue.svg?logo=Julia&logoColor=white)](https://biojulia.dev/BioMarkovChains.jl/dev/) [![Latest Release](https://img.shields.io/github/release/BioJulia/BioMarkovChains.jl.svg)](https://github.com/BioJulia/BioMarkovChains.jl/releases/latest) [![DOI](https://zenodo.org/badge/665161607.svg)](https://zenodo.org/badge/latestdoi/665161607)
[![CI Workflow](https://github.com/BioJulia/BioMarkovChains.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/BioJulia/BioMarkovChains.jl/actions/workflows/CI.yml) [![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/BioJulia/BioMarkovChains.jl/blob/main/LICENSE) [![Work in Progress](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip) [![Downloads](https://img.shields.io/badge/dynamic/json?url=http%3A%2F%2Fjuliapkgstats.com%2Fapi%2Fv1%2Fmonthly_downloads%2FBioMarkovChains&query=total_requests&suffix=%2Fmonth&label=Downloads)](http://juliapkgstats.com/pkg/BioMarkovChains) [![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl) [![JET](https://img.shields.io/badge/%E2%9C%88%EF%B8%8F%20tested%20with%20-%20JET.jl%20-%20red)](https://github.com/aviatesk/JET.jl)

BioMarkovChains

A Julia package to represent biological sequences as Markov chains

Installation

BioMarkovChains is a   Julia Language   package. To install BioMarkovChains, please open Julia's interactive session (known as REPL) and press ] key in the REPL to use the package mode, then type the following command

julia pkg> add BioMarkovChains

Creating Markov chain out of DNA sequences

An important step before developing several gene finding algorithms consist of having a Markov chain representation of the DNA. To do so, we implemented the BioMarkovChain method that will capture the initials and transition probabilities of a DNA sequence (LongSequence) and will create a dedicated object storing relevant information of a DNA Markov chain. Here an example:

Let find one ORF in a random LongDNA :

```julia using BioSequences, BioMarkovChains, GeneFinder

> 180195.SAMN03785337.LFLS01000089 -> finds only 1 gene in Prodigal (from Pyrodigal tests)

seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC"

orfseq = findorfs(seq)[3] |> sequence

21nt DNA Sequence: ATGCGTCGAATGGCACGGTGA ```

If we translate it, we get a 7aa sequence:

```julia translate(orfseq)

7aa Amino Acid Sequence: MRRMAR* ```

Now supposing I do want to see how transitions are occurring in this ORF sequence, the I can use the BioMarkovChain method and tune it to 2nd-order Markov chain:

```julia BioMarkovChain(orfseq, 2)

BioMarkovChain of DNA alphabet and order 1: - Transition Probability Matrix -> Matrix{Float64}(4 × 4): 0.25 0.25 0.0 0.5 0.25 0.0 0.75 0.0 0.25 0.25 0.25 0.25 0.0 0.25 0.75 0.0 - Initial Probabilities -> Vector{Float64}(4 × 1): 0.2 0.2 0.4 0.2

```

But I can also have a BioMarkovChain instance of the Ammino Acid sequence:

```julia BioMarkovChain(translate(orfseq), 2)

BioMarkovChain of AminoAcid alphabet and order 1: - Transition Probability Matrix -> Matrix{Float64}(20 × 20): 0.0 1.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 - Initial Probabilities -> Vector{Float64}(20 × 1): 0.167 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0

```

This is useful to later create HMMs and calculate sequence probability based on a given model, for instance we now have the E. coli CDS and No-CDS transition models or Markov chain implemented:

```julia ECOLICDS

BioMarkovChain of DNA alphabet and order 1: - Transition Probability Matrix -> Matrix{Float64}(4 × 4): 0.31 0.224 0.199 0.268 0.251 0.215 0.313 0.221 0.236 0.308 0.249 0.207 0.178 0.217 0.338 0.267 - Initial Probabilities -> Vector{Float64}(4 × 1): 0.245 0.243 0.273 0.239 ```

What is then the probability of the previous DNA sequence given this model?

```julia markovprobability(orfseq, model=ECOLICDS, logscale=true)

-39.71754773536592 ```

This is off course not very informative, but we can later use different criteria to then classify new ORFs. For a more detailed explanation see the docs

Owner

  • Name: BioJulia
  • Login: BioJulia
  • Kind: organization

Bioinformatics and Computational Biology in Julia

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find useful this repo, please cite it as below."
authors:
- family-names: "García-Botero"
  given-names: "Camilo"
  orcid: "https://orcid.org/0000-0002-0426-7007"
title: "BioMarkovChains.jl: a Julia package to represent DNA as markov chains"
version: 0.0.2
type: software
date-released: 2022-12-13
url: "https://github.com/camilogarciabotero/BioMarkovChains.jl"
doi: 10.5281/zenodo.8157635
preferred-citation:
  type: software
  authors:
  - family-names: "García-Botero"
  given-names: "Camilo"
  orcid: "https://orcid.org/0000-0002-0426-7007"
  title: "BioMarkovChains.jl: a Julia package to represent DNA as markov chains"
  version: 0.0.3
  date-released: 2022-12-13
  url: "https://github.com/camilogarciabotero/BioMarkovChains.jl"
  doi: 10.5281/zenodo.8157635

GitHub Events

Total
  • Watch event: 3
  • Issue comment event: 1
  • Push event: 5
  • Pull request event: 1
  • Create event: 1
Last Year
  • Watch event: 3
  • Issue comment event: 1
  • Push event: 5
  • Pull request event: 1
  • Create event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 262
  • Total Committers: 3
  • Avg Commits per committer: 87.333
  • Development Distribution Score (DDS): 0.015
Past Year
  • Commits: 23
  • Committers: 1
  • Avg Commits per committer: 23.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Camilo García c****2@u****o 258
dependabot[bot] 4****] 2
CompatHelper Julia c****y@j****g 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 hour
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (2)
  • camilogarciabotero (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2)

Dependencies

.github/workflows/CI.yml actions
  • actions/checkout v4 composite
  • codecov/codecov-action v3 composite
  • julia-actions/cache v1 composite
  • julia-actions/julia-buildpkg v1 composite
  • julia-actions/julia-docdeploy v1 composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest v1 composite
  • julia-actions/setup-julia v1 composite
.github/workflows/CompatHelper.yml actions
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
.github/workflows/register.yml actions
  • julia-actions/RegisterAction latest composite