biomarkovchains.jl
Representing biological sequences as Markov chains
Science Score: 64.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Representing biological sequences as Markov chains
Basic Info
- Host: GitHub
- Owner: BioJulia
- License: mit
- Language: Julia
- Default Branch: main
- Homepage: https://biojulia.dev/BioMarkovChains.jl/dev/
- Size: 3.57 MB
Statistics
- Stars: 9
- Watchers: 2
- Forks: 2
- Open Issues: 1
- Releases: 22
Topics
Metadata Files
README.md
Representing biological sequences as Markov chains
[](https://github.com/BioJulia/BioMarkovChains.jl/actions/workflows/CI.yml) [](https://github.com/BioJulia/BioMarkovChains.jl/blob/main/LICENSE) [](https://www.repostatus.org/#wip) [](http://juliapkgstats.com/pkg/BioMarkovChains) [](https://github.com/JuliaTesting/Aqua.jl) [](https://github.com/aviatesk/JET.jl)
BioMarkovChains
A Julia package to represent biological sequences as Markov chains
Installation
BioMarkovChains is a
Julia Language
package. To install BioMarkovChains,
please open
Julia's interactive session (known as REPL) and press ]
key in the REPL to use the package mode, then type the following command
julia
pkg> add BioMarkovChains
Creating Markov chain out of DNA sequences
An important step before developing several gene finding algorithms consist of having a Markov chain representation of the DNA. To do so, we implemented the BioMarkovChain method that will capture the initials and transition probabilities of a DNA sequence (LongSequence) and will create a dedicated object storing relevant information of a DNA Markov chain. Here an example:
Let find one ORF in a random LongDNA :
```julia using BioSequences, BioMarkovChains, GeneFinder
> 180195.SAMN03785337.LFLS01000089 -> finds only 1 gene in Prodigal (from Pyrodigal tests)
seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCAATCTGACTGTGGGCGGTGTTACCAACGGCACTGCTACTACTGGCAACATCGCACTGACCGGTAACAATGCGCTGAGCGGTCCGGTCAATCTGAATGCGTCGAATGGCACGGTGACCTTGAACACGACCGGCAATACCACGCTCGGTAACGTGACGGCACAAGGCAATGTGACGACCAATGTGTCCAACGGCAGTCTGACGGTTACCGGCAATACGACAGGTGCCAACACCAACCTCAGTGCCAGCGGCAACCTGACCGTGGGTAACCAGGGCAATATCAGTACCGCAGGCAATGCAACCCTGACGGCCGGCGACAACCTGACGAGCACTGGCAATCTGACTGTGGGCGGCGTCACCAACGGCACGGCCACCACCGGCAACATCGCGCTGACCGGTAACAATGCACTGGCTGGTCCTGTCAATCTGAACGCGCCGAACGGCACCGTGACCCTGAACACAACCGGCAATACCACGCTGGGTAATGTCACCGCACAAGGCAATGTGACGACTAATGTGTCCAACGGCAGCCTGACAGTCGCTGGCAATACCACAGGTGCCAACACCAACCTGAGTGCCAGCGGCAATCTGACCGTGGGCAACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAGC"
orfseq = findorfs(seq)[3] |> sequence
21nt DNA Sequence: ATGCGTCGAATGGCACGGTGA ```
If we translate it, we get a 7aa sequence:
```julia translate(orfseq)
7aa Amino Acid Sequence: MRRMAR* ```
Now supposing I do want to see how transitions are occurring in this ORF sequence, the I can use the BioMarkovChain method and tune it to 2nd-order Markov chain:
```julia BioMarkovChain(orfseq, 2)
BioMarkovChain of DNA alphabet and order 1: - Transition Probability Matrix -> Matrix{Float64}(4 × 4): 0.25 0.25 0.0 0.5 0.25 0.0 0.75 0.0 0.25 0.25 0.25 0.25 0.0 0.25 0.75 0.0 - Initial Probabilities -> Vector{Float64}(4 × 1): 0.2 0.2 0.4 0.2
```
But I can also have a BioMarkovChain instance of the Ammino Acid sequence:
```julia BioMarkovChain(translate(orfseq), 2)
BioMarkovChain of AminoAcid alphabet and order 1: - Transition Probability Matrix -> Matrix{Float64}(20 × 20): 0.0 1.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 - Initial Probabilities -> Vector{Float64}(20 × 1): 0.167 0.5 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
```
This is useful to later create HMMs and calculate sequence probability based on a given model, for instance we now have the E. coli CDS and No-CDS transition models or Markov chain implemented:
```julia ECOLICDS
BioMarkovChain of DNA alphabet and order 1: - Transition Probability Matrix -> Matrix{Float64}(4 × 4): 0.31 0.224 0.199 0.268 0.251 0.215 0.313 0.221 0.236 0.308 0.249 0.207 0.178 0.217 0.338 0.267 - Initial Probabilities -> Vector{Float64}(4 × 1): 0.245 0.243 0.273 0.239 ```
What is then the probability of the previous DNA sequence given this model?
```julia markovprobability(orfseq, model=ECOLICDS, logscale=true)
-39.71754773536592 ```
This is off course not very informative, but we can later use different criteria to then classify new ORFs. For a more detailed explanation see the docs
Owner
- Name: BioJulia
- Login: BioJulia
- Kind: organization
- Website: https://biojulia.dev
- Repositories: 79
- Profile: https://github.com/BioJulia
Bioinformatics and Computational Biology in Julia
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you find useful this repo, please cite it as below." authors: - family-names: "García-Botero" given-names: "Camilo" orcid: "https://orcid.org/0000-0002-0426-7007" title: "BioMarkovChains.jl: a Julia package to represent DNA as markov chains" version: 0.0.2 type: software date-released: 2022-12-13 url: "https://github.com/camilogarciabotero/BioMarkovChains.jl" doi: 10.5281/zenodo.8157635 preferred-citation: type: software authors: - family-names: "García-Botero" given-names: "Camilo" orcid: "https://orcid.org/0000-0002-0426-7007" title: "BioMarkovChains.jl: a Julia package to represent DNA as markov chains" version: 0.0.3 date-released: 2022-12-13 url: "https://github.com/camilogarciabotero/BioMarkovChains.jl" doi: 10.5281/zenodo.8157635
GitHub Events
Total
- Watch event: 3
- Issue comment event: 1
- Push event: 5
- Pull request event: 1
- Create event: 1
Last Year
- Watch event: 3
- Issue comment event: 1
- Push event: 5
- Pull request event: 1
- Create event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Camilo García | c****2@u****o | 258 |
| dependabot[bot] | 4****] | 2 |
| CompatHelper Julia | c****y@j****g | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: about 1 hour
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.5
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: about 1 hour
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.5
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (2)
- camilogarciabotero (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- codecov/codecov-action v3 composite
- julia-actions/cache v1 composite
- julia-actions/julia-buildpkg v1 composite
- julia-actions/julia-docdeploy v1 composite
- julia-actions/julia-processcoverage v1 composite
- julia-actions/julia-runtest v1 composite
- julia-actions/setup-julia v1 composite
- JuliaRegistries/TagBot v1 composite
- julia-actions/RegisterAction latest composite