https://github.com/akikuno/midsv

a Python module to translate SAM into MIDSV format.

https://github.com/akikuno/midsv

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: plos.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

bioinformatics cstag long-read-sequencing sequence-analysis
Last synced: 5 months ago · JSON representation

Repository

a Python module to translate SAM into MIDSV format.

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 22
Topics
bioinformatics cstag long-read-sequencing sequence-analysis
Created over 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

Licence Test Python PyPI Bioconda

midsv

midsv is a Python module to convert SAM to MIDSV format.

MIDSV (Match, Insertion, Deletion, Substitution, and inVersion) is a comma-separated format representing the difference between a reference and a query with the same length as the reference.

⚠️ MIDSV is for the target amplicon sequence (10-100 kbp). It may crash when whole chromosomes are used as reference due to running out of memory.

MIDSV provides MIDSV, CSSPLIT, and QSCORE.

  • MIDSV is a simple representation focusing on mutations
  • CSSPLIT keeps original nucleotides
  • QSCORE provides Phred quality score on each nucleotide

MIDSV (formerly named MIDS) details are described in our paper.

Installation

From PyPI:

bash pip install midsv

From Bioconda:

bash conda install -c bioconda midsv

Usage

python midsv.transform( sam: list[list], midsv: bool = True, cssplit: bool = True, qscore: bool = True) -> list[dict]

  • midsv.transform() returns a list of dictionaries incuding QNAME, RNAME, MIDSV, CSSPLIT, and QSCORE.
  • MIDSV, CSSPLIT, and QSCORE are comma-separated and have the same reference sequence length.

```python import midsv

Perfect match

sam = [ ['@SQ', 'SN:example', 'LN:10'], ['match', '0', 'example', '1', '60', '10M', '*', '0', '0', 'ACGTACGTAC', '0123456789', 'cs:Z:=ACGTACGTAC'] ]

midsv.transform(sam)

[{

'QNAME': 'control',

'RNAME': 'example',

'MIDSV': 'M,M,M,M,M,M,M,M,M,M',

'CSSPLIT': '=A,=C,=G,=T,=A,=C,=G,=T,=A,=C',

'QSCORE': '15,16,17,18,19,20,21,22,23,24'

}]

Insertion, deletion and substitution

sam = [ ['@SQ', 'SN:example', 'LN:10'], ['indel_sub', '0', 'example', '1', '60', '5M3I1M2D2M', '', '0', '0', 'ACGTGTTTCGT', '01234!!!56789', 'cs:Z:=ACGTag+ttt=C-aa=GT'] ]

midsv.transform(sam)

[{

'QNAME': 'indel_sub',

'RNAME': 'example',

'MIDSV': 'M,M,M,M,S,3M,D,D,M,M',

'CSSPLIT': '=A,=C,=G,=T,*AG,+T|+T|+T|=C,-A,-A,=G,=T',

'QSCORE': '15,16,17,18,19,0|0|0|20,-1,-1,21,22'

}]

Large deletion

sam = [ ['@SQ', 'SN:example', 'LN:10'], ['large-deletion', '0', 'example', '1', '60', '2M', '', '0', '0', 'AC', '01', 'cs:Z:=AC'], ['large-deletion', '0', 'example', '9', '60', '2M', '', '0', '0', 'AC', '89', 'cs:Z:=AC'] ]

midsv.transform(sam)

[

{'QNAME': 'large-deletion',

'RNAME': 'example',

'MIDSV': 'M,M,D,D,D,D,D,D,M,M',

'CSSPLIT': '=A,=C,N,N,N,N,N,N,=A,=C',

'QSCORE': '15,16,-1,-1,-1,-1,-1,-1,23,24'}

]

Inversion

sam = [ ['@SQ', 'SN:example', 'LN:10'], ['inversion', '0', 'example', '1', '60', '5M', '', '0', '0', 'ACGTA', '01234', 'cs:Z:=ACGTA'], ['inversion', '16', 'example', '6', '60', '3M', '', '0', '0', 'CGT', '567', 'cs:Z:=CGT'], ['inversion', '2048', 'example', '9', '60', '2M', '*', '0', '0', 'AC', '89', 'cs:Z:=AC'] ]

midsv.transform(sam)

[

{'QNAME': 'inversion',

'RNAME': 'example',

'MIDSV': 'M,M,M,M,M,m,m,m,M,M',

'CSSPLIT': '=A,=C,=G,=T,=A,=c,=g,=t,=A,=C',

'QSCORE': '15,16,17,18,19,20,21,22,23,24'}

]

```

Operators

MIDSV

| Op | Description | | ----------- | --------------------------- | | M | Identical sequence | | [1-9][0-9]+ | Insertion to the reference | | D | Deletion from the reference | | S | Substitution | | N | Unknown | | [mdsn] | Inversion |

MIDSV represents insertion as an integer and appends the following operators.

If five insertions follow three matches, MIDSV returns 5M,M,M (not 5,M,M,M) since 5M,M,M keeps reference sequence length in a comma-separated field.

CSSPLIT

| Op | Regex | Description | | --- | -------------- | ---------------------------- | | = | [ACGTN] | Identical sequence | | + | [ACGTN] | Insertion to the reference | | - | [ACGTN] | Deletion from the reference | | * | [ACGTN][ACGTN] | Substitution | | | [acgtn] | Inversion | | | | | Separater of insertion sites |

CSSPLIT uses | to separate nucleotides in insertion sites.

Therefore, +A|+C|+G|+T|=A can be easily splited to [+A, +C, +G, +T, =A] by "+A|+C|+G|+T|=A".split("|") in Python.

QSCORE

| Op | Description | | --- | ---------------------------- | | -1 | Unknown | | | | Separator at insertion sites |

QSCORE uses -1 at deletion or unknown nucleotides.

As with CSSPLIT, QSCORE uses | to separate quality scores in insertion sites.

Helper functions

Read SAM file

python midsv.read_sam(path_of_sam: str | Path) -> list[list]

midsv.read_sam read SAM file into a list of lists.

Read/Write JSON Line (JSONL)

python midsv.write_jsonl(dict: list[dict], path_of_jsonl: str | Path)

python midsv.read_jsonl(path_of_jsonl: str | Path) -> list[dict]

Since midsv returns a list of dictionaries, midsv.write_jsonl outputs it to a file in JSONL format.

Conversely, midsv.read_jsonl reads JSONL as a list of dictionaries.

Owner

  • Name: Akihiro Kuno
  • Login: akikuno
  • Kind: user
  • Location: Tsukuba, Ibaraki, Japan
  • Company: University of Tsukuba

Bioinformatician working at the Laboratory Animal Resource Center

GitHub Events

Total
  • Create event: 4
  • Release event: 1
  • Issues event: 6
  • Delete event: 3
  • Issue comment event: 1
  • Push event: 38
  • Pull request event: 6
Last Year
  • Create event: 4
  • Release event: 1
  • Issues event: 6
  • Delete event: 3
  • Issue comment event: 1
  • Push event: 38
  • Pull request event: 6

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 7
  • Total pull requests: 3
  • Average time to close issues: 8 months
  • Average time to close pull requests: 4 minutes
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.57
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 3
  • Average time to close issues: about 1 hour
  • Average time to close pull requests: 4 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.2
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • akikuno (7)
Pull Request Authors
  • akikuno (6)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 815 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 24
  • Total maintainers: 1
pypi.org: midsv

Python module to convert SAM to MIDSV format.

  • Versions: 24
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 815 Last month
Rankings
Dependent packages count: 4.7%
Downloads: 13.7%
Average: 19.0%
Dependent repos count: 21.7%
Stargazers count: 25.0%
Forks count: 29.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/deploy_ghpages.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/deploy_pypi.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish master composite
setup.py pypi