https://github.com/akikuno/midsv
a Python module to translate SAM into MIDSV format.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: plos.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Repository
a Python module to translate SAM into MIDSV format.
Basic Info
- Host: GitHub
- Owner: akikuno
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://akikuno.github.io/midsv/midsv/
- Size: 14.2 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 3
- Releases: 22
Topics
Metadata Files
README.md
midsv
midsv is a Python module to convert SAM to MIDSV format.
MIDSV (Match, Insertion, Deletion, Substitution, and inVersion) is a comma-separated format representing the difference between a reference and a query with the same length as the reference.
⚠️ MIDSV is for the target amplicon sequence (10-100 kbp). It may crash when whole chromosomes are used as reference due to running out of memory.
MIDSV provides MIDSV, CSSPLIT, and QSCORE.
MIDSVis a simple representation focusing on mutationsCSSPLITkeeps original nucleotidesQSCOREprovides Phred quality score on each nucleotide
MIDSV (formerly named MIDS) details are described in our paper.
Installation
From PyPI:
bash
pip install midsv
From Bioconda:
bash
conda install -c bioconda midsv
Usage
python
midsv.transform(
sam: list[list],
midsv: bool = True,
cssplit: bool = True,
qscore: bool = True) -> list[dict]
midsv.transform()returns a list of dictionaries incudingQNAME,RNAME,MIDSV,CSSPLIT, andQSCORE.MIDSV,CSSPLIT, andQSCOREare comma-separated and have the same reference sequence length.
```python import midsv
Perfect match
sam = [ ['@SQ', 'SN:example', 'LN:10'], ['match', '0', 'example', '1', '60', '10M', '*', '0', '0', 'ACGTACGTAC', '0123456789', 'cs:Z:=ACGTACGTAC'] ]
midsv.transform(sam)
[{
'QNAME': 'control',
'RNAME': 'example',
'MIDSV': 'M,M,M,M,M,M,M,M,M,M',
'CSSPLIT': '=A,=C,=G,=T,=A,=C,=G,=T,=A,=C',
'QSCORE': '15,16,17,18,19,20,21,22,23,24'
}]
Insertion, deletion and substitution
sam = [ ['@SQ', 'SN:example', 'LN:10'], ['indel_sub', '0', 'example', '1', '60', '5M3I1M2D2M', '', '0', '0', 'ACGTGTTTCGT', '01234!!!56789', 'cs:Z:=ACGTag+ttt=C-aa=GT'] ]
midsv.transform(sam)
[{
'QNAME': 'indel_sub',
'RNAME': 'example',
'MIDSV': 'M,M,M,M,S,3M,D,D,M,M',
'CSSPLIT': '=A,=C,=G,=T,*AG,+T|+T|+T|=C,-A,-A,=G,=T',
'QSCORE': '15,16,17,18,19,0|0|0|20,-1,-1,21,22'
}]
Large deletion
sam = [ ['@SQ', 'SN:example', 'LN:10'], ['large-deletion', '0', 'example', '1', '60', '2M', '', '0', '0', 'AC', '01', 'cs:Z:=AC'], ['large-deletion', '0', 'example', '9', '60', '2M', '', '0', '0', 'AC', '89', 'cs:Z:=AC'] ]
midsv.transform(sam)
[
{'QNAME': 'large-deletion',
'RNAME': 'example',
'MIDSV': 'M,M,D,D,D,D,D,D,M,M',
'CSSPLIT': '=A,=C,N,N,N,N,N,N,=A,=C',
'QSCORE': '15,16,-1,-1,-1,-1,-1,-1,23,24'}
]
Inversion
sam = [ ['@SQ', 'SN:example', 'LN:10'], ['inversion', '0', 'example', '1', '60', '5M', '', '0', '0', 'ACGTA', '01234', 'cs:Z:=ACGTA'], ['inversion', '16', 'example', '6', '60', '3M', '', '0', '0', 'CGT', '567', 'cs:Z:=CGT'], ['inversion', '2048', 'example', '9', '60', '2M', '*', '0', '0', 'AC', '89', 'cs:Z:=AC'] ]
midsv.transform(sam)
[
{'QNAME': 'inversion',
'RNAME': 'example',
'MIDSV': 'M,M,M,M,M,m,m,m,M,M',
'CSSPLIT': '=A,=C,=G,=T,=A,=c,=g,=t,=A,=C',
'QSCORE': '15,16,17,18,19,20,21,22,23,24'}
]
```
Operators
MIDSV
| Op | Description | | ----------- | --------------------------- | | M | Identical sequence | | [1-9][0-9]+ | Insertion to the reference | | D | Deletion from the reference | | S | Substitution | | N | Unknown | | [mdsn] | Inversion |
MIDSV represents insertion as an integer and appends the following operators.
If five insertions follow three matches, MIDSV returns 5M,M,M (not 5,M,M,M) since 5M,M,M keeps reference sequence length in a comma-separated field.
CSSPLIT
| Op | Regex | Description | | --- | -------------- | ---------------------------- | | = | [ACGTN] | Identical sequence | | + | [ACGTN] | Insertion to the reference | | - | [ACGTN] | Deletion from the reference | | * | [ACGTN][ACGTN] | Substitution | | | [acgtn] | Inversion | | | | | Separater of insertion sites |
CSSPLIT uses | to separate nucleotides in insertion sites.
Therefore, +A|+C|+G|+T|=A can be easily splited to [+A, +C, +G, +T, =A] by "+A|+C|+G|+T|=A".split("|") in Python.
QSCORE
| Op | Description | | --- | ---------------------------- | | -1 | Unknown | | | | Separator at insertion sites |
QSCORE uses -1 at deletion or unknown nucleotides.
As with CSSPLIT, QSCORE uses | to separate quality scores in insertion sites.
Helper functions
Read SAM file
python
midsv.read_sam(path_of_sam: str | Path) -> list[list]
midsv.read_sam read SAM file into a list of lists.
Read/Write JSON Line (JSONL)
python
midsv.write_jsonl(dict: list[dict], path_of_jsonl: str | Path)
python
midsv.read_jsonl(path_of_jsonl: str | Path) -> list[dict]
Since midsv returns a list of dictionaries, midsv.write_jsonl outputs it to a file in JSONL format.
Conversely, midsv.read_jsonl reads JSONL as a list of dictionaries.
Owner
- Name: Akihiro Kuno
- Login: akikuno
- Kind: user
- Location: Tsukuba, Ibaraki, Japan
- Company: University of Tsukuba
- Website: https://researchmap.jp/7000027584/?lang=en
- Twitter: akikuno_sh
- Repositories: 12
- Profile: https://github.com/akikuno
Bioinformatician working at the Laboratory Animal Resource Center
GitHub Events
Total
- Create event: 4
- Release event: 1
- Issues event: 6
- Delete event: 3
- Issue comment event: 1
- Push event: 38
- Pull request event: 6
Last Year
- Create event: 4
- Release event: 1
- Issues event: 6
- Delete event: 3
- Issue comment event: 1
- Push event: 38
- Pull request event: 6
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 7
- Total pull requests: 3
- Average time to close issues: 8 months
- Average time to close pull requests: 4 minutes
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.57
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 3
- Average time to close issues: about 1 hour
- Average time to close pull requests: 4 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.2
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- akikuno (7)
Pull Request Authors
- akikuno (6)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 815 last-month
- Total dependent packages: 1
- Total dependent repositories: 1
- Total versions: 24
- Total maintainers: 1
pypi.org: midsv
Python module to convert SAM to MIDSV format.
- Homepage: https://github.com/akikuno/midsv
- Documentation: https://midsv.readthedocs.io/
- License: MIT
-
Latest release: 0.11.1
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- peaceiris/actions-gh-pages v3 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- pypa/gh-action-pypi-publish master composite