variant

python utils for genomic variant (WIP)

https://github.com/y9c/variant

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.0%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

python utils for genomic variant (WIP)

Basic Info
  • Host: GitHub
  • Owner: y9c
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 623 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

Python pakcage for genomic variant analysis

Pypi Releases Downloads

How to install?

pip install variant

How to use?

🧬 variant motif subcommand can fetch motif sequence around given site.

``` Usage: variant motif [OPTIONS]

Fetch genomic motif.

╭─ Options ─────────────────────────────────────────────────────────────────╮ │ --input -i TEXT Input position file. │ │ --output -o TEXT Output annotation file. │ │ * --fasta -f TEXT reference fasta file. [required] │ │ --npad -n TEXT Number of padding base to call motif. If you │ │ want to set different left and right pads, │ │ use comma to separate them. (eg. 2,3) │ │ --padding -p TEXT Padding base to use for motif. 'N' by default │ │ but can be set to any single letter │ │ --with-header -H With header line in input file. │ │ --columns -c TEXT Sets columns for site info. │ │ (Chrom,Pos,Strand) │ │ [default: 1,2,3] │ │ --to-upper -u Convert motif to upper case. │ │ --wrap-site -w Wrap motif site. │ │ --help -h Show this message and exit. │ ╰───────────────────────────────────────────────────────────────────────────╯ ```

demo:

I would like to get the 2 bases before the given sites, and 3 bases after the given sites, meanwhile, wrap the give sites with bracket. Moreover, the strand information should be taken into account.

use -n 2,3 -w

🧫 variant effect subcommand can infer the effect of a mutation

``` Usage: variant effect [OPTIONS]

Annotation genomic variant effect.

╭─ Options ────────────────────────────────────────────────────────────────╮ │ --input -i TEXT Input position file. │ │ --output -o TEXT Output annotation file │ │ --reference -r TEXT reference species │ │ --reference-gtf TEXT Customized reference gtf file. │ │ --reference-transcript TEXT Customized reference transcript │ │ fasta file. │ │ --reference-protein TEXT Customized reference protein fasta │ │ file. │ │ --release -e INTEGER ensembl release │ │ --strandness -s Use strand infomation or not? │ │ --pU-mode -u Make rRNA, tRNA, snoRNA into top │ │ priority. │ │ --npad -n INTEGER Number of padding base to call │ │ motif. │ │ --all-effects -a Output all effects. │ │ --with-header -H With header line in input file. │ │ --columns -c TEXT Sets columns for site info. │ │ (Chrom,Pos,Strand,Ref,Alt) │ │ [default: 1,2,3,4,5] │ │ --help -h Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────╯ ```

demo:

Store the following table in file (sites.tsv).

| Chrom | Position | Strand | Ref | Alt | | ----- | --------- | ------ | --- | --- | | chr1 | 230703034 | - | C | T | | chr12 | 69353439 | + | A | T | | chr14 | 23645352 | + | G | T | | chr2 | 215361150 | - | A | T | | chr2 | 84906537 | + | C | T | | chr22 | 39319077 | - | T | A | | chr22 | 39319095 | - | T | A | | chr22 | 39319098 | - | T | A |

Run command:

bash variant-effect -i sites.tsv -H -r human -e 108 -t RNA -H -c 1,2,3

  • -i specify the input file
  • -H means the file is with header line, and the first row will be skipped;
  • -r use the specific genome, default is human
  • -e specify the Ensembl release version
  • -c means only use some of the columns in the input file. default will use the first 5 columns.

You will have this output

| Chrom | Position | Strand | Ref | Alt | muttype | genetype | genename | genepos | transcriptname | transcriptpos | transcriptmotif | codingpos | codonref | aapos | aaref | distance2splice | | :---- | :-------- | :----- | :-- | :-- | :------------ | :------------- | :---------------------- | :------- | :-------------------------- | :------------- | :-------------------- | :--------- | :-------- | :----- | :----- | --------------- | | chr1 | 230703034 | - | C | T | ThreePrimeUTR | proteincoding | ENSG00000135744(AGT) | 42543 | ENST00000680041(AGT-208) | 1753 | TGTGTCACCCCCAGTCTCCCA | None | None | None | None | 295 | | chr12 | 69353439 | + | A | T | ThreePrimeUTR | proteincoding | ENSG00000090382(LYZ) | 5059 | ENST00000261267(LYZ-201) | 695 | TAGAACTAATACTGGTGAAAA | None | None | None | None | 286 | | chr14 | 23645352 | + | G | T | ThreePrimeUTR | proteincoding | ENSG00000100867(DHRS2) | 15238 | ENST00000344777(DHRS2-202) | 1391 | CTGCCATTCTGCCAGACTAGC | None | None | None | None | 210 | | chr2 | 215361150 | - | A | T | ThreePrimeUTR | proteincoding | ENSG00000115414(FN1) | 74924 | ENST00000323926(FN1-201) | 8012 | GGCCCGCAATACTGTAGGAAC | None | None | None | None | 476 | | chr2 | 84906537 | + | C | T | ThreePrimeUTR | proteincoding | ENSG00000034510(TMSB10) | 882 | ENST00000233143(TMSB10-201) | 327 | CCTGGGCACTCCGCGCCGATG | None | None | None | None | 148 | | chr22 | 39319077 | - | T | A | Intronic | proteincoding | ENSG00000100316(RPL3) | 1313 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None | | chr22 | 39319095 | - | T | A | Intronic | proteincoding | ENSG00000100316(RPL3) | 1295 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None | | chr22 | 39319098 | - | T | A | Intronic | protein_coding | ENSG00000100316(RPL3) | 1292 | ENST00000216146(RPL3-201) | None | None | None | None | None | None | None |

🧫 variant coordinate subcommand can mapping chrom name and positions between different reference coordinate

``` Usage: variant coordinate [OPTIONS]

Fetch genomic motif.

╭─ Options ───────────────────────────────────────────────────────────────────╮ │ --input -i TEXT Input position file. │ │ --output -o TEXT Output annotation file. │ │ --reference-mapping -m TEXT Mapping file for chrom name, first column is │ │ chrom in the input, second column is chrom │ │ in the reference db (sep by tab) │ │ --buildin-mapping -M TEXT Build-in mapping for chrom name: U2E (UCSC │ │ to Ensembl), E2U (Ensembl to UCSC) │ │ --with-header -H With header line in input file. │ │ --columns -c TEXT Sets columns for site info. (Chrom) │ │ [default: 1] │ │ --help -h Show this message and exit. │ ╰─────────────────────────────────────────────────────────────────────────────╯

```

⏳⏳⏳ more functions will be supported in the future

TODO:

Owner

  • Name: Chang Y
  • Login: y9c
  • Kind: user

(yec)

GitHub Events

Total
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 37
Last Year
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 37

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • yangli04 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
  • pytest ^5.2 develop
  • setuptools 57.4.0 develop
  • click ^8.1.3
  • pyensembl ^2.0.0
  • python ^3.7
  • varcode *
.github/workflows/docs.yaml actions
  • actions/checkout v3 composite
  • actions/configure-pages v2 composite
  • actions/deploy-pages v1 composite
  • actions/jekyll-build-pages v1 composite
  • actions/upload-pages-artifact v1 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite