https://github.com/anton-bushuiev/mutils

Mutation utilities for protein design

https://github.com/anton-bushuiev/mutils

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Mutation utilities for protein design

Basic Info
  • Host: GitHub
  • Owner: anton-bushuiev
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 29.9 MB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Mutation utilities for protein design

A very raw version, work in progress. To install, run pip install git+https://github.com/anton-bushuiev/mutils.git.

Mutation class

```python from mutils.mutations import Mutation

Parse a double-point mutation

mutation = Mutation('TC13A,GC13aA') mutation

Mutation(muts=[PointMutation(wt='T', chain='C', pos=13, ins=None, m='A'), PointMutation(wt='G', chain='C', pos=13, ins='a', m='A')])

Revert

mutation.revert()

Mutation(muts=[PointMutation(wt='A', chain='C', pos=13, ins=None, m='T'), PointMutation(wt='A', chain='C', pos=13, ins='a', m='G')])

Revert and convert back to string

str(mut.revert())

'AC13T,AC13aG'

Check if the wild type is present in a PDB file

mutation.wtinpdb(mypdbfile_path)

True

Convert wild types to Graphein format

mutation.wttographein()

['THR:C:13', 'GLY:C:13:a'] ```

MutationSpace class

```python from mutils.mutations import MutationSpace

Define possible substitutions

space = MutationSpace({'A10': 'AG', 'A11': 'A', 'A12': ''}) space

MutationSpace({'A10': 'AG', 'A11': 'A', 'A12': ''})

Construct all single-point mutations in the space

space.construct(d=1)

['A10A', 'A10G', 'A11A']

Construct all double-point mutations in the space

space.construct(d=2)

['A10A,A11A', 'A10G,A11A']

Construct all mutations in the space

space.construct()

['A10A,A11A', 'A10A', 'A10G,A11A', 'A10G', 'A11A']

Get the size of the space without constructing

space.size(d=2)

2 ```

Utilities for .pdb and .fasta files

```python from mutils.pdb import downloadpdb, pdb2fasta from mutils.proteins import parsefasta

Download structure from PDB

download_pdb('1bui')

Convert sequences to FASTA

fasta = pdb2fasta('1bui.pdb', verbose=False) fasta

'>1BUI:A\nPSFDCGKPQVEPKKCXPGXVVGGCVAHPHSWPWQVSLRTRFGMHFCGGTLISPEWVLTAAHCLEKSPRPSSYKVILGAHQEVNLEPHVQEIEVSRLFLEPTRXXXXXXKDIALLKLSSPAVITDKVIPACLPSPNYVVADRTECFITGWGETQGXXTFGAGLLKEAQLPVIENKVCNRYEFLNGRVQSTELCAGHLAGGTDSCQGDSGGPLVCFEKDKYILQGVTSWGLXGCARPNKPGVYVRVSRFVTWIEGVMRNN\n>1BUI:B\nAPSFDCGKPQVEPKKCXPGXVVGGCVAHPHSWPWQVSLRTRFGMHFCGGTLISPEWVLTAAHCLEKSPRPSSYKVILGAHQEVNLEPHVQEIEVSRLFLEPTRXXXXXXKDIALLKLSSPAVITDKVIPACLPSPNYVVADRTECFITGWGETQGXXTFGAGLLKEAQLPVIENKVCNRYEFLNGRVQSTELCAGHLAGGTDSCQGDSGGPLVCFEKDKYILQGVTSWGLXGCARPNKPGVYVRVSRFVTWIEGVMRNN\n>1BUI:C\nASYFEPTGPYLMVNVTGVDSKGNELLSPHYVEFPIKPGTTLTKEKIEYYVEWALDATAYKEFRVVELDPSAKIEVTYYDKNKKKEETKSFPITEKGFVVPDLSEHIKNPGFNLITKVVIEKK\n'

Get sequences as a dict

parse_fasta(fasta)

{'A': 'PSFDCGKPQVEPKKCPGVVGGCVAHPHSWPWQVSLRTRFGMHFCGGTLISPEWVLTAAHCLEKSPRPSSYKVILGAHQEVNLEPHVQEIEVSRLFLEPTRKDIALLKLSSPAVITDKVIPACLPSPNYVVADRTECFITGWGETQGTFGAGLLKEAQLPVIENKVCNRYEFLNGRVQSTELCAGHLAGGTDSCQGDSGGPLVCFEKDKYILQGVTSWGLGCARPNKPGVYVRVSRFVTWIEGVMRNN', 'B': 'APSFDCGKPQVEPKKCPGVVGGCVAHPHSWPWQVSLRTRFGMHFCGGTLISPEWVLTAAHCLEKSPRPSSYKVILGAHQEVNLEPHVQEIEVSRLFLEPTRKDIALLKLSSPAVITDKVIPACLPSPNYVVADRTECFITGWGETQGTFGAGLLKEAQLPVIENKVCNRYEFLNGRVQSTELCAGHLAGGTDSCQGDSGGPLVCFEKDKYILQGVTSWGLGCARPNKPGVYVRVSRFVTWIEGVMRNN', 'C': 'ASYFEPTGPYLMVNVTGVDSKGNELLSPHYVEFPIKPGTTLTKEKIEYYVEWALDATAYKEFRVVELDPSAKIEVTYYDKNKKKEETKSFPITEKGFVVPDLSEHIKNPGFNLITKVVIEKK'} ```

Reading preprocessed datasets

Please cite the corresponding papers if you find datasets useful (see mutils/datasets/README.md).

```python from mutils.data import load_SKEMPI2

Load and preprocess SKEMPI2 dataset

df = loadSKEMPI2()[0] df #Pdb Mutation(s)PDB Mutation(s)cleaned iMutationLocation(s) Holdouttype Holdoutproteins ... dGmut dGwt ddG PDB Id Partner 1 Partner 2 0 1CSEEI LI45G LI38G COR Pr/PI Pr/PI ... -14.022334 -16.302911 2.280577 1CSE E I 1 1CSEEI LI45S LI38S COR Pr/PI Pr/PI ... -15.114136 -16.302911 1.188776 1CSE E I 2 1CSEEI LI45P LI38P COR Pr/PI Pr/PI ... -9.537466 -16.302911 6.765446 1CSE E I 3 1CSEEI LI45I LI38I COR Pr/PI Pr/PI ... -13.320410 -16.302911 2.982502 1CSE E I 4 1CSEEI LI45D LI38D COR Pr/PI Pr/PI ... -11.891069 -16.302911 4.411843 1CSE E I ... ... ... ... ... ... ... ... ... ... ... ... ... ... 7080 3QIBABPCD KP9R KP8R COR TCR/pMHC TCR/pMHC,1JCKAB ... -4.938011 -7.175045 2.237034 3QIB ABP CD 7081 3QIBABPCD TP12A TP11A COR TCR/pMHC TCR/pMHC,1JCKAB ... -4.036047 -7.175045 3.138999 3QIB ABP CD 7082 3QIBABPCD TP12S TP11S COR TCR/pMHC TCR/pMHC,1JCKAB ... -6.099323 -7.175045 1.075723 3QIB ABP CD 7083 3QIBABPCD TP12N TP11N COR TCR/pMHC TCR/pMHC,1JCKAB ... -5.951210 -7.175045 1.223835 3QIB ABP CD 7084 3QIBABPCD YP7F,TP12S YP6F,TP11S COR,COR TCR/pMHC TCR/pMHC,1JCKAB ... -5.958076 -7.175045 1.216970 3QIB ABP CD

[6706 rows x 35 columns] ```

Owner

  • Name: Anton Bushuiev
  • Login: anton-bushuiev
  • Kind: user
  • Location: Prague
  • Company: Czech Technical University in Prague

PhD student. Machine learning / computational biology 🤖🌱

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
Last Year
  • Watch event: 1
  • Push event: 2

Dependencies

requirements.txt pypi
setup.py pypi