minsepie

Modelling insertion efficiency for Prime Insertion Experiments

https://github.com/julianeweller/minsepie

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Keywords

bioinformatics crispr-cas9 gene-editing

Last synced: 11 months ago · JSON representation ·

Repository

Modelling insertion efficiency for Prime Insertion Experiments

Basic Info

Host: GitHub
Owner: julianeweller
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 27.7 MB

Statistics

Stars: 13
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 2

Topics

bioinformatics crispr-cas9 gene-editing

Created over 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

MinsePIE :pie:

To predict insertion rates, also check out the MinsePIE online tool on elixir.ut.ee/minsepie/.

Modelling insertion efficiency for Prime Insertion Experiments

Alt Text

Writing short sequences into the genome with prime eiditng faciliates protein tagging, correction of pathogenic deletions and many more exciting applications. We studied the features that influence insertion efficiency and built a model to predict insertion rates based on the insert sequence. This helps users to choose optimal contructs for DNA insertion with prime editing.

The provided model "MinsePIE.sav" was trained on 22974 events: a libary of 2,666 insert sequences up to 69 nt in length in four genomic sites (CLYBL, EMX1, FANCF, HEK3) in three human cell lines, using the PE2 prime editing system.

System requirements

Python 3.8
Python packages: argparse (1.4.0), more_itertools (8.12.0),biopython (1.79), scikit-learn (0.24.2), scipy (1.5.3), XGBoost (1.5.0), pandas (1.3.4), pandarallel (1.5.4), regex (2021.8.3), RNAlib-2.4.18

If you encounter problems setting up the environment or packages, please check out the detailed description for installing the packages in the scripts folder.

Usage guide

The MinsePIE tools are constantly improving. Therefore, it is recommended to run clone the github repository and update it frequently:

```

clone

git clone https://github.com/julianeweller/MinsePIE.git

update

git pull

install MinsePIE: go into the folder with setup.py

pip install .

```

Here is an example on how to use minsepie in python: ``` import minsepie minsepie.predict(['TGTCA'], pbs = 'CAGACTGAGCACG', ha = 'TGATGGCAGAGGAAAGGAAGCCCTGCTTCCTCCA', spacer = 'GGCCCAGACTGAGCACGTGA', mmr = 0, outdir = "./")

```

Python API

Prediction

minsepie.predict(insert, fasta = None, pbs = None, ha = None, spacer = None, halen = 15, pbslen = 13, spclen = 20, mmr = 0, inputmode = None, cellline = None, outdir = None, mean = None, std = None, model = None)

Predicts editing outcomes for insert sequences based on pegRNA features given individually or determined from fasta sequence. Provide either fasta (with optionally rttlen, pbslen, spclen) or pbs + rtt + spacer.

| Parameter | Type | Description | | ------------- | ------------- | ------------- | | insert | list | Insert sequences to be tested| | fasta | file | Fasta file with target sequences| | pbs | str | Primer binding site sequence for the pegRNA| | ha | str | Homology arm for reverse transcriptase template covering the homology sequence. This does not include the new sequence to be inserted| | spacer | str | pegRNA spacer| | halen | int |Length of the RTT. Only needed if target site is provided as fasta. | | pbslen | int |Length of the PBS. Only needed if target site is provided as fasta. | | spclen | int | Length of the spacer. Only needed if target site is provided as fasta.| | mmr | int | Mismatch repair proficiency of cell line. 0: MMR deficient. 1: MMR proficient| |Inputmode |“dna”, “protein”, or None|Insert sequence can either be nucleotides or amino acids. If none, default is DNA. | |cellline| str, None |Instead of providing the MMR status directly, cell line can be provided and MMR status is determined based on reference file.| |outdir|dir|Output directory| |mean|int, None| Expected mean editing rate for the prime editing screen used to scale the z-factor to an insertion rate.| |std| int, None|Expected standard deviation for the prime editing screen used to scale the z-factor to an insertion rate. |model|str, None| Model used to predict editing efficiency.

Returns request as dataframe with features and prediction

Example: predict([“TGTCA”], pbs = “CAGACTGAGCACG”, ha = “TGATGGCAGAGGAAAGGAAGCCCTGCTTCCTCCA”, spacer = “GGCCCAGACTGAGCACGTGA”, mmr = 0)

Reference

Prediction of prime editing insertion efficiencies using sequence features and DNA repair determinants
Jonas Koeppel, Juliane Weller, Elin Madli Peets, Ananth Pallaseni, Ivan Kuzmin, Uku Raudvere, Hedi Peterson, Fabio Giuseppe Liberante & Leopold Parts
Nat Biotechnol (2023)
doi: https://doi.org/10.1101/2021.11.10.468024

Owner

Name: Juliane Weller
Login: julianeweller
Kind: user
Location: Cambridge, UK
Company: Wellcome Sanger Institute

Twitter: JulianeWeller
Repositories: 5
Profile: https://github.com/julianeweller

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Weller
    given-names: Juliane
    orcid: https://orcid.org/0000-0002-1310-6168
  - family-names: Pallaseni
    given-names: Ananth
    orcid: https://orcid.org/0000-0002-4840-195X
  - family-names: Koeppel
    given-names: Jonas
    orcid: https://orcid.org/0000-0003-1306-3994
  - family-names: Peets
    given-names: Elin Madli
  - family-names: Kuzmin
    given-names: Ivan
  - family-names: Raudvere
    given-names: Uku
  - family-names: Peterson
    given-names: Hedi
    orcid: https://orcid.org/0000-0001-9951-5116
  - family-names: Liberante
    given-names: Fabio
    orcid: https://orcid.org/0000-0002-0192-5385
  - family-names: Parts
    given-names: Leopold
    orcid: https://orcid.org/0000-0002-2618-670X
title: "MinsePIE: Modelling insertion efficiency for Prime Insertion Experiments"
version: 3.0
doi: 10.5281/zenodo.7505833
date-released: 2022-12-03

GitHub Events

Total

Last Year

Dependencies

setup.py pypi

biopython >=1.79
datetime *
more-itertools >=8.12
more_itertools >=8.12.0
numpy *
pandarallel ==1.5.4
pandas >=1.3
psutil ==5.9.0
regex >=2021.8
scikit-learn >=0.24
scipy >=1.5
viennarna >=2.5.0a1
xgboost ==1.5.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

minsepie

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

MinsePIE :pie:

Modelling insertion efficiency for Prime Insertion Experiments

Usage guide

clone

update

install MinsePIE: go into the folder with setup.py

Python API

Prediction

Reference

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies