preselantest_pipeline

Python3 pipeline to predict t-cell epitopes or parse b-cell bepipred results, filter epitopes and perform multiple-model antigenicity test.

https://github.com/yascoma/preselantest_pipeline

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary

Last synced: 7 months ago · JSON representation

Repository

Python3 pipeline to predict t-cell epitopes or parse b-cell bepipred results, filter epitopes and perform multiple-model antigenicity test.

Basic Info

Host: GitHub
Owner: YasCoMa
License: mit
Language: Python
Default Branch: master
Size: 66.7 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Code of conduct Citation

PreSelAnTest - Epitope prediction, selection and multiple model based antigenicity test

Python3 pipeline to predict t-cell epitopes or parse b-cell bepipred results, filter epitopes and perform multiple-model antigenicity test.

Summary

We have developed a comprehensive pipeline for prediction, simple curation and antigenicity prediction using multiple models. This pipeline contains the following functions: (1) Prediction of t-cell epitopes with netmhcpanII, with desired hla alleles, filtering the input fasta to select aa sequences according to this tool specifications; it also parse the results to organize the input for the other steps of the pipeline. Alternatively, it parses the fasta files aleady predicted by the b-cell bepipred3 predictor for linear epitopes. (2) Curation of these epitopes according to rank percentile of binding affinity (only t-cell), promiscuity of mhc alleles (only t-cell), overlapping with iedb epitopes (only t-cell), overlapping in protegen database bacterial protein sequences, and overlapping of epitopes with human proteins. (3) Antigenicity prediction for these epitopes according to paprec workflow

Requirements:

Python packages needed:
- pip3 install numpy
- pip3 install sklearn=1.2.2
- pip3 install pandas
- pip3 install matplotlib
- pip3 install statistics
- pip3 install python-Levenshtein
- pip3 install boruta
- pip3 install joblib

Usage Instructions

Preparation:

git clone https://github.com/YasCoMa/preselantest_pipeline.git
cd preselantest_pipeline
pip3 install -r requirements.txt
Umcompress trained_models.tar.xz (Provided pickle models were generated on python 3.9 with sklearn 1.2.2)
The main input file is the configuration in json, there is an example for b-cell (configcarb.json) and for t-cell (config.json). The second parameter of each function file is the step used in (1) and (2). Only in the predictions1.py, there is a third line command argument corresponding to the nemhcpanII tool path, only required if you are using for t-cell epitope prediction

(1) Run Prediction of epitopes or parsing of prediction results:

Configuration variables:
- folder_in: path to the working directory
- cell_type: cell type for the epitopes (t-cell or b-cell), value is b or t
- hlas: it is used only for t-cell, list of mhc alleles
- bepipred3_output: it is used only for b-cell, bepipred3 fasta output (must be in the working directory)
Run all:
- python3 prediction_s1.py config_carb.json 0
Run only prediction:
- python3 prediction_s1.py config.json 1 /path/to/nemhcpanII
Run bepipred3 parsing:
- python3 prediction_s1.py config_carb.json 1
Run t-cell results parsing:
- python3 prediction_s1.py config.json 2 /path/to/nemhcpanII

(2) Run epitope curation:

Configuration variables:
- folder_in: path to the working directory
- cell_type: cell type for the epitopes (t-cell or b-cell), value is b or t
- thresholdsimiedb: it is used only for t-cell, threshold of similarity between iedb and prediction epitopes
- threshold_alleles: it is used only for t-cell, threshold of promiscuity of a epitope ith affinity predicted for multiple mhc alleles
- threshold_rank: it is used only for t-cell, threshold of percentile rank to filter results table from netmhcpanII
Run selection:
- python3 curation_s2.py config_carb.json 1
Run vaxijen2 summary output table parsing (only check Summary Mode in web server, copy the results in the webpage and save as vaxijen{epitope|protein}results.txt in {folder_in}/), this step must be made after the selection process:
- python3 curation_s2.py config_carb.json 2

(3) Run epitope antigenicity prediction:

Configuration variables:
- folder_in: path to the working directory
- cell_type: cell type for the epitopes (t-cell or b-cell), value is b or t
- controlepitope: it is not mandatory, only use if you want to compare the results with vaxijen (if you used step 2 of curations2.py, the value should be vaxijenepitopeprediction.tsv)
- control_protein: it is not mandatory, only use if you want to compare the results with vaxign-ml
- For the control files, you may compare with any other tool, you just have to be sure to format according to the model, using the columns:
  - pepid: identifier for epitope/protein
  - sequence: aa sequence
  - value: original score of the tool
  - class: classification in binary number (1 or 0)
  - class_name: antigen or non-antigen
Run selection:
- python3 antigenicity_prediction_s3.py config_carb.json

Run all steps

There is an example of shell file to execute all steps for the config_carb.json file, adapt it according to your configuration file name and the steps you want to run with PreSelAnTest
chmod a+x run_all.sh
bash run_all.sh

Reference

Bug Report

Please, use the Issues tab to report any bug.

Owner

Name: Yasmmin Côrtes Martins
Login: YasCoMa
Kind: user
Location: Rio de Janeiro, Brasil

Repositories: 6
Profile: https://github.com/YasCoMa

I am a scientist who likes and works mainly in the following topics: bioinformatics, semantic web, machine learning.

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

boruta *
joblib *
matplotlib *
numpy *
pandas *
python-Levenshtein *
sklearn *
statistics *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

preselantest_pipeline

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

readme.md

PreSelAnTest - Epitope prediction, selection and multiple model based antigenicity test

Summary

Requirements:

Usage Instructions

Preparation:

(1) Run Prediction of epitopes or parsing of prediction results:

(2) Run epitope curation:

(3) Run epitope antigenicity prediction:

Run all steps

Reference

Bug Report

Owner

GitHub Events

Total

Last Year

Dependencies