preselantest_pipeline
Python3 pipeline to predict t-cell epitopes or parse b-cell bepipred results, filter epitopes and perform multiple-model antigenicity test.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
Python3 pipeline to predict t-cell epitopes or parse b-cell bepipred results, filter epitopes and perform multiple-model antigenicity test.
Basic Info
- Host: GitHub
- Owner: YasCoMa
- License: mit
- Language: Python
- Default Branch: master
- Size: 66.7 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
readme.md
PreSelAnTest - Epitope prediction, selection and multiple model based antigenicity test
Python3 pipeline to predict t-cell epitopes or parse b-cell bepipred results, filter epitopes and perform multiple-model antigenicity test.
Summary
We have developed a comprehensive pipeline for prediction, simple curation and antigenicity prediction using multiple models. This pipeline contains the following functions: (1) Prediction of t-cell epitopes with netmhcpanII, with desired hla alleles, filtering the input fasta to select aa sequences according to this tool specifications; it also parse the results to organize the input for the other steps of the pipeline. Alternatively, it parses the fasta files aleady predicted by the b-cell bepipred3 predictor for linear epitopes. (2) Curation of these epitopes according to rank percentile of binding affinity (only t-cell), promiscuity of mhc alleles (only t-cell), overlapping with iedb epitopes (only t-cell), overlapping in protegen database bacterial protein sequences, and overlapping of epitopes with human proteins. (3) Antigenicity prediction for these epitopes according to paprec workflow
Requirements:
- Python packages needed:
- pip3 install numpy
- pip3 install sklearn=1.2.2
- pip3 install pandas
- pip3 install matplotlib
- pip3 install statistics
- pip3 install python-Levenshtein
- pip3 install boruta
- pip3 install joblib
Usage Instructions
Preparation:
git clone https://github.com/YasCoMa/preselantest_pipeline.gitcd preselantest_pipelinepip3 install -r requirements.txt- Umcompress trained_models.tar.xz (Provided pickle models were generated on python 3.9 with sklearn 1.2.2)
- The main input file is the configuration in json, there is an example for b-cell (configcarb.json) and for t-cell (config.json). The second parameter of each function file is the step used in (1) and (2). Only in the predictions1.py, there is a third line command argument corresponding to the nemhcpanII tool path, only required if you are using for t-cell epitope prediction
(1) Run Prediction of epitopes or parsing of prediction results:
- Configuration variables:
- folder_in: path to the working directory
- cell_type: cell type for the epitopes (t-cell or b-cell), value is b or t
- hlas: it is used only for t-cell, list of mhc alleles
- bepipred3_output: it is used only for b-cell, bepipred3 fasta output (must be in the working directory)
- Run all:
python3 prediction_s1.py config_carb.json 0
- Run only prediction:
python3 prediction_s1.py config.json 1 /path/to/nemhcpanII
- Run bepipred3 parsing:
python3 prediction_s1.py config_carb.json 1
- Run t-cell results parsing:
python3 prediction_s1.py config.json 2 /path/to/nemhcpanII
(2) Run epitope curation:
- Configuration variables:
- folder_in: path to the working directory
- cell_type: cell type for the epitopes (t-cell or b-cell), value is b or t
- thresholdsimiedb: it is used only for t-cell, threshold of similarity between iedb and prediction epitopes
- threshold_alleles: it is used only for t-cell, threshold of promiscuity of a epitope ith affinity predicted for multiple mhc alleles
- threshold_rank: it is used only for t-cell, threshold of percentile rank to filter results table from netmhcpanII
- Run selection:
python3 curation_s2.py config_carb.json 1
- Run vaxijen2 summary output table parsing (only check Summary Mode in web server, copy the results in the webpage and save as vaxijen{epitope|protein}results.txt in {folder_in}/), this step must be made after the selection process:
python3 curation_s2.py config_carb.json 2
(3) Run epitope antigenicity prediction:
- Configuration variables:
- folder_in: path to the working directory
- cell_type: cell type for the epitopes (t-cell or b-cell), value is b or t
- controlepitope: it is not mandatory, only use if you want to compare the results with vaxijen (if you used step 2 of curations2.py, the value should be vaxijenepitopeprediction.tsv)
- control_protein: it is not mandatory, only use if you want to compare the results with vaxign-ml
- For the control files, you may compare with any other tool, you just have to be sure to format according to the model, using the columns:
- pepid: identifier for epitope/protein
- sequence: aa sequence
- value: original score of the tool
- class: classification in binary number (1 or 0)
- class_name: antigen or non-antigen
- Run selection:
python3 antigenicity_prediction_s3.py config_carb.json
Run all steps
- There is an example of shell file to execute all steps for the config_carb.json file, adapt it according to your configuration file name and the steps you want to run with PreSelAnTest
chmod a+x run_all.shbash run_all.sh
Reference
Bug Report
Please, use the Issues tab to report any bug.
Owner
- Name: Yasmmin Côrtes Martins
- Login: YasCoMa
- Kind: user
- Location: Rio de Janeiro, Brasil
- Repositories: 6
- Profile: https://github.com/YasCoMa
I am a scientist who likes and works mainly in the following topics: bioinformatics, semantic web, machine learning.
GitHub Events
Total
Last Year
Dependencies
- boruta *
- joblib *
- matplotlib *
- numpy *
- pandas *
- python-Levenshtein *
- sklearn *
- statistics *