paprec_pipeline
Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.
Basic Info
- Host: GitHub
- Owner: YasCoMa
- License: mit
- Language: Python
- Default Branch: main
- Size: 31.1 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
readme.md
PAPreC - Pipeline for Antigenicity Predictor Comparison
Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.
Summary
We have developed a comprehensive pipeline for comparing models used in antigenicity prediction. This pipeline encompasses a range of experiment configurations that systematically modify four key parameters: (1) the source dataset, encompassing datasets Bcipep, hla and Protegen (Yang et al. 2011); (2) the alignment-free method employed for generating numerical features; and (3) the utilization of nine distinct classifiers.
Requirements:
- Python packages needed:
- pip3 install numpy
- pip3 install scikit-learn
- pip3 install pandas
- pip3 install matplotlib
- pip3 install statistics
- pip3 install boruta
- pip3 install joblib
- Or run: conda env create --file paprec_env.yml
Usage Instructions
Preparation:
git clone https://github.com/YasCoMa/paprec_pipeline.gitcd paprec_pipelinepip3 install -r requirements.txt
Run Screening:
python3 multiple_method_dataset.py- Check the results obtained with those found in our article:
- Bcipep dataset: https://www.dropbox.com/s/8ezeup4xiwb9p7n/bcipep_dataset.zip?dl=0
- HLA dataset: https://www.dropbox.com/s/6vpfgvmsz9vd5r0/hla_dataset.zip?dl=0
- Gram+ dataset: https://www.dropbox.com/s/l5wqpcsp4qc6ret/gram%2B_dataset.zip?dl=0
- Gram- dataset: https://www.dropbox.com/s/cvzrhlselxj9sp5/gram-_dataset.zip?dl=0
Run Comparison in Gram positive and negative bacteria (Optional) :
- Download and uncompress the following folder: https://www.dropbox.com/s/27nnwhh1spl2038/gram_comparison.zip?dl=0
python3 comparison_gram.py
Reference
Bug Report
Please, use the Issues tab to report any bug.
Owner
- Name: Yasmmin Côrtes Martins
- Login: YasCoMa
- Kind: user
- Location: Rio de Janeiro, Brasil
- Repositories: 6
- Profile: https://github.com/YasCoMa
I am a scientist who likes and works mainly in the following topics: bioinformatics, semantic web, machine learning.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." title: "PAPC - Pipeline for Antigenicity Predictor Comparison" version: 1.0.0 date-released: 2023-07-12 url: "https://github.com/YasCoMa/papc_pipeline" authors: - family-names: "Martins" given-names: "Yasmmin" orcid: "https://orcid.org/0000-0002-6830-1948"
GitHub Events
Total
Last Year
Dependencies
- boruta *
- joblib *
- matplotlib *
- numpy *
- pandas *
- sklearn *
- statistics *