paprec_pipeline

Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.

https://github.com/yascoma/paprec_pipeline

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: YasCoMa
License: mit
Language: Python
Default Branch: main
Size: 31.1 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Contributing License Code of conduct Citation

readme.md

PAPreC - Pipeline for Antigenicity Predictor Comparison

Summary

We have developed a comprehensive pipeline for comparing models used in antigenicity prediction. This pipeline encompasses a range of experiment configurations that systematically modify four key parameters: (1) the source dataset, encompassing datasets Bcipep, hla and Protegen (Yang et al. 2011); (2) the alignment-free method employed for generating numerical features; and (3) the utilization of nine distinct classifiers.

Requirements:

Python packages needed:
- pip3 install numpy
- pip3 install scikit-learn
- pip3 install pandas
- pip3 install matplotlib
- pip3 install statistics
- pip3 install boruta
- pip3 install joblib
Or run: conda env create --file paprec_env.yml

Usage Instructions

Preparation:

git clone https://github.com/YasCoMa/paprec_pipeline.git
cd paprec_pipeline
pip3 install -r requirements.txt

Run Screening:

python3 multiple_method_dataset.py
Check the results obtained with those found in our article:
- Bcipep dataset: https://www.dropbox.com/s/8ezeup4xiwb9p7n/bcipep_dataset.zip?dl=0
- HLA dataset: https://www.dropbox.com/s/6vpfgvmsz9vd5r0/hla_dataset.zip?dl=0
- Gram+ dataset: https://www.dropbox.com/s/l5wqpcsp4qc6ret/gram%2B_dataset.zip?dl=0
- Gram- dataset: https://www.dropbox.com/s/cvzrhlselxj9sp5/gram-_dataset.zip?dl=0

Run Comparison in Gram positive and negative bacteria (Optional) :

Download and uncompress the following folder: https://www.dropbox.com/s/27nnwhh1spl2038/gram_comparison.zip?dl=0
python3 comparison_gram.py

Reference

Bug Report

Please, use the Issues tab to report any bug.

Owner

Name: Yasmmin Côrtes Martins
Login: YasCoMa
Kind: user
Location: Rio de Janeiro, Brasil

Repositories: 6
Profile: https://github.com/YasCoMa

I am a scientist who likes and works mainly in the following topics: bioinformatics, semantic web, machine learning.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "PAPC - Pipeline for Antigenicity Predictor Comparison"
version: 1.0.0
date-released: 2023-07-12
url: "https://github.com/YasCoMa/papc_pipeline"
authors:
- family-names: "Martins"
  given-names: "Yasmmin"
  orcid: "https://orcid.org/0000-0002-6830-1948"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

boruta *
joblib *
matplotlib *
numpy *
pandas *
sklearn *
statistics *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science