paprec_pipeline

Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.

https://github.com/yascoma/paprec_pipeline

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.

Basic Info
  • Host: GitHub
  • Owner: YasCoMa
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 31.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Code of conduct Citation

readme.md

PAPreC - Pipeline for Antigenicity Predictor Comparison

Python pipeline to prepare epitopes and protein sequence datasets, extract numerical features from sequence with alignment-free methods, perform model evaluation and test model performance upon feature selection.

Summary

We have developed a comprehensive pipeline for comparing models used in antigenicity prediction. This pipeline encompasses a range of experiment configurations that systematically modify four key parameters: (1) the source dataset, encompassing datasets Bcipep, hla and Protegen (Yang et al. 2011); (2) the alignment-free method employed for generating numerical features; and (3) the utilization of nine distinct classifiers.

pipeline

Requirements:

  • Python packages needed:
    • pip3 install numpy
    • pip3 install scikit-learn
    • pip3 install pandas
    • pip3 install matplotlib
    • pip3 install statistics
    • pip3 install boruta
    • pip3 install joblib
  • Or run: conda env create --file paprec_env.yml

Usage Instructions

Preparation:

  1. git clone https://github.com/YasCoMa/paprec_pipeline.git
  2. cd paprec_pipeline
  3. pip3 install -r requirements.txt

Run Screening:

  1. python3 multiple_method_dataset.py
  2. Check the results obtained with those found in our article:
    • Bcipep dataset: https://www.dropbox.com/s/8ezeup4xiwb9p7n/bcipep_dataset.zip?dl=0
    • HLA dataset: https://www.dropbox.com/s/6vpfgvmsz9vd5r0/hla_dataset.zip?dl=0
    • Gram+ dataset: https://www.dropbox.com/s/l5wqpcsp4qc6ret/gram%2B_dataset.zip?dl=0
    • Gram- dataset: https://www.dropbox.com/s/cvzrhlselxj9sp5/gram-_dataset.zip?dl=0

Run Comparison in Gram positive and negative bacteria (Optional) :

  1. Download and uncompress the following folder: https://www.dropbox.com/s/27nnwhh1spl2038/gram_comparison.zip?dl=0
  2. python3 comparison_gram.py

Reference

Bug Report

Please, use the Issues tab to report any bug.

Owner

  • Name: Yasmmin Côrtes Martins
  • Login: YasCoMa
  • Kind: user
  • Location: Rio de Janeiro, Brasil

I am a scientist who likes and works mainly in the following topics: bioinformatics, semantic web, machine learning.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "PAPC - Pipeline for Antigenicity Predictor Comparison"
version: 1.0.0
date-released: 2023-07-12
url: "https://github.com/YasCoMa/papc_pipeline"
authors:
- family-names: "Martins"
  given-names: "Yasmmin"
  orcid: "https://orcid.org/0000-0002-6830-1948"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • boruta *
  • joblib *
  • matplotlib *
  • numpy *
  • pandas *
  • sklearn *
  • statistics *