https://github.com/bzhanglab/pepquerymhc

PepQueryMHC: Rapid and Comprehensive Tumor Antigen Prioritization from Immunopeptidomics Data

https://github.com/bzhanglab/pepquerymhc

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary

Keywords

immunopeptidomics proteogenomics
Last synced: 6 months ago · JSON representation

Repository

PepQueryMHC: Rapid and Comprehensive Tumor Antigen Prioritization from Immunopeptidomics Data

Basic Info
  • Host: GitHub
  • Owner: bzhanglab
  • License: other
  • Language: Java
  • Default Branch: main
  • Homepage:
  • Size: 238 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 6
Topics
immunopeptidomics proteogenomics
Created almost 2 years ago · Last pushed 8 months ago
Metadata Files
Readme License

README.md

PepQueryMHC


- License

About

The accurate prioritization of tumor antigens, including aberrant translational products, is critical for the development of personalized cancer immunotherapies. PepQueryMHC estimates a comprehensive repertoire of local RNA expression of tumor antigens within minutes per sample.

Usage

PepQueryMHC provides three main functions such as 1) scan mode, 2) target mode 3) FASTQ mode and 4) annotate mode.
When you use FASTQ mode, please make sure that FASTQ files do not contain artifical sequences such as adaptors, barcodes and so on.

Quick start

Scan mode bash java -Xmx2G -jar PepQueryMHC.jar \ --mode scan \ --input peptides.tsv \ --bam sample.sorted.bam \ --output sample \ --thread 16 Target mode bash java -Xmx2G -jar PepQueryMHC.jar \ --mode target \ --input peptides_locations_strands.tsv \ --bam sample.sorted.bam \ --output sample \ --thread 16 FASTQ mode (for single-end) bash java -Xmx2G -jar PepQueryMHC.jar \ --mode fastq \ --input peptides.tsv \ --0 sample.trimmed.fastq.gz \ --output sample \ --strand f \ --thread 16 FASTQ mode (for paired-end) bash java -Xmx2G -jar PepQueryMHC.jar \ --mode fastq \ --input peptides.tsv \ --1 sample.trimmed.fastq.1.gz \ --2 sample.trimmed.fastq.2.gz \ --output sample \ --strand rf \ --thread 16 Annotate mode bash java -Xmx2G -jar PepQueryMHC.jar \ --mode annotate \ --input locations_strands.tsv \ --gtf reference_annotation.gtf \ --output sample

Parameters

Y+: mandatory, Y: optional, N: none |Option | Description | Type | Default | Scan mode | Target mode | FASTQ mode | Annotate mode | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | m/mode | mode to use| scan|target|fastq|annotate | | Y+ | Y+ | Y+ | Y+ | | i/input | input file path| string || Y+ | Y+ | Y+ | Y+ | | o/output | output base name path| string || Y+ | Y+ | Y+ | Y+ | | b/bam | sorted bam/sam file path | bam|sam || Y+ | Y+ | N | N | | 0/fastqsingle | fastq file path | fastq|fastq.gz || N | N | Y+ | N | | 1/fastqpaired1 | fastq file path | fastq|fastq.gz || N | N | Y+ | N | | 2/fastqparied2 | fastq file path | fastq|fastq.gz || N | N | Y+ | N | | g/gtf | gtf file path | string || N | N | N | Y+ | | @/thread | the number of threads | int |4| Y | Y | Y | N | | c/count | tpye of reads being processed | primary|all | primary | Y | Y | N | N | | l/libsize | tsv file including library size information | string | | Y | Y | Y | N | | w/white_list | cell brcode list (tsv), only available in single-cell RNA-seq | string | | Y | Y | Y | N | | p/prob | ignore region of interests with error > p| [0,1] | 0.05 | Y | Y | Y | N | | e/equal | specify isoleucine = leucine | none | | Y | Y | Y | N | | u/union | specify the unit of the peptide read count | sum|max | sum | Y | Y | N | N | | s/strand | specify strandedness. non: non-stranded, fr: fr-second strand, rf: fr-first strand, f: forward strand for single-end, r: reverse strand for single-end, auto: auto-detection. Auto-detection is only available if there is XS tag in a given bam file | non|fr|rf|f|r|auto | auto | Y | Y | Y+ | N | | s/stretch | output single line per annotation | none | | N | N | N | Y | | v/verbose | print every messages being processed | none | | Y | Y | Y | Y |

Scan mode

Input format |Sequence| User-defined column 1| ... | User-defined column N | | :---: | :---: | :---: | :---: | |AACTKLAKKM| any value | ... | any value |

Target mode

Input format |Sequence| Location | Strand |User-defined column 1| ... | User-defined column N | | :---: | :---: | :---: | :---: | :---: | :---: | |AACTKLAKKM| chr1:1-30 | + | any value | ... | any value | |TKMQEPPALY| chr1:31-50|chr1:81-90 | - | any value | ... | any value | |KEKRKAPPR| . | . | any value | ... | any value |

FASTQ mode

Input format |Sequence| User-defined column 1| ... | User-defined column N | | :---: | :---: | :---: | :---: | |AACTKLAKKM| any value | ... | any value | * input format is exactly the same as what used in scan mode.

Annotate mode

Input format | Location | Strand |User-defined column 1| ... | User-defined column N | | :---: | :---: | :---: | :---: | :---: | | chr1:1-30 | + | any value | ... | any value | | chr1:31-50|chr1:81-90 | - | any value | ... | any value | | chr1:21-40|chr1:87-90 | . | any value | ... | any value |

White list

A white list is a set of barcodes selected for inclusion in the analysis of single-cell RNA-seq data.
Input format | Barcode | | :---: | | AAACCTGAGCAATCTC-1 | | AAACCTGAGCGTTTAC-1 | | AAACCTGAGCTGCAAG-1 | | AAACCTGCAAACGTGG-1 | | AAACCTGCAAACTGCT-1 | | AAACCTGCAACTGGCC-1 | | AAACCTGCAAGCCCAC-1 | | AAACCTGCACTTAAGC-1 | | AAACCTGCAGCCACCA-1 |

Figures for the reviewers

Figures in the paper can be generated using: 1) R code in figR folder, 2) Supplementary Tables 1,2,3, and 5, and 3) the meta dataset available at https://doi.org/10.5281/zenodo.14984543.

License

All code is available as under the Attribution-NonCommercial (CC BY-NC) 4.0 license.

Owner

  • Name: Zhang Lab
  • Login: bzhanglab
  • Kind: organization
  • Location: Houston, TX

Translating omics data into biological insights.

GitHub Events

Total
  • Create event: 5
  • Release event: 4
  • Issues event: 2
  • Watch event: 1
  • Delete event: 1
  • Push event: 35
Last Year
  • Create event: 5
  • Release event: 4
  • Issues event: 2
  • Watch event: 1
  • Delete event: 1
  • Push event: 35

Dependencies

pom.xml maven
  • com.github.samtools:htsjdk 4.1.0
  • commons-cli:commons-cli 1.6.0
  • org.ahocorasick:ahocorasick 0.6.3
  • junit:junit 3.8.1 test