https://github.com/bigbio/pgatk-io

High performance io library for proteogenomics

https://github.com/bigbio/pgatk-io

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

fileformats mass-spectrometry parquet proteogenomics proteomics spark
Last synced: 5 months ago · JSON representation

Repository

High performance io library for proteogenomics

Basic Info
  • Host: GitHub
  • Owner: bigbio
  • License: apache-2.0
  • Language: Java
  • Default Branch: master
  • Homepage:
  • Size: 72.4 MB
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
fileformats mass-spectrometry parquet proteogenomics proteomics spark
Created almost 7 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

pgatk-io

Java CI with Maven License

About pgatk-io

The pgatk-io library is a java framework to manipulate mass spectrometry and proteomics file formats. It has an special focus on novel file formats like Apache Spark Parquet and Json file formats for proteomics.

Support Matrix

This table summarizes the current level of support for each feature across the different file formats. See discussion below for details on each feature.

| Feature | MGF | APL (Maxquant) | mzXML | mzML | PRIDE Json |Pep Avro | | ---------------------|--------------------|------------------------|---------------------|----------------------|---------------------|------------------------| | Random Access | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | | | Fast Iterable Access | :heavycheckmark: | :whitecheckmark: | :heavycheckmark: | :x: | :x: | :heavycheckmark: | | Gzip Support | :x: | :x: | :x: | :x: |:x: | | | Numpress Support | :x: | :x: | :whitecheckmark: | :whitecheckmark: |:x: | |

File formats

  • MGF - http://www.matrixscience.com/help/datafilehelp.html
  • mzXML - http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML#:~:text=mzXML%20is%20an%20open%20data,foundation%20of%20our%20proteomic%20pipelines
  • mzML - https://www.psidev.info/mzML
  • ArchiveSpectrum (PRIDE Json) - http://bigbio.xyz/pgatk-io/io/github/bigbio/pgatk/io/pride/ArchiveSpectrum.html
  • AnnotatedSpectrum (Pep Avro) - http://bigbio.xyz/pgatk-io/io/github/bigbio/pgatk/io/pride/AnnotatedSpectrum.html

License

pgatk-io is licensed under Apache License 2.0.

Main Features

  • Based on a custom build class to efficiently parse text files line by line all parsers can handle arbitrary large files in minimal memory, allowing easy and efficient processing of peak list files using the Java programming language.

  • For every implementation a Random Access and Iterable Access Reader is provided.

    • In the Random access developers can access to any individual Spectrum using the Identifier of the Spectrum or the index.
    • In the Iterable access developers can access one by one each of the spectra with the next function

Getting Help

If you have questions or need additional help, please create an issue in the library repo in github (https://github.com/bigbio/pgatk-io/issues). Please send us your feedback, including error reports, improvement suggestions, new feature requests and any other things you might want to suggest.

Similar libraries:

  • ms-data-core-api Perez-Riverol Y., Uszkoreit J., Sanchez A., Ternent T., Del Toro N., Hermjakob H., Vizcaíno J.A., Wang R. ms-data-core-api: an open-source, metadata-oriented library for computational proteomics. Bioinformatics, 2015 Sep 1;31(17):2903-5 ms-data-core-api

  • jmzReader Griss J, Reisinger F, Hermjakob H, Vizcaíno JA. jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats. Proteomics. 2012 Mar;12(6):795-8. doi: 10.1002/pmic.201100578.

Owner

  • Name: BigBio Stack
  • Login: bigbio
  • Kind: organization
  • Email: proteomicsstack@gmail.com
  • Location: Cambridge, UK

Provide big data solutions Bioinformatics

GitHub Events

Total
Last Year

Dependencies

.github/workflows/maven-release-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-java v1 composite
.github/workflows/maven-snapshopt-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-java v1 composite
.github/workflows/maven.yml actions
  • actions/checkout v2 composite
  • actions/setup-java v1 composite
pom.xml maven
  • com.fasterxml.jackson.core:jackson-annotations 2.11.1
  • com.fasterxml.jackson.core:jackson-core 2.11.1
  • com.fasterxml.jackson.core:jackson-databind 2.11.1
  • com.fasterxml.jackson.module:jackson-module-paranamer 2.11.1
  • com.spotify.sparkey:sparkey 3.0.1
  • com.sun.xml.bind:jaxb-core 2.3.0.1
  • com.sun.xml.bind:jaxb-impl 2.3.2
  • io.github.bigbio.pgatk:pgatk-utilities 1.0.1-SNAPSHOT
  • junit:junit 4.13.1
  • net.openhft:chronicle-map 3.17.1
  • org.apache.avro:avro 1.10.2
  • org.ehcache:ehcache 3.8.1
  • org.hamcrest:hamcrest-core 1.3
  • org.hamcrest:hamcrest-library 1.3
  • org.iq80.leveldb:leveldb 0.10
  • org.mapdb:mapdb 3.0.8
  • org.mockito:mockito-core 1.10.19
  • org.projectlombok:lombok 1.18.2
  • org.slf4j:jcl-over-slf4j 1.7.25
  • org.slf4j:slf4j-api 1.7.25
  • org.zeroturnaround:zt-zip 1.13
  • org.zoodb:zoodb 0.5.2
  • uk.ac.ebi.jmzml:jmzml 1.7.11