pridepy

pridepy: A Python package to download and search data from PRIDE database - Published in JOSS (2025)

https://github.com/pride-archive/pridepy

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    2 of 6 committers (33.3%) from academic institutions
  • Institutional organization owner
    Organization pride-archive has institutional domain (www.ebi.ac.uk)
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

mass-spectrometry pride pride-database proteomics python python-client
Last synced: 4 months ago · JSON representation ·

Repository

Python client for PRIDE Archive Rest API.

Basic Info
  • Host: GitHub
  • Owner: PRIDE-Archive
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 35.1 MB
Statistics
  • Stars: 24
  • Watchers: 6
  • Forks: 4
  • Open Issues: 0
  • Releases: 7
Topics
mass-spectrometry pride pride-database proteomics python python-client
Created almost 7 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

pridepy: A Python package to download and search data from PRIDE database

Python package PyPI version PyPI - Downloads

Python Client library for PRIDE Rest API

Installation

From PyPI

To install, simply use pip:

bash $ pip install --upgrade pridepy

From Source

First, clone the repository on your local machine and then install the package using pip:

bash $ git clone https://github.com/PRIDE-Archive/pridepy $ cd pridepy $ poetry build $ pip install dist/*.whl

Install with setup.py:

bash $ git clone https://github.com/PRIDE-Archive/pridepy $ cd pridepy $ poetry build $ pip install dist/pridepy-{version}.tar.gz

Usage and Documentation

This Python CLI tool, built using the Click module, already provides detailed usage instructions for each command. To avoid redundancy and potential clutter in this README, you can access the usage instructions directly from the CLI Use the below command to view a list of commands available:

```bash $ pridepy --help Usage: pridepy [OPTIONS] COMMAND [ARGS]...

Options: --help Show this message and exit.

Commands: download-all-public-raw-files Download all public raw files... download-file-by-name Download a single file from a... get-files-by-filter get paged files :return: get-files-by-project-accession get files by project accession... get-private-files Get private files by project... get-projects get paged projects :return: get-projects-by-accession get projects by accession... stream-files-metadata Stream all files metadata in... stream-projects-metadata Stream all projects metadata... search-projects-by-keywords-and-filters Search all projects by keywords...

```

[!NOTE] Please make sure you are using Python3, not Python 2.7 version.

Downloading a project from PRIDE Archive

The main purpose of this tool is to download data from the PRIDE Archive. Here, how to download all the raw files from a dataset(eg: PXD012353).

bash $ pridepy download-all-public-raw-files -a PXD012353 -o /Users/yourname/Downloads/foldername/ -p aspera - -a flag is used to specify the project accession number. - -o flag is used to specify the output directory. - -p flag is used to specify the protocol (aspera, ftp, globus)

[!IMPORTANT] Currently, pridepy supports multiple protocols for downloading including ftp, aspera, globus, s3. ftp, aspera uses those protocols to download the files; the pridepy includes the aspera client. For globus and s3, the tool uses https of both services endpoints. Read the whitepaper to know more about the performance of each protocol.

Additional options:

  • -skip flag is used to skip the download of files that already exist in the output directory.
  • --aspera_maximum_bandwidth flag is used to specify the maximum bandwidth for the Aspera download. The default value is 100M.
  • --checksum_check flag is used to check the checksum of the downloaded files. The default value is False.

Download single file by name

Users instead of downloading an entire project files may be interested in downloading a single file if they know it by name. Here is how to download a single file by name.

bash $ pridepy download-file-by-name -a PXD022105 -o /Users/yourname/Downloads/foldername/ -f checksum.txt -p globus

Please be aware that the additional parameters are the same as the previous command Downloading a project from PRIDE Archive.

Download project files by category

Users may be interested in downloading files by category. Here is how to download files by category. The different categories are available in the PRIDE Archive:

  • RAW: Raw data files
  • PEAK: Peak list files
  • SEARCH: Search engine output files
  • OTHER: Other files
  • RESULT: Result files
  • SPECTRUM LIBRARIES: Spectrum libraries
  • FASTA: FASTA files

bash $ pridepy download-files-by-category -a PXD022105 -o /Users/yourname/Downloads/foldername/ -c RAW -p ftp

Please be aware that the additional parameters are the same as the previous command Downloading a project from PRIDE Archive.

[!IMPORTANT] We also implemented a direct command to download RAW files from a project which is the most common use case.

Download private files

Users and especially reviewers may be interested in downloading private files. Here is how to download private files.

First, the user can list the private files of a project:

bash $ pridepy list-private-files -a PXD022105 -u yourusername -p yourpassword

This command will list the private files of the project PXD022105. Including the file name, file size, and download link.

Then the user can download the private files:

bash $ pridepy download-file-by-name -a PXD022105 -o /Users/yourname/Downloads/foldername/ --username yourusername --password yourpassword -f checksum.txt

[!WARNING] To download preivate files, the user should use the same command as downloading a single file by name. The only difference is that the user should provide the username and password. However, protocol in this case is unnecessary as the tool will use the https protocol to download the files. At the moment we only allow this protocol because of the infrastructure of PRIDE private files (read the whitepaper for more information).

Streaming metadata

One of the great features of PRIDE and pridepy is the ability to stream metadata of all projects and files. This is useful for users who want to analyze the metadata of all projects and files locally.

Stream metadata of all projects as JSON and write it to a file:

bash $ pridepy stream-projects-metadata -o all_pride_projects.json

Stream all files metadata in a specific project as JSON and write it to a file:

bash $ pridepy stream-files-metadata -o all_pride_files_metadata.json Stream the files metadata of a specific project as JSON and write it to a file:

bash $ pridepy stream-files-metadata -o PXD005011_files.json -a PXD005011

Search projects by keywords and filters

Get the Project metadata by keywords and filters

bash $ python -m pridepy.pridepy search-projects-by-keywords-and-filters -f projectTags==Proteometools,organismsPart==Pancreas -k human -sd DESC -sf accession -sf submissionDate

White paper

A white paper is available at here. We can build it as PDF using pandoc.

bash $docker run --rm --platform linux/amd64 -v /Users/yperez/work/pridepy/paper/:/data -w /data openjournals/inara:latest paper.md -p -o pdf

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement."

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Citation

Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563

Zenodo DOI

Owner

  • Name: PRIDE-Resources
  • Login: PRIDE-Archive
  • Kind: organization
  • Email: pride-support@ebi.ac.uk

PRIDE: Proteomics resources

JOSS Publication

pridepy: A Python package to download and search data from PRIDE database
Published
March 24, 2025
Volume 10, Issue 107, Page 7563
Authors
Selvakumar Kamatchinathan ORCID
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Suresh Hewapathirana ORCID
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Chakradhar Bandla ORCID
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Santiago Insua ORCID
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Juan Antonio Vizcaíno ORCID
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Yasset Perez-Riverol ORCID
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Editor
Sehrish Kanwal ORCID
Tags
proteomics mass spectrometry pride archive big data

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as follows."
authors:
  - family-names: Kamatchinathan
    given-names: Selvakumar
  - family-names: Hewapathirana
    given-names: Suresh
  - family-names: Bandla
    given-names: Chakradhar
  - family-names: Insua
    given-names: Santiago
  - family-names: Vizcaíno
    given-names: Juan Antonio
  - family-names: Perez-Riverol
    given-names: Yasset
title: "pridepy: A Python package to download and search data from PRIDE database"
version: "10.0.107"
doi: "10.21105/joss.07563"
date-released: "2025-01-01"
repository-code: "https://github.com/your-repo/pridepy"
preferred-citation:
  type: article
  authors:
    - family-names: Kamatchinathan
      given-names: Selvakumar
    - family-names: Hewapathirana
      given-names: Suresh
    - family-names: Bandla
      given-names: Chakradhar
    - family-names: Insua
      given-names: Santiago
    - family-names: Vizcaíno
      given-names: Juan Antonio
    - family-names: Perez-Riverol
      given-names: Yasset
  title: "pridepy: A Python package to download and search data from PRIDE database"
  journal: "Journal of Open Source Software"
  volume: "10"
  issue: "107"
  year: "2025"
  pages: "7563"
  doi: "10.21105/joss.07563"
  url: "https://doi.org/10.21105/joss.07563"

GitHub Events

Total
  • Create event: 9
  • Release event: 3
  • Issues event: 7
  • Watch event: 7
  • Delete event: 7
  • Issue comment event: 30
  • Push event: 75
  • Pull request review event: 31
  • Pull request review comment event: 14
  • Pull request event: 39
Last Year
  • Create event: 9
  • Release event: 3
  • Issues event: 7
  • Watch event: 7
  • Delete event: 7
  • Issue comment event: 30
  • Push event: 75
  • Pull request review event: 31
  • Pull request review comment event: 14
  • Pull request event: 39

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 243
  • Total Committers: 6
  • Avg Commits per committer: 40.5
  • Development Distribution Score (DDS): 0.222
Past Year
  • Commits: 197
  • Committers: 4
  • Avg Commits per committer: 49.25
  • Development Distribution Score (DDS): 0.152
Top Committers
Name Email Commits
Yasset Perez-Riverol y****l@g****m 189
selvakumar kamatchinathan s****9@g****m 22
selva s****a@e****k 12
Chakradhar Bandla c****a@g****m 10
Suresh Hewapathirana h****a@e****k 8
sureshhewa s****i@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 9
  • Total pull requests: 74
  • Average time to close issues: 10 months
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 6
  • Total pull request authors: 4
  • Average comments per issue: 1.78
  • Average comments per pull request: 1.93
  • Merged pull requests: 70
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 66
  • Average time to close issues: about 1 month
  • Average time to close pull requests: about 2 hours
  • Issue authors: 2
  • Pull request authors: 3
  • Average comments per issue: 2.5
  • Average comments per pull request: 2.15
  • Merged pull requests: 62
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gessulat (2)
  • ypriverol (2)
  • sureshhewabi (2)
  • Alexander-Sol (1)
  • cramacha (1)
  • MohmedSoudy (1)
Pull Request Authors
  • ypriverol (84)
  • selvaebi (17)
  • chakrabandla (2)
  • sureshhewabi (1)
Top Labels
Issue Labels
bug (2) enhancement (2) use-case (1)
Pull Request Labels
enhancement (31) Review effort [1-5]: 2 (20) documentation (19) Review effort [1-5]: 1 (12) dependencies (8) Review effort [1-5]: 3 (8) Review effort [1-5]: 4 (6) other (5) bug_fix (4) configuration changes (2) tests (2) Review effort 1/5 (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 106 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 1
pypi.org: pridepy

Python Client library for PRIDE Rest API

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 106 Last month
Rankings
Dependent packages count: 10.1%
Forks count: 16.9%
Stargazers count: 17.1%
Dependent repos count: 21.6%
Average: 22.5%
Downloads: 46.9%
Maintainers (1)
Last synced: 4 months ago

Dependencies

requirements.txt pypi
  • click *
  • plotly *
  • pytest *
  • ratelimit *
  • requests *
  • setuptools *
setup.py pypi
  • click *
  • plotly *
  • pytest *
  • ratelimit *
  • requests *
  • setuptools *
.github/workflows/python-app.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/python-package.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
Dockerfile docker
  • python 3 build