rcsb-api

Python interface for RCSB.org API services

https://github.com/rcsb/py-rcsb-api

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Python interface for RCSB.org API services

Basic Info
Statistics
  • Stars: 45
  • Watchers: 9
  • Forks: 7
  • Open Issues: 4
  • Releases: 0
Created almost 2 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Codemeta

README.md

PyPi Release Build Status Documentation Status DOI OpenSSF Best Practices FAIR checklist badge fair-software.eu

   rcsb-api: Python Toolkit for Accessing RCSB.org APIs

Python interface for RCSB Protein Data Bank API services at RCSB.org.

Installation

This package requires Python 3.8 or later.

Get it from PyPI:

pip install rcsb-api

Or, download from GitHub and install locally:

git clone https://github.com/rcsb/py-rcsb-api.git
cd py-rcsb-api
pip install .

Getting Started

Full documentation available at readthedocs.

The RCSB PDB Search API supports RESTful requests according to a defined schema. This package provides an rcsbapi.search module that simplifies generating complex search queries.

The RCSB PDB Data API supports requests using GraphQL, a language for API queries. This package provides an rcsbapi.data module that simplifies generating queries in GraphQL syntax.

Search API

The rcsbapi.search module supports all available Advanced Search services, as listed below. For more details on their usage, see Search Service Types.

|Search service |QueryType | |----------------------------------|--------------------------| |Full-text |TextQuery() | |Attribute (structure or chemical) |AttributeQuery() | |Sequence similarity |SeqSimilarityQuery() | |Sequence motif |SeqMotifQuery() | |Structure similarity |StructSimilarityQuery() | |Structure motif |StructMotifQuery() | |Chemical similarity |ChemSimilarityQuery() |

Search API Examples

To perform a search for all structures from humans associated with the term "Hemoglobin", you can combine a "full-text" query (TextQuery) with an "attribute" query (AttributeQuery):

```python from rcsbapi.search import AttributeQuery, TextQuery from rcsbapi.search import search_attributes as attrs

Construct a "full-text" sub-query for structures associated with the term "Hemoglobin"

q1 = TextQuery(value="Hemoglobin")

Construct an "attribute" sub-query to search for structures from humans

q2 = AttributeQuery( attribute="rcsbentitysourceorganism.scientificname", operator="exactmatch", # Other operators include "containsphrase", "exists", and more value="Homo sapiens" )

OR, do so by using Python bitwise operators:

q2 = attrs.rcsbentitysourceorganism.scientificname == "Homo sapiens"

Combine the sub-queries (can sub-group using parentheses and standard operators, "&", "|", etc.)

query = q1 & q2

Fetch the results by iterating over the query execution

for rId in query(): print(rId)

OR, capture them into a variable

results = list(query()) ```

These examples are in operator syntax. You can also make queries in fluent syntax. Learn more about both syntaxes and implementation details in Query Syntax and Execution.

Data API

The rcsbapi.data module allows you to easily construct GraphQL queries to the RCSB.org Data API.

This is done by specifying the following input: - "inputtype": the data hierarchy level you are starting from (e.g., "entry", "polymerentity", etc.) (See full list here). - "inputids": the list of IDs for which to fetch data (corresponding to the specified "inputtype") - "returndatalist": the list of data items ("fields") to retrieve. (Available fields can be explored here or via the GraphiQL editor's Documentation Explorer panel.)

Data API Examples

This is a simple query requesting the experimental method of a structure with PDB ID 4HHB (Hemoglobin).

The query must be executed using the .exec() method, which will return the JSON response as well as store the response as an attribute of the DataQuery object. From the object, you can access the Data API response, get an interactive editor link, or access the arguments used to create the query. The package is able to automatically build queries based on the "inputtype" and path segment passed into "returndata_list". If using this package in code intended for long-term use, it's recommended to use fully qualified paths. When autocompletion is being used, an WARNING message will be printed out as a reminder.

python from rcsbapi.data import DataQuery as Query query = Query( input_type="entries", input_ids=["4HHB"], return_data_list=["exptl.method"] ) print(query.exec()) Data is returned in JSON format json { "data": { "entries": [ { "rcsb_id": "4HHB", "exptl": [ { "method": "X-RAY DIFFRACTION" } ] } ] } }

Here is a more complex query. Note that periods can be used to further specify requested data in returndatalist. Also note multiple return data items and ids can be requested in one query. python from rcsbapi.data import DataQuery as Query query = Query( input_type="polymer_entities", input_ids=["2CPK_1", "3WHM_1", "2D5Z_1"], return_data_list=[ "polymer_entities.rcsb_id", "rcsb_entity_source_organism.ncbi_taxonomy_id", "rcsb_entity_source_organism.ncbi_scientific_name", "cluster_id", "identity" ] ) print(query.exec())

Jupyter Notebooks

Several Jupyter notebooks with example use cases and workflows for all package modules are provided under notebooks.

For example, one notebook using both Search and Data API packages for a COVID-19 related example is available in notebooks/searchdataworkflow.ipynb or online through Google Colab Open In Colab.

Citing

Please cite the rcsb-api package with the following reference:

Dennis W. Piehl, Brinda Vallat, Ivana Truong, Habiba Morsy, Rusham Bhatt, Santiago Blaumann, Pratyoy Biswas, Yana Rose, Sebastian Bittrich, Jose M. Duarte, Joan Segura, Chunxiao Bi, Douglas Myers-Turnbull, Brian P. Hudson, Christine Zardecki, Stephen K. Burley. rcsb-api: Python Toolkit for Streamlining Access to RCSB Protein Data Bank APIs, Journal of Molecular Biology, 2025. DOI: 10.1016/j.jmb.2025.168970

You should also cite the RCSB.org API services this package utilizes:

Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley, John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive, Journal of Molecular Biology, 2020. DOI: 10.1016/j.jmb.2020.11.003

Documentation and Support

Please refer to the readthedocs page to learn more about package usage and other available features as well as to see more examples.

If you experience any issues installing or using the package, please submit an issue on GitHub and we will try to respond in a timely manner.

Owner

  • Name: RCSB PDB
  • Login: rcsb
  • Kind: organization
  • Location: United States

Github repository of the RCSB Protein Data Bank

CodeMeta (codemeta.json)

{
  "@context": "https://w3id.org/codemeta/3.0",
  "type": "SoftwareSourceCode",
  "applicationCategory": "Structural Biology, Bioinformatics",
  "codeRepository": "https://github.com/rcsb/py-rcsb-api",
  "dateCreated": "2024-04-15",
  "dateModified": "2025-03-20",
  "datePublished": "2024-07-22",
  "description": "Python interface for RCSB PDB API services at RCSB.org.",
  "downloadUrl": "https://github.com/rcsb/py-rcsb-api/archive/refs/heads/master.zip",
  "funder": {
    "type": "Organization",
    "name": "US National Science Foundation (DBI-2321666), US Department of Energy (DE-SC0019749), National Cancer Institute, National Institute of Allergy and Infectious Diseases, and National Institute of General Medical Sciences of the National Institutes of Health (R01GM157729)"
  },
  "keywords": [
    "structural biology",
    "bioinformatics",
    "protein structure",
    "application programming interface",
    "APIs"
  ],
  "license": "https://spdx.org/licenses/MIT",
  "name": "rcsb-api",
  "operatingSystem": [
    "Linux",
    "Windows",
    "MacOS"
  ],
  "programmingLanguage": "Python 3",
  "relatedLink": [
    "https://pypi.org/project/rcsb-api/",
    "https://rcsbapi.readthedocs.io/en/latest/index.html"
  ],
  "softwareRequirements": "https://github.com/rcsb/py-rcsb-api/blob/master/pyproject.toml",
  "version": "1.1.2",
  "codemeta:contIntegration": {
    "id": "https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=40&branchName=master"
  },
  "continuousIntegration": "https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build/latest?definitionId=40&branchName=master",
  "developmentStatus": "active",
  "funding": "DBI-2321666, DE-SC0019749, R01GM157729",
  "issueTracker": "https://github.com/rcsb/py-rcsb-api/issues",
  "referencePublication": "https://doi.org/10.1016/j.jmb.2025.168970"
}

GitHub Events

Total
  • Issues event: 12
  • Watch event: 32
  • Delete event: 7
  • Issue comment event: 37
  • Push event: 209
  • Pull request review event: 83
  • Pull request review comment event: 71
  • Pull request event: 56
  • Fork event: 9
  • Create event: 11
Last Year
  • Issues event: 12
  • Watch event: 32
  • Delete event: 7
  • Issue comment event: 37
  • Push event: 209
  • Pull request review event: 83
  • Pull request review comment event: 71
  • Pull request event: 56
  • Fork event: 9
  • Create event: 11

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 9
  • Total pull requests: 21
  • Average time to close issues: 4 days
  • Average time to close pull requests: 13 days
  • Total issue authors: 9
  • Total pull request authors: 4
  • Average comments per issue: 0.78
  • Average comments per pull request: 0.1
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 21
  • Average time to close issues: 4 days
  • Average time to close pull requests: 13 days
  • Issue authors: 9
  • Pull request authors: 4
  • Average comments per issue: 0.78
  • Average comments per pull request: 0.1
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dmyersturnbull (2)
  • karenjgonzalez (1)
  • locitran (1)
  • edwardsnj (1)
  • HankerWu (1)
  • jamesl910 (1)
  • anantjiit2026 (1)
  • omgol411 (1)
  • piehld (1)
  • ivana-truong (1)
  • CatChenal (1)
Pull Request Authors
  • ivana-truong (29)
  • habiba-m (14)
  • piehld (11)
  • krish69212 (3)
  • sg-s (1)
  • dmyersturnbull (1)
Top Labels
Issue Labels
bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 2,059 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 15
  • Total maintainers: 1
pypi.org: rcsb-api

Python package interface for RCSB.org API services

  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,059 Last month
Rankings
Dependent packages count: 10.6%
Average: 35.1%
Dependent repos count: 59.7%
Maintainers (1)
Last synced: 7 months ago