2025-biodata-wealth-inequality

Code to create figures showing wealth inequality of biological sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt, UniProtKB/TREMBL (AlphaFoldDB), and predicted number of proteins on Earth

https://github.com/seanome/2025-biodata-wealth-inequality

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.1%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Code to create figures showing wealth inequality of biological sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt, UniProtKB/TREMBL (AlphaFoldDB), and predicted number of proteins on Earth

Basic Info
  • Host: GitHub
  • Owner: seanome
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 105 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 1
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation Codemeta

README.md

2025-biodata-wealth-inequality

Code to create figures showing wealth inequality of biological sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt, UniProtKB/TREMBL (AlphaFoldDB), and predicted number of proteins on Earth.

Figures created

The figures created are the "packcircles" below the graph in the following figure:

See figures/ for each individual figure.

Parsing UniProt TREMBL entries

We used the uniprot module from heuermh/dishevelled-bio to parse the gigantic 219 GB uniprot_trembl.xml.gz file into entries. See notebooks/01_parse_uniprot_trembl_xml.ipynb for more.

Owner

  • Name: Seanome
  • Login: seanome
  • Kind: organization
  • Email: info@seanome.org
  • Location: United States of America

Citation (CITATION.cff)

title: 2025-biodata-wealth-inequality
authors:
  - given-names: Olga
    family-names: Botvinnik
    email: olga@seanome.org
    affiliation: Seanome
cff-version: 1.2.0
message: If you use this software, please cite it using the metadata from this file.
type: software
abstract: Code to create figures showing wealth inequality of biological
  sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt,
  UniProtKB/TREMBL (AlphaFoldDB), and all predicted number of proteins on Earth.
keywords:
  - biodiversity
  - uniprot
license: MIT
repository-code: https://github.com/seanome/2025-biodata-wealth-inequality

GitHub Events

Total
  • Create event: 9
  • Release event: 1
  • Issues event: 2
  • Delete event: 2
  • Push event: 17
  • Pull request event: 11
Last Year
  • Create event: 9
  • Release event: 1
  • Issues event: 2
  • Delete event: 2
  • Push event: 17
  • Pull request event: 11

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 2
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 11 days
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 1
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 11 days
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 1
  • Bot pull requests: 0
Top Authors
Issue Authors
  • codefair-io[bot] (1)
  • heuermh (1)
Pull Request Authors
  • olgabot (2)
Top Labels
Issue Labels
Pull Request Labels