2025-biodata-wealth-inequality
Code to create figures showing wealth inequality of biological sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt, UniProtKB/TREMBL (AlphaFoldDB), and predicted number of proteins on Earth
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.1%) to scientific vocabulary
Repository
Code to create figures showing wealth inequality of biological sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt, UniProtKB/TREMBL (AlphaFoldDB), and predicted number of proteins on Earth
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 3
- Releases: 1
Metadata Files
README.md
2025-biodata-wealth-inequality
Code to create figures showing wealth inequality of biological sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt, UniProtKB/TREMBL (AlphaFoldDB), and predicted number of proteins on Earth.
Figures created
The figures created are the "packcircles" below the graph in the following figure:

See figures/ for each individual figure.
Parsing UniProt TREMBL entries
We used the uniprot module from heuermh/dishevelled-bio to parse the gigantic 219 GB uniprot_trembl.xml.gz file into entries. See notebooks/01_parse_uniprot_trembl_xml.ipynb for more.
Owner
- Name: Seanome
- Login: seanome
- Kind: organization
- Email: info@seanome.org
- Location: United States of America
- Repositories: 1
- Profile: https://github.com/seanome
Citation (CITATION.cff)
title: 2025-biodata-wealth-inequality
authors:
- given-names: Olga
family-names: Botvinnik
email: olga@seanome.org
affiliation: Seanome
cff-version: 1.2.0
message: If you use this software, please cite it using the metadata from this file.
type: software
abstract: Code to create figures showing wealth inequality of biological
sequence data in the Protein Data Bank (PDB), UniProtKB/SwissProt,
UniProtKB/TREMBL (AlphaFoldDB), and all predicted number of proteins on Earth.
keywords:
- biodiversity
- uniprot
license: MIT
repository-code: https://github.com/seanome/2025-biodata-wealth-inequality
GitHub Events
Total
- Create event: 9
- Release event: 1
- Issues event: 2
- Delete event: 2
- Push event: 17
- Pull request event: 11
Last Year
- Create event: 9
- Release event: 1
- Issues event: 2
- Delete event: 2
- Push event: 17
- Pull request event: 11
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 2
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 11 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 1
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 11 days
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 1
- Bot pull requests: 0
Top Authors
Issue Authors
- codefair-io[bot] (1)
- heuermh (1)
Pull Request Authors
- olgabot (2)