shpc-registry-cache

A cache of container executables (currently featuring the BioContainers) 🗃️

https://github.com/singularityhub/shpc-registry-cache

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

cache containers executables shpc singularity-hpc
Last synced: 6 months ago · JSON representation

Repository

A cache of container executables (currently featuring the BioContainers) 🗃️

Basic Info
  • Host: GitHub
  • Owner: singularityhub
  • License: mpl-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 14.7 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 35
Topics
cache containers executables shpc singularity-hpc
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Shpc Registry Cache

DOI

This is a static cache of container executables discovered on the path. The cache is updated once a week (Wednesday), and we store namespaced (based on OCI or Docker registry) identifiers from the repository root here. Since we primarily cache the set of BioContainers, that means the main set is under quay.io. These counts are useful for research purposes, or for applied uses like Singularity Registry HPC to derive an "ideal" set of entrypoints per container. The cache is generated via the container-executable-discovery action. For details about how the cache algorithm works, see the action as the source of truth. A brief description is included below.

Singularity Registry HPC

As an example of the usage of this cache, we use these cache entries to populate the Singularity HPC Registry. On a high level, shpc-registry is providing install configuration files for containers. Docker or other OCI registry containers are installed to an HPC system via module software, and to make this work really well, we need to know their aliases. This is where data from the cache comes in! Specifically for this use case this means we:

  • Identify a new container, C, not in the registry from the executable cache here
  • Create a set of global executable counts, G
  • Define a set of counts from G in C as S
  • Rank order S from least to greatest}
  • Include any entries in S that have a frequency < 10
  • Include any entries in S that have any portion of the name matching the container identifier
  • Above that, add the next 10 executables with the lowest frequencies, and < 1,000

The frequencies are calculated across the cache here, included in counts.json. This produces a container configuration file with a likely good set of executables that represent the most unique to that container, based on data from the cache.

To learn more about Singularity Registry HPC you can:

Manual Update

To update manually, install the updater:

bash $ python -m pip install git+https://github.com/vsoch/pipelib@main $ python -m pip install git+https://github.com/singularityhub/guts@main $ python -m pip install git+https://github.com/singularityhub/singularity-hpc@main bash $ git clone --depth 1 https://github.com/singularityhub/container-executable-discovery $ cd container-executable-discovery/lib $ pip install -e .

Then generate the biocontainers listing file:

bash $ pip install -r .github/scripts/dev-requirements.txt $ python .github/scripts/get_biocontainers.py /tmp/biocontainers.txt

And then run the update!

bash $ container-discovery update-cache --root $(pwd) --repo-letter-prefix --namespace quay.io/biocontainers /tmp/biocontainers.txt

This is useful to run locally sometimes when there are huge containers that won't be extractable in a GitHub action.

Contribution

This registry showcases a container executable cache, and specifically includes over 8K containers from BioContainers. If you would like to add another source of container identifiers contributions are very much welcome!

License

This code is licensed under the MPL 2.0 LICENSE.

Owner

  • Name: Container Tools
  • Login: singularityhub
  • Kind: organization

open source container hosting registry, tools, and clients

GitHub Events

Total
  • Release event: 9
  • Push event: 59
  • Create event: 9
Last Year
  • Release event: 9
  • Push event: 59
  • Create event: 9

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 4
  • Total pull requests: 12
  • Average time to close issues: 16 days
  • Average time to close pull requests: about 4 hours
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 3.5
  • Average comments per pull request: 0.33
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • marcodelapierre (3)
  • vsoch (1)
Pull Request Authors
  • github-actions[bot] (6)
  • vsoch (5)
  • marcodelapierre (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/release.yaml actions
  • actions/checkout v3 composite
  • avakar/tag-and-release 8f4b627f03fe59381267d3925d39191e27f44236 composite
.github/workflows/update-cache.yaml actions
  • actions/checkout v3 composite
  • singularityhub/container-executable-discovery main composite
.github/scripts/dev-requirements.txt pypi
  • beautifulsoup4 * development
  • packaging * development
  • pipelib * development
  • requests * development
  • singularity-hpc * development