https://github.com/cboettig/oras

R interface to the oras cli

https://github.com/cboettig/oras

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

R interface to the oras cli

Basic Info
  • Host: GitHub
  • Owner: cboettig
  • License: other
  • Language: R
  • Default Branch: main
  • Size: 38.1 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# oras

`oras` provides a robust way to read and write data objects to cloud storage systems container registries.
This can be an highly effective and provider-agnostic ("multi-cloud") mechanism to distribute reasonably version stable, content-addressed large data files publicly and at scale.
This approach also supports convenient private distribution (though the quotas available from the free-tier of most major providers are much less generous for private files). This package merely provides an R wrapper to the [oras](https://oras.land) command line interface tool, see original documentation for details or skip to the [Quickstart](#Quickstart) to get started from R. 



## Background & Motivation

Container registries were initial introduced to support the distribution of [Docker images](https://docker.com), which use a layered, content-addressed storage system.
The [Open Container Initiative](https://opencontainers.org/) later defined an [open standard specification](https://github.com/opencontainers/image-spec/blob/main/spec.md) for container registries, leading to a proliferation of open source standard implementations ([zot](https://zotregistry.io/), [GitLab](https://docs.gitlab.com/ee/user/packages/container_registry/), and [Harbor](https://goharbor.io/)) that can be self-hosted, while most major cloud providers including [Google Artifact Registry](https://cloud.google.com/artifact-registry), [Amazon's Elastic Container Registry](https://aws.amazon.com/blogs/containers/oci-artifact-support-in-amazon-ecr/), and Microsoft's [Azure Artifact Registry](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-oci-artifacts) offer container registries as a service. 
(The increasing adoption of the more general term "Artifact" registry reflects the growing recognition that these platforms are incredibly useful for arbitrary digital assets and are by no means limited to Docker containers).  

Of particular interest due to it's generous public data storage quotas and familiarity to many programmers is the [GitHub Container Registry (GHCR)](https://ghcr.io), now called "GitHub Packages," which we'll use as the focal point of examples here.  But it is worth bearing in mind that the software involved under the hood is not GitHub-specific, and could just as easily work against a self-hosted system like [zot](https://zotregistry.io/) or the major cloud provider registries. 

## Quickstart


```r
install_github("cboettig/oras")
```

```{r include=FALSE}
write.csv(mtcars, "mtcars.csv", row.names = FALSE)
write.csv(lynx, "lynx.csv", row.names = FALSE)

```


```{r}
library(oras)
```

After loading the library, the appropriate oras client binary must be installed by explicit request (as per CRAN policy). Batch scripts should always call `install_oras()`, which will check if a valid version is installed and only install if needed (or if an upgrade is requested with `force=TRUE`). In interactive use, functions will prompt if the oras binary is not installed.

```{r}
install_oras()
```

The default login is set to GitHub's container registry, GHCR.  Alternative providers can also be indicated. By default, this loads the `GITHUB_TOKEN` environmental variable -- ensure that your token includes the "packages" scope for it to be able to work with GHCR (best practice would be to use a dedicated token with _only_ packages scope).  Instead of using an environmental variable, the user name and password can be provided as function arguments (but be sure not to accidentally disclose your password in shared code that way.)  Alternatively, `--username` and `--password` flags are recognized by most `oras` commands.

```{r}
oras_login()
```

```{r}
oras_push("ghcr.io/cboettig/content-store/mtcars:v1", "mtcars.csv")
oras_blob_fetch("ghcr.io/cboettig/content-store/mtcars@sha256:450a97ba6b438c6ea5bdf2aaac7eab0ecbbf812b5ff74b56f62dcf0a0c7eb0e5",
                "cars.csv")
```
We can now download the asset using either it's tag (which can be overwritten) or it's hash (which is always unique to this particular content, and retained even after it is overwritten.)

```{r}
oras_pull("ghcr.io/cboettig/content-store/mtcars:v1")
```

What if we push a different dataset to this same namespace and tag?

```{r}
oras_push("ghcr.io/cboettig/content-store/mtcars:v1", "lynx.csv")
```

Note that because the registry is content-based, if we push an object that already exits, we get a simple note that the object already exists, as determined by its SHA256 hash. Also note that using the tag in `oras_pull`, we now receive the `lynx.csv` file, while using the mtcars hash, we receive the `mtcars.csv`.  
But the content is preserved and addressable by it's sha256 hash. Always use `oras_blob_fetch()` to fetch specific content addressed by hash.  

```{r}
oras_pull("ghcr.io/cboettig/content-store/mtcars:v1")

oras_blob_fetch("ghcr.io/cboettig/content-store/mtcars@sha256:450a97ba6b438c6ea5bdf2aaac7eab0ecbbf812b5ff74b56f62dcf0a0c7eb0e5",
                output = "cars.csv")
```
 Unlike other content-based storage systems like IPFS, content addresses in container registries are transparent and easily computed by the SHA256 standard (perhaps the most widely used and supported algorithm ever written). For instance, we can easily verify the hash quoted by the registry matches that produced independently:
 
```{r}
openssl::sha256(file("mtcars.csv", "rb")) |> as.character()
openssl::sha256(file("cars.csv", "rb")) |> as.character()

```

## Metadata and manfiests

Richer workflows that add metadata tags to blobs and connect content to other content in a DAG, allowing a tree of content to be pulled in a single command (just as a docker image pulls down all its component "layers", i.e. blobs), is a natural extension of this and well-supported.  **examples coming**. 


## Public addressing

By default, objects pushed to GHCR are marked private, and subject to quotas or billing (above a free storage tier of 500 MB). Those quotas do not apply to objects marked public.  After creating an object, set it to public as described in the [GitHub Documentation](https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility) (this task is not part of the scope of OCI specification nor covered by the GitHub API.)

A public object can be addressed by hash using the `blobs` endpoint (common to any OCI standard registry, though GitHub requires this additional anonymous token  `QQ==` be specified in the header):


```{r eval=FALSE, include=FALSE}
## huh, not working
host <- "https://ghcr.io"
namespace <- "cboettig/content-store/mtcars"
sha <- "450a97ba6b438c6ea5bdf2aaac7eab0ecbbf812b5ff74b56f62dcf0a0c7eb0e5"
url <- glue::glue("{host}/v2/{namespace}/blobs/sha:{sha}")

library(httr)
resp <- GET(url,
    write_disk("cars.csv", overwrite = TRUE),
    add_headers(Authorization= "Bearer QQ==")
)
stop_for_status(resp)

library(curl)
h <- new_handle()
handle_setheaders(h, Authorization= "Bearer QQ==")
curl_download(url, "test.txt", handle = h)
```


```{bash}
curl -fL --header "Authorization: Bearer QQ==" "https://ghcr.io/v2/cboettig/content-store/mtcars/blobs/sha256:450a97ba6b438c6ea5bdf2aaac7eab0ecbbf812b5ff74b56f62dcf0a0c7eb0e5" -o cars.csv

# verify matching content 
sha256sum cars.csv
```


Finally, we can log out, to make our connection secure until an authentication token is again provided. (For non-GHCR use, remember to specify the registry address)

```{r}
oras_logout()
```


```{r include = FALSE}
unlink("mtcars.csv")
unlink("lynx.csv")
unlink("cars.csv")

```

Owner

  • Name: Carl Boettiger
  • Login: cboettig
  • Kind: user
  • Company: UC Berkeley

GitHub Events

Total
Last Year

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 6
  • Total Committers: 2
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.333
Past Year
  • Commits: 6
  • Committers: 2
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Carl Boettiger c****g@g****m 4
Carl c****g@g****m 2

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • archive * imports
  • fs * imports
  • glue * imports
  • processx * imports
  • tools * imports
  • utils * imports
  • spelling * suggests
  • testthat >= 3.0.0 suggests