s3

More efficient downloading and usage of files hosted on AWS S3

https://github.com/brokamp-group/s3

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

More efficient downloading and usage of files hosted on AWS S3

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 11
Created over 5 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# s3 


[![CRAN status](https://www.r-pkg.org/badges/version/s3)](https://CRAN.R-project.org/package=s3)
[![R-CMD-check](https://github.com/brokamp-group/s3/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/brokamp-group/s3/actions/workflows/R-CMD-check.yaml)


s3 is an R package designed to download files from [AWS S3](). Files are downloaded to the R user data directory (i.e., `tools::R_user_dir("s3", "data")`) so they can be cached across all of an R user's sessions and projects. Specify an alternative download location by setting the `R_USER_DATA_DIR` environment variable (see `?tools::R_user_dir`). 

A file is specified from AWS S3 using its URI and downloaded using the `s3_get()` and `s3_get_files()` functions; e.g., `s3_get("s3://modis-aod-nasa/2020.05.22.tif")`. The get functions always (invisibly) return paths to downloaded files, making it straightforward to read downloaded files into R. Files already present in the download location will be used before trying to download a file again. This means more concise code for downloading files, if they are not already downloaded, and reading files within R.

## Installation

Install the CRAN latest release inside `R` with:

```r
install.packages("s3")
```

Install the development version from GitHub:

```r
# install.packages("remotes")
remotes::install_github("geomarker-io/s3")
```

## Usage

### Downloading Files

```{r}
library(s3)
```

Download a single file specified by its S3 URI with:

```{r, echo = FALSE}
Sys.unsetenv(c('AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY'))
Sys.setenv("R_USER_DATA_DIR" = tempdir())
```

```{r}
s3_get("s3://geomarker/testing_downloads/mtcars.rds")
```

If a file has already been downloaded, then it will not be re-downloaded:

```{r}
s3_get("s3://geomarker/testing_downloads/mtcars.rds")
```

Download multiple files with:

```{r}
s3_get_files(c(
          "s3://geomarker/testing_downloads/mtcars.rds",
          "s3://geomarker/testing_downloads/mtcars_again.rds"
        ),
	confirm = FALSE)
```

### Private Files

Downloading private files requires the name of the S3 bucket's region (this is determined automatically when the file is public):

```{r, eval = FALSE}
s3_get("s3://geomarker/testing_downloads/mtcars_private.rds", region = "us-east-2")
```

#### Setting up AWS credentials

You must have the appropriate AWS S3 credentials set to gain access to non-public files. As with other AWS command line tools and R packages, you can use the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to gain access to such files. 

It is highly recommended to setup your environment variables outside of your R script to avoid including sensitive information within your R script. This can be done by exporting environment variables before starting R (see [AWS CLI documentation](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) on this) or by defining them in a `.Renviron` file (see `?.Renviron` within `R`).

You can use the internal helper function to check if AWS key environment variables are set.

```{r}
s3:::check_for_aws_env_vars()
```

### Downloaded file paths

Files are saved within a directory structure matching that of the S3 URI. `s3_get` and `s3_get_files` both invisibly return the file path(s) of the downloaded files so that they can be further used to access the downloaded files. This makes it possible for different users with different operating systems and/or different project file structures and locations to utilize a downloaded S3 file without changing their source code:

```{r}
s3_get("s3://geomarker/testing_downloads/mtcars.rds") |>
    readRDS()
```

Owner

  • Name: brokamp-group
  • Login: brokamp-group
  • Kind: organization

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • cran 223 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 5
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: s3

Download Files from 'AWS S3'

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 5
  • Downloads: 223 Last month
Rankings
Dependent repos count: 13.1%
Forks count: 21.0%
Stargazers count: 25.5%
Dependent packages count: 28.7%
Average: 30.2%
Downloads: 62.8%
Maintainers (1)
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • aws.signature * imports
  • cli * imports
  • digest * imports
  • dplyr * imports
  • fs * imports
  • glue * imports
  • httr * imports
  • magrittr * imports
  • prettyunits * imports
  • purrr >= 0.3.4 imports
  • tibble * imports
  • testthat * suggests
  • withr * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite