s3fs

Access Amazon Web Service 'S3' as if it were a file system. File system 'API' design around R package 'fs'

https://github.com/dyfanjones/s3fs

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.3%) to scientific vocabulary

Keywords

aws aws-s3 fs minio r r-package
Last synced: 6 months ago · JSON representation

Repository

Access Amazon Web Service 'S3' as if it were a file system. File system 'API' design around R package 'fs'

Basic Info
Statistics
  • Stars: 45
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 5
Topics
aws aws-s3 fs minio r r-package
Created over 3 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Code of conduct

README.Rmd

---
output: github_document
---



# s3fs


[![s3fs status badge](https://dyfanjones.r-universe.dev/badges/s3fs)](https://dyfanjones.r-universe.dev/s3fs)
[![R-CMD-check](https://github.com/DyfanJones/s3fs/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/DyfanJones/s3fs/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/DyfanJones/s3fs/branch/main/graph/badge.svg)](https://app.codecov.io/gh/DyfanJones/s3fs?branch=main)
[![CRAN status](https://www.r-pkg.org/badges/version/s3fs)](https://CRAN.R-project.org/package=s3fs)



`s3fs` provides a file-system like interface into Amazon Web Services
for `R`. It utilizes [`paws`](https://github.com/paws-r/paws) `SDK`and
[`R6`](https://github.com/r-lib/R6) for it's core design. This repo has been inspired by
Python’s [`s3fs`](https://github.com/fsspec/s3fs), however it’s API and
implementation has been developed to follow `R`’s
[`fs`](https://github.com/r-lib/fs).

## Installation

You can install the released version of s3fs from [CRAN](https://cran.r-project.org/) with:
```r
install.packages('s3fs')
```

r-universe installation:
```r
# Enable repository from dyfanjones
options(repos = c(
  dyfanjones = 'https://dyfanjones.r-universe.dev',
  CRAN = 'https://cloud.r-project.org')
)

# Download and install s3fs in R
install.packages('s3fs')
```

Github installation

```r
remotes::install_github("dyfanjones/s3fs")
```

### Dependencies

* [`paws`](https://github.com/paws-r/paws): connection with AWS S3
* [`R6`](https://github.com/r-lib/R6): Setup core class
* [`data.table`](https://github.com/Rdatatable/data.table): wrangle lists into data.frames
* [`fs`](https://github.com/r-lib/fs): file system on local files
* [`lgr`](https://github.com/s-fleck/lgr): set up logging
* [`future`](https://github.com/HenrikBengtsson/future): set up async functionality
* [`future.apply`](https://github.com/HenrikBengtsson/future.apply): set up parallel looping

# Comparison with `fs`

`s3fs` attempts to give the same interface as `fs` when handling files on AWS S3 from `R`.

- **Vectorization**. All `s3fs` functions are vectorized, accepting multiple path inputs similar to `fs`.
- **Predictable**. 
  - Non-async functions return values that convey a path.
  - Async functions return a `future` object of it's no-async counterpart.
  - The only exception will be `s3_stream_in` which returns a list of raw objects.
- **Naming conventions**. s3fs functions follows `fs` naming conventions with `dir_*`, `file_*` and `path_*` however with the syntax `s3_` infront i.e `s3_dir_*`, `s3_file_*` and `s3_path_*` etc.
- **Explicit failure**. Similar to `fs` if a failure happens, then it will be raised and not masked with a warning.

# Extra features:

- **Scalable**. All `s3fs` functions are designed to have the option to run in parallel through the use of `future` and `future.apply`.

For example: copy a large file from one location to the next.
```r
library(s3fs)
library(future)

plan("multisession")

s3_file_copy("s3://mybucket/multipart/large_file.csv", "s3://mybucket/new_location/large_file.csv")
```

`s3fs` to copy a large file (> 5GB) using multiparts, `future` allows each multipart to run in parallel to speed up the process.

- **Async**. `s3fs` uses `future` to create a few key async functions. This is more focused on functions that might be moving large files to and from `R` and `AWS S3`.

For example: Copying a large file from `AWS S3` to `R`.
```r
library(s3fs)
library(future)

plan("multisession")

s3_file_copy_async("s3://mybucket/multipart/large_file.csv", "large_file.csv")
```

## Usage

`fs` has a straight forward API with 4 core themes:

- `path_` for manipulating and constructing paths
- `file_` for files
- `dir_` for directories
- `link_` for links

`s3fs` follows theses themes with the following:

- `s3_path_` for manipulating and constructing s3 uri paths
- `s3_file_` for s3 files
- `s3_dir_` for s3 directories

**NOTE:** `link_` is currently not supported.

``` r
library(s3fs)

# Construct a path to a file with `path()`
s3_path("foo", "bar", letters[1:3], ext = "txt")
#> [1] "s3://foo/bar/a.txt" "s3://foo/bar/b.txt" "s3://foo/bar/c.txt"

# list buckets
s3_dir_ls()
#> [1] "s3://MyBucket1"
#> [2] "s3://MyBucket2"                                        
#> [3] "s3://MyBucket3"               
#> [4] "s3://MyBucket4"                            
#> [5] "s3://MyBucket5"

# list files in bucket
s3_dir_ls("s3://MyBucket5")
#> [1] "s3://MyBucket5/iris.json"     "s3://MyBucket5/athena-query/"
#> [3] "s3://MyBucket5/data/"         "s3://MyBucket5/default/"     
#> [5] "s3://MyBucket5/iris/"         "s3://MyBucket5/made-up/"     
#> [7] "s3://MyBucket5/test_df/"

# create a new directory
tmp <- s3_dir_create(s3_file_temp(tmp_dir = "MyBucket5"))
tmp
#> [1] "s3://MyBucket5/filezwkcxx9q5562"

# create new files in that directory
s3_file_create(s3_path(tmp, "my-file.txt"))
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
s3_dir_ls(tmp)
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"

# remove files from the directory
s3_file_delete(s3_path(tmp, "my-file.txt"))
s3_dir_ls(tmp)
#> character(0)

# remove the directory
s3_dir_delete(tmp)
```

Created on 2022-06-21 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)

Similar to `fs`, `s3fs` is designed to work well with the pipe.

``` r
library(s3fs)
paths <- s3_file_temp(tmp_dir = "MyBucket") |>
 s3_dir_create() |>
 s3_path(letters[1:5]) |>
 s3_file_create()
paths
#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"

paths |> s3_file_delete()
#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
```

Created on 2022-06-22 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)

**NOTE:** all examples have be developed from `fs`.

### File systems that emulate S3

`s3fs` allows you to connect to file systems that provides an S3-compatible interface. For example, [MinIO](https://min.io/) offers high-performance, S3 compatible object storage. 
You will be able to connect to your `MinIO` server using `s3fs::s3_file_system`:

``` r
library(s3fs)

s3_file_system(
  aws_access_key_id = "minioadmin",  
  aws_secret_access_key = "minioadmin",
  endpoint = "http://localhost:9000"
)

s3_dir_ls()
#> [1] ""

s3_bucket_create("s3://testbucket")
#> [1] "s3://testbucket"

# refresh cache
s3_dir_ls(refresh = T)
#> [1] "s3://testbucket"

s3_bucket_delete("s3://testbucket")
#> [1] "s3://testbucket"

# refresh cache
s3_dir_ls(refresh = T)
#> [1] ""
```

Created on 2022-12-14 with [reprex v2.0.2](https://reprex.tidyverse.org)

**NOTE:** if you to want change from AWS S3 to Minio in the same R session, you will need to set the parameter `refresh = TRUE` when calling `s3_file_system` again.
You can use multiple sessions by using the R6 class `S3FileSystem` directly.

# Feedback wanted

Please open a Github ticket raising any issues or feature requests.

Owner

  • Name: Larefly
  • Login: DyfanJones
  • Kind: user
  • Location: United Kingdom

GitHub Events

Total
  • Issues event: 5
  • Watch event: 6
  • Delete event: 19
  • Issue comment event: 4
  • Push event: 4
  • Create event: 2
Last Year
  • Issues event: 5
  • Watch event: 6
  • Delete event: 19
  • Issue comment event: 4
  • Push event: 4
  • Create event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 165
  • Total Committers: 4
  • Avg Commits per committer: 41.25
  • Development Distribution Score (DDS): 0.382
Past Year
  • Commits: 10
  • Committers: 2
  • Avg Commits per committer: 5.0
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
dyfan.jones 2****s 102
dyfan.jones d****s@s****m 59
Scott Chamberlain s****r@f****g 2
Salim B g****t@s****e 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 18
  • Total pull requests: 36
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 7
  • Total pull request authors: 3
  • Average comments per issue: 4.61
  • Average comments per pull request: 0.28
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 7
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 4 days
  • Issue authors: 3
  • Pull request authors: 2
  • Average comments per issue: 3.5
  • Average comments per pull request: 0.43
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • DyfanJones (6)
  • pat-s (4)
  • tyner (2)
  • asitemade4u (2)
  • reisfe (1)
  • jwijffels (1)
  • salim-b (1)
Pull Request Authors
  • DyfanJones (40)
  • salim-b (2)
  • sckott (2)
Top Labels
Issue Labels
bug (3) enhancement (2) documentation (1) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 431 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: s3fs

'Amazon Web Service S3' File System

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 431 Last month
Rankings
Stargazers count: 12.6%
Average: 28.1%
Forks count: 28.8%
Dependent packages count: 29.8%
Downloads: 33.6%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R6 * imports
  • curl * imports
  • data.table * imports
  • fs * imports
  • future * imports
  • future.apply * imports
  • lgr * imports
  • paws.storage * imports
  • utils * imports
  • testthat >= 3.1.4 suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/rhub.yaml actions
  • r-hub/actions/checkout v1 composite
  • r-hub/actions/platform-info v1 composite
  • r-hub/actions/run-check v1 composite
  • r-hub/actions/setup v1 composite
  • r-hub/actions/setup-deps v1 composite
  • r-hub/actions/setup-r v1 composite