Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Keywords
Repository
Connect R to Athena using Boto3 SDK (DBI Interface)
Basic Info
- Host: GitHub
- Owner: DyfanJones
- License: other
- Language: R
- Default Branch: master
- Homepage: https://dyfanjones.github.io/RAthena/
- Size: 1.9 MB
Statistics
- Stars: 37
- Watchers: 5
- Forks: 6
- Open Issues: 12
- Releases: 13
Topics
Metadata Files
README.md
RAthena
The goal of the RAthena package is to provide a DBI-compliant interface
to Amazon’s Athena (https://aws.amazon.com/athena/) using Boto3 SDK.
This allows for an efficient, easy setup connection to Athena using the
Boto3 SDK as a driver.
NOTE: Before using RAthena you must have an aws account or have
access to aws account with permissions allowing you to use Athena.
Installation:
Before installing RAthena ensure that Python 3+ is installed onto your
machine: https://www.python.org/downloads/. To install Boto3 either it
can installed the pip command or using RAthena installation function:
pip install boto3
RAthena Method (after RAthena has been installed this method can be used)
r
RAthena::install_boto()
To install RAthena you can get it from CRAN with:
r
install.packages("RAthena")
Or to get the development version from Github with:
r
remotes::install_github("dyfanjones/rathena")
Connection Methods
Hard Coding
The most basic way to connect to AWS Athena is to hard-code your access key and secret access key. However this method is not recommended as your credentials are hard-coded. ```r library(DBI)
con <- dbConnect(RAthena::athena(), awsaccesskeyid='YOURACCESSKEYID', awssecretaccesskey='YOURSECRETACCESSKEY', s3stagingdir='s3://path/to/query/bucket/', region_name='eu-west-1') ```
AWS Profile Name
The next method is to use profile names set up by AWS CLI or created manually
in the ~/.aws directory. To create the profile names manually please refer
to: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html.
Setting up AWS CLI
RAthena is compatible with AWS CLI. This allows your aws credentials to be stored and not be hard coded in your connection.
To install AWS CLI please refer to: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html, to configure AWS CLI please refer to: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
Once AWS CLI has been set up you will be able to connect to Athena by
only putting the s3_staging_dir.
Using default profile name:
r
library(DBI)
con <- dbConnect(RAthena::athena(),
s3_staging_dir = 's3://path/to/query/bucket/')
Connecting to Athena using profile name other than default.
r
library(DBI)
con <- dbConnect(RAthena::athena(),
profile_name = "your_profile",
s3_staging_dir = 's3://path/to/query/bucket/')
Temporary Credentials with MFA Account:
```r library(RAthena) getsessiontoken("YOURPROFILENAME", serialnumber='arn:aws:iam::123456789012:mfa/user', tokencode = "531602", set_env = TRUE)
Connect to Athena using temporary credentials
con <- dbConnect(athena(), s3stagingdir = 's3://path/to/query/bucket/') ```
Assuming ARN Role for connection
Another method in connecting to Athena is to use Amazon Resource Name (ARN) role.
Setting credentials in environmental variables: ```r library(RAthena) assumerole(profilename = "YOURPROFILENAME", rolearn = "arn:aws:sts::123456789012:assumed-role/rolename/rolesessionname", set_env = TRUE)
Connect to Athena using temporary credentials
con <- dbConnect(athena(), s3stagingdir = 's3://path/to/query/bucket/') ``` Connecting to Athena directly using ARN role:
r
library(DBI)
con <- dbConnect(athena(),
profile_name = "YOUR_PROFILE_NAME",
role_arn = "arn:aws:sts::123456789012:assumed-role/role_name/role_session_name",
s3_staging_dir = 's3://path/to/query/bucket/')
To change the duration of ARN role session please change the parameter duration_seconds.
By default duration_seconds is set to 3600 seconds (1 hour).
Usage
Basic Usage
Connect to athena, and send a query and return results back to R.
``` r library(DBI)
con <- dbConnect(RAthena::athena(), awsaccesskeyid='YOURACCESSKEYID', awssecretaccesskey='YOURSECRETACCESSKEY', s3stagingdir='s3://path/to/query/bucket/', region_name='eu-west-1')
res <- dbExecute(con, "SELECT * FROM one_row") dbFetch(res) dbClearResult(res) ```
To retrieve query in 1 step.
r
dbGetQuery(con, "SELECT * FROM one_row")
Intermediate Usage
To create a tables in athena, dbExecute will send the query to athena
and wait until query has been executed. This makes it and idea method to
create tables within athena.
``` r
query <-
"CREATE EXTERNAL TABLE impressions (
requestBeginTime string,
adId string,
impressionId string,
referrer string,
userAgent string,
userCookie string,
ip string,
number string,
processId string,
browserCookie string,
requestEndTime string,
timers struct
dbExecute(con, query) ```
RAthena has 2 extra function to return extra information around Athena
tables: dbGetParitiions and dbShow
dbGetPartitions will return all the partitions (returns data.frame):
r
RAthena::dbGetPartition(con, "impressions")
dbShow will return the table’s ddl, so you will able to see how the
table was constructed in Athena (returns SQL character):
r
RAthena::dbShow(con, "impressions")
Advanced Usage
r
library(DBI)
con <- dbConnect(RAthena::athena(),
s3_staging_dir = 's3://path/to/query/bucket/')
Sending data to Athena
RAthena has created a method to send data.frame from R to Athena.
``` r
Check existing tables
dbListTables(con)
Upload iris to Athena
dbWriteTable(con, "iris", iris, partition=c("TIMESTAMP" = format(Sys.Date(), "%Y%m%d")))
Read in iris from Athena
dbReadTable(con, "iris")
Check new existing tables in Athena
dbListTables(con)
Check if iris exists in Athena
dbExistsTable(con, "iris") ```
Please check out RAthena method for dbWriteTable for more information in how to upload data to AWS Athena and AWS S3.
For more information around how to get the most out of AWS Athena when uploading data please check out: Top 10 Performance Tuning Tips for Amazon Athena
Tidyverse Usage
Creating a connection to Athena and query and already existing table
iris that was created in previous example.
``` r library(DBI) library(dplyr)
con <- dbConnect(RAthena::athena(), awsaccesskeyid='YOURACCESSKEYID', awssecretaccesskey='YOURSECRETACCESSKEY', s3stagingdir='s3://path/to/query/bucket/', region_name='eu-west-1') tbl(con, sql("SELECT * FROM iris")) ```
# Source: SQL [?? x 5]
# Database: Athena 1.9.210 [eu-west-1/default]
sepal_length sepal_width petal_length petal_width species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with more rows
dplyr provides lazy querying with allows to short hand tbl(con, sql("SELECT * FROM iris"))
to tbl(con, "iris"). For more information please look at https://solutions.posit.co/connections/db/r-packages/dplyr/.
r
tbl(con, "iris")
# Source: table<iris> [?? x 5]
# Database: Athena 1.9.210 [eu-west-1/default]
sepal_length sepal_width petal_length petal_width species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
# … with more rows
Querying Athena with profile_name instead of hard coding
aws_access_key_id and aws_secret_access_key. By using profile_name
extra Meta Data is returned in the query to give users extra
information.
r
con <- dbConnect(RAthena::athena(),
profile_name = "your_profile",
s3_staging_dir='s3://path/to/query/bucket/')
tbl(con, "iris")) %>%
filter(petal_length < 1.3)
# Source: lazy query [?? x 5]
# Database: Athena 1.9.210 [your_profile@eu-west-1/default]
sepal_length sepal_width petal_length petal_width species
<dbl> <dbl> <dbl> <dbl> <chr>
1 4.7 3.2 1.3 0.2 setosa
2 4.3 3 1.1 0.1 setosa
3 5.8 4 1.2 0.2 setosa
4 5.4 3.9 1.3 0.4 setosa
5 4.6 3.6 1 0.2 setosa
6 5 3.2 1.2 0.2 setosa
7 5.5 3.5 1.3 0.2 setosa
8 4.4 3 1.3 0.2 setosa
9 5 3.5 1.3 0.3 setosa
10 4.5 2.3 1.3 0.3 setosa
# … with more rows
r
tbl(con, "iris") %>%
select(contains("sepal"), contains("petal"))
# Source: lazy query [?? x 4]
# Database: Athena 1.9.210 [your_profile@eu-west-1/default]
sepal_length sepal_width petal_length petal_width
<dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2
2 4.9 3 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
# … with more rows
Upload data using dplyr function copy_to and compute.
``` r library(DBI) library(dplyr)
con <- dbConnect(RAthena::athena(), profilename = "yourprofile", s3stagingdir='s3://path/to/query/bucket/') ```
Write data.frame to Athena table
r
copy_to(con, mtcars,
s3_location = "s3://mybucket/data/")
Write Athena table from tblsql ```r athenamtcars <- tbl(con, "mtcars") mtcarsfilter <- athenamtcars %>% filter(gear >=4) ```
Create athena with unique table name
r
mtcars_filer %>% compute()
Create athena with specified name and s3 location ```r mtcarsfiler %>% compute("mtcarsfiler", s3location = "s3://mybucket/mtcarsfiler/")
Disconnect from Athena
dbDisconnect(con) ```
Work Groups
Creating work group:
``` r library(RAthena) library(DBI)
con <- dbConnect(RAthena::athena(), profilename = "yourprofile", encryptionoption = "SSES3", s3stagingdir='s3://path/to/query/bucket/')
createworkgroup(con, "demoworkgroup", description = "This is a demo work group", tags = tagoptions(key= "demoworkgroup", value = "demo01")) ```
List work groups:
r
list_work_groups(con)
[[1]]
[[1]]$Name
[1] "demo_work_group"
[[1]]$State
[1] "ENABLED"
[[1]]$Description
[1] "This is a demo work group"
[[1]]$CreationTime
2019-09-06 18:51:28.902000+01:00
[[2]]
[[2]]$Name
[1] "primary"
[[2]]$State
[1] "ENABLED"
[[2]]$Description
[1] ""
[[2]]$CreationTime
2019-08-22 16:14:47.902000+01:00
Update work group:
r
update_work_group(con, "demo_work_group", description = "This is a demo work group update")
Return work group meta data:
r
get_work_group(con, "demo_work_group")
$Name
[1] "demo_work_group"
$State
[1] "ENABLED"
$Configuration
$Configuration$ResultConfiguration
$Configuration$ResultConfiguration$OutputLocation
[1] "s3://path/to/query/bucket/"
$Configuration$ResultConfiguration$EncryptionConfiguration
$Configuration$ResultConfiguration$EncryptionConfiguration$EncryptionOption
[1] "SSE_S3"
$Configuration$EnforceWorkGroupConfiguration
[1] FALSE
$Configuration$PublishCloudWatchMetricsEnabled
[1] FALSE
$Configuration$BytesScannedCutoffPerQuery
[1] 10000000
$Configuration$RequesterPaysEnabled
[1] FALSE
$Description
[1] "This is a demo work group update"
$CreationTime
2019-09-06 18:51:28.902000+01:00
Connect to Athena using work group:
r
con <- dbConnect(RAthena::athena(),
profile_name = "your_profile",
work_group = "demo_work_group")
Delete work group:
r
delete_work_group(con, "demo_work_group")
Similar Projects
Python:
pyAthena- A python wrapper of the python packageBoto3using the sqlAlchemy framework: https://github.com/laughingman7743/PyAthenapyAthenaJDBC- A python interface into AWS Athena’s JDBC drivers: https://github.com/laughingman7743/PyAthenaJDBC
R:
AWR.Athena- A R wrapper of RJDBC for the AWS Athena’s JDBC drivers: https://github.com/nfultz/AWR.Athenanoctua- A R wrapper of the R AWS SDKpawsto develop a DBI interface https://github.com/DyfanJones/noctuaawsathena- rJava Interface to AWS Athena SDK https://github.com/hrbrmstr/awsathenametis- Helpers for Accessing and Querying Amazon Athena using R, Including a lightweight RJDBC shim https://github.com/hrbrmstr/metismetisjars- JARs formetishttps://github.com/hrbrmstr/metis-jarsmetis.tidy- Access and Query Amazon Athena via the Tidyverse https://github.com/hrbrmstr/metis-tidy
awsathena and metis family of packages are currently used in production every day to analyze petabytes of internet scan and honeypot data.
Comparison:
The reason why RAthena stands slightly apart from AWR.Athena is that
AWR.Athena uses the Athena JDBC drivers and RAthena uses the Python
AWS SDK Boto3. The ultimate goal is to provide an extra method for R
users to interface with AWS Athena. As pyAthena is the most similar
project, this project has used an appropriate name to reflect this …
RAthena.
Owner
- Name: Larefly
- Login: DyfanJones
- Kind: user
- Location: United Kingdom
- Repositories: 14
- Profile: https://github.com/DyfanJones
GitHub Events
Total
- Issues event: 6
- Watch event: 1
- Issue comment event: 5
Last Year
- Issues event: 6
- Watch event: 1
- Issue comment event: 5
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| DyfanJones | d****s@s****m | 1,062 |
| Dyfan Jones | d****s@t****m | 125 |
| dyfan.jones | 2****s | 24 |
| Ossi Lehtinen | o****n@s****i | 9 |
| Larefly | d****s@h****m | 8 |
| ras44 | 9****4 | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 35
- Total pull requests: 73
- Average time to close issues: 3 months
- Average time to close pull requests: about 11 hours
- Total issue authors: 14
- Total pull request authors: 3
- Average comments per issue: 4.66
- Average comments per pull request: 0.25
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 0
- Average time to close issues: 3 months
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.8
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- DyfanJones (18)
- willshen99 (5)
- juhoautio (1)
- hrbrmstr (1)
- JonMerlevede (1)
- hadley (1)
- aoyh (1)
- luisccmm (1)
- CerebralMastication (1)
- OssiLehtinen (1)
- NanisTe (1)
- ozhyrenkov (1)
- DavidArenburg (1)
- brunocarlin (1)
Pull Request Authors
- DyfanJones (71)
- olivroy (2)
- ras44 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 667 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 26
- Total maintainers: 1
cran.r-project.org: RAthena
Connect to 'AWS Athena' using 'Boto3' ('DBI' Interface)
- Homepage: https://github.com/DyfanJones/RAthena
- Documentation: http://cran.r-project.org/web/packages/RAthena/RAthena.pdf
- License: MIT + file LICENSE
-
Latest release: 2.6.1
published about 3 years ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.2.0 depends
- DBI >= 0.7 imports
- data.table >= 1.12.4 imports
- methods * imports
- reticulate >= 1.13 imports
- stats * imports
- utils * imports
- uuid >= 0.1 imports
- arrow * suggests
- bit64 * suggests
- covr * suggests
- dbplyr >= 1.4.3 suggests
- dplyr >= 0.8.0 suggests
- jsonify * suggests
- jsonlite * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat * suggests
- tibble * suggests
- vroom >= 1.2.0 suggests
- actions/checkout v3 composite
- actions/upload-artifact main composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- rstudio/r-base 4.0.2-centos7 build