connector.databricks

Expansion of the connector package for establishing connections to databricks

https://github.com/novonordisk-opensource/connector.databricks

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Expansion of the connector package for establishing connections to databricks

Basic Info
Statistics
  • Stars: 5
  • Watchers: 7
  • Forks: 0
  • Open Issues: 6
  • Releases: 1
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# connector.databricks connector.databricks website


[![Checks](https://github.com/novonordisk-opensource/connector.databricks/actions/workflows/check_and_co.yaml/badge.svg)](https://github.com/novonordisk-opensource/connector.databricks/actions/workflows/check_and_co.yaml)
[![Codecov test coverage](https://codecov.io/gh/NovoNordisk-OpenSource/connector.databricks/graph/badge.svg)](https://app.codecov.io/gh/NovoNordisk-OpenSource/connector.databricks)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)


The connector.databricks package provides a convenient interface for accessing and interacting with [Databricks](https://www.databricks.com/)  volumes and tables directly from R. This vignette will guide you through the process of connecting to Databricks, retrieving data, and performing various operations using this package.

This package is meant to be used with [connector](https://github.com/NovoNordisk-OpenSource/connector) package, which provides a common interface for interacting with various data sources. The connector.databricks package extends the connector package to support Databricks **volumes** and **tables**.

## Installation

You can install the connector.databricks from CRAN package using the following command:

```{r, eval=FALSE}
# Install from CRAN
install.packages("connector.databricks")
```

### Development version

To get a bug fix or to use a feature from the development version, you can install
the development version of connector.databricks from GitHub.

```{r, eval = FALSE}
pak::pak("novonordisk-opensource/connector.databricks")
```


## Usage

Here is an example of how to connect to databricks and retrieve data:

```{r, eval=FALSE}
library(connector.databricks)

# Connect to databricks tables using DBI
con <- connector_databricks_table(
  http_path = "path-to-cluster",
  catalog = "my_catalog",
  schema = "my_schema"
)

# Connect to databricks volume
con <- connector_databricks_volume(
  catalog = "my_catalog",
  schema = "my_schema",
  path = "path-to-file-storage"
)
```

When connecting to **Databricks tables**, authentication to databricks is handled by the `odbc::databricks()`
driver and supports general use of personal access tokens and credentials through Posit Workbench.
See also `odbc::databricks()` On more information on how the connection to Databricks is established. Currently, most package functions rely on [brickster](https://github.com/databrickslabs/brickster) package.

When connecting to **Databricks volumes**, authentication is handled using `brickster` package. See also this [vignette](https://databrickslabs.github.io/brickster/articles/setup-auth.html) on more information how the authentication is handled.

Hopefully in the future whole backend will rely completely only on `brickster` package.

Both types of connections share similar interfaces for reading and writing data.
Tables should be used with tabular types of data, while volumes should be used
with unstructured data.

Example of how to use the connector object:
```{r, eval=FALSE}
# List content
con$list_content_cnt()

# Write a file
con$write_cnt(iris, "iris.rds")

# Read a file
con$read_cnt("iris.rds") |>
  head()

# Remove a file
con$remove_cnt("file_name.csv")
```

## Usage with connector package

Here is an example how it can be used with connector package and configuration YAML file (for more information take a look at the connector package):
```{r, eval=FALSE}
# Connect using configuration file
connector <- connector::connect(
  config = system.file(
    "config",
    "example_yaml.yaml",
    package = "connector.databricks"
  )
)

# List contents in Volume
connector$volumes$list_content_cnt()

# Get databricks connection object from Tables
connector$tables$get_conn()

# Write a file
connector$volumes$write_cnt(iris, "Test/iris.csv")

# Read a file
connector$tables$read_cnt("example_data")
```

## Contributing
We welcome contributions to the connector.databricks package. If you have any
suggestions or find any issues, please open an issue or submit a pull request on GitHub.

## License
This package is licensed under the Apache License.

Owner

  • Name: NovoNordisk-OpenSource
  • Login: NovoNordisk-OpenSource
  • Kind: organization

GitHub Events

Total
  • Issues event: 31
  • Watch event: 5
  • Delete event: 16
  • Issue comment event: 126
  • Push event: 147
  • Pull request review comment event: 23
  • Pull request review event: 39
  • Pull request event: 33
  • Create event: 16
Last Year
  • Issues event: 31
  • Watch event: 5
  • Delete event: 16
  • Issue comment event: 126
  • Push event: 147
  • Pull request review comment event: 23
  • Pull request review event: 39
  • Pull request event: 33
  • Create event: 16

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 25
  • Total pull requests: 24
  • Average time to close issues: 3 months
  • Average time to close pull requests: 6 days
  • Total issue authors: 5
  • Total pull request authors: 6
  • Average comments per issue: 0.12
  • Average comments per pull request: 1.63
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 22
  • Pull requests: 24
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 6 days
  • Issue authors: 5
  • Pull request authors: 6
  • Average comments per issue: 0.14
  • Average comments per pull request: 1.63
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • vladimirobucina (16)
  • falgreen (3)
  • Cervangirard (3)
  • AEBilgrau (2)
  • akselthomsen (1)
Pull Request Authors
  • vladimirobucina (15)
  • Cervangirard (3)
  • SkanderMulder (2)
  • akselthomsen (2)
  • AEBilgrau (1)
  • Oli-nwl (1)
Top Labels
Issue Labels
bug (8) enhancement (7) documentation (5) good first issue (4)
Pull Request Labels
enhancement (7) documentation (4) bug (4) good first issue (1)

Packages

  • Total packages: 1
  • Total downloads:
    • cran 35 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: connector.databricks

Expand 'connector' Package for 'Databricks' Tables and Volumes

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 35 Last month
Rankings
Stargazers count: 23.3%
Dependent packages count: 25.6%
Forks count: 29.0%
Dependent repos count: 31.4%
Average: 38.9%
Downloads: 85.3%
Maintainers (1)
Last synced: 9 months ago

Dependencies

.github/actions/setup/action.yaml actions
.github/workflows/check_and_co.yaml actions
DESCRIPTION cran
  • DBI * imports
  • R6 >= 2.4.0 imports
  • brickster * imports
  • checkmate * imports
  • cli * imports
  • connector >= 0.0.8 imports
  • dplyr * imports
  • httr * imports
  • jsonlite * imports
  • odbc >= 1.4.0 imports
  • purrr * imports
  • rlang * imports
  • withr * imports
  • glue * suggests
  • knitr * suggests
  • mockery >= 0.4.4 suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests
  • tibble * suggests
  • whirl >= 0.1.8.9000 suggests