kibior

Interact easily with Elasticsearch-related backend in R

https://github.com/regisoc/kibior

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

data-science database datasets elasticsearch elasticsearch-client push-pull r search search-engine
Last synced: 6 months ago · JSON representation

Repository

Interact easily with Elasticsearch-related backend in R

Basic Info
  • Host: GitHub
  • Owner: regisoc
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 2.13 MB
Statistics
  • Stars: 3
  • Watchers: 4
  • Forks: 1
  • Open Issues: 2
  • Releases: 0
Topics
data-science database datasets elasticsearch elasticsearch-client push-pull r search search-engine
Created almost 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme

README.md

kibior: easy scientific data handling, searching and sharing with Elasticsearch

Project Status: Active Build Status

Version: 0.1.1

TL;DR

| | | |-|-| | What | kibior is a R package dedicated to ease the pain of data handling in science, and more notably with biological data. | | Where | kibior is using Elasticsearch as database and search engine. | | Who | kibior is built for data science and data manipulation, so when any data-related action or need is involved, notably sharing data. It mainly targets bioinformaticians, and more broadly, data scientists. | | When | Available now from this repository, or CRAN repository. | | Public instances | Use the $get_kibio_instance() method to connect to Kibio and access known datasets. See Kibio datasets at the end of this document for a complete list. | | Cite this package | In R session, run citation("kibior") | | Publication | 10.1093/bioinformatics/btab157 |

Main features

This package allows:

  • Pushing, pulling, joining, sharing and searching tabular data between an R session and one or multiple Elasticsearch instances/clusters.
  • Massive data query and filter with Elasticsearch engine.
  • Multiple living Elasticsearch connections to different addresses.
  • Method autocompletion in proper environments (e.g. R cli, RStudio).
  • Import and export datasets from an to files.
  • Server-side execution for most of operations (i.e. on Elasticsearch instances/clusters).

How

Install

```r

Get from CRAN

install.packages("kibior")

or get the latest from Github

devtools::install_github("regisoc/kibior") ```

Run

```r

load

library(kibior)

Get a specific instance

kc <- Kibior$new("serveroraddress", port)

Or try something bigger...

kibio <- Kibior$getkibioinstance() kibio$list()

```

Examples

Here is an extract of some of the features proposed by KibioR. See Introduction vignette for more advanced usage.

Example: push datasets

```r

Push data (R memory -> Elasticsearch)

dplyr::starwars %>% kc$push("sw") dplyr::storms %>% kc$push("st") ```

Example: pull datasets

```r

Pull data with columns selection (Elasticsearch -> R memory)

kc$pull("sw", query = "homeworld:(naboo || tatooine)", columns = c("name", "homeworld", "height", "mass", "species"))

see vignette for query syntax

```

Example: copy datasets

```r

Copy dataset (Elasticsearch internal operation)

kc$copy("sw", "sw_copy") ```

Example: delete datasets

```r

Delete datasets

kc$delete("sw_copy") ```

Example: list, match dataset names

```r

List available datasets

kc$list()

Search for index names starting with "s"

kc$match("s*") ```

Example: get columns names and list unique keys in values

```r

Get columns of all datasets starting with "s"

kc$columns("s*")

Get unique values of a column

kc$keys("sw", "homeworld") ```

Example: some Elasticsearch basic statistical methods

```r

Count number of lines in dataset

kc$count("st")

Count number of lines with query (name of the storm is Anita)

kc$count("st", query = "name:anita")

Generic stats on two columns

kc$stats("sw", c("height", "mass"))

Specific descriptive stats with query

kc$avg("sw", c("height", "mass"), query = "homeworld:naboo") ```

Example: join

```r

Inner join between:

1/ a Elasticsearch-based dataset with query ("sw"),

2/ and a in-memory R dataset (dplyr::starwars)

kc$innerjoin("sw", dplyr::starwars, leftquery = "haircolor:black", leftcolumns = c("name", "mass", "height"), by = "name") ```

Owner

  • Name: regis
  • Login: regisoc
  • Kind: user

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 129
  • Total Committers: 2
  • Avg Commits per committer: 64.5
  • Development Distribution Score (DDS): 0.031
Top Committers
Name Email Commits
regisoc r****1@u****a 125
regis r****c@u****m 4
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 4
  • Total pull requests: 0
  • Average time to close issues: about 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.75
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • regisoc (4)
Pull Request Authors
Top Labels
Issue Labels
enhancement (4)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 236 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: kibior

A Simple Data Management and Sharing Tool

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 236 Last month
Rankings
Stargazers count: 28.5%
Forks count: 28.8%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 35.7%
Downloads: 56.1%
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.0 depends
  • Biostrings * imports
  • R6 >= 2.5.0 imports
  • Rsamtools * imports
  • data.table >= 1.13.2 imports
  • dplyr >= 1.0.2 imports
  • elastic >= 1.1.0 imports
  • jsonlite >= 1.7.1 imports
  • magrittr >= 1.5 imports
  • purrr >= 0.3.4 imports
  • rio >= 0.5.16 imports
  • rtracklayer * imports
  • stringr >= 1.4.0 imports
  • tibble >= 3.0.4 imports
  • tidyr >= 1.1.2 imports
  • ggplot2 >= 3.3.2 suggests
  • knitr >= 1.30 suggests
  • readr >= 1.4.0 suggests
  • rmarkdown >= 2.5 suggests
  • testthat >= 3.0.0 suggests
  • xml2 >= 1.3.2 suggests
  • yaml >= 2.2.1 suggests