bitfield

Handle Bit-Flags to record Quality for Modeled Data Products

https://github.com/bitfloat/bitfield

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Handle Bit-Flags to record Quality for Modeled Data Products

Basic Info
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created about 4 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(bitfield)
library(dplyr)
library(knitr)
```

# bitfield 


[![CRAN\_Status\_Badge](http://www.r-pkg.org/badges/version/bitfield)](https://cran.r-project.org/package=bitfield)
[![](http://cranlogs.r-pkg.org/badges/grand-total/bitfield)](https://cran.r-project.org/package=bitfield)

[![R-CMD-check](https://github.com/bitfloat/bitfield/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/bitfloat/bitfield/actions/workflows/R-CMD-check.yaml)
[![codecov](https://codecov.io/gh/bitfloat/bitfield/graph/badge.svg?token=QZB36RION3)](https://app.codecov.io/gh/bitfloat/bitfield)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)



## Overview

This package is designed to capture the computational footprint of any model workflow or output. It achieves this by encoding computational decisions into sequences of bits (i.e., [bitfields](https://en.wikipedia.org/wiki/Bit_field)) that are transformed to integer values. This allows storing a range of information into a single column of a table or a raster layer, which can be useful when documenting 

  - the metadata of any dataset by collecting information throughout the dataset creation process, 
  - intermediate data that accrue along a workflow, or 
  - a set of output metrics or parameters.
  - ...

Think of a bit as a switch representing off and on states. A combination of a pair of bits can store four states, and n bits can accommodate 2^n states. These states could be the outcomes of (simple) tests that document binary responses, cases or numeric values. The data produced in that way could be described as meta-analytic or meta-algorithmic data, because they can be re-used to extend an analysis pipeline or algorithm by downstream applications.

## Installation

Install the official version from CRAN:

```{r, eval=FALSE}
install.packages("bitfield")
```

Install the latest development version from github:

```{r, eval=FALSE}
devtools::install_github("bitfloat/bitfield")
```


## Examples

```{r}
library(bitfield)
library(dplyr, warn.conflicts = FALSE)
library(terra, warn.conflicts = FALSE)
```

Let's first load an example dataset

```{r}
bf_tbl$x                                       # invalid (259) and improbable (0) coordinate value

bf_tbl$y                                       # Inf and NaN value

bf_tbl$commodity                               # NA value or mislabelled term ("honey")

bf_tbl$yield                                   # correct range?!

bf_tbl$year                                    # flags (*r)

# and there is a set of valid commodity terms
validComm <- c("soybean", "maize")
```

The first step is in creating what is called `registry` in `bitfield`. This registry captures all the information required to build the bitfield (note, this is merely a minimal tech demo, for additional information, check the respective function documentations).

```{r}
reg <- bf_registry(name = "yield_QA",
                   description = "this example bitfield documents quality assessment of yield data.")
```

Then, individual bit flags need to be grown by specifying the respective protocols. These protocols create flags for the most common applications, such as `na` (to test for missing values), `case` (to test what case/class the observations are part of),`nChar` (to count the number of characters of a variable), or `numeric` to encode a numeric (floating point) variable as bit sequence.
  
```{r}
# tests for longitude availability
reg <- bf_map(protocol = "na",                       # the protocol with which to build the bit flag
              data = bf_tbl,                         # specify where to determine flags
              x = x,                                 # ... and which variable to test
              pos = 1,                               # specify at which position to store the flag
              registry = reg)                        # provide the registry to update

# test which case an observation is part of
reg <- bf_map(protocol = "case", data = bf_tbl, registry = reg, na.val = 0,
              yield >= 11, yield < 11 & yield > 9, yield < 9 & commodity == "maize")

# test the length (number of characters) of values
reg <- bf_map(protocol = "nChar", data = bf_tbl, registry = reg, 
              x = y)

# store a simplified (e.g. rounded) numeric value
reg <- bf_map(protocol = "numeric", data = bf_tbl, registry = reg, 
              x = yield, format = "half")
```

These are functions that represent the possible encoding types boolean (`bool`), enumerated cases (`enum`), (signed) integers (`int`), and numeric floating-point (`num`). The encoding type determines various storage parameters of the resulting flags. This is, however, not yet the bitfield. The registry is merely the instruction manual, so to speak, to create the bitfield and encode it as integer, with the function `bf_encode()`.

```{r}
reg

(field <- bf_encode(registry = reg))
```

The bitfield can be decoded based on the registry with the function `bf_decode()` at a later point in time or another workflow, where the metadata contained in the bitfield can be used or extended in a downstream application.

```{r}
flags <- bf_decode(x = field, registry = reg, sep = "-")

# -> prints legend by default, which is also available in bf_legend

bf_tbl |>
  bind_cols(flags) |>
  kable()
```

The column `bf_bin`, in combination with the legend, can be read one step at a time. For example, considering the first bit, we see that no observation has an `NA` value and considering the second bit, we see that observations 4 and 6 have a `yield` smaller than 9 and a `commodity` value "maize" (case 3 with binary value `11`).

Moreover, more computation friendly, we can also separate the bitfield into distinct columns per flag and we can load the decoded values from the package environment `.GlobalEnv`.

```{r}
bf_decode(x = field, registry = reg, verbose = FALSE)

# access values manually
ls(.GlobalEnv)

.GlobalEnv[["nChar_y"]]
```

Beware that numeric values that have been encoded in this way, likely have a lower precision than the input values (which may not be a problem in the frequent case where only rounded values are of interest). This can be adjusted by setting the respective parameters in the protocol that encodes numeric values (a vignette explaining this in detail will follow).

```{r}
old <- options(pillar.sigfig = 7)
tibble::tibble(original = bf_tbl$yield, 
               bitfield = .GlobalEnv$numeric_yield)
options(old)
```

## Bitfields for raster data

An interesting use case is in encoding metadata for modelled gridded data. In the newest version of this package, this is possible simply by providing a raster instead of a table to the functions.

```{r}
library(terra)

# define an example dataset
bf_rst <- rast(nrows = 3, ncols = 3, vals = bf_tbl$commodity, 
               names = "commodity") # with levels
bf_rst$yield <- rast(nrows = 3, ncols = 3, vals = bf_tbl$yield) # numeric

# build the registry
reg <- bf_registry(name = "reg_raster",
                   description = "bitfield for rasters.")

reg <- bf_map(protocol = "na", data = bf_rst, registry = reg,
              x = commodity)
reg <- bf_map(protocol = "range", data = bf_rst, registry = reg, 
              x = yield, min = 5, max = 11)
reg <- bf_map(protocol = "category", data = bf_rst, registry = reg,
              x = commodity, na.val = 0)
              
# encode as bitfield (and make raster out of it)
field <- bf_encode(registry = reg)
rst_field <- rast(bf_rst$yield, vals = field, names = names(field))

# decode (gridded) bitfield somewhere downstream
flags <- bf_decode(x = values(rst_field, dataframe = TRUE), registry = reg)

bind_cols(flags, field)

plot(c(bf_rst, rst_field))
```

Owner

  • Name: bitfloat
  • Login: bitfloat
  • Kind: organization

GitHub Events

Total
  • Release event: 1
  • Delete event: 1
  • Push event: 41
  • Create event: 1
Last Year
  • Release event: 1
  • Delete event: 1
  • Push event: 41
  • Create event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • cran 214 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: bitfield

Handle Bitfields to Record Meta Data

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 214 Last month
Rankings
Dependent packages count: 26.6%
Dependent repos count: 32.8%
Average: 48.7%
Downloads: 86.7%
Maintainers (1)
Last synced: 10 months ago

Dependencies

.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • checkmate * imports
  • crayon * imports
  • dplyr * imports
  • magrittr * imports
  • methods * imports
  • purrr * imports
  • rlang * imports
  • stringr * imports
  • tibble * imports
  • tidyr * imports
  • knitr * suggests
  • rmarkdown * suggests