bitfield
Handle Bit-Flags to record Quality for Modeled Data Products
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.7%) to scientific vocabulary
Last synced: 9 months ago
·
JSON representation
Repository
Handle Bit-Flags to record Quality for Modeled Data Products
Basic Info
- Host: GitHub
- Owner: bitfloat
- License: gpl-3.0
- Language: R
- Default Branch: main
- Homepage: https://bitfloat.github.io/bitfield/
- Size: 3.02 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 2
Created about 4 years ago
· Last pushed 10 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(bitfield)
library(dplyr)
library(knitr)
```
# bitfield
[](https://cran.r-project.org/package=bitfield)
[](https://cran.r-project.org/package=bitfield)
[](https://github.com/bitfloat/bitfield/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/bitfloat/bitfield)
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
## Overview
This package is designed to capture the computational footprint of any model workflow or output. It achieves this by encoding computational decisions into sequences of bits (i.e., [bitfields](https://en.wikipedia.org/wiki/Bit_field)) that are transformed to integer values. This allows storing a range of information into a single column of a table or a raster layer, which can be useful when documenting
- the metadata of any dataset by collecting information throughout the dataset creation process,
- intermediate data that accrue along a workflow, or
- a set of output metrics or parameters.
- ...
Think of a bit as a switch representing off and on states. A combination of a pair of bits can store four states, and n bits can accommodate 2^n states. These states could be the outcomes of (simple) tests that document binary responses, cases or numeric values. The data produced in that way could be described as meta-analytic or meta-algorithmic data, because they can be re-used to extend an analysis pipeline or algorithm by downstream applications.
## Installation
Install the official version from CRAN:
```{r, eval=FALSE}
install.packages("bitfield")
```
Install the latest development version from github:
```{r, eval=FALSE}
devtools::install_github("bitfloat/bitfield")
```
## Examples
```{r}
library(bitfield)
library(dplyr, warn.conflicts = FALSE)
library(terra, warn.conflicts = FALSE)
```
Let's first load an example dataset
```{r}
bf_tbl$x # invalid (259) and improbable (0) coordinate value
bf_tbl$y # Inf and NaN value
bf_tbl$commodity # NA value or mislabelled term ("honey")
bf_tbl$yield # correct range?!
bf_tbl$year # flags (*r)
# and there is a set of valid commodity terms
validComm <- c("soybean", "maize")
```
The first step is in creating what is called `registry` in `bitfield`. This registry captures all the information required to build the bitfield (note, this is merely a minimal tech demo, for additional information, check the respective function documentations).
```{r}
reg <- bf_registry(name = "yield_QA",
description = "this example bitfield documents quality assessment of yield data.")
```
Then, individual bit flags need to be grown by specifying the respective protocols. These protocols create flags for the most common applications, such as `na` (to test for missing values), `case` (to test what case/class the observations are part of),`nChar` (to count the number of characters of a variable), or `numeric` to encode a numeric (floating point) variable as bit sequence.
```{r}
# tests for longitude availability
reg <- bf_map(protocol = "na", # the protocol with which to build the bit flag
data = bf_tbl, # specify where to determine flags
x = x, # ... and which variable to test
pos = 1, # specify at which position to store the flag
registry = reg) # provide the registry to update
# test which case an observation is part of
reg <- bf_map(protocol = "case", data = bf_tbl, registry = reg, na.val = 0,
yield >= 11, yield < 11 & yield > 9, yield < 9 & commodity == "maize")
# test the length (number of characters) of values
reg <- bf_map(protocol = "nChar", data = bf_tbl, registry = reg,
x = y)
# store a simplified (e.g. rounded) numeric value
reg <- bf_map(protocol = "numeric", data = bf_tbl, registry = reg,
x = yield, format = "half")
```
These are functions that represent the possible encoding types boolean (`bool`), enumerated cases (`enum`), (signed) integers (`int`), and numeric floating-point (`num`). The encoding type determines various storage parameters of the resulting flags. This is, however, not yet the bitfield. The registry is merely the instruction manual, so to speak, to create the bitfield and encode it as integer, with the function `bf_encode()`.
```{r}
reg
(field <- bf_encode(registry = reg))
```
The bitfield can be decoded based on the registry with the function `bf_decode()` at a later point in time or another workflow, where the metadata contained in the bitfield can be used or extended in a downstream application.
```{r}
flags <- bf_decode(x = field, registry = reg, sep = "-")
# -> prints legend by default, which is also available in bf_legend
bf_tbl |>
bind_cols(flags) |>
kable()
```
The column `bf_bin`, in combination with the legend, can be read one step at a time. For example, considering the first bit, we see that no observation has an `NA` value and considering the second bit, we see that observations 4 and 6 have a `yield` smaller than 9 and a `commodity` value "maize" (case 3 with binary value `11`).
Moreover, more computation friendly, we can also separate the bitfield into distinct columns per flag and we can load the decoded values from the package environment `.GlobalEnv`.
```{r}
bf_decode(x = field, registry = reg, verbose = FALSE)
# access values manually
ls(.GlobalEnv)
.GlobalEnv[["nChar_y"]]
```
Beware that numeric values that have been encoded in this way, likely have a lower precision than the input values (which may not be a problem in the frequent case where only rounded values are of interest). This can be adjusted by setting the respective parameters in the protocol that encodes numeric values (a vignette explaining this in detail will follow).
```{r}
old <- options(pillar.sigfig = 7)
tibble::tibble(original = bf_tbl$yield,
bitfield = .GlobalEnv$numeric_yield)
options(old)
```
## Bitfields for raster data
An interesting use case is in encoding metadata for modelled gridded data. In the newest version of this package, this is possible simply by providing a raster instead of a table to the functions.
```{r}
library(terra)
# define an example dataset
bf_rst <- rast(nrows = 3, ncols = 3, vals = bf_tbl$commodity,
names = "commodity") # with levels
bf_rst$yield <- rast(nrows = 3, ncols = 3, vals = bf_tbl$yield) # numeric
# build the registry
reg <- bf_registry(name = "reg_raster",
description = "bitfield for rasters.")
reg <- bf_map(protocol = "na", data = bf_rst, registry = reg,
x = commodity)
reg <- bf_map(protocol = "range", data = bf_rst, registry = reg,
x = yield, min = 5, max = 11)
reg <- bf_map(protocol = "category", data = bf_rst, registry = reg,
x = commodity, na.val = 0)
# encode as bitfield (and make raster out of it)
field <- bf_encode(registry = reg)
rst_field <- rast(bf_rst$yield, vals = field, names = names(field))
# decode (gridded) bitfield somewhere downstream
flags <- bf_decode(x = values(rst_field, dataframe = TRUE), registry = reg)
bind_cols(flags, field)
plot(c(bf_rst, rst_field))
```
Owner
- Name: bitfloat
- Login: bitfloat
- Kind: organization
- Repositories: 1
- Profile: https://github.com/bitfloat
GitHub Events
Total
- Release event: 1
- Delete event: 1
- Push event: 41
- Create event: 1
Last Year
- Release event: 1
- Delete event: 1
- Push event: 41
- Create event: 1
Packages
- Total packages: 1
-
Total downloads:
- cran 214 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
cran.r-project.org: bitfield
Handle Bitfields to Record Meta Data
- Homepage: https://github.com/bitfloat/bitfield
- Documentation: http://cran.r-project.org/web/packages/bitfield/bitfield.pdf
- License: GPL (≥ 3)
-
Latest release: 0.6.1
published about 1 year ago
Rankings
Dependent packages count: 26.6%
Dependent repos count: 32.8%
Average: 48.7%
Downloads: 86.7%
Maintainers (1)
Last synced:
10 months ago
Dependencies
.github/workflows/check-standard.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- checkmate * imports
- crayon * imports
- dplyr * imports
- magrittr * imports
- methods * imports
- purrr * imports
- rlang * imports
- stringr * imports
- tibble * imports
- tidyr * imports
- knitr * suggests
- rmarkdown * suggests