datawizard

datawizard: An R Package for Easy Data Preparation and Statistical Transformations - Published in JOSS (2022)

https://github.com/easystats/datawizard

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

data dplyr hacktoberfest janitor manipulation r-package reshape rstats tidyr wrangling

Keywords from Contributors

standardization correlation predict easystats gaussian-graphical-models bayes-factors bayesian-correlations biserial cor correlation-analysis

Scientific Fields

Engineering Computer Science - 80% confidence
Last synced: 4 months ago · JSON representation

Repository

Magic potions to clean and transform your data 🧙

Basic Info
Statistics
  • Stars: 230
  • Watchers: 8
  • Forks: 16
  • Open Issues: 33
  • Releases: 33
Topics
data dplyr hacktoberfest janitor manipulation r-package reshape rstats tidyr wrangling
Created over 4 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing Funding License Code of conduct Support

README.Rmd

---
output: github_document
---

# `datawizard`: Easy Data Wrangling and Statistical Transformations 

```{r, echo=FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  dpi = 300,
  out.width = "100%",
  fig.path = "man/figures/",
  comment = "#>"
)

set.seed(333)
library(datawizard)
```

[![DOI](https://joss.theoj.org/papers/10.21105/joss.04684/status.svg)](https://doi.org/10.21105/joss.04684)
[![downloads](https://cranlogs.r-pkg.org/badges/datawizard)](https://cran.r-project.org/package=datawizard)
[![total](https://cranlogs.r-pkg.org/badges/grand-total/datawizard)](https://cranlogs.r-pkg.org/)









`{datawizard}` is a lightweight package to easily manipulate, clean, transform, and prepare your data for analysis. It is part of the [easystats ecosystem](https://easystats.github.io/easystats/), a suite of R packages to deal with your entire statistical analysis, from cleaning the data to reporting the results.

It covers two aspects of data preparation:

- **Data manipulation**: `{datawizard}` offers a very similar set of functions to that of the *tidyverse* packages, such as a `{dplyr}` and `{tidyr}`, to select, filter and reshape data, with a few key differences. 1) All data manipulation functions start with the prefix `data_*` (which makes them easy to identify). 2) Although most functions can be used exactly as their *tidyverse* equivalents, they are also string-friendly (which makes them easy to program with and use inside functions). Finally, `{datawizard}` is super lightweight (no dependencies, similar to [poorman](https://github.com/nathaneastwood/poorman)), which makes it awesome for developers to use in their packages.

- **Statistical transformations**: `{datawizard}` also has powerful functions to easily apply common data [transformations](https://easystats.github.io/datawizard/reference/index.html#statistical-transformations), including standardization, normalization, rescaling, rank-transformation, scale reversing, recoding, binning, etc.





# Installation [![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/datawizard)](https://cran.r-project.org/package=datawizard) [![datawizard status badge](https://easystats.r-universe.dev/badges/datawizard)](https://easystats.r-universe.dev) [![codecov](https://codecov.io/gh/easystats/datawizard/branch/main/graph/badge.svg)](https://app.codecov.io/gh/easystats/datawizard) [![R-CMD-check](https://github.com/easystats/datawizard/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/easystats/datawizard/actions) Type | Source | Command ---|---|--- Release | CRAN | `install.packages("datawizard")` Development | r-universe | `install.packages("datawizard", repos = "https://easystats.r-universe.dev")` Development | GitHub | `remotes::install_github("easystats/datawizard")` > **Tip** > > **Instead of `library(datawizard)`, use `library(easystats)`.** > **This will make all features of the easystats-ecosystem available.** > > **To stay updated, use `easystats::install_latest()`.** # Citation To cite the package, run the following command: ```{r, comment=""} citation("datawizard") ``` # Features [![Documentation](https://img.shields.io/badge/documentation-datawizard-orange.svg?colorB=E91E63)](https://easystats.github.io/datawizard/) [![Blog](https://img.shields.io/badge/blog-easystats-orange.svg?colorB=FF9800)](https://easystats.github.io/blog/posts/) [![Features](https://img.shields.io/badge/features-datawizard-orange.svg?colorB=2196F3)](https://easystats.github.io/datawizard/reference/index.html) Most courses and tutorials about statistical modeling assume that you are working with a clean and tidy dataset. In practice, however, a major part of doing statistical modeling is preparing your data--cleaning up values, creating new columns, reshaping the dataset, or transforming some variables. `{datawizard}` provides easy to use tools to perform these common, critical, and sometimes tedious data preparation tasks. ## Data wrangling ### Select, filter and remove variables The package provides helpers to filter rows meeting certain conditions... ```{r} data_match(mtcars, data.frame(vs = 0, am = 1)) ``` ... or logical expressions: ```{r} data_filter(mtcars, vs == 0 & am == 1) ``` Finding columns in a data frame, or retrieving the data of selected columns, can be achieved using `extract_column_names()` or `data_select()`: ```{r} # find column names matching a pattern extract_column_names(iris, starts_with("Sepal")) # return data columns matching a pattern data_select(iris, starts_with("Sepal")) |> head() ``` It is also possible to extract one or more variables: ```{r} # single variable data_extract(mtcars, "gear") # more variables head(data_extract(iris, ends_with("Width"))) ``` Due to the consistent API, removing variables is just as simple: ```{r} head(data_remove(iris, starts_with("Sepal"))) ``` ### Reorder or rename ```{r} head(data_relocate(iris, select = "Species", before = "Sepal.Length")) ``` ```{r} head(data_rename(iris, c("Sepal.Length", "Sepal.Width"), c("length", "width"))) ``` ### Merge ```{r} x <- data.frame(a = 1:3, b = c("a", "b", "c"), c = 5:7, id = 1:3) y <- data.frame(c = 6:8, d = c("f", "g", "h"), e = 100:102, id = 2:4) x y data_merge(x, y, join = "full") data_merge(x, y, join = "left") data_merge(x, y, join = "right") data_merge(x, y, join = "semi", by = "c") data_merge(x, y, join = "anti", by = "c") data_merge(x, y, join = "inner") data_merge(x, y, join = "bind") ``` ### Reshape A common data wrangling task is to reshape data. Either to go from wide/Cartesian to long/tidy format ```{r} wide_data <- data.frame(replicate(5, rnorm(10))) head(data_to_long(wide_data)) ``` or the other way ```{r} long_data <- data_to_long(wide_data, rows_to = "Row_ID") # Save row number data_to_wide(long_data, names_from = "name", values_from = "value", id_cols = "Row_ID" ) ``` ### Empty rows and columns ```{r} tmp <- data.frame( a = c(1, 2, 3, NA, 5), b = c(1, NA, 3, NA, 5), c = c(NA, NA, NA, NA, NA), d = c(1, NA, 3, NA, 5) ) tmp # indices of empty columns or rows empty_columns(tmp) empty_rows(tmp) # remove empty columns or rows remove_empty_columns(tmp) remove_empty_rows(tmp) # remove empty columns and rows remove_empty(tmp) ``` ### Recode or cut dataframe ```{r} set.seed(123) x <- sample(1:10, size = 50, replace = TRUE) table(x) # cut into 3 groups, based on distribution (quantiles) table(categorize(x, split = "quantile", n_groups = 3)) ``` ## Data Transformations The packages also contains multiple functions to help transform data. ### Standardize For example, to standardize (*z*-score) data: ```{r} # before summary(swiss) # after summary(standardize(swiss)) ``` ### Winsorize To winsorize data: ```{r} # before anscombe # after winsorize(anscombe) ``` ### Center To grand-mean center data ```{r} center(anscombe) ``` ### Ranktransform To rank-transform data: ```{r} # before head(trees) # after head(ranktransform(trees)) ``` ### Rescale To rescale a numeric variable to a new range: ```{r} change_scale(c(0, 1, 5, -5, -2)) ``` ### Rotate or transpose ```{r} x <- mtcars[1:3, 1:4] x data_rotate(x) ``` ## Data properties `datawizard` provides a way to provide comprehensive descriptive summary for all variables in a dataframe: ```{r} data(iris) describe_distribution(iris) ``` Or even just a variable ```{r} describe_distribution(mtcars$wt) ``` There are also some additional data properties that can be computed using this package. ```{r} x <- (-10:10)^3 + rnorm(21, 0, 100) smoothness(x, method = "diff") ``` ## Function design and pipe-workflow The design of the `{datawizard}` functions follows a design principle that makes it easy for user to understand and remember how functions work: 1. the first argument is the data 2. for methods that work on data frames, two arguments are following to `select` and `exclude` variables 3. the following arguments are arguments related to the specific tasks of the functions Most important, functions that accept data frames usually have this as their first argument, and also return a (modified) data frame again. Thus, `{datawizard}` integrates smoothly into a "pipe-workflow". ```{r} iris |> # all rows where Species is "versicolor" or "virginica" data_filter(Species %in% c("versicolor", "virginica")) |> # select only columns with "." in names (i.e. drop Species) data_select(contains("\\.")) |> # move columns that ends with "Length" to start of data frame data_relocate(ends_with("Length")) |> # remove fourth column data_remove(4) |> head() ``` # Contributing and Support In case you want to file an issue or contribute in another way to the package, please follow [this guide](https://easystats.github.io/datawizard/CONTRIBUTING.html). For questions about the functionality, you may either contact us via email or also file an issue. # Code of Conduct Please note that this project is released with a [Contributor Code of Conduct](https://easystats.github.io/datawizard/CODE_OF_CONDUCT.html). By participating in this project you agree to abide by its terms.

Owner

  • Name: easystats
  • Login: easystats
  • Kind: organization
  • Location: worldwide

Make R stats easy!

JOSS Publication

datawizard: An R Package for Easy Data Preparation and Statistical Transformations
Published
October 09, 2022
Volume 7, Issue 78, Page 4684
Authors
Indrajeet Patil ORCID
cynkra Analytics GmbH, Germany
Dominique Makowski ORCID
Nanyang Technological University, Singapore
Mattan S. Ben-Shachar ORCID
Ben-Gurion University of the Negev, Israel
Brenton M. Wiernik ORCID
Independent Researcher
Etienne Bacher ORCID
Luxembourg Institute of Socio-Economic Research (LISER), Luxembourg
Daniel Lüdecke ORCID
University Medical Center Hamburg-Eppendorf, Germany
Editor
Øystein Sørensen ORCID
Tags
easystats

GitHub Events

Total
  • Create event: 55
  • Commit comment event: 1
  • Release event: 4
  • Issues event: 43
  • Watch event: 14
  • Delete event: 48
  • Issue comment event: 288
  • Push event: 488
  • Pull request review event: 165
  • Pull request review comment event: 156
  • Pull request event: 99
Last Year
  • Create event: 55
  • Commit comment event: 1
  • Release event: 4
  • Issues event: 43
  • Watch event: 14
  • Delete event: 48
  • Issue comment event: 289
  • Push event: 492
  • Pull request review event: 167
  • Pull request review comment event: 157
  • Pull request event: 100

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,248
  • Total Committers: 9
  • Avg Commits per committer: 138.667
  • Development Distribution Score (DDS): 0.504
Past Year
  • Commits: 103
  • Committers: 5
  • Avg Commits per committer: 20.6
  • Development Distribution Score (DDS): 0.417
Top Committers
Name Email Commits
Daniel m****l@d****e 619
Indrajeet Patil p****e@g****m 334
Etienne Bacher 5****r 199
Mattan S. Ben-Shachar m****b@m****o 26
Dominique Makowski d****9@g****m 24
etiennebacher y****u@e****m 15
github-actions[bot] 4****] 13
Brenton M. Wiernik b****k 12
Rémi Thériault 1****c 6
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 113
  • Total pull requests: 311
  • Average time to close issues: 3 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 17
  • Total pull request authors: 7
  • Average comments per issue: 4.18
  • Average comments per pull request: 3.09
  • Merged pull requests: 264
  • Bot issues: 0
  • Bot pull requests: 25
Past Year
  • Issues: 34
  • Pull requests: 127
  • Average time to close issues: 12 days
  • Average time to close pull requests: 3 days
  • Issue authors: 9
  • Pull request authors: 5
  • Average comments per issue: 3.35
  • Average comments per pull request: 2.2
  • Merged pull requests: 103
  • Bot issues: 0
  • Bot pull requests: 13
Top Authors
Issue Authors
  • etiennebacher (25)
  • IndrajeetPatil (24)
  • strengejacke (20)
  • mattansb (11)
  • DominiqueMakowski (11)
  • jmgirard (6)
  • rempsyc (4)
  • bwiernik (2)
  • profandyfield (2)
  • chuxinyuan (1)
  • Cal-Fang (1)
  • Cghlewis (1)
  • albaperis (1)
  • lewislehe (1)
  • BalbR (1)
Pull Request Authors
  • strengejacke (170)
  • etiennebacher (88)
  • github-actions[bot] (25)
  • IndrajeetPatil (16)
  • mattansb (6)
  • DominiqueMakowski (4)
  • rempsyc (2)
Top Labels
Issue Labels
enhancement :boom: (8) bug 🪲 (8) upkeep :broom: (6) Feature idea :fire: (4) feature idea :fire: (3) consistency 🍎🍏 (3) docs 📚 (3) Upkeep :broom: (3) Bug :bug: (3) breaking :skull_and_crossbones: (2) Enhancement :boom: (2) question (1) Docs 📚 (1) Discussion :parrot: (1) Consistency :green_apple: :apple: (1) High priority :running_man: (1) invalid (1) high priority :running_man: (1)
Pull Request Labels
Auto-update (17) auto-update (8) docs 📚 (1)

Packages

  • Total packages: 2
  • Total downloads:
    • cran 115,922 last-month
  • Total docker downloads: 48,992
  • Total dependent packages: 24
    (may contain duplicates)
  • Total dependent repositories: 42
    (may contain duplicates)
  • Total versions: 49
  • Total maintainers: 1
cran.r-project.org: datawizard

Easy Data Wrangling and Statistical Transformations

  • Versions: 34
  • Dependent Packages: 18
  • Dependent Repositories: 41
  • Downloads: 115,922 Last month
  • Docker Downloads: 48,992
Rankings
Downloads: 1.3%
Stargazers count: 2.5%
Dependent packages count: 3.7%
Dependent repos count: 4.0%
Average: 6.4%
Forks count: 7.0%
Docker downloads count: 19.8%
Last synced: 4 months ago
conda-forge.org: r-datawizard
  • Versions: 15
  • Dependent Packages: 6
  • Dependent Repositories: 1
Rankings
Dependent packages count: 9.0%
Dependent repos count: 24.4%
Average: 26.9%
Stargazers count: 29.3%
Forks count: 44.9%
Last synced: 4 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.6 depends
  • insight >= 0.18.8 imports
  • stats * imports
  • utils * imports
  • bayestestR * suggests
  • boot * suggests
  • brms * suggests
  • data.table * suggests
  • dplyr >= 1.0 suggests
  • effectsize * suggests
  • gamm4 * suggests
  • ggplot2 * suggests
  • gt * suggests
  • haven * suggests
  • htmltools * suggests
  • httr * suggests
  • knitr * suggests
  • lme4 * suggests
  • mediation * suggests
  • parameters * suggests
  • poorman >= 0.2.6 suggests
  • psych * suggests
  • readr * suggests
  • readxl * suggests
  • rio * suggests
  • rmarkdown * suggests
  • rstanarm * suggests
  • see * suggests
  • testthat >= 3.1.0 suggests
  • tidyr * suggests
  • withr * suggests
.github/workflows/R-CMD-check-devel-easystats.yaml actions
.github/workflows/R-CMD-check-hard.yaml actions
.github/workflows/R-CMD-check-strict.yaml actions
.github/workflows/R-CMD-check.yaml actions
.github/workflows/check-all-examples.yaml actions
.github/workflows/check-link-rot.yaml actions
.github/workflows/check-random-test-order.yaml actions
.github/workflows/check-readme.yaml actions
.github/workflows/check-spelling.yaml actions
.github/workflows/check-styling.yaml actions
.github/workflows/check-test-warnings.yaml actions
.github/workflows/check-vignette-warnings.yaml actions
.github/workflows/html-5-check.yaml actions
.github/workflows/lint-changed-files.yaml actions
.github/workflows/lint.yaml actions
.github/workflows/pkgdown-no-suggests.yaml actions
.github/workflows/pkgdown.yaml actions
.github/workflows/revdepcheck.yaml actions
.github/workflows/test-coverage-examples.yaml actions
.github/workflows/test-coverage.yaml actions
.github/workflows/update-to-latest-easystats.yaml actions