unpivotr

Unpivot complex and irregular data layouts in R

https://github.com/nacnudus/unpivotr

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

excel pivot-tables r spreadsheet

Keywords from Contributors

parsing tidyverse unit-testing fwf csv visualisation factor strings string-interpolation s3-vectors
Last synced: 6 months ago · JSON representation

Repository

Unpivot complex and irregular data layouts in R

Basic Info
Statistics
  • Stars: 187
  • Watchers: 8
  • Forks: 19
  • Open Issues: 4
  • Releases: 5
Topics
excel pivot-tables r spreadsheet
Created over 9 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/"
)
```

# unpivotr


[![Cran Status](http://www.r-pkg.org/badges/version/unpivotr)](https://CRAN.R-project.org/package=unpivotr)
![Cran Downloads](https://cranlogs.r-pkg.org/badges/unpivotr)
[![codecov](https://codecov.io/github/nacnudus/unpivotr/coverage.svg?branch=master)](https://app.codecov.io/gh/nacnudus/unpivotr)
[![R-CMD-check](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml)


[unpivotr](https://github.com/nacnudus/unpivotr) deals with non-tabular data,
especially from spreadsheets.  Use unpivotr when your source data has any of
these 'features':

* Multi-headered hydra
* Meaningful formatting
* Headers anywhere but at the top of each column
* Non-text headers e.g. dates
* Other stuff around the table
* Several similar tables in one sheet
* Sentinel values
* Superscript symbols
* Meaningful comments
* Nested HTML tables

If that list makes your blood boil, you'll enjoy the function names.

* `behead()` deals with multi-headered hydra tables one layer of headers at a
  time, working from the edge of the table inwards.  It's a bit like using
  `header = TRUE` in `read.csv()`, but because it's a function, you can apply it
  to as many layers of headers as you need.  You end up with all the headers in
  columns.
* `spatter()` is like `tidyr::spread()` but preserves mixed data types.  You get
  into a mixed-data-type situation by delaying type coercion until *after* the
  table is tidy (rather than before, like `read.csv()` et al).  And yes, it
  usually follows `behead()`.

More positive, corrective functions:

* `justify()` aligns column headers before `behead()`ing, and has deliberate
  moral overtones.
* `enhead()` attaches a header to the body of the data, *a la* Frankenstein.
  The effect is the same as `behead()`, but is more powerful because you can
  choose exactly which header cells you want, paying attention to formatting
  (which `behead()` doesn't understand).
* `isolate_sentinels()` separates meaningful symbols like `"N/A"` or
  `"confidential"` from the rest of the data, giving them some time alone think
  about what they've done.
* `partition()` takes a sheet with several tables on it, and slashes into pieces
  that each contain one table.  You can then unpivot each table in turn with
  `purrr::map()` or similar.

## Make cells tidy

Unpivotr uses data where each cells is represented by one row in a dataframe.
Like this.

![Gif of tidyxl converting cells into a tidy representation of one row per cell](./vignettes/tidy_xlsx.gif)

What can you do with tidy cells?  The best places to start are:

* [Spreadsheet Munging
  Strategies](https://nacnudus.github.io/spreadsheet-munging-strategies/), a
  free, online cookbook using [tidyxl](https://github.com/nacnudus/tidyxl/) and
  [unpivotr](https://github.com/nacnudus/unpivotr)
* [Screencasts](https://www.youtube.com/watch?v=1sinC7wsS5U) on YouTube.
* [Worked examples](https://github.com/nacnudus/ukfarm) on GitHub.

Otherwise the basic idea is:

1. Read the data with a specialist tool.
   * For spreadsheets, use [tidyxl](https://nacnudus.github.io/tidyxl/).
   * For plain text files, you might soon be able to use
     [readr](https://readr.tidyverse.org), but for now you'll have to install a
     pull-request on that package with
     `devtools::install_github("tidyverse/readr#760")`.
   * For tables in html pages, use `unpivotr::tidy_html()`
   * For data frames, use `unpivotr::as_cells()` -- this should be a last
     resort, because by the time the data is in a conventional data frame, it
     is often too late -- formatting has been lost, and most data types have
     been coerced to strings.
1. Either `behead()` straight away, else `dplyr::filter()` separately for the
   header cells and the data cells, and then recombine with `enhead()`.
1. `spatter()` so that each column has one data type.

```{r}
library(unpivotr)
library(tidyverse)
x <- purpose$`up-left left-up`
x # A pivot table in a conventional data frame.  Four levels of headers, in two
  # rows and two columns.

y <- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell
y

rectify(y) # useful for reviewing the melted form as though in a spreadsheet

y %>%
  behead("up-left", "sex") %>%               # Strip headers
  behead("up", "life-satisfication") %>%  # one
  behead("left-up", "qualification") %>%     # by
  behead("left", "age-band") %>%            # one.
  select(-row, -col, -data_type, count = chr) %>% # cleanup
  mutate(count = as.integer(count))
```

Note the compass directions in the code above, which hint to `behead()` where to
find the header cell for each data cell.

* `"up-left"` means the header (`Female`, `Male`) is positioned up and to the
  left of the columns of data cells it describes.
* `"up"` means the header (`0 - 6`, `7 - 10`) is positioned directly above the
  columns of data cells it describes.
* `"left-up"` means the header (`Bachelor's degree`, `Certificate`, etc.) is
  positioned to the left and upwards of the rows of data cells it describes.
* `"left"` means the header (`15 - 24`, `25 - 44`, etc.) is positioned directly to
  the left of the rows of data cells it describes.

## Installation

```{r, echo = TRUE, eval = FALSE}
# install.packages("devtools") # If you don't already have devtools
devtools::install_github("nacnudus/unpivotr", build_vignettes = TRUE)
```

The version 0.4.0 release had somee breaking changes.  See `NEWS.md` for
details.  The previous version can be installed as follow:

```r
devtools::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org")
```

## Similar projects

[unpivotr](https://github.com/nacnudus/unpivotr) is inspired by
[Databaker](https://github.com/sensiblecodeio/databaker), a collaboration
between the [United Kingdom Office of National Statistics](https://www.ons.gov.uk/)
and [The Sensible Code Company](https://sensiblecode.io/).
[unpivotr](https://github.com/nacnudus/unpivotr).

[jailbreaker](https://github.com/rsheets/jailbreakr) attempts to extract
non-tabular data from spreadsheets into tabular structures automatically via
some clever algorithms.  [unpivotr](https://github.com/nacnudus/unpivotr)
differs by being less magic, and equipping you to express what you want to do.

Owner

  • Name: Duncan Garmonsway
  • Login: nacnudus
  • Kind: user
  • Location: London
  • Company: Government Digital Service

GitHub Events

Total
  • Issues event: 1
  • Watch event: 3
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request event: 2
  • Create event: 3
Last Year
  • Issues event: 1
  • Watch event: 3
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 9
  • Pull request event: 2
  • Create event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 630
  • Total Committers: 9
  • Avg Commits per committer: 70.0
  • Development Distribution Score (DDS): 0.022
Past Year
  • Commits: 6
  • Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Duncan Garmonsway n****s@g****m 616
Bill Denney w****y@h****m 4
Lionel Henry l****y@g****m 2
Ian Moran i****1@g****m 2
DavisVaughan d****s@r****m 2
wdkrnls k****s@g****m 1
Michael Kerber 4****R 1
Kirill Müller k****r 1
Greg g****k 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 33
  • Total pull requests: 29
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 21
  • Total pull request authors: 9
  • Average comments per issue: 2.39
  • Average comments per pull request: 0.76
  • Merged pull requests: 24
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 9 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nacnudus (10)
  • billdenney (3)
  • bedantaguru (2)
  • jnmaloof (1)
  • seanfryan (1)
  • martinzuba (1)
  • danstrobridge-Weston (1)
  • rtaph (1)
  • lepennec (1)
  • callumwebb (1)
  • JPennn (1)
  • erm-eanway (1)
  • romainfrancois (1)
  • f00baroo (1)
  • krlmlr (1)
Pull Request Authors
  • nacnudus (18)
  • billdenney (3)
  • krlmlr (3)
  • romainfrancois (1)
  • wdkrnls (1)
  • mtkerbeR (1)
  • DavisVaughan (1)
  • gregrs-uk (1)
  • lionel- (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 471 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 19
  • Total maintainers: 1
proxy.golang.org: github.com/nacnudus/unpivotr
  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
cran.r-project.org: unpivotr

Unpivot Complex and Irregular Data Layouts

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 471 Last month
Rankings
Stargazers count: 2.4%
Forks count: 3.9%
Average: 14.8%
Dependent repos count: 19.3%
Downloads: 19.7%
Dependent packages count: 28.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.2.0 depends
  • cellranger * imports
  • dplyr * imports
  • forcats * imports
  • magrittr * imports
  • methods * imports
  • pillar * imports
  • purrr * imports
  • rlang * imports
  • tibble >= 2.1.1 imports
  • tidyr * imports
  • tidyselect * imports
  • xml2 * imports
  • DT * suggests
  • covr * suggests
  • here * suggests
  • htmltools * suggests
  • knitr * suggests
  • readr * suggests
  • readxl * suggests
  • rmarkdown * suggests
  • rvest * suggests
  • selectr * suggests
  • stringr * suggests
  • testthat * suggests
  • tidyxl * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite