unpivotr

Unpivot complex and irregular data layouts in R

https://github.com/nacnudus/unpivotr

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Keywords

excel pivot-tables r spreadsheet

Keywords from Contributors

parsing tidyverse unit-testing fwf csv visualisation factor strings string-interpolation s3-vectors

Last synced: 6 months ago · JSON representation

Repository

Unpivot complex and irregular data layouts in R

Basic Info

Host: GitHub
Owner: nacnudus
License: other
Language: R
Default Branch: main
Homepage: https://nacnudus.github.io/unpivotr/
Size: 6.93 MB

Statistics

Stars: 187
Watchers: 8
Forks: 19
Open Issues: 4
Releases: 5

Topics

excel pivot-tables r spreadsheet

Created over 9 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/"
)
```

# unpivotr


[![Cran Status](http://www.r-pkg.org/badges/version/unpivotr)](https://CRAN.R-project.org/package=unpivotr)
![Cran Downloads](https://cranlogs.r-pkg.org/badges/unpivotr)
[![codecov](https://codecov.io/github/nacnudus/unpivotr/coverage.svg?branch=master)](https://app.codecov.io/gh/nacnudus/unpivotr)
[![R-CMD-check](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/nacnudus/unpivotr/actions/workflows/R-CMD-check.yaml)


[unpivotr](https://github.com/nacnudus/unpivotr) deals with non-tabular data,
especially from spreadsheets.  Use unpivotr when your source data has any of
these 'features':

* Multi-headered hydra
* Meaningful formatting
* Headers anywhere but at the top of each column
* Non-text headers e.g. dates
* Other stuff around the table
* Several similar tables in one sheet
* Sentinel values
* Superscript symbols
* Meaningful comments
* Nested HTML tables

If that list makes your blood boil, you'll enjoy the function names.

* `behead()` deals with multi-headered hydra tables one layer of headers at a
  time, working from the edge of the table inwards.  It's a bit like using
  `header = TRUE` in `read.csv()`, but because it's a function, you can apply it
  to as many layers of headers as you need.  You end up with all the headers in
  columns.
* `spatter()` is like `tidyr::spread()` but preserves mixed data types.  You get
  into a mixed-data-type situation by delaying type coercion until *after* the
  table is tidy (rather than before, like `read.csv()` et al).  And yes, it
  usually follows `behead()`.

More positive, corrective functions:

* `justify()` aligns column headers before `behead()`ing, and has deliberate
  moral overtones.
* `enhead()` attaches a header to the body of the data, *a la* Frankenstein.
  The effect is the same as `behead()`, but is more powerful because you can
  choose exactly which header cells you want, paying attention to formatting
  (which `behead()` doesn't understand).
* `isolate_sentinels()` separates meaningful symbols like `"N/A"` or
  `"confidential"` from the rest of the data, giving them some time alone think
  about what they've done.
* `partition()` takes a sheet with several tables on it, and slashes into pieces
  that each contain one table.  You can then unpivot each table in turn with
  `purrr::map()` or similar.

## Make cells tidy

Unpivotr uses data where each cells is represented by one row in a dataframe.
Like this.

![Gif of tidyxl converting cells into a tidy representation of one row per cell](./vignettes/tidy_xlsx.gif)

What can you do with tidy cells?  The best places to start are:

* [Spreadsheet Munging
  Strategies](https://nacnudus.github.io/spreadsheet-munging-strategies/), a
  free, online cookbook using [tidyxl](https://github.com/nacnudus/tidyxl/) and
  [unpivotr](https://github.com/nacnudus/unpivotr)
* [Screencasts](https://www.youtube.com/watch?v=1sinC7wsS5U) on YouTube.
* [Worked examples](https://github.com/nacnudus/ukfarm) on GitHub.

Otherwise the basic idea is:

1. Read the data with a specialist tool.
   * For spreadsheets, use [tidyxl](https://nacnudus.github.io/tidyxl/).
   * For plain text files, you might soon be able to use
     [readr](https://readr.tidyverse.org), but for now you'll have to install a
     pull-request on that package with
     `devtools::install_github("tidyverse/readr#760")`.
   * For tables in html pages, use `unpivotr::tidy_html()`
   * For data frames, use `unpivotr::as_cells()` -- this should be a last
     resort, because by the time the data is in a conventional data frame, it
     is often too late -- formatting has been lost, and most data types have
     been coerced to strings.
1. Either `behead()` straight away, else `dplyr::filter()` separately for the
   header cells and the data cells, and then recombine with `enhead()`.
1. `spatter()` so that each column has one data type.

```{r}
library(unpivotr)
library(tidyverse)
x <- purpose$`up-left left-up`
x # A pivot table in a conventional data frame.  Four levels of headers, in two
  # rows and two columns.

y <- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell
y

rectify(y) # useful for reviewing the melted form as though in a spreadsheet

y %>%
  behead("up-left", "sex") %>%               # Strip headers
  behead("up", "life-satisfication") %>%  # one
  behead("left-up", "qualification") %>%     # by
  behead("left", "age-band") %>%            # one.
  select(-row, -col, -data_type, count = chr) %>% # cleanup
  mutate(count = as.integer(count))
```

Note the compass directions in the code above, which hint to `behead()` where to
find the header cell for each data cell.

* `"up-left"` means the header (`Female`, `Male`) is positioned up and to the
  left of the columns of data cells it describes.
* `"up"` means the header (`0 - 6`, `7 - 10`) is positioned directly above the
  columns of data cells it describes.
* `"left-up"` means the header (`Bachelor's degree`, `Certificate`, etc.) is
  positioned to the left and upwards of the rows of data cells it describes.
* `"left"` means the header (`15 - 24`, `25 - 44`, etc.) is positioned directly to
  the left of the rows of data cells it describes.

## Installation

```{r, echo = TRUE, eval = FALSE}
# install.packages("devtools") # If you don't already have devtools
devtools::install_github("nacnudus/unpivotr", build_vignettes = TRUE)
```

The version 0.4.0 release had somee breaking changes.  See `NEWS.md` for
details.  The previous version can be installed as follow:

```r
devtools::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org")
```

## Similar projects

[unpivotr](https://github.com/nacnudus/unpivotr) is inspired by
[Databaker](https://github.com/sensiblecodeio/databaker), a collaboration
between the [United Kingdom Office of National Statistics](https://www.ons.gov.uk/)
and [The Sensible Code Company](https://sensiblecode.io/).
[unpivotr](https://github.com/nacnudus/unpivotr).

[jailbreaker](https://github.com/rsheets/jailbreakr) attempts to extract
non-tabular data from spreadsheets into tabular structures automatically via
some clever algorithms.  [unpivotr](https://github.com/nacnudus/unpivotr)
differs by being less magic, and equipping you to express what you want to do.

Owner

Name: Duncan Garmonsway
Login: nacnudus
Kind: user
Location: London
Company: Government Digital Service

Website: https://www.linkedin.com/in/duncangarmonsway/
Repositories: 119
Profile: https://github.com/nacnudus

GitHub Events

Total

Issues event: 1
Watch event: 3
Delete event: 1
Issue comment event: 1
Push event: 9
Pull request event: 2
Create event: 3

Last Year

Issues event: 1
Watch event: 3
Delete event: 1
Issue comment event: 1
Push event: 9
Pull request event: 2
Create event: 3

Committers

Last synced: 9 months ago

All Time

Total Commits: 630
Total Committers: 9
Avg Commits per committer: 70.0
Development Distribution Score (DDS): 0.022

Past Year

Commits: 6
Committers: 1
Avg Commits per committer: 6.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Duncan Garmonsway	n**s@g**m	616
Bill Denney	w**y@h**m	4
Lionel Henry	l**y@g**m	2
Ian Moran	i**1@g**m	2
DavisVaughan	d**s@r**m	2
wdkrnls	k**s@g**m	1
Michael Kerber	4****R	1
Kirill Müller	k****r	1
Greg	g****k	1

Committer Domains (Top 20 + Academic)

rstudio.com: 1 humanpredictions.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 33
Total pull requests: 29
Average time to close issues: about 2 months
Average time to close pull requests: 2 months
Total issue authors: 21
Total pull request authors: 9
Average comments per issue: 2.39
Average comments per pull request: 0.76
Merged pull requests: 24
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 9 days
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

nacnudus (10)
billdenney (3)
bedantaguru (2)
jnmaloof (1)
seanfryan (1)
martinzuba (1)
danstrobridge-Weston (1)
rtaph (1)
lepennec (1)
callumwebb (1)
JPennn (1)
erm-eanway (1)
romainfrancois (1)
f00baroo (1)
krlmlr (1)

Pull Request Authors

nacnudus (18)
billdenney (3)
krlmlr (3)
romainfrancois (1)
wdkrnls (1)
mtkerbeR (1)
DavisVaughan (1)
gregrs-uk (1)
lionel- (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- cran 471 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 2
(may contain duplicates)
Total versions: 19
Total maintainers: 1

proxy.golang.org: github.com/nacnudus/unpivotr

Documentation: https://pkg.go.dev/github.com/nacnudus/unpivotr#section-documentation
License: other
Latest release: v0.6.0
published almost 6 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

cran.r-project.org: unpivotr

Unpivot Complex and Irregular Data Layouts

Homepage: https://github.com/nacnudus/unpivotr
Documentation: http://cran.r-project.org/web/packages/unpivotr/unpivotr.pdf
License: MIT + file LICENSE
Latest release: 0.6.4
published about 1 year ago

Versions: 13
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 471 Last month

Rankings

Stargazers count: 2.4%

Forks count: 3.9%

Average: 14.8%

Dependent repos count: 19.3%

Downloads: 19.7%

Dependent packages count: 28.8%

Maintainers (1)

nacnudus@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.2.0 depends
cellranger * imports
dplyr * imports
forcats * imports
magrittr * imports
methods * imports
pillar * imports
purrr * imports
rlang * imports
tibble >= 2.1.1 imports
tidyr * imports
tidyselect * imports
xml2 * imports
DT * suggests
covr * suggests
here * suggests
htmltools * suggests
knitr * suggests
readr * suggests
readxl * suggests
rmarkdown * suggests
rvest * suggests
selectr * suggests
stringr * suggests
testthat * suggests
tidyxl * suggests

.github/workflows/R-CMD-check.yaml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/upload-artifact main composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v1 composite