smallsets

Visual documentation for data preprocessing in R and Python

https://github.com/lydialucchesi/smallsets

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    2 of 3 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary

Keywords

data-science data-visualization documentation-tool machine-learning preprocessing python r r-package visualization-tools
Last synced: 6 months ago · JSON representation

Repository

Visual documentation for data preprocessing in R and Python

Basic Info
Statistics
  • Stars: 14
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 3
Topics
data-science data-visualization documentation-tool machine-learning preprocessing python r r-package visualization-tools
Created over 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License

README.Rmd

---
output: github_document
---

```{r, echo=FALSE, out.width="17%", fig.align="right", out.extra='style="float:right; padding:15px"'}
knitr::include_graphics("man/figures/hex_sticker.png")
```

# smallsets: Visual Documentation for Data Preprocessing in R and Python

[![CRAN status](https://www.r-pkg.org/badges/version/smallsets)](https://CRAN.R-project.org/package=smallsets)
![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/smallsets)

**`smallsets` website: [lydialucchesi.github.io/smallsets/](https://lydialucchesi.github.io/smallsets/)**

Do you use R or Python to preprocess datasets for analyses? `smallsets` is an R package (https://CRAN.R-project.org/package=smallsets) that transforms the preprocessing code in your R, R Markdown, Python, or Jupyter Notebook file into a Smallset Timeline. A Smallset Timeline is a static, compact visualisation composed of small data snapshots of different preprocessing steps. A full description of the Smallset Timeline can be found in the paper [**Smallset Timelines: A Visual Representation of Data Preprocessing Decisions**](https://doi.org/10.1145/3531146.3533175) in the proceedings of ACM FAccT '22.

The `smallsets` user guide is available [here](https://lydialucchesi.github.io/smallsets/articles/smallsets.html) and in the package in `vignette("smallsets")`. If you have questions or would like help building a Smallset Timeline, please [email Lydia](mailto:lydia.lucchesi@anu.edu.au).

**[Download the smallsets cheatsheet (1-page PDF)](https://lydialucchesi.github.io/smallsets_cheatsheet/smallsets_cheatsheet.pdf)**

## Install from CRAN

```{r, eval=FALSE}
install.packages("smallsets")
```

## Quick start example

Run this snippet of code to build your first Smallset Timeline! It's based on the synthetic dataset s_data, with 100 observations and eight variables (C1-C8), and the preprocessing script s_data_preprocess.R, discussed in the following section.

```{r quick-start-example, eval=FALSE}
library(smallsets)

set.seed(145)

Smallset_Timeline(data = s_data,
                  code = system.file("s_data_preprocess.R", package = "smallsets"))
```

![](man/figures/quick_start_figure.png)

## Structured comments

The Smallset Timeline above is based on the R preprocessing script below, s_data_preprocess.R. Structured comments were added to it, informing `smallsets` what to do.

```{r, code=readLines(system.file("s_data_preprocess.R", package="smallsets")), eval=FALSE, class.source="view-only"}
```

## Citing `smallsets`

If you use the `smallsets` software, please cite the Smallset Timeline paper.

Lydia R. Lucchesi, Petra M. Kuhnert, Jenny L. Davis, and Lexing Xie. 2022. Smallset Timelines: A Visual Representation of Data Preprocessing Decisions. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery, New York, NY, USA, 1136–1153. https://doi.org/10.1145/3531146.3533175

```
@inproceedings{SmallsetTimelines, 
  author = {Lucchesi, Lydia R. and Kuhnert, Petra M. and Davis, Jenny L. and Xie, Lexing}, 
  title = {Smallset Timelines: A Visual Representation of Data Preprocessing Decisions}, 
  year = {2022}, 
  isbn = {9781450393522}, 
  publisher = {Association for Computing Machinery}, 
  address = {New York, NY, USA}, 
  url = {https://doi.org/10.1145/3531146.3533175}, 
  doi = {10.1145/3531146.3533175}, 
  location = {Seoul, Republic of Korea}, 
  series = {FAccT '22}
}
```

Owner

  • Name: Lydia Lucchesi
  • Login: lydialucchesi
  • Kind: user
  • Location: Canberra, Australia
  • Company: Australian National University

PhD candidate. Visualising data preprocessing decisions (smallsets R package) & uncertainty in spatial data (Vizumap R package)!

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 227
  • Total Committers: 3
  • Avg Commits per committer: 75.667
  • Development Distribution Score (DDS): 0.022
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Lucchesi L****i@a****u 222
Petra Kuhnert p****t@d****u 3
lexingxie l****e@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • pkuhnert (2)
  • lexingxie (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 207 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: smallsets

Visual Documentation for Data Preprocessing

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 207 Last month
Rankings
Stargazers count: 17.9%
Forks count: 21.9%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 36.3%
Downloads: 76.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • brew * imports
  • colorspace * imports
  • dplyr * imports
  • english * imports
  • flextable * imports
  • ggfittext * imports
  • ggforce * imports
  • ggplot2 * imports
  • ggtext * imports
  • gplots * imports
  • knitr * imports
  • magrittr * imports
  • patchwork * imports
  • plotrix * imports
  • plyr * imports
  • readr * imports
  • reshape2 * imports
  • reticulate * imports
  • stringr * imports
  • testthat * imports
  • tibble * imports
  • tools * imports
  • gurobi * suggests