smallsets
Visual documentation for data preprocessing in R and Python
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
2 of 3 committers (66.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Keywords
data-science
data-visualization
documentation-tool
machine-learning
preprocessing
python
r
r-package
visualization-tools
Last synced: 6 months ago
·
JSON representation
Repository
Visual documentation for data preprocessing in R and Python
Basic Info
- Host: GitHub
- Owner: lydialucchesi
- License: gpl-3.0
- Language: R
- Default Branch: main
- Homepage: https://lydialucchesi.github.io/smallsets/
- Size: 16.1 MB
Statistics
- Stars: 14
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 3
Topics
data-science
data-visualization
documentation-tool
machine-learning
preprocessing
python
r
r-package
visualization-tools
Created over 5 years ago
· Last pushed about 1 year ago
Metadata Files
Readme
Changelog
Contributing
License
README.Rmd
---
output: github_document
---
```{r, echo=FALSE, out.width="17%", fig.align="right", out.extra='style="float:right; padding:15px"'}
knitr::include_graphics("man/figures/hex_sticker.png")
```
# smallsets: Visual Documentation for Data Preprocessing in R and Python
[](https://CRAN.R-project.org/package=smallsets)

**`smallsets` website: [lydialucchesi.github.io/smallsets/](https://lydialucchesi.github.io/smallsets/)**
Do you use R or Python to preprocess datasets for analyses? `smallsets` is an R package (https://CRAN.R-project.org/package=smallsets) that transforms the preprocessing code in your R, R Markdown, Python, or Jupyter Notebook file into a Smallset Timeline. A Smallset Timeline is a static, compact visualisation composed of small data snapshots of different preprocessing steps. A full description of the Smallset Timeline can be found in the paper [**Smallset Timelines: A Visual Representation of Data Preprocessing Decisions**](https://doi.org/10.1145/3531146.3533175) in the proceedings of ACM FAccT '22.
The `smallsets` user guide is available [here](https://lydialucchesi.github.io/smallsets/articles/smallsets.html) and in the package in `vignette("smallsets")`. If you have questions or would like help building a Smallset Timeline, please [email Lydia](mailto:lydia.lucchesi@anu.edu.au).
**[Download the smallsets cheatsheet (1-page PDF)](https://lydialucchesi.github.io/smallsets_cheatsheet/smallsets_cheatsheet.pdf)**
## Install from CRAN
```{r, eval=FALSE}
install.packages("smallsets")
```
## Quick start example
Run this snippet of code to build your first Smallset Timeline! It's based on the synthetic dataset s_data, with 100 observations and eight variables (C1-C8), and the preprocessing script s_data_preprocess.R, discussed in the following section.
```{r quick-start-example, eval=FALSE}
library(smallsets)
set.seed(145)
Smallset_Timeline(data = s_data,
code = system.file("s_data_preprocess.R", package = "smallsets"))
```

## Structured comments
The Smallset Timeline above is based on the R preprocessing script below, s_data_preprocess.R. Structured comments were added to it, informing `smallsets` what to do.
```{r, code=readLines(system.file("s_data_preprocess.R", package="smallsets")), eval=FALSE, class.source="view-only"}
```
## Citing `smallsets`
If you use the `smallsets` software, please cite the Smallset Timeline paper.
Lydia R. Lucchesi, Petra M. Kuhnert, Jenny L. Davis, and Lexing Xie. 2022. Smallset Timelines: A Visual Representation of Data Preprocessing Decisions. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22). Association for Computing Machinery, New York, NY, USA, 1136–1153. https://doi.org/10.1145/3531146.3533175
```
@inproceedings{SmallsetTimelines,
author = {Lucchesi, Lydia R. and Kuhnert, Petra M. and Davis, Jenny L. and Xie, Lexing},
title = {Smallset Timelines: A Visual Representation of Data Preprocessing Decisions},
year = {2022},
isbn = {9781450393522},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3531146.3533175},
doi = {10.1145/3531146.3533175},
location = {Seoul, Republic of Korea},
series = {FAccT '22}
}
```
Owner
- Name: Lydia Lucchesi
- Login: lydialucchesi
- Kind: user
- Location: Canberra, Australia
- Company: Australian National University
- Repositories: 2
- Profile: https://github.com/lydialucchesi
PhD candidate. Visualising data preprocessing decisions (smallsets R package) & uncertainty in spatial data (Vizumap R package)!
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Lucchesi | L****i@a****u | 222 |
| Petra Kuhnert | p****t@d****u | 3 |
| lexingxie | l****e@g****m | 2 |
Committer Domains (Top 20 + Academic)
data61.csiro.au: 1
anu.edu.au: 1
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- pkuhnert (2)
- lexingxie (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 207 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
cran.r-project.org: smallsets
Visual Documentation for Data Preprocessing
- Homepage: https://lydialucchesi.github.io/smallsets/
- Documentation: http://cran.r-project.org/web/packages/smallsets/smallsets.pdf
- License: GPL (≥ 3)
-
Latest release: 2.0.0
published about 2 years ago
Rankings
Stargazers count: 17.9%
Forks count: 21.9%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 36.3%
Downloads: 76.6%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- brew * imports
- colorspace * imports
- dplyr * imports
- english * imports
- flextable * imports
- ggfittext * imports
- ggforce * imports
- ggplot2 * imports
- ggtext * imports
- gplots * imports
- knitr * imports
- magrittr * imports
- patchwork * imports
- plotrix * imports
- plyr * imports
- readr * imports
- reshape2 * imports
- reticulate * imports
- stringr * imports
- testthat * imports
- tibble * imports
- tools * imports
- gurobi * suggests