cassowaryr

Compute scagnostics on your scatterplots

https://github.com/numbats/cassowaryr

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.4%) to scientific vocabulary

Keywords

data-science data-visualization eda high-dimensional-data multivariate
Last synced: 6 months ago · JSON representation

Repository

Compute scagnostics on your scatterplots

Basic Info
Statistics
  • Stars: 4
  • Watchers: 5
  • Forks: 4
  • Open Issues: 12
  • Releases: 0
Topics
data-science data-visualization eda high-dimensional-data multivariate
Created over 4 years ago · Last pushed 12 months ago
Metadata Files
Readme Code of conduct

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  eval = TRUE
)
```

# CassowaryR 


[![Codecov test coverage](https://codecov.io/gh/numbats/cassowaryr/branch/master/graph/badge.svg)](https://app.codecov.io/gh/numbats/cassowaryr?branch=master)
[![R-CMD-check](https://github.com/numbats/cassowaryr/workflows/R-CMD-check/badge.svg)](https://github.com/numbats/cassowaryr/actions)


The `cassowaryr` package provides functions to compute scagnostics on pairs of numeric variables in a data set.

The term __scagnostics__ refers to scatter plot diagnostics, originally described by John and Paul Tukey. This is a collection of techniques for automatically extracting interesting visual features from pairs of variables. This package is an implementation of graph theoretic scagnostics developed by Wilkinson, Anand, and Grossman (2005) in pure R.  

## Installation
The package can be installed from CRAN using 

> ```install.packages("cassowaryr")```

and from GitHub using 

> ```remotes::install_github("numbats/cassowaryr")```

to install the development version.

## Examples
```{r, sc}
library(cassowaryr)
library(dplyr)
# A single scagnostic on two vectors
data("anscombe_tidy")
sc_outlying(anscombe$x1, anscombe$y1)
```

```{r calc_scags}
data("datasaurus_dozen")
datasaurus_dozen %>%
  dplyr::group_by(dataset)%>%
  dplyr::summarise(calc_scags(x, y, scags=c("clumpy2", "monotonic")))
```

## About the name

CAlculate Scagnostics on Scatterplots Over Wads of Associated Real numberYs in R

## About the calculations

### Graph-based measures

A 2-d scatter plot can be represented by a combination of three graphs
which are computed directly from the Delauney-Voroni tesselation.

1. A __minimum spanning tree__ weighted by the lengths of the Delauney triangles
2. The __convex hull__ of the points i.e. the outer segments of the triangulation
3. The __alpha hull__ (also called concave hull) i.e. formed by connect the outer edges of triangles that are enclosed within a ball of radius _alpha_. 

All graph based scagnostic measures are computed with respect to these three graphs.

Prior to graph construction decisions must be made about filtering outliers (done with respect to the distribution of edge lengths in the triangulation) and thinning the size of the graphs by performing binning (for computational speed). For the moment we can forge ahead without these but it is worth keeping in mind that the package needs to be flexible enough to include them.  
There are also opportunities to experiment with the preprocessing here. 

Two MST measures "clumpy" and "outlying" are known to cause problems.

As all the graph based measures rely on the triangulation, they could be computed lazily. More concretely, if you are only interested in computing "skinny" you don't need to compute the spanning tree. If you computed "skinny" but wanted to then compute "convex" you shouldn't need to reconstruct the alpha-hull and so on... To begin let's not worry about that and focus on implementations of each measure.

### Association-based measures

These are computed directly from the 2-d point clouds, and do not need to be constructed from the graph. 


Owner

  • Name: NUMBATS: Non-Uniform Monash Business Analytics Team repo for joint projects
  • Login: numbats
  • Kind: organization
  • Email: buseco-numbats@monash.edu
  • Location: Melbourne, Australia

We are part of Monash University, Department of Econometrics and Business Statistics

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
  • Pull request event: 3
  • Fork event: 3
Last Year
  • Watch event: 1
  • Push event: 2
  • Pull request event: 3
  • Fork event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 21
  • Total pull requests: 2
  • Average time to close issues: 4 months
  • Average time to close pull requests: 13 days
  • Total issue authors: 5
  • Total pull request authors: 2
  • Average comments per issue: 1.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 7
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 7 hours
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • harriet-mason (10)
  • dicook (5)
  • uschiLaa (3)
  • sa-lee (2)
  • TengMCing (1)
Pull Request Authors
  • huizezhang-sherry (2)
  • sa-lee (1)
  • Tinarj (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 1,019 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: cassowaryr

Compute Scagnostics on Pairs of Numeric Variables in a Data Set

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,019 Last month
Rankings
Forks count: 17.8%
Stargazers count: 26.2%
Dependent packages count: 29.8%
Average: 34.0%
Dependent repos count: 35.5%
Downloads: 60.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.0.0 depends
  • alphahull >= 2.5 imports
  • dplyr * imports
  • energy * imports
  • ggplot2 * imports
  • igraph * imports
  • interp * imports
  • magrittr * imports
  • progress * imports
  • splancs * imports
  • stats * imports
  • tibble * imports
  • tidyselect * imports
  • GGally * suggests
  • covr * suggests
  • knitr * suggests
  • mgcv * suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests
  • tidyr * suggests
.github/workflows/check-standard.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pr-commands.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/pr-fetch v1 composite
  • r-lib/actions/pr-push v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite