c2c

R package: compare two classifications with varying membership or class structure

https://github.com/mitchest/c2c

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

R package: compare two classifications with varying membership or class structure

Basic Info
  • Host: GitHub
  • Owner: mitchest
  • Language: HTML
  • Default Branch: master
  • Size: 52.7 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 9 years ago · Last pushed almost 9 years ago
Metadata Files
Readme

README.md

c2c


Build Status

What is c2c?

An R package for comparing two classifications or clustering solutions that have different structures - i.e. the two classifications have a different number of classes, or one classification has soft membership and one classification has hard membership. You can create a confusion matrix (error matrix) and then calculate various metrics to assess how the clusters compare to each other. The calculations are simple, but provide a handy tool for users unfamiliar with matrix multiplication. The helper functions also help you to do things like make a soft classification into a hard one, or turn a set of class labels into a binary classification matrix.

How to use c2c?

The basic premise is that you already have two (or more perhaps) classifications that you would like compare - these could be from a clustering algorithm, extracted from a remote sensing map, a set of classes assigned manually etc. There already exist a number of tools and packages to calculate cluster diagnostics or accuracy metrics, but they are usually focused on comparing clustering solutions that are hard (i.e. each observation has only one class) and have the same number of classes (e.g. clustering solution vs. the 'truth'). c2c is designed to allow you to compare classifications that to not fit into this scenario. The motivating problem was the need to compare a probabilistic clustering of vegetation data to an existing hard classification (which had a hierarchy with of numbers of classes) of that data, without losing the probabilistic component that the clustering algorithm produces.

This example is on silly fake data, but it's quick and will run without any additional data or package loads. Check out the vignette for something a little more sensible.
c2c vignette.

First install and load c2c

r install.packages("c2c") library(c2c)

Make a silly made up soft classification matrix

r my_soft_mat <- matrix(runif(50,0,1), nrow = 10, ncol = 5)

and a made up set of class labels, with matching number of observations

r my_labels <- rep(c("a","b","c"), length.out = 10)

The two main functions are get_conf_mat and calculate_clustering_metrics. First generate the confusion matrix

r conf_mat <- get_conf_mat(my_soft_mat, my_labels) conf_mat

then calculate the metrics - see ?calculate_cluster_metrics for details

r calculate_clustering_metrics(conf_mat)

You could also just pass any confusion matrix (that you have already generated elsewhere). Another thing you can do within get_conf_mat is turn a soft matrix into a hard one.

Installation

You can install directly from CRAN as above

r install.packages("c2c")

or if you want to get the development version, which might have some new functionality, you can install from GitHub. It's very easy, simply use Hadley Wickham's (excellent) devtools package

r install.packages("devtools")

then call

r library(devtools) devtools::install_github("mitchest/c2c")

Bugs

There are some probably. If you find them, please let me know about them - either directly on github, or the contact details below.

Contact

  • Mitchell Lyons
  • mitchell.lyons@gmail.com / mitchell.lyons@unsw.edu.au

References

Lyons, Foster and Keith (2017). Simultaneous vegetation classification and mapping at large spatial scales. Journal of Biogeography.

Foster, Hill and Lyons (2017) "Ecological Grouping of Survey Sites when Sampling Artefacts are Present". Journal of the Royal Statistical Society: Series C (Applied Statistics). DOI: http://dx.doi.org/10.1111/rssc.12211

Owner

  • Name: Mitchell Lyons
  • Login: mitchest
  • Kind: user
  • Location: Australia
  • Company: University of New South Wales

machine learner turned (statistical) ecologist

GitHub Events

Total
Last Year

Committers

Last synced: over 3 years ago

All Time
  • Total Commits: 59
  • Total Committers: 2
  • Avg Commits per committer: 29.5
  • Development Distribution Score (DDS): 0.458
Top Committers
Name Email Commits
Mitchell Lyons m****s@g****m 32
Mitchell Lyons m****s@u****u 27
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 230 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: c2c

Compare Two Classifications or Clustering Solutions of Varying Structure

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 230 Last month
Rankings
Forks count: 28.8%
Dependent packages count: 29.8%
Stargazers count: 35.2%
Dependent repos count: 35.5%
Average: 37.5%
Downloads: 58.2%
Maintainers (1)
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1.0 depends
  • e1071 * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests