Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 36 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 4 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Repository
Subset partitioning via anticlustering
Basic Info
Statistics
- Stars: 36
- Watchers: 5
- Forks: 8
- Open Issues: 9
- Releases: 35
Metadata Files
README.md
anticlust 
Anticlustering partitions a pool of elements into clusters (or
anticlusters) with the goal of achieving high between-cluster
similarity and high within-cluster heterogeneity. This is accomplished
by maximizing instead of minimizing a clustering objective function,
such as the intra-cluster variance (used in k-means clustering) or the
sum of pairwise distances within clusters. The package anticlust
implements anticlustering methods as described in Papenberg and Klau
(2021;
https://doi.org/10.1037/met0000301),
Brusco et al. (2020;
https://doi.org/10.1111/bmsp.12186),
Papenberg (2024;
https://doi.org/10.1111/bmsp.12315),
and Papenberg et al. (2025;
https://doi.org/10.1101/2025.03.03.641320).
Installation
The stable release of anticlust is available from
CRAN and can be
installed via:
install.packages("anticlust")
A (potentially more recent) version of anticlust can also be installed
via R Universe:
install.packages('anticlust', repos = c('https://m-py.r-universe.dev', 'https://cloud.r-project.org'))
or directly via Github:
library("remotes") # if not available: install.packages("remotes")
install_github("m-Py/anticlust")
Citation
If you use anticlust in your research, it would be courteous if you
cite the following reference:
- Papenberg, M., & Klau, G. W. (2021). Using anticlustering to partition data sets into equivalent parts. Psychological Methods, 26(2), 161–174. https://doi.org/10.1037/met0000301
Depending on which anticlust functions you are using, including other
references may also be fair. Here you can find out in detail how to
cite
anticlust.
Another great way of showing your appreciation of anticlust is to
leave a star on this Github repository.
How do I learn about anticlust
This README contains some basic information on the R package
anticlust. More information is available via the following sources:
- Up until now, we published 3 papers describing the theoretical
background of
anticlust.- The initial presentation of the
anticlustpackage is given in Papenberg and Klau (2021) (https://doi.org/10.1111/bmsp.12315; Preprint). - The k-plus anticlustering method is described in Papenberg (2024) (https://doi.org/10.1037/met0000527; Preprint).
- A new paper describes the must-link feature and provides additional comparisons to alternative methods, focusing on categorical variables (Papenberg et al., 2025; https://doi.org/10.1101/2025.03.03.641320).
- The R documentation of the main functions is actually quite rich
and up to date, so you should definitely check that out when
using the
anticlustpackage. The most important background is provided in?anticlustering.
- The initial presentation of the
- A video is available in German
language where I illustrate the main functionalities of the
anticlustering()function. My plan is to make a similar video in English in the future. - The package website contains all documentation as a convenient website. At the current time, the website also has four package vignettes, while additional vignettes are planned.
A quick start
In this initial example, I use the main function anticlustering() to
create five similar sets of plants using the classical iris data set:
First, load the package via
library("anticlust")
Call the anticlustering() method:
anticlusters <- anticlustering(
iris[, -5],
K = 5,
objective = "kplus",
method = "local-maximum",
repetitions = 10
)
The output is a vector that assigns a group (i.e, a number between 1 and
K) to each input element:
anticlusters
#> [1] 1 2 4 5 3 4 2 3 2 2 1 5 1 2 4 1 2 3 2 5 1 5 4 5 1 1 3 4 5 5 5 4 5 2 1 1 3
#> [38] 4 3 3 4 2 3 5 2 5 3 4 3 1 2 2 5 1 2 3 3 4 4 1 5 1 2 3 3 1 2 4 4 4 4 1 3 4
#> [75] 2 4 5 2 5 2 3 3 1 5 4 1 5 3 2 1 2 5 3 4 1 4 1 2 4 5 2 2 3 1 4 1 3 4 4 5 3
#> [112] 2 3 1 5 2 5 3 1 5 4 1 2 5 1 2 3 1 3 3 5 1 2 5 5 4 3 5 4 3 5 5 1 4 4 1 3 4
#> [149] 2 2
By default, each group has the same number of elements (but the argument
K can be adjusted to request different group sizes):
table(anticlusters)
#> anticlusters
#> 1 2 3 4 5
#> 30 30 30 30 30
Last, let’s compare the features’ means and standard deviations across groups to find out if the five groups are similar to each other:
knitr::kable(mean_sd_tab(iris[, -5], anticlusters), row.names = TRUE)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | |
|---|---|---|---|---|
| 1 | 5.84 (0.84) | 3.06 (0.44) | 3.76 (1.79) | 1.20 (0.77) |
| 2 | 5.84 (0.84) | 3.06 (0.45) | 3.76 (1.79) | 1.20 (0.77) |
| 3 | 5.84 (0.84) | 3.06 (0.44) | 3.75 (1.79) | 1.20 (0.77) |
| 4 | 5.85 (0.84) | 3.05 (0.45) | 3.76 (1.79) | 1.21 (0.77) |
| 5 | 5.84 (0.84) | 3.06 (0.44) | 3.76 (1.79) | 1.19 (0.78) |
As illustrated in the example, we can use the function
anticlustering() to create similar groups of plants. In this case
“similar” primarily means that the means and standard deviations (in
parentheses) of the variables are pretty much the same across the five
groups. The function anticlustering() takes as input a data table
describing the elements that should be assigned to sets. In the data
table, each row represents an element (here a plant, but it can be
anything; for example a person, word, or a photo). Each column is a
numeric variable describing one of the elements’ features. The number of
groups is specified through the argument K. The argument objective
specifies how between-group similarity is quantified; the argument
method specifies the algorithm by which this measure is optimized. See
the documentation ?anticlustering for more details.
Five anticlustering objectives are natively supported in
anticlustering():
- the “diversity” objective, setting
objective = "diversity"(default) - the “average-diversity”, setting
objective = "average-diversity", which normalizes the diversity by cluster size - the k-means objective (i.e., the “variance”) setting
objective = "variance" - the “k-plus” objective, an extension of the k-means variance
criterion, setting
objective = "kplus" - the “dispersion” objective (the minimum distance between any two
elements within the same cluster), setting
objective = "dispersion"
The anticlustering objectives are described in detail in the
documentation (?anticlustering, ?diversity_objective,
?variance_objective, ?kplus_anticlustering, ?dispersion_objective)
and the references therein. It is also possible to optimize user-defined
objectives, which is also described in the documentation
(?anticlustering).
Categorical variables
Sometimes, it is required that sets are not only similar with regard to
some numeric variables, but we also want to ensure that each set
contains an equal number of elements of a certain category. Coming back
to the initial iris data set, we may want to require that each set has a
balanced number of plants of the three iris species. To this end, we can
use the argument categories as follows:
anticlusters <- anticlustering(
iris[, -5],
K = 3,
categories = iris$Species
)
## The species are as balanced as possible across anticlusters:
table(anticlusters, iris$Species)
#>
#> anticlusters setosa versicolor virginica
#> 1 17 17 16
#> 2 17 16 17
#> 3 16 17 17
Matching and clustering
Anticlustering creates sets of dissimilar elements; the heterogenity
within anticlusters is maximized. This is the opposite of clustering
problems that strive for high within-cluster similarity and good
separation between clusters. The anticlust package also provides
functions for “classical” clustering applications:
balanced_clustering() creates sets of elements that are similar while
ensuring that clusters are of equal size. This is an example:
# Generate random data, cluster the data set and visualize results
N <- 1400
lds <- data.frame(var1 = rnorm(N), var2 = rnorm(N))
cl <- balanced_clustering(lds, K = 7)
plot_clusters(lds, clusters = cl, show_axes = TRUE)

The function matching() is very similar, but is usually used to find
small groups of similar elements, e.g., triplets as in this example:
# Generate random data and find triplets of similar elements:
N <- 120
lds <- data.frame(var1 = rnorm(N), var2 = rnorm(N))
triplets <- matching(lds, p = 3)
plot_clusters(
lds,
clusters = triplets,
within_connection = TRUE,
show_axes = TRUE
)

Questions and suggestions
If you have any question on the anticlust package or find some bugs, I
encourage you to open an issue on the Github
repository.
Owner
- Name: Martin Papenberg
- Login: m-Py
- Kind: user
- Location: Düsseldorf, Germany
- Company: Department of Experimental Psychology, University of Düsseldorf
- Website: https://m-py.github.io/about
- Repositories: 17
- Profile: https://github.com/m-Py
Post-doctoral researcher at the University of Duesseldorf.
GitHub Events
Total
- Create event: 10
- Release event: 5
- Issues event: 11
- Watch event: 7
- Delete event: 9
- Issue comment event: 17
- Push event: 38
- Pull request event: 2
- Fork event: 3
Last Year
- Create event: 10
- Release event: 5
- Issues event: 11
- Watch event: 7
- Delete event: 9
- Issue comment event: 17
- Push event: 38
- Pull request event: 2
- Fork event: 3
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Papenberg | m****g@h****e | 1,291 |
| einGlasRotwein | j****z@g****m | 4 |
| manalama | m****r@u****e | 2 |
| unDocUMeantIt | 1****t | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 30
- Total pull requests: 40
- Average time to close issues: 2 months
- Average time to close pull requests: 23 days
- Total issue authors: 5
- Total pull request authors: 8
- Average comments per issue: 2.17
- Average comments per pull request: 0.43
- Merged pull requests: 34
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 4
- Average time to close issues: 30 days
- Average time to close pull requests: about 1 month
- Issue authors: 2
- Pull request authors: 3
- Average comments per issue: 1.38
- Average comments per pull request: 2.25
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- m-Py (23)
- A-Pai (4)
- rbcavanaugh (1)
- uhkeller (1)
- viv-analytics (1)
Pull Request Authors
- m-Py (31)
- HanneyAI (2)
- unDocUMeantIt (2)
- ManaLama (2)
- Hanney100 (2)
- olivroy (2)
- einGlasRotwein (1)
- Dimitry-Wintermantel (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 1,059 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 18
- Total maintainers: 1
cran.r-project.org: anticlust
Subset Partitioning via Anticlustering
- Homepage: https://github.com/m-Py/anticlust
- Documentation: http://cran.r-project.org/web/packages/anticlust/anticlust.pdf
- License: MIT + file LICENSE
-
Latest release: 0.8.10
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.6.0 depends
- Matrix * imports
- RANN >= 2.6.0 imports
- Rglpk * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat * suggests