Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets

Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets - Published in JOSS (2020)

https://github.com/manalytics/akmedoids

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README and JOSS metadata
  • Academic publication links
  • Committers with academic emails
    2 of 3 committers (66.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

akmedoids for longitudinal clustering

Basic Info
  • Host: GitHub
  • Owner: MAnalytics
  • License: gpl-3.0
  • Language: HTML
  • Default Branch: master
  • Size: 3.24 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 1
Created about 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
#output: github_document
output: md_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# akmedoids




An R package for analyzing and clustering longitudinal data

### Description

The `akmedoids` package advances the clustering of longitudinal datasets in order to identify clusters of trajectories with similar long-term linear trends over time, providing an improved cluster identification as compared with the classic kmeans algorithm. The package also includes a set of functions for addressing common data issues, such as missing entries and outliers, prior to conducting advance longitudinal data analysis. One of the key objectives of this package is to facilitate easy replication of a recent paper which examined small area inequality in the crime drop (Adepeju et al. 2020). Many of the functions provided in the `akmedoids` package may be applied to longitudinal data in general. 

**For more information and usability, check out details on [CRAN](https://cran.r-project.org/web/packages/akmedoids/index.html).**


### Installation from `CRAN`

From an R console, type:

```{r, echo=TRUE, message=FALSE, eval=TRUE}
#install.packages("akmedoids")
library(akmedoids)

#Other libraries
library(tidyr)
library(ggplot2)
library(reshape)
library(readr)
```
To install the development version of the package, type `remotes::install_github("MAnalytics/akmedoids")`. Please, report any installation problems in the issues.

### Example usage:

Given a longitudinal datasets, the following is an example of how `akmedoids` could be used to extract clusters of trajectories with similar long-term trends over time. We will use a simulated dataset (named `simulated.rda`) stored in the `data/` directory.


### Generating artificial dataset

Simulating data set which comprised of three clusters with distinct mean directional change over time. Each group contains 50 trajectories.


```{r, echo=TRUE, message=FALSE, eval=TRUE}
dir.create("input") # create a folder

#function for creating longitudinal noise
noise_fn <- function(x=3, time){
  rnorm(length(time), mean=0, x)}

#function for simulating a trajectory group
sim_group <- function(gr_baseline, sd, time){
  intcp_errors <- rgamma(1, shape=2, scale=sd) #intercept error
  mean_traj = gr_baseline + intcp_errors
  traj = mean_traj + noise_fn(intcp_errors, time)
}

#time steps
t_steps <- c(0:20)

#increasing group
i_gr <- NULL
for(i in seq_len(50)){
  i_gr <- rbind(i_gr,
                sim_group(gr_baseline=(0.5*t_steps),
                          sd=1, time=t_steps))
}

#stable group
s_gr <- NULL
for(i in seq_len(50)){
  s_gr <- rbind(s_gr,
                sim_group(gr_baseline=rep(3,length(t_steps)),
                          sd=1, time=t_steps))
}

#decreasing group
d_gr <- NULL
for(i in seq_len(50)){
  d_gr <- rbind(d_gr,
                sim_group(gr_baseline=(10 - (0.5*t_steps)),
                          sd=1, time=t_steps))
}

#combine groups
simulated <- data.frame(rbind(i_gr, s_gr, d_gr))

#add group label
simulated <- data.frame(cbind(ID=1:nrow(simulated), simulated))

colnames(simulated) <- c("ID", 1:(ncol(simulated)-1))

#save data set
##simulated = readr::write_csv(simulated, "input/example-simulated.csv")

```


### Visualising artificial dataset

```{r, echo=TRUE, message=FALSE, eval=TRUE}

#import already save simulated data
Import_simulated = read_csv(file="./input/example-simulated.csv")

#preview the data
head(Import_simulated)


#convert wide-format into long
simulated_long <- melt(t(Import_simulated), id.vars=c("ID"))

 simulated_long <- simulated_long %>%
  dplyr::filter(X1!="ID") %>%
  dplyr::rename(Time=X1, ID=X2)
# 
simulated_long <- data.frame(cbind(simulated_long,
                                   Groups= c(rep("Increasing", 50*21),
                                   rep("Stable", 50*21),
                                   rep("Decreasing", 50*21))))

#re-order levels
simulated_long$Time <- factor(simulated_long$Time, 
                                levels = c(1:21))

ggplot(simulated_long, aes(x = Time, y = value, group=ID, color=Groups)) +
  geom_point(size=0.5) + 
  geom_line() +
  scale_color_manual(values=c('#999999','#E69F00', '#56B4E9')) +
  theme_light()

```

### Performing clustering using `akmedoids`

Performing clustering analysis using `akmedoids` package.

```{r, echo=TRUE, message=TRUE, eval=TRUE}

output <- akclustr(Import_simulated, id_field=TRUE, verbose = FALSE, k=c(3,12), crit = "Silhouette",
                  quality_plot=TRUE)

#Remark: The entire output can be printed out to the console by typing `output`

#To check the optimal cluster size, type:
output$optimal_k

#Show quality plot
#plot(output$QltyPlot)

#user may decide the examine the plot post clustering....
```

### Documentation

From an R console type `??akmedoids` for help on the package. The package page on CRAN is [here](https://cran.r-project.org/web/packages/akmedoids/index.html), package reference manual is [here](https://cran.r-project.org/web/packages/akmedoids/akmedoids.pdf), package vignette is [here](https://cran.r-project.org/web/packages/akmedoids/vignettes/akmedoids-vignette.html). 

### Support and Contributions:

For support and bug reports send an email to: monsuur2010@yahoo.com or open an issue [here](https://github.com/MAnalytics/akmedoids/issues). Code contributions to akmedoids are also very welcome.

### References:

Rousseeuw, P. J. 1987. “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics, no. 20: 53–6. [link](https://www.bibsonomy.org/bibtex/bc0f62c7895f91c787354d03f23da976)

Caliński, T., and J. Harabasz. 1974. “A Dendrite Method for Cluster Analysis.” Communications in Statistics-Theory and Methods, 3(1): 1–27. [link](https://www.tandfonline.com/doi/abs/10.1080/03610927408827101)

Adepeju, M., Langton, S. and Bannister, J. 2020. Anchored k-medoids: a novel adaptation of k-means further refined to measure instability in the exposure to crime. Journal of Computational Social Science, (revised).

Owner

  • Name: Monsuru Adepeju
  • Login: MAnalytics
  • Kind: user
  • Location: Manchester
  • Company: Manchester Metropolitan University

I specialize in Geocomputation, Big (Crime and Police) Data Analytics, and GIS Application Development.

JOSS Publication

Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets
Published
December 04, 2020
Volume 5, Issue 56, Page 2379
Authors
Monsuru Adepeju ORCID
Crime and Well-being Big Data Centre, Manchester Metropolitan University
Samuel Langton ORCID
Crime and Well-being Big Data Centre, Manchester Metropolitan University
Jon Bannister ORCID
Crime and Well-being Big Data Centre, Manchester Metropolitan University
Editor
Karthik Ram ORCID
Tags
Anchored k-medoids k-means crime longitudinal clustering long-term trends

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 539
  • Total Committers: 3
  • Avg Commits per committer: 179.667
  • Development Distribution Score (DDS): 0.032
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
geoMADE g****d@l****k 522
Monsuru Adepeju m****u@m****k 10
geoMADE 7
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 9 days
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 1.33
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • davidgohel (2)
  • andrewmaclachlan (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • Hmisc * imports
  • clusterCrit * imports
  • ggplot2 * imports
  • grDevices * imports
  • kml * imports
  • reshape2 * imports
  • signal * imports
  • stats * imports
  • utils * imports
  • devtools * suggests
  • digest * suggests
  • dplyr * suggests
  • gdtools * suggests
  • kableExtra * suggests
  • knitr * suggests
  • rgl * suggests
  • rmarkdown * suggests
  • testthat * suggests