Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets

Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets - Published in JOSS (2020)

https://github.com/manalytics/akmedoids

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README and JOSS metadata
○
Academic publication links
✓
Committers with academic emails
2 of 3 committers (66.7%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Last synced: 9 months ago · JSON representation

Repository

akmedoids for longitudinal clustering

Basic Info

Host: GitHub
Owner: MAnalytics
License: gpl-3.0
Language: HTML
Default Branch: master
Size: 3.24 MB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 2
Releases: 1

Created over 6 years ago · Last pushed almost 5 years ago

Metadata Files

Readme Changelog License

README.Rmd

---
#output: github_document
output: md_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# akmedoids




An R package for analyzing and clustering longitudinal data

### Description

The `akmedoids` package advances the clustering of longitudinal datasets in order to identify clusters of trajectories with similar long-term linear trends over time, providing an improved cluster identification as compared with the classic kmeans algorithm. The package also includes a set of functions for addressing common data issues, such as missing entries and outliers, prior to conducting advance longitudinal data analysis. One of the key objectives of this package is to facilitate easy replication of a recent paper which examined small area inequality in the crime drop (Adepeju et al. 2020). Many of the functions provided in the `akmedoids` package may be applied to longitudinal data in general. 

**For more information and usability, check out details on [CRAN](https://cran.r-project.org/web/packages/akmedoids/index.html).**


### Installation from `CRAN`

From an R console, type:

```{r, echo=TRUE, message=FALSE, eval=TRUE}
#install.packages("akmedoids")
library(akmedoids)

#Other libraries
library(tidyr)
library(ggplot2)
library(reshape)
library(readr)
```
To install the development version of the package, type `remotes::install_github("MAnalytics/akmedoids")`. Please, report any installation problems in the issues.

### Example usage:

Given a longitudinal datasets, the following is an example of how `akmedoids` could be used to extract clusters of trajectories with similar long-term trends over time. We will use a simulated dataset (named `simulated.rda`) stored in the `data/` directory.


### Generating artificial dataset

Simulating data set which comprised of three clusters with distinct mean directional change over time. Each group contains 50 trajectories.


```{r, echo=TRUE, message=FALSE, eval=TRUE}
dir.create("input") # create a folder

#function for creating longitudinal noise
noise_fn <- function(x=3, time){
  rnorm(length(time), mean=0, x)}

#function for simulating a trajectory group
sim_group <- function(gr_baseline, sd, time){
  intcp_errors <- rgamma(1, shape=2, scale=sd) #intercept error
  mean_traj = gr_baseline + intcp_errors
  traj = mean_traj + noise_fn(intcp_errors, time)
}

#time steps
t_steps <- c(0:20)

#increasing group
i_gr <- NULL
for(i in seq_len(50)){
  i_gr <- rbind(i_gr,
                sim_group(gr_baseline=(0.5*t_steps),
                          sd=1, time=t_steps))
}

#stable group
s_gr <- NULL
for(i in seq_len(50)){
  s_gr <- rbind(s_gr,
                sim_group(gr_baseline=rep(3,length(t_steps)),
                          sd=1, time=t_steps))
}

#decreasing group
d_gr <- NULL
for(i in seq_len(50)){
  d_gr <- rbind(d_gr,
                sim_group(gr_baseline=(10 - (0.5*t_steps)),
                          sd=1, time=t_steps))
}

#combine groups
simulated <- data.frame(rbind(i_gr, s_gr, d_gr))

#add group label
simulated <- data.frame(cbind(ID=1:nrow(simulated), simulated))

colnames(simulated) <- c("ID", 1:(ncol(simulated)-1))

#save data set
##simulated = readr::write_csv(simulated, "input/example-simulated.csv")

```


### Visualising artificial dataset

```{r, echo=TRUE, message=FALSE, eval=TRUE}

#import already save simulated data
Import_simulated = read_csv(file="./input/example-simulated.csv")

#preview the data
head(Import_simulated)


#convert wide-format into long
simulated_long <- melt(t(Import_simulated), id.vars=c("ID"))

 simulated_long <- simulated_long %>%
  dplyr::filter(X1!="ID") %>%
  dplyr::rename(Time=X1, ID=X2)
# 
simulated_long <- data.frame(cbind(simulated_long,
                                   Groups= c(rep("Increasing", 50*21),
                                   rep("Stable", 50*21),
                                   rep("Decreasing", 50*21))))

#re-order levels
simulated_long$Time <- factor(simulated_long$Time, 
                                levels = c(1:21))

ggplot(simulated_long, aes(x = Time, y = value, group=ID, color=Groups)) +
  geom_point(size=0.5) + 
  geom_line() +
  scale_color_manual(values=c('#999999','#E69F00', '#56B4E9')) +
  theme_light()

```

### Performing clustering using `akmedoids`

Performing clustering analysis using `akmedoids` package.

```{r, echo=TRUE, message=TRUE, eval=TRUE}

output <- akclustr(Import_simulated, id_field=TRUE, verbose = FALSE, k=c(3,12), crit = "Silhouette",
                  quality_plot=TRUE)

#Remark: The entire output can be printed out to the console by typing `output`

#To check the optimal cluster size, type:
output$optimal_k

#Show quality plot
#plot(output$QltyPlot)

#user may decide the examine the plot post clustering....
```

### Documentation

From an R console type `??akmedoids` for help on the package. The package page on CRAN is [here](https://cran.r-project.org/web/packages/akmedoids/index.html), package reference manual is [here](https://cran.r-project.org/web/packages/akmedoids/akmedoids.pdf), package vignette is [here](https://cran.r-project.org/web/packages/akmedoids/vignettes/akmedoids-vignette.html). 

### Support and Contributions:

For support and bug reports send an email to: monsuur2010@yahoo.com or open an issue [here](https://github.com/MAnalytics/akmedoids/issues). Code contributions to akmedoids are also very welcome.

### References:

Rousseeuw, P. J. 1987. “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics, no. 20: 53–6. [link](https://www.bibsonomy.org/bibtex/bc0f62c7895f91c787354d03f23da976)

Caliński, T., and J. Harabasz. 1974. “A Dendrite Method for Cluster Analysis.” Communications in Statistics-Theory and Methods, 3(1): 1–27. [link](https://www.tandfonline.com/doi/abs/10.1080/03610927408827101)

Adepeju, M., Langton, S. and Bannister, J. 2020. Anchored k-medoids: a novel adaptation of k-means further refined to measure instability in the exposure to crime. Journal of Computational Social Science, (revised).

Owner

Name: Monsuru Adepeju
Login: MAnalytics
Kind: user
Location: Manchester
Company: Manchester Metropolitan University

Website: https://mnalytics.com/
Twitter: AdepejuMonsuru
Repositories: 3
Profile: https://github.com/MAnalytics

I specialize in Geocomputation, Big (Crime and Police) Data Analytics, and GIS Application Development.

JOSS Publication

Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets

Published

December 04, 2020

DOI

10.21105/joss.02379

Volume 5, Issue 56, Page 2379

Authors

Monsuru Adepeju

Crime and Well-being Big Data Centre, Manchester Metropolitan University

Samuel Langton

Crime and Well-being Big Data Centre, Manchester Metropolitan University

Jon Bannister

Crime and Well-being Big Data Centre, Manchester Metropolitan University

Editor

Karthik Ram

GitHub Events

Total

Last Year

Committers

Last synced: 10 months ago

All Time

Total Commits: 539
Total Committers: 3
Avg Commits per committer: 179.667
Development Distribution Score (DDS): 0.032

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
geoMADE	g**d@l**k	522
Monsuru Adepeju	m**u@m**k	10
geoMADE		7

Committer Domains (Top 20 + Academic)

mmu.ac.uk: 1 leeds.ac.uk: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: 9 days
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 1.33
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

davidgohel (2)
andrewmaclachlan (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
Hmisc * imports
clusterCrit * imports
ggplot2 * imports
grDevices * imports
kml * imports
reshape2 * imports
signal * imports
stats * imports
utils * imports
devtools * suggests
digest * suggests
dplyr * suggests
gdtools * suggests
kableExtra * suggests
knitr * suggests
rgl * suggests
rmarkdown * suggests
testthat * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets

Science Score: 95.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

Akmedoids R package for generating directionally-homogeneous clusters of longitudinal data sets

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies