comparison-clustering-longitudinal-data
Supplementary materials for the manuscript "A comparison of methods for clustering longitudinal data with slowly changing trends" by N. G. P. Den Teuling, S.C. Pauws, and E.R. van den Heuvel, published in Communications in Statistics - Simulation and Computation (2021).
https://github.com/philips-labs/comparison-clustering-longitudinal-data
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.1%) to scientific vocabulary
Keywords
Repository
Supplementary materials for the manuscript "A comparison of methods for clustering longitudinal data with slowly changing trends" by N. G. P. Den Teuling, S.C. Pauws, and E.R. van den Heuvel, published in Communications in Statistics - Simulation and Computation (2021).
Basic Info
- Host: GitHub
- Owner: philips-labs
- License: gpl-2.0
- Language: R
- Default Branch: main
- Homepage: https://doi.org/10.1080/03610918.2020.1861464
- Size: 144 KB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
comparison-clustering-longitudinal-data
This repository contains all R code used in running and analyzing the simulation study and case study reported in the manuscript.
As the simulation study involves many simulation settings (over 27,000) and the estimation time of some methods was rather long, a custom parallel simulation framework was implemented for use on a computation cluster. While a computational cluster is not strictly needed if you are only interested in replicating a subset of the simulation scenarios or methods, you will need to configure a Redis database server (https://redis.io/) in order to run any simulations. The instructions are provided below.
The complete database of simulation results (600 MB) is available upon request.
Useful links
- MixTVEM source code used in the simulation study - https://github.com/dziakj1/MixTVEM
- lcmm R package, used for estimating GMM and GBTM - https://cran.r-project.org/package=lcmm
- kml R package, used for estimating KmL - https://cran.r-project.org/package=kml
- latrend R package: The longitudinal clustering framework that we have created, originating from the learnings of this work - https://github.com/philips-software/latrend
Getting started
- Either load the Rstudio project file
comparison.Rproj, or start an R session with the working directory set to the root repository directory. - Install required packages and dependencies
R install.packages(c("assertthat", "data.table", "effects", "ggplot2", "igraph", "kml", "latex2exp", "lcmm", "lpSolve", "memoise", "mvnfast", "magrittr", "multcompView", "nlme", "polynom", "R.utils", "rredis", "scales", "weights"), dependencies = TRUE) - Create an
.Rprofilefile with the following content: ```R FIGDIR <- 'figs' # directory to export figures to TABDIR <- 'tabs' # directory to export model coefficient tables to OSUUSAGEDATAFILE <- '../data/' CASE OSURESULTSDIR <- '../caseresults' # directory where to store the models
REDISHOSTFILE <- 'redis/localhost.txt' # file specifying hostname and port REDIS_PWD <- 'password' # server AUTH password
source('include.R') ``` Change file and directory paths as needed.
- Restart the R session. This should now automatically run the
.Rprofilefile, which you can tell by the output in the console on start-up. Theinclude.Rscript loads all required packages and functions.
You should now be able to run all functions and scripts. Running simulation studies requires a Redis database server to be configured.
Redis database
The Redis database stores the open jobs as well as the results of completed jobs. Parallel workers fetch jobs from the Redis queue, and store result in the respective experiment set. The benefit of storing results in the database is that it avoids the rather large file system overhead from saving thousands of small result files.
Installing Redis server
Windows
- Download the Redis binaries. Older binaries are available at https://github.com/microsoftarchive/redis/ (download link)
- Install Redis
- Make sure Redis is added to your system's
PATHenvironment variable. - Let Redis use the default port (6379).
- Make sure Redis is added to your system's
Unix
WIP
1. set BASEDIR in redis.ksh
Starting Redis server
You need to start the Redis server before you can run simulations or retrieve simulation results.
The Redis configuration file included in the repository here configures a server on port 6379 with password "password" and database saved to redis/database.rdb. A server password is required because the simulation R code connects to Redis using authentication.
Windows
In order to start the Redis server on Windows, run redis.bat. Alternatively, you can open the command line in the root repository directory and execute redis-server redis/redis.conf
If everything is configured correctly, you should see the following window:

If no window shows up, that indicates the Redis server failed to start. First check if the database directory path exists.
Unix
From the root directory of the repository, run
redis-server redis/redis.conf
Connect to Redis
After you have confirmed that the Redis server is running and you have opened an R session with all scripts loaded, connect to Redis in R by running redis_connect(). You should see the message "Connected to Redis at localhost:6379.".
Running simulations
All simulation scenarios described in the manuscript are located inside the experiments folder. Simulation scenarios are defined in R scripts prefixed by exp_.
Generating simulation settings
As an example, the simulation settings for the scenario involving a known number of clusters are defined and generated in expnormalknown.R.
Specifically, the scenario with two-cluster dataset with quadratic trends and varying number of trajetories, observations, random effects, and noise, are generated using:
R
cases_normal2 <- expand.grid(
data = c('longdata_randquad2'),
model = c('longmodel_kml', 'longmodel_gcm2km', 'longmodel_gbtm2', 'longmodel_gmm2', 'longmodel_mixtvem_nugget'),
numtraj = c(200, 500, 1000),
numobs = c(4, 10, 25),
numclus = 2,
re = c(RE_NORM_LOW, RE_NORM_MED, RE_NORM_HIGH),
noise = c(.01, .1),
dataseed = 1:100,
seed = 1
) %>%
as.data.table() %T>%
print()
The model names passed through the model argument are names of the functions defined in the methods folder. This makes it relatively easy to define and evaluate new methods.
Providing dataseed = 1:100 results in 100 different datasets being generated.
Queueing simulation jobs
After generating the table of simulation settings, we can submit them to the job queue using the experiment_submit() function. Only jobs which have not been previously evaluated are added.
R
redis_connect() # connect to Redis first
experiment_submit(name = 'normal_known', cases = cases_normal2)

Starting parallel workers
The submitted jobs now need to be evaluated. This evaluation is done by worker instances.
To start a simulation worker on Windows, run worker.bat.
However, for this to work, R needs to be in your PATH environment variable so Windows can locate the R executable file.
On Linux, in the command line from the repository directory, run
R --slave -f redis/worker.R
On computational clusters, you can start worker batch jobs in a similar manner.
You can start as many workers as your system allows. The workers will pull jobs from the queue and evaluate them. When no more jobs are open, the workers will terminate.
You can also evaluate jobs in the master R session by sourcing the redis/worker.R script.
Helper functions
Jobs
R
job_monitor() # monitor number of remaining jobs over time
job_count() # returns number of open jobs
job_clear() # clear the job queue
Experiments
R
experiment_names() # get list of evaluated experiments
experiment_delete(name) # delete all results of the respective experiment
Evaluating simulation results
Simulation results can be retrieved and analyzed at any moment in time, returning all job results that have been completed up to that moment. All simulation scenario analysis scripts are located inside the experiments folder, prefixed by analysis_.
Retrieving results
Methods output their results as a named list of scalar values. Results can therefore be easily combined into a table. All evaluated cases can be retrieved as a single data.table object using the experiment_getOutputTable() function.
```R
resultsnormalall <- experimentgetOutputTable('normalknown')
head(resultsnormalall)
```

Owner
- Name: Philips Labs
- Login: philips-labs
- Kind: organization
- Location: Netherlands
- Repositories: 131
- Profile: https://github.com/philips-labs
Philips Labs - Projects in development
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
preferred-citation:
type: article
authors:
- family-names: "Den Teuling"
given-names: "Niek G. P."
orcid: "https://orcid.org/0000-0003-1026-5080"
- family-names: "Pauws"
given-names: "Steffen C."
orcid: "https://orcid.org/0000-0003-2257-9239"
- family-names: "van den Heuvel"
given-names: "Edwin R."
orcid: "https://orcid.org/0000-0001-9157-7224"
doi: "10.1080/03610918.2020.1861464"
journal: "Communications in Statistics - Simulation and Computation"
start: 1 # First page number
end: 28 # Last page number
title: "A comparison of methods for clustering longitudinal data with slowly changing trends"
year: 2021
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0