registr 2.0
registr 2.0: Incomplete Curve Registration for Exponential Family Functional Data - Published in JOSS (2021)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
5 of 9 committers (55.6%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Last synced: 6 months ago
·
JSON representation
Repository
Register exponential family curves
Basic Info
- Host: GitHub
- Owner: julia-wrobel
- License: other
- Language: R
- Default Branch: master
- Size: 14.6 MB
Statistics
- Stars: 17
- Watchers: 4
- Forks: 3
- Open Issues: 3
- Releases: 2
Created almost 9 years ago
· Last pushed about 2 years ago
Metadata Files
Readme
Changelog
License
README.Rmd
--- output: github_document --- # registr[](https://cran.r-project.org/package=registr) [](https://cran.r-project.org/package=registr) [](https://travis-ci.org/julia-wrobel/registr) [](https://ci.appveyor.com/project/julia-wrobel/registr) [](https://codecov.io/gh/julia-wrobel/registr/coverage.svg?branch=master) [](https://doi.org/10.21105/joss.02964) [](https://github.com/julia-wrobel/registr/actions) ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, collapse = TRUE) ``` Registration for incomplete exponential family functional data. * Authors: [Julia Wrobel](http://juliawrobel.com), [Alexander Bauer](https://www.en.stablab.stat.uni-muenchen.de/people/doktoranden/bauer1/index.html), [Erin McDonnell](http://eimcdonnell.com/), and [Jeff Goldsmith](https://jeffgoldsmith.com/) * License: [MIT](https://opensource.org/licenses/MIT). See the [LICENSE](LICENSE) file for details * Version: 2.1 ### What it does --------------- Functional data analysis is a set of tools for understanding patterns and variability in data where the basic unit of observation is a curve measured over some domain such as time or space. An example is an accelerometer study where intensity of physical activity was measured at each minute over 24 hours for 50 subjects. The data will contain 50 curves, where each curve is the 24-hour activity profile for a particular subject. Classic functional data analysis assumes that each curve is continuous or comes from a Gaussian distribution. However, applications with exponential family functional data -- curves that arise from any exponential family distribution, and have a smooth latent mean -- are increasingly common. For example, take the accelerometer data just mentioned, but assume researchers are interested in *sedentary behavior* instead of *activity intensity*. At each minute over 24 hours they collect a binary measurement that indicates whether a subject was active or inactive (sedentary). Now we have a *binary curve* for each subject -- a trajectory where each time point can take on a value of 0 or 1. We assume the binary curve has a smooth latent mean, which in this case is interpreted as the probability of being active at each minute over 24 hours. This is a example of exponential family functional data. Often in a functional dataset curves have similar underlying patterns but the main features of each curve, such as the minimum and maximum, have shifts such that the data appear misaligned. This misalignment can obscure patterns shared across curves and produce messy summary statistics. Registration methods reduce variability in functional data and clarify underlying patterns by aligning curves. This package implements statistical methods for registering exponential family functional data. The basic methods are described in more detail in our [paper](http://juliawrobel.com/Downloads/registration_ef.pdf) and were further adapted to (potentially) incomplete curve settings where (some) curves are not observed from the very beginning and/or until the very end of the common domain. For details on the incomplete curve methodology and how to use it see the corresponding package vignette. Instructions for installing the software and using it to register simulated binary data are provided below. ### Installation --------------- To install from `CRAN`, please use: ```{r, eval = FALSE, echo = TRUE} install.packages("registr") ``` To install the latest version directly from Github, please use: ```{r, eval = FALSE, echo = TRUE} install.packages("devtools") devtools::install_github("julia-wrobel/registr") ``` The `registr` package includes vignettes with more details on package use and functionality. To install the latest version and pull up the vignettes please use: ```{r, eval = FALSE, echo = TRUE} devtools::install_github("julia-wrobel/registr", build_vignettes = TRUE) vignette(package = "registr") ``` ### How to use it --------------- This example registers simulated binary data. More details on the use of the package can be found in the vignettes mentioned above. The code below uses `registr::simulate_unregistered_curves()` to simulate curves for 100 subjects with 200 timepoints each, observed over domain $(0, 1)$. All curves have similar structure but the location of the peak is shifted. On the observed domain $t^*$ the curves are unregistered (misaligned). On the domain $t$ the curves are registered (aligned). ```{r simulate_data, echo = TRUE} library(registr) registration_data = simulate_unregistered_curves(I = 100, D = 200, seed = 2018) ``` The plot below shows the unregistered curves and registered curves. ```{r plot_sim_data, fig.align='center', fig.height=3, fig.width=9} library(tidyverse) library(cowplot) unreg = ggplot(registration_data, aes(x = index, y = boot::inv.logit(latent_mean), group = id)) + geom_path(alpha = .25) + theme_bw() + labs(x = "t_star", y = "Prob(Y = 1)") reg = ggplot(registration_data, aes(x = t, y = boot::inv.logit(latent_mean), group = id)) + geom_path(alpha = .25) + theme_bw() + labs(x = "t", y = "Prob(Y = 1)") cowplot::plot_grid(unreg, reg, ncol = 2) ``` Continuously observed curves are shown above in order to illustrate the misalignment problem and our simulated data; the simulated dataset also includes binary values which have been generated by using these continuous curves as probabilities. The unregistered and registered binary curves for two subjects are shown below. ```{r plot_2subjs, fig.align='center', fig.height=3, fig.width=9} IDs = c(63, 85) sub_data = registration_data %>% filter(id %in% IDs) unreg = ggplot(sub_data, aes(x = index, y = boot::inv.logit(latent_mean), group = id, color = factor(id))) + geom_path() + theme_bw() + theme(legend.position = "none") + geom_point(aes(y = value), alpha = 0.25, size = 0.25) + labs(x = "t_star", y = "Prob(Y = 1)") reg = ggplot(sub_data, aes(x = t, y = boot::inv.logit(latent_mean), group = id, color = factor(id))) + geom_path() + theme_bw() + theme(legend.position = "none") + geom_point(aes(y = value), alpha = 0.25, size = 0.25) + labs(x = "t", y = "Prob(Y = 1)") cowplot::plot_grid(unreg, reg, ncol = 2) ``` Our software registers curves by estimating $t$. For this we use the function `registration_fpca()`. ```{r register_data, echo = TRUE, message = TRUE} binary_registration = register_fpca(Y = registration_data, family = "binomial", Kt = 6, Kh = 4, npc = 1) ``` The plot below shows unregistered, true registered, and estimated registered binary curves for two subjects after fitting our method. ```{r plot_fit, fig.align='center', fig.height=3, fig.width=9} sub_data = binary_registration$Y %>% filter(id %in% IDs) unreg = ggplot(sub_data, aes(x = tstar, y = boot::inv.logit(latent_mean), group = id, color = factor(id))) + geom_path() + theme_bw() + theme(legend.position = "none") + geom_point(aes(y = value), alpha = 0.25, size = 0.25) + labs(x = "t_star", y = "Prob(Y = 1)") reg = ggplot(sub_data, aes(x = t, y = boot::inv.logit(latent_mean), group = id, color = factor(id))) + geom_path() + theme_bw() + theme(legend.position = "none") + geom_point(aes(y = value), alpha = 0.25, size = 0.25) + labs(x = "t", y = "Prob(Y = 1)") reg_hat = ggplot(sub_data, aes(x = t_hat, y = boot::inv.logit(latent_mean), group = id, color = factor(id))) + geom_path() + theme_bw() + theme(legend.position = "none") + geom_point(aes(y = value), alpha = 0.25, size = 0.25) + labs(x = "t", y = "Prob(Y = 1)") cowplot::plot_grid(unreg, reg, reg_hat, ncol = 3) ``` ### Citation If you like our software, please cite it in your work! To cite the latest `CRAN` version of the package with `BibTeX`, use ```{} @Manual{, title = {registr: Registration for Exponential Family Functional Data}, author = {Julia Wrobel and Alexander Bauer and Erin McDonnell and Jeff Goldsmith}, year = {2022}, note = {R package version 2.1.0}, url = {https://CRAN.R-project.org/package=registr}, } ``` To cite the 2021 Journal of Open Source Software paper, use ```{} @article{wrobel2021registr, title={registr 2.0: Incomplete Curve Registration for Exponential Family Functional Data}, author={Wrobel, Julia and Bauer, Alexander}, journal={Journal of Open Source Software}, volume={6}, number={61}, pages={2964}, year={2021} } ``` To cite the 2018 Journal of Open Source Software paper, use ```{} @article{wrobel2018regis, title={registr: Registration for Exponential Family Functional Data}, author={Wrobel, Julia}, journal={The Journal of Open Source Software}, volume={3}, year={2018} } ``` ### Contributions --------------- If you find small bugs, larger issues, or have suggestions, please file them using the [issue tracker](https://github.com/julia-wrobel/registr/issues) or email the maintainer at
. Contributions (via pull requests or otherwise) are welcome.
Owner
- Name: Julia Wrobel
- Login: julia-wrobel
- Kind: user
- Location: Denver, CO
- Company: University of Colorado School of Public Health
- Website: juliawrobel.com
- Twitter: julialwrobel
- Repositories: 25
- Profile: https://github.com/julia-wrobel
JOSS Publication
register: Registration for Exponential Family Functional Data
Published
February 19, 2018
Volume 3, Issue 22, Page 557
Tags
Statistical analysis Functional dataGitHub Events
Total
- Watch event: 2
- Pull request event: 1
Last Year
- Watch event: 2
- Pull request event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| bauer-alex | a****r@s****e | 147 |
| Julia Wrobel | j****4@c****u | 132 |
| Erin McDonnell | e****7@c****u | 70 |
| muschellij2 | m****2@g****m | 40 |
| Julia Wrobel | j****1@g****m | 6 |
| julia-wrobel | j****l@c****u | 5 |
| Andrew Leroux | a****x@a****n | 3 |
| Jeff Goldsmith | j****h@c****u | 2 |
| fabian-s | f****l@g****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 23
- Total pull requests: 17
- Average time to close issues: about 1 month
- Average time to close pull requests: 23 days
- Total issue authors: 6
- Total pull request authors: 5
- Average comments per issue: 1.17
- Average comments per pull request: 0.47
- Merged pull requests: 16
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- bauer-alex (12)
- corybrunson (4)
- julia-wrobel (3)
- eimcdonnell (2)
- qpmnguyen (1)
- fabian-s (1)
Pull Request Authors
- bauer-alex (10)
- muschellij2 (4)
- olivroy (2)
- fabian-s (1)
- julia-wrobel (1)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels
Dependencies
DESCRIPTION
cran
- R >= 3.2 depends
- MASS * imports
- Matrix * imports
- Rcpp >= 0.11.5 imports
- dplyr * imports
- gamm4 * imports
- lme4 * imports
- magrittr * imports
- mgcv * imports
- parallel * imports
- pbs * imports
- purrr * imports
- tidyr * imports
- utils * imports
- cowplot * suggests
- fastglm * suggests
- ggplot2 * suggests
- knitr * suggests
- pbapply * suggests
- rmarkdown * suggests
- testthat * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/cache v2 composite
- actions/checkout v2 composite
- actions/upload-artifact main composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v1 composite
- r-lib/actions/setup-r-dependencies v1 composite
[](https://cran.r-project.org/package=registr)
[](https://cran.r-project.org/package=registr)
[](https://travis-ci.org/julia-wrobel/registr)
[](https://ci.appveyor.com/project/julia-wrobel/registr)
[](https://codecov.io/gh/julia-wrobel/registr/coverage.svg?branch=master)
[](https://doi.org/10.21105/joss.02964)
[](https://github.com/julia-wrobel/registr/actions)
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE,
warning = FALSE,
message = FALSE,
collapse = TRUE)
```
Registration for incomplete exponential family functional data.
* Authors: [Julia Wrobel](http://juliawrobel.com), [Alexander Bauer](https://www.en.stablab.stat.uni-muenchen.de/people/doktoranden/bauer1/index.html), [Erin McDonnell](http://eimcdonnell.com/),
and [Jeff Goldsmith](https://jeffgoldsmith.com/)
* License: [MIT](https://opensource.org/licenses/MIT). See the [LICENSE](LICENSE) file for details
* Version: 2.1
### What it does
---------------
Functional data analysis is a set of tools for understanding patterns and variability in data where the basic unit of observation is a curve measured over some domain such as time or space. An example is an accelerometer study where intensity of physical activity was measured at each minute over 24 hours for 50 subjects. The data will contain 50 curves, where each curve is the 24-hour activity profile for a particular subject.
Classic functional data analysis assumes that each curve is continuous or comes from a Gaussian distribution. However, applications with exponential family functional data -- curves that arise from any exponential family distribution, and have a smooth latent mean -- are increasingly common. For example, take the accelerometer data just mentioned, but assume researchers are interested in *sedentary behavior* instead of *activity intensity*. At each minute over 24 hours they collect a binary measurement that indicates whether a subject was active or inactive (sedentary). Now we have a *binary curve* for each subject -- a trajectory where each time point can take on a value of 0 or 1. We assume the binary curve has a smooth latent mean, which in this case is interpreted as the probability of being active at each minute over 24 hours. This is a example of exponential family functional data.
Often in a functional dataset curves have similar underlying patterns but the main features of each curve, such as the minimum and maximum, have shifts such that the data appear misaligned. This misalignment can obscure patterns shared across curves and produce messy summary statistics. Registration methods reduce variability in functional data and clarify underlying patterns by aligning curves.
This package implements statistical methods for registering exponential family functional data. The basic methods are described in more detail in our [paper](http://juliawrobel.com/Downloads/registration_ef.pdf) and were further adapted to (potentially) incomplete curve settings where (some) curves
are not observed from the very beginning and/or until the very end of the common domain.
For details on the incomplete curve methodology and how to use it see the corresponding package vignette.
Instructions for installing the software and using it to register simulated binary data are provided below.
### Installation
---------------
To install from `CRAN`, please use:
```{r, eval = FALSE, echo = TRUE}
install.packages("registr")
```
To install the latest version directly from Github, please use:
```{r, eval = FALSE, echo = TRUE}
install.packages("devtools")
devtools::install_github("julia-wrobel/registr")
```
The `registr` package includes vignettes with more details on package use and functionality. To install the latest version and pull up the vignettes please use:
```{r, eval = FALSE, echo = TRUE}
devtools::install_github("julia-wrobel/registr", build_vignettes = TRUE)
vignette(package = "registr")
```
### How to use it
---------------
This example registers simulated binary data. More details on the use of the package can be found in the vignettes mentioned above.
The code below uses `registr::simulate_unregistered_curves()` to simulate curves for 100 subjects with 200 timepoints each, observed over domain $(0, 1)$. All curves have similar structure but the location of the peak is shifted. On the observed domain $t^*$ the curves are unregistered (misaligned). On the domain $t$ the curves are registered (aligned).
```{r simulate_data, echo = TRUE}
library(registr)
registration_data = simulate_unregistered_curves(I = 100, D = 200, seed = 2018)
```
The plot below shows the unregistered curves and registered curves.
```{r plot_sim_data, fig.align='center', fig.height=3, fig.width=9}
library(tidyverse)
library(cowplot)
unreg = ggplot(registration_data, aes(x = index, y = boot::inv.logit(latent_mean),
group = id)) +
geom_path(alpha = .25) + theme_bw() +
labs(x = "t_star", y = "Prob(Y = 1)")
reg = ggplot(registration_data, aes(x = t, y = boot::inv.logit(latent_mean),
group = id)) +
geom_path(alpha = .25) + theme_bw() +
labs(x = "t", y = "Prob(Y = 1)")
cowplot::plot_grid(unreg, reg, ncol = 2)
```
Continuously observed curves are shown above in order to illustrate the misalignment problem and our simulated data; the simulated dataset also includes binary values which have been generated by using these continuous curves as probabilities. The unregistered and registered binary curves for two subjects are shown below.
```{r plot_2subjs, fig.align='center', fig.height=3, fig.width=9}
IDs = c(63, 85)
sub_data = registration_data %>% filter(id %in% IDs)
unreg = ggplot(sub_data, aes(x = index, y = boot::inv.logit(latent_mean),
group = id, color = factor(id))) +
geom_path() + theme_bw() + theme(legend.position = "none") +
geom_point(aes(y = value), alpha = 0.25, size = 0.25) +
labs(x = "t_star", y = "Prob(Y = 1)")
reg = ggplot(sub_data, aes(x = t, y = boot::inv.logit(latent_mean),
group = id, color = factor(id))) +
geom_path() + theme_bw() + theme(legend.position = "none") +
geom_point(aes(y = value), alpha = 0.25, size = 0.25) +
labs(x = "t", y = "Prob(Y = 1)")
cowplot::plot_grid(unreg, reg, ncol = 2)
```
Our software registers curves by estimating $t$. For this we use the function `registration_fpca()`.
```{r register_data, echo = TRUE, message = TRUE}
binary_registration = register_fpca(Y = registration_data, family = "binomial",
Kt = 6, Kh = 4, npc = 1)
```
The plot below shows unregistered, true registered, and estimated registered binary curves for two subjects after fitting our method.
```{r plot_fit, fig.align='center', fig.height=3, fig.width=9}
sub_data = binary_registration$Y %>% filter(id %in% IDs)
unreg = ggplot(sub_data, aes(x = tstar, y = boot::inv.logit(latent_mean),
group = id, color = factor(id))) +
geom_path() + theme_bw() + theme(legend.position = "none") +
geom_point(aes(y = value), alpha = 0.25, size = 0.25) +
labs(x = "t_star", y = "Prob(Y = 1)")
reg = ggplot(sub_data, aes(x = t, y = boot::inv.logit(latent_mean),
group = id, color = factor(id))) +
geom_path() + theme_bw() + theme(legend.position = "none") +
geom_point(aes(y = value), alpha = 0.25, size = 0.25) +
labs(x = "t", y = "Prob(Y = 1)")
reg_hat = ggplot(sub_data, aes(x = t_hat, y = boot::inv.logit(latent_mean),
group = id, color = factor(id))) +
geom_path() + theme_bw() + theme(legend.position = "none") +
geom_point(aes(y = value), alpha = 0.25, size = 0.25) +
labs(x = "t", y = "Prob(Y = 1)")
cowplot::plot_grid(unreg, reg, reg_hat, ncol = 3)
```
### Citation
If you like our software, please cite it in your work! To cite the latest `CRAN` version of the package with `BibTeX`, use
```{}
@Manual{,
title = {registr: Registration for Exponential Family Functional Data},
author = {Julia Wrobel and Alexander Bauer and Erin McDonnell and Jeff Goldsmith},
year = {2022},
note = {R package version 2.1.0},
url = {https://CRAN.R-project.org/package=registr},
}
```
To cite the 2021 Journal of Open Source Software paper, use
```{}
@article{wrobel2021registr,
title={registr 2.0: Incomplete Curve Registration for Exponential Family Functional Data},
author={Wrobel, Julia and Bauer, Alexander},
journal={Journal of Open Source Software},
volume={6},
number={61},
pages={2964},
year={2021}
}
```
To cite the 2018 Journal of Open Source Software paper, use
```{}
@article{wrobel2018regis,
title={registr: Registration for Exponential Family Functional Data},
author={Wrobel, Julia},
journal={The Journal of Open Source Software},
volume={3},
year={2018}
}
```
### Contributions
---------------
If you find small bugs, larger issues, or have suggestions, please file them using the [issue tracker](https://github.com/julia-wrobel/registr/issues) or email the maintainer at 