codaredistlm

Functions to analyse compositional data and produce confidence intervals for relative increases and decreases in the compositional components

https://github.com/tystan/codaredistlm

Keywords

coda compositional-data-analysis isometric-log-ratio multiple-linear-regression plotting prediction

Last synced: 9 months ago · JSON representation

Repository

Functions to analyse compositional data and produce confidence intervals for relative increases and decreases in the compositional components

Basic Info

Host: GitHub
Owner: tystan
Language: R
Default Branch: main
Homepage:
Size: 150 KB

Statistics

Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

coda compositional-data-analysis isometric-log-ratio multiple-linear-regression plotting prediction

Created over 3 years ago · Last pushed over 3 years ago

Metadata Files

Readme Changelog

README.md

Please note that this package (codaredistlm, Compositional Data Analysis [CoDA] redistribution linear model) is the new actively maintained package previously known as deltacomp.

The `codaredistlm` package

Functions to analyse compositional data and produce predictions (with confidence intervals) for relative increases and decreases in the compositional parts.

1. Background

For an outcome variable Y, D compositional parts (x_1, ..., x_D) and C covariates (z_1, ..., z_C); this package fits the compositional data analysis model (notation inexact):

Y = b_0 + b_1 ilr_1 + ... + b_{D-1} ilr_{D-1} + a_1 z_1 + ... + a_C z_C + e

where ilr_i are the D-1 isometric log ratio variables derived from the D compositional parts (x_1, ..., x_D), b_0, ..., b_{D-1}, a_1, ..., a_C are D+C parameters to be estimated and e ~ N(0, sigma) is the error. The package then makes predictions in alterations of the time-use variables (the linearly dependent set of compositional parts) based on this model.

For a starting point to learn about compositional data analysis please see Aitchison (1982) or van den Boogaart and Tolosana-Delgado (2013). However the articles Dumuid et al. (2017a) and Dumuid et al. (2017b) may be more approachable introductions.

2. Reallocation of time-use component options

Please note that the use of 'mean composition' means the geometric mean on the compositional simplex and not the arithmetic mean. If these words have little meaning to you, that is no problems as these differently calculated means likely do not differ much in your dataset.

2.1. Option `comparisons = "prop-realloc"`

Information on outcome prediction with time-use exchange between one part and the remaining compositional parts proportionally (comparisons = "prop-realloc" option of the predict_delta_comps() function), please see Dumuid et al. (2017a).

2.1.1. Example

Suppose you have three (predictor) parts in a day summing to 1 (e.g., a day) to predict an outcome variable. The three parts are sedentary, sleep and activity. Let's assume the mean sampled composition is:

sedentary = 0.5 (i.e., half a day)
sleep = 0.3 (i.e., 30% a day)
activity = 0.2 (i.e., 20% a day)

If you wanted to predict the change in the outcome variable from the above mean composition with delta = +0.05 (5% of the day) is added to sedentary, the option comparisons = "prop-realloc" reduces the remaining parts by the 5% proportionately based on their mean values, illustrated below:

sedentary* = 0.5 + delta = 0.5 + 0.05 = 0.55
sleep* = 0.3 - delta * sleep / (sleep + activity) = 0.3 - 0.05 * 0.3 / (0.3 + 0.2) = 0.3 - 0.03 = 0.27
activity* = 0.2 - delta * activity / (sleep + activity) = 0.2 - 0.05 * 0.2 / (0.3 + 0.2) = 0.2 - 0.02 = 0.18

Noting that the new compsition: sedentary* + sleep* + activity* = 0.55 + 0.27 + 0.18 = 1.

Note for the example above, the option comparisons = "prop-realloc" in predict_delta_comps() will actually automatically produce separate predictions for a delta = +0.05 on each of the parts against the remaining parts. i.e., not only the sedentary* = 0.5 + delta scenario as illustrated above but also sleep* = 0.3 + delta and activity* = 0.2 + delta cases.

2.2. Option `comparisons = "one-v-one"`

For information on outcome prediction with time-use exchange between two compositional parts (i.e., the comparisons = "one-v-one" option of the predict_delta_comps() function), please see Dumuid et al. (2017b).

2.2.1. Example

Similarly to the previous example, suppose you have three (predictor) parts in a day summing to 1 (i.e. a day) to predict an outcome variable. The three parts are sedentary, sleep and activity. Let's assume the mean sampled composition is:

sedentary = 0.5 (i.e., half a day)
sleep = 0.3 (i.e., 30% a day)
activity = 0.2 (i.e., 20% a day)

If you wanted to predict the change in the outcome variable from the above mean composition with delta = +0.05 (5% of the day), the option comparisons = "one-v-one" looks at all pairwise exchanges between the parts (sedentary*, sleep*, activity*):

(0.5 + 0.05, 0.3 - 0.05, 0.2 )
(0.5 + 0.05, 0.3 , 0.2 - 0.05)
(0.5 , 0.3 + 0.05, 0.2 - 0.05)
(0.5 - 0.05, 0.3 + 0.05, 0.2 )
(0.5 - 0.05, 0.3 , 0.2 + 0.05)
(0.5 , 0.3 - 0.05, 0.2 + 0.05)

3. Datasets in package

Two datasets are supplied with the package:

fairclough and
fat_data.

The fairclough dataset was kindly provided by the authors of Fairclough et al. (2017). fat_data is a randomly generated test dataset that might roughly mimic a real dataset.

4. Example usage

```R library(devtools) # see https://www.r-project.org/nosvn/pandoc/devtools.html devtools::install_github('tystan/codaredistlm') library(codaredistlm)

see help file to run example

?predictdeltacomps

predictdeltacomps( dataf = fat_data, y = "fat", comps = c("sl", "sb", "lpa", "mvpa"), covars = c("sibs", "parents", "ed"), deltas = seq(-60, 60, by = 5) / (24 * 60), comparisons = "prop-realloc", alpha = 0.05 )

OR

predictdeltacomps( dataf = fat_data, y = "fat", comps = c("sl", "sb", "lpa", "mvpa"), covars = c("sibs", "parents", "ed"), deltas = seq(-60, 60, by = 5) / (24 * 60), comparisons = "one-v-one", alpha = 0.05 )

```

5. Output and plotting results

Output is a data.frame that can be turned into the plot below using the following code.

```R

preddf <- predictdeltacomps( dataf = fairclough, y = "zbmi", comps = c("sleep", "sed", "lpa", "mvpa"), covars = c("decimalage", "sex"), # careful deltas greater than 25 min in magnitude induce negative compositions # predictdelta_comps() will warn you about this :-) deltas = seq(-20, 20, by = 5) / (24 * 60), comparisons = "prop-realloc", # or try "one-v-one" alpha = 0.05 )

plotdeltacomp( preddf, # provide the returned object from predictdeltacomps() # x-axis can be converted from propotion of composition to meaningful units comptotal = 24 * 60, # minutes available in the composition units_lab = "min" # just a label for plotting )

```

5.1. Prediction for the mean composition

The function predict_delta_comps() now outputs the predicted outcome value (with 100 * (1 - alpha)% confidence interval). This data is printed to the console but also can be extracted from the output of predict_delta_comps() as per the below code:

```R

produces a 1 line data.frame that contains

the (simplex/geometric) mean composition,

the "average" covariates (the median of the factor variables in order of the levels are taken as default),

the ilr coords of the (simplex/geometric) mean composition, and

the predicted outcome value with 100*(1-alpha)% confidence interval

attr(preddf, "meanpred")

```

6. Release notes

See /change-notes.md.

Owner

Name: Ty Stanford
Login: tystan
Kind: user
Location: Adelaide, Australia

Website: https://people.unisa.edu.au/Ty.Stanford/
Repositories: 6
Profile: https://github.com/tystan

(Bio)statistician

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 7
Total Committers: 2
Avg Commits per committer: 3.5
Development Distribution Score (DDS): 0.143

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Ty Stanford	t**n@g**m	6
DrLundRasmussen	C**e@g**m	1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 7 days
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 8.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

codaredistlm

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

The codaredistlm package

1. Background

2. Reallocation of time-use component options

2.1. Option comparisons = "prop-realloc"

2.1.1. Example

2.2. Option comparisons = "one-v-one"

2.2.1. Example

3. Datasets in package

4. Example usage

see help file to run example

OR

5. Output and plotting results

5.1. Prediction for the mean composition

produces a 1 line data.frame that contains

the (simplex/geometric) mean composition,

the "average" covariates (the median of the factor variables in order of the levels are taken as default),

the ilr coords of the (simplex/geometric) mean composition, and

the predicted outcome value with 100*(1-alpha)% confidence interval

6. Release notes

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

The `codaredistlm` package

2.1. Option `comparisons = "prop-realloc"`

2.2. Option `comparisons = "one-v-one"`