splithalf

splithalf: robust estimates of split half reliability - Published in JOSS (2021)

https://github.com/sdparsons/splithalf

Scientific Fields

Medicine Life Sciences - 40% confidence

Neuroscience Life Sciences - 40% confidence

Artificial Intelligence and Machine Learning Computer Science - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

splithalf

Basic Info

Host: GitHub
Owner: sdparsons
License: gpl-3.0
Language: R
Default Branch: master
Size: 1010 KB

Statistics

Stars: 12
Watchers: 1
Forks: 6
Open Issues: 8
Releases: 7

Created almost 9 years ago · Last pushed over 3 years ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
bibliography: inst/book.bib
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

```{r, echo = FALSE}
library("ggplot2")
```




[![DOI](https://joss.theoj.org/papers/10.21105/joss.03041/status.svg)](https://doi.org/10.21105/joss.03041)
[![CRAN Version](http://www.r-pkg.org/badges/version/splithalf)](https://cran.r-project.org/package=splithalf)
[![Downloads](https://cranlogs.r-pkg.org/badges/splithalf)](https://cran.r-project.org/package=splithalf)

# splithalf: robust estimates of split half reliability

The `R` package **splithalf** provides tools to estimate the internal consistency reliability of cognitive measures. In particular, the tools were developed for application to tasks that use difference scores as the main outcome measure, for instance the Stroop score or dot-probe attention bias index (average RT in incongruent trials minus average RT in congruent trials). 

The methods in **splithalf** are built around split half reliability estimation. To increase the robustness of these estimates, the package implements a permutation approach that takes a large number of random (without replacement) split halves of the data. For each permutation the correlation between halves is calculated, with the Spearman-Brown correction applied [@spearman_proof_1904]. This process generates a distribution of reliability estimates from which we can extract summary statistics (e.g. average and 95% HDI).

### Why should I estimate the reliabilty of my task measurement?

While many cognitive tasks yield robust effects (e.g. everybody shows a Stroop effect) they may not yield reliable individual differences [@hedge_reliability_2018]. As these measures are used in questions of individual differences researchers need to have some psychometric information for the outcome measures. Recently, it was proposed that psychological science should set a standard expectation for the reporting of reliability information for cognitive and behavioural measures [@parsons_kruijt_fox_2019]. **splithalf** was developed to support this proposition by providing a tool to easily extract internal consistency reliability estimates from behavioural measures. 

## Installation 

The latest release version (`0.7.2` unofficial version name: Kitten Mittens) can be installed from CRAN:

```{r eval=FALSE}
install.packages("splithalf")
```

The current developmental version (`0.8.2` unofficial version name: I eat stickers all the time, dude!) can be installed from Github with:

```{r eval=FALSE}
devtools::install_github("sdparsons/splithalf")
```

**splithalf** requires the **tidyr** [@R-tidyr] and **dplyr** [@R-dplyr] packages for data handling within the functions. The **robustbase** package is used to extract median scores when applicable. The computationally heavy tasks (extracting many random half samples of the data) are written in `c++` via the `R` package **Rcpp** [@R-Rcpp]. Figures use the **ggplot** package [@R-ggplot2], raincloud plots use code adapted from Allen et al. [@allen_raincloud_2019], and the **patchwork** package [@R-patchwork] is used for plotting the multiverse analyses. 

```{r eval = FALSE, include = FALSE, echo = FALSE}
library("tidyr")
library("dplyr")
library("Rcpp")
library("robustbase")
```

### Citing the package

Citing packages is one way for developers to gain some recognition for the time spent maintaining the work. I would like to keep track of how the package is used so that I can solicit feedback and improve the package more generally.  This would also help me track the uptake of reporting measurement reliability over time. 

Please use the following reference for the code: Parsons, S., (2021). splithalf: robust estimates of split half reliability. _Journal of Open Source Software, 6_(60), 3041, 



### User feedback

Developing the splithalf package is a labour of love (and occasionally burning hatred). If you have any suggestions for improvement, additional functionality, or anything else, please contact me (sam.parsons\@psy.ox.ac.uk) or raise an issue on github (https://github.com/sdparsons/splithalf). Likewise, if you are having trouble using the package (e.g. scream fits at your computer screen – we’ve all been there) do contact me and I will do my best to help as quickly as possible. These kind of help requests are super welcome. In fact, the package has seen several increases in performance and usability due to people asking for help. 

## Latest updates:

**Version 0.8.1 now out! [unofficial version name: "Rum Ham"]** Lots of fixed issues in the multiverse functions, and lots more documentation/examples!

**Now on github and submitted to CRAN: VERSION 0.7.2 [unofficial version name: "Kitten Mittens"]** Featuring reliability multiverse analyses!!!

This update includes the addition of reliability multiverse functions. A _splithalf_  object can be inputted into _splithalf.multiverse_ to estimate reliability across a user-defined list of data-processing specifications (e.g. min/max RTs). Then, because sometimes science is more art than science, the multiverse output can be plotted with _multiverse.plot_. A brief tutorial can be found below, and a more comprehensive one can be found in a recent preprint [@parsons_exploring_2020].

Additionally, the output of splithalf has been reworked. Now a list is returned including the specific function calls, the processed data. 

Since version 0.6.2 a user can also set `plot = TRUE` in _splithalf_ to generate a raincloud plot of the distribution of reliability estimates. 

## Examples

### A note on terminology used in this document

It is important that we have a similar understanding of the terminology I use in the package and documentation. Each is also discussed in reference to the functions later in the documentation. 

* Trial – whatever happens in this task, e.g. a stimuli is presented. Importantly, participants give one response per trial
* Trial type – often trials can be split into different trial types (e.g. to compare congruent and incongruent trials)
* Condition - this might be different blocks of trials, or something to be assessed separately within the functions. e.g. a task might have a block of 'positive' trials and a block of 'negative' trials.
* Datatype - I use this to refer to the outcome of interest. specifically whether one is interested in average response times or accuracy rates
* Score - I use score to indicate how the final outcome is measured; e.g. the average RT, or the difference between two average RTs, or even the difference between two differences between two RTs (yes, the final one is confusing)

### A note on preprocessing

The core function _splithalf_ requires that the input dataset has already undergone preprocessing (e.g. removal of error trials, RT trimming, and participants with high error rates). Splithalf should therefore be used with the same data that will be used to calculate summary scores and outcome indices. The exception is in multiverse analyses, as described below. 

For those unfamiliar with R, the following snippets may help with common data-processing steps. I also highly recommend the Software Carpentry course "R for Reproducible Scientific Analysis" (https://swcarpentry.github.io/r-novice-gapminder/).

Note == indicates ‘is equal to’, :: indicates that the function uses the package indicated, in the first case the **dplyr** package [@R-dplyr].


```{r eval = FALSE}
dataset %>%    
  dplyr::filter(accuracy == 1) %>%       ## keeps only trials in which participants made an accurate response
  dplyr::filter(RT >= 100, RT <= 2000)  %>%   ## removes RTs less than 100ms and greater than 2000ms
  dplyr::filter(participant != c(“p23”, “p45”)    ## removes participants “p23” and “p45”
```


If following rt trims you also trimmed by SD, use the following as well. Note that this is for two standard deviations from the mean, within each participant, and within each condition and trialtype.

```{r eval = FALSE}
dataset %>%    
    dplyr::group_by(participant, condition, compare) %>%
    dplyr::mutate(low =  mean(RT) - (2 * sd(RT)),
                  high = mean(RT) + (2 * sd(RT))) %>%
    dplyr::filter(RT >= low & RT <= high)
```

If you want to save yourself effort in running splithalf, you could also rename your variable (column) names to the function defaults using the following

```{r eval = FALSE}
dplyr::rename(dataset,
              RT = "latency",
              condition = FALSE,
              participant = "subject",
              correct = "correct",
              trialnum = "trialnum",
              compare = "congruency")
```

The following examples assume that you have already processed your data to remove outliers, apply any RT cutoffs, etc. A reminder: the data you input into _splithalf_ should be the same as that used to create your final scores - otherwise the resultant estimates will not accurately reflect the reliability of your data. 

### Questions to ask before running splithalf

These questions should feed into what settings are appropriate for your need, and are aimed to make the _splithalf_ function easy to use. 

1. **What type of data do you have?**

Are you interested in response times, or accuracy rates?

Knowing this, you can set `outcome = "RT"`, or `outcome = "accuracy"`


2. **How is your outcome score calculated?**

Say that your response time based task has two trial types; "incongruent" and "congruent". When you analyse your data will you use the average RT in each trial type, or will you create a difference score (or bias) by e.g. subtracting the average RT in congruent trials from the average RT in incongruent trials. The first can be called with `score = "average"` and the second with `score = "difference"`. 


3. **Which method would you like to use to estimate (split-half) reliability?**

A super common way is to split the data into odd and even trials. Another is to split by the first half and second half of the trials. Both approaches are implemented in the _splithalf_ funciton. However, I believe that the permutation splithalf approach is the most applicable in general and so the default is `halftype = "random"`


### An example dataset

For this brief example, we will simulate some data for 60 participants, who each completed a task with two blocks (A and B) of 80 trials. Trials are also evenly distributed between "congruent" and "incongruent" trials. For each trial we have RT data, and are assuming that participants were accurate in all trials. 

```{r}
n_participants = 60 ## sample size
n_trials = 80
n_blocks = 2

sim_data <- data.frame(participant_number = rep(1:n_participants, each = n_blocks * n_trials),
                       trial_number = rep(1:n_trials, times = n_blocks * n_participants),
                       block_name = rep(c("A","B"), each = n_trials, length.out = n_participants * n_trials * n_blocks),
                       trial_type = rep(c("congruent","incongruent"), length.out = n_participants * n_trials * n_blocks),
                       RT = rnorm(n_participants * n_trials * n_blocks, 500, 200),
                       ACC = 1)


```


### Difference scores

This is by far the most common outcome measure I have come across, so lets start with that. 

Our data will be analysed so that we have two 'bias' or 'difference score' outcomes. So, within each block, we will take the average RT in congruent trials and subtract the average RT in incongruent trials. Calculating the final scores for each participant and for each block separately could be done as follows:

```{r comment = NA, echo = TRUE, message = FALSE}
library("dplyr")
library("tidyr")

sim_data %>%
  dplyr::group_by(participant_number, block_name, trial_type) %>%
  dplyr::summarise(average = mean(RT)) %>%
  tidyr::spread(trial_type, average) %>%
  dplyr::mutate(bias = congruent - incongruent)

```

To estimate reliability with _splithalf_ we run the following.

```{r message = FALSE, comment = NA, results = 'hide', warning = FALSE, fig.keep='none'}
library("splithalf")

difference <- splithalf(data = sim_data,
                        outcome = "RT",
                        score = "difference",
                        conditionlist = c("A", "B"),
                        halftype = "random",
                        permutations = 5000,
                        var.RT = "RT",
                        var.condition = "block_name",
                        var.participant = "participant_number",
                        var.compare = "trial_type",
                        compare1 = "congruent",
                        compare2 = "incongruent",
                        average = "mean",
                        plot = TRUE)

```

```{r echo = FALSE, comment = NA}
difference$final_estimates

```

Specifying `plot = TRUE` will also allow you to plot the distributions of reliability estimates. you can extract the plot from a saved object with e.g. `difference$plot`.

```{r plotting_distribution, echo = FALSE, comment = NA}
difference$plot
```

### Reading and reporting the output

The _splithalf_ output gives estimates separately for each condition defined (if no condition is defined, the function assumes that you have only a single condition, which it will call "all" to represent that all trials were included). 

The second column (n) gives the number of participants analysed. If, for some reason one participant has too few trials to analyse, or did not complete one condition, this will be reflected here. I suggest you compare this n to your expected n to check that everything is running correctly. If the ns dont match, we have a problem. More likely, R will give an error message, but useful to know. 

Next are the estimates; the splithalf column and the associated 95% percentile intervals, and the Spearman-Brown corrected estimate with its own percentile intervals. Unsurprisingly, our simlated random data does not yield internally consistant measurements. 

*What should I report?* My preference is to report everything. 95% percentiles of the estimates are provided to give a picture of the spread of internal consistency estimates. Also included is the spearman-brown corrected estimates, which take into account that the estimates are drawn from half the trials that they could have been. Negative reliabilities are near uninterpretable and the spearman-brown formula is not useful in this case.

> We estimated the internal consitency of bias A and B using a permutation-based splithalf approach [@R-splithalf] with 5000 random splits. The (Spearman-Brown corrected) splithalf internal consistency of bias A was  were _r_~SB~ = `r difference$final_estimates[1,6]`, 95%CI [`r difference$final_estimates[1,7]`,`r difference$final_estimates[1,8]`]. 
>
> --- Parsons, 2020


### Average scores

For some tasks the outcome measure may simply be the average RT. In this case, we will ignore the trial type option. We will extract separate outcome scores for each block of trials, but this time it is simply the average RT in each block. Note that the main difference in this code is that we have omitted the inputs about what trial types to 'compare', as this is irrelevant for the current task.

```{r message = FALSE, comment = NA, results = 'hide'}
average <- splithalf(data = sim_data,
                     outcome = "RT",
                     score = "average",
                     conditionlist = c("A", "B"),
                     halftype = "random",
                     permutations = 5000,
                     var.RT = "RT",
                     var.condition = "block_name",
                     var.participant = "participant_number",
                     average = "mean")

```

```{r echo = FALSE, comment = NA}
average$final_estimates

```

### Difference-of-difference scores

The difference of differences score is a bit more complex, and perhaps also less common. I programmed this aspect of the package initially because I had seen a few papers that used a change in bias score in their analysis, and I wondered "I wonder how reliable that is as an individual difference measure". Be warned, difference scores are nearly always less reliable than raw averages, and  differences-of-differences will be less reliable again. 

Our difference-of-difference variable in our task is the difference between bias observed in block A and B. So our outcome is calculated something like this. 

BiasA = incongruent_A - congruent_A

BiasB = incongruent_B - congruent_B

Outcome = BiasB - BiasA

In our function, we specify this very similarly as in the difference score example. The only change will be changing the score to "difference_of_difference". Note that we will keep the condition list consisting of A and B. But, specifying that we are interested in the difference of differences will lead the function to calculate the outcome scores apropriately. 

```{r message = FALSE, comment = NA, results = 'hide'}

diff_of_diff <- splithalf(data = sim_data,
                        outcome = "RT",
                        score = "difference_of_difference",
                        conditionlist = c("A", "B"),
                        halftype = "random",
                        permutations = 5000,
                        var.RT = "RT",
                        var.condition = "block_name",
                        var.participant = "participant_number",
                        var.compare = "trial_type",
                        compare1 = "congruent",
                        compare2 = "incongruent",
                        average = "mean")


```

```{r echo = FALSE, comment = NA}
diff_of_diff$final_estimates

```



## Multiverse analysis extension

This example is simplified from Parsons [-@parsons_exploring_2020]. The process takes four steps. First, specify a list of data processing decisions. Here, we'll specify only removing trials greater or lower than a specified amount. More options are available, such as total accuracy cutoff thresholds for participant removals. 

```{r}
specifications <- list(RT_min = c(0, 100, 200),
                       RT_max = c(1000, 2000),
                       averaging_method = c("mean", "median"))
```

Second, perform `splithalf(...)`. The key difference here, compared to the earlier examples is that we are no longer assuming that all data processing has already occurred. Instead, the data processing will be performed as part of `splithalf.multiverse`, next.

```{r message = FALSE, comment = NA, results = 'hide'}
difference <- splithalf(data = sim_data,
                        outcome = "RT",
                        score = "difference",
                        conditionlist = c("A"),
                        halftype = "random",
                        permutations = 5000,
                        var.RT = "RT",
                        var.condition = "block_name",
                        var.participant = "participant_number",
                        var.compare = "trial_type",
                        var.ACC = "ACC",
                        compare1 = "congruent",
                        compare2 = "incongruent",
                        average = "mean")
```

Third, perform `splithalf.multiverse` with the specification list and splithalf objects as inputs

```{r message = FALSE, comment = NA, results = 'hide'}
multiverse <- splithalf.multiverse(input = difference,
                                   specifications = specifications)
```

Finally, plot the multiverse with plot.multiverse.

```{r multiverse_plot}
multiverse.plot(multiverse = multiverse,
                title = "README multiverse")
```

For more information, see Parsons (2020). 

## Other considerations

### how many permutations?

To examine how many random splits are required to provide a precise estimate, a short simulation was performed including 20 estimates of the spearman-brown reliability estimate, for each of 1, 10, 50, 100, 1000, 2000, 5000, 10000, and 20000 random splits. This simulation was performed on data from one block of 80 trials. Based on this simulation, I recommend 5000 (or more) random splits be used to calculate split-half reliability. 5000 permutations yielded a standard deviation of .002 and a total range of .008, indicating that the reliability estimates are stable to two decimal places with this number of random splits. Increasing the number of splits improved precision, however 20000 splits were required to reduce the standard deviation to .001. 

### *splithalf* runtime?

The speed of *splithalf* rests entirely on the number of conditions, participants, and permutations. The biggest factor will be your machine speed. For relative times, I ran a simulation with a range of sample sizes, numbers of conditions, numbers of trials, and permutations. The data is contained within the package as `data/speedtest.rda`

```{r speedtest, message = FALSE, comment = NA, results = 'hide', echo = FALSE}
load("data/speedtestdata.rda")
speed <- speedtestdata
speed2 <- speed

speed2$sample_size <- as.factor(speed2$sample_size)
speed2$Number_conditions <- as.factor(speed2$Number_conditions)
speed2$trials <- as.factor(speed2$trials)
speed2$permutations <- as.factor(speed2$permutations)


ggplot(data = speed2, 
       aes(x = permutations, y = runtime, colour = sample_size, group = interaction(trials, sample_size))) +
  geom_point(size = 2) +
  geom_line() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  labs(y = "runtime (seconds)") +
  guides(colour = guide_legend(reverse = TRUE)) +
  facet_grid(vars(trials), vars(Number_conditions))

```



## Future development

The **splithalf** package is still under development. If you have suggestions for improvements to the package, or bugs to report, please raise an issue on github (https://github.com/sdparsons/splithalf). Currently, I have the following on my immediate to do list:

* error tracking
    * I plan to develop a function that catches potential issues that could arise with the functions.
    * I also plan to document common R warnings and errors that arise and why (as sometimes without knowing exactly how the functions work these issues can be harder to trace back).
* include other scoring methods:
    * signal detection, e.g. d prime
    * potentially customisable methods, e.g. where the outcome is scored in formats similar to A - B / A + B
    
    
## Comparison to other software

**splithalf** is the only package to implement all of these tools, in particular reliability multiverse analyses. Some other `R` packages offer a bootstrapped approach to split-half reliability **multicon** [@R-multicon], **psych** [@R-psych], and **splithalfr** [@R-splithalfr]


## References

Owner

Name: Sam Parsons
Login: sdparsons
Kind: user
Location: Donders Institute, Netherlands

Website: https://sdparsons.github.io/
Twitter: sam_d_parsons
Repositories: 3
Profile: https://github.com/sdparsons

Postdoc Fellow @lcd_lab_donders | Reliability and psychometrics in cognitive neuroscience | Initiatives: @ReproducibiliT @FORRTproject | Trending towards bawbag

JOSS Publication

splithalf: robust estimates of split half reliability

Published

April 23, 2021

DOI

10.21105/joss.03041

Volume 6, Issue 60, Page 3041

Authors

Sam Parsons
Department of Experimental Psychology, University of Oxford

Editor

Olivia Guest

Papers & Mentions

Total mentions: 4

The Reliability of the Wisconsin Card Sorting Test in Clinical Practice

DOI: 10.1177/1073191119866257
OpenAlex ID: https://openalex.org/W2965721118
Published: August 2019

Last synced: 4 months ago

The CogBIAS longitudinal study of adolescence: cohort profile and stability and change in measures across three waves

DOI: 10.1186/s40359-019-0342-8
OpenAlex ID: https://openalex.org/W2988702118
Published: November 2019

Last synced: 4 months ago

Cognitive, Affective, and Feedback-Based Flexibility – Disentangling Shared and Different Aspects of Three Facets of Psychological Flexibility

DOI: 10.5334/joc.120
OpenAlex ID: https://openalex.org/W3084358907
Published: September 2020

Last synced: 4 months ago

Combining Web-Based Attentional Bias Modification and Approach Bias Modification as a Self-Help Smoking Intervention for Adult Smokers Seeking Online Help: Double-Blind Randomized Controlled Trial

DOI: 10.2196/16342
OpenAlex ID: https://openalex.org/W3002955230
Published: May 2020

Last synced: 4 months ago

GitHub Events

Total

Issues event: 1
Watch event: 1

Last Year

Issues event: 1
Watch event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 128
Total Committers: 2
Avg Commits per committer: 64.0
Development Distribution Score (DDS): 0.008

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Sam Parsons	s**s@p**k	127
Olivia Guest	o****t	1

Committer Domains (Top 20 + Academic)

psy.ox.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 12
Total pull requests: 4
Average time to close issues: about 2 months
Average time to close pull requests: 3 months
Total issue authors: 10
Total pull request authors: 4
Average comments per issue: 1.08
Average comments per pull request: 0.75
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

rMassimiliano (2)
Nathaniel-Haines (2)
sdparsons (1)
jpsnijder (1)
kyla-mcconnell (1)
franfrutos (1)
blueja5 (1)
juju-bowow (1)
OliJimbo (1)
Foxi1309 (1)

Pull Request Authors

oliviaguest (1)
sdparsons (1)
belardi (1)
OliJimbo (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 304 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 7
Total maintainers: 1

cran.r-project.org: splithalf

Calculate Task Split Half Reliability Estimates

Homepage: https://github.com/sdparsons/splithalf
Documentation: http://cran.r-project.org/web/packages/splithalf/splithalf.pdf
License: GPL-3
Latest release: 0.8.2
published over 3 years ago

Versions: 7
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 304 Last month

Rankings

Forks count: 11.3%

Stargazers count: 16.3%

Average: 24.3%

Downloads: 28.6%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Maintainers (1)

sam.parsons@radboudumc.nl

Last synced: 7 months ago

splithalf

Science Score: 95.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

JOSS Publication

splithalf: robust estimates of split half reliability

Authors

Editor

Tags

Papers & Mentions

The Reliability of the Wisconsin Card Sorting Test in Clinical Practice

The CogBIAS longitudinal study of adolescence: cohort profile and stability and change in measures across three waves

Cognitive, Affective, and Feedback-Based Flexibility – Disentangling Shared and Different Aspects of Three Facets of Psychological Flexibility

Combining Web-Based Attentional Bias Modification and Approach Bias Modification as a Self-Help Smoking Intervention for Adult Smokers Seeking Online Help: Double-Blind Randomized Controlled Trial

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: splithalf

Rankings

Maintainers (1)

Dependencies