r_package

https://github.com/behavioraldataanalysis/r_package

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: BehavioralDataAnalysis
License: other
Language: R
Default Branch: main
Size: 164 KB

Statistics

Stars: 2
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Codemeta

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# BehavioralDataAnalysis


[![Codecov test coverage](https://codecov.io/gh/BehavioralDataAnalysis/R_package/branch/main/graph/badge.svg)](https://app.codecov.io/gh/BehavioralDataAnalysis/R_package?branch=main)
[![check-standard](https://github.com/BehavioralDataAnalysis/R_package/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/BehavioralDataAnalysis/R_package/actions/workflows/check-standard.yaml)
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)



** WORK IN PROGRESS! Please forgive the mess until the package is ready for release on CRAN **


The goal of BehavioralDataAnalysis is to provide functions to help you analyze behavioral data, i.e., data that represents the behavior of human beings such as customers and employees. In particular, I believe that there are two aspects of behavioral data that are worth emphasizing: 
- it doesn't obey a normal distribution nearly as often as we assume. It can be asymmetrical (skewed), fat-tailed, kurtotic, present multiple peaks and what have you. 
- we're generally interested in understanding what causes a behavior, so that we can affect it--e.g., increase customer spending or reduce employee churn. This requires the use of experimental or quasi-experimental methods, many of which make our data even less "well-behaved", statistically speaking. 

Both of these aspects call for dedicated analytical approaches, which is what this package is about. I describe in more details this "Causal-Behavioral Framework", as I call it, in my book [Behavioral Data Analysis with R and Python](https://smile.amazon.com/Behavioral-Data-Analysis-Python-Customer-Driven-ebook/dp/B0979QYPWD/) (O'Reilly Media). But you can totally use this package without reading the book, and I've tried to make the documentation self-sustaining. 

Please note that the package is designed to integrate nicely with the Tidyverse and therefore most functions will expect data formatted as a data.frame or a tibble. 

## Installation

You can install the development version of BehavioralDataAnalysis from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("BehavioralDataAnalysis/R_package")
```

## Examples

### Bootstrap confidence interval

The function that you're most likely to use is probably `boot_ci()`, which estimates a Bootstrap interval for a function applied to a dataset. While the `boot.ci()` function of the [boot package]('https://cran.r-project.org/web/packages/boot/index.html') offers more options and is more powerful, it often requires more memory and computation than my personal laptop can manage and I find it somewhat cumbersome to use. Definitely check it out if you need a more serious implementation than the one here!

You can pass to `boot_ci()` any function that takes as argument a data frame and returns a single number, and by default it will automatically return the 90% confidence interval:

```{r}
library(BehavioralDataAnalysis)
my_data <- data.frame(
  x = rnorm(100)
)

my_function <- function(df) { return(mean(df$x)) }

CI <- boot_ci(my_data, my_function)
print(CI)
```
However, the most common use case is probably to use it to run a regression, so you can also pass directly the formula for a linear regression as the second parameter. For example, let's see what is the relationship between mass and height in the `starwars` dataset.

```{r}
data(starwars, package = "dplyr")

CI <- boot_ci(starwars, 'mass~height')
print(CI)
```

### matching subject for experimentation

If you have access to your whole list of subjects ahead of time (e.g., as opposed to users visiting at random your website), you can pair subjects sharing similar characteristics, to ensure that your experimental groups are as balanced as possible. This is also called stratified assignment, hence the name of the function `paired_assign()`. Note however that it will make traditional statistics invalid, and you'll have to use the Bootstrap to build intervals around your central estimates. 

```{r, message=FALSE, warning=FALSE}
library(dplyr)
library(BehavioralDataAnalysis)
attach(starwars)
set.seed(1)
dat <- starwars %>%
  na.omit() %>%
  dplyr::select(-films, -vehicles, -starships) %>%
  dplyr::filter(!grepl('Dooku', name))

paired_assigned_dat <- paired_assign(dat, id = 'name')
summ <- paired_assigned_dat %>% 
  group_by(grp) %>% 
  summarize(mean_height = mean(height, na.rm = TRUE))
print(summ)

```

As we can see, the mean heights of the two groups are pretty close. With pure randomization on the other hand, the two values are further apart from each other:

```{r}
set.seed(1)
rnd_dat <- dat %>%
  mutate(grp = c(rep(0, 14), rep(1, 14))) %>%
  mutate(grp = sample(grp))
rnd_summ <- rnd_dat %>% 
  group_by(grp) %>% 
  summarize(mean_height = mean(height, na.rm = TRUE))
print(rnd_summ)

```

Owner

Login: BehavioralDataAnalysis
Kind: user

Repositories: 1
Profile: https://github.com/BehavioralDataAnalysis

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "BehavioralDataAnalysis",
  "description": "Based on the book Behavioral Data Analysis With R and Python. It provides robust functions to analyze behavioral data without relying on traditional statistics.",
  "name": "BehavioralDataAnalysis: Bootstrap and Sampling Functions For Behavioral Data Analysis",
  "codeRepository": "https://github.com/BehavioralDataAnalysis/R_package",
  "issueTracker": "https://github.com/BehavioralDataAnalysis/R_package/issues",
  "license": "https://spdx.org/licenses/MIT",
  "version": "0.1.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 4.2.2 (2022-10-31 ucrt)",
  "author": [
    {
      "@type": "Person",
      "givenName": "Florent",
      "familyName": "Buisson",
      "email": "florent.buisson.oreilly@maskedmails.com"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Florent",
      "familyName": "Buisson",
      "email": "florent.buisson.oreilly@maskedmails.com"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "version": ">= 3.0.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    }
  ],
  "softwareRequirements": {
    "1": {
      "@type": "SoftwareApplication",
      "identifier": "doParallel",
      "name": "doParallel",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=doParallel"
    },
    "2": {
      "@type": "SoftwareApplication",
      "identifier": "dplyr",
      "name": "dplyr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=dplyr"
    },
    "3": {
      "@type": "SoftwareApplication",
      "identifier": "foreach",
      "name": "foreach",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=foreach"
    },
    "4": {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=magrittr"
    },
    "5": {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    "6": {
      "@type": "SoftwareApplication",
      "identifier": "Rcpp",
      "name": "Rcpp",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=Rcpp"
    },
    "7": {
      "@type": "SoftwareApplication",
      "identifier": "scales",
      "name": "scales",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=scales"
    },
    "8": {
      "@type": "SoftwareApplication",
      "identifier": "stats",
      "name": "stats"
    },
    "9": {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 2.10"
    },
    "SystemRequirements": null
  },
  "fileSize": "6157.639KB",
  "readme": "https://github.com/BehavioralDataAnalysis/R_package/blob/main/README.md",
  "contIntegration": [
    "https://app.codecov.io/gh/BehavioralDataAnalysis/R_package?branch=main",
    "https://github.com/BehavioralDataAnalysis/R_package/actions/workflows/check-standard.yaml"
  ],
  "developmentStatus": "https://www.repostatus.org/#wip"
}

GitHub Events

Total

Push event: 5

Last Year

Push event: 5

Dependencies

.github/workflows/check-standard.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/test-coverage.yaml actions

actions/checkout v3 composite
actions/upload-artifact v3 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

DESCRIPTION cran

R >= 2.10 depends
Rcpp * imports
doParallel * imports
dplyr * imports
foreach * imports
magrittr * imports
methods * imports
scales * imports
stats * imports
knitr * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science