webtrackr
:telescope: R package for Preprocessing and Analyzing Web Tracking Data
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.1%) to scientific vocabulary
Keywords
r-package
rstats-package
webtracking
Last synced: 10 months ago
·
JSON representation
·
Repository
:telescope: R package for Preprocessing and Analyzing Web Tracking Data
Basic Info
- Host: GitHub
- Owner: gesistsa
- License: other
- Language: R
- Default Branch: main
- Homepage: https://gesistsa.github.io/webtrackR/
- Size: 15.6 MB
Statistics
- Stars: 9
- Watchers: 3
- Forks: 3
- Open Issues: 4
- Releases: 3
Topics
r-package
rstats-package
webtracking
Created almost 4 years ago
· Last pushed 10 months ago
Metadata Files
Readme
Changelog
License
Citation
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# webtrackR
[](https://CRAN.R-project.org/package=webtrackR)
[](https://CRAN.R-project.org/package=webtrackR)
[](https://github.com/schochastics/webtrackR/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/schochastics/webtrackR?branch=main)
webtrackR is an R package to preprocess and analyze web tracking data, i.e.,
web browsing histories of participants in an academic study. Web tracking data is
oftentimes collected and analyzed in conjunction with survey data of the same participants.
`webtrackR` is part of a series of R packages to analyse webtracking data:
- [webtrackR](https://github.com/schochastics/webtrackR): preprocess raw webtracking data
- [domainator](https://github.com/schochastics/domainator): classify domains
- [adaR](https://github.com/gesistsa/adaR): parse urls
## Installation
You can install the development version of webtrackR from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("schochastics/webtrackR")
```
The [CRAN](https://CRAN.R-project.org/package=webtrackR) version can be installed with:
```r
install.packages("webtrackR")
```
## S3 class `wt_dt`
The package defines an S3 class called `wt_dt` which inherits most of the functionality from the `data.frame` class. A `summary` and `print` method are included in the package.
Each row in a web tracking data set represents a visit. Raw data need to have at least the following variables:
- `panelist_id`: the individual from which the data was collected
- `url`: the URL of the visit
- `timestamp`: the time of the URL visit
The function `as.wt_dt` assigns the class `wt_dt` to a raw web tracking data set.
It also allows you to specify the name of the raw variables corresponding to `panelist_id`, `url` and `timestamp`.
Additionally, it turns the timestamp variable into `POSIXct` format.
All preprocessing functions check if these three variables are present. Otherwise an error is thrown.
## Preprocessing
Several other variables can be derived from the raw data with the following functions:
- `add_duration()` adds a variable called `duration` based on the sequence of timestamps. The basic logic is that the duration of a visit is set to the time difference to the subsequent visit, unless this difference exceeds a certain value (defined by argument `cutoff`), in which case the duration will be replaced by `NA` or some user-defined value (defined by `replace_by`).
- `add_session()` adds a variable called `session`, which groups subsequent visits into a session until the difference to the next visit exceeds a certain value (defined by `cutoff`).
- `extract_host()`, `extract_domain()`, `extract_path()` extracts the host, domain and path of the raw URL and adds variables named accordingly. See function descriptions for definitions of these terms. `drop_query()` lets you drop the query and fragment components of the raw URL.
- `add_next_visit()` and `add_previous_visit()` adds the previous or the next URL, domain, or host (defined by `level`) as a new variable.
- `add_referral()` adds a new variable indicating whether a visit was referred by a social media platform. Follows the logic of Schmidt et al., [(2023)](https://doi.org/10.31235/osf.io/cks68).
- `add_title()` downloads the title of a website (the text within the `` tag of a web site's ``) and adds it as a new variable.
- `add_panelist_data()`. Joins a data set containing information about participants such as a survey.
## Classification
- `classify_visits()` categorizes website visits by either extracting the URL's domain or host and matching them to a list of domains or hosts, or by matching a list of regular expressions against the visit URL.
## Summarizing and aggregating
- `deduplicate()` flags or drops (as defined by argument `method`) consecutive visits to the same URL within a user-defined time frame (as set by argument `within`). Alternatively to dropping or flagging visits, the function aggregates the durations of such duplicate visits.
- `sum_visits()` and `sum_durations()` aggregate the number or the durations of visits, by participant and by a time period (as set by argument `timeframe`). Optionally, the function aggregates the number / duration of visits to a certain class of visits.
- `sum_activity()` counts the number of active time periods (defined by `timeframe`) by participant.
## Example code
A typical workflow including preprocessing, classifying and aggregating web tracking data looks like this (using the in-built example data):
``` r
library(webtrackR)
# load example data and turn it into wt_dt
data("testdt_tracking")
wt <- as.wt_dt(testdt_tracking)
# add duration
wt <- add_duration(wt)
# extract domains
wt <- extract_domain(wt)
# drop duplicates (consecutive visits to the same URL within one second)
wt <- deduplicate(wt, within = 1, method = "drop")
# load example domain classification and classify domains
data("domain_list")
wt <- classify_visits(wt, classes = domain_list, match_by = "domain")
# load example survey data and join with web tracking data
data("testdt_survey_w")
wt <- add_panelist_data(wt, testdt_survey_w)
# aggregate number of visits by day and panelist, and by domain class
wt_summ <- sum_visits(wt, timeframe = "date", visit_class = "type")
```
Owner
- Name: Transparent Social Analytics
- Login: gesistsa
- Kind: organization
- Location: Germany
- Repositories: 2
- Profile: https://github.com/gesistsa
Open Science Tools maintained by Transparent Social Analytics Team, GESIS
Citation (CITATION.cff)
# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
cff-version: 1.2.0
message: 'To cite package "webtrackR" in publications use:'
type: software
license: MIT
title: 'webtrackR: Preprocessing and Analyzing Web Tracking Data'
version: 0.3.1.9000
identifiers:
- type: doi
value: 10.32614/CRAN.package.webtrackR
abstract: Data structures and methods to work with web tracking data. The functions
cover data preprocessing steps, enriching web tracking data with external information
and methods for the analysis of digital behavior as used in several academic papers
(e.g., Clemm von Hohenberg et al., 2023 <https://doi.org/10.17605/OSF.IO/M3U9P>;
Stier et al., 2022 <https://doi.org/10.1017/S0003055421001222>).
authors:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
- family-names: Hohenberg
given-names: Bernhard
name-particle: Clemm von
email: bernhard.clemm@gesis.org
orcid: https://orcid.org/0000-0002-6976-9745
- family-names: Mangold
given-names: Frank
email: frank.mangold@gesis.org
orcid: https://orcid.org/0000-0002-9776-3113
- family-names: Stier
given-names: Sebastian
email: sebastian.stier@gesis.org
orcid: https://orcid.org/0000-0002-1217-5778
preferred-citation:
type: manual
title: 'webtrackR: Preprocessing and Analyzing Web Tracking Data'
authors:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
- family-names: Hohenberg
given-names: Bernhard Clemm
name-particle: von
- family-names: Mangold
given-names: Frank
email: frank.mangold@gesis.org
orcid: https://orcid.org/0000-0002-9776-3113
- family-names: Stier
given-names: Sebastian
email: sebastian.stier@gesis.org
orcid: https://orcid.org/0000-0002-1217-5778
year: '2023'
repository: https://CRAN.R-project.org/package=webtrackR
repository-code: https://github.com/gesistsa/webtrackR
url: https://github.com/gesistsa/webtrackR
contact:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
keywords:
- r-package
- rstats-package
- webtracking
references:
- type: software
title: 'R: A Language and Environment for Statistical Computing'
notes: Depends
url: https://www.R-project.org/
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
version: '>= 3.5.0'
- type: software
title: utils
abstract: 'R: A Language and Environment for Statistical Computing'
notes: Imports
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
- type: software
title: stats
abstract: 'R: A Language and Environment for Statistical Computing'
notes: Imports
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
- type: software
title: httr
abstract: 'httr: Tools for Working with URLs and HTTP'
notes: Imports
url: https://httr.r-lib.org/
repository: https://CRAN.R-project.org/package=httr
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
doi: 10.32614/CRAN.package.httr
- type: software
title: data.table
abstract: 'data.table: Extension of `data.frame`'
notes: Imports
url: https://r-datatable.com
repository: https://CRAN.R-project.org/package=data.table
authors:
- family-names: Barrett
given-names: Tyson
email: t.barrett88@gmail.com
orcid: https://orcid.org/0000-0002-2137-1391
- family-names: Dowle
given-names: Matt
email: mattjdowle@gmail.com
- family-names: Srinivasan
given-names: Arun
email: asrini@pm.me
- family-names: Gorecki
given-names: Jan
- family-names: Chirico
given-names: Michael
orcid: https://orcid.org/0000-0003-0787-087X
- family-names: Hocking
given-names: Toby
orcid: https://orcid.org/0000-0002-3146-0865
- family-names: Schwendinger
given-names: Benjamin
orcid: https://orcid.org/0000-0003-3315-8114
year: '2024'
doi: 10.32614/CRAN.package.data.table
version: '>= 1.15.0'
- type: software
title: knitr
abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
notes: Suggests
url: https://yihui.org/knitr/
repository: https://CRAN.R-project.org/package=knitr
authors:
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
year: '2024'
doi: 10.32614/CRAN.package.knitr
- type: software
title: rmarkdown
abstract: 'rmarkdown: Dynamic Documents for R'
notes: Suggests
url: https://pkgs.rstudio.com/rmarkdown/
repository: https://CRAN.R-project.org/package=rmarkdown
authors:
- family-names: Allaire
given-names: JJ
email: jj@posit.co
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
- family-names: Dervieux
given-names: Christophe
email: cderv@posit.co
orcid: https://orcid.org/0000-0003-4474-2498
- family-names: McPherson
given-names: Jonathan
email: jonathan@posit.co
- family-names: Luraschi
given-names: Javier
- family-names: Ushey
given-names: Kevin
email: kevin@posit.co
- family-names: Atkins
given-names: Aron
email: aron@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
- family-names: Cheng
given-names: Joe
email: joe@posit.co
- family-names: Chang
given-names: Winston
email: winston@posit.co
- family-names: Iannone
given-names: Richard
email: rich@posit.co
orcid: https://orcid.org/0000-0003-3925-190X
year: '2024'
doi: 10.32614/CRAN.package.rmarkdown
- type: software
title: testthat
abstract: 'testthat: Unit Testing for R'
notes: Suggests
url: https://testthat.r-lib.org
repository: https://CRAN.R-project.org/package=testthat
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
doi: 10.32614/CRAN.package.testthat
version: '>= 3.0.0'
GitHub Events
Total
- Delete event: 2
- Issue comment event: 1
- Push event: 5
- Pull request review event: 1
- Pull request event: 7
- Create event: 1
Last Year
- Delete event: 2
- Issue comment event: 1
- Push event: 5
- Pull request review event: 1
- Pull request event: 7
- Create event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 58
- Total pull requests: 42
- Average time to close issues: about 1 month
- Average time to close pull requests: 2 days
- Total issue authors: 4
- Total pull request authors: 3
- Average comments per issue: 1.34
- Average comments per pull request: 0.0
- Merged pull requests: 41
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 3
- Average time to close issues: about 2 months
- Average time to close pull requests: 3 minutes
- Issue authors: 4
- Pull request authors: 1
- Average comments per issue: 3.2
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sebstier (1)
Pull Request Authors
- chainsawriot (2)
- ArthurMuehl (2)
- schochastics (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.2.0 depends
- data.table * imports
- igraph * imports
- tibble * imports
- urltools * imports
- utils * imports
- backbone * suggests
- stats * suggests
- testthat >= 3.0.0 suggests