adaR
:computer: wrapper for ada-url a WHATWG-compliant and fast URL parser written in modern C++
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.1%) to scientific vocabulary
Keywords
r
rstats
rstats-package
url-parser
Last synced: 6 months ago
·
JSON representation
·
Repository
:computer: wrapper for ada-url a WHATWG-compliant and fast URL parser written in modern C++
Basic Info
- Host: GitHub
- Owner: gesistsa
- License: other
- Language: C++
- Default Branch: main
- Homepage: https://gesistsa.github.io/adaR/
- Size: 7.14 MB
Statistics
- Stars: 26
- Watchers: 4
- Forks: 4
- Open Issues: 6
- Releases: 7
Topics
r
rstats
rstats-package
url-parser
Created over 2 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
License
Citation
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# adaR
[](https://github.com/gesistsa/adaR/actions/workflows/R-CMD-check.yaml)
[](https://CRAN.R-project.org/package=adaR)
[](https://CRAN.R-project.org/package=adaR)
[](https://app.codecov.io/gh/gesistsa/adaR?branch=main)
[](https://github.com/ada-url/ada)
adaR is a wrapper for [ada-url](https://github.com/ada-url/ada), a
[WHATWG](https://url.spec.whatwg.org/#url-parsing)-compliant and fast URL parser written in modern C++ .
It implements several auxilliary functions to work with urls:
- public suffix extraction (top level domain excluding private domains) like [psl](https://github.com/hrbrmstr/psl)
- fast c++ implementation of `utils::URLdecode` (~40x speedup)
More general information on URL parsing can be found in the introductory vignette via `vignette("adaR")`.
`adaR` is part of a series of R packages to analyse webtracking data:
- [webtrackR](https://github.com/gesistsa/webtrackR): preprocess raw webtracking data
- [domainator](https://github.com/schochastics/domainator): classify domains
- [adaR](https://github.com/gesistsa/adaR): parse urls
## Installation
You can install the development version of adaR from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("gesistsa/adaR")
```
The version on CRAN can be installed with
```r
install.packages("adaR")
```
## Example
This is a basic example which shows all the returned components of a URL.
```{r example}
library(adaR)
ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")
```
```c++
/*
* https://user:pass@example.com:1234/foo/bar?baz#quux
* | | | | ^^^^| | |
* | | | | | | | `----- hash_start
* | | | | | | `--------- search_start
* | | | | | `----------------- pathname_start
* | | | | `--------------------- port
* | | | `----------------------- host_end
* | | `---------------------------------- host_start
* | `--------------------------------------- username_end
* `--------------------------------------------- protocol_end
*/
```
It solves some problems of urltools with more complex urls.
```{r better}
urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
```
A "raw" url parse using ada is extremely fast (see [ada-url.com](https://www.ada-url.com/)) but for this to carry over to R is tricky.
The performance is still compatible with `urltools::url_parse` with the noted advantage in accuracy in some
practical circumstances.
```{r faster}
bench::mark(
ada = ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag", decode = FALSE),
urltools = urltools::url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag"),
check = FALSE
)
```
For further benchmark results, see `benchmark.md` in `data_raw`.
There are four more groups of functions available to work with url parsing:
- `ada_get_*()` get a specific component
- `ada_has_*()` check if a specific component is present
- `ada_set_*()` set a specific component from URLS
- `ada_clear_*()` remove a specific component from URLS
## Public Suffix extraction
`public_suffix()` extracts their top level domain from the [public suffix list](https://publicsuffix.org/), **excluding** private domains.
```{r public_suffix}
urls <- c(
"https://subsub.sub.domain.co.uk",
"https://domain.api.gov.uk",
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp"
)
public_suffix(urls)
```
If you are wondering about the last url. The list also contains wildcard suffixes such as `*.kawasaki.jp` which need to be matched.
## Acknowledgement
The logo is created from [this portrait](https://commons.wikimedia.org/wiki/File:Ada_Lovelace_portrait.jpg) of [Ada Lovelace](https://de.wikipedia.org/wiki/Ada_Lovelace), a very early pioneer in Computer Science.
Owner
- Name: Transparent Social Analytics
- Login: gesistsa
- Kind: organization
- Location: Germany
- Repositories: 2
- Profile: https://github.com/gesistsa
Open Science Tools maintained by Transparent Social Analytics Team, GESIS
Citation (CITATION.cff)
# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
cff-version: 1.2.0
message: 'To cite package "adaR" in publications use:'
type: software
license: MIT
title: 'adaR: A Fast ''WHATWG'' Compliant URL Parser'
version: 0.3.2
abstract: A wrapper for 'ada-url', a 'WHATWG' compliant and fast URL parser written
in modern 'C++'. Also contains auxiliary functions such as a public suffix extractor.
authors:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
- family-names: Chan
given-names: Chung-hong
email: chainsawtiney@gmail.com
orcid: https://orcid.org/0000-0002-6232-7530
repository: https://CRAN.R-project.org/package=adaR
repository-code: https://github.com/gesistsa/adaR
url: https://gesistsa.github.io/adaR/
contact:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
keywords:
- r
- rstats
- rstats-package
- url-parser
references:
- type: software
title: Rcpp
abstract: 'Rcpp: Seamless R and C++ Integration'
notes: LinkingTo
url: https://www.rcpp.org
repository: https://CRAN.R-project.org/package=Rcpp
authors:
- family-names: Eddelbuettel
given-names: Dirk
- family-names: Francois
given-names: Romain
- family-names: Allaire
given-names: JJ
- family-names: Ushey
given-names: Kevin
- family-names: Kou
given-names: Qiang
- family-names: Russell
given-names: Nathan
- family-names: Ucar
given-names: Inaki
- family-names: Bates
given-names: Douglas
- family-names: Chambers
given-names: John
year: '2024'
- type: software
title: triebeard
abstract: 'triebeard: ''Radix'' Trees in ''Rcpp'''
notes: Imports
url: https://github.com/Ironholds/triebeard/
repository: https://CRAN.R-project.org/package=triebeard
authors:
- family-names: Keyes
given-names: Os
- family-names: Schmidt
given-names: Drew
- family-names: Takano
given-names: Yuuki
year: '2024'
- type: software
title: knitr
abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
notes: Suggests
url: https://yihui.org/knitr/
repository: https://CRAN.R-project.org/package=knitr
authors:
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
year: '2024'
- type: software
title: rmarkdown
abstract: 'rmarkdown: Dynamic Documents for R'
notes: Suggests
url: https://pkgs.rstudio.com/rmarkdown/
repository: https://CRAN.R-project.org/package=rmarkdown
authors:
- family-names: Allaire
given-names: JJ
email: jj@posit.co
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
- family-names: Dervieux
given-names: Christophe
email: cderv@posit.co
orcid: https://orcid.org/0000-0003-4474-2498
- family-names: McPherson
given-names: Jonathan
email: jonathan@posit.co
- family-names: Luraschi
given-names: Javier
- family-names: Ushey
given-names: Kevin
email: kevin@posit.co
- family-names: Atkins
given-names: Aron
email: aron@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
- family-names: Cheng
given-names: Joe
email: joe@posit.co
- family-names: Chang
given-names: Winston
email: winston@posit.co
- family-names: Iannone
given-names: Richard
email: rich@posit.co
orcid: https://orcid.org/0000-0003-3925-190X
year: '2024'
- type: software
title: testthat
abstract: 'testthat: Unit Testing for R'
notes: Suggests
url: https://testthat.r-lib.org
repository: https://CRAN.R-project.org/package=testthat
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
version: '>= 3.0.0'
- type: software
title: 'R: A Language and Environment for Statistical Computing'
notes: Depends
url: https://www.R-project.org/
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
version: '>= 4.2'
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 4
- Watch event: 1
- Issue comment event: 2
- Push event: 15
- Pull request review event: 2
- Pull request event: 8
- Fork event: 2
Last Year
- Create event: 2
- Release event: 1
- Issues event: 4
- Watch event: 1
- Issue comment event: 2
- Push event: 15
- Pull request review event: 2
- Pull request event: 8
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 40
- Total pull requests: 42
- Average time to close issues: 4 days
- Average time to close pull requests: about 4 hours
- Total issue authors: 7
- Total pull request authors: 4
- Average comments per issue: 2.5
- Average comments per pull request: 1.29
- Merged pull requests: 39
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 7
- Average time to close issues: about 7 hours
- Average time to close pull requests: about 4 hours
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 0.67
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- schochastics (25)
- chainsawriot (5)
- DyfanJones (1)
- ArthurMuehl (1)
- Fluke95 (1)
- cbpuschmann (1)
- JBGruber (1)
Pull Request Authors
- chainsawriot (22)
- schochastics (19)
- DyfanJones (2)
- ArthurMuehl (1)
Top Labels
Issue Labels
0.2.0 (6)
0.3.0 (4)
feature? (3)
bug (3)
0.4.0 (1)
0.1.0 (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 352 last-month
- Total dependent packages: 1
- Total dependent repositories: 1
- Total versions: 6
- Total maintainers: 1
cran.r-project.org: adaR
A Fast 'WHATWG' Compliant URL Parser
- Homepage: https://gesistsa.github.io/adaR/
- Documentation: http://cran.r-project.org/web/packages/adaR/adaR.pdf
- License: MIT + file LICENSE
-
Latest release: 0.3.4
published about 1 year ago
Rankings
Stargazers count: 11.0%
Forks count: 17.1%
Average: 23.3%
Dependent repos count: 24.0%
Dependent packages count: 28.7%
Downloads: 35.7%
Maintainers (1)
Last synced:
6 months ago
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 4.2 depends
- Rcpp * imports
- triebeard * imports
- knitr * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests