webbotparser
:mag: R package to parse search engine results
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Keywords
browser-extension
rstats
rstats-package
search-engine
Last synced: 6 months ago
·
JSON representation
·
Repository
:mag: R package to parse search engine results
Basic Info
- Host: GitHub
- Owner: gesistsa
- License: other
- Language: HTML
- Default Branch: main
- Homepage: https://gesistsa.github.io/webbotparseR/
- Size: 41.4 MB
Statistics
- Stars: 8
- Watchers: 3
- Forks: 2
- Open Issues: 2
- Releases: 0
Topics
browser-extension
rstats
rstats-package
search-engine
Created almost 3 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
License
Citation
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# webbotparseR
[](https://app.codecov.io/gh/gesistsa/webbotparseR?branch=main)
[](https://github.com/gesistsa/webbotparseR/actions/workflows/R-CMD-check.yaml)
webbotparseR allows to parse search engine results that where scraped with the [WebBot](https://github.com/gesiscss/WebBot) browser extension. A similar python library is [also available](https://github.com/gesiscss/WebBot-tutorials).
## Installation
You can install the development version of webbotparseR like so:
``` r
remotes::install_github("schochastics/webbotparseR")
```
The package contains an example html from a google search on climate change.
```{r ex_file}
library(webbotparseR)
ex_file <- system.file("www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR")
```
Such search results can be parsed via the function `parse_search_results()`. The parameter `engine` is used to specify the
search engine and the search type.
```{r parse}
output <- parse_search_results(path = ex_file, engine = "google text")
output
```
Note that images are always returned base64 encoded.
```{r image}
output$image[1]
```
The function `base64_to_img()` can be used to decode the image and save it in an appropriate format.
Owner
- Name: Transparent Social Analytics
- Login: gesistsa
- Kind: organization
- Location: Germany
- Repositories: 2
- Profile: https://github.com/gesistsa
Open Science Tools maintained by Transparent Social Analytics Team, GESIS
Citation (CITATION.cff)
# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
cff-version: 1.2.0
message: 'To cite package "webbotparseR" in publications use:'
type: software
license: MIT
title: 'webbotparseR: Parse html files containing search engine results'
version: 0.1.0.9000
abstract: Parse search engine results which have been scraped with the 'WebBot' browser
extension <https://github.com/gesiscss/WebBot>.
authors:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
- family-names: Chan
given-names: Chung-hong
email: chainsawtiney@gmail.com
orcid: https://orcid.org/0000-0002-6232-7530
repository-code: https://github.com/gesistsa/webbotparseR
url: https://gesistsa.github.io/webbotparseR/
contact:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
keywords:
- browser-extension
- rstats
- rstats-package
- search-engine
references:
- type: software
title: rvest
abstract: 'rvest: Easily Harvest (Scrape) Web Pages'
notes: Imports
url: https://rvest.tidyverse.org/
repository: https://CRAN.R-project.org/package=rvest
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
doi: 10.32614/CRAN.package.rvest
- type: software
title: tibble
abstract: 'tibble: Simple Data Frames'
notes: Imports
url: https://tibble.tidyverse.org/
repository: https://CRAN.R-project.org/package=tibble
authors:
- family-names: Müller
given-names: Kirill
email: kirill@cynkra.com
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
year: '2024'
doi: 10.32614/CRAN.package.tibble
- type: software
title: fastmap
abstract: 'fastmap: Fast Data Structures'
notes: Imports
url: https://r-lib.github.io/fastmap/
repository: https://CRAN.R-project.org/package=fastmap
authors:
- family-names: Chang
given-names: Winston
email: winston@posit.co
year: '2024'
doi: 10.32614/CRAN.package.fastmap
- type: software
title: base64enc
abstract: 'base64enc: Tools for base64 encoding'
notes: Imports
url: http://www.rforge.net/base64enc
repository: https://CRAN.R-project.org/package=base64enc
authors:
- family-names: Urbanek
given-names: Simon
email: Simon.Urbanek@r-project.org
year: '2024'
doi: 10.32614/CRAN.package.base64enc
- type: software
title: 'R: A Language and Environment for Statistical Computing'
notes: Depends
url: https://www.R-project.org/
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
version: '>= 3.5'
- type: software
title: testthat
abstract: 'testthat: Unit Testing for R'
notes: Suggests
url: https://testthat.r-lib.org
repository: https://CRAN.R-project.org/package=testthat
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
doi: 10.32614/CRAN.package.testthat
version: '>= 3.0.0'
GitHub Events
Total
- Delete event: 1
- Push event: 2
- Pull request event: 5
- Create event: 1
Last Year
- Delete event: 1
- Push event: 2
- Pull request event: 5
- Create event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 7
- Total pull requests: 8
- Average time to close issues: 4 months
- Average time to close pull requests: about 1 hour
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 2.0
- Average comments per pull request: 0.63
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- schochastics (6)
- jobreu (1)
Pull Request Authors
- schochastics (4)
- chainsawriot (3)
- ArthurMuehl (2)
- wanLo (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.4.1 composite
- actions/checkout v3 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v3 composite
- actions/upload-artifact v3 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.5 depends
- base64enc * imports
- fastmap * imports
- rvest * imports
- tibble * imports
- covr * suggests
- testthat >= 3.0.0 suggests