webbotparser

:mag: R package to parse search engine results

https://github.com/gesistsa/webbotparser

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

browser-extension rstats rstats-package search-engine
Last synced: 6 months ago · JSON representation ·

Repository

:mag: R package to parse search engine results

Basic Info
Statistics
  • Stars: 8
  • Watchers: 3
  • Forks: 2
  • Open Issues: 2
  • Releases: 0
Topics
browser-extension rstats rstats-package search-engine
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%"
)
```

# webbotparseR  


[![Codecov test coverage](https://codecov.io/gh/schochastics/webbotparseR/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gesistsa/webbotparseR?branch=main)
[![R-CMD-check](https://github.com/schochastics/webbotparseR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gesistsa/webbotparseR/actions/workflows/R-CMD-check.yaml)


webbotparseR allows to parse search engine results that where scraped with the [WebBot](https://github.com/gesiscss/WebBot) browser extension. A similar python library is [also available](https://github.com/gesiscss/WebBot-tutorials).

## Installation

You can install the development version of webbotparseR like so:

``` r
remotes::install_github("schochastics/webbotparseR")
```

The package contains an example html from a google search on climate change.
```{r ex_file}
library(webbotparseR)
ex_file <- system.file("www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR")
```

Such search results can be parsed via the function `parse_search_results()`. The parameter `engine` is used to specify the
search engine and the search type.  

```{r parse}
output <- parse_search_results(path = ex_file, engine = "google text")
output
```

Note that images are always returned base64 encoded.
```{r image}
output$image[1]
```

The function `base64_to_img()` can be used to decode the image and save it in an appropriate format.

Owner

  • Name: Transparent Social Analytics
  • Login: gesistsa
  • Kind: organization
  • Location: Germany

Open Science Tools maintained by Transparent Social Analytics Team, GESIS

Citation (CITATION.cff)

# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "webbotparseR" in publications use:'
type: software
license: MIT
title: 'webbotparseR: Parse html files containing search engine results'
version: 0.1.0.9000
abstract: Parse search engine results which have been scraped with the 'WebBot' browser
  extension <https://github.com/gesiscss/WebBot>.
authors:
- family-names: Schoch
  given-names: David
  email: david@schochastics.net
  orcid: https://orcid.org/0000-0003-2952-4812
- family-names: Chan
  given-names: Chung-hong
  email: chainsawtiney@gmail.com
  orcid: https://orcid.org/0000-0002-6232-7530
repository-code: https://github.com/gesistsa/webbotparseR
url: https://gesistsa.github.io/webbotparseR/
contact:
- family-names: Schoch
  given-names: David
  email: david@schochastics.net
  orcid: https://orcid.org/0000-0003-2952-4812
keywords:
- browser-extension
- rstats
- rstats-package
- search-engine
references:
- type: software
  title: rvest
  abstract: 'rvest: Easily Harvest (Scrape) Web Pages'
  notes: Imports
  url: https://rvest.tidyverse.org/
  repository: https://CRAN.R-project.org/package=rvest
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2024'
  doi: 10.32614/CRAN.package.rvest
- type: software
  title: tibble
  abstract: 'tibble: Simple Data Frames'
  notes: Imports
  url: https://tibble.tidyverse.org/
  repository: https://CRAN.R-project.org/package=tibble
  authors:
  - family-names: Müller
    given-names: Kirill
    email: kirill@cynkra.com
    orcid: https://orcid.org/0000-0002-1416-3412
  - family-names: Wickham
    given-names: Hadley
    email: hadley@rstudio.com
  year: '2024'
  doi: 10.32614/CRAN.package.tibble
- type: software
  title: fastmap
  abstract: 'fastmap: Fast Data Structures'
  notes: Imports
  url: https://r-lib.github.io/fastmap/
  repository: https://CRAN.R-project.org/package=fastmap
  authors:
  - family-names: Chang
    given-names: Winston
    email: winston@posit.co
  year: '2024'
  doi: 10.32614/CRAN.package.fastmap
- type: software
  title: base64enc
  abstract: 'base64enc: Tools for base64 encoding'
  notes: Imports
  url: http://www.rforge.net/base64enc
  repository: https://CRAN.R-project.org/package=base64enc
  authors:
  - family-names: Urbanek
    given-names: Simon
    email: Simon.Urbanek@r-project.org
  year: '2024'
  doi: 10.32614/CRAN.package.base64enc
- type: software
  title: 'R: A Language and Environment for Statistical Computing'
  notes: Depends
  url: https://www.R-project.org/
  authors:
  - name: R Core Team
  institution:
    name: R Foundation for Statistical Computing
    address: Vienna, Austria
  year: '2024'
  version: '>= 3.5'
- type: software
  title: testthat
  abstract: 'testthat: Unit Testing for R'
  notes: Suggests
  url: https://testthat.r-lib.org
  repository: https://CRAN.R-project.org/package=testthat
  authors:
  - family-names: Wickham
    given-names: Hadley
    email: hadley@posit.co
  year: '2024'
  doi: 10.32614/CRAN.package.testthat
  version: '>= 3.0.0'

GitHub Events

Total
  • Delete event: 1
  • Push event: 2
  • Pull request event: 5
  • Create event: 1
Last Year
  • Delete event: 1
  • Push event: 2
  • Pull request event: 5
  • Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 7
  • Total pull requests: 8
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.63
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • schochastics (6)
  • jobreu (1)
Pull Request Authors
  • schochastics (4)
  • chainsawriot (3)
  • ArthurMuehl (2)
  • wanLo (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 3.5 depends
  • base64enc * imports
  • fastmap * imports
  • rvest * imports
  • tibble * imports
  • covr * suggests
  • testthat >= 3.0.0 suggests