paperwizard
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords from Contributors
Repository
Basic Info
Statistics
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 3
- Releases: 2
Metadata Files
README.md
paperwizard 
paperwizard is an R package designed to extract readable content (such as news
articles) from webpages using
Readability.js. This package leverages
Node.js to parse webpages and identify the main content of an article, allowing
you to work with cleaner, structured content.
The package is supposed to be an addon for paperboy.
Installation
You can install the development version of paperwizard like so:
r
remotes::install_github("schochastics/paperwizard")
Setup
To use paperwizard, you need to have Node.js installed. Download and install Node.js from the official
website. The page offers
instructions for all major OS. After installing Node.js, you can confirm the
installation by running the
following command in your terminal.
bash
node -v
This should return the version of Node.js installed.
To make sure that the package knows where the command node is found, set
r
options(paperwizard.node_path = "/path/to/node")
if it is not installed in a standard location.
Once Node.js is installed, you need to install the necessary libraries which are linkedom, Readability.js, puppeteer and axios.
r
pw_npm_install()
Use
You can use it either by supplying a url
r
pw_deliver(url)
or a data.frame that was created by paperboy::pb_collect()
r
x <- paperboy::pb_collect(list_or_urls)
pw_deliver(x)
Known sites with issues
Owner
- Name: David Schoch
- Login: schochastics
- Kind: user
- Location: Germany
- Company: cynkra
- Website: mr.schochastics.net
- Repositories: 131
- Profile: https://github.com/schochastics
Data Scientist/DevOps Engineer at cynkra and #RStats developer
Citation (CITATION.cff)
# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
cff-version: 1.2.0
message: 'To cite package "paperwizard" in publications use:'
type: software
license: MIT
title: 'paperwizard: Scrape News Sites using ''readability.js'''
version: 0.2.0.9000
abstract: uses Mozillas readability.js to scrape text from websites.
authors:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
repository-code: https://github.com/schochastics/paperwizard
url: https://github.com/schochastics/paperwizard
contact:
- family-names: Schoch
given-names: David
email: david@schochastics.net
orcid: https://orcid.org/0000-0003-2952-4812
references:
- type: software
title: cli
abstract: 'cli: Helpers for Developing Command Line Interfaces'
notes: Imports
url: https://cli.r-lib.org
repository: https://CRAN.R-project.org/package=cli
authors:
- family-names: Csárdi
given-names: Gábor
email: csardi.gabor@gmail.com
year: '2024'
doi: 10.32614/CRAN.package.cli
- type: software
title: jsonlite
abstract: 'jsonlite: A Simple and Robust JSON Parser and Generator for R'
notes: Imports
url: https://jeroen.r-universe.dev/jsonlite
repository: https://CRAN.R-project.org/package=jsonlite
authors:
- family-names: Ooms
given-names: Jeroen
email: jeroenooms@gmail.com
orcid: https://orcid.org/0000-0002-4035-0289
year: '2024'
identifiers:
- type: url
value: https://arxiv.org/abs/1403.2805
doi: 10.32614/CRAN.package.jsonlite
- type: software
title: processx
abstract: 'processx: Execute and Control System Processes'
notes: Imports
url: https://processx.r-lib.org
repository: https://CRAN.R-project.org/package=processx
authors:
- family-names: Csárdi
given-names: Gábor
email: csardi.gabor@gmail.com
orcid: https://orcid.org/0000-0001-7098-9676
- family-names: Chang
given-names: Winston
year: '2024'
doi: 10.32614/CRAN.package.processx
- type: software
title: tibble
abstract: 'tibble: Simple Data Frames'
notes: Imports
url: https://tibble.tidyverse.org/
repository: https://CRAN.R-project.org/package=tibble
authors:
- family-names: Müller
given-names: Kirill
email: kirill@cynkra.com
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Wickham
given-names: Hadley
email: hadley@rstudio.com
year: '2024'
doi: 10.32614/CRAN.package.tibble
GitHub Events
Total
- Create event: 3
- Release event: 4
- Issues event: 11
- Watch event: 6
- Delete event: 2
- Issue comment event: 2
- Public event: 1
- Push event: 44
- Pull request event: 3
Last Year
- Create event: 3
- Release event: 4
- Issues event: 11
- Watch event: 6
- Delete event: 2
- Issue comment event: 2
- Public event: 1
- Push event: 44
- Pull request event: 3
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| schochastics | d****d@s****t | 60 |
| dependabot[bot] | 4****] | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 8
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- schochastics (8)
Pull Request Authors
- ArthurMuehl (2)
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- adaR * imports
- jsonlite * imports
- lubridate * imports
- processx * imports
- tibble * imports
- actions/checkout v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- 164 dependencies
- @mozilla/readability ^0.5.0
- axios ^1.7.7
- jsdom ^25.0.1
- linkedom ^0.18.5
- puppeteer ^23.6.0