Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 32 committers (3.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.5%) to scientific vocabulary
Keywords
html
r
web-scraping
Keywords from Contributors
grammar
data-manipulation
tidy-data
rmarkdown
curl
pandoc
parsing
fwf
csv
package-creation
Last synced: 6 months ago
·
JSON representation
Repository
Simple web scraping for R
Basic Info
- Host: GitHub
- Owner: tidyverse
- License: other
- Language: R
- Default Branch: main
- Homepage: https://rvest.tidyverse.org
- Size: 12.8 MB
Statistics
- Stars: 1,506
- Watchers: 88
- Forks: 348
- Open Issues: 30
- Releases: 14
Topics
html
r
web-scraping
Created over 11 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
Contributing
License
Code of conduct
Codeowners
Support
README.Rmd
---
output: github_document
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# rvest
[](https://cran.r-project.org/package=rvest)
[](https://github.com/tidyverse/rvest/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/tidyverse/rvest)
## Overview
rvest helps you scrape (or harvest) data from web pages.
It is designed to work with [magrittr](https://github.com/tidyverse/magrittr) to make it easy to express common web scraping tasks, inspired by libraries like [beautiful soup](https://www.crummy.com/software/BeautifulSoup/) and [RoboBrowser](http://robobrowser.readthedocs.io/en/latest/readme.html).
If you're scraping multiple pages, I highly recommend using rvest in concert with [polite](https://dmi3kno.github.io/polite/).
The polite package ensures that you're respecting the [robots.txt](https://en.wikipedia.org/wiki/Robots_exclusion_standard) and not hammering the site with too many requests.
## Installation
```{r, eval = FALSE}
# The easiest way to get rvest is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just rvest:
install.packages("rvest")
```
## Usage
```{r, message = FALSE}
library(rvest)
# Start by reading a HTML page with read_html():
starwars <- read_html("https://rvest.tidyverse.org/articles/starwars.html")
# Then find elements that match a css selector or XPath expression
# using html_elements(). In this example, each corresponds
# to a different film
films <- starwars |> html_elements("section")
films
# Then use html_element() to extract one element per film. Here
# we the title is given by the text inside
title <- films |>
html_element("h2") |>
html_text2()
title
# Or use html_attr() to get data out of attributes. html_attr() always
# returns a string so we convert it to an integer using a readr function
episode <- films |>
html_element("h2") |>
html_attr("data-id") |>
readr::parse_integer()
episode
```
If the page contains tabular data you can convert it directly to a data frame with `html_table()`:
```{r}
html <- read_html("https://en.wikipedia.org/w/index.php?title=The_Lego_Movie&oldid=998422565")
html |>
html_element(".tracklist") |>
html_table()
```
Owner
- Name: tidyverse
- Login: tidyverse
- Kind: organization
- Website: http://tidyverse.org
- Repositories: 43
- Profile: https://github.com/tidyverse
The tidyverse is a collection of R packages that share common principles and are designed to work together seamlessly
GitHub Events
Total
- Issues event: 30
- Watch event: 30
- Delete event: 5
- Issue comment event: 46
- Push event: 16
- Pull request review event: 1
- Pull request event: 13
- Fork event: 12
- Create event: 2
Last Year
- Issues event: 30
- Watch event: 30
- Delete event: 5
- Issue comment event: 46
- Push event: 16
- Pull request review event: 1
- Pull request event: 13
- Fork event: 12
- Create event: 2
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Hadley Wickham | h****m@g****m | 373 |
| Mara Averick | m****k@g****m | 6 |
| john collins | j****s@g****m | 5 |
| Dmytro Perepolkin | d****n@g****m | 4 |
| Benjamin Skov Kaas-Hansen | e****n@h****m | 3 |
| Will May | w****y@l****m | 2 |
| Hiroaki Yutani | y****i@g****m | 2 |
| jrnold | j****d@g****m | 2 |
| jjchern | j****n@g****m | 2 |
| Kun Ren | k****n@r****e | 2 |
| Jim Hester | j****r@g****m | 2 |
| Jamie Lendrum | j****m@g****m | 2 |
| Eduardo Ariño de la Rubia | e****o@g****m | 2 |
| David Holstius | d****s@g****m | 2 |
| vtroost | 3****t | 1 |
| moody_mudskipper | a****i@g****m | 1 |
| leledavid | l****d@g****m | 1 |
| Z_Wael | z****s@g****m | 1 |
| William Doane | w****l@D****m | 1 |
| Sam | s****e | 1 |
| Raymond | 3****t | 1 |
| Michael Chirico | m****4@g****m | 1 |
| Matt Cowgill | m****l@g****m | 1 |
| Marcin Kosiński | k****m@s****l | 1 |
| Luis Verde Arregoitia | l****d@c****x | 1 |
| Brent Brewington | b****n@g****m | 1 |
| Charlotte Wickham | c****m@g****m | 1 |
| Craig Citro | c****o@g****m | 1 |
| Daniel Possenriede | p****e@g****m | 1 |
| Josh Duncan | j****d@g****m | 1 |
| and 2 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 134
- Total pull requests: 62
- Average time to close issues: 10 months
- Average time to close pull requests: 5 months
- Total issue authors: 90
- Total pull request authors: 22
- Average comments per issue: 1.5
- Average comments per pull request: 1.24
- Merged pull requests: 40
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 18
- Pull requests: 13
- Average time to close issues: about 2 hours
- Average time to close pull requests: 1 day
- Issue authors: 17
- Pull request authors: 6
- Average comments per issue: 0.17
- Average comments per pull request: 0.85
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hadley (36)
- geotheory (3)
- petrbouchal (3)
- davidrsch (2)
- kjschaudt (2)
- alireza5969 (2)
- epiben (2)
- OlexiyPukhov (2)
- qpmnguyen (1)
- jeroenjanssens (1)
- romainfrancois (1)
- MattCowgill (1)
- litao1105 (1)
- jubilee2 (1)
- cregouby (1)
Pull Request Authors
- hadley (22)
- jonthegeek (4)
- epiben (4)
- MichaelChirico (3)
- jeroen (3)
- luisDVA (3)
- jrosell (2)
- SermetPekin (2)
- shikokuchuo (2)
- david-jankoski (2)
- MattCowgill (2)
- VisruthSK (2)
- ZWael (2)
- vtroost (1)
- HayesJohnD (1)
Top Labels
Issue Labels
feature (18)
table 🏓 (11)
bug (7)
documentation (6)
form 🧾 (5)
upkeep (4)
live :baby_chick: (3)
reprex (2)
help wanted :heart: (1)
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- cran 660,454 last-month
- Total docker downloads: 45,956,794
-
Total dependent packages: 284
(may contain duplicates) -
Total dependent repositories: 1,350
(may contain duplicates) - Total versions: 30
- Total maintainers: 1
cran.r-project.org: rvest
Easily Harvest (Scrape) Web Pages
- Homepage: https://rvest.tidyverse.org/
- Documentation: http://cran.r-project.org/web/packages/rvest/rvest.pdf
- License: MIT + file LICENSE
-
Latest release: 1.0.5
published 6 months ago
Rankings
Forks count: 0.1%
Stargazers count: 0.1%
Dependent repos count: 0.3%
Downloads: 0.4%
Dependent packages count: 0.4%
Average: 3.1%
Docker downloads count: 17.3%
Maintainers (1)
Last synced:
6 months ago
proxy.golang.org: github.com/tidyverse/rvest
- Documentation: https://pkg.go.dev/github.com/tidyverse/rvest#section-documentation
- License: other
-
Latest release: v1.0.5
published 6 months ago
Rankings
Dependent packages count: 5.5%
Average: 5.7%
Dependent repos count: 5.9%
Last synced:
6 months ago
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action 4.1.4 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pr-commands.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/pr-fetch v2 composite
- r-lib/actions/pr-push v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.2 depends
- cli * imports
- glue * imports
- httr >= 0.5 imports
- lifecycle >= 1.0.3 imports
- magrittr * imports
- rlang >= 1.0.0 imports
- selectr * imports
- tibble * imports
- withr * imports
- xml2 >= 1.3 imports
- covr * suggests
- knitr * suggests
- readr * suggests
- repurrrsive * suggests
- rmarkdown * suggests
- spelling * suggests
- stringi >= 0.3.1 suggests
- testthat >= 3.0.2 suggests
- webfakes * suggests