readthat

Read Text Data

https://github.com/mkearney/readthat

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary

Keywords

http-get r r-package read read-file read-url readsource readtextfile rstats text
Last synced: 6 months ago · JSON representation

Repository

Read Text Data

Basic Info
  • Host: GitHub
  • Owner: mkearney
  • License: other
  • Language: R
  • Default Branch: master
  • Size: 1.5 MB
Statistics
  • Stars: 26
  • Watchers: 3
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
http-get r r-package read read-file read-url readsource readtextfile rstats text
Created over 6 years ago · Last pushed over 6 years ago
Metadata Files
Readme License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(readthat)
```

# readthat 


[![CRAN status](https://www.r-pkg.org/badges/version/readthat)](https://CRAN.R-project.org/package=readthat)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![Travis build status](https://travis-ci.org/mkearney/readthat.svg?branch=master)](https://travis-ci.org/mkearney/readthat)
[![Codecov test coverage](https://codecov.io/gh/mkearney/readthat/branch/master/graph/badge.svg)](https://codecov.io/gh/mkearney/readthat?branch=master)


Quickly read text/source from local files and web pages.

## Installation

You can install the development released version of readthat from Github with:

``` r
remotes::install_github("mkearney/readthat")
```

## Examples

Let's say we want to read-in the source of the following websites:

```{r}
## a vector of URLs
urls <- c(
  "https://mikewk.com",
  "https://cnn.com",
  "https://www.cnn.com/us"
)
```

Use `readthat::read()` to read the text/source of a single file/URL

```{r}
## read single web/file (returns text vector)
x <- read(urls[1])

## preview output
substr(x, 1, 60)

## use apply functions to read multiple pages
xx <- sapply(urls, read)

## preview output
lapply(xx, substr, 1, 60)
```

## Comparisons

Benchmark comparison for reading a text file:

```{r}
## save a text file
writeLines(read(urls[1]), x <- tempfile())

## coompare read times
bm_file <- bench::mark(
  readr = readr::read_lines(x),
  readthat = read(x),
  readLines = readLines(x),
  check = FALSE
)

## view results
bm_file
```


```{r, include=FALSE}
p1 <- ggplot2::autoplot(bm_file)
p1 + ggplot2::ggsave("man/figures/README-bm_file.png",
    width = 9, height = 5, units = "in")
```


![](man/figures/README-bm_file.png)

Benchmark comparison for reading a web page:

```{r}
x <- "https://www.espn.com/nfl/scoreboard"
bm_html <- bench::mark(
  httr = httr::content(httr::GET(x), as = "text", encoding = "UTF-8"),
  xml2 = xml2::read_html(x),
  readthat = read(x),
  readLines = readLines(x, warn = FALSE),
  readr = readr::read_lines(x),
  check = FALSE,
  iterations = 25,
  filter_gc = TRUE
)
bm_html
```

```{r, include=FALSE}
p2 <- ggplot2::autoplot(bm_html)
p2 + ggplot2::ggsave("man/figures/README-bm_html.png",
    width = 9, height = 5, units = "in")
```


![](man/figures/README-bm_html.png)

Owner

  • Name: Michael W. Kearney
  • Login: mkearney
  • Kind: user
  • Location: United States
  • Company: @AwareHQ

📊🧑‍💻📊 Senior Data Scientist

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 30
  • Total Committers: 1
  • Avg Commits per committer: 30.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
mkearney k****w@m****u 30
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jimhester (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
cran.r-project.org: readthat

Read Text Data

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 10.3%
Average: 26.1%
Forks count: 28.8%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • Rcpp * imports
  • curl * imports
  • covr * suggests
  • testthat >= 2.1.0 suggests