readr

Read flat files (csv, tsv, fwf) into R

https://github.com/tidyverse/readr

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
14 of 113 committers (12.4%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (20.5%) to scientific vocabulary

Keywords

csv fwf parsing r

Keywords from Contributors

grammar data-manipulation tidy-data visualisation rmarkdown package-creation tidyverse documentation-tool curl latex

Last synced: 6 months ago · JSON representation

Repository

Read flat files (csv, tsv, fwf) into R

Basic Info

Host: GitHub
Owner: tidyverse
License: other
Language: R
Default Branch: main
Homepage: https://readr.tidyverse.org
Size: 11.8 MB

Statistics

Stars: 1,018
Watchers: 51
Forks: 287
Open Issues: 117
Releases: 18

Topics

csv fwf parsing r

Created over 12 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog Contributing License Code of conduct Codeowners Support

README.Rmd

---
output: github_document
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

# readr 


[![CRAN status](https://www.r-pkg.org/badges/version/readr)](https://CRAN.R-project.org/package=readr)
[![R-CMD-check](https://github.com/tidyverse/readr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/tidyverse/readr/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/tidyverse/readr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/tidyverse/readr?branch=main)


## Overview

The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values (CSV) and tab-separated values (TSV).
It is designed to parse many types of data found in the wild, while providing an informative problem report when parsing leads to unexpected results.
If you are new to readr, the best place to start is the [data import chapter](https://r4ds.hadley.nz/data-import) in R for Data Science.

## Installation

```{r, eval = FALSE}
# The easiest way to get readr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just readr:
install.packages("readr")
```

::: .pkgdown-devel
```{r, eval = FALSE}
# Or you can install the development version from GitHub:
# install.packages("pak")
pak::pak("tidyverse/readr")
```
:::

## Cheatsheet



## Usage

readr is part of the core tidyverse, so you can load it with:

```{r}
library(tidyverse)
```

Of course, you can also load readr as an individual package:

```{r eval = FALSE}
library(readr)
```

To read a rectangular dataset with readr, you combine two pieces: a function that parses the lines of the file into individual fields and a column specification.

readr supports the following file formats with these `read_*()` functions:

* `read_csv()`: comma-separated values (CSV)
* `read_tsv()`: tab-separated values (TSV)
* `read_csv2()`: semicolon-separated values with `,` as the decimal mark
* `read_delim()`: delimited files (CSV and TSV are important special cases)
* `read_fwf()`: fixed-width files
* `read_table()`: whitespace-separated files
* `read_log()`: web log files

A column specification describes how each column should be converted from a character vector to a specific data type (e.g. character, numeric, datetime, etc.).
In the absence of a column specification, readr will guess column types from the data.
`vignette("column-types")` gives more detail on how readr guesses the column types.
Column type guessing is very handy, especially during data exploration, but it's important to remember these are *just guesses*.
As any data analysis project matures past the exploratory phase, the best strategy is to provide explicit column types.

The following example loads a sample file bundled with readr and guesses the column types:

```{r}
(chickens <- read_csv(readr_example("chickens.csv")))
```

Note that readr prints the column types -- the *guessed* column types, in this case.
This is useful because it allows you to check that the columns have been read in as you expect.
If they haven't, that means you need to provide the column specification.
This sounds like a lot of trouble, but luckily readr affords a nice workflow for this.
Use `spec()` to retrieve the (guessed) column specification from your initial effort.

```{r}
spec(chickens)
```

Now you can copy, paste, and tweak this, to create a more explicit readr call that expresses the desired column types.
Here we express that `sex` should be a factor with levels `rooster` and `hen`, in that order, and that `eggs_laid` should be integer.

```{r}
chickens <- read_csv(
  readr_example("chickens.csv"),
  col_types = cols(
    chicken   = col_character(),
    sex       = col_factor(levels = c("rooster", "hen")),
    eggs_laid = col_integer(),
    motto     = col_character()
  )
)
chickens
```

`vignette("readr")` gives an expanded introduction to readr.

## Editions

readr got a new parsing engine in version 2.0.0 (released July 2021).
In this so-called second edition, readr calls `vroom::vroom()`, by default.

The parsing engine in readr versions prior to 2.0.0 is now called the first edition.
If you’re using readr >= 2.0.0, you can still access first edition parsing via the functions `with_edition(1, ...)` and `local_edition(1)`.
And, obviously, if you're using readr < 2.0.0, you will get first edition parsing, by definition, because that's all there is.

We will continue to support the first edition for a number of releases, but the overall goal is to make the second edition uniformly better than the first.
Therefore the plan is to eventually deprecate and then remove the first edition code.
New code and actively-maintained code should use the second edition.
The workarounds `with_edition(1, ...)` and `local_edition(1)` are offered as a pragmatic way to patch up legacy code or as a temporary solution for infelicities identified as the second edition matures.

## Alternatives

There are two main alternatives to readr: base R and data.table's `fread()`.
The most important differences are discussed below.

### Base R

Compared to the corresponding base functions, readr functions:

* Use a consistent naming scheme for the parameters (e.g. `col_names` and 
 `col_types` not `header` and `colClasses`).
 
* Are generally much faster (up to 10x-100x) depending on the dataset.

* Leave strings as is by default, and automatically parse common 
  date/time formats.

* Have a helpful progress bar if loading is going to take a while.

* All functions work exactly the same way regardless of the current locale.
  To override the US-centric defaults, use `locale()`.

### data.table and `fread()`

[data.table](https://github.com/Rdatatable/data.table) has a function similar to `read_csv()` called `fread()`. Compared to `fread()`, readr functions:

* Are sometimes slower, particularly on numeric heavy data.

* Can automatically guess some parameters, but basically encourage explicit
  specification of, e.g., the delimiter, skipped rows, and the header row.
  
* Follow tidyverse-wide conventions, such as returning a tibble, a standard
  approach for column name repair, and a common mini-language for column
  selection.

## Acknowledgements

Thanks to:

* [Joe Cheng](https://github.com/jcheng5) for showing me the beauty of
  deterministic finite automata for parsing, and for teaching me why I 
  should write a tokenizer.
  
* [JJ Allaire](https://github.com/jjallaire) for helping me come up with a
  design that makes very few copies, and is easy to extend.
  
* [Dirk Eddelbuettel](http://dirk.eddelbuettel.com) for coming up with the
  name!

Owner

Name: tidyverse
Login: tidyverse
Kind: organization

Website: http://tidyverse.org
Repositories: 43
Profile: https://github.com/tidyverse

The tidyverse is a collection of R packages that share common principles and are designed to work together seamlessly

GitHub Events

Total

Issues event: 17
Watch event: 16
Issue comment event: 27
Pull request event: 2
Fork event: 5

Last Year

Issues event: 17
Watch event: 16
Issue comment event: 27
Pull request event: 2
Fork event: 5

Committers

Last synced: 8 months ago

All Time

Total Commits: 1,774
Total Committers: 113
Avg Commits per committer: 15.699
Development Distribution Score (DDS): 0.636

Past Year

Commits: 2
Committers: 2
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
Jim Hester	j**r@g**m	646
hadley	h**m@g**m	633
Romain François	r**n@r**m	121
Jenny Bryan	j**n@g**m	101
Nicolas Coutin	n**n@g**m	32
jrnold	j**d@g**m	24
Kirill Müller	k**r@m**g	13
Shelby Bearrows	3****s	13
Mara Averick	m**k@g**m	12
Edwin de Jonge	e**e@g**m	8
Hiroaki Yutani	y**i@g**m	6
Oliver Keyes	I****s	6
Kirill Müller	k**r@i**h	6
andres-s	a**i@g**m	5
jennybc	j**y@s**a	5
Aymeric Stamm	a**m@c**u	5
Will Beasley	w**y@h**m	4
Mikhail Popov	m**l@m**m	4
John Blischak	j**k@g**m	4
Christophe Dervieux	c**x@g**m	4
ironholds	i**s@g**m	3
Duncan Murdoch	m**n@g**m	3
Sam Brady	s**3@g**m	3
Michael Quinn	m**n@a**u	3
Greg Freedman Ellis	g**n@g**m	3
Abraham Neuwirth	Y****e	3
Peter DeWitt (optiplex)	p**t@u**u	3
Lionel Henry	l**y@g**m	2
Kenneth Benoit	k**t@l**k	2
Jeroen Ooms	j**s@g**m	2
and 83 more...

Committer Domains (Top 20 + Academic)

rstudio.com: 2 r-enthusiasts.com: 1 mailbox.org: 1 ivt.baug.ethz.ch: 1 stat.ubc.ca: 1 childrens.harvard.edu: 1 mpopov.com: 1 aya.yale.edu: 1 ucdenver.edu: 1 lse.ac.uk: 1 manchester.ac.uk: 1 usc.edu: 1 stanford.edu: 1 missouri.edu: 1 nexleaf.org: 1 clarkberg.org: 1 ecohealthalliance.org: 1 brattle.com: 1 medisin.uio.no: 1 cmu.edu: 1 wit.edu.pl: 1 caltech.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 173
Total pull requests: 25
Average time to close issues: 5 months
Average time to close pull requests: about 1 month
Total issue authors: 137
Total pull request authors: 16
Average comments per issue: 1.76
Average comments per pull request: 1.76
Merged pull requests: 13
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 24
Pull requests: 3
Average time to close issues: about 8 hours
Average time to close pull requests: N/A
Issue authors: 24
Pull request authors: 2
Average comments per issue: 0.38
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jennybc (13)
jxu (4)
dpprdan (4)
hadley (4)
sbearrows (3)
ggrothendieck (3)
chainsawriot (2)
presnell (2)
abalter (2)
zackw (2)
mine-cetinkaya-rundel (2)
nacnudus (2)
twest820 (2)
hidekoji (2)
charliejhadley (2)

Pull Request Authors

sbearrows (6)
jennybc (3)
drvictorvs (2)
cgiachalis (2)
bastistician (2)
hadley (2)
drmowinckels (2)
dpprdan (2)
spaette (2)
peterdesmet (1)
keesdeschepper (1)
malcolmbarrett (1)
salim-b (1)
zekiakyol (1)
eitsupi (1)

Top Labels

Issue Labels

bug (19) read :book: (17) feature (16) documentation (10) upkeep (8) datetime 📆 (6) reprex (6) write :pencil2: (4) encoding � (3) locale :earth_asia: (3) col_types :hospital: (3) collector (2) Windows 🪟 (2) write (1) read (1) lifecycle :butterfly: (1)

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- cran 915,989 last-month
Total docker downloads: 46,961,679

Total dependent packages: 860
(may contain duplicates)
Total dependent repositories: 4,931
(may contain duplicates)
Total versions: 43
Total maintainers: 1

cran.r-project.org: readr

Read Rectangular Text Data

Homepage: https://readr.tidyverse.org
Documentation: http://cran.r-project.org/web/packages/readr/readr.pdf
License: MIT + file LICENSE
Latest release: 2.1.5
published about 2 years ago

Versions: 22
Dependent Packages: 860
Dependent Repositories: 4,931
Downloads: 915,989 Last month
Docker Downloads: 46,961,679

Rankings

Dependent repos count: 0.1%

Downloads: 0.1%

Dependent packages count: 0.1%

Forks count: 0.2%

Stargazers count: 0.3%

Average: 3.0%

Docker downloads count: 17.3%

Maintainers (1)

jenny@posit.co

Last synced: 6 months ago

proxy.golang.org: github.com/tidyverse/readr

Documentation: https://pkg.go.dev/github.com/tidyverse/readr#section-documentation
License: other
Latest release: v2.1.5+incompatible
published about 2 years ago

Versions: 21
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.4 depends
R6 * imports
cli >= 3.2.0 imports
clipr * imports
crayon * imports
hms >= 0.4.1 imports
lifecycle >= 0.2.0 imports
methods * imports
rlang * imports
tibble * imports
utils * imports
vroom >= 1.5.6 imports
covr * suggests
curl * suggests
datasets * suggests
knitr * suggests
rmarkdown * suggests
spelling * suggests
stringi * suggests
testthat >= 3.1.2 suggests
tzdb >= 0.1.1 suggests
waldo * suggests
withr * suggests
xml2 * suggests

.github/workflows/R-CMD-check-against-dev-vroom.yaml actions

actions/checkout v2 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/R-CMD-check.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action v4.4.1 composite
actions/checkout v3 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/pr-commands.yaml actions

actions/checkout v3 composite
r-lib/actions/pr-fetch v2 composite
r-lib/actions/pr-push v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/test-coverage.yaml actions

actions/checkout v3 composite
actions/upload-artifact v3 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite