tidyfst

tidyfst: Tidy Verbs for Fast Data Manipulation - Published in JOSS (2020)

https://github.com/hope-data-science/tidyfst

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org, zenodo.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Last synced: 6 months ago · JSON representation

Repository

Tidy Verbs for Fast Data Manipulation

Basic Info

Host: GitHub
Owner: hope-data-science
License: other
Language: R
Default Branch: master
Homepage: https://hope-data-science.github.io/tidyfst/
Size: 18.7 MB

Statistics

Stars: 106
Watchers: 5
Forks: 7
Open Issues: 0
Releases: 12

Created about 6 years ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License Code of conduct Support

tidyfst: Tidy Verbs for Fast Data Manipulation

Overview

tidyfst is a toolkit of tidy data manipulation verbs with data.table as the backend . Combining the merits of syntax elegance from dplyr and computing performance from data.table, tidyfst intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of data.table, while enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations. Also, tidyfst would introduce more tidy data verbs from other packages, including but not limited to tidyverse and data.table. If you are a dplyr user but have to use data.table for speedy computation, or data.table user looking for readable coding syntax, tidyfst is designed for you (and me of course). For further details and tutorials, see vignettes. Both Chinese and English tutorials could be found there.

Till now, tidyfst has an API that might even transcend its predecessors (e.g. select_dt could accept nearly anything for super column selection). Enjoy the efficient data operations in tidyfst !

PS: For extreme performance in tidy syntax, try tidyfst's mirror package tidyft.

Features

Receives any data.frame (tibble/data.table/data.frame) and returns a data.table.
Show the variable class of data.table as default.
Never use in place replacement (also known as modification by reference, which means the original variable would not be modified without notification).
Use suffix ("_dt") rather than prefix to increase the efficiency (especially when you have IDE with automatic code completion).
More flexible verbs (e.g. pairwisecountdt) for big data manipulation.
Supporting data importing and parsing with fst, which saves both time and memory. Details see parsefst/selectfst/filter_fst and importfst/exportfst.
Low and stable dependency on mature packages (data.table, fst, stringr)

Installation

R install.packages("tidyfst")

Example

```R library(tidyfst)

iris %>% mutatedt(group = Species,sl = Sepal.Length,sw = Sepal.Width) %>% selectdt(group,sl,sw) %>% filterdt(sl > 5) %>% arrangedt(group,sl) %>% distinctdt(sl,.keepall = T) %>% summarise_dt(sw = max(sw),by = group)

> group sw

>

> 1: setosa 4.4

> 2: versicolor 3.4

> 3: virginica 3.8

iris %>% countdt(Species) %>% addprop()

> Species n prop prop_label

>

> 1: setosa 50 0.3333333 33.3%

> 2: versicolor 50 0.3333333 33.3%

> 3: virginica 50 0.3333333 33.3%

iris[3:8,] %>% mutate_when(Petal.Width == .2, one = 1,Sepal.Length=2)

> Sepal.Length Sepal.Width Petal.Length Petal.Width Species one

>

> 1: 2.0 3.2 1.3 0.2 setosa 1

> 2: 2.0 3.1 1.5 0.2 setosa 1

> 3: 2.0 3.6 1.4 0.2 setosa 1

> 4: 5.4 3.9 1.7 0.4 setosa NA

> 5: 4.6 3.4 1.4 0.3 setosa NA

> 6: 2.0 3.4 1.5 0.2 setosa 1

```

Future plans

tidyfst will keep up with the updates of data.table , in the next step would introduce more new features to improve the performance and flexibility to facilitate fast data manipulation in tidy syntax.

Vignettes

Cheat sheet

Suggested citation

Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388

Related work

Acknowledgement

The author of maditr, Gregory Demin and the author of fst, Marcus Klik have helped me a lot in the development of this work. It is so lucky to have them (and many other selfless contributors) in the same open source community of R.

Owner

Name: Hope
Login: hope-data-science
Kind: user
Location: Beijing
Company: Chinese Academy of Sciences

Repositories: 3
Profile: https://github.com/hope-data-science

Use R to change the world!

JOSS Publication

tidyfst: Tidy Verbs for Fast Data Manipulation

Published

August 21, 2020

DOI

10.21105/joss.02388

Volume 5, Issue 52, Page 2388

Authors

Tian-Yuan Huang

School of Life Science, Fudan University

Bin Zhao

School of Life Science, Fudan University

Editor

Mikkel Meyer Andersen

GitHub Events

Total

Watch event: 12
Push event: 2

Last Year

Watch event: 12
Push event: 2

Committers

Last synced: 7 months ago

All Time

Total Commits: 349
Total Committers: 3
Avg Commits per committer: 116.333
Development Distribution Score (DDS): 0.02

Past Year

Commits: 21
Committers: 1
Avg Commits per committer: 21.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Hope	3****e	342
Hadley Wickham	h**m@g**m	6
Michael Chirico	m**4@g**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 24
Total pull requests: 2
Average time to close issues: 3 months
Average time to close pull requests: 3 days
Total issue authors: 19
Total pull request authors: 2
Average comments per issue: 3.88
Average comments per pull request: 3.5
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

markfairbanks (6)
michaelaoash (1)
XianglinZhang-risker (1)
rcannood (1)
lssb (1)
fc-ibb105 (1)
B-1991-ing (1)
hwanghan (1)
kongdd (1)
acpguedes (1)
jfdesomzee (1)
maskegger (1)
xiaoluolorn (1)
xiaodaigh (1)
hope-data-science (1)

Pull Request Authors

MichaelChirico (2)
hadley (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 3,414 last-month
Total docker downloads: 9

Total dependent packages: 2
Total dependent repositories: 2
Total versions: 27
Total maintainers: 1

cran.r-project.org: tidyfst

Tidy Verbs for Fast Data Manipulation

Homepage: https://github.com/hope-data-science/tidyfst
Documentation: http://cran.r-project.org/web/packages/tidyfst/tidyfst.pdf
License: MIT + file LICENSE
Latest release: 1.8.2
published 10 months ago

Versions: 27
Dependent Packages: 2
Dependent Repositories: 2
Downloads: 3,414 Last month
Docker Downloads: 9

Rankings

Stargazers count: 4.1%

Forks count: 7.9%

Downloads: 13.1%

Dependent packages count: 13.7%

Average: 14.0%

Dependent repos count: 19.3%

Docker downloads count: 25.8%

Maintainers (1)

huang.tian-yuan@qq.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.3.0 depends
data.table >= 1.13.0 imports
fst >= 0.9.0 imports
stringr >= 1.4.0 imports
bench * suggests
dplyr * suggests
ggplot2 * suggests
knitr * suggests
nycflights13 * suggests
pryr * suggests
rmarkdown * suggests
testthat * suggests
tidyr * suggests

tidyfst

Science Score: 93.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

tidyfst: Tidy Verbs for Fast Data Manipulation

Overview

Features

Installation

Example

> group sw

>

> 1: setosa 4.4

> 2: versicolor 3.4

> 3: virginica 3.8

> Species n prop prop_label

>

> 1: setosa 50 0.3333333 33.3%

> 2: versicolor 50 0.3333333 33.3%

> 3: virginica 50 0.3333333 33.3%

> Sepal.Length Sepal.Width Petal.Length Petal.Width Species one

>

> 1: 2.0 3.2 1.3 0.2 setosa 1

> 2: 2.0 3.1 1.5 0.2 setosa 1

> 3: 2.0 3.6 1.4 0.2 setosa 1

> 4: 5.4 3.9 1.7 0.4 setosa NA

> 5: 4.6 3.4 1.4 0.3 setosa NA

> 6: 2.0 3.4 1.5 0.2 setosa 1

Future plans

Vignettes

Cheat sheet

Suggested citation

Related work

Acknowledgement

Owner

JOSS Publication

tidyfst: Tidy Verbs for Fast Data Manipulation

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: tidyfst

Rankings

Maintainers (1)

Dependencies