tidyfst
tidyfst: Tidy Verbs for Fast Data Manipulation - Published in JOSS (2020)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Repository
Tidy Verbs for Fast Data Manipulation
Basic Info
- Host: GitHub
- Owner: hope-data-science
- License: other
- Language: R
- Default Branch: master
- Homepage: https://hope-data-science.github.io/tidyfst/
- Size: 18.7 MB
Statistics
- Stars: 106
- Watchers: 5
- Forks: 7
- Open Issues: 0
- Releases: 12
Metadata Files
README.md
tidyfst: Tidy Verbs for Fast Data Manipulation
Overview
tidyfst is a toolkit of tidy data manipulation verbs with data.table as the backend . Combining the merits of syntax elegance from dplyr and computing performance from data.table, tidyfst intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of data.table, while enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations. Also, tidyfst would introduce more tidy data verbs from other packages, including but not limited to tidyverse and data.table. If you are a dplyr user but have to use data.table for speedy computation, or data.table user looking for readable coding syntax, tidyfst is designed for you (and me of course). For further details and tutorials, see vignettes. Both Chinese and English tutorials could be found there.
Till now, tidyfst has an API that might even transcend its predecessors (e.g. select_dt could accept nearly anything for super column selection). Enjoy the efficient data operations in tidyfst !
PS: For extreme performance in tidy syntax, try tidyfst's mirror package tidyft.
Features
- Receives any data.frame (tibble/data.table/data.frame) and returns a data.table.
- Show the variable class of data.table as default.
- Never use in place replacement (also known as modification by reference, which means the original variable would not be modified without notification).
- Use suffix ("_dt") rather than prefix to increase the efficiency (especially when you have IDE with automatic code completion).
- More flexible verbs (e.g. pairwisecountdt) for big data manipulation.
- Supporting data importing and parsing with fst, which saves both time and memory. Details see parsefst/selectfst/filter_fst and importfst/exportfst.
- Low and stable dependency on mature packages (data.table, fst, stringr)
Installation
R
install.packages("tidyfst")
Example
```R library(tidyfst)
iris %>% mutatedt(group = Species,sl = Sepal.Length,sw = Sepal.Width) %>% selectdt(group,sl,sw) %>% filterdt(sl > 5) %>% arrangedt(group,sl) %>% distinctdt(sl,.keepall = T) %>% summarise_dt(sw = max(sw),by = group)
> group sw
>
> 1: setosa 4.4
> 2: versicolor 3.4
> 3: virginica 3.8
iris %>% countdt(Species) %>% addprop()
> Species n prop prop_label
>
> 1: setosa 50 0.3333333 33.3%
> 2: versicolor 50 0.3333333 33.3%
> 3: virginica 50 0.3333333 33.3%
iris[3:8,] %>% mutate_when(Petal.Width == .2, one = 1,Sepal.Length=2)
> Sepal.Length Sepal.Width Petal.Length Petal.Width Species one
>
> 1: 2.0 3.2 1.3 0.2 setosa 1
> 2: 2.0 3.1 1.5 0.2 setosa 1
> 3: 2.0 3.6 1.4 0.2 setosa 1
> 4: 5.4 3.9 1.7 0.4 setosa NA
> 5: 4.6 3.4 1.4 0.3 setosa NA
> 6: 2.0 3.4 1.5 0.2 setosa 1
```
Future plans
tidyfst will keep up with the updates of data.table , in the next step would introduce more new features to improve the performance and flexibility to facilitate fast data manipulation in tidy syntax.
Vignettes
- Example 1: Basic usage
- Example 2: Join tables
- Example 3: Reshape
- Example 4: Nest
- Example 5: Fst
- Example 6: Dt
Cheat sheet
Suggested citation
Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388
Related work
Acknowledgement
The author of maditr, Gregory Demin and the author of fst, Marcus Klik have helped me a lot in the development of this work. It is so lucky to have them (and many other selfless contributors) in the same open source community of R.
Owner
- Name: Hope
- Login: hope-data-science
- Kind: user
- Location: Beijing
- Company: Chinese Academy of Sciences
- Repositories: 3
- Profile: https://github.com/hope-data-science
Use R to change the world!
JOSS Publication
tidyfst: Tidy Verbs for Fast Data Manipulation
Authors
Tags
data.table data aggregation data manipulation dplyr tidyfstGitHub Events
Total
- Watch event: 12
- Push event: 2
Last Year
- Watch event: 12
- Push event: 2
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Hope | 3****e | 342 |
| Hadley Wickham | h****m@g****m | 6 |
| Michael Chirico | m****4@g****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 24
- Total pull requests: 2
- Average time to close issues: 3 months
- Average time to close pull requests: 3 days
- Total issue authors: 19
- Total pull request authors: 2
- Average comments per issue: 3.88
- Average comments per pull request: 3.5
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- markfairbanks (6)
- michaelaoash (1)
- XianglinZhang-risker (1)
- rcannood (1)
- lssb (1)
- fc-ibb105 (1)
- B-1991-ing (1)
- hwanghan (1)
- kongdd (1)
- acpguedes (1)
- jfdesomzee (1)
- maskegger (1)
- xiaoluolorn (1)
- xiaodaigh (1)
- hope-data-science (1)
Pull Request Authors
- MichaelChirico (2)
- hadley (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 3,414 last-month
- Total docker downloads: 9
- Total dependent packages: 2
- Total dependent repositories: 2
- Total versions: 27
- Total maintainers: 1
cran.r-project.org: tidyfst
Tidy Verbs for Fast Data Manipulation
- Homepage: https://github.com/hope-data-science/tidyfst
- Documentation: http://cran.r-project.org/web/packages/tidyfst/tidyfst.pdf
- License: MIT + file LICENSE
-
Latest release: 1.8.2
published 10 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.3.0 depends
- data.table >= 1.13.0 imports
- fst >= 0.9.0 imports
- stringr >= 1.4.0 imports
- bench * suggests
- dplyr * suggests
- ggplot2 * suggests
- knitr * suggests
- nycflights13 * suggests
- pryr * suggests
- rmarkdown * suggests
- testthat * suggests
- tidyr * suggests

