data.table

R's data.table package extends data.frame:

https://github.com/rdatatable/data.table

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    5 of 156 committers (3.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary

Keywords from Contributors

visualisation data-manipulation grammar pandoc rmarkdown literate-programming date-time reproducibility curl coverage-report
Last synced: 6 months ago · JSON representation

Repository

R's data.table package extends data.frame:

Basic Info
  • Host: GitHub
  • Owner: Rdatatable
  • License: mpl-2.0
  • Language: R
  • Default Branch: master
  • Homepage: http://r-datatable.com
  • Size: 58.1 MB
Statistics
  • Stars: 3,769
  • Watchers: 172
  • Forks: 1,015
  • Open Issues: 972
  • Releases: 0
Created over 11 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Codeowners Governance

README.md

data.table

CRAN status R-CMD-check Codecov test coverage GitLab CI build status downloads CRAN usage BioC usage indirect usage Powered by NumFOCUS <!-- badges: end -->

data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.

The data.table project uses a custom governance agreement and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.


Why data.table?

  • concise syntax: fast to type, fast to read
  • fast speed
  • memory efficient
  • careful API lifecycle management
  • community
  • feature rich

Features

  • fast and friendly delimited file reader: ?fread, see also convenience features for small data
  • fast and feature rich delimited file writer: ?fwrite
  • low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
  • fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators >, >=, <, <=), aggregate on join (by=.EACHI), update on join
  • fast add/update/delete columns by reference by group using no copies at all
  • fast and feature rich reshaping data: ?dcast (pivot/wider/spread) and ?melt (unpivot/longer/gather)
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type list are supported
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

Installation

```r install.packages("data.table")

latest development version (only if newer available)

data.table::updatedevpkg()

latest development version (force install)

install.packages("data.table", repos="https://rdatatable.gitlab.io/data.table") ```

See the Installation wiki for more details.

Usage

Use data.table subset [ operator the same way you would use data.frame one, but...

  • no need to prefix each column with DT$ (like subset() and with() but built-in)
  • any R expression using any package is allowed in j argument, not just list of columns
  • extra argument by to compute j expression by group

```r library(data.table) DT = as.data.table(iris)

FROM[WHERE, SELECT, GROUP BY]

DT [i, j, by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]

Species V1

1: versicolor 4.362791

2: virginica 5.552000

```

Getting started

Cheatsheets

Community

data.table is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the top most starred R packages on GitHub, and was highly rated by the Depsy project. If you need help, the data.table community is active on StackOverflow.

A list of packages that significantly support, extend, or make use of data.table can be found in the Seal of Approval document.

Stay up-to-date

Contributing

Guidelines for filing issues / pull requests: Contribution Guidelines.

Owner

  • Name: Rdatatable
  • Login: Rdatatable
  • Kind: organization

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 5,429
  • Total Committers: 156
  • Avg Commits per committer: 34.801
  • Development Distribution Score (DDS): 0.578
Past Year
  • Commits: 578
  • Committers: 44
  • Avg Commits per committer: 13.136
  • Development Distribution Score (DDS): 0.519
Top Committers
Name Email Commits
Matt Dowle m****e@g****m 2,292
arunsrinivasan a****b@g****m 986
Michael Chirico c****m@g****m 699
Jan Gorecki j****i 424
Benjamin Schwendinger 5****n 109
Toby Dylan Hocking t****5@g****m 71
Pasha Stetsenko p****a@h****i 70
Ani b****6@g****m 59
aitap k****t@g****m 57
Michael Chirico m****o@g****m 48
badasahog 5****g 39
Tom Short t****t@e****m 38
Steve Lianoglou s****u@g****m 37
Eduard Antonyan e****n@g****m 32
Xianying Tan s****n@1****m 30
nitish jha n****a@n****l 29
Joshua Wu j****4@g****m 29
Nitish Jha 1****2 25
HughParsonage h****e@g****m 20
venom1204 v****4@g****m 16
Cole Miller 5****1 13
MarkusBonsch m****h@p****e 13
Scott Ritchie s****3@g****m 11
Václav Tlapák 5****k 10
Mukul 1****4 10
Kelly N. Bodwin k****y@b****s 9
Rafael Fontenelle r****e 8
Tyson Barrett t****8@g****m 8
Philippe Chataignon p****n 7
Rick Saporta R****a@g****m 7
and 126 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1,241
  • Total pull requests: 1,776
  • Average time to close issues: over 2 years
  • Average time to close pull requests: 4 months
  • Total issue authors: 427
  • Total pull request authors: 91
  • Average comments per issue: 4.22
  • Average comments per pull request: 3.34
  • Merged pull requests: 1,191
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 272
  • Pull requests: 799
  • Average time to close issues: 15 days
  • Average time to close pull requests: 8 days
  • Issue authors: 89
  • Pull request authors: 46
  • Average comments per issue: 1.67
  • Average comments per pull request: 2.72
  • Merged pull requests: 526
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MichaelChirico (230)
  • jangorecki (94)
  • tdhock (79)
  • arunsrinivasan (56)
  • mattdowle (41)
  • ben-schwen (21)
  • TysonStanley (19)
  • badasahog (17)
  • renkun-ken (16)
  • aitap (15)
  • franknarf1 (14)
  • iago-pssjd (11)
  • OfekShilon (10)
  • iagogv3 (10)
  • shrektan (9)
Pull Request Authors
  • MichaelChirico (696)
  • ben-schwen (141)
  • jangorecki (136)
  • aitap (109)
  • badasahog (95)
  • venom1204 (77)
  • tdhock (66)
  • Nj221102 (59)
  • joshhwuu (47)
  • Anirban166 (33)
  • Mukulyadav2004 (30)
  • TysonStanley (19)
  • DorisAmoakohene (14)
  • rffontenelle (13)
  • KyleHaynes (11)
Top Labels
Issue Labels
feature request (136) fread (111) bug (71) documentation (67) top request (51) consistency (45) performance (39) internals (34) revdep (34) tests (33) enhancement (31) joins (29) by-reference (28) beginner-task (28) ci (27) reshape (24) translation (23) regression (22) openmp (21) programming (19) non-atomic column (19) platform-specific (19) question (19) dev (18) print (17) rbindlist (17) help-wanted (16) GForce (16) fwrite (16) idate/itime (15)
Pull Request Labels
ci (91) translation (91) froll (22) documentation (15) tests (11) internals (10) code-quality (10) governance (10) graphite-ready (9) reshape (8) dev (7) atime (7) High (5) breaking-change (5) rbindlist (5) hi_IN (4) fread (4) platform-specific (4) release (3) openmp (3) R-devel (3) consistency (2) help-wanted (2) regression (2) non-atomic column (2) programming (2) encoding (1) enhancement (1) www (1) segfault (1)

Packages

  • Total packages: 2
  • Total downloads:
    • cran 842,742 last-month
  • Total docker downloads: 124,987,508
  • Total dependent packages: 1,850
    (may contain duplicates)
  • Total dependent repositories: 8,346
    (may contain duplicates)
  • Total versions: 89
  • Total maintainers: 1
cran.r-project.org: data.table

Extension of 'data.frame'

  • Versions: 72
  • Dependent Packages: 1,686
  • Dependent Repositories: 8,228
  • Downloads: 842,742 Last month
  • Docker Downloads: 124,987,508
Rankings
Dependent repos count: 0.0%
Stargazers count: 0.1%
Forks count: 0.1%
Dependent packages count: 0.1%
Downloads: 0.1%
Average: 2.9%
Docker downloads count: 17.3%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-data.table
  • Versions: 17
  • Dependent Packages: 164
  • Dependent Repositories: 118
Rankings
Dependent packages count: 0.4%
Dependent repos count: 3.1%
Average: 3.9%
Forks count: 5.0%
Stargazers count: 7.0%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1.0 depends
  • methods * imports
  • R.utils * suggests
  • bit >= 4.0.4 suggests
  • bit64 >= 4.0.0 suggests
  • curl * suggests
  • knitr * suggests
  • markdown * suggests
  • nanotime * suggests
  • rmarkdown * suggests
  • xts * suggests
  • yaml * suggests
  • zoo >= 1.8 suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
.github/workflows/test-coverage.yaml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite