sgd
An R package for large scale estimation with stochastic gradient descent
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 9 committers (22.2%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Keywords
Repository
An R package for large scale estimation with stochastic gradient descent
Basic Info
Statistics
- Stars: 62
- Watchers: 11
- Forks: 19
- Open Issues: 41
- Releases: 2
Topics
Metadata Files
README.md
sgd
sgd is an R package for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.
Features
At the core of the package is the function
{R}
sgd(formula, data, model, model.control, sgd.control)
It estimates parameters for a given data set and model using stochastic gradient
descent. The optional arguments model.control and sgd.control specify
attributes about the model and stochastic gradient method. Taking advantage of
the bigmemory package, sgd also operates on data sets which are too large to fit
in RAM as well as streaming data.
Example of large-scale linear regression: ```{R} library(sgd)
Dimensions
N <- 1e5 # number of data points d <- 1e2 # number of features
Generate data.
X <- matrix(rnorm(Nd), ncol=d) theta <- rep(5, d+1) eps <- rnorm(N) y <- cbind(1, X) %% theta + eps dat <- data.frame(y=y, x=X)
sgd.theta <- sgd(y ~ ., data=dat, model="lm") ```
Any loss function may be specified. For convenience the following are built-in: * Linear models * Generalized linear models * Method of moments * Generalized method of moments * Cox proportional hazards model * M-estimation
The following stochastic gradient methods exist: * (Standard) stochastic gradient descent * Implicit stochastic gradient descent * Averaged stochastic gradient descent * Averaged implicit stochastic gradient descent * Classical momentum * Nesterov's accelerated gradient
Check out the vignette in vignettes/ or examples in demo/.
In R, the equivalent commands are vignette(package="sgd") and
demo(package="sgd").
Installation
To install the latest version from CRAN:
{R}
install.packages("sgd")
To install the latest development version from Github: ```{R}
install.packages("devtools")
devtools::install_github("airoldilab/sgd") ```
Authors
sgd is written by Dustin Tran, Junhyung Lyle Kim and Panos Toulis. Please feel free to contribute by submitting any issues or requests—or by solving any current issues!
We thank all other members of the Airoldi Lab (led by Prof. Edo Airoldi) for their feedback and contributions.
Citation
@article{tran2015stochastic,
author = {Tran, Dustin and Toulis, Panos and Airoldi, Edoardo M},
title = {Stochastic gradient descent methods for estimation with large data sets},
journal = {arXiv preprint arXiv:1509.06459},
year = {2015}
}
Owner
- Name: Airoldi Lab
- Login: airoldilab
- Kind: organization
- Location: Harvard University, Cambridge, MA
- Website: http://applied.stat.harvard.edu
- Repositories: 15
- Profile: https://github.com/airoldilab
Harvard Laboratory for Applied Statistical Methodology & Data Science
GitHub Events
Total
- Watch event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Fork event: 1
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Dustin Tran | d****n@g****m | 211 |
| Tian Lan | t****7@g****m | 63 |
| Ye Kuang | k****j@g****m | 48 |
| J. Lyle Kim | j****m@r****u | 4 |
| ptoulis | p****s@g****m | 4 |
| hxd1011 | h****1@g****m | 2 |
| Nick Rittler | 3****g | 2 |
| ye | y****g@g****u | 1 |
| Lyle Kim | l****m@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 98
- Total pull requests: 2
- Average time to close issues: 2 months
- Average time to close pull requests: about 3 hours
- Total issue authors: 19
- Total pull request authors: 1
- Average comments per issue: 2.08
- Average comments per pull request: 1.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- dustinvtran (60)
- ptoulis (11)
- lantian2012 (7)
- MarcinKosinski (2)
- jeffwong-nflx (2)
- k-ye (2)
- karldw (2)
- fortisil (1)
- jonlachmann (1)
- acdec (1)
- deaneckles (1)
- donboyd5 (1)
- alexanderchernyakovgithub (1)
- ikosmidis (1)
- dselivanov (1)
Pull Request Authors
- hxd1011 (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 287 last-month
- Total docker downloads: 20,358
- Total dependent packages: 1
- Total dependent repositories: 5
- Total versions: 5
- Total maintainers: 1
cran.r-project.org: sgd
Stochastic Gradient Descent for Scalable Estimation
- Homepage: https://github.com/airoldilab/sgd
- Documentation: http://cran.r-project.org/web/packages/sgd/sgd.pdf
- License: GPL-2
-
Latest release: 1.1.2
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- MASS * imports
- Rcpp >= 0.11.3 imports
- ggplot2 * imports
- methods * imports
- stats * imports
- R.rsp * suggests
- bigmemory * suggests
- glmnet * suggests
- gridExtra * suggests
- testthat * suggests