sgd

An R package for large scale estimation with stochastic gradient descent

https://github.com/airoldilab/sgd

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 9 committers (22.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

big-data data-analysis gradient-descent r statistics
Last synced: 6 months ago · JSON representation

Repository

An R package for large scale estimation with stochastic gradient descent

Basic Info
  • Host: GitHub
  • Owner: airoldilab
  • Language: C++
  • Default Branch: master
  • Homepage:
  • Size: 2.03 MB
Statistics
  • Stars: 62
  • Watchers: 11
  • Forks: 19
  • Open Issues: 41
  • Releases: 2
Topics
big-data data-analysis gradient-descent r statistics
Created about 11 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog

README.md

sgd

sgd is an R package for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.

Features

At the core of the package is the function {R} sgd(formula, data, model, model.control, sgd.control) It estimates parameters for a given data set and model using stochastic gradient descent. The optional arguments model.control and sgd.control specify attributes about the model and stochastic gradient method. Taking advantage of the bigmemory package, sgd also operates on data sets which are too large to fit in RAM as well as streaming data.

Example of large-scale linear regression: ```{R} library(sgd)

Dimensions

N <- 1e5 # number of data points d <- 1e2 # number of features

Generate data.

X <- matrix(rnorm(Nd), ncol=d) theta <- rep(5, d+1) eps <- rnorm(N) y <- cbind(1, X) %% theta + eps dat <- data.frame(y=y, x=X)

sgd.theta <- sgd(y ~ ., data=dat, model="lm") ```

Any loss function may be specified. For convenience the following are built-in: * Linear models * Generalized linear models * Method of moments * Generalized method of moments * Cox proportional hazards model * M-estimation

The following stochastic gradient methods exist: * (Standard) stochastic gradient descent * Implicit stochastic gradient descent * Averaged stochastic gradient descent * Averaged implicit stochastic gradient descent * Classical momentum * Nesterov's accelerated gradient

Check out the vignette in vignettes/ or examples in demo/. In R, the equivalent commands are vignette(package="sgd") and demo(package="sgd").

Installation

To install the latest version from CRAN: {R} install.packages("sgd")

To install the latest development version from Github: ```{R}

install.packages("devtools")

devtools::install_github("airoldilab/sgd") ```

Authors

sgd is written by Dustin Tran, Junhyung Lyle Kim and Panos Toulis. Please feel free to contribute by submitting any issues or requests—or by solving any current issues!

We thank all other members of the Airoldi Lab (led by Prof. Edo Airoldi) for their feedback and contributions.

Citation

@article{tran2015stochastic, author = {Tran, Dustin and Toulis, Panos and Airoldi, Edoardo M}, title = {Stochastic gradient descent methods for estimation with large data sets}, journal = {arXiv preprint arXiv:1509.06459}, year = {2015} }

Owner

  • Name: Airoldi Lab
  • Login: airoldilab
  • Kind: organization
  • Location: Harvard University, Cambridge, MA

Harvard Laboratory for Applied Statistical Methodology & Data Science

GitHub Events

Total
  • Watch event: 2
  • Fork event: 1
Last Year
  • Watch event: 2
  • Fork event: 1

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 336
  • Total Committers: 9
  • Avg Commits per committer: 37.333
  • Development Distribution Score (DDS): 0.372
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Dustin Tran d****n@g****m 211
Tian Lan t****7@g****m 63
Ye Kuang k****j@g****m 48
J. Lyle Kim j****m@r****u 4
ptoulis p****s@g****m 4
hxd1011 h****1@g****m 2
Nick Rittler 3****g 2
ye y****g@g****u 1
Lyle Kim l****m@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 98
  • Total pull requests: 2
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 19
  • Total pull request authors: 1
  • Average comments per issue: 2.08
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dustinvtran (60)
  • ptoulis (11)
  • lantian2012 (7)
  • MarcinKosinski (2)
  • jeffwong-nflx (2)
  • k-ye (2)
  • karldw (2)
  • fortisil (1)
  • jonlachmann (1)
  • acdec (1)
  • deaneckles (1)
  • donboyd5 (1)
  • alexanderchernyakovgithub (1)
  • ikosmidis (1)
  • dselivanov (1)
Pull Request Authors
  • hxd1011 (2)
Top Labels
Issue Labels
feature (21) bug (10) interface (6) project (6) model examples (5) algorithm (3) testing (2) windows (1) code cleanup (1) documentation (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 287 last-month
  • Total docker downloads: 20,358
  • Total dependent packages: 1
  • Total dependent repositories: 5
  • Total versions: 5
  • Total maintainers: 1
cran.r-project.org: sgd

Stochastic Gradient Descent for Scalable Estimation

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 5
  • Downloads: 287 Last month
  • Docker Downloads: 20,358
Rankings
Forks count: 4.1%
Stargazers count: 5.9%
Docker downloads count: 12.6%
Dependent repos count: 13.1%
Average: 13.7%
Dependent packages count: 18.1%
Downloads: 28.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • MASS * imports
  • Rcpp >= 0.11.3 imports
  • ggplot2 * imports
  • methods * imports
  • stats * imports
  • R.rsp * suggests
  • bigmemory * suggests
  • glmnet * suggests
  • gridExtra * suggests
  • testthat * suggests