lolR

package for dimensionality reduction under supervised data scenarios

https://github.com/neurodata/lol

Science Score: 51.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, zenodo.org
✓
Committers with academic emails
4 of 7 committers (57.1%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

package for dimensionality reduction under supervised data scenarios

Basic Info

Host: GitHub
Owner: neurodata
License: gpl-3.0
Language: R
Default Branch: master
Size: 39.6 MB

Statistics

Stars: 20
Watchers: 7
Forks: 34
Open Issues: 3
Releases: 0

Created over 8 years ago · Last pushed over 5 years ago

Metadata Files

Readme License Citation

Linear Optimal Low Rank Projection (lolR)

Overview
Repo Contents
System Requirements
Installation Guide
Demo
Results
License
Issues
Citation

Overview

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this high dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then we are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Repo Contents

R: R package code.
docs: package documentation, and usage of the lolR package on many real and simulated data examples.
man: package manual for help in R session.
tests: R unit tests written using the testthat package.
vignettes: R vignettes for R session html help pages.

System Requirements

Hardware Requirements

The lol package requires only a standard computer with enough RAM to support the operations defined by a user. For minimal performance, this will be a computer with about 2 GB of RAM. For optimal performance, we recommend a computer with the following specs:

RAM: 16+ GB
CPU: 4+ cores, 3.3+ GHz/core

The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores@3.3 GHz) and internet of speed 25 Mbps.

Software Requirements

OS Requirements

The package development version is tested on Linux operating systems. The developmental version of the package has been tested on the following systems:

Linux: Ubuntu 16.04
Mac OSX:
Windows:

The CRAN package should be compatible with Windows, Mac, and Linux operating systems.

Before setting up the lolR package, users should have R version 3.4.0 or higher, and several packages set up from CRAN.

Installing R version 3.4.2 on Ubuntu 16.04

the latest version of R can be installed by adding the latest repository to apt:

sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 gpg -a --export E084DAB9 | sudo apt-key add - sudo apt-get update sudo apt-get install r-base r-base-dev

which should install in about 20 seconds.

Installation Guide

Stable Release

lolR is available in a stable release on CRAN:

install.packages('lolR')

Development Version

Package dependencies

Users should install the following packages prior to installing lolR, from an R terminal:

install.packages(c('ggplot2', 'abind', 'irlba', 'knitr', 'rmarkdown', 'latex2exp', 'MASS', 'randomForest'))

which will install in about 30 seconds on a machine with the recommended specs.

The lolR package functions with all packages in their latest versions as they appear on CRAN on December 13, 2017. Users can check CRAN snapshot for details. The versions of software are, specifically: abind_1.4-5 latex2exp_0.4.0 ggplot2_2.2.1 irlba_2.3.1 Matrix_1.2-3 MASS_7.3-47 randomForest_4.6-12

If you are having an issue that you believe to be tied to software versioning issues, please drop us an Issue.

Package Installation

From an R session, type:

require(devtools) install_github('neurodata/lol', build_vignettes=TRUE, force=TRUE) # install lol with the vignettes require(lolR) vignette("lol", package="lolR") # view one of the basic vignettes

The package should take approximately 40 seconds to install with vignettes on a recommended computer.

Demo

Functions

For interactive demos of the functions, please check out the vignettes built into the package. They can be accessed as follows:

require(lolR) vignette('lol') vignette('pca') vignette('cpca') vignette('lrcca') vignette('mdp') vignette('xval') vignette('qoq') vignette('simulations') vignette('nearestCentroid')

Extending the lolR Package

The lolR package makes many useful resources available (such as embedding and cross-validation) for simple extension.

To extend the lolR package, check out the vignettes:

require(lolR) vignette('extend_embedding') vignette('extend_classification')

Results

In this benchmark comparison, we show that LOL does better than all linear embedding techniques in supervised HDLSS settings when dimensionality is high (d > 100, ntrain <= d) on 20 benchmark problems from the UCI and PMLB datasets. LOL provides a good tradeoff between maintaining the class conditional difference (good misclassification rate) in a small number of dimensions (low number of embedding dimensions).

Citation

For usage of the package and associated manuscript, please cite according to the enclosed citation.bib.

Owner

Name: neurodata
Login: neurodata
Kind: organization
Email: admin@neurodata.io
Location: everywhere

Website: https://neurodata.io
Repositories: 175
Profile: https://github.com/neurodata

Citation (citation.bib)

@article{Vogelstein2017,
    abstract = {Classifying samples into categories becomes intractable when a single sample can have millions to billions of features, such as in genetics or imaging data. Principal Components Analysis (PCA) is widely used to identify a low-dimensional representation of such features for further analysis. However, PCA ignores class labels, such as whether or not a subject has cancer, thereby discarding information that could substantially improve downstream classification performance. We describe an approach, "Linear Optimal Low-rank" projection (LOL), which extends PCA by incorporating the class labels in a fashion that is advantageous over existing supervised dimensionality reduction techniques. We prove, and substantiate with synthetic experiments, that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost. We then demonstrate that LOL substantially outperforms PCA in differentiating cancer patients from healthy controls using genetic data, and in differentiating gender using magnetic resonance imaging data with {\$}{\textgreater}{\$}500 million features and 400 gigabytes of data. LOL therefore allows the solution of previous intractable problems, yet requires only a few minutes to run on a desktop computer.},
    archivePrefix = {arXiv},
    arxivId = {1709.01233},
    author = {Vogelstein, Joshua T. and Tang, Minh and Bridgeford, Eric and Zheng, Da and Burns, Randal and Maggioni, Mauro},
    eprint = {1709.01233},
    month = {sep},
    title = {{Linear Optimal Low Rank Projection for High-Dimensional Multi-Class Data}},
    url = {http://arxiv.org/abs/1709.01233},
    year = {2017}
}

@manual{Bridgeford2018,
    author = {Bridgeford, Eric W and Tang, Minh and Yim, Jason and Vogelstein, Joshua T},
    doi = {10.5281/ZENODO.1246979},
    month = {may},
    title = {{Linear Optimal Low-Rank Projection}},
    url = {https://zenodo.org/record/1246979},
    year = {2018},
    note = {CRAN Package}
}

GitHub Events

Total

Watch event: 1
Fork event: 5

Last Year

Watch event: 1
Fork event: 5

Committers

Last synced: over 3 years ago

All Time

Total Commits: 338
Total Committers: 7
Avg Commits per committer: 48.286
Development Distribution Score (DDS): 0.036

Top Committers

Name	Email	Commits
Eric Bridgeford	e**2@j**u	326
Richard Chen	r**0@j**u	5
Alex Loftus	a**4@g**m	2
jyim6	j**m@g**m	2
Ubuntu	u**u@i**l	1
Eric Bridgeford	e**2@b**u	1
Ran Liu	r**4@j**u	1

Committer Domains (Top 20 + Academic)

jhu.edu: 3 brainviz1.cs.jhu.edu: 1 ip-172-31-83-54.ec2.internal: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 5
Total pull requests: 3
Average time to close issues: 24 days
Average time to close pull requests: less than a minute
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 0.6
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ebridge2 (3)
jovo (1)
sbhattacharyay (1)

Pull Request Authors

ebridge2 (3)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 238 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

cran.r-project.org: lolR

Linear Optimal Low-Rank Projection

Homepage: https://github.com/neurodata/lol
Documentation: http://cran.r-project.org/web/packages/lolR/lolR.pdf
License: GPL-2
Latest release: 1.0.1
published over 8 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 238 Last month

Rankings

Forks count: 3.2%

Stargazers count: 12.6%

Average: 29.6%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Downloads: 66.9%

Maintainers (1)

ericwb95@gmail.com

Last synced: 10 months ago

Dependencies

DESCRIPTION cran

R >= 3.4.0 depends
MASS * imports
abind * imports
ggplot2 * imports
irlba * imports
pls * imports
robust * imports
robustbase * imports
covr * suggests
knitr * suggests
latex2exp * suggests
parallel * suggests
randomForest * suggests
rmarkdown * suggests
testthat * suggests

lolR

Science Score: 51.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Linear Optimal Low Rank Projection (lolR)

Contents

Overview

Repo Contents

System Requirements

Hardware Requirements

Software Requirements

OS Requirements

Installing R version 3.4.2 on Ubuntu 16.04

Installation Guide

Stable Release

Development Version

Package dependencies

Package Installation

Demo

Functions

Extending the lolR Package

Results

Citation

Owner

Citation (citation.bib)

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: lolR

Rankings

Maintainers (1)

Dependencies