modeLLtest

modeLLtest: An R Package for Unbiased Model Comparison using Cross Validation - Published in JOSS (2019)

https://github.com/shanascogin/modelltest

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 18 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence

Engineering Computer Science - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

An R Package for tests comparing cross validated log likelihood (CVLL) estimates

Basic Info

Host: GitHub
Owner: ShanaScogin
License: gpl-3.0
Language: R
Default Branch: main
Homepage:
Size: 310 KB

Statistics

Stars: 9
Watchers: 1
Forks: 1
Open Issues: 4
Releases: 3

Created almost 8 years ago · Last pushed almost 4 years ago

Metadata Files

Readme Changelog Contributing License Code of conduct

modeLLtest

An R Package which implements model comparison tests using cross-validated log-likelihood (CVLL) values.

Introduction

modeLLtest is an R package which implements model comparison tests. This package includes functions for the cross-validated difference in means (CVDM) test and the cross-validated median fit (CVMF) test. The CVDM and CVMF tests assist researchers and students in selecting among models describing the same process. Selection among estimation methods describing the same process is a crucial methodological step within the social sciences. Other tools in modeLLtest include a function to output a vector of cross-validated log-likelihood (CVLL) values and a function to perform the CVDM test on an input of two CVLL vectors. The relevant papers including details can be found below:

Harden, J. J., & Desmarais, B. A. (2011). Linear Models with Outliers: Choosing between Conditional-Mean and Conditional-Median Methods. State Politics & Policy Quarterly, 11(4), 371-389. https://doi.org/10.1177/1532440011408929
Desmarais, B. A., & Harden, J. J. (2012). Comparing partial likelihood and robust estimation methods for the Cox regression model. Political Analysis, 20(1), 113-135. https://doi.org/10.1093/pan/mpr042
Desmarais, B. A., & Harden, J. J. (2014). An unbiased model comparison test using cross-validation. Quality & Quantity, 48(4), 2155-2173. https://doi.org/10.1007/s11135-013-9884-7

Installing The Package

The easiest way to install modeLLtest is to use install.packages() and download it from CRAN.

install.packages("modeLLtest")

Installing from CRAN should avoid any compilation issues that can arise and is the quickest option. However, the newest version of the package can also be downloaded with the devtools package in R from GitHub. To do this, install devtools by calling:

install.packages("devtools")

Now we can install from GitHub with the following line:

devtools::install_github("ShanaScogin/modeLLtest")

Once you have installed the package from either CRAN or GitHub, you can access it by calling:

library(modeLLtest) After the package is loaded, check out the ?modeLLtest to see a help file. You can also see the documentation for the functions with ?cvdm, ?cvll, ?cvlldiff, or ?cvmf. If you have issues or questions, please email me at sscogin@nd.edu.

Note on installation failure

Some users might experience gfortran errors due to Rcpp, RcppArmadillo, and MacOS. To fix this problem, consider installing gfortran 6.1 from https://cran.r-project.org/bin/macosx/tools/. (Also check out Yiqing Xu and Licheng Liu's note on this type of error in the gsynth package for some helpful links.)

Basic Usage

This package has four main functions: cvdm(), cvll(), cvmf(), and cvlldiff(). The function cvdm() deploys the CVDM test, which uses a bias-corrected Johnson's t-test to choose between the leave-one-out cross-validated log-likelihood outputs of two non-nested models. The function cvll() outputs a vector of leave-one-out cross-validated log-likelihoods for a given method. Currently, these functions accommodate linear regression, median regression (from the package quantreg), and two methods of robust regression (from the package MASS). Thecvlldiff() function performs the bias-corrected Johnson's t-test on two vectors of cross-validated log-likelihoods. Finally, the cvmf() function test between the partial likelihood maximization (PLM) and the iteratively reweighted robust (IRR) methods of estimation for a given application of the Cox model.

After loading the package, you can find the documentation for the functions with ?cvdm, ?cvll, ?cvlldiff, or ?cvmf. For the output, type ?cvdm_object, ?cvll_object, ?cvlldiff_object, or ?cvmf_object to view the documentation. If you encounter a bug or have comments or questions, please email me at sscogin@nd.edu.

Examples

Here are some examples of the functions in this package. First, we'll look at cvdm(), which applies cross-validated log-likelihood difference in means (CVDM) test to compare two methods of estimating a formula. The example compares ordinary least squares (OLS) estimation to median regression (MR).

``` library(modeLLtest) set.seed(123456) b0 <- .2 # True value for the intercept b1 <- .5 # True value for the slope n <- 500 # Sample size X <- runif(n, -1, 1)

Y <- b0 + b1 * X + rnorm(n, 0, 1) # N(0, 1 error)

obj_cvdm <- cvdm(Y ~ X, data.frame(cbind(Y, X)), method1 = "OLS", method2 = "MR")

obj_cvdm

```

Next, let's do the same as we did above, but with cvll() and cvlldiff(). These are general functions that create vectors of the cross-validated log-likelihood functions and then compute the bias-corrected Johnson's t-test, respectively.

``` library(modeLLtest) set.seed(123456) b0 <- .2 # True value for the intercept b1 <- .5 # True value for the slope n <- 500 # Sample size X <- runif(n, -1, 1)

Y <- b0 + b1 * X + rnorm(n, 0, 1) # N(0, 1 error)

objcvllols <- cvll(Y ~ X, data.frame(cbind(Y, X)), method = "OLS")

objcvllmr <- cvll(Y ~ X, data.frame(cbind(Y, X)), method = "MR")

objcvlldiff <- cvlldiff(objcvllols$cvll, objcvllmr$cvll, objcvll_ols$df)

obj_cvlldiff ```

Finally, let's look at the cvmf() function. This function compares the partial likelihood maximization (PLM) and the iteratively reweighted robust (IRR) methods of estimation for a given application of the Cox model. Note: This function currently runs slowly (approximately 3 seconds for one run). Future developments look to optimize this function.

``` library(modeLLtest) library(survival)

set.seed(12345) x1 <- rnorm(100) x2 <- rnorm(100) x2e <- x2 + rnorm(100, 0, 0.5)

y <- rexp(100, exp(x1 + x2)) y <- Surv(y) # Changing y into a survival obj with survival package

dat <- data.frame(y, x1, x2e) form <- y ~ x1 + x2e

obj_cvmf <- cvmf(formula = form, data = dat)

obj_cvmf ```

Data

This package includes two datasets from real-world analyses to facilitate examples. These datasets are publicly available and have been included in this package with the endorsement of the authors. More on the data and original analysis can be found in the following papers:

Joshi, M., & Mason, T. D. (2008). Between democracy and revolution: peasant support for insurgency versus democracy in Nepal. Journal of Peace Research, 45(6), 765-782. https://doi.org/10.1177/0022343308096155
Golder, S. N. (2010). Bargaining delays in the government formation process. Comparative Political Studies, 43(1), 3-32. https://doi.org/10.1177/0010414009341714

Examples with Replication Data

For an example of the CVDM test utilizing real-world analysis, we can look at a study by Joshi and Mason (2008, Journal of Peace Research 45(6): 765-782). This study employs robust regression to analyze district-level election turnout among peasants in Nepal. Specifically, Joshi and Mason hypothesize that peasant dependence on landed elite for survival will result in higher voter turnout. Using their model of the 1999 parliamentary elections, we can see how the use of a robust regression is supported by the CVDM test. These data are available on the Journal of Peace Research Replication Datasets website and have been included in the package for ease of replication. For full replication and discussion of the CVDM test, see Desmarais and Harden (2014, Quality and Quantity 48(4): 2155-2173).

``` library(MASS) library(modeLLtest)

data(nepaldem)

set.seed(978)

objcvdmjm <- cvdm(percentregvote1999 ~ landlessgap + below1pagap + sharecropgap + servicegap + fixmoneygap + fixprodgap + perwithoutinstcredit + totoalkilled1000 + hdigap1 + lnpop2001 + totalcontestants1999 + castethfract, data = nepaldem, method1 = "OLS", method2 = "RLM")

objcvdmjm

model1999 <- rlm(percentregvote1999 ~ landlessgap + below1pagap + sharecropgap + servicegap + fixmoneygap + fixprodgap + perwithoutinstcredit + totoalkilled1000 + hdigap1 + lnpop2001 + totalcontestants1999 + casteth_fract, data = nepaldem)

model_1999 ```

Next, we can look at a study by Golder (2010, Comparative Political Studies 43(1): 3-32) to see an example of the CVMF test. Golder employs the PLM method of estimating a Cox model to investigate Western European government formation duration. She hypothesizes that bargaining complexity leads to increasing delays in government formation as uncertainty increases. The CVMF test indicates that her choice of PLM is the better performing estimator (p < .05) compared to IRR. These data are available on the Harvard Dataverse page (https://doi.org/10.7910/DVN/BUWZBA) and have been included in the package for ease of replication. For full replication and discussion of the CVMF test, see Desmarais and Harden (2012, Political Analysis 20(1): 113-135).

``` library(survival) library(coxrobust) library(modeLLtest)

data(govtform)

golder_surv <- Surv(govtform$bargainingdays)

golderx <- cbind(govtform$postelection, govtform$legislativeparties, govtform$polarization, govtform$positiveparl, govtform$postlegislativeparties, govtform$postpolariz, govtform$postpositive, govtform$continuation, govtform$singlepartymajority)

colnames(golderx) <- c("govtform$postelection", "govtform$legislativeparties", "govtform$polarization", "govtform$positiveparl", "govtform$postlegislativeparties", "govtform$postpolariz", "govtform$postpositive", "govtform$continuation", "govtform$singlepartymajority")

objcvmfgolder <- cvmf(goldersurv ~ golderx, method = "efron")

objcvmfgolder

govtformplm <- coxph(goldersurv ~ golder_x, method = "efron")

govtform_plm ```

What's Happening

Next steps for this package include adding more methods to the cvdm() and cvll() functions and optimizing functions - especially cvmf() - to improve speed.

Contact

Please submit an issue if you encounter any bugs or problems with the package.

Owner

Name: Shana Scogin
Login: ShanaScogin
Kind: user
Company: University of Notre Dame

Website: https://shanascogin.com/
Twitter: ShanaScogin
Repositories: 3
Profile: https://github.com/ShanaScogin

Political scientist, researcher, educator

JOSS Publication

modeLLtest: An R Package for Unbiased Model Comparison using Cross Validation

Published

September 01, 2019

DOI

10.21105/joss.01542

Volume 4, Issue 41, Page 1542

Authors

Shana Scogin

University of Notre Dame, Department of Political Science

Sarah Petersen

University of Notre Dame, Department of Mathematics

Jeffrey J. Harden

University of Notre Dame, Department of Political Science

Bruce A. Desmarais

Pennsylvania State University, Department of Political Science

Editor

Bruce E. Wilson

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 251
Total Committers: 2
Avg Commits per committer: 125.5
Development Distribution Score (DDS): 0.004

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Shana Scogin	s**s@g**m	250
sarahllpetersen	s**n@g**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 1
Average time to close issues: over 1 year
Average time to close pull requests: less than a minute
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.2
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ShanaScogin (5)

Pull Request Authors

ShanaScogin (1)

Top Labels

Issue Labels

enhancement (2)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 238 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

cran.r-project.org: modeLLtest

Compare Models with Cross-Validated Log-Likelihood

Homepage: https://github.com/ShanaScogin/modeLLtest
Documentation: http://cran.r-project.org/web/packages/modeLLtest/modeLLtest.pdf
License: GPL-3
Latest release: 1.0.4
published almost 4 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 238 Last month

Rankings

Stargazers count: 17.0%

Forks count: 21.9%

Dependent packages count: 29.8%

Average: 31.9%

Dependent repos count: 35.5%

Downloads: 55.6%

Maintainers (1)

shanarscogin@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.2.3 depends
MASS * imports
Rcpp * imports
coxrobust * imports
quantreg * imports
stats * imports
survival * imports
knitr * suggests
rmarkdown * suggests
testthat * suggests

modeLLtest

Science Score: 93.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

modeLLtest

Introduction

Installing The Package

Note on installation failure

Basic Usage

Examples

Data

Examples with Replication Data

What's Happening

Contact

Owner

JOSS Publication

modeLLtest: An R Package for Unbiased Model Comparison using Cross Validation

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: modeLLtest

Rankings

Maintainers (1)

Dependencies