ebrahim.gof

Ebrahim-Farrington Binary logistic regression goodness of fit test

https://github.com/ebrahimkhaled/ebrahim.gof

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

Ebrahim-Farrington Binary logistic regression goodness of fit test

Basic Info

Host: GitHub
Owner: ebrahimkhaled
License: other
Language: R
Default Branch: main
Size: 460 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 8 months ago · Last pushed 8 months ago

Metadata Files

Readme Changelog License

Ebrahim-Farrington Goodness of Fit test

Overview

The ebrahim.gof package implements the Ebrahim-Farrington goodness-of-fit test for logistic regression models. This test is particularly effective for binary data and sparse datasets, providing an improved alternative to the traditional Hosmer-Lemeshow test.

Key Features

Ebrahim-Farrington Test: Simplified implementation for binary data with automatic grouping
Original Farrington Test: Full implementation for grouped data
Robust Performance: Particularly effective with sparse data and binary outcomes
Easy to Use: Simple function interface similar to other goodness-of-fit tests
Well Documented: Comprehensive documentation with examples

Installation

From GitHub (Development Version)

Copy and paste this in R or R-studio. ```r

Install devtools if you haven't already

if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }

Install ebrahim.gof from GitHub

devtools::install_github("ebrahimkhaled/ebrahim.gof") ```

From CRAN (Stable Version) (NOT AVAILABLE YET)

Another way to install the R-Libarary, but its not avaialbe yet. ```r

Will be available after CRAN submission

install.packages("ebrahim.gof") ```

Quick Start

```r library(ebrahim.gof)

Example with binary data

set.seed(123) n <- 500 x <- rnorm(n) linpred <- 0.5 + 1.2 * x prob <- 1 / (1 + exp(-linpred)) y <- rbinom(n, 1, prob)

Fit logistic regression

model <- glm(y ~ x, family = binomial()) predicted_probs <- fitted(model)

Perform Ebrahim-Farrington test

result <- ef.gof(y, predicted_probs, G = 10) print(result) ```

Main Functions

`ef.gof()`

The main function that performs the goodness-of-fit test:

r ef.gof(y, predicted_probs , G = 10, model = NULL, m = NULL)

Parameters: - y: Binary response vector (0/1) or success counts for grouped data - predicted_probs: Vector of predicted probabilities from logistic model - G: Number of groups for binary data (default: 10) - model: Optional glm object (required for original Farrington only, not for Ebrahim-Farrington test) - m: Optional vector of trial counts (for grouped data) (required for original Farrington only, not for Ebrahim-Farrington test)

Returns: A data frame with test name, test statistic, and p-value.

Examples

Example 1: Basic Usage with Binary Data

```r library(ebrahim.gof)

Simulate binary data

set.seed(42) n <- 1000 x1 <- rnorm(n) x2 <- rnorm(n) linpred <- -0.5 + 0.8 * x1 + 0.6 * x2 prob <- plogis(linpred) y <- rbinom(n, 1, prob)

Fit logistic regression

model <- glm(y ~ x1 + x2, family = binomial()) predicted_probs <- fitted(model)

Test goodness of fit

result <- ef.gof(y, predicted_probs, G = 10) print(result)

> Test TestStatistic pvalue

> 1 Ebrahim-Farrington -0.8944 0.8143

```

Example 2: Compare Different Group Numbers

```r

Test with different numbers of groups

results <- data.frame( Groups = c(4, 10, 20), Pvalue = c( ef.gof(y, predictedprobs, G = 4)$pvalue, ef.gof(y, predictedprobs, G = 10)$pvalue, ef.gof(y, predictedprobs, G = 20)$p_value ) ) print(results) ```

Example 3: Comparison with Hosmer-Lemeshow Test

```r library(ResourceSelection)

Ebrahim-Farrington test

efresult <- ef.gof(y, predictedprobs, G = 10)

Hosmer-Lemeshow test

hlresult <- hoslem.test(y, predictedprobs, g = 10)

Compare results

comparison <- data.frame( Test = c("Ebrahim-Farrington", "Hosmer-Lemeshow"), Pvalue = c(efresult$pvalue, hlresult$p.value) ) print(comparison) ```

Example 4: Power Analysis

```r

Function to simulate misspecified model

simulatepower <- function(n, betaquad = 0.1, n_sims = 100) { rejections <- 0

for (i in 1:nsims) { x <- runif(n, -2, 2) # True model has quadratic term linpredtrue <- 0 + x + betaquad * x^2 probtrue <- plogis(linpredtrue) y <- rbinom(n, 1, probtrue)

# Fit misspecified linear model
model_mis <- glm(y ~ x, family = binomial())
pred_probs <- fitted(model_mis)

# Test goodness of fit
test_result <- ef.gof(y, pred_probs, G = 10)

if (test_result$p_value < 0.05) {
  rejections <- rejections + 1
}

}

return(rejections / n_sims) }

Calculate power for different sample sizes

powerresults <- data.frame( n = c(100, 200, 500, 1000), power = sapply(c(100, 200, 500, 1000), simulatepower) ) print(power_results) ```

Methodology

The Ebrahim-Farrington test is based on Farrington's (1996) theoretical framework but simplified for practical implementation with binary data. The test uses a modified Pearson chi-square statistic:

For binary data with automatic grouping, the test statistic is:

Z_EF = (T_EF - (G - 2)) / sqrt(2(G - 2))

Where: - T_EF is the modified Pearson chi-square statistic - G is the number of groups - The test statistic follows a standard normal distribution under H₀

Advantages over Hosmer-Lemeshow Test

Better Power: More sensitive to model misspecification
Sparse Data Handling: Specifically designed for sparse data situations
Computational Efficiency: Simplified calculations for binary data
Theoretical Foundation: Based on rigorous asymptotic theory ## Superior Performance at G=10 Simulation results consistently demonstrate that the Ebrahim-Farrington test outperforms the Hosmer-Lemeshow test, even when the model misspecification is minimal—such as with a missing interaction or omitted quadratic term—when using G = 10 groups (Ebrahim, 2025).

Assympotitically Following the Standard Normal Distn

The following two figures illustrate that, under the null hypothesis, the Ebrahim-Farrington test statistic is asymptotically standard normal for both single-predictor and multiple-predictor logistic regression models. This property holds even in sparse data settings, confirming the theoretical foundation of the test and supporting its use for model assessment. (see (Ebrahim,2025))

Figure 1: Empirical cumulative distribution function (CDF) of the Ebrahim-Farrington test statistic under the null for a single predictor, compared to the standard normal CDF.
Figure 2: Empirical CDF for the test statistic under the null for a multiple independent predictors scenario, again compared to the standard normal.

These results demonstrate that the Ebrahim-Farrington test maintains the correct type I error rate and its statistic converges to the standard normal distribution as sample size increases, validating its asymptotic properties.

Farrington CDF Comparison (U-3_3) Farrington CDF Comparison (multi_indep)

References

Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. Journal of the Royal Statistical Society. Series B (Methodological), 58(2), 349-360.
Ebrahim, Khaled Ebrahim (2025). Goodness-of-Fits Tests and Calibration Machine Learning Algorithms for Logistic Regression Model with Sparse Data. Master's Thesis, Alexandria University.
Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression, Second Edition. New York: Wiley.

Citation

If you use this package in your research, please cite:

Ebrahim, K. E. (2025). ebrahim.gof: Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression. R package version 1.0.0. https://github.com/ebrahimkhaled/ebrahim.gof

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the GPL-3 License - see the LICENSE file for details.

Author

Ebrahim Khaled Ebrahim
Alexandria University
Email: ebrahim.khaled@alexu.edu.eg

Acknowledgments

Prof. Osama Abd ElAziz Hussien (Alexandria University) for supervision
Dr. Ahmed El-Kotory (Alexandria University) for guidance and supervision
The R community for continuous support and feedback

Owner

Login: ebrahimkhaled
Kind: user

Repositories: 2
Profile: https://github.com/ebrahimkhaled

GitHub Events

Total

Push event: 3
Create event: 1

Last Year

Push event: 3
Create event: 1

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: ebrahim.gof

Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression

Homepage: https://github.com/ebrahimkhaled/ebrahim.gof
Documentation: http://cran.r-project.org/web/packages/ebrahim.gof/ebrahim.gof.pdf
License: GPL-3
Latest release: 1.0.0
published 7 months ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 0 Last month

Rankings

Dependent packages count: 25.6%

Dependent repos count: 31.5%

Average: 47.5%

Downloads: 85.4%

Maintainers (1)

ebrahimkhaled@alexu.edu.eg

Last synced: 7 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

DESCRIPTION cran

R >= 3.5.0 depends
stats * imports
utils * imports
ResourceSelection * suggests
ggplot2 * suggests
knitr * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests

ebrahim.gof

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Ebrahim-Farrington Goodness of Fit test

Overview

Key Features

Installation

From GitHub (Development Version)

Install devtools if you haven't already

Install ebrahim.gof from GitHub

From CRAN (Stable Version) (NOT AVAILABLE YET)

Will be available after CRAN submission

Quick Start

Example with binary data

Fit logistic regression

Perform Ebrahim-Farrington test

Main Functions

ef.gof()

Examples

Example 1: Basic Usage with Binary Data

Simulate binary data

Fit logistic regression

Test goodness of fit

> Test TestStatistic pvalue

> 1 Ebrahim-Farrington -0.8944 0.8143

Example 2: Compare Different Group Numbers

Test with different numbers of groups

Example 3: Comparison with Hosmer-Lemeshow Test

Ebrahim-Farrington test

Hosmer-Lemeshow test

Compare results

Example 4: Power Analysis

Function to simulate misspecified model

Calculate power for different sample sizes

Methodology

Advantages over Hosmer-Lemeshow Test

Assympotitically Following the Standard Normal Distn

References

Citation

Contributing

License

Author

Acknowledgments

Owner

GitHub Events

Total

Last Year

Packages

cran.r-project.org: ebrahim.gof

Rankings

Maintainers (1)

Dependencies

`ef.gof()`