edoif

EDOIF is a nonparametric framework based on estimation statistics principle. Its main purpose is to infer orders of empirical distributions from different categories base on a probability of finding a value in one distribution that greater than the expectation of another distribution.

https://github.com/darkeyes/edoif

Keywords

bootstrapping-statistics data-science estimation-statistics nonparametric-framework

Last synced: 6 months ago · JSON representation

Repository

EDOIF is a nonparametric framework based on estimation statistics principle. Its main purpose is to infer orders of empirical distributions from different categories base on a probability of finding a value in one distribution that greater than the expectation of another distribution.

Basic Info

Host: GitHub
Owner: DarkEyes
License: other
Language: R
Default Branch: master
Homepage:
Size: 13.3 MB

Statistics

Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

bootstrapping-statistics data-science estimation-statistics nonparametric-framework

Created over 6 years ago · Last pushed 8 months ago

Metadata Files

Readme License

README.md

Empirical Distribution Ordering Inference Framework (EDOIF)

Given a dataset of careers and incomes, how large a difference of income between any pair of careers would be? Given a dataset of travel time records, how long do we need to spend more when choosing a public transportation mode A instead of B to travel? In this work, we developed a framework to solve these problems named "EDOIF".

EDOIF is a nonparametric framework based on "Estimation Statistics" principle. Its main purpose is to infer orders of empirical distributions from different categories based on a probability of finding a value in one distribution that is greater than an expectation of another distribution. Given a set of ordered-pair of real-category values the framework is capable of

1) inferring orders of domination of categories and representing orders in a form of a graph; 2) estimating magnitude of difference between a pair of categories in forms of mean-difference confidence intervals; and 3) visualizing domination orders and magnitudes of difference of categories.

Installation

You can install our package from CRAN

r install.packages("EDOIF")

For the newest version on github, please call the following command in R terminal.

r remotes::install_github("DarkEyes/EDOIF") This requires a user to install the "remotes" package before installing EDOIF.

Example: Inferring orders of categories based on their empirical distributions

``` r library(EDOIF)

== simulation: Generating distributuions of five categories:

Category5 dominates Category4

Category4 dominates Category3

Category3 dominates Category2

Category2 dominates Category1

nInv=150 # number of samples per categories initMean=10 stepMean=20 std=8

simData1<-c() simData1$Values<-rnorm(nInv,mean=initMean,sd=std) simData1$Group<-rep(c("Category1"),times=nInv) simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) ) simData1$Group<-c(simData1$Group,rep(c("Category2"),times=nInv)) simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2stepMean,sd=std) ) simData1$Group<-c(simData1$Group,rep(c("Category3"),times=nInv) ) simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3stepMean,sd=std) ) simData1$Group<-c(simData1$Group, rep(c("Category4"),times=nInv) ) simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) ) simData1$Group<-c(simData1$Group, rep(c("Category5"),times=nInv) )

== parameter setting

bootT=1000 # number of times of sample with replacement in bootstrap function. alpha=0.05 # Significance level

== Calling the class constructor

A1<-EDOIF(simData1$Values,simData1$Group, bootT=bootT, alpha=alpha, methodType ="perc")

== Visualizing results

print(A1) # print the results in text mode plot(A1, fontSize=10) # print the results in graphic mode ``` Graphic mode results 1. An alpha-confidence-interval of mean plot for five categories. The horizontal axis represents categories and the vertical axis represents values within distributions of categories.

2. A dominant-distribution network of five categories. A node represents categories and an edge represents a dominant-distribution relation between categories. If there is an edge from category A to B, then A dominates B. A larger node size implies a higher mean value of a category.

An alpha-confidence-interval of mean difference plot for five categories.

Text mode results

```

EDOIF (Empirical Distribution Ordering Inference Framework)

Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc Using Mann-Whitney test to report whether A ≺ B A dominant-distribution network density:0.900000 Distribution: Category1 Mean:10.840671 95CI:[ 9.706981,12.014179] Distribution: Category2 Mean:11.044785 95CI:[ 9.806991,12.446037] Distribution: Category3 Mean:50.462935 95CI:[ 49.208005,51.757706] Distribution: Category4 Mean:70.299726 95CI:[ 69.103924,71.502505] Distribution: Category5

Mean:91.190505 95CI:[ 89.895480,92.518455]

Mean difference of Category2 (n=150) minus Category1 (n=150): Category1 ⊀ Category2 :p-val 0.4463 Mean Diff:0.204114 95CI:[ -1.545130,1.930609]

Mean difference of Category3 (n=150) minus Category1 (n=150): Category1 ≺ Category3 :p-val 0.0000 Mean Diff:39.622264 95CI:[ 37.984831,41.378232]

Mean difference of Category4 (n=150) minus Category1 (n=150): Category1 ≺ Category4 :p-val 0.0000 Mean Diff:59.459055 95CI:[ 57.921328,61.127817]

Mean difference of Category5 (n=150) minus Category1 (n=150): Category1 ≺ Category5 :p-val 0.0000 Mean Diff:80.349835 95CI:[ 78.620391,82.133270]

Mean difference of Category3 (n=150) minus Category2 (n=150): Category2 ≺ Category3 :p-val 0.0000 Mean Diff:39.418150 95CI:[ 37.543210,41.241722]

Mean difference of Category4 (n=150) minus Category2 (n=150): Category2 ≺ Category4 :p-val 0.0000 Mean Diff:59.254941 95CI:[ 57.304359,61.098774]

Mean difference of Category5 (n=150) minus Category2 (n=150): Category2 ≺ Category5 :p-val 0.0000 Mean Diff:80.145720 95CI:[ 78.313321,82.040234]

Mean difference of Category4 (n=150) minus Category3 (n=150): Category3 ≺ Category4 :p-val 0.0000 Mean Diff:19.836791 95CI:[ 18.047421,21.762239]

Mean difference of Category5 (n=150) minus Category3 (n=150): Category3 ≺ Category5 :p-val 0.0000 Mean Diff:40.727570 95CI:[ 39.004372,42.627946]

Mean difference of Category5 (n=150) minus Category4 (n=150): Category4 ≺ Category5 :p-val 0.0000 Mean Diff:20.890780 95CI:[ 19.079287,22.625807]

``` For more examples, please see the vignettes in this link .

Citation

Amornbunchornvej, Chainarong, Navaporn Surasvadi, Anon Plangprasopchok, Suttipong Thajchayapong. "A nonparametric framework for inferring orders of categorical data from category-real pairs." Heliyon 6, no. 11 (2020): e05435, ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2020.e05435. arXiv

Contact

Developer: C. Amornbunchornvej
https://orcid.org/0000-0003-3131-0370
Strategic Analytics Networks with Machine Learning and AI (SAI), NECTEC, Thailand
Homepage: Link

GitHub Events

Total

Push event: 1
Fork event: 1

Last Year

Push event: 1
Fork event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 99
Total Committers: 1
Avg Commits per committer: 99.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Chainarong Amornbunchornvej	g**a@g**m	99

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 17 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 17 hours
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

edoif

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Empirical Distribution Ordering Inference Framework (EDOIF)

Installation

Example: Inferring orders of categories based on their empirical distributions

== simulation: Generating distributuions of five categories:

Category5 dominates Category4

Category4 dominates Category3

Category3 dominates Category2

Category2 dominates Category1

== parameter setting

== Calling the class constructor

== Visualizing results

EDOIF (Empirical Distribution Ordering Inference Framework)

Mean:91.190505 95CI:[ 89.895480,92.518455]

Citation

Contact

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies