https://github.com/dmetivie/robustmeans.jl

Implement some Robust Mean Estimators

https://github.com/dmetivie/robustmeans.jl

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary

Keywords

confidence-intervals julia julia-language robust robust-statistics statistics
Last synced: 4 months ago · JSON representation

Repository

Implement some Robust Mean Estimators

Basic Info
  • Host: GitHub
  • Owner: dmetivie
  • License: mit
  • Language: Julia
  • Default Branch: master
  • Homepage:
  • Size: 179 KB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 4
Topics
confidence-intervals julia julia-language robust robust-statistics statistics
Created almost 4 years ago · Last pushed 8 months ago
Metadata Files
Readme License Authors

README.md

RobustMeans

The aim of this package is to implement in Julia some robust mean estimators (one-dimensional for now). See Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey or The Robust Randomized Quasi Monte Carlo method, applications to integrating singular functions for recent surveys.

[!NOTE] Computing the empirical mean over a data set is one of the most common operations in data analysis. However, this operation is not robust to outliers or contaminated samples. Robust mean estimators are mean estimators that are robust (in some sense) against such outliers or contaminated samples.

I am currently trying some stuff on the package about "robust moving average"

Example: Comparing robust estimator vs Empirical Means

julia using Distributions using RobustMeans

Generate samples

julia n = 8 * 7 M = 10^5 # M = 10^7 is used for the plot α = 3.1 distribution = Pareto(α) μ = mean(distribution) # True mean σ = std(distribution) # True std x = rand(distribution, M, n) # M realizations of samples of size n

Estimate the mean with different estimators

```julia

Store all the realizations into a Dictionary

p = 1 # Parameter of Minsker-Ndaoud δ = 3exp(-8) # 0.001 estimators = [EmpiricalMean(), Catoni(σ), Huber(σ), LeeValiant(), MinskerNdaoud(p)] short_names = ["EM", "CA", "HU", "LV", "MN"] estimates = Dict{MeanEstimator,Vector}() for estimator in estimators estimates[estimator] = [mean(r, δ, estimator) for r in eachrow(x)] end ```

Results

Code for the plot ```julia using StatsPlots, LaTeXStrings gr() plot_font = "Computer Modern" # To have nice LaTeX font plots. default( fontfamily = plot_font, linewidth = 2, label = nothing, grid = true, framestyle = :default ) ``` ```julia begin plot(thickness_scaling = 2, size = (1000, 600)) plot!(Normal(), label = L"\mathcal{N}(0,1)", c = :black, alpha = 0.6) for (ns, s) in enumerate(estimators) W = √(n) * (estimates[s] .- μ) / σ stephist!(W, alpha = 0.6, norm = :pdf, label = short_names[ns], c = ns) vline!([quantile(W, 1-δ)], s = :dot, c = ns) end vline!([0], label = :none, c = :black, lw = 1, alpha = 0.9) yaxis!(:log10, yminorticks = 9, minorgrid = :y, legend = :topright, minorgridlinewidth = 1.2) ylims!((1/M*10, 2)) xlabel!(L"\sqrt{n}(\hat{\mu}_n-\mu)/\sigma", tickfonthalign = :center) ylabel!("PDF") xlims!((-5, 10)) ylims!((1e-5,2)) yticks!(10.0 .^ (-7:-0)) end ```

robust_n_56_alpha_3p1_delta_0p001_1000000_EMCAHULVMN.svg

Example: Robust nonlinear regression

Let's say you have a nonlinear regression problem $Y = f(u, X) + \epsilon$, where $\epsilon$ can be heavy-tailed or corrupted samples. The function is parametrized by the vector $u$ of parameters you want to adjust.

Traditionally, one would try to solve the following optimization problem

math u^\ast_{\mathrm{EM}} = \mathrm{argmin}_u \dfrac{1}{N}\sum_{i=1}^N (y_i - f(u, x_i))^2

However, this empirical mean could be heavily influenced by data outliers. To perform robust regression, one could use

math u^\ast_{\mathrm{robust}} = \mathrm{argmin}_u \text{RobustMean}\left(\left\{(y_i - f(u, x_i))^2\right\}_{i\in [\![1, N]\!]}\right)

[!NOTE] Note that when $f$ is linear, you can use the dedicated package RobustModels.jl, which has many more robust estimators and a better interface. However, it does lack some of the more theoretical ones written here and most importantly it is currently limited to linear models.

In the following example, we use the Minsker-Ndaoud robust estimator and $f$ is a $\mathrm{relu}$ function. We choose Minsker-Ndaoud because it is compatible with automatic differentiation. Note that Catoni/Huber should also be easily differentiable; for the MoM-based estimator, I am not sure...

First, here is the set up

```julia using RobustMeans using Plots

using Optimization using ForwardDiff

relu(x, a, b) = x > b ? a * (x - b) : zero(x) N = 8*5 X = 100rand(N) atrue = 1 btrue = 20 Y = abs.(relu.(X, atrue, btrue) + 2randn(N))

We manually corrupt the dataset

percentageoutliers = 0.17 noutliers = round(Int, percentageoutliers * length(X)) Y[1:noutliers] = maximum(Y)*rand(n_outliers) .+ minimum(Y)

just so δ is a multiple of the number of data points

δ = 3exp(-8)

u0 = [0.2, 18] p = [X, Y] ```

For comparison, let's try the regular regression

```julia ferrEM(u, p) = mean((relu.(p[1], u[1], u[2]) - p[2]).^2, δ, EmpiricalMean())

optfEM = OptimizationFunction(ferrEM, AutoForwardDiff()) probEM = OptimizationProblem(optf_EM, u0, p)

solEM = solve(probEM, Optimization.LBFGS()) ```

Now the robust regression

```julia ferrR(u, p) = mean((relu.(p[1], u[1], u[2]) - p[2]).^2, δ, MinskerNdaoud(2))

optfR = OptimizationFunction(ferrR, AutoForwardDiff()) probR = OptimizationProblem(optf_R, u0, p)

solR = solve(probR, Optimization.LBFGS()) ```

julia Xl = 0:0.1:100 scatter(X, Y, label="Data") plot!(Xl, relu.(Xl, a_true, b_true), label="True function", lw = 2, s = :dash, c = :black) plot!(Xl, relu.(Xl, sol_EM.u...), lw = 2, label = "Fit EM") plot!(Xl, relu.(Xl, sol_R.u...), lw = 2, label = "Fit Minsker-Ndaoud")

Robust relu regression

Owner

  • Name: David Métivier
  • Login: dmetivie
  • Kind: user
  • Location: Montpellier, France
  • Company: INRAe, MISTEA

I am a research scientist with a physics background. Now, I do statistics to tackle environmental, and climate change problems. Julia enthusiast!

GitHub Events

Total
  • Create event: 1
  • Commit comment event: 1
  • Release event: 1
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 3
Last Year
  • Create event: 1
  • Commit comment event: 1
  • Release event: 1
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 3

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 43
  • Total Committers: 2
  • Avg Commits per committer: 21.5
  • Development Distribution Score (DDS): 0.093
Past Year
  • Commits: 9
  • Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
David Métivier 4****e@u****m 39
CompatHelper Julia c****y@j****g 4
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 5
  • Average time to close issues: less than a minute
  • Average time to close pull requests: 2 months
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 4.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 4
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • JuliaTagBot (1)
Pull Request Authors
  • github-actions[bot] (4)
  • dmetivie (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • julia 3 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
juliahub.com: RobustMeans

Implement some Robust Mean Estimators

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3 Total
Rankings
Dependent repos count: 9.9%
Dependent packages count: 38.9%
Average: 41.6%
Forks count: 53.5%
Stargazers count: 64.2%
Last synced: 5 months ago

Dependencies

.github/workflows/CompatHelper.yml actions
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite