https://github.com/dmetivie/robustmeans.jl
Implement some Robust Mean Estimators
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Keywords
Repository
Implement some Robust Mean Estimators
Basic Info
Statistics
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 4
Topics
Metadata Files
README.md
RobustMeans
The aim of this package is to implement in Julia some robust mean estimators (one-dimensional for now). See Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey or The Robust Randomized Quasi Monte Carlo method, applications to integrating singular functions for recent surveys.
[!NOTE] Computing the empirical mean over a data set is one of the most common operations in data analysis. However, this operation is not robust to outliers or contaminated samples. Robust mean estimators are mean estimators that are
robust(in some sense) against such outliers or contaminated samples.
I am currently trying some stuff on the package about "robust moving average"
Example: Comparing robust estimator vs Empirical Means
julia
using Distributions
using RobustMeans
Generate samples
julia
n = 8 * 7
M = 10^5 # M = 10^7 is used for the plot
α = 3.1
distribution = Pareto(α)
μ = mean(distribution) # True mean
σ = std(distribution) # True std
x = rand(distribution, M, n) # M realizations of samples of size n
Estimate the mean with different estimators
```julia
Store all the realizations into a Dictionary
p = 1 # Parameter of Minsker-Ndaoud δ = 3exp(-8) # 0.001 estimators = [EmpiricalMean(), Catoni(σ), Huber(σ), LeeValiant(), MinskerNdaoud(p)] short_names = ["EM", "CA", "HU", "LV", "MN"] estimates = Dict{MeanEstimator,Vector}() for estimator in estimators estimates[estimator] = [mean(r, δ, estimator) for r in eachrow(x)] end ```
Results
Code for the plot
```julia using StatsPlots, LaTeXStrings gr() plot_font = "Computer Modern" # To have nice LaTeX font plots. default( fontfamily = plot_font, linewidth = 2, label = nothing, grid = true, framestyle = :default ) ``` ```julia begin plot(thickness_scaling = 2, size = (1000, 600)) plot!(Normal(), label = L"\mathcal{N}(0,1)", c = :black, alpha = 0.6) for (ns, s) in enumerate(estimators) W = √(n) * (estimates[s] .- μ) / σ stephist!(W, alpha = 0.6, norm = :pdf, label = short_names[ns], c = ns) vline!([quantile(W, 1-δ)], s = :dot, c = ns) end vline!([0], label = :none, c = :black, lw = 1, alpha = 0.9) yaxis!(:log10, yminorticks = 9, minorgrid = :y, legend = :topright, minorgridlinewidth = 1.2) ylims!((1/M*10, 2)) xlabel!(L"\sqrt{n}(\hat{\mu}_n-\mu)/\sigma", tickfonthalign = :center) ylabel!("PDF") xlims!((-5, 10)) ylims!((1e-5,2)) yticks!(10.0 .^ (-7:-0)) end ```Example: Robust nonlinear regression
Let's say you have a nonlinear regression problem $Y = f(u, X) + \epsilon$, where $\epsilon$ can be heavy-tailed or corrupted samples. The function is parametrized by the vector $u$ of parameters you want to adjust.
Traditionally, one would try to solve the following optimization problem
math
u^\ast_{\mathrm{EM}} = \mathrm{argmin}_u \dfrac{1}{N}\sum_{i=1}^N (y_i - f(u, x_i))^2
However, this empirical mean could be heavily influenced by data outliers. To perform robust regression, one could use
math
u^\ast_{\mathrm{robust}} = \mathrm{argmin}_u \text{RobustMean}\left(\left\{(y_i - f(u, x_i))^2\right\}_{i\in [\![1, N]\!]}\right)
[!NOTE] Note that when $f$ is linear, you can use the dedicated package RobustModels.jl, which has many more robust estimators and a better interface. However, it does lack some of the more theoretical ones written here and most importantly it is currently limited to linear models.
In the following example, we use the Minsker-Ndaoud robust estimator and $f$ is a $\mathrm{relu}$ function. We choose Minsker-Ndaoud because it is compatible with automatic differentiation. Note that Catoni/Huber should also be easily differentiable; for the MoM-based estimator, I am not sure...
First, here is the set up
```julia using RobustMeans using Plots
using Optimization using ForwardDiff
relu(x, a, b) = x > b ? a * (x - b) : zero(x) N = 8*5 X = 100rand(N) atrue = 1 btrue = 20 Y = abs.(relu.(X, atrue, btrue) + 2randn(N))
We manually corrupt the dataset
percentageoutliers = 0.17 noutliers = round(Int, percentageoutliers * length(X)) Y[1:noutliers] = maximum(Y)*rand(n_outliers) .+ minimum(Y)
just so δ is a multiple of the number of data points
δ = 3exp(-8)
u0 = [0.2, 18] p = [X, Y] ```
For comparison, let's try the regular regression
```julia ferrEM(u, p) = mean((relu.(p[1], u[1], u[2]) - p[2]).^2, δ, EmpiricalMean())
optfEM = OptimizationFunction(ferrEM, AutoForwardDiff()) probEM = OptimizationProblem(optf_EM, u0, p)
solEM = solve(probEM, Optimization.LBFGS()) ```
Now the robust regression
```julia ferrR(u, p) = mean((relu.(p[1], u[1], u[2]) - p[2]).^2, δ, MinskerNdaoud(2))
optfR = OptimizationFunction(ferrR, AutoForwardDiff()) probR = OptimizationProblem(optf_R, u0, p)
solR = solve(probR, Optimization.LBFGS()) ```
julia
Xl = 0:0.1:100
scatter(X, Y, label="Data")
plot!(Xl, relu.(Xl, a_true, b_true), label="True function", lw = 2, s = :dash, c = :black)
plot!(Xl, relu.(Xl, sol_EM.u...), lw = 2, label = "Fit EM")
plot!(Xl, relu.(Xl, sol_R.u...), lw = 2, label = "Fit Minsker-Ndaoud")

Owner
- Name: David Métivier
- Login: dmetivie
- Kind: user
- Location: Montpellier, France
- Company: INRAe, MISTEA
- Website: http://www.cmap.polytechnique.fr/~david.metivier/
- Repositories: 5
- Profile: https://github.com/dmetivie
I am a research scientist with a physics background. Now, I do statistics to tackle environmental, and climate change problems. Julia enthusiast!
GitHub Events
Total
- Create event: 1
- Commit comment event: 1
- Release event: 1
- Watch event: 1
- Issue comment event: 1
- Push event: 3
Last Year
- Create event: 1
- Commit comment event: 1
- Release event: 1
- Watch event: 1
- Issue comment event: 1
- Push event: 3
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| David Métivier | 4****e@u****m | 39 |
| CompatHelper Julia | c****y@j****g | 4 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 5
- Average time to close issues: less than a minute
- Average time to close pull requests: 2 months
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 4.0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JuliaTagBot (1)
Pull Request Authors
- github-actions[bot] (4)
- dmetivie (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- julia 3 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
juliahub.com: RobustMeans
Implement some Robust Mean Estimators
- Documentation: https://docs.juliahub.com/General/RobustMeans/stable/
- License: MIT
-
Latest release: 0.1.4
published 9 months ago
Rankings
Dependencies
- JuliaRegistries/TagBot v1 composite