eudirscms
Python3 Implementation of Euclidean and Directional Subspace Constrained Mean Shift (SCMS) Algorithm
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Repository
Python3 Implementation of Euclidean and Directional Subspace Constrained Mean Shift (SCMS) Algorithm
Basic Info
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Euclidean and Directional Subspace Constrained Mean Shift (SCMS) Algorithms
This repository implements both the classical SCMS algorithm (Ozertem and Erdogmus, 2011) with Euclidean data and our proposed SCMS algorithm under the directional data setting via Python3.
- Paper Reference: Y. Zhang and Y.-C. Chen (2023). Linear Convergence of the Subspace Constrained Mean Shift Algorithm: From Euclidean to Directional Data Information and Inference: A Journal of the IMA, 12(1), 210-311. [Arxiv Version]
Requirements
- Python >= 3.6 (earlier version might be applicable).
- NumPy, Matplotlib (especially the Basemap toolkit), pandas, seaborn, SciPy (A speical function scipy.special.iv is used to compute the modified Bessel function of the first kind of real order).
- (Optional) Ray (Fast and simple distributed computing API for Python and Java)
- We provide an guideline of installing the Basemap toolkit on Ubuntu.
Descriptions
Some high-level descriptions of our Python scripts are as follows:
- Density_Example.py: This script contains the code for plotting the contour lines and (principal/subspace constrained) gradient flows of the example function (Figure 2 in the paper).
- DirSCMS_fun.py: This script implements the functions of directional KDE and subspace constrained mean shift (SCMS) algorithm with the von Mises kernel.
- Drawback_Eu.py: This script contains code for comparing Euclidean KDE with directional KDE as well as comparing Euclidean subspace constrained mean shift (SCMS) with our proposed directional SCMS algorithm on simulated datasets in order to illustrate the drawbacks of Euclidean KDE and SCMS algorithm in handling directional data (Figures 9 and 10 in the paper).
- EarthquakeRidgesOrg.py: This script contains code for our applications of Euclidean and directional SCMS algorithms to the earthquake data (Figure 6 in the paper) and their comparisons with the real boundaries of tectonic plates. This script may take around half an hour to execute, depending on the actual computing platform.
- EarthquakeRidgesFast.py: This script contains code for our applications of Euclidean and directional SCMS algorithms to the earthquake data (Figure 6 in the paper) and their comparisons with the real boundaries of tectonic plates. Different from the corresponding script EarthquakeRidgesOrg.py, the two SCMS algorithms are parallelized by Ray. This script takes only several minutes to run on my laptop with 8 CPU cores.
- EuDirRidges.py: This script contains code for applying the Euclidean and directional subspace constrained mean shift (SCMS) algorithm to the simulated datasets (Figure 1 in the paper).
- LC_plots.py: This script contains code for empirically verifying the linear convergence of the Euclidean and directional SCMS algorithms (Figures 4 and 5 in the paper).
- MSSCMSRay.py: This script contains code for the parallel implementations of Euclidean/directional mean shift and SCMS algorithms.
- SCMSLogDensity_Comp.py: This script contains code for empirically demonstrating that the (Euclidean/directional) SCMS algorithms with the logarithm of the estimated densities are faster than their counterparts with the original densities (Figure 7 in the paper).
- SCMS_fun.py: This script contains code for Euclidean KDE and subspace constrained mean shift (SCMS) algorithm with Gaussian kernel.
- Utility_fun.py: This script contains all the utility functions for our experiments.
Euclidean Mean Shift and SCMS Algorithms
Given a random sample , the (Euclidean) kernel density estimator (KDE) is defined as
where
is a normalizing constant,
is the smoothing bandwidth parameter, and
is called the profile of a radially symmetric kernel. Some well-known choices of the profile function are
for the multivariate gaussian kernel,
for the Epanechnikov kernel, etc. The Euclidean mean shift algorithm has the following iteration formula:
with , and the Euclidean SCMS algorithm iterates the following equation:
with , where
has its columns equal to the orthonormal eigenvectors of
associated with its
smallest eigenvalues. Here,
is the intrinsic dimension of the estimated Euclidean density ridge.
Directional Mean Shift and SCMS Algorithms
While the above Euclidean SCMS algorithm has been widely used in various fields, it exhibits some salient drawbacks in dealing with directional data , where
; see Fig 1 below for the ridge-finding case and our paper for more details. Under the directional data scenario, the directional KDE is formulated as:
where is a directional kernel function,
is the smoothing bandwidth parameter,
is a normalizing constant to ensure that
is a probability density function. One famous example of the directional kernel function is the so-called von Mises kernel
. Then, the directional mean shift algorithm has a fixed-point iteration formula:
with , where
is the total gradient of the directional KDE computed in the ambient Euclidean space
; see Zhang and Chen (2020) for its detailed derivation and https://github.com/zhangyk8/DirMS for its Python3 implementation. Our proposed directional SCMS algorithm is built upon the directional mean shift formula and iterates the following procedure:
with , where
has its columns equal to the orthonormal eigenvectors of the (estimated) Riemannian Hessian
associated with its
smallest eigenvalues with the tangent space
. Here,
is the intrinsic dimension of the estimated directional density ridge as a submanifold of the unit hypersphere
.
The implementations of Euclidean and directional SCMS algorithms are encapsulated into two Python function called SCMS_KDE in the script SCMS_fun.py and SCMS_DirKDE in the script DirSCMS_fun.py, respectively. However, in our applications of these two algorithms, we use their log-density versions SCMS_Log_KDE and SCMS_Log_DirKDE in the corresponding scripts. The input arguments for the functions SCMS_Log_KDE and SCMS_Log_DirKDE are essentially the same; thus, we combine the descriptions of their arguments as follows:
def SCMS_Log_KDE(mesh_0, data, d=1, h=None, eps=1e-7, max_iter=1000, stop_cri='proj_grad')
def SCMS_Log_DirKDE(mesh_0, data, d=1, h=None, eps=1e-7, max_iter=1000, stop_cri='proj_grad')
- Parameters:
- mesh0: a (m,D)-array
---- The Euclidean coordinates of m directional initial points in the D-dimensional Euclidean space.
- data: a (n,D)-array
---- The Euclidean coordinates of n directional data sample points in the D-dimensional Euclidean space.
- d: int
---- The order of the density ridge. (Default: d=1.)
- h: float
---- The bandwidth parameter. (Default: h=None. Then a rule of thumb for directional KDEs with the von Mises kernel in Garcia-Portugues (2013) is applied.)
- eps: float
---- The precision parameter. (Default: eps=1e-7.)
- maxiter: int
---- The maximum number of iterations for the directional SCMS algorithm on each initial point. (Default: maxiter=1000.)
- stopcri: string ('projgrad'/'ptsdiff')
---- The indicator of which stopping criteria that will be used to terminate the SCMS algorithm. (When stopcri='ptsdiff', the errors between two consecutive iteration points need to be smaller than 'eps' for terminating the algorithm. When stopcri='projgrad' or others, the projected/principal (Riemannian) gradient of the current point need to be smaller than 'eps' for terminating the algorithm.) (Default: stopcri='projgrad'.)
- Return:
- SCMS_path: (m,D,T)-array ---- The entire iterative SCMS sequence for each initial point.
Example code: ```bash import matplotlib.pyplot as plt import numpy as np from mpltoolkits.basemap import Basemap from Utilityfun import cart2sph, CirSphsamp from SCMSfun import SCMSLogKDE from DirSCMSfun import SCMSLogDirKDE
np.random.seed(111) ## Set an arbitrary seed for reproducibility
Sampling the points on a circle that crosses through the north and south poles
cirsamp = CirSphsamp(1000, latc=0, sigma=0.2, pvax=np.array([1,0,0])) lonc, latc, r = cart2sph(*cirsamp.T) cirsampang = np.concatenate((lonc.reshape(len(lonc),1), latc.reshape(len(latc),1)), axis=1) bwDir = None bwEu = None
Apply the directional and Euclidean SCMS algorithms
SCMSDirlog2 = SCMSLogDirKDE(cirsamp, cirsamp, d=1, h=bwDir, eps=1e-7, maxiter=5000) Dirridgelog2 = SCMSDirlog2[:,:,SCMSDirlog2.shape[2]-1]
SCMSEulog2 = SCMSLogKDE(cirsampang, cirsampang, d=1, h=bwEu, eps=1e-7, maxiter=5000) Euridgelog2 = SCMSEulog2[:,:,SCMSEulog2.shape[2]-1]
fig = plt.figure(figsize=(14,8)) lont = np.concatenate([90np.ones(50,), -90np.ones(50,)]) latt = np.concatenate([np.linspace(-90, 90, 50), np.linspace(90, -90, 50)]) lonc, latc, r = cart2sph(cirsamp.T) lonrEu = Euridgelog2[:,0] latrEu = Euridgelog2[:,1] lonrDir, latr_Dir, r = cart2sph(Dirridgelog2.T)
Set up map projection
m1 = Basemap(projection='hammer', llcrnrlon=-180, urcrnrlon=180, llcrnrlat=-90, urcrnrlat=90, resolution='c', lon_0=0)
Draw lat/lon grid lines every 30 degrees.
m1.drawmeridians(np.arange(-180, 180, 30)) m1.drawparallels(np.arange(-90, 90, 30))
Compute native map projection coordinates of lat/lon grid.
x, y = m1(lonc, latc) xt, yt = m1(lont, latt) xEu, yEu = m1(lonrEu, latrEu) xDir, yDir = m1(lonrDir, latrDir)
Scatter plots over the map.
cs = m1.scatter(x, y, facecolors='none', edgecolors='black', s=20) cs = m1.plot(xt, yt, color='blue', linewidth=4, alpha=0.5) cs = m1.scatter(xEu, yEu, color='darkgreen', s=25, alpha=1) cs = m1.scatter(xDir, yDir, color='red', s=35, alpha=0.7) fig.savefig('./Figures/Output.png') ```
Fig 1. An illustration of Euclidean and directional SCMS algorithms applied to a simulated dataset with an underlying circular structure on the sphere. (Here, The red points represent the estimated directional ridge identified by our directional SCMS algorithm. The green points indicate the estimated ridge obtained by the Euclidean SCMS algorithm. And the blue curve exhibits the true circular structure.)
Additional References
- U. Ozertem and D. Erdogmus (2011). Locally Defined Principal Curves and Surfaces. Journal of Machine Learning Research 12 1249-1286.
- Y. Zhang and Y.-C. Chen (2021). Kernel Smoothing, Mean Shift, and Their Learning Theory with Directional Data. Journal of Machine Learning Research 22(154), 1-92.
- E. Garcı́a-Portugués (2013). Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electronic Journal of Statistics 7 1655–1685.
- Y.-C. Chen, C. Genovese, and L. Wasserman (2016). A comprehensive approach to mode clustering. Electronic Journal of Statistics 10(1) 210-241.
Owner
- Name: Yikun Zhang
- Login: zhangyk8
- Kind: user
- Location: Guangzhou, China / Seattle, USA
- Company: University of Washington, Seattle
- Repositories: 4
- Profile: https://github.com/zhangyk8
Citation (citation.bib)
@article{zhang2023linear,
title={Linear convergence of the subspace constrained mean shift algorithm: from Euclidean to directional data},
author={Zhang, Yikun and Chen, Yen-Chi},
journal={Information and Inference: A Journal of the IMA},
volume={12},
number={1},
pages={210--311},
year={2023},
publisher={Oxford University Press}
}
GitHub Events
Total
- Push event: 5
Last Year
- Push event: 5