NMFMerge

Merging components in nonnegative matrix factorization

https://github.com/holylab/nmfmerge.jl

Science Score: 72.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: ieee.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
    Organization holylab has institutional domain (holylab.wustl.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords from Contributors

hack meshing standardization ode bridges pipeline-testing datacleaner data-profilers interpretability pinn
Last synced: 6 months ago · JSON representation ·

Repository

Merging components in nonnegative matrix factorization

Basic Info
  • Host: GitHub
  • Owner: HolyLab
  • License: mit
  • Language: Julia
  • Default Branch: main
  • Size: 1.38 MB
Statistics
  • Stars: 6
  • Watchers: 3
  • Forks: 0
  • Open Issues: 3
  • Releases: 2
Created almost 2 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

NMFMerge

Build Status Coverage

This package implements the technique in the paper An optimal pairwise merge algorithm improves the quality and consistency of nonnegative matrix factorization. It is used to project Non-negative matrix factorization(NMF) solutions from a high-dimensional space to lower dimensional space by optimally and sequentially merging NMF component pairs.

This approach is motivated by the idea that convergence of NMF becomes poor when one is forced to make difficult tradeoffs in describing different features of the data matrix; thus, performing an initial factorization with an excessive number of components grants the opportunity to escape such constraints and reliably describe the full behavior of the data matrix. Later, any redundant or noisy components are identified and merged together.

The concept of NMF-Merge in an illustrative example: Sample Figure

The data matrix is $\mathbf{X}=\mathbf{WH}+\mathbf{N}$, where $\mathbf{W}$ and $\mathbf{H}$ are rank-5 and $\mathbf{N}$ denotes the noise. The 'Good' and 'Bad' factorizations (blue box) represent two local minima reached by rank-5 NMF across 1000 different initializations, while the factorizations in the green boxes denote minima identified by NMF with $r\ge 6$. (These minima were found from every random initialization tested, but uniqueness is not required.) The higher-rank solutions can be merged to produce a rank 5 factorization using NMFMerge. Thus, by first identifying a higher-rank NMF, and then merging to lower rank, you can more reliably identify high-quality solutions. We demonstrate this experimentally in the linked manuscript.

Let's start with a simple demo:

Install the package: type ] at the julia> prompt to enter pkg> mode, and type julia pkg> add NMFMerge;

We'll use the following ground truth

math \begin{align} \begin{aligned} \mathbf{W} = \begin{pmatrix} 6 & 0 & 4 & 9 \\ 0 & 4 & 8 & 3 \\ 4 & 4 & 0 & 7 \\ 9 & 1 & 1 & 1 \\ 0 & 3 & 0 & 4 \\ 8 & 1 & 4 & 0 \\ 0 & 0 & 4 & 2 \\ 0 & 9 & 5 & 5 \end{pmatrix}, \quad \mathbf{H}^{\mathrm{T}} = \begin{pmatrix} 6 & 0 & 3 & 4 \\ 10 & 10 & 5 & 9 \\ 8 & 2 & 0 & 10 \\ 2 & 9 & 2 & 7 \\ 0 & 10 & 4 & 7 \\ 1 & 6 & 0 & 0 \\ 2 & 0 & 0 & 0 \\ 10 & 0 & 8 & 0 \end{pmatrix} \end{aligned} \end{align} julia using NMF, GsvdInitialization using NMFMerge

Packages: NMF, GsvdInitialization

julia julia> X = W*H 8×8 Matrix{Int64}: 84 161 138 83 79 6 12 92 36 107 38 73 93 24 0 64 52 143 110 93 89 28 8 40 61 114 84 36 21 15 18 98 16 66 46 55 58 18 0 0 60 110 66 33 26 14 16 112 20 38 20 22 30 0 0 32 35 160 68 126 145 54 0 40 Running NMF (HALS algorithm) on $\mathbf{X}$ with NNDSVD initialization

julia julia> f = svd(X); julia> result_hals = nnmf(float(X), 4; init=:nndsvd, alg=:cd, initdata=f, maxiter = 10^12, tol = 1e-4); julia> result_hals.objvalue/sum(abs2, X) 0.00019519131697246967

Running NMF Merge on $\mathbf{X}$ with NNDSVD initialization julia julia> result_renmf = nmfmerge(float(X), 5=>4; alg = :cd, maxiter = max_iter); julia> result_renmf.objvalue/sum(abs2, X); 0.00010318497977267333 The relative fitting error between NMF solution and ground truth of NMFMerge is about half that of standard NMF. Thus, NMFMerge helps NMF converge to a better local minimum.

The comparison between standard NMF(HALS) and NMFMerge: Sample Figure

Consistent with the conclusion from the comparision of ralative fitting error, the figure suggests that the results of NMFMerge(Brown) fits the ground truth(Green) better than standard NMF(Magenta). (At 44 points out of 64 points, NMFMerge results are closer to the ground truth.)


Functions

nmfmerge(X, ncomponents; tolfinal=1e-4, tolintermediate=sqrt(tol_final), W0=nothing, H0=nothing, kwargs...) This function performs "NMF-Merge" on 2D data matrix X.

Arguments:

ncomponents::Pair{Int,Int}: in the form of n1 => n2, merging from n1 components to n2components, where n1 is the number of components for overcomplete NMF, and n2 is the number of components for initial and final NMF.

Alternatively, ncomponents can be an integer denoting the final number of components. In this case, nmfmerge defaults to an approximate 20% component excess before merging.

Keyword arguments:

tol_final: The tolerence of final NMF, default: $10^{-4}$

tol_intermediate: The tolerence of initial and overcomplete NMF, default: $\sqrt{\mathrm{tol\_final}}$

W0: initialization of initial NMF, default: nothing

H0: initialization of initial NMF, default: nothing

If one of W0 and H0 is nothing, NNDSVD is used for initialization.

Other keywords arguments are passed to NMF.nnmf.


Suppose you have the NMF solution W and H with r componenents, colmerge2to1pq function can merge r components to ncomponents. The details of this function is:

colmerge2to1pq(W, H, n)

This function merges components in W and H (columns in W and rows in H) from original number of components to n components (ncolumns and rows left in W and H respectively).

To use this function: Wmerge, Hmerge, mergeseq = colmerge2to1pq(W, H, n), where Wmerge and Hmerge are the merged results with n components. mergeseq is the sequence of merge pair ids (id1, id2), which is the components id of single merge.


Before merging components, the columns in W are required to be normalized to 1. The normalization can be realized by colnormalize function or any other method you like.

colnormalize(W, H, p=2)

This function normalize ||W[:, i]||_p = 1 for i in 1:size(W, 2). Our manuscript uses p=2 throughout.

To use this function: Wnormalized, Hnormalized = colnormalize(W, H, p)


If you already have a merge sequence and want to merge from size(W, 2) components to n components, you can use the function:

mergecolumns(W, H, mergeseq; tracemerge)

keyword argurment tracemerge: save Wmerge and Hmerge at each merge stage if tracemerge=true. default tracemerge=false.

To use this function:

Wmerge, Hmerge, WHstage, Err = mergecolumns(W, H, mergeseq; tracemerge), where Wmerge and Hmerge are the merged results. WHstage::Vector{Tuple{Matrix, Matrix}} includes the results of each merge stage. WHstage=[] if tracemerge=false. Err::Vector includes merge penalty of each merge stage.

Citation

Thanks for citing this work! See the "Cite this repository" link in the "About" bar for format options.

Owner

  • Name: Holy Lab
  • Login: HolyLab
  • Kind: organization
  • Email: holy@wustl.edu
  • Location: Washington University in St. Louis

None

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  An optimal pairwise merge algorithm improves the quality
  and consistency of nonnegative matrix factorization
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Youdong
    family-names: Guo
    orcid: 'https://orcid.org/0009-0007-7787-3722'
  - given-names: Timothy E.
    family-names: Holy
    orcid: 'https://orcid.org/0000-0002-2429-1071'
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2408.09013'
    description: The ArXiv deposit of the encompassing paper
  - type: doi
    value: 10.48550/arXiv.2408.09013
repository-code: 'https://github.com/HolyLab/NMFMerge.jl'
abstract: >-
  Non-negative matrix factorization (NMF) is a key technique
  for feature extraction and widely used in source
  separation. However, existing algorithms may converge to
  poor local minima, or to one of several minima with
  similar objective value but differing feature
  parametrizations. Additionally, the performance of NMF
  greatly depends on the number of components, but choosing
  the optimal count remains a challenge. Here we show that
  some of these weaknesses may be mitigated by performing
  NMF in a higher-dimensional feature space and then
  iteratively combining components with an
  analytically-solvable pairwise merge strategy.
  Experimental results demonstrate our method helps NMF
  achieve better local optima and greater consistency of the
  solutions. Iterative merging also provides an efficient
  and informative framework for choosing the number of
  components. Surprisingly, despite these extra steps, our
  approach often improves computational performance by
  reducing the occurrence of ``convergence stalling'' near
  saddle points. This can be recommended as a preferred
  approach for most applications of NMF.

GitHub Events

Total
  • Create event: 6
  • Commit comment event: 5
  • Issues event: 2
  • Release event: 1
  • Watch event: 6
  • Delete event: 5
  • Issue comment event: 5
  • Push event: 17
  • Pull request review comment event: 5
  • Pull request review event: 9
  • Pull request event: 15
Last Year
  • Create event: 6
  • Commit comment event: 5
  • Issues event: 2
  • Release event: 1
  • Watch event: 6
  • Delete event: 5
  • Issue comment event: 5
  • Push event: 17
  • Pull request review comment event: 5
  • Pull request review event: 9
  • Pull request event: 15

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 32
  • Total Committers: 6
  • Avg Commits per committer: 5.333
  • Development Distribution Score (DDS): 0.625
Past Year
  • Commits: 19
  • Committers: 5
  • Avg Commits per committer: 3.8
  • Development Distribution Score (DDS): 0.579
Top Committers
Name Email Commits
youdongguo 4****o 12
youdongguo 1****7@q****m 7
Dae Woo Kim k****3@w****u 5
dependabot[bot] 4****] 3
Tim Holy t****y@g****m 3
github-actions[bot] 4****] 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 34
  • Average time to close issues: about 4 hours
  • Average time to close pull requests: 8 days
  • Total issue authors: 2
  • Total pull request authors: 5
  • Average comments per issue: 1.33
  • Average comments per pull request: 0.5
  • Merged pull requests: 25
  • Bot issues: 0
  • Bot pull requests: 11
Past Year
  • Issues: 1
  • Pull requests: 15
  • Average time to close issues: N/A
  • Average time to close pull requests: 16 days
  • Issue authors: 1
  • Pull request authors: 5
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.13
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 4
Top Authors
Issue Authors
  • timholy (2)
  • JuliaTagBot (1)
Pull Request Authors
  • youdongguo (19)
  • github-actions[bot] (7)
  • timholy (6)
  • dependabot[bot] (5)
  • kdw503 (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (5) github_actions (1)

Packages

  • Total packages: 1
  • Total downloads:
    • julia 1 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
juliahub.com: NMFMerge

Merging components in nonnegative matrix factorization

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1 Total
Rankings
Dependent repos count: 3.2%
Downloads: 3.8%
Average: 7.8%
Dependent packages count: 16.3%
Last synced: 7 months ago

Dependencies

.github/workflows/CI.yml actions
  • actions/checkout v4 composite
  • codecov/codecov-action v4 composite
  • julia-actions/cache v2 composite
  • julia-actions/julia-buildpkg v1 composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest v1 composite
  • julia-actions/setup-julia v2 composite
.github/workflows/CompatHelper.yml actions
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite