t-elf

Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.

https://github.com/lanl/t-elf

Science Score: 85.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Committers with academic emails
    6 of 8 committers (75.0%) from academic institutions
  • Institutional organization owner
    Organization lanl has institutional domain (www.lanl.gov)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.2%) to scientific vocabulary

Keywords

blind-source-separation dimensionality-reduction feature-extraction gpu high-performance-computing hpc latent-variables machine-learning matrix matrix-completion matrix-factorization non-negative-matrix-factorization pattern-extraction semi-supervised-learning tensor-decomposition tensor-factorization tensors text-preprocessing unsupervised-learning
Last synced: 6 months ago · JSON representation ·

Repository

Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.

Basic Info
Statistics
  • Stars: 20
  • Watchers: 4
  • Forks: 6
  • Open Issues: 10
  • Releases: 29
Topics
blind-source-separation dimensionality-reduction feature-extraction gpu high-performance-computing hpc latent-variables machine-learning matrix matrix-completion matrix-factorization non-negative-matrix-factorization pattern-extraction semi-supervised-learning tensor-decomposition tensor-factorization tensors text-preprocessing unsupervised-learning
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Tensor Extraction of Latent Features (T-ELF)

[![Build Status](https://github.com/lanl/T-ELF/actions/workflows/ci_tests.yml/badge.svg?branch=main)](https://github.com/lanl/T-ELF/actions/workflows/ci_tests.yml/badge.svg?branch=main) [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg) [![Python Version](https://img.shields.io/badge/python-v3.11.10-blue)](https://img.shields.io/badge/python-v3.11.10-blue) [![DOI](https://zenodo.org/badge/703212457.svg)](https://zenodo.org/doi/10.5281/zenodo.10257896)

### [:information_source: Documentation](https://lanl.github.io/T-ELF/)   [:orange_book: Examples](examples/)   [:page_with_curl: Publications](https://smart-tensors.lanl.gov/publications/)   [:link: Website](https://smart-tensors.LANL.gov)

T-ELF is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets. Acting as a comprehensive toolbox, T-ELF specializes in data pre-processing, extraction of latent features, and structuring results to facilitate informed decision-making. Leveraging high-performance computing and cutting-edge GPU architectures, our toolbox is optimized for analyzing large datasets from diverse set of problems.

Central to T-ELF's core capabilities lie non-negative matrix and tensor factorization solutions for discovering multi-faceted hidden details in data, featuring automated model determination facilitating the estimation of latent factors or rank. This pivotal functionality ensures precise data modeling and the extraction of concealed patterns. Additionally, our software suite incorporates cutting-edge modules for both pre-processing and post-processing of data, tailored for diverse tasks including text mining, Natural Language Processing, and robust tools for matrix and tensor analysis and construction.

T-ELF's adaptability spans across a multitude of disciplines, positioning it as a robust AI and data analytics solution. Its proven efficacy extends across various fields such as Large-scale Text Mining, High Performance Computing, Computer Security, Applied Mathematics, Dynamic Networks and Ranking, Biology, Material Science, Medicine, Chemistry, Data Compression, Climate Studies, Relational Databases, Data Privacy, Economy, and Agriculture.

Installation

Step 1: Install Poetry to your system

This step is optional. Use Pip or Conda if Poetry is not avaiable.

Step 2: Install the Library

Option 1: Install via Poetry or Pip shell conda create --name TELF python=3.11.10 source activate TELF # or <conda activate TELF> poetry install # or <pip install .>

Option 2: Install via Conda shell git clone https://gitlab.lanl.gov/maksim/telf_internal cd telf_internal conda env create --file environment_gpu.yml # use <conda env create --file environment_cpu.yml> for CPU only conda activate TELF_conda conda develop .

Step 3: Post-installation Dependencies

Next, we need to install the optional and additional dependencies. These include optional dependencies for GPU and HPC capabilities, as well as required dependencies like the SpaCy language models. To view all available options, please run: shell python post_install.py --help Install the additional dependencies: shell python post_install.py # use the following, for example, for GPU system: <python post_install.py --gpu>

Jupyter Setup Tutorial for using the examples (Link)

Capabilities

### Please see our [:page_with_curl: Publications](https://smart-tensors.lanl.gov/publications/) for the capabilities

Modules

TELF.factorization

| Method | Dense | Sparse | GPU | CPU | Multiprocessing | HPC | Description | Example | |:-------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:-------------------:|:------------------:|:----------------------------------------------------------------:|:-----------:| | NMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | NMF with Automatic Model Determination | Link | | Custom NMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | Use Custom NMF Functions with NMFk | Link | | TriNMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | | NMF with Automatic Model Determination for Clusters and Patterns | Link | | RESCALk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | RESCAL with Automatic Model Determination | Link | | RNMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | Recommender NMFk | Link | | SymNMFk | :heavycheckmark: | | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | NMFk with Symmetric Clustering | Link | | WNMFk | :heavycheckmark: | | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | NMFk with weighting - used for recommendation system | Link | | HNMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | Hierarchical NMFk | Link | | BNMFk | :heavycheckmark: | | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | Boolean NMFk | Link | | LMF | :heavycheckmark: | | :heavycheckmark: | :heavycheckmark: | | | Logistic Matrix Factorization | Link | | SPLIT | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | | Joint NMFk factorization of multiple data via SPLIT | Link | | SPLITTransfer | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | | Supervised transfer learning method via SPLIT and NMFk | Link |

TELF.pre_processing

| Method | Multiprocessing | HPC | Description | Example | |:----------:|:-------------------:|:-------------------:|:------------------------------------------------------------------:|:-----------:| | Vulture | :heavycheckmark: | :heavycheckmark: | Advanced text processing tool for cleaning and NLP | Link | | Beaver | :heavycheckmark: | :heavycheckmark: | Fast matrix and tensor building tool for text mining | Link | | iPenguin | :heavycheckmark: | | Online information retrieval tool for Scopus, SemanticScholar, and OSTI | Link | | Orca | :heavycheckmark: | | Duplicate author detector for text mining and information retrieval | Link | | Squirrel | | | Dataset pruning tool for documents | Link |

TELF.post_processing

| Method | Description | Example | |:----------:|:----------------------------------------------------------:|:-----------:| | Wolf | Graph centrality and ranking tool | Link | | Peacock | Data visualization and generation of actionable statistics | Link | | SeaLion | Generic report generation tool | Link | | Fox | Report generation tool for text data from NMFk using OpenAI | Link | | ArcticFox | Report generation tool for text data from HNMFk using local LLMs | Link |

TELF.applications

| Method | Description | Example | |:----------:|:--------------------------------------------------------------------:|:-----------:| | Cheetah | Fast search by keywords and phrases | Link | | Bunny | Dataset generation tool for documents and their citations/references | Link | | Penguin | Text storage tool | Link | | Lynx | Streamlit UI | Link | | Termite | Knowladge graph building tool | :soon: |

Use Cases

| Example | Description | Link | |:----------:|:--------------------------------------------------------------------:|:-----------:| | NM Law Data | Domain specific data for AI and RAG system written in our paper about New Mexico Law that uses the TELF pipeline | Link| | Full TELF Pipeline | An end-to-end pipeline demonstration, from collection to analysis | Link |

How to Cite T-ELF?

If you use T-ELF please cite.

APA: latex Eren, M., Solovyev, N., Barron, R., Bhattarai, M., Truong, D., Boureima, I., Skau, E., Rasmussen, K., & Alexandrov, B. (2023). Tensor Extraction of Latent Features (T-ELF) [Computer software]. https://doi.org/10.5281/zenodo.10257897

BibTeX: latex @software{TELF, author = {Eren, Maksim and Solovyev, Nick and Barron, Ryan and Bhattarai, Manish and Truong, Duc and Boureima, Ismael and Skau, Erik and Rasmussen, Kim and Alexandrov, Boian}, month = oct, title = {{Tensor Extraction of Latent Features (T-ELF)}}, url = {https://github.com/lanl/T-ELF}, doi = {10.5281/zenodo.10257897}, year = {2023} }

Authors

  • Maksim Ekin Eren: Information Systems and Modeling Group, Los Alamos National Laboratory (Website)
  • Nicholas Solovyev: Theoretical Division, Los Alamos National Laboratory
  • Ryan Barron: Theoretical Division, Los Alamos National Laboratory
  • Manish Bhattarai: Theoretical Division, Los Alamos National Laboratory
  • Duc Truong: Theoretical Division, Los Alamos National Laboratory
  • Ismael Boureima: Theoretical Division, Los Alamos National Laboratory
  • Erik Skau: Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory
  • Kim Rasmussen: Theoretical Division, Los Alamos National Laboratory
  • Boian S. Alexandrov: Theoretical Division, Los Alamos National Laboratory

Patents

Boian ALEXANDROV, o. S. F., New Mexico, Maksim Ekin EREN, of Sante Fe, New Mexico, Manish BHATTARAI, of Albuquerque, New Mexico, Kim Orskov RASMUSSEN of Sante Fe, New Mexico, and Charles K. NICHOLAS, of Columbia, Maryland, (“Assignor”) DATA IDENTIFICATION AND CLASSIFICATION METHOD, APPARATUS, AND SYSTEM. No. 63/472,188. Triad National Security, LLC. (June 9, 2023).

BS. Alexandrov, LB. Alexandrov, and VG. Stanev et al. 2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10,776,718 (2020).

Copyright Notice

© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

LANL C Number: C22048

License

This program is open source under the BSD-3 License. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Developer Test Suite

Developer test suites are located under tests/ directory. Tests can be ran from this folder using python -m pytest *.

LANL HPC Installation Notes

Chicoma

```shell

replace with your own path below.

conda create --prefix= python=3.11.10 source activate # or use conda activate <...> pip install . python post_install.py --gpu --hpc-conda ```

Darwin

shell salloc -n 1 -p shared-gpu module load openmpi module load miniconda3 conda create --name TELF python=3.11.10 conda activate TELF # or <source activate TELF> pip install . python post_install.py --gpu --hpc

Owner

  • Name: Los Alamos National Laboratory
  • Login: lanl
  • Kind: organization
  • Email: github-register@lanl.gov
  • Location: Los Alamos, New Mexico, USA

Citation (CITATION.cff)

version: 0.0.43
message: "If you use this software, please cite it as below."
authors:
  - family-names: Eren
    given-names: Maksim
  - family-names: Solovyev
    given-names: Nick
  - family-names: Barron
    given-names: Ryan
  - family-names: Bhattarai
    given-names: Manish
  - family-names: Truong
    given-names: Duc
  - family-names: Boureima
    given-names: Ismael
  - family-names: Skau
    given-names: Erik
  - family-names: Rasmussen
    given-names: Kim
  - family-names: Alexandrov
    given-names: Boian
title: "Tensor Extraction of Latent Features (T-ELF)"
version: 0.0.43
url: https://github.com/lanl/T-ELF
doi: 10.5281/zenodo.10257897
date-released: 2023-12-04

GitHub Events

Total
  • Create event: 15
  • Release event: 9
  • Issues event: 12
  • Watch event: 7
  • Delete event: 7
  • Member event: 1
  • Push event: 21
  • Pull request review event: 3
  • Pull request event: 30
  • Fork event: 3
Last Year
  • Create event: 15
  • Release event: 9
  • Issues event: 12
  • Watch event: 7
  • Delete event: 7
  • Member event: 1
  • Push event: 21
  • Pull request review event: 3
  • Pull request event: 30
  • Fork event: 3

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 276
  • Total Committers: 8
  • Avg Commits per committer: 34.5
  • Development Distribution Score (DDS): 0.457
Past Year
  • Commits: 62
  • Committers: 4
  • Avg Commits per committer: 15.5
  • Development Distribution Score (DDS): 0.29
Top Committers
Name Email Commits
MaksimEkin m****1@u****u 150
maksim m****m@l****v 45
Nicholas Solovyev n****s@.****v 36
Ryan Barron 6****4 17
Ryan Barron b****n@l****v 15
Nick Solovyev n****s@l****v 9
Nick Solovyev 5****k 3
Maksim Ekin Eren m****m@p****v 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 56
  • Total pull requests: 61
  • Average time to close issues: 4 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 3
  • Total pull request authors: 4
  • Average comments per issue: 0.18
  • Average comments per pull request: 0.11
  • Merged pull requests: 57
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 15
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 minutes
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MaksimEkin (67)
  • ryancb4 (19)
  • SoloNick (5)
  • osumpcheng (1)
Pull Request Authors
  • MaksimEkin (71)
  • ryancb4 (12)
  • SoloNick (6)
  • barronlanl (6)
Top Labels
Issue Labels
enhancement (26) bug (22) wontfix (4) documentation (3) testing (2) hot-fix (1)
Pull Request Labels
enhancement (29) bug (26) hot-fix (9) documentation (5)

Dependencies

.github/workflows/ci_tests.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2 composite
requirements.txt pypi
  • gputil *
  • h5py *
  • joblib *
  • mat73 *
  • matplotlib *
  • networkx *
  • nltk *
  • numpy *
  • pandas *
  • pathos *
  • psutil *
  • pytest *
  • scikit-learn *
  • scipy ==1.10.1
  • spacy ==3.7.2
  • sparse *
  • tqdm *
  • treelib *
setup.py pypi