Recent Releases of t-elf

t-elf - v0.0.43

v0.0.43 adds Ocelot sub-module to Cheetah, a module for filtering documents using n-grams; updates AutoBunny to support simpler block-style usage within pipelines; introduces a block for summarizing term matches when applying Ocelot filtering; and includes accompanying example notebooks.

- Python
Published by barronlanl 6 months ago

t-elf - v0.0.42

Version 0.0.42 introduces a new block-based pipeline framework that allows users to design and execute custom data flows across TELF modules, enabling greater flexibility and experimentation with pipeline configurations. This release also includes Lynx, a lightweight Streamlit-based frontend for visualizing post-processing results, offering a way to explore outcomes from scientific literature analysis and link prediction.

- Python
Published by barronlanl 6 months ago

t-elf - v0.0.41

New Pre-Processing Module: Squirrel 🚀

Version 0.0.41 introduces a new pre-processing module called Squirrel, designed for automated document pruning. Squirrel streamlines the process of accepting or rejecting documents by applying predefined rules and thresholds, eliminating the need for manual review. Squirrel supports the use of multiple pruning strategies. In this release, we include both embedding-based pruning and LLM-based pruning: - Embedding-Based Pruning: This method filters documents based on their distance from a reference centroid in embedding space. Only documents within a specified threshold are retained, ensuring higher data quality. - LLM-Based Pruning: Squirrel leverages large language models to further refine the pruning process. It conducts multiple voting trials using LLM evaluations to determine whether a document should be accepted or rejected.

- Python
Published by MaksimEkin 10 months ago

t-elf - v0.0.40

🔧 Vulture Enhancements

  • Expanded Standard Cleaning Functions:
    Added the following to Vulture’s standard text cleaning pipeline:
    • remove_numbers: Removes stand-alone numbers.
    • remove_alphanumeric: Removes mixed alphanumeric terms (e.g., abc123).
    • remove_roman_numerals: Removes Roman numeral listings.

🐆 Cheetah Additions

  • term_generator:

    • Extracts top keywords from cleaned text using TF-IDF.
    • Pairs keywords with nearby support terms based on a co-occurrence matrix.
    • Saves output as a structured markdown file of search terms.
  • CheetahTermFormatter:

    • Parses markdown search term files into structured blocks with optional filters (e.g., positives, negatives).
    • Supports plain string output or category-based filtering.
    • Can generate substitution maps to convert multi-word phrases into underscored versions and back.
  • convert_txt_to_cheetah_markdown:

    • Converts plain .txt files or structured term dictionaries into Cheetah-compatible markdown format.
    • Facilitates easier programmatic creation and editing of search term files.

🧹 Refactoring and Fixes

  • Code Refactoring:

    • Consolidated several duplicated functions across modules into shared helper utilities at a higher level.
  • Bug Fixes:

    • Vulture:
    • Fixed path-saving logic in operator pipelines.
    • Fixed bugs in the NER and Vocabulary Consolidator operators.
    • Beaver:
    • Resolved a file-saving issue that also affected Wolf’s visualization routines.
    • Fixes README under examples to have the correct module links.
  • .gitignore Updates:

    • Added more output files and example notebook directories to .gitignore.

📁 New Example: NM Law Data Pipeline

Added the NM Law Data/ folder, containing the data processing pipeline used in the paper:
“Legal Document Analysis with HNMFk” (arXiv:2502.20364)

  • 00_data_collection/:
    Scrapes and formats legal documents (statutes, constitution, court cases) from Justia.

  • 01_hnmfk_operation/:
    Constructs document-word matrices and runs Hierarchical Nonnegative Matrix Factorization (HNMFk).

  • 02_benchmarking/:
    Evaluates LLM-generated content using factual accuracy, entailment, and summarization metrics.

  • 03_visualizations/:
    Visualizes legal trends, knowledge graphs, and model evaluation results.

- Python
Published by MaksimEkin 10 months ago

t-elf - v0.0.39

🚀 New Features

New Modules Added

  • Fox: Report generation tool for text data from NMFk using OpenAI
  • ArcticFox : Report generation tool for text data from HNMFk using local LLMs
  • SPLIT : Joint NMFk factorization of multiple data via SPLIT
  • SPLITTransfer : Supervised transfer learning method via SPLIT and NMFk

Beaver Enhancements

  • Added support for automatically creating the directory specified by save_path when saving objects.

🐛 Bug Fixes

Beaver: Highlighting & Vocabulary Logic

  • Fixed an issue where tokens used in highlighting but not present in the provided vocabulary would fail to trigger re-vectorization.
  • The logic now ensures that the vocabulary is properly expanded and documents are re-vectorized accordingly.

Beaver: Trailing Newline in Output Files

  • Fixed a bug where an extra newline was added at the end of output text files such as Vocabulary.txt.

HNMFk: Model Loading Path

  • Fixed a bug with incorrect handling of the model path and name when loading an existing model.

Vulture: Module Imports

  • Resolved inconsistent module imports.

Conda Installation

  • Fixed .yml files by adding missing dependencies for proper conda installation.

- Python
Published by MaksimEkin 11 months ago

t-elf - v0.0.38

New Modules

Adds Penguin, Bunny, Peacock, and SeaLion modules:

  • Penguin: Text storage tool.
  • Bunny: Dataset generation tool for documents and their citations/references.
  • Peacock: Data visualization and generation of actionable statistics.
  • SeaLion: Generic report generation tool.

Bugs

  • Fixes query index issue in Cheetah

- Python
Published by MaksimEkin 11 months ago

t-elf - v0.0.37

Adds three new modules for pre-processing text (Orca and iPenguin), and post-processing text (Wolf). - Wolf: Graph centrality and ranking tool. - iPenguin: Online information retrieval tool for Scopus, SemanticScholar, and OSTI. - Orca: Duplicate author detector for text mining and information retrieval.

- Python
Published by MaksimEkin 11 months ago

t-elf - v0.0.36

HNMFk graph post-processing & root node naming - Added the ability to post-process HNMFk graphs based on the number of documents in leaf nodes. - New functions: - model.traverse_tiny_leaf_topics(threshold: int): Identifies outlier clusters where the number of documents is below the given threshold. - model.get_tiny_leaf_topics(): Retrieves tiny leaf nodes (processed separately). - model.process_tiny_leaf_topics(threshold: int): Processes the graph to separate tiny nodes based on the given threshold. - Resetting the graph by setting threshold=None restores the tiny nodes.

  • Added option to specify a root node name in HNMFk using root_node_name="Root".
    • Default is now "Root" instead of "*" to resolve Windows compatibility issues.

Bug(s) - Fixed a bug in Beaver where mismatched indexes caused incorrect highlighting.

- Python
Published by MaksimEkin 12 months ago

t-elf - v0.0.35

  • Fixes a bug with Cheetah on setting the default to empty string.
  • Adds Logistic Matrix Factorization (LMF).
  • Adds developer script to change versioning automatically.
  • Updates documentation.

- Python
Published by MaksimEkin about 1 year ago

t-elf - v0.0.34

Fast-tracking to v0.0.34 from v0.0.20

Enhancements

Pruning Support:

  • Enabled pruning in bnmf, wnmf, and nmf_recommender.
  • Added pruning of additional matrices, e.g., MASK, based on X.
  • Included pruned_cols and pruned_rows in saved outputs.

Matrix Factorization:

  • Introduced new submodule BNMFk under NMFk with nmf_method='bnmf'.
  • Added WEIGHT and MASK keys for WNMFk and BNMFk.
  • Implemented matrix deletion in subroutines to reduce memory consumption.
  • Added factor_thresholding parameter to perform thresholding over NMFk factors, making them boolean. Options include:
    • coord_desc_thresh
    • WH_thresh
  • Introduced factor_thresholding_obj_params for configuring thresholding subroutines.
  • Added clustering_method parameter with options:
    • kmeans
    • bool or boolean (both are equivalent).
  • Introduced clustering_obj_params to configure clustering subroutines.
  • Added new perturbation type for boolean matrices: perturb_type='boolean' or perturb_type='bool'.
  • Updated examples to reflect new boolean-specific features.
  • Path compatibility using os.path.join.

Thresholding and Clustering:

  • Added factor_thresholding_H_regression with options:
    • otsu_thresh
    • coord_desc_thresh
    • kmeans_thresh
  • Default factor_thresholding_H_regression set to kmeans_thresh.
  • Default factor_thresholding set to otsu_thresh.
  • Introduced factor_thresholding_H_regression_obj_params to configure parameters.
  • Added K-means-based boolean thresholding for W and H matrices:
    • Clusters values in each row of W and H into two groups; then the boolean threshold is the midpoint of cluster centroids.

Hardware and Device Management:

  • Added device parameter to NMFk for GPU management:
    • device=-1: Use all GPUs.
    • device=0: Use the GPU with ID 0.
    • device=[0,1,...]: Use a specific list of GPUs.
    • Negative values other than -1: Use (number of GPUs + device + 1).

Hierarchical NMFk (HNMFk) Improvements:

  • Added new variables for nodes:
    • parent_node_factors_path
    • parent_node_k
    • factors_path
  • Enabled dynamic renaming of paths when loading HNMFk models from different directories.
  • Improved decomposition behavior:
    • Nodes with fewer samples than the sample threshold no longer decompose unnecessarily.
  • Added signature, centroid, and probabilities from parent nodes to child nodes.
  • Introduced graph iterator methods for navigating to specific nodes by name.
  • Updated node naming conventions to use ancestor-based indexing.

Result Storage:

  • Added W_all to saved outputs of NMFk.

Installation and Documentation

  • Migrated to a new installation system using pip and Poetry.
  • Added a post-installation script for simplifying setup on different systems.
  • Updated documentation for:
    • New installation methods on Chicoma and Darwin.

Bug Fixes

  • Corrected HNMFk behavior to return total data indices instead of indices of indices.
  • Corrected naming inconsistencies in pruning variables in NMFk.
  • Fixed error calculation to consider only known locations when masking is applied.
  • Resolved GPU transfer conflicts when using MASK.
  • Fixed default device parameter in NMFk to be -1 (use all devices).
  • Addressed issues in WNMFk and BNMFk examples.
  • Fixed checkpointing bugs:
    • Made saving checkpoints true by default.
    • Resolved issues when loading an HNMFk model during an ongoing process.
  • Fixed scalar addition error with sparse matrices in kl_mu.
  • Resolved dependency conflicts with numpy and numba.
  • Updated HPC documentation for T-ELF installation.

- Python
Published by MaksimEkin about 1 year ago

t-elf - v0.0.20

Fixes a bug on HNMFk where the original indices were wrong.

- Python
Published by MaksimEkin over 1 year ago

t-elf - v0.0.19

  • Fixes a bug with HNMFk checkpointing where if continuing from checkpoint on a HPC system, not all nodes would be free on the job queue due to the bug.
  • Fixes a bug with BST post-order search where the order was incorrect.
  • Adds BST in-order search capability. NMFk hyper-parameter changed accordingly:

ksearchmethod : str, optional Which approach to use when searching for the rank or k. The default is "linear".

* ``k_search_method='linear'`` will linearly visit each K given in ``Ks`` hyper-parameter of the ``fit()`` function.
* ``k_search_method='bst_post'`` will perform post-order binary search. When an ideal rank is found, determined by the selected ``predict_k_method``, all lower ranks are pruned from the search space.
* ``k_search_method='bst_pre'`` will perform pre-order binary search. When an ideal rank is found, determined by the selected ``predict_k_method``, all lower ranks are pruned from the search space.
* ``k_search_method='bst_in'`` will perform in-order binary search. When an ideal rank is found, determined by the selected ``predict_k_method``, all lower ranks are pruned from the search space.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.18

  • Fixes a bug where Ks were not organized correctly for BST post and pre order.
  • Fixes a bug for Hsillthresh, now allowing for being able to set threshold at negative values as well.
  • Adds option to use either W sill for k prediction, H sill for k prediction, or both. Selection of the predict_k_method also changes how the BST search is done with k_search_method. Below hyper-parameters for NMFk are modified accordingly:

predictkmethod : str, optional Method to use when performing automatic k prediction. Default is "WH_sill".

python predict_k_method='pvalue' # will use L-Statistics with column-wise error for automatically estimating the number of latent factors. predict_k_method='WH_sill' # will use Silhouette scores from minimum of W and H latent factors for estimating the number of latent factors. predict_k_method='W_sill' # will use Silhouette scores from W latent factor for estimating the number of latent factors. predict_k_method='H_sill' # will use Silhouette scores from H latent factor for estimating the number of latent factors. predict_k_method='sill' # will default to ``predict_k_method='WH_sill'``.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.17

New Features

  • Introduces a new Vulture subclass VocabularyConsolidator, under TELF.pre_processing.Vulture.tokens_analysis, designed to consolidate vocabularies and textual terms.
  • Refactors NMFk, RESCALk, HNMFk, and SymNMFk to enhance modularity. Helper functions are created under TELF.factorization.utilities to modularize the code.
  • Adds a new search criterion for identifying the optimal rank, or K, to NMFk, HNMFk, WNMFk, and RNMFk. This enhancement introduces a significant speedup to each algorithm. The new criterion utilizes a Binary Search Tree to streamline the process of determining the optimal rank, drastically reducing the search space and the time needed for factorization. Additionally, this K search feature is compatible with High Performance Computing (HPC) systems, ensuring that changes in the K search space by any node are synchronized across all nodes. NMFk has been updated to include new hyper-parameters tailored to these search settings.

ksearchmethod : str, optional Which approach to use when searching for the rank or k. The default is "linear". * k_search_method='linear' will linearly visit each K given in Ks hyper-parameter of the fit() function. * k_search_method='bst_post' will perform post-order binary search. When an ideal rank is found with min(W silhouette, H silhouette) >= sill_thresh, all lower ranks are pruned from the search space. * k_search_method='bst_pre' will perform pre-order binary search. When an ideal rank is found with min(W silhouette, H silhouette) >= sill_thresh, all lower ranks are pruned from the search space.

Hsillthresh : float, optional Setting for removing higher ranks from the search space. The default is -1.

When searching for the optimal rank with binary search using k_search='bst_post' or k_search='bst_pre', this hyper-parameter can be used to cut off higher ranks from search space. The cut-off of higher ranks from the search space is based on threshold for H silhouette. When a H silhouette below H_sill_thresh is found for a given rank or K, all higher ranks are removed from the search space. If H_sill_thresh=-1, it is not used.

Bugs

  • Fixes a bug in RESCALk plotting where plotting function was expecting W and H silhouettes.
  • Fixes a bug where k predict would not work if none of the W or H silhouettes are above the sill_thresh hyper-parameter. New fix selects new sill_thresh based on the rule: self.sill_thresh = min([max(sils_min_W), max(sils_min_H)]) when none of the W or H silhouettes are above the sill_thresh hyper-parameter.
  • Fixes a bug in document substitutions of Vulture where an error is raised if no corpus substitutions are passed.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.16

  • Fixes a bug for HPC HNMFk capability when checkpointing would not save if using custom callback functionality.
  • Fixes a bug in the stopwords option in Vulture Clean that excludes hyphens from stop word checks, a boolean in iterable’s place bug.
  • Fixes a bug to flatten the output dictionary in the Vulture Acronyms module, a dictionary iteration bug.
  • Fixes a bug where itertools was missing in permutation import in Vulture material permutations.
  • Fixes a bug in Vulture materials permutations for the save_path definition.
  • Adds Ks range and X shape checks for HNMFk to make sure the decomposition can still be done if using a callback functionality.
  • Adds a feature to include lowercased materials in permutations.
  • Adds future for material permutations.
  • Adds multithread string consolidation in levenshtein.
  • Levenshtein consolidation criteria change from shorest string to most common string.
  • Moves HNMFk leaf node termination, based on sample threshold, to after factorization to obtain the latent factors W and H even for nodes where number of samples are less than the threshold.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.15

  • Fixes a bug where Vulture Acronym Operator edge case producing wrong results when using substitutions.
  • Fixes a bug where Vulture cleaning operations for stop words would not remove hyphenated words if they contain a stop word.
  • Fixes minor bugs where conda environment activation was done wrong in hpc example scripts.
  • Vulture Acronym Operator example notebook to be organized to show when the cleaning is done and when the acronym operation is done with substitutions.
  • Acronym warning message printing class attribute instead of data.
  • Adds HPC capability to HNMFk.
  • Adds checkpointing capability for HNMFk.
  • Adds online node operations for HNMFk, reducing the space taken by graph nodes.
  • Adds per document based substitutions operator feature to Vulture.
  • Adds Levenstein distance based acronym consolidation for post-processing of acronyms.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.14

  • Adds callback functionality to HNMFk for generating new data matrix X at each NMFk application. This allows Semantic HNMFk by re-generating TF-IDF matrix at each node.
  • Adds capability to HNMFk for saving custom user data in each node when using generate_X_callback.
  • Adds taking note for after pruning X shape and Ks range, and if decomposition is no longer possible after pruning by noting prune status.
  • HNMFk now uses Path library to generate sub-directories automatically.
  • Fixed bug where max(Ks) is more than min(X.shape) after pruning in NMFk.
  • Fixed a bug where HNMFk is loading wrong factors when k=2 is True.
  • Fixed a bug where NMFk would try to decompose data after pruning even if not possible (for example if the number of samples left is 1, or K range is empty based on the rule k < min(X.shape).
  • Fixed a bug where Beaver.get_vocabulary() was not consistent with the vocabulary that is generated in the other matrix creation routines.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.13

  • Adds HNMFk. Hierarchical Non-negative matrix factorization with automatic model determination with custom settings including missing value prediction. HNMFk has multi-processing capabilities for both CPU and GPU systems. HPC capabilities for HNMFk is planned to be added later.
  • Fixes a bug on HPC example for WNMFk where number of nodes was not correct in the hyper-parameters.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.12

  • Added ability to plot both silhouttes of latent patterns (W matrix) and the latent clusters (H matrix) to assist selecting the number of hidden patterns and the corresponding number of hidden clusters.
  • predict_k_method default is changed to "sill".
  • NMFk plot will no longer include the blue relative error line when calculate_error=False.
  • New predict_k_method="sill" will predict k based on:
    • The maximum k where W silhoutte is above the threshold sill_thresh: Wk
    • The maximum k where H silhoutte is above the threshold sill_thresh: Hk
    • Final k, or number of hidden signals, will be k=min(Wk, Hk).

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.11

  • Adds acronym identification and substitution for acronyms capability to Vulture.
  • Fixes the dependency list in .yml files for installation.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.10

Adds a new text mining tool named Cheetah for fast search by keywords and phrases.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.9

  • Fixed bug where masking in NMFk was not passed properly to the NMF optimization.
  • Fixed bug where Vulture did not check cleaning steps in dataframe cleaning.
  • Adding ability for operator based module in Vulture. Not Vulture supports cleaning and operator modules.
  • Adding NER operator module to Vulture.

- Python
Published by MaksimEkin almost 2 years ago

t-elf - v0.0.8

  • Fixes a bug where consensus matrix would not fit in MPI communication for large matrices if multi-node factorization is performed.
  • Fixes a bug where consensus matrix calculation would not do unpruning for matrices that are pruned.

- Python
Published by MaksimEkin about 2 years ago

t-elf - v0.0.7

Hot fix for a bug where WNMFk was not updating H latent factor with non-negativity constraint.

- Python
Published by MaksimEkin about 2 years ago

t-elf - v0.0.6

  • Fixes a bug in WNMFk that would result in issues in using GPUs
  • Several Vulture bug fixes:
    • fixed a bug where case sensitivity would affect stopwords removal
    • fixed a bug where the SubstitutionCleaner.lower attribute being set to True would cause the entire document to be converted to lowercase instead of just ignoring case in substitution matching
    • fixed a bug where input substitutions dictionary would be modified by reference in SubstitutionCleaner
    • fixed a bug where empty strings would be output by Vulture.dataframe_clean() (in the case of invalid input documents such as non english text). Now these values are set to np.nan in the output DataFrame

- Python
Published by MaksimEkin about 2 years ago

t-elf - v0.0.5

New Features

  • Adds ability to run TriNMFk without having to run NMFk first.
  • Adds WNMFk for recommendation systems.

Bugs

  • Fixes bug with n_jobs when using perturb multi-processing.

- Python
Published by MaksimEkin about 2 years ago

t-elf - v0.0.4

- Python
Published by MaksimEkin about 2 years ago

t-elf - v0.0.3

Fixes bug with RESCALk.

- Python
Published by MaksimEkin about 2 years ago

t-elf - v0.0.2

Tensor Extraction of Latent Features (T-ELF)

[![Build Status](https://github.com/lanl/T-ELF/actions/workflows/ci_tests.yml/badge.svg?branch=main)](https://github.com/lanl/T-ELF/actions/workflows/ci_tests.yml/badge.svg?branch=main) [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg) [![Python Version](https://img.shields.io/badge/python-v3.11.5-blue)](https://img.shields.io/badge/python-v3.8.5-blue)
### [:information_source: Documentation](https://lanl.github.io/telf/)   [:orange_book: Examples](examples/)   [:page_with_curl: Publications](https://smart-tensors.lanl.gov/publications/)   [:link: Website](https://smart-tensors.LANL.gov)

T-ELF is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets. Acting as a comprehensive toolbox, T-ELF specializes in data pre-processing, extraction of latent features, and structuring results to facilitate informed decision-making. Leveraging high-performance computing and cutting-edge GPU architectures, our toolbox is optimized for analyzing large datasets from diverse set of problems.

Central to T-ELF's core capabilities lie non-negative matrix and tensor factorization solutions for discovering multi-faceted hidden details in data, featuring automated model determination facilitating the estimation of latent factors or rank. This pivotal functionality ensures precise data modeling and the extraction of concealed patterns. Additionally, our software suite incorporates cutting-edge modules for both pre-processing and post-processing of data, tailored for diverse tasks including text mining, Natural Language Processing, and robust tools for matrix and tensor analysis and construction.

T-ELF's adaptability spans across a multitude of disciplines, positioning it as a robust AI and data analytics solution. Its proven efficacy extends across various fields such as Large-scale Text Mining, High Performance Computing, Computer Security, Applied Mathematics, Dynamic Networks and Ranking, Biology, Material Science, Medicine, Chemistry, Data Compression, Climate Studies, Relational Databases, Data Privacy, Economy, and Agriculture.

Installation

Step 1: Install the Library

Option 1: Install via PIP shell conda create --name TELF python=3.11.5 source activate TELF # or <conda activate TELF> pip install git+https://github.com/lanl/T-ELF.git

Option 2: Install from Source shell git clone https://github.com/lanl/T-ELF.git cd T-ELF conda create --name TELF python=3.11.5 source activate TELF # or <conda activate TELF> pip install -e . # or <python setup.py install>

Option 3: Install via Conda shell git clone https://github.com/lanl/T-ELF.git cd T-ELF conda env create --file environment_gpu.yml # use <conda env create --file environment_cpu.yml> for CPU only conda activate TELF_conda conda develop .

Step 2: Install Spacy NLP model and NLTK Packages

shell python -m spacy download en_core_web_lg python -m nltk.downloader wordnet omw-1.4

Step 3: Install Cupy if using GPU (Optional - Skip if used Option 3 in Step 1)

shell conda install -c conda-forge cupy

Step 4: Install MPI if using HPC (Optional)

shell module load <openmpi> # On a HPC Node pip install mpi4py # or <conda install -c conda-forge mpi4py> depending on the system

Jupyter Setup Tutorial for using the examples (Link)

Other Considerations

On some Linux devices, based on how CUDA was configured, you may get an error when using a GPU. Install cudatoolkit to resolve the error: shell conda install cudatoolkit conda install cudnn

Capabilities

Please see our :pagewithcurl: Publications for the capabilities

Modules

TELF.factorization

| Method | Dense | Sparse | GPU | CPU | Multiprocessing | HPC | Description | Example | Release Status | |:-------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:-------------------:|:------------------:|:----------------------------------------------------------------:|:-----------:|:------------------:| | NMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | NMF with Automatic Model Determination | Link | :whitecheckmark: | | Custom NMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | Use Custom NMF Functions with NMFk | Link | :whitecheckmark: | | TriNMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | | NMF with Automatic Model Determination for Clusters and Patterns | Link | :whitecheckmark: | | RESCALk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | RESCAL with Automatic Model Determination | Link | :whitecheckmark: | | RNMFk | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | Recommender NMFk | Link | :whitecheckmark: | | SymNMFk | :heavycheckmark: | | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | :heavycheckmark: | NMFk with Symmetric Clustering | Link | :whitecheckmark: | | BNMFk | | | | | | | Boolean NMFk | | :soon: | | HNMFk | | | | | | | Hierarchical NMFk | | :soon: | | SPLIT NMFk | | | | | | | Joint NMFk factorization of multiple data via SPLIT | | :soon: | | SPLIT Transfer Classifier | | | | | | | Supervised transfer learning method via SPLIT and NMFk | | :soon: | | CP-ALS | | | | | | | Alternating least squares algorithm for canonical polyadic decomposition | | :soon: | | CP-APR | | | | | | | Alternating Poisson regression algorithm for canonical polyadic decomposition | | :soon: | | NTDS_FAPG | | | | | | | Non-negative Tucker Tensor Decomposition | | :soon: |

TELF.pre_processing

| Method | Multiprocessing | HPC | Description | Example | Release Status | |:----------:|:-------------------:|:-------------------:|:------------------------------------------------------------------:|:-----------:|:------------------:| | Vulture | :heavycheckmark: | :heavycheckmark: | Advanced text processing tool for cleaning and NLP | Link | :whitecheckmark: | | Beaver | :heavycheckmark: | :heavycheckmark: | Fast matrix and tensor building tool for text mining | Link | :whitecheckmark: | | iPenguin | | | Online Semantic Scholar information retrieval tool | | :soon: | | Orca | | | Duplicate author detector for text mining and information retrival | | :soon: | | | | | | | |

TELF.post_processing

| Method | Description | Example | Release Status | |:----------:|:----------------------------------------------------------:|:-----------:|:------------------:| | Peacock | Data visualization and generation of actionable statistics | | :soon: | | Wolf | Graph centrality and ranking tool | | :soon: |

TELF.applications

| Method | Description | Example | Release Status | |:----------:|:--------------------------------------------------------------------:|:-----------:|:------------------:| | Cheetah | Fast search by keywords | | :soon: | | Bunny | Dataset generation tool for documents and their citations/references | | :soon: |

How to Cite T-ELF?

If you use T-ELF please cite.

APA: latex Eren, M., Solovyev, N., Barron, R., Bhattarai, M., Boureima, I., Skau, E., Rasmussen, K., & Alexandrov, B. (2023). Tensor Extraction of Latent Features (T-ELF) (Version 0.0.2) [Computer software]. https://github.com/lanl/T-ELF

BibTeX: latex @software{Tensor_Extraction_of_2023, author = {Eren, Maksim and Solovyev, Nick and Barron, Ryan and Bhattarai, Manish and Boureima, Ismael and Skau, Erik and Rasmussen, Kim and Alexandrov, Boian}, month = oct, title = {{Tensor Extraction of Latent Features (T-ELF)}}, url = {https://github.com/lanl/T-ELF}, version = {0.0.2}, year = {2023} }

Authors

  • Maksim Ekin Eren: Advanced Research in Cyber Systems, Los Alamos National Laboratory (Website)
  • Nicholas Solovyev: Theoretical Division, Los Alamos National Laboratory
  • Ryan Barron: Theoretical Division, Los Alamos National Laboratory
  • Manish Bhattarai: Theoretical Division, Los Alamos National Laboratory
  • Ismael Boureima: Theoretical Division, Los Alamos National Laboratory
  • Erik Skau: Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory
  • Kim Rasmussen: Theoretical Division, Los Alamos National Laboratory
  • Boian S. Alexandrov: Theoretical Division, Los Alamos National Laboratory

Patents

Boian ALEXANDROV, o. S. F., New Mexico, Maksim Ekin EREN, of Sante Fe, New Mexico, Manish BHATTARAI, of Albuquerque, New Mexico, Kim Orskov RASMUSSEN of Sante Fe, New Mexico, and Charles K. NICHOLAS, of Columbia, Maryland, (“Assignor”) DATA IDENTIFICATION AND CLASSIFICATION METHOD, APPARATUS, AND SYSTEM. No. 63/472,188. Triad National Security, LLC. (June 9, 2023).

BS. Alexandrov, LB. Alexandrov, and VG. Stanev et al. 2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10,776,718 (2020).

Copyright Notice

© 2022. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.

LANL C Number: C22048

License

This program is open source under the BSD-3 License. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Developer Test Suite

Developer test suites are located under tests/ directory. Tests can be ran from this folder using python -m pytest *.

- Python
Published by MaksimEkin over 2 years ago