Updated 6 months ago

kitsune • Rank 4.2 • Science 85%

Kitsune is a next-generation data steward and harmonization tool.

Updated 6 months ago

text2vec • Rank 17.9 • Science 54%

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Updated 6 months ago

https://github.com/lancedb/lance • Rank 28.1 • Science 36%

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Updated 6 months ago

knrscore • Rank 4.9 • Science 54%

KNRScore is a Python package for computing K-Nearest-Rank Similarity, a metric that quantifies local structural similarity between two maps or embeddings.

Updated 6 months ago

word2vecelastic • Rank 0.7 • Science 54%

Collect sentences from ElasticSearch, preprocess and train diachronic Word2Vec models

Updated 6 months ago

ragoon • Rank 9.3 • Science 44%

High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡

Updated 6 months ago

node2vec • Rank 20.0 • Science 33%

Implementation of the node2vec algorithm.

Updated 6 months ago

graph • Rank 8.9 • Science 44%

GPU-accelerated force graph layout and rendering

Updated 6 months ago

llama_ros • Rank 7.2 • Science 44%

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

Updated 6 months ago

marqo-fashionclip • Rank 6.4 • Science 44%

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Updated 6 months ago

questea • Rank 1.4 • Science 39%

QuestionnaireEmbeddingsAnalysis - innovative approach to extracting richer information from clinical survey

Updated 6 months ago

word2vec • Rank 15.1 • Science 23%

Distributed Representations of Words using word2vec

Updated 6 months ago

https://github.com/amazon-science/supervised-intent-clustering • Rank 2.1 • Science 26%

This is a package to fine-tune language models in order to create clustering-friendly embeddings.

Updated 6 months ago

https://github.com/dru-mara/evalne-gui • Rank 3.6 • Science 23%

EvalNE-GUI: The Graphical User Interface for EvalNE

Updated 6 months ago

doc2vec • Rank 10.8 • Science 10%

Distributed Representations of Sentences and Documents

Updated 6 months ago

jupyter-scatter-tutorial • Science 44%

Jupyter Scatter Tutorial (that was first presented at SciPy '23)

Updated 6 months ago

lorann • Science 36%

Approximate Nearest Neighbor search using reduced-rank regression, with extremely fast queries, tiny memory usage, and rapid indexing on modern vector embeddings.

Updated 6 months ago

https://github.com/cuc-zihang-liu/text-based-rag-framework • Science 26%

基于文本的RAG框架(多种编码器组合)

Updated 6 months ago

tsde • Science 41%

TSDE is a novel SSL framework for TSRL, the first of its kind, effectively harnessing a diffusion process, conditioned on an innovative dual-orthogonal Transformer encoder architecture with a crossover mechanism, and employing a unique IIF mask strategy (KDD 2024, main research track).

Updated 6 months ago

https://github.com/aida-ugent/debayes • Science 26%

DeBayes: a Bayesian Method for Debiasing Network Embeddings (ICML 2020).

Updated 6 months ago

geospatial-rag • Science 26%

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

Updated 6 months ago

pkgmatch • Science 26%

Find R packages matching either descriptions or other R packages

Updated 6 months ago

clep • Science 67%

🤖 A Python Package for generating new patient representations driven by data and prior knowledge

Scientific Software
Updated 6 months ago

DiRe - JAX — Peer-reviewed • Science 93%

DiRe - JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data - Published in JOSS (2025)

Updated 6 months ago

most-different-text-selection • Science 54%

Use embedding data from LLMs to determine the most different text in a given corpus.

Updated 6 months ago

model • Science 26%

The Clay Foundation Model - An open source AI model and interface for Earth

Updated 5 months ago

https://github.com/d1egoprog/synthetictriples • Science 13%

Paper showcase for the initial version of the Synthetic Triple generation approach

Updated 6 months ago

https://github.com/0xibra/linux-tower-gpt-embeddings-experiment • Science 13%

This project is a work-in-progress and serves as an experiment for context injection with GPT and code embeddings. The goal is to use GPT to develop the remaining features of the project.

Updated 6 months ago

https://github.com/amberlee2427/nancy-brain • Science 36%

Nancy's RAG backend and HTTP API/MCP server connectors.

Updated 6 months ago

midi2vec • Science 67%

MIDI2vec computes embeddings for representing MIDI data in vector space

Updated 6 months ago

comparative-embedding-visualization • Science 57%

A Jupyter widget for comparing two embeddings with shared labels by their confusion, neighborhoods, and size.