Updated 10 months ago

kitsune • Rank 4.2 • Science 85%

Kitsune is a next-generation data steward and harmonization tool.

Updated 10 months ago

text2vec • Rank 17.9 • Science 54%

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Updated 10 months ago

https://github.com/lancedb/lance • Rank 28.1 • Science 36%

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Updated 10 months ago

knrscore • Rank 4.9 • Science 54%

KNRScore is a Python package for computing K-Nearest-Rank Similarity, a metric that quantifies local structural similarity between two maps or embeddings.

Updated 10 months ago

word2vecelastic • Rank 0.7 • Science 54%

Collect sentences from ElasticSearch, preprocess and train diachronic Word2Vec models

Updated 10 months ago

ragoon • Rank 9.3 • Science 44%

High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡

Updated 10 months ago

node2vec • Rank 20.0 • Science 33%

Implementation of the node2vec algorithm.

Updated 10 months ago

graph • Rank 8.9 • Science 44%

GPU-accelerated force graph layout and rendering

Updated 10 months ago

llama_ros • Rank 7.2 • Science 44%

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

Updated 10 months ago

marqo-fashionclip • Rank 6.4 • Science 44%

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Updated 10 months ago

questea • Rank 1.4 • Science 39%

QuestionnaireEmbeddingsAnalysis - innovative approach to extracting richer information from clinical survey

Updated 10 months ago

word2vec • Rank 15.1 • Science 23%

Distributed Representations of Words using word2vec

Updated 10 months ago

https://github.com/amazon-science/supervised-intent-clustering • Rank 2.1 • Science 26%

This is a package to fine-tune language models in order to create clustering-friendly embeddings.

Updated 10 months ago

https://github.com/dru-mara/evalne-gui • Rank 3.6 • Science 23%

EvalNE-GUI: The Graphical User Interface for EvalNE

Updated 10 months ago

doc2vec • Rank 10.8 • Science 10%

Distributed Representations of Sentences and Documents

Updated 10 months ago

lorann • Science 36%

Approximate Nearest Neighbor search using reduced-rank regression, with extremely fast queries, tiny memory usage, and rapid indexing on modern vector embeddings.

Updated 10 months ago

clep • Science 67%

🤖 A Python Package for generating new patient representations driven by data and prior knowledge

Updated 10 months ago

comparative-embedding-visualization • Science 57%

A Jupyter widget for comparing two embeddings with shared labels by their confusion, neighborhoods, and size.

Updated 10 months ago

jupyter-scatter-tutorial • Science 44%

Jupyter Scatter Tutorial (that was first presented at SciPy '23)

Updated 10 months ago

pkgmatch • Science 26%

Find R packages matching either descriptions or other R packages

Updated 10 months ago

https://github.com/d1egoprog/synthetictriples • Science 13%

Paper showcase for the initial version of the Synthetic Triple generation approach

Updated 10 months ago

most-different-text-selection • Science 54%

Use embedding data from LLMs to determine the most different text in a given corpus.

Scientific Software
Updated 10 months ago

DiRe - JAX — Peer-reviewed • Science 93%

DiRe - JAX: A JAX based Dimensionality Reduction Algorithm for Large-scale Data - Published in JOSS (2025)

Updated 10 months ago

geospatial-rag • Science 26%

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

Updated 10 months ago

midi2vec • Science 67%

MIDI2vec computes embeddings for representing MIDI data in vector space

Updated 10 months ago

tsde • Science 41%

TSDE is a novel SSL framework for TSRL, the first of its kind, effectively harnessing a diffusion process, conditioned on an innovative dual-orthogonal Transformer encoder architecture with a crossover mechanism, and employing a unique IIF mask strategy (KDD 2024, main research track).

Updated 10 months ago

https://github.com/0xibra/linux-tower-gpt-embeddings-experiment • Science 13%

This project is a work-in-progress and serves as an experiment for context injection with GPT and code embeddings. The goal is to use GPT to develop the remaining features of the project.

Updated 10 months ago

https://github.com/cuc-zihang-liu/text-based-rag-framework • Science 26%

基于文本的RAG框架(多种编码器组合)

Updated 10 months ago

https://github.com/amberlee2427/nancy-brain • Science 36%

Nancy's RAG backend and HTTP API/MCP server connectors.

Updated 10 months ago

model • Science 26%

The Clay Foundation Model - An open source AI model and interface for Earth

Updated 10 months ago

https://github.com/aida-ugent/debayes • Science 26%

DeBayes: a Bayesian Method for Debiasing Network Embeddings (ICML 2020).