cntext
text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。
text2vec
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
scattertext
Beautiful visualizations of how language differs among document types.
word2vecelastic
Collect sentences from ElasticSearch, preprocess and train diachronic Word2Vec models
align
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
https://github.com/alibaba/alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
https://github.com/hidadeng/chinese-pretrained-word-embeddings
中文文本分析工具、语料、预训练模型相关资源汇总。
dutch-word-embeddings
Dutch word embeddings, trained on a large collection of Dutch social media messages and news/blog/forum posts.
word-embeddings-repository-for-turkish
Code for "A Comprehensive Analysis of Static Word Embeddings for Turkish". Expert Systems with Applications 2024.
https://github.com/alexeyev/kyrgyz-embedding-evaluation
A benchmark for embeddings evaluation for Kyrgyz language
word2vec-russian-novels
Inspired by word2vec-pride-vis the replacement of words of Russian most valuable novels text with closest word2vec model words. By Boris Orekhov 📚