Recent Releases of txtai
txtai - v8.6.0
This release fixes a number of integration issues with downstream libraries and other performance improvements.
See below for full details on the new features, improvements and bug fixes.
Improvements
- Handling truncation for the Similarity pipeline (#882)
- Update tagline to the all-in-one AI framework (#901)
Bug Fixes
- Encoding issue with latest version of LiteLLM (#902)
- Fix bug with latest version of smolagents (#906)
- Import error with latest version of onnx (#907)
- Upcoming breaking GrandCypher API change (#909)
- Max Length parameter is ignored in LLM and Summary pipelines with latest version of Transformers (#912)
- Fix issue with latest version of smolagents (#913)
- Python
Published by davidmezzetti 9 months ago
txtai - v8.5.0
This release migrates from Transformers Agents to smolagents, adds Model Context Protocol (MCP) support and now requires Python 3.10+
See below for full details on the new features, improvements and bug fixes.
New Features
- Migrate to smolagents (#890)
- Add Model Context Protocol (MCP) Support (#892)
- Add support for MCP servers to Agent Framework (#898)
- Require Python 3.10 (#897)
Improvements
- Lazy load list of translation models (#896)
Bug Fixes
- Fix issue with MessageRole Enums and LLM pipeline (#888)
- Transformers 4.50 modified cached_file behavior (#889)
- Add test vision model compatible with Transformers 4.50 (#891)
- Fix bug introduced with Pillow 11.2 (#895)
- Python
Published by davidmezzetti 11 months ago
txtai - v8.4.0
This release adds support for vision LLMs, graph vector search, embeddings checkpoints, observability and an OpenAI-compatible API
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for vision models to HF LLM pipeline (#884)
- Add similar query clause to graph queries (#875)
- Feature Request: Embeddings index checkpointing (#695)
- Feature Request: Enhance observability and tracing capabilities (#869)
- Add OpenAI API compatible endpoint to API (#883)
- Add example notebook showing how to use OpenAI compatible API (#887)
- Add texttospeech pipeline to API (#552)
- Add upload endpoint to API (#659)
Improvements
- Add encoding parameter to TextToSpeech pipeline (#885)
- Add support for input streams to Transcription pipeline (#886)
Bug Fixes
- Fix bug with latest version of Transformers and model registry (#878)
- Python
Published by davidmezzetti 12 months ago
txtai - v8.3.0
This release adds support for GLiNER, Chonkie, Kokoro TTS and Static Vectors
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for GLiNER models (#862) Thank you @urchade
- Add semantic chunking pipeline (#812) Thank you @bhavnicksm
- Add Kokoro TTS support to TextToSpeech pipeline (#854) Thank you @hexgrad
- Add staticvectors inference (#859)
- Add example notebook for Entity Extraction with GLiNER (#873)
- Add example notebook for RAG Chunking (#874)
- Add notebook that analyzes NeuML LinkedIn posts (#851)
Improvements
- Add new methods for audio signal processing (#855)
- Remove fasttext dependency (#857)
- Remove WordVectors.build method (#858)
- Detect graph queries and route to graph index (#865)
- Replace python-louvain library with networkx equivalent (#867)
- Word vector model improvements (#868)
- Improve parsing of table text in HTML to Markdown pipeline (#872)
Bug Fixes
- Update build script to workaround breaking change with latest version of Transformers and Python 3.9 (#852)
- Incorrect Endpoint Description in Swagger UI (FastAPI) (#860)
- Handle empty token list with word vectorization (#861)
- Python
Published by davidmezzetti about 1 year ago
txtai - v8.2.0
This release simplifies LLM chat messages, adds attribute filtering to Graph RAG and enables multi-cpu/gpu vector encoding
See below for full details on the new features, improvements and bug fixes.
New Features
- Add defaultrole to LLM pipeline (#841)
- Feature Request: Graph RAG - Add extra attributes (#684)
- Support graph=True in embeddings config (#848)
- Support pulling attribute data in graph.scan (#849)
- Encoding using multiple-GPUs (#541)
- Add vectors argument to Model2Vec vectors (#846)
- Enhanced Docs: LLM Embedding Examples (#843, #844) Thank you @igorlima!
Improvements
- Pin build script to pillow==10.4.0 (#800)
- Ensure generated datetimes are in UTC (#840)
- Update RAG notebooks to add clarifying notes on LLM inference (#847)
- Python
Published by davidmezzetti about 1 year ago
txtai - v8.1.0
This release adds Docling integration, Embeddings context managers and significant database component enhancements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add text extraction with Docling (#814)
- Add Embeddings context manager (#832)
- Add support for halfvec and bit vector types with PGVector ANN (#839)
- Persist embeddings components to specified schema (#829)
- Add example notebook that analyzes the Hugging Face Posts dataset (#817)
- Add an example notebook for autonomous agents (#820)
Improvements
- Cloud storage improvements (#821)
- Autodetect Model2Vec model paths (#822)
- Add parameter to disable text cleaning in Segmentation pipeline (#823)
- Refactor vectors package (#826)
- Refactor Textractor pipeline into multiple pipelines (#828)
- RDBMS graph.delete tests and upgrade graph dependency (#837)
- Bound ANN hamming scores between 0.0 and 1.0 (#838)
Bug Fixes
- Fix error with inferring function parameters in agents (#816)
- Add programmatic workaround for Faiss + macOS (#818) Thank you @yukiman76!
- docs: update 49Externaldatabase_integration.ipynb (#819) Thank you @eltociear!
- Fix memory issue with llama.cpp LLM pipeline (#824)
- Fix issue with calling cached_file for local directories (#825)
- Fix resource issues with embeddings indexing components backed by databases (#831)
- Fix bug with NetworkX.hasedge method (#834)
- Python
Published by davidmezzetti about 1 year ago
txtai - v8.0.0
π We're excited to announce the release of txtai 8.0 π
If you like txtai, please remember to give it a β!
8.0 introduces agents. Agents automatically create workflows to answer multi-faceted user requests. Agents iteratively prompt and/or interface with tools to step through a process and ultimately come to an answer for a request.
This release also adds support for Model2Vec vectorization. See below for more.
New Features
- Add txtai agents π (#804)
- Add agents package to txtai (#808)
- Add documentation for txtai agents (#809)
- Add agents to Application and API interfaces (#810)
- Add agents example notebook (#811)
- Add model2vec vectorization (#801)
Improvements
- Update BASE_IMAGE in Dockerfile (#799)
- Cleanup vectors package (#802)
- Build script improvements (#805)
Bug Fixes
- ImportError: cannot import name 'DuckDuckGoSearchTool' from 'transformers.agents' (#807)
- Python
Published by davidmezzetti over 1 year ago
txtai - v7.5.0
This release adds Speech to Speech RAG, new TTS models and Generative Audio features
See below for full details on the new features, improvements and bug fixes.
New Features
- Add Speech to Speech example notebook (#789)
- Add streaming speech generation (#784)
- Add a microphone pipeline (#785)
- Add an audio playback pipeline (#786)
- Add Text to Audio pipeline (#792)
- Add support for SpeechT5 ONNX exports with Text to Speech pipeline (#793)
- Add audio signal processing and mixing methods (#795)
- Add Generative Audio example notebook (#798)
- Add example notebook covering open data access (#782)
Improvements
- Issue with Language Specific Transcription Using txtai and Whisper (#593)
- Update TextToSpeech pipeline to support speaker parameter (#787)
- Update Text to Speech Generation Notebook (#790)
- Update hfhubdownload methods to use cached_file (#794)
- Require Python >= 3.9 (#796)
- Upgrade pylint and black (#797)
- Python
Published by davidmezzetti over 1 year ago
txtai - v7.4.0
This release adds the SQLite ANN, new text extraction features and a programming language neutral embeddings index format
See below for full details on the new features, improvements and bug fixes.
New Features
- Add SQLite ANN (#780)
- Enhance markdown support for Textractor (#758)
- Update txtai index format to remove Python-specific serialization (#769)
- Add new functionality to RAG application (#753)
- Add bm25s library to benchmarks (#757) Thank you @a0346f102085fe9f!
- Add serialization package for handling supported data serialization methods (#770)
- Add MessagePack serialization as a top level dependency (#771)
Improvements
- Support
<pre>blocks with Textractor (#749) - Update HF LLM to reduce noisy warnings (#752)
- Update NLTK model downloads (#760)
- Refactor benchmarks script (#761)
- Update documentation to use base imports (#765)
- Update examples to use RAG pipeline instead of Extractor when paired with LLMs (#766)
- Modify NumPy and Torch ANN components to use np.load/np.save (#772)
- Persist Embeddings index ids (only used when content storage is disabled) with MessagePack (#773)
- Persist Reducer component with skops library (#774)
- Persist NetworkX graph component with MessagePack (#775)
- Persist Scoring component metadata with MessagePack (#776)
- Modify vector transforms to load/save data using np.load/np.save (#777)
- Refactor embeddings configuration into separate component (#778)
- Document txtai index format (#779)
Bug Fixes
- Translation: AttributeError: 'ModelInfo' object has no attribute 'modelId' (#750)
- Change RAGTask to RagTask (#763)
- Notebook 42 error (#768)
- Python
Published by davidmezzetti over 1 year ago
txtai - v7.3.0
This release adds a new RAG front-end application template, streaming LLM and streaming RAG support along with significant text extraction improvements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for streaming LLM generation (#680)
- Add RAG API endpoint (#735)
- Add RAG deepdive notebook (#737)
- Add RAG example application (#743)
Improvements
- Improve textractor pipeline (#748)
- Can't specify embedding model via API? (#632)
- Configuration documentation update request (#705)
- RAG alias for Extractor (#732)
- Rename Extractor pipeline to RAG (#736)
- Support maxseqlength parameter with model pooling (#746)
Bug Fixes
- Fix issue with max tokens for llama.cpp components (#733)
- Fix issue with loading non-transformer LLM models in Extractor/RAG pipeline (#734)
- Fix issue with setting quantize=False in HFTrainer pipeline (#747)
- Python
Published by davidmezzetti over 1 year ago
txtai - v7.2.0
This release adds Postgres integration for all components, LLM Chat Messages and vectorization with llama.cpp/LiteLLM
See below for full details on the new features, improvements and bug fixes.
New Features
- Add pgvector ANN backend (#698)
- Add RDBMS Graph (#699)
- Add notebook covering txtai integration with Postgres (#701)
- Add Postgres Full Text Scoring (#713)
- Add support for chat messages in LLM pipeline (#718)
- Add support for LiteLLM vector backend (#725)
- Add support for llama.cpp vector backend (#726)
- Add notebook showing to run RAG with llama.cpp and LiteLLM (#728)
Improvements
- Split similarity extras install (#696)
- Ensure config.path = None and config.path missing mean the same thing (#704)
- Add close methods to ANN and Graph (#711)
- Update finalizers to check object attributes haven't already been cleared (#722)
- Update LLM pipeline to support GPU parameter with llama.cpp backend (#724)
- Refactor vector module to support additional backends (#727)
Bug Fixes
- Fix issue with database.search and empty scores (#712)
- Update HFOnnx pipeline to default to opset 14 (#719)
- Fix incompatibility with ONNX models and transformers>=4.41.0 (#720)
- Fix incompatibility between latest skl2onnx and txtai (#729)
- Python
Published by davidmezzetti over 1 year ago
txtai - v7.1.0
This release adds dynamic embeddings vector support along with semantic graph and RAG improvements
See below for full details on the new features, improvements and bug fixes.
New Features
- Add support for dynamic vector dimensions (#674)
- Add batch node and edge creation for graphs (#693)
- Add notebook on Retrieval Augmented and Guided Generation (#694)
Improvements
- Pass options to underlying vector models (#675)
- Move vector model caching from Embeddings to Vectors (#678)
- Add indexids only search (#691)
- Create temporary tables once per database session (#692)
Bug Fixes
- Fix token storage auth error (#676)
- TypeError: 'NoneType' object is not iterable (#683)
- Fix issue with hardcoded autoawq version in example notebooks (#686)
- API deps missing Pillow (#690)
- Python
Published by davidmezzetti almost 2 years ago
txtai - v7.0.0
π We're excited to announce the release of txtai 7.0 π
If you like txtai, please remember to give it a β!
7.0 introduces the next generation of the semantic graph. This release adds support for graph search, advanced graph traversal and graph RAG. It also adds binary support to the API, index format improvements and training LoRA/QLoRA models. See below for more.
New Features
- Add indexing of embeddings graph relationships (#525)
- Expand the graph capabilities to enable advanced graph traversal (#534, #540)
- Add feature to return embeddings search results as graph (#644)
- Add RAG with Semantic Graphs notebook (#645)
- Graph search results via API (#670)
- Add knowledge graphs via LLM-driven entity extraction notebook (#671)
- Add advanced RAG with graph path traversal notebook (#672)
- Add support for binary content via API (#630)
- Add MessagePack encoding to API (#658)
- Add documentation for API security (#627)
- Add notebook that covers API authorization and authentication (#628)
- Add top level import for LLM (#648)
- Add external vectorization notebook (#651)
- Add configuration override to embeddings.load (#657)
- Add what's new in txtai 7.0 notebook (#673)
Improvements
- Benchmark script improvements (#641)
- ImportError: Textractor pipeline is not available - install "pipeline" extra to enable (#646)
- Resolve external vector transform functions (#650)
- Change default embeddings config format to json (#652)
- Store index ids outside of configuration when content is disabled (#653)
- Update HFTrainer to add PEFT support (#654)
- Update 40TexttoSpeechGeneration.ipynb (#666)- thank you @babinux
- Adding training dependencies to notebooks (#669)
Bug Fixes
- Fix various issues with subindex reloading (#618)
- Fix benchmarks script (#636)
- Set tokenizer.pad_token when empty for all training paths (#649)
- Fix documentation code filters (#656)
- Issues with NetworkX when using graph subindex (#664)
A big thank you goes to Jordan Matelsky (@j6k4m8) for his help in integrating the GrandCypher library into txtai!
- Python
Published by davidmezzetti about 2 years ago
txtai - v6.3.0
This release adds new LLM inference methods, API Authorization and RAG improvements
π New LLM methods. llama.cpp and LiteLLM support added. LLM pipeline now supports Hugging Face models, GGUF files and LLM API inference all with one line of code.
π API Authorization. Adds support for API keys and pluggable authentication methods when running through txtai API.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add llama.cpp support to LLM (#611)
- Integrate with Litellm (#554)
- Add API route dependencies (#623)
- Add API Authorization (#263, #624)
- Add notebook on how to build RAG pipelines (#605)
- Add notebook showing how to use llama.cpp, LiteLLM and custom generation models (#615)
Improvements
- Enhance textractor to better support RAG use cases (#603)
- Update text extraction notebook (#604)
- Extractor (RAG) pipeline improvements (#613)
- Refactor LLM pipeline to support multiple framework methods (#614)
- Change API startup event to lifespan event (#625)
Bug Fixes
- Handle None input properly in Tokenizer (#607)
- Issue with subdirectories and ZIP compression (#609)
- Error in 52BuildRAGpipelineswith_txtai.ipynb (#620)
- Add missing skl2onnx dependency (#622)
- Python
Published by davidmezzetti about 2 years ago
txtai - v6.2.0
This releases add binary quantization, SQL bind parameters and performance improvements
β‘ Scalar quantization. Supports 1 bit (binary) through 8 bit quantization. Can dramatically reduce vector storage requirements.
π SQL bind parameters. Enables searching binary content with SQL statements, along with being a standard best practice.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add scalar quantization support to vectors (#583)
- Feature request: Bind variable support when searching with SQL using Content=True mode (#564)
- Add cls pooling option (#565)
- Add prefix parameter for object storage (#568)
- Add parameter to RetrieveTask to disable directory flattening (#569)
- Add support for binary indexes to Faiss ANN (#585)
- Add support for scalar data to torch and numpy ANN backends (#587)
- Add quantization notebook (#588)
- Add API extensions notebook (#591)
- Add env variable to disable macOS MPS devices (#592)
Improvements
- Allow searching for images (#404)
- Update LLM pipeline to support template parameter (#566)
- Update recommended models (#573)
- Is it possible to add chat history to extractor workflow? (#575)
- Extractor pipeline improvements (#577)
- Update documentation (#582)
- Move vector normalization to vectors module (#584)
- Update benchmarks to read configuration (#586)
- Update torch version in Dockerfile (#589)
- Update Faiss ANN to support IVF strings without number of cells (#594)
- Update documentation to note SQL bind parameters (#596)
Bug Fixes
- Inconsistency in Embeddings behavior in Applications (#571)
- Python
Published by davidmezzetti over 2 years ago
txtai - v6.1.0
This release adds metadata support for client-server databases and custom scoring implementations
ποΈ Client-server database integration. Store index metadata in Postgres, MariaDB/MySQL, MSSQL and more.
πΉ Custom scoring implementations. Store keyword index data in systems such as Elasticsearch. Similar to functionality already available in vector index component.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add metadata support for client-server databases (#532)
- Add support for custom scoring instances (#544)
- Add benchmark script (#522)
- Add sparse keyword benchmark notebook (#523)
- Add hybrid search notebook (#526)
- Add way to load database connection URL via environment variable (#548)
- Add external database integration notebook (#549)
- Add weights and index to Application methods (#561)
Improvements
- Refresh introducing txtai notebook (#520)
- Calling .reindex() on Application instance (#547)
- Document how to run API via HTTPS (#553)
- Update reindex action to support new 6.x configuration (#557)
Bug Fixes
- ValueError: dictionary update sequence element #0 has length X; 2 is required (#529)
- Add build script workaround for DuckDB and Pandas 2.1.0 incompatibility (#542)
- Align API parameter data type to translation pipeline (#550)
- Summary pipeline error when gpu enabled on mps device (#551)
- Remove deprecated option from quantize_dynamic (#562)
- Dates fail in example (#563)
- Python
Published by davidmezzetti over 2 years ago
txtai - v6.0.0
π₯³ We're excited to announce the release of txtai 6.0 π₯³
This significant milestone release marks txtai's 3 year birthdayπ If you like txtai, please remember to give it a β!
6.0 adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow. See below for more.
Breaking changes
The vast majority of changes are fully backwards compatible. New features are only enabled when specified. The only breaking change is with the Scoring terms interface, where the index format changed. The main Scoring interface used for word vectors weighting is unchanged.
New Features
- Better BM25 (#508)
- Hybrid Search (#509)
- Add additional indexes for embeddings (#515)
- Refactor Sequences and Generator pipeline into single LLM pipeline (#494)
- Support passing model parameters in pipelines (#500)
- Add "auto-id" capability to Embeddings (#502)
- Add UUID auto-id (#505)
- Add keyword arguments to Embeddings constructor (#503)
- Add top level imports (#514)
Improvements
- Add NumPy ANN Backend (#468)
- Add PyTorch ANN Backend (#469)
- Add notebook covering embeddings configuration options (#470)
- make data - No such file or directory (#473)
- Improve derivation of default embeddings model path (#476)
- Add accelerate dependency (#477)
- Add baseball example application (#484)
- Update minimum Python version to 3.8 (#485)
- Add WAL option for SQLite (#488)
- Add support for alternative acceleration devices (#489)
- Add support for passing torch devices to embeddings and pipelines (#492)
- Documentation updates (#495)
- Improve Pooling tokenizer load method (#499)
- Add ability for extractor to reference another pipeline in applications (#501)
- Reorganize embeddings configuration documentation (#504)
- Support Unicode Text Segmentation in Tokenizer (#507)
- ANN improvements (#510)
- Add multilingual graph topic modeling (#511)
- Add support for configurable text/object fields (#512)
- Update documentation for 6.0 (#513)
- Add count method to database (#517)
- Improvements when indexing through Applications (#518)
- Add what's new in txtai 6.0 notebook (#519)
Bug Fixes
- OpenMP issues with torch 1.13+ on macOS (#377)
- Unique constrant violation issue with DuckDB (#475)
- Incorrect results can be returned by embedding search when Content storage enabled (#496)
- Fix issues with graph.infertopics (#516)
- Python
Published by davidmezzetti over 2 years ago
txtai - v5.5.0
This release adds workflow streams and DuckDB as a database backend
βͺοΈοΈ Workflow streams enable server-side processing of large datasets. Streams iteratively pass content to workflows, no need to pass bulk data through the API.
π¦ DuckDB is a new database backend. Certain larger non-vector driven queries and aggregations will now run significantly faster than with SQLite.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add workflow streams (#461)
- Add DuckDB support (#462)
Improvements
- Modify translation pipeline langdetect parameter to accept language detection function good first issue - Thank you @saucam! (#423, #444)
- Pass generation keyword arguments to underlying text generation pipeline (#457)
- Replace original prompt in text generation pipeline (#459)
Bug Fixes
- Issue with upsert and graph (#421)
- Upsert API fails with graph config while performing after /delete (#435)
- Build errors with latest onnxmltools package (#449)
- Fix issue with embeddings reindex and stale function references (#453)
- Problem with the workflow builder (#454)
- Check for empty queue before attempting to convert inputs to dictionaries (#456)
- Fix issue with latest version of Transformers and TokenDetection.save_pretrained (#458)
- Python
Published by davidmezzetti almost 3 years ago
txtai - v5.4.0
This release adds prompt templates, conversational task chaining and Hugging Face Hub integration
π Prompt templates dynamically generate text using workflow task inputs. This enables chaining multiple prompts and models together.
π€ Embeddings now integrate with the Hugging Face Hub! Easily share and load embeddings indexes. There is a full embeddings index available for English Wikipedia.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add translation pipeline parameter to return selected models and detected language - Thank you @saucam! (#383, #424)
- Add sample parameter to Faiss ANN (#427)
- Add support for instruction-based embeddings (#428)
- Add Hugging Face Hub integration (#430)
- Add cloud object storage support for uncompressed embeddings indexes (#431)
- Add support for custom cloud providers (#432)
- Add support for storing embeddings config as JSON (#433)
- Add notebook for syncing embeddings with the cloud (#434)
- Add terms method to embeddings (#445)
- Add extractor reference output format (#446)
- Add template task (#448)
- Add prompt template and task chaining example notebook (#451)
Improvements
- Mention the default storage engine - Thank you @hsm207! (#422)
- Refactor archive module into separate package (#429)
- Resolve application references in pipelines (#441)
- Extractor pipeline improvements (#443)
- Allow task action arguments to be dictionaries in addition to tuples (#447)
- Automatically mark embeddings index files for lfs tracking with Hugging Face Hub (#450)
Bug Fixes
- Pin onnxruntime for macOS in build script (#425)
- Python
Published by davidmezzetti almost 3 years ago
txtai - v5.3.0
This release adds embeddings-guided and prompt-driven search along with a number of methods to train language models
π Prompt-driven search is a big step forward towards conversational search in txtai. With this release, complex prompts can now be passed to txtai to customize how search results are returned. Lots of exciting possibilities on this front, stay tuned.
π‘ The trainer pipeline now has support for training language models from scratch. It supports masked language modeling (MLM), causal language modeling (CLM) and replaced token detection (ELECTRA-style). This is part of the micromodels effort.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add language modeling task to HFTrainer (#403)
- Add language modeling example notebook (#408)
- Add FAQ section to documentation (#413)
- Add language generation task to HFTrainer (#414)
- Add replaced token detection task to HFTrainer (#415)
- Add generator pipeline for text generation (#416)
- Add notebook for embeddings-guided and prompt-driven search with LLMs (#418)
Improvements
- Normalize BM25 and TF-IDF scores (#401)
- Add note to restart kernel if running in Google Colab - Thank you @hsm207! (#410)
- Add clear error when starting API and config file not found (#412)
- Extractor pipeline 2.0 (#417)
- Make texts parameter optional for extractor pipeline in applications (#420)
Bug Fixes
- Fix issue with ORDER BY case sensitivity (#405)
- Python
Published by davidmezzetti about 3 years ago
txtai - v5.2.0
This release adds TextToSpeech and Cross-Encoder pipelines. The performance of the embeddings.batchtransform method was significantly improved, enabling a speed up in building semantic graphs. Default configuration is now available for Embeddings, allowing an Embeddings instance to be created with no arguments like Pipelines.
See below for full details on the new features, improvements and bug fixes.
New Features
- Add Cross-Encoder support to Similarity pipeline (#372)
- Create compression package (#376)
- Add TextToSpeech pipeline (#389)
- Add TextToSpeech Notebook (#391)
- Add default configuration for Embeddings (#393)
Improvements
- Filter HF API list models request (#381)
- Split pipeline extras by function area (#387)
- Update data package to handle label arrays (#388)
- Modify transcription pipeline to accept raw waveform data (#390)
- Transcription pipeline improvements (#392)
- Allow searching by embedding (#396)
- Modified logger configuration in init.py (libraries shouldn't modify root logger) - Thank you @adin786! (#397)
- Pass evaluation metrics to underlying Trainer (#398)
- Improve batchtransform performance (#399)
Bug Fixes
- Example 31 - Duplicate image detection not working (#357)
- All sorts of issues with Example 18 - Export and run models with ONNX (#369)
- Fix issue with select distinct bug (#379)
- Update build script and tests to address issues with latest version of FastAPI (#380)
- Fix issue with similar and bracket SQL expressions embedded in functions (#382)
- Fix bug with embeddings functions and application config bug (#400)
- Python
Published by davidmezzetti about 3 years ago
txtai - v5.1.0
This release adds new model support for the translation pipeline, OpenAI Whisper support in the transcription pipeline and ARM Docker images. Topic modeling was also updated with improvements, including how to use BM25/TF-IDF indexes to drive topic models.
See below for full details on the new features, improvements and bug fixes.
New Features
- Multiarch docker image (#324)
- Add notebook covering classic topic modeling with BM25 (#360)
Improvements
- Read authentication parameters from storage task (#332)
- Update scoring algorithms (#351)
- Add config option for list of stopwords to ignore with topic generation (#352)
- Allow for setting custom translation model path (#355)
- Update caption pipeline to call image-to-text pipeline (#361)
- Update transcription pipeline to call automatic-speech-recognition pipeline (#362)
- Only pass tokenizer to pipeline when necessary (#363)
- Improve default max length logic for text generation (#364)
- Update transcription notebook (#365)
- Update translation notebook (#366)
- Move mkdocs dependencies from docs.yml to setup.py (#368)
Bug Fixes
- GitHub Actions build error with torch 1.12 on macOS (#300)
- SQLite JSON support not built into Python Windows builds < 3.9 (#356)
- Use tags field in application.add (#359)
- Fix issue with Application autosequencing (#367)
- Python
Published by davidmezzetti over 3 years ago
txtai - v5.0.0
πππ₯³ We're excited to announce the release of txtai 5.0! π₯³ππ
Thank you to the txtai community! Please remember to β txtai!
txtai 5.0 is a major new release. This release adds the semantic graph along with enabling external integrations. It also adds a number of improvements and bug fixes.
New Features
- Add scoring-based search (#327)
- Add notebook demonstrating functionality of individual embeddings components (#328)
- Add SQL expression columns (#338)
- Add semantic graph component (#339)
- Add notebook covering Semantic Graphs (#341)
- Add graph documentation (#343)
- Allow custom ann, database and graph instances (#344)
Improvements
- Clarify embeddings.save documentation (#325)
- Modify embeddings search candidate default logic (#326)
- Update console to conditionally import library (#333)
- Update ANN package to make terminology more consistent (#334)
- Support non-text document elements in Applications (#335)
- Update workflow documentation to note generator execution (#336)
- Update audio transcription notebook to include example with OpenAI Whisper (#345)
Bug Fixes
- Calling scoring.index with no tokens parsed results in error (#337)
- Fix cached_path error with transformers v4.22 (#340)
- Fix docker command "--it". Thank you to @lipusz! (#346)
- Error loading compressed indexes in console bug (#347)
- Python
Published by davidmezzetti over 3 years ago
txtai - v4.6.0
πππ₯³ txtai turns 2 πππ₯³
We're excited to release the 25th version of txtai marking it's 2 year anniversary. Thank you to the txtai community. Please remember to β txtai!
txtai 4.6 is a large but backwards compatible release! This release adds better integration between embeddings and workflows. It also adds a number of significant performance improvements and bug fixes.
New Features
- Add transform workflow action to application (#281)
- Add ability to resolve workflows within applications (#290)
- OFFSET in sql query statement (#293)
- Add webpage summary image generation notebook (#299)
- Add notebook on running txtai with native code (#304)
- Add mmap parameter to Faiss (#308)
- Add indexing guide to docs (#312)
Improvements
- Consume generator outputs in workflow tasks (#291)
- Update pipeline workflow notebook (#292)
- Update tabular notebook (#297)
- Lower required version of Pillow library to prevent unnecessary upgrades (#303)
- Embeddings vector batch improvements (#309)
- Use single constant for current pickle protocol (#310)
- Move quantize config param to Faiss (#311)
- Update documentation with new demo and diagrams (#313)
- Improve embeddings performance with large query limits (#318)
Bug Fixes
- ModuleNotFoundError: No module named 'transformers.hf_api' (#274)
- Dependency issue with ONNX and Protobuf (#285)
- The key should be writable instead of path. Thank you to @csnelsonchu! (#287)
- Fix breaking change in build script from mkdocstrings bug (#289)
- Index id sync issue when inserting multiple data types (text, documents, objects) into Embeddings (#294)
- Labels pipeline outputs changed with transformers 4.20.0 (#295)
- Tabular pipeline throws error when processing list fields (#296)
- txtai load testing (#305)
- Add cloud config to application.upsert method (#306)
- Python
Published by davidmezzetti over 3 years ago
txtai - v4.5.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add scripts to train bashsql query translation model (#271)
- Add QA database example notebook (#272)
- Add CITATION file (#273)
Improvements
- Improve efficiency of external vectors (#275)
- Refactor vectors package to improve code reuse (#276)
- Add logic to detect external vectors method (#277)
Bug Fixes
- Fix summary pipeline issue with transformers>=4.19.0 (#278)
- Python
Published by davidmezzetti almost 4 years ago
txtai - v4.4.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add semantic search explainability (#248)
- Add notebook covering model explainability (#249)
- Add txtai console (#252)
- Add sequences pipeline (#261)
- Add scripts to train query translation models (#265)
- Add query translation logic in embeddings searches (#266)
- Add notebook for query translation (#269)
Improvements
- Update HFTrainer to support sequence-sequence models (#262)
Bug Fixes
- Unit tests failing with tokenizers>= 0.12 (#253)
- Running default.config.yml returns TypeError: register() got an unexpected keyword argument 'ids' (#256)
- Unit tests failing with transformers==4.18.0 (#258)
- Update precommit to use latest version of psf black (#259)
- Python
Published by davidmezzetti almost 4 years ago
txtai - v4.3.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add notebook covering txtai embeddings index file structure (#237)
- Add Image Hash pipeline (#240)
- Add support for custom SQL functions in embeddings queries (#241)
- Add notebook for Embeddings SQL functions (#243)
- Add notebook for near-duplicate image detection (#244)
Improvements
- Rename SQLException to SQLError (#232)
- Refactor API instance into a separate package (#233)
- API should raise an error if attempting to modify a read-only index (#235)
- Add last update field to index metadata (#236)
- Update transcription pipeline to use AutoModelForCTC (#238)
Bug Fixes
- Ensure limit always set in embeddings search/batchsearch (#234)
- Fix issue with parsing multiline SQL statements bug (#242)
- Python
Published by davidmezzetti almost 4 years ago
txtai - v4.2.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add notebook for workflow notifications (#225)
- Add default and custom docker configurations (#226)
- Create docker configuration for AWS Lambda (#228)
- Add support for loading/storing embedding indexes on cloud storage (#229)
Improvements
- Add support for SQL || operator (#223)
- Add flag to disable loading index data in API (#230)
Bug Fixes
- Modify database decoder methods to check for None (#220)
- Modify embeddings search to make return type consistent when index initialized and not initialized (#221)
- Embeddings index returning malformed JSON errors in certain situations (#222)
- Check for empty documents input before indexing (#224)
- Python
Published by davidmezzetti about 4 years ago
txtai - v4.1.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add entity extraction pipeline (#203)
- Add workflow scheduling (#206)
- Add workflow search task to API (#210)
- Add Console Task (#215)
- Add Export Task (#216)
- Add notebook for workflow scheduling (#218)
Improvements
- Default documentation theme using system preference (#197)
- Improve multi-user experience for workflow application (#198)
- Documentation improvements (#200)
- Add social preview image for documentation (#201)
- Add links to txtai in all example notebooks (#202)
- Add limit parameter to API search method (#208)
- Add documentation on local API instances (#209)
- Add shorthand syntax for creating workflow tasks in API (#211)
- Accept functions as workflow task actions in API (#213)
Bug Fixes
- Object detection model fails to load additional models (#204)
- Update unit tests to limit cpu usage for word vector tests (#207)
- Add better error handling around unindexed embedding instances (#212)
- Fix issue when workflow task generates no output (#214)
- Add lock to API search methods (#217)
- Python
Published by davidmezzetti about 4 years ago
txtai - v4.0.0
πππ₯³ We're excited to announce the release of txtai 4.0! π₯³ππ
Thank you to the growing txtai community. This couldn't be done without you. Please remember to β txtai if it has been helpful.
txtai 4.0 is a major release with a significant number of new features. This release adds content storage, querying with sql, object storage, reindexing, index compression, external vectors and more!
To quantify the changes, the code base increased by 50% with 36 resolved issues, by far the biggest release of txtai. These changes were designed to be fully backward compatible but keep in mind it is a new major release.
What's new in txtai 4.0 covers all the changes with detailed examples. The documentation site has also been refreshed.
New Features
- Store text content (#168)
- Add option to index dictionaries of content (#169)
- Add SQL support for generating combined embeddings + database queries (#170)
- Add reindex method to embeddings (#171)
- Add index archive support (#172)
- Add close method to embeddings (#173)
- Update API to work with embeddings + database search (#176)
- Add content option to tabular pipeline (#177)
- Update workflow example to support embeddings content (#179)
- Add index metadata to embeddings config (#180)
- Add object storage (#183)
- Aggregate partial query results when clustering (#184)
- Add function parameter to embeddings reindex (#185)
- Add support for user defined column aliases (#186)
- Use SQL bracket notation to support multi word and more complex JSON path expressions (#187)
- Support SQLite 3.22+ (#190)
- Add pre-computed vector support (#192)
- Change document/object inserts to only keep latest record (#193)
- Update documentation with 4.0 changes (#196)
Improvements
- Modify workflow to select batches with slices (#158)
- Add tensor support to workflows (#159)
- Read YAML config if provided as a file path (#162)
- Make adding pipelines to API easier (#163)
- Process task actions concurrently (#164)
- Add tensor workflow notebook (#167)
- Update default ANN parameters (#174)
- Require Python 3.7+ (#175)
- Consistently name embeddings id fields (#178)
- Add txtai version attribute (#181)
- Refresh notebooks for 4.0 (#188)
- Modify embeddings to only iterate over input documents once (#189)
- Improve efficiency of vector transformations (#191)
Bug Fixes
- Add thread lock around API write calls (#160)
- Expose caption and objects pipeline via API (#161)
- Change pickle calls to use protocol supporting lowest Python version (#182)
- HFOnnx expects ORT provider bug (#195)
- Python
Published by davidmezzetti about 4 years ago
txtai - v3.7.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add object detection pipeline (#148)
- Add image caption pipeline (#149)
- Add retrieval task (#150)
- Add no-op pipeline (#152)
- Add new workflow functionality (#155)
Improvements
- Add korean translation to README.md. Thank you @0206pdh! (#138)
- Add links to external articles (#139)
- Update example applications to be consistent (#140)
- Add an article summarization example (#144)
- Add fallback mode for textractor (#145)
- Reorganize pipeline package (#147)
- Update optional package tests to simulate missing packages (#154)
- Add parameter to flatten labels output (#153)
- Update documentation with latest changes (#156)
Bug Fixes
- Fix bug with importing service task when workflow extra not installed (#146)
- Fix inconsistencies with url based tasks (#151)
- Python
Published by davidmezzetti over 4 years ago
txtai - v3.6.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add post workflow action to API (#129)
- Add tabular pipeline (#134)
- Enhance ServiceTask to support additional use cases (#135)
- Add notebook for tabular pipeline (#136)
- Add topn option to extractor pipeline (#137)
Improvements
- Refactor registering new auto models to use methods in Transformers library (#128)
- Update workflow example application (#130)
Bug Fixes
- No issues this release
- Python
Published by davidmezzetti over 4 years ago
txtai - v3.5.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add scikit-learn to ONNX export pipeline (#124)
- Add registry methods for auto models (#126)
- Add notebook to demonstrate loading scikit-learn and PyTorch models (#127)
Improvements
- Add parameter to return raw model outputs for labels pipeline (#123)
- Add parameter to use standard pooling for TransformersVectors (#125)
Bug Fixes
- Pass model configuration to ONNX Models (#121)
- Fix incorrect import in Notebooks (#122)
- Python
Published by davidmezzetti over 4 years ago
txtai - v3.4.0
This release adds the following new features, improvements and bug fixes.
New Features
- Create notebook using extractive qa to build structured data (#117)
- Modify extractor pipeline to support similarity pipeline backed context (#119)
Improvements
- Improve performance of extractor context queries (#120)
Bug Fixes
- Update labels pipeline to filter text classification output (#116)
- Fix issues with Transformers 4.11.2 (#118)
- Python
Published by davidmezzetti over 4 years ago
txtai - v3.3.0
This release adds the following new features, improvements and bug fixes.
New Features
- Add ONNX export pipeline (#107)
- Add notebook for ONNX pipeline (#108)
- Add ONNX support for Embeddings and Pipelines (#109)
- Support QA models in Trainer pipeline (#111)
- Add notebook for training QA models (#115 )
Improvements
- Remove deprecated packages (#114)
Bug Fixes
- Fix issues with latest Transformers version (#110)
- Python
Published by davidmezzetti over 4 years ago
txtai - v3.2.0
This release adds the following new features, improvements and bug fixes.
New Features
- Enhance Labels pipeline to support standard text classification models (#95)
- Add Trainer pipeline (#96)
- Modularize txtai install (#97)
- Evaluate if faiss-cpu can be used as default across all platforms (#98)
- Add vector method for sentence-transformers (#101)
Improvements
- Add book search example application (#91)
- Add wiki search example application (#92)
- Change tokenization to default to false for TransformerVectors (#99)
- Infer vector method using path (#100)
- Improve performance when running models through transformers (#102)
- Update notebooks and example applications (#103)
Bug Fixes
- Clear workflow batch during processing bug (#90)
- Python
Published by davidmezzetti over 4 years ago
txtai - v3.1.0
This release adds the following new features:
- Add support for update/delete embeddings index operations (#86)
- Add Embeddings Cluster component (#87)
- Switch default backend on Windows to Hnswlib (#88)
- Add notebook covering distributed embedding clusters (#89)
- Python
Published by davidmezzetti almost 5 years ago
txtai - v3.0.0
txtai 3.0.0 is a major release with a significant number of new features. This release overhauls the project structure, consolidates logic into pipelines and introduces workflows.
Summary of txtai features:
- π Large-scale similarity search with multiple index backends (Faiss, Annoy, Hnswlib)
- π Create embeddings for text snippets, documents, audio and images. Supports transformers and word vectors.
- π‘ Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction
- βͺοΈοΈ Workflows that join pipelines together to aggregate business logic. txtai processes can be microservices or full-fledged indexing workflows.
- π API bindings for JavaScript, Java, Rust and Go
- βοΈ Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes)
New Features
- Add Docker file for API (#59)
- Require Faiss 1.7.0 (#60)
- Add summary pipeline (#65)
- Add text extraction pipeline (#66)
- Add transcription pipeline (#67)
- Add translation pipeline (#68)
- Add workflow framework (#69)
- Add additional pipeline abstraction layer for tensor frameworks (#70)
- Add tests for new v3 functionality (#71)
- Add notebooks covering new v3 functionality (#73)
- Add Pipeline Factory (#76)
- Add API extensions (#77)
- Add workflow builder application (#80)
- Add text segmentation pipeline (#81)
- Add workflow to API (#82)
- Add service workflow task (#83)
- Add object storage workflow task (#84)
- Add URL workflow task (#85)
Improvements
- Refactor code into smaller components and modules (#63)
- Modify pipeline to accept GPU device id (#64)
- Allow direct download of sentence-transformer models (#72)
- Update documentation, add site through GitHub pages (#75)
- Modularize the API (#78)
- Add default truncation to pipelines (#79)
Bug Fixes
- Non intuitive behaviour of Tokenizer (#61)
- [Python 3.9, Mac OS] Code hangs while building embedding index (#62)
- embeddings.index Truncation RuntimeError: The size of tensor a (889) must match the size of tensor b (512) at non-singleton dimension 1 (#74)
- Python
Published by davidmezzetti almost 5 years ago
txtai - v2.0.0
txtai 2.0.0 is a major release with a significant number of new features. This release brings a new zero-shot similarity pipeline, a more streamlined and consistent API, batch support for all modules and integration with Hugging Face Datasets.
In addition to Python, txtai has API support for JavaScript, Java, Rust and Go.
New Features
- [BREAKING CHANGES] Make API definitions consistent (#54)
- Zero-shot similarity pipeline (#21, #49)
- Add batch support for all modules (#18, #53)
- Add example notebook integrating Hugging Face Datasets (#26)
- Add example notebook that adds semantic search to existing system (#57)
Improvements
- Add API tests, increase test coverage (#42)
- Refactor pipeline component (#44)
- Upgrade to Transformers 4.x (#45)
- Review, organize and update example notebooks (#52)
- Allow setting ANN index parameters (#55)
- Modify API add method to stream data (#56)
Bug Fixes
- Fix language support issues (#39, #43)
- Python
Published by davidmezzetti about 5 years ago
txtai - v1.3.0
This release adds the following enhancements and bug fixes:
- Added FastAPI interface (#12)
- Fix tokenization error in notebook (#28)
- Added text labeling interface using zero shot classifier (#30)
- Update macOS version in Travis CI script
- Python
Published by davidmezzetti over 5 years ago
txtai - v1.2.0
This release adds the following enhancements and bug fixes:
- Add unit tests and integrate Travis CI (#7)
- Add documentation for Embeddings settings to README (#11)
- Compatibility issues with transformers 3.1 and sentence-transformers (#20)
- Add batch indexing for transformer indices (#23)
- Add option to store word vectors with embeddings model (#24)
- Python
Published by davidmezzetti over 5 years ago
txtai - v1.1.0
This release adds the following enhancements and bug fixes:
- Fully support Windows and macOS installs (#1, #2, #8, #9)
- Add support for additional index backends, Annoy (#4) and hnswlib (#5)
- Support string ids (#6)
- Enable flag to enable/disable Faiss SQ8 quantization (#10)
- Python
Published by davidmezzetti over 5 years ago