Releases | Open Source Science

Added

Chat: Added optional chat_session parameter (accepts a ChatSession) to preserve message history across turns in DocumentLLM.chat(). When this parameter is omitted, chat is single-turn, without message history.

- Python
Published by SergiiShcherbak 9 months ago

Changed

DocxConverter: Conversion speed improved by ~2X, significantly reducing processing time for DOCX files.

- Python
Published by SergiiShcherbak 9 months ago

Added

Multimodal LLM roles ("extractor_multimodal" and "reasoner_multimodal") to support extraction of multimodal document-level concepts from both text and images. Previously, only text and vision roles were supported, requiring choosing either text or image context for extraction, not both.

- Python
Published by SergiiShcherbak 9 months ago

Fixed

Added support for "minimal" reasoning effort for gpt-5 models.

- Python
Published by SergiiShcherbak 10 months ago

Added

Reasoning-aware extraction prompts: Automatically enables private chain-of-thought guidance on models that support reasoning, yielding higher-quality outputs (no change for other models).

- Python
Published by SergiiShcherbak 10 months ago

Added

Auto-pricing for LLMs: enable via auto_pricing=True to automatically estimate costs using pydantic's genai-prices; optional auto_pricing_refresh=True refreshes cached price data at runtime.

Refactor

Public API made more consistent and stable: user-facing classes are now thin, well-documented facades over internal implementations. No behavior changes.
Internal reorganization for maintainability and future-proofing.

Docs

Added guidance for configuring auto-pricing for LLMs.

- Python
Published by SergiiShcherbak 10 months ago

Fixed

Suppressed noisy LiteLLM proxy missing-dependency error logs (prompting to install litellm[proxy]) emitted by litellm>=1.75.2 during LLM API calls. ContextGem does not require LiteLLM proxy features. Suppression is scoped to LiteLLM loggers.

- Python
Published by SergiiShcherbak 10 months ago

Fixed

Enabled reasoning_effort parameter for gpt-5 models by explicitly forwarding it via allowed_openai_params, since litellm.get_supported_openai_params() does not yet include this parameter for gpt-5 models.

- Python
Published by SergiiShcherbak 10 months ago

Added

Added warning for gpt-oss models used with lm_studio/ provider due to performance issues (according to tests), with a recommendation to use Ollama as a working alternative (e.g., ollama_chat/gpt-oss:20b).

- Python
Published by SergiiShcherbak 10 months ago

Added

Added step-by-step usage guide in README, with brief descriptions of core components.
Added new documentation on documents, extraction pipelines, and logging configuration.

Changed

Renamed DocumentPipeline to ExtractionPipeline to better reflect its purpose and scope. DocumentPipeline is maintained as a deprecated wrapper class for backwards compatibility until v1.0.0.
Simplified logging config to use a single environment variable.

- Python
Published by SergiiShcherbak 10 months ago

Added

Added utility function create_image() for flexible image creation from various sources (file paths, PIL objects, file-like objects, raw bytes) with automatic MIME type detection.

Changed

Updated image_to_base64() utility function to accept more image source types (file-like objects and raw bytes) in addition to file paths.
Made temperature and top_p parameters for DocumentLLM optional.

- Python
Published by SergiiShcherbak 10 months ago

Changed

Enhanced LLM prompts with XML tags for improved instruction clarity and higher-quality extraction outputs.
Updated LabelConcept documentation with clearer distinction between multi-label and multi-class classification types.

Fixed

Fixed a bug where LabelConcept with multi-class classification type did not always return a label, as expected for multi-class classification tasks.

- Python
Published by SergiiShcherbak 10 months ago

Added

Explicit declaration of model vision capabilities: Added support for explicitly declaring vision capability when litellm.supports_vision() does not correctly identify a model's vision support. If a LLM is configured as a vision model and genuinely supports vision, but litellm fails to detect this capability, a warning will be issued. Users can manually set _supports_vision=True on the model instance to declare the capability and allow the model to accept image inputs.
Warning for Ollama vision models: Added a warning prompting users to use the ollama/ prefix instead of ollama_chat/ for vision models, as ollama_chat/ does not currently support image inputs.

- Python
Published by SergiiShcherbak 10 months ago

Fixed

BooleanConcept extraction for valid False values: Improved instructions in the concepts extraction prompt to fix a bug where no items were extracted for BooleanConcept with expected valid False values. The concept could be incorrectly considered "not addressed", resulting in empty extraction results.

Changed

Enhanced documentation: Added more details to parameter tables for Aspects API, Concepts API, and LLM config documentation.

- Python
Published by SergiiShcherbak 10 months ago

Fixed

Allow disabling system message (e.g. for basic chat interactions): Added support for omitting system messages in DocumentLLM by allowing empty strings, which prevents sending any system message to the LLM. Introduced a warning in llm.chat()/llm.chat_async() when the default system message (optimized for extraction tasks) is used. Updated initialization to set default system message only when needed, ensuring flexibility for basic chat without a system message.

- Python
Published by SergiiShcherbak 11 months ago

Added support for litellm versions >1.71.1: ContextGem now supports newer litellm versions that were previously incompatible with tests due to underlying transport changes (removal of httpx-aiohttp dependency) introduced after v1.71.1, which affected VCR recording used in testing.

- Python
Published by SergiiShcherbak 11 months ago

contextgem - v0.10.0

Added

RatingConcept now supports tuple format for rating scales: Use (start, end) tuples instead of RatingScale objects for simpler API. Example: rating_scale=(1, 5) instead of rating_scale=RatingScale(start=1, end=5).

Deprecated

RatingScale class is deprecated and will be removed in v1.0.0. Use tuple format (start, end) instead for rating scales in RatingConcept.

- Python
Published by SergiiShcherbak 11 months ago

contextgem - v0.9.0

New exception handling for LLM extraction methods: Added raise_exception_on_extraction_error parameter (default is True) to LLM extraction methods. Controls whether to raise an exception when LLM returns invalid data (LLMExtractionError) or when there is an error calling LLM API (LLMAPIError). When False, warnings are issued and no extracted items are returned.

- Python
Published by SergiiShcherbak 11 months ago

contextgem - v0.8.2

Improved prompts.

- Python
Published by SergiiShcherbak 11 months ago

contextgem - v0.8.1

Added documentation on troubleshooting issues with small models.

- Python
Published by SergiiShcherbak 11 months ago

contextgem - v0.8.0

Deferred SaT segmentation: SaT segmentation is now performed only when actually needed, improving both document initialization and extraction performance, as some extraction workflows may not require SaT segmentation.

- Python
Published by SergiiShcherbak 11 months ago

contextgem - v0.7.0

Added

DocxConverter upgrade: migrated to high-performance lxml library for parsing DOCX document structure, added processing of links and inline formatting, improved conversion accuracy.
Integrated Bandit security scanning across development pipeline.

Changed

Updated documentation to reflect the above changes.

- Python
Published by SergiiShcherbak 12 months ago

contextgem - v0.6.1

v0.6.1

Updated documentation for LM Studio models to clarify dummy API key requirement.

- Python
Published by SergiiShcherbak 12 months ago

contextgem - v0.6.0

v0.6.0

Added LabelConcept - a classification concept type that categorizes content using predefined labels.

- Python
Published by SergiiShcherbak 12 months ago

contextgem - v0.5.0

Fixed params handling for reasoning (CoT-capable) models other than OpenAI o-series. Enabled automatic retry of LLM calls with dropping unsupported params if such unsupported params were set for the model. Improved handling and validation of LLM call params.

Migrated to wtpsplit-lite - a lightweight version of wtsplit that only retains accelerated ONNX inference of SaT models with minimal dependencies.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.4.1

Comprehensive docs on extracting aspects, extracting concepts, and LLM extraction methods.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.4.0

Support for local SaT model paths in Document's sat_model_id parameter.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.3.0

Expanded JsonObjectConcept to support nested class hierarchies, nested dictionary structures, lists containing objects, and literal types.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.2.4

Fixed

Removed 'think' tags and content from LLM outputs (e.g. when using DeepSeek R1 via Ollama) which was breaking JSON parsing and validation

Added

Documentation for cloud/local LLMs and LLM configuration guide

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.2.3

Updated litellm dependency version after encoding bug has been fixed upstream. Updated README.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.2.2

Refactored DOCX converter internals for better maintainability. Updated README. Added CHANGELOG.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.2.1

Fix: encoding bug in litellm > v1.67.1. Docs update.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.2.0

DocxConverter - convert DOCX files to ContextGem Document objects, while preserving document structure with rich metadata for improved LLM analysis.

- Python
Published by SergiiShcherbak about 1 year ago

contextgem - v0.1.2

Fixed Azure OpenAI params for o1/o3. Added reasoning_effort param for o1/o3. Added LLM chat. Updated docs & tests.

- Python
Published by SergiiShcherbak about 1 year ago

Recent Releases of contextgem

contextgem - v0.18.0

Added

contextgem - v0.17.1

Changed

contextgem - v0.17.0

Added

contextgem - v0.16.1

Fixed

contextgem - v0.16.0

Added

contextgem - v0.15.0

Added

Refactor

Docs

contextgem - v0.14.4

Fixed

contextgem - v0.14.3

Fixed

contextgem - v0.14.2

Added

contextgem - v0.14.1

Added

Changed

contextgem - v0.14.0

Added

Changed

contextgem - v0.13.0

Changed

Fixed

contextgem - v0.12.1

Added

contextgem - v0.12.0

Fixed

Changed

contextgem - v0.11.1

Fixed

contextgem - v0.11.0

contextgem - v0.10.0

Added

Deprecated

contextgem - v0.9.0

contextgem - v0.8.2

contextgem - v0.8.1

contextgem - v0.8.0

contextgem - v0.7.0

Added

Changed

contextgem - v0.6.1

contextgem - v0.6.0

contextgem - v0.5.0

contextgem - v0.4.1

contextgem - v0.4.0

contextgem - v0.3.0

contextgem - v0.2.4

Fixed

Added

contextgem - v0.2.3

contextgem - v0.2.2

contextgem - v0.2.1

contextgem - v0.2.0

contextgem - v0.1.2

contextgem - v0.1.1.post1

contextgem - Release 0.1.1

contextgem - Initial release v0.1.0