Recent Releases of kani
kani - v1.6.0
New feature: Multimodal inputs
kani-multimodal-core should be installed alongside the core kani install using an extra:
shell
$ pip install "kani[multimodal]"
However, you can also explicitly specify a version and install the core package itself:
shell
$ pip install kani-multimodal-core
Features
This package provides the core multimodal extensions that engine implementations can use -- it does not provide any engine implementations on its own.
The package adds support for:
- Images (
kani.ext.multimodal_core.ImagePart) - Audio (
kani.ext.multimodal_core.AudioPart) - Video (
kani.ext.multimodal_core.VideoPart) - Other binary files, such as PDFs (
kani.ext.multimodal_core.BinaryFilePart)
When installed, these core kani engines will automatically use the multimodal parts:
- OpenAIEngine
- AnthropicEngine
- GoogleAIEngine
Additionally, the core kani chat_in_terminal method will support attaching multimodal data from a local drive or
from the internet using @/path/to/media or @https://example.com/media.
Message Parts
The main feature you need to be familiar with is the MessagePart, the core way of sending messages to the engine.
To do this, when you call the kani round methods (i.e. Kani.chat_round or Kani.full_round or their str variants),
pass a list of multimodal parts rather than a string:
```python from kani import Kani from kani.engines.openai import OpenAIEngine from kani.ext.multimodal_core import ImagePart
engine = OpenAIEngine(model="gpt-4.1-nano") ai = Kani(engine)
notice how the arg is a list of parts rather than a single str!
msg = await ai.chatroundstr([ "Please describe this image:", ImagePart.from_file("path/to/image.png") ]) print(msg) ```
See the docs (https://kani-multimodal-core.readthedocs.io) for more information about the provided message parts.
Terminal Utility
When installed, kani-multimodal-core augments the chat_in_terminal utility provided by kani.
This utility allows you to provide multimodal media on your disk or on the internet inline by prepending it with an @ symbol:
```pycon
from kani import chatinterminal chatinterminal(ai) USER: Please describe this image: @path/to/image.png and also this one: @https://example.com/image.png ```
- Added native support for multimodal (image, video, audio) models using the
kani-multimodal-corepackage (https://github.com/zhudotexe/kani-multimodal-core)!- The AnthropicEngine, OpenAIEngine, and GoogleAIEngine will automatically support multimodal inputs when
kani-multimodal-coreis installed
- The AnthropicEngine, OpenAIEngine, and GoogleAIEngine will automatically support multimodal inputs when
New feature: Native Google Gemini support
shell
$ pip install "kani[google]"
```python from kani import Kani from kani.engines.google import GoogleAIEngine
engine = GoogleAIEngine(model="gemini-2.5-flash") ```
This engine supports all Google AI models through the Google AI Studio API.
See https://ai.google.dev/gemini-api/docs/models for a list of available models.
Multimodal support: images, audio, video.
- Added the
GoogleAIEnginefor Google Gemini support - supports function calling & multimodal inputs
New feature: kani CLI tool
When kani is installed, you can run $ kani provider:model-id to begin chatting with a model in your terminal!
Examples:
shell
$ kani openai:gpt-4.1-nano
$ kani huggingface:meta-llama/Meta-Llama-3-8B-Instruct
$ kani anthropic:claude-sonnet-4-0
$ kani google:gemini-2.5-flash
This CLI helper automatically creates a Engine and Kani instance, and calls chat_in_terminal() so you can test LLMs faster. Just as with chat_in_terminal(), you can use @/path/to/file or @https://example.com/file to attach multimodal parts to your CLI inputs.
- Added a
kaniCLI tool for easy chatting in terminal- Use
@/path/to/fileor@https://example.com/fileto upload a multimodal file to your CLI
- Use
Additional features & fixes
Added
Message.extrasto store arbitrary additional information with a Message object- Certain engines will set an extra to store the raw response returned by the API (see engine docs)
- For example, to access the detailed usage object returned by an
OpenAIEngine, you can use:python msg = await ai.chat_round(...) # or .full_round openai_usage = msg.extra["openai_usage"]
Added a
save_formatparameter toKani.save()to allow saving to a.kanifile instead of.json- Saving to a
.kanifile is used by default unless the filename given toKani.save()ends with.json - A
.kanifile is a ZIP file containing the saved chat state of the Kani instance - Certain extensions (e.g.,
kani-multimodal-core) may save additional files to the.kaniarchive to save multimodal MessageParts without inflating the size of the saved JSON file
- Saving to a
Fixed the kani CLI not always quitting on ^C
Engine-specific
- Anthropic: Handle PDF inputs using
kani.ext.multimodal_core.BinaryFilePart - Hugging Face: Load on MPS by default when detected on a macOS system
- Python
Published by zhudotexe 9 months ago
kani - v1.5.0
GPT-OSS and GPT-5
GPT-OSS and GPT-5 are now supported in kani>=1.5.0! You can use this code to get started with full function calling + reasoning capabilities: ```python from kani import Kani, chatinterminal from kani.engines.huggingface import HuggingEngine from kani.modelspecific.gptoss import GPTOSSParser
this method is the same for the 20B and 120B variants - simply replace the model ID!
model = HuggingEngine( modelid="openai/gpt-oss-20b", chattemplatekwargs=dict(reasoningeffort="low"), # set this to "low", "medium", or "high" eostokenid=[200002, 199999, 200012], # ensures the model stops correctly on tool calls temperature=1.0, # suggested decoding parameter topk=None, # ensure we do not use topk (transformers default =50) ) engine = GPTOSSParser(model, showreasoninginstream=True) ai = Kani(engine) chatin_terminal(ai) ```
Full release notes
- Added support for GPT-OSS with a model-specific parser/pipeline
- Added support for GPT-5 in the OpenAIEngine
- Added automatic handwritten model pipelines: when using a HuggingEngine whose model requires slightly more logic than the provided chat template, Kani automatically selects a correct handwritten prompt pipeline (in
kani.model_specific) - Fixed some issues with the HF Chat Template based pipeline not sending tools in the correct schema
- Fixed an issue where certain argument names could not be passed to
ToolCall.from_function - Fixed an issue with importing the OpenAIEngine on
openai-python>=1.99.2 - Made the HuggingEngine return non-EOS special tokens
- Optimized the throughput and memory usage of the HuggingEngine
- BREAKING: Moved existing handwritten model pipelines from
prompts/impltomodel_specific - BREAKING: Moved existing tool parsers from
tool_parserstomodel_specific - BREAKING: Removed Vicuna 1.3: this model was a very old fine-tune of Llama v1, and removed to reduce the maintenance burden of the library.
- Python
Published by zhudotexe 10 months ago
kani - v1.4.3
- Llama.cpp: Add
model_pathkwarg to allow loading local GGUF models (thanks @lawrenceakka!)
[!NOTE] This is technically a minor breaking change, as the position of arguments has changed. I recommend using keyword arguments to load any models.
- Hugging Face: Do not set
max_lengthgeneration parameter ifmax_new_tokensis set to avoid a verbose warning - OpenAI: Add default context lengths for o-series models, GPT-4.1, add warning for models without default context lengths
- Python
Published by zhudotexe 12 months ago
kani - v1.4.1
- Added better options for controlling the JSON Schema generated by an AIFunction
- Generated JSON Schema now includes a function's docstring by default as the top-level
descriptionkey - Generated JSON Schema's top-level
titlekey is now a function's name instead of_FunctionSpecby default - Generated JSON Schema's fields only include a
titlekey if atitlekwarg is explicitly passed toAIParam(fixing a regression introduced some time ago)
These changes should have no effect on OpenAI function calling; these changes are made to improve compatibility with open models that use raw JSON Schema to define functions (e.g., Step-Audio).
- Python
Published by zhudotexe about 1 year ago
kani - v1.4.0
Mainly improvements to the llama.cpp engine in this release.
Improvements
- Update the
LlamaCppEngineto not use the Llama 2 prompt pipeline by default. Prompt pipelines must now be explicitly passed. - The
LlamaCppEnginewill now automatically download additional GGUF shards when a sharded model is given. - Added
ChatTemplatePromptPipeline.from_pretrainedto create a prompt pipeline from the chat template of any model on the HF Hub, by ID. - Added examples and documentation for using DeepSeek-R1 (quantized).
Fixes
chat_in_terminal_asyncno longer blocks the asyncio event loop when waiting for input from the terminal.- Fixed the
LlamaCppEnginenot passing functions to the provided prompt pipeline.
- Python
Published by zhudotexe over 1 year ago
kani - v1.3.0
Enhancements
- Added
ToolCallParsers -- these classes are wrappers around KaniEngines that parse raw text generated by a model, and return Kani-format tool calls. This is an easy way to enable tool calling on open-source models!
Example:
python
from kani.engines.huggingface import HuggingEngine
from kani.prompts.impl.mistral import MISTRAL_V3_PIPELINE
from kani.tool_parsers.mistral import MistralToolCallParser
model = HuggingEngine(model_id="mistralai/Mistral-Small-Instruct-2409", prompt_pipeline=MISTRAL_V3_PIPELINE)
engine = MistralToolCallParser(model)
- Added
NaiveJSONToolCallParser(e.g., Llama 3) - Added
MistralToolCallParser - Added
DeepseekR1ToolCallParser
Bug Fixes et al.
- Fix compatibility issues with Pydantic 2.10
- Update documentation to better reflect supported HF models
- Python
Published by zhudotexe over 1 year ago
kani - v1.2.3
- Fixes Anthropic tool calling being broken with anthropic-sdk>0.26.0
- Fixes an issue where Anthropic prompts were over-eagerly trimming prompts that did not start with a user message
- Added support for tool calling while streaming with Anthropic models
- Python
Published by zhudotexe over 1 year ago
kani - v1.2.2
- fix(mistral): ensure prompt and completion tokens are passed through in the MistralFunctionCallingAdapter when streaming
- fix(streaming): don't emit text in DummyStream if it is None
- feat: add standalone width formatters
- docs: gpt-3.5-turbo -> gpt-4o-mini defaults
- fix(streaming): potential line len miscount in format_stream
- Python
Published by zhudotexe over 1 year ago
kani - v1.2.1
- Fixes various issues in the
MistralFunctionCallingAdapterwrapper engine for Mistral-Large and Mistral-Small function calling models. - Fixes an issue in
PromptPipeline.explain()where manual examples would not be explained. - Fixes an issue in
PromptPipeline.ensure_bound_function_calls()where passing an ID translator would mutate the ID of the underlying messages
- Python
Published by zhudotexe over 1 year ago
kani - v1.2.0
New Features
- Hugging Face: Models loaded through the
HuggingEnginenow use chat templates for conversational prompting and tool usage if available by default. This should make it much easier to get started with a Hugging Face model in Kani. - Added the ability to supply a custom tokenizer to the
OpenAIEngine(e.g., for using OpenAI-compatible APIs)\
Fixes/Improvements
- Fixed a missing dependency in the
llamaextra - The
HuggingEnginewill now automatically setdevice_map="auto"if theacceleratelibrary is installed
- Python
Published by zhudotexe over 1 year ago
kani - v1.1.0
- Added
max_function_roundstoKani.full_round,Kani.full_round_str, andKani.full_round_stream: > The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. > Default unlimited (continues until model's response does not contain a function call). - Added
__repr__to engines - Fixed an issue where Kani could underestimate the token usage for certain OpenAI models using parallel function calling
- Python
Published by zhudotexe almost 2 years ago
kani - v1.0.2
- Add
Kani.add_completion_to_history(useful for token counting, see #29) - Add an example of an AIFunction definition to
PromptPipeline.explain()when a function-related step is included - Add
id_translatorarg toPromptPipeline.ensure_bound_function_calls() - Ensure that OpenAIEngine and HuggingEngine streams yield a completion including prompt and completion token usage
- Various Mistral-7B Instruct v0.3 prompt fixes
- Python
Published by zhudotexe about 2 years ago
kani - v1.0.0
New Features
Streaming
kani now supports streaming to print tokens from the engine as they are received! Streaming is designed to be a drop-in superset of the chat_round and full_round methods, allowing you to gradually refactor your code without ever leaving it in a broken state.
To request a stream from the engine, use Kani.chat_round_stream() or Kani.full_round_stream(). These methods will return a StreamManager, which you can use in different ways to consume the stream.
The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str. ```py
CHAT ROUND:
stream = ai.chatroundstream("What is the airspeed velocity of an unladen swallow?") async for token in stream: print(token, end="") msg = await stream.message()
FULL ROUND:
async for stream in ai.fullroundstream("What is the airspeed velocity of an unladen swallow?"):
async for token in stream:
print(token, end="")
msg = await stream.message()
After a stream finishes, its contents will be available as a `ChatMessage`. You can retrieve the final message or BaseCompletion with:
py
msg = await stream.message()
completion = await stream.completion()
```
The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.
[!TIP] For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:
py msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")(note the await that is not present in the above examples). This allows you to refactor your code by changing chatround to chatroundstream without other changes. ```diff - msg = await ai.chatround("What is the airspeed velocity of an unladen swallow?") + msg = await ai.chatroundstream("What is the airspeed velocity of an unladen swallow?") ```
Issue: #30
New Models
kani now has bundled support for the following new models:
Hosted
- Claude 3 (including function calling)
Open Source
- Llama 3 (all sizes)
- Command R and Command R+ (including function calling)
- Mistral-7B and Mixtral-8x7B
- Gemma (all sizes)
Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers or llama.cpp using the new Prompt Pipelines feature (see below)!
Issue: #34
llama.cpp
To use GGUF-quantized versions of models, kani now supports the LlamaCppEngine, which uses the llama-cpp-python library to interface with the llama.cpp library. Any model with a GGUF version is compatible with this engine!
Prompt Pipelines
A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage into an engine-specific format using fluent-style chaining.
To build a pipeline, create an instance of PromptPipeline() and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.
Pipelines come with a built-in explain() method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).
Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:
```py from kani import PromptPipeline, ChatRole
LLAMA2_PIPELINE = ( PromptPipeline()
# System messages should be wrapped with this tag. We'll translate them to USER
# messages since a system and user message go together in a single [INST] pair.
.wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
.translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)
# If we see two consecutive USER messages, merge them together into one with a
# newline in between.
.merge_consecutive(role=ChatRole.USER, sep="\n")
# Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
# generations).
.merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")
# Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
# message list ends with an ASSISTANT message, don't add the EOS token
# (we want the model to continue the generation).
.conversation_fmt(
user_prefix="<s>[INST] ",
user_suffix=" [/INST]",
assistant_prefix=" ",
assistant_suffix=" </s>",
assistant_suffix_if_last="",
)
)
We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()
And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2PIPELINE(ai.getprompt()) ```
Integration with HuggingEngine and LlamaCppEngine
Previously, to use a model with a different prompt format than the ones bundled with the library, one had to create a subclass of the HuggingEngine to implement the prompting scheme. With the release of Prompt Pipelines, you can now supply a PromptPipeline in addition to the model ID to use the HuggingEngine directly!
For example, the LlamaEngine (huggingface) is now equivalent to the following:
py
engine = HuggingEngine(
"meta-llama/Llama-2-7b-chat-hf",
prompt_pipeline=LLAMA2_PIPELINE
)
The engine will use the passed pipeline to automatically infer a model's token usage, making it easier than ever to implement new models.
Issue: #32
Improvements
- The
OpenAIEnginenow uses the officialopenai-pythonpackage. (#31)- This means that
aiohttpis no longer a direct dependency, and theHTTPClienthas been deprecated. For API-based models, we recommend using thehttpxlibrary.
- This means that
- Added arguments to the
chat_in_terminalhelper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33) - The
HuggingEnginecan now automatically determine a model's context length. - Added a warning message if an
@ai_functionis missing a docstring. (#37) - Added
WrapperEngineto make writing wrapper extensions easier.
Breaking Changes
- All
kanimodels (e.g.ChatMessage) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly. - As the
ctransformerslibrary does not appear to be maintained, we have removed theCTransformersEngineand replaced it with theLlamaCppEngine. - The arguments to
chat_in_terminal(except the first) are now keyword-only. - The arguments to
HuggingEngine(exceptmodel_id,max_context_size, andprompt_pipeline) are now keyword-only. - Generation arguments for OpenAI models now take dictionaries rather than
kani.engines.openai.models.*models. (If you aren't sure if you're affected by this, you probably aren't.)
Bug Fixes
- Fixed an issue with Claude 3 and parallel function calling.
It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.
- Python
Published by zhudotexe about 2 years ago
kani - v1 Release Candidate 0
New Features
Streaming
kani now supports streaming to print tokens from the engine as they are received! Streaming is designed to be a drop-in superset of the chat_round and full_round methods, allowing you to gradually refactor your code without ever leaving it in a broken state.
To request a stream from the engine, use Kani.chat_round_stream() or Kani.full_round_stream(). These methods will return a StreamManager, which you can use in different ways to consume the stream.
The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str. ```py
CHAT ROUND:
stream = ai.chatroundstream("What is the airspeed velocity of an unladen swallow?") async for token in stream: print(token, end="") msg = await stream.message()
FULL ROUND:
async for stream in ai.fullroundstream("What is the airspeed velocity of an unladen swallow?"):
async for token in stream:
print(token, end="")
msg = await stream.message()
After a stream finishes, its contents will be available as a `ChatMessage`. You can retrieve the final message or BaseCompletion with:
py
msg = await stream.message()
completion = await stream.completion()
```
The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.
[!TIP] For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:
py msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")(note the await that is not present in the above examples). This allows you to refactor your code by changing chatround to chatroundstream without other changes. ```diff - msg = await ai.chatround("What is the airspeed velocity of an unladen swallow?") + msg = await ai.chatroundstream("What is the airspeed velocity of an unladen swallow?") ```
Issue: #30
New Models
kani now has bundled support for the following new models:
Hosted
- Claude 3 (including function calling)
Open Source
- Command R and Command R+ (including function calling)
- Mistral-7B and Mixtral-8x7B
- Gemma (all sizes)
Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers or llama.cpp using the new Prompt Pipelines feature (see below)!
Issue: #34
llama.cpp
To use GGUF-quantized versions of models, kani now supports the LlamaCppEngine, which uses the llama-cpp-python library to interface with the llama.cpp library. Any model with a GGUF version is compatible with this engine!
Prompt Pipelines
A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage into an engine-specific format using fluent-style chaining.
To build a pipeline, create an instance of PromptPipeline() and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.
Pipelines come with a built-in explain() method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).
Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:
```py from kani import PromptPipeline, ChatRole
LLAMA2_PIPELINE = ( PromptPipeline()
# System messages should be wrapped with this tag. We'll translate them to USER
# messages since a system and user message go together in a single [INST] pair.
.wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
.translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)
# If we see two consecutive USER messages, merge them together into one with a
# newline in between.
.merge_consecutive(role=ChatRole.USER, sep="\n")
# Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
# generations).
.merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")
# Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
# message list ends with an ASSISTANT message, don't add the EOS token
# (we want the model to continue the generation).
.conversation_fmt(
user_prefix="<s>[INST] ",
user_suffix=" [/INST]",
assistant_prefix=" ",
assistant_suffix=" </s>",
assistant_suffix_if_last="",
)
)
We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()
And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2PIPELINE(ai.getprompt()) ```
Integration with HuggingEngine and LlamaCppEngine
Previously, to use a model with a different prompt format than the ones bundled with the library, one had to create a subclass of the HuggingEngine to implement the prompting scheme. With the release of Prompt Pipelines, you can now supply a PromptPipeline in addition to the model ID to use the HuggingEngine directly!
For example, the LlamaEngine (huggingface) is now equivalent to the following:
py
engine = HuggingEngine(
"meta-llama/Llama-2-7b-chat-hf",
prompt_pipeline=LLAMA2_PIPELINE
)
Issue: #32
Improvements
- The
OpenAIEnginenow uses the officialopenai-pythonpackage. (#31)- This means that
aiohttpis no longer a direct dependency, and theHTTPClienthas been deprecated. For API-based models, we recommend using thehttpxlibrary.
- This means that
- Added arguments to the
chat_in_terminalhelper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33) - The
HuggingEnginecan now automatically determine a model's context length. - Added a warning message if an
@ai_functionis missing a docstring. (#37)
Breaking Changes
- All
kanimodels (e.g.ChatMessage) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly. - As the
ctransformerslibrary does not appear to be maintained, we have removed theCTransformersEngineand replaced it with theLlamaCppEngine. - The arguments to
chat_in_terminal(except the first) are now keyword-only. - The arguments to
HuggingEngine(exceptmodel_id,max_context_size, andprompt_pipeline) are now keyword-only. - Generation arguments for OpenAI models now take dictionaries rather than
kani.engines.openai.models.*models. (If you aren't sure if you're affected by this, you probably aren't.)
It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.
- Python
Published by zhudotexe about 2 years ago
kani - v0.8.0
Most likely the last release before v1.0! This update mostly contains improvements to chat_in_terminal to improve usability in interactive environments like Jupyter Notebook.
Possible Breaking Change
All arguments to chat_in_terminal except the Kani instance must now be keyword arguments; positional arguments are no longer accepted.
For example, chat_in_terminal(ai, 1, "!stop") must now be written chat_in_terminal(ai, rounds=1, stopword="!stop").
Improvements
- You may now specify
Noneas the user query inchat_roundandfull_round. This will request a new ASSISTANT message without adding a USER message to the chat history (e.g. to continue an unfinished generation).
Added the following keyword args to chat_in_terminal to improve usability in interactive environments like Jupyter Notebook:
- echo: Whether to echo the user's input to stdout after they send a message (e.g. to save in interactive notebook outputs; default false)
- ai_first: Whether the user should send the first message (default) or the model should generate a completion before prompting the user for a message.
- width: The maximum width of the printed outputs (default unlimited).
- showfunctionargs: Whether to print the arguments the model is calling functions with for each call (default false).
- showfunctionreturns: Whether to print the results of each function call (default false).
- verbose: Equivalent to setting
echo,show_function_args, andshow_function_returnsto True.
- Python
Published by zhudotexe about 2 years ago
kani - v0.7.0
New Features
- Added support for the Claude API through the
AnthropicEngine- Currently, this is only for chat messages - we don't yet have access to the new function calling API. We plan to add Claude function calling to Kani as soon as we get access!
- Renamed
ToolCallErrorto a more generalPromptError- Technically a minor breaking change, though a search of GitHub shows that no one has used
ToolCallErroryet
- Technically a minor breaking change, though a search of GitHub shows that no one has used
Fixes
- Fixed an issue where parallel tool calls could not be validated (thanks @arturoleon!)
- Python
Published by zhudotexe over 2 years ago
kani - v0.6.0
As of Nov 6, 2023, OpenAI added the ability for a single assistant message to request calling multiple functions in
parallel, and wrapped all function calls in a ToolCall wrapper. In order to add support for this in kani while
maintaining backwards compatibility with OSS function calling models, a ChatMessage now actually maintains the
following internal representation:
ChatMessage.function_call is actually an alias for ChatMessage.tool_calls[0].function. If there is more
than one tool call in the message, when trying to access this property, kani will raise an exception.
To translate kani's FUNCTION message types to OpenAI's TOOL message types, the OpenAIEngine now performs a translation based on binding free tool call IDs to following FUNCTION messages deterministically.
Breaking Changes
To the kani end user, there should be no change to how functions are defined and called. One breaking change was necessary:
Kani.do_function_callandKani.handle_function_call_exceptionnow take an additionaltool_call_idparameter, which may break overriding functions. The documentation has been updated to encourage overriders to handle*args, **kwargsto prevent this happening again.
New Features
kani can now handle making multiple function calls in parallel if the model requests it. Rather than returning an ASSISTANT message with a single function_call, an engine can now return a list of tool_calls. kani will resolve these tool calls in parallel using asyncio, and add their results to the chat history in the order of the list provided.
Returning a single function_call will continue to work for backwards compatibility.
- Python
Published by zhudotexe over 2 years ago
kani - v0.5.1
- OpenAI: The OpenAIClient (internal class used by OpenAIEngine) now expects
OpenAIChatMessages as input rather thankani.ChatMessagein order to better type-validate API requests - OpenAI: Updated token estimation to better reflect current token counts returned by the API
- Python
Published by zhudotexe over 2 years ago
kani - v0.5.0
New Feature: Message Parts API
The Message Parts API is intended to provide a foundation for future multimodal LLMs and other engines that require engine-specific input without compromising kani's model-agnostic design. This is accomplished by allowing ChatMessage.content to be a list of MessagePart objects, in addition to a string.
This change is fully backwards-compatible and will not affect existing code.
When writing code with compatibility in mind, the ChatMessage class exposes ChatMessage.text (always a string or None) and ChatMessage.parts (always a list of message parts), which we recommend using instead of ChatMessage.content. These properties are dynamically generated based on the underlying content, and it is safe to mix messages with different content types in a single Kani.
Generally, message part classes are defined by an engine, and consumed by the developer. Message parts can be used in any role’s message - for example, you might use a message part in an assistant message to separate out a chain of thought from a user reply, or in a user message to supply an image to a multimodal model.
For more information, see the Message Parts documentation.
Up next: we're adding support for multimodal vision-language models like LLaVA and GPT-Vision through a kani extension!
Improvements
- LLaMA 2: Improved the prompting in non-strict mode to group consecutive user/system messages into a single
[INST]wrapper. See the tests for how kani translates consecutive message types into the LLaMA prompt. - Other documentation and minor improvements
- Python
Published by zhudotexe over 2 years ago
kani - v0.4.0
BREAKING CHANGES
Kani.full_roundnow emits every message generated during the round, not just assistant messages- This means that you will need to handle
FUNCTIONmessages, and potentiallySYSTEMmessages from a function exception handler. Kani.full_round_str's default behaviour is unchanged.
- This means that you will need to handle
Kani.full_round_strnow takes in amessage_formatterrather than afunction_call_formatter- By default, this handler only returns the contents of
ASSISTANTmessages.
- By default, this handler only returns the contents of
Kani.do_function_callnow returns aFunctionCallResultrather than abool- To migrate any overriding functions, you should change the following:
- Rather than calling
Kani.add_to_historyin the override, save the ChatMessage to a variable - Update the return value from a boolean to
FunctionCallResult(is_model_turn=<old return value>, message=<message from above>)
Kani.handle_function_call_exceptionnow returns aExceptionHandleResultrather than abool- To migrate any overriding functions, you should change the following:
- Rather than calling
Kani.add_to_historyin the override, save the ChatMessage to a variable - Update the return value from a boolean to
ExceptionHandleResult(should_retry=<old return value>, message=<message from above>)
Improvements
- Added
kani.utils.message_formatters - Added
kani.ExceptionHandleResultandkani.FunctionCallResult - Documentation improvements
Fixes
- Fixed an issue where
ChatMessage.copy_withcould cause unset values to appear in JSON serializations
- Python
Published by zhudotexe over 2 years ago
kani - v0.3.4
Improvements
- Updated dependencies to allow more recent versions
- The documentation now shows fully-qualified class names in reference sections
- Added
.copy_withmethod to ChatMessage and FunctionCall to make updating chat history easier - Various documentation updates
- Python
Published by zhudotexe over 2 years ago
kani - v0.3.3
Improvements
- Added a warning in
Kani.chat_roundto useKani.full_roundwhen AI functions are defined - Added examples in Google Colab
- Other documentation improvements
Fixes
- Fixed an issue where the ctransformers engine could overrun its context length (e.g. see https://github.com/zhudotexe/kani/actions/runs/6152842183/job/16695721588)
- Python
Published by zhudotexe over 2 years ago
kani - v0.3.0
Improvements
- Added
Kani.add_to_history, a method that is called whenever kani adds a new message to the chat context httpclient.BaseClient.requestnow returns aResponseto aid low-level implementation.get()and.post()are unchanged
- Add additional documentation about GPU support for local models
- Other documentation improvements
- Python
Published by zhudotexe over 2 years ago
kani - v0.2.0
Improvements
- Engines: Added
Engine.function_token_reserve()to dynamically reserve a number of tokens for a function list - OpenAI: The OpenAIEngine now reads the
OPENAI_API_KEYenvironment variable by default if no api key or client is specified - Documentation improvements (polymorphism, mixins, extension packages)
- Python
Published by zhudotexe almost 3 years ago
kani - v0.1.0
BREAKING CHANGES
These should hopefully be the last set of breaking changes until v1.0. We're finalizing some of the attribute names for clarity and publication.
- renamed
Kani.always_include_messagestoKani.always_included_messages
Features & Improvements
@ai_functions with synchronous signatures now run in a thread pool in order to prevent blocking the asyncio event loop- OpenAI: Added the ability to specify the API base and additional headers (e.g. for proxy APIs).
- Various documentation improvements
- Python
Published by zhudotexe almost 3 years ago
kani - v0.0.3
BREAKING CHANGES
- Renamed
Kani.get_truncated_chat_historytoKani.get_prompt
Additions & Improvements
- Added
CTransformersEngineandLlamaCTransformersEngine(thanks @Maknee!) - Added a lower-level
Kani.get_model_completionto make a prediction at the current chat state (without modifying the chat history) - Added the
auto_truncateparam to@ai_functionto opt in to kani trimming long responses from a function (i.e., responses that do not fit in a model's context) - Improved the internal handling of tokens when the chat history is directly modified
ChatMessage.[role]()classmethods now pass kwargs to the constructor- LLaMA: Improved the fidelity of non-strict-mode LLaMA prompting
- OpenAI: Added support for specifying an OpenAI organization and configuring retry
- Many documentation improvements
Fixes
- OpenAI message length could return too short on messages with no content
- Other minor fixes and improvements
- Python
Published by zhudotexe almost 3 years ago