floneum

https://github.com/floneum/floneum - kalosm-0.4

Introducing Kalosm 0.4: A Local-First AI Meta-Framework for Rust

Kalosm is a simple, local first interface for pre-trained language, audio, and image models written in Rust. It is built on top of the candle library for ML inference. Kalosm makes it easy to run quantized models on your local machine accelerated by cuda or metal. Whether you’re building a chatbot, a transcription tool, or an image processing application, Kalosm 0.4 brings major improvements to help you get started quickly and efficiently.

In this release, we’re excited to announce:

A Simplified API for language models, tasks, and chat sessions
Extended support for remote models via OpenAI and Anthropic adapters with structured generation
Single-file model loading that consolidates model weights, tokenizer, and chat template metadata into one GGUF file
Improved whisper transcription with word-level timestamps using dynamic time warping

Simplified LLM API

One of the biggest improvements in Kalosm 0.4 is the streamlined API for interacting with local large language models. Developers can now configure chat sessions and tasks with a series of builder methods until the first message is added. This flexibility simplifies the code and makes your applications easier to maintain.

For example, here’s how you can set up a chat session with a system prompt:

```rust use kalosm::language::*;

// Create a new chat-capable Llama model let model = Llama::new_chat().await?;

// Configure a chat session with a system prompt let mut chat = model .chat() .withsystemprompt("The assistant will act like a pirate");

loop { // Get user input and stream the model’s response to stdout chat(&promptinput("\n> ")?) .tostd_out() .await?; } ```

Similarly, the task API now supports automatic constraint inference. By deriving parsing and schema traits on your data types, you can easily generate concrete types from your language model:

```rust use kalosm::language::*;

/// A fictional account holder

[derive(Schema, Parse, Clone, Debug)]

struct Account { /// A brief summary of the account holder #[parse(pattern = r"[a-zA-Z,.?!\d ]{1,80}")] summary: String, /// The account holder’s name (full name or pseudonym) #[parse(pattern = "[a-zA-Z ]{1,20}")] name: String, /// The account holder’s age #[parse(range = 1..=100)] age: u8, }

// Create a small reasoning-focused model let llm = Llama::phi_3().await?;

// Create a task for generating accounts, with automatic type-based constraint inference let create_account = llm .task("You generate accounts based on a description of the account holder") .typed();

// Generate an account from a natural language prompt let account: Account = create_account( "Candice is the CEO of a fortune 500 company. She is 30 years old." ).await?;

println!("Generated Account: {account:?}"); ```

OpenAI and Anthropic Support with Structured Generation

Kalosm 0.4 extends its support for remote models by integrating with the OpenAI and Anthropic chat APIs. Now you can use Kalosm’s structured generation even when working with remote chat models. Unfortunately Anthropic currently doesn't support structured generation, so support for tasks with Anthropic models is currently limited.

To work with OpenAI models, enable the openai feature and set your API key via the OPENAI_API_KEY environment variable. Then create a model adapter:

```rust // Create a chat model adapter for OpenAI-compatible models let llm = OpenAICompatibleChatModel::builder() .withgpt4o_mini() .build();

// Use the adapter just like any other chat model let generate_character = llm .task("You generate accounts based on a description of the account holder") .typed();

let account: Account = generate_character( "Candice is the CEO of a fortune 500 company. She is 30 years old." ).await?; println!("Generated Account: {account:?}"); ```

For Anthropic models, enable the anthropic feature and set your ANTHROPIC_API_KEY:

```rust // Create a chat model adapter for Anthropic-compatible models let llm = AnthropicCompatibleChatModel::builder() .withclaude35haiku() .build();

// Start a chat session with a custom system prompt let mut chat = llm .chat() .withsystemprompt("The assistant will act like a pirate");

loop { chat(&promptinput("\n> ")?) .tostd_out() .await?; } ```

These integrations allow you to combine local inference with remote model capabilities and structured generation features, making it easier to build hybrid AI applications where local models may not be enough.

Single-File LLM Loading

Previously, loading a custom model required managing separate files for the tokenizer, chat template, and model weights. With version 0.4, Kalosm supports loading all model metadata from a single GGUF file. This improvement simplifies model distribution and deployment.

To load a custom model, you can now just point to a local or huggingface GGUF file:

```rust let model = Llama::builder() // Specify a custom model source using a GGUF file .withsource(LlamaSource::new( FileSource::HuggingFace { modelid: "QuantFactory/SmolLM-1.7B-Instruct-GGUF".tostring(), revision: "main".tostring(), file: "SmolLM-1.7B-Instruct.Q4KM.gguf".to_string(), }, )) .build() .await?;

let mut chat = model .chat() .withsystemprompt("The assistant will act like a pirate");

loop { chat(&promptinput("\n> ")?) .tostd_out() .await?; } ```

This unified approach to model loading makes it much easier to switch between different models and update to new models over time.

Timestamped Transcription with Whisper

Kalosm 0.4 also brings improvements to audio transcription. The new Whisper API supports word-level timestamps, which can be critical for applications that require precise alignment between audio and text.

Below is an example that demonstrates how to transcribe audio and print each segment along with its corresponding timestamps:

```rust // Build a new Whisper model using a quantized variant let model = WhisperBuilder::default() .with_source(WhisperSource::QuantizedLargeV3Turbo) .build() .await?;

// Open and decode an audio file let file = BufReader::new(File::open("./samples_jfk.wav")?); let audio = Decoder::new(file)?;

// Transcribe the audio with timestamping enabled let mut text = model.transcribe(audio).timestamped();

// Print each transcribed segment with its start and end times while let Some(segment) = text.next().await { for chunk in segment.chunks() { let timestamp = chunk.timestamp().unwrap(); println!("{:0.2}..{:0.2}: {}", timestamp.start, timestamp.end, chunk); } } ```

This level of granularity in transcription is particularly useful for applications like video captioning, meeting transcription, or any context where knowing the exact timing of spoken words matters.

Community and Support

Kalosm is built by a passionate community dedicated to making local AI inference accessible and efficient. For help, discussion, or to share feedback, join the Kalosm Discord community. Whether you’re troubleshooting, looking to contribute, or simply curious about the latest updates, our Discord is the best place to connect with fellow developers and the Kalosm team.

Conclusion

Kalosm 0.4 brings a range of enhancements—from simplified APIs and unified model loading to extended remote support and precise transcription capabilities—that we hope will make your development process smoother and more enjoyable. We're truly excited to see the creative applications and projects you build with these tools.

Happy coding!

- Rust
Published by ealmloff about 1 year ago

https://github.com/floneum/floneum - kalosm-0.3.0

Kalosm 0.3

Kalosm 0.3 makes it significantly easier to use structured generation, improves transcription, and makes it possible to track model download progress. It also includes performance improvements for text generation and transcription models along with parser improvements

Performance Improvements

The new version of Kalosm includes significant performance improvements for llama, mistral, and phi models. We have also developed sampler aware structured generation which lets us skip parsing most tokens in loose structures. Performance should be between 2-4x as fast depending on your usecase:

| Demo | Kalosm 0.2 | Kalosm 0.3 | |-------|------------|------------| | Text generation |

Structured Generation Improvements

Structured generation is both easier and faster in 0.3. Many structured generation tasks can use json. If you just need a json parser, kalosm 0.3 lets you derive the parser from your data: ```rust use kalosm::language::*;

/// A fictional character

[derive(Parse, Schema, Clone, Debug)]

struct Character { /// The name of the character #[parse(pattern = "[A-Z][a-z]{2,10} [A-Z][a-z]{2,10}")] name: String, /// The age of the character #[parse(range = 1..=100)] age: u8, /// A description of the character #[parse(pattern = "[A-Za-z ]{40,200}")] description: String, } ```

Then you can build a task that generates the character:

```rust

[tokio::main]

async fn main() { // First create a model. Chat models tend to work best with structured generation let model = Llama::newchat().await.unwrap(); // Then create a task with the parser as constraints let task = Task::builderfor::("You generate realistic JSON placeholders for characters") .build(); // Finally, run the task let mut stream = task.run("Create a random character", &model); stream.tostdout().await.unwrap(); let character = stream.await.unwrap(); println!("{character:?}"); } ```

Along with the parser, you can also derive a json schema that matches the parser which is useful for function calling models.

You can read more about how structured generation works in kalosm in our last blog post.

Streaming Voice Transcription

Kalosm 0.3 adds support for transcribing audio streams like microphone in chunks based on voice activity. You can now read the audio stream directly from the microphone and transcribe it as voices are detected:

```rust // Create a new whisper model. let model = Whisper::new().await.unwrap();

// Stream audio from the microphone let mic = MicInput::default(); let stream = mic.stream().unwrap();

// Transcribe the audio into text in chunks based on voice activity. let mut text_stream = stream.transcribe(model);

// Finally, print the text to the console textstream.tostd_out().await.unwrap(); ```

Model Progress

Loading models is now async with a callback for loading progress: rust let model = Bert::builder() // build with loading handler lets you track the progress of the model loading .build_with_loading_handler(|loading| match loading { ModelLoadingProgress::Downloading { source, start_time, progress, } => { let elapsed = start_time.elapsed(); println!("Downloading model from {source}...{progress}% (elapsed {elapsed:?})"); } ModelLoadingProgress::Loading { progress } => { println!("Loading model into memory...{progress}%"); } }) .await .unwrap();

Whisper transcriptions and wuerstchen generations are also async with progress info thanks to @newfla:

```rust // Create a new whisper model let model = WhisperBuilder::default() .with_source(WhisperSource::QuantizedDistilLargeV3) .build() .await.unwrap();

let mic = MicInput::default(); let audio = mic.stream().unwrap();

// Transcribe the source audio into text let mut text = audio.transcribe(model);

// As the model transcribes the audio, print the text to the console while let Some(chunk) = text.next().await { let text = chunk.asref(); println!("{text}"); println!( "estimated time left to decode chunk: {}s", chunk.remainingtime().as_secs() ); } ```

Documentation improvements

The inline documentation has been significantly improved in 0.3. Common items now include inline guides to help you get started like the language page and concept explanations like embeddings

New models!

Along with the new release, kalosm supports a few new models:

Quantized whisper models are now supported with presets for distilled versions of whisper to run even faster
The Phi-3 series of models is supported by kalosm-llama. The Phi series performs above its weight for structured json generation tasks

Full changelog

Implement token healing by @ealmloff in https://github.com/floneum/floneum/pull/149
Decouple models from tasks by @ealmloff in https://github.com/floneum/floneum/pull/150
Update candle and add metal support by @ealmloff in https://github.com/floneum/floneum/pull/153
Improve sidebar UI and add categories by @ealmloff in https://github.com/floneum/floneum/pull/155
Bump mio from 0.8.10 to 0.8.11 by @dependabot in https://github.com/floneum/floneum/pull/156
Improve model loading API by @ealmloff in https://github.com/floneum/floneum/pull/157
Bump actions/checkout from 3 to 4 by @dependabot in https://github.com/floneum/floneum/pull/160
Bump actions/upload-artifact from 3 to 4 by @dependabot in https://github.com/floneum/floneum/pull/159
Fix linux support by @ealmloff in https://github.com/floneum/floneum/pull/161
Pin wasmtime rev by @ealmloff in https://github.com/floneum/floneum/pull/163
Improve support for mkl by @newfla in https://github.com/floneum/floneum/pull/165
Plugin calculate by @LafCorentin in https://github.com/floneum/floneum/pull/166
Support starling beta and speed up token generation by @ealmloff in https://github.com/floneum/floneum/pull/168
Whisper & Wuerstchen download progress by @newfla in https://github.com/floneum/floneum/pull/169
wuerstchen resolution warnings and accelerator support by @ealmloff in https://github.com/floneum/floneum/pull/173
rwhisper: progress, elapsed time. estimate remaining time by @newfla in https://github.com/floneum/floneum/pull/174
Fix loading chat sessions on accelerators by @ealmloff in https://github.com/floneum/floneum/pull/175
Add support for quantized whisper models by @ealmloff in https://github.com/floneum/floneum/pull/176
Add distil whisper v3 large quantized by @ealmloff in https://github.com/floneum/floneum/pull/178
rwuerstchen: async api by @newfla in https://github.com/floneum/floneum/pull/177
Reference count language models by @ealmloff in https://github.com/floneum/floneum/pull/180
Make structured generation faster by @ealmloff in https://github.com/floneum/floneum/pull/181
Add wizard lm 2 by @ealmloff in https://github.com/floneum/floneum/pull/182
Simplify Parsers by @ealmloff in https://github.com/floneum/floneum/pull/183
Improve Floneum UI by @ealmloff in https://github.com/floneum/floneum/pull/158
Add phi-3 by @ealmloff in https://github.com/floneum/floneum/pull/185
Add a menu item to clear the current workflow by @ealmloff in https://github.com/floneum/floneum/pull/187
fix the call to unsafe function error by @haoxins in https://github.com/floneum/floneum/pull/188
fix linking cuda kernels on windows by @newfla in https://github.com/floneum/floneum/pull/189
Clean up kalosm examples by @ealmloff in https://github.com/floneum/floneum/pull/190
Add snowflake embedding models by @ealmloff in https://github.com/floneum/floneum/pull/191
Add extra context methods to simplify adding documents with database integration by @ealmloff in https://github.com/floneum/floneum/pull/192
Fix Rwuerstchen example link in Readme by @newfla in https://github.com/floneum/floneum/pull/193
Improve Bert model by @ealmloff in https://github.com/floneum/floneum/pull/194
Implement smarter rule based sentence chunking by @ealmloff in https://github.com/floneum/floneum/pull/196
Semantic chunking by @ealmloff in https://github.com/floneum/floneum/pull/197
Use the in place kv cache for faster long context token generation by @ealmloff in https://github.com/floneum/floneum/pull/198
Slowly expand the llama cache as we need to by @ealmloff in https://github.com/floneum/floneum/pull/199
Save/load classifier heads, add dropout layer and expose the learning rate by @ealmloff in https://github.com/floneum/floneum/pull/201
Optimize large bert batch sizes by @ealmloff in https://github.com/floneum/floneum/pull/202
Fix memory usage and add batch sizes to classifier training by @ealmloff in https://github.com/floneum/floneum/pull/203
Expose classifier probabilities by @ealmloff in https://github.com/floneum/floneum/pull/205
HTML chunking and simplification by @ealmloff in https://github.com/floneum/floneum/pull/200
Fix windows CI by @ealmloff in https://github.com/floneum/floneum/pull/207
Improve node interface by @ealmloff in https://github.com/floneum/floneum/pull/208
Cache embeddings by @ealmloff in https://github.com/floneum/floneum/pull/209
Add a separate method for embedding queries by @ealmloff in https://github.com/floneum/floneum/pull/210
Reorganize and simplify examples by @ealmloff in https://github.com/floneum/floneum/pull/211
Improve kalosm-learning docs and lazily find the input size by @ealmloff in https://github.com/floneum/floneum/pull/212
Add more docs for embeddings by @ealmloff in https://github.com/floneum/floneum/pull/213
Read huggingface token by @ealmloff in https://github.com/floneum/floneum/pull/216
Expose a way to manually set the device for llama by @ealmloff in https://github.com/floneum/floneum/pull/215
Improve chat API and adding more examples for Chat and ChatBuilder by @ealmloff in https://github.com/floneum/floneum/pull/218
Add a voice activity and denoising helpers to kalosm audio by @ealmloff in https://github.com/floneum/floneum/pull/222
Simplify parse and add a derive macro by @ealmloff in https://github.com/floneum/floneum/pull/223
Bump the cargo group with 2 updates by @dependabot in https://github.com/floneum/floneum/pull/225
Improve kalosm feature flags by @newfla in https://github.com/floneum/floneum/pull/226
Fix regex constraints by @ealmloff in https://github.com/floneum/floneum/pull/227
Fix structured generation with non-prefix encodable tokenizers like phi by @ealmloff in https://github.com/floneum/floneum/pull/228
Faster structured generation with sampler aware token decoding by @ealmloff in https://github.com/floneum/floneum/pull/229
Derive parse for enums with data by @ealmloff in https://github.com/floneum/floneum/pull/230
Add attributes to modify unit, enum and struct parsing by @ealmloff in https://github.com/floneum/floneum/pull/231
Improve the ergonomics of the TextStream trait and remove async from a few model methods by @ealmloff in https://github.com/floneum/floneum/pull/232
Remove a bunch of unused dependencies by @ealmloff in https://github.com/floneum/floneum/pull/233
Add overviews for each core module by @ealmloff in https://github.com/floneum/floneum/pull/234
Create a compile time state machine for enum parsers by @ealmloff in https://github.com/floneum/floneum/pull/235
Add a llama 3.1 instruct preset by @ealmloff in https://github.com/floneum/floneum/pull/237
Bump openssl from 0.10.64 to 0.10.66 in the cargo group by @dependabot in https://github.com/floneum/floneum/pull/236
Fix the required next tokens for repeat parsers by @ealmloff in https://github.com/floneum/floneum/pull/239
Make cloning repeat partial state very cheap with an immutable Arc Linked List by @ealmloff in https://github.com/floneum/floneum/pull/240
Implement phi-3.1 support by @ealmloff in https://github.com/floneum/floneum/pull/241
Fix parsing signs and optimize separated parser by @ealmloff in https://github.com/floneum/floneum/pull/242
Fix constrained rust type performance by @ealmloff in https://github.com/floneum/floneum/pull/243
Derive a JSON schema by @ealmloff in https://github.com/floneum/floneum/pull/245
Implement prompt healing by @ealmloff in https://github.com/floneum/floneum/pull/246
Split floneum and kalosm in the workspace by @ealmloff in https://github.com/floneum/floneum/pull/247
Fix structured generation with the phi tokenizer by @ealmloff in https://github.com/floneum/floneum/pull/250
chore: update lib.rs by @eltociear in https://github.com/floneum/floneum/pull/249
Fix remaining doc tests by @ealmloff in https://github.com/floneum/floneum/pull/251
Fix CI checks by @ealmloff in https://github.com/floneum/floneum/pull/252
Add a tiny helper for tasks that implement parse and schema by @ealmloff in https://github.com/floneum/floneum/pull/253
Bump version by @ealmloff in https://github.com/floneum/floneum/pull/254
Improve the documentation for the entry point of each crate by @ealmloff in https://github.com/floneum/floneum/pull/255
Bump docs by @ealmloff in https://github.com/floneum/floneum/pull/256

New Contributors

@KerfuffleV2 made their first contribution in https://github.com/floneum/floneum/pull/77
@haoxins made their first contribution in https://github.com/floneum/floneum/pull/86
@Yevgnen made their first contribution in https://github.com/floneum/floneum/pull/93
@dependabot made their first contribution in https://github.com/floneum/floneum/pull/156
@newfla made their first contribution in https://github.com/floneum/floneum/pull/165
@LafCorentin made their first contribution in https://github.com/floneum/floneum/pull/166
@eltociear made their first contribution in https://github.com/floneum/floneum/pull/249

Full Git Diff: https://github.com/floneum/floneum/compare/v0.2.0...kalosm-0.3.0

- Rust
Published by ealmloff over 1 year ago

https://github.com/floneum/floneum - 0.2.0

Floneum 0.2 is here! Here are some major features in the new release:

You can read the full release notes in the 0.2 release post

- Rust
Published by ealmloff over 2 years ago

https://github.com/floneum/floneum - Nightly-7-23-2023

- Rust
Published by ealmloff over 2 years ago

https://github.com/floneum/floneum - 0.1.0

The first pre-release of Floneum. This is a pre release, there will be bugs. If you run into any bugs, you can report them on the discord or open a github issue

- Rust
Published by ealmloff over 2 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of https://github.com/floneum/floneum

https://github.com/floneum/floneum - kalosm-0.4

Introducing Kalosm 0.4: A Local-First AI Meta-Framework for Rust

Simplified LLM API

[derive(Schema, Parse, Clone, Debug)]

OpenAI and Anthropic Support with Structured Generation

Single-File LLM Loading

Timestamped Transcription with Whisper

Community and Support

Conclusion

https://github.com/floneum/floneum - kalosm-0.3.0

Kalosm 0.3

Performance Improvements

Structured Generation Improvements

[derive(Parse, Schema, Clone, Debug)]

[tokio::main]

Streaming Voice Transcription

Model Progress

Documentation improvements

New models!

Full changelog

New Contributors

https://github.com/floneum/floneum - 0.2.0

https://github.com/floneum/floneum - Nightly-7-23-2023

https://github.com/floneum/floneum - 0.1.0