LlamaCppOutlines

Llama Cpp and Outlines wrapper in one tight package.

https://github.com/krishnaveti/llamacppoutlines.jl

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Llama Cpp and Outlines wrapper in one tight package.

Basic Info

Host: GitHub
Owner: krishnaveti
License: mit
Language: Julia
Default Branch: master
Homepage:
Size: 43.3 MB

Statistics

Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 2

Created 12 months ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

LlamaCppOutlines.jl

A Julia package for LLaMA inference with structured output generation using Outlines constraints.

Features

LLaMA Model Inference: Basic text generation with sampling
Multimodal Support: Text + image generation with multimodal models
Constrained Generation: Structured output using JSON schema constraints
Enhanced Sampling: Multiple sampling strategies (greedy, top-k, top-p, temperature)
LoRA Support: Dynamic adapter loading and switching
GPU Support: Automatic CUDA detection

Installation

julia ] add LlamaCppOutlines or Pkg.add(url="https://github.com/krishnaveti/LlamaCppOutlines.jl") Warning: Current support is only for Windows.

Quick Start

```julia using LlamaCppOutlines

Initialize the APIs

init_apis!()

Load a model

model, modelcontext, vocab = loadand_initialize("path/to/model.gguf")

Basic text generation

result = generatewithsampling( "What is the capital of France?", model=model, modelcontext=modelcontext, vocab=vocab, maxnewtokens=50 )

Constrained generation with JSON schema

schema = Dict( "type" => "object", "properties" => Dict( "city" => Dict("type" => "string"), "country" => Dict("type" => "string") ), "required" => ["city", "country"] )

result = greedyconstrainedgeneration( "What is the capital of France?", schema, "google/gemma-2-2b-it", modelcontext=modelcontext, vocab=vocab ) ```

Multimodal Usage

```julia

Load multimodal model

model, modelcontext, mtmdcontext, vocab = loadandinitialize_mtmd( "path/to/model.gguf", "path/to/projection.gguf" )

Generate text from image

result = generatemtmdwithsampling( "Describe this image: <media>", ["path/to/image.jpg"], model=model, modelcontext=modelcontext, vocab=vocab, mtmdcontext=mtmd_context ) ```

Enhanced Sampling

```julia

Use different sampling strategies

creativeresult = enhancedconstrainedgeneration( prompt, schema, tokenizer, modelcontext=modelcontext, vocab=vocab, samplingparams=creative_params() )

Or create custom sampling parameters

customparams = SamplingParams( temperature=0.9f0, topk=50, topp=0.95f0, repeatpenalty=1.1f0 ) ```

LoRA Training and Adaptation

```julia

Train a LoRA adapter from scratch

ggufpath = trainloratogguf( "google/gemma-2-2b-it", "yourhftokenhere", outputdir="myloramodels" )

Load and use LoRA adapter

model, modelcontext, vocab = loadandinitialize("path/to/model.gguf") adapter = loadloraadapter(model, ggufpath)

Apply adapter to context

result = setadapterlora(model_context, adapter, 1.0f0) if result == 0 println("LoRA adapter applied successfully") end

Generate with LoRA adaptation

response = generatewithsampling( "Solve this economics problem: ...", model=model, modelcontext=modelcontext, vocab=vocab, maxnewtokens=100 )

Remove adapter when done

rmadapterlora(modelcontext, adapter) freelora_adapter(adapter) ```

LoRA Manager for Multiple Adapters

```julia

Create LoRA manager for handling multiple adapters

manager = LoRAManager(model)

Load multiple adapters

loadadapter!(manager, "economics", "path/to/econadapter.gguf") loadadapter!(manager, "creative", "path/to/creativeadapter.gguf")

List available adapters

adapters = list_adapters(manager) println("Available adapters: ", adapters)

Switch between adapters

switchadapter!(manager, modelcontext, "economics", scale=1.0f0)

... generate economics content

switchadapter!(manager, modelcontext, "creative", scale=0.8f0)

... generate creative content

Clear active adapter

clearactiveadapter!(manager, model_context)

Clean up all adapters

freealladapters!(manager) ```

API Reference

Core Functions

init_apis!(): Initialize all API libraries (Windows)
init_apis_not_windows!(): Initialize all API libraries (Linux/macOS)
load_and_initialize(model_path; ...): Load LLaMA model for text generation
load_and_initialize_mtmd(model_path, proj_path; ...): Load multimodal model
generate_with_sampling(prompt; ...): Basic text generation
generate_mtmd_with_sampling(prompt, img_paths; ...): Multimodal generation

Constrained Generation

greedy_constrained_generation(prompt, schema, tokenizer; ...): Greedy constrained output
enhanced_constrained_generation(prompt, schema, tokenizer; ...): Enhanced sampling with constraints
greedy_mtmd_constrained_generation(prompt, img_paths, schema, tokenizer; ...): Multimodal constrained output
enhanced_mtmd_constrained_generation(prompt, img_paths, schema, tokenizer; ...): Enhanced multimodal sampling

Sampling Parameters

SamplingParams(; ...): Create custom sampling configuration
creative_params(): High creativity settings
balanced_params(): Balanced settings (default)
focused_params(): Low creativity, focused output
greedy_params(): Deterministic output

LoRA Functions

train_lora_to_gguf(model_name, hf_token; ...): Train LoRA adapter and convert to GGUF
load_lora_adapter(model, adapter_path): Load LoRA adapter from GGUF file
free_lora_adapter(adapter): Free LoRA adapter from memory
set_adapter_lora(ctx, adapter, scale): Set LoRA adapter on context
rm_adapter_lora(ctx, adapter): Remove LoRA adapter from context
clear_adapter_lora(ctx): Clear all LoRA adapters from context

LoRA Manager

LoRAManager(model): Create LoRA manager for multiple adapters
load_adapter!(manager, name, path): Load adapter into manager
switch_adapter!(manager, ctx, name; scale=1.0f0): Switch to specific adapter
clear_active_adapter!(manager, ctx): Clear active adapter
free_all_adapters!(manager): Free all managed adapters
list_adapters(manager): List all loaded adapters
get_active_adapter(manager): Get name of active adapter
list_lora_gguf_files(directory="lora_gguf"): List GGUF files in directory

Requirements

System Dependencies

LLaMA.cpp: Binary builds through artifact
Outlines-core: Binary builds through artifact

⚠️ Linux/macOS Users: This package currently includes Windows binaries only. Will be updating soon.

Authentication Requirements

HuggingFace Token: Required for full API functionality
- Needed for constrained generation with HuggingFace tokenizers
- Required for LoRA training with gated models (e.g., Gemma, Llama)
- Required for vocabulary creation from HuggingFace models
- Get your token at: https://huggingface.co/settings/tokens
- Use with hf_token parameter in relevant functions

Optional Dependencies

For image processing (multimodal functionality): julia ] add Images FileIO ImageMagick

For audio processing: julia ] add LibSndFile

For LoRA training (automatically installed by train_lora_to_gguf): bash pip install torch transformers peft datasets huggingface_hub python-dotenv accelerate Once initialized, all functionality works identically to Windows.

Directory Structure

LlamaCppOutlines/ ├── src/ │ ├── LlamaCppOutlines.jl # Main module │ └── constrained_generation.jl # Constrained generation functions ├── lib/ │ ├── llama_api.jl # LLaMA API bindings │ ├── mtmd_api.jl # Multimodal API bindings │ ├── outlines_api.jl # Outlines API bindings │ └── lora_to_gguf.py # LoRA training script ├── test/ │ ├── runtests.jl # Test runner │ ├── test_basic_api.jl # Basic API tests │ ├── test_multimodal_api.jl # Multimodal tests │ ├── test_outlines_api.jl # Outlines tests │ └── test_integration.jl # Integration tests └── Project.toml └── Artifacts.toml

Testing

julia ] test LlamaCppOutlines

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Submit a pull request

License

This project is licensed under the MIT License.

Owner

Login: krishnaveti
Kind: user

Repositories: 1
Profile: https://github.com/krishnaveti

Citation (Citation.cff)

cff-version: 1.2.0
message: >
  If you use LlamaCppOutlines.jl in your work and find it helpful, please cite it using the following metadata.
title: LlamaCppOutlines.jl
version: 0.1.0
doi: 10.5281/zenodo.15857210
date-released: 2025-07-10
authors:
  - family-names: K S
    given-names: Vikram
    orcid: https://orcid.org/0009-0008-3118-411X
    affiliation: University of Cincinnati
repository-code: https://github.com/krishnaveti/llamacpp-outlines
license: MIT
keywords:
  - Julia
  - LLaMA
  - GGUF
  - multimodal
  - Outlines
  - LoRA
  - inference
  - structured generation
  - constrained decoding
  - large language models

GitHub Events

Total

Create event: 3
Commit comment event: 15
Release event: 3
Watch event: 1
Delete event: 5
Push event: 16

Last Year

Create event: 3
Commit comment event: 15
Release event: 3
Watch event: 1
Delete event: 5
Push event: 16

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 3

juliahub.com: LlamaCppOutlines

Llama Cpp and Outlines wrapper in one tight package.

Documentation: https://docs.juliahub.com/General/LlamaCppOutlines/stable/
License: MIT
Latest release: 1.0.0
published 12 months ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 0 Total

Rankings

Dependent repos count: 8.3%

Average: 21.9%

Dependent packages count: 35.5%

Last synced: 10 months ago

LlamaCppOutlines

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

LlamaCppOutlines.jl

Features

Installation

Quick Start

Initialize the APIs

Load a model

Basic text generation

Constrained generation with JSON schema

Multimodal Usage

Load multimodal model

Generate text from image

Enhanced Sampling

Use different sampling strategies

Or create custom sampling parameters

LoRA Training and Adaptation

Train a LoRA adapter from scratch

Load and use LoRA adapter

Apply adapter to context

Generate with LoRA adaptation

Remove adapter when done

LoRA Manager for Multiple Adapters

Create LoRA manager for handling multiple adapters

Load multiple adapters

List available adapters

Switch between adapters

... generate economics content

... generate creative content

Clear active adapter

Clean up all adapters

API Reference

Core Functions

Constrained Generation

Sampling Parameters

LoRA Functions

LoRA Manager

Requirements

System Dependencies

Authentication Requirements

Optional Dependencies

Directory Structure

Testing

Contributing

License

Owner

Citation (Citation.cff)

GitHub Events

Total

Last Year

Packages

juliahub.com: LlamaCppOutlines

Rankings