LlamaCppOutlines
Llama Cpp and Outlines wrapper in one tight package.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary
Repository
Llama Cpp and Outlines wrapper in one tight package.
Basic Info
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
LlamaCppOutlines.jl
A Julia package for LLaMA inference with structured output generation using Outlines constraints.
Features
- LLaMA Model Inference: Basic text generation with sampling
- Multimodal Support: Text + image generation with multimodal models
- Constrained Generation: Structured output using JSON schema constraints
- Enhanced Sampling: Multiple sampling strategies (greedy, top-k, top-p, temperature)
- LoRA Support: Dynamic adapter loading and switching
- GPU Support: Automatic CUDA detection
Installation
julia
] add LlamaCppOutlines
or
Pkg.add(url="https://github.com/krishnaveti/LlamaCppOutlines.jl")
Warning: Current support is only for Windows.
Quick Start
```julia using LlamaCppOutlines
Initialize the APIs
init_apis!()
Load a model
model, modelcontext, vocab = loadand_initialize("path/to/model.gguf")
Basic text generation
result = generatewithsampling( "What is the capital of France?", model=model, modelcontext=modelcontext, vocab=vocab, maxnewtokens=50 )
Constrained generation with JSON schema
schema = Dict( "type" => "object", "properties" => Dict( "city" => Dict("type" => "string"), "country" => Dict("type" => "string") ), "required" => ["city", "country"] )
result = greedyconstrainedgeneration( "What is the capital of France?", schema, "google/gemma-2-2b-it", modelcontext=modelcontext, vocab=vocab ) ```
Multimodal Usage
```julia
Load multimodal model
model, modelcontext, mtmdcontext, vocab = loadandinitialize_mtmd( "path/to/model.gguf", "path/to/projection.gguf" )
Generate text from image
result = generatemtmdwithsampling( "Describe this image: <media>", ["path/to/image.jpg"], model=model, modelcontext=modelcontext, vocab=vocab, mtmdcontext=mtmd_context ) ```
Enhanced Sampling
```julia
Use different sampling strategies
creativeresult = enhancedconstrainedgeneration( prompt, schema, tokenizer, modelcontext=modelcontext, vocab=vocab, samplingparams=creative_params() )
Or create custom sampling parameters
customparams = SamplingParams( temperature=0.9f0, topk=50, topp=0.95f0, repeatpenalty=1.1f0 ) ```
LoRA Training and Adaptation
```julia
Train a LoRA adapter from scratch
ggufpath = trainloratogguf( "google/gemma-2-2b-it", "yourhftokenhere", outputdir="myloramodels" )
Load and use LoRA adapter
model, modelcontext, vocab = loadandinitialize("path/to/model.gguf") adapter = loadloraadapter(model, ggufpath)
Apply adapter to context
result = setadapterlora(model_context, adapter, 1.0f0) if result == 0 println("LoRA adapter applied successfully") end
Generate with LoRA adaptation
response = generatewithsampling( "Solve this economics problem: ...", model=model, modelcontext=modelcontext, vocab=vocab, maxnewtokens=100 )
Remove adapter when done
rmadapterlora(modelcontext, adapter) freelora_adapter(adapter) ```
LoRA Manager for Multiple Adapters
```julia
Create LoRA manager for handling multiple adapters
manager = LoRAManager(model)
Load multiple adapters
loadadapter!(manager, "economics", "path/to/econadapter.gguf") loadadapter!(manager, "creative", "path/to/creativeadapter.gguf")
List available adapters
adapters = list_adapters(manager) println("Available adapters: ", adapters)
Switch between adapters
switchadapter!(manager, modelcontext, "economics", scale=1.0f0)
... generate economics content
switchadapter!(manager, modelcontext, "creative", scale=0.8f0)
... generate creative content
Clear active adapter
clearactiveadapter!(manager, model_context)
Clean up all adapters
freealladapters!(manager) ```
API Reference
Core Functions
init_apis!(): Initialize all API libraries (Windows)init_apis_not_windows!(): Initialize all API libraries (Linux/macOS)load_and_initialize(model_path; ...): Load LLaMA model for text generationload_and_initialize_mtmd(model_path, proj_path; ...): Load multimodal modelgenerate_with_sampling(prompt; ...): Basic text generationgenerate_mtmd_with_sampling(prompt, img_paths; ...): Multimodal generation
Constrained Generation
greedy_constrained_generation(prompt, schema, tokenizer; ...): Greedy constrained outputenhanced_constrained_generation(prompt, schema, tokenizer; ...): Enhanced sampling with constraintsgreedy_mtmd_constrained_generation(prompt, img_paths, schema, tokenizer; ...): Multimodal constrained outputenhanced_mtmd_constrained_generation(prompt, img_paths, schema, tokenizer; ...): Enhanced multimodal sampling
Sampling Parameters
SamplingParams(; ...): Create custom sampling configurationcreative_params(): High creativity settingsbalanced_params(): Balanced settings (default)focused_params(): Low creativity, focused outputgreedy_params(): Deterministic output
LoRA Functions
train_lora_to_gguf(model_name, hf_token; ...): Train LoRA adapter and convert to GGUFload_lora_adapter(model, adapter_path): Load LoRA adapter from GGUF filefree_lora_adapter(adapter): Free LoRA adapter from memoryset_adapter_lora(ctx, adapter, scale): Set LoRA adapter on contextrm_adapter_lora(ctx, adapter): Remove LoRA adapter from contextclear_adapter_lora(ctx): Clear all LoRA adapters from context
LoRA Manager
LoRAManager(model): Create LoRA manager for multiple adaptersload_adapter!(manager, name, path): Load adapter into managerswitch_adapter!(manager, ctx, name; scale=1.0f0): Switch to specific adapterclear_active_adapter!(manager, ctx): Clear active adapterfree_all_adapters!(manager): Free all managed adapterslist_adapters(manager): List all loaded adaptersget_active_adapter(manager): Get name of active adapterlist_lora_gguf_files(directory="lora_gguf"): List GGUF files in directory
Requirements
System Dependencies
- LLaMA.cpp: Binary builds through artifact
- Outlines-core: Binary builds through artifact
⚠️ Linux/macOS Users: This package currently includes Windows binaries only. Will be updating soon.
Authentication Requirements
- HuggingFace Token: Required for full API functionality
- Needed for constrained generation with HuggingFace tokenizers
- Required for LoRA training with gated models (e.g., Gemma, Llama)
- Required for vocabulary creation from HuggingFace models
- Get your token at: https://huggingface.co/settings/tokens
- Use with
hf_tokenparameter in relevant functions
Optional Dependencies
For image processing (multimodal functionality):
julia
] add Images FileIO ImageMagick
For audio processing:
julia
] add LibSndFile
For LoRA training (automatically installed by train_lora_to_gguf):
bash
pip install torch transformers peft datasets huggingface_hub python-dotenv accelerate
Once initialized, all functionality works identically to Windows.
Directory Structure
LlamaCppOutlines/
├── src/
│ ├── LlamaCppOutlines.jl # Main module
│ └── constrained_generation.jl # Constrained generation functions
├── lib/
│ ├── llama_api.jl # LLaMA API bindings
│ ├── mtmd_api.jl # Multimodal API bindings
│ ├── outlines_api.jl # Outlines API bindings
│ └── lora_to_gguf.py # LoRA training script
├── test/
│ ├── runtests.jl # Test runner
│ ├── test_basic_api.jl # Basic API tests
│ ├── test_multimodal_api.jl # Multimodal tests
│ ├── test_outlines_api.jl # Outlines tests
│ └── test_integration.jl # Integration tests
└── Project.toml
└── Artifacts.toml
Testing
julia
] test LlamaCppOutlines
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
License
This project is licensed under the MIT License.
Owner
- Login: krishnaveti
- Kind: user
- Repositories: 1
- Profile: https://github.com/krishnaveti
Citation (Citation.cff)
cff-version: 1.2.0
message: >
If you use LlamaCppOutlines.jl in your work and find it helpful, please cite it using the following metadata.
title: LlamaCppOutlines.jl
version: 0.1.0
doi: 10.5281/zenodo.15857210
date-released: 2025-07-10
authors:
- family-names: K S
given-names: Vikram
orcid: https://orcid.org/0009-0008-3118-411X
affiliation: University of Cincinnati
repository-code: https://github.com/krishnaveti/llamacpp-outlines
license: MIT
keywords:
- Julia
- LLaMA
- GGUF
- multimodal
- Outlines
- LoRA
- inference
- structured generation
- constrained decoding
- large language models
GitHub Events
Total
- Create event: 3
- Commit comment event: 15
- Release event: 3
- Watch event: 1
- Delete event: 5
- Push event: 16
Last Year
- Create event: 3
- Commit comment event: 15
- Release event: 3
- Watch event: 1
- Delete event: 5
- Push event: 16
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 3
juliahub.com: LlamaCppOutlines
Llama Cpp and Outlines wrapper in one tight package.
- Documentation: https://docs.juliahub.com/General/LlamaCppOutlines/stable/
- License: MIT
-
Latest release: 1.0.0
published 12 months ago