bayesian-lora

Bayesian Low-Rank Adaptation for Large Language Models

https://github.com/maximerobeyns/bayesian_lora

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Keywords

kfac laplace-approximation llm

Last synced: 11 months ago · JSON representation ·

Repository

Bayesian Low-Rank Adaptation for Large Language Models

Basic Info

Host: GitHub
Owner: MaximeRobeyns
License: apache-2.0
Language: Python
Default Branch: master
Homepage: https://maximerobeyns.github.io/bayesian_lora/index.html
Size: 1.74 MB

Statistics

Stars: 35
Watchers: 4
Forks: 6
Open Issues: 2
Releases: 0

Topics

kfac laplace-approximation llm

Created over 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

Bayesian LoRA

Code for the paper Bayesian Low-Rank Adaptation for Large Language Models.

See the explanatory blog post and documentation for more information.

Installation

bash pip install bayesian-lora

Example

We provide a comprehensive example in examples/example_usage.py, running through the main methods using Phi-2 on ARC-E.

Note that running this requires a local installation with a few extra dependencies. Run: bash git clone https://github.com/MaximeRobeyns/bayesian_lora cd bayesian_lora pip install -e ".[examples]" and then bash python ./examples/example_usage.py

The main functions this library provides are for calculating Kronecker factors, the marginal likelihood, and the posterior predictive distribution. We show how to use these in the examples below.

Calculating (low-rank) Kronecker factors

First, wrap your model call in a function that takes a batch from your data loader, and returns the relevant logits. For a CausalLM from HuggingFace:

python def fwd_call(model: nn.Module, batch_prompts: Any) -> t.Tensor: inputs = tokenizer(batch_prompts).to(device) outputs = model(**inputs) logits = outputs.logits[:, -1] # Get the last token logits return logits You can now call our calculate_kronecker_factors function: ```python from bayesianlora import calculatekronecker_factors

factors = calculatekroneckerfactors( model, # Your model (not necessarily PEFT) fwdcall, # Model call wrapper, defined above trainloader, # Your training data loader cfg.nkfac, # (Optional) rank to use cfg.lrthreshold, # (Optional) threshold for low-rank approximation ["lora"], # (Optional) modules to target; defaults to all modules usetqdm=True, # (Optional) use tqdm for progress bar ) ``In the above, the["lora"]argument contains a case-insensitive list of keywords to identify modules to target. Since we're working with a LoRA model, we choose"lora"to target LoRA modules, for instancelayers.0.qproj.lora_A`.

The factors are a dictionary with keys being the full name of the targetted modules, and a tuple of two tensors as the values: the first being the (possibly low-rank) Kronecker factor corresponding to the input activations, and the second being the (possibly low-rank) factor corresponding to the output gradients.

See the K-FAC docs for more detail.

Model Evidence

We provide a function called model_evidence which returns the evidence / marginal likelihood.

```python from bayesianlora import modelevidence

evidence = modelevidence( model, # Your model loglikelihood, # A Tensor with model's log likelihood on some eval dataset factors, # Kronecker factors, as calculated above nlora, # rank used in the LoRA adapters nkfac, # rank used in the Kronecker factors prior_var, # prior variance hyperparameter, as a tensor ) ```

You can then use evidence as the loss in a normal training loop, presuming your parameters (e.g. prior_var have gradients).

Posterior Predictive Distribution

To get the parameters of the Gaussian over the logits, use the jacobian_mean and variance functions.

```python with t.nograd(): for batch in validationloader prompts, classes = batch

    batch_inputs = tokenizer(prompts)

    # Predict the output logit locations
    # target_ids is a tensor containing the indices of the target tokens
    # e.g. [354, 355, 356].
    jacobian, f_mu = jacobian_mean(
        model, batch_inputs, target_ids
    )

    # Predict the output logit variances
    f_var = variance(
        batch_inputs,     # inputs
        jacobian,         # the Jacobian dictionary, obtained above
        factors,          # Kronecker factors, as calculated above
        prior_var,        # prior variance hyperparameter, as a tensor
        classes.size(-1), # number of classes to predict
        n_lora,           # rank of the LoRA adapters
        n_kfac,           # rank of the Kronecker factors
        device,           # device to use
    )

    # Now use the parameters to e.g. sample logits from the Gaussian
    # predictive, parametrised by f_mu, f_var
    L = t.linalg.cholesky(f_var)
    samples = 100_000
    f_mu = f_mu.expand(samples, *f_mu.shape)
    L = L.expand(samples, *L.shape)
    eps = t.randn_like(f_mu)
    logits = (f_mu + L @ eps).squeeze(-1).mean(0)

```

The above is a minimal example; see this section of the documentation for more detail.

Development

This library is intentionally very small and hackable. It has two main files, and three dependencies (torch, tqdm and jaxtyping.)

main.py contains methods specific to the paper,
kfac.py contains relatively portable K-FAC methods

Feel free to directly copy the code into your projects and hack on it.

Owner

Name: Maxime Robeyns
Login: MaximeRobeyns
Kind: user
Location: London

Website: maximerobeyns.com
Twitter: maxime_robeyns
Repositories: 6
Profile: https://github.com/MaximeRobeyns

PhD student in probabilistic machine learning

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Robeyns
    given-names: Maxime
    orcid: https://orcid.org/0000-0001-9802-9597
title: "Bayesian LoRA"
version: 0.0.1
date-released: 2024-01-31
repository-code: "https://github.com/MaximeRobeyns/bayesian_lora"

GitHub Events

Total

Watch event: 9
Fork event: 5

Last Year

Watch event: 9
Fork event: 5

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 6
Total pull requests: 0
Average time to close issues: 21 days
Average time to close pull requests: N/A
Total issue authors: 4
Total pull request authors: 0
Average comments per issue: 0.83
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: 10 minutes
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.5
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

xuullin (3)
flyleeee (1)
sandylaker (1)
brooksniu (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 113 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 6
Total maintainers: 1

pypi.org: bayesian-lora

Bayesian LoRA adapters for Language Models

Homepage: https://github.com/MaximeRobeyns/bayesian_lora
Documentation: https://maximerobeyns.github.io/bayesian_lora/
License: Apache-2.0
Latest release: 0.0.6
published over 2 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 113 Last month

Rankings

Dependent packages count: 9.9%

Average: 37.5%

Dependent repos count: 65.1%

Maintainers (1)

mrobeyns