Releases | Open Source Science

cappr - v0.9.6 - fix Llama 3 tokenizer

Breaking changes

None

New features

None

Bug fixes

cappr.huggingface is compatible with Llama 3/3.1's tokenizer. It works around this issue using code from this PR (with small modifications). See the updated list of supported architectures here.

- Python
Published by kddubey over 1 year ago

cappr - v0.9.5 - address deprecation of HF KV tuple

Breaking changes

None

New features

cappr.huggingface.classify internally passes in a DynamicCache object if possible. This change gets rid of a warning you might see when running previous versions of CAPPr:

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)

Bug fixes

None

- Python
Published by kddubey over 1 year ago

cappr - v0.9.4 - don't repeat KV if possible

Breaking changes

None

New features

cappr.huggingface.classify doesn't copy data if batch_size=1. Instead, it repeats a view of the data. This change saves memory for tasks where there are many completions. For example, in the Banking 77 demo, peak VRAM usage decreases by ~5 GB.

Bug fixes

None

- Python
Published by kddubey over 1 year ago

cappr - v0.9.3 - log-probs is an array when possible

Breaking changes

None

New features

The agg_log_probs function can return a numpy array instead of list if there are a constant number of completions.

Bug fixes

None

- Python
Published by kddubey over 1 year ago

cappr - v0.9.2 - arbitrary token log-prob aggregation

Breaking changes

None

New features

The agg_log_probs function applies func instead of np.exp ∘ func. So if you want average token log-probabilities, set func=np.mean

Bug fixes

None

- Python
Published by kddubey over 1 year ago

cappr - v0.9.1 - no setup.py

Breaking changes

There's no setup.py file, in case you were relying on that.

New features

None

Bug fixes

None

- Python
Published by kddubey over 1 year ago

cappr - v0.9.0 - don't require openai, tiktoken

Breaking changes

pip install cappr will no longer install openai, tiktoken. Install them yourself, or install them using pip install "cappr[openai]". For previous versions of cappr, if you needed to install cappr without these dependencies, you had to run:

bash python -m pip install \ "numpy>=1.21.0" \ "tqdm>=4.27.0" && \ python -m pip install --no-deps cappr

cappr.openai.api.Model no longer includes the deprecated text-* models

New features

None

Bug fixes

None

- Python
Published by kddubey about 2 years ago

cappr - v0.8.8 - default axis for posterior prob

Breaking changes

None

New features

The axis of posterior_prob defaults to the last one

Bug fixes

None

- Python
Published by kddubey about 2 years ago

cappr - v0.8.7 - Llama CPP no need for logits_all=True

Breaking changes

None

New features

You no longer need to instantiate your Llama CPP model with logits_all=True

Bug fixes

None

- Python
Published by kddubey about 2 years ago

cappr - v0.8.6 - support LongLLaMA

Breaking changes

Setting the internal past attribute of the cache to None now will cause an error to be raised if you try to use it again. Please use the original model instead

New features

Support LongLLaMA
repr for cached model
Don't check logits from Llama CPP

Bug fixes

None

- Python
Published by kddubey about 2 years ago

cappr - v0.8.5 - slightly better docstrings

Breaking changes

None

New features

Slightly better docstrings

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.8.4 - cache_model

Breaking changes

None

New features

For online applications, you can cache the model once and use it later. See cappr.huggingface.classify.cache_model and cappr.llama_cpp.classify.cache_model. See the demonstration in the Craigslist Bargains demo

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.8.3 - fixes for openai>=1.2, openai<1.0

Breaking changes

None

New features

None

Bug fixes

Now works for openai>=1.2.0, and backwards compatibility fix for openai<1.0.0

- Python
Published by kddubey over 2 years ago

cappr - v0.8.2 - fix for older numpy versions

Breaking changes

None

New features

None

Bug fixes

For older versions of numpy, _examples functions now work

- Python
Published by kddubey over 2 years ago

cappr - v0.8.1 - OpenAI v1.0.0 compatibility

Breaking changes

end_of_prompt must be a whitespace or empty string. (This was intended since v0.4.7, but I forgot to add a check for it)

New features

cappr.openai is now compatible with OpenAI v1.0.0 (and is backwards compatible with previous versions). You can input a client object

Bug fixes

cappr.huggingface.classify.token_logprobs now has an option to add a BOS token or not. Previously, it always added it if applicable, which is wrong for the (still highly experimental) discount feature

- Python
Published by kddubey over 2 years ago

cappr - v0.8.0 - Llama CPP BPE compatibility

Breaking changes

None

New features

cappr.llama_cpp.classify now takes an end_of_prompt=" " kwarg, which makes it compatible with models which use BPE tokenizers. (This change is backward compatible with SentencePiece tokenizers because they always separate the prompt and completion with a whitespace.)

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.7.0 - no more half measures

Breaking changes

cappr.huggingface.classify_no_batch and cappr.huggingface.classify_no_cache_no_batch have been deprecated. Instead, set batch_size_completions=1 in cappr.huggingface.classify and cappr.huggingface.classify_no_cache, respectively. Apologies for releasing these half measures

New features

cappr.huggingface.classify* can batch over completions to save memory. Set the keyword argument batch_size_completions
cappr.huggingface.classify now allows for sub-prompt caching to reduce runtime. Cache shared instructions or exemplars for prompts using the new context manager. See this functionality in action in the Banking 77 demo

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.6.6 - loosen prior and token_logprobs inputs

Breaking changes

None

New features

The prior can now be any sort of sequence, not just a Sequence or numpy array. So you can supply something like the following (from a pandas DataFrame):

python prior = ( df_tr["correct_completion"] .value_counts(normalize=True) [completions] ) * token_logprobs functions now allow a str input for texts

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.6.5 - never modify model, tokenizer input

Breaking changes

None

New features

None

Bug fixes

Previously, it was possible for an HF or Llama CPP function call to modify the inputted model/tokenizer if an exception were raised. (Technically, the cause is that my context managers didn't wrap the context in a try-finally block. Now they do.)

- Python
Published by kddubey over 2 years ago

cappr - v0.6.4 - HF more caching for no-batch module

Breaking changes

The default batch_size in cappr.huggingface is now 2, not 32
The implementation for cappr.huggingface.classify_no_batch is now in cappr.huggingface.classify_no_batch_no_cache

New features

cappr.huggingface.classify_no_batch now caches the prompt, which makes it much faster. It can also cache shared instructions or exemplars for prompts using the new context manager. See this functionality in action in the Banking 77 demo

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.6.3 - Llama CPP more caching

Breaking changes

None

New features

For cappr.llama_cpp.classify, cache shared instructions or exemplars for many prompts using the new context manager.

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.6.2 - all extras install

Breaking changes

None

New features

Install all extra dependencies to run any model format using:

pip install "cappr[all]"

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.6.1 - fix openai.token_logprobs

Breaking changes

cappr.openai.token_logprobs now prepends a space to each text by default. Set end_of_prompt="" if you don't want that

New features

None

Bug fixes

cappr.openai's (still highly experimental) discount feature works for a wider range of completions

- Python
Published by kddubey over 2 years ago

cappr - v0.6.0 - HF no-batching module

Breaking changes

None

New features

To minimize memory usage, use cappr.huggingface.classify_no_batch. See this section of the docs. I ended up needing this feature to demo Mistral 7B on a T4 GPU

Bug fixes

show_progress_bar=False now works, my b

- Python
Published by kddubey over 2 years ago

cappr - v0.5.1 - allow installation with no dependencies

Breaking changes

None

New features

See this section of the docs

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.5.0 - support GGUF models using llama-cpp-python

Breaking changes

completions is not allowed to be an empty sequence

New features

Use GGUF models using the cappr.llama_cpp.classify module. Install using:

pip install "cappr[llama-cpp]"

See this section of the docs. See this demo for an example.

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.4.7 - breaking little things

Breaking changes

end_of_prompt is restricted to be a whitespace, ” “, or empty string, ””. After much thought and experimentation, I realized that anything else is unnecessarily complicated
The OpenAI API model gpt-3.5-turbo-instruct has been deprecated b/c their API won’t allow setting echo=True, logprobs=1 starting tomorrow
The keyword argument for the (still highly experimental) discount feature, log_marginal_probs_completions, has been renamed to log_marg_probs_completions

New features

You can input your OpenAI API key dynamically: api_key=
The User Guide is much better

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.4.6 - support more types of sequence inputs

Breaking changes

None

New features

Input checks on prompts and completions are more accurate. You can now input, e.g., a polars or pandas Series of strings

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.4.5 - niceties

Breaking changes

There are stronger input checks to avoid silent failures. prompts cannot be empty. completions cannot be empty or a pure string (it has to be a sequence of strings)

New features

Pass normalize=False when you want raw, unnormalized probabilities for, e.g., multi-label classification applications
You can input a single prompt string or Example object. You no longer have to wrap it in a list and then unwrap it
You can disable progress bars using show_progress_bar=False
cappr.huggingface type-hints the model as a PreTrainedModelForCausalLM for greater clarity

Bug fixes

cappr.huggingface doesn't modify the model or tokenizer anymore, sorry bout that
The jagged/inhomogenous numpy array warning from earlier numpy versions (when using _examples functions) is correctly handled

- Python
Published by kddubey over 2 years ago

cappr - v0.4.0 - HF single-token speedup, token_logprobs, discount feature

Breaking changes

None

New features

cappr.huggingface is faster when all of the completions are single tokens. Specifically, we just do inference once on the prompts, and don't repeat data unnecessarily
cappr.huggingface implements token_logprobs like cappr.openai did
cappr.huggingface now supports the (highly experimental) discount feature (mentioned at the bottom of this answer) like cappr.openai did

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.3.0 - support Llama and Llama 2

Breaking changes

None

New features

cappr.huggingface now supports Llama and Llama 2 (chat, raw, GPTQd)

Bug fixes

None

- Python
Published by kddubey over 2 years ago

cappr - v0.2.6 - deprecate model string as input to HF functions

Breaking changes

cappr.huggingface functions only allow model_and_tokenizer input, not the string model input.

New features

None

Bug fixes

Correct type hint for predict_proba_examples functions to reflect that the 2nd dimension is always an array.

- Python
Published by kddubey almost 3 years ago

cappr - v0.2.5 - add prior kwarg to HF no-cache functions

Breaking changes

None

New features

None

Bug fixes

cappr.huggingface.classify.predict_proba and cappr.huggingface.classify.predict now accept a prior kwarg, as was intended (I just forgot to add it in).

- Python
Published by kddubey almost 3 years ago

cappr - v0.2.4 - fix token slicing

Breaking changes

None

New features

None

Bug fixes

For OpenAI models, the completion token probabilities should actually be sliced based on the tokenization of end_of_prompt + completion, not just completion. Based on a few experiments, this change doesn't impact statistical performance. But it should be fixed ofc.

- Python
Published by kddubey almost 3 years ago

cappr - v0.2.3 - allow pre-computed completion log-probs for the discount feature

Breaking changes

None

New features

Allow for pre-computed completion log-probs for the experimental discount feature. Use the newly surfaced function, cappr.openai.token_logprobs, to compute them once and re-use them.

Bug fixes

None

- Python
Published by kddubey almost 3 years ago

cappr - v0.2.2 - highly experimental discount feature

Breaking changes

Deprecate cappr.utils.classify.agg_log_probs_from_constant_completions. I doubt anyone was using this. If you were, then use cappr.utils.classify.agg_log_probs from now on (it does the exact same thing).

New features

Highly experimental feature which discounts completions by their marginal probability. See my updated answer here. The plan is to evaluate this method more thoroughly and discuss it in the user guide. For now, feel free to mess with it.

Bug fixes

Fix type hint for tokenizer: AutoTokenizer to PreTrainedTokenizer.

- Python
Published by kddubey almost 3 years ago

cappr - v0.2.1 - allow prior to be a numpy array

Breaking changes

None

New features

None

Bug fixes

Allow prior to be a numpy array

- Python
Published by kddubey almost 3 years ago

cappr - v0.2.0 - add HF no-cache module

Breaking changes

None

New features

Adds cappr.huggingface.classify_no_cache, which appears to be faster for non-batch processing. This may be a bug tho lol. If it is and I fix it, I'm going to hide this module again, which will be a breaking change.

Here's its documentation.

Bug fixes

None

- Python
Published by kddubey almost 3 years ago

cappr - v0.1.0 - first release

See the documentation

Installation

If you intend on using OpenAI models, sign up for the OpenAI API here, and then set the environment variable OPENAI_API_KEY. For zero-shot classification, OpenAI models are currently far ahead of others. But using them will cost ya 💰!

Install with pip:

python -m pip install cappr

(Optional) Install requirements for HuggingFace models

``` python -m pip install cappr[hf] ```

(Optional) Set up to run demos

``` python -m pip install cappr[demos] ```

- Python
Published by kddubey almost 3 years ago

Recent Releases of cappr

cappr - v0.9.6 - fix Llama 3 tokenizer

Breaking changes

New features

Bug fixes

cappr - v0.9.5 - address deprecation of HF KV tuple

Breaking changes

New features

Bug fixes

cappr - v0.9.4 - don't repeat KV if possible

Breaking changes

New features

Bug fixes

cappr - v0.9.3 - log-probs is an array when possible

Breaking changes

New features

Bug fixes

cappr - v0.9.2 - arbitrary token log-prob aggregation

Breaking changes

New features

Bug fixes

cappr - v0.9.1 - no setup.py

Breaking changes

New features

Bug fixes

cappr - v0.9.0 - don't require openai, tiktoken

Breaking changes

New features

Bug fixes

cappr - v0.8.8 - default axis for posterior prob

Breaking changes

New features

Bug fixes

cappr - v0.8.7 - Llama CPP no need for logits_all=True

Breaking changes

New features

Bug fixes

cappr - v0.8.6 - support LongLLaMA

Breaking changes

New features

Bug fixes

cappr - v0.8.5 - slightly better docstrings

Breaking changes

New features

Bug fixes

cappr - v0.8.4 - cache_model

Breaking changes

New features

Bug fixes

cappr - v0.8.3 - fixes for openai>=1.2, openai<1.0

Breaking changes

New features

Bug fixes

cappr - v0.8.2 - fix for older numpy versions

Breaking changes

New features

Bug fixes

cappr - v0.8.1 - OpenAI v1.0.0 compatibility

Breaking changes

New features

Bug fixes

cappr - v0.8.0 - Llama CPP BPE compatibility

Breaking changes

New features

Bug fixes

cappr - v0.7.0 - no more half measures

Breaking changes

New features

Bug fixes

cappr - v0.6.6 - loosen prior and token_logprobs inputs

Breaking changes

New features

Bug fixes

cappr - v0.6.5 - never modify model, tokenizer input

Breaking changes

New features

Bug fixes

cappr - v0.6.4 - HF more caching for no-batch module

Breaking changes

New features