Recent Releases of cappr
cappr - v0.9.6 - fix Llama 3 tokenizer
Breaking changes
None
New features
None
Bug fixes
cappr.huggingfaceis compatible with Llama 3/3.1's tokenizer. It works around this issue using code from this PR (with small modifications). See the updated list of supported architectures here.
- Python
Published by kddubey over 1 year ago
cappr - v0.9.5 - address deprecation of HF KV tuple
Breaking changes
None
New features
cappr.huggingface.classifyinternally passes in aDynamicCacheobject if possible. This change gets rid of a warning you might see when running previous versions of CAPPr:
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
Bug fixes
None
- Python
Published by kddubey over 1 year ago
cappr - v0.9.4 - don't repeat KV if possible
Breaking changes
None
New features
cappr.huggingface.classifydoesn't copy data ifbatch_size=1. Instead, it repeats a view of the data. This change saves memory for tasks where there are many completions. For example, in the Banking 77 demo, peak VRAM usage decreases by ~5 GB.
Bug fixes
None
- Python
Published by kddubey over 1 year ago
cappr - v0.9.3 - log-probs is an array when possible
Breaking changes
None
New features
- The
agg_log_probsfunction can return a numpy array instead of list if there are a constant number of completions.
Bug fixes
None
- Python
Published by kddubey over 1 year ago
cappr - v0.9.2 - arbitrary token log-prob aggregation
Breaking changes
None
New features
- The
agg_log_probsfunction appliesfuncinstead ofnp.exp ∘ func. So if you want average token log-probabilities, setfunc=np.mean
Bug fixes
None
- Python
Published by kddubey over 1 year ago
cappr - v0.9.1 - no setup.py
Breaking changes
- There's no
setup.pyfile, in case you were relying on that.
New features
None
Bug fixes
None
- Python
Published by kddubey over 1 year ago
cappr - v0.9.0 - don't require openai, tiktoken
Breaking changes
pip install capprwill no longer installopenai, tiktoken. Install them yourself, or install them usingpip install "cappr[openai]". For previous versions ofcappr, if you needed to installcapprwithout these dependencies, you had to run:
bash
python -m pip install \
"numpy>=1.21.0" \
"tqdm>=4.27.0" && \
python -m pip install --no-deps cappr
cappr.openai.api.Modelno longer includes the deprecatedtext-*models
New features
None
Bug fixes
None
- Python
Published by kddubey about 2 years ago
cappr - v0.8.8 - default axis for posterior prob
Breaking changes
None
New features
- The axis of
posterior_probdefaults to the last one
Bug fixes
None
- Python
Published by kddubey about 2 years ago
cappr - v0.8.7 - Llama CPP no need for logits_all=True
Breaking changes
- None
New features
- You no longer need to instantiate your Llama CPP model with
logits_all=True
Bug fixes
None
- Python
Published by kddubey about 2 years ago
cappr - v0.8.6 - support LongLLaMA
Breaking changes
- Setting the internal
pastattribute of the cache toNonenow will cause an error to be raised if you try to use it again. Please use the original model instead
New features
- Support LongLLaMA
reprfor cached model- Don't check logits from Llama CPP
Bug fixes
None
- Python
Published by kddubey about 2 years ago
cappr - v0.8.5 - slightly better docstrings
Breaking changes
None
New features
- Slightly better docstrings
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.8.4 - cache_model
Breaking changes
None
New features
- For online applications, you can cache the model once and use it later. See
cappr.huggingface.classify.cache_modelandcappr.llama_cpp.classify.cache_model. See the demonstration in the Craigslist Bargains demo
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.8.3 - fixes for openai>=1.2, openai<1.0
Breaking changes
None
New features
None
Bug fixes
- Now works for
openai>=1.2.0, and backwards compatibility fix foropenai<1.0.0
- Python
Published by kddubey over 2 years ago
cappr - v0.8.2 - fix for older numpy versions
Breaking changes
None
New features
None
Bug fixes
- For older versions of numpy,
_examplesfunctions now work
- Python
Published by kddubey over 2 years ago
cappr - v0.8.1 - OpenAI v1.0.0 compatibility
Breaking changes
end_of_promptmust be a whitespace or empty string. (This was intended since v0.4.7, but I forgot to add a check for it)
New features
cappr.openaiis now compatible with OpenAI v1.0.0 (and is backwards compatible with previous versions). You can input aclientobject
Bug fixes
cappr.huggingface.classify.token_logprobsnow has an option to add a BOS token or not. Previously, it always added it if applicable, which is wrong for the (still highly experimental) discount feature
- Python
Published by kddubey over 2 years ago
cappr - v0.8.0 - Llama CPP BPE compatibility
Breaking changes
None
New features
cappr.llama_cpp.classifynow takes anend_of_prompt=" "kwarg, which makes it compatible with models which use BPE tokenizers. (This change is backward compatible with SentencePiece tokenizers because they always separate the prompt and completion with a whitespace.)
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.7.0 - no more half measures
Breaking changes
cappr.huggingface.classify_no_batchandcappr.huggingface.classify_no_cache_no_batchhave been deprecated. Instead, setbatch_size_completions=1incappr.huggingface.classifyandcappr.huggingface.classify_no_cache, respectively. Apologies for releasing these half measures
New features
cappr.huggingface.classify*can batch over completions to save memory. Set the keyword argumentbatch_size_completionscappr.huggingface.classifynow allows for sub-prompt caching to reduce runtime. Cache shared instructions or exemplars for prompts using the new context manager. See this functionality in action in the Banking 77 demo
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.6.6 - loosen prior and token_logprobs inputs
Breaking changes
None
New features
- The
priorcan now be any sort of sequence, not just aSequenceor numpy array. So you can supply something like the following (from a pandas DataFrame):
python
prior = (
df_tr["correct_completion"]
.value_counts(normalize=True)
[completions]
)
* token_logprobs functions now allow a str input for texts
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.6.5 - never modify model, tokenizer input
Breaking changes
None
New features
None
Bug fixes
- Previously, it was possible for an HF or Llama CPP function call to modify the inputted model/tokenizer if an exception were raised. (Technically, the cause is that my context managers didn't wrap the context in a try-finally block. Now they do.)
- Python
Published by kddubey over 2 years ago
cappr - v0.6.4 - HF more caching for no-batch module
Breaking changes
- The default
batch_sizeincappr.huggingfaceis now2, not32 - The implementation for
cappr.huggingface.classify_no_batchis now incappr.huggingface.classify_no_batch_no_cache
New features
cappr.huggingface.classify_no_batchnow caches the prompt, which makes it much faster. It can also cache shared instructions or exemplars for prompts using the new context manager. See this functionality in action in the Banking 77 demo
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.6.3 - Llama CPP more caching
Breaking changes
None
New features
- For
cappr.llama_cpp.classify, cache shared instructions or exemplars for many prompts using the new context manager.
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.6.2 - all extras install
Breaking changes
None
New features
- Install all extra dependencies to run any model format using:
pip install "cappr[all]"
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.6.1 - fix openai.token_logprobs
Breaking changes
cappr.openai.token_logprobsnow prepends a space to each text by default. Setend_of_prompt=""if you don't want that
New features
None
Bug fixes
cappr.openai's (still highly experimental) discount feature works for a wider range ofcompletions
- Python
Published by kddubey over 2 years ago
cappr - v0.6.0 - HF no-batching module
Breaking changes
None
New features
- To minimize memory usage, use
cappr.huggingface.classify_no_batch. See this section of the docs. I ended up needing this feature to demo Mistral 7B on a T4 GPU
Bug fixes
show_progress_bar=Falsenow works, my b
- Python
Published by kddubey over 2 years ago
cappr - v0.5.1 - allow installation with no dependencies
Breaking changes
None
New features
- See this section of the docs
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.5.0 - support GGUF models using llama-cpp-python
Breaking changes
completionsis not allowed to be an empty sequence
New features
- Use GGUF models using the
cappr.llama_cpp.classifymodule. Install using:
pip install "cappr[llama-cpp]"
See this section of the docs. See this demo for an example.
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.4.7 - breaking little things
Breaking changes
end_of_promptis restricted to be a whitespace,” “, or empty string,””. After much thought and experimentation, I realized that anything else is unnecessarily complicatedThe OpenAI API model
gpt-3.5-turbo-instructhas been deprecated b/c their API won’t allow settingecho=True, logprobs=1starting tomorrowThe keyword argument for the (still highly experimental) discount feature,
log_marginal_probs_completions, has been renamed tolog_marg_probs_completions
New features
You can input your OpenAI API key dynamically:
api_key=The User Guide is much better
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.4.6 - support more types of sequence inputs
Breaking changes
None
New features
- Input checks on
promptsandcompletionsare more accurate. You can now input, e.g., a polars or pandas Series of strings
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.4.5 - niceties
Breaking changes
- There are stronger input checks to avoid silent failures.
promptscannot be empty.completionscannot be empty or a pure string (it has to be a sequence of strings)
New features
- Pass
normalize=Falsewhen you want raw, unnormalized probabilities for, e.g., multi-label classification applications - You can input a single prompt string or
Exampleobject. You no longer have to wrap it in a list and then unwrap it - You can disable progress bars using
show_progress_bar=False cappr.huggingfacetype-hints the model as aPreTrainedModelForCausalLMfor greater clarity
Bug fixes
cappr.huggingfacedoesn't modify the model or tokenizer anymore, sorry bout that- The jagged/inhomogenous numpy array warning from earlier numpy versions (when using
_examplesfunctions) is correctly handled
- Python
Published by kddubey over 2 years ago
cappr - v0.4.0 - HF single-token speedup, token_logprobs, discount feature
Breaking changes
None
New features
cappr.huggingfaceis faster when all of the completions are single tokens. Specifically, we just do inference once on the prompts, and don't repeat data unnecessarilycappr.huggingfaceimplementstoken_logprobslikecappr.openaididcappr.huggingfacenow supports the (highly experimental) discount feature (mentioned at the bottom of this answer) likecappr.openaidid
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.3.0 - support Llama and Llama 2
Breaking changes
None
New features
cappr.huggingfacenow supports Llama and Llama 2 (chat, raw, GPTQd)
Bug fixes
None
- Python
Published by kddubey over 2 years ago
cappr - v0.2.6 - deprecate model string as input to HF functions
Breaking changes
cappr.huggingfacefunctions only allowmodel_and_tokenizerinput, not the stringmodelinput.
New features
None
Bug fixes
- Correct type hint for
predict_proba_examplesfunctions to reflect that the 2nd dimension is always an array.
- Python
Published by kddubey almost 3 years ago
cappr - v0.2.5 - add prior kwarg to HF no-cache functions
Breaking changes
None
New features
None
Bug fixes
cappr.huggingface.classify.predict_probaandcappr.huggingface.classify.predictnow accept apriorkwarg, as was intended (I just forgot to add it in).
- Python
Published by kddubey almost 3 years ago
cappr - v0.2.4 - fix token slicing
Breaking changes
None
New features
None
Bug fixes
- For OpenAI models, the completion token probabilities should actually be sliced based on the tokenization of
end_of_prompt + completion, not justcompletion. Based on a few experiments, this change doesn't impact statistical performance. But it should be fixed ofc.
- Python
Published by kddubey almost 3 years ago
cappr - v0.2.3 - allow pre-computed completion log-probs for the discount feature
Breaking changes
None
New features
- Allow for pre-computed completion log-probs for the experimental discount feature. Use the newly surfaced function,
cappr.openai.token_logprobs, to compute them once and re-use them.
Bug fixes
None
- Python
Published by kddubey almost 3 years ago
cappr - v0.2.2 - highly experimental discount feature
Breaking changes
- Deprecate
cappr.utils.classify.agg_log_probs_from_constant_completions. I doubt anyone was using this. If you were, then usecappr.utils.classify.agg_log_probsfrom now on (it does the exact same thing).
New features
- Highly experimental feature which discounts completions by their marginal probability. See my updated answer here. The plan is to evaluate this method more thoroughly and discuss it in the user guide. For now, feel free to mess with it.
Bug fixes
- Fix type hint for tokenizer:
AutoTokenizertoPreTrainedTokenizer.
- Python
Published by kddubey almost 3 years ago
cappr - v0.2.1 - allow prior to be a numpy array
Breaking changes
None
New features
None
Bug fixes
- Allow
priorto be a numpy array
- Python
Published by kddubey almost 3 years ago
cappr - v0.2.0 - add HF no-cache module
Breaking changes
None
New features
Adds cappr.huggingface.classify_no_cache, which appears to be faster for non-batch processing. This may be a bug tho lol. If it is and I fix it, I'm going to hide this module again, which will be a breaking change.
Here's its documentation.
Bug fixes
None
- Python
Published by kddubey almost 3 years ago
cappr - v0.1.0 - first release
See the documentation
Installation
If you intend on using OpenAI models, sign up for the OpenAI API here, and then set the environment variable OPENAI_API_KEY. For zero-shot classification, OpenAI models are currently far ahead of others. But using them will cost ya 💰!
Install with pip:
python -m pip install cappr
(Optional) Install requirements for HuggingFace models
``` python -m pip install cappr[hf] ```(Optional) Set up to run demos
``` python -m pip install cappr[demos] ```
- Python
Published by kddubey almost 3 years ago