Recent Releases of https://github.com/numbagg/numbagg
https://github.com/numbagg/numbagg - 0.9.0
0.9.0 implements our own "dynamic" compilation for grouped functions, in lieu of numba's (which currently doesn't work with parallel functions). This allows us to compile a function only for the types of the current function call's arguments, rather than all possible types allowed by the function. The speeds up the JIT compilation of a single function call by ~4x for the grouped functions.
- Python
Published by max-sixty about 1 year ago
https://github.com/numbagg/numbagg - 0.8.2
0.8.2 reduces numerical instability in moving (aka rolling) functions with very short windows, as well as slightly improving their performance.
- Python
Published by max-sixty over 1 year ago
https://github.com/numbagg/numbagg - 0.8.1
0.8.1 adds an experimental NUMBAGG_FASTMATH env var option (thanks @frazane) which increases performance in some routines at the cost of minor inaccuracy. Feel free to provide feedback in an issue if you find this helpful (or unhelpful!). There's also a change for numpy 2.0 compatibility (thanks @mathause), and some internal improvements.
- Python
Published by max-sixty almost 2 years ago
https://github.com/numbagg/numbagg - 0.8.0
0.8.0 includes nanmedian, a wrapper of nanquantile with one quantile of 0.5.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.7.2
0.7.2 raises an error if values outside [0, 1] are passed to nanquantile
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.7.1
0.7.1 removes a stray print statement from the code. Thanks to @mathause for raising and fixing the issue.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - v0.7.0
0.7.0 adds a ddof argument to std & var aggregation & grouping functions. Internally, there are lots of new benchmarks, which are more clearly presented in the Readme, and added some initial property tests.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.6.8
0.6.8 contains mostly internal changes — the initial benchmarking approach is expanded to all functions and displayed in the new Readme. The same framework is now used to test all functions. We also ensure the functions don't emit warnings when handling expected inputs in our tests.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.6.7
0.6.7 removes the temporary patch for the int8 issues we experienced previously in grouping functions, replacing it with something more robust. Specifically, when there are a very large number of items in a group and labels has a very small dtype, labels is cast to a higher dtype.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.6.6
Following closely on the heels of 0.6.5, 0.6.6 works around another rare but serious bug with int8 types. We now coerce all int8 label arrays to int16.
Many thanks to @dcherian for the report.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.6.5
0.6.5 works around a rare but serious bug — when a labels array with int8 type is used in a group function, numbagg can return an incorrect result. The bug requires the array to be a specific size. The currently implemented solution is a workaround rather than an understanding of the underlying issue. Check out https://github.com/numbagg/numbagg/issues/211 for more details.
- Python
Published by max-sixty about 2 years ago
https://github.com/numbagg/numbagg - 0.6.4
0.6.4 fixes a small bug — the value for the window argument for rolling methods couldn't be equal to the axis length.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.6.3
Numbagg will now compile withmode="cpu" if it detects that it's being run in a ThreadPoolExecutor. Previously, the default mode="parallel" could cause numba to abort the python program within that context.
Note that running in a multi-process context retains mode="parallel", so the new behavior should only be slower in infrequent cases, such as a local dask multi-threaded executor.
I'm not completely confident this is the globally optimal solution, so this may evolve. https://github.com/numba/numba/issues/9288 has more context.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.6.2
0.6.2 allows grouping functions to take a wider range of int types as labels. Thanks to @dcherian for the contribution.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.6.1
0.6.1:
- Enables parallel mode in most functions. This radically improves performance in multi-core systems on multi-dimensional arrays (see benchmarks for details)
- Allows passing an array of alphas in the moving_exp functions, which lets us decay values by different amounts
- Improves nanquantile's compatibility with various axis values
- Extends benchmarks to different shapes, adds bottleneck as a comparison
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.6.0
- Add
ffill&bfill, at ~2.7x pandas' performance - Add standard moving window functions —
move_corr,move_cov,move_std,move_sum,move_var, in addition to the existingmove_mean. These have 3.5-20x pandas' performance. - New benchmarks using
pytest-benchmark. This includes a script which makes a nice output which we've added to the readme. It currently only covers themovingandmoving_expfunctions.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.5.1
- Add a
nanquantilefunction; approximately 4x faster thannp.nanquantilewhen over 2 dimensions. It's slightly slower thannp.quantileand pandas'.quantile - Ensure we don't produce
infvalues for some exponential moving functions. Numerical values remain unchanged.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.5.0
- Sets
ddof=1forstd&varfunctions, mirroring the grouped &move_expfunctions (but notably different from numpy) - Adds a
move_exp_nancountfunctions, for exponentially weighted moving counts - Adds
nancountas an alias forcount
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.4.5
0.4.5 fixes an issue with our new PyPI release workflow. 0.4.1-4 were not published to PyPI
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.4.4
0.4.4 fixes an issue with our new PyPI release workflow. 0.4.1-3 were not published to PyPI
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.4.3
0.4.3 fixes an issue with our new PyPI release workflow. 0.4.1 & 0.4.2 were not published to PyPI
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.4.2
0.4.2 fixes an issue with our new PyPI release workflow. 0.4.1 was not published to PyPI
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.4.1
0.4.1 fixes an issue with move_exp_nanstd not accepting an axis kwarg.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.4.0
0.4.0 adds some more exponentially weighted functions:
- move_exp_nanstd
- move_exp_nanvar
- move_exp_nancorr
- move_exp_nancov
Because functions can now take more than one array, the signature of the moving exponential functions has changed slightly to require alpha to be a keyword argument. This is technically a breaking change, though most consumers will be passing alpha as a kwarg already (xarray included).
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.3.1
This release adds a min_weight parameter to the exponential moving functions, so that it's possible to output values if there's a sufficient number of recent valid values — similar to the min_count count parameter to the simple moving functions.
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - 0.3.0
After a lengthy hiatus of development on numbagg, we're back with a big release:
Lots of new grouping functions, in an attempt to be an engine for flox, a library from @dcherian & others. The functions include:
group_nancountgroup_nanargmax,group_nanargmingroup_nanfirst,group_nanlastgroup_nansum_of_squaresgroup_nanprodgroup_nanall,group_nananygroup_nanvargroup_nanstdgroup_nanmax,group_nanmin
Lots of performance improvements to existing grouping functions
- Initial benchmarking shows 2-5x the performance over pandas' equivalent functions (though mostly towards the lower end, and the benchmarks are not as robust as I'd like; feedback and verifications welcome).
Large test coverage expansion of grouping functions
Improvements to the exponentially weighted moving functions:
- A new
move_exp_nanvarfunction - Code simplification and modest performance improvements to existing functions
- Benchmarks show 1-5x the performance of pandas' equivalent functions.
- A new
A modest performance gain to existing moving functions.
Internally, we've removed some of the original hacks that were initially required. Thanks to
numbaggfor supporting many of these natively!
The documentation needs a pass — the Readme could be reorganized, and the benchmarks could be more systematically measured and reported. It's possible that these large changes have introduced small bugs — particularly around edge cases, such as unfamiliar dtypes. That said, the main use cases are quite well-tested, and we have pandas & numpy to thank for excellent comparisons to test against.)
Please report any issues or questions. I (@max-sixty) am excited numbagg is back, and will gauge how much to add on the extent to which folks find it useful. And ofc thanks to @shoyer for writing the original library!
- Python
Published by max-sixty over 2 years ago
https://github.com/numbagg/numbagg - v0.2.2
Fixes embedded version number
- Python
Published by max-sixty about 3 years ago