Recent Releases of datasketch
datasketch - v1.6.5
What's Changed
- Retrieve MinHash from LSHForest by @123epsilon in https://github.com/ekzhu/datasketch/pull/234
- Merging (Identically Specified) MinHashLSH objects by @rupeshkumaar in https://github.com/ekzhu/datasketch/pull/232
- Update bBitMinHash Benchmark by @123epsilon in https://github.com/ekzhu/datasketch/pull/238
New Contributors
- @123epsilon made their first contribution in https://github.com/ekzhu/datasketch/pull/234
- @rupeshkumaar made their first contribution in https://github.com/ekzhu/datasketch/pull/232
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.6.4...v1.6.5
- Python
Published by ekzhu over 1 year ago
datasketch - v1.6.4
What's Changed
- HNSW bug fixes by @ekzhu in https://github.com/ekzhu/datasketch/pull/230
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.6.3...v1.6.4
- Python
Published by ekzhu over 2 years ago
datasketch - v1.6.3
What's Changed
- Update docs by @ekzhu in https://github.com/ekzhu/datasketch/pull/224
- HNSW remove() point in-place. by @ekzhu in https://github.com/ekzhu/datasketch/pull/225
- Benchmark HNSW for Jaccard by @ekzhu in https://github.com/ekzhu/datasketch/pull/226
- HNSW support for soft-remove and hard-remove. by @ekzhu in https://github.com/ekzhu/datasketch/pull/227
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.6.2...v1.6.3
- Python
Published by ekzhu over 2 years ago
datasketch - v1.6.2
What's Changed
- HNSW as MutableMap by @ekzhu in https://github.com/ekzhu/datasketch/pull/223
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.6.1...v1.6.2
- Python
Published by ekzhu over 2 years ago
datasketch - v1.6.1
What's Changed
- simplify reshapes by @chris-ha458 in https://github.com/ekzhu/datasketch/pull/217
- HNSW Update Point by @ekzhu in https://github.com/ekzhu/datasketch/pull/220
- HNSW Dict interface by @ekzhu in https://github.com/ekzhu/datasketch/pull/221
- HNSW Doc by @ekzhu in https://github.com/ekzhu/datasketch/pull/222
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.6.0...v1.6.1
- Python
Published by ekzhu over 2 years ago
datasketch - v1.6.0
What's Changed
- Update MinHashLSH.query docstring detailing proximal nature of results by @micimize in https://github.com/ekzhu/datasketch/pull/199
- Fix doc with new template. by @ekzhu in https://github.com/ekzhu/datasketch/pull/202
- Update lsh.rst by @ekzhu in https://github.com/ekzhu/datasketch/pull/208
- Benchmark ANN index for Jaccard by @ekzhu in https://github.com/ekzhu/datasketch/pull/210
- Update hashfunc.py by @chris-ha458 in https://github.com/ekzhu/datasketch/pull/211
- HNSW Index by @ekzhu in https://github.com/ekzhu/datasketch/pull/218
New Contributors
- @micimize made their first contribution in https://github.com/ekzhu/datasketch/pull/199
- @chris-ha458 made their first contribution in https://github.com/ekzhu/datasketch/pull/211
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.5.9...v1.6.0
- Python
Published by ekzhu over 2 years ago
datasketch - v1.5.9
What's Changed
- Create python-publish.yml by @ekzhu in https://github.com/ekzhu/datasketch/pull/191
- Support numpy>=1.20.0 by @joehalliwell in https://github.com/ekzhu/datasketch/pull/192
- Add note to documentation to address #195 by @ekzhu in https://github.com/ekzhu/datasketch/pull/197
New Contributors
- @joehalliwell made their first contribution in https://github.com/ekzhu/datasketch/pull/192
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.5.8...v1.5.9
- Python
Published by ekzhu about 3 years ago
datasketch - v1.5.8
What's Changed
- Add GitHub URL for PyPi by @andriyor in https://github.com/ekzhu/datasketch/pull/179
- Support asyncio redis by @long2ice in https://github.com/ekzhu/datasketch/pull/185
- Fix name construction for all values of b by @SenadI in https://github.com/ekzhu/datasketch/pull/190
New Contributors
- @andriyor made their first contribution in https://github.com/ekzhu/datasketch/pull/179
- @long2ice made their first contribution in https://github.com/ekzhu/datasketch/pull/185
- @SenadI made their first contribution in https://github.com/ekzhu/datasketch/pull/190
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.5.7...v1.5.8
- Python
Published by ekzhu over 3 years ago
datasketch - v1.5.7
What's Changed
- Unable to create multiple lsh indices each one in its own keyspace - issue #171 by @ronassa in https://github.com/ekzhu/datasketch/pull/172
New Contributors
- @ronassa made their first contribution in https://github.com/ekzhu/datasketch/pull/172
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.5.6...v1.5.7
- Python
Published by ekzhu about 4 years ago
datasketch - Fixed broken packaging script for datasketch/experimental/aio
Fixed broken packaging setup.py that missed experimental/aio.
- Python
Published by ekzhu about 4 years ago
datasketch - v1.5.5
What's Changed
- Adding minhash_many to WeightedMinHashGenerator. by @jroose-jv in https://github.com/ekzhu/datasketch/pull/165
- Add query buffer by @hguhlich in https://github.com/ekzhu/datasketch/pull/167
New Contributors
- @jroose-jv made their first contribution in https://github.com/ekzhu/datasketch/pull/165
- @hguhlich made their first contribution in https://github.com/ekzhu/datasketch/pull/167
Full Changelog: https://github.com/ekzhu/datasketch/compare/v1.5.4...v1.5.5
- Python
Published by ekzhu about 4 years ago
datasketch - v1.5.4
What's Changed
- Fixes #146; MinhashLSH creates mongo index key. by @oisincar in https://github.com/ekzhu/datasketch/pull/148
- Add
redis_bufferconfiguration. by @QthCN in https://github.com/ekzhu/datasketch/pull/152 - minhash: Get rid of deprecation warning by @xkubov in https://github.com/ekzhu/datasketch/pull/156
New Contributors
- @oisincar made their first contribution in https://github.com/ekzhu/datasketch/pull/148
- @QthCN made their first contribution in https://github.com/ekzhu/datasketch/pull/152
- @xkubov made their first contribution in https://github.com/ekzhu/datasketch/pull/156
Full Changelog: https://github.com/ekzhu/datasketch/compare/1.5.2...v1.5.4
- Python
Published by ekzhu about 4 years ago
datasketch - Improved performance for MinHash and MinHashLSH
- Performance improvement for MinHash's update method.
- Make MinHash updates 4.5X faster by using
update_batchmethod for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch) - Further performance gain by using bulk generation of MinHash using
MinHash.bulkorMinHash.generator. See API doc and pull request. - Optional compression for MinHash LSH index by hashing the bucket key produced by
MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.
Thank you @Sinusoidal36!
- Python
Published by ekzhu about 5 years ago
datasketch - Add Cassandra storage layer.
- Minor bug fixes
- Cassandra storage layer, thank @ostefano! Now you can specify the Cassandra config just like the Redis one.
```python from datasketch import MinHashLSH
lsh = MinHashLSH( threashold=0.5, numperm=128, storageconfig={ 'type': 'cassandra', 'cassandra': { 'seeds': ['127.0.0.1'], 'keyspace': 'lshtest', 'replication': { 'class': 'SimpleStrategy', 'replicationfactor': '1', }, 'dropkeyspace': False, 'droptables': False, } } ) ```
- Python
Published by ekzhu about 6 years ago
datasketch - hashfunc to replace hashobj
Now support hashfunc parameter for MinHash and HyperLogLog. The old parameter hashobj is removed.
```python
Let's use MurmurHash3.
import mmh3
We need to define a new hash function that outputs an integer that
can be encoded in 32 bits.
def hashfunc(d): return mmh3.hash32(d)
Use this function in MinHash constructor.
m = MinHash(hashfunc=hashfunc) ```
- Python
Published by ekzhu about 7 years ago
datasketch - Better LSH Ensemble
Use dynamic programming to create optimal partition, allow LSH Ensemble index to adapt to any set size distribution.
- Python
Published by ekzhu about 7 years ago
datasketch - Batch removal of keys from Async MinHashLSH index
- Adding batch removal functionality for Async MinHashLSH
- Because Redis does not support async operation, removed Redis support from Async MinHashLSH
For details see Pull #70 Thanks @aastafiev for the contribution.
- Python
Published by ekzhu over 7 years ago
datasketch - MongoDB replicas
Add support for MongoDB replica set
- Python
Published by ekzhu over 7 years ago
datasketch - Asynchronous MinHash LSH module and storage base name
- Added Asynchronous MinHash LSH module. Thanks @aastafiev!
- Added ability to set the base name in storage config. Base name is used as the prefix for generating keys in the underlying storage (e.g., Redis). This change allows client to "reconnect" to an existing LSH index in the storage through its base name.
- Python
Published by ekzhu over 7 years ago
datasketch - Fix bug in storage
Fix a bug with UnorderedStorage.get_many (#56)
- Python
Published by ekzhu over 7 years ago
datasketch - Fix bug in LSH Forest for Weighted MinHash
- Fix issue #35
- Test cases for checking consistency of hash value length in LSH.
- Python
Published by ekzhu over 8 years ago
datasketch - Optional redis storage requirement.
Thanks @vmarkovtsev
- Python
Published by ekzhu over 8 years ago
datasketch - Redis storage layer for MinHash LSH
- Introduced a Redis storage layer for MinHash LSH. Thanks to @ae-foster
- Added
__hash__method for Lean MinHash.
- Python
Published by ekzhu over 8 years ago
datasketch - LSH Ensemble
- Added a slightly simplified version of LSH Ensemble that supports containment search with MinHash data sketches.
- An introduction on containment link.
- Update documentations
- Python
Published by ekzhu almost 9 years ago
datasketch - Consistent MinHash hash values across Python versions
MinHash now uses Numpy's random number generator instead of Python's built-in random. This makes MinHash generate consistent hash values across different Python versions.
The side-effect is that now MinHash created before version 1.1.3 won’t work (i.e., jaccard, merge and union) correctly with those created after.
- Python
Published by ekzhu almost 9 years ago
datasketch - Introduce Lean MinHash and better documentation
LeanMinHashis a subclass ofMinHash. It uses less memory and allows faster (de)serialization. See documentation for details.- Removed
serialize,deserialize, andbytesizemethods fromMinHash. These are supported inLeanMinHashinstead. - Serialized
MinHashobjects before this version will not be deserialized properly. To migrate see here. - Documentation now have its own website!
- Python
Published by ekzhu almost 9 years ago
datasketch - First stable release
After nearly 2 years working on this project on-and-off, the API is now stable, and the features of MinHash-related sketches are completed.
I will continue to add more data sketches and indexes.
- Python
Published by ekzhu about 9 years ago
datasketch - MinHash LSH Forest
- MinHash LSH Forest implementation and benchmark using synthetic data
- Improve existing MinHash LSH benchmark using synthetic data for more tunable data distributions
- Improve MinHash and LSH performance
- Python
Published by ekzhu about 9 years ago
datasketch - Windows compatibility
- Fixed Issue #4 - int overflow error on Windows platform
- Use Python build-in random number generator for better MinHash accuracy
- Python
Published by ekzhu over 9 years ago
datasketch - Functionality for removing key from LSH index
- Add remove method for LSH index -
lsh.remove(key) - Add membership check for LSH -
key in lsh
- Python
Published by ekzhu almost 10 years ago
datasketch - Introduce Weighted MinHash and interface change
- Add Weighted MinHash data sketch
- Add Weighted MinHash LSH index
- Performance and accuracy benchmark for Weighted MinHash
- Rename digest to update for MinHash and HyperLogLog, and use bytes as input argument.
- Make hashobj customizable through data sketch constructors
- Add new methods for data sketches
- Bug fixes
- Python
Published by ekzhu almost 10 years ago