Recent Releases of https://github.com/bsc-wdc/dislib
https://github.com/bsc-wdc/dislib - v0.9.0
New features
- New RandomSVD algorithm
- New LanczosSVD algorithm
- New distributed versions of Random Forest Classifier and Random Forest Regressor
- New nested versions of Random Forest Classifier and Random Forest Regressor
- Included a version of TeraSort algorithm
Changed
- New documentation for SVD algorithm, RF and TeraSort
Fixed
- Fix bugs & tests
- Python
Published by cTatu over 2 years ago
https://github.com/bsc-wdc/dislib - v0.8.0
New features
saveandloadmethods for all models- Adding Muliclass CSVM
- Adding TS-QR (Tall Skinny QR)
- New in-place operations for ds-arrays:
addiaddisub - Matrix-Subtraction and Matrix-Addition
- Concatenating two ds-arrays by columns
- Save ds-array to
npyfile - Load ds-array from several
npyfiles - Create ds-arrays from blocks
- GridSearch for simulations & improvements
- Inverse transformation in Scalers
- Train-Test-Split functionality
- Add KNN Classifier
- Better SVD columns pairing
- GPU Support using CUDA/CuPy for algorithms: Kmeans, KNN, SVD, PCA, Matmul, Addition, Subtraction, QR, Kronecker
Changed
- New documentation for GPU, RandomForest, Scalers
Fixed
- Fix bug Scalers & tests
- Python
Published by cTatu over 3 years ago
https://github.com/bsc-wdc/dislib - v0.7.1
What's Changed
0.7.0 + documentation fix
Full Changelog: https://github.com/bsc-wdc/dislib/compare/v0.7.0...v0.7.1
- Python
Published by cTatu over 4 years ago
https://github.com/bsc-wdc/dislib - v0.7.0
New features
- QR decomposition
- Random Forest regressor
- MinMax scaler
- Matrix multiplication with transposed arguments
- several utility functions to pad matrices, or to remove last rows/columns
Improvements
- improved performance of SVD
- computing units for each task
- Python
Published by michal-choinski over 4 years ago
https://github.com/bsc-wdc/dislib - v0.6.4
Dependencies
- PyCOMPSs >= 2.7
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
- cvxpy>=1.1.5
Improvements
- SVD doc example fixed.
- LR example fixed.
- Warn when cvxpy dependency missing (for mn4 installation).
- Added link to Contributing guide in docs
- Python
Published by salvisolamartinell over 5 years ago
https://github.com/bsc-wdc/dislib - v0.6.3
Dependencies
- PyCOMPSs >= 2.7
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
- cvxpy>=1.1.5
Improvements
- PyPI long_description shortened.
- Python
Published by salvisolamartinell over 5 years ago
https://github.com/bsc-wdc/dislib - v0.6.2
Dependencies
- PyCOMPSs >= 2.7
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
- cvxpy>=1.1.5
Improvements
- Added extra info for PyPI
- Python
Published by salvisolamartinell over 5 years ago
https://github.com/bsc-wdc/dislib - v0.6.1
Dependencies
- PyCOMPSs >= 2.7
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
- cvxpy>=1.1.5
Improvements
- Documentation fixes.
- Python
Published by salvisolamartinell over 5 years ago
https://github.com/bsc-wdc/dislib - v0.6.0
Dependencies
- PyCOMPSs >= 2.7
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
- cvxpy>=1.1.5
Upgrade Steps
If using docker, just use the new image.
If you have a local installation, upgrade to COMPSs 2.7 (see COMPSs doc) before upgrading to dislib 0.6.0. Also, install the Python cvxpy module in order to use the regression algorithms: pip install cvxpy.
Breaking Changes
- ds-array doesn't accept a chunk_size bigger than the array.
- Moved data loading routines to a different file as array.py was getting too big.
- applyalongaxis for sparse data now returns sparse ds-arrays.
- Some PyCOMPSs log messages have changed.
New Features
- User guide and glossary
- Method to read from npy files
- Support for one-dimensional data in ds-array
- Parametrized ds-array tests
- identity, full and zeros methods that generate ds-arrays filled with a value
- ds-array operators: subtraction, division, conjugate, transpose, item setting, etc.
- matmul, kronecker product and rechunk methods for of ds-arrays
- Automatic deletion of ds-arrays when the GC is called.
- Multivariate linear regression.
- SVD (Singular Value Decomposition)
- PCA using SVD
- ADMM Lasso algorithm
- Daura clustering algorithm
Bug Fixes
- Some bugs in the ds-array
- Internal inconsistencies in transformed_array of PCA
Improvements
- Improved performance testing scripts and added new tests
- Allow executing applications with params using dislib exec
- Extended and improved the tutorial notebook
- Updated dislib-base docker image
- Replaced COLLECTIONINOUT parameters with COLLECTIONOUT when possible for improving performance
- Python
Published by salvisolamartinell over 5 years ago
https://github.com/bsc-wdc/dislib - v0.5.0
Dependencies
- PyCOMPSs == 2.5
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
New Features
- Added grid search and randomized search with cross-validation
- Added K-fold splitter
- dislib command line can now run jupyter notebooks
Bug Fixes
- Fixed various bugs in fancy indexing of ds-arrays
- dislib command line now works on MacOS
- Fixed "source" links in the documentation to point to the appropriate version of the source code
- dislib command line now works even if PyCOMPSs is not installed
Improvements
- Added a new notebook and improved the existing one
- PCA now supports sparse data
- Estimators now extend scikit-learn's base estimator for greater integration
- Python
Published by javicid over 6 years ago
https://github.com/bsc-wdc/dislib - v0.4.3
Dependencies
- PyCOMPSs == 2.5
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
Improvements
- Installing dislib via pip now automatically places the dislib executable in the PATH.
- Python
Published by javicid over 6 years ago
https://github.com/bsc-wdc/dislib - v0.4.0
Dependencies
- PyCOMPSs == 2.5
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
Breaking Changes
- Most estimator methods, such as fit and predict, now expect one or two ds-arrays instead of a Dataset.
New Features
- This release introduces the distributed array as the main data structure in dislib. All estimators have been modified to accept ds-arrays instead of Datasets. The Dataset and Subset classes have been removed.
Bug Fixes
- Minor bug fixes in RandomForestClassifier and K-means
Improvements
- The performance of various algorithms has been improved by using PyCOMPSs COLLECTIONS.
- K-means now accepts an 'init' parameter.
- Python
Published by javicid over 6 years ago
https://github.com/bsc-wdc/dislib - v0.3.0
Dependencies
- PyCOMPSs == 2.5
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0
New Features
- GaussianMixture now supports covariance types 'tied', 'diag', and 'spherical' apart from 'full'.
- dislib now provides PCA and LinearRegression models.
Bug Fixes
- Fixed DBSCAN to be able to detect clusters with less than min_samples samples, and to be able to detect clusters that lie in the intersection of two regions.
Improvements
- The GaussianMixture documentation has been improved.
- Extra tests for GaussianMixture, C-SVM and DBSCAN have been added.
- The performance of K-means, DBSCAN and GaussianMixtures has been significantly improved.
- The performance of utils.shuffle has been improved by using PyCOMPSs collections.
- The performance of Dataset has been improved by removing the tracking of duplicates.
- Python
Published by javicid almost 7 years ago
https://github.com/bsc-wdc/dislib - v0.2.1
Dependencies
(Update dependency versions if required)
- PyCOMPSs >= 2.4-rc1902
- Scikit-learn >= 0.19.1
- NumPy >= 1.15.4
- Scipy >= 1.0.0
Bug Fixes
- DBSCAN now detects clusters with less than min_samples in certain situations
Improvements
- The performance of DBSCAN has been improved
- Python
Published by javicid almost 7 years ago
https://github.com/bsc-wdc/dislib -
Dependencies
- PyCOMPSs == 2.4-rc1902
- Scikit-learn => 0.19.1
- NumPy => 1.15.4
- Scipy => 1.0.0
Upgrade Steps
Breaking Changes
- predict and fit_predict methods in K-means, DBSCAN and C-SVM now take a Dataset as argument and do not return anything
New Features
The following new algorithms have been implemented:
- Gaussian mixtures
- Nearest neighbors
- Alternating least squares
- Standard scaler
Added the following utility methods:
- resample
- shuffle
- as_grid
Bug Fixes
- Numerous bug fixes in DBSCAN.
- Fixed the reproducibility of results in C-SVM and random forests
- Several other minor bug fixes
Improvements
- Completely unified the interface of the different algorithms
- Improved the documentation
- Added a way to easily access Dataset samples and labels
- Implemented Dataset's transpose
- Implemented Dataset's apply function
- Python
Published by javicid over 7 years ago
https://github.com/bsc-wdc/dislib -
This release has been tested with COMPSs version rc1902.
- Python
Published by javicid over 7 years ago