pydata-wrangler - v0.4.0 (June, 2025)

🚀 High-Performance Polars Backend + Simplified Text API

## 🎯 Key Features

### ⚡ NEW: High-Performance Polars Backend (2-100x faster!) - Dual DataFrame Support: Choose between pandas (default) or Polars backends - Zero Code Changes: Add backend='polars' to any operation for instant speedups - Comprehensive Coverage: All data types (arrays, text, files) work with both backends - Smart Type Preservation: DataFrames maintain their type when no backend specified - Global Configuration: Set default backend preference with set_dataframe_backend('polars') - Cross-Backend Conversion: Seamlessly convert between pandas and Polars DataFrames

### 📊 Performance Gains with Polars - Array Processing: 2-100x faster conversion for large datasets - Text Embeddings: 3-10x faster document processing - Memory Efficiency: 30-70% reduction in memory usage - Parallel Processing: Built-in multi-core optimization

### 🎨 Simplified Text Model API (80% reduction in verbosity) - Simple String Format: {'model': 'all-MiniLM-L6-v2'} now works everywhere - Automatic Normalization: All model formats converted to unified dict internally - List Support: Lists of models work with simplified format - Full Backward Compatibility: All existing verbose syntax continues working

## 📋 Quick Start Examples

### High-Performance Processing ```python import datawrangler as dw import numpy as np

# Large dataset example large_array = np.random.rand(50000, 20)

# Traditional pandas backend pandasdf = dw.wrangle(largearray) # Default

# High-performance Polars backend (2-100x faster!) polarsdf = dw.wrangle(largearray, backend='polars')

# Set global preference from datawrangler.core.configurator import setdataframebackend setdataframebackend('polars') # All operations now use Polars

Simplified Text Processing

# Before v0.4.0 (verbose) text_kwargs = { 'model': { 'model': 'all-MiniLM-L6-v2', 'args': [], 'kwargs': {} } }

# After v0.4.0 (simplified!) text_kwargs = {'model': 'all-MiniLM-L6-v2'}

# Works with Polars for 3-10x faster text processing fastembeddings = dw.wrangle(texts, textkwargs=text_kwargs, backend='polars')

🔧 Additional Improvements

Google Colab Fix: Eliminated installation warning popup
Cleaner Dependencies: Removed redundant configparser
Enhanced Documentation: All examples updated for both backends
API Consistency: Fixed all docstring examples to use public API

📈 When to Use Each Backend

Use pandas for: Small datasets, complex index operations, maximum ecosystem compatibility
Use Polars for: Large datasets, performance-critical applications, memory efficiency

🚀 Installation

pip install --upgrade pydata-wrangler

# For full ML capabilities including sentence-transformers pip install --upgrade "pydata-wrangler[hf]"

🧪 Verified Quality

✅ All 45 tests passing
✅ Documentation builds successfully
✅ Full backward compatibility maintained
✅ Comprehensive API examples tested

This release maintains full backward compatibility while delivering significant performance improvements and API simplification. Upgrade today to experience the power of high-performance data wrangling!

- Python
Published by jeremymanning about 1 year ago

pydata-wrangler - v0.3.0 (June, 2025)

🎉 Major Release: NumPy 2.0+ Compatibility & Modern ML Libraries

This release brings full compatibility with NumPy 2.0+ and pandas 2.0+ while modernizing the text embedding infrastructure with sentence-transformers.

🚀 New Features

Full NumPy 2.0+ and pandas 2.0+ compatibility
Modern sentence-transformers integration for text embeddings
Support for latest scikit-learn, matplotlib, and scipy versions
Enhanced error handling for missing dependencies
Updated Python support (3.9-3.12)

🔧 Breaking Changes

Replaced Flair with sentence-transformers for text embeddings
Removed gensim dependency (eliminates NumPy version conflicts)
Updated text embedding API to use sentence-transformers models
Dropped Python 3.6-3.8 support in favor of modern Python versions

🐛 Bug Fixes

Fixed numpy.str_ deprecation that broke in NumPy 2.0+
Updated HuggingFace datasets import for API changes
Fixed sklearn model detection preventing incorrect sentence-transformers usage
Fixed pandas iteritems deprecation for pandas 2.0+ compatibility
Replaced deprecated matplotlib.pyplot.imread

📚 Documentation & Examples

Updated all examples to use sentence-transformers syntax
Modernized installation instructions and model references
Comprehensive tutorial updates with new embedding approaches

🔄 Migration Guide

Old Flair syntax: {'model': 'TransformerDocumentEmbeddings', 'args': ['bert-base-uncased']}

New sentence-transformers syntax: {'model': 'all-mpnet-base-v2', 'args': [], 'kwargs': {}}

🛠️ Technical Changes

Sklearn models (CountVectorizer, etc.) now properly detected before sentence-transformers
Enhanced model detection prevents accidental model misclassification
Improved error messages for missing optional dependencies
Full compatibility with modern scientific Python stack

- Python
Published by jeremymanning about 1 year ago

pydata-wrangler - v0.2.2 (July, 2022)

v0.2.1: Bug fixes when hugging-face libraries aren't installed
v0.2.2: Better error handling when hugging-face libraries aren't installed and user asks to embed text using hugging-face models

- Python
Published by jeremymanning almost 4 years ago

pydata-wrangler - v0.2.0 (July, 2022)

Adds CUDA (GPU) support for pytorch models
Streamline package by not installing hugging-face (🤗) support by default
Adds Python 3.10 support (and associated tests)
Relaxes some tests to support a wider range of platforms (mostly this is relevant for GitHub CI)
Relaxes requirements.txt versioning to improve compatibility with other libraries when installing via pip

- Python
Published by jeremymanning almost 4 years ago

pydata-wrangler - v0.1.7 (August, 2021)

Updates model defaults to support more use cases

- Python
Published by jeremymanning almost 5 years ago

pydata-wrangler - v0.1.6 (August, 2021)

Note: this version will be replaced by 0.1.7 shortly; tagging for archival purposes.

More fixes to dw.unstack.

- Python
Published by jeremymanning almost 5 years ago

pydata-wrangler - v0.1.0 (July, 2021)

Initial release!

- Python
Published by jeremymanning almost 5 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of pydata-wrangler

pydata-wrangler - v0.4.0 (June, 2025)

🚀 High-Performance Polars Backend + Simplified Text API

pydata-wrangler - v0.3.0 (June, 2025)

pydata-wrangler - v0.2.2 (July, 2022)

pydata-wrangler - v0.2.0 (July, 2022)

pydata-wrangler - v0.1.7 (August, 2021)

pydata-wrangler - v0.1.6 (August, 2021)

pydata-wrangler - v0.1.5

pydata-wrangler - v0.1.4 (August, 2021)

pydata-wrangler - v0.1.3 (August, 2021)

pydata-wrangler - v0.1.2 (August, 2021)

pydata-wrangler - v0.1.1 (July, 2021)

pydata-wrangler - v0.1.0 (July, 2021)