Recent Releases of ParquetDB
ParquetDB - Release 1.0.1
ParquetDB 1.0.1 Release Notes
We are thrilled to announce the 1.0.1 release of ParquetDB!
ParquetDB is a Python library designed to be a "middleware" solution, effectively bridging the gap between simple file-based storage (like CSV, JSON) and more complex, full-fledged database systems. It leverages the power and efficiency of Apache Parquet files while providing a user-friendly, database-like interface for managing and querying your data.
This release marks a significant milestone in providing a robust and streamlined solution for researchers and developers working with evolving, complex, and nested datasets, particularly in environments where traditional databases are overkill or impractical, such as HPC clusters with limited connectivity.
Why ParquetDB?
In many research and development workflows, data storage needs fall into a challenging middle ground:
- Traditional file formats (CSV, JSON) are simple but inefficient for numerical data, lack querying capabilities, and struggle with schema evolution and complex data types.
- Binary formats like HDF5 are more efficient for numerical data but act more like structured file containers, lacking rich querying APIs and easy management of data relationships.
- Full database systems (SQL or NoSQL) offer robust features but can be overly complex to set up and manage, introduce rigidity in schema management (SQL), or present consistency challenges (some NoSQL). They often require server configurations, making them less suitable for lightweight experimentation or "classically serverless" deployments.
- Directly using libraries like PyArrow with Parquet files provides efficiency but requires significant boilerplate for database-like operations (CRUD), schema consistency, and handling complex Python objects.
ParquetDB was born out of the need to address these limitations, specifically for iterative research workflows requiring:
- Schema Evolvability: Seamlessly adapt your data schema over time without upfront rigidity.
- Complex Nested Data Structures: Natively handle and manage intricate, evolving nested data.
- Table and Field-Level Metadata: Easily manage metadata associated with your datasets.
- "Classically Serverless" Operation: Ideal for environments like HPC clusters with no reliance on network-connected database servers.
- Performance: Efficient data storage, retrieval, and querying.
Key Features in 1.0.0:
- Simple, Database-like Interface: Intuitive methods for
create,read,update, anddeleteoperations. - Leverages Apache Parquet: Benefits from columnar storage for efficient compression and read performance.
- Minimal Overhead: Achieves competitive read/write speeds without the complexity of traditional database setup.
- Handles Complex Data Types: Natively supports nested structures, arrays, and even Python objects (via pickling).
- Schema Evolution: Add new fields and update schemas without hassle.
- Efficient Querying: Utilizes predicate pushdown for optimized data retrieval.
- Normalization: Tools to balance data distribution across files for consistent performance.
- Batching: Efficiently process large datasets.
- Pandas DataFrame Integration: Easily add data from and read data into Pandas DataFrames.
Performance Highlights:
Our benchmarks demonstrate that ParquetDB offers competitive performance:
- Write Performance: Competitive creation times, performing well against SQLite as dataset sizes increase.
- Read Performance: While initial reads on very small datasets might be comparable, ParquetDB significantly outperforms competitors like SQLite and MongoDB on larger datasets for bulk read operations, showcasing the efficiency of the underlying columnar Parquet format.
- Query Performance: Effectively uses predicate pushdown with Parquet's field-level statistics for efficient filtering, excelling when querying or returning substantial portions of wide datasets.
(For detailed benchmarks and comparisons, please refer to our documentation and forthcoming paper: Lang, ParquetDB: A Lightweight Python Library for Serverless Management of Complex, Evolving Datasets Using Apache Parquet, 2025).
When to Choose ParquetDB:
ParquetDB shines when you're dealing with:
- Complex and deeply nested data.
- Schemas that are expected to evolve over time.
- The need for a serverless solution that manages collections of Parquet files as a coherent, updatable dataset.
- Scenarios where full database systems are too heavy, but basic file I/O is insufficient.
If your data is simple, flat, and has a stable schema, tools like DuckDB or direct Parquet file management with Pandas/PyArrow might be sufficient. However, ParquetDB offers a streamlined approach for more intricate data management challenges.
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] 10 months ago
ParquetDB - Release 0.28.0
Release v0.28.0 (15-05-2025)
This release delivers a critical bug fix for table updates and enriches the project documentation. The update ensures reliable row ordering in multi‐key joins, while the refined paper.md and new DuckDB reference improve clarity and completeness.
Bugs
- Fix index handling in updateflattendtable to correctly select and sort incoming and existing tables, preserving row order after right joins in multi-key updates
New features
- None identified
Documentation updates
- Refine paper.md wording and punctuation for greater clarity
- Add DuckDB entry to the bibliography
Maintenance
- None identified
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] 10 months ago
ParquetDB - Release 0.27.1
Release v0.27.1 (13-05-2025)
This release focuses on streamlining documentation, enhancing type and path handling, and improving logging verbosity and metadata management. A critical bug in dataset renaming has been fixed, and dynamic node typing support has been added.
Bugs
- Fixed an issue in the
rename_datasetmethod that previously prevented proper dataset renaming.
New features
- Introduced
set_node_typefor dynamic node type updates and simplified overallnode_typemanagement.
Documentation updates
- Renamed “GraphDB” to “ParquetGraphDB” across all docs and added a dedicated ParquetGraphDB guide.
- Moved tutorials to an
examplesdirectory and updated index.rst paths accordingly. - Replaced all instances of “matgraphdb” with “parquetdb” and adjusted module references from
parquetdb.coretoparquetdb.graphin the API docs. - Added a public section detailing ParquetDB’s motivation, advantages over pyarrow/pandas, schema-evolution and nested-data strategies, plus an internal overview of its complex-data management benefits.
- Enhanced the README with direct links to documentation, PyPI, GitHub, contributing guidelines, and license information.
Maintenance
- Standardized on
pathlib.Pathfor file handling in ParquetDB and store constructors (now acceptstrorPath). - Removed built-in metadata storage; stores now rely on user-provided metadata types and redundant initialization code/tests have been cleaned up.
- Defaulted
setup_kwargsandinitialize_kwargsto empty dictionaries only when not provided. - Added type hints across ParquetDB methods for clearer interfaces.
- Tuned logging: key operations now use
debuglevel to reduce noise; error logs added for unsupported or invalid load formats. - Updated
pyproject.tomlto consolidate docs and testing dependencies underparquetdb[docs,tests]and added themyst_parserrequirement.
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] 10 months ago
ParquetDB - Release 0.27.0
Release v0.27.0 (07-05-2025)
This release introduces a new Parquet-based graph database API complete with data validation and adjustable logging, while streamlining version management via setuptools_scm. It also upgrades testing, CI, and documentation—expanding Python support to 3.8–3.13, refining example notebooks, and improving test robustness.
Bugs
- Enforce the required
edge_typeinput in ParquetGraphDB tests to fix edge-case failures - Suppress
DeprecationWarningin ParquetGraphDB tests for cleaner output
New features
- Introduce core graph database components (
NodeStore,EdgeStore,ParquetGraphDB) with DataFrame validation - Add a
verboseparameter to stores andParquetGraphDBfor adjustable logging - Enhance release PR comments to include version and date for clearer CHANGELOG entries
Documentation updates
- Add and update ParquetGraphDB example notebooks and gallery demos
- Remove obsolete example notebook and revise
index.rstaccordingly - Configure nbsphinx to skip execution in the graph generator notebook
- Add a link in the README to CONTRIBUTING.md to guide new contributors
Maintenance
- Remove the obsolete
_version.pyand adopt setuptools_scm for automatic version tracking - Expand Python compatibility to 3.8–3.13, bump
pyarrowto ≥17.0.0, and update GitHub Actions CI workflows - Refactor
PythonObjectPandasArray.__arrow_array__for binary storage and add module logging - Streamline path handling in stores and remove verbose debug logs
- Refactor and expand unit tests:
- Use isolated fixtures with temp directories and assert array shapes, content, and ordering
- Validate return types and update assertions with correct syntax
- Use isolated fixtures with temp directories and assert array shapes, content, and ordering
- Update
.gitignoreto exclude example GraphDB files and data directory patterns
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] 10 months ago
ParquetDB - Release v0.26.0
This release enhances automation across CI/CD and documentation workflows, introduces new testing utilities and PyArrow joins, and fixes temporary file naming and notebook metadata issues. It also refines configuration management, cleans up dead code, and expands test coverage for greater reliability.
Bugs
- Renamed temporary Parquet files from
tmp_{dataset_name}totmp-{dataset_name}to prevent race conditions in tmp-prefixed directories - Reset notebook execution counts and refreshed outputs for accurate metadata
- Updated CI to install test dependencies via the
[tests]extras and pinned Python to 3.12 for compatibility
New features
- Added a GitHub Actions workflow for fully automated Python package releases and branch updates
- Implemented scripts to generate release notes from PR commits and comments
- Introduced a continuous pytest runner with timestamped failure logs, graceful shutdown, and iteration feedback
- Added
join_tablesin PyArrow to perform inner, outer, and other joins with suffix handling (includesleft_outer_joinexample) - Auto-creates a user-specific configuration file when none is found
Documentation updates
- Removed the initial project note from the README and updated its link to point to
docs/index.html - Created
CONTRIBUTING.mdwith cloning, testing, and communication guidelines - Relocated Sphinx sources into
docs, removed built docs, Makefiles, and obsolete files, and updated.readthedocs.yaml - Configured Read the Docs and added a Sphinx build workflow (Python setup, dependency install, build validation) that fails on errors
- Enhanced notebook formatting with custom CSS and added installation instructions for
parquetdbandpymongoin examples - Revised
paper.mdand benchmarks for clarity, updated performance comparisons and references, standardized notebook names, and corrected execution counts
Maintenance
- Removed dead code:
TransactionManagerclass and methods - Refactored
config.pyfor flexible loading, addedplatformdirs, and fixed stray dependency formatting - Cleaned up imports, applied code formatting, and upgraded model references to
o4-mini - Added
pytestandpytest-covto project dependencies and updated CI workflows accordingly - Clarified test data comments, re-enabled
test_update_multi_keys, and improved temp-directory management with existence checks and debug logs - Replaced ad-hoc
printstatements with structured logger calls and added schema‐merge logging while removing redundant logs - Renamed CI jobs (
run-tests), standardized Python to 3.10, simplified workflows by removing unnecessary steps, and added atest-build-packagejob for build/publish automation - Refined
_version.pywith proper formatting, comments, public API (__all__), and clarified type-hint imports - Expanded ParquetDB test suite to cover row counts, row groups, file sizes, column metadata, and dataset copying
Scientific Software - Peer-reviewed
- Python
Published by github-actions[bot] 10 months ago
ParquetDB - v0.25.1
0.25.1 (02-16-2025)
Bugs
- None identified
New Features
- Implemented methods to retrieve file and row group sizes in ParquetDB
Documentation updates
- Updated versioning information in _version.py and CHANGELOG.md for the new release
Maintenance
- Merged changes from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.25.0
0.25.0 (02-15-2025)
Bugs
- None identified
New Features
- Added example notebooks for advanced 3D Alexandria and Jarvis datasets
- Introduced matplotlib utility functions and improved periodic table plotting
- Expanded test cases for new metadata retrieval functionality
Documentation
- Enhanced the README.md by adding a Documentation section to the table of contents and restructuring for clarity
- Created API Reference documentation for ParquetDB
- Updated documentation references to simplify class names in Core API documentation
- Prepared documentation for online hosting and integration with ReadTheDocs
- Streamlined documentation for improved readability
- Added JOSS paper metadata for ParquetDB publication
- Updated documentation configuration and static assets
Maintenance
- Modified dataset loading to ignore nested directories and improve filter handling
- Cleaned up and ensured consistent formatting in plotting scripts
- Revised the test runner configuration in test_parquetdb.py
- Removed the .nojekyll file from docs directory
- Merged branch 'main' from remote repository
- Updated _version.py and CHANGELOG.md for the new release
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.24.1
0.24.1 (01-26-2025)
Bugs
- None identified
New Features
- None identified
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release
Maintenance
- Merged changes from the main branch of the remote repository
- Ensured the original dataset remains unchanged during transformation to a new path
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.24.0
0.24.0 (01-25-2025)
Bugs
- None identified
New Features
- Implemented a transform method in ParquetDB for flexible dataset transformations
Documentation updates
- Updated version information in _version.py and CHANGELOG.md due to a new release
Maintenance
- Merged latest changes from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.23.7
0.23.7 (01-23-2025)
Bugs
- None identified
New features
- Enhanced dataset loading to ignore temporary files.
- Improved data type checks in loading methods.
Documentation updates
- Updated version number and changelog for new release.
Maintenance
- Refactored ParquetDB class methods for better dataset handling.
- Updated methods to accurately check for data presence.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.23.6
0.23.6 (01-21-2025)
Bugs
- Fixed issue with batch deletion of columns by updating the schema before writing the dataset.
New features
- None identified
Documentation updates
- None identified
Maintenance
- Updated version information in
_version.pyandCHANGELOG.mdfor the new release.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - 0.23.5
0.23.5 (01-16-2025)
Bugs
- None identified
New features
- Enhanced the 'ParquetDB' class to improve handling of nested datasets.
- Introduced a dedicated directory for nested datasets within the database path.
Documentation updates
- Updated '_version.py' and 'CHANGELOG.md' to reflect the new release.
Maintenance
- Merged latest changes from the main branch of the remote repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.23.4
0.23.4 (01-09-2025)
Bugs
- None identified
New features
- Enhanced the
PythonObjectPandasArrayclass with a new__setitem__method for improved array manipulation.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdto reflect the new release.
Maintenance
- Merged updates from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.23.3
0.23.3 (01-09-2025)
Bugs
- None identified
New Features
- Enhanced ParquetDB functionality by adding new dependencies:
daskanddistributed. - Improved equality check in
PythonObjectArrowScalarclass for better accuracy.
Documentation updates
- Updated
CHANGELOG.mdto reflect the new release.
Maintenance
- Refactored
pyarrow_utils.pyfor better readability and organization. - Revised test cases to manage
Nonevalues in structure fields. - Updated
pyproject.tomlto include new dependencies.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - 0.23.2
0.23.2 (01-08-2025)
Bugs
- None identified
New features
- Enhanced the ParquetDB class to more robustly check for empty datasets and normalize incoming table schemas.
Documentation updates
- Updated the CHANGELOG.md and _version.py to reflect the new release.
Maintenance
- Merged the latest changes from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.23.1
0.23.1 (01-08-2025)
Bugs
- None identified
New features
- Enhanced the
parallel_applyfunction inmp_utils.pyto allow the specification of aprocessesparameter, improving flexibility in multiprocessing for large datasets.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdto reflect the latest release.
Maintenance
- Merged changes from the main branch of the repository on GitHub.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.23.0
0.23.0 (01-08-2025)
Bugs
- None identified
New Features
- None identified
Documentation
- Updated
_version.pyandCHANGELOG.mdto reflect the new release.
Maintenance
- Adjusted unit tests for compatibility with new settings and functionality.
- Merged changes from the main branch of the remote repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.22.0
0.22.0 (01-07-2025)
Bugs
- None identified
New Features
- None identified
Documentation updates
- Updated
_version.pyandCHANGELOG.mdto reflect the new release
Maintenance
- Ensured consistent formatting and spacing throughout the codebase for improved readability
- Merged updates from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.21.0
0.21.0 (01-06-2025)
Bugs
- None identified
New Features
- Enhanced ParquetDB to support Python object serialization.
- Added new utility functions for multiprocessing and plotting.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor new release.
Maintenance
- Removed redundant utility files to improve maintainability.
- Updated unit tests to align with changes in
rename_datasetfunctionality.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.19.0
0.19.0 (01-03-2025)
Bugs
- None identified
New features
- Enhanced ParquetDB initialization to accept initial fields for schema definition.
- Refactored the
get_field_metadatamethod to support retrieval of metadata for multiple field names.
Documentation updates
- Updated version information in
_version.pyandCHANGELOG.mdfollowing the new release.
Maintenance
- Merged changes from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.18.0
0.18.0 (01-03-2025)
Bugs
- None identified
New features
- Enhanced metadata handling for ParquetDB
Documentation updates
- Updated
_version.pyandCHANGELOG.mdin preparation for the new release
Maintenance
- Merged updates from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.17.0
0.17.0 (01-03-2025)
Bugs
- None identified
New Features
- None identified
Documentation
- Updated
_version.pyandCHANGELOG.mdfor new release.
Maintenance
- Merged changes from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.16.0
0.16.0 (01-02-2025)
Bugs
- None identified
New Features
- Enhanced unit tests to validate schema updates and ensure correct handling of metadata updates with the new parameter.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release.
Maintenance
- Merged changes from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.15.0
0.15.0 (01-02-2025)
Bugs
- None identified
New features
- Modified unit tests to validate new metadata setting behaviors, ensuring proper functionality for both update and replace actions.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release.
Maintenance
- Merged changes from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.14.0
0.14.0 (01-01-2025)
Bugs
- None identified
New Features
- None identified
Documentation updates
- Updated the CHANGELOG.md to reflect the new release.
Maintenance
- Refactored the ParquetDB preprocessing for improved code structure.
- Removed outdated TODO comments from
parquetdb.pyto streamline the codebase. - Refactored update handling in ParquetDB.
- Updated
_version.pyto align with the new release.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.13.0
0.13.0 (01-01-2025)
Bugs
- None identified
New Features
- Enhanced the
dropmethod to recreate the dataset directory and initialize an empty table when dropping an existing dataset.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release.
Maintenance
- Merged updates from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.12.0
0.12.0 (01-01-2025)
Bugs
- None identified
New Features
- Developed early exit functionality for handling empty datasets in ParquetDB operations.
- Enhanced table handling and schema management features in ParquetDB.
Documentation updates
- Updated version information and CHANGELOG for the new release.
Maintenance
- Merged changes from the main branch of the remote repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.11.0
0.11.0 (12-30-2024)
Bugs
- None identified
New Features
- None identified
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release
Maintenance
- Refactored schema merging and improved metadata handling in ParquetDB
- Merged changes from the main remote repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.10.0
0.10.0 (12-30-2024)
Bugs
- None identified
New Features
- Enhanced metadata handling in ParquetDB
Documentation updates
- Updated version information in
_version.pyand theCHANGELOG.mdfor the new release
Maintenance
- Improved logging functionality in ParquetDB
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.9.0
0.9.0 (12-18-2024)
Bugs
- None identified
New features
- Enhanced the update functionality to support customizable update keys in ParquetDB.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdto reflect the new release.
Maintenance
- Refactored table update logic and improved schema alignment in ParquetDB.
- Merged updates from the main branch.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.8.0
0.8.0 (12-18-2024)
Bugs
- None identified
New features
- Refactored handling of nested dataset directories in ParquetDB.
- Added unit tests for rebuilding nested structures.
Documentation updates
- Updated version information in _version.py and CHANGELOG.md for the new release.
Maintenance
- Merged updates from the main branch of the upstream repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.7.0
0.7.0 (12-18-2024)
Bugs
- None identified
New features
- None identified
Documentation updates
- Updated _version.py and CHANGELOG.md for the new release
Maintenance
- Refactored ParquetDB to ensure consistent use of db_path for directory handling
- Merged latest changes from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.6.0
0.6.0 (12-15-2024)
Bugs
- None identified
New features
- Enhanced ParquetDB with improved logging and support for shape tensors.
- Implemented
sort_fieldsmethod in ParquetDB with unit tests. - Implemented
rename_fieldsmethod in ParquetDB with unit tests. - Added a method to set field metadata.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor new release.
Maintenance
- Removed outdated TODO comments in
parquetdb.py. - Refactored ParquetDB initialization and directory handling.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.5.14
0.5.14 (12-12-2024)
Bugs
- Improved handling for nested dictionaries within lists to prevent errors with empty nested lists containing structs.
New Features
- Introduced keyword arguments to enhance control over preprocessing of incoming data, allowing automatic conversion of lists of floats or ndarray-like structures to fixed shape tensors.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdin preparation for a new release.
Maintenance
- Unified schema handling in
pa.unify_schemato accommodate special cases.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.5.13
0.5.13 (12-02-2024)
Bugs
- None identified
New Features
- Added additional authors to
pyproject.toml - Introduced new plotting scripts for benchmarks
- Added utility functions for increased functionality
- Included example for importing the Jarvis Alexandria 2D dataset
- Added new example for importing the Jarvis DFT 3D dataset
- Improved normalization process in benchmarks
Documentation updates
- Updated docstrings for clarity
- Revised the first example to enhance understanding
- Updated
_version.pyandCHANGELOG.mdfor the new release
Maintenance
- Merged changes from the main branch of the ParquetDB repository
- Removed unnecessary logging level modifications
- Added new examples to demonstrate functionality
Scientific Software - Peer-reviewed
- Python
Published by lllangWV about 1 year ago
ParquetDB - v0.5.12
0.5.12 (10-25-2024)
Bugs
- Removed unnecessary deletion that did not affect outer scope variable
- Removed
ParquetDb_managerdue to an issue
New features
- None identified
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release
Maintenance
- Merged updates from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - 0.5.11
0.5.11 (10-25-2024)
Bugs
- None identified
New Features
- Introduced data classes for handling normalization and loading configurations
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the latest release
Maintenance
- Reformatted code
- Removed the
ParquetDBManagerclass - Merged updates from the main branch of repository
https://github.com/lllangWV/ParquetDB
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.5.10
0.5.10 (10-25-2024)
Bugs
- Fixed bug where some
FixedListArrayswere null. - Resolved issue where some rows were null in the method that enforces numeric and boolean list types.
New Features
- Introduced a new method for preprocessing incoming tables.
- Modified create and update methods to apply
table_column_callbacks, allowing users to reconstruct ndarrays more easily.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release. - Enhanced comments for better code readability.
Maintenance
- Updated tests to ensure functionality.
- Moved data generation logic to
general_utils. - Cleaned up the codebase for improved quality.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.4.1
0.4.1 (10-23-2024)
Bugs
- Fixed an issue with normalization when handling batch updates.
- Updated the method to correctly handle updates to list fields.
New Features
- Reworked the modification process to keep main files untouched, generating and renaming new files as needed.
Documentation
- Enhanced documentation for the project.
Maintenance
- Updated
_version.pyandCHANGELOG.mdfor new releases. - Updated example files.
- Updated
config.yml. - Updated
.gitignore. - Merged the latest updates from the main branch.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.3.1
0.3.1 (10-16-2024)
Bugs
- Fixed bug in batch updates resulting in incorrect chunked arrays from columns. Updated record batches to ensure casting to the incoming schema is applied correctly.
New Features
- Introduced capability to delete columns.
- Added new data generation methods.
- Implemented new benchmarks for performance evaluation.
- Added utility functions for matplotlib.
- Included benchmark scripts for SQLite, MongoDB, and ParquetDB.
- Enabled rebuilding of nested tables from a flattened structure in the read method.
Documentation Updates
- Updated the README to include a section for benchmark overview.
- Improved HTML for embedding PDFs in the README.
- Updated CHANGELOG.md and _version.py for the new release.
Maintenance
- Corrected default
normalize_kwargs. - Optimized the update table method for a 5x speed increase.
- Moved default normalization parameters to config.yml.
- Renamed directory to benchmarks and changed PDF files to PNG format.
- Updated development dependencies.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.7
0.2.7 (10-11-2024)
Bugs
- None identified
New features
- None identified
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release
Maintenance
- Improved workflow scripts
- Merged latest changes from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.6
0.2.6 (10-11-2024)
Bugs
- None identified
New Features
- Enhanced the logging mechanism in tests with separate loggers.
- Improved methods for creating, updating, and deleting schemas to support batch operations.
Documentation updates
- Updated example for clarity.
Maintenance
- Merged changes from the main branch of the repository.
- Removed unnecessary development scripts.
- Deleted obsolete file.
- Excluded
dev_scriptsfrom.gitignore. - Updated
_version.pyandCHANGELOG.mdfor the new release.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.5
0.2.5 (10-11-2024)
Bugs
- None identified
New features
- Introduced a new storage method that flattens all nested structures into a single table and sorts columns alphabetically, enhancing performance.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdto reflect the new release.
Maintenance
- Improved
.gitignoreto excludedev_scripts. - Refined tests.
- Revised example script for clarity.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.4
0.2.4 (10-11-2024)
Bugs
- None identified
New Features
- Added
__version__import to the Parquet module
Documentation
- Updated
_version.pyandCHANGELOG.mdfor the new release
Maintenance
- Merged changes from the main branch of the repository
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.3
0.2.3 (10-11-2024)
Bugs
- Fixed a bug in external_utils that prevented automatic creation of source and destination directories.
New features
- Added a development script to handle ordering of nested fields.
- Updated dependencies to include 'requests'.
Documentation updates
- Updated _version.py and CHANGELOG.md for the new release.
Maintenance
- Improved the workflow script to include test building of the package before pushing the version and changelog.
- Updated the workflow script for better branch management after pushing changes.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.2
0.2.2 (10-10-2024)
Bugs
- Fixed a bug related to the old method of aligning the table with the new schema, resulting in improved performance.
New features
- Updated example functionality to enhance usability.
Documentation updates
- Updated _version.py and CHANGELOG.md for the new release.
Maintenance
- Merged changes from the main branch to ensure project stays up-to-date.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.2.1
0.2.1 (10-10-2024)
Bugs
- Fixed issue with database row order inconsistency in tests.
- Corrected incorrect input provided during schema alignment in the create statement.
New Features
- Introduced a new utility function for table manipulation and empty table generation.
- Added methods for merging schemas with built-in functions.
Documentation updates
- Enhanced documentation with detailed docstrings for several functions.
- Updated the README with relevant information.
Maintenance
- Refactored multiple methods for optimization, including
create,update, anddelete. - Improved developer scripts and added new scripts for schema merging.
- Updated
.gitignoreand revision files like_version.pyandCHANGELOG.md.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.1.2
This is a major release with backward-incompatible changes. The ParquetDB class has been renamed to ParquetDBManager, and ParquetDatasetDB has been renamed to ParquetDB. These changes were made to better align with the intended functionality of the classes. The newly named ParquetDB serves as the core class, managing a single dataset, while the newly named ParquetDBManager wraps around ParquetDB to manage multiple independent datasets.
0.1.2 (10-08-2024)
Bugs
- None identified
New Features
- Introduced a new example demonstrating the use of
ParquetDatasetDBwith 4 million structures and highlighted some reading capabilities. - Added an option to create normalization methods that optimize performance by ensuring a consistent number of rows in dataset files.
- Implemented a script for running all tests in the test directory.
Documentation updates
- Updated
_version.pyandCHANGELOG.mdfor the new release. - Consolidated logging and configuration management into a single config object.
Maintenance
- Added BeautifulSoup as a dependency for examples.
- Moved old examples to the
dev_scripts/examplesdirectory. - Rearranged and improved the structure of
dev_scripts. - Updated changelog script and configuration for the timing logger.
- Merged updates from the main branch of the repository.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.0.8
0.0.8 (10-08-2024)
Bugs
- Made changes so read will always return an empty table or batch generator if the filtering or column selection fails
- Removed print statement
New Features
- Added config class. Now users can change configs by importing parquetdb; parquetdb.config.rootdir='path/to/dir' or change the logging parquetdb.loggingconfig.loggers.parquetdb.level='Debug'; parquetdb.logging_config.apply()
- New dev scripts
Documentation updates
- Updated _version.py and CHANGELOG.md due to new release
Maintenance
- Removed logging from tests
- Merge branch 'main' of https://github.com/lllangWV/ParquetDB into main
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.0.7
This release improves the package's directory structure, refines the Conda .yml files, and adds better support for nested structures.
0.0.7 (10-07-2024)
Bugs
- Improved support for nested struct types to prevent issues with empty dictionaries.
- Enhanced
merge_tablesfunction and added more utility functions for manipulatingpa.structtypes.
New Features
- [No changes]
Documentation updates
- Updated README.md for clarity.
- Changed
table_nametodataset_namein the README.md. - Made deployment workflow agnostic to repository and package name.
Maintenance
- Updated
env_dev.ymlandenv.yml. - Updated package directory structure by moving
parquetdbandparquet_datasetdbtocoreand creating autilsfolder withgeneral_utils. - Updated
_version.pyandCHANGELOG.mddue to new release.
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.0.6
ParquetDB v0.0.6 - Bug Fixes
What's New:
This release includes critical bug fixes to improve the stability of the GitHub workflow and environment variable management.
Bug Fixes:
- Workflow Commit Issue: Fixed an issue where the workflow was pulling the wrong git commits for the changelog, ensuring accurate commit logs in future releases.
- Environment Variable Issue: Resolved a bug where the
GITHUB_TOKENandrepo_namewere not passed as environment variables during workflow execution. This fix ensures that the workflow runs as expected with proper authentication and repository details.
0.0.6 (10-03-2024)
Bugs
- Bug fix: forgot to pass GitHub token and repo_name as env var
- Bug fix in workflow: got the wrong git commits
New Features
- None identified
Documentation updates
- Updated _version.py and CHANGELOG.md due to new release (noted twice)
Maintenance
- Merge branch 'main' of https://github.com/lllangWV/ParquetDB into main (noted twice)
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago
ParquetDB - v0.0.2
ParquetDB v0.0.2
We’re excited to announce the second release of ParquetDB! This update introduces several key improvements:
- New Class for Single Parquet Dataset Management: A new class has been added to manage a single parquet dataset as a database.
- Expanded Test Coverage: Additional tests have been implemented to ensure reliability and performance.
- Automated Deployment Workflow: A workflow has been integrated to automatically deploy the package to PyPi upon each GitHub release. This will also synchronize the package version with the corresponding release tag.
- CHANGELOG.md Introduced: The release introduces a
CHANGELOG.md, which will provide a concise summary of all changes made since the previous release.
0.0.2 (10-03-2024)
Bugs
- None identified ##### New features
- None identified ##### Documentation updates
- None identified ##### Maintenance
- No changes
Scientific Software - Peer-reviewed
- Python
Published by lllangWV over 1 year ago