Recent Releases of https://github.com/dathere/qsv

https://github.com/dathere/qsv - 7.1.0

🇮🇹 csv,conf,v9 edition 🍝

  |   :----|:---- |csvconfv9-flavor-small|Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN!

For this feature release, we polished describegpt a bit more for the occassion...

Towards the "People's API!"! Verso l'API del Popolo!
(Answering People/Policymaker Interface)|


🚀 Enhanced describegpt Command

  • Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
  • Few-shot Learning: Add --fewshot-examples option to improve LLM response quality with contextual examples
  • Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
  • Conditional SQL Results: Implement conditional --sql-results format for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a .csv extension. If a "SQL hallucination" fails, the file is saved with a .sql extension instead for the user to tweak and edit.
  • TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
  • Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
  • Disk Cache by Default: The disk cache is now enabled by default for better performance
  • TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files. (see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml)
  • Better Local LLM Support: --api-key can now be set to NONE for local LLM configurations that may not necessarily run on localhost (e.g. a shared Local LLM service running on the local network)

partition Command Enhancements

  • New --limit Option: Implement --limit option to set the maximum number of open files
  • Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets

Added

  • describegpt: add configurable frequency limit https://github.com/dathere/qsv/pull/2950
  • describegpt: migrate prompt file from JSON to more easier to edit TOML format https://github.com/dathere/qsv/pull/2954
  • describegpt: refactor default prompt file; add --fewshot-examples option https://github.com/dathere/qsv/pull/2955
  • describegpt: add TogetherAI support for models endpoint https://github.com/dathere/qsv/pull/2965
  • partition: add --limit option https://github.com/dathere/qsv/pull/2960
  • added Windows ARM64 prebuilt binaries

Changed

  • describegpt: enable disk cache by default https://github.com/dathere/qsv/pull/2951
  • describegpt: Polars SQL generation tweaks https://github.com/dathere/qsv/pull/2958
  • python: replace deprecated with_gil with attach https://github.com/dathere/qsv/pull/2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!
  • deps: bump embedded Luau from 0.688 to 0.690 https://github.com/dathere/qsv/pull/2967
  • deps: bump Polars to 0.50.0 at py-1.33.0 tag
  • build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2962
  • build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2963
  • build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2961
  • build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2948
  • build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2946
  • build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2956
  • build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2952
  • applied select clippy lints
  • updated indirect dependencies

Full Changelog: https://github.com/dathere/qsv/compare/7.0.1...7.1.0

- Rust
Published by jqnatividad 6 months ago

https://github.com/dathere/qsv - 7.0.1

[7.0.1] - 2025-08-28

A patch release with some minor bug fixes, benchmark tweaks and build system improvements.

Added

  • publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)

Changed

  • docs: describegpt expanded error message about LLM URL or API key
  • deps: remove planus pinned dependency

Fixed

  • fix: geocode --batch 0 causes panic when polars feature is enabled
  • publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail
  • publish: remove powerpc64le from main publish workflow
  • benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks

Full Changelog: https://github.com/dathere/qsv/compare/7.0.0...7.0.1

- Rust
Published by jqnatividad 6 months ago

https://github.com/dathere/qsv - 7.0.0

[7.0.0] - 2025-08-28

🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀

This is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!

With the release of OpenAI's gpt-oss open-weight reasoning model earlier this month setting the stage, we continue on our "Automagical Metadata" journey by revamping describegpt.

🤖 Revamped describegpt - AI-Powered Metadata Inferencing and Data Analysis: - Intelligent Metadata Generation: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - without sending your data to the cloud! - Chat with your Data: If your prompt can be answered using this high-quality, high-resolution Metadata, describegpt can answer it! If your prompt is not remotely related to the data, it will politely refuse - "I'm sorry, I can only answer questions about the Dataset." - Auto SQL RAG Mode: Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL Retrieval-Augmented Generation (RAG) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, "hallucination-free" SQL query[^1] to respond to your prompt. - Database Engine Support: If DuckDB is installed or the Polars feature is enabled, and the --sql-results <ANSWER.CSV> is specified, an optimized SQL query will be automatically executed and the query results are saved to the specified file. As both are purpose-built OLAP engines that support direct queries (no database pre-loading required), you get answers in a few seconds[^2] - even for very large datasets. - Multi-LLM Support: Works with any OpenAI-API compatible LLM - with special support for local LLMs like Ollama, Jan and LM Studio and the ability to customize model behavior with the --addl-props option. - Advanced Caching: Disk and Redis caching support for performance and cost optimization. - Flexible Prompting: Custom prompt files and built-in intelligent templates for various analysis tasks.

Check out these examples using a 1 million row sample of NYC's 311 data! - --all option produces a Data Dictionary, Description and Tags - Markdown, JSON - --prompt "What are the top 10 complaint types per community board and borough?" - SQL result

On top of other improvements in Datapusher+ with its new Jinja-based "metadata suggestion engine" - we're using this AI-inferred metadata along with other precalcs to prepopulate DCATv3 (both US and European profiles) and Croissant metadata fields that are otherwise too hard and expensive to compile manually.

The inferred and precalculated metadata values are offered as "suggestions", using a UI/UX purpose-built to facilitate interactive metadata curation chats.

This allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated "Data Steward in the Loop" data ingestion and metadata curation workflow.

If you want to see and learn more, we're Bologna-bound to attend csv,conf,v9 to present and share how we're using this to auto-infer metadata in CKAN.

Hope to see you there!

📊 Enhanced frequency Command: - Rank Column: Ranking of frequency results for better data insights - JSON Output Mode: New --json option not only provides structured output beyond the default CSV format - it also takes advantage of JSON's nested support to include 15 additional summary statistics per field - Performance Boost: Speed improvements with SIMD-accelerated number parsing, remaining performant even with the added functionality

stats Command Improvements: - Faster Still: Enabled by improvements in the underlying qsv-stats crate - Improved Precision: Faster, streamlined precision calculation - SIMD Number Parsing: Hardware-accelerated parsing for int/float values - Unix Epoch Support: Proper handling of Unix timestamp 0 as valid date - Enhanced Date Inference: Better date and boolean type inference capabilities

🔧 validate & schema Enhancements: - Fancy Regex Support: You can now use "advanced" regex features with your JSON Schema patterns with the --fancy-regex option. Previously, you can only use the standard Rust regex engine which does not support backreferences or look-arounds (for performance reasons) - JSON Schema Improvements: Better error handling and format validation options - Schema Validation Refinements: More granular validation control with --no-format-validation

🔄 rename Reverted and Improved:
When pairwise renaming was introduced in v6.0.0, it broke some some workflows. It's now fixed by introducing two modes: - Positional Mode: Renaming by position is now once again the default - Pairwise Mode: New --pairwise flag for column renaming by column pairs

🗂️ partition Improvements: - Case-Insensitive Safety: Improved case-aware partitioning algorithm. Previously, case insensitive file systems like macOS APFS and Windows NTFS was causing incorrect partitioning of case-sensitive values - Faster still: With better use of I/O bufferring - with deferred, batched, async writes instead of after every record

[^1]: LLMs can still hallucinate a syntactically wrong SQL query. But once a valid SQL query is produced, its fully reproducible.

[^2]: Depending on your LLM setup, SQL query generation may take some time, but once generated, the SQL query itself is blazing-fast.

Added

  • frequency add rank info to frequency table https://github.com/dathere/qsv/pull/2878
  • frequency add --json output option https://github.com/dathere/qsv/pull/2868
  • validate add --fancy-regex option https://github.com/dathere/qsv/pull/2845
  • add CPU-accelerated, mem-mapped, chunked sha256 file checksum helper https://github.com/dathere/qsv/pull/2909

Changed

  • apply use SIMD-accelerated base64-simd crate for Encode64 and Decode64 operations https://github.com/dathere/qsv/pull/2863
  • stats faster precision calculation https://github.com/dathere/qsv/pull/2852
  • perf: Use simdjson instead of serdejson to serialize to JSON https://github.com/dathere/qsv/pull/2884
  • refactor: create and use reqwest client helpers to eliminate redundant code https://github.com/dathere/qsv/pull/2888
  • perf: Faster parallelized sha256 hash file https://github.com/dathere/qsv/pull/2918
  • refactor: describegpt https://github.com/dathere/qsv/pull/2890
  • refactor: describegpt setting --timeout to 0 sets no timeout https://github.com/dathere/qsv/pull/2891
  • refactor: describegpt more refinements https://github.com/dathere/qsv/pull/2892
  • feat: describegpt refactor round3 https://github.com/dathere/qsv/pull/2893
  • feat: describegpt disk & redis caching https://github.com/dathere/qsv/pull/2895
  • refactor: describegpt https://github.com/dathere/qsv/pull/2896
  • refactor: describegpt create get_cache_key helper; customizable stats options https://github.com/dathere/qsv/pull/2902
  • feat: describegpt auto SQL RAG for --prompt https://github.com/dathere/qsv/pull/2904
  • feat: describegpt major refactor https://github.com/dathere/qsv/pull/2913
  • refactor: describegpt default promptfile is now embedded in qsv binary; fine-tune tests https://github.com/dathere/qsv/pull/2924
  • feat: describegpt returning reasoning with --json option https://github.com/dathere/qsv/pull/2926
  • feat: describegpt add DuckDB support in SQL RAG mode https://github.com/dathere/qsv/pull/2929
  • feat: describegpt various DuckDB improvements https://github.com/dathere/qsv/pull/2936
  • refactor: describegpt improved cache miss handling https://github.com/dathere/qsv/pull/2938
  • feat: describegpt --addl-props is now part of cachekey https://github.com/dathere/qsv/pull/2939
  • deps: bump cached to 0.56 and remove our patched fork https://github.com/dathere/qsv/pull/2853
  • deps: bump polars from 0.49 to 0.50 https://github.com/dathere/qsv/pull/2869
  • deps: bump polars to 0.50.0 at the py-1.32.2 tag https://github.com/dathere/qsv/pull/2877
  • deps: bump polars to 0.50.0 at py-1.32.3 tag https://github.com/dathere/qsv/pull/2889
  • build(deps): bump actions/checkout from 4 to 5 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2886
  • build(deps): bump arboard from 3.6.0 to 3.6.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2920
  • build(deps): bump base62 from 2.2.1 to 2.2.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2937
  • build(deps): bump bytemuck from 1.23.1 to 1.23.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2876
  • build(deps): bump calamine from 0.29.0 to 0.30.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2872
  • build(deps): bump criterion from 0.6.0 to 0.7.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2855
  • build(deps): bump dns-lookup from 2.1.0 to 3.0.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2915
  • build(deps): bump dynfmt2 from 0.2.0 to 0.3.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2850
  • build(deps): bump foldhash from 0.1.5 to 0.2.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2922
  • build(deps): bump file-format from 0.27.0 to 0.28.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2873
  • build(deps): bump filetime from 0.2.25 to 0.2.26 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2906
  • build(deps): bump governor from 0.10.0 to 0.10.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2871
  • build(deps): bump hashbrown from 0.15.4 to 0.15.5 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2874
  • build(deps): bump indexmap from 2.10.0 to 2.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2917
  • build(deps): bump jsonschema from 0.32.1 to 0.33.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2928
  • build(deps): bump libc from 0.2.174 to 0.2.175 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2882
  • build(deps): bump memmap2 from 0.9.7 to 0.9.8 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2914
  • build(deps): bump mimalloc from 0.1.47 to 0.1.48 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2935
  • build(deps): bump minijinja-contrib from 2.11.0 to 2.12.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2923
  • deps: bump mlua from 0.10.5 to 0.11.1 - upgrading Luau from 0.663 to 0.682 https://github.com/dathere/qsv/pull/2842
  • build(deps): bump mlua from 0.11.1 to 0.11.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2879
  • build(deps): bump phf from 0.12.1 to 0.13.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2921
  • build(deps): bump qsv-stats from 0.36.0 to 0.37.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2856
  • build(deps): bump rand from 0.9.1 to 0.9.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2851
  • build(deps): bump rayon from 1.10.0 to 1.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2887
  • build(deps): bump redis from 0.32.4 to 0.32.5 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2880
  • build(deps): bump regex from 1.11.1 to 1.11.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2925
  • build(deps): bump reqwest from 0.12.22 to 0.12.23 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2885
  • build(deps): bump serde_json from 1.0.140 to 1.0.141 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2847
  • build(deps): bump serde_json from 1.0.141 to 1.0.142 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2865
  • build(deps): bump serde_json from 1.0.142 to 1.0.143 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2898
  • build(deps): bump strum from 0.27.1 to 0.27.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2848
  • build(deps): bump strum_macros from 0.27.1 to 0.27.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2849
  • build(deps): bump sysinfo from 0.36.0 to 0.36.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2846
  • build(deps): bump sysinfo from 0.36.1 to 0.37.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2881
  • build(deps): bump tempfile from 3.20.0 to 3.21.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2900
  • build(deps): bump tokio from 1.46.1 to 1.47.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2857
  • build(deps): bump tokio from 1.47.0 to 1.47.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2866
  • build(deps): bump uuid from 1.17.0 to 1.18.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2883
  • build(deps): bump url from 2.5.4 to 2.5.6 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2912
  • build(deps): bump url from 2.5.6 to 2.5.7 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2919
  • build(deps): bump zip from 4.3.0 to 4.5.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2911
  • applied select clippy suggestions
  • updated indirect dependencies
  • bumped MSRV to Rust 1.89

Fixed

  • fix: json more robust error-handling of invalid JSON input; https://github.com/dathere/qsv/pull/2844
  • fix: template fix stdin regression https://github.com/dathere/qsv/pull/2907
  • fix:rename add --positional option https://github.com/dathere/qsv/pull/2930
  • fix: rename the real fix - positional is now the default and pairwise is the option https://github.com/dathere/qsv/pull/2931
  • fix: partition case insensitive filesystems https://github.com/dathere/qsv/pull/2934
  • docs: fix inconsistent formatting in command help examples by @abobov in https://github.com/dathere/qsv/pull/2862

New Contributors

  • @abobov made their first contribution in https://github.com/dathere/qsv/pull/2862

Full Changelog: https://github.com/dathere/qsv/compare/6.0.1...7.0.0

- Rust
Published by jqnatividad 6 months ago

https://github.com/dathere/qsv - 6.0.1

[6.0.1] - 2025-07-12

This is a patch release with bug fixes and minor improvements.


Changed

  • feat: updated completions for qsv v6.0.0 by @rzmk in #2838
  • docs: updated sample schema.json based on NYC311 1M row sample benchmark data
  • docs: updated sample stats output using NYC 311 1M row sample benchmark data
  • build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2839
  • build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2840
  • bumped indirect dependencies
  • Added benchmark_data.* to .gitignore

Fixed

  • geocode: make --batch=0 mode more robust by setting a minimum batch size of 1,000 rows https://github.com/dathere/qsv/commit/2fa90bcc7df57a338a4851bafb361e886cea97c5
  • jsonl: correct batchsize calculation to use input file instead of output file for line counting https://github.com/dathere/qsv/commit/742dc777a3d2d2f3d70e72078d69cfdc39c04b4b
  • benchmarks: fixed benchmarks with unescaped parameters with embedded spaces https://github.com/dathere/qsv/commit/ad95596b8400154b50042e2cb8352900d0198904

Removed

  • Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)

Full Changelog: https://github.com/dathere/qsv/compare/6.0.0...6.0.1

- Rust
Published by jqnatividad 8 months ago

https://github.com/dathere/qsv - 6.0.0

What's Changed

  • build(deps): bump libc from 0.2.173 to 0.2.174 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2794
  • feat: schema JSON schema description property set to cmdline used to generate the JSON schema by @jqnatividad in https://github.com/dathere/qsv/pull/2796
  • build(deps): bump phf from 0.11.3 to 0.12.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2797
  • deps: bump polars from 0.48 to 0.49 by @jqnatividad in https://github.com/dathere/qsv/pull/2798
  • sqlp & joinp: --decimal-comma option is not only for parsing input CSVs, it's also used when writing output CSVs by @jqnatividad in https://github.com/dathere/qsv/pull/2800
  • build(deps): bump flexi_logger from 0.31.0 to 0.31.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2801
  • build(deps): bump zip from 4.1.0 to 4.2.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2802
  • feat: validate improved JSON Schema schema validation by @jqnatividad in https://github.com/dathere/qsv/pull/2803
  • feat: update completions for qsv v5.1.0 by @rzmk in https://github.com/dathere/qsv/pull/2804
  • feat: lens add --wrap-mode option by @jqnatividad in https://github.com/dathere/qsv/pull/2805
  • feat: rename pair-based renaming by @jqnatividad in https://github.com/dathere/qsv/pull/2806
  • feat: sort add --natural sort option by @jqnatividad in https://github.com/dathere/qsv/pull/2808
  • build(deps): bump flexi_logger from 0.31.1 to 0.31.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2812
  • build(deps): bump arboard from 3.5.0 to 3.6.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2814
  • build(deps): bump minijinja from 2.10.2 to 2.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2815
  • build(deps): bump minijinja-contrib from 2.10.2 to 2.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2816
  • build(deps): bump reqwest from 0.12.20 to 0.12.21 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2817
  • build(deps): bump indicatif from 0.17.11 to 0.17.12 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2818
  • build(deps): bump tokio from 1.45.1 to 1.46.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2821
  • build(deps): bump reqwest from 0.12.21 to 0.12.22 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2820
  • dep: bump polars to latest upstream - adapt to PlPath api reqt by @jqnatividad in https://github.com/dathere/qsv/pull/2822
  • build(deps): bump qsv-stats from 0.33.0 to 0.34.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2823
  • build(deps): bump tokio from 1.46.0 to 1.46.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2825
  • deps: Remove similar-asserts and go back to std asserts by @jqnatividad in https://github.com/dathere/qsv/pull/2826
  • perf: transpose refactored for perf by @jqnatividad in https://github.com/dathere/qsv/pull/2827
  • build(deps): bump jaq-std from 2.1.1 to 2.1.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2830
  • build(deps): bump jaq-core from 2.2.0 to 2.2.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2831
  • build(deps): bump jaq-json from 1.1.2 to 1.1.3 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2832
  • build(deps): bump human-panic from 2.0.2 to 2.0.3 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2833
  • build(deps): bump zip from 4.2.0 to 4.3.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2835
  • build(deps): bump sysinfo from 0.35.2 to 0.36.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2836
  • perf: bump to faster geosuggest to 0.8 by @jqnatividad in https://github.com/dathere/qsv/pull/2837

Full Changelog: https://github.com/dathere/qsv/compare/5.1.0...6.0.0

- Rust
Published by jqnatividad 8 months ago

https://github.com/dathere/qsv - 5.1.0

[5.1.0] - 2025-06-17

Highlights

  • lens is now colorful by default, with a --monochrome option to turn it off:

    qsv lens /tmp/NYC311SR_2010-2020-sample-1M.csv Screenshot 2025-06-17 at 10 02 43 PM

  • lens can now have custom prompts with the --prompt option (with support for ANSI escape codes to format the prompt). Meant to be paired with the --echo-column <colname> option, e.g.:

    qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m' --echo-column 'Unique Key' \ /tmp/NYC311SR_2010-2020-sample-1M.csv

qsvprompt * the qsv-stats crate - the underlying engine behind the central stats, frequency and "smart" commands, got a lot of love in this release * validate got a tad faster while decreasing its memory footprint. The new --no-format-validation option now also allows you to ignore all JSON Schema "format" keywords (e.g. date, email, url, currency, etc.) when validating CSVs.


Added

  • lens: add --prompt option, add examples to regex-enabled options https://github.com/dathere/qsv/pull/2772
  • lens: add --monochrome option, otherwise, columns displayed in different colors https://github.com/dathere/qsv/pull/2761
  • validate: add --no-format-validation option when in JSON Schema mode https://github.com/dathere/qsv/pull/2762
  • docs: add shell completions badges by @rzmk in https://github.com/dathere/qsv/pull/2760
  • feat: added criterion trim algorithm microbenchmarks https://github.com/dathere/qsv/pull/2789

Changed

  • frequency: performance microoptimizations - use stats cache column cardinality to pre-alloc & size frequency hash tables
  • geocode: refactor regex handling for performance & maintainability
  • json: preserve key order https://github.com/dathere/qsv/pull/2777
  • stats: performance microoptimizations - use unwrap_unchecked() instead of just unwrap() in hot sampling functions
  • validate: major refactoring for added performance/memory efficiency
  • chore: temporarily use qsv-calamine until a new calamine is released https://github.com/dathere/qsv/pull/2790
  • Bump cpc from 1.9 to 2 https://github.com/dathere/qsv/pull/2770
  • deps: bump criterion from 0.5 to 0.6 https://github.com/dathere/qsv/pull/2791
  • deps: use latest csvlens upstream with colorful columnshttps://github.com/dathere/qsv/commit/f2c9322e33a0ac335dafec10a490c871d3de0a6c
  • deps: temporarily use qsv-calamine until a new calamine is released https://github.com/dathere/qsv/pull/2790
  • deps: bump our patched forks of cached, csvs_convert, json-objects-to-csv, jsonschema, localzone, rfd, self_update until PRs are merged or new releases are made
  • deps: bump zip from 3 to 4 in https://github.com/dathere/qsv/commit/75909d2ca8835400bee5a90e18085c370939bb53
  • deps: bump polars to 0.48.1 at 49ce57a revision
  • build(deps): bump atoi_simd from 0.16.0 to 0.16.1 by @dependabot in https://github.com/dathere/qsv/pull/2766
  • build(deps): bump bytemuck from 1.23.0 to 1.23.1 by @dependabot in https://github.com/dathere/qsv/pull/2778
  • build(deps): bump flate2 from 1.1.1 to 1.1.2 by @dependabot in https://github.com/dathere/qsv/pull/2781
  • build(deps): bump flexi_logger from 0.30.1 to 0.30.2 by @dependabot in https://github.com/dathere/qsv/pull/2765
  • build(deps): bump flexi_logger from 0.30.2 to 0.31.0 by @dependabot in https://github.com/dathere/qsv/pull/2793
  • build(deps): bump hashbrown from 0.15.3 to 0.15.4 by @dependabot in https://github.com/dathere/qsv/pull/2779
  • build(deps): bump libc from 0.2.172 to 0.2.173 by @dependabot in https://github.com/dathere/qsv/pull/2787
  • build(deps): bump mimalloc from 0.1.46 to 0.1.47 by @dependabot in https://github.com/dathere/qsv/pull/2792
  • build(deps): bump mlua from 0.10.3 to 0.10.5 by @dependabot in https://github.com/dathere/qsv/pull/2758
  • build(deps): bump num_cpus from 1.16.0 to 1.17.0 by @dependabot in https://github.com/dathere/qsv/pull/2771
  • build(deps): bump parking_lot from 0.12.3 to 0.12.4 by @dependabot in https://github.com/dathere/qsv/pull/2768
  • build(deps): bump pyo3 from 0.25.0 to 0.25.1 by @dependabot in https://github.com/dathere/qsv/pull/2785
  • deps: upgrade qsv-stats from 0.32 to 0.33, which features major memory and performance optimizations behind the stats & frequency commands https://github.com/dathere/qsv/pull/2786
  • deps: bump redis from 0.29.5 to 0.32
  • build(deps): bump reqwest from 0.12.15 to 0.12.16 by @dependabot in https://github.com/dathere/qsv/pull/2764
  • build(deps): bump reqwest from 0.12.16 to 0.12.18 by @dependabot in https://github.com/dathere/qsv/pull/2767
  • build(deps): bump reqwest from 0.12.18 to 0.12.19 by @dependabot in https://github.com/dathere/qsv/pull/2773
  • build(deps): bump reqwest from 0.12.19 to 0.12.20 by @dependabot in https://github.com/dathere/qsv/pull/2782
  • build(deps): bump rust_decimal from 1.37.1 to 1.37.2 by @dependabot in https://github.com/dathere/qsv/pull/2788
  • build(deps): bump smallvec from 1.15.0 to 1.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2780
  • build(deps): bump sysinfo from 0.35.1 to 0.35.2 by @dependabot in https://github.com/dathere/qsv/pull/2774
  • build(deps): bump titlecase from 3.5.0 to 3.6.0 by @dependabot in https://github.com/dathere/qsv/pull/2775
  • build(deps): bump tokio from 1.45.0 to 1.45.1 by @dependabot in https://github.com/dathere/qsv/pull/2759
  • build(deps): bump uuid from 1.16.0 to 1.17.0 by @dependabot in https://github.com/dathere/qsv/pull/2757
  • applied select clippy suggestions
  • updated indirect dependencies
  • set Rust nightly to 2025-05-21, the same nightly Polars uses https://github.com/dathere/qsv/commit/872ade1b52cb0013fdb30aa2c4d83ce2081cf0c6

Fixed:

  • fix: frequency recover from non-fatal absence of stats cache, instead of panicking https://github.com/dathere/qsv/commit/b2821a0
  • fix: flaky json tests caused by hardcoding name of intermediate file - https://github.com/dathere/qsv/commit/62ca310f5942a3ffcf7334a5623db0c94c9fa8b3
  • fix: flaky reverse property tests by handling BOM characters https://github.com/dathere/qsv/commit/cefd490a899156735baf904b597b322e96b61f5d
  • fix: util::process_input helper does not honor QSV_SKIP_FORMAT_CHECK when processing dir input https://github.com/dathere/qsv/pull/2784

Full Changelog: https://github.com/dathere/qsv/compare/5.0.3...5.1.0

- Rust
Published by jqnatividad 9 months ago

https://github.com/dathere/qsv - 5.0.3

[5.0.3] - 2025-05-22 "The Geo Release" 🌍

qsv 5.0.3 represents a major milestone with significant enhancements to its geospatial data processing capabilities.
They're targeted to support the Datapusher+ Data Resource Upload First (DRUF) workflow for "automagical metadata inferencing" - focusing on DCAT-US v3 recommended spatial and temporal properties that would otherwise be too tedious to manually compile:

New Geocoding Capabilities - Added IP geolocation with new --iplookup and --iplookupnow subcommands in the geocode command - Integrated Maxmind GeoLite2 database support for accurate IP-to-location mapping - Enhanced geocoding performance (up to 5x faster) with rkyv serialization (contributed by @estin)

Enhanced geoconvert Command - Added CSV input support alongside existing geospatial formats - Introduced GeoJSONL output format for streaming workflows - Added stdin support for all formats except SHP input - New coordinate handling options: --latitude and --longitude parameters - Added --max-length option for output control - Comprehensive test coverage additions - all contributed by @rzmk!

🚀 Performance & Infrastructure Improvements

Polars Integration - Upgraded Polars from 0.46.0 to 0.48.1 with intermediate releases - Enhanced Polars schema support across multiple commands (schema, joinp, pivotp, sqlp) - Added --polars mode to the schema command to explicitly create a polars schema file on demand, rather than as a side-effect of the sqlp command using its --cache-schema option.

Core Performance - Microoptimizations in the sort command - Improved file handling with tempfile usage in edit --in-place - Enhanced auto-decompression support now available suite-wide for gz, zlib, and zst files

🛠️ New Features & Usability

Enhanced Commands - edit: New --in-place option for direct file modification with automatic backup (.bak) creation - foreach: Added "/" to splitter pattern for improved path handling - stats: New QSV_STATS_STRING_MAX_LENGTH environment variable for string analysis control - to: Added --all-strings option for simplified data type handling

Distribution & Installation - Added conda package support with installation instructions - New download badges and streamlined installation documentation - Retired older glibc-2.31 and musl-1.2.3 "prebuilt-older" binaries as Ubuntu 20.04 has been retired and no longer supported with GitHub Actions. - Discontinued MSI installer in favor of the easier qsv Windows Easy Installer (thanks @rzmk!)

Quality & Stability - Applied multiple clippy lint suggestions for code quality - Enhanced test coverage, particularly for geospatial functions - Improved documentation with better examples and clearer explanations - Fixed stdin handling issues in the split command

🎯 Default Feature Changes

The qsvdp variant now includes geocode and geoconvert commands by default, making geospatial functionality more accessible to Datapusher+ users with Jinja2-powered metadata formulas.

NOTE: * for qsv v5.0.3, cargo install will NOT worked as the calamine crate (which powers the excel command) is pinned to zip 2.5.0 which was yanked. * unfortunately, the broken zip dependency also prevents us from publishing qsv 5.0.3 to crates.io * for both cases, either install the prebuilts or compile from source with cargo build.


Added

  • edit: add --in-place (and test) which uses tempfile by @rzmk in https://github.com/dathere/qsv/pull/2744
  • foreach: add "/" to splitter pattern https://github.com/dathere/qsv/pull/2754
  • geoconvert: add CSV input and GeoJSONL output and use buf by @rzmk in https://github.com/dathere/qsv/pull/2690
  • geoconvert: add stdin support (except for SHP input) by @rzmk in https://github.com/dathere/qsv/pull/2699
  • geoconvert: add --latitude and --longitude options by @rzmk in https://github.com/dathere/qsv/pull/2707
  • geoconvert: add --max-length option https://github.com/dathere/qsv/pull/2711
  • geocode: add iplookup and iplookupnow subcommands https://github.com/dathere/qsv/pull/2741
  • tests: geoconvert - add basic tests and move tests to test_geoconvert.rs by @rzmk in https://github.com/dathere/qsv/pull/2717
  • qsvdp now include geocode & geoconvert commands by default https://github.com/dathere/qsv/pull/2697
  • stats: QSVSTATSSTRINGMAXLENGTH env var https://github.com/dathere/qsv/pull/2709
  • to: add --all-strings option https://github.com/dathere/qsv/pull/2746
  • docs: add conda install command by @rzmk in https://github.com/dathere/qsv/pull/2718
  • docs: add qsv download badges and update install instructions by @rzmk in https://github.com/dathere/qsv/pull/2721

Changed

  • geocode: bump geosuggest crate to use much faster rkyv serialization by @estin in https://github.com/dathere/qsv/pull/2734
  • sort: microoptimize https://github.com/dathere/qsv/pull/2748
  • feat: update completions for qsv v5.0 by @rzmk in https://github.com/dathere/qsv/pull/2752
  • Improved Polars Schema support https://github.com/dathere/qsv/pull/2703
  • Bump polars from 0.46.0 to 0.47.0 https://github.com/dathere/qsv/commit/87bf7b7f5e0b5af754afabf2939ced3914eb276f
  • Bump polars py-1.30.0-beta-1 https://github.com/dathere/qsv/pull/2747
  • Bump polars to 0.48.0 https://github.com/dathere/qsv/commit/5a037eeff1d353f3f4b8f16a7d6ec6b3074b2f8c
  • build(deps): bump polars from 0.48.0 to 0.48.1 by @dependabot in https://github.com/dathere/qsv/pull/2750
  • build(deps): bump polars-ops from 0.48.0 to 0.48.1 by @dependabot in https://github.com/dathere/qsv/pull/2751
  • build(deps): bump actions/setup-python from 5.5.0 to 5.6.0 by @dependabot in https://github.com/dathere/qsv/pull/2713
  • build(deps): bump actix-web from 4.10.2 to 4.11.0 by @dependabot in https://github.com/dathere/qsv/pull/2742
  • build(deps): bump bytemuck from 1.22.0 to 1.23.0 by @dependabot in https://github.com/dathere/qsv/pull/2719
  • build(deps): bump chrono from 0.4.40 to 0.4.41 by @dependabot in https://github.com/dathere/qsv/pull/2722
  • build(deps): bump ext-sort from 0.1.4 to 0.1.5 by @dependabot in https://github.com/dathere/qsv/pull/2736
  • build(deps): bump file-format from 0.26.0 to 0.27.0 by @dependabot in https://github.com/dathere/qsv/pull/2735
  • build(deps): bump pyo3 from 0.24.1 to 0.24.2 by @dependabot in https://github.com/dathere/qsv/pull/2708
  • build(deps): bump jaq-json from 1.1.1 to 1.1.2 by @dependabot in https://github.com/dathere/qsv/pull/2714
  • build(deps): bump jaq-std from 2.1.0 to 2.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2715
  • build(deps): bump jaq-core from 2.1.1 to 2.2.0 by @dependabot in https://github.com/dathere/qsv/pull/2716
  • build(deps): bump jsonschema from 0.29.1 to 0.30.0 by @dependabot in https://github.com/dathere/qsv/pull/2704
  • build(deps): bump libc from 0.2.171 to 0.2.172 by @dependabot in https://github.com/dathere/qsv/pull/2696
  • build(deps): bump sysinfo from 0.34.2 to 0.35.0 by @dependabot in https://github.com/dathere/qsv/pull/2724
  • build(deps): bump minijinja from 2.9.0 to 2.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2727
  • build(deps): bump minijinja from 2.10.1 to 2.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2732
  • build(deps): bump minijinja-contrib from 2.9.0 to 2.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2728
  • build(deps): bump minijinja-contrib from 2.10.1 to 2.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2733
  • build(deps): bump pyo3 from 0.24.2 to 0.25.0 by @dependabot in https://github.com/dathere/qsv/pull/2745
  • build(deps): bump rand from 0.9.0 to 0.9.1 by @dependabot in https://github.com/dathere/qsv/pull/2702
  • build(deps): bump simd-json from 0.15.0 to 0.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2701
  • build(deps): bump sysinfo from 0.35.0 to 0.35.1 by @dependabot in https://github.com/dathere/qsv/pull/2740
  • build(deps): bump tempfile from 3.19.1 to 3.20.0 by @dependabot in https://github.com/dathere/qsv/pull/2739
  • build(deps): bump tokio from 1.44.2 to 1.45.0 by @dependabot in https://github.com/dathere/qsv/pull/2731
  • bump indirect dependencies
  • apply select clippy lint suggestions
  • bump MRSV to 1.87.0

Fixed:

  • docs: fix typo in apply operations replace example by @HarrisonMc555 in https://github.com/dathere/qsv/pull/2743
  • fix: split save stdin to tempfile https://github.com/dathere/qsv/pull/2706

New Contributors

  • @estin made their first contribution in https://github.com/dathere/qsv/pull/2734
  • @HarrisonMc555 made their first contribution in https://github.com/dathere/qsv/pull/2743

Full Changelog: https://github.com/dathere/qsv/compare/4.0.0...5.0.3

- Rust
Published by jqnatividad 9 months ago

https://github.com/dathere/qsv - 4.0.0

[4.0.0] - 2025-04-13

Highlights:

This is a major release with numerous improvements! - qsv can now read additional file formats by leveraging the Polars engine: Arrow/IPC, Avro, Parquet, JSON (JSON array) and JSONL - Automatic decompression support for compressed CSV file dialects (csv, tsv/tab & csv) using gzip (.gz), zlib (.zlib), zstd (.zst) compression formats. (e.g. data.csv.gz, data.tsv.zst, data.ssv.zlib) qsv sample 1000 data.parquet | qsv stats | qsv lens qsv frequency data.csv.gz | qsv lens qsv search Waldo data.tsv.zlib | qsv lens qsv select 2-5 data.jsonl | qsv lens - New geoconvert command for converting spatial formats to CSV: - GeoJSON # convert TX_cities.geojson to CSV, filter out the geometry column and browse with lens qsv geoconvert TX_cities.geojson geojson csv | qsv select '!geometry' | qsv lens - Shapefile (SHP) - Enhanced split command with new --filter option: - Similar to GNU split - Spawns a subprocess for each chunk (e.g. compress each chunk with qsv split outdir input.csv --filter "gzip {}.csv") - Expanded to command: - added LibreOffice Calc (ODS) support - re-enabled parquet generation now that it's using Arrow instead of DuckDB (which made for very long compiles) - New uniqueCombinedWith JSON Schema custom keyword in validate command: - Allows validating uniqueness across multiple columns - Useful for composite key validation - QSVDOTENVPATH now supports the sentinel value "<NONE>" to disable dotenv processing altogether.


Added

  • geoconvert: new command to convert spatial formats to CSV by @rzmk in https://github.com/dathere/qsv/pull/2681 & https://github.com/dathere/qsv/pull/2688
  • split: add --filter options https://github.com/dathere/qsv/pull/2660
  • sqlp: add decimal type support https://github.com/dathere/qsv/pull/2646
  • to: add back to parquet support https://github.com/dathere/qsv/pull/2665
  • feat: Extended auto decompression support. In addition to snappy auto-decompression, auto-decompress CSV dialects (tsv/tab & ssv files) using gzip, zlib and zstd compression formats https://github.com/dathere/qsv/pull/2671
  • to: add ODS support https://github.com/dathere/qsv/pull/2674
  • validate: add uniqueCombinedWith custom JSON Schema Validation keyword https://github.com/dathere/qsv/pull/2636
  • feat: prompt add file formats supported to dialog box filter when polars feature is enabled https://github.com/dathere/qsv/pull/2667
  • feat: add QSV_POLARS_FLOAT_PRECISION env var https://github.com/dathere/qsv/pull/2678
  • tests: add tests for https://100.dathere.com/lessons/3 by @rzmk in https://github.com/dathere/qsv/pull/2638

Changed

  • qsvdp binary variant can now use the geocode & geoconvert commands https://github.com/dathere/qsv/commit/50f004608d396602b8f6eb048a72dcc56630d26a
  • geocode feature now gates the geocode & geoconvert command https://github.com/dathere/qsv/commit/9d046e8da107c088f1e0b3bb20e64ab79fde05d8
  • feat: setting QSVDOTENVPATH to sentinel value "<NONE>\" disables dotenv processing https://github.com/dathere/qsv/pull/2684
  • refactor: polars special formats support https://github.com/dathere/qsv/pull/2683
  • contrib(completions): update completions to v3.3.0 by @rzmk in https://github.com/dathere/qsv/pull/2626
  • contrib(completions): update completions for qsv v4.0.0 by @rzmk in https://github.com/dathere/qsv/pull/2677
  • deps: bump polars to 0.46.0 at py-1.27.1 tag https://github.com/dathere/qsv/pull/2675 and https://github.com/dathere/qsv/commit/e5d29d7f192bf10f6528d0d423347179d785e40f
  • build(deps): bump actions/setup-python from 5.4.0 to 5.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2627
  • build(deps): bump arboard from 3.4.1 to 3.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2653
  • build(deps): bump chrono-tz from 0.10.2 to 0.10.3 by @dependabot in https://github.com/dathere/qsv/pull/2623
  • build(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 by @dependabot in https://github.com/dathere/qsv/pull/2672
  • build(deps): bump csvs_convert from 0.11.0 to 0.11.1 by @dependabot in https://github.com/dathere/qsv/pull/2686
  • build(deps): bump data-encoding from 2.8.0 to 2.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2685
  • build(deps): bump flate2 from 1.1.0 to 1.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2649
  • build(deps): bump flexi_logger from 0.29.8 to 0.30.0 by @dependabot in https://github.com/dathere/qsv/pull/2650
  • build(deps): bump flexi_logger from 0.30.0 to 0.30.1 by @dependabot in https://github.com/dathere/qsv/pull/2651
  • build(deps): bump governor from 0.8.1 to 0.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2625
  • build(deps): bump governor from 0.9.0 to 0.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2631
  • build(deps): bump jsonschema from 0.29.0 to 0.29.1 by @dependabot in https://github.com/dathere/qsv/pull/2635
  • build(deps): bump log from 0.4.26 to 0.4.27 by @dependabot in https://github.com/dathere/qsv/pull/2622
  • build(deps): bump mimalloc from 0.1.44 to 0.1.45 by @dependabot in https://github.com/dathere/qsv/pull/2652
  • build(deps): bump minijinja from 2.8.0 to 2.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2643
  • build(deps): bump minijinja-contrib from 2.8.0 to 2.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2642
  • build(deps): bump pyo3 from 0.24.0 to 0.24.1 by @dependabot in https://github.com/dathere/qsv/pull/2645
  • build(deps): bump qsv-dateparser from 0.12.1 to 0.13.0 by @dependabot in https://github.com/dathere/qsv/pull/2639
  • build(deps): bump qsv-sniffer from 0.10.3 to 0.11.0 by @dependabot in https://github.com/dathere/qsv/pull/2640
  • build(deps): bump redis from 0.29.2 to 0.29.4 by @dependabot in https://github.com/dathere/qsv/pull/2663
  • build(deps): bump redis from 0.29.4 to 0.29.5 by @dependabot in https://github.com/dathere/qsv/pull/2666
  • build(deps): bump smallvec from 1.14.0 to 1.15.0 by @dependabot in https://github.com/dathere/qsv/pull/2656
  • build(deps): bump sysinfo from 0.34.0 to 0.34.1 by @dependabot in https://github.com/dathere/qsv/pull/2637
  • build(deps): bump sysinfo from 0.34.1 to 0.34.2 by @dependabot in https://github.com/dathere/qsv/pull/2648
  • build(deps): bump titlecase from 3.4.0 to 3.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2669
  • build(deps): bump tokio from 1.44.1 to 1.44.2 by @dependabot in https://github.com/dathere/qsv/pull/2662
  • applied select clippy lint suggestions
  • bumped indirect dependencies to latest version

Fixed

  • fix: select panic when idx is out of bounds https://github.com/dathere/qsv/pull/2670
  • fix: correct link to qsv-dateparser accepted date formats https://github.com/dathere/qsv/pull/2632
  • fix: reset SIGPIPE handling https://github.com/dathere/qsv/pull/2664
  • docs: fix typo it's -> its by @rzmk in https://github.com/dathere/qsv/pull/2680

Full Changelog: https://github.com/dathere/qsv/compare/3.3.0...4.0.0

- Rust
Published by jqnatividad 11 months ago

https://github.com/dathere/qsv - 3.3.0

[3.3.0] - 2025-03-23

Highlights:

  • stats got another round of improvements:
    • boolean inferencing is now configurable!
      Before, it was limited to a simple, English-centric heuristic:
    • When a column's cardinality is 2; and the 2 values' first characters are 0/1, t/f or y/n case-insensitive, the data type of the column is inferred as boolean
    • With the new --boolean-patterns <arg> option, we can now specify arbitrary true_pattern:false_pattern pattern pairs. Each pattern can be a string of length > 1, case-insensitive. If a pattern ends with "", it is treated as a prefix.
      For example, `t
      :fmatches "true", "Truthy", "T" as boolean true so long as the corresponding false pattern (e.g. "Fake, False, f") is also matched. Bear in mind that the cardinality needs to be 2, so multiple matches on the same column on different patterns will disqualify the field as boolean (e.g. If a column's domain is "True", "truthy" and "False", it doesn't qualify as it's cardinality is 3. On the other hand, if it's "True", "true", "False", "false", "FALSE" - it still qualifies as they resolve to just "true/false" case-insensitive.<br />For backwards compatibility, the default true/false pairs are1:0,t:f,y:n*`
    • percentiles can now be computed!
      By enabling the --percentiles flag, stats will now return the 5th, 10th, 40th, 60th, 90th and 95th percentile by default using the nearest-rank method for all numeric and date/datetime columns. The returned percentiles can be configured to return different percentiles using the --percentile-list <arg> option.
      Note that the method for computing quartiles (Method 3) is basically a specialized implementation of the nearest rank method for q1 (25th), q2 (50th or median) and q3 (75th percentile), thus the choice of non-overlapping defaults for --percentile-list.
  • frequency: got a performance boost now that we're using qsv-stats 0.32.0, which uses the faster foldhash crate
  • in the same vein, by replacing ahash with foldhash suite-wide, qsv got a tad faster when doing hash lookups
  • sample: "streaming" bernoulli sampling now works for any remotely hosted CSVs with servers that support chunked downloads, without requiring range request support.
  • we're now using the latest Polars engine - v0.46.0 at the py-1.26.0 tag.

Added

  • stats: add configurable boolean inferencing https://github.com/dathere/qsv/pull/2595
  • stats: add --percentiles option https://github.com/dathere/qsv/pull/2617

Changed

  • refactor: replace ahash with faster foldhash https://github.com/dathere/qsv/pull/2619
  • replace std assert_eq! macro with similar_asserts::assert_eq! macro for easier debugging https://github.com/dathere/qsv/pull/2605
  • deps: bump polars to 0.46.0 at py-1.25.2 tag https://github.com/dathere/qsv/pull/2604
  • deps: bump Polars to v0.46.0 at py-1.26.0 tag https://github.com/dathere/qsv/pull/2621
  • build(deps): bump actix-web from 4.9.0 to 4.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2591
  • build(deps): bump indexmap from 2.7.1 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2592
  • build(deps): bump mimalloc from 0.1.43 to 0.1.44 by @dependabot in https://github.com/dathere/qsv/pull/2608
  • build(deps): bump qsv-stats from 0.30.0 to 0.31.0 by @dependabot in https://github.com/dathere/qsv/pull/2603
  • build(deps): bump qsv-stats from 0.31.0 to 0.32.0 by @dependabot in https://github.com/dathere/qsv/pull/2620
  • build(deps): bump reqwest from 0.12.12 to 0.12.13 by @dependabot in https://github.com/dathere/qsv/pull/2593
  • build(deps): bump reqwest from 0.12.13 to 0.12.14 by @dependabot in https://github.com/dathere/qsv/pull/2596
  • build(deps): bump reqwest from 0.12.14 to 0.12.15 by @dependabot in https://github.com/dathere/qsv/pull/2609
  • build(deps): bump rfd from 0.15.2 to 0.15.3 by @dependabot in https://github.com/dathere/qsv/pull/2597
  • build(deps): bump rust_decimal from 1.37.0 to 1.37.1 by @dependabot in https://github.com/dathere/qsv/pull/2616
  • build(deps): bump simd-json from 0.14.3 to 0.15.0 by @dependabot in https://github.com/dathere/qsv/pull/2615
  • build(deps): bump tempfile from 3.18.0 to 3.19.0 by @dependabot in https://github.com/dathere/qsv/pull/2602
  • build(deps): bump tempfile from 3.19.0 to 3.19.1 by @dependabot in https://github.com/dathere/qsv/pull/2612
  • build(deps): bump uuid from 1.15.1 to 1.16.0 by @dependabot in https://github.com/dathere/qsv/pull/2601
  • build(deps): bump zip from 2.2.3 to 2.4.1 by @dependabot in https://github.com/dathere/qsv/pull/2607
  • apply select clippy lint suggestions
  • bumped indirect dependencies to latest version
  • set Rust nightly to 2025-03-07, the same version Polars uses https://github.com/dathere/qsv/commit/17f6bdb3f80c5798d154a133428f0ca6ff59fc79

Fixed

  • updated lock file, primarily to fix CVE-2025-29787 https://github.com/dathere/qsv/commit/e44e5df3fd296fcf85293d46a7afe08f40b86693
  • luau: fix flaky registerlookuptable CI test that only intermittently fails in Windows by using buffered writer in lookup write_cache_file helper https://github.com/dathere/qsv/commit/f494b46d334259d370c92cd8cc6b211bc81c244a
  • sample: refactor "streaming" Bernoulli sampling, so it actually works without requiring range requests support https://github.com/dathere/qsv/pull/2600

Full Changelog: https://github.com/dathere/qsv/compare/3.2.0...3.3.0

- Rust
Published by jqnatividad 11 months ago

https://github.com/dathere/qsv - 3.2.0

[3.2.0] - 2025-03-09

Added

  • sample: "streaming" bernoulli sampling of remote files when hosted on servers with range requests support https://github.com/dathere/qsv/pull/2588

Changed

  • Updated benchmarks.sh to add Homebrew installation prompt by @ondohotola in https://github.com/dathere/qsv/pull/2575
  • feat: migrate to Rust 2024 edition https://github.com/dathere/qsv/pull/2587
  • deps: bump luau from 0.660 to 0.663 https://github.com/dathere/qsv/pull/2567
  • deps: bump polars to 0.46.0 at py-1.24.0 tag https://github.com/dathere/qsv/commit/f70ce71ffa2d822aaa511e66bd11a2789786c82e
  • deps: replace deprecated simple-home-dir with directories crate https://github.com/dathere/qsv/commit/6768cd59baa20b23ac9152cc8a9ce176d9e2c362
  • deps: bump arrow from 54.2.0 to 54.2.1 https://github.com/dathere/qsv/commit/fc479b2b87843a370e072248e9b6331de690f0a2
  • build(deps): bump bytemuck from 1.21.0 to 1.22.0 by @dependabot in https://github.com/dathere/qsv/pull/2570
  • build(deps): bump console from 0.15.10 to 0.15.11 by @dependabot in https://github.com/dathere/qsv/pull/2569
  • build(deps): bump governor from 0.8.0 to 0.8.1 by @dependabot in https://github.com/dathere/qsv/pull/2562
  • build(deps): bump minijinja from 2.7.0 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2573
  • build(deps): bump minijinja-contrib from 2.7.0 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2571
  • build(deps): bump pyo3 from 0.23.4 to 0.23.5 by @dependabot in https://github.com/dathere/qsv/pull/2558
  • build(deps): bump pyo3 from 0.23.5 to 0.24.0 by @dependabot in https://github.com/dathere/qsv/pull/2590
  • build(deps): bump redis from 0.29.0 to 0.29.1 by @dependabot in https://github.com/dathere/qsv/pull/2568
  • build(deps): bump robinraju/release-downloader from 1.11 to 1.12 by @dependabot in https://github.com/dathere/qsv/pull/2580
  • build(deps): bump serde_json from 1.0.139 to 1.0.140 by @dependabot in https://github.com/dathere/qsv/pull/2572
  • build(deps): bump tempfile from 3.17.1 to 3.18.0 by @dependabot in https://github.com/dathere/qsv/pull/2581
  • build(deps): bump uuid from 1.14.0 to 1.15.0 by @dependabot in https://github.com/dathere/qsv/pull/2563
  • build(deps): bump uuid from 1.15.0 to 1.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2566
  • applied select clippy lint suggestions
  • bumped indirect dependencies to latest versions

Fixed

  • apply: fix currencytonum handling of "0.00" value by adding parsing strictness control with --formatstr option https://github.com/dathere/qsv/pull/2586
  • describegpt: fix panic by adding error handling when LLM API response is not in expected format https://github.com/dathere/qsv/pull/2577
  • tojsonl: fix display of floats as per the JSON spec https://github.com/dathere/qsv/pull/2583

New Contributors

  • @ondohotola made their first contribution in https://github.com/dathere/qsv/pull/2575

Full Changelog: https://github.com/dathere/qsv/compare/3.1.1...3.2.0

- Rust
Published by jqnatividad 12 months ago

https://github.com/dathere/qsv - 3.1.1

[3.1.1] - 2025-02-24

Highlights:

  • sample: is now a "smart" command that uses the stats cache to validate and make sampling faster.
  • With the QSVSTATSCACHEMODE env var, you can now control the stats cache behavior suite-wide, making sure "smart" commands use it when appropriate.
  • luau command's capabilities have been significantly expanded with:
    • New accumulate helper function for aggregating values across rows
    • Optional naming for cumulative helper functions
    • More robust error handling and improved docstrings
    • Enhanced scripting performance with fast-float parsing
    • new Wiki section with examples of using its helper functions
  • schema: now does type-aware sorting of enum lists, making JSON Schema enum list customization easier when fine-tuning it for JSON Schema validation with validate.
  • lens: adds --freeze-columns option with a default of 1, improving navigation of wide CSVs
  • stats: adds --dataset-stats option to explicitly compute dataset-level statistics. Starting with qsv 2.0.0, it was computed automatically to support Datapusher+ and the DRUF workflow, but it was causing confusion with some command-line users.

Added

  • lens: added --freeze-columns option https://github.com/dathere/qsv/pull/2552
  • luau: added accumulate helper function https://github.com/dathere/qsv/pull/2537 https://github.com/dathere/qsv/pull/2539
  • luau: added a new section in the Wiki with examples of using the new helper functions https://github.com/dathere/qsv/wiki/Luau-Helper-Functions-Examples
  • sample: is now "smart" - using the stats cache to validate and make sampling faster https://github.com/dathere/qsv/pull/2529 https://github.com/dathere/qsv/pull/2530 https://github.com/dathere/qsv/commit/71ec7ede121ef1e09fb19af9bac3f52aa67a7f54
  • schema: added type-aware sort of JSON Schema enum list https://github.com/dathere/qsv/pull/2551
  • stats: added --dataset-stats option https://github.com/dathere/qsv/pull/2555
  • python: added precompiled qsvpy binary for Python 3.13 https://github.com/dathere/qsv/commit/c4087788b6fee64f358047ea8ef44a5450604ec1
  • added QSVSTATSCACHEMODE env var to control stats cache suite-wide https://github.com/dathere/qsv/commit/4afb98d8729fa4c3c5f61e0a26347dad5aa1e9f8 https://github.com/dathere/qsv/commit/2adc313937ec8aa292976f8e5acf3a4e7756fd93 https://github.com/dathere/qsv/commit/ba75f0897e5a7e6579380a8a4c073a1af436648a
  • docs: updated PERFORMANCE docs and added a TLDR version https://github.com/dathere/qsv/commit/77ed167aef8f7307ec295616a8b96af2f3bb81fd https://github.com/dathere/qsv/commit/c61c249a8354ee7f4ab0d03464624f3dd3249d2b https://github.com/dathere/qsv/commit/db0bb3f147599ece48ca2e8ad1d54db83d7b897c
  • chore: added *.tab & *.ssv to typos config https://github.com/dathere/qsv/commit/523667520ac06a1c96942897aa9288fe7a9d1f5d

Changed

  • frequency: made error handling more robust https://github.com/dathere/qsv/commit/b195519ec04efcba7cfa7f99e153818d03f419d0
  • luau: refactored all cumulative helper functions (cum_) now have name as an optional argument https://github.com/dathere/qsv/pull/2540
  • schema: refactored to use QSVSTATSCACHEMODE env var https://github.com/dathere/qsv/commit/5771ff4892ab89f8ca7d6940aa02baaa0c9b1fa5
  • select: refactored select helper https://github.com/dathere/qsv/commit/bfbe64cc64a20006e4c93d8a3f6be3f326411fec
  • stats: optimized memory layout of central Stats struct https://github.com/dathere/qsv/commit/52f697e5828a5c3e059d7f25254e4aef840d8598
  • stats: optimized record_count functionality https://github.com/dathere/qsv/commit/0e3114a54a8340639c381a19251d03ab94496b04 https://github.com/dathere/qsv/commit/18791da0cc2972de2f5909fe1556d83c8b7e8f9f
  • contrib(completions): update qsv completions for qsv 3.1 by @rzmk in https://github.com/dathere/qsv/pull/2556
  • deps: bump arrow and tempfile https://github.com/dathere/qsv/commit/4cc267972622dfb703779b3d18b084006369b449
  • deps: bump cached and redis crates https://github.com/dathere/qsv/commit/e622d1447a9a8ff4ecdb22d000335fb2d129683a
  • deps: bump csvlens from 0.11 to 0.12 https://github.com/dathere/qsv/commit/b2fd985bf51fac4ec224b4664cc2fe91d8676101
  • deps: use our patched fork of csvlens with ability to freeze columns https://github.com/dathere/qsv/commit/d66ec6df0e768f29b1102108152f28028da0ec8b
  • deps: bump polars to 0.46.0 at py-1.23.0 tag https://github.com/dathere/qsv/commit/6072aa22bed211cafa2fe90be58386acd8869415
  • deps: bump flate2 from 1.0.35 to 1.1.0 https://github.com/dathere/qsv/commit/eed471a441f031d0311849a13ac3efb116baa33d
  • deps: bump gzp from 0.11 to 1.0.0 https://github.com/dathere/qsv/commit/43c8a4a414484b9a3d573cb41a713ce838a2d425
  • build(deps): bump jaq-json from 1.1.0 to 1.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2547
  • build(deps): bump jaq-core from 2.1.0 to 2.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2546
  • build(deps): bump log from 0.4.25 to 0.4.26 by @dependabot in https://github.com/dathere/qsv/pull/2545
  • build(deps): bump tempfile from 3.16.0 to 3.17.0 by @dependabot in https://github.com/dathere/qsv/pull/2532
  • build(deps): bump tempfile from 3.17.0 to 3.17.1 by @dependabot in https://github.com/dathere/qsv/pull/2535
  • build(deps): bump serde_json from 1.0.138 to 1.0.139 by @dependabot in https://github.com/dathere/qsv/pull/2541
  • build(deps): bump serde from 1.0.217 to 1.0.218 by @dependabot in https://github.com/dathere/qsv/pull/2542
  • build(deps): bump smallvec from 1.13.2 to 1.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2528
  • build(deps): bump strum from 0.27.0 to 0.27.1 by @dependabot in https://github.com/dathere/qsv/pull/2533
  • build(deps): bump strum_macros from 0.27.0 to 0.27.1 by @dependabot in https://github.com/dathere/qsv/pull/2534
  • build(deps): bump uuid from 1.13.1 to 1.13.2 by @dependabot in https://github.com/dathere/qsv/pull/2538
  • build(deps): bump uuid from 1.13.2 to 1.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2544
  • chore: we now have ~1,800 tests! https://github.com/dathere/qsv/commit/f5d09ed76d8e0acb9052f89b6688a047c756b053
  • applied select clippy lint suggestions
  • bumped indirect dependencies to latest versions
  • bumped MSRV to latest Rust stable - v1.85

Fixed

  • count: refactored to fall back to "regular" CSV reader when Polars counting returns a zero count https://github.com/dathere/qsv/commit/fd39bcbd9574d8d5ef1ddc5025eda4748f2a8652
  • schema: fixed off-by-one error https://github.com/dathere/qsv/commit/60de090bdf727dd0eaf79ba7058745fdacef07ef
  • ensured getstatsrecord helper returns field/stats correctly https://github.com/dathere/qsv/commit/ad86a373d01ea45902d764a46c19f26ad5b01029
  • Fixed RUSTSEC-2025-0007: ring is unmaintained https://github.com/dathere/qsv/issues/2548
  • stats: only add qsv__value column when --dataset-stats is enabled https://github.com/dathere/qsv/commit/64267d38c4161b8591a6f81e36bea6c7fdbddc70
  • skip format check when path starts with temp dir or is a snappy file https://github.com/dathere/qsv/commit/ff8957e77ae4c28a24f323328c58a2549ff43c0c

Removed

  • frequency: removed --stats-mode option now that we have a suite-wide QSVSTATSCACHEMODE env var https://github.com/dathere/qsv/commit/ba75f0897e5a7e6579380a8a4c073a1af436648a https://github.com/dathere/qsv/commit/416abb7ce73f406c2a605cdca87d50c12723698a
  • chore: removed simdutf8 conditional directive for aarch64 architecture, now that its no longer needed https://github.com/dathere/qsv/commit/ec1e16c7a20a7458b560e3c78dfbd83fba82de29
  • removed publish-linux-qsvpy-glibc-231-musl-123.yml workflow as it was getting cross compilation errors and we have another musl workflow that works https://github.com/dathere/qsv/commit/7c08617132e8d7df069b7b3be160d3b348f44d53

Full Changelog: https://github.com/dathere/qsv/compare/3.0.0...3.1.1

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 3.0.0

[3.0.0] - 2025-02-13

Highlights:

  • sample: Five new sampling methods! In addition to reservoir & indexed - added bernoulli, systematic, stratified, weighted & cluster sampling. And they're all memory efficient so you should be able to sample arbitrarily large datasets!
  • stats: Added "sortiness" [-1 (Descending) to 1 (Ascending)] & "uniquenessratio"_ [0 (many repeated values) to 1 (All unique values)] stats (more info).
    The qsv-stats engine was also optimized to squeeze out more performance, with stats now 2.6x faster while using less memory despite the addition of these new stats.
  • diff: is now a "smart" command, so that it uses the stats cache to short-circuit diffs if files are identical per their fingerprint hashes, and to validate that the diff key column is all unique.
  • The stats cache has been refactored and improved performance for "smart" commands:
    • frequency is not only 3.3x faster, it uses far less memory as it now doesn't need to maintain hashmaps for columns with all unique values.
    • tojsonl is 2.25x faster
    • schema is 1.4x faster
  • luau got a major performance boost with the v0.660 engine upgrade, taking advantage of several compiler optimizations. luau is now up to 3.1x faster!
  • validate had a major performance regression - going down from 3.295 seconds in v2.1.0 to 13.159 seconds in v2.2.1 in the benchmarks. 4x slower! With the jsonschema 0.29 crate update, validate now clocks in 3.022 seconds!

* template also got a big boost and is now 2.9x faster with the minijinja 2.7 crate update.

Added

  • joinp: additional joinp asof join sort and match options https://github.com/dathere/qsv/pull/2486
  • stats: add "sortiness" statistic https://github.com/dathere/qsv/pull/2499
  • stats add uniqueness_ratio https://github.com/dathere/qsv/pull/2521
  • stats & frequency: add --vis-whitespace option. Fulfills #2501 https://github.com/dathere/qsv/pull/2503
  • sample: add more sampling methods (in addition to indexed and reservoir - added bernoulli, systematic, stratified, weighted & cluster sampling) and made them all memory efficient so we can sample arbitrarily large datasets: https://github.com/dathere/qsv/pull/2507 & https://github.com/dathere/qsv/pull/2511
  • diff: make diff a "smart" command. Fulfills #2493 and #2509 https://github.com/dathere/qsv/pull/2518
  • benchmarks : added new benchmarks for sample for new sampling methods https://github.com/dathere/qsv/commit/d758c54effcef31dbc1c1eb40e0c1789050eeb34

Changed

  • luau: bump from 0.653 to 0.660 and optimize for performance https://github.com/dathere/qsv/commit/4402df6788205341552b4f4e43220ea49924a28e https://github.com/dathere/qsv/commit/de429b4bb858a7872e30eccbdb3e526ad0ea322b https://github.com/dathere/qsv/commit/07ff8b8458a042987c9d11cae5b5b1dfaa934097 https://github.com/dathere/qsv/commit/3211f5c84fc23b652e4d7da83098e7db46829081
  • stats: compute string len stats only for string columns https://github.com/dathere/qsv/pull/2495
  • contrib(completions): update qsv completions for qsv 2.2.1 by @rzmk in https://github.com/dathere/qsv/pull/2494
  • deps: bump polars to latest upstream after its py-1.22.0 release
  • deps: backported csv-core 0.1.12 fix to our qsv-optimized csv-core fork https://github.com/dathere/rust-csv/commit/5d0916e243f365a377b1b0e7c84bcf9585e83f2d
  • build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in https://github.com/dathere/qsv/pull/2488
  • build(deps): bump bytes from 1.9.0 to 1.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2497
  • build(deps): bump data-encoding from 2.7.0 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2512
  • build(deps): bump geosuggest-core from 0.6.5 to 0.6.6 by @dependabot in https://github.com/dathere/qsv/pull/2520
  • build(deps): bump geosuggest-utils from 0.6.5 to 0.6.6 by @dependabot in https://github.com/dathere/qsv/pull/2519
  • build(deps): bump jsonschema from 0.28.3 to 0.29.0 by @dependabot in https://github.com/dathere/qsv/pull/2510
  • build(deps): bump minijinja from 2.6.0 to 2.7.0 by @dependabot in https://github.com/dathere/qsv/pull/2489
  • build(deps): bump mlua from 0.10.2 to 0.10.3 by @dependabot in https://github.com/dathere/qsv/pull/2485
  • build(deps): bump qsv-stats from 0.27.0 to 0.28.0 by @dependabot in https://github.com/dathere/qsv/pull/2496
  • build(deps): bump qsv-stats from 0.28.0 to 0.29.0 by @dependabot in https://github.com/dathere/qsv/pull/2498
  • build(deps): bump qsv-stats from 0.29.0 to 0.30.0 by @dependabot in https://github.com/dathere/qsv/pull/2505
  • chore: Bump rand to 0.9 https://github.com/dathere/qsv/pull/2504
  • build(deps): bump simple-home-dir from 0.4.6 to 0.4.7 by @dependabot in https://github.com/dathere/qsv/pull/2515
  • build(deps): bump uuid from 1.12.1 to 1.13.1 by @dependabot in https://github.com/dathere/qsv/pull/2500
  • bumped numerous indirect dependencies to latest versions
  • applied select clippy lint suggestions
  • bumped MSRV to latest Rust stable - v1.84.1

Fixed

  • docs: QSVAUTOINDEX => QSVAUTOINDEX_SIZE typo. Fixes #2479 https://github.com/dathere/qsv/pull/2484
  • fix: search & searchset off by 1 when using --flag option. Fixes #2508 https://github.com/dathere/qsv/pull/2513

Full Changelog: https://github.com/dathere/qsv/compare/2.2.1...3.0.0

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 2.2.1

[2.2.1] - 2025-01-27

Changed

  • deps: bumped polars to 0.46.0. This will allow us to publish qsv to crates.io as qsv was using features that were not enabled in polars 0.45.1 https://github.com/dathere/qsv/commit/275b2b8bd3cb41d9ddf30ba721d393d446bd2b48

Fixed

  • stats: fix cache json processing bug. Fixes #2476 https://github.com/dathere/qsv/pull/2477
  • benchmarks: v6.1.0 - ensured all stats cache benchmarks actually used the stats cache even if the default --cache-threshold is 5 seconds - too high to trigger stats cache creation https://github.com/dathere/qsv/commit/ac33010260bf55c3424f8baa195f359f10ffe088

Full Changelog: https://github.com/dathere/qsv/compare/2.2.0...2.2.1

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 2.2.0

[2.2.0] - 2025-01-26

Highlights:

  • stats - the :heart: of qsv, got a little tune-up:
    • It got a tad faster now that we only compute string length stats for string types. Previously, we were also computing length for numbers, thinking it'll be useful for storage sizing purposes (as everything is stored as string with CSV). But as performance is goal number 1, we're no longer doing so. Besides, this sizing info can be derived using other stats.
    • Fixed the problem with the stats cache being deleted/ignored even when not necessary.
      This bug snuck in while implementing the --cache-threshold cache suppression option. With stats getting its cache mojo back - expect near-instant cache-backed response not only for stats but also other "automagical" smart commands 🪄.
  • diff - @janriemer squashed some bugs without sacrificing diff's ludicrous speed! :wink:
  • validate: added dynamicEnum custom JSON Schema keyword column specifier support.
    You can now specify which column to validate against (by name or by 0-based column index), instead of always using the first column. This works for local & remote lookup files using the http/s://, ckan:// and dathere:// URL schemes.
  • extdedup now actually uses a proper memory-mapped backed on-disk hash table.
    Previously, it was only deduping in-memory as the odht crate was not properly wired to a memory mapped file :facepalm: (I took the name of the odht crate literally and thought it was handling it :shrug:). Thanks for the detailed bug report @Svenskunganka!
  • JSON query parsing overhaul.
    The fetch, fetchpost & json commands now use the latest jaq engine, making for faster performance especially now that we're precompiling and caching the jaq filter.
  • Polars engine upgraded. :polar_bear:
    By two versions! py-polars 1.20.0 and 1.21.0 - giving the sqlp, joinp, pivotp & count commands a little boost. :rocket:

NOTE: qsv v2.2.0 is not available on crates.io as it does not allow enabling unreleased features as we await a new version of Polars. As soon as Polars 0.46.0 is published, a new qsv patch release will be published to crates.io. This means that installation option 3 using cargo install will be limited to 1.0.0 - the last qsv version available on crates.io. All other installation and update options to install/update qsv 2.2.0 still work.


Added

  • diff: add --delimiter "convenience" option. Fulfills #2447 https://github.com/dathere/qsv/pull/2464
  • slice: add stdin and snappy compressed file support https://github.com/dathere/qsv/commit/ab34a623f32bd25d9ff761972f66faa85f510a5d
  • validate: add dynamicEnum column specifier support. Fulfills #2470 https://github.com/dathere/qsv/pull/2472

Changed

  • fetch, fetchpost & json: jaq dependency upgrade - from jaq-interpret & jaq-parse to jaq-core/jaq-json/jaq-std https://github.com/dathere/qsv/pull/2458
  • fetch & fetchpost: cache compiled jaq filter https://github.com/dathere/qsv/pull/2467
  • joinp: adjust asofby test to reflect Polars py-1.20.0 behavior https://github.com/dathere/qsv/commit/853a266c866aa54598b6b1a3faa253d151a6b472
  • stats: compute string length stats for string type only https://github.com/dathere/qsv/pull/2471
  • sqlp: wordsmith fastpath explanation https://github.com/dathere/qsv/commit/4e3f85397f67cbe20562e8a84c228b7dc61e4bd7
  • refactor: standardize -q and -Q shortcut options. Fulfills #2466 https://github.com/dathere/qsv/pull/2468
  • deps: bump polars to 0.45.1 at py-polars-1.20.0 tag https://github.com/dathere/qsv/pull/2448
  • deps: bump polars to 0.45.1 at py-polars-1.21.0 tag https://github.com/dathere/qsv/commit/4525d00ecd4845feaac2062d40bb7bc64c13688f
  • deps: Bump csv-diff to 0.1.1 by @janriemer in https://github.com/dathere/qsv/pull/2456
  • deps: Bump csvlens to latest upstream https://github.com/dathere/qsv/commit/27a723eee4af046920a022605ad6c3476c0962e4
  • deps: use latest strum upstream https://github.com/dathere/qsv/commit/2ca1b0d476a20b93c786d0839cc5077e26fd6d88
  • build(deps): bump base62 from 2.2.0 to 2.2.1 by @dependabot in https://github.com/dathere/qsv/pull/2440
  • build(deps): bump chrono-tz from 0.10.0 to 0.10.1 by @dependabot in https://github.com/dathere/qsv/pull/2449
  • build(deps): bump data-encoding from 2.6.0 to 2.7.0 by @dependabot in https://github.com/dathere/qsv/pull/2444
  • build(deps): bump indexmap from 2.7.0 to 2.7.1 by @dependabot in https://github.com/dathere/qsv/pull/2461
  • build(deps): bump jsonschema from 0.28.1 to 0.28.2 by @dependabot in https://github.com/dathere/qsv/pull/2469
  • build(deps): bump jsonschema from 0.28.2 to 0.28.3 by @dependabot in https://github.com/dathere/qsv/pull/2473
  • build(deps): bump log from 0.4.22 to 0.4.25 by @dependabot in https://github.com/dathere/qsv/pull/2439
  • build(deps): bump semver from 1.0.24 to 1.0.25 by @dependabot in https://github.com/dathere/qsv/pull/2459
  • build(deps): bump serde_json from 1.0.135 to 1.0.136 by @dependabot in https://github.com/dathere/qsv/pull/2455
  • build(deps): bump serde_json from 1.0.136 to 1.0.137 by @dependabot in https://github.com/dathere/qsv/pull/2460
  • build(deps): bump simple-home-dir from 0.4.5 to 0.4.6 by @dependabot in https://github.com/dathere/qsv/pull/2445
  • build(deps): bump uuid from 1.11.1 to 1.12.0 by @dependabot in https://github.com/dathere/qsv/pull/2441
  • build(deps): bump uuid from 1.12.0 to 1.12.1 by @dependabot in https://github.com/dathere/qsv/pull/2465
  • tests: enabled Windows CI caching for faster CI tests
  • bumped numerous indirect dependencies to latest versions
  • applied select clippy lint suggestions

Fixed

  • count: Sometimes, polars count returns zero even if there are rows. Fixed by doing a regular csv reader count when polars count returns zero https://github.com/dathere/qsv/commit/abcd36524d6c26a17a2ecfac54498ecab58fe87c
  • diff: Fix name to index conversion by @janriemer. Fixes #2443 https://github.com/dathere/qsv/pull/2457
  • extdedup: refactor/fix to actually have on-disk hash table backed by a mem-mapped file. Fixes #2462 https://github.com/dathere/qsv/pull/2475
  • stats: fix stats caching as it was inadvertently deleting the stats cache even when not necessary https://github.com/dathere/qsv/commit/96e6d289d31a2b22345524fb5cc71eca0d6ffae9

Removed

  • foreach: refactored to remove unmaintained local-encoding dependency https://github.com/dathere/qsv/pull/2454
  • remove polars feature from qsvdp binary variant. We'll use py-polars from DP+ directly.

Full Changelog: https://github.com/dathere/qsv/compare/2.1.0...2.2.0

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 2.1.0

[2.1.0] - 2025-01-12

Highlights:

  • join & joinp fine-tuning continues, with several join key transformation options (--ignore-leading-zeros & --norm-unicode); join fixes for --right-anti and --right-semi joins; and reverting a join performance regression with 2.0.0.
  • pivotp uses more summary statistics for even smarter aggregation suggestions

NOTE: qsv v2.1.0 is not available on crates.io. This was caused by qsv's use of a brand new string_normalize Polars feature that is not yet available on the latest release of Polars - v0.45.1. Once a new version of Polars is published with this feature, a new qsv patch release will be published to crates.io. This means that installation option 3 using cargo install will be limited to 1.0.0 - the last qsv version available on crates.io. All other installation and update options to qsv 2.1.0 still work.


Added

  • join: add --ignore-leading-zeros option https://github.com/dathere/qsv/pull/2430
  • joinp add --norm-unicode option to unicode normalize join keys https://github.com/dathere/qsv/pull/2436
  • pivotp added more smart aggregation suggestions https://github.com/dathere/qsv/pull/2428
  • template: added to qsvdp binary variant https://github.com/dathere/qsv/commit/9df85e65dedf130981ab430764b4a4cdc9382dc8
  • benchmarks: added pivotp benchmark https://github.com/dathere/qsv/commit/92e4c51cb17e5511f668b4a2cc96d9cab28a4758

Changed

  • joinp: refactored --ignore-leading-zeros handling https://github.com/dathere/qsv/pull/2433
  • Migrate from unmaintained dynfmt to dynfmt2 https://github.com/dathere/qsv/pull/2421
  • deps: bump csvlens to latest upstream https://github.com/dathere/qsv/commit/52c766da43642c2eef6f35819d8e9fb0966700a3
  • deps: bump to latest csv qsv-optimized fork https://github.com/dathere/qsv/commit/58ac650abfa51b7b8deb23d1a8917b3983515755
  • deps: bumped MiniJinja to 2.6.0 https://github.com/dathere/qsv/commit/8176368434982ba6bd206762c524a3dc47370039
  • deps: bump to latest Polars upstream
  • deps: bump qsv-stats to 0.26.0
  • build(deps): bump azure/trusted-signing-action from 0.5.0 to 0.5.1 by @dependabot in https://github.com/dathere/qsv/pull/2420
  • build(deps): bump base62 from 2.0.3 to 2.1.0 by @dependabot in https://github.com/dathere/qsv/pull/2419
  • build(deps): bump base62 from 2.1.0 to 2.2.0 by @dependabot in https://github.com/dathere/qsv/pull/2426
  • build(deps): bump phf from 0.11.2 to 0.11.3 by @dependabot in https://github.com/dathere/qsv/pull/2417
  • build(deps): bump pyo3 from 0.23.3 to 0.23.4 by @dependabot in https://github.com/dathere/qsv/pull/2431
  • build(deps): bump serde_json from 1.0.134 to 1.0.135 by @dependabot in https://github.com/dathere/qsv/pull/2416
  • build(deps): bump tokio from 1.42.0 to 1.43.0 by @dependabot in https://github.com/dathere/qsv/pull/2423
  • build(deps): bump uuid from 1.11.0 to 1.11.1 by @dependabot in https://github.com/dathere/qsv/pull/2427
  • apply several clippy suggestions
  • bumped numerous indirect dependencies to latest versions
  • bumped Rust nightly from 2024-12-19 to 2025-01-05 (same version used by Polars)
  • bump MSRV to latest Rust stable - v1.84.0

Fixed

  • join: revert optimization that actually resulted in a performance regression https://github.com/dathere/qsv/commit/e42af2b4e9ab9ef4eed43b97e343e253c50a35a1
  • join: --right-anti and --right-semi joins didn't swap headers properly https://github.com/dathere/qsv/pull/2435
  • count: polars-powered count didn't use the right data type SQL count(*) https://github.com/dathere/qsv/commit/d8c1524ca0dff4ac19164ccb8090b01fd740b571

Full Changelog: https://github.com/dathere/qsv/compare/2.0.0...2.1.0

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 2.0.0

qsv v2.0.0 is here! 🎉

It took 193 releases to get to v1.0.0, and we're already at v2.0.0 a month later!?!

Yes! We wanted a running start for 2025, and qsv 2.0.0 marks qsv's biggest release yet!

  • It fully enables the "Data Resource Upload First (DRUF)" workflow, allowing Datapusher+ to infer "automagical metadata" from the data itself. It exposes two Domain Specific Language (DSL) options - Luau and MiniJinja - to enable powerful data transformation and validation capabilities. This allows data stewards to upload data first, then use qsv's DSL capabilities inside DP+ to automatically generate rich metadata - including data dictionaries, field descriptions, data quality rules, and data validation schemas. This "automagical metadata" approach dramatically reduces the friction in compiling high-quality, high-resolution metadata (using the DCAT-US 3.0 specification as a reference) that would otherwise be a manual, laborious, and error-prone process.
    Under the hood, the fetchpost, template, stats, validate and luau commands now have the necessary scaffolding to fully support this workflow inside Datapusher+ and ckanext-scheming.
  • It adds a new "smart" pivotp command, powered by Polars, to enable fast pivot operations on large datasets. It's "smart" as it uses the stats cache to automatically suggest an aggregation based on a column's data type and summary statistics. You can now pivot your data in seconds by simply specifying the columns to pivot on while blowing past Excel's PivotTable limitations.
  • stats now computes geometric mean and harmonic mean and adds string length stats, all while getting a performance boost.
  • join and joinp got a lot of love in this release, with several new options:
    • joinp: non-equi join support! 🎉💯🥳
      See "Lightning Fast and Space Efficient Inequality Joins" paper and this Polars non-equi join tracking issue.
    • join & joinp: --right-anti and --right-semi joins
    • joinp: --ignore-leading-zeros option for join keys
    • joinp: --maintain-order option to maintain the order of the either the left or right dataset in the output
    • joinp: expanded --cache-schema options to make joinp smarter/faster by leveraging the stats cache
    • join: --keys-output option to write successfully joined keys to a separate output file.

This release lays the groundwork for the outliers "smart" command to quickly identify outliers using stats/frequency info.

It also sets the stage for an initial implementation of our "Data Concierge" that leverages all the high-quality, high-res metadata we automagically compile with DRUF to enable Metadata Gardening Agents to proactively link seemingly unrelated data and glean insights as it constantly grooms the Data Catalog - effectively making it a FAIR Data Factory.


Added

  • fetchpost: add --globals-json option https://github.com/dathere/qsv/pull/2357
  • fixlengths: add --remove-empty option; refactored for performance. Fulfills #2391. https://github.com/dathere/qsv/pull/2411
  • join: add --keys-output option. Fulfills #2407. https://github.com/dathere/qsv/pull/2408
  • join: add --right-anti and --right-semi options. Fulfills #2379. https://github.com/dathere/qsv/pull/2380
  • joinp: add non-equi join support! 🎉💯🥳 https://github.com/dathere/qsv/pull/2409
  • joinp: add --ignore-leading-zeros option. Fulfills #2398. https://github.com/dathere/qsv/pull/2400
  • joinp: add --maintain-order option https://github.com/dathere/qsv/pull/2338
  • joinp: add --right-anti and --right-semi options. Fulfills #2377. https://github.com/dathere/qsv/pull/2378
  • luau: addl helper functions. Fulfills #1782. https://github.com/dathere/qsv/pull/2362
  • luau: add qsv_writejson helper https://github.com/dathere/qsv/pull/2375
  • pivotp: new polars polars-powered command. Fulfills #799. https://github.com/dathere/qsv/pull/2364
  • pivotp: "smart" pivotp. https://github.com/dathere/qsv/pull/2367
  • stats: add geometric mean and harmonic mean. Fulfills #2227. https://github.com/dathere/qsv/pull/2342
  • stats: add string length stats to set stage for upcoming outliers "smart" command to quickly identify outliers using stats/frequency info https://github.com/dathere/qsv/pull/2390
  • template: add --globals-json option https://github.com/dathere/qsv/pull/2356
  • tojsonl: add --quiet option. Fulfills #2335. https://github.com/dathere/qsv/pull/2336
  • validate: add --validate-schema option to check if the JSON Schema itself is valid https://github.com/dathere/qsv/pull/2393
  • contrib(completions): add joinp --ignore-case and slice --invert by @rzmk in https://github.com/dathere/qsv/pull/2322
  • contrib(completions): add --quiet to tojsonl by @rzmk in https://github.com/dathere/qsv/pull/2337
  • ci: add qsvglibc2.31-headless to action by @rzmk in https://github.com/dathere/qsv/pull/2330
  • Add license to MSI installer by @rzmk in https://github.com/dathere/qsv/pull/2321

Changed

  • lens: optimized csvlens library usage, dropping clap dependency https://github.com/dathere/qsv/pull/2403
  • pivotp: an even smarter pivotp https://github.com/dathere/qsv/pull/2368
  • stats: performance boost https://github.com/dathere/qsv/commit/51349ba8f0121804a1a6766371f1e17c0da800b6
  • Update deb package by @tino097 in https://github.com/dathere/qsv/pull/2226
  • ci: attempt using files-folder instead of files by @rzmk in https://github.com/dathere/qsv/pull/2320
  • Setting QSVFREEMEMORYHEADROOM_PCT to 0 disables memory availability check https://github.com/dathere/qsv/pull/2353
  • build(deps): bump actix-governor from 0.7.0 to 0.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2351
  • build(deps): bump bytemuck from 1.20.0 to 1.21.0 by @dependabot in https://github.com/dathere/qsv/pull/2361
  • build(deps): bump chrono from 0.4.38 to 0.4.39 by @dependabot in https://github.com/dathere/qsv/pull/2345
  • build(deps): bump crossbeam-channel from 0.5.13 to 0.5.14 by @dependabot in https://github.com/dathere/qsv/pull/2354
  • build(deps): bump flexi_logger from 0.29.6 to 0.29.7 by @dependabot in https://github.com/dathere/qsv/pull/2348
  • build(deps): bump governor from 0.7.0 to 0.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2347
  • build(deps): bump itertools from 0.13.0 to 0.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2413
  • build(deps): bump jsonschema from 0.26.1 to 0.26.2 by @dependabot in https://github.com/dathere/qsv/pull/2355
  • build(deps): bump jsonschema from 0.26.2 to 0.27.0 by @dependabot in https://github.com/dathere/qsv/pull/2371
  • build(deps): bump jsonschema from 0.27.1 to 0.28.0 by @dependabot in https://github.com/dathere/qsv/pull/2389
  • build(deps): bump jsonschema from 0.28.0 to 0.28.1 by @dependabot in https://github.com/dathere/qsv/pull/2396
  • bump polars from 0.44.2 to 0.45 https://github.com/dathere/qsv/pull/2340
  • build(deps): bump polars from 0.45.0 to 0.45.1 by @dependabot in https://github.com/dathere/qsv/pull/2344
  • bump pyo3 from 0.22 to 0.23 now that Polars supports it https://github.com/dathere/qsv/pull/2352
  • build(deps): bump redis from 0.27.5 to 0.27.6 by @dependabot in https://github.com/dathere/qsv/pull/2331
  • build(deps): bump reqwest from 0.12.9 to 0.12.11 by @dependabot in https://github.com/dathere/qsv/pull/2385
  • build(deps): bump reqwest from 0.12.11 to 0.12.12 by @dependabot in https://github.com/dathere/qsv/pull/2395
  • build(deps): bump rfd from 0.15.1 to 0.15.2 by @dependabot in https://github.com/dathere/qsv/pull/2404
  • build(deps): bump serde from 1.0.215 to 1.0.216 by @dependabot in https://github.com/dathere/qsv/pull/2349
  • build(deps): bump serde from 1.0.216 to 1.0.217 by @dependabot in https://github.com/dathere/qsv/pull/2384
  • build(deps): bump serde_json from 1.0.133 to 1.0.134 by @dependabot in https://github.com/dathere/qsv/pull/2365
  • build(deps): bump sysinfo from 0.32.1 to 0.33.0 by @dependabot in https://github.com/dathere/qsv/pull/2334
  • build(deps): bump sysinfo from 0.33.0 to 0.33.1 by @dependabot in https://github.com/dathere/qsv/pull/2383
  • deps: bump tabwriter to 1.4.1 https://github.com/dathere/qsv/commit/bbcbeba193b7b1808bcd359c460fb688b49107f0
  • build(deps): bump tokio from 1.41.1 to 1.42.0 by @dependabot in https://github.com/dathere/qsv/pull/2333
  • build(deps): bump xxhash-rust from 0.8.12 to 0.8.13 by @dependabot in https://github.com/dathere/qsv/pull/2359
  • build(deps): bump xxhash-rust from 0.8.13 to 0.8.14 by @dependabot in https://github.com/dathere/qsv/pull/2372
  • build(deps): bump xxhash-rust from 0.8.14 to 0.8.15 by @dependabot in https://github.com/dathere/qsv/pull/2392
  • apply several clippy suggestions
  • bumped numerous indirect dependencies to latest versions
  • bumped Rust nightly from 2024-11-28 to 2024-12-19 (same version used by Polars)

Fixed

  • joinp: refactor --cache-schema option. Resolves #2369. https://github.com/dathere/qsv/pull/2370
  • extsort underflow in CSV mode. Resolves #2391. https://github.com/dathere/qsv/pull/2412
  • instantiate logger properly https://github.com/dathere/qsv/commit/9c0c1a7a63ef3773e599f6fa91e6fa3b734936df
  • fix util::get_stats_records() to no longer infer boolean in StatsMode::PolarsSchema. Resolves #2369. https://github.com/dathere/qsv/commit/cebb6642daf8b528ed8c95be9fc47709abe1c50a

Full Changelog: https://github.com/dathere/qsv/compare/1.0.0...2.0.0

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 1.0.0

qsv v1.0.0 is here! 🎉

After over 3 years of development, nearly 200 releases, and 11,000+ commits, qsv has finally reached v1.0.0!

What started as a hobby project to learn Rust during COVID has evolved into a powerful data wrangling tool used in multiple datHere products, open source projects, and even in several mission-critical production environments!

To mark this major milestone, this larger than usual release includes major performance improvements, new features, and various optimizations!


Added

  • joinp: add --ignore-case option https://github.com/dathere/qsv/pull/2287
  • py: add ability to load python expression from file https://github.com/dathere/qsv/pull/2295
  • replace: add --not-one flag (resolves #2305) by @rzmk in https://github.com/dathere/qsv/pull/2307
  • slice: add --invert option https://github.com/dathere/qsv/pull/2298
  • stats: add dataset-level stats https://github.com/dathere/qsv/pull/2297
  • sqlp: auto-decompression of gzip, zstd & zlib compressed csv files with read_csv table function (implements suggestion from @wardi in #2301) https://github.com/dathere/qsv/pull/2315
  • template: add lookup support https://github.com/dathere/qsv/pull/2313
  • added ui feature to make it easier to make a headless build of qsv https://github.com/dathere/qsv/pull/2289
  • added better panic handling https://github.com/dathere/qsv/pull/2304
  • added new benchmark for template command https://github.com/dathere/qsv/commit/cd7e480de5ff1e2766a16b8d21767b76fbf10d35
  • added 📚 lookup support legend https://github.com/dathere/qsv/commit/b46de73f57ba35ee08581a4f20809a5f581d461b

Changed

  • move qsv from personal Github repo to datHere GitHub org https://github.com/dathere/qsv/pull/2317
  • template: parallelized template rendering for significant speedups https://github.com/dathere/qsv/pull/2273
  • simplify input format check https://github.com/dathere/qsv/pull/2309
  • bump embedded luau from 0.650 to 0.653 https://github.com/dathere/qsv/commit/986a1d3b4e60f15c25ef8a157c7e9e205ae8e7a9
  • deps: Switch back to simple-home-dir from simple-expand-tilde https://github.com/dathere/qsv/pull/2319
  • deps: Add minijinja contrib https://github.com/dathere/qsv/pull/2276
  • deps: bump pyo3 down to 0.21.2 because polars-mem-engine is not compatible with pyo3 0.23.x yet https://github.com/dathere/qsv/commit/7f9fc8a6cfe94a104d33e895ecae11e2f40274ee
  • build(deps): bump base62 from 2.0.2 to 2.0.3 by @dependabot in https://github.com/dathere/qsv/pull/2281
  • build(deps): bump bytemuck from 1.19.0 to 1.20.0 by @dependabot in https://github.com/dathere/qsv/pull/2299
  • build(deps): bump bytes from 1.8.0 to 1.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2314
  • build(deps): bump file-format from 0.25.0 to 0.26.0 by @dependabot in https://github.com/dathere/qsv/pull/2277
  • build(deps): bump hashbrown from 0.15.1 to 0.15.2 by @dependabot in https://github.com/dathere/qsv/pull/2310
  • build(deps): bump itoa from 1.0.11 to 1.0.12 by @dependabot in https://github.com/dathere/qsv/pull/2300
  • build(deps): bump itoa from 1.0.12 to 1.0.13 by @dependabot in https://github.com/dathere/qsv/pull/2302
  • build(deps): bump itoa from 1.0.13 to 1.0.14 by @dependabot in https://github.com/dathere/qsv/pull/2311
  • build(deps): bump mlua from 0.10.0 to 0.10.1 by @dependabot in https://github.com/dathere/qsv/pull/2280
  • build(deps): bump mlua from 0.10.1 to 0.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2316
  • build(deps): bump serial_test from 3.1.1 to 3.2.0 by @dependabot in https://github.com/dathere/qsv/pull/2279
  • build(deps): bump minijinja from 2.4.0 to 2.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2284
  • build(deps): bump minijinja-contrib from 2.3.1 to 2.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2283
  • build(deps): bump rfd from 0.15.0 to 0.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2291
  • build(deps): bump sanitize-filename from 0.5.0 to 0.6.0 by @dependabot in https://github.com/dathere/qsv/pull/2275
  • build(deps): bump serde from 1.0.214 to 1.0.215 by @dependabot in https://github.com/dathere/qsv/pull/2286
  • build(deps): bump serde_json from 1.0.132 to 1.0.133 by @dependabot in https://github.com/dathere/qsv/pull/2292
  • build(deps): bump tempfile from 3.13.0 to 3.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2278
  • build(deps): bump tokio from 1.41.0 to 1.41.1 by @dependabot in https://github.com/dathere/qsv/pull/2274
  • build(deps): bump url from 2.5.3 to 2.5.4 by @dependabot in https://github.com/dathere/qsv/pull/2306
  • applied several clippy suggestions
  • bumped numerous indirect dependencies to latest versions
  • bumped MSRV to latest Rust stable (1.83.0)
  • bumped Rust nightly from 2024-11-01 to 2024-11-28, the same version used by Polars

Fixed

  • fix get_stats_records() helper to handle input files with embedded spaces (fixes #2294) https://github.com/dathere/qsv/pull/2296
  • added better panic handling (fixes #2301) https://github.com/dathere/qsv/pull/2304
  • implement simple format check for input files (fixes #2301) https://github.com/dathere/qsv/pull/2308

Removed

  • removed simple-expand-tilde dependency in favor of simple-home-dir https://github.com/dathere/qsv/pull/2318
  • removed patched fork of indicatif now that 0.17.9 is released, fixing GH unmaintained advisory for instant https://github.com/dathere/qsv/commit/33fa54a1651ce29d286c0e1ff4f3d77bbbd2ffd5
  • removed clipboard command from qsvlite binary variant https://github.com/dathere/qsv/commit/9c663d84da49cbbe53d7c9df6bd747cad0d9ba24

Full Changelog: https://github.com/dathere/qsv/compare/0.138.0...1.0.0

- Rust
Published by jqnatividad about 1 year ago

https://github.com/dathere/qsv - 0.138.0

Highlights:

  • :star: New template command for rendering templates with CSV data.
    Generate complex documents from CSVs (Form letters, HTML, JSON, XML files, etc.) with the powerful MiniJinja template engine (Example template).

  • :star: New lookup module for fetching reference data from remote and local files.
    In addition to the typical http/https schemes for remote files, qsv adds two additional schemes - CKAN:// and datHere://, fetching lookup data from a CKAN site or datHere maintained reference data respectively. The lookup module has simple file-based caching as well to minimize repeated fetching of typically static reference data (default cache age: 600 seconds).
    The lookup module is now being used by the luau (for its qsv_register_lookup helper) and validate (for its dynamicEnum custom JSON Schema keyword) commands. More commands will take advantage of this module over time (e.g. apply, geocode, template, sqlp, etc.) to do extended lookups (e.g. lookup Census information given spatiotemporal data - like demographic info of a Census tract).

  • :sparkles: Enhanced fetchpost with MiniJinja templating for payload construction.
    Previously, fetchpost was limited to posting url-encoded HTML Form data with content type application/x-www-form-urlencoded. Now with the new --payload-tpl and --content-type options, users can post request bodies rendered with MiniJinja and specify other content types (typically application/json, text/plain, multipart/form-data) as well.

  • :sparkles: Improved Polars integration with automatic schema detection
    The joinp and sqlp commands now use qsv's stats cache to automatically determine column data types, rather than having Polars scan a sample of rows. This provides two key benefits:

    1. Faster execution by skipping Polars' schema inference step
    2. GUARANTEED data type inferencing since the stats cache analyzes the entire dataset, not just a sample
  • :running: fast-float2 crate for faster float parsing
    Casting string/bytes to float is now much faster (2 to 8x faster than Rust's standard library) with fast-float2.

  • :muscle: Major dependency updates including Polars 0.44.2, Luau 0.650, mlua 0.10.0 and jsonschema 0.26.1
    These core crates underpin qsv's advanced commands. Using the latest version of these crates allow qsv to stay true to its goal of being the fastest and most comprehensive data-wrangling toolkit.


Added

  • added lookup module - enabling fetching and caching of reference data from remote and local files https://github.com/jqnatividad/qsv/pull/2262
  • fetchpost: add --payload-tpl <file> and --content-type options to construct payload using MiniJinja with the appropriate content-type https://github.com/jqnatividad/qsv/pull/2268 https://github.com/jqnatividad/qsv/commit/592149867997da6ac56d20a7e7f84252b2baeb2a
  • joinp: derive polars schema from stats cache https://github.com/jqnatividad/qsv/commit/86fe22ee4e3677dc702eaf21175c60ceb8166001
  • sqlp: derive polars schema from stats cache https://github.com/jqnatividad/qsv/pull/2256
  • template: new command to render MiniJinja templates with CSV data https://github.com/jqnatividad/qsv/pull/2267
  • validate: add dynamicEnum lookup support https://github.com/jqnatividad/qsv/pull/2265
  • contrib(completions): add template command and update fetchpost by @rzmk in https://github.com/jqnatividad/qsv/pull/2269
  • add fast-float2 dependency for faster bytes to float conversion https://github.com/jqnatividad/qsv/commit/7590e4ed171eeb6804845e1b54bec0fa26cca706 https://github.com/jqnatividad/qsv/commit/3ca30aa878ed3c4dc58944d46f53fb0c4b955356
  • added more benchmarks for new/updated commands https://github.com/jqnatividad/qsv/commit/f8a1d4fff11d78860c102c1375653822ee95ca58 https://github.com/jqnatividad/qsv/commit/cd7e480de5ff1e2766a16b8d21767b76fbf10d35

Changed

  • luau: adapt to mlua 0.10 API changes https://github.com/jqnatividad/qsv/commit/268cb45a04a49360befb81af76cc1cddd6307286
  • luau: refactored stage management https://github.com/jqnatividad/qsv/commit/31ef58a82b8f80fe0b29260f9170f10220c73714
  • luau: now uses the lookup module https://github.com/jqnatividad/qsv/commit/2f4be3473a90252df4fd559a5f3b38246a3da696
  • stats: minor perf refactoring https://github.com/jqnatividad/qsv/commit/6cdd6ea94adbae063e7fb6d9da71dac0c86adc12
  • build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2243
  • build(deps): bump azure/trusted-signing-action from 0.4.0 to 0.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2239
  • build(deps): bump bytes from 1.7.2 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2231
  • build(deps): bump cached from 0.53.1 to 0.54.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2272
  • build(deps): bump flexi_logger from 0.29.3 to 0.29.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2229
  • build(deps): bump flexi_logger from 0.29.4 to 0.29.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2261
  • build(deps): bump flexi_logger from 0.29.5 to 0.29.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/2266
  • build(deps): bump hashbrown from 0.15.0 to 0.15.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2270
  • build(deps): bump jsonschema from 0.24.0 to 0.24.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2234
  • build(deps): bump jsonschema from 0.24.1 to 0.24.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2238
  • build(deps): bump jsonschema from 0.24.2 to 0.24.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2240
  • build(deps): bump jsonschema from 0.25.0 to 0.25.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2244
  • build(deps): bump jsonschema from 0.26.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2260
  • build(deps): bump regex from 1.11.0 to 1.11.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2242
  • build(deps): bump reqwest from 0.12.8 to 0.12.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/2258
  • build(deps): bump serde from 1.0.210 to 1.0.211 by @dependabot in https://github.com/jqnatividad/qsv/pull/2232
  • build(deps): bump serde from 1.0.211 to 1.0.213 by @dependabot in https://github.com/jqnatividad/qsv/pull/2236
  • build(deps): bump serde from 1.0.213 to 1.0.214 by @dependabot in https://github.com/jqnatividad/qsv/pull/2259
  • build(deps): bump simd-json from 0.14.1 to 0.14.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2235
  • build(deps): bump tokio from 1.40.0 to 1.41.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2237
  • deps: updated our fork of the csv crate with more perf optimizations https://github.com/jqnatividad/qsv/commit/eae7d764bd31d717bdf123646ea85c81ed829829
  • deps: use calamine upstream with unreleased fixes https://github.com/jqnatividad/qsv/commit/4cc7f37e9c34b712ae2c5f43c018b2d6a6655ebb
  • deps: use our csvlens fork untl PR removing unneeded arboard features is merged https://github.com/jqnatividad/qsv/commit/bb3232205b7a948848c2949bcaf3b54e54f3d49b
  • deps: bump jsonschema from 0.25 to 0.26 https://github.com/jqnatividad/qsv/pull/2251
  • deps: bump embedded Luau from 0.640 to 0.650 https://github.com/jqnatividad/qsv/commit/8c54b875bf8768849b128ab15d96c33b02be180b https://github.com/jqnatividad/qsv/commit/aca30b072ecb6bb22d7edbe8ddef348649a5d699
  • deps: bump mlua from 0.9 to 0.10 https://github.com/jqnatividad/qsv/pull/2249
  • deps: bump Polars from 0.43.1 at py-1.11.0 tag to latest 0.44.2 upstream https://github.com/jqnatividad/qsv/pull/2255 https://github.com/jqnatividad/qsv/commit/0e40a4429b4ef219ab7a11c91767e95778470ef2
  • apply select clippy lint suggestions
  • updated indirect dependencies
  • aligned Rust nightly to Polars nightly - 2024-10-28 - https://github.com/jqnatividad/qsv/commit/245bcb55af416960aa603c05de960289f6125c5c

Fixed

  • fix documentation typo: it's → its by @tmtmtmtm in https://github.com/jqnatividad/qsv/pull/2254

Removed

  • removed need to set RAYONNUMTHREADS env var and just call the Rayon API directly https://github.com/jqnatividad/qsv/commit/aa6ef89eceac89c3d1ed19068e0e23a451c4402d
  • removed unneeded create_dir_all_threadsafe helper now that std::createdirall is threadsafe https://github.com/jqnatividad/qsv/commit/d0af83bfbd0430fa22f039bd00615380110f456e

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.137.0...0.138.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.137.0

Highlights:

  • extdedup & extsort now support two modes - LINE mode and CSV mode. Previously, both commands only sorted on a line-by-line basis (LINE mode).
    With the addition of CSV mode, you can now deduplicate or sort CSV files on a column-by-column basis, with the powerful --select option to specify which columns to deduplicate or sort on. This is especially useful for large CSV files with many columns, where you only want to deduplicate or sort on a subset of columns. And since both commands use the disk and are streaming, they can handle files larger than memory.
  • sqlp now has a --cache-schema option that caches the schema of the input CSV file, which can significantly speed up subsequent queries on the same file.
  • fetch and fetchpost have been updated to use the jaq (a jq-like tool for parsing JSON) crate instead of the jql crate. This change was made to improve performance and to make the commands more consistent with the json command which also uses jaq. Furthermore, jaq is a clone of jq - which is widely used and has a large community, so it should be more familiar to users.
  • stats is a tad faster as we keep squeezing more performance from this central command.
  • validate is now faster and more memory efficient due to optimizations in the jsonschema crate and minor performance improvements in the validate command itself.

Added

  • extdedup: now supports two modes - LINE mode and CSV mode https://github.com/jqnatividad/qsv/pull/2208
  • extsort: now also has two modes - CSV mode and LINE mode https://github.com/jqnatividad/qsv/pull/2210
  • sqlp: add --cache-schema option https://github.com/jqnatividad/qsv/pull/2224
  • added sqlp --cache-schema benchmarks

Changed

  • apply & applydp: use smallvec for operations vector & other minor performance optimizations https://github.com/jqnatividad/qsv/pull/2219 & https://github.com/jqnatividad/qsv/commit/bc837ae698f3aee06ea9b846b98ea0c75820a22d
  • apply & applydp: specify min_length for parallel iterators https://github.com/jqnatividad/qsv/commit/7d6ce5ec9675755abd5942a5e9e731592961700d
  • fetch & fetchpost: replace jql with jaq https://github.com/jqnatividad/qsv/pull/2222
  • stats: performance optimizations https://github.com/jqnatividad/qsv/commit/f205809549ac275078a95bc2821a583611955ad0 https://github.com/jqnatividad/qsv/commit/e26c27f58df688d7bfb2185ad54d4fe010b1fccf https://github.com/jqnatividad/qsv/commit/4579c1bfba4eca21d7480694780e39f6966a88a0
  • validate: specify min_length for parallel iterators https://github.com/jqnatividad/qsv/commit/a5b818562d5db7d65f00e5acd2c8bf7d44bd869a
  • build(deps): bump calamine from 0.26.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2204
  • build(deps): bump csvs_convert from 0.8.14 to 0.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2215
  • build(deps): bump flexi_logger from 0.29.2 to 0.29.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2209
  • build(deps): bump jsonschema from 0.23.0 to 0.24.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2223
  • build(deps): bump pyo3 from 0.22.3 to 0.22.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2207
  • build(deps): bump pyo3 from 0.22.4 to 0.22.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2212
  • build(deps): bump redis from 0.27.3 to 0.27.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2202
  • build(deps): bump redis from 0.27.4 to 0.27.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2217
  • build(deps): bump serde_json from 1.0.129 to 1.0.130 by @dependabot in https://github.com/jqnatividad/qsv/pull/2218
  • build(deps): bump serde_json from 1.0.131 to 1.0.132 by @dependabot in https://github.com/jqnatividad/qsv/pull/2220
  • build(deps): bump uuid from 1.10.0 to 1.11.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2213
  • apply select clippy lints
  • bumped indirect dependencies
  • bumped MSRV to 1.82

Fixed:

  • fix performance regression in batched commands by refactoring optimal_batch_size to require indexed CSV files https://github.com/jqnatividad/qsv/pull/2206

Removed:

  • fetch & fetchpost: removed jql options; replaced with jaq https://github.com/jqnatividad/qsv/pull/2222

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.136.0...0.137.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.136.0

:tada: qsv pro is now available in the Microsoft Store! :tada:

It's Data Wrangling Democratized on the Desktop, featuring:

  • :bar_chart: Familiar Spreadsheet Interface
    tap the power of qsv to query, analyze, enrich, scrub and transform huge Excel files and multi-gigabyte CSV files in seconds, without having to deal with the command-line.
  • CKAN CKAN desktop client
    designed to make data publishing easier for portal operators and data stewards using the CKAN CKAN platform.
  • :inbox_tray: Flow
    allows you to build custom node-based flows and data pipelines using a visual interface.
  • :wrench: Toolbox
    features an ever-expanding library of reusable scripts for common data-wrangling use cases.
  • :star: and more!
    Natural Language Interface (RAG), Polars SQL query support, an API, Python/Luau support, automatic Data Dictionaries, DCAT 3 metadata profile inferencing, along with a retinue of other cloud-based services (e.g. customizable street-level geocoding, data feeds, reference data lookups, geo-ip lookups, cloud storage support, .qsv file format, etc.) that will be unveiled in future versions.

Like qsv, we're iterating rapidly with qsv pro, so your feedback is essential. Give it a try!

Get it from https://qsvpro.dathere.com or

Other highlights: * excel: new --table option for XLSX files; new --header-row option; expanded --range option, adding support for Named Ranges and absolute ranges (e.g. Sheet2!$A$1:$J$10); and expanded metadata export now including Named Ranges and Tables (for XLSX files) * Improved performance for several commands (apply, datefmt, tojsonl and validate) through automatic batch size optimization * validate: dynamicEnum custom JSON Schema keyword in validate command (renamed from dynenum) and enhanced email validation * schema: automatic JSON Schema const inferencing for columns with just one value * Significant dependency updates, including latest upstream versions of Polars, jsonschema, and serde_json with unreleased performance upgrades, new features and fixes

NOTE: You can see qsv & qsv pro in action in our "The Problem with Data Portals" webinar Wed, Oct 23, 2024. 1-2pm EDT

Added

  • :tada: qsv pro is now in the Microsoft Store!!! :tada:
  • apply, datefmt, tojsonl, validate: added logic to automatically determine optimal batch size for better parallelization https://github.com/jqnatividad/qsv/pull/2178
  • enum: added --new-column support for all enum modes, not just --increment https://github.com/jqnatividad/qsv/pull/2173
  • excel: new --table option for XLSX files https://github.com/jqnatividad/qsv/pull/2194
  • excel: new --header-row option https://github.com/jqnatividad/qsv/commit/458f79ad9f4da504c68d73b48e83ad53b9634027
  • excel: expanded range and metadata options https://github.com/jqnatividad/qsv/pull/2195
  • schema: added JSON Schema automatic const inferencing https://github.com/jqnatividad/qsv/pull/2180
  • Add signing step to qsv MSI installer GitHub Action by @rzmk in https://github.com/jqnatividad/qsv/pull/2182
  • contrib(completions): add --table option to qsv excel by @rzmk in https://github.com/jqnatividad/qsv/pull/2197
  • completions: add --header-row option to qsv excel https://github.com/jqnatividad/qsv/commit/e8794d569185245f857659cdc299ea86029dd841
  • added new apply operations sentiment benchmark https://github.com/jqnatividad/qsv/commit/b745e6438b64686810e4d1df4fa2e6748ba93ff8
  • docs: added indexing section to PERFORMANCE.md https://github.com/jqnatividad/qsv/commit/804145a5304091c36728a8cdde4d56f879f71c15

Changed

  • stats: various minor micro-optimizations https://github.com/jqnatividad/qsv/commit/62d95fc6db2c34916160db88e4235719749a5f23 https://github.com/jqnatividad/qsv/commit/2c2862a75d6c0b2651516da30a7e6207a0043670
  • validate: renamed custom keyword dynenum to dynamicEnum to be more consistent with JSON schema naming conventions https://github.com/jqnatividad/qsv/compare/0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cf
  • validate: optimizations for increased performance; replace serdejson with simdjson https://github.com/jqnatividad/qsv/compare/0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cf
  • apply new clippy::ref_option lint to Config::new API https://github.com/jqnatividad/qsv/pull/2192
  • Update debian package readme by @tino097 in https://github.com/jqnatividad/qsv/pull/2187
  • deps: bump calamine from 0.25 to 0.26 https://github.com/jqnatividad/qsv/commit/b42279a66144264bde9333068c47c530e3945f8c
  • deps: jsonschema use latest 0.22.3 upstream with unreleased features/fixes
  • deps: polars use latest 0.43.1 upstream with unreleased features/fixes
  • deps: created our own fork of unmaintained vader_sentiment crate https://github.com/jqnatividad/qsv/commit/b4267610f39d13eb8939c86f3b5e70033aa95a0c
  • deps: use serde_json upstream with unreleased perf improvement/fixes https://github.com/jqnatividad/qsv/blob/1c1174b3b8b65d9dfd9c841597366fb09d0a047c/Cargo.toml#L221
  • build(deps): bump flate2 from 1.0.33 to 1.0.34 by @dependabot in https://github.com/jqnatividad/qsv/pull/2171
  • build(deps): bump flexi_logger from 0.29.0 to 0.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2189
  • build(deps): bump flexi_logger from 0.29.1 to 0.29.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2196
  • build(deps): bump hashbrown from 0.14.5 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2186
  • build(deps): bump jsonschema from 0.20.0 to 0.21.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2177
  • build(deps): bump jsonschema from 0.22.1 to 0.22.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2191
  • build(deps): bump regex from 1.10.6 to 1.11.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2176
  • build(deps): bump reqwest from 0.12.7 to 0.12.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/2183
  • build(deps): bump simd-json from 0.14.0 to 0.14.1 https://github.com/jqnatividad/qsv/pull/2199
  • build(deps): bump simple-expand-tilde from 0.4.2 to 0.4.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2190
  • build(deps): bump sysinfo from 0.31.4 to 0.32.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2193
  • build(deps): bump tempfile from 3.12.0 to 3.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2175
  • apply select clippy lints
  • bumped indirect dependencies
  • aligned Rust nightly to Polars nightly - 2024-09-29 https://github.com/jqnatividad/qsv/commit/7cd2de1151b2299d9b75a9c8b1a3e21dc9c992e2

Fixed

  • schema: fix enum so it only adds a list when the number of unique values > --enum-threshold https://github.com/jqnatividad/qsv/pull/2180
  • Upload artifact fix for Debian package publishing by @tino097 in https://github.com/jqnatividad/qsv/pull/2168
  • fixed typos configuration https://github.com/jqnatividad/qsv/commit/627de891d8fd358aadf8c302552e8a99c54ed959
  • fixed various GitHub Actions publishing workflow issues

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.135.0...0.136.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.135.0

Highlights

JSON Schema validation just got a whole lot more powerful with the introduction of qsv's custom dynenum keyword! With dynenum, you can now dynamically lookup valid enum values from a CSV (on the filesystem or on a URL), allowing for more flexible and responsive data validation.

Unlike the standardenum keyword, dynenum does not require hardcoding valid values at schema definition time, and can be used to validate data against a changing set of valid values.

For an example, see https://github.com/jqnatividad/qsv/discussions/1872#discussioncomment-10725628.

In an upcoming qsv pro release, we're planning on making dynenum even more powerful by allowing you to easily specify high-value reference data (e.g. US Census data, World Bank data, data.gov, etc.) that is maintained at data.dathere.com and other CKAN instances.

This release also add the custom currency JSON Schema format, which enables currency validation according to the ISO 4217 standard.

The Polars engine was also upgraded to 0.43.1 at the py-1.81.1 tag - making for various under-the-hood improvements for the sqlp, joinp and count commands, as we set the stage for more Polars-powered features in future releases.


Added

  • foreach: enabled foreach command on Windows prebuilt binaries https://github.com/jqnatividad/qsv/commit/def9c8fa98cd214f0db839b64bcd12764dcfba43
  • lens: added support for QSVSNIFFDELIMITER env var and snappy auto-decompression https://github.com/jqnatividad/qsv/commit/8340e8949c4b60669bc95c432c661a8c374ca422
  • sample: add --max-size option https://github.com/jqnatividad/qsv/commit/e845a3cc1dcbbceda86bb7fe132c5040d23ce78b
  • validate: added dynenum custom JSON Schema keyword for dynamic validation lookups https://github.com/jqnatividad/qsv/pull/2166
  • tests: add tests for https://100.dathere.com/lessons/2 by @rzmk in https://github.com/jqnatividad/qsv/pull/2141
  • added stats_sorted and frequency_sorted benchmarks
  • added validate_dynenum benchmarks

Changed

  • json: add error for empty key and update usage text by @rzmk in https://github.com/jqnatividad/qsv/pull/2167
  • prompt: gate prompt command behind prompt feature https://github.com/jqnatividad/qsv/pull/2163
  • validate: expanded currency JSON Schema custom format to support ISO 4217 currency codes and alternate formats https://github.com/jqnatividad/qsv/commit/5202508e5c3969b279c20cf80bb1e37d89afd826
  • validate: migrate to new jsonschema crate api https://github.com/jqnatividad/qsv/commit/5d6505426c652e7db4bb602c1bf9d302e6a09214
  • Update ubuntu version for deb package by @tino097 in https://github.com/jqnatividad/qsv/pull/2126
  • contrib(completions): update completions for qsv v0.134.0 and fix subcommand options by @rzmk in https://github.com/jqnatividad/qsv/pull/2135
  • contrib(completions): add --max-size completion for sample by @rzmk in https://github.com/jqnatividad/qsv/pull/2142
  • deps: bump to polars 0.43.1 at py-1.81.1 https://github.com/jqnatividad/qsv/pull/2130
  • deps: switch back to calamine upstream instead of our fork https://github.com/jqnatividad/qsv/commit/677458faa4439b1b34c8a3556687a031ed184e4e
  • build(deps): bump actix-governor from 0.5.0 to 0.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2146
  • build(deps): bump anyhow from 1.0.87 to 1.0.88 by @dependabot in https://github.com/jqnatividad/qsv/pull/2132
  • build(deps): bump arboard from 3.4.0 to 3.4.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2137
  • build(deps): bump bytes from 1.7.1 to 1.7.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2148
  • build(deps): bump geosuggest-core from 0.6.3 to 0.6.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2153
  • build(deps): bump geosuggest-utils from 0.6.3 to 0.6.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2154
  • build(deps): bump jql-runner from 7.1.13 to 7.2.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2165
  • build(deps): bump jsonschema from 0.18.1 to 0.18.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2127
  • build(deps): bump jsonschema from 0.18.2 to 0.18.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2134
  • build(deps): bump jsonschema from 0.18.3 to 0.19.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2144
  • build(deps): bump jsonschema from 0.19.1 to 0.20.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2152
  • build(deps): bump pyo3 from 0.22.2 to 0.22.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2143
  • build(deps): bump rfd from 0.14.1 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2151
  • build(deps): bump simple-expand-tilde from 0.4.0 to 0.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2129
  • build(deps): bump qsv_currency from 0.6.0 to 0.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2159
  • build(deps): bump qsv_docopt from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2136
  • build(deps): bump redis from 0.26.1 to 0.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2133
  • build(deps): bump simdutf8 from 0.1.4 to 0.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2164
  • bump indirect dependencies
  • apply select clippy lint suggestions
  • several usage text/documentation improvements
  • bump MSRV to 1.81.0

Fixed

  • validate: correct fail_validation_error! macro; reformat error messages to use hyphens as the JSONschema error message already starts with "error:" https://github.com/jqnatividad/qsv/commit/9a2552481a07759847efe6025b402297ecba7e19
  • moved --help output from stderr to stdout as per GNU CLI guidelines https://github.com/jqnatividad/qsv/pull/2138
  • lens: fixed parsing of lens options https://github.com/jqnatividad/qsv/commit/1cdd1bcac29fd2411521ac95fa87595de74cbb1b
  • searchset: fixed usage text for <regexset-file> https://github.com/jqnatividad/qsv/commit/9a60fb088a326ee97ed1b147c4c3686b6b8aaeeb
  • used patched forks of arrow, csvlens and xlsxwriter crates that replaces a dependency on an old version of lexical-core with known soundness issues - https://rustsec.org/advisories/RUSTSEC-2023-0086. Once those crates have updated their lexical-coredependency, we will revert to the original crates.

Removed

  • removed prompt command from qsvlite https://github.com/jqnatividad/qsv/pull/2163
  • publish: remove lens feature from i686 targets as it does not compile https://github.com/jqnatividad/qsv/commit/959ca7686f8656c98de9257d11f1f762852bdf9d
  • deps: remove anyhow dependency https://github.com/jqnatividad/qsv/pull/2150

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.134.0...0.135.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.134.0

Workflow demo Flow demo Toolbox demo

qsv pro command demo API demo Configurator demo

qsv pro v1 is here! 🎉

If you've been using qsv for a while, even if you're a command-line ninja, you'll find a lot of new capabilities in qsv pro that can make your data wrangling experience even better!

Apart from making qsv easier to use, qsv pro has a multitude of features including: view interactive data tables; browse stats/frequency/metadata; run recipes and tools (scripts); run Polars SQL queries; use Natural Language queries (using Retrieval Augmented Generation (RAG) techniques); regular expression search; export to multiple file formats; download/upload from/to compatible CKAN instances; design custom node-based flows and data pipelines; interact with a local API from external programs including the qsv pro command; run various qsv commands in a graphical user interface; and the list goes on!

And that's just the beginning, there's more to come! You just have to try it!

Download qsv pro v1 now at qsvpro.dathere.com.

Other highlights include:

  • pro: new command to allow qsv to interact with the qsv pro API to tap into qsv pro exclusive features.
  • lens: new command to interactively view CSVs using the csvlens crate.
  • The ludicrously fast diff command is now easier to use with its --drop-equal-fields option. @janriemer continues to work on his csv-diff crate, and there's more diff UX improvements coming soon!
  • stats adds sum_length and avg_length "streaming" statistics in addition to the existing min_length and max_length metrics. These are especially useful for datasets with a lot of "free text" columns.
  • stats also got "smarter" and "faster" by dog-fooding its own statistics to make it run faster!
    It's a little complicated, but the way stats works is that it compiles the "streaming" statistics on the fly first as it multiplex load the data across several threads, and the more expensive advanced statistics are "lazily" computed at the end.
    Since we now compile "sort order" in a streaming manner, we use this info when deriving cardinality at the end to see if we can skip sorting - an otherwise necessary step to get cardinality which is done by "scanning" all the sorted values of a column. Everytime two neighboring values differ in a sorted column, it increments the cardinality count.
    Apart from this "sort order" optimization, we also improved the "cardinality scan" algorithm - halving its memory footprint and making it faster still for larger datasets by parallelizing the computation. This in turn, makes the frequency command faster and more memory efficient.
    It's performance tweaks like these, that despite adding six metrics (is_ascii, sort_order, sum_length, avg_length, sem - standard error of the mean & cv - coefficient of variation) in recent releases, that stats is still able to compile 35 statistics and do GUARANTEED data type inferences of a million row, 41 column, 520 MB sample of NYC's 311 data in 1.327 seconds (753,580 records per second)![^1]
  • we now also use our own fork of the csv crate, featuring SIMD-accelerated UTF-8 validation and other minor perf tweaks, making the entire qsv suite faster still!

[^1]: see stats_everything_index benchmark


Added

  • pro: add qsv pro command to interact with qsv pro API by @rzmk in https://github.com/jqnatividad/qsv/pull/2039
  • lens: new command to interactively view CSVs using the csvlens crate https://github.com/jqnatividad/qsv/pull/2117
  • apply: add crc32 operation https://github.com/jqnatividad/qsv/pull/2121
  • count: add --delimiter option https://github.com/jqnatividad/qsv/pull/2120
  • diff: add flag --drop-equal-fields by @janriemer in https://github.com/jqnatividad/qsv/pull/2114
  • stats: add sum_length and avg_length columns https://github.com/jqnatividad/qsv/pull/2113
  • stats: smarter cardinality computation - added new parallel algorithm for large datasets (10,000+ rows) and updated sequential algorithm for smaller datasets https://github.com/jqnatividad/qsv/commit/4e63fec61a394ef2ddfa499c0cdd0958e677ad17

Changed

  • count: added comment to justify magic number https://github.com/jqnatividad/qsv/commit/5241e3972c05f024a0791be04632d03a06b2f9ce
  • stats: use simdjson for faster JSONL parsing; micro-optimize compute hot loop https://github.com/jqnatividad/qsv/commit/0e8b73451999a3e95bfd52246b1088aecd64b88f
  • stats: standardized OVERFLOW and UNDERFLOW messages https://github.com/jqnatividad/qsv/commit/38c61285704e5064a63c9dbb1ac866f18fa130fd
  • sort: renamed symbol so eliminate devskim lint false positive warning https://github.com/jqnatividad/qsv/commit/12db7397f68d3199e3311f402d5c7afed586b88c
  • enable lens feature in GH workflows https://github.com/jqnatividad/qsv/pull/2122
  • deps: bump polars 0.42.0 to latest upstream at time of release https://github.com/jqnatividad/qsv/commit/3c17ed12c3c763d644d9713afcc8442964f22de3
  • deps: use our own optimized fork of csv crate, with simdutf8 validation and other minor perf tweaks https://github.com/jqnatividad/qsv/commit/e4bcd7123172fa8d8094c305d7780e151c120db1
  • build(deps): bump serde from 1.0.209 to 1.0.210 by @dependabot in https://github.com/jqnatividad/qsv/pull/2111
  • build(deps): bump serde_json from 1.0.127 to 1.0.128 by @dependabot in https://github.com/jqnatividad/qsv/pull/2106
  • build(deps): bump qsv-stats from 0.19.0 to 0.22.0 https://github.com/jqnatividad/qsv/pull/2107 https://github.com/jqnatividad/qsv/pull/2112 https://github.com/jqnatividad/qsv/commit/cb1eb60a0a9fb3b9ba381183a2c29909f82efa42
  • apply select clippy lint suggestions
  • updated several indirect dependencies
  • made various doc and usage text improvements

Fixed

  • schema: Print an error if the qsv stats invocation fails by @abrauchli in https://github.com/jqnatividad/qsv/pull/2110

New Contributors

  • @abrauchli made their first contribution in https://github.com/jqnatividad/qsv/pull/2110

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.133.1...0.134.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.133.1

Highlights

| | | | ------------- | ------------- | | qsv-polars-0 133 0-relnotes[^1] | This release doubles down on Polars' capabilities, as we now, as a matter of policy track the latest polars upstream. If you think qsv has a torrid release schedule, you should see Polars. They're constantly fixing bugs, adding new features and optimizations!
To keep up, we've added Polars revision info to the --version output, and the --envlist option now includes Polars relevant env vars. We've also added support for the POLARS_BACKTRACE_IN_ERR env var to control whether Polars backtraces are included in error messages.
We also removed the to parquet subcommand as its redundant with the Polars-powered sqlp's ability to create parquet files. This removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries smaller. |

Other highlights include: - New edit command that allows you to edit CSV files. - The count command's --width option now includes record width stats beyond max length (avg, median, min, variance, stddev & MAD). - The fixlengths command now has --quote and --escape options. - The stats command adds a sort_order streaming statistic.

NOTE: 0.133.0 was skipped because of a dev dependency conflict with the csvs_convert crate, preventing us from publishing 0.133.0 to crates.io. This has been resolved in 0.133.1.

[^1]: ChatGPT prompt: Using the logos for the Polars project and the qsv project as a baseline, can you create a version with the cowboy riding a polar bear instead?

Added

  • count: expanded --width options, adding record width stats beyond max length (avg, median, min, variance, stddev & MAD). Also added --json output when using --width https://github.com/jqnatividad/qsv/pull/2099
  • edit: add qsv edit command by @rzmk in https://github.com/jqnatividad/qsv/pull/2074
  • fixlengths: added --quote and --escape options https://github.com/jqnatividad/qsv/pull/2104
  • stats: add sort_order streaming statistic https://github.com/jqnatividad/qsv/pull/2101
  • polars: add polars revision info to --version output https://github.com/jqnatividad/qsv/commit/e60e44f99061c37758bd53dfa8511c16d49ceed5
  • polars: added Polars relevant env vars to --envlist option https://github.com/jqnatividad/qsv/commit/0ad68fed94f7b5059cca6cf96cec4a3b55638e60
  • polars: add & document POLARS_BACKTRACE_IN_ERR env var https://github.com/jqnatividad/qsv/commit/f9cc5595664d4665f0b610fcbac93c30fa445056

Changed

  • Optimize polars optflags https://github.com/jqnatividad/qsv/pull/2089
  • deps: bump polars 0.42.0 to latest upstream at time of release https://github.com/jqnatividad/qsv/commit/3b7af519343f08919f114c7307f0f561d04f93e8
  • bump polars to latest upstream, removing smartstring https://github.com/jqnatividad/qsv/pull/2091
  • build(deps): bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2094
  • build(deps): bump flate2 from 1.0.32 to 1.0.33 by @dependabot in https://github.com/jqnatividad/qsv/pull/2085
  • build(deps): bump flexi_logger from 0.28.5 to 0.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2086
  • build(deps): bump indexmap from 2.4.0 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2096
  • build(deps): bump jsonschema from 0.18.0 to 0.18.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2084
  • build(deps): bump serde from 1.0.208 to 1.0.209 by @dependabot in https://github.com/jqnatividad/qsv/pull/2082
  • build(deps): bump serde_json from 1.0.125 to 1.0.127 by @dependabot in https://github.com/jqnatividad/qsv/pull/2079
  • build(deps): bump sysinfo from 0.31.2 to 0.31.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2077
  • build(deps): bump qsv-stats from 0.18.0 to 0.19.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2100
  • build(deps): bump tokio from 1.39.3 to 1.40.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2095
  • apply select clippy lint suggestions
  • updated several indirect dependencies
  • made various doc and usage text improvements
  • pin Rust nightly to 2024-08-26 from 2024-07-26, aligning with Polars pinned nightly

Fixed

  • Ensure portable binaries are "added" to the publish zip archive, instead of replacing all the binaries with just the portable version. Fixes #2083. https://github.com/jqnatividad/qsv/commit/34ad2067007a86ffad6355f7244163c4105a98f2

Removed

  • removed to parquet subcommand as its redundant with sqlp's ability to create parquet files. This also removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries much smaller https://github.com/jqnatividad/qsv/pull/2088
  • removed smartstring dependency now that Polars has its own compact inlined string type https://github.com/jqnatividad/qsv/commit/47f047e6ee10916b5caa19ee829471e9fb6f4bea
  • removed to parquet benchmark

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.132.0...0.133.1

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.132.0

Highlights

With this release, we finally finish the stats caching refactor started in 0.131.0, replacing the binary encoded stats cache with a simpler JSONL cache. The stats cache stores the necessary statistical metadata to make several key commands smarter & faster. Per the benchmarks:

  • frequency is 6x faster (frequency_index_stats_mode_auto).
    Not only is it faster, it now doesn't need to compile a hashmap for columns with ALL unique values (e.g. ID columns) - practically, making it able to handle "real-world" datasets of any size (that is, unless all the columns have ALL unique cardinalities. In that case, the entire CSV will have to fit into memory).
  • tojsonl is 2.67x faster (tojsonl_index)
  • schema is two orders of magnitude (100x) faster!!! (schema_index)

The stats cache also provides the foundation for even more "smart" features and commands in the future. It also has the side-benefit of adding a way to produce stats in JSONL format that can be used for other purposes beyond qsv.

The search, searchset, and replace commands now also have a --literal option that allows you to search for and replace strings with regex special/reserved characters. This makes it easier to search for and replace strings that contain otherwise reserved regex characters without having to escape them (especially useful with URL columns that often contain characters like ?,:,-,., etc.)


Added

  • search, searchset & replace: add --literal option https://github.com/jqnatividad/qsv/pull/2060 & https://github.com/jqnatividad/qsv/commit/7196053b36c8886092fe25fd030ccf1cf765ed6a
  • slice: added usage text examples https://github.com/jqnatividad/qsv/commit/04afaa3d5a6e51c75f3f9041515c1d7986c45777
  • publish: added workflow to build "portable" binaries with CPU features disabled
  • contrib(completions): add --literal for search and searchset by @rzmk in https://github.com/jqnatividad/qsv/pull/2061
  • contrib(completions): add --literal completion to replace by @rzmk in https://github.com/jqnatividad/qsv/pull/2062
  • add more polars metadata in --version info https://github.com/jqnatividad/qsv/pull/2073
  • docs: added more info to SECURITY.md https://github.com/jqnatividad/qsv/commit/609d4df61c93de6959f07e8d972009ae6cd12b78
  • docs: expanded Goals/Non-Goals https://github.com/jqnatividad/qsv/commit/54998e36eb4608a1fba7938836e5985b699e32ff
  • docs: added Installation "Option 0" quick start https://github.com/jqnatividad/qsv/commit/bf5bf82105397436d901de247398fce3e808b122
  • added search --literal benchmark

Changed

  • stats, schema, frequency & tojsonl: stats caching refactor, replacing binary encoded stats cache with a simpler JSONL cache https://github.com/jqnatividad/qsv/pull/2055
  • rename stats --stats-json option to stats --stats-jsonl https://github.com/jqnatividad/qsv/pull/2063
  • changed "broken pipe" error to a warning https://github.com/jqnatividad/qsv/commit/73532759a8dad2d643f283296aa402950765b648
  • docs: update multithreading and caching sections of PERFORMANCE.md https://github.com/jqnatividad/qsv/commit/5e6bc455bc544003535e18f99493cc1a20c4a2ce
  • deps: switch to our qsv-optimized fork of csv crate https://github.com/jqnatividad/qsv/commit/3fc1e82c83b5dec23d3ba610e3d0f9bbd2924788
  • deps: bump polars from 0.41.3 to 0.42.0 https://github.com/jqnatividad/qsv/pull/2051
  • build(deps): bump actix-web from 4.8.0 to 4.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2041
  • build(deps): bump flate2 from 1.0.31 to 1.0.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/2071

  • build(deps): bump indexmap from 2.3.0 to 2.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2049

  • build(deps): bump reqwest from 0.12.6 to 0.12.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/2070

  • build(deps): bump rust_decimal from 1.35.0 to 1.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2068

  • build(deps): bump serde from 1.0.205 to 1.0.206 by @dependabot in https://github.com/jqnatividad/qsv/pull/2043

  • build(deps): bump serde from 1.0.206 to 1.0.207 by @dependabot in https://github.com/jqnatividad/qsv/pull/2047

  • build(deps): bump serde from 1.0.207 to 1.0.208 by @dependabot in https://github.com/jqnatividad/qsv/pull/2054

  • build(deps): bump serde_json from 1.0.122 to 1.0.124 by @dependabot in https://github.com/jqnatividad/qsv/pull/2045

  • build(deps): bump serde_json from 1.0.124 to 1.0.125 by @dependabot in https://github.com/jqnatividad/qsv/pull/2052

  • apply select clippy lint suggestions

  • updated several indirect dependencies

  • made various usage text improvements

Fixed

  • stats: fix --output delimiter inferencing based on file extension https://github.com/jqnatividad/qsv/pull/2065
  • make process_input helper handle stdin better https://github.com/jqnatividad/qsv/pull/2058
  • docs: fix completions for --stats-jsonl and qsv pro installation text update by @rzmk in https://github.com/jqnatividad/qsv/pull/2072
  • docs: added Note about why luau feature is disabled in musl binaries - https://github.com/jqnatividad/qsv/commit/ffa2bc5a3f397b406347d14d0d4fb4ead49cb470 & https://github.com/jqnatividad/qsv/commit/27d0f8e1c2e43c00b99abf98dfa01a4758cf9bad

Removed

  • Removed bincode dependency now that we're using JSONL stats cache https://github.com/jqnatividad/qsv/pull/2055 https://github.com/jqnatividad/qsv/commit/babd92bbae473ed63f44f593bc1ab0ad1bc17761

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.131.1...0.132.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.131.1

Changed

  • deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
  • build(deps): bump filetime from 0.2.23 to 0.2.24 by @dependabot in https://github.com/jqnatividad/qsv/pull/2038

Fixed

  • frequency: change --stats-mode default to none from auto.
    This is because of a big performance regression when using --stats-mode auto on datasets with columns with ALL unique values. See https://github.com/jqnatividad/qsv/issues/2040 for more info.

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.131.0...0.131.1

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.131.0

Highlights

  • Refactored frequency to make it smarter and faster.
    frequency's core algorithm essentially compiles an in-memory hashmap to determine the frequency of each unique value for each column. It does this using multi-threaded, multi-I/O techniques to make it blazing fast.
    However, for columns with ALL unique values (e.g. ID columns), this takes a comparatively long time and consumes a lot of memory as it essentially compiles a hashmap of the ENTIRE column, with a hashmap entry for each column value with a count of 1.
    Now, with the new --stats-mode option (enabled by default), frequency can compile the dataset in a more intelligent way by looking up a column's cardinality in the stats cache.
    If the cardinality of a column is equal to the CSV's rowcount (indicating a column with ALL unique values), it short-circuits frequency calculations for that column - dramatically reducing the time and memory requirements for the ID column as it eliminates the need to maintain a hashmap for it.
    Practically speaking, this makes frequency able to handle "real-world" datasets of any size.
    To ensure frequency is as fast as possible, be sure to index and compute stats for your datasets beforehand.
  • Setting the stage for Datapusher+ v1 and...
    The "itches we've been scratching" the past few months have been informed by our work at several clients towards the release of Datapusher+ 1.0 and qsv pro 1.0 (more info below) - both targeted for release this month.
    DP+ is our third-gen, high-speed data ingestion/registration tool for CKAN that uses qsv as its data wrangling/analysis engine. It will enable us to reinvent the way data is ingested into CKAN - with exponentially faster data ingestion, metadata inferencing, data validation, computed metadata fields, and more!
    We're particularly excited how qsv will allow us to compute and infer high-quality metadata for datasets (with a focus on inferring optional recommended DCAT-US v3 metadata fields) in "near real-time", while dataset publishers are still entering metadata. This will be a game-changer for CKAN administrators and data publishers!
  • ...qsv pro 1.0
    qsv pro is datHere's enterprise-grade data wrangling/curation workbench that’s planned for v1.0 release this month. Building the core functionality of qsv pro's Workflow feature is one of the primary reasons for a v1.0 release.
    We feel qsv pro may be a game-changer for data wranglers and data curators who need to work with spreadsheets and large datasets to view statistical data and metadata while also performing complex data wrangling operations in a user-friendly way without having to write code.

Added

  • docs: added Shell Completion section https://github.com/jqnatividad/qsv/commit/556a2ff48660d05f8e9a865ec427e98114f49b43
  • docs: add 🪄 emoji in legend to indicate "automagical" commands https://github.com/jqnatividad/qsv/commit/2753c90fcbd1cc1b41dae0a51d26bfe704029ee8
  • Add building deb package (WIP) by @tino097 in https://github.com/jqnatividad/qsv/pull/2029
  • Added GitHub workflow to test debian package (WIP) by @tino097 in https://github.com/jqnatividad/qsv/pull/2032
  • tests: added false positive to _typos.toml configuration https://github.com/jqnatividad/qsv/commit/d576af229bf76b7d0e1f40eb37b578a6b6691ed4
  • added more benchmarks
  • added more tests

Changed

  • fetch & fetchpost: remove expired diskcache entries on startup https://github.com/jqnatividad/qsv/commit/9b6ab5db91416f71577b8a1fc91e2e3189a1bd4b
  • frequency: smarter frequency compilation with new --stats-mode option https://github.com/jqnatividad/qsv/pull/2030
  • json: refactored for maintainability & performance https://github.com/jqnatividad/qsv/commit/62e92162a4aa446097736ec76834cf0e06d195b8 and https://github.com/jqnatividad/qsv/commit/4e44b1878b952c455c1922a66795b8c86a1b1dba
  • improved self-update messages https://github.com/jqnatividad/qsv/commit/5c874e09e15a274dccd8f83a322002032e65c2d0 and https://github.com/jqnatividad/qsv/commit/0aa0b13cf34103cfb75befc6480f31714d806aa2
  • contrib(completions): frequency updates & remove bashly/fish by @rzmk in https://github.com/jqnatividad/qsv/pull/2031
  • Debian package update by @tino097 in https://github.com/jqnatividad/qsv/pull/2017
  • publish: optimized enabled CPU features when building release binaries in all GitHub Actions "publishing" workflows
  • publish: ensure latest Python patch release is used when building qsvpy binary variants https://github.com/jqnatividad/qsv/commit/2ab03a097645a95b0d390f546ad9735c9a7e72b2 and https://github.com/jqnatividad/qsv/commit/ec6f486ef112cf942b2263b84b97d90cba1beb12
  • tests: also enabled CPU features in CI tests
  • docs: wordsmith qsv "elevator pitch" https://github.com/jqnatividad/qsv/commit/cc47fe688eeeb13b4deb3f3bf48d954924eee22e
  • docs: point to https://100.dathere.com in Whirlwind tour https://github.com/jqnatividad/qsv/commit/fc49aef826c1b1933ea1508cb687476936a147ff
  • deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
  • build(deps): bump bytes from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2018
  • build(deps): bump bytes from 1.7.0 to 1.7.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2021
  • build(deps): bump flate2 from 1.0.30 to 1.0.31 by @dependabot in https://github.com/jqnatividad/qsv/pull/2027
  • build(deps): bump indexmap from 2.2.6 to 2.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2020
  • build(deps): bump jaq-parse from 1.0.2 to 1.0.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2016
  • build(deps): bump redis from 0.26.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2023
  • build(deps): bump regex from 1.10.5 to 1.10.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/2025
  • build(deps): bump serde_json from 1.0.121 to 1.0.122 by @dependabot in https://github.com/jqnatividad/qsv/pull/2022
  • build(deps): bump sysinfo from 0.30.13 to 0.31.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2019
  • build(deps): bump sysinfo from 0.31.0 to 0.31.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2024
  • build(deps): bump tempfile from 3.11.0 to 3.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2033
  • build(deps): bump serde from 1.0.204 to 1.0.205 by @dependabot in https://github.com/jqnatividad/qsv/pull/2036
  • apply select clippy suggestions
  • updated several indirect dependencies
  • made various usage text improvements
  • bumped MSRV to 1.80.1

Fixed

  • sqlp & joinp: fixed .ssv.sz output auto-compression support https://github.com/jqnatividad/qsv/commit/5397f6c7a3b083872bbb97d90db3a2fd2f8521e6 & https://github.com/jqnatividad/qsv/commit/d86ba6376d5819898187d5fa88eae19373022e5b
  • docs: fix link by @uncenter in https://github.com/jqnatividad/qsv/pull/2026
  • tests: correct misnamed test https://github.com/jqnatividad/qsv/commit/8ae600011ddb109e7993e54dae9b933d15eccd38
  • tests: fix flaky reverse property tests https://github.com/jqnatividad/qsv/commit/d86ba6376d5819898187d5fa88eae19373022e5b

Removed

  • docs: "Quicksilver" is the name of the logo horse, not how you pronounce "qsv" https://github.com/jqnatividad/qsv/commit/e4551ae4b62a3a635b7c351c5f28aa2a7d374958

New Contributors

  • @uncenter made their first contribution in https://github.com/jqnatividad/qsv/pull/2026

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.130.0...0.131.0

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.130.0

Following the 0.129.0 release - the largest release to date, 0.130.0 continues to polish qsv as a data-wrangling engine, packing new features, fixes, and improvements, previewing upcoming features in qsv pro 1.0. Here are a few highlights:

Highlights

  • Added .ssv (semicolon separated values) automatic support. Semicolon separated values are now automatically detected and supported by qsv. Though not as common as CSV, SSV is used in some regions and industries, so qsv now supports it.
  • Added cargo deb compatibility. In preparation for the release of DataPusher+ 1.0, we're now making it easier to upgrade qsvdp so CKAN administrators can install and upgrade it easily using apt-get install qsvdp or apt-get upgrade qsvdp. DP+ is our next-gen, high-speed data ingestion tool for CKAN that uses qsv as its analysis engine. Its not only a robust, fast, validating data pump that guarantees high quality data, it also does extended analysis to infer and automatically derive high-quality metadata - what we call "automagical metadata".
  • Upgraded to the latest Polars upstream at the py-polars-1.3.0 tag. Polars tops the TPC-H Benchmark and is several orders of magnitude faster than traditional dataframe libraries (cough - 🐼 pandas). qsv proudly rides the 🐻‍❄️ Polars bear to get subsecond response times even with very large datasets!
  • qsv v0.130.0 shell completions files are available for download here. With shell completions, pressing tab in a compatible shell provides suggestions for various qsv commands, subcommands, and options that you can choose from. Supported shells include bash, zsh, powershell, fish, nushell, fig, and elvish. View tips on how to install completions for the bash shell here.

Added

  • apply: add base62 encode/decode operations https://github.com/jqnatividad/qsv/pull/2013
  • headers: add --just-count option https://github.com/jqnatividad/qsv/pull/2004
  • json: add --select option https://github.com/jqnatividad/qsv/pull/1990
  • searchset: add --not-one flag by @rzmk in https://github.com/jqnatividad/qsv/pull/1994
  • Added .ssv (semicolon separated values) automatic support https://github.com/jqnatividad/qsv/pull/1987
  • Added cargo deb compatibility by @tino097 in https://github.com/jqnatividad/qsv/pull/1991
  • contrib(completions): add --just-count for headers by @rzmk in https://github.com/jqnatividad/qsv/pull/2006
  • contrib(completions): add --select for json by @rzmk in https://github.com/jqnatividad/qsv/pull/1992
  • added several benchmarks
  • added more tests

Changed

  • diff: allow selection of --key and --sort-columns by name, not just by index https://github.com/jqnatividad/qsv/pull/2010
  • fetch & fetchpost: replace deprecated Redis execute command https://github.com/jqnatividad/qsv/commit/75cbe2b76426591e4658fdcb7d29287a40a7db36
  • stats: more intelligent --infer-lenoption https://github.com/jqnatividad/qsv/commit/c6a0e641cd4c6ef87c070c8944f32a962a11c7e3
  • validate: return delimiter detected upon successful CSV validation https://github.com/jqnatividad/qsv/pull/1977
  • bump polars to latest upstream at py-polars-1.3.0 tag https://github.com/jqnatividad/qsv/pull/2009
  • deps: bump csvs_convert from 0.8.12 to 0.8.13 https://github.com/jqnatividad/qsv/commit/d1d08009deb0579fd4d6fe305097e00e92da4191
  • build(deps): bump cached from 0.52.0 to 0.53.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1983
  • build(deps): bump cached from 0.53.0 to 0.53.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1986
  • build(deps): bump postgres from 0.19.7 to 0.19.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1985
  • build(deps): bump pyo3 from 0.22.1 to 0.22.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1979
  • build(deps): bump redis from 0.25.4 to 0.26.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1995
  • build(deps): bump serde_json from 1.0.120 to 1.0.121 by @dependabot in https://github.com/jqnatividad/qsv/pull/2011
  • build(deps): bump simple-expand-tilde from 0.1.7 to 0.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1984
  • build(deps): bump tokio from 1.38.0 to 1.38.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1973
  • build(deps): bump tokio from 1.38.1 to 1.39.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1988
  • build(deps): bump xxhash-rust from 0.8.11 to 0.8.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1997
  • apply select clippy suggestions
  • updated several indirect dependencies
  • made various usage text improvements
  • pin Rust nightly to 2024-07-26

Fixed

  • diff: clarify --key usage examples, resolves #1998 by @rzmk in https://github.com/jqnatividad/qsv/pull/2001
  • json: refactored so it didn't need to use threads to spawn qsv select to order the columns. Had to do this as sometimes intermediate output was sent to stdout before the final output was ready https://github.com/jqnatividad/qsv/commit/0f25deff98139b574dfd61c6e9bf58d36ea16618
  • py: replace row with col in usage text by @allen-chin in https://github.com/jqnatividad/qsv/pull/2008
  • reverse: fix indexed bug https://github.com/jqnatividad/qsv/pull/2007
  • validate: properly auto-detect tab delimiter when file extension is TSV or TAB https://github.com/jqnatividad/qsv/pull/1975
  • fix panic when process_input helper fn receives unexpected input from stdin https://github.com/jqnatividad/qsv/commit/152fec486c0e7b16242f3967930e9654ff2bdf3c

Removed

  • docs: remove *nix only message for foreach by @rzmk in https://github.com/jqnatividad/qsv/pull/1972

New Contributors

  • @tino097 made their first contribution in https://github.com/jqnatividad/qsv/pull/1991
  • @allen-chin made their first contribution in https://github.com/jqnatividad/qsv/pull/2008

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.129.1...0.130.0


To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.129.1

This is a small patch release to fix some publishing issues, update tab completion, and to fix minor CI errors. See 0.129.0 release notes to get the details on qsv's biggest release to date!

Changed

  • clipboard: add error handling based on clipboard::Error by @rzmk in https://github.com/jqnatividad/qsv/pull/1970
  • contrib(completions): add all commands (except applydp & generate) by @rzmk in https://github.com/jqnatividad/qsv/pull/1971
  • Temporarily suppressed some CI tests that were flaky on GH macOS Apple Silicon action runners. They previously worked fine on self-hosted macOS Apple Silicon action runners that are temporarily unavailable.

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.129.0...0.129.1

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.129.0

This release is the biggest one ever!

Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:

📌 Highlights (click each dropdown for more info)

Meet @rzmk - qsv pro's software engineer now also co-maintains qsv! @rzmk has contributed to projects in the qsv ecosystem including qsv's [`describegpt`](https://github.com/jqnatividad/qsv/tree/master/src/main/describegpt.rs), [`prompt`](https://github.com/jqnatividad/qsv/tree/master/src/main/prompt.rs), [`json`](https://github.com/jqnatividad/qsv/tree/master/src/main/json.rs), and [`clipboard`](https://github.com/jqnatividad/qsv/tree/master/src/main/clipboard.rs) commands; qsv's tab completion support; [qsv.dathere.com](https://qsv.dathere.com) including its online configurator and benchmarks page; [100.dathere.com](https://100.dathere.com) with its qsv lessons and exercises; and [qsv pro](https://qsvpro.dathere.com) the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv! With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "[automagical](https://dathere.com/2023/11/automagical-metadata/)" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.
Polars v0.41.3 - numerous sqlp and joinp improvements * `sqlp`: expanded SQL support - Natural Join support - DuckDB-like `COLUMNS` SQL function to select columns that match a pattern - ORDER BY ALL support - Support POSTGRESQL `^@` ("starts with"), `~~`,`~~*`,`!~~`,`!~~*` ("like", "ilike") string-matching operators - Support for SQL `SELECT * ILIKE` wildcard syntax - Support SQL temporal functions `STRFTIME` and `STRPTIME` * `sqlp`: added `--streaming` option
New command qsv prompt - Use a file dialog for qsv file input and output Be more interactive with qsv by using a file dialog to select a file for input and output. ![qsv-prompt-0.129.0-demo](https://github.com/jqnatividad/qsv/assets/30333942/4ec1f6ef-3a82-41fb-91ab-a0ab15360d21) Here are a few key highlights: - Start with `qsv prompt` when piping commands to provide a file as input from an open file dialog and pipe it into another command, for example: `qsv prompt | qsv stats`. - End with `qsv prompt -f` when piping commands to save the output to a file you choose with a save file dialog. There are other options too, so feel free to explore more with `qsv prompt --help`. This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!
New command qsv json - Convert JSON data to CSV and optionally provide a jq-like filter The new `json` command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the `--jaq` option to provide a jq-like filter. See `qsv json --help` for more information and examples. ![qsv-json-demo](https://github.com/jqnatividad/qsv/assets/30333942/e8e5e39d-dc2a-45a5-895a-5ec4ec5b6e01) Here are a few key highlights: - Specify the path to a JSON file to attempt conversion to CSV with `qsv json `. - Attempt conversion of JSON to CSV data from `stdin`, for example: `qsv slice --json | qsv json`. - Write the output to a file with the `--output ` (or `-o` for short) option. - Use the `--jaq ` option to try converting nested or complex JSON data into the intended format before parsing to CSV. You may learn more by running `qsv json --help`. Along with the `jsonl` command, we now have more options to convert JSON to CSV with qsv!
New command qsv clipboard - Provide input from your clipboard and save output to your clipboard Provide your clipboard content using `qsv clipboard` and save output to your clipboard by piping into `qsv clipboard --save` (or `-s` for short). ![qsv-clipboard-demo](https://github.com/jqnatividad/qsv/assets/30333942/c3e3754a-8db0-4a28-84bd-ba88054cf9a6)
100.dathere.com - Try out lessons and exercises with qsv from your browser! You may run qsv commands from your browser without having to install it locally at [100.dathere.com](https://100.dathere.com). | Within the lesson (in-page) using Thebe | In a Jupyter Lab environment | | ----------------------------------- | ----------------------------------- | | ![qsv Thebe demo](https://github.com/jqnatividad/qsv/assets/30333942/f5315ad4-e73a-4fe1-b868-b2f950412ecc) | ![qsv Jupyter Lab demo](https://github.com/jqnatividad/qsv/assets/30333942/9acca4b1-3117-4222-8198-c751a74e6378) | Thanks to [Jupyter Book](https://jupyterbook.org), [datHere](https://dathere.com) has released a website available at [100.dathere.com](https://100.dathere.com) where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting [100.dathere.com](https://100.dathere.com) and star the source code's repository [here](https://github.com/dathere/100.dathere.com).
New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish) There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the `examples` folder from [`contrib/completions`](https://github.com/jqnatividad/qsv/tree/master/contrib/completions) to verify if the examples work (as of today's release date only `qsv count` and `qsv clipboard` may be available) and also contribute to adding the rest of the completions if you know a bit of Rust. The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed. | Bash completions demo | Fish completions demo | | ----------------------------------- | ----------------------------------- | | ![qsv Bash completions demo](https://github.com/jqnatividad/qsv/assets/30333942/bec4b9ae-584a-49ad-8ced-c765174e8113) | ![qsv Fish completions demo](https://github.com/jqnatividad/qsv/assets/30333942/fafbf40b-9ea3-4ec4-ae22-9ae3319ce400) | With shell completions enabled, you may identify qsv commands more easily when pressing the `tab` key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com [here](https://100.dathere.com/exercises-setup.html#bash) to learn how to install the Bash completions and under the Usage section [here](https://github.com/jqnatividad/qsv/tree/master/contrib/fish#usage) for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.
qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow > This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released. ![qsv-pro-ckan-download-demo](https://github.com/jqnatividad/qsv/assets/30333942/9f4931ce-f51e-4266-9c22-e568d10ed811) In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the `create_dataset` permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.
qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance > This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released. > Also note that this video is sped up as you may see by the notes that pop up (you may pause the video to read them). https://github.com/jqnatividad/qsv/assets/30333942/e90893e6-3196-4fa6-bce0-f69a9f6347f2 Leveraging [`qsv describegpt`](https://github.com/jqnatividad/qsv/tree/master/src/cmd/describegpt.rs)'s AI integration capabilities along with multiple other qsv commands, qsv pro's Workflow's existing SQL query tab now has a generator that may ***attempt*** to generate a SQL query natural language using an LLM API compatible with OpenAI's API specification such as running an [Ollama](https://ollama.com/) (v0.2.0 or above) server locally and ***attempt*** to generate a SQL query by asking a question related to your spreadsheet data. Results may vary depending on your configuration and you may need to fix the generated output. For example in the demo we asked for ***who*** has the highest salary but extra information and only the highest salary was provided, though this does give a query we can modify and work with.
Note on Ask and qsv describegpt We mention ***attempt*** since LLMs can produce incorrect output, even output that *seems* correct but is not. We mention that "inaccurate information" may be produced within `qsv describegpt`'s usage text too along with AI-generated output potentially being incorrect within qsv pro, so make sure the output is fixed and verified before using it in production use cases.

🔁 Changelog

### Added * `clipboard`: add `qsv clipboard` command for clipboard input/output by @rzmk in https://github.com/jqnatividad/qsv/pull/1953 * `describegpt`: add `--prompt` for custom prompt & update prompt file + docs by @rzmk in https://github.com/jqnatividad/qsv/pull/1862 * `describegpt`: add base_url, model, ollama, & timeout to prompt file by @rzmk in https://github.com/jqnatividad/qsv/pull/1859 * `enum`: add `--hash` option to create a platform-independent deterministic id https://github.com/jqnatividad/qsv/pull/1902 * `enum`: add `--uuid7` option to create UUID v7 identifiers https://github.com/jqnatividad/qsv/pull/1914 * `freq`: add `--no-trim` option https://github.com/jqnatividad/qsv/pull/1944 * `foreach`: add sample Windows implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1847 * `joinp`: add `--right` outer join option https://github.com/jqnatividad/qsv/pull/1945 * `json`: change jsonp to json using new implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1924 * `json`: add `--jaq` option to allow jq-like filtering & test by @rzmk in https://github.com/jqnatividad/qsv/pull/1959 * `jsonp`: add `jsonp` command allowing non-nested JSON to CSV conversion with Polars by @rzmk in https://github.com/jqnatividad/qsv/pull/1880 * `prompt`: add `qsv prompt` to pick a file with a file dialog & write to stdout by @rzmk in https://github.com/jqnatividad/qsv/pull/1860 * `prompt`: add `--fd-output` (`-f`) & `--output` (`-o`) options by @rzmk in https://github.com/jqnatividad/qsv/pull/1861 * `select`: add `--sort`, `--random` & `--seed` options; also add 9999 sentinel value to indicate last column https://github.com/jqnatividad/qsv/pull/1867 * `select`: use underscore char (_) to indicate last column, replacing 9999 sentinel value https://github.com/jqnatividad/qsv/pull/1873 * `sqlp`: add `--streaming` option https://github.com/jqnatividad/qsv/commit/e8bee9a60dccc6ec5b5a43b91cb6f558915faa0e * `stats`: add Standard Error of the Mean (SEM) & Coefficient of Variation (CV) https://github.com/jqnatividad/qsv/pull/1857 * `validate`: added custom JSONschema format "currency" (decimal with 2 decimal places). Also, added check that only ascii characters are allowed in keys in JSONschema files. * added `--batch` zero option to all commands with batch processing. This sentinel value is used to indicate that the entire input should be processed in one batch https://github.com/jqnatividad/qsv/commit/feedbda4a3be9f8835eba0626e5fe01147831186 * added typos check to CI https://github.com/jqnatividad/qsv/commit/9fdf0662b6dc4fa6ebfed592a177d8539f264041 * `contrib(fish)`: add fish completions prototype with `qsv.fish` and docs by @rzmk in https://github.com/jqnatividad/qsv/pull/1884 * contrib(bashly): add `--hash ` option to `enum` by @rzmk in https://github.com/jqnatividad/qsv/pull/1905 * contrib(bashly): add `--uuid4` & `--uuid7` for `qsv enum` by @rzmk in https://github.com/jqnatividad/qsv/pull/1915 * `contrib(bashly)`: remove `--ollama` from `qsv describegpt` by @rzmk in https://github.com/jqnatividad/qsv/pull/1951 * `contrib(bashly)`: add `--no-trim` to `frequency` & `--right` to `joinp` by @rzmk in https://github.com/jqnatividad/qsv/pull/1952 * `tests`: add tests for 100.dathere.com/lessons/1 by @rzmk in https://github.com/jqnatividad/qsv/pull/1876 * `tests`: add test_100 for 100.dathere.com & tests for lesson/exercise 0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1848 * `docs`: add 👆 emoji to indicate commands with column selector support https://github.com/jqnatividad/qsv/commit/40ac8a7602315857ca529f43dd4fc45bec65c703 * Incorporate typos check in CI https://github.com/jqnatividad/qsv/pull/1930 ### Changed * `stats`: made several microoptimizations to Field Data Type inferencing https://github.com/jqnatividad/qsv/commit/35004541d25eb29d564ec60824da63d9f32344dd https://github.com/jqnatividad/qsv/commit/f829e0cfbc8a390570f85371e3d661ec33211405 * `select`: `--sort` & `--random` options now work with the initial selection, not just the entire CSV https://github.com/jqnatividad/qsv/pull/1875 * `contrib(bashly)`: update `contrib/bashly/completions.bash` (prep for qsv v0.129.0) by @rzmk in https://github.com/jqnatividad/qsv/pull/1885 * `jsonp`: use `print!` instead of `println!` & add `House.csv` + tests by @rzmk in https://github.com/jqnatividad/qsv/pull/1897 * `docs`: add column selector emoji - 👆 https://github.com/jqnatividad/qsv/pull/1906 * upgrade to polars 0.41.0 https://github.com/jqnatividad/qsv/pull/1907 * `describegpt`: update `dotenv.template` variable with `QSV_LLM_APIKEY` by @rzmk in https://github.com/jqnatividad/qsv/pull/1908 * `describegpt`: change min Ollama version from 0.1.49 to 0.2.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1954 * `describegpt`: add `{headers}` replaced by `qsv slice ... --len 1 -n` by @rzmk in https://github.com/jqnatividad/qsv/pull/1941 * `validate`: validating against a JSONschema requires headers https://github.com/jqnatividad/qsv/pull/1931 * setting `--batch` to 0 loads all rows at once before parallel processing https://github.com/jqnatividad/qsv/pull/1928 * `deps`: add polars timezones support https://github.com/jqnatividad/qsv/pull/1898 * `tests`: update `test_100/exercise_0.rs` setup file data by @rzmk in https://github.com/jqnatividad/qsv/pull/1858 * build(deps): bump actions/setup-python from 5.1.0 to 5.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1961 * build(deps): bump actix-web from 4.6.0 to 4.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1866 * build(deps): bump actix-web from 4.7.0 to 4.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1901 * build(deps): bump atoi_simd from 0.15.6 to 0.16.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1844 * build(deps): bump cached from 0.51.3 to 0.51.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1874 * build(deps): bump cached from 0.51.4 to 0.52.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1938 * build(deps): bump csvs_convert from 0.8.10 to 0.8.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1891 * build(deps): bump csvs_convert from 0.8.11 to 0.8.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1948 * build(deps): bump curve25519-dalek from 4.1.2 to 4.1.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1893 * build(deps): bump flexi_logger from 0.28.0 to 0.28.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1853 * build(deps): bump flexi_logger from 0.28.1 to 0.28.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1868 * build(deps): bump flexi_logger from 0.28.2 to 0.28.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1870 * build(deps): bump flexi_logger from 0.28.3 to 0.28.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1881 * build(deps): bump flexi_logger from 0.28.4 to 0.28.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1904 * build(deps): bump geosuggest-core from 0.6.2 to 0.6.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1883 * build(deps): bump geosuggest-utils from 0.6.2 to 0.6.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1882 * build(deps): bump jql-runner from 7.1.9 to 7.1.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1845 * build(deps): bump jql-runner from 7.1.10 to 7.1.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1856 * build(deps): bump jql-runner from 7.1.11 to 7.1.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1903 * build(deps): bump jql-runner from 7.1.12 to 7.1.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/1960 * build(deps): bump log from 0.4.21 to 0.4.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1925 * build(deps): bump mimalloc from 0.1.42 to 0.1.43 by @dependabot in https://github.com/jqnatividad/qsv/pull/1911 * build(deps): bump mlua from 0.9.8 to 0.9.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1894 * `deps`: apply latest polars upstream with unreleased fixes https://github.com/jqnatividad/qsv/commit/261ede59058a123c4cba62c0945a1fc4e1c77861 * `deps`: we now track py-polars release, instead of rust-polars https://github.com/jqnatividad/qsv/pull/1854 * `deps`: update polars engine to use py-polars-1.0.0-beta1 https://github.com/jqnatividad/qsv/pull/1896 * build(deps): bump polars from 0.41.0 to 0.41.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1909 * build(deps): bump polars from 0.41.1 to 0.41.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1916 * deps: bump polars from 0.41.2 to 0.41.3 https://github.com/jqnatividad/qsv/commit/dc0492ffe2669ddf8a7ff3f82fcd2db8daad83f9 * build(deps): bump pyo3 from 0.21.2 to 0.22.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1918 * build(deps): bump pyo3 from 0.22.0 to 0.22.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1950 * build(deps): bump regex from 1.10.4 to 1.10.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1865 * build(deps): bump redis from 0.25.3 to 0.25.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1846 * build(deps): bump reqwest from 0.12.4 to 0.12.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1889 * build(deps): bump self_update from 0.40.0 to 0.41.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1939 * build(deps): bump serde from 1.0.203 to 1.0.204 by @dependabot in https://github.com/jqnatividad/qsv/pull/1949 * build(deps): bump serde_json from 1.0.117 to 1.0.118 by @dependabot in https://github.com/jqnatividad/qsv/pull/1920 * build(deps): bump serde_json from 1.0.118 to 1.0.119 by @dependabot in https://github.com/jqnatividad/qsv/pull/1932 * build(deps): bump serde_json from 1.0.119 to 1.0.120 by @dependabot in https://github.com/jqnatividad/qsv/pull/1935 * build(deps): bump simple-expand-tilde from 0.1.6 to 0.1.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1886 * build(deps): bump strum from 0.26.2 to 0.26.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1913 * build(deps): bump strum_macros from 0.26.2 to 0.26.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1855 * build(deps): bump strum_macros from 0.26.3 to 0.26.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1863 * build(deps): bump sysinfo from 0.30.12 to 0.30.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/1957 * build(deps): bump sysinfo from 0.30.12 to 0.30.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/1965 * build(deps): bump titlecase from 3.2.0 to 3.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1963 * build(deps): bump tokio from 1.37.0 to 1.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1850 * build(deps): bump url from 2.5.0 to 2.5.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1869 * build(deps): bump url from 2.5.1 to 2.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1895 * build(deps): bump uuid from 1.8.0 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1912 * build(deps): bump uuid from 1.9.0 to 1.9.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1919 * build(deps): bump uuid from 1.9.1 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1964 * build(deps): bump xxhash-rust from 0.8.10 to 0.8.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1942 * apply select clippy suggestions * updated several indirect dependencies * made various usage text improvements * added several benchmarks * pin Rust nightly to 2024-06-23 ### Fixed * `frequency`: fix unique identifiers column detection https://github.com/jqnatividad/qsv/pull/1966 * `json`: add empty single JSON object logic & empty tests by @rzmk in https://github.com/jqnatividad/qsv/pull/1958 * `json`: fix typo in error message by @rzmk in https://github.com/jqnatividad/qsv/pull/1929 * `sniff`: fix doc typo `sanple` -> `sample` by @rzmk in https://github.com/jqnatividad/qsv/pull/1947 * `validate`: validating with a JSONSchema requires headers https://github.com/jqnatividad/qsv/commit/616438213de44e4377a98ea81a676a7900bd4ae9 * Fixed several typos https://github.com/jqnatividad/qsv/commit/9fdf0662b6dc4fa6ebfed592a177d8539f264041 ### Removed * `describegpt`: remove `--ollama` since Ollama v0.1.49 has endpoints by @rzmk in https://github.com/jqnatividad/qsv/pull/1946 * `json`: remove necessity for `polars` feature & fix `--list` formatting by @rzmk in https://github.com/jqnatividad/qsv/pull/1936 * `jsonp`: remove `jsonp` command in favor of `json` by @rzmk in https://github.com/jqnatividad/qsv/pull/1924 * `deps`: fine tune polars features and remove explicit polars-ops dependency https://github.com/jqnatividad/qsv/commit/ccfd000d129799f5a106a7d4c8edab88af37367b **Full Changelog**: https://github.com/jqnatividad/qsv/compare/0.128.0...0.129.0

To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter

- Rust
Published by jqnatividad over 1 year ago

https://github.com/dathere/qsv - 0.128.0

[0.128.0] - 2024-05-25

❤️ csv,conf,v8 Edition 🎉
🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨

csv,conf,v8-qsv-ai-logo-previews

Yii-hah! We're Mexico bound as we head to csv,conf,v8 to present and share qsv with fellow data-makers and wranglers from all over!

And we've packed a lot into this release for the occasion: * search got a lot of love as it now powers qsv pro's new search feature to get near-instant search results even on large datasets. * stats - the ❤️ of qsv, now has several cache fine-tuning options with --cache-threshold. It now also computes max_precision for floats and is_ascii for strings. It also has a new --round 9999 sentinel value to suppress rounding of statistics. * schema & tojsonl are now faster thanks to stats --cache-threshold autoindex & cache creation/deletion logic. * We upgraded Polars to 0.40.0 to unlock additional capabilities in the count, joinp & sqlp commands. * count now has an additional blazing fast counting mode using Polars' read_csv() table function. * frequency gets some micro-optimizations for even faster frequency analysis. * luau is now bundled with luau 0.625 from 0.622. We also upgraded the bundled LuaDate library from 2.2.0 to 2.2.1. All of this, while making it ~10% faster!

Overall, qsv manages to keep its performance edge despite the addition of new capabilities and features. We'll give a whirlwind tour of qsv and these updates in our talk at csv,conf,v8.

We'll also preview what we've been calling the People's APPI - our "Answering People/Policymaker Interface" in qsv pro.

This is a new way to interact with qsv that's more conversational and less command-line-y using a natural language interface. It's a way to make qsv more accessible to more people, especially those who are not comfortable with the command line.

We're excited to share all these qsv innovations with the csv,conf,v8 community and the wider world! Nos vemos en Puebla!

¡Ándele! ¡Ándele! ¡Epa! ¡Epa! ¡Epa!


Added

  • count: additional Polars-powered counting mode using read_csv() SQL table function https://github.com/jqnatividad/qsv/commit/05c580912365356e9c5383654f351e0cc6ebaab6
  • input: add --quote-style option https://github.com/jqnatividad/qsv/commit/df3c8f14a4eaa2fba7237dfe30df2fef8c98eccd
  • joinp: add --coalesce option https://github.com/jqnatividad/qsv/commit/8d142e51d683ab425fc53b2dddfdeeff6a814ffa
  • search: add --preview-match option https://github.com/jqnatividad/qsv/pull/1785
  • search: add --json output option https://github.com/jqnatividad/qsv/pull/1790
  • search: add "match-only" --flag option mode https://github.com/jqnatividad/qsv/pull/1799
  • search: add --not-one flag for not using exit code 1 when no match by @rzmk in https://github.com/jqnatividad/qsv/pull/1810
  • sqlp: add --decimal-comma option https://github.com/jqnatividad/qsv/pull/1832
  • stats: add --cache-threshold option https://github.com/jqnatividad/qsv/pull/1795
  • stats: add --cache-threshold autoindex creation/deletion logic https://github.com/jqnatividad/qsv/pull/1809
  • stats: add additional mode to --cache-threshold https://github.com/jqnatividad/qsv/commit/63fdc55828ec55bf7545c37bd56a4d537aa0cf71
  • stats: now computes max_precision for floats https://github.com/jqnatividad/qsv/pull/1815
  • stats: add --round 9999 sentinel value support to suppress rounding https://github.com/jqnatividad/qsv/pull/1818
  • stats: add is_ascii column https://github.com/jqnatividad/qsv/pull/1824
  • added new benchmarks for search command https://github.com/jqnatividad/qsv/commit/58d73c3beb41071d6cd8532768f0991f0554b717

Changed

  • count: document three count modes https://github.com/jqnatividad/qsv/commit/3d5a333ca8aef3aeaf74ff9e153b5118eb6a605b
  • describegpt: update --max-tokens type for LLMs with larger context sizes by @rzmk https://github.com/jqnatividad/qsv/pull/1841
  • excel: use simpler range::headers() to get headers https://github.com/jqnatividad/qsv/commit/069acbf5a6e86132214521324720608f4258c20f
  • frequency: ensure --other-sorted works with --other-text https://github.com/jqnatividad/qsv/commit/7430ad76bda869be7729ea5000ad4d85a875433b
  • frequency: microoptimize hot loop https://github.com/jqnatividad/qsv/commit/d9c01e17fa6c4f853a501fe75c6a6b8a30c269d2, https://github.com/jqnatividad/qsv/commit/7c9f925184100f89f6f3a77ae4f7b93448103f38 and
  • luau: improve usage text https://github.com/jqnatividad/qsv/commit/cb6b4d9b7bfb60a10385057ca093453e3549e424
  • luau: we now bundle luau 0.625 from 0.622 https://github.com/jqnatividad/qsv/commit/40609751950a852f998fba41edb35aab31c74c20
  • luau: update vendored LuaDate library from 2.2.0 to 2.2.1 https://github.com/jqnatividad/qsv/pull/1840
  • schema: adjust to reflect stats --cache-threshold option https://github.com/jqnatividad/qsv/commit/92fed8696fd885d3721f07eeedcf67732febed4c
  • slice: move json output helpers to util https://github.com/jqnatividad/qsv/commit/1f44b488784fd0c1ef22786ab7aeacbf2f8cf976
  • tojsonl: refactor boolcheck helper https://github.com/jqnatividad/qsv/commit/74d5f5a8c934254e11ee611973cc10524a288a9e
  • docs: cross-reference split & partition commands https://github.com/jqnatividad/qsv/pull/1828
  • contrib(bashly): update completions.bash for qsv v0.127.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1776
  • contrib(bashly): update completions.bash for qsv v0.128.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1838
  • deps: upgrade to polars 0.40.0 https://github.com/jqnatividad/qsv/pull/1831
  • build(deps): bump actix-web from 4.5.1 to 4.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1825
  • build(deps): bump anyhow from 1.0.82 to 1.0.83 by @dependabot in https://github.com/jqnatividad/qsv/pull/1798
  • build(deps): bump anyhow from 1.0.83 to 1.0.85 by @dependabot in https://github.com/jqnatividad/qsv/pull/1823
  • build(deps): bump anyhow from 1.0.85 to 1.0.86 by @dependabot in https://github.com/jqnatividad/qsv/pull/1826
  • build(deps): bump cached from 0.50.0 to 0.51.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1789
  • build(deps): bump cached from 0.51.0 to 0.51.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1793
  • build(deps): bump cached from 0.51.1 to 0.51.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1802
  • build(deps): bump cached from 0.51.2 to 0.51.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1805
  • build(deps): bump crossbeam-channel from 0.5.12 to 0.5.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/1827
  • build(deps): bump csvs_convert from 0.8.9 to 0.8.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1808
  • build(deps): bump data-encoding from 2.5.0 to 2.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1780
  • build(deps): bump file-format from 0.24.0 to 0.25.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1807
  • build(deps): bump flate2 from 1.0.28 to 1.0.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1778
  • build(deps): bump flate2 from 1.0.29 to 1.0.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1784
  • build(deps): bump hashbrown from 0.14.3 to 0.14.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1781
  • build(deps): bump itertools from 0.12.1 to 0.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1822
  • deps: bump forked jsonschema from 0.17.1 to 0.18.0 https://github.com/jqnatividad/qsv/commit/f02620fd170804b1995b070e8133522b98a8c443
  • build(deps): bump mimalloc from 0.1.41 to 0.1.42 by @dependabot in https://github.com/jqnatividad/qsv/pull/1829
  • build(deps): bump mlua from 0.9.7 to 0.9.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1821
  • build(deps): bump qsv-stats from 0.16.0 to 0.17.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1813
  • build(deps): bump qsv-stats from 0.17.1 to 0.17.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1814
  • build(deps): bump qsv-stats from 0.17.2 to 0.18.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1816
  • build(deps): bump ryu from 1.0.17 to 1.0.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/1801
  • build(deps): bump semver from 1.0.22 to 1.0.23 by @dependabot in https://github.com/jqnatividad/qsv/pull/1800
  • build(deps): bump serde from 1.0.198 to 1.0.199 by @dependabot in https://github.com/jqnatividad/qsv/pull/1777
  • build(deps): bump serde from 1.0.199 to 1.0.200 by @dependabot in https://github.com/jqnatividad/qsv/pull/1787
  • build(deps): bump serde from 1.0.200 to 1.0.201 by @dependabot in https://github.com/jqnatividad/qsv/pull/1804
  • build(deps): bump serde from 1.0.201 to 1.0.202 by @dependabot in https://github.com/jqnatividad/qsv/pull/1817
  • build(deps): bump serde_json from 1.0.116 to 1.0.117 by @dependabot in https://github.com/jqnatividad/qsv/pull/1806
  • build(deps): bump serial_test from 3.1.0 to 3.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1779
  • build(deps): bump simple-expand-tilde from 0.1.5 to 0.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1811
  • build(deps): bump sysinfo from 0.30.11 to 0.30.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1797
  • build(deps): bump titlecase from 3.0.0 to 3.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1791
  • build(deps): bump jql-runner from 7.1.8 to 7.1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1839
  • apply select clippy suggestions
  • updated several indirect dependencies
  • pin Rust nightly to 2024-05-14
  • bump MSRV to 1.78

Fixed

  • luau: correct example when using --colindex https://github.com/jqnatividad/qsv/commit/cbbed21718324346031a3201407f274abfec5ee6
  • search: fix --json output https://github.com/jqnatividad/qsv/pull/1792
  • pass through docopt messages without a prefix https://github.com/jqnatividad/qsv/pull/1835
  • apply Polars SQL count(*) group by fix https://github.com/jqnatividad/qsv/pull/1837

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.127.0...0.128.0

- Rust
Published by jqnatividad almost 2 years ago

https://github.com/dathere/qsv - 0.127.0

📊 Enhanced Frequency Analysis 📊

This a quick release adding several frequency enhancements for more detailed frequency analysis. The frequency command now includes a percentage column, calculates other values, and supports limiting unique counts and negative limits. These options provides additional context for Datapusher+, qsv-pro and describegpt so their metadata inferences are more accurate and comprehensive.

Previously, for a 775-row CSV file containing one column named state with entries for all 50 states, frequency only showed[^1]:

qsv frequency freq_state_example.csv | qsv table field value count state NY 100 state NJ 70 state CA 60 state MA 55 state FL 45 state TX 43 state NM 40 state AZ 39 state NV 38 state MI 35

Now, there's a new percentage column and other values calculation, both of which have configurable options:

qsv frequency freq_state_example.csv | qsv table field value count percentage state NY 100 12.90323 state NJ 70 9.03226 state CA 60 7.74194 state MA 55 7.09677 state FL 45 5.80645 state TX 43 5.54839 state NM 40 5.16129 state AZ 39 5.03226 state NV 38 4.90323 state MI 35 4.51613 state Other (40) 250 32.25806

This release is also out of cycle to address a big performance regression in the excel command caused by unnecessary formula info retrieval for the --error-format option introduced in 0.126.0. This has been fixed, and the excel command is now back to its speedy self.


Added

  • frequency: added percentage column; other values calculation, implementing https://github.com/jqnatividad/qsv/issues/1774 https://github.com/jqnatividad/qsv/pull/1775
  • benchmarks: added new frequency and excel benchmarks https://github.com/jqnatividad/qsv/commit/b83ad3aae1cdf9a1750201cbf9b3ccd4ac3a4192

Changed

  • contrib(bashly): update completions.bash for qsv v0.126.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1771
  • build(deps): bump mimalloc from 0.1.39 to 0.1.41 by @dependabot in https://github.com/jqnatividad/qsv/pull/1772
  • build(deps): bump qsv-stats from 0.14.0 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1773
  • updated several indirect dependencies
  • applied select clippy recommendations

Fixed

  • excel: fixed performance regression because qsv was unnecessarily getting formula info (an expensive operation) for --error-format option even when not required https://github.com/jqnatividad/qsv/commit/772af3420c44c864e06cd2cb61606900bff17947
  • renamed 0.126.0 sqlpvsduckdb benchmark results so they're next to each other for easy direct comparison. https://github.com/jqnatividad/qsv/commit/7bcd59e301965b9e8737a9230d1236e8d34ab4bf.
    Per the benchmarks, sqlp is 2.87 times faster than duckdb v0.10.2 for a simple aggregation (0.066 secs vs 0.19 secs), and 1.42 times faster for an "expensive" aggregation (0.143 secs vs 0.203 secs).

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.126.0...0.127.0

[^1]: with its default --limit setting of 10 only show the top 10 unique values in the column, sorted by occurence

- Rust
Published by jqnatividad almost 2 years ago

https://github.com/dathere/qsv - 0.126.0

🤖 Expanded Metadata Inferencing 🤖

describegpt headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.

Several commands got additional options: cat with --no-headers support in the rowskey subcommand; excel with new options like --error-format and short --metadata mode; and foreach with a --dry-run option. frequency also got new options, including --unq-limit for limiting unique counts, support for negative limits, and a --lmt-threshold option for compiling comprehensive frequencies below a threshold. slice now supports negative indices and new JSON output options, providing more flexibility in data slicing.

This is all rounded out with sqlp improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv() and read_parquet()) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.


New Features

  • cat: Added --no-headers support to the rowskey subcommand.
  • describegpt: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.
  • excel: Introduced new options in the excel command: --error-format for better error handling and a short --metadata JSON mode.
  • foreach: added a --dry-run option, allowing users to preview the results of scripts without executing them.
  • frequency: New options added such as --unq-limit for limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a --lmt-threshold option to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.
  • slice: Support for negative indices to slice from the end and new JSON output options.
  • sqlp: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.

Changes and Optimizations

  • Performance Enhancements: Microoptimizations in datefmt and validate commands, and increased default length for --infer-len in sqlp for improved performance.
  • Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
  • Benchmarks Added: New performance benchmarks for sqlp vs duckdb added to ensure there are no performance regressions between releases. Right now, sqlp is faster than duckdb in most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.

Security and Robustness

  • Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
  • Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.

Deprecated Features

  • fetch & fetchpost: Removal of the jsonxf crate from these commands to streamline JSON processing
  • reverse: Eliminate kludgy buffer expansions.

Added

  • cat: add --no-headers support to rowskey subcommand https://github.com/jqnatividad/qsv/pull/1762
  • describegpt: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in https://github.com/jqnatividad/qsv/pull/1761
  • excel: add --error-format option https://github.com/jqnatividad/qsv/pull/1721
  • excel: add --metadata short JSON mode https://github.com/jqnatividad/qsv/pull/1738
  • foreach: add --dry-run option https://github.com/jqnatividad/qsv/pull/1740
  • frequency: add --unq-limit option https://github.com/jqnatividad/qsv/pull/1763
  • frequency: add support for negative --limits https://github.com/jqnatividad/qsv/pull/1765
  • frequency: add --lmt-threshold option https://github.com/jqnatividad/qsv/pull/1766
  • slice: add support for negative --index option values https://github.com/jqnatividad/qsv/pull/1726
  • slice: implement --json output option https://github.com/jqnatividad/qsv/pull/1729
  • sqlp: added support for single-line comments in SQL scripts https://github.com/jqnatividad/qsv/commit/bb52bcee61d8ea980a2ab093315ead0c153517a5
  • sqlp: added SKIPINPUT special value to short-circuit input processing if the user wants to load input files directly using table functions (e.g. readcsv(), read_parquet(), etc.) https://github.com/jqnatividad/qsv/commit/fe850adb47f1d7aa7f6c3981e350646e7b0c7476
  • validate: add --valid-output option https://github.com/jqnatividad/qsv/pull/1730
  • contrib: add sample Bashly completions implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1731
  • benchmarks: added sqlp vs duckdb benchmarks.

Changed

  • datefmt: microoptimize formatting https://github.com/jqnatividad/qsv/commit/0ee27e768fdc08b7381094842d22b45940fd0a26
  • joinp: adapt to breaking change in Polars 0.39 for lazyframe sort https://github.com/jqnatividad/qsv/commit/c625ca9f5aef59c736a837aaa4eeda7688403c37
  • sqlp: change --infer-len option default from 250 to 1000 for increased performance https://github.com/jqnatividad/qsv/commit/da1d215d803f8bfe400a7202feeecb8ae14239e9
  • validate: microoptimize to_json_instance() https://github.com/jqnatividad/qsv/commit/c2e4a1c696300eea04cccacca33f6872622ec086
  • bump Luau from 0.616 to 0.622 https://github.com/jqnatividad/qsv/commit/9216ec3a53767379662657f69c0076f4a52caaff
  • build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1711
  • build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1712
  • build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1750
  • build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1715
  • build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1716
  • build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1732
  • build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1735
  • build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1755
  • build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1720
  • build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1724
  • build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1725
  • build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1759
  • build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in https://github.com/jqnatividad/qsv/pull/1733
  • build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1734
  • build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in https://github.com/jqnatividad/qsv/pull/1744
  • bump polars from 0.38 to 0.39 https://github.com/jqnatividad/qsv/pull/1745
  • build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1746
  • build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1752
  • build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1747
  • build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in https://github.com/jqnatividad/qsv/pull/1749
  • build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in https://github.com/jqnatividad/qsv/pull/1751
  • build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1758
  • build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1767
  • build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1768
  • build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1769
  • applied select clippy recommendations
  • updated several indirect dependencies
  • added several benchmarks for new/changed commands
  • pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
  • bumped MSRV to 1.77.2

Fixed

  • Make init_logger more robust https://github.com/jqnatividad/qsv/pull/1717
  • count: empty CSVs count as zero also for polars. Fixes #1741 https://github.com/jqnatividad/qsv/pull/1742
  • excel: fix #1682 by adding --error-format option https://github.com/jqnatividad/qsv/issues/1689
  • fetch & fetchpost: more robust JSON response validation https://github.com/jqnatividad/qsv/commit/ebc7287cd929cc23629ee53c7d82e0b8984bc2b0
  • slice: use write! macro to get rid of GH Advanced Security lint https://github.com/jqnatividad/qsv/commit/c739097e20d526cb6f49ca69d76fed8b28adc029
  • sqlp: fixed docopt defaults that were not being parsed correctly https://github.com/jqnatividad/qsv/commit/fe850adb47f1d7aa7f6c3981e350646e7b0c7476
  • deps: bump h2 from 0.4.3 to 0.4.4 to fix HTTP2 Continuation Flood vulnerability https://github.com/jqnatividad/qsv/commit/6af0da27f4e4a0bb6d5563701c07c89ad00f76b8
  • deps: bump rustls from 0.22.3 to 0.22.4 to fix https://nvd.nist.gov/vuln/detail/CVE-2024-32650 https://github.com/jqnatividad/qsv/pull/1758

Removed

  • fetch & fetch post: remove jsonxf crate; use serde_json to prettify JSON strings https://github.com/jqnatividad/qsv/pull/1727
  • reverse: remove kludgy expansion of read/write buffers https://github.com/jqnatividad/qsv/commit/46095cdf57f65c5380251c5d59317053ae1f80c3

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.125.0...0.126.0

- Rust
Published by jqnatividad almost 2 years ago

https://github.com/dathere/qsv - 0.125.0

In this release, we focused on the 🏎️ need for even more speed 🏎️ .

This was done primarily by tweaking several supporting qsv crates. qsv-docopt now parses command-line arguments slightly faster. qsv-stats, the crate behind commands like stats, schema, tojsonl, and frequency, has been further optimized for speed. qsv-dateparser has been updated to support new timezone handling options in datefmt. qsv-sniffer also got a speed boost.

Per the benchmark suite, stats is 25% faster (1.563 secs vs 2.067 secs) when computing the 13 "streaming" stats and 13% faster when computing --everything (17 columns of addl stats - 3.149 secs vs 3.656 secs) for the 1M row, 41 column, 520mb sample of NYC's 311 data.

The count command has been refactored to utilize Polars' SQLContext, which leverages LazyFrames evaluation to automagically count even very large files in just a few seconds. Previously, count was already using Polars, but it mistakenly fell back to a slower counting mode. Now, it consistently delivers fast performance, even without an index. On the same benchmark suite, it takes 0.052 secs vs 0.503 seconds - almost 10x faster!

As count is not just a top-level command, but also a widely used helper used by several qsv commands, this gives the entire suite a nice performance boost.

Continuing on the performance front, the excel command now has a new short --metadata mode, allowing users to just get a "shorter" version of the metadata report that only list the workbook's top level metadata (sheet index, sheet name, sheet type, visibility) instead of the full metadata report (which also has info like num rows, column metadata, etc.). On the benchmark suite, the short metadata report takes all of 0.005 secs vs 11.237 secs for the 1M row xlsx version of the same NYC 311 data - more than 3 orders of magnitude faster! (it may actually be faster since 0.005 secs is at the limits of what hyperfine can measure)

The datefmt command also got some major enhancements with new timezone handling and timestamp parsing options, though at the cost of a small 15% performance penalty.

Lastly, we are excited to announce that qsv will be featured at the CSV,Conf,V8 conference in Puebla, Mexico on May 28-29. I'll be presenting a talk titled "qsv: A Blazing Fast CSV Data-Wrangling Toolkit". Hope to see you there!.


Added

  • excel: added short mode to --metadata option https://github.com/jqnatividad/qsv/pull/1699
  • datefmt: added ts-resolution option to specify resolution to use when parsing unix timestamps https://github.com/jqnatividad/qsv/pull/1704
  • datefmt: added timezone handling options https://github.com/jqnatividad/qsv/pull/1706 https://github.com/jqnatividad/qsv/pull/1707 https://github.com/jqnatividad/qsv/pull/1642

Changed

  • count: refactored to use Polars SQLContext https://github.com/jqnatividad/qsv/commit/43a236f6a45c890d2bb6b4c43eb469bd627f82e1
  • stats: refactored stats_path helper function https://github.com/jqnatividad/qsv/commit/174c30e3b87470613ff34a98617d44e477a4296a
  • apply, applydp, datefmt, excel, geocode, py, validate: use std::mem::take to avoid clone https://github.com/jqnatividad/qsv/commit/1fd187f23262b51e0f431664895d49fd930d011a https://github.com/jqnatividad/qsv/commit/8402d3a8063ef161fc9ec68dd7f0f0601802d21d https://github.com/jqnatividad/qsv/commit/849615775505a25888a50b255ba0d544e878aeaf
  • excel: optimized workbook opening operation https://github.com/jqnatividad/qsv/commit/67f662eba501e543ec44e5daf5eb175f8a8ae7b1
  • build(deps): bump flexi_logger from 0.27.4 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1673
  • build(deps): bump polars from 0.38.2 to 0.38.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1674
  • build(deps): bump uuid from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1675
  • build(deps): bump hashbrown from 0.14.3 to 0.14.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1680
  • build(deps): bump reqwest from 0.11.26 to 0.11.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1679
  • build(deps): bump bytes from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1685
  • build(deps): bump regex from 1.10.3 to 1.10.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1686
  • build(deps): bump indexmap from 2.2.5 to 2.2.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1687
  • build(deps): bump rayon from 1.9.0 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1688
  • build(deps): bump qsv_docopt from 1.6.0 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1691
  • build(deps): bump reqwest from 0.12.1 to 0.12.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1693
  • build(deps): bump serde_json from 1.0.114 to 1.0.115 by @dependabot in https://github.com/jqnatividad/qsv/pull/1694
  • build(deps): bump itoa from 1.0.10 to 1.0.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1695
  • build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1700
  • build(deps): bump rust_decimal from 1.34.3 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1701
  • build(deps): bump chrono from 0.4.35 to 0.4.37 by @dependabot in https://github.com/jqnatividad/qsv/pull/1702
  • build(deps): bump tokio from 1.36.0 to 1.37.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1703
  • build(deps): bump qsv-sniffer from 0.10.2 to 0.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1708
  • build(deps): bump titlecase from 2.2.1 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1709
  • build(deps): bump qsv-stats from 0.13.0 to 0.14.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1710
  • applied select clippy recommendations
  • updated several indirect dependencies
  • added several benchmarks for new/changed commands
  • bumped MSRV to 1.77.1
  • use #[cfg(debug_assertions)] conditional compilation to avoid compiling debug code in release mode
  • use patched forks of jsonschema, cached, self_update and localzone crates to avoid old dependencies which was causing dependency bloat

Fixed

  • count: fixed polarscountinput helper, as it was always falling back to "slow" counting mode https://github.com/jqnatividad/qsv/commit/3484c89080d41d2e39457c918a893189aee64753

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.124.1...0.125.0

- Rust
Published by jqnatividad almost 2 years ago

https://github.com/dathere/qsv - 0.124.1

Datapusher+ "Speed of Insight" Release! 🚀🚀🚀

This release is all about speed, speed, speed! We've made qsv even faster by leveraging Polars' multithreaded, mem-mapped CSV reader to get near-instant row counts of large CSV files, and near instant SQL queries and aggregations with Datapusher+ - automagically inferring metadata and giving you quick insights into your data in seconds!

We're demoing our qsv-powered Datapusher+ at the March 2024 installment of CKAN Montly Live on March 20, 2024, 13:00-14:00 UTC. Join us!

Beyond pushing data reliably at speed into your CKAN Datastore (it pushes real good! 😉), DP+ does some extended analysis, processing and enrichment of the data so it can be readily Used.

Both fetch and fetchpost commands now also have a --disk-cache option and are fully synched - forming the foundation for high-speed data enrichment from Web Services - including datHere's forthcoming, fully-integrated Data Enrichment Service.

🏇🏽 Hi-ho Quicksilver, away! 🏇🏽


Added

  • count: automatically use Polars multithreaded, mem-mapped CSV reader when polars feature is enabled to get near-instant row counts of large CSV files even without an index https://github.com/jqnatividad/qsv/pull/1656
  • qsvdp: added polars support to Datapusher+-optimized binary variant, so we can do near instant SQL queries and aggregations during DP+ processing https://github.com/jqnatividad/qsv/pull/1664
  • fetchpost: added --disk-cache options and synced usage options with fetch https://github.com/jqnatividad/qsv/pull/1671
  • extended .infile-list to skip empty and commented lines, and to validate file paths https://github.com/jqnatividad/qsv/commit/20a45c80fa32ef8a8060bb32cc94b7934da23229 and https://github.com/jqnatividad/qsv/commit/26509303719ce29e900cb73b5000671a78db6b4a

Changed

  • sqlp: automatically disable read_csv() fast path optimization when a custom delimiter is specified https://github.com/jqnatividad/qsv/pull/1648
  • refactored util::count_rows() helper to also use polars if available https://github.com/jqnatividad/qsv/commit/1e09e17e440d3cdc11237d9d9e45cefb82da5a42 and https://github.com/jqnatividad/qsv/commit/8d321fe8ad4c288b72edc7e8d082fcd6ec304a32
  • publish: updated Windows MSI publish GH Action workflow to use Wix 3.14 from 3.11 https://github.com/jqnatividad/qsv/commit/75894ef4e894f521056a93b4f0a14d7469bac022
  • deps: bump polars from 0.38.1 to 0.38.2 https://github.com/jqnatividad/qsv/commit/5faf90ed830541a724768e808c7f07f0a418e2ab
  • deps: update Luau from 0.614 to 0.616 https://github.com/jqnatividad/qsv/commit/eb197fe81738b4ed15352f5f89d5d5d1b0fad604 and https://github.com/jqnatividad/qsv/commit/52331da939a3cd278c6a1f474179bef2207364a8
  • build(deps): bump sysinfo from 0.30.6 to 0.30.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1650
  • build(deps): bump chrono from 0.4.34 to 0.4.35 by @dependabot in https://github.com/jqnatividad/qsv/pull/1651
  • build(deps): bump strum from 0.26.1 to 0.26.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1658
  • build(deps): bump qsv-stats from 0.12.0 to 0.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1663
  • build(deps): bump anyhow from 1.0.80 to 1.0.81 by @dependabot in https://github.com/jqnatividad/qsv/pull/1662
  • build(deps): bump reqwest from 0.11.25 to 0.11.26 by @dependabot in https://github.com/jqnatividad/qsv/pull/1667
  • applied select clippy recommendations
  • updated several indirect dependencies
  • added several benchmarks for new/changed commands

Fixed

  • dedup: fixed #1665 dedup not handling numeric values properly by adding a --numeric option https://github.com/jqnatividad/qsv/pull/1666
  • joinp: reenable join validation tests now that Polars 0.38.2 join validation is working again https://github.com/jqnatividad/qsv/commit/5faf90ed830541a724768e808c7f07f0a418e2ab and https://github.com/jqnatividad/qsv/commit/fcfc75b855c615effb50f23c09a1d66ce70505e8
  • count: broken in unreleased 0.124.0. Polars-powered count require a "clean" CSV file as it infers the schema based on the first 1000 rows of a CSV. This will sometimes result in an invalid "error" (e.g. it infers a column is a number column, when its not). 0.124.1 fixes this by adding a fallback to the "regular" CSV reader if a Polars error occurs https://github.com/jqnatividad/qsv/commit/a2c086900d1c1f1ba8ed2b2d1eaf8e547e3ef740

Removed

  • gender_guesser 0.2.0 has been released. Remove patch.crates-io entry https://github.com/jqnatividad/qsv/commit/97873a5c496bfd559d7a7804db4d28b94915d536

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.123.0...0.124.1

- Rust
Published by jqnatividad almost 2 years ago

https://github.com/dathere/qsv - 0.123.0

OPEN DATA DAY 2024 Release! 🎉🎉🎉

In celebration of Open Data Day, we're releasing qsv 0.123.0 - the biggest release ever with 330+ commits! qsv 0.123.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.

We've been baking qsv pro for a while now, and it's almost ready for release. qsv pro is a cross-platform Desktop Data Wrangling tool marrying an Excel-like UI with the power of qsv, backed by cloud-based data cleaning, enrichment and enhancement service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.

Stay tuned!

Highlights:

  • sqlp now has automatic read_csv() fast path optimization, often making optimized queries run dramatically faster - e.g what took 6.09 seconds for a non-trivial SQL aggregation on an 18 column, 657mb CSV with 7.43 million rows now takes just 0.14 seconds with the optimization - 🚀 43.5x FASTER 🚀 ! [^1] [^1]: measurements taken on an Apple Mac Mini 2023 model with an M2 Pro chip with 12 CPU cores & 32GB of RAM, running macOS Sonoma 14.4 ```bash # with fast path optimization turned off /usr/bin/time qsv sqlp taxi.csv --no-optimizations "select VendorID,sum(totalamount) from taxi group by VendorID order by VendorID" VendorID,totalamount 1,52377417.52985942 2,89959869.13054822 4,600584.610000027 (3, 2) 6.09 real 6.82 user 0.16 sys

with fast path optimization, fully exploiting Polars' multithreaded, mem-mapped CSV reader!

/usr/bin/time qsv sqlp taxi.csv "select VendorID,sum(totalamount) from taxi group by VendorID order by VendorID" VendorID,totalamount 1,52377417.52985942 2,89959869.13054822 4,600584.610000027 (3, 2) 0.14 real 1.09 user 0.09 sys

in contrast, csvq takes 72.46 seconds - 517.57x slower

/usr/bin/time csvq "select VendorID,sum(totalamount) from taxi group by VendorID order by VendorID" +----------+---------------------+ | VendorID | SUM(totalamount) | +----------+---------------------+ | 1 | 52377417.529256366 | | 2 | 89959869.1264675 | | 4 | 600584.6099999828 | +----------+---------------------+ 72.46 real 65.15 user 75.17 sys ```

"Traditional" SQL engines

qsv and csvq both operate on "bare" CSVs. For comparison, let's contrast qsv's performance against "traditional" SQL engines that require setup and import (aka ETL). Not counting setup and import time (which alone, takes several minutes), we get:

sqlite3.43.2 takes 2.910 seconds - 20.79x slower

sql sqlite> .timer on sqlite> select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID; 1,52377417.53 2,89959869.13 4,600584.61 Run Time: real 2.910 user 2.569494 sys 0.272972

PostgreSQL 15.6 using PgAdmin 4 v6.12 takes 18.527 seconds - 132.34x slower

Screenshot 2024-03-06 at 10 14 04 AM

even with an index, qsv sqlp is still 5.96x faster

Screenshot 2024-03-08 at 7 57 57 AM

  • sqlp now supports JSONL output format and adds compression support for Avro and Arrow output formats.
  • fetch now has a --disk-cache option, so you can cache web service responses to disk, complete with cache control and expiry handling!
  • jsonl is now multithreaded with additional --batch and --job options.
  • split now has three modes: split by record count, split by number of chunks and split by file size.
  • datefmt is a new top-level command for date formatting. We extracted it from apply to make it easier to use, and to set the stage for expanded date and timezone handling.
  • enum now has a --start option.
  • excel now has a --keep-zero-time option and now has improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24.
  • tojsonl now has --trim and --no-boolean options and eliminated false positive boolean inferences.

Added

  • apply: add gender_guess operation https://github.com/jqnatividad/qsv/pull/1569
  • datefmt: new top-level command for date formatting. https://github.com/jqnatividad/qsv/pull/1638
  • enum: add --start option https://github.com/jqnatividad/qsv/pull/1631
  • excel: added --keep-zero-time option; improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1595
  • fetch: add --disk-cache option https://github.com/jqnatividad/qsv/pull/1621
  • jsonl: major performance refactor! Now multithreaded with addl --batch and --job options https://github.com/jqnatividad/qsv/pull/1553
  • sniff: added addl mimetype/file formats detected by bumping file-format from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1589
  • split: add <outdir> error handling and add usage text examples https://github.com/jqnatividad/qsv/pull/1585
  • split: added --chunks option https://github.com/jqnatividad/qsv/pull/1587
  • split: add --kb-size option https://github.com/jqnatividad/qsv/pull/1613
  • sqlp: added JSONL output format and compression support for AVRO and Arrow output formats in https://github.com/jqnatividad/qsv/pull/1635
  • tojsonl: add --trim option https://github.com/jqnatividad/qsv/pull/1554
  • Add QSVDOTENVPATH env var https://github.com/jqnatividad/qsv/pull/1562
  • Add license scan report and status by @fossabot in https://github.com/jqnatividad/qsv/pull/1550
  • Added several benchmarks for new/changed commands

Changed

  • luau: bumped Luau from 0.606 to 0.614
  • freq: major performance refactor - https://github.com/jqnatividad/qsv/commit/1a3a4b4f54f7459ce120c2bc907385ad72d34d8e
  • split: migrate to rayon from threadpool https://github.com/jqnatividad/qsv/pull/1555
  • split: refactored to actually create chunks <= desired --kb-size, obviating need for hacky --sep-factor option https://github.com/jqnatividad/qsv/pull/1615
  • tojsonl: improved true/false boolean inferencing false positive handling https://github.com/jqnatividad/qsv/pull/1641
  • tojsonl: fine-tune boolean inferencing https://github.com/jqnatividad/qsv/pull/1643
  • schema: use parallel sort when sorting enums for fields https://github.com/jqnatividad/qsv/commit/523c60a36bf45b4df5e66f3951a91948c22d5261
  • Use array for rustflags to avoid conflicts with user flags by @clarfonthey in https://github.com/jqnatividad/qsv/pull/1548
  • Make it easier and more consistent to package for distros by @alerque in https://github.com/jqnatividad/qsv/pull/1549
  • Replace simple_home_dir with simple_expand_tilde crate https://github.com/jqnatividad/qsv/pull/1578
  • build(deps): bump rayon from 1.8.0 to 1.8.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1547
  • build(deps): bump rayon from 1.8.1 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1623
  • build(deps): bump uuid from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1551
  • build(deps): bump jql-runner from 7.1.2 to 7.1.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1552
  • build(deps): bump jql-runner from 7.1.3 to 7.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1602
  • build(deps): bump jql-runner from 7.1.5 to 7.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1637
  • build(deps): bump flexi_logger from 0.27.3 to 0.27.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1556
  • build(deps): bump regex from 1.10.2 to 1.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1557
  • build(deps): bump cached from 0.47.0 to 0.48.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1558
  • build(deps): bump cached from 0.48.0 to 0.48.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1560
  • build(deps): bump cached from 0.48.1 to 0.49.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1618
  • build(deps): bump chrono from 0.4.31 to 0.4.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/1559
  • build(deps): bump chrono from 0.4.32 to 0.4.33 by @dependabot in https://github.com/jqnatividad/qsv/pull/1566
  • build(deps): bump mlua from 0.9.4 to 0.9.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1565
  • build(deps): bump mlua from 0.9.5 to 0.9.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1632
  • build(deps): bump serde from 1.0.195 to 1.0.196 by @dependabot in https://github.com/jqnatividad/qsv/pull/1568
  • build(deps): bump serde from 1.0.196 to 1.0.197 by @dependabot in https://github.com/jqnatividad/qsv/pull/1612
  • build(deps): bump serde_json from 1.0.111 to 1.0.112 by @dependabot in https://github.com/jqnatividad/qsv/pull/1567
  • build(deps): bump serde_json from 1.0.112 to 1.0.113 by @dependabot in https://github.com/jqnatividad/qsv/pull/1576
  • build(deps): bump serde_json from 1.0.113 to 1.0.114 by @dependabot in https://github.com/jqnatividad/qsv/pull/1610
  • bump Polars from 0.36 to 0.37 https://github.com/jqnatividad/qsv/pull/1570
  • build(deps): bump polars from 0.37.0 to 0.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1629
  • build(deps): bump polars from 0.38.0 to 0.38.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1634
  • build(deps): bump strum from 0.25.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1572
  • build(deps): bump indexmap from 2.1.0 to 2.2.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1575
  • build(deps): bump indexmap from 2.2.1 to 2.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1579
  • build(deps): bump indexmap from 2.2.2 to 2.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1601
  • build(deps): bump indexmap from 2.2.4 to 2.2.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1633
  • build(deps): bump robinraju/release-downloader from 1.8 to 1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1574
  • build(deps): bump itertools from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1577
  • build(deps): bump rust_decimal from 1.33.1 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1580
  • build(deps): bump rust_decimal from 1.34.0 to 1.34.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1582
  • build(deps): bump rust_decimal from 1.34.2 to 1.34.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1597
  • build(deps): bump reqwest from 0.11.23 to 0.11.24 by @dependabot in https://github.com/jqnatividad/qsv/pull/1581
  • build(deps): bump tokio from 1.35.1 to 1.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1583
  • build(deps): bump tempfile from 3.9.0 to 3.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1590
  • build(deps): bump tempfile from 3.10.0 to 3.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1622
  • build(deps): bump indicatif from 0.17.7 to 0.17.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1598
  • build(deps): bump csvs_convert from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1596
  • build(deps): bump ahash from 0.8.7 to 0.8.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1599
  • build(deps): bump ahash from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1611
  • build(deps): bump ahash from 0.8.9 to 0.8.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1624
  • build(deps): bump ahash from 0.8.10 to 0.8.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1640
  • build(deps): bump governor from 0.6.0 to 0.6.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1603
  • build(deps): bump semver from 1.0.21 to 1.0.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1606
  • build(deps): bump ryu from 1.0.16 to 1.0.17 by @dependabot in https://github.com/jqnatividad/qsv/pull/1605
  • build(deps): bump anyhow from 1.0.79 to 1.0.80 by @dependabot in https://github.com/jqnatividad/qsv/pull/1604
  • build(deps): bump geosuggest-core from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1607
  • build(deps): bump geosuggest-utils from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1608
  • build(deps): bump pyo3 from 0.20.2 to 0.20.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1616
  • build(deps): bump crossbeam-channel from 0.5.11 to 0.5.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1627
  • build(deps): bump log from 0.4.20 to 0.4.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1628
  • build(deps): bump sysinfo from 0.30.5 to 0.30.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1636
  • build(deps): bump qsv-sniffer from 0.10.1 to 0.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1644
  • deps: bump halfbrown from 0.24 to 0.25 https://github.com/jqnatividad/qsv/commit/b32fc7161715fc0d3cc96b1566f89354bea36abf
  • apply select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2024-02-23 - the nightly that Polars 0.38 can be built with

Fixed

  • fix: fix feature = "cargo-clippy" deprecation by @rex4539 in https://github.com/jqnatividad/qsv/pull/1626
  • stats: fixed cache.json file not being updated properly https://github.com/jqnatividad/qsv/commit/b9c43713b0943baf2d70eb7089e1d8f05b848b9d

Removed

  • Removed datefmt subcommand from apply https://github.com/jqnatividad/qsv/pull/1638

New Contributors

  • @clarfonthey made their first contribution in https://github.com/jqnatividad/qsv/pull/1548
  • @alerque made their first contribution in https://github.com/jqnatividad/qsv/pull/1549
  • @fossabot made their first contribution in https://github.com/jqnatividad/qsv/pull/1550
  • @rex4539 made their first contribution in https://github.com/jqnatividad/qsv/pull/1626

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.122.0...0.123.0

- Rust
Published by jqnatividad almost 2 years ago

https://github.com/dathere/qsv - 0.122.0

👉 REQUEST FOR USE CASES: 👈

Please help define the future of qsv. Add what you're currently using qsv for here - https://github.com/jqnatividad/qsv/discussions/1529

Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.

Highlights:

  • qsvpy is now available in the prebuilt binaries for select platforms! It's a new qsv binary variant with the python feature, enabling the py command. Three subvariants are available - qsvpy310, qsvpy311 and qsvpy312, corresponding to Python 3.10, 3.11 and 3.12 respectively.
  • Removed generate command as generate's main dependency is unmaintained and has old dependencies. generate was also not used much, as the test data it generated was not well suited for training models and it was too slow so we decided to remove it even before the synthesize (#235) command is ready.
  • reverse now has index support and can work in "streaming" mode and handle larger than memory CSV files.
  • sort and sample: users can now choose from three Random Number Generator (RNG) algorithms with the --rng option - standard, faster & cryptosecure.
  • pseudo now has --start, --increment & --formatstr options.
  • fmt now has a --no-final-newline option to suppress the final newline for better interoperability with other tools, specifically Excel. It also treats "T" as special value for tab character for the --out-delimiter option.

Added

  • reverse: now has index support and can work in "streaming" mode https://github.com/jqnatividad/qsv/pull/1531
  • sort: added --rng <kind> for different kinds of RNGs - standard, faster & cryptosecure https://github.com/jqnatividad/qsv/pull/1535
  • sample: added --rng <kind> option (standard, faster & cryptosecure) https://github.com/jqnatividad/qsv/pull/1532
  • pseudo: major refactor. Added --start, --increment & --formatstr options https://github.com/jqnatividad/qsv/pull/1541
  • fmt: add --no-final-newline option https://github.com/jqnatividad/qsv/pull/1545
  • added additional benchmarks
  • added additional test for new options. We now have ~1,300 tests!

Changed

  • fmt: --out-delimiter now treats "T" as special value for tab character https://github.com/jqnatividad/qsv/pull/1546
  • build(deps): bump whatlang from 0.16.3 to 0.16.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1525
  • build(deps): bump serde_json from 1.0.110 to 1.0.111 by @dependabot in https://github.com/jqnatividad/qsv/pull/1524
  • build(deps): bump pyo3 from 0.20.1 to 0.20.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1526
  • build(deps): bump sysinfo from 0.30.3 to 0.30.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1523
  • build(deps): bump sysinfo from 0.30.4 to 0.30.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1530
  • build(deps): bump serial_test from 2.0.0 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1534
  • build(deps): bump mlua from 0.9.2 to 0.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1540
  • build(deps): bump mlua from 0.9.3 to 0.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1542
  • build(deps): bump simple-home-dir from 0.2.1 to 0.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1544
  • apply select clippy suggestions
  • update several indirect dependencies

Removed

  • removed generate command https://github.com/jqnatividad/qsv/pull/1527
  • removed generate feature from GitHub Action workflows https://github.com/jqnatividad/qsv/pull/1528
  • sample: removed --faster RNG sampling option, replacing it with --rng https://github.com/jqnatividad/qsv/pull/1532

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.121.0...0.122.0

- Rust
Published by jqnatividad about 2 years ago

https://github.com/dathere/qsv - 0.121.0

Two days ago, qsv 0.120.0 was released. Hours later, significant updates occurred in our ecosystem: Polars upgraded to version 0.36, Homebrew rolled out support for Rust 1.75.0, and our pull request for 'cached' was merged.

In light of these developments, we're releasing 0.121.0 out of cycle to leverage the new features, fixes and performance enhancements in these key components integral to qsv.


👉 REQUEST FOR USE CASES: 👈 Please help define the future of qsv. Add what you're currently using qsv for here - https://github.com/jqnatividad/qsv/discussions/1529

Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.


Added

  • sqlp: with Polars 0.36, it now supports:
  • sqlp: now supports writing to Apache Avro format https://github.com/jqnatividad/qsv/commit/32f2fbb1b06dfbee4e7823521e9991a306e7eb44
  • sqlp: when writing to CSV --format, if the --output file has a TSV or TAB extension, it will automatically use the tab delimiter https://github.com/jqnatividad/qsv/commit/c97048cfc8c0fed01d7b32d3173be508135b9769

Changed

  • Bump polars from 0.35 to 0.36 https://github.com/jqnatividad/qsv/pull/1521
  • build(deps): bump serde from 1.0.193 to 1.0.194 by @dependabot in https://github.com/jqnatividad/qsv/pull/1520
  • build(deps): bump serde_json from 1.0.109 to 1.0.110 by @dependabot in https://github.com/jqnatividad/qsv/pull/1519
  • build(deps): bump semver from 1.0.20 to 1.0.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1518
  • build(deps): bump serde_stacker from 0.1.10 to 0.1.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1517
  • build(deps): bump cached from 0.46.1 to 0.47.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1522
  • bumped MSRV to 1.75.0

Fixed

  • cat: fixed performance regression in rowskey by moving unchanging variables out of hot loop - https://github.com/jqnatividad/qsv/commit/96a40e93b5ab09655aa90f8653014c96d3da652b
  • sqlp: Polars 0.36 fixed the SQL SUBSTR() function

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.120.0...0.121.0

- Rust
Published by jqnatividad about 2 years ago

https://github.com/dathere/qsv - 0.120.0

Happy New Year! 🎉🎉🎉 Here's the first release of 2024, the biggest ever with 280+ commits! qsv 0.120.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.

Apart from wrapping qsv with a User Interface, qsv pro also comes with a retinue of related cloud-based data cleaning, enrichment and enhancement services along with expanded metadata inferencing to make your Data Useful, Usable and Used!

qsv pro draws inspiration from OpenRefine, but reimagined without its file size and speed limitations, with qsv pro having the ability to process multi-gigabyte files in seconds.

It incorporates hard lessons we learned in the past 12 years deploying Data Portals and Data Pipelines to create a new Data/Metadata Wrangling and AI-assisted Data Publishing service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.

But it's not quite ready for release yet, so stay tuned!

We're now taking signups for a preview release however, so if you're interested, please sign up here!

Excitingly, qsv was also mentioned on Hacker News in this thread Dec 23, 2023! As a result, we're now almost at 2,000+ stars on GitHub from 900 stars on Dec 22! 🎉🎉🎉

Stay tuned for more advancements in 2024 – it's set to be a landmark year for qsv! 🦄🦄🦄


Added

  • cat: add rowskey --group options; increased perf of rowskey https://github.com/jqnatividad/qsv/pull/1508
  • validate: add --trim and --quiet options https://github.com/jqnatividad/qsv/pull/1452
  • apply & applydp: operations regex_replace now supports empty --replacement with the "" special value https://github.com/jqnatividad/qsv/pull/1470 and https://github.com/jqnatividad/qsv/pull/1471
  • exclude: also consider rows with empty fields https://github.com/jqnatividad/qsv/pull/1498
  • extsort: add --tmp-dir option https://github.com/jqnatividad/qsv/commit/ca1f46145cf6a06ad4401e2bf30f82415cc2ef82

Changed

  • validate: Faster RFC4180 validation with byterecords and SIMD-accelerated utf8 validation https://github.com/jqnatividad/qsv/pull/1440
  • excel: minor performance tweaks https://github.com/jqnatividad/qsv/pull/1446
  • apply, applydp, explode, geocode, pseudo: consolidate redundant code and use one replace_column_value helper fn in util.rs https://github.com/jqnatividad/qsv/pull/1456
  • excel: bump calamine from 0.22 to 0.23 https://github.com/jqnatividad/qsv/pull/1473
  • excel & joinp: use atoi_simd for faster &[u8] to int conversion https://github.com/jqnatividad/qsv/commit/9521f3e3fb73f600e6691188a65e19eda6cd317e
  • cat, describegpt, headers, sqlp, to, tojsonl: refactor commands that accept multiple input files to use improved process_input helper https://github.com/jqnatividad/qsv/pull/1496
  • fetch & fetchpost: get_response refactor for maintainability and performance https://github.com/jqnatividad/qsv/pull/1507
  • luau: replaced --no-colindex option with --colindex option. --col-index slows down processing and is not often used, so make it an option, not the default. https://github.com/jqnatividad/qsv/commit/a0c856807c47f00f531837ae124d412fca834cd2
  • make thousands crate optional with apply feature in https://github.com/jqnatividad/qsv/pull/1453
  • build(deps): bump uuid from 1.6.0 to 1.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1430
  • build(deps): bump serde from 1.0.192 to 1.0.193 by @dependabot in https://github.com/jqnatividad/qsv/pull/1432
  • build(deps): bump data-encoding from 2.4.0 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1435
  • build(deps): bump mlua from 0.9.1 to 0.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1436
  • build(deps): bump url from 2.4.1 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1437
  • build(deps): bump jql-runner from 7.0.6 to 7.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1439
  • build(deps): bump jql-runner from 7.0.7 to 7.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1447
  • build(deps): bump jql-runner from 7.1.0 to 7.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1457
  • build(deps): bump jql-runner from 7.1.1 to 7.1.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1486
  • build(deps): bump hashbrown from 0.14.2 to 0.14.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1441
  • build(deps): bump redis from 0.23.3 to 0.23.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1442
  • build(deps): bump redis from 0.23.3 to 0.24.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1455
  • build(deps): bump atoi_simd from 0.15.3 to 0.15.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1444
  • build(deps): bump atoi_simd from 0.15.4 to 0.15.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1445
  • build(deps): bump atoi_simd from 0.15.5 to 0.15.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1512
  • build(deps): bump actions/setup-python from 4.7.1 to 4.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1454
  • build(deps): bump actions/setup-python from 4.8.0 to 5.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1459
  • build(deps): bump actions/stale from 8 to 9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1463
  • build(deps): bump itoa from 1.0.9 to 1.0.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1464
  • build(deps): bump tokio from 1.34.0 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1465
  • build(deps): bump tokio from 1.35.0 to 1.35.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1483
  • build(deps): bump ryu from 1.0.15 to 1.0.16 by @dependabot in https://github.com/jqnatividad/qsv/pull/1466
  • build(deps): bump file-format from 0.22.0 to 0.23.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1468
  • build(deps): bump github/codeql-action from 2 to 3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1476
  • build(deps): bump geosuggest-utils from 0.5.1 to 0.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1479
  • build(deps): bump geosuggest-core from 0.5.1 to 0.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1478
  • build(deps): bump reqwest from 0.11.22 to 0.11.23 by @dependabot in https://github.com/jqnatividad/qsv/pull/1480
  • build(deps): bump calamine from 0.23.0 to 0.23.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1481
  • build(deps): bump qsv-sniffer from 0.10.0 to 0.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1484
  • build(deps): bump anyhow from 1.0.75 to 1.0.76 by @dependabot in https://github.com/jqnatividad/qsv/pull/1485
  • build(deps): bump futures from 0.3.29 to 0.3.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1492
  • build(deps): bump futures-util from 0.3.29 to 0.3.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1491
  • build(deps): bump crossbeam-channel from 0.5.9 to 0.5.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1490
  • build(deps): bump sysinfo from 0.29.10 to 0.29.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1443
  • Bump sysinfo from 0.29.11 to 0.30 https://github.com/jqnatividad/qsv/pull/1489
  • build(deps): bump sysinfo from 0.30.0 to 0.30.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1495
  • build(deps): bump sysinfo from 0.30.1 to 0.30.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1504
  • build(deps): bump sysinfo from 0.30.2 to 0.30.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1509
  • build(deps): bump tabwriter from 1.3.0 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1500
  • build(deps): bump tempfile from 3.8.1 to 3.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1502
  • build(deps): bump qsv_docopt from 1.4.0 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1503
  • build(deps): bump ahash from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1510
  • build(deps): bump serde_json from 1.0.108 to 1.0.109 by @dependabot in https://github.com/jqnatividad/qsv/pull/1511
  • apply select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2023-12-23

Fixed

  • apply: Fix for dynfmt and calcconv subcommands not working in release mode https://github.com/jqnatividad/qsv/pull/1467
  • luau: fix check for excess mapped columns earlier. Otherwise, we'll get a CSV different field count error https://github.com/jqnatividad/qsv/commit/db1581159590205af9befaade5c047d316c9c8b3

Removed

  • luau: remove unneeded --jit option as we precompile luau scripts to bytecode https://github.com/jqnatividad/qsv/pull/1438

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.119.0...0.120.0

- Rust
Published by jqnatividad about 2 years ago

https://github.com/dathere/qsv - 0.119.0

Highlights:

As we prepare for version 1.0, we're focusing on performance, stability and reliability as we set the stage for qsv pro - a cloud-backed UI version of qsv powered by Tauri, set to be released in 2024. Stay tuned!

  • diff is now out of beta and blazingly fast! Give "the fastest CSV-diff in the world" a try :wink:!
  • joinp now supports snappy automatic compression/decompression!
  • sqlp & joinp now recognize the QSV_COMMENT_CHAR environment variable, allowing you to skip comment lines in your input CSV files. They're also faster with the upgrade to Polars 0.35.4.
  • sqlp now supports subqueries, table aliases, and more!
  • luau: upgraded embedded Luau from 0.599 to 0.604; refactored code to reduce unneeded allocations and increase performance (more than doubling it!) as we prepare for extended recipe support.
  • cat is now even faster with the --flexible option. If you know your CSV files are valid, you can use this option to skip CSV validation and make cat run twice as fast!
  • qsv can now add a Byte Order Mark (BOM) header sequence to produce Excel-friendly CSVs on Windows with the QSV_OUTPUT_BOM environment variable.
  • stats, sort, schema & validate are now faster with the use of atoi_simd to directly convert &[u8] to integer, skipping unnecessary utf8 validation, while also using SIMD CPU instructions for noticeably faster performance.

Added

  • diff: added option/flag for headers in output by @janriemer in https://github.com/jqnatividad/qsv/pull/1395
  • diff: added option/flag --delimiter-output by @janriemer in https://github.com/jqnatividad/qsv/pull/1402
  • cat: added --flexible option to make cat rows faster still https://github.com/jqnatividad/qsv/pull/1408
  • sqlp & joinp: both commands now recognize QSVCOMMENTCHAR env var https://github.com/jqnatividad/qsv/pull/1412
  • joinp: added snappy compression/decompression support https://github.com/jqnatividad/qsv/pull/1413
  • geocode: now automatically decompresses snappy-compressed index files https://github.com/jqnatividad/qsv/pull/1429
  • Add Byte Order Mark (BOM) output support https://github.com/jqnatividad/qsv/pull/1424
  • Added Codacy code quality badge https://github.com/jqnatividad/qsv/commit/99591297d59b3c45363592516d5ecb7d4d98d5c8

Changed

  • stats, sort, schema & validate: use atoi_simd to directly convert &[u8] to integer skipping unnecessary utf8 validation, while also using SIMD instructions for noticeably faster performance
  • cat: faster cat rows https://github.com/jqnatividad/qsv/pull/1407
  • count: optimize --width option https://github.com/jqnatividad/qsv/pull/1411
  • luau: upgrade embedded Luau from 0.603 to 0.604 https://github.com/jqnatividad/qsv/pull/1426
  • use ato_simd for fast &[u8] to int conversion https://github.com/jqnatividad/qsv/pull/1423
  • luau: performance refactor https://github.com/jqnatividad/qsv/commit/4cebd7c9a4b3f9f754fd2746484c24fa61ee1286
  • build(deps): bump csv-diff from 0.1.0-beta.4 to 0.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1394
  • build(deps): bump serde_json from 1.0.107 to 1.0.108 by @dependabot in https://github.com/jqnatividad/qsv/pull/1393
  • build(deps): bump indexmap from 2.0.2 to 2.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1397
  • build(deps): bump jql-runner from 7.0.4 to 7.0.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1399
  • build(deps): bump jql-runner from 7.0.5 to 7.0.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1400
  • build(deps): bump file-format from 0.21.0 to 0.22.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1401
  • build(deps): bump cached from 0.46.0 to 0.46.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1403
  • build(deps): bump serde from 1.0.190 to 1.0.192 by @dependabot in https://github.com/jqnatividad/qsv/pull/1404
  • build(deps): bump tokio from 1.33.0 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1409
  • build(deps): bump flexi_logger from 0.27.2 to 0.27.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1410
  • build(deps): bump qsv-stats from 0.11.0 to 0.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1415
  • build(deps): bump itertools from 0.11.0 to 0.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1418
  • build(deps): bump rust_decimal from 1.33.0 to 1.33.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1420
  • build(deps): bump polars from 0.35.2 to 0.35.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1425
  • build(deps): bump uuid from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1428
  • bump MSRV to 1.74.0
  • apply select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2023-11-18

Fixed

  • pseudo: detect when more than one column is selected for pseudonymization https://github.com/jqnatividad/qsv/commit/0b093726bb964c2a4a6eec15c0e30ed3660fdf97
  • dotenv (.env) tweaks/fixes https://github.com/jqnatividad/qsv/pull/1427
  • fix several typos https://github.com/jqnatividad/qsv/commit/723443eed4ac0f692cdd6ac4a1af4d82e22fda8b
  • fix several markdown lints

Removed

  • remove fast-float as std float parse is now also using Eisel-Lemire algorithm https://github.com/jqnatividad/qsv/pull/1414

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.118.0...0.119.0


NOTE:

To verify prebuilt binary zip archives - click here.

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.118.0

Highlights:

  • With the Polars upgrade to 0.34.2, the sqlp and joinp enjoy expanded capabilities and a noticeable performance boost. 🦄🏇
  • We now publish the 500, 1000, 5000 and 15000 Geonames cities indices for the geocode command, with users able to easily switch indices with the index-load subcommand. As the name implies, the 500 index contains cities with populations of 500 or more, the 1000 index contains cities with populations of 1000 or more, and so on.
    The 15000 index (default) is the smallest (13mb) and fastest with ~26k cities. The 500 index is the largest(56mb) and slowest, with ~200k cities. The 5000 index is 21mb with ~53k cities. The 1000 index is 44mb with ~140k cities. 🎠
  • The geocode command now returns US Census FIPS codes for US places with the %json and %pretty-json formats, returning both US State and US County FIPS codes, with upcoming support for Cities and other US Census geographies (School Districts, Voting Districts, Congressional Districts, etc.) 🎠
  • Improved performance for stats, schema and tojsonl commands with the stats cache bincode refactor. This is especially noticeable for large CSV files as stats previously created large bincode cache files by default.
    The bincode cache allows other commands (currently, only schema and tojsonl) to skip recomputing statistics and deserialize the saved stats data structures directly into memory. Now, it will only create a bincode file if the --stats-binout option is specified (typically, before using the schema an tojsonl commands). stats will still continue to create a stats CSV cache file by default, but it will be much smaller than the bincode file, and is universally applicable, unlike the bincode cache. 🏇
  • self-update will now verify updates. This is done by verifying the zipsign signature of the release zip archive before applying it. This should make it harder for malicious actors to compromise the self-update process. Version 0.118.0 has the verification code, and future releases will use this new verification process. Regardless, we will zipsign all zip archives starting with this release. Users can manually verify the signatures by downloading the zipsign public key and running the zipsign command line tool. See Verifying the Integrity of the Prebuilt Binaries Zip Archive for more info. 🦄
  • The frequency command now supports the --ignore-case option for case-insensitive frequency counts. 🦄🎠
  • The schema command can now compile case-insensitive enum constraints. 🦄
  • Improved performance for apply and applydp commands with faster compile-time perfect hash functions for operations lookups. 🏇
  • Several minor performance improvements and bug fixes with snappy, sniff & cat commands. 🏇

Added

  • frequency: added --ignore-case option https://github.com/jqnatividad/qsv/pull/1386
  • geocode: added 500, 1000, 5000, 15000 Geonames cities convenience shortcuts to index subcommands https://github.com/jqnatividad/qsv/commit/bd9f4c34b0a88cc6a446872ed4cda41e8a1ca102
  • schema: added --ignore-case option when compiling enum constraints; replaced Hashset with faster AHashset https://github.com/jqnatividad/qsv/commit/a16a1ca25f93699a5ee27327f4257e8e559bc5e8
  • snappy: added buf_size parm to compress helper fn https://github.com/jqnatividad/qsv/commit/e0c0d1f7eb22917d43f638121babe23e366c9dd8
  • sniff added --just-mime option https://github.com/jqnatividad/qsv/pull/1372
  • added zipsign signature verification to self-update https://github.com/jqnatividad/qsv/pull/1389

Changed

  • apply & applydp: replaced binary_search with faster compile-time perfect hash functions for operations lookups https://github.com/jqnatividad/qsv/pull/1371
  • stats, schema and tojsonl: stats cache bincode refactor https://github.com/jqnatividad/qsv/pull/1377
  • luau: replaced sanitise-file-name with more popular sanitize-filename crate https://github.com/jqnatividad/qsv/commit/8927cb70bc92e9e1360547e96d1ac10e6037e9e3
  • cat: minor optimization by preallocating with capacity https://github.com/jqnatividad/qsv/commit/c13c34120c47bb7ab603a97a0a7cae7f0de7b146
  • sqlp & joinp: expanded speed/functionality with upgrade to Polars 0.34.2 https://github.com/jqnatividad/qsv/pull/1385
  • tojsonl: improved boolean inferencing. Now correctly infers booleans, even if the enum domain range is more than 2, but has cardinality 2 case-insensitive https://github.com/jqnatividad/qsv/commit/6345f2dc01f6451075ba7f23c35d8ba8cced9293
  • build(deps): bump strum_macros from 0.25.2 to 0.25.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1368
  • build(deps): bump regex from 1.10.1 to 1.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1369
  • build(deps): bump uuid from 1.4.1 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1373
  • build(deps): bump hashbrown from 0.14.1 to 0.14.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1376
  • build(deps): bump self_update from 0.38.0 to 0.39.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1378
  • build(deps): bump ahash from 0.8.5 to 0.8.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1383
  • build(deps): bump serde from 1.0.189 to 1.0.190 by @dependabot in https://github.com/jqnatividad/qsv/pull/1388
  • build(deps): bump futures from 0.3.28 to 0.3.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1390
  • build(deps): bump futures-util from 0.3.28 to 0.3.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1391
  • build(deps): bump tempfile from 3.8.0 to 3.8.1 by @dependabot in https://github.com/jqnatividad/qsv/commit/4f6200cb57fdeb612aeb74d796b4b0c1fde7c243
  • apply select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2023-10-26

Fixed

  • dedup: fixed --ignore-case not being honored during internal sort option https://github.com/jqnatividad/qsv/pull/1387
  • applydp: fixed wrong usage text using apply and not applydp https://github.com/jqnatividad/qsv/commit/c47ba86f305508a41e19ce39f2bd6323a0a60e1e
  • geocode: fixed index-update not honoring --timeout parameter https://github.com/jqnatividad/qsv/commit/3272a9e3ac75e8b8f2d9f13b0cec81a0c41c7ed4
  • geocode : fixed index-load to work properly with convenience shortcuts https://github.com/jqnatividad/qsv/commit/5097326ee41d39787b472b4eea95ddec76bb06b5

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.117.0...0.118.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.117.0

Highlights:

  • geocode: added Federal Information Processing Standards (FIPS) codes to results for US places, so we can derive GEOIDs. This paves the way to doing data enrichment lookups (starting with the US Census) in an upcoming release.
  • Added Goal/Non-goals, explicitly codifying what qsv is and isn't, and what we're trying to achieve with the toolkit.
  • excel: CSV output processing is now multi-threaded, making it a bit faster. The bottleneck is still the Excel/ODS library we're using (calamine), which is single-threaded. But there are active discussions underway to make it much faster in the future.
  • Upgrading the MSRV to 1.73.0 has allowed us to use LLVM 17, which has resulted in an overall performance boost.

Added:

  • geocode: added Federal Information Processing Standards (FIPS) codes to results for US places.
  • Added Goals/Non-goals to README.md

Changed

  • cat : minor optimization https://github.com/jqnatividad/qsv/commit/343bb668ae84fcf862883245382e7d8015da88c2
  • excel: CSV output processing is now multi-threaded https://github.com/jqnatividad/qsv/pull/1360
  • geocode: more efficient dynfmt ptocessing https://github.com/jqnatividad/qsv/pull/1367
  • frequency: optimize allocations before hot loop https://github.com/jqnatividad/qsv/commit/655bebcdec6d89f0ffa33d794069ee5eee0df3e5
  • luau: upgraded embedded Luau from 0.596 to 0.599
  • deps: bump calamine from 0.22.0 to 0.22.1 https://github.com/jqnatividad/qsv/commit/4c4ed7e25614bbfe4d7b16fe7619a5a874ef7591
  • docs: reorganized README, moving FEATURES and INTERPRETERS to their own markdown files.
  • build(deps): bump byteorder from 1.4.3 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1347
  • build(deps): bump tokio from 1.32.0 to 1.33.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1354
  • build(deps): bump regex from 1.9.6 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1356
  • build(deps): bump semver from 1.0.19 to 1.0.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/1358
  • build(deps): bump pyo3 from 0.19.2 to 0.20.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1359
  • build(deps): bump serde from 1.0.188 to 1.0.189 by @dependabot in https://github.com/jqnatividad/qsv/pull/1361
  • build(deps): bump flate2 from 1.0.27 to 1.0.28 by @dependabot in https://github.com/jqnatividad/qsv/pull/1363
  • build(deps): bump regex from 1.10.0 to 1.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1366
  • deps: update several indirect dependencies
  • pin Rust nightly to 2023-10-14
  • bump MSRV to 1.73.0

Removed

  • excel: removed --progressbar option as Excel/ODS maximum sheet size is just too small (1,048,576 rows) to make it useful.

Fixed

  • Fixed Jupyter Notebook Viewer Link by @a5dur in https://github.com/jqnatividad/qsv/pull/1349

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.116.0...0.117.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.116.0

Highlights: :tada: :rocket:

  • Benchmarks refinements galore with more benchmarks and more comprehensive benchmarking instructions. 🎠
  • geocode: The Geonames index's configuration metadata is now available with the geocode index-check subcommand. No need to maintain a separate metadata JSON file. This should make it even easier to maintain multiple Geonames index files with different configurations without having to worry if you're looking at the right metadata JSON file. 🎠
  • cat: rowskey subcommand is now 27% faster 🏇🏽
  • tojsonl: parallelized with rayon, making it 33% faster! 🏇🏽
  • smaller qsv binary size and faster compile times if the to_parquet feature is disabled. If you're good enough with sqlp's ability to create a parquet file from a SQL query, qsv's binary size and compile time will be markedly smaller/faster. 🏇🏽
  • minor perf tweaks & optimizations - count and luau commands 🏇🏽

Added

  • geocode: added Geonames index file metadata to index-check subcommand
  • tojsonl: parallelized with rayon https://github.com/jqnatividad/qsv/pull/1338
  • to: added to_parquet feature. https://github.com/jqnatividad/qsv/pull/1341
  • benchmarks: upgraded from 3.0.0 to 3.3.1
    • you can now specify a separate benchmarking binary as we dogfood qsv for the benchmarks and some features are required that may not be in the qsv binary variant being benchmarked
    • added additional count benchmarks with --width option
    • added additional luau benchmarks with single/multi filter options
    • added additional search benchmark with --unicode option
    • show absolute path of qsv binaries used (both the one we're dogfooding and the one being benchmarked) and their version info before running the benchmarks proper
    • ensured schema benchmark was not using the stats cache with the --force option

Changed

  • cat: use an empty byte_record var instead of repeatedly allocating a new one in a hot loop https://github.com/jqnatividad/qsv/commit/eddafd11acb8e8d9d8587f952ba8cd02d450b08e
  • count: minor optimization https://github.com/jqnatividad/qsv/commit/bb113c0f348d4903ebfdc893c09517e5a4b145ad
  • luau: minor perf tweaks https://github.com/jqnatividad/qsv/commit/c71cd16a22f729a074a2a8d59020eba4cc8d7281 and https://github.com/jqnatividad/qsv/commit/f9c1e3c755fdb847be8f7f54d21622fb0c8c747f
  • (deps): bump Geosuggest from 0.4.5 to 5.1 https://github.com/jqnatividad/qsv/pull/1333
  • (deps): use patched version of calamine which has unreleased fixes since 0.22.0
  • build(deps): bump flexi_logger from 0.27.0 to 0.27.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1328
  • build(deps): bump indexmap from 2.0.0 to 2.0.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1329
  • build(deps): bump hashbrown from 0.14.0 to 0.14.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1334
  • build(deps): bump file-format from 0.20.0 to 0.21.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1335
  • build(deps): bump indexmap from 2.0.1 to 2.0.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1336
  • build(deps): bump regex from 1.9.5 to 1.9.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1337
  • build(deps): bump jql-runner from 7.0.3 to 7.0.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1340
  • build(deps): bump csvs_convert from 0.8.7 to 0.8.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1339
  • build(deps): bump actions/setup-python from 4.7.0 to 4.7.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1342
  • build(deps): bump reqwest from 0.11.21 to 0.11.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1343
  • build(deps): bump csv from 1.2.2 to 1.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1344
  • build(deps): bump actix-governor from 0.4.1 to 0.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1346
  • applied select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2023-10-04

Removed

  • geocode: removed separate metadata JSON file for Geonames index files. The metadata is now embedded in the index file itself and can be viewed with the index-check command.
  • removed redundant setting from profile.release-samply in Cargo.toml https://github.com/jqnatividad/qsv/commit/2a35be5bbae2fc6994c103acac37ea3559854a0a

Fixed

  • geocode: when producing JSON output with the now subcommands (suggestnow, reversenow, countryinfonow), we now produce valid JSON. We previously generated JSON with escaped/extra quotes as it was formatted to be included in CSV files, which is required for the suggest, reverse and countryinfo subcommands as they are designed to process CSVs with multiple rows, thus requiring escaped JSON. The now commands are only meant for one result so there's no need to escape quote the JSON. https://github.com/jqnatividad/qsv/pull/1345
  • schema: fixed --force flag not being honored

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.115.0...0.116.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.115.0

We continue to refine the benchmark suite, and have added a new setup argument to setup and install the required tools for the benchmark suite. We've also added more comprehensive checks to ensure that the required tools are installed before running the benchmarks. 🎠

For geocode, we've added a JSON file describing the Geonames index file configuration. This should help users maintain several Geonames index files with different configurations. 🎠

geocode should also be a tad faster now, thanks to cached crate making ahash its default hashing algorithm and upgrading hashbrown - microbenchmarks show a 33% performance improvement. 🏇🏽

We also added a release-samply profile so we can make it easier to squeeze more performance out of the toolkit with samply. 🏇🏽


Added

  • geocode: added a JSON file describing the Geonames index file configuration in https://github.com/jqnatividad/qsv/pull/1324
  • benchmarks: v3.0.0 release
    • added setup argument to setup and install required tools for the benchmark suite
    • added more comprehensive required tools check
    • added more realistic luau benchmarks, using helper luau scripts (dtformat.luau and turnaroundtime.luau)
    • added stats withcache and createcache benchmarks
    • added benchmark_aggregations.luau script for benchmark analysis
    • added binary, total_mean and qsv_env columns to benchmark results binary is the qsv binary variant used total_mean is the sum of all the mean run times of the benchmarks qsv_env are the qsv-relevant environment variables active while running the benchmarks
    • expanded README.md and benchmark suite usage instructions
  • added release-samply profile to Cargo.toml to facilitate continued performance optimization with samply

Changed

  • readme: move tab completion instructions/script to scripts/misc
  • geocode: updated bundled Geonames index to 2021-09-25
  • bump embedded luau from 0.594 to 0.596
  • build(deps): bump flexi_logger from 0.26.1 to 0.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1317
  • build(deps): bump indicatif from 0.17.6 to 0.17.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1318
  • build(deps): bump semver from 1.0.18 to 1.0.19 by @dependabot in https://github.com/jqnatividad/qsv/pull/1320
  • build(deps): bump cached from 0.45.1 to 0.46.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1322
  • build(deps): bump geosuggest-core from 0.4.3 to 0.4.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1323
  • build(deps): bump geosuggest-utils from 0.4.3 to 0.4.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1321
  • build(deps): bump fastrand from 2.0.0 to 2.0.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1325
  • bump MSRV from Rust 1.72.0 to 1.72.1
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-09-25

Fixed

  • benchmarks: fixed invalid luau benchmark that had invalid luau command

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.114.0...0.115.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.114.0

The long-overdue Benchmarks revamp is finally here! 🎉- https://qsv.dathere.com/benchmarks

The benchmarks have been completely rewritten to be more reproducible, and now use hyperfine instead of time. The new benchmarks are now run as part of the release process, and the results are compiled into a single page that is published on the new Quicksilver website.

The new benchmarks are also more comprehensive, and designed to be run on a variety of hardware and operating systems. This allows users to adapt the benchmarks to their own workloads and environments.

Other release highlights include: * geocode is now fully-featured and ready for production use! 🎉 Though it only currently features Geonames city-level lookup support, it provides a solid foundation on top of which we'll add more geocoding providers in the future (next up - OpenCage support with street-level geocoding). * Polars has been bumped from 0.32.1 to 0.33.2, which includes a number of performance improvements for the sqlp and joinp commands. * major performance increase on several regex/aho-corasick powered commands on Apple Silicon thanks to various under-the-hood improvements in the aho-corasick crate.

Big thanks to @rzmk , @a5dur, @minhajuddin2510 and @samibaig and helping me finally push out the revamped Benchmarks!


Added

  • Added autoindex size threshold, replacing QSV_AUTOINDEX env var with QSV_AUTOINDEX_SIZE. Resolves #1300. in https://github.com/jqnatividad/qsv/pull/1301 https://github.com/jqnatividad/qsv/commit/69e25aceb25d3bb20d8fdeeadf5504d8fe75fe37
  • diff: Added test for different delimiters by @janriemer in https://github.com/jqnatividad/qsv/pull/1297
  • benchmarks: Added qsv benchmark notebook. by @a5dur in https://github.com/jqnatividad/qsv/pull/1309
  • geocode: Added countryinfo/now subcommand made available in geosuggest 0.4.3 https://github.com/jqnatividad/qsv/pull/1311
  • geocode: Added --language option so users can specify the language of the geocoding results. This requires running the index-update subcommand with the --languages option to rebuild the index with the desired languages.
  • sqlp: add example of using columns with embedded spaces in SQL queries https://github.com/jqnatividad/qsv/commit/f7bf4f65edc2068f42712808aec7096ef7122dfe

Changed

  • benchmarks: Benchmarks revamped https://github.com/jqnatividad/qsv/pull/1298, https://github.com/jqnatividad/qsv/pull/1310 https://github.com/jqnatividad/qsv/commit/d8eeb949b8c846793941eb9c343e8598784b6207
  • build(deps): bump serde_json from 1.0.106 to 1.0.107 by @dependabot in https://github.com/jqnatividad/qsv/pull/1302
  • build(deps): bump mimalloc from 0.1.38 to 0.1.39 by @dependabot in https://github.com/jqnatividad/qsv/pull/1303
  • build(deps): bump simple-home-dir from 0.1.4 to 0.2.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1304
  • build(deps): bump chrono from 0.4.30 to 0.4.31 by @dependabot in https://github.com/jqnatividad/qsv/pull/1305
  • (deps): bump Polars from 0.32.1 to Polars 0.33.2 https://github.com/jqnatividad/qsv/pull/1308
  • build(deps): bump cpc from 1.9.2 to 1.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1313
  • build(deps): bump rayon from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1315
  • (deps): update several indirect dependencies
  • pin Rust nightly to 2023-09-21

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.113.0...0.114.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.113.0

This is the first "Unicorn" 🦄 release, adding MAJOR new features to the toolkit!

  • geocode: adds high-speed, cache-backed, multi-threaded geocoding using a local, updateable copy of the GeoNames database. This is a major improvement over the previous geocode subcommand in the apply command thanks to the wonderful geosuggest crate.
  • guaranteed non-UTF8 input detection with the validate and input commands. Quicksilver REQUIRES UTF-8 encoded input. You can now use these commands to ensure you have valid UTF-8 input before using the rest of the toolkit.
  • New/expanded whirlwind tour & quick-start notebooks by @a5dur and @rzmk 🎠
  • Various performance improvements all-around: 🏇🏽
    • overall increase of ~5% now that mimalloc - the default allocator for qsv, is built without secure mode unnecessarily enabled.
    • flatten command is now ~10% faster
    • faster regex performance thanks to various under-the-hood improvements in the regex crate
    • and the benchmark scripts have been updated by @minhajuddin2510 to use hyperfine instead of time, and to use the same input file for all benchmarks to make them more reproducible. In upcoming releases, we'll start compiling the benchmark results into a single page as part of the release process, so we can track our progress over time.

and last but not least - Quicksilver now has a website! - https://qsv.dathere.com/ :unicorn: :tada: :rocket:

And its not just a static site with a few links - its a full-blown web app that lets you try out qsv commands in your browser! It's not just a demo site - you can use it as a configurator and save your commands to a gist and share them with others!

It's the first Beta release of the Quicksilver website, so there's still a lot of work to do, but we're excited to share it with you and get your feedback!

We have more exciting features planned for Quicksilver and the website, but we require your help to make it happen! For qsv, use GitHub issues. For the website, use the feedback form. And if you want to help out, please check out the contributing guide.

Big thanks to @rzmk for all the work on the website! To @a5dur for all the QA work on this release! And to @minhajuddin2510 for revamping the benchmark script!


Added

  • geocode: new high-speed geocoding command https://github.com/jqnatividad/qsv/pull/1231
    • major improvements using geosuggest upstream https://github.com/jqnatividad/qsv/pull/1269
    • add suggest --country filter https://github.com/jqnatividad/qsv/pull/1275
    • add --admin1 filter https://github.com/jqnatividad/qsv/pull/1276
    • automatic --country inferencing from --admin1 code https://github.com/jqnatividad/qsv/pull/1277
    • add --suggestnow and --reversenow subcommands https://github.com/jqnatividad/qsv/pull/1280
    • add "%dyncols:" special formatter to dynamically add geocoded columns to the output CSV https://github.com/jqnatividad/qsv/pull/1286
  • excel: add SheetType (Worksheet, DialogSheet, MacroSheet, ChartSheet, VBA) in metadata mode; log.info! headers; wordsmith comments https://github.com/jqnatividad/qsv/pull/1225
  • excel: moar metadata! moar examples! https://github.com/jqnatividad/qsv/pull/1271
  • add support ALL_PROXY env var https://github.com/jqnatividad/qsv/pull/1233
  • input: add --encoding-errors handling option https://github.com/jqnatividad/qsv/pull/1235
  • fixlengths: add --insert option https://github.com/jqnatividad/qsv/pull/1247
  • joinp: add --sql-filter option https://github.com/jqnatividad/qsv/pull/1287
  • luau: we now embed Luau 0.594 from 0.592
  • notebooks: add qsv-colab-quickstart by @rzmk in https://github.com/jqnatividad/qsv/pull/1253
  • notebooks: Added Whirlwindtour.ipynb by @a5dur in https://github.com/jqnatividad/qsv/pull/1223

Changed

  • flatten: refactor for performance https://github.com/jqnatividad/qsv/pull/1227
  • validate: improved utf8 error mesages https://github.com/jqnatividad/qsv/pull/1256
  • apply & applydp: improve usage text in relation to multi-column capabilites https://github.com/jqnatividad/qsv/pull/1257
  • qsv-cache now set to ~/.qsv-cache by default https://github.com/jqnatividad/qsv/pull/1265
  • Download file helper refactor https://github.com/jqnatividad/qsv/pull/1267
  • Benchmark Update by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1237
  • Improved error handling https://github.com/jqnatividad/qsv/pull/1238
  • Improved error handling - incorrect usage errors are now differentiated from other errors as well https://github.com/jqnatividad/qsv/pull/1239
  • build(deps): bump whatlang from 0.16.2 to 0.16.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1221
  • build(deps): bump serde_json from 1.0.104 to 1.0.105 by @dependabot in https://github.com/jqnatividad/qsv/pull/1220
  • build(deps): bump tokio from 1.31.0 to 1.32.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1222
  • build(deps): bump mlua from 0.9.0-rc.3 to 0.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1224
  • build(deps): bump tempfile from 3.7.1 to 3.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1226
  • build(deps): bump postgres from 0.19.5 to 0.19.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1229
  • build(deps): bump file-format from 0.18.0 to 0.19.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1228
  • build(deps): bump reqwest from 0.11.18 to 0.11.19 by @dependabot in https://github.com/jqnatividad/qsv/pull/1232
  • build(deps): bump rustls-webpki from 0.101.3 to 0.101.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1236
  • build(deps): bump reqwest from 0.11.19 to 0.11.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/1241
  • build(deps): bump rust_decimal from 1.31.0 to 1.32.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1242
  • build(deps): bump serde from 1.0.185 to 1.0.186 by @dependabot in https://github.com/jqnatividad/qsv/pull/1243
  • build(deps): bump jql-runner from 7.0.2 to 7.0.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1246
  • build(deps): bump grex from 1.4.2 to 1.4.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1245
  • build(deps): bump mlua from 0.9.0 to 0.9.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1244
  • build(deps): bump mimalloc from 0.1.37 to 0.1.38 by @dependabot in https://github.com/jqnatividad/qsv/pull/1249
  • build(deps): bump postgres from 0.19.6 to 0.19.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1251
  • build(deps): bump serde from 1.0.186 to 1.0.187 by @dependabot in https://github.com/jqnatividad/qsv/pull/1250
  • build(deps): bump serde from 1.0.187 to 1.0.188 by @dependabot in https://github.com/jqnatividad/qsv/pull/1252
  • build(deps): bump regex from 1.9.3 to 1.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1254
  • build(deps): bump url from 2.4.0 to 2.4.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1261
  • build(deps): bump tabwriter from 1.2.1 to 1.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1259
  • build(deps): bump sysinfo from 0.29.8 to 0.29.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1260
  • build(deps): bump actix-web from 4.3.1 to 4.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1262
  • build(deps): bump chrono from 0.4.26 to 0.4.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1264
  • build(deps): bump chrono from 0.4.27 to 0.4.28 by @dependabot in https://github.com/jqnatividad/qsv/pull/1266
  • build(deps): bump redis from 0.23.2 to 0.23.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1268
  • build(deps): bump regex from 1.9.4 to 1.9.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1272
  • build(deps): bump flexi_logger from 0.25.6 to 0.26.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1273
  • build(deps): bump geosuggest-core from 0.4.0 to 0.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1279
  • build(deps): bump geosuggest-utils from 0.4.0 to 0.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1278
  • build(deps): bump cached from 0.44.0 to 0.45.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1282
  • build(deps): bump self_update from 0.37.0 to 0.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1281
  • build(deps): bump actions/checkout from 3 to 4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1283
  • build(deps): bump chrono from 0.4.28 to 0.4.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1284
  • build(deps): bump cached from 0.45.0 to 0.45.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1285
  • build(deps): bump sysinfo from 0.29.9 to 0.29.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1288
  • build(deps): bump chrono from 0.4.29 to 0.4.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1290
  • build(deps): bump bytes from 1.4.0 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1289
  • build(deps): bump file-format from 0.19.0 to 0.20.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1291
  • cargo update bump several indirect dependencies
  • apply select clippy suggestions
  • pin Rust nightly to 2023-09-06

Removed

  • apply: remove geocode subcmd now that we have a dedicated geocode command https://github.com/jqnatividad/qsv/pull/1263

Fixed

  • excel: we can now open workbooks with formulas set to an empty string value https://github.com/jqnatividad/qsv/pull/1274
  • notebooks: fix qsv colab quickstart link by @rzmk in https://github.com/jqnatividad/qsv/pull/1255

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.112.0...0.113.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.112.0

This is the second in a series of "Giddy-up" 🏇🏽 releases, improving the performance of the following commands:

  • stats: by refactoring the code to detect empty cells more efficiently, and by removing unnecessary bounds checks in the main compute loop. (~10% performance improvement)
  • sample: by refactoring the code to use an index more effectively when available - not only making it faster, but also eliminating the need to load the entire dataset into memory. Also added a --faster option to use a faster random number generator. (~15% performance improvement)
  • frequency, schema, search & validate by amortizing/reducing allocations in hot loops
  • excel: by refactoring the main hot loop to convert Excel cells more efficiently

The prebuilt binaries are also built with CPU optimizations enabled for x86_64 and Apple Silicon (arm64) architectures.

0.112.0 is also a "Carousel" (i.e. increased usability) 🎠 release featuring new Jupyter notebooks in the contrib/notebooks directory to help users get started with qsv.


Added

  • sqlp: added CASE expression support with Polars 0.32 https://github.com/jqnatividad/qsv/commit/9d508e69cc4165b7adbe4b44b15c4c07001cf76b
  • sample: added --faster option to use a faster random number generator https://github.com/jqnatividad/qsv/pull/1210
  • jsonl: added --delimiter option https://github.com/jqnatividad/qsv/pull/1205
  • excel: added --delimiter option https://github.com/jqnatividad/qsv/commit/ab73067da1f498c7c64de9b87586d6998d36d042
  • notebook/describegpt: added describegpt QA Jupyter notebook by @a5dur in https://github.com/jqnatividad/qsv/pull/1215
  • notebook/count: added intro-to-count.ipynb by @rzmk in https://github.com/jqnatividad/qsv/pull/1207

Changed

  • stats: refactor hot compute function - https://github.com/jqnatividad/qsv/commit/35999c5dad996edcafe6094ff4b717f96d657832
  • stats: faster detection of empty samples https://github.com/jqnatividad/qsv/commit/b0548159ca8c8a35bab1dd196c72414f739c2fd8 and https://github.com/jqnatividad/qsv/commit/a7f0836bcebf947efb3cc7e7f6a884cc649196b5
  • sample: major refactor making it faster, but also eliminating need to load the entire dataset into memory when an index is available. https://github.com/jqnatividad/qsv/pull/1210
  • frequency: refactor primary ftables function https://github.com/jqnatividad/qsv/commit/57d660d6cf48be4b8845b5c09a46b16582f612c0
  • excel: refactor main loop for more performance - https://github.com/jqnatividad/qsv/commit/61f227b0120c8d20bfb5906536a0a0de7d9f82ad
  • rustfmt: matchblocktrailing_comma https://github.com/jqnatividad/qsv/pull/1206
  • bump MSRV to 1.71.1 https://github.com/jqnatividad/qsv/commit/1c993644992d1cf4d0985d100045821cb027c17d
  • apply clippy suggestions https://github.com/jqnatividad/qsv/pull/1209
  • build(deps): bump tokio from 1.29.1 to 1.30.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1204
  • build(deps): bump log from 0.4.19 to 0.4.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/1211
  • build(deps): bump redis from 0.23.1 to 0.23.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1213
  • build(deps): bump tokio from 1.30.0 to 1.31.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1212
  • build(deps): bump sysinfo from 0.29.7 to 0.29.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1214
  • upgrade to Polars 0.32.0 https://github.com/jqnatividad/qsv/pull/1217
  • build(deps): bump flate2 from 1.0.26 to 1.0.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1218
  • build(deps): bump polars from 0.32.0 to 0.32.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1219
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-08-13

Removed

  • stats: removed Debug derives from structs - https://github.com/jqnatividad/qsv/commit/2def136230ed2e9af727168d3a6329d660b65d4d

Fixed

  • notebook/count: fix Google Colab link by @rzmk in https://github.com/jqnatividad/qsv/pull/1208

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.111.0...0.112.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.111.0

This is the first in a series of "Giddy-up" 🏇🏽 releases.

As Quicksilver matures, we will continue to tweak it in our goal to be the 🚀 fastest general purpose CSV data-wrangling CLI toolkit available.

"Giddy-up" 🏇🏽 releases increase performance by: * taking advantage of new Rust features as they become available * using new libraries that are faster than the ones we currently use * optimizing our code to take advantage of new features in the libraries we use * using new algorithms that are faster than the ones we currently use * taking advantage of more hardware features (SIMD, multi-core, etc.) * adding reproducible benchmarks that are automatically updated on release to track our progress

As it is, Quicksilver has an aggressive release tempo - with more than 160 releases since its initial release in December 2020. This was made possible by the solid foundation of Rust and the xsv project from which qsv was forked. We will continue to build on this foundation by adding more CI tests and starting to track code coverage so we can continue to iterate aggressively with confidence.

Apart from "giddy-up" releases, Quicksilver will also have "carousel" 🎠 releases that will focus on making the toolkit more accessible to non-technical users.

"Carousel" 🎠 releases will include: * more documentation * more examples * more tutorials * more recipes in the Cookbook * multiple GUI wrappers around the CLI * integrations with common desktop tools like Excel, Google Sheets, Open Office, etc. * tighter integration with the CKAN ecosystem, with a focus on helping data publishers & data coordinators maintain a high quality data/metadata catalog

Hopefully, this will make qsv more accessible to non-technical users, and help them get more value out of their data. Special attention will be given to "open data" use cases - enabling non-profits, governments and regular citizens tap raw open data and convert it to actionable insight - making open data useful, usable and used.

Every now and then, we'll also have "Unicorn" 🦄 releases that will add MAJOR new features to the toolkit (e.g. 10x type features like the integration of Pola.rs into qsv).

We will also add a new Technical Documentation section to the wiki to document qsv's architecture and how each command works. The hope is doing so will lower the barrier to contributions and help us grow the community of qsv contributors.

Added

  • sort: add --faster option https://github.com/jqnatividad/qsv/pull/1190
  • describegpt: add -Q, --quiet option by @rzmk in https://github.com/jqnatividad/qsv/pull/1179

Changed

  • stats: refactor initdateinference https://github.com/jqnatividad/qsv/pull/1187
  • join: cache has_headers result in hot loop https://github.com/jqnatividad/qsv/commit/e53edafdc91493c61e9889c8004177f147483a45
  • search & searchset: amortize allocs https://github.com/jqnatividad/qsv/pull/1188
  • stats: use fast-float to convert string to float https://github.com/jqnatividad/qsv/pull/1191
  • sqlp: more examples, apply clippy::needless_borrow lint https://github.com/jqnatividad/qsv/commit/ff37a041da246101db03c51d22b498127a5d7ba7 and https://github.com/jqnatividad/qsv/commit/b8e1f7784cc6906745cdd43b61194e897a3666c4
  • use fast-float project-wide (apply, applydp, schema, sort, validate) https://github.com/jqnatividad/qsv/pull/1192
  • fine tune publishing workflows to enable universally available CPU features https://github.com/jqnatividad/qsv/commit/a1dccc74b480477acaa17e21dde706c159c56b48
  • build(deps): bump serde from 1.0.179 to 1.0.180 by @dependabot in https://github.com/jqnatividad/qsv/pull/1176
  • build(deps): bump pyo3 from 0.19.1 to 0.19.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1177
  • build(deps): bump qsv-dateparser from 0.9.0 to 0.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1178
  • build(deps): bump qsv-sniffer from 0.9.4 to 0.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1180
  • build(deps): bump indicatif from 0.17.5 to 0.17.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1182
  • Bump to qsv stats 0.11 https://github.com/jqnatividad/qsv/pull/1184
  • build(deps): bump serde from 1.0.180 to 1.0.181 by @dependabot in https://github.com/jqnatividad/qsv/pull/1185
  • build(deps): bump qsv_docopt from 1.3.0 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1186
  • build(deps): bump filetime from 0.2.21 to 0.2.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1193
  • build(deps): bump regex from 1.9.1 to 1.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1194
  • build(deps): bump regex from 1.9.2 to 1.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1195
  • build(deps): bump serde from 1.0.181 to 1.0.182 by @dependabot in https://github.com/jqnatividad/qsv/pull/1196
  • build(deps): bump tempfile from 3.7.0 to 3.7.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1199
  • build(deps): bump strum_macros from 0.25.1 to 0.25.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1200
  • build(deps): bump serde from 1.0.182 to 1.0.183 by @dependabot in https://github.com/jqnatividad/qsv/pull/1201
  • cargo update bump several indirect dependencies
  • apply select clippy lint suggestions
  • pin Rust nightly to 2023-08-07

Removed

  • temporarily remove rand/simd_support feature when building nightly as its causing the nightly build to fail https://github.com/jqnatividad/qsv/commit/0a66fdb454941052857f6458df38abe7730e0b4b

Fixed

  • fixed typos from documentation by @a5dur in https://github.com/jqnatividad/qsv/pull/1203

New Contributors

  • @a5dur made their first contribution in https://github.com/jqnatividad/qsv/pull/1203

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.110.0...0.111.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.110.0

Added

  • describegpt: Add jsonl to prompt file doc section & more clarification by @rzmk in https://github.com/jqnatividad/qsv/pull/1149
  • luau: add --no-jit option https://github.com/jqnatividad/qsv/pull/1170
  • sqlp: add CTE examples https://github.com/jqnatividad/qsv/commit/33f0218c6a78b9cef15e9bed6e227e5f17ef747a

Changed

  • frequency: minor optimizations https://github.com/jqnatividad/qsv/commit/ecac0be5777a50cef2bfe7937d80c5ffe071e4cd
  • join: performance optimizations https://github.com/jqnatividad/qsv/commit/4cb593783efc4e7c2026d632b8dc741cc2edc778 and https://github.com/jqnatividad/qsv/commit/4cb593783efc4e7c2026d632b8dc741cc2edc778
  • sqlp: reduce allocs in loop https://github.com/jqnatividad/qsv/commit/ae164b570c300845e75ce0fac3272221bdebfa66
  • Apple Silicon build now uses mimalloc allocator by default https://github.com/jqnatividad/qsv/commit/bfab24aba2d3b3f70f08ea407572d20feeda725d
  • build(deps): bump jql-runner from 7.0.1 to 7.0.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1151
  • build(deps): bump serde from 1.0.171 to 1.0.173 by @dependabot in https://github.com/jqnatividad/qsv/pull/1154
  • build(deps): bump tempfile from 3.6.0 to 3.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1155
  • build(deps): bump serde from 1.0.174 to 1.0.175 by @dependabot in https://github.com/jqnatividad/qsv/pull/1157
  • build(deps): bump redis from 0.23.0 to 0.23.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1164
  • build(deps): bump serde from 1.0.175 to 1.0.177 by @dependabot in https://github.com/jqnatividad/qsv/pull/1163
  • build(deps): bump serde_json from 1.0.103 to 1.0.104 by @dependabot in https://github.com/jqnatividad/qsv/pull/1160
  • build(deps): bump grex from 1.4.1 to 1.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1159
  • build(deps): bump sysinfo from 0.29.6 to 0.29.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1158
  • build(deps): bump mlua from 0.9.0-rc.1 to 0.9.0-rc.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1169
  • build(deps): bump flexi_logger from 0.25.5 to 0.25.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1168
  • build(deps): bump jemallocator from 0.5.0 to 0.5.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1167
  • build(deps): bump serde from 1.0.177 to 1.0.178 by @dependabot in https://github.com/jqnatividad/qsv/pull/1166
  • build(deps): bump rust_decimal from 1.30.0 to 1.31.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1172
  • build(deps): bump csvs_convert from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1174
  • apply clippy:needless_pass_by_ref_mut lint in select and frequency https://github.com/jqnatividad/qsv/commit/ba6566e5ea73a1042d33c02035ed1736947b60d8 and https://github.com/jqnatividad/qsv/commit/83add7b30c6e32a49b412629acf60c4c7057df37
  • cargo update bump indirect dependencies
  • pin Rust nightly to 2023-07-29

Removed

  • excel: remove defunct dates-whitelist comments https://github.com/jqnatividad/qsv/commit/2a24d2dcd23c2ccd24dfef1055bf265085f10146

Fixed

  • join: fix left-semi join. Fixes #1150. https://github.com/jqnatividad/qsv/pull/1153
  • foreach: fix command argument token splitter pattern. Fixes #1171 https://github.com/jqnatividad/qsv/pull/1173

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.109.0...0.110.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.109.0

This is a monstrous👹 release with lots of new features and improvements!

The biggest new feature is the describegpt command which allows you to use OpenAI's Large Language Models to generate extended metadata from a CSV. We created this command primarily for CKAN and Datapusher+ so we can infer descriptions, tags and to automatically created annotated data dictionaries using the CSV's summary statistics and frequency tables. In that way, it works even for very large CSV files without consuming too many Open AI tokens. This is a very powerful feature and we are looking forward to seeing what people do with it. Thanks @rzmk for all the work on this!

This release also features major improvements in the sqlp and joinp commands thanks to all the new capabilities of Polars 0.31.1.

Polars SQL's capabilities have been vastly improved in 0.31.1 with numerous new SQL functions and operators, and they're all available with the sqlp command.

The joinp command has several new options for CSV parsing, for pre-join filtering (--filter-left and --filter-right), and pre-join validation with the --validate option. Two new asof join variants (--left_by and --right_by) were also added.

Added

  • describegpt command by @rzmk in https://github.com/jqnatividad/qsv/pull/1036
  • describegpt: minor refactoring in https://github.com/jqnatividad/qsv/pull/1104
  • describegpt: --key & QSVOPENAIAPI_KEY by @rzmk in https://github.com/jqnatividad/qsv/pull/1105
  • describegpt: add --user-agent in help message by @rzmk in https://github.com/jqnatividad/qsv/pull/1095
  • describegpt: json output format for redirection by @rzmk in https://github.com/jqnatividad/qsv/pull/1107
  • describegpt: add testing (resolves #1114) by @rzmk in https://github.com/jqnatividad/qsv/pull/1115
  • describegpt: add --model option (resolves #1101) by @rzmk in https://github.com/jqnatividad/qsv/pull/1117
  • describegpt: polishing https://github.com/jqnatividad/qsv/pull/1122
  • describegpt: add --jsonl option (resolves #1086) by @rzmk in https://github.com/jqnatividad/qsv/pull/1127
  • describegpt: add --prompt-file option (resolves #1085) by @rzmk in https://github.com/jqnatividad/qsv/pull/1120
  • joinp: added asof_by join variant; added CSV formatting options consistent with sqlp CSV format options https://github.com/jqnatividad/qsv/pull/1090
  • joinp: add --filter-left and --filter-right options https://github.com/jqnatividad/qsv/pull/1146
  • joinp: add --validate option https://github.com/jqnatividad/qsv/pull/1147
  • fetch & fetchpost: add --no-cache option https://github.com/jqnatividad/qsv/pull/1112
  • sniff: detect file kind along with mime type https://github.com/jqnatividad/qsv/pull/1137
  • user-agent metadata now contains the current command's name https://github.com/jqnatividad/qsv/pull/1093

Changed

  • fetch & fetchpost: --redis and --no-cache are mutually exclusive https://github.com/jqnatividad/qsv/pull/1113
  • luau: adapt to mlua 0.9.0-rc.1 API changes https://github.com/jqnatividad/qsv/pull/1129
  • upgrade to Polars 0.31.1 https://github.com/jqnatividad/qsv/pull/1139
  • Bump MSRV to latest Rust stable (1.71.0)
  • pin Rust nightly to 2023-07-15
  • Bump uuid from 1.3.4 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1073
  • Bump tokio from 1.28.2 to 1.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1077
  • Bump tokio from 1.29.0 to 1.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1087
  • Bump sysinfo from 0.29.2 to 0.29.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1088
  • build(deps): bump sysinfo from 0.29.4 to 0.29.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1148
  • Bump jql-runner from 6.0.9 to 7.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1092
  • build(deps): bump jql-runner from 7.0.0 to 7.0.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1132
  • Bump itoa from 1.0.6 to 1.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1091
  • Bump itoa from 1.0.7 to 1.0.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1098
  • build(deps): bump itoa from 1.0.8 to 1.0.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1142
  • Bump serde from 1.0.164 to 1.0.165 by @dependabot in https://github.com/jqnatividad/qsv/pull/1094
  • Bump serde from 1.0.165 to 1.0.166 by @dependabot in https://github.com/jqnatividad/qsv/pull/1100
  • Bump serde from 1.0.166 to 1.0.167 by @dependabot in https://github.com/jqnatividad/qsv/pull/1116
  • build(deps): bump serde from 1.0.167 to 1.0.171 by @dependabot in https://github.com/jqnatividad/qsv/pull/1118
  • Bump pyo3 from 0.19.0 to 0.19.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1099
  • Bump ryu from 1.0.13 to 1.0.14 by @dependabot in https://github.com/jqnatividad/qsv/pull/1096
  • build(deps): bump ryu from 1.0.14 to 1.0.15 by @dependabot in https://github.com/jqnatividad/qsv/pull/1144
  • Bump strum_macros from 0.25.0 to 0.25.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1097
  • Bump serde_json from 1.0.99 to 1.0.100 by @dependabot in https://github.com/jqnatividad/qsv/pull/1103
  • build(deps): bump serde_json from 1.0.100 to 1.0.101 by @dependabot in https://github.com/jqnatividad/qsv/pull/1123
  • build(deps): bump serde_json from 1.0.101 to 1.0.102 by @dependabot in https://github.com/jqnatividad/qsv/pull/1125
  • build(deps): bump serde_json from 1.0.102 to 1.0.103 by @dependabot in https://github.com/jqnatividad/qsv/pull/1143
  • Bump serde_stacker from 0.1.8 to 0.1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1110
  • Bump regex from 1.8.4 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1109
  • build(deps): bump regex from 1.9.0 to 1.9.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1119
  • Bump jsonschema from 0.17.0 to 0.17.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1108
  • build(deps): bump cpc from 1.9.1 to 1.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1121
  • build(deps): bump governor from 0.5.1 to 0.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1128
  • build(deps): bump actions/setup-python from 4.6.1 to 4.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1134
  • build(deps): bump file-format from 0.17.3 to 0.18.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1136
  • build(deps): bump serde_stacker from 0.1.9 to 0.1.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1141
  • build(deps): bump semver from 1.0.17 to 1.0.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/1140
  • cargo update bump several indirect dependencies

Fixed

  • fmt: Quote ASCII format differently by @LemmingAvalanche in https://github.com/jqnatividad/qsv/pull/1075
  • apply: make dynfmt subcommand case sensitive. Fixes #1126 https://github.com/jqnatividad/qsv/pull/1130
  • applydp: make dynfmt case-sensitive https://github.com/jqnatividad/qsv/pull/1131
  • describegpt: docs/Describegpt.md: typo 'a' --> 'an' by @rzmk in https://github.com/jqnatividad/qsv/pull/1135
  • tojsonl: support snappy-compressed input. Fixes #1133 https://github.com/jqnatividad/qsv/pull/1145
  • security.md: fix mailto text by @rzmk in https://github.com/jqnatividad/qsv/pull/1079

New Contributors

  • @LemmingAvalanche made their first contribution in https://github.com/jqnatividad/qsv/pull/1075

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.108.0...0.109.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.108.0

Another big Quicksilver release with lots of new features and improvements!

The two Polars-powered commands - joinp and sqlp - have received significant attention. joinp now supports asof joins and the --try-parsedates option. sqlp now has several Parquet format options, along with a --low-memory option.

Other new features include:

  • A new cat rowskey --group option that emulates csvkit's csvstack command.
  • SIMD-accelerated UTF-8 validation for the input command.
  • A --field-separator option for the flatten command.
  • The sniff command now uses the excellent file-format crate for mime-type detection on ALL platforms, not just Linux, as was the case when we were using the libmagic library.

Also, QuickSilver now has optimized builds for Apple Silicon. These builds are created using native Apple Silicon self-hosted Action Runners, which means we can enable all qsv features without being constrained by cross-compilation limitations and GitHub’s Action Runner’s disk/memory constraints. Additionally, we compile Apple Silicon builds with M1/M2 chip optimizations enabled to maximize performance.

Finally, qsv startup should be noticeably faster, thanks to @vi’s PR to avoid sysinfo::System::new_all.

Added

  • joinp: added asof join & --try-parsedates option https://github.com/jqnatividad/qsv/pull/1059
  • cat: emulate csvkit's csvstack https://github.com/jqnatividad/qsv/pull/1067
  • input: SIMD-accelerated utf8 validation https://github.com/jqnatividad/qsv/commit/88e1df2757b4a9a6f9dbaf55a99b87fc15b18a65
  • sniff: replace magic with file-format crate, enabling mime-type detection on all platforms https://github.com/jqnatividad/qsv/pull/1069
  • sqlp: add --low-memory option https://github.com/jqnatividad/qsv/commit/d95048e7be1a9d34cc7a22feebbd792a5c27c604
  • sqlp: added parquet format options https://github.com/jqnatividad/qsv/commit/c179cf49e02343138b058d02783332394029a050 https://github.com/jqnatividad/qsv/commit/a861ebf246d22db0f4bcbce1b76788413cfdd1e7
  • flatten: add --field-separator option https://github.com/jqnatividad/qsv/pull/1068
  • Apple Silicon binaries built on native Apple Silicon self-hosted Action Runners, enabling all features and optimized for M1/M2 chips

Changed

  • input: minor improvements https://github.com/jqnatividad/qsv/commit/62cff74b4679e2ba207916392cab5de573ce0a59
  • joinp: align option names with join command https://github.com/jqnatividad/qsv/pull/1058
  • sqlp: minor improvements
  • changed all GitHub action workflows to account for the new Apple Silicon builds
  • Bump rust_decimal from 1.29.1 to 1.30.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1049
  • Bump serde_json from 1.0.96 to 1.0.97 by @dependabot in https://github.com/jqnatividad/qsv/pull/1051
  • Bump calamine from 0.21.0 to 0.21.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1052
  • Bump strum from 0.24.1 to 0.25.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1055
  • Bump actix-governor from 0.4.0 to 0.4.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1060
  • Bump csvs_convert from 0.8.5 to 0.8.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1061
  • Bump itertools from 0.10.5 to 0.11.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1062
  • Bump serde_json from 1.0.97 to 1.0.99 by @dependabot in https://github.com/jqnatividad/qsv/pull/1065
  • Bump indexmap from 1.9.3 to 2.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1066
  • Bump calamine from 0.21.1 to 0.21.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1071
  • cargo update bump various indirect dependencies
  • pin Rust nightly to 2021-06-23

Fixed

  • Avoid sysinfo::System::new_all by @vi in https://github.com/jqnatividad/qsv/pull/1064
  • correct typos project-wide https://github.com/jqnatividad/qsv/pull/1072

Removed

  • removed libmagic dependency from all GitHub action workflows

New Contributors

  • @vi made their first contribution in https://github.com/jqnatividad/qsv/pull/1064

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.107.0...0.108.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.107.0

We continue to improve the new sqlp command. It now supports SQL scripts and additional options to fine-tune Polars CSV parsing and formatting behavior.

We also added an _all_generic special value for the rename command which allows you to rename all columns in a CSV with generic names (e.g. col1, col2, colN). This was done to make it easier to prepare CSVs with no headers for use with sqlp.

This release also features a Windows MSI installer. This is a big step forward for qsv and we hope to make it easier for Windows users to install and use qsv. Thanks @minhajuddin2510 for all the work on pulling this together!

Added

  • sqlp: added script support https://github.com/jqnatividad/qsv/pull/1037
  • sqlp: added CSV format options https://github.com/jqnatividad/qsv/pull/1048
  • rename: add "_all_generic" special value for headers https://github.com/jqnatividad/qsv/pull/1031

Changed

  • excel: now supports Duration type with calamine upgrade to 0.21.0 https://github.com/jqnatividad/qsv/pull/1045
  • Update publish-wix-installer.yml by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1032
  • Bump mlua from 0.9.0-beta.2 to 0.9.0-beta.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1030
  • Bump serde from 1.0.163 to 1.0.164 by @dependabot in https://github.com/jqnatividad/qsv/pull/1029
  • Bump csvs_convert from 0.8.4 to 0.8.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1028
  • Bump sysinfo from 0.29.1 to 0.29.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1027
  • Bump log from 0.4.18 to 0.4.19 by @dependabot in https://github.com/jqnatividad/qsv/pull/1039
  • Bump uuid from 1.3.3 to 1.3.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1041
  • Bump jql-runner from 6.0.8 to 6.0.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1043
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2021-06-13

Fixed

  • Remove redundant registries protocol by @icp1994 in https://github.com/jqnatividad/qsv/pull/1034
  • fix typo in tojsonl.rs (optionns -> options) by @rzmk in https://github.com/jqnatividad/qsv/pull/1035
  • Fix eula by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1046

New Contributors

  • @icp1994 made their first contribution in https://github.com/jqnatividad/qsv/pull/1034
  • @rzmk made their first contribution in https://github.com/jqnatividad/qsv/pull/1035

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.106.0...0.107.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.106.0

This release features the new Polars-powered sqlp command which allows you to run SQL queries against CSVs.

Initial tests show that its performance is competitive with DuckDB and faster than DataFusion on identical SQL queries, and it just runs rings around pandas sql.

It converts Polars SQL (a subset of ANSI SQL) queries to multi-threaded LazyFrames expressions and then executes them. This is a very powerful feature and allows you to do things like joins, aggregations, group bys, etc. on larger than memory CSVs. The sqlp command is still experimental and we are looking for feedback on it. Please try it out and let us know what you think.

Added

  • sqlp: new command to allow Polars SQL queries against CSVs https://github.com/jqnatividad/qsv/pull/1015

Changed

  • Bump csv from 1.2.1 to 1.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1008
  • Bump pyo3 from 0.18.3 to 0.19.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1007
  • workflow for creating msi for qsv by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1009
  • migrate from once_cell to std::sync::oncelock https://github.com/jqnatividad/qsv/pull/1010
  • Bump qsv_docopt from 1.2.2 to 1.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1011
  • Bump self_update from 0.36.0 to 0.37.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1014
  • Bump indicatif from 0.17.4 to 0.17.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1013
  • Bump cached from 0.43.0 to 0.44.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1012
  • Bump url from 2.3.1 to 2.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1016
  • Wix changes by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1017
  • Bump actions/github-script from 5 to 6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1018
  • Bump regex from 1.8.3 to 1.8.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1019
  • Bump hashbrown from 0.13.2 to 0.14.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1020
  • Bump tempfile from 3.5.0 to 3.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1021
  • Bump sysinfo from 0.29.0 to 0.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1023
  • Bump qsv-dateparser from 0.8.2 to 0.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1022
  • Bump qsv-sniffer from 0.9.3 to 0.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1024
  • Bump qsv-stats from 0.9.0 to 0.10.0 https://github.com/jqnatividad/qsv/commit/38035793d2bb3bf4bee1d3e4cbfc62a6f0235fb6
  • Bump embedded luau from 0.577 to 0.579
  • Bump data-encoding from 2.3.3 to 2.4.0 https://github.com/jqnatividad/qsv/commit/2285a12eab6a7997f97cb39f908684c3adae3ec9
  • cargo update bump several indirect dependencies
  • change MSRV to 1.70.0
  • pin Rust nightly to 2023-06-06

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.105.1...0.106.0

- Rust
Published by jqnatividad over 2 years ago

https://github.com/dathere/qsv - 0.105.1

All "unsafe" code has been removed. By selectively using asserts, we obviate the need to use explicit unchecked logic to skip unnecessary bounds checking.

Changed

  • stats: remove all unsafes https://github.com/jqnatividad/qsv/commit/4a4c0107f98dcd3a2fac7a793101624ec46762df
  • fetch & fetchpost: remove unsafe https://github.com/jqnatividad/qsv/commit/1826bb3cbe24f731973d2e2ce8edc1927dc87d4b
  • validate: remove unsafe https://github.com/jqnatividad/qsv/commit/742ccb3b36fd6a0fb9690d9150bec5b2e4d44b0a
  • normalize --user-agent option across all of qsv https://github.com/jqnatividad/qsv/commit/feff90bba4d6840f7d2aa2100897cfaad7efe08f & https://github.com/jqnatividad/qsv/commit/839b3b71369f948135d403b7d30e8b26248a313b
  • bump qsv-dateparser from 0.8.1 to 0.8.2 which also uses chrono 0.4.26
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-05-29

Fixed

  • remove chrono pin to 0.4.24 and upgrade to 0.4.26 which fixed 0.4.25 CI test failures https://github.com/jqnatividad/qsv/commit/7636d82bdcf3428e59b800b6ff9f53dcd52cddd9

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.105.0...0.105.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.105.0

Added

  • sniff: added --harvest-mode convenience option https://github.com/jqnatividad/qsv/pull/997
  • sniff: added --quick option on Linux https://github.com/jqnatividad/qsv/commit/e16df6fbbad9318cc4efeb500409f80b76cd50e2
  • qsv (pronounced "Quicksilver") now has a tagline - "Hi ho, QuickSilver! Away!" :smile: https://github.com/jqnatividad/qsv/commit/d32aeb1afe7a90c4887b00a0c2a20481a91722fe

Changed

  • sniff: if --no-infer is enabled when sniffing a snappy file, just return the snappy mime type https://github.com/jqnatividad/qsv/pull/996
  • sniff: now returns filesize and last-modified date in errors. https://github.com/jqnatividad/qsv/commit/2162659bd574122e93e204cb14b5114bd7ca5344
  • stats: minor performance tweaks in hot compute loop https://github.com/jqnatividad/qsv/commit/f61198c2057545fb76a9b30bd12adfd3a3bbf8ba
  • qsv binary variants built using older glibc/musl libraries are now published with their respective glibc/musl version suffixes (glibc-2.31/musl-1.1.24) in the filename, instead of just the "older" suffix.
  • pin chrono to 0.4.24 as the new 0.4.25 is breaking CI tests https://github.com/jqnatividad/qsv/commit/cde3623b27fcb583a1248fc736aaf11f569f5085
  • Bump calamine from 0.19.1 to 0.20.0 https://github.com/jqnatividad/qsv/commit/ec7e2df70e33756d4ef49567bf4f5acba3eb19d4
  • Bump actions/setup-python from 4.6.0 to 4.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/991
  • Bump flexi_logger from 0.25.4 to 0.25.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/992
  • Bump regex from 1.8.2 to 1.8.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/993
  • Bump csvs_convert from 0.8.3 to 0.8.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/994
  • Bump log from 0.4.17 to 0.4.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/998
  • Bump polars from 0.29.0 to 0.30.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/999
  • Bump tokio from 1.28.1 to 1.28.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1000
  • Bump once_cell from 1.17.1 to 1.17.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1003
  • Bump indicatif from 0.17.3 to 0.17.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1001
  • cargo bump update several indirect dependencies
  • pin Rust nightly to 2023-05-28

Removed

  • excel: removed kludgy --dates-whitelist option https://github.com/jqnatividad/qsv/pull/1005

Fixed

  • sniff: fix inconsistent mime type detection https://github.com/jqnatividad/qsv/pull/995

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.104.1...0.105.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.104.1

Added

  • added new publishing workflow to build binary variants using older glibc 2.31 instead of glibc 2.35 and musl 1.1.24 instead of musl 1.2.2. This will allow users running on older Linux distros (e.g. Debian, Ubuntu 20.04) to run qsv prebuilt binaries with "older" glibc/musl versions. https://github.com/jqnatividad/qsv/commit/1a08b920240b39ff57282645cc92686b42e3c278

Changed

  • sniff: improved usage text https://github.com/jqnatividad/qsv/commit/d2b32ac6631589230484cb84506b5113c8f75192
  • sniff: if sniffing a URL, and server does not return content-length or last-modified headers, set filesize and last-modified to "Unknown" https://github.com/jqnatividad/qsv/commit/d4a64ac2e7147e7ab5452864fe6063a97f37f76b
  • frequency: use SIMD accelerated utf8 validation in hot loop https://github.com/jqnatividad/qsv/commit/33406a15f554d03ca117e0196efa6362f104e3cc
  • foreach: use simdut8 validation https://github.com/jqnatividad/qsv/commit/df6b4f8ae967bde8ca22bc6dd217938ae5238add
  • apply: use simdutf8 validation in decode operation; also tweak it to avoid panics (however unlikely) https://github.com/jqnatividad/qsv/commit/adf7052db39a08aeda2401774892a884be98223c
  • update install & build instructions with magic
  • Bump regex from 1.8.1 to 1.8.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/990
  • Bump bumpalo from 3.12.2 to 3.13.0
  • pin Rust nightly to 2021-05-22

Removed

  • sniff: disabled --progressbar option on qsvdp binary variant https://github.com/jqnatividad/qsv/commit/1a20edb7af7525fcb5c54daacf70e3381cf17e82

Fixed

  • updated publishing workflows to properly enable magic feature (for sniff mime type detection) https://github.com/jqnatividad/qsv/commit/136211fcd9134f3421223979a5272ff53d77f03b

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.104.0...0.104.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.104.0

Added

  • sniff: add --no-infer option only available on Linux. Using this option makes sniff work as a general mime type detector - retrieving detected mime type, file size (content-length when sniffing a URL), and last modified date.
    When sniffing a URL with --no-infer, it only sniffs the first downloaded chunk, making it very fast even for very large remote files. This option was designed to facilitate accelerated harvesting and broken/stale link checking on CKAN. https://github.com/jqnatividad/qsv/pull/987
  • excel: add canonical_filename to metadata https://github.com/jqnatividad/qsv/pull/985
  • snappy: now accepts url input https://github.com/jqnatividad/qsv/pull/986
  • sample: support url input https://github.com/jqnatividad/qsv/pull/989

Changed

  • Bump qsv-sniffer from 0.9.2 to 0.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/979
  • Bump console from 0.15.5 to 0.15.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/980
  • Bump jql-runner from 6.0.7 to 6.0.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/981
  • Bump console from 0.15.6 to 0.15.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/988
  • Bump embedded Luau from 0.576 to 0.577
  • apply select clippy recommendations
  • tweaked emojis used in Available Commands legend - 🗜️ to 🤯 to denote memory-intensive commands that load the entire CSV into memory; 🪗 to 😣 to denote commands that need addl memory proportional to the cardinality of the columns being processed; 🌐 to denote commands that have web-aware options
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2021-05-21

Fixed

  • excel: Handle ranges larger than the sheet by @bluepython508 in https://github.com/jqnatividad/qsv/pull/984

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.103.1...0.104.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.103.1

Changed

  • Bump reqwest from 0.11.17 to 0.11.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/978
  • cargo update bump indirect dependencies

Fixed

  • fix cargo install failing as it is trying to fetch cargo environment variables that are only set for cargo build, but not cargo install #977

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.103.0...0.103.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.103.0

Added

  • sniff: On Linux, short-circuit sniffing a remote file when we already know its not a CSV https://github.com/jqnatividad/qsv/pull/976
  • stats: now computes variance for dates https://github.com/jqnatividad/qsv/commit/e3e678298de59f2485d5e70f622218d849a2e2c9
  • stats: now automatically invalidates cached stats across qsv releases https://github.com/jqnatividad/qsv/commit/6e929dd1feac692be3f7e1883ad88f99b3abc5b2
  • add magic version to --version option https://github.com/jqnatividad/qsv/commit/455c0f26e237c812bf9d88d6a7906e34c5a9cbeb
  • added CKAN-aware (CKAN) legend to List of Available Commands

Changed

  • stats: improve usage text
  • stats: use extendfromslice for readability https://github.com/jqnatividad/qsv/commit/23275e2e8ef30bdc101293084bce71e651b3222a
  • validate: do not panic if the input is not UTF-8 https://github.com/jqnatividad/qsv/commit/532cd012de0866250be2dc19b6e02ffa27b3c9fb
  • sniff: simplify getting stdin last_modified property; on Linux, return detected mime type in JSON error response https://github.com/jqnatividad/qsv/commit/01975912ae99fe0a7b38cf741f3dfbcf2b9dc486
  • luau: update embedded Luau from 0.573 to 0.576
  • Update nightly build instructions
  • Bump qsv-sniffer from 0.9.1 to 0.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/972
  • Bump tokio from 1.28.0 to 1.28.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/973
  • Bump serde from 1.0.162 to 1.0.163 by @dependabot in https://github.com/jqnatividad/qsv/pull/974
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2021-05-13

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.102.1...0.103.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.102.1

0.102.1 is a small patch release to fix issues in publishing the pre-built binary variants with magic for sniff when cross-compiling.

Changed

  • stats: refine --infer-boolean option info & update test count https://github.com/jqnatividad/qsv/commit/de6390b21a21b67ae0dd3f3f6d0153f2c0736cff
  • tojsonl: refine boolcheckfirstlower_char() fn https://github.com/jqnatividad/qsv/commit/241115e4718c67cd8e701c435b91e02556875eac

Fixed

  • tweaked GitHub Actions publishing workflows to enable building magic-enabled sniff on Linux. Disabled magic when cross-compiling for non-x86_64 Linux targets.

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.102.0...0.102.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.102.0

A lot of work was done on sniff to make it not just a CSV dialect detector, but a general purpose file type detector leveraging :magic_wand: magic :sparkles: - able to detect mime types even for files on URLs.

sniff can now also use the same data types as stats with the --stats-types option. This was primarily done to support metadata collection when registering CKAN resources not only during data entry, but also when checking resource links for bitrot, and when harvesting metadata from other systems, so stats & sniff can be used interchangeably based on the response time requirement and the data quality of the data source.

For example, sniff can be used for quickly inferring metadata by just downloading a small sample from a very large data file DURING data entry ("Resource-first upload workflow"), with stats being used later on, when the data is actually being pushed to the Datastore with Datapusher+, when data type inferences need to be guaranteed, and the entire file will need to be scanned.

Added

  • stats: add --infer-boolean option https://github.com/jqnatividad/qsv/pull/967
  • sniff: add --stats-types option https://github.com/jqnatividad/qsv/pull/968
  • sniff: add magic mime-type detection on Linux https://github.com/jqnatividad/qsv/pull/970
  • sniff: add --user-agent option https://github.com/jqnatividad/qsv/commit/bd0bf788609c7dd5220cdab6061067170acf1ca2
  • sniff: add last_modified info https://github.com/jqnatividad/qsv/commit/ef68bff177ee7c9ce6bd45868488287c8114a91e

Changed

  • make --envlist option allocator-aware https://github.com/jqnatividad/qsv/commit/f3566dc0c4ab7c7236374cce936f5db7200e39de
  • Bump serde from 1.0.160 to 1.0.162 by @dependabot in https://github.com/jqnatividad/qsv/pull/962
  • Bump robinraju/release-downloader from 1.7 to 1.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/960
  • Bump flexi_logger from 0.25.3 to 0.25.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/965
  • Bump sysinfo from 0.28.4 to 0.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/966
  • Bump jql-runner from 6.0.6 to 6.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/969
  • Bump polars from 0.28.0 to 0.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/971
  • apply select clippy recommendations
  • cargo update bump indirect dependencies
  • change MSRV to 1.69.0
  • pin Rust nightly to 2023-05-07

Fixed

  • sniff: make sniff give more consistent results https://github.com/jqnatividad/qsv/pull/958. Fixes #956
  • Bump qsv-sniffer from 0.8.3 to 0.9.1. Replaced all assert with proper error-handling. https://github.com/jqnatividad/qsv/pull/961 https://github.com/jqnatividad/qsv/commit/a7c607a55be9bebca13148f5a0dddf1fea909df7 https://github.com/jqnatividad/qsv/commit/43d7eaf9201c72016682096e84400dba59b7cd95
  • sniff: fixed rowcount calculation when sniffing a URL and the entire file was actually downloaded - https://github.com/jqnatividad/qsv/commit/ef68bff177ee7c9ce6bd45868488287c8114a91e

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.101.0...0.102.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.101.0

We're back to the future! The qsv release train is back on track, as we jump to 0.101.0 over the yanked 0.100.0 release now that self-update logic has been fixed.

Added

  • stats: added more metadata to stats arg cache json - https://github.com/jqnatividad/qsv/commit/5767e5650690a8f39d537ccdc428a6688762cd77
  • added target-triple to user-agent string, and changed agent name to qsv binary variant https://github.com/jqnatividad/qsv/commit/063b08031e361b5c1f26ed504870f0bc1bfd7678, https://github.com/jqnatividad/qsv/commit/70f4ea3b2d0d88b54358c470dd8e964e89adf16d, https://github.com/jqnatividad/qsv/commit/f0fcb0591fcecaae9b8a9db192adbcdfeb402728

Changed

  • excel: performance, safety & documentation refinements https://github.com/jqnatividad/qsv/commit/e9a283d51fe84cc4c4e004c0e7b9b2ef12db683d, https://github.com/jqnatividad/qsv/commit/3800d250223619963bc9072ade9c43200ca1bdaf, https://github.com/jqnatividad/qsv/commit/252b01e2207bb995d09154af546a12174d532d6a, https://github.com/jqnatividad/qsv/commit/6a6df0f045cb4f1e58d07433e73a41579ca1262f, https://github.com/jqnatividad/qsv/commit/67ccd85cbe5441b1ad0188ae524b3e832c817d30, https://github.com/jqnatividad/qsv/commit/f2908ce020316087ed756d614c357373727f2664, https://github.com/jqnatividad/qsv/commit/6d5105deaa00f3b8e350d522b196ef4ed3676fc4, https://github.com/jqnatividad/qsv/commit/dbcea393cfba08b4ffe3b6b6d0acd364a59cb342, https://github.com/jqnatividad/qsv/commit/faa8ef9b3f9d6de6af47ddced0d80a5ad5b4e763
  • replace: clarify that it works on a field-by-field basis https://github.com/jqnatividad/qsv/commit/c0e2012dc011a6269359ed0ff2c7dc157bae5cd0
  • stats: use extendfromslice when possible - https://github.com/jqnatividad/qsv/commit/c71ad4ee3d7992f4ef1cdc37e32d740756340ba9
  • fetch & fetchpost: replace multiple push_fields with a csv from vec - https://github.com/jqnatividad/qsv/commit/f4e0479e508c845f49d320967af443fe5a247327
  • fetch & fetchpost: Migrate to jql 6 https://github.com/jqnatividad/qsv/pull/955
  • schema: made bincode reader buffer bigger - https://github.com/jqnatividad/qsv/commit/39b4bb5f89bab7ada2dda40d66d1e40bb51cbe0a
  • index: use increased default buffer size when creating index https://github.com/jqnatividad/qsv/commit/60fe7d64b7eeb322625d2cc44d196bd5633bd79c
  • standardized user_agent processing https://github.com/jqnatividad/qsv/commit/4c063015a8d664b9ef105243b2ea6541b3cc6b59, https://github.com/jqnatividad/qsv/commit/010c565912c6ae5ba09620cee7f90aeb294c4d14
  • User agent environment variable; standardized user agent processing https://github.com/jqnatividad/qsv/pull/951
  • more robust Environment Variables processing https://github.com/jqnatividad/qsv/pull/946
  • move Environment Variables to its own markdown file https://github.com/jqnatividad/qsv/commit/77c167fe3942ce464bc5a675b76b3371cf75e84b
  • Bump tokio from 1.27.0 to 1.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/945
  • Bump mimalloc from 0.1.36 to 0.1.37 by @dependabot in https://github.com/jqnatividad/qsv/pull/944
  • Bump mlua from 0.9.0-beta.1 to 0.9.0-beta.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/952
  • Bump flate2 from 1.0.25 to 1.0.26 by @dependabot in https://github.com/jqnatividad/qsv/pull/954
  • Bump reqwest from 0.11.16 to 0.11.17 by @dependabot in https://github.com/jqnatividad/qsv/pull/953
  • cargo update bump indirect dependencies
  • pin Rust nightly to 2023-04-30

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.99.1...0.101.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.99.1

Even though this is a patch release, it actually contains a lot of new features and improvements. This was done so that qsv version 0.99.0 and below can upgrade to this release, as the self-update logic in older versions compared versions as strings, and not as semvers, preventing the older versions from updating as the yanked 0.100.0 is less than anything 0.99.0 and below when compared as strings.

The changelog below is a combination of the changelog of the yanked 0.100.0 and the changes since 0.99.0.

Added

  • snappy: add validate subcommand https://github.com/jqnatividad/qsv/pull/920
  • sniff: can now sniff snappy-compressed files - on the local file system and on URLs https://github.com/jqnatividad/qsv/pull/925
  • schema & stats: stats now has a --stats-binout option which schema takes advantage of https://github.com/jqnatividad/qsv/pull/931
  • schema: added example NYC 311 JSON schema validation file generated by qsv schema https://github.com/jqnatividad/qsv/commit/c956212574ad0d800c3cf3bb1caa4e5722f0a393
  • to: added snappy auto-compression/decompression support https://github.com/jqnatividad/qsv/commit/09a7afd38fdf59703edf76fa492eed9747586b6c
  • to: added dirs as input source https://github.com/jqnatividad/qsv/commit/a31fb3b7499e1ed05136b32b3179d5713bec2106 and https://github.com/jqnatividad/qsv/commit/4d4dd548c44967c61493f1e1c2403f352dcfba34
  • to: added unit tests for sqlite, postgres, xslx and datapackage https://github.com/jqnatividad/qsv/commit/16f2b7ec35bc44093b90d4673e8c20a61f6263bb https://github.com/jqnatividad/qsv/commit/808b018d1f5b7f815897979e1bd67d663fe31c9c https://github.com/jqnatividad/qsv/commit/10739c55bdf66494e5f76028fb1bc67dbeb706cf
  • add dotenv file support https://github.com/jqnatividad/qsv/pull/936 and https://github.com/jqnatividad/qsv/pull/937

Changed

  • stats & schema: major performance improvement (30x faster) with stats binary format serialization/deserialization https://github.com/jqnatividad/qsv/commit/73b4b2075a7d9013f8b71a9109073e6d9b8ad9b4
  • snappy: misc improvements in https://github.com/jqnatividad/qsv/pull/921
  • stats: Refine stats binary format caching in https://github.com/jqnatividad/qsv/pull/932
  • bump embedded Luau from 0.5.71 to 0.5.73 https://github.com/jqnatividad/qsv/commit/d0ea7c8f926299c5d201609e4f3f11e11e3462d7
  • Better OOM checks. It now has two distinct modes - NORMAL and CONSERVATIVE, with NORMAL being the default. Previously, the CONSERVATIVE heuristic was the default and it was causing too many false positives https://github.com/jqnatividad/qsv/pull/935
  • Bump actions/setup-python from 4.5.0 to 4.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/934
  • Bump emdedded Luau from 0.5.67 to 0.5.71 https://github.com/jqnatividad/qsv/commit/a67bd3e274b1f73d64bb93e03c817cce583a8b02
  • Bump qsv-stats from 0.7 to 0.8 https://github.com/jqnatividad/qsv/commit/9a6812abff719b11e5b0c7e25009dfc81231757a
  • Bump serde from 1.0.159 to 1.0.160 by @dependabot in https://github.com/jqnatividad/qsv/pull/918
  • Bump cached from 0.42.0 to 0.43.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/919
  • Bump serde_json from 1.0.95 to 1.0.96 by @dependabot in https://github.com/jqnatividad/qsv/pull/922
  • Bump pyo3 from 0.18.2 to 0.18.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/923
  • Bump ext-sort from 0.1.3 to 0.1.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/929
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-23

Removed

  • snappy is even snappier when we removed 8-cpu cap for even faster compression - going from 1.75 gb/sec to 2.25 gb/sec for the NYC 311 test data :rocket: https://github.com/jqnatividad/qsv/commit/19acf2f23187dee5fd104e9e6eceb8fdc74d7a08

Fixed

  • excel: Float serialization correctness by @bluepython508 in https://github.com/jqnatividad/qsv/pull/933
  • luau: only create qsv_cache directory when needed https://github.com/jqnatividad/qsv/pull/930
  • luau: make qsv_shellcmd() helper function work with Windows https://github.com/jqnatividad/qsv/commit/f867158c4c7eaf10c18092b2a4c88ff67cc3a487 and https://github.com/jqnatividad/qsv/commit/cc24acba3c916184059e7e9d776dce9e35294d44
  • Self update semver parsing fixed so versions are compared as semvers, not as strings. This prevented self-update from updating from 0.99.0 to 0.100.0 as 0.99.0 > 0.100.0 when compared as strings. https://github.com/jqnatividad/qsv/pull/940
  • fixed werr macro to also format! messages https://github.com/jqnatividad/qsv/commit/c3ceaf713683ddb70e40a293f494f15144cc78fb

New Contributors

  • @bluepython508 made their first contribution in https://github.com/jqnatividad/qsv/pull/933

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.99.0...0.99.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.99.0

Added

  • added Snappy auto-compression/decompression support. The Snappy format was chosen primarily because it supported streaming compression/decompression and is designed for performance. https://github.com/jqnatividad/qsv/pull/911
  • added snappy command. Although files ending with the ".sz" extension are automatically compressed/decompressed by qsv, the snappy command offers 4-5x faster multi-threaded compression. It can also be used to check if a file is Snappy-compressed or not, and can be used to compress/decompress ANY file. https://github.com/jqnatividad/qsv/pull/911 and https://github.com/jqnatividad/qsv/pull/916
  • diff command added to qsvlite and qsvdp binary variants https://github.com/jqnatividad/qsv/pull/910
  • to: added stdin support https://github.com/jqnatividad/qsv/pull/913

Changed

  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-09

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.98.0...0.99.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.98.0

Added

  • stats: added stats caching and storing the computed stats as metadata. Doing so not only prevents unnecessary recomputation of stats, especially for very large files, it also sets the foundation for summary statistics to be used more widely across qsv to support new commands that leverages these stats - e.g. fixdata, outliers, describegpt, fake, statsviz and multi-pass stats, etc. https://github.com/jqnatividad/qsv/pull/902
  • stats: added --force option to force recomputation of stats https://github.com/jqnatividad/qsv/commit/2f91d0cd981ce9be6c36424cd946f3bcce42b909
  • luau: add qsv_loadcsv helper function https://github.com/jqnatividad/qsv/pull/908
  • added more info about regular expression syntax and link to https://regex101.com which now supports the Rust flavor of regex

Changed

  • logging is now buffered by default https://github.com/jqnatividad/qsv/pull/903
  • renamed features to be more easily understandable: "full" -> "featurecapable", "allfull" -> "all_features" https://github.com/jqnatividad/qsv/pull/906
  • changed GitHub Actions workflows to use the new feature names
  • Bump redis from 0.22.3 to 0.23.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/901
  • Bump filetime from 0.2.20 to 0.2.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/904
  • reenabled fetch and fetchpost CI tests
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-06

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.97.0...0.98.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.97.0

Since 0.96.x was not published, 0.97.0 contains the changes from 0.96.x after fixing the mimalloc build errors on some platforms.

Added

  • excel: add --date-format option in https://github.com/jqnatividad/qsv/pull/897 and https://github.com/jqnatividad/qsv/commit/6a7db997c8d150854405a2cb2ac392479c3534b9
  • luau: add qsv_fileexists() helper fn https://github.com/jqnatividad/qsv/commit/f4cc60f87c3c7c85a7736260356daa3051d2a879

Changed

  • excel: speed up float conversion by using ryu and itoa together rather than going thru core::fmt::Formatter https://github.com/jqnatividad/qsv/commit/e722753c377e385ebdffca199557ab3cf848ce7b
  • joinp: --cross option does not require columns; added CI tests https://github.com/jqnatividad/qsv/pull/894
  • schema: better, more human-readable regex patterns are generated when inferring pattern attribute; more interactive messages https://github.com/jqnatividad/qsv/commit/1620477b752e64b6b2844aafeee4adf9256d4de8
  • schema & validate: improve usage text; added JSON Schema Validation info https://github.com/jqnatividad/qsv/commit/3da68474d0fa4b6ec2170bf69dbfb27ab0d5f8a3
  • Bump tokio from 1.26.0 to 1.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/887
  • Bump reqwest from 0.11.15 to 0.11.16 by @dependabot in https://github.com/jqnatividad/qsv/pull/888
  • Bump serde_json from 1.0.94 to 1.0.95 by @dependabot in https://github.com/jqnatividad/qsv/pull/889
  • Bump serde from 1.0.158 to 1.0.159 by @dependabot in https://github.com/jqnatividad/qsv/pull/890
  • Bump tempfile from 3.4.0 to 3.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/891
  • Bump polars from 0.27.2 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/893
  • Bump mlua from 0.8 to 0.9.0-beta.1 https://github.com/jqnatividad/qsv/commit/9b7e984cba4079f8e826f7e74209a90ce7856bc7
  • bump MSRV to Rust 1.68.2
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-04-02

Removed

  • luau: removed unnecessary --exec option https://github.com/jqnatividad/qsv/commit/0d4ccdaab95ab5471bb71d99aa7f9056dabf48c3

Fixed

  • Fixed build errors on non-Windows platforms #900 by bumping mimalloc from 0.1.34 to 0.1.36

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.95.1...0.97.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.95.1

Changed

  • count: add example/test add link from usage text https://github.com/jqnatividad/qsv/commit/9cd3c293eef0344c27693949f415850881211adf
  • diff: add examples link from usage text https://github.com/jqnatividad/qsv/commit/4250811d0d20284342ccd7efcc58cd7562d16636
  • Standardize --timeout option handling and exposed it with QSV_TIMEOUT env var https://github.com/jqnatividad/qsv/pull/886
  • improved self-update messages https://github.com/jqnatividad/qsv/commit/4027306f08aeca3b2ebe1e4243628a65c1307a9e
  • Bump qsv-dateparser from 0.6 to 0.7
  • Bump qsv-sniffer from 0.7 to 0.8
  • Bump actions/stale from 7 to 8 by @dependabot in https://github.com/jqnatividad/qsv/pull/876
  • Bump newline-converter from 0.2.2 to 0.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/877
  • Bump rust_decimal from 1.29.0 to 1.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/882
  • Bump regex from 1.7.2 to 1.7.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/881
  • Bump sysinfo from 0.28.3 to 0.28.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/883
  • Bump pyo3 from 0.18.1 to 0.18.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/885
  • Bump indexmap from 1.9.2 to 1.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/884
  • change MSRV to Rust 1.68.1
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-26

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.95.0...0.95.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.95.0

Added

  • luau: added qsvcmd() and qsvshellcmd() helpers, detailed map error messages to help with script development https://github.com/jqnatividad/qsv/pull/869
  • luau: added environment variable set/get helper functions - qsvsetenv() and qsvgetenv() https://github.com/jqnatividad/qsv/pull/872
  • luau: added smart qsvregisterlookup() caching so lookup tables need not be repeatedly downloaded and can be persisted/expired as required https://github.com/jqnatividad/qsv/pull/874
  • luau: added QSVCKANAPI, QSVCKANTOKEN and QSVCACHEDIR env vars https://github.com/jqnatividad/qsv/commit/9b7269e98fe004c6d2268d626777628af65dd45d

Changed

  • apply & applydp: expanded usage text to have arguments section; emptyreplace subcommand now supports column selectors https://github.com/jqnatividad/qsv/pull/868
  • luau: smarter script file processing. In addition to recognizing "file:" prefix, if the script argument ends with ".lua/luau" file extensions, its automatically processed as a file https://github.com/jqnatividad/qsv/pull/875
  • luau: qsvsleep() and qsvwritefile() improvements https://github.com/jqnatividad/qsv/commit/27358a26411f95f57acfd62aad8b92906fe82ced
  • partition: added arguments section to usage text; added NYC 311 example https://github.com/jqnatividad/qsv/commit/74aa37b1c138f1c010d338fb4f6c9b48a381532a
  • Bump reqwest from 0.11.14 to 0.11.15 by @dependabot in https://github.com/jqnatividad/qsv/pull/870
  • Bump regex from 1.7.1 to 1.7.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/873
  • apply select clippy lint recommendations
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-22

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.94.0...0.95.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.94.0

Added

  • luau: qsvregisterlookup now supports "ckan://" scheme. This allows the luau script developer to fetch lookup table resources from CKAN instances. https://github.com/jqnatividad/qsv/pull/864
  • luau: added detailed example for "dathere://" lookup scheme in https://github.com/dathere/qsv-lookup-tables repo. https://github.com/jqnatividad/qsv/commit/3074538a9ac1071ba6d6b6e85fdc0ca3c833ce4e
  • luau: added qsv_writefile helper function. This allows the luau script developer to write text files to the current working directory. Filenames are sanitized for safety. https://github.com/jqnatividad/qsv/pull/867
  • luau: random access mode now supports progressbars. The progressbar indicates the current record and the total number of records in the CSV file https://github.com/jqnatividad/qsv/commit/63150a0a0d885f5bd5b118524d802ff59b18f621
  • input: added --comment option which allows the user to specify the comment character. CSV rows that start with the comment character are skipped. https://github.com/jqnatividad/qsv/pull/866

Changed

  • luau: added additional logging messages to help with script debugging https://github.com/jqnatividad/qsv/commit/bcff8adc03ad398829f4874e948f5152bca04783
  • schema & tojsonl: refactor stdin handling https://github.com/jqnatividad/qsv/commit/6c923b19bfa3fbed918335b70b793a6d6011a960
  • bump jsonschema from 0.16 to 0.17
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-17

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.93.1...0.94.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.93.1

Fixed

  • Fixed publishing workflow so qsvdp luau is only enabled on platforms that support it

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.93.0...0.93.1

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.93.0

Added

  • luau: qsvregisterlookup helper function now works with CSVs on URLs https://github.com/jqnatividad/qsv/pull/860
  • luau: added support for "dathere://" lookup scheme, allowing users to conveniently load oft-used lookup tables from https://github.com/dathere/qsv-lookup-tables https://github.com/jqnatividad/qsv/pull/861
  • luau: added detailed API definitions for Luau Helper Functions https://github.com/jqnatividad/qsv/blob/605b38b5636382d45f96d3d9d3c404bb20efaf15/src/cmd/luau.rs#L1156-L1497
  • validate: added --timeout option when downloading JSON Schemas https://github.com/jqnatividad/qsv/commit/605b38b5636382d45f96d3d9d3c404bb20efaf15

Changed

  • remove all glob imports https://github.com/jqnatividad/qsv/pull/857 and https://github.com/jqnatividad/qsv/pull/858
  • qsvdp (Datapusher+-optimized qsv binary variant) now has an embedded luau interpreter https://github.com/jqnatividad/qsv/pull/859
  • validate: JSON Schema url now case-insensitive https://github.com/jqnatividad/qsv/commit/3123dc6da30370cae88c9e4bb9d387fed3d36507
  • Bump serde from 1.0.155 to 1.0.156 by @dependabot in https://github.com/jqnatividad/qsv/pull/862
  • applied select clippy lint recommendations
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-03-14

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.92.0

Added

  • excel: added option to specify range to extract by @EricSoroos in https://github.com/jqnatividad/qsv/pull/843
  • luau: added --remap option. This allows the user to only map specified columns to the output CSV https://github.com/jqnatividad/qsv/pull/841
  • luau: added several new helper functions:
    • qsv_skip: skips writing the current record to the output CSV https://github.com/jqnatividad/qsv/pull/854
    • qsv_break: stops processing the current CSV file https://github.com/jqnatividad/qsv/pull/846
    • qsv_insertrecord: inserts a new record to the output CSV https://github.com/jqnatividad/qsv/pull/845
    • qsv_register_lookup: loads a CSV that can be used as a lookup table in Luau https://github.com/jqnatividad/qsv/commit/38e7b7eb264d4b43b7f3039696ad918238f0a4c6

Changed

  • luau: reorganized code for readability/maintainability https://github.com/jqnatividad/qsv/pull/846
  • foreach: tweak usage text to say it works with shell commands, not just the bash shell https://github.com/jqnatividad/qsv/commit/78851b33e8482c1961e97c17c95ea316950355fd
  • split: added deeplink to examples/tests https://github.com/jqnatividad/qsv/commit/6f293b853b74505b7769e2972e7bc358506db34e
  • select: added deeplink to examples/tests https://github.com/jqnatividad/qsv/commit/72fa0942c5954b48236b6d137a8347e89e2f097c
  • Switch to qsv-optimized fork of docopt.rs - qsv_docopt. As docopt.rs is unmaintained and docopt parsing is an integral part of qsv as we embed each command's usage text in a way that cannot be done by either clap or structopt https://github.com/jqnatividad/qsv/pull/852
  • Bump embedded Luau from 0.566 to 0.567 https://github.com/jqnatividad/qsv/commit/d624e840802b51aae59cf5db0923f8f2605426c5
  • Bump csv from 1.2.0 to 1.2.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/839
  • Bump serde from 1.0.152 to 1.0.153 by @dependabot in https://github.com/jqnatividad/qsv/pull/842
  • Bump serde from 1.0.153 to 1.0.154 by @dependabot in https://github.com/jqnatividad/qsv/pull/844
  • Bump rust_decimal from 1.28.1 to 1.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/853
  • start using new crates.io sparse protocol
  • applied select clippy lint recommendations
  • cargo update bump several other dependencies
  • pin Rust nightly to 2021-03-12

Fixed

  • stats: fix stdin regression https://github.com/jqnatividad/qsv/pull/851
  • excel: Fix missing integer headers in excel transform. by @EricSoroos in https://github.com/jqnatividad/qsv/pull/840
  • luau: fix & improve comment remover https://github.com/jqnatividad/qsv/pull/845

New Contributors

  • @EricSoroos made their first contribution in https://github.com/jqnatividad/qsv/pull/840

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.91.0...0.92.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.91.0

Added

  • luau: map multiple new computed columns in one call https://github.com/jqnatividad/qsv/pull/829
  • luau: added qsv_autoindex() helper function https://github.com/jqnatividad/qsv/pull/834
  • luau: added qsv_coalesce() helper function https://github.com/jqnatividad/qsv/commit/3064ba2116ce5c96f3bd7e789616a3b0ffe9f63b
  • luau: added _LASTROW special variable to facilitate random access mode

Changed

  • diff: rename --primary-key-idx -> --key by @janriemer in https://github.com/jqnatividad/qsv/pull/826
  • diff: implement option to sort by columns by @janriemer in https://github.com/jqnatividad/qsv/pull/827
  • luau: parsing improvements https://github.com/jqnatividad/qsv/pull/835
  • luau: bump embedded luau version from 0.562 to 0.566 https://github.com/jqnatividad/qsv/commit/f4a08b4980201015dcba31dfae74d8b1045c0455
  • sniff: major refactoring. https://github.com/jqnatividad/qsv/pull/836
  • enable polars nightly feature when building nightly https://github.com/jqnatividad/qsv/pull/816
  • bump qsv-sniffer from 0.6.1 to 0.7.0 https://github.com/jqnatividad/qsv/commit/5027a64576f19792f917550f9146792d5c9e351a
  • Bump crossbeam-channel from 0.5.6 to 0.5.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/818
  • Bump flexi_logger from 0.25.1 to 0.25.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/824
  • Bump rayon from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/831
  • Bump ryu from 1.0.12 to 1.0.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/830
  • Bump itoa from 1.0.5 to 1.0.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/832
  • cargo update bump dependencies
  • pin Rust nightly to 2023-03-04

Fixed

  • stats: use utf8-aware truncate https://github.com/jqnatividad/qsv/pull/819
  • sniff: fix URL sniffing https://github.com/jqnatividad/qsv/commit/8d2c514fa2a173be626b5c36dbfb70d60335b81e
  • show polars version in qsv --version https://github.com/jqnatividad/qsv/commit/586a1ed987fa2efbfbc233bd82f84a52fa4c3859

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.90.1...0.91.0

- Rust
Published by jqnatividad almost 3 years ago

https://github.com/dathere/qsv - 0.90.1

Changed

  • joinp: Refactor to use LazyFrames instead of DataFrames for performance and ability to do streaming and process files larger than RAM. https://github.com/jqnatividad/qsv/pull/814 and https://github.com/jqnatividad/qsv/pull/815
  • luau: expanded example using qsv_log helper https://github.com/jqnatividad/qsv/commit/5c198e4bcb243005dace25d8aecbc58bb211cadc
  • handled new clippy lints https://github.com/jqnatividad/qsv/commit/e81a391bd675a2f4fb07169c1d6848340104b9fe
  • adjust publishing workflows to build binaries with as many features enabled. On some platforms, the to and polars(for joinp) features cannot be built.
  • cargo update bump indirect dependencies, notably arrow and duckdb
  • pin Rust nightly to 2023-02-27

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.90.0...0.90.1

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.90.0

Added

  • joinp: new join command powered by Pola.rs. This is just the first of more commands that will leverage the Pola.rs engine. https://github.com/jqnatividad/qsv/pull/798
  • luau: added random acess mode; major refactor as we prepare to use luau as qsv's DSL; added qsv_log helper that can be called from Luau scripts to facilitate development of full-fledged data-wrangling scripts. https://github.com/jqnatividad/qsv/pull/805 and https://github.com/jqnatividad/qsv/pull/806
  • sniff: added URL & re-enabled stdin support; URL support features sampling only the required number of rows to sniff the metadata without downloading the entire file; expanded sniff metadata returned; added --progressbar option for URL sniffing https://github.com/jqnatividad/qsv/pull/812
  • sniff: added --timeout option for URL inputs; now runs async from all the binary variants https://github.com/jqnatividad/qsv/pull/813

Changed

  • diff: sort by line when no other sort option is given by @janriemer in https://github.com/jqnatividad/qsv/pull/808
  • luau: rename --prologue/--epilogue options to --begin/--end; add embedded BEGIN/END block handling https://github.com/jqnatividad/qsv/pull/801
  • Update to csvs_convert 0.8 by @kindly in https://github.com/jqnatividad/qsv/pull/800
  • use simdutf8 when possible https://github.com/jqnatividad/qsv/commit/ae466cbffbc924cc5c1cc09509dd963c56dfc259
  • Bump self_update from 0.35.0 to 0.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/797
  • Bump sysinfo from 0.28.0 to 0.28.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/809
  • Bump actix-web from 4.3.0 to 4.3.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/811
  • improved conditional compilation of different variants https://github.com/jqnatividad/qsv/commit/9e636946504a09a1edeea4b0533d42a0bb658b7f
  • temporarily skip CI tests that use httpbin.org as it was causing intermittent failures https://github.com/jqnatividad/qsv/commit/bee160228794c26326baf569e5e7239206ae4314
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-26

Removed

  • Python 3.6 support removed https://github.com/jqnatividad/qsv/commit/86b29d487261fda7670072bfd5977dd9508ac0aa

Fixed

  • sniff: does not work with stdin which fixes #803; https://github.com/jqnatividad/qsv/pull/807
    Note that stdin support was shortly re-enabled in https://github.com/jqnatividad/qsv/pull/812

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.89.0...0.90.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.89.0

Added

  • cat: added new rowskey subcommand. Unlike the existing rows subcommand, it allows far more flexible concatenation of CSV files by row, even if the files have different number of columns and column order. https://github.com/jqnatividad/qsv/pull/795
  • added jemalloc support. As the current default mimalloc allocator is not supported in some platforms. Also, for certain workloads, jemalloc may be faster. See Memory Allocator for more info https://github.com/jqnatividad/qsv/pull/796
  • added --no-memcheck and related QSV_NO_MEMORY_CHECK env var. This relaxes the conservative Out-of-Memory prevention heuristic of qsv. See Memory Management for more info https://github.com/jqnatividad/qsv/pull/792

Changed

  • --version now returns "non-streaming" mode max input file size and detailed memory info. See Version details for more info https://github.com/jqnatividad/qsv/pull/780
  • exclude: expanded usage text and added 'input parameters' help by @tmtmtmtm in https://github.com/jqnatividad/qsv/pull/783
  • stats: performance tweaks in https://github.com/jqnatividad/qsv/commit/96e8168e6064469ab4489ed19c36aa595d5d119d, https://github.com/jqnatividad/qsv/commit/634d42a646dfb3bed2d34842bb3fa484cf641c7e and https://github.com/jqnatividad/qsv/commit/7e148cf78753aa60ef60f8efd6f1c7fea246b703
  • Use simdutf8 to do SIMD accelerated utf8 validation, replacing problematic utf8 screening. Together with https://github.com/jqnatividad/qsv/pull/782, completes utf8 validation revamp. https://github.com/jqnatividad/qsv/pull/784
  • Bump sysinfo from 0.27.7 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/786
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-18

Removed

  • Removed patched versions of csv crate optimized for performance. With the release of csv 1.2, switched back to csv crate upstream. https://github.com/jqnatividad/qsv/pull/794
  • removed utf8 first 8k screening. It was increasing code complexity and not very reliable. https://github.com/jqnatividad/qsv/pull/782

Fixed

  • dedup: refactored to use iterators to avoid out of bounds errors. https://github.com/jqnatividad/qsv/commit/f5e547b68410407851f217c706ad303bdbc5a583
  • exclude: don't screen for utf8. This bugfix spurred the utf8 validation revamp, where I realized, I just needed to pull out utf8 screening https://github.com/jqnatividad/qsv/pull/781
  • py: col, not row https://github.com/jqnatividad/qsv/pull/793

New Contributors

  • @tmtmtmtm made their first contribution in https://github.com/jqnatividad/qsv/pull/783

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.88.2...0.89.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.88.2

Changed

  • also show --update and --updatenow errors on stderr in addition to log file https://github.com/jqnatividad/qsv/pull/770
  • sortcheck: when a file is not sorted, dupecount is invalid. Set dupecount to -1 to make it plainly evident when file is not sorted. https://github.com/jqnatividad/qsv/pull/771
  • excel: added --quiet option https://github.com/jqnatividad/qsv/commit/99d88499df573f9f46992346f394d9372ceeffcc
  • extdedup: minimize allocations in hot loop https://github.com/jqnatividad/qsv/commit/62096fa84505b6de2c108d1f07707008e1c2d170
  • improved memfilecheck OOM-prevention helper function. Better error messages; clamp free memory headroom percentage between 10 and 90 percent https://github.com/jqnatividad/qsv/commit/6701ebfae58e942117378996ec6679544f620cbf and https://github.com/jqnatividad/qsv/commit/5cd8a95e7b36819f75f0d3bb8172dcff601b649b
  • improved utf8 check error messages to give more detail, and not just say there is an encoding error https://github.com/jqnatividad/qsv/commit/c9b5b075d31b9639958193db919683475c3e3ba5
  • improved README, adding Regular Expression Syntax section; reordered sections
  • modified CI workflows to also check qsvlite
  • Bump once_cell from 1.17.0 to 1.17.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/775
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-15

Fixed

  • dedup unnecessarily doing utf8 check; improve input usage text https://github.com/jqnatividad/qsv/pull/773
  • dedup: fix unstable dedup results caused by using par_sort_unstable_by https://github.com/jqnatividad/qsv/pull/776
  • sort: fix unstable sort results caused by using par_sort_unstable_by https://github.com/jqnatividad/qsv/commit/9f01df41a77dece75e434ee24b3ea0178d58deaf
  • removed mispublished 0.88.1 release

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.88.0...0.88.2

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.88.0

Added

  • extdedup: new command to deduplicate arbitrarily large CSV/text files using a memory-buffered, on-disk hash table. Not only does it dedup very large files using constant memory, it does so while retaining the file's original sort order, unlike dedup which loads the entire file into memory to sort it first before deduping by comparing neighboring rows https://github.com/jqnatividad/qsv/pull/762
  • Added Out-of-Memory (OOM) handling for "non-streaming" commands (i.e. commands that load the entire file into memory) using a heuristic that if an input file's size is lower than the free memory available minus a default headroom of 20 percent, qsv processing stops gracefully with a detailed message about the potential OOM condition. This headroom can be adjusted using the QSV_FREEMEMORY_HEADROOM_PCT environment variable, which has a minimum value of 10 percent https://github.com/jqnatividad/qsv/pull/767
  • add -Q, --quiet option to all commands that return counts to stderr (dedup, extdedup, search, searchset and replace) in https://github.com/jqnatividad/qsv/pull/768

Changed

  • sort & sortcheck: separate test suites and link from usage text https://github.com/jqnatividad/qsv/pull/756
  • frequency: amortize allocations, preallocate with_capacity. Informal benchmarking shows an improvement of ~30%! 🚀 https://github.com/jqnatividad/qsv/pull/761
  • extsort: refactor. Aligned options with extdedup; now also support stdin/stdout; added --memory-limit option https://github.com/jqnatividad/qsv/pull/763
  • safenames: minor optimization https://github.com/jqnatividad/qsv/commit/a7df378e0a755300e541dec0fef0b12d39b215f2
  • excel: minor optimization https://github.com/jqnatividad/qsv/commit/75eac7875e276b45e668cbe91271ad86cec8db49
  • stats: add date inferencing false positive warning, with a recommendation how to prevent false positives https://github.com/jqnatividad/qsv/commit/a84a4e614b5c14dd2e0d523bec4c6d9dbeb7c3ba
  • sortcheck: added note to usage text that dupe_count is only valid if file is sorted https://github.com/jqnatividad/qsv/commit/ab69f144fa2ac375255bf9fbd6dd08bf538c1dfa
  • reorganized Installation section to differentiate installation options https://github.com/jqnatividad/qsv/commit/9ef8bfc0b90574b41629c7c7bd463289dc1dcb62
  • bump MSRV to 1.67.1
  • applied select clippy recommendations
  • Bump flexi_logger from 0.25.0 to 0.25.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/755
  • Bump pyo3 from 0.18.0 to 0.18.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/757
  • Bump serde_json from 1.0.92 to 1.0.93 by @dependabot in https://github.com/jqnatividad/qsv/pull/760
  • Bump filetime from 0.2.19 to 0.2.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/759
  • Bump self_update from 0.34.0 to 0.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/765
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-02-12

Fixed

  • sortcheck: correct wrong progress message showing invalid dupe_count (as dupe count is only valid if the file is sorted) https://github.com/jqnatividad/qsv/commit/8eaa8240249c5c7eb1ece068764a8caa7e804414
  • py & luau: correct usage text about stderr https://github.com/jqnatividad/qsv/commit/1b56e72988e2dee1502517f8e2dbf036416efb8d

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.87.1...0.88.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.87.1

Changed

  • safenames: refactor in https://github.com/jqnatividad/qsv/pull/754
    • better handling of headers that start with a digit, instead of replacing the digit with a _, prepend the unsafe prefix
    • quoted identifiers are also considered unsafe, unless conditional mode is used
    • verbose modes now also return a list of duplicate header names
  • update MSRV to 1.67.0
  • cargo update bump depedencies
  • disable optimization on test profile for faster CI compilation, which was taking much longer than test run time
  • optimize prebuilt nightlies to compile with target-cpu=native
  • pin Rust nightly to 2023-02-01

Fixed

  • safenames: fixed mode behavior inconsistencies https://github.com/jqnatividad/qsv/pull/754 all modes now use the same safenames algorithm. Before, the verbose modes used a simpler one leading to inconsistencies between modes (resolves safenames handling inconsistent between modes #753)

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.87.0...0.87.1

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.87.0

Added

  • apply: add decimal separator --replacement option to thousands operation. This fully rounds out thousands formatting, as it will allow formatting numbers to support "euro-style" formats (e.g. 1.234.567,89 instead of 1,234,567.89) https://github.com/jqnatividad/qsv/pull/749
  • apply: add round operation; also refactored thousands operation to use more appropriate --formatstr option instead of --comparand option to specify "format" of thousands separator policy https://github.com/jqnatividad/qsv/pull/751
  • applydp: add round operation https://github.com/jqnatividad/qsv/pull/752

Changed

  • changed MSRV policy to track latest Rust version in Homebrew, instead of latest Rust stable
  • removed excess trailing whitespace in apply & applydp usage text
  • moved round_num function from stats.rs to util.rs so it can be used in round operation in apply and applydp
  • cargo update bump dependencies, notably tokio from 1.24.2 to 1.25.0
  • pin Rust nightly to 2023-01-28

Fixed

  • apply: corrected thousands operation usage text - hexfour not hex_four https://github.com/jqnatividad/qsv/commit/6545aa2b3ce470b5f6c039c998e9f6fc21a6ad84

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.86.0...0.87.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.86.0

Added

  • apply: added thousands operation which adds thousands separators to numeric values. Specify the separator policy with --comparand (default: comma). The valid policies are: comma, dot, space, underscore, hexfour (place a space every four hex digits) and indiancomma (place a comma every two digits, except the last three digits). https://github.com/jqnatividad/qsv/pull/748
  • searchset: added --unmatched-output option. This was done to allow Datapusher+ to screen for PIIs more efficiently. Writing PII candidate records in one CSV file, and the "clean" records in another CSV in just one pass. https://github.com/jqnatividad/qsv/pull/742

Changed

  • fetch & fetchpost: expanded usage text info on HTTP2 Adaptive Flow Control support
  • fetchpost: added more detail about --compress option
  • stats: added more tests
  • updated prebuilt zip archive READMEs https://github.com/jqnatividad/qsv/commit/072973efd7947a93773b2783d098eeace17d963d
  • Bump redis from 0.22.2 to 0.22.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/741
  • Bump ahash from 0.8.2 to 0.8.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/743
  • Bump jql from 5.1.4 to 5.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/747
  • applied select clippy recommendations
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-01-27

Fixed

  • stats: fixed antimodes null display. Use the literal NULL instead of just "" when listing NULL as an antimode. https://github.com/jqnatividad/qsv/pull/745
  • tojsonl: fixed invalid escaping of JSON values https://github.com/jqnatividad/qsv/pull/746

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.85.0...0.86.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.85.0

Added

  • Update csvs_convert by @kindly in https://github.com/jqnatividad/qsv/pull/736
  • sniff: added --delimiter option https://github.com/jqnatividad/qsv/pull/732
  • fetchpost: add --compress option in https://github.com/jqnatividad/qsv/pull/737
  • searchset: several tweaks for PII screening requirement of Datapusher+. --flag option now shows regex labels instead of just row number; new --flag-matches-only option sends only matching rows to output when used with --flag; --json option returns rowswithmatches, total_matches and rowcount as json to stderr. https://github.com/jqnatividad/qsv/pull/738

Changed

  • luau: minor tweaks to increase code readability https://github.com/jqnatividad/qsv/commit/31d01c8b9eb1fe85262e9bf5fd237ae4493d562c
  • stats: now normalizes after rounding. Normalizing strips trailing zeroes and converts -0.0 to 0.0. https://github.com/jqnatividad/qsv/commit/f838272b4deb79d25ca5704cf3c89652c0b9a3bb
  • safenames: mention CKAN-specific options https://github.com/jqnatividad/qsv/commit/f371ac25ba0c27e48b7b9b14a37dc47913cf0095
  • fetch & fetchpost: document decompression priority https://github.com/jqnatividad/qsv/commit/43ce13c4bf7eb23dc5d051d522d6d52d3cc255aa
  • Bump actix-governor from 0.3.2 to 0.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/728
  • Bump sysinfo from 0.27.6 to 0.27.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/730
  • Bump serial_test from 0.10.0 to 1.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/729
  • Bump pyo3 from 0.17.3 to 0.18.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/731
  • Bump reqwest from 0.11.13 to 0.11.14 by @dependabot in https://github.com/jqnatividad/qsv/pull/734
  • cargo update bump for other dependencies
  • pin Rust nightly to 2023-01-21

Fixed

  • sniff: now checks that --sample size is greater than zero https://github.com/jqnatividad/qsv/commit/cd4c390ce4322d7076866be27025d67800bc60e2

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.84.0...0.85.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.84.0

Added

  • headers: added --trim option to trim quote and spaces from headers https://github.com/jqnatividad/qsv/pull/726

Changed

  • input: --trim-headers option also removes excess quotes https://github.com/jqnatividad/qsv/pull/727
  • safenames: trim quotes and spaces from headers https://github.com/jqnatividad/qsv/commit/0260833bc8b36ea6e6ccb9e79687c76470a8a6b0
  • cargo update bump dependencies
  • pin Rust nightly to 2022-01-13

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.83.0...0.84.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.83.0

Added

  • stats: add sparsity to "streaming" statistics https://github.com/jqnatividad/qsv/pull/719
  • schema: also infer enum constraints for integer fields. Not only good for validation, this is also required by tojsonl for smarter boolean inferencing https://github.com/jqnatividad/qsv/pull/721

Changed

  • stats: change --typesonly so it will not automatically --infer-dates. Let the user decide. https://github.com/jqnatividad/qsv/pull/718
  • stats: if median is already known, use it to calculate Median Absolute Deviation https://github.com/jqnatividad/qsv/commit/08ed08da4651a96bf05372b34b670063fbcec14f
  • tojsonl: smarter boolean inferencing. It will infer a column as boolean if it only has a domain of two values, and the first character of the values are one of the following case-insensitive "truthy/falsy" combinations: t/f; t/null; 1/0; 1/null; y/n & y/null are treated as true/false. https://github.com/jqnatividad/qsv/pull/722 and https://github.com/jqnatividad/qsv/pull/723
  • safenames: process --reserved option before --prefix option. https://github.com/jqnatividad/qsv/commit/b333549199726a3e92b95fb1d501fbdbbeede34a
  • strum and strum-macros are no longer optional dependencies as we use it with all the binary variants now https://github.com/jqnatividad/qsv/commit/bea6e00fc400e8fafa2938832f8654d97c45fe34
  • Bump qsv-stats from 0.6.0 to 0.7.0
  • Bump sysinfo from 0.27.3 to 0.27.6
  • Bump hashbrown from 0.13.1 to 0.13.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/720
  • Bump actions/setup-python from 4.4.0 to 4.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/724
  • change MSRV from 1.66.0 to 1.66.1
  • cargo update bump indirect dependencies
  • pin Rust nightly to 2023-01-12

Fixed

  • safenames: fixed --prefix option. When checking for invalid underscore prefix, it was checking for hyphen, not underscore, causing a problem with Datapusher+ https://github.com/jqnatividad/qsv/commit/4fbbfd3a479b6678fa9d4c823fd00b592b326c7a

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.82.0...0.83.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.82.0

Added

  • diff: Find the difference between two CSVs ludicrously fast! by @janriemer in https://github.com/jqnatividad/qsv/pull/711
  • stats: added Median Absolute Deviation (MAD) https://github.com/jqnatividad/qsv/pull/715
  • added Testing section to README https://github.com/jqnatividad/qsv/commit/517d69b496aaa9535a2b23b05e44a5999d8ef994

Changed

  • validate: schema-less validation error improvements https://github.com/jqnatividad/qsv/pull/703
  • stats: faster date inferencing https://github.com/jqnatividad/qsv/pull/706
  • stats: minor performance tweaks https://github.com/jqnatividad/qsv/commit/15e6284c20cccf4a6b74498336d31b0d7ba03285 https://github.com/jqnatividad/qsv/commit/3f0ed2b314765a546e28b534d5e82bff892592c3
  • stats: refactored modes compilation, with antimodes no longer unnecessarily compiling more than 10 antimodes it won't show anyway. https://github.com/jqnatividad/qsv/commit/6e448b041a2c78b3ce1cc89aadaff4a8d1081472
  • stats: simplify if condition https://github.com/jqnatividad/qsv/commit/ae7cc85afe1dc4c3f87cbefe3b14dc93b28d94e9
  • luau: show luau version when invoking --version https://github.com/jqnatividad/qsv/commit/f7f9c4297fb3dea685b5d0f631932b6b2ca4a99a
  • excel: add "sheet" suffix to end msg for readability https://github.com/jqnatividad/qsv/commit/ae3a8e31784a24c8492de76c5074e477cc474063
  • cache util::count_rows result, so if a CSV without an index is queried, it caches the result and future calls to count_rows in the same session will be instantaneous https://github.com/jqnatividad/qsv/commit/e805dedf5674cfbc56d9948791419ac6fd51f2fd
  • Bump console from 0.15.3 to 0.15.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/704
  • Bump cached from 0.41.0 to 0.42.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/709
  • Bump mlua from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/712
  • Bump qsv-stats from 0.5.2 to 0.6.0 with the new MAD statistic support and faster, more memory-efficient antimodes compilation
  • cargo update bump dependencies - notably mimalloc from 0.1.32 to 0.1.34, luau0-src from 0.4.1luau553 to 0.5.0luau555, csvs_convert from 0.7.9 to 0.7.11 and regex from 1.7.0 to 1.7.1
  • pin Rust nightly to 2023-01-08

Fixed

  • tojsonl: fix escaping of unicode string. Replace hand-rolled escape fn with built-in escape_default fn https://github.com/jqnatividad/qsv/pull/707. Fixes https://github.com/jqnatividad/qsv/issues/705
  • tojsonl: more robust boolean inferencing https://github.com/jqnatividad/qsv/pull/710. Fixes https://github.com/jqnatividad/qsv/issues/708

New Contributors

  • @janriemer made their first contribution in https://github.com/jqnatividad/qsv/pull/711

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.81.0...0.82.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.81.0

[0.81.0] - 2023-01-02

Added

  • stats: added range statistic https://github.com/jqnatividad/qsv/pull/691
  • stats: added additional mode stats. For mode, added modecount and modeoccurrences. Added "antimode" (opposite of mode - least frequently non-zero occurring value), antimodecount and antimodeoccurrences. https://github.com/jqnatividad/qsv/pull/694
  • qsv-dateparser now recognizes unix timestamp values with fractional seconds to nanosecond precision as dates. stats, sniff, apply datefmt and schema, which all use qsv-dateparser, now infer unix timestamps as dates - https://github.com/jqnatividad/qsv/commit/a29ff8ea255d5aed9992556a0a23ab76117c8340 https://github.com/jqnatividad/qsv/pull/702 > USAGE NOTE: As timestamps can be float or integer, and data type inferencing will guess dates last, preprocess timestamp columns with apply datefmt first to more date-like, non-timestamp formats, so they are recognized as dates by other qsv commands.

Changed

  • apply: document numtocurrency --comparand & --replacement behavior https://github.com/jqnatividad/qsv/commit/cc88fe921d8cdf7eedcb0008e16ebb5c46744f33
  • index: explicitly flush buffer after creating index https://github.com/jqnatividad/qsv/commit/ee5d790af1cde73dfc57b028bf52fa88e83cdaa4
  • sample: no longer requires an index to do percentage sampling https://github.com/jqnatividad/qsv/commit/45d4657713ebe2ae8388ce55f4cb1a733e727024
  • slice: removed unneeded utf8 check https://github.com/jqnatividad/qsv/commit/5a199f4442bd025cec31309bee44ac71bacbdfaa
  • schema: expand usage text regarding --strict-dates https://github.com/jqnatividad/qsv/commit/3d22829f3cf0441961e854555cd0c333bcb3ffb1
  • stats: date stats refactor. Date stats are returned in rfc3339 format. Dates are converted to timestamps with millisecond precision while calculating date stats. https://github.com/jqnatividad/qsv/pull/690 https://github.com/jqnatividad/qsv/commit/e7c297795ff5e82cf1dc242090be11ecced6da9a
  • filter out variance/stddev in tests as float precision issues are causing flaky CI tests https://github.com/jqnatividad/qsv/pull/696
  • Bump qsv-dateparser from 0.4.4 to 0.6.0
  • Bump qsv-stats from 0.4.6 to 0.5.2
  • Bump qsv-sniffer from 0.5.0 to 0.6.0
  • Bump serde from 1.0.151 to 1.0.152 by @dependabot in https://github.com/jqnatividad/qsv/pull/692
  • Bump csvs_convert from 0.7.7 to 0.7.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/693
  • Bump once_cell from 0.16.0 to 0.17.0 https://github.com/jqnatividad/qsv/commit/d3ac2556c74e2ddd66dcee00e5e836d284b662a7
  • Bump self-update from 0.32.0 to 0.34.0 https://github.com/jqnatividad/qsv/commit/5f95933f01e2e0c592b52d7424b6a832aafd3591
  • Bump cpc from 1.8 to 1.9; set csvs_convert dependency to minor version https://github.com/jqnatividad/qsv/commit/ee9164810559f5496dfafba0e789b9cd84000a17
  • applied select clippy recommendations
  • deeplink to Cookbook from Table of Contents
  • pin Rust nightly to 2023-01-01
  • implementation comments on stats, sample, sort & Python distribution

Fixed

  • stats: prevent premature rounding, and make sum statistic use the same rounding method https://github.com/jqnatividad/qsv/commit/879214a1f3032f140f0207fe8807e1bb641110d7 https://github.com/jqnatividad/qsv/commit/1a1362031de8973b623598748bea4bc5fc6e08d3
  • fix autoindex so we return the index path properly https://github.com/jqnatividad/qsv/commit/d3ce6a3918683d66bf0f3246c7d6e8518eead392
  • fetch & fetchpost: corrected typo https://github.com/jqnatividad/qsv/commit/684036bbc237d5b80ea060f9ee8b8d46c1a2ad88

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.80.0...0.81.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.80.0

Added

  • new to command. Converts CSVs "to" PostgreSQL, SQLite, XLSX, Parquet and Data Package by @kindly in https://github.com/jqnatividad/qsv/pull/656
  • apply: add numtocurrency operation https://github.com/jqnatividad/qsv/pull/670
  • sort: add --ignore-case option https://github.com/jqnatividad/qsv/pull/673
  • stats: now computes summary statistics for dates as well https://github.com/jqnatividad/qsv/pull/684
  • added --updatenow option, resolves https://github.com/jqnatividad/qsv/issues/661 https://github.com/jqnatividad/qsv/pull/662
  • replace footnotes in Available Commands list with emojis :smile:

Changed

  • apply & applydp: expose --batch size option https://github.com/jqnatividad/qsv/pull/679
  • validate: add last valid row to validation error https://github.com/jqnatividad/qsv/commit/7680011a2fcc459aa621414122ecaa869e98ae83
  • input: add last valid row to error message https://github.com/jqnatividad/qsv/commit/492e51f85ab5a0637c201d7020d7ac2fdb72be96
  • upgrade to csvs-convert 0.7.5 by @kindly in https://github.com/jqnatividad/qsv/pull/668
  • Bump serial_test from 0.9.0 to 0.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/671
  • Bump csvs_convert from 0.7.5 to 0.7.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/674
  • Bump num_cpus from 1.14.0 to 1.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/678
  • Bump robinraju/release-downloader from 1.6 to 1.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/677
  • Bump actions/stale from 6 to 7 by @dependabot in https://github.com/jqnatividad/qsv/pull/676
  • Bump actions/setup-python from 4.3.1 to 4.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/683
  • added concurrency check to CI tests so that redundant CI test are canceled when new ones are launched
  • instead of saying "descriptive statistics", use more understandable "summary statistics"
  • changed publishing workflows to enable to feature for applicable target platforms
  • cargo update bump dependencies, notably qsv-stats from 0.4.5 to 0.4.6 and qsv_currency from 0.5.0 to 0.6.0
  • pin Rust nightly to 2022-12-22

Fixed

  • stats: fix leading zero handling https://github.com/jqnatividad/qsv/pull/667
  • apply: fix currencytonum bug https://github.com/jqnatividad/qsv/pull/669

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.79.0...0.80.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.79.0

Added

  • safenames: add --reserved option, allowing user to specify additional "unsafe" names https://github.com/jqnatividad/qsv/pull/657
  • safenames: add --prefix option https://github.com/jqnatividad/qsv/pull/658
  • fetch & fetchpost: added simple retry backoff multiplier - https://github.com/jqnatividad/qsv/commit/e343398ddd9c804237e73bbc652cc9e51c657b78

Changed

  • excel: refactored --metadata processing; added more debug messages; minor perf tweaks https://github.com/jqnatividad/qsv/commit/f137bab42f81518acd3ef825cd223b9970d70b02
  • set MSRV to Rust 1.6.6
  • cargo update bump several dependencies, notably qsv-dateparser
  • pin Rust nightly to 2022-12-15

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.78.2...0.79.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.78.2

Changed

  • cargo update bump paste 1.0.9 to 1.0.10
  • pin Rust nightly to 2022-12-12

Removed

  • excel: remove --safenames option. If you need safenames, use the safenames command https://github.com/jqnatividad/qsv/commit/e5da73bcc64ef3a8c66c611fd6247fa331117544

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.78.1...0.78.2

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.78.1

Changed

  • qsvdp: apply now available in qsvdp asapplydp - removing the geocode and calconv subcommands, and removing all operations that require third-party crates EXCEPT dynfmt and datefmt which is needed for Datapusher+ https://github.com/jqnatividad/qsv/pull/652
  • excel: fine-tune --metadata processing https://github.com/jqnatividad/qsv/commit/09530d4f65b06060d24b7ed3948aeab25b2aa7c8
  • bump serde from 1.0.149 to 1.0.150
  • qsvdp in now included in CI tests

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.78.0...0.78.1

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.78.0

Added

  • stats: added leading zero handling when inferring types (e.g. zipcodes like "07094" are strings not integers) https://github.com/jqnatividad/qsv/pull/648
  • stats: added --typesonly option, which infers only data types with date inferencing enabled for all columns https://github.com/jqnatividad/qsv/pull/650
  • stats: added underflow handing to sum statistic https://github.com/jqnatividad/qsv/commit/1b5e5451f929ad1c7dc5fb7f17b2a3261809ab05
  • excel: expanded --metadata functionality, with the option to return workbook metadata as JSON as well https://github.com/jqnatividad/qsv/pull/651
  • added platform-specific README for prebuilt zip archives https://github.com/jqnatividad/qsv/commit/15e247e523dbc22a50ebff1b15d7d0c4eb668bd5

Changed

  • safenames: improved usage text
  • stats: minor performance tweaks https://github.com/jqnatividad/qsv/commit/88be38b542fc61470a7b0331e7be3a3cad62a7bb and https://github.com/jqnatividad/qsv/commit/8aa58c5ad733116d246e171bcea622c1378b8e48
  • join: minor performance tweaks https://github.com/jqnatividad/qsv/commit/92d41910077148f769ccf2c8a283be2c30d68bbf
  • exclude: minor performance tweaks https://github.com/jqnatividad/qsv/commit/f3cc0ac29c5f3e6cec5a08d3aac3371d32b5eb0f
  • sniff: minor performance tweak https://github.com/jqnatividad/qsv/commit/d2a4676fcb5189fc9232538e68854cfcf4ef808b
  • sortcheck: minor performance tweak https://github.com/jqnatividad/qsv/commit/83c22ae5a623a8b0740f7024aac9448ee809eabd
  • switch GitHub Actions to use ubuntu-20.04 so as not to link to too new glibc libraries, preventing older distros from running the linux-gnu prebuilts.
  • switch GitHub Actions to use macos-12 to minimize flaky CI tests
  • expanded qsvdp description in README
  • Bump actions/setup-python from 4.3.0 to 4.3.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/645
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2022-12-10

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.77.0...0.78.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.77.0

Added

  • safenames: added Verbose JSON options https://github.com/jqnatividad/qsv/pull/644

Changed

  • py & luau: improved usage text
  • opt-in self-update in https://github.com/jqnatividad/qsv/pull/640 and https://github.com/jqnatividad/qsv/pull/641
  • Create README in prebuilt zip archive with platform specific notes (Logic created but not implemented until next release) https://github.com/jqnatividad/qsv/pull/642
  • Simplify python map_datetime test so it works on older Python versions https://github.com/jqnatividad/qsv/commit/e85e4e7bf9bf379f8478b066a9f6dea21afbf0e8
  • include date.lua in qsv package so cargo install works https://github.com/jqnatividad/qsv/commit/11a0ff8edc5405afd9cc6637de026bf2138a7df0
  • Bump data-encoding from 2.3.2 to 2.3.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/638
  • cargo update bump several dependencies
  • pin Rust nightly to 2022-12-07

Fixed:

  • safenames: fixed calculation of unsafe headers as it was dupe-counting some unsafe headers - https://github.com/jqnatividad/qsv/pull/644

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.3...0.77.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.76.3

Changed

  • cargo update bump serde from 1.0.148 to 1.0.149
  • simplify python datetime test so it runs on Python 3.6 and above

Fixed

  • reverted not_luau_compatible introduced in 0.76.2 and 0.76.3. Adjusted Github Action publish workflow instead to properly build luau in qsvdp when the platform supports it.

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.2...0.76.3

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.76.2

Fixed

  • tweak not_luau_compatible feature so we can more easily disable luau feature when cross-compiling for some platforms where we cannot properly build luau.

NOTE: Not published on crates.io due to problems creating prebuilt binaries

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.1...0.76.2

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.76.1

Fixed

  • added not_luau_compatible feature so we can more easily disable luau feature when cross-compiling for some platforms where we cannot properly build luau.

NOTE: Not published on crates.io due to problems creating prebuilt binaries

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.0...0.76.1

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.76.0

Added

  • qsvdp: add luau in anticipation of Datapusher+ optional preprocessing https://github.com/jqnatividad/qsv/pull/634
  • luau: added ability to load libraries using "require"; preload LuaDate library https://github.com/jqnatividad/qsv/pull/633
  • luau: added more extensive debug logging support, adding _idx to debug log messages; trace log level support showing global vars and record values when an error occurs https://github.com/jqnatividad/qsv/pull/636 and https://github.com/jqnatividad/qsv/pull/637

Changed

  • py and luau: when errors encountered, return non-zero exit code, along with error count to stderr https://github.com/jqnatividad/qsv/pull/631
  • safenames and excel: Unsafe empty column/header names are replaced with "_blank" instead of "_" https://github.com/jqnatividad/qsv/pull/632
  • frequency: replace foreach iterator with regular for; remove unneeded assert https://github.com/jqnatividad/qsv/commit/74eb321defbf294675872a7dd891e8a7aedd31f1
  • bumped qsv-stats from 0.4.1 to 0.4.5 - fixing sum rounding and variance precision errors.
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2022-12-03

Fixed

  • stats: fix sum rounding and variance precision errors https://github.com/jqnatividad/qsv/pull/635

NOTE: Not published on crates.io due to problems creating prebuilt binaries

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.75.0...0.76.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.75.0

Added:

  • py: added python datetime module by default in https://github.com/jqnatividad/qsv/pull/629
  • qsvdp (Datapusher+ optimized binary variant): added self-update. However, unlike qsv and qsvlite binary variants, qsvdp will not automatically prompt for a self-update, and will only inform the user if there is a new release. The user will need to invoke the --update option explicitly. https://github.com/jqnatividad/qsv/pull/622

Changed:

  • stats: Speedup type checking by @kindly in https://github.com/jqnatividad/qsv/pull/625
  • validate: Added a useful note about validate output by @aborruso in https://github.com/jqnatividad/qsv/pull/624
  • luau: Now precompiles all scripts, including the --prologue & --epilogue scripts, into bytecode https://github.com/jqnatividad/qsv/commit/e97c2caf81316bcf655875a9bee4c78dac5a8b70
  • frequency: remove unsafe fromutf8unchecked https://github.com/jqnatividad/qsv/commit/16642e8ee3364309c1a774142976f6207ba5c594
  • More robust autoindexing in https://github.com/jqnatividad/qsv/pull/623
  • minor clippy performance tweaks to rust-csv fork
  • Bump serde from 1.0.147 to 1.0.148 by @dependabot in https://github.com/jqnatividad/qsv/pull/620
  • cargo update bump several indirect dependencies
  • improved README; use :sparkle: to indicate commands behind a feature flag
  • pin Rust nightly to 2022-11-30

New Contributors

  • @aborruso made their first contribution in https://github.com/jqnatividad/qsv/pull/624
  • @kindly made their first contribution in https://github.com/jqnatividad/qsv/pull/625

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.74.0...0.75.0

- Rust
Published by jqnatividad about 3 years ago

https://github.com/dathere/qsv - 0.74.0

Added:

  • safenames: added --verify and --verbose modes in https://github.com/jqnatividad/qsv/pull/610 and https://github.com/jqnatividad/qsv/pull/615

Changed:

  • excel: align --safenames option to safenames command in https://github.com/jqnatividad/qsv/pull/611 and https://github.com/jqnatividad/qsv/pull/616
  • luau: Now precompiles main script to bytecode; now allow loading luau script from file for main, prologue and epilogue scripts in https://github.com/jqnatividad/qsv/pull/619
  • sniff: increase default sample size from 100 to 1000 in https://github.com/jqnatividad/qsv/commit/40d52cf0c67e39d645a1c76a26ae234999317b0b
  • validate: applied various optimizations in https://github.com/jqnatividad/qsv/commit/bfed127f28c4ccf6e9a18a5998588396594831d2 and https://github.com/jqnatividad/qsv/commit/06c109a0335326f57d903211334b4f2fb1ab7ccc
  • updated Github Actions workflows to reflect removal of luajit feature
  • Bump sysinfo from 0.26.7 to 0.26.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/614
  • Bump rust_decimal from 1.26.1 to 1.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/617
  • cargo bump update several indirect dependencies
  • applied various clippy recommendations
  • pin Rust nightly to 2022-11-25

Removed:

  • luajit: removed as its been deprecated by optimized luau command which now support precompiling to bytecode, largely obviating the main feature of LuaJIT - Just-in-Time compilation in https://github.com/jqnatividad/qsv/pull/619

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.73.2...0.74.0

- Rust
Published by jqnatividad over 3 years ago

https://github.com/dathere/qsv - 0.73.2

Changed:

  • Link to tests as examples from usage text in https://github.com/jqnatividad/qsv/pull/608
  • Bump serde_json from 1.0.88 to 1.0.89 by @dependabot in https://github.com/jqnatividad/qsv/pull/607
  • cargo update bump to get latest crossbeam crates to replace yanked crates https://github.com/jqnatividad/qsv/commit/5108a87b0f5e2d5a7cfef3f60f4cd6b3659bce7d

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.73.1...0.73.2

- Rust
Published by jqnatividad over 3 years ago

https://github.com/dathere/qsv - 0.73.1

Changed:

  • rename safename command to safenames for consistency
  • cargo update bump indirect dependencies

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.73.0...0.73.1

- Rust
Published by jqnatividad over 3 years ago

https://github.com/dathere/qsv - 0.73.0

Added

  • safenames: new command to modify header names to db-safe names in https://github.com/jqnatividad/qsv/pull/606
  • apply: added censor-count operation in https://github.com/jqnatividad/qsv/pull/599
  • apply: added escape operation in https://github.com/jqnatividad/qsv/pull/600
  • excel: added --safe-names option in https://github.com/jqnatividad/qsv/pull/598

Changed

  • apply: refactored to use enums instead of strings for operations in https://github.com/jqnatividad/qsv/pull/601
  • fetch & fetchpost: --http-header -H shortcut in https://github.com/jqnatividad/qsv/pull/596
  • excel: smarter date parsing for XLSX files; rename --safe-column-names to --safe-names in https://github.com/jqnatividad/qsv/pull/603
  • Smarter safe names in https://github.com/jqnatividad/qsv/pull/605
  • Bump uuid from 1.2.1 to 1.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/594
  • Bump mimalloc from 0.1.31 to 0.1.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/595
  • Bump censor from 0.2.0 to 0.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/597
  • Bump Swatinem/rust-cache from 1 to 2 by @dependabot in https://github.com/jqnatividad/qsv/pull/602
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2022-11-19

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.72.0...0.73.0

- Rust
Published by jqnatividad over 3 years ago