Recent Releases of https://github.com/dathere/qsv
https://github.com/dathere/qsv - 7.1.0
🇮🇹 csv,conf,v9 edition 🍝
|
:----|:----
||Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN!
For this feature release, we polished describegpt a bit more for the occassion...
Towards the "People's API!"! Verso l'API del Popolo!
(Answering People/Policymaker Interface)|
🚀 Enhanced describegpt Command
- Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
- Few-shot Learning: Add
--fewshot-examplesoption to improve LLM response quality with contextual examples - Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
- Conditional SQL Results: Implement conditional
--sql-resultsformat for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a.csvextension. If a "SQL hallucination" fails, the file is saved with a.sqlextension instead for the user to tweak and edit. - TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
- Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
- Disk Cache by Default: The disk cache is now enabled by default for better performance
- TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files. (see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml)
- Better Local LLM Support:
--api-keycan now be set to NONE for local LLM configurations that may not necessarily run onlocalhost(e.g. a shared Local LLM service running on the local network)
partition Command Enhancements
- New
--limitOption: Implement--limitoption to set the maximum number of open files - Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets
Added
describegpt: add configurable frequency limit https://github.com/dathere/qsv/pull/2950describegpt: migrate prompt file from JSON to more easier to edit TOML format https://github.com/dathere/qsv/pull/2954describegpt: refactor default prompt file; add--fewshot-examplesoption https://github.com/dathere/qsv/pull/2955describegpt: add TogetherAI support for models endpoint https://github.com/dathere/qsv/pull/2965partition: add--limitoption https://github.com/dathere/qsv/pull/2960- added Windows ARM64 prebuilt binaries
Changed
describegpt: enable disk cache by default https://github.com/dathere/qsv/pull/2951describegpt: Polars SQL generation tweaks https://github.com/dathere/qsv/pull/2958python: replace deprecatedwith_gilwithattachhttps://github.com/dathere/qsv/pull/2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!- deps: bump embedded Luau from 0.688 to 0.690 https://github.com/dathere/qsv/pull/2967
- deps: bump Polars to 0.50.0 at py-1.33.0 tag
- build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2962
- build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2963
- build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2961
- build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2948
- build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2946
- build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2956
- build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2952
- applied select clippy lints
- updated indirect dependencies
Full Changelog: https://github.com/dathere/qsv/compare/7.0.1...7.1.0
- Rust
Published by jqnatividad 6 months ago
https://github.com/dathere/qsv - 7.0.1
[7.0.1] - 2025-08-28
A patch release with some minor bug fixes, benchmark tweaks and build system improvements.
Added
- publish: add dedicated powerpc64le-unknown-linux-gnu publishing workflow (WIP)
Changed
- docs:
describegptexpanded error message about LLM URL or API key - deps: remove planus pinned dependency
Fixed
- fix:
geocode--batch 0causes panic when polars feature is enabled - publish: remove luau feature from x86_64-pc-windows builds that was causing builds to fail
- publish: remove powerpc64le from main publish workflow
- benchmarks: updated to v6.8.0 with fixes to luau and clustered sample benchmarks
Full Changelog: https://github.com/dathere/qsv/compare/7.0.0...7.0.1
- Rust
Published by jqnatividad 6 months ago
https://github.com/dathere/qsv - 7.0.0
[7.0.0] - 2025-08-28
🥳 Open Weights with Open Data, Local LLM 🤖 edition 🚀
This is the biggest release yet - 470+ commits since v6.0.1! Packed with new AI-powered features, fixes and significant performance improvements suite-wide!
With the release of OpenAI's gpt-oss open-weight reasoning model earlier this month setting the stage, we continue on our "Automagical Metadata" journey by revamping describegpt.
🤖 Revamped describegpt - AI-Powered Metadata Inferencing and Data Analysis:
- Intelligent Metadata Generation: Automatically generate comprehensive metadata - Data Dictionaries, Description and Tags for your Datasets using Large Language Models (LLM) prompted with summary statistics and frequency tables as detailed context - without sending your data to the cloud!
- Chat with your Data: If your prompt can be answered using this high-quality, high-resolution Metadata, describegpt can answer it! If your prompt is not remotely related to the data, it will politely refuse - "I'm sorry, I can only answer questions about the Dataset."
- Auto SQL RAG Mode: Should the LLM decide that it doesn't have the necessary information in the metadata it compiled to answer your prompt, it will automatically enter SQL Retrieval-Augmented Generation (RAG) mode - using the rich metadata instead as context to craft an expert-level, deterministic, reproducible, "hallucination-free" SQL query[^1] to respond to your prompt.
- Database Engine Support: If DuckDB is installed or the Polars feature is enabled, and the --sql-results <ANSWER.CSV> is specified, an optimized SQL query will be automatically executed and the query results are saved to the specified file. As both are purpose-built OLAP engines that support direct queries (no database pre-loading required), you get answers in a few seconds[^2] - even for very large datasets.
- Multi-LLM Support: Works with any OpenAI-API compatible LLM - with special support for local LLMs like Ollama, Jan and LM Studio and the ability to customize model behavior with the --addl-props option.
- Advanced Caching: Disk and Redis caching support for performance and cost optimization.
- Flexible Prompting: Custom prompt files and built-in intelligent templates for various analysis tasks.
Check out these examples using a 1 million row sample of NYC's 311 data!
- --all option produces a Data Dictionary, Description and Tags - Markdown, JSON
- --prompt "What are the top 10 complaint types per community board and borough?" - SQL result
On top of other improvements in Datapusher+ with its new Jinja-based "metadata suggestion engine" - we're using this AI-inferred metadata along with other precalcs to prepopulate DCATv3 (both US and European profiles) and Croissant metadata fields that are otherwise too hard and expensive to compile manually.
The inferred and precalculated metadata values are offered as "suggestions", using a UI/UX purpose-built to facilitate interactive metadata curation chats.
This allows Data Stewards to compile high-quality, high-resolution metadata catalogs with an accelerated "Data Steward in the Loop" data ingestion and metadata curation workflow.
If you want to see and learn more, we're Bologna-bound to attend csv,conf,v9 to present and share how we're using this to auto-infer metadata in CKAN.
Hope to see you there!
📊 Enhanced frequency Command:
- Rank Column: Ranking of frequency results for better data insights
- JSON Output Mode: New --json option not only provides structured output beyond the default CSV format - it also takes advantage of JSON's nested support to include 15 additional summary statistics per field
- Performance Boost: Speed improvements with SIMD-accelerated number parsing, remaining performant even with the added functionality
⚡ stats Command Improvements:
- Faster Still: Enabled by improvements in the underlying qsv-stats crate
- Improved Precision: Faster, streamlined precision calculation
- SIMD Number Parsing: Hardware-accelerated parsing for int/float values
- Unix Epoch Support: Proper handling of Unix timestamp 0 as valid date
- Enhanced Date Inference: Better date and boolean type inference capabilities
🔧 validate & schema Enhancements:
- Fancy Regex Support: You can now use "advanced" regex features with your JSON Schema patterns with the --fancy-regex option. Previously, you can only use the standard Rust regex engine which does not support backreferences or look-arounds (for performance reasons)
- JSON Schema Improvements: Better error handling and format validation options
- Schema Validation Refinements: More granular validation control with --no-format-validation
🔄 rename Reverted and Improved:
When pairwise renaming was introduced in v6.0.0, it broke some some workflows. It's now fixed by introducing two modes:
- Positional Mode: Renaming by position is now once again the default
- Pairwise Mode: New --pairwise flag for column renaming by column pairs
🗂️ partition Improvements:
- Case-Insensitive Safety: Improved case-aware partitioning algorithm. Previously, case insensitive file systems like macOS APFS and Windows NTFS was causing incorrect partitioning of case-sensitive values
- Faster still: With better use of I/O bufferring - with deferred, batched, async writes instead of after every record
[^1]: LLMs can still hallucinate a syntactically wrong SQL query. But once a valid SQL query is produced, its fully reproducible.
[^2]: Depending on your LLM setup, SQL query generation may take some time, but once generated, the SQL query itself is blazing-fast.
Added
frequencyadd rank info to frequency table https://github.com/dathere/qsv/pull/2878frequencyadd--jsonoutput option https://github.com/dathere/qsv/pull/2868validateadd--fancy-regexoption https://github.com/dathere/qsv/pull/2845- add CPU-accelerated, mem-mapped, chunked sha256 file checksum helper https://github.com/dathere/qsv/pull/2909
Changed
applyuse SIMD-accelerated base64-simd crate for Encode64 and Decode64 operations https://github.com/dathere/qsv/pull/2863statsfaster precision calculation https://github.com/dathere/qsv/pull/2852- perf: Use simdjson instead of serdejson to serialize to JSON https://github.com/dathere/qsv/pull/2884
- refactor: create and use reqwest client helpers to eliminate redundant code https://github.com/dathere/qsv/pull/2888
- perf: Faster parallelized sha256 hash file https://github.com/dathere/qsv/pull/2918
- refactor:
describegpthttps://github.com/dathere/qsv/pull/2890 - refactor:
describegptsetting--timeoutto 0 sets no timeout https://github.com/dathere/qsv/pull/2891 - refactor:
describegptmore refinements https://github.com/dathere/qsv/pull/2892 - feat:
describegptrefactor round3 https://github.com/dathere/qsv/pull/2893 - feat:
describegptdisk & redis caching https://github.com/dathere/qsv/pull/2895 - refactor:
describegpthttps://github.com/dathere/qsv/pull/2896 - refactor:
describegptcreateget_cache_keyhelper; customizable stats options https://github.com/dathere/qsv/pull/2902 - feat:
describegptauto SQL RAG for--prompthttps://github.com/dathere/qsv/pull/2904 - feat:
describegptmajor refactor https://github.com/dathere/qsv/pull/2913 - refactor:
describegptdefault promptfile is now embedded in qsv binary; fine-tune tests https://github.com/dathere/qsv/pull/2924 - feat:
describegptreturning reasoning with --json option https://github.com/dathere/qsv/pull/2926 - feat:
describegptadd DuckDB support in SQL RAG mode https://github.com/dathere/qsv/pull/2929 - feat:
describegptvarious DuckDB improvements https://github.com/dathere/qsv/pull/2936 - refactor:
describegptimproved cache miss handling https://github.com/dathere/qsv/pull/2938 - feat:
describegpt--addl-propsis now part of cachekey https://github.com/dathere/qsv/pull/2939 - deps: bump cached to 0.56 and remove our patched fork https://github.com/dathere/qsv/pull/2853
- deps: bump polars from 0.49 to 0.50 https://github.com/dathere/qsv/pull/2869
- deps: bump polars to 0.50.0 at the py-1.32.2 tag https://github.com/dathere/qsv/pull/2877
- deps: bump polars to 0.50.0 at py-1.32.3 tag https://github.com/dathere/qsv/pull/2889
- build(deps): bump actions/checkout from 4 to 5 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2886
- build(deps): bump arboard from 3.6.0 to 3.6.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2920
- build(deps): bump base62 from 2.2.1 to 2.2.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2937
- build(deps): bump bytemuck from 1.23.1 to 1.23.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2876
- build(deps): bump calamine from 0.29.0 to 0.30.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2872
- build(deps): bump criterion from 0.6.0 to 0.7.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2855
- build(deps): bump dns-lookup from 2.1.0 to 3.0.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2915
- build(deps): bump dynfmt2 from 0.2.0 to 0.3.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2850
- build(deps): bump foldhash from 0.1.5 to 0.2.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2922
- build(deps): bump file-format from 0.27.0 to 0.28.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2873
- build(deps): bump filetime from 0.2.25 to 0.2.26 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2906
- build(deps): bump governor from 0.10.0 to 0.10.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2871
- build(deps): bump hashbrown from 0.15.4 to 0.15.5 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2874
- build(deps): bump indexmap from 2.10.0 to 2.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2917
- build(deps): bump jsonschema from 0.32.1 to 0.33.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2928
- build(deps): bump libc from 0.2.174 to 0.2.175 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2882
- build(deps): bump memmap2 from 0.9.7 to 0.9.8 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2914
- build(deps): bump mimalloc from 0.1.47 to 0.1.48 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2935
- build(deps): bump minijinja-contrib from 2.11.0 to 2.12.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2923
- deps: bump mlua from 0.10.5 to 0.11.1 - upgrading Luau from 0.663 to 0.682 https://github.com/dathere/qsv/pull/2842
- build(deps): bump mlua from 0.11.1 to 0.11.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2879
- build(deps): bump phf from 0.12.1 to 0.13.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2921
- build(deps): bump qsv-stats from 0.36.0 to 0.37.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2856
- build(deps): bump rand from 0.9.1 to 0.9.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2851
- build(deps): bump rayon from 1.10.0 to 1.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2887
- build(deps): bump redis from 0.32.4 to 0.32.5 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2880
- build(deps): bump regex from 1.11.1 to 1.11.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2925
- build(deps): bump reqwest from 0.12.22 to 0.12.23 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2885
- build(deps): bump serde_json from 1.0.140 to 1.0.141 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2847
- build(deps): bump serde_json from 1.0.141 to 1.0.142 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2865
- build(deps): bump serde_json from 1.0.142 to 1.0.143 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2898
- build(deps): bump strum from 0.27.1 to 0.27.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2848
- build(deps): bump strum_macros from 0.27.1 to 0.27.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2849
- build(deps): bump sysinfo from 0.36.0 to 0.36.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2846
- build(deps): bump sysinfo from 0.36.1 to 0.37.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2881
- build(deps): bump tempfile from 3.20.0 to 3.21.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2900
- build(deps): bump tokio from 1.46.1 to 1.47.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2857
- build(deps): bump tokio from 1.47.0 to 1.47.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2866
- build(deps): bump uuid from 1.17.0 to 1.18.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2883
- build(deps): bump url from 2.5.4 to 2.5.6 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2912
- build(deps): bump url from 2.5.6 to 2.5.7 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2919
- build(deps): bump zip from 4.3.0 to 4.5.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2911
- applied select clippy suggestions
- updated indirect dependencies
- bumped MSRV to Rust 1.89
Fixed
- fix:
jsonmore robust error-handling of invalid JSON input; https://github.com/dathere/qsv/pull/2844 - fix:
templatefix stdin regression https://github.com/dathere/qsv/pull/2907 - fix:
renameadd--positionaloption https://github.com/dathere/qsv/pull/2930 - fix:
renamethe real fix - positional is now the default and pairwise is the option https://github.com/dathere/qsv/pull/2931 - fix:
partitioncase insensitive filesystems https://github.com/dathere/qsv/pull/2934 - docs: fix inconsistent formatting in command help examples by @abobov in https://github.com/dathere/qsv/pull/2862
New Contributors
- @abobov made their first contribution in https://github.com/dathere/qsv/pull/2862
Full Changelog: https://github.com/dathere/qsv/compare/6.0.1...7.0.0
- Rust
Published by jqnatividad 6 months ago
https://github.com/dathere/qsv - 6.0.1
[6.0.1] - 2025-07-12
This is a patch release with bug fixes and minor improvements.
Changed
- feat: updated completions for qsv v6.0.0 by @rzmk in #2838
- docs: updated sample schema.json based on NYC311 1M row sample benchmark data
- docs: updated sample stats output using NYC 311 1M row sample benchmark data
- build(deps): bump chrono-tz from 0.10.3 to 0.10.4 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2839
- build(deps): bump qsv-stats from 0.35.0 to 0.36.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2840
- bumped indirect dependencies
- Added benchmark_data.* to .gitignore
Fixed
geocode: make--batch=0mode more robust by setting a minimum batch size of 1,000 rows https://github.com/dathere/qsv/commit/2fa90bcc7df57a338a4851bafb361e886cea97c5jsonl: correct batchsize calculation to use input file instead of output file for line counting https://github.com/dathere/qsv/commit/742dc777a3d2d2f3d70e72078d69cfdc39c04b4bbenchmarks: fixed benchmarks with unescaped parameters with embedded spaces https://github.com/dathere/qsv/commit/ad95596b8400154b50042e2cb8352900d0198904
Removed
- Removed retired publishing workflows (linux-glibc-231-musl-123 and wix-installer)
Full Changelog: https://github.com/dathere/qsv/compare/6.0.0...6.0.1
- Rust
Published by jqnatividad 8 months ago
https://github.com/dathere/qsv - 6.0.0
What's Changed
- build(deps): bump libc from 0.2.173 to 0.2.174 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2794
feat:schemaJSON schema description property set to cmdline used to generate the JSON schema by @jqnatividad in https://github.com/dathere/qsv/pull/2796- build(deps): bump phf from 0.11.3 to 0.12.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2797
- deps: bump polars from 0.48 to 0.49 by @jqnatividad in https://github.com/dathere/qsv/pull/2798
sqlp&joinp:--decimal-commaoption is not only for parsing input CSVs, it's also used when writing output CSVs by @jqnatividad in https://github.com/dathere/qsv/pull/2800- build(deps): bump flexi_logger from 0.31.0 to 0.31.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2801
- build(deps): bump zip from 4.1.0 to 4.2.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2802
- feat:
validateimproved JSON Schema schema validation by @jqnatividad in https://github.com/dathere/qsv/pull/2803 - feat: update completions for qsv v5.1.0 by @rzmk in https://github.com/dathere/qsv/pull/2804
- feat:
lensadd--wrap-modeoption by @jqnatividad in https://github.com/dathere/qsv/pull/2805 - feat:
renamepair-based renaming by @jqnatividad in https://github.com/dathere/qsv/pull/2806 - feat:
sortadd--naturalsort option by @jqnatividad in https://github.com/dathere/qsv/pull/2808 - build(deps): bump flexi_logger from 0.31.1 to 0.31.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2812
- build(deps): bump arboard from 3.5.0 to 3.6.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2814
- build(deps): bump minijinja from 2.10.2 to 2.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2815
- build(deps): bump minijinja-contrib from 2.10.2 to 2.11.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2816
- build(deps): bump reqwest from 0.12.20 to 0.12.21 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2817
- build(deps): bump indicatif from 0.17.11 to 0.17.12 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2818
- build(deps): bump tokio from 1.45.1 to 1.46.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2821
- build(deps): bump reqwest from 0.12.21 to 0.12.22 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2820
- dep: bump polars to latest upstream - adapt to PlPath api reqt by @jqnatividad in https://github.com/dathere/qsv/pull/2822
- build(deps): bump qsv-stats from 0.33.0 to 0.34.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2823
- build(deps): bump tokio from 1.46.0 to 1.46.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2825
- deps: Remove similar-asserts and go back to std asserts by @jqnatividad in https://github.com/dathere/qsv/pull/2826
- perf:
transposerefactored for perf by @jqnatividad in https://github.com/dathere/qsv/pull/2827 - build(deps): bump jaq-std from 2.1.1 to 2.1.2 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2830
- build(deps): bump jaq-core from 2.2.0 to 2.2.1 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2831
- build(deps): bump jaq-json from 1.1.2 to 1.1.3 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2832
- build(deps): bump human-panic from 2.0.2 to 2.0.3 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2833
- build(deps): bump zip from 4.2.0 to 4.3.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2835
- build(deps): bump sysinfo from 0.35.2 to 0.36.0 by @dependabot[bot] in https://github.com/dathere/qsv/pull/2836
- perf: bump to faster geosuggest to 0.8 by @jqnatividad in https://github.com/dathere/qsv/pull/2837
Full Changelog: https://github.com/dathere/qsv/compare/5.1.0...6.0.0
- Rust
Published by jqnatividad 8 months ago
https://github.com/dathere/qsv - 5.1.0
[5.1.0] - 2025-06-17
Highlights
lensis now colorful by default, with a--monochromeoption to turn it off:qsv lens /tmp/NYC311SR_2010-2020-sample-1M.csv
lenscan now have custom prompts with the--promptoption (with support for ANSI escape codes to format the prompt). Meant to be paired with the--echo-column <colname>option, e.g.:qsv lens --prompt $'\033[1;5;31mBlinking red, bold text\033[0m' --echo-column 'Unique Key' \ /tmp/NYC311SR_2010-2020-sample-1M.csv
* the
qsv-stats crate - the underlying engine behind the central stats, frequency and "smart" commands, got a lot of love in this release
* validate got a tad faster while decreasing its memory footprint. The new --no-format-validation option now also allows you to ignore all JSON Schema "format" keywords (e.g. date, email, url, currency, etc.) when validating CSVs.
Added
lens: add--promptoption, add examples to regex-enabled options https://github.com/dathere/qsv/pull/2772lens: add--monochromeoption, otherwise, columns displayed in different colors https://github.com/dathere/qsv/pull/2761validate: add--no-format-validationoption when in JSON Schema mode https://github.com/dathere/qsv/pull/2762- docs: add shell completions badges by @rzmk in https://github.com/dathere/qsv/pull/2760
- feat: added criterion trim algorithm microbenchmarks https://github.com/dathere/qsv/pull/2789
Changed
frequency: performance microoptimizations - use stats cache column cardinality to pre-alloc & size frequency hash tablesgeocode: refactor regex handling for performance & maintainabilityjson: preserve key order https://github.com/dathere/qsv/pull/2777stats: performance microoptimizations - useunwrap_unchecked()instead of justunwrap()in hot sampling functionsvalidate: major refactoring for added performance/memory efficiency- chore: temporarily use qsv-calamine until a new calamine is released https://github.com/dathere/qsv/pull/2790
- Bump cpc from 1.9 to 2 https://github.com/dathere/qsv/pull/2770
- deps: bump criterion from 0.5 to 0.6 https://github.com/dathere/qsv/pull/2791
- deps: use latest csvlens upstream with colorful columnshttps://github.com/dathere/qsv/commit/f2c9322e33a0ac335dafec10a490c871d3de0a6c
- deps: temporarily use qsv-calamine until a new calamine is released https://github.com/dathere/qsv/pull/2790
- deps: bump our patched forks of
cached,csvs_convert,json-objects-to-csv,jsonschema,localzone,rfd,self_updateuntil PRs are merged or new releases are made - deps: bump zip from 3 to 4 in https://github.com/dathere/qsv/commit/75909d2ca8835400bee5a90e18085c370939bb53
- deps: bump polars to 0.48.1 at 49ce57a revision
- build(deps): bump atoi_simd from 0.16.0 to 0.16.1 by @dependabot in https://github.com/dathere/qsv/pull/2766
- build(deps): bump bytemuck from 1.23.0 to 1.23.1 by @dependabot in https://github.com/dathere/qsv/pull/2778
- build(deps): bump flate2 from 1.1.1 to 1.1.2 by @dependabot in https://github.com/dathere/qsv/pull/2781
- build(deps): bump flexi_logger from 0.30.1 to 0.30.2 by @dependabot in https://github.com/dathere/qsv/pull/2765
- build(deps): bump flexi_logger from 0.30.2 to 0.31.0 by @dependabot in https://github.com/dathere/qsv/pull/2793
- build(deps): bump hashbrown from 0.15.3 to 0.15.4 by @dependabot in https://github.com/dathere/qsv/pull/2779
- build(deps): bump libc from 0.2.172 to 0.2.173 by @dependabot in https://github.com/dathere/qsv/pull/2787
- build(deps): bump mimalloc from 0.1.46 to 0.1.47 by @dependabot in https://github.com/dathere/qsv/pull/2792
- build(deps): bump mlua from 0.10.3 to 0.10.5 by @dependabot in https://github.com/dathere/qsv/pull/2758
- build(deps): bump num_cpus from 1.16.0 to 1.17.0 by @dependabot in https://github.com/dathere/qsv/pull/2771
- build(deps): bump parking_lot from 0.12.3 to 0.12.4 by @dependabot in https://github.com/dathere/qsv/pull/2768
- build(deps): bump pyo3 from 0.25.0 to 0.25.1 by @dependabot in https://github.com/dathere/qsv/pull/2785
- deps: upgrade qsv-stats from 0.32 to 0.33, which features major memory and performance optimizations behind the
stats&frequencycommands https://github.com/dathere/qsv/pull/2786 - deps: bump redis from 0.29.5 to 0.32
- build(deps): bump reqwest from 0.12.15 to 0.12.16 by @dependabot in https://github.com/dathere/qsv/pull/2764
- build(deps): bump reqwest from 0.12.16 to 0.12.18 by @dependabot in https://github.com/dathere/qsv/pull/2767
- build(deps): bump reqwest from 0.12.18 to 0.12.19 by @dependabot in https://github.com/dathere/qsv/pull/2773
- build(deps): bump reqwest from 0.12.19 to 0.12.20 by @dependabot in https://github.com/dathere/qsv/pull/2782
- build(deps): bump rust_decimal from 1.37.1 to 1.37.2 by @dependabot in https://github.com/dathere/qsv/pull/2788
- build(deps): bump smallvec from 1.15.0 to 1.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2780
- build(deps): bump sysinfo from 0.35.1 to 0.35.2 by @dependabot in https://github.com/dathere/qsv/pull/2774
- build(deps): bump titlecase from 3.5.0 to 3.6.0 by @dependabot in https://github.com/dathere/qsv/pull/2775
- build(deps): bump tokio from 1.45.0 to 1.45.1 by @dependabot in https://github.com/dathere/qsv/pull/2759
- build(deps): bump uuid from 1.16.0 to 1.17.0 by @dependabot in https://github.com/dathere/qsv/pull/2757
- applied select clippy suggestions
- updated indirect dependencies
- set Rust nightly to 2025-05-21, the same nightly Polars uses https://github.com/dathere/qsv/commit/872ade1b52cb0013fdb30aa2c4d83ce2081cf0c6
Fixed:
- fix:
frequencyrecover from non-fatal absence of stats cache, instead of panicking https://github.com/dathere/qsv/commit/b2821a0 - fix: flaky
jsontests caused by hardcoding name of intermediate file - https://github.com/dathere/qsv/commit/62ca310f5942a3ffcf7334a5623db0c94c9fa8b3 - fix: flaky
reverseproperty tests by handling BOM characters https://github.com/dathere/qsv/commit/cefd490a899156735baf904b597b322e96b61f5d - fix:
util::process_inputhelper does not honorQSV_SKIP_FORMAT_CHECKwhen processing dir input https://github.com/dathere/qsv/pull/2784
Full Changelog: https://github.com/dathere/qsv/compare/5.0.3...5.1.0
- Rust
Published by jqnatividad 9 months ago
https://github.com/dathere/qsv - 5.0.3
[5.0.3] - 2025-05-22 "The Geo Release" 🌍
qsv 5.0.3 represents a major milestone with significant enhancements to its geospatial data processing capabilities.
They're targeted to support the Datapusher+ Data Resource Upload First (DRUF) workflow for "automagical metadata inferencing" - focusing on DCAT-US v3 recommended spatial and temporal properties that would otherwise be too tedious to manually compile:
New Geocoding Capabilities
- Added IP geolocation with new --iplookup and --iplookupnow subcommands in the geocode command
- Integrated Maxmind GeoLite2 database support for accurate IP-to-location mapping
- Enhanced geocoding performance (up to 5x faster) with rkyv serialization (contributed by @estin)
Enhanced geoconvert Command
- Added CSV input support alongside existing geospatial formats
- Introduced GeoJSONL output format for streaming workflows
- Added stdin support for all formats except SHP input
- New coordinate handling options: --latitude and --longitude parameters
- Added --max-length option for output control
- Comprehensive test coverage additions
- all contributed by @rzmk!
🚀 Performance & Infrastructure Improvements
Polars Integration
- Upgraded Polars from 0.46.0 to 0.48.1 with intermediate releases
- Enhanced Polars schema support across multiple commands (schema, joinp, pivotp, sqlp)
- Added --polars mode to the schema command to explicitly create a polars schema file on demand, rather than as a side-effect of the sqlp command using its --cache-schema option.
Core Performance
- Microoptimizations in the sort command
- Improved file handling with tempfile usage in edit --in-place
- Enhanced auto-decompression support now available suite-wide for gz, zlib, and zst files
🛠️ New Features & Usability
Enhanced Commands
- edit: New --in-place option for direct file modification with automatic backup (.bak) creation
- foreach: Added "/" to splitter pattern for improved path handling
- stats: New QSV_STATS_STRING_MAX_LENGTH environment variable for string analysis control
- to: Added --all-strings option for simplified data type handling
Distribution & Installation - Added conda package support with installation instructions - New download badges and streamlined installation documentation - Retired older glibc-2.31 and musl-1.2.3 "prebuilt-older" binaries as Ubuntu 20.04 has been retired and no longer supported with GitHub Actions. - Discontinued MSI installer in favor of the easier qsv Windows Easy Installer (thanks @rzmk!)
Quality & Stability
- Applied multiple clippy lint suggestions for code quality
- Enhanced test coverage, particularly for geospatial functions
- Improved documentation with better examples and clearer explanations
- Fixed stdin handling issues in the split command
🎯 Default Feature Changes
The qsvdp variant now includes geocode and geoconvert commands by default, making geospatial functionality more accessible to Datapusher+ users with Jinja2-powered metadata formulas.
NOTE: * for qsv v5.0.3,
cargo installwill NOT worked as the calamine crate (which powers theexcelcommand) is pinned to zip 2.5.0 which was yanked. * unfortunately, the broken zip dependency also prevents us from publishing qsv 5.0.3 tocrates.io* for both cases, either install the prebuilts or compile from source withcargo build.
Added
edit: add--in-place(and test) which uses tempfile by @rzmk in https://github.com/dathere/qsv/pull/2744foreach: add "/" to splitter pattern https://github.com/dathere/qsv/pull/2754geoconvert: add CSV input and GeoJSONL output and use buf by @rzmk in https://github.com/dathere/qsv/pull/2690geoconvert: add stdin support (except for SHP input) by @rzmk in https://github.com/dathere/qsv/pull/2699geoconvert: add--latitudeand--longitudeoptions by @rzmk in https://github.com/dathere/qsv/pull/2707geoconvert: add--max-lengthoption https://github.com/dathere/qsv/pull/2711geocode: addiplookupandiplookupnowsubcommands https://github.com/dathere/qsv/pull/2741- tests:
geoconvert- add basic tests and move tests to test_geoconvert.rs by @rzmk in https://github.com/dathere/qsv/pull/2717 qsvdpnow include geocode & geoconvert commands by default https://github.com/dathere/qsv/pull/2697stats: QSVSTATSSTRINGMAXLENGTH env var https://github.com/dathere/qsv/pull/2709to: add--all-stringsoption https://github.com/dathere/qsv/pull/2746- docs: add conda install command by @rzmk in https://github.com/dathere/qsv/pull/2718
- docs: add qsv download badges and update install instructions by @rzmk in https://github.com/dathere/qsv/pull/2721
Changed
geocode: bump geosuggest crate to use much faster rkyv serialization by @estin in https://github.com/dathere/qsv/pull/2734sort: microoptimize https://github.com/dathere/qsv/pull/2748- feat: update completions for qsv v5.0 by @rzmk in https://github.com/dathere/qsv/pull/2752
- Improved Polars Schema support https://github.com/dathere/qsv/pull/2703
- Bump polars from 0.46.0 to 0.47.0 https://github.com/dathere/qsv/commit/87bf7b7f5e0b5af754afabf2939ced3914eb276f
- Bump polars py-1.30.0-beta-1 https://github.com/dathere/qsv/pull/2747
- Bump polars to 0.48.0 https://github.com/dathere/qsv/commit/5a037eeff1d353f3f4b8f16a7d6ec6b3074b2f8c
- build(deps): bump polars from 0.48.0 to 0.48.1 by @dependabot in https://github.com/dathere/qsv/pull/2750
- build(deps): bump polars-ops from 0.48.0 to 0.48.1 by @dependabot in https://github.com/dathere/qsv/pull/2751
- build(deps): bump actions/setup-python from 5.5.0 to 5.6.0 by @dependabot in https://github.com/dathere/qsv/pull/2713
- build(deps): bump actix-web from 4.10.2 to 4.11.0 by @dependabot in https://github.com/dathere/qsv/pull/2742
- build(deps): bump bytemuck from 1.22.0 to 1.23.0 by @dependabot in https://github.com/dathere/qsv/pull/2719
- build(deps): bump chrono from 0.4.40 to 0.4.41 by @dependabot in https://github.com/dathere/qsv/pull/2722
- build(deps): bump ext-sort from 0.1.4 to 0.1.5 by @dependabot in https://github.com/dathere/qsv/pull/2736
- build(deps): bump file-format from 0.26.0 to 0.27.0 by @dependabot in https://github.com/dathere/qsv/pull/2735
- build(deps): bump pyo3 from 0.24.1 to 0.24.2 by @dependabot in https://github.com/dathere/qsv/pull/2708
- build(deps): bump jaq-json from 1.1.1 to 1.1.2 by @dependabot in https://github.com/dathere/qsv/pull/2714
- build(deps): bump jaq-std from 2.1.0 to 2.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2715
- build(deps): bump jaq-core from 2.1.1 to 2.2.0 by @dependabot in https://github.com/dathere/qsv/pull/2716
- build(deps): bump jsonschema from 0.29.1 to 0.30.0 by @dependabot in https://github.com/dathere/qsv/pull/2704
- build(deps): bump libc from 0.2.171 to 0.2.172 by @dependabot in https://github.com/dathere/qsv/pull/2696
- build(deps): bump sysinfo from 0.34.2 to 0.35.0 by @dependabot in https://github.com/dathere/qsv/pull/2724
- build(deps): bump minijinja from 2.9.0 to 2.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2727
- build(deps): bump minijinja from 2.10.1 to 2.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2732
- build(deps): bump minijinja-contrib from 2.9.0 to 2.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2728
- build(deps): bump minijinja-contrib from 2.10.1 to 2.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2733
- build(deps): bump pyo3 from 0.24.2 to 0.25.0 by @dependabot in https://github.com/dathere/qsv/pull/2745
- build(deps): bump rand from 0.9.0 to 0.9.1 by @dependabot in https://github.com/dathere/qsv/pull/2702
- build(deps): bump simd-json from 0.15.0 to 0.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2701
- build(deps): bump sysinfo from 0.35.0 to 0.35.1 by @dependabot in https://github.com/dathere/qsv/pull/2740
- build(deps): bump tempfile from 3.19.1 to 3.20.0 by @dependabot in https://github.com/dathere/qsv/pull/2739
- build(deps): bump tokio from 1.44.2 to 1.45.0 by @dependabot in https://github.com/dathere/qsv/pull/2731
- bump indirect dependencies
- apply select clippy lint suggestions
- bump MRSV to 1.87.0
Fixed:
- docs: fix typo in apply operations replace example by @HarrisonMc555 in https://github.com/dathere/qsv/pull/2743
- fix:
splitsave stdin to tempfile https://github.com/dathere/qsv/pull/2706
New Contributors
- @estin made their first contribution in https://github.com/dathere/qsv/pull/2734
- @HarrisonMc555 made their first contribution in https://github.com/dathere/qsv/pull/2743
Full Changelog: https://github.com/dathere/qsv/compare/4.0.0...5.0.3
- Rust
Published by jqnatividad 9 months ago
https://github.com/dathere/qsv - 4.0.0
[4.0.0] - 2025-04-13
Highlights:
This is a major release with numerous improvements!
- qsv can now read additional file formats by leveraging the Polars engine:
Arrow/IPC, Avro, Parquet, JSON (JSON array) and JSONL
- Automatic decompression support for compressed CSV file dialects (csv, tsv/tab & csv) using gzip (.gz), zlib (.zlib), zstd (.zst) compression formats. (e.g. data.csv.gz, data.tsv.zst, data.ssv.zlib)
qsv sample 1000 data.parquet | qsv stats | qsv lens
qsv frequency data.csv.gz | qsv lens
qsv search Waldo data.tsv.zlib | qsv lens
qsv select 2-5 data.jsonl | qsv lens
- New geoconvert command for converting spatial formats to CSV:
- GeoJSON
# convert TX_cities.geojson to CSV, filter out the geometry column and browse with lens
qsv geoconvert TX_cities.geojson geojson csv | qsv select '!geometry' | qsv lens
- Shapefile (SHP)
- Enhanced split command with new --filter option:
- Similar to GNU split
- Spawns a subprocess for each chunk
(e.g. compress each chunk with qsv split outdir input.csv --filter "gzip {}.csv")
- Expanded to command:
- added LibreOffice Calc (ODS) support
- re-enabled parquet generation now that it's using Arrow instead of DuckDB (which made for very long compiles)
- New uniqueCombinedWith JSON Schema custom keyword in validate command:
- Allows validating uniqueness across multiple columns
- Useful for composite key validation
- QSVDOTENVPATH now supports the sentinel value "<NONE>" to disable dotenv processing altogether.
Added
geoconvert: new command to convert spatial formats to CSV by @rzmk in https://github.com/dathere/qsv/pull/2681 & https://github.com/dathere/qsv/pull/2688split: add--filteroptions https://github.com/dathere/qsv/pull/2660sqlp: add decimal type support https://github.com/dathere/qsv/pull/2646to: add backtoparquet support https://github.com/dathere/qsv/pull/2665- feat: Extended auto decompression support. In addition to snappy auto-decompression, auto-decompress CSV dialects (tsv/tab & ssv files) using gzip, zlib and zstd compression formats https://github.com/dathere/qsv/pull/2671
to: add ODS support https://github.com/dathere/qsv/pull/2674validate: add uniqueCombinedWith custom JSON Schema Validation keyword https://github.com/dathere/qsv/pull/2636- feat:
promptadd file formats supported to dialog box filter when polars feature is enabled https://github.com/dathere/qsv/pull/2667 - feat: add
QSV_POLARS_FLOAT_PRECISIONenv var https://github.com/dathere/qsv/pull/2678 tests: add tests for https://100.dathere.com/lessons/3 by @rzmk in https://github.com/dathere/qsv/pull/2638
Changed
qsvdpbinary variant can now use thegeocode&geoconvertcommands https://github.com/dathere/qsv/commit/50f004608d396602b8f6eb048a72dcc56630d26ageocodefeature now gates thegeocode&geoconvertcommand https://github.com/dathere/qsv/commit/9d046e8da107c088f1e0b3bb20e64ab79fde05d8- feat: setting QSVDOTENVPATH to sentinel value "<NONE>\" disables dotenv processing https://github.com/dathere/qsv/pull/2684
- refactor: polars special formats support https://github.com/dathere/qsv/pull/2683
contrib(completions): update completions to v3.3.0 by @rzmk in https://github.com/dathere/qsv/pull/2626contrib(completions): update completions for qsv v4.0.0 by @rzmk in https://github.com/dathere/qsv/pull/2677- deps: bump polars to 0.46.0 at py-1.27.1 tag https://github.com/dathere/qsv/pull/2675 and https://github.com/dathere/qsv/commit/e5d29d7f192bf10f6528d0d423347179d785e40f
- build(deps): bump actions/setup-python from 5.4.0 to 5.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2627
- build(deps): bump arboard from 3.4.1 to 3.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2653
- build(deps): bump chrono-tz from 0.10.2 to 0.10.3 by @dependabot in https://github.com/dathere/qsv/pull/2623
- build(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 by @dependabot in https://github.com/dathere/qsv/pull/2672
- build(deps): bump csvs_convert from 0.11.0 to 0.11.1 by @dependabot in https://github.com/dathere/qsv/pull/2686
- build(deps): bump data-encoding from 2.8.0 to 2.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2685
- build(deps): bump flate2 from 1.1.0 to 1.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2649
- build(deps): bump flexi_logger from 0.29.8 to 0.30.0 by @dependabot in https://github.com/dathere/qsv/pull/2650
- build(deps): bump flexi_logger from 0.30.0 to 0.30.1 by @dependabot in https://github.com/dathere/qsv/pull/2651
- build(deps): bump governor from 0.8.1 to 0.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2625
- build(deps): bump governor from 0.9.0 to 0.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2631
- build(deps): bump jsonschema from 0.29.0 to 0.29.1 by @dependabot in https://github.com/dathere/qsv/pull/2635
- build(deps): bump log from 0.4.26 to 0.4.27 by @dependabot in https://github.com/dathere/qsv/pull/2622
- build(deps): bump mimalloc from 0.1.44 to 0.1.45 by @dependabot in https://github.com/dathere/qsv/pull/2652
- build(deps): bump minijinja from 2.8.0 to 2.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2643
- build(deps): bump minijinja-contrib from 2.8.0 to 2.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2642
- build(deps): bump pyo3 from 0.24.0 to 0.24.1 by @dependabot in https://github.com/dathere/qsv/pull/2645
- build(deps): bump qsv-dateparser from 0.12.1 to 0.13.0 by @dependabot in https://github.com/dathere/qsv/pull/2639
- build(deps): bump qsv-sniffer from 0.10.3 to 0.11.0 by @dependabot in https://github.com/dathere/qsv/pull/2640
- build(deps): bump redis from 0.29.2 to 0.29.4 by @dependabot in https://github.com/dathere/qsv/pull/2663
- build(deps): bump redis from 0.29.4 to 0.29.5 by @dependabot in https://github.com/dathere/qsv/pull/2666
- build(deps): bump smallvec from 1.14.0 to 1.15.0 by @dependabot in https://github.com/dathere/qsv/pull/2656
- build(deps): bump sysinfo from 0.34.0 to 0.34.1 by @dependabot in https://github.com/dathere/qsv/pull/2637
- build(deps): bump sysinfo from 0.34.1 to 0.34.2 by @dependabot in https://github.com/dathere/qsv/pull/2648
- build(deps): bump titlecase from 3.4.0 to 3.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2669
- build(deps): bump tokio from 1.44.1 to 1.44.2 by @dependabot in https://github.com/dathere/qsv/pull/2662
- applied select clippy lint suggestions
- bumped indirect dependencies to latest version
Fixed
- fix:
selectpanic when idx is out of bounds https://github.com/dathere/qsv/pull/2670 - fix: correct link to qsv-dateparser accepted date formats https://github.com/dathere/qsv/pull/2632
- fix: reset SIGPIPE handling https://github.com/dathere/qsv/pull/2664
- docs: fix typo it's -> its by @rzmk in https://github.com/dathere/qsv/pull/2680
Full Changelog: https://github.com/dathere/qsv/compare/3.3.0...4.0.0
- Rust
Published by jqnatividad 11 months ago
https://github.com/dathere/qsv - 3.3.0
[3.3.0] - 2025-03-23
Highlights:
statsgot another round of improvements:- boolean inferencing is now configurable!
Before, it was limited to a simple, English-centric heuristic: - When a column's cardinality is 2; and the 2 values' first characters are
0/1,t/fory/ncase-insensitive, the data type of the column is inferred as boolean - With the new
--boolean-patterns <arg>option, we can now specify arbitrarytrue_pattern:false_patternpattern pairs. Each pattern can be a string of length > 1, case-insensitive. If a pattern ends with "", it is treated as a prefix.
For example, `t:fmatches "true", "Truthy", "T" as boolean true so long as the corresponding false pattern (e.g. "Fake, False, f") is also matched. Bear in mind that the cardinality needs to be 2, so multiple matches on the same column on different patterns will disqualify the field as boolean (e.g. If a column's domain is "True", "truthy" and "False", it doesn't qualify as it's cardinality is 3. On the other hand, if it's "True", "true", "False", "false", "FALSE" - it still qualifies as they resolve to just "true/false" case-insensitive.<br />For backwards compatibility, the default true/false pairs are1:0,t:f,y:n*` - percentiles can now be computed!
By enabling the--percentilesflag,statswill now return the 5th, 10th, 40th, 60th, 90th and 95th percentile by default using the nearest-rank method for all numeric and date/datetime columns. The returned percentiles can be configured to return different percentiles using the--percentile-list <arg>option.
Note that the method for computing quartiles (Method 3) is basically a specialized implementation of the nearest rank method for q1 (25th), q2 (50th or median) and q3 (75th percentile), thus the choice of non-overlapping defaults for--percentile-list.
- boolean inferencing is now configurable!
frequency: got a performance boost now that we're usingqsv-stats0.32.0, which uses the fasterfoldhashcrate- in the same vein, by replacing
ahashwithfoldhashsuite-wide, qsv got a tad faster when doing hash lookups sample: "streaming" bernoulli sampling now works for any remotely hosted CSVs with servers that support chunked downloads, without requiring range request support.- we're now using the latest Polars engine - v0.46.0 at the py-1.26.0 tag.
Added
stats: add configurable boolean inferencing https://github.com/dathere/qsv/pull/2595stats: add--percentilesoption https://github.com/dathere/qsv/pull/2617
Changed
- refactor: replace ahash with faster foldhash https://github.com/dathere/qsv/pull/2619
- replace std
assert_eq!macro withsimilar_asserts::assert_eq!macro for easier debugging https://github.com/dathere/qsv/pull/2605 - deps: bump polars to 0.46.0 at py-1.25.2 tag https://github.com/dathere/qsv/pull/2604
- deps: bump Polars to v0.46.0 at py-1.26.0 tag https://github.com/dathere/qsv/pull/2621
- build(deps): bump actix-web from 4.9.0 to 4.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2591
- build(deps): bump indexmap from 2.7.1 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2592
- build(deps): bump mimalloc from 0.1.43 to 0.1.44 by @dependabot in https://github.com/dathere/qsv/pull/2608
- build(deps): bump qsv-stats from 0.30.0 to 0.31.0 by @dependabot in https://github.com/dathere/qsv/pull/2603
- build(deps): bump qsv-stats from 0.31.0 to 0.32.0 by @dependabot in https://github.com/dathere/qsv/pull/2620
- build(deps): bump reqwest from 0.12.12 to 0.12.13 by @dependabot in https://github.com/dathere/qsv/pull/2593
- build(deps): bump reqwest from 0.12.13 to 0.12.14 by @dependabot in https://github.com/dathere/qsv/pull/2596
- build(deps): bump reqwest from 0.12.14 to 0.12.15 by @dependabot in https://github.com/dathere/qsv/pull/2609
- build(deps): bump rfd from 0.15.2 to 0.15.3 by @dependabot in https://github.com/dathere/qsv/pull/2597
- build(deps): bump rust_decimal from 1.37.0 to 1.37.1 by @dependabot in https://github.com/dathere/qsv/pull/2616
- build(deps): bump simd-json from 0.14.3 to 0.15.0 by @dependabot in https://github.com/dathere/qsv/pull/2615
- build(deps): bump tempfile from 3.18.0 to 3.19.0 by @dependabot in https://github.com/dathere/qsv/pull/2602
- build(deps): bump tempfile from 3.19.0 to 3.19.1 by @dependabot in https://github.com/dathere/qsv/pull/2612
- build(deps): bump uuid from 1.15.1 to 1.16.0 by @dependabot in https://github.com/dathere/qsv/pull/2601
- build(deps): bump zip from 2.2.3 to 2.4.1 by @dependabot in https://github.com/dathere/qsv/pull/2607
- apply select clippy lint suggestions
- bumped indirect dependencies to latest version
- set Rust nightly to 2025-03-07, the same version Polars uses https://github.com/dathere/qsv/commit/17f6bdb3f80c5798d154a133428f0ca6ff59fc79
Fixed
- updated lock file, primarily to fix CVE-2025-29787 https://github.com/dathere/qsv/commit/e44e5df3fd296fcf85293d46a7afe08f40b86693
luau: fix flaky registerlookuptable CI test that only intermittently fails in Windows by using buffered writer in lookupwrite_cache_filehelper https://github.com/dathere/qsv/commit/f494b46d334259d370c92cd8cc6b211bc81c244asample: refactor "streaming" Bernoulli sampling, so it actually works without requiring range requests support https://github.com/dathere/qsv/pull/2600
Full Changelog: https://github.com/dathere/qsv/compare/3.2.0...3.3.0
- Rust
Published by jqnatividad 11 months ago
https://github.com/dathere/qsv - 3.2.0
[3.2.0] - 2025-03-09
Added
sample: "streaming" bernoulli sampling of remote files when hosted on servers with range requests support https://github.com/dathere/qsv/pull/2588
Changed
- Updated benchmarks.sh to add Homebrew installation prompt by @ondohotola in https://github.com/dathere/qsv/pull/2575
- feat: migrate to Rust 2024 edition https://github.com/dathere/qsv/pull/2587
- deps: bump
luaufrom 0.660 to 0.663 https://github.com/dathere/qsv/pull/2567 - deps: bump polars to 0.46.0 at py-1.24.0 tag https://github.com/dathere/qsv/commit/f70ce71ffa2d822aaa511e66bd11a2789786c82e
- deps: replace deprecated
simple-home-dirwithdirectoriescrate https://github.com/dathere/qsv/commit/6768cd59baa20b23ac9152cc8a9ce176d9e2c362 - deps: bump arrow from 54.2.0 to 54.2.1 https://github.com/dathere/qsv/commit/fc479b2b87843a370e072248e9b6331de690f0a2
- build(deps): bump bytemuck from 1.21.0 to 1.22.0 by @dependabot in https://github.com/dathere/qsv/pull/2570
- build(deps): bump console from 0.15.10 to 0.15.11 by @dependabot in https://github.com/dathere/qsv/pull/2569
- build(deps): bump governor from 0.8.0 to 0.8.1 by @dependabot in https://github.com/dathere/qsv/pull/2562
- build(deps): bump minijinja from 2.7.0 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2573
- build(deps): bump minijinja-contrib from 2.7.0 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2571
- build(deps): bump pyo3 from 0.23.4 to 0.23.5 by @dependabot in https://github.com/dathere/qsv/pull/2558
- build(deps): bump pyo3 from 0.23.5 to 0.24.0 by @dependabot in https://github.com/dathere/qsv/pull/2590
- build(deps): bump redis from 0.29.0 to 0.29.1 by @dependabot in https://github.com/dathere/qsv/pull/2568
- build(deps): bump robinraju/release-downloader from 1.11 to 1.12 by @dependabot in https://github.com/dathere/qsv/pull/2580
- build(deps): bump serde_json from 1.0.139 to 1.0.140 by @dependabot in https://github.com/dathere/qsv/pull/2572
- build(deps): bump tempfile from 3.17.1 to 3.18.0 by @dependabot in https://github.com/dathere/qsv/pull/2581
- build(deps): bump uuid from 1.14.0 to 1.15.0 by @dependabot in https://github.com/dathere/qsv/pull/2563
- build(deps): bump uuid from 1.15.0 to 1.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2566
- applied select clippy lint suggestions
- bumped indirect dependencies to latest versions
Fixed
apply: fixcurrencytonumhandling of "0.00" value by adding parsing strictness control with--formatstroption https://github.com/dathere/qsv/pull/2586describegpt: fix panic by adding error handling when LLM API response is not in expected format https://github.com/dathere/qsv/pull/2577tojsonl: fix display of floats as per the JSON spec https://github.com/dathere/qsv/pull/2583
New Contributors
- @ondohotola made their first contribution in https://github.com/dathere/qsv/pull/2575
Full Changelog: https://github.com/dathere/qsv/compare/3.1.1...3.2.0
- Rust
Published by jqnatividad 12 months ago
https://github.com/dathere/qsv - 3.1.1
[3.1.1] - 2025-02-24
Highlights:
sample: is now a "smart" command that uses the stats cache to validate and make sampling faster.- With the QSVSTATSCACHEMODE env var, you can now control the stats cache behavior suite-wide, making sure "smart" commands use it when appropriate.
luaucommand's capabilities have been significantly expanded with:- New accumulate helper function for aggregating values across rows
- Optional naming for cumulative helper functions
- More robust error handling and improved docstrings
- Enhanced scripting performance with fast-float parsing
- new Wiki section with examples of using its helper functions
schema: now does type-aware sorting of enum lists, making JSON Schema enum list customization easier when fine-tuning it for JSON Schema validation withvalidate.lens: adds--freeze-columnsoption with a default of 1, improving navigation of wide CSVsstats: adds--dataset-statsoption to explicitly compute dataset-level statistics. Starting with qsv 2.0.0, it was computed automatically to support Datapusher+ and the DRUF workflow, but it was causing confusion with some command-line users.
Added
lens: added--freeze-columnsoption https://github.com/dathere/qsv/pull/2552luau: added accumulate helper function https://github.com/dathere/qsv/pull/2537 https://github.com/dathere/qsv/pull/2539luau: added a new section in the Wiki with examples of using the new helper functions https://github.com/dathere/qsv/wiki/Luau-Helper-Functions-Examplessample: is now "smart" - using the stats cache to validate and make sampling faster https://github.com/dathere/qsv/pull/2529 https://github.com/dathere/qsv/pull/2530 https://github.com/dathere/qsv/commit/71ec7ede121ef1e09fb19af9bac3f52aa67a7f54schema: added type-aware sort of JSON Schema enum list https://github.com/dathere/qsv/pull/2551stats: added--dataset-statsoption https://github.com/dathere/qsv/pull/2555python: added precompiled qsvpy binary for Python 3.13 https://github.com/dathere/qsv/commit/c4087788b6fee64f358047ea8ef44a5450604ec1- added QSVSTATSCACHEMODE env var to control stats cache suite-wide https://github.com/dathere/qsv/commit/4afb98d8729fa4c3c5f61e0a26347dad5aa1e9f8 https://github.com/dathere/qsv/commit/2adc313937ec8aa292976f8e5acf3a4e7756fd93 https://github.com/dathere/qsv/commit/ba75f0897e5a7e6579380a8a4c073a1af436648a
- docs: updated PERFORMANCE docs and added a TLDR version https://github.com/dathere/qsv/commit/77ed167aef8f7307ec295616a8b96af2f3bb81fd https://github.com/dathere/qsv/commit/c61c249a8354ee7f4ab0d03464624f3dd3249d2b https://github.com/dathere/qsv/commit/db0bb3f147599ece48ca2e8ad1d54db83d7b897c
- chore: added *.tab & *.ssv to typos config https://github.com/dathere/qsv/commit/523667520ac06a1c96942897aa9288fe7a9d1f5d
Changed
frequency: made error handling more robust https://github.com/dathere/qsv/commit/b195519ec04efcba7cfa7f99e153818d03f419d0luau: refactored all cumulative helper functions (cum_) now have name as an optional argument https://github.com/dathere/qsv/pull/2540schema: refactored to use QSVSTATSCACHEMODE env var https://github.com/dathere/qsv/commit/5771ff4892ab89f8ca7d6940aa02baaa0c9b1fa5select: refactored select helper https://github.com/dathere/qsv/commit/bfbe64cc64a20006e4c93d8a3f6be3f326411fecstats: optimized memory layout of central Stats struct https://github.com/dathere/qsv/commit/52f697e5828a5c3e059d7f25254e4aef840d8598stats: optimized record_count functionality https://github.com/dathere/qsv/commit/0e3114a54a8340639c381a19251d03ab94496b04 https://github.com/dathere/qsv/commit/18791da0cc2972de2f5909fe1556d83c8b7e8f9fcontrib(completions): update qsv completions for qsv 3.1 by @rzmk in https://github.com/dathere/qsv/pull/2556- deps: bump arrow and tempfile https://github.com/dathere/qsv/commit/4cc267972622dfb703779b3d18b084006369b449
- deps: bump cached and redis crates https://github.com/dathere/qsv/commit/e622d1447a9a8ff4ecdb22d000335fb2d129683a
- deps: bump csvlens from 0.11 to 0.12 https://github.com/dathere/qsv/commit/b2fd985bf51fac4ec224b4664cc2fe91d8676101
- deps: use our patched fork of csvlens with ability to freeze columns https://github.com/dathere/qsv/commit/d66ec6df0e768f29b1102108152f28028da0ec8b
- deps: bump polars to 0.46.0 at py-1.23.0 tag https://github.com/dathere/qsv/commit/6072aa22bed211cafa2fe90be58386acd8869415
- deps: bump flate2 from 1.0.35 to 1.1.0 https://github.com/dathere/qsv/commit/eed471a441f031d0311849a13ac3efb116baa33d
- deps: bump gzp from 0.11 to 1.0.0 https://github.com/dathere/qsv/commit/43c8a4a414484b9a3d573cb41a713ce838a2d425
- build(deps): bump jaq-json from 1.1.0 to 1.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2547
- build(deps): bump jaq-core from 2.1.0 to 2.1.1 by @dependabot in https://github.com/dathere/qsv/pull/2546
- build(deps): bump log from 0.4.25 to 0.4.26 by @dependabot in https://github.com/dathere/qsv/pull/2545
- build(deps): bump tempfile from 3.16.0 to 3.17.0 by @dependabot in https://github.com/dathere/qsv/pull/2532
- build(deps): bump tempfile from 3.17.0 to 3.17.1 by @dependabot in https://github.com/dathere/qsv/pull/2535
- build(deps): bump serde_json from 1.0.138 to 1.0.139 by @dependabot in https://github.com/dathere/qsv/pull/2541
- build(deps): bump serde from 1.0.217 to 1.0.218 by @dependabot in https://github.com/dathere/qsv/pull/2542
- build(deps): bump smallvec from 1.13.2 to 1.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2528
- build(deps): bump strum from 0.27.0 to 0.27.1 by @dependabot in https://github.com/dathere/qsv/pull/2533
- build(deps): bump strum_macros from 0.27.0 to 0.27.1 by @dependabot in https://github.com/dathere/qsv/pull/2534
- build(deps): bump uuid from 1.13.1 to 1.13.2 by @dependabot in https://github.com/dathere/qsv/pull/2538
- build(deps): bump uuid from 1.13.2 to 1.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2544
- chore: we now have ~1,800 tests! https://github.com/dathere/qsv/commit/f5d09ed76d8e0acb9052f89b6688a047c756b053
- applied select clippy lint suggestions
- bumped indirect dependencies to latest versions
- bumped MSRV to latest Rust stable - v1.85
Fixed
count: refactored to fall back to "regular" CSV reader when Polars counting returns a zero count https://github.com/dathere/qsv/commit/fd39bcbd9574d8d5ef1ddc5025eda4748f2a8652schema: fixed off-by-one error https://github.com/dathere/qsv/commit/60de090bdf727dd0eaf79ba7058745fdacef07ef- ensured getstatsrecord helper returns field/stats correctly https://github.com/dathere/qsv/commit/ad86a373d01ea45902d764a46c19f26ad5b01029
- Fixed RUSTSEC-2025-0007: ring is unmaintained https://github.com/dathere/qsv/issues/2548
stats: only addqsv__valuecolumn when--dataset-statsis enabled https://github.com/dathere/qsv/commit/64267d38c4161b8591a6f81e36bea6c7fdbddc70- skip format check when path starts with temp dir or is a snappy file https://github.com/dathere/qsv/commit/ff8957e77ae4c28a24f323328c58a2549ff43c0c
Removed
frequency: removed--stats-modeoption now that we have a suite-wide QSVSTATSCACHEMODE env var https://github.com/dathere/qsv/commit/ba75f0897e5a7e6579380a8a4c073a1af436648a https://github.com/dathere/qsv/commit/416abb7ce73f406c2a605cdca87d50c12723698a- chore: removed simdutf8 conditional directive for aarch64 architecture, now that its no longer needed https://github.com/dathere/qsv/commit/ec1e16c7a20a7458b560e3c78dfbd83fba82de29
- removed publish-linux-qsvpy-glibc-231-musl-123.yml workflow as it was getting cross compilation errors and we have another musl workflow that works https://github.com/dathere/qsv/commit/7c08617132e8d7df069b7b3be160d3b348f44d53
Full Changelog: https://github.com/dathere/qsv/compare/3.0.0...3.1.1
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 3.0.0
[3.0.0] - 2025-02-13
Highlights:
sample: Five new sampling methods! In addition to reservoir & indexed - added bernoulli, systematic, stratified, weighted & cluster sampling. And they're all memory efficient so you should be able to sample arbitrarily large datasets!stats: Added "sortiness" [-1 (Descending) to 1 (Ascending)] & "uniquenessratio"_ [0 (many repeated values) to 1 (All unique values)] stats (more info).
The qsv-stats engine was also optimized to squeeze out more performance, withstatsnow 2.6x faster while using less memory despite the addition of these new stats.diff: is now a "smart" command, so that it uses the stats cache to short-circuit diffs if files are identical per their fingerprint hashes, and to validate that the diff key column is all unique.- The stats cache has been refactored and improved performance for "smart" commands:
frequencyis not only 3.3x faster, it uses far less memory as it now doesn't need to maintain hashmaps for columns with all unique values.tojsonlis 2.25x fasterschemais 1.4x faster
luaugot a major performance boost with the v0.660 engine upgrade, taking advantage of several compiler optimizations.luauis now up to 3.1x faster!validatehad a major performance regression - going down from 3.295 seconds in v2.1.0 to 13.159 seconds in v2.2.1 in the benchmarks. 4x slower! With the jsonschema 0.29 crate update,validatenow clocks in 3.022 seconds!
* template also got a big boost and is now 2.9x faster with the minijinja 2.7 crate update.
Added
joinp: additionaljoinpasofjoin sort and match options https://github.com/dathere/qsv/pull/2486stats: add "sortiness" statistic https://github.com/dathere/qsv/pull/2499statsadd uniqueness_ratio https://github.com/dathere/qsv/pull/2521stats&frequency: add--vis-whitespaceoption. Fulfills #2501 https://github.com/dathere/qsv/pull/2503sample: add more sampling methods (in addition to indexed and reservoir - added bernoulli, systematic, stratified, weighted & cluster sampling) and made them all memory efficient so we can sample arbitrarily large datasets: https://github.com/dathere/qsv/pull/2507 & https://github.com/dathere/qsv/pull/2511diff: makediffa "smart" command. Fulfills #2493 and #2509 https://github.com/dathere/qsv/pull/2518benchmarks: added new benchmarks forsamplefor new sampling methods https://github.com/dathere/qsv/commit/d758c54effcef31dbc1c1eb40e0c1789050eeb34
Changed
luau: bump from 0.653 to 0.660 and optimize for performance https://github.com/dathere/qsv/commit/4402df6788205341552b4f4e43220ea49924a28e https://github.com/dathere/qsv/commit/de429b4bb858a7872e30eccbdb3e526ad0ea322b https://github.com/dathere/qsv/commit/07ff8b8458a042987c9d11cae5b5b1dfaa934097 https://github.com/dathere/qsv/commit/3211f5c84fc23b652e4d7da83098e7db46829081stats: compute string len stats only for string columns https://github.com/dathere/qsv/pull/2495contrib(completions): update qsv completions for qsv 2.2.1 by @rzmk in https://github.com/dathere/qsv/pull/2494- deps: bump polars to latest upstream after its py-1.22.0 release
- deps: backported csv-core 0.1.12 fix to our qsv-optimized csv-core fork https://github.com/dathere/rust-csv/commit/5d0916e243f365a377b1b0e7c84bcf9585e83f2d
- build(deps): bump actions/setup-python from 5.3.0 to 5.4.0 by @dependabot in https://github.com/dathere/qsv/pull/2488
- build(deps): bump bytes from 1.9.0 to 1.10.0 by @dependabot in https://github.com/dathere/qsv/pull/2497
- build(deps): bump data-encoding from 2.7.0 to 2.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2512
- build(deps): bump geosuggest-core from 0.6.5 to 0.6.6 by @dependabot in https://github.com/dathere/qsv/pull/2520
- build(deps): bump geosuggest-utils from 0.6.5 to 0.6.6 by @dependabot in https://github.com/dathere/qsv/pull/2519
- build(deps): bump jsonschema from 0.28.3 to 0.29.0 by @dependabot in https://github.com/dathere/qsv/pull/2510
- build(deps): bump minijinja from 2.6.0 to 2.7.0 by @dependabot in https://github.com/dathere/qsv/pull/2489
- build(deps): bump mlua from 0.10.2 to 0.10.3 by @dependabot in https://github.com/dathere/qsv/pull/2485
- build(deps): bump qsv-stats from 0.27.0 to 0.28.0 by @dependabot in https://github.com/dathere/qsv/pull/2496
- build(deps): bump qsv-stats from 0.28.0 to 0.29.0 by @dependabot in https://github.com/dathere/qsv/pull/2498
- build(deps): bump qsv-stats from 0.29.0 to 0.30.0 by @dependabot in https://github.com/dathere/qsv/pull/2505
- chore: Bump rand to 0.9 https://github.com/dathere/qsv/pull/2504
- build(deps): bump simple-home-dir from 0.4.6 to 0.4.7 by @dependabot in https://github.com/dathere/qsv/pull/2515
- build(deps): bump uuid from 1.12.1 to 1.13.1 by @dependabot in https://github.com/dathere/qsv/pull/2500
- bumped numerous indirect dependencies to latest versions
- applied select clippy lint suggestions
- bumped MSRV to latest Rust stable - v1.84.1
Fixed
- docs: QSVAUTOINDEX => QSVAUTOINDEX_SIZE typo. Fixes #2479 https://github.com/dathere/qsv/pull/2484
- fix:
search&searchsetoff by 1 when using--flagoption. Fixes #2508 https://github.com/dathere/qsv/pull/2513
Full Changelog: https://github.com/dathere/qsv/compare/2.2.1...3.0.0
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 2.2.1
[2.2.1] - 2025-01-27
Changed
- deps: bumped polars to 0.46.0. This will allow us to publish qsv to crates.io as qsv was using features that were not enabled in polars 0.45.1 https://github.com/dathere/qsv/commit/275b2b8bd3cb41d9ddf30ba721d393d446bd2b48
Fixed
stats: fix cache json processing bug. Fixes #2476 https://github.com/dathere/qsv/pull/2477- benchmarks: v6.1.0 - ensured all
statscache benchmarks actually used the stats cache even if the default--cache-thresholdis 5 seconds - too high to trigger stats cache creation https://github.com/dathere/qsv/commit/ac33010260bf55c3424f8baa195f359f10ffe088
Full Changelog: https://github.com/dathere/qsv/compare/2.2.0...2.2.1
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 2.2.0
[2.2.0] - 2025-01-26
Highlights:
stats- the :heart: of qsv, got a little tune-up:- It got a tad faster now that we only compute string length stats for string types. Previously, we were also computing length for numbers, thinking it'll be useful for storage sizing purposes (as everything is stored as string with CSV). But as performance is goal number 1, we're no longer doing so. Besides, this sizing info can be derived using other stats.
- Fixed the problem with the stats cache being deleted/ignored even when not necessary.
This bug snuck in while implementing the--cache-thresholdcache suppression option. Withstatsgetting its cache mojo back - expect near-instant cache-backed response not only forstatsbut also other "automagical" smart commands 🪄.
diff- @janriemer squashed some bugs without sacrificingdiff's ludicrous speed! :wink:validate: addeddynamicEnumcustom JSON Schema keyword column specifier support.
You can now specify which column to validate against (by name or by 0-based column index), instead of always using the first column. This works for local & remote lookup files using thehttp/s://,ckan://anddathere://URL schemes.extdedupnow actually uses a proper memory-mapped backed on-disk hash table.
Previously, it was only deduping in-memory as the odht crate was not properly wired to a memory mapped file :facepalm: (I took the name of the odht crate literally and thought it was handling it :shrug:). Thanks for the detailed bug report @Svenskunganka!- JSON query parsing overhaul.
Thefetch,fetchpost&jsoncommands now use the latestjaqengine, making for faster performance especially now that we're precompiling and caching the jaq filter. - Polars engine upgraded. :polar_bear:
By two versions! py-polars 1.20.0 and 1.21.0 - giving thesqlp,joinp,pivotp&countcommands a little boost. :rocket:
NOTE: qsv v2.2.0 is not available on crates.io as it does not allow enabling unreleased features as we await a new version of Polars. As soon as Polars 0.46.0 is published, a new qsv patch release will be published to crates.io. This means that installation option 3 using
cargo installwill be limited to 1.0.0 - the last qsv version available on crates.io. All other installation and update options to install/update qsv 2.2.0 still work.
Added
diff: add--delimiter"convenience" option. Fulfills #2447 https://github.com/dathere/qsv/pull/2464slice: add stdin and snappy compressed file support https://github.com/dathere/qsv/commit/ab34a623f32bd25d9ff761972f66faa85f510a5dvalidate: add dynamicEnum column specifier support. Fulfills #2470 https://github.com/dathere/qsv/pull/2472
Changed
fetch,fetchpost&json:jaqdependency upgrade - fromjaq-interpret&jaq-parsetojaq-core/jaq-json/jaq-stdhttps://github.com/dathere/qsv/pull/2458fetch&fetchpost: cache compiled jaq filter https://github.com/dathere/qsv/pull/2467joinp: adjust asofby test to reflect Polars py-1.20.0 behavior https://github.com/dathere/qsv/commit/853a266c866aa54598b6b1a3faa253d151a6b472stats: compute string length stats for string type only https://github.com/dathere/qsv/pull/2471sqlp: wordsmith fastpath explanation https://github.com/dathere/qsv/commit/4e3f85397f67cbe20562e8a84c228b7dc61e4bd7- refactor: standardize -q and -Q shortcut options. Fulfills #2466 https://github.com/dathere/qsv/pull/2468
- deps: bump polars to 0.45.1 at py-polars-1.20.0 tag https://github.com/dathere/qsv/pull/2448
- deps: bump polars to 0.45.1 at py-polars-1.21.0 tag https://github.com/dathere/qsv/commit/4525d00ecd4845feaac2062d40bb7bc64c13688f
- deps: Bump csv-diff to 0.1.1 by @janriemer in https://github.com/dathere/qsv/pull/2456
- deps: Bump csvlens to latest upstream https://github.com/dathere/qsv/commit/27a723eee4af046920a022605ad6c3476c0962e4
- deps: use latest strum upstream https://github.com/dathere/qsv/commit/2ca1b0d476a20b93c786d0839cc5077e26fd6d88
- build(deps): bump base62 from 2.2.0 to 2.2.1 by @dependabot in https://github.com/dathere/qsv/pull/2440
- build(deps): bump chrono-tz from 0.10.0 to 0.10.1 by @dependabot in https://github.com/dathere/qsv/pull/2449
- build(deps): bump data-encoding from 2.6.0 to 2.7.0 by @dependabot in https://github.com/dathere/qsv/pull/2444
- build(deps): bump indexmap from 2.7.0 to 2.7.1 by @dependabot in https://github.com/dathere/qsv/pull/2461
- build(deps): bump jsonschema from 0.28.1 to 0.28.2 by @dependabot in https://github.com/dathere/qsv/pull/2469
- build(deps): bump jsonschema from 0.28.2 to 0.28.3 by @dependabot in https://github.com/dathere/qsv/pull/2473
- build(deps): bump log from 0.4.22 to 0.4.25 by @dependabot in https://github.com/dathere/qsv/pull/2439
- build(deps): bump semver from 1.0.24 to 1.0.25 by @dependabot in https://github.com/dathere/qsv/pull/2459
- build(deps): bump serde_json from 1.0.135 to 1.0.136 by @dependabot in https://github.com/dathere/qsv/pull/2455
- build(deps): bump serde_json from 1.0.136 to 1.0.137 by @dependabot in https://github.com/dathere/qsv/pull/2460
- build(deps): bump simple-home-dir from 0.4.5 to 0.4.6 by @dependabot in https://github.com/dathere/qsv/pull/2445
- build(deps): bump uuid from 1.11.1 to 1.12.0 by @dependabot in https://github.com/dathere/qsv/pull/2441
- build(deps): bump uuid from 1.12.0 to 1.12.1 by @dependabot in https://github.com/dathere/qsv/pull/2465
- tests: enabled Windows CI caching for faster CI tests
- bumped numerous indirect dependencies to latest versions
- applied select clippy lint suggestions
Fixed
count: Sometimes, polars count returns zero even if there are rows. Fixed by doing a regular csv reader count when polars count returns zero https://github.com/dathere/qsv/commit/abcd36524d6c26a17a2ecfac54498ecab58fe87cdiff: Fix name to index conversion by @janriemer. Fixes #2443 https://github.com/dathere/qsv/pull/2457extdedup: refactor/fix to actually have on-disk hash table backed by a mem-mapped file. Fixes #2462 https://github.com/dathere/qsv/pull/2475stats: fix stats caching as it was inadvertently deleting the stats cache even when not necessary https://github.com/dathere/qsv/commit/96e6d289d31a2b22345524fb5cc71eca0d6ffae9
Removed
foreach: refactored to remove unmaintainedlocal-encodingdependency https://github.com/dathere/qsv/pull/2454- remove
polarsfeature from qsvdp binary variant. We'll use py-polars from DP+ directly.
Full Changelog: https://github.com/dathere/qsv/compare/2.1.0...2.2.0
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 2.1.0
[2.1.0] - 2025-01-12
Highlights:
join&joinpfine-tuning continues, with several join key transformation options (--ignore-leading-zeros&--norm-unicode);joinfixes for--right-antiand--right-semijoins; and reverting ajoinperformance regression with 2.0.0.pivotpuses more summary statistics for even smarter aggregation suggestions
NOTE: qsv v2.1.0 is not available on crates.io. This was caused by qsv's use of a brand new
string_normalizePolars feature that is not yet available on the latest release of Polars - v0.45.1. Once a new version of Polars is published with this feature, a new qsv patch release will be published to crates.io. This means that installation option 3 usingcargo installwill be limited to 1.0.0 - the last qsv version available on crates.io. All other installation and update options to qsv 2.1.0 still work.
Added
join: add--ignore-leading-zerosoption https://github.com/dathere/qsv/pull/2430joinpadd--norm-unicodeoption to unicode normalize join keys https://github.com/dathere/qsv/pull/2436pivotpadded more smart aggregation suggestions https://github.com/dathere/qsv/pull/2428template: added to qsvdp binary variant https://github.com/dathere/qsv/commit/9df85e65dedf130981ab430764b4a4cdc9382dc8benchmarks: addedpivotpbenchmark https://github.com/dathere/qsv/commit/92e4c51cb17e5511f668b4a2cc96d9cab28a4758
Changed
joinp: refactored--ignore-leading-zeroshandling https://github.com/dathere/qsv/pull/2433- Migrate from unmaintained dynfmt to dynfmt2 https://github.com/dathere/qsv/pull/2421
- deps: bump csvlens to latest upstream https://github.com/dathere/qsv/commit/52c766da43642c2eef6f35819d8e9fb0966700a3
- deps: bump to latest csv qsv-optimized fork https://github.com/dathere/qsv/commit/58ac650abfa51b7b8deb23d1a8917b3983515755
- deps: bumped MiniJinja to 2.6.0 https://github.com/dathere/qsv/commit/8176368434982ba6bd206762c524a3dc47370039
- deps: bump to latest Polars upstream
- deps: bump qsv-stats to 0.26.0
- build(deps): bump azure/trusted-signing-action from 0.5.0 to 0.5.1 by @dependabot in https://github.com/dathere/qsv/pull/2420
- build(deps): bump base62 from 2.0.3 to 2.1.0 by @dependabot in https://github.com/dathere/qsv/pull/2419
- build(deps): bump base62 from 2.1.0 to 2.2.0 by @dependabot in https://github.com/dathere/qsv/pull/2426
- build(deps): bump phf from 0.11.2 to 0.11.3 by @dependabot in https://github.com/dathere/qsv/pull/2417
- build(deps): bump pyo3 from 0.23.3 to 0.23.4 by @dependabot in https://github.com/dathere/qsv/pull/2431
- build(deps): bump serde_json from 1.0.134 to 1.0.135 by @dependabot in https://github.com/dathere/qsv/pull/2416
- build(deps): bump tokio from 1.42.0 to 1.43.0 by @dependabot in https://github.com/dathere/qsv/pull/2423
- build(deps): bump uuid from 1.11.0 to 1.11.1 by @dependabot in https://github.com/dathere/qsv/pull/2427
- apply several clippy suggestions
- bumped numerous indirect dependencies to latest versions
- bumped Rust nightly from 2024-12-19 to 2025-01-05 (same version used by Polars)
- bump MSRV to latest Rust stable - v1.84.0
Fixed
join: revert optimization that actually resulted in a performance regression https://github.com/dathere/qsv/commit/e42af2b4e9ab9ef4eed43b97e343e253c50a35a1join:--right-antiand--right-semijoins didn't swap headers properly https://github.com/dathere/qsv/pull/2435count: polars-poweredcountdidn't use the right data type SQL count(*) https://github.com/dathere/qsv/commit/d8c1524ca0dff4ac19164ccb8090b01fd740b571
Full Changelog: https://github.com/dathere/qsv/compare/2.0.0...2.1.0
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 2.0.0
qsv v2.0.0 is here! 🎉
It took 193 releases to get to v1.0.0, and we're already at v2.0.0 a month later!?!
Yes! We wanted a running start for 2025, and qsv 2.0.0 marks qsv's biggest release yet!
- It fully enables the "Data Resource Upload First (DRUF)" workflow, allowing Datapusher+ to infer "automagical metadata" from the data itself. It exposes two Domain Specific Language (DSL) options - Luau and MiniJinja - to enable powerful data transformation and validation capabilities. This allows data stewards to upload data first, then use qsv's DSL capabilities inside DP+ to automatically generate rich metadata - including data dictionaries, field descriptions, data quality rules, and data validation schemas. This "automagical metadata" approach dramatically reduces the friction in compiling high-quality, high-resolution metadata (using the DCAT-US 3.0 specification as a reference) that would otherwise be a manual, laborious, and error-prone process.
Under the hood, thefetchpost,template,stats,validateandluaucommands now have the necessary scaffolding to fully support this workflow inside Datapusher+ and ckanext-scheming. - It adds a new "smart"
pivotpcommand, powered by Polars, to enable fast pivot operations on large datasets. It's "smart" as it uses the stats cache to automatically suggest an aggregation based on a column's data type and summary statistics. You can now pivot your data in seconds by simply specifying the columns to pivot on while blowing past Excel's PivotTable limitations. statsnow computes geometric mean and harmonic mean and adds string length stats, all while getting a performance boost.joinandjoinpgot a lot of love in this release, with several new options:joinp: non-equi join support! 🎉💯🥳
See "Lightning Fast and Space Efficient Inequality Joins" paper and this Polars non-equi join tracking issue.join&joinp:--right-antiand--right-semijoinsjoinp:--ignore-leading-zerosoption for join keysjoinp:--maintain-orderoption to maintain the order of the either the left or right dataset in the outputjoinp: expanded--cache-schemaoptions to makejoinpsmarter/faster by leveraging the stats cachejoin:--keys-outputoption to write successfully joined keys to a separate output file.
This release lays the groundwork for the outliers "smart" command to quickly identify outliers using stats/frequency info.
It also sets the stage for an initial implementation of our "Data Concierge" that leverages all the high-quality, high-res metadata we automagically compile with DRUF to enable Metadata Gardening Agents to proactively link seemingly unrelated data and glean insights as it constantly grooms the Data Catalog - effectively making it a FAIR Data Factory.
Added
fetchpost: add--globals-jsonoption https://github.com/dathere/qsv/pull/2357fixlengths: add--remove-emptyoption; refactored for performance. Fulfills #2391. https://github.com/dathere/qsv/pull/2411join: add--keys-outputoption. Fulfills #2407. https://github.com/dathere/qsv/pull/2408join: add--right-antiand--right-semioptions. Fulfills #2379. https://github.com/dathere/qsv/pull/2380joinp: add non-equi join support! 🎉💯🥳 https://github.com/dathere/qsv/pull/2409joinp: add--ignore-leading-zerosoption. Fulfills #2398. https://github.com/dathere/qsv/pull/2400joinp: add--maintain-orderoption https://github.com/dathere/qsv/pull/2338joinp: add--right-antiand--right-semioptions. Fulfills #2377. https://github.com/dathere/qsv/pull/2378luau: addl helper functions. Fulfills #1782. https://github.com/dathere/qsv/pull/2362luau: addqsv_writejsonhelper https://github.com/dathere/qsv/pull/2375pivotp: new polars polars-powered command. Fulfills #799. https://github.com/dathere/qsv/pull/2364pivotp: "smart" pivotp. https://github.com/dathere/qsv/pull/2367stats: add geometric mean and harmonic mean. Fulfills #2227. https://github.com/dathere/qsv/pull/2342stats: add string length stats to set stage for upcomingoutliers"smart" command to quickly identify outliers using stats/frequency info https://github.com/dathere/qsv/pull/2390template: add--globals-jsonoption https://github.com/dathere/qsv/pull/2356tojsonl: add--quietoption. Fulfills #2335. https://github.com/dathere/qsv/pull/2336validate: add--validate-schemaoption to check if the JSON Schema itself is valid https://github.com/dathere/qsv/pull/2393contrib(completions): add joinp--ignore-caseand slice--invertby @rzmk in https://github.com/dathere/qsv/pull/2322contrib(completions): add--quiettotojsonlby @rzmk in https://github.com/dathere/qsv/pull/2337ci: add qsvglibc2.31-headless to action by @rzmk in https://github.com/dathere/qsv/pull/2330- Add license to MSI installer by @rzmk in https://github.com/dathere/qsv/pull/2321
Changed
lens: optimized csvlens library usage, dropping clap dependency https://github.com/dathere/qsv/pull/2403pivotp: an even smarterpivotphttps://github.com/dathere/qsv/pull/2368stats: performance boost https://github.com/dathere/qsv/commit/51349ba8f0121804a1a6766371f1e17c0da800b6- Update deb package by @tino097 in https://github.com/dathere/qsv/pull/2226
ci: attempt using files-folder instead of files by @rzmk in https://github.com/dathere/qsv/pull/2320- Setting QSVFREEMEMORYHEADROOM_PCT to 0 disables memory availability check https://github.com/dathere/qsv/pull/2353
- build(deps): bump actix-governor from 0.7.0 to 0.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2351
- build(deps): bump bytemuck from 1.20.0 to 1.21.0 by @dependabot in https://github.com/dathere/qsv/pull/2361
- build(deps): bump chrono from 0.4.38 to 0.4.39 by @dependabot in https://github.com/dathere/qsv/pull/2345
- build(deps): bump crossbeam-channel from 0.5.13 to 0.5.14 by @dependabot in https://github.com/dathere/qsv/pull/2354
- build(deps): bump flexi_logger from 0.29.6 to 0.29.7 by @dependabot in https://github.com/dathere/qsv/pull/2348
- build(deps): bump governor from 0.7.0 to 0.8.0 by @dependabot in https://github.com/dathere/qsv/pull/2347
- build(deps): bump itertools from 0.13.0 to 0.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2413
- build(deps): bump jsonschema from 0.26.1 to 0.26.2 by @dependabot in https://github.com/dathere/qsv/pull/2355
- build(deps): bump jsonschema from 0.26.2 to 0.27.0 by @dependabot in https://github.com/dathere/qsv/pull/2371
- build(deps): bump jsonschema from 0.27.1 to 0.28.0 by @dependabot in https://github.com/dathere/qsv/pull/2389
- build(deps): bump jsonschema from 0.28.0 to 0.28.1 by @dependabot in https://github.com/dathere/qsv/pull/2396
- bump polars from 0.44.2 to 0.45 https://github.com/dathere/qsv/pull/2340
- build(deps): bump polars from 0.45.0 to 0.45.1 by @dependabot in https://github.com/dathere/qsv/pull/2344
- bump pyo3 from 0.22 to 0.23 now that Polars supports it https://github.com/dathere/qsv/pull/2352
- build(deps): bump redis from 0.27.5 to 0.27.6 by @dependabot in https://github.com/dathere/qsv/pull/2331
- build(deps): bump reqwest from 0.12.9 to 0.12.11 by @dependabot in https://github.com/dathere/qsv/pull/2385
- build(deps): bump reqwest from 0.12.11 to 0.12.12 by @dependabot in https://github.com/dathere/qsv/pull/2395
- build(deps): bump rfd from 0.15.1 to 0.15.2 by @dependabot in https://github.com/dathere/qsv/pull/2404
- build(deps): bump serde from 1.0.215 to 1.0.216 by @dependabot in https://github.com/dathere/qsv/pull/2349
- build(deps): bump serde from 1.0.216 to 1.0.217 by @dependabot in https://github.com/dathere/qsv/pull/2384
- build(deps): bump serde_json from 1.0.133 to 1.0.134 by @dependabot in https://github.com/dathere/qsv/pull/2365
- build(deps): bump sysinfo from 0.32.1 to 0.33.0 by @dependabot in https://github.com/dathere/qsv/pull/2334
- build(deps): bump sysinfo from 0.33.0 to 0.33.1 by @dependabot in https://github.com/dathere/qsv/pull/2383
- deps: bump tabwriter to 1.4.1 https://github.com/dathere/qsv/commit/bbcbeba193b7b1808bcd359c460fb688b49107f0
- build(deps): bump tokio from 1.41.1 to 1.42.0 by @dependabot in https://github.com/dathere/qsv/pull/2333
- build(deps): bump xxhash-rust from 0.8.12 to 0.8.13 by @dependabot in https://github.com/dathere/qsv/pull/2359
- build(deps): bump xxhash-rust from 0.8.13 to 0.8.14 by @dependabot in https://github.com/dathere/qsv/pull/2372
- build(deps): bump xxhash-rust from 0.8.14 to 0.8.15 by @dependabot in https://github.com/dathere/qsv/pull/2392
- apply several clippy suggestions
- bumped numerous indirect dependencies to latest versions
- bumped Rust nightly from 2024-11-28 to 2024-12-19 (same version used by Polars)
Fixed
joinp: refactor--cache-schemaoption. Resolves #2369. https://github.com/dathere/qsv/pull/2370extsortunderflow in CSV mode. Resolves #2391. https://github.com/dathere/qsv/pull/2412- instantiate logger properly https://github.com/dathere/qsv/commit/9c0c1a7a63ef3773e599f6fa91e6fa3b734936df
- fix
util::get_stats_records()to no longer infer boolean inStatsMode::PolarsSchema. Resolves #2369. https://github.com/dathere/qsv/commit/cebb6642daf8b528ed8c95be9fc47709abe1c50a
Full Changelog: https://github.com/dathere/qsv/compare/1.0.0...2.0.0
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 1.0.0
qsv v1.0.0 is here! 🎉
After over 3 years of development, nearly 200 releases, and 11,000+ commits, qsv has finally reached v1.0.0!
What started as a hobby project to learn Rust during COVID has evolved into a powerful data wrangling tool used in multiple datHere products, open source projects, and even in several mission-critical production environments!
To mark this major milestone, this larger than usual release includes major performance improvements, new features, and various optimizations!
Added
joinp: add--ignore-caseoption https://github.com/dathere/qsv/pull/2287py: add ability to load python expression from file https://github.com/dathere/qsv/pull/2295replace: add--not-oneflag (resolves #2305) by @rzmk in https://github.com/dathere/qsv/pull/2307slice: add--invertoption https://github.com/dathere/qsv/pull/2298stats: add dataset-level stats https://github.com/dathere/qsv/pull/2297sqlp: auto-decompression of gzip, zstd & zlib compressed csv files withread_csvtable function (implements suggestion from @wardi in #2301) https://github.com/dathere/qsv/pull/2315template: add lookup support https://github.com/dathere/qsv/pull/2313- added
uifeature to make it easier to make a headless build of qsv https://github.com/dathere/qsv/pull/2289 - added better panic handling https://github.com/dathere/qsv/pull/2304
- added new benchmark for
templatecommand https://github.com/dathere/qsv/commit/cd7e480de5ff1e2766a16b8d21767b76fbf10d35 - added 📚
lookup supportlegend https://github.com/dathere/qsv/commit/b46de73f57ba35ee08581a4f20809a5f581d461b
Changed
- move qsv from personal Github repo to datHere GitHub org https://github.com/dathere/qsv/pull/2317
template: parallelized template rendering for significant speedups https://github.com/dathere/qsv/pull/2273- simplify input format check https://github.com/dathere/qsv/pull/2309
- bump embedded
luaufrom 0.650 to 0.653 https://github.com/dathere/qsv/commit/986a1d3b4e60f15c25ef8a157c7e9e205ae8e7a9 - deps: Switch back to
simple-home-dirfromsimple-expand-tildehttps://github.com/dathere/qsv/pull/2319 - deps: Add minijinja contrib https://github.com/dathere/qsv/pull/2276
- deps: bump pyo3 down to 0.21.2 because polars-mem-engine is not compatible with pyo3 0.23.x yet https://github.com/dathere/qsv/commit/7f9fc8a6cfe94a104d33e895ecae11e2f40274ee
- build(deps): bump base62 from 2.0.2 to 2.0.3 by @dependabot in https://github.com/dathere/qsv/pull/2281
- build(deps): bump bytemuck from 1.19.0 to 1.20.0 by @dependabot in https://github.com/dathere/qsv/pull/2299
- build(deps): bump bytes from 1.8.0 to 1.9.0 by @dependabot in https://github.com/dathere/qsv/pull/2314
- build(deps): bump file-format from 0.25.0 to 0.26.0 by @dependabot in https://github.com/dathere/qsv/pull/2277
- build(deps): bump hashbrown from 0.15.1 to 0.15.2 by @dependabot in https://github.com/dathere/qsv/pull/2310
- build(deps): bump itoa from 1.0.11 to 1.0.12 by @dependabot in https://github.com/dathere/qsv/pull/2300
- build(deps): bump itoa from 1.0.12 to 1.0.13 by @dependabot in https://github.com/dathere/qsv/pull/2302
- build(deps): bump itoa from 1.0.13 to 1.0.14 by @dependabot in https://github.com/dathere/qsv/pull/2311
- build(deps): bump mlua from 0.10.0 to 0.10.1 by @dependabot in https://github.com/dathere/qsv/pull/2280
- build(deps): bump mlua from 0.10.1 to 0.10.2 by @dependabot in https://github.com/dathere/qsv/pull/2316
- build(deps): bump serial_test from 3.1.1 to 3.2.0 by @dependabot in https://github.com/dathere/qsv/pull/2279
- build(deps): bump minijinja from 2.4.0 to 2.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2284
- build(deps): bump minijinja-contrib from 2.3.1 to 2.5.0 by @dependabot in https://github.com/dathere/qsv/pull/2283
- build(deps): bump rfd from 0.15.0 to 0.15.1 by @dependabot in https://github.com/dathere/qsv/pull/2291
- build(deps): bump sanitize-filename from 0.5.0 to 0.6.0 by @dependabot in https://github.com/dathere/qsv/pull/2275
- build(deps): bump serde from 1.0.214 to 1.0.215 by @dependabot in https://github.com/dathere/qsv/pull/2286
- build(deps): bump serde_json from 1.0.132 to 1.0.133 by @dependabot in https://github.com/dathere/qsv/pull/2292
- build(deps): bump tempfile from 3.13.0 to 3.14.0 by @dependabot in https://github.com/dathere/qsv/pull/2278
- build(deps): bump tokio from 1.41.0 to 1.41.1 by @dependabot in https://github.com/dathere/qsv/pull/2274
- build(deps): bump url from 2.5.3 to 2.5.4 by @dependabot in https://github.com/dathere/qsv/pull/2306
- applied several clippy suggestions
- bumped numerous indirect dependencies to latest versions
- bumped MSRV to latest Rust stable (1.83.0)
- bumped Rust nightly from 2024-11-01 to 2024-11-28, the same version used by Polars
Fixed
- fix
get_stats_records()helper to handle input files with embedded spaces (fixes #2294) https://github.com/dathere/qsv/pull/2296 - added better panic handling (fixes #2301) https://github.com/dathere/qsv/pull/2304
- implement simple format check for input files (fixes #2301) https://github.com/dathere/qsv/pull/2308
Removed
- removed
simple-expand-tildedependency in favor ofsimple-home-dirhttps://github.com/dathere/qsv/pull/2318 - removed patched fork of
indicatifnow that 0.17.9 is released, fixing GH unmaintained advisory forinstanthttps://github.com/dathere/qsv/commit/33fa54a1651ce29d286c0e1ff4f3d77bbbd2ffd5 - removed
clipboardcommand fromqsvlitebinary variant https://github.com/dathere/qsv/commit/9c663d84da49cbbe53d7c9df6bd747cad0d9ba24
Full Changelog: https://github.com/dathere/qsv/compare/0.138.0...1.0.0
- Rust
Published by jqnatividad about 1 year ago
https://github.com/dathere/qsv - 0.138.0
Highlights:
:star: New
templatecommand for rendering templates with CSV data.
Generate complex documents from CSVs (Form letters, HTML, JSON, XML files, etc.) with the powerful MiniJinja template engine (Example template).:star: New
lookupmodule for fetching reference data from remote and local files.
In addition to the typicalhttp/httpsschemes for remote files, qsv adds two additional schemes -CKAN://anddatHere://, fetching lookup data from a CKAN site or datHere maintained reference data respectively. The lookup module has simple file-based caching as well to minimize repeated fetching of typically static reference data (default cache age: 600 seconds).
Thelookupmodule is now being used by theluau(for itsqsv_register_lookuphelper) andvalidate(for itsdynamicEnumcustom JSON Schema keyword) commands. More commands will take advantage of this module over time (e.g.apply,geocode,template,sqlp, etc.) to do extended lookups (e.g. lookup Census information given spatiotemporal data - like demographic info of a Census tract).:sparkles: Enhanced
fetchpostwith MiniJinja templating for payload construction.
Previously,fetchpostwas limited to posting url-encoded HTML Form data with content typeapplication/x-www-form-urlencoded. Now with the new--payload-tpland--content-typeoptions, users can post request bodies rendered with MiniJinja and specify other content types (typicallyapplication/json,text/plain,multipart/form-data) as well.:sparkles: Improved Polars integration with automatic schema detection
Thejoinpandsqlpcommands now use qsv's stats cache to automatically determine column data types, rather than having Polars scan a sample of rows. This provides two key benefits:- Faster execution by skipping Polars' schema inference step
- GUARANTEED data type inferencing since the stats cache analyzes the entire dataset, not just a sample
:running:
fast-float2crate for faster float parsing
Casting string/bytes to float is now much faster (2 to 8x faster than Rust's standard library) withfast-float2.:muscle: Major dependency updates including Polars 0.44.2, Luau 0.650, mlua 0.10.0 and jsonschema 0.26.1
These core crates underpin qsv's advanced commands. Using the latest version of these crates allow qsv to stay true to its goal of being the fastest and most comprehensive data-wrangling toolkit.
Added
- added lookup module - enabling fetching and caching of reference data from remote and local files https://github.com/jqnatividad/qsv/pull/2262
fetchpost: add--payload-tpl <file>and--content-typeoptions to construct payload using MiniJinja with the appropriate content-type https://github.com/jqnatividad/qsv/pull/2268 https://github.com/jqnatividad/qsv/commit/592149867997da6ac56d20a7e7f84252b2baeb2ajoinp: derive polars schema from stats cache https://github.com/jqnatividad/qsv/commit/86fe22ee4e3677dc702eaf21175c60ceb8166001sqlp: derive polars schema from stats cache https://github.com/jqnatividad/qsv/pull/2256template: new command to render MiniJinja templates with CSV data https://github.com/jqnatividad/qsv/pull/2267validate: adddynamicEnumlookup support https://github.com/jqnatividad/qsv/pull/2265contrib(completions): add template command and update fetchpost by @rzmk in https://github.com/jqnatividad/qsv/pull/2269- add
fast-float2dependency for faster bytes to float conversion https://github.com/jqnatividad/qsv/commit/7590e4ed171eeb6804845e1b54bec0fa26cca706 https://github.com/jqnatividad/qsv/commit/3ca30aa878ed3c4dc58944d46f53fb0c4b955356 - added more benchmarks for new/updated commands https://github.com/jqnatividad/qsv/commit/f8a1d4fff11d78860c102c1375653822ee95ca58 https://github.com/jqnatividad/qsv/commit/cd7e480de5ff1e2766a16b8d21767b76fbf10d35
Changed
luau: adapt to mlua 0.10 API changes https://github.com/jqnatividad/qsv/commit/268cb45a04a49360befb81af76cc1cddd6307286luau: refactored stage management https://github.com/jqnatividad/qsv/commit/31ef58a82b8f80fe0b29260f9170f10220c73714luau: now uses the lookup module https://github.com/jqnatividad/qsv/commit/2f4be3473a90252df4fd559a5f3b38246a3da696stats: minor perf refactoring https://github.com/jqnatividad/qsv/commit/6cdd6ea94adbae063e7fb6d9da71dac0c86adc12- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2243
- build(deps): bump azure/trusted-signing-action from 0.4.0 to 0.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2239
- build(deps): bump bytes from 1.7.2 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2231
- build(deps): bump cached from 0.53.1 to 0.54.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2272
- build(deps): bump flexi_logger from 0.29.3 to 0.29.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2229
- build(deps): bump flexi_logger from 0.29.4 to 0.29.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2261
- build(deps): bump flexi_logger from 0.29.5 to 0.29.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/2266
- build(deps): bump hashbrown from 0.15.0 to 0.15.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2270
- build(deps): bump jsonschema from 0.24.0 to 0.24.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2234
- build(deps): bump jsonschema from 0.24.1 to 0.24.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2238
- build(deps): bump jsonschema from 0.24.2 to 0.24.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2240
- build(deps): bump jsonschema from 0.25.0 to 0.25.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2244
- build(deps): bump jsonschema from 0.26.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2260
- build(deps): bump regex from 1.11.0 to 1.11.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2242
- build(deps): bump reqwest from 0.12.8 to 0.12.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/2258
- build(deps): bump serde from 1.0.210 to 1.0.211 by @dependabot in https://github.com/jqnatividad/qsv/pull/2232
- build(deps): bump serde from 1.0.211 to 1.0.213 by @dependabot in https://github.com/jqnatividad/qsv/pull/2236
- build(deps): bump serde from 1.0.213 to 1.0.214 by @dependabot in https://github.com/jqnatividad/qsv/pull/2259
- build(deps): bump simd-json from 0.14.1 to 0.14.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2235
- build(deps): bump tokio from 1.40.0 to 1.41.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2237
deps: updated our fork of the csv crate with more perf optimizations https://github.com/jqnatividad/qsv/commit/eae7d764bd31d717bdf123646ea85c81ed829829deps: use calamine upstream with unreleased fixes https://github.com/jqnatividad/qsv/commit/4cc7f37e9c34b712ae2c5f43c018b2d6a6655ebbdeps: use our csvlens fork untl PR removing unneeded arboard features is merged https://github.com/jqnatividad/qsv/commit/bb3232205b7a948848c2949bcaf3b54e54f3d49bdeps: bump jsonschema from 0.25 to 0.26 https://github.com/jqnatividad/qsv/pull/2251deps: bump embedded Luau from 0.640 to 0.650 https://github.com/jqnatividad/qsv/commit/8c54b875bf8768849b128ab15d96c33b02be180b https://github.com/jqnatividad/qsv/commit/aca30b072ecb6bb22d7edbe8ddef348649a5d699deps: bump mlua from 0.9 to 0.10 https://github.com/jqnatividad/qsv/pull/2249deps: bump Polars from 0.43.1 at py-1.11.0 tag to latest 0.44.2 upstream https://github.com/jqnatividad/qsv/pull/2255 https://github.com/jqnatividad/qsv/commit/0e40a4429b4ef219ab7a11c91767e95778470ef2- apply select clippy lint suggestions
- updated indirect dependencies
- aligned Rust nightly to Polars nightly - 2024-10-28 - https://github.com/jqnatividad/qsv/commit/245bcb55af416960aa603c05de960289f6125c5c
Fixed
- fix documentation typo: it's → its by @tmtmtmtm in https://github.com/jqnatividad/qsv/pull/2254
Removed
- removed need to set RAYONNUMTHREADS env var and just call the Rayon API directly https://github.com/jqnatividad/qsv/commit/aa6ef89eceac89c3d1ed19068e0e23a451c4402d
- removed unneeded
create_dir_all_threadsafehelper now that std::createdirall is threadsafe https://github.com/jqnatividad/qsv/commit/d0af83bfbd0430fa22f039bd00615380110f456e
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.137.0...0.138.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.137.0
Highlights:
extdedup&extsortnow support two modes - LINE mode and CSV mode. Previously, both commands only sorted on a line-by-line basis (LINE mode).
With the addition of CSV mode, you can now deduplicate or sort CSV files on a column-by-column basis, with the powerful--selectoption to specify which columns to deduplicate or sort on. This is especially useful for large CSV files with many columns, where you only want to deduplicate or sort on a subset of columns. And since both commands use the disk and are streaming, they can handle files larger than memory.sqlpnow has a--cache-schemaoption that caches the schema of the input CSV file, which can significantly speed up subsequent queries on the same file.fetchandfetchposthave been updated to use thejaq(a jq-like tool for parsing JSON) crate instead of thejqlcrate. This change was made to improve performance and to make the commands more consistent with thejsoncommand which also usesjaq. Furthermore,jaqis a clone ofjq- which is widely used and has a large community, so it should be more familiar to users.statsis a tad faster as we keep squeezing more performance from this central command.validateis now faster and more memory efficient due to optimizations in thejsonschemacrate and minor performance improvements in thevalidatecommand itself.
Added
extdedup: now supports two modes - LINE mode and CSV mode https://github.com/jqnatividad/qsv/pull/2208extsort: now also has two modes - CSV mode and LINE mode https://github.com/jqnatividad/qsv/pull/2210sqlp: add--cache-schemaoption https://github.com/jqnatividad/qsv/pull/2224- added
sqlp --cache-schemabenchmarks
Changed
apply&applydp: use smallvec for operations vector & other minor performance optimizations https://github.com/jqnatividad/qsv/pull/2219 & https://github.com/jqnatividad/qsv/commit/bc837ae698f3aee06ea9b846b98ea0c75820a22dapply&applydp: specify min_length for parallel iterators https://github.com/jqnatividad/qsv/commit/7d6ce5ec9675755abd5942a5e9e731592961700dfetch&fetchpost: replace jql with jaq https://github.com/jqnatividad/qsv/pull/2222stats: performance optimizations https://github.com/jqnatividad/qsv/commit/f205809549ac275078a95bc2821a583611955ad0 https://github.com/jqnatividad/qsv/commit/e26c27f58df688d7bfb2185ad54d4fe010b1fccf https://github.com/jqnatividad/qsv/commit/4579c1bfba4eca21d7480694780e39f6966a88a0validate: specify min_length for parallel iterators https://github.com/jqnatividad/qsv/commit/a5b818562d5db7d65f00e5acd2c8bf7d44bd869a- build(deps): bump calamine from 0.26.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2204
- build(deps): bump csvs_convert from 0.8.14 to 0.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2215
- build(deps): bump flexi_logger from 0.29.2 to 0.29.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2209
- build(deps): bump jsonschema from 0.23.0 to 0.24.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2223
- build(deps): bump pyo3 from 0.22.3 to 0.22.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2207
- build(deps): bump pyo3 from 0.22.4 to 0.22.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2212
- build(deps): bump redis from 0.27.3 to 0.27.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2202
- build(deps): bump redis from 0.27.4 to 0.27.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2217
- build(deps): bump serde_json from 1.0.129 to 1.0.130 by @dependabot in https://github.com/jqnatividad/qsv/pull/2218
- build(deps): bump serde_json from 1.0.131 to 1.0.132 by @dependabot in https://github.com/jqnatividad/qsv/pull/2220
- build(deps): bump uuid from 1.10.0 to 1.11.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2213
- apply select clippy lints
- bumped indirect dependencies
- bumped MSRV to 1.82
Fixed:
- fix performance regression in batched commands by refactoring
optimal_batch_sizeto require indexed CSV files https://github.com/jqnatividad/qsv/pull/2206
Removed:
fetch&fetchpost: removed jql options; replaced with jaq https://github.com/jqnatividad/qsv/pull/2222
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.136.0...0.137.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.136.0
:tada: qsv pro is now available in the Microsoft Store! :tada:
It's Data Wrangling Democratized on the Desktop, featuring:
- :bar_chart: Familiar Spreadsheet Interface
tap the power of qsv to query, analyze, enrich, scrub and transform huge Excel files and multi-gigabyte CSV files in seconds, without having to deal with the command-line.
CKAN desktop client
designed to make data publishing easier for portal operators and data stewards using the
CKAN platform.- :inbox_tray: Flow
allows you to build custom node-based flows and data pipelines using a visual interface. - :wrench: Toolbox
features an ever-expanding library of reusable scripts for common data-wrangling use cases. - :star: and more!
Natural Language Interface (RAG), Polars SQL query support, an API, Python/Luau support, automatic Data Dictionaries, DCAT 3 metadata profile inferencing, along with a retinue of other cloud-based services (e.g. customizable street-level geocoding, data feeds, reference data lookups, geo-ip lookups, cloud storage support,.qsvfile format, etc.) that will be unveiled in future versions.
Like qsv, we're iterating rapidly with qsv pro, so your feedback is essential. Give it a try!
Other highlights:
* excel: new --table option for XLSX files; new --header-row option; expanded --range option, adding support for Named Ranges and absolute ranges (e.g. Sheet2!$A$1:$J$10); and expanded metadata export now including Named Ranges and Tables (for XLSX files)
* Improved performance for several commands (apply, datefmt, tojsonl and validate) through automatic batch size optimization
* validate: dynamicEnum custom JSON Schema keyword in validate command (renamed from dynenum) and enhanced email validation
* schema: automatic JSON Schema const inferencing for columns with just one value
* Significant dependency updates, including latest upstream versions of Polars, jsonschema, and serde_json with unreleased performance upgrades, new features and fixes
NOTE: You can see qsv & qsv pro in action in our "The Problem with Data Portals" webinar Wed, Oct 23, 2024. 1-2pm EDT
Added
- :tada: qsv pro is now in the Microsoft Store!!! :tada:
apply,datefmt,tojsonl,validate: added logic to automatically determine optimal batch size for better parallelization https://github.com/jqnatividad/qsv/pull/2178enum: added--new-columnsupport for all enum modes, not just--incrementhttps://github.com/jqnatividad/qsv/pull/2173excel: new--tableoption for XLSX files https://github.com/jqnatividad/qsv/pull/2194excel: new--header-rowoption https://github.com/jqnatividad/qsv/commit/458f79ad9f4da504c68d73b48e83ad53b9634027excel: expanded range and metadata options https://github.com/jqnatividad/qsv/pull/2195schema: added JSON Schema automaticconstinferencing https://github.com/jqnatividad/qsv/pull/2180- Add signing step to qsv MSI installer GitHub Action by @rzmk in https://github.com/jqnatividad/qsv/pull/2182
contrib(completions): add--tableoption toqsv excelby @rzmk in https://github.com/jqnatividad/qsv/pull/2197completions: add--header-rowoption toqsv excelhttps://github.com/jqnatividad/qsv/commit/e8794d569185245f857659cdc299ea86029dd841- added new
apply operations sentimentbenchmark https://github.com/jqnatividad/qsv/commit/b745e6438b64686810e4d1df4fa2e6748ba93ff8 docs: added indexing section to PERFORMANCE.md https://github.com/jqnatividad/qsv/commit/804145a5304091c36728a8cdde4d56f879f71c15
Changed
stats: various minor micro-optimizations https://github.com/jqnatividad/qsv/commit/62d95fc6db2c34916160db88e4235719749a5f23 https://github.com/jqnatividad/qsv/commit/2c2862a75d6c0b2651516da30a7e6207a0043670validate: renamed custom keyworddynenumtodynamicEnumto be more consistent with JSON schema naming conventions https://github.com/jqnatividad/qsv/compare/0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cfvalidate: optimizations for increased performance; replace serdejson with simdjson https://github.com/jqnatividad/qsv/compare/0.135.0...master#diff-9783631cdad9e1f47f60266303dc2d56a6e7a486784b61c40961601e8192f7cf- apply new
clippy::ref_optionlint to Config::new API https://github.com/jqnatividad/qsv/pull/2192 - Update debian package readme by @tino097 in https://github.com/jqnatividad/qsv/pull/2187
deps: bumpcalaminefrom 0.25 to 0.26 https://github.com/jqnatividad/qsv/commit/b42279a66144264bde9333068c47c530e3945f8cdeps:jsonschemause latest 0.22.3 upstream with unreleased features/fixesdeps:polarsuse latest 0.43.1 upstream with unreleased features/fixesdeps: created our own fork of unmaintained vader_sentiment crate https://github.com/jqnatividad/qsv/commit/b4267610f39d13eb8939c86f3b5e70033aa95a0cdeps: useserde_jsonupstream with unreleased perf improvement/fixes https://github.com/jqnatividad/qsv/blob/1c1174b3b8b65d9dfd9c841597366fb09d0a047c/Cargo.toml#L221- build(deps): bump flate2 from 1.0.33 to 1.0.34 by @dependabot in https://github.com/jqnatividad/qsv/pull/2171
- build(deps): bump flexi_logger from 0.29.0 to 0.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2189
- build(deps): bump flexi_logger from 0.29.1 to 0.29.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2196
- build(deps): bump hashbrown from 0.14.5 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2186
- build(deps): bump jsonschema from 0.20.0 to 0.21.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2177
- build(deps): bump jsonschema from 0.22.1 to 0.22.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2191
- build(deps): bump regex from 1.10.6 to 1.11.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2176
- build(deps): bump reqwest from 0.12.7 to 0.12.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/2183
- build(deps): bump simd-json from 0.14.0 to 0.14.1 https://github.com/jqnatividad/qsv/pull/2199
- build(deps): bump simple-expand-tilde from 0.4.2 to 0.4.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2190
- build(deps): bump sysinfo from 0.31.4 to 0.32.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2193
- build(deps): bump tempfile from 3.12.0 to 3.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2175
- apply select clippy lints
- bumped indirect dependencies
- aligned Rust nightly to Polars nightly - 2024-09-29 https://github.com/jqnatividad/qsv/commit/7cd2de1151b2299d9b75a9c8b1a3e21dc9c992e2
Fixed
schema: fixenumso it only adds a list when the number of unique values >--enum-thresholdhttps://github.com/jqnatividad/qsv/pull/2180- Upload artifact fix for Debian package publishing by @tino097 in https://github.com/jqnatividad/qsv/pull/2168
- fixed typos configuration https://github.com/jqnatividad/qsv/commit/627de891d8fd358aadf8c302552e8a99c54ed959
- fixed various GitHub Actions publishing workflow issues
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.135.0...0.136.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.135.0
Highlights
JSON Schema validation just got a whole lot more powerful with the introduction of qsv's custom dynenum keyword!
With dynenum, you can now dynamically lookup valid enum values from a CSV (on the filesystem or on a URL), allowing for more flexible and responsive data validation.
Unlike the standardenum keyword, dynenum does not require hardcoding valid values at schema definition time, and can be used to validate data against a changing set of valid values.
For an example, see https://github.com/jqnatividad/qsv/discussions/1872#discussioncomment-10725628.
In an upcoming qsv pro release, we're planning on making dynenum even more powerful by allowing you to easily specify high-value reference data (e.g. US Census data, World Bank data, data.gov, etc.) that is maintained at data.dathere.com and other CKAN instances.
This release also add the custom currency JSON Schema format, which enables currency validation according to the ISO 4217 standard.
The Polars engine was also upgraded to 0.43.1 at the py-1.81.1 tag - making for various under-the-hood improvements for the sqlp, joinp and count commands, as we set the stage for more Polars-powered features in future releases.
Added
foreach: enabledforeachcommand on Windows prebuilt binaries https://github.com/jqnatividad/qsv/commit/def9c8fa98cd214f0db839b64bcd12764dcfba43lens: added support for QSVSNIFFDELIMITER env var and snappy auto-decompression https://github.com/jqnatividad/qsv/commit/8340e8949c4b60669bc95c432c661a8c374ca422sample: add--max-sizeoption https://github.com/jqnatividad/qsv/commit/e845a3cc1dcbbceda86bb7fe132c5040d23ce78bvalidate: addeddynenumcustom JSON Schema keyword for dynamic validation lookups https://github.com/jqnatividad/qsv/pull/2166tests: add tests for https://100.dathere.com/lessons/2 by @rzmk in https://github.com/jqnatividad/qsv/pull/2141- added
stats_sortedandfrequency_sortedbenchmarks - added
validate_dynenumbenchmarks
Changed
json: add error for empty key and update usage text by @rzmk in https://github.com/jqnatividad/qsv/pull/2167prompt: gatepromptcommand behindpromptfeature https://github.com/jqnatividad/qsv/pull/2163validate: expandedcurrencyJSON Schema custom format to support ISO 4217 currency codes and alternate formats https://github.com/jqnatividad/qsv/commit/5202508e5c3969b279c20cf80bb1e37d89afd826validate: migrate to newjsonschemacrate api https://github.com/jqnatividad/qsv/commit/5d6505426c652e7db4bb602c1bf9d302e6a09214- Update ubuntu version for deb package by @tino097 in https://github.com/jqnatividad/qsv/pull/2126
contrib(completions): update completions for qsv v0.134.0 and fix subcommand options by @rzmk in https://github.com/jqnatividad/qsv/pull/2135contrib(completions): add--max-sizecompletion forsampleby @rzmk in https://github.com/jqnatividad/qsv/pull/2142deps: bump to polars 0.43.1 at py-1.81.1 https://github.com/jqnatividad/qsv/pull/2130deps: switch back to calamine upstream instead of our fork https://github.com/jqnatividad/qsv/commit/677458faa4439b1b34c8a3556687a031ed184e4e- build(deps): bump actix-governor from 0.5.0 to 0.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2146
- build(deps): bump anyhow from 1.0.87 to 1.0.88 by @dependabot in https://github.com/jqnatividad/qsv/pull/2132
- build(deps): bump arboard from 3.4.0 to 3.4.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2137
- build(deps): bump bytes from 1.7.1 to 1.7.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2148
- build(deps): bump geosuggest-core from 0.6.3 to 0.6.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2153
- build(deps): bump geosuggest-utils from 0.6.3 to 0.6.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/2154
- build(deps): bump jql-runner from 7.1.13 to 7.2.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2165
- build(deps): bump jsonschema from 0.18.1 to 0.18.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2127
- build(deps): bump jsonschema from 0.18.2 to 0.18.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2134
- build(deps): bump jsonschema from 0.18.3 to 0.19.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2144
- build(deps): bump jsonschema from 0.19.1 to 0.20.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2152
- build(deps): bump pyo3 from 0.22.2 to 0.22.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2143
- build(deps): bump rfd from 0.14.1 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2151
- build(deps): bump simple-expand-tilde from 0.4.0 to 0.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2129
- build(deps): bump qsv_currency from 0.6.0 to 0.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2159
- build(deps): bump qsv_docopt from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2136
- build(deps): bump redis from 0.26.1 to 0.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2133
- build(deps): bump simdutf8 from 0.1.4 to 0.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/2164
- bump indirect dependencies
- apply select clippy lint suggestions
- several usage text/documentation improvements
- bump MSRV to 1.81.0
Fixed
validate: correctfail_validation_error!macro; reformat error messages to use hyphens as the JSONschema error message already starts with "error:" https://github.com/jqnatividad/qsv/commit/9a2552481a07759847efe6025b402297ecba7e19- moved
--helpoutput from stderr to stdout as per GNU CLI guidelines https://github.com/jqnatividad/qsv/pull/2138 lens: fixed parsing of lens options https://github.com/jqnatividad/qsv/commit/1cdd1bcac29fd2411521ac95fa87595de74cbb1bsearchset: fixed usage text for<regexset-file>https://github.com/jqnatividad/qsv/commit/9a60fb088a326ee97ed1b147c4c3686b6b8aaeeb- used patched forks of
arrow,csvlensandxlsxwritercrates that replaces a dependency on an old version oflexical-corewith known soundness issues - https://rustsec.org/advisories/RUSTSEC-2023-0086. Once those crates have updated theirlexical-coredependency, we will revert to the original crates.
Removed
- removed
promptcommand from qsvlite https://github.com/jqnatividad/qsv/pull/2163 - publish: remove
lensfeature from i686 targets as it does not compile https://github.com/jqnatividad/qsv/commit/959ca7686f8656c98de9257d11f1f762852bdf9d deps: remove anyhow dependency https://github.com/jqnatividad/qsv/pull/2150
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.134.0...0.135.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.134.0
qsv pro v1 is here! 🎉
If you've been using qsv for a while, even if you're a command-line ninja, you'll find a lot of new capabilities in qsv pro that can make your data wrangling experience even better!
Apart from making qsv easier to use, qsv pro has a multitude of features including: view interactive data tables; browse stats/frequency/metadata; run recipes and tools (scripts); run Polars SQL queries; use Natural Language queries (using Retrieval Augmented Generation (RAG) techniques); regular expression search; export to multiple file formats; download/upload from/to compatible CKAN instances; design custom node-based flows and data pipelines; interact with a local API from external programs including the qsv pro command; run various qsv commands in a graphical user interface; and the list goes on!
And that's just the beginning, there's more to come! You just have to try it!
Download qsv pro v1 now at qsvpro.dathere.com.
Other highlights include:
pro: new command to allow qsv to interact with the qsv pro API to tap into qsv pro exclusive features.lens: new command to interactively view CSVs using the csvlens crate.- The ludicrously fast
diffcommand is now easier to use with its--drop-equal-fieldsoption. @janriemer continues to work on hiscsv-diffcrate, and there's morediffUX improvements coming soon! statsaddssum_lengthandavg_length"streaming" statistics in addition to the existingmin_lengthandmax_lengthmetrics. These are especially useful for datasets with a lot of "free text" columns.statsalso got "smarter" and "faster" by dog-fooding its own statistics to make it run faster!
It's a little complicated, but the waystatsworks is that it compiles the "streaming" statistics on the fly first as it multiplex load the data across several threads, and the more expensive advanced statistics are "lazily" computed at the end.
Since we now compile "sort order" in a streaming manner, we use this info when deriving cardinality at the end to see if we can skip sorting - an otherwise necessary step to get cardinality which is done by "scanning" all the sorted values of a column. Everytime two neighboring values differ in a sorted column, it increments the cardinality count.
Apart from this "sort order" optimization, we also improved the "cardinality scan" algorithm - halving its memory footprint and making it faster still for larger datasets by parallelizing the computation. This in turn, makes thefrequencycommand faster and more memory efficient.
It's performance tweaks like these, that despite adding six metrics (is_ascii,sort_order,sum_length,avg_length,sem- standard error of the mean &cv- coefficient of variation) in recent releases, thatstatsis still able to compile 35 statistics and do GUARANTEED data type inferences of a million row, 41 column, 520 MB sample of NYC's 311 data in 1.327 seconds (753,580 records per second)![^1]- we now also use our own fork of the
csvcrate, featuring SIMD-accelerated UTF-8 validation and other minor perf tweaks, making the entire qsv suite faster still!
[^1]: see stats_everything_index benchmark
Added
pro: addqsv procommand to interact with qsv pro API by @rzmk in https://github.com/jqnatividad/qsv/pull/2039lens: new command to interactively view CSVs using the csvlens crate https://github.com/jqnatividad/qsv/pull/2117apply: add crc32 operation https://github.com/jqnatividad/qsv/pull/2121count: add --delimiter option https://github.com/jqnatividad/qsv/pull/2120diff: add flag--drop-equal-fieldsby @janriemer in https://github.com/jqnatividad/qsv/pull/2114stats: addsum_lengthandavg_lengthcolumns https://github.com/jqnatividad/qsv/pull/2113stats: smarter cardinality computation - added new parallel algorithm for large datasets (10,000+ rows) and updated sequential algorithm for smaller datasets https://github.com/jqnatividad/qsv/commit/4e63fec61a394ef2ddfa499c0cdd0958e677ad17
Changed
count: added comment to justify magic number https://github.com/jqnatividad/qsv/commit/5241e3972c05f024a0791be04632d03a06b2f9cestats: use simdjson for faster JSONL parsing; micro-optimizecomputehot loop https://github.com/jqnatividad/qsv/commit/0e8b73451999a3e95bfd52246b1088aecd64b88fstats: standardized OVERFLOW and UNDERFLOW messages https://github.com/jqnatividad/qsv/commit/38c61285704e5064a63c9dbb1ac866f18fa130fdsort: renamed symbol so eliminate devskim lint false positive warning https://github.com/jqnatividad/qsv/commit/12db7397f68d3199e3311f402d5c7afed586b88c- enable
lensfeature in GH workflows https://github.com/jqnatividad/qsv/pull/2122 deps: bump polars 0.42.0 to latest upstream at time of release https://github.com/jqnatividad/qsv/commit/3c17ed12c3c763d644d9713afcc8442964f22de3deps: use our own optimized fork of csv crate, with simdutf8 validation and other minor perf tweaks https://github.com/jqnatividad/qsv/commit/e4bcd7123172fa8d8094c305d7780e151c120db1- build(deps): bump serde from 1.0.209 to 1.0.210 by @dependabot in https://github.com/jqnatividad/qsv/pull/2111
- build(deps): bump serde_json from 1.0.127 to 1.0.128 by @dependabot in https://github.com/jqnatividad/qsv/pull/2106
- build(deps): bump qsv-stats from 0.19.0 to 0.22.0 https://github.com/jqnatividad/qsv/pull/2107 https://github.com/jqnatividad/qsv/pull/2112 https://github.com/jqnatividad/qsv/commit/cb1eb60a0a9fb3b9ba381183a2c29909f82efa42
- apply select clippy lint suggestions
- updated several indirect dependencies
- made various doc and usage text improvements
Fixed
schema: Print an error if theqsv statsinvocation fails by @abrauchli in https://github.com/jqnatividad/qsv/pull/2110
New Contributors
- @abrauchli made their first contribution in https://github.com/jqnatividad/qsv/pull/2110
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.133.1...0.134.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.133.1
Highlights
| | |
| ------------- | ------------- |
| [^1] | This release doubles down on Polars' capabilities, as we now, as a matter of policy track the latest polars upstream. If you think qsv has a torrid release schedule, you should see Polars. They're constantly fixing bugs, adding new features and optimizations!
To keep up, we've added Polars revision info to the --version output, and the --envlist option now includes Polars relevant env vars. We've also added support for the POLARS_BACKTRACE_IN_ERR env var to control whether Polars backtraces are included in error messages.
We also removed the to parquet subcommand as its redundant with the Polars-powered sqlp's ability to create parquet files. This removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries smaller. |
Other highlights include:
- New edit command that allows you to edit CSV files.
- The count command's --width option now includes record width stats beyond max length (avg, median, min, variance, stddev & MAD).
- The fixlengths command now has --quote and --escape options.
- The stats command adds a sort_order streaming statistic.
NOTE: 0.133.0 was skipped because of a dev dependency conflict with the
csvs_convertcrate, preventing us from publishing 0.133.0 to crates.io. This has been resolved in 0.133.1.
[^1]: ChatGPT prompt: Using the logos for the Polars project and the qsv project as a baseline, can you create a version with the cowboy riding a polar bear instead?
Added
count: expanded--widthoptions, adding record width stats beyond max length (avg, median, min, variance, stddev & MAD). Also added--jsonoutput when using--widthhttps://github.com/jqnatividad/qsv/pull/2099edit: addqsv editcommand by @rzmk in https://github.com/jqnatividad/qsv/pull/2074fixlengths: added--quoteand--escapeoptions https://github.com/jqnatividad/qsv/pull/2104stats: addsort_orderstreaming statistic https://github.com/jqnatividad/qsv/pull/2101polars: add polars revision info to--versionoutput https://github.com/jqnatividad/qsv/commit/e60e44f99061c37758bd53dfa8511c16d49ceed5polars: added Polars relevant env vars to--envlistoption https://github.com/jqnatividad/qsv/commit/0ad68fed94f7b5059cca6cf96cec4a3b55638e60polars: add & documentPOLARS_BACKTRACE_IN_ERRenv var https://github.com/jqnatividad/qsv/commit/f9cc5595664d4665f0b610fcbac93c30fa445056
Changed
- Optimize polars optflags https://github.com/jqnatividad/qsv/pull/2089
deps: bump polars 0.42.0 to latest upstream at time of release https://github.com/jqnatividad/qsv/commit/3b7af519343f08919f114c7307f0f561d04f93e8- bump polars to latest upstream, removing smartstring https://github.com/jqnatividad/qsv/pull/2091
- build(deps): bump actions/setup-python from 5.1.1 to 5.2.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2094
- build(deps): bump flate2 from 1.0.32 to 1.0.33 by @dependabot in https://github.com/jqnatividad/qsv/pull/2085
- build(deps): bump flexi_logger from 0.28.5 to 0.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2086
- build(deps): bump indexmap from 2.4.0 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2096
- build(deps): bump jsonschema from 0.18.0 to 0.18.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2084
- build(deps): bump serde from 1.0.208 to 1.0.209 by @dependabot in https://github.com/jqnatividad/qsv/pull/2082
- build(deps): bump serde_json from 1.0.125 to 1.0.127 by @dependabot in https://github.com/jqnatividad/qsv/pull/2079
- build(deps): bump sysinfo from 0.31.2 to 0.31.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2077
- build(deps): bump qsv-stats from 0.18.0 to 0.19.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2100
- build(deps): bump tokio from 1.39.3 to 1.40.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2095
- apply select clippy lint suggestions
- updated several indirect dependencies
- made various doc and usage text improvements
- pin Rust nightly to 2024-08-26 from 2024-07-26, aligning with Polars pinned nightly
Fixed
- Ensure portable binaries are "added" to the publish zip archive, instead of replacing all the binaries with just the portable version. Fixes #2083. https://github.com/jqnatividad/qsv/commit/34ad2067007a86ffad6355f7244163c4105a98f2
Removed
- removed
to parquetsubcommand as its redundant withsqlp's ability to create parquet files. This also removes the HUGE duckdb dependency, which should markedly make compile times shorter and binaries much smaller https://github.com/jqnatividad/qsv/pull/2088 - removed
smartstringdependency now that Polars has its own compact inlined string type https://github.com/jqnatividad/qsv/commit/47f047e6ee10916b5caa19ee829471e9fb6f4bea - removed
to parquetbenchmark
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.132.0...0.133.1
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.132.0
Highlights
With this release, we finally finish the stats caching refactor started in 0.131.0, replacing the binary encoded stats cache with a simpler JSONL cache. The stats cache stores the necessary statistical metadata to make several key commands smarter & faster. Per the benchmarks:
frequencyis 6x faster (frequency_index_stats_mode_auto).
Not only is it faster, it now doesn't need to compile a hashmap for columns with ALL unique values (e.g. ID columns) - practically, making it able to handle "real-world" datasets of any size (that is, unless all the columns have ALL unique cardinalities. In that case, the entire CSV will have to fit into memory).tojsonlis 2.67x faster (tojsonl_index)schemais two orders of magnitude (100x) faster!!! (schema_index)
The stats cache also provides the foundation for even more "smart" features and commands in the future. It also has the side-benefit of adding a way to produce stats in JSONL format that can be used for other purposes beyond qsv.
The search, searchset, and replace commands now also have a --literal option that allows you to search for and replace strings with regex special/reserved characters. This makes it easier to search for and replace strings that contain otherwise reserved regex characters without having to escape them (especially useful with URL columns that often contain characters like ?,:,-,., etc.)
Added
search,searchset&replace: add--literaloption https://github.com/jqnatividad/qsv/pull/2060 & https://github.com/jqnatividad/qsv/commit/7196053b36c8886092fe25fd030ccf1cf765ed6aslice: added usage text examples https://github.com/jqnatividad/qsv/commit/04afaa3d5a6e51c75f3f9041515c1d7986c45777publish: added workflow to build "portable" binaries with CPU features disabledcontrib(completions): add--literalforsearchandsearchsetby @rzmk in https://github.com/jqnatividad/qsv/pull/2061contrib(completions): add--literalcompletion toreplaceby @rzmk in https://github.com/jqnatividad/qsv/pull/2062- add more polars metadata in
--versioninfo https://github.com/jqnatividad/qsv/pull/2073 docs: added more info to SECURITY.md https://github.com/jqnatividad/qsv/commit/609d4df61c93de6959f07e8d972009ae6cd12b78docs: expanded Goals/Non-Goals https://github.com/jqnatividad/qsv/commit/54998e36eb4608a1fba7938836e5985b699e32ffdocs: added Installation "Option 0" quick start https://github.com/jqnatividad/qsv/commit/bf5bf82105397436d901de247398fce3e808b122- added
search --literalbenchmark
Changed
stats,schema,frequency&tojsonl: stats caching refactor, replacing binary encoded stats cache with a simpler JSONL cache https://github.com/jqnatividad/qsv/pull/2055- rename
stats --stats-jsonoption tostats --stats-jsonlhttps://github.com/jqnatividad/qsv/pull/2063 - changed "broken pipe" error to a warning https://github.com/jqnatividad/qsv/commit/73532759a8dad2d643f283296aa402950765b648
docs: update multithreading and caching sections of PERFORMANCE.md https://github.com/jqnatividad/qsv/commit/5e6bc455bc544003535e18f99493cc1a20c4a2cedeps: switch to our qsv-optimized fork of csv crate https://github.com/jqnatividad/qsv/commit/3fc1e82c83b5dec23d3ba610e3d0f9bbd2924788deps: bump polars from 0.41.3 to 0.42.0 https://github.com/jqnatividad/qsv/pull/2051- build(deps): bump actix-web from 4.8.0 to 4.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2041
build(deps): bump flate2 from 1.0.31 to 1.0.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/2071
build(deps): bump indexmap from 2.3.0 to 2.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2049
build(deps): bump reqwest from 0.12.6 to 0.12.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/2070
build(deps): bump rust_decimal from 1.35.0 to 1.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2068
build(deps): bump serde from 1.0.205 to 1.0.206 by @dependabot in https://github.com/jqnatividad/qsv/pull/2043
build(deps): bump serde from 1.0.206 to 1.0.207 by @dependabot in https://github.com/jqnatividad/qsv/pull/2047
build(deps): bump serde from 1.0.207 to 1.0.208 by @dependabot in https://github.com/jqnatividad/qsv/pull/2054
build(deps): bump serde_json from 1.0.122 to 1.0.124 by @dependabot in https://github.com/jqnatividad/qsv/pull/2045
build(deps): bump serde_json from 1.0.124 to 1.0.125 by @dependabot in https://github.com/jqnatividad/qsv/pull/2052
apply select clippy lint suggestions
updated several indirect dependencies
made various usage text improvements
Fixed
stats: fix--outputdelimiter inferencing based on file extension https://github.com/jqnatividad/qsv/pull/2065- make process_input helper handle stdin better https://github.com/jqnatividad/qsv/pull/2058
docs: fix completions for--stats-jsonland qsv pro installation text update by @rzmk in https://github.com/jqnatividad/qsv/pull/2072docs: added Note about whyluaufeature is disabled in musl binaries - https://github.com/jqnatividad/qsv/commit/ffa2bc5a3f397b406347d14d0d4fb4ead49cb470 & https://github.com/jqnatividad/qsv/commit/27d0f8e1c2e43c00b99abf98dfa01a4758cf9bad
Removed
- Removed bincode dependency now that we're using JSONL stats cache https://github.com/jqnatividad/qsv/pull/2055 https://github.com/jqnatividad/qsv/commit/babd92bbae473ed63f44f593bc1ab0ad1bc17761
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.131.1...0.132.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.131.1
Changed
- deps: bump polars to latest upstream post py-1.41.1 release at the time of this release
- build(deps): bump filetime from 0.2.23 to 0.2.24 by @dependabot in https://github.com/jqnatividad/qsv/pull/2038
Fixed
frequency: change--stats-modedefault tononefromauto.
This is because of a big performance regression when using--stats-mode autoon datasets with columns with ALL unique values. See https://github.com/jqnatividad/qsv/issues/2040 for more info.
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.131.0...0.131.1
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.131.0
Highlights
- Refactored
frequencyto make it smarter and faster.
frequency's core algorithm essentially compiles an in-memory hashmap to determine the frequency of each unique value for each column. It does this using multi-threaded, multi-I/O techniques to make it blazing fast.
However, for columns with ALL unique values (e.g. ID columns), this takes a comparatively long time and consumes a lot of memory as it essentially compiles a hashmap of the ENTIRE column, with a hashmap entry for each column value with a count of 1.
Now, with the new--stats-modeoption (enabled by default),frequencycan compile the dataset in a more intelligent way by looking up a column's cardinality in the stats cache.
If the cardinality of a column is equal to the CSV's rowcount (indicating a column with ALL unique values), it short-circuits frequency calculations for that column - dramatically reducing the time and memory requirements for the ID column as it eliminates the need to maintain a hashmap for it.
Practically speaking, this makesfrequencyable to handle "real-world" datasets of any size.
To ensurefrequencyis as fast as possible, be sure toindexand computestatsfor your datasets beforehand. - Setting the stage for Datapusher+ v1 and...
The "itches we've been scratching" the past few months have been informed by our work at several clients towards the release of Datapusher+ 1.0 and qsv pro 1.0 (more info below) - both targeted for release this month.
DP+ is our third-gen, high-speed data ingestion/registration tool for CKAN that uses qsv as its data wrangling/analysis engine. It will enable us to reinvent the way data is ingested into CKAN - with exponentially faster data ingestion, metadata inferencing, data validation, computed metadata fields, and more!
We're particularly excited how qsv will allow us to compute and infer high-quality metadata for datasets (with a focus on inferring optional recommended DCAT-US v3 metadata fields) in "near real-time", while dataset publishers are still entering metadata. This will be a game-changer for CKAN administrators and data publishers! - ...qsv pro 1.0
qsv pro is datHere's enterprise-grade data wrangling/curation workbench that’s planned for v1.0 release this month. Building the core functionality of qsv pro's Workflow feature is one of the primary reasons for a v1.0 release.
We feel qsv pro may be a game-changer for data wranglers and data curators who need to work with spreadsheets and large datasets to view statistical data and metadata while also performing complex data wrangling operations in a user-friendly way without having to write code.
Added
docs: added Shell Completion section https://github.com/jqnatividad/qsv/commit/556a2ff48660d05f8e9a865ec427e98114f49b43docs:add 🪄 emoji in legend to indicate "automagical" commands https://github.com/jqnatividad/qsv/commit/2753c90fcbd1cc1b41dae0a51d26bfe704029ee8- Add building deb package (WIP) by @tino097 in https://github.com/jqnatividad/qsv/pull/2029
- Added GitHub workflow to test debian package (WIP) by @tino097 in https://github.com/jqnatividad/qsv/pull/2032
tests: added false positive to _typos.toml configuration https://github.com/jqnatividad/qsv/commit/d576af229bf76b7d0e1f40eb37b578a6b6691ed4- added more benchmarks
- added more tests
Changed
fetch&fetchpost: remove expired diskcache entries on startup https://github.com/jqnatividad/qsv/commit/9b6ab5db91416f71577b8a1fc91e2e3189a1bd4bfrequency: smarter frequency compilation with new--stats-modeoption https://github.com/jqnatividad/qsv/pull/2030json: refactored for maintainability & performance https://github.com/jqnatividad/qsv/commit/62e92162a4aa446097736ec76834cf0e06d195b8 and https://github.com/jqnatividad/qsv/commit/4e44b1878b952c455c1922a66795b8c86a1b1dba- improved
self-updatemessages https://github.com/jqnatividad/qsv/commit/5c874e09e15a274dccd8f83a322002032e65c2d0 and https://github.com/jqnatividad/qsv/commit/0aa0b13cf34103cfb75befc6480f31714d806aa2 contrib(completions):frequencyupdates & remove bashly/fish by @rzmk in https://github.com/jqnatividad/qsv/pull/2031- Debian package update by @tino097 in https://github.com/jqnatividad/qsv/pull/2017
publish: optimized enabled CPU features when building release binaries in all GitHub Actions "publishing" workflowspublish: ensure latest Python patch release is used when buildingqsvpybinary variants https://github.com/jqnatividad/qsv/commit/2ab03a097645a95b0d390f546ad9735c9a7e72b2 and https://github.com/jqnatividad/qsv/commit/ec6f486ef112cf942b2263b84b97d90cba1beb12tests: also enabled CPU features in CI testsdocs: wordsmith qsv "elevator pitch" https://github.com/jqnatividad/qsv/commit/cc47fe688eeeb13b4deb3f3bf48d954924eee22edocs: point to https://100.dathere.com in Whirlwind tour https://github.com/jqnatividad/qsv/commit/fc49aef826c1b1933ea1508cb687476936a147ffdeps: bump polars to latest upstream post py-1.41.1 release at the time of this release- build(deps): bump bytes from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2018
- build(deps): bump bytes from 1.7.0 to 1.7.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2021
- build(deps): bump flate2 from 1.0.30 to 1.0.31 by @dependabot in https://github.com/jqnatividad/qsv/pull/2027
- build(deps): bump indexmap from 2.2.6 to 2.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2020
- build(deps): bump jaq-parse from 1.0.2 to 1.0.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/2016
- build(deps): bump redis from 0.26.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/2023
- build(deps): bump regex from 1.10.5 to 1.10.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/2025
- build(deps): bump serde_json from 1.0.121 to 1.0.122 by @dependabot in https://github.com/jqnatividad/qsv/pull/2022
- build(deps): bump sysinfo from 0.30.13 to 0.31.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2019
- build(deps): bump sysinfo from 0.31.0 to 0.31.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/2024
- build(deps): bump tempfile from 3.11.0 to 3.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/2033
- build(deps): bump serde from 1.0.204 to 1.0.205 by @dependabot in https://github.com/jqnatividad/qsv/pull/2036
- apply select clippy suggestions
- updated several indirect dependencies
- made various usage text improvements
- bumped MSRV to 1.80.1
Fixed
sqlp&joinp: fixed.ssv.szoutput auto-compression support https://github.com/jqnatividad/qsv/commit/5397f6c7a3b083872bbb97d90db3a2fd2f8521e6 & https://github.com/jqnatividad/qsv/commit/d86ba6376d5819898187d5fa88eae19373022e5bdocs: fix link by @uncenter in https://github.com/jqnatividad/qsv/pull/2026tests: correct misnamed test https://github.com/jqnatividad/qsv/commit/8ae600011ddb109e7993e54dae9b933d15eccd38tests: fix flakyreverseproperty tests https://github.com/jqnatividad/qsv/commit/d86ba6376d5819898187d5fa88eae19373022e5b
Removed
docs: "Quicksilver" is the name of the logo horse, not how you pronounce "qsv" https://github.com/jqnatividad/qsv/commit/e4551ae4b62a3a635b7c351c5f28aa2a7d374958
New Contributors
- @uncenter made their first contribution in https://github.com/jqnatividad/qsv/pull/2026
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.130.0...0.131.0
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.130.0
Following the 0.129.0 release - the largest release to date, 0.130.0 continues to polish qsv as a data-wrangling engine, packing new features, fixes, and improvements, previewing upcoming features in qsv pro 1.0. Here are a few highlights:
Highlights
- Added
.ssv(semicolon separated values) automatic support. Semicolon separated values are now automatically detected and supported by qsv. Though not as common as CSV, SSV is used in some regions and industries, so qsv now supports it. - Added cargo deb compatibility. In preparation for the release of DataPusher+ 1.0, we're now making it easier to upgrade
qsvdpso CKAN administrators can install and upgrade it easily usingapt-get install qsvdporapt-get upgrade qsvdp. DP+ is our next-gen, high-speed data ingestion tool for CKAN that uses qsv as its analysis engine. Its not only a robust, fast, validating data pump that guarantees high quality data, it also does extended analysis to infer and automatically derive high-quality metadata - what we call "automagical metadata". - Upgraded to the latest Polars upstream at the py-polars-1.3.0 tag. Polars tops the TPC-H Benchmark and is several orders of magnitude faster than traditional dataframe libraries (cough - 🐼 pandas). qsv proudly rides the 🐻❄️ Polars bear to get subsecond response times even with very large datasets!
- qsv v0.130.0 shell completions files are available for download here. With shell completions, pressing tab in a compatible shell provides suggestions for various qsv commands, subcommands, and options that you can choose from. Supported shells include bash, zsh, powershell, fish, nushell, fig, and elvish. View tips on how to install completions for the bash shell here.
Added
apply: add base62 encode/decode operations https://github.com/jqnatividad/qsv/pull/2013headers: add--just-countoption https://github.com/jqnatividad/qsv/pull/2004json: add--selectoption https://github.com/jqnatividad/qsv/pull/1990searchset: add--not-oneflag by @rzmk in https://github.com/jqnatividad/qsv/pull/1994- Added
.ssv(semicolon separated values) automatic support https://github.com/jqnatividad/qsv/pull/1987 - Added cargo deb compatibility by @tino097 in https://github.com/jqnatividad/qsv/pull/1991
contrib(completions): add--just-countforheadersby @rzmk in https://github.com/jqnatividad/qsv/pull/2006contrib(completions): add--selectforjsonby @rzmk in https://github.com/jqnatividad/qsv/pull/1992- added several benchmarks
- added more tests
Changed
diff: allow selection of--keyand--sort-columnsby name, not just by index https://github.com/jqnatividad/qsv/pull/2010fetch&fetchpost: replace deprecated Redis execute command https://github.com/jqnatividad/qsv/commit/75cbe2b76426591e4658fdcb7d29287a40a7db36stats: more intelligent--infer-lenoption https://github.com/jqnatividad/qsv/commit/c6a0e641cd4c6ef87c070c8944f32a962a11c7e3validate: return delimiter detected upon successful CSV validation https://github.com/jqnatividad/qsv/pull/1977- bump polars to latest upstream at py-polars-1.3.0 tag https://github.com/jqnatividad/qsv/pull/2009
- deps: bump csvs_convert from 0.8.12 to 0.8.13 https://github.com/jqnatividad/qsv/commit/d1d08009deb0579fd4d6fe305097e00e92da4191
- build(deps): bump cached from 0.52.0 to 0.53.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1983
- build(deps): bump cached from 0.53.0 to 0.53.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1986
- build(deps): bump postgres from 0.19.7 to 0.19.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1985
- build(deps): bump pyo3 from 0.22.1 to 0.22.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1979
- build(deps): bump redis from 0.25.4 to 0.26.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1995
- build(deps): bump serde_json from 1.0.120 to 1.0.121 by @dependabot in https://github.com/jqnatividad/qsv/pull/2011
- build(deps): bump simple-expand-tilde from 0.1.7 to 0.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1984
- build(deps): bump tokio from 1.38.0 to 1.38.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1973
- build(deps): bump tokio from 1.38.1 to 1.39.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1988
- build(deps): bump xxhash-rust from 0.8.11 to 0.8.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1997
- apply select clippy suggestions
- updated several indirect dependencies
- made various usage text improvements
- pin Rust nightly to 2024-07-26
Fixed
diff: clarify--keyusage examples, resolves #1998 by @rzmk in https://github.com/jqnatividad/qsv/pull/2001json: refactored so it didn't need to use threads to spawnqsv selectto order the columns. Had to do this as sometimes intermediate output was sent to stdout before the final output was ready https://github.com/jqnatividad/qsv/commit/0f25deff98139b574dfd61c6e9bf58d36ea16618py: replace row with col in usage text by @allen-chin in https://github.com/jqnatividad/qsv/pull/2008reverse: fix indexed bug https://github.com/jqnatividad/qsv/pull/2007validate: properly auto-detect tab delimiter when file extension is TSV or TAB https://github.com/jqnatividad/qsv/pull/1975- fix panic when process_input helper fn receives unexpected input from stdin https://github.com/jqnatividad/qsv/commit/152fec486c0e7b16242f3967930e9654ff2bdf3c
Removed
docs: remove *nix only message forforeachby @rzmk in https://github.com/jqnatividad/qsv/pull/1972
New Contributors
- @tino097 made their first contribution in https://github.com/jqnatividad/qsv/pull/1991
- @allen-chin made their first contribution in https://github.com/jqnatividad/qsv/pull/2008
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.129.1...0.130.0
To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.129.1
This is a small patch release to fix some publishing issues, update tab completion, and to fix minor CI errors. See 0.129.0 release notes to get the details on qsv's biggest release to date!
Changed
clipboard: add error handling based onclipboard::Errorby @rzmk in https://github.com/jqnatividad/qsv/pull/1970contrib(completions): add all commands (exceptapplydp&generate) by @rzmk in https://github.com/jqnatividad/qsv/pull/1971- Temporarily suppressed some CI tests that were flaky on GH macOS Apple Silicon action runners. They previously worked fine on self-hosted macOS Apple Silicon action runners that are temporarily unavailable.
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.129.0...0.129.1
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.129.0
This release is the biggest one ever!
Packed with new features, improvements, and previews of upcoming qsv pro features, here are a few highlights:
📌 Highlights (click each dropdown for more info)
Meet @rzmk - qsv pro's software engineer now also co-maintains qsv!
@rzmk has contributed to projects in the qsv ecosystem including qsv's [`describegpt`](https://github.com/jqnatividad/qsv/tree/master/src/main/describegpt.rs), [`prompt`](https://github.com/jqnatividad/qsv/tree/master/src/main/prompt.rs), [`json`](https://github.com/jqnatividad/qsv/tree/master/src/main/json.rs), and [`clipboard`](https://github.com/jqnatividad/qsv/tree/master/src/main/clipboard.rs) commands; qsv's tab completion support; [qsv.dathere.com](https://qsv.dathere.com) including its online configurator and benchmarks page; [100.dathere.com](https://100.dathere.com) with its qsv lessons and exercises; and [qsv pro](https://qsvpro.dathere.com) the spreadsheet data wrangling desktop app (along with its promo site). @rzmk now also co-maintains qsv! With @rzmk now also co-maintaining qsv, our data-wrangling portfolio's roadmap may get more intriguing as @rzmk's work on qsv pro, 100.dathere.com, and other initiatives can result in contributions to qsv as we've seen in this release. Perhaps some aims may be put towards AI; "[automagical](https://dathere.com/2023/11/automagical-metadata/)" metadata inferencing; DCAT 3; and expanded recipe support with the accelerated evolution of qsv pro as an enterprise-grade Data-Wrangling/Data Curation Workbench.Polars v0.41.3 - numerous sqlp and joinp improvements
* `sqlp`: expanded SQL support
- Natural Join support
- DuckDB-like `COLUMNS` SQL function to select columns that match a pattern
- ORDER BY ALL support
- Support POSTGRESQL `^@` ("starts with"), `~~`,`~~*`,`!~~`,`!~~*` ("like", "ilike") string-matching operators
- Support for SQL `SELECT * ILIKE` wildcard syntax
- Support SQL temporal functions `STRFTIME` and `STRPTIME`
* `sqlp`: added `--streaming` option
New command qsv prompt - Use a file dialog for qsv file input and output
Be more interactive with qsv by using a file dialog to select a file for input and output.

Here are a few key highlights:
- Start with `qsv prompt` when piping commands to provide a file as input from an open file dialog and pipe it into another command, for example: `qsv prompt | qsv stats`.
- End with `qsv prompt -f` when piping commands to save the output to a file you choose with a save file dialog.
There are other options too, so feel free to explore more with `qsv prompt --help`.
This will allow you to create qsv pipelines that are more "user-friendly" and distribute them to non-technical users. It's not as flexible as qsv pro's full-blown GUI, but it's a start!
New command qsv json - Convert JSON data to CSV and optionally provide a jq-like filter
The new `json` command allows you to convert non-nested JSON data to CSV. If your data is not in the expected format, try using the `--jaq` option to provide a jq-like filter. See `qsv json --help` for more information and examples.

Here are a few key highlights:
- Specify the path to a JSON file to attempt conversion to CSV with `qsv json New command qsv clipboard - Provide input from your clipboard and save output to your clipboard
Provide your clipboard content using `qsv clipboard` and save output to your clipboard by piping into `qsv clipboard --save` (or `-s` for short).

100.dathere.com - Try out lessons and exercises with qsv from your browser!
You may run qsv commands from your browser without having to install it locally at [100.dathere.com](https://100.dathere.com). | Within the lesson (in-page) using Thebe | In a Jupyter Lab environment | | ----------------------------------- | ----------------------------------- | |  |  | Thanks to [Jupyter Book](https://jupyterbook.org), [datHere](https://dathere.com) has released a website available at [100.dathere.com](https://100.dathere.com) where you may explore lessons and exercises with qsv by running them within the web page, in a Jupyter Lab environment, or locally after following the provided installation instructions. There are multiple exercises planned, but feel free to try out the first few available lessons/exercises by visiting [100.dathere.com](https://100.dathere.com) and star the source code's repository [here](https://github.com/dathere/100.dathere.com).New multi-shell completions draft (bash, zsh, powershell, fish, nushell, fig, elvish)
There's a draft of more qsv shell completion support including 7 different shells! The plan is to add the rest of the commands in this implementation since we can use one codebase to generate the 7 shell completion script files. Feel free to try out the various shell completions in the `examples` folder from [`contrib/completions`](https://github.com/jqnatividad/qsv/tree/master/contrib/completions) to verify if the examples work (as of today's release date only `qsv count` and `qsv clipboard` may be available) and also contribute to adding the rest of the completions if you know a bit of Rust. The existing Bash shell completions for v0.129.0 and fish shell completions draft are available for now as the multi-shell completions draft is being developed. | Bash completions demo | Fish completions demo | | ----------------------------------- | ----------------------------------- | |  |  | With shell completions enabled, you may identify qsv commands more easily when pressing the `tab` key on your keyboard in certain positions using the relevant Bash or fish shell from your terminal. You may follow the instructions from 100.dathere.com [here](https://100.dathere.com/exercises-setup.html#bash) to learn how to install the Bash completions and under the Usage section [here](https://github.com/jqnatividad/qsv/tree/master/contrib/fish#usage) for fish shell completions. Note that the fish shell completions are incomplete and both of the implementations may be replaced by the multi-shell completions implementation once complete.qsvpro.dathere.com - Preview: Download spreadsheets from a compatible CKAN instance into the qsv pro Workflow
> This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released.  In addition to importing local spreadsheet files and uploading to a CKAN instance, this new feature allows users to select a locally registered CKAN instance where they have the `create_dataset` permission to download a spreadsheet file from their CKAN instance and load the new local spreadsheet file into the Workflow. qsv pro's Workflow would therefore have both upload and download capability to and from a compatible CKAN instance.qsvpro.dathere.com - Preview: Attempt SQL query generation from natural language with a compatible LLM API instance
> This is a preview of a feature, meaning it is planned for an upcoming release but may change by the time it is released. > Also note that this video is sped up as you may see by the notes that pop up (you may pause the video to read them). https://github.com/jqnatividad/qsv/assets/30333942/e90893e6-3196-4fa6-bce0-f69a9f6347f2 Leveraging [`qsv describegpt`](https://github.com/jqnatividad/qsv/tree/master/src/cmd/describegpt.rs)'s AI integration capabilities along with multiple other qsv commands, qsv pro's Workflow's existing SQL query tab now has a generator that may ***attempt*** to generate a SQL query natural language using an LLM API compatible with OpenAI's API specification such as running an [Ollama](https://ollama.com/) (v0.2.0 or above) server locally and ***attempt*** to generate a SQL query by asking a question related to your spreadsheet data. Results may vary depending on your configuration and you may need to fix the generated output. For example in the demo we asked for ***who*** has the highest salary but extra information and only the highest salary was provided, though this does give a query we can modify and work with.Note on Ask and qsv describegpt
We mention ***attempt*** since LLMs can produce incorrect output, even output that *seems* correct but is not. We mention that "inaccurate information" may be produced within `qsv describegpt`'s usage text too along with AI-generated output potentially being incorrect within qsv pro, so make sure the output is fixed and verified before using it in production use cases.
🔁 Changelog
### Added
* `clipboard`: add `qsv clipboard` command for clipboard input/output by @rzmk in https://github.com/jqnatividad/qsv/pull/1953
* `describegpt`: add `--prompt` for custom prompt & update prompt file + docs by @rzmk in https://github.com/jqnatividad/qsv/pull/1862
* `describegpt`: add base_url, model, ollama, & timeout to prompt file by @rzmk in https://github.com/jqnatividad/qsv/pull/1859
* `enum`: add `--hash` option to create a platform-independent deterministic id https://github.com/jqnatividad/qsv/pull/1902
* `enum`: add `--uuid7` option to create UUID v7 identifiers https://github.com/jqnatividad/qsv/pull/1914
* `freq`: add `--no-trim` option https://github.com/jqnatividad/qsv/pull/1944
* `foreach`: add sample Windows implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1847
* `joinp`: add `--right` outer join option https://github.com/jqnatividad/qsv/pull/1945
* `json`: change jsonp to json using new implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1924
* `json`: add `--jaq` option to allow jq-like filtering & test by @rzmk in https://github.com/jqnatividad/qsv/pull/1959
* `jsonp`: add `jsonp` command allowing non-nested JSON to CSV conversion with Polars by @rzmk in https://github.com/jqnatividad/qsv/pull/1880
* `prompt`: add `qsv prompt` to pick a file with a file dialog & write to stdout by @rzmk in https://github.com/jqnatividad/qsv/pull/1860
* `prompt`: add `--fd-output` (`-f`) & `--output` (`-o`) options by @rzmk in https://github.com/jqnatividad/qsv/pull/1861
* `select`: add `--sort`, `--random` & `--seed` options; also add 9999 sentinel value to indicate last column https://github.com/jqnatividad/qsv/pull/1867
* `select`: use underscore char (_) to indicate last column, replacing 9999 sentinel value https://github.com/jqnatividad/qsv/pull/1873
* `sqlp`: add `--streaming` option https://github.com/jqnatividad/qsv/commit/e8bee9a60dccc6ec5b5a43b91cb6f558915faa0e
* `stats`: add Standard Error of the Mean (SEM) & Coefficient of Variation (CV) https://github.com/jqnatividad/qsv/pull/1857
* `validate`: added custom JSONschema format "currency" (decimal with 2 decimal places). Also, added check that only ascii characters are allowed in keys in JSONschema files.
* added `--batch` zero option to all commands with batch processing. This sentinel value is used to indicate that the entire input should be processed in one batch https://github.com/jqnatividad/qsv/commit/feedbda4a3be9f8835eba0626e5fe01147831186
* added typos check to CI https://github.com/jqnatividad/qsv/commit/9fdf0662b6dc4fa6ebfed592a177d8539f264041
* `contrib(fish)`: add fish completions prototype with `qsv.fish` and docs by @rzmk in https://github.com/jqnatividad/qsv/pull/1884
* contrib(bashly): add `--hash To stay updated with datHere's latest news and updates (including qsv pro, datHere's CKAN DMS, and analyze.dathere.com), subscribe to the newsletter here: dathere.com/newsletter
- Rust
Published by jqnatividad over 1 year ago
https://github.com/dathere/qsv - 0.128.0
[0.128.0] - 2024-05-25
❤️ csv,conf,v8 Edition 🎉
🏇🏽 ¡Ándale! ¡Ándale! ¡Arriba! ¡Arriba! 💨
Yii-hah! We're Mexico bound as we head to csv,conf,v8 to present and share qsv with fellow data-makers and wranglers from all over!
And we've packed a lot into this release for the occasion:
* search got a lot of love as it now powers qsv pro's new search feature to get near-instant search results even on large datasets.
* stats - the ❤️ of qsv, now has several cache fine-tuning options with --cache-threshold. It now also computes max_precision for floats and is_ascii for strings. It also has a new --round 9999 sentinel value to suppress rounding of statistics.
* schema & tojsonl are now faster thanks to stats --cache-threshold autoindex & cache creation/deletion logic.
* We upgraded Polars to 0.40.0 to unlock additional capabilities in the count, joinp & sqlp commands.
* count now has an additional blazing fast counting mode using Polars' read_csv() table function.
* frequency gets some micro-optimizations for even faster frequency analysis.
* luau is now bundled with luau 0.625 from 0.622. We also upgraded the bundled LuaDate library from 2.2.0 to 2.2.1. All of this, while making it ~10% faster!
Overall, qsv manages to keep its performance edge despite the addition of new capabilities and features. We'll give a whirlwind tour of qsv and these updates in our talk at csv,conf,v8.
We'll also preview what we've been calling the People's APPI - our "Answering People/Policymaker Interface" in qsv pro.
This is a new way to interact with qsv that's more conversational and less command-line-y using a natural language interface. It's a way to make qsv more accessible to more people, especially those who are not comfortable with the command line.
We're excited to share all these qsv innovations with the csv,conf,v8 community and the wider world! Nos vemos en Puebla!
¡Ándele! ¡Ándele! ¡Epa! ¡Epa! ¡Epa!
Added
count: additional Polars-powered counting mode usingread_csv()SQL table function https://github.com/jqnatividad/qsv/commit/05c580912365356e9c5383654f351e0cc6ebaab6input: add--quote-styleoption https://github.com/jqnatividad/qsv/commit/df3c8f14a4eaa2fba7237dfe30df2fef8c98eccdjoinp: add--coalesceoption https://github.com/jqnatividad/qsv/commit/8d142e51d683ab425fc53b2dddfdeeff6a814ffasearch: add--preview-matchoption https://github.com/jqnatividad/qsv/pull/1785search: add--jsonoutput option https://github.com/jqnatividad/qsv/pull/1790search: add "match-only"--flagoption mode https://github.com/jqnatividad/qsv/pull/1799search: add--not-oneflag for not using exit code 1 when no match by @rzmk in https://github.com/jqnatividad/qsv/pull/1810sqlp: add--decimal-commaoption https://github.com/jqnatividad/qsv/pull/1832stats: add--cache-thresholdoption https://github.com/jqnatividad/qsv/pull/1795stats: add--cache-thresholdautoindex creation/deletion logic https://github.com/jqnatividad/qsv/pull/1809stats: add additional mode to--cache-thresholdhttps://github.com/jqnatividad/qsv/commit/63fdc55828ec55bf7545c37bd56a4d537aa0cf71stats: now computes max_precision for floats https://github.com/jqnatividad/qsv/pull/1815stats: add--round9999 sentinel value support to suppress rounding https://github.com/jqnatividad/qsv/pull/1818stats: addis_asciicolumn https://github.com/jqnatividad/qsv/pull/1824- added new benchmarks for
searchcommand https://github.com/jqnatividad/qsv/commit/58d73c3beb41071d6cd8532768f0991f0554b717
Changed
count: document three count modes https://github.com/jqnatividad/qsv/commit/3d5a333ca8aef3aeaf74ff9e153b5118eb6a605bdescribegpt: update--max-tokenstype for LLMs with larger context sizes by @rzmk https://github.com/jqnatividad/qsv/pull/1841excel: use simplerrange::headers()to get headers https://github.com/jqnatividad/qsv/commit/069acbf5a6e86132214521324720608f4258c20ffrequency: ensure--other-sortedworks with--other-texthttps://github.com/jqnatividad/qsv/commit/7430ad76bda869be7729ea5000ad4d85a875433bfrequency: microoptimize hot loop https://github.com/jqnatividad/qsv/commit/d9c01e17fa6c4f853a501fe75c6a6b8a30c269d2, https://github.com/jqnatividad/qsv/commit/7c9f925184100f89f6f3a77ae4f7b93448103f38 andluau: improve usage text https://github.com/jqnatividad/qsv/commit/cb6b4d9b7bfb60a10385057ca093453e3549e424luau: we now bundle luau 0.625 from 0.622 https://github.com/jqnatividad/qsv/commit/40609751950a852f998fba41edb35aab31c74c20luau: update vendored LuaDate library from 2.2.0 to 2.2.1 https://github.com/jqnatividad/qsv/pull/1840schema: adjust to reflectstats --cache-thresholdoption https://github.com/jqnatividad/qsv/commit/92fed8696fd885d3721f07eeedcf67732febed4cslice: move json output helpers to util https://github.com/jqnatividad/qsv/commit/1f44b488784fd0c1ef22786ab7aeacbf2f8cf976tojsonl: refactor boolcheck helper https://github.com/jqnatividad/qsv/commit/74d5f5a8c934254e11ee611973cc10524a288a9edocs: cross-referencesplit&partitioncommands https://github.com/jqnatividad/qsv/pull/1828- contrib(bashly): update completions.bash for qsv v0.127.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1776
- contrib(bashly): update completions.bash for qsv v0.128.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1838
deps: upgrade to polars 0.40.0 https://github.com/jqnatividad/qsv/pull/1831- build(deps): bump actix-web from 4.5.1 to 4.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1825
- build(deps): bump anyhow from 1.0.82 to 1.0.83 by @dependabot in https://github.com/jqnatividad/qsv/pull/1798
- build(deps): bump anyhow from 1.0.83 to 1.0.85 by @dependabot in https://github.com/jqnatividad/qsv/pull/1823
- build(deps): bump anyhow from 1.0.85 to 1.0.86 by @dependabot in https://github.com/jqnatividad/qsv/pull/1826
- build(deps): bump cached from 0.50.0 to 0.51.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1789
- build(deps): bump cached from 0.51.0 to 0.51.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1793
- build(deps): bump cached from 0.51.1 to 0.51.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1802
- build(deps): bump cached from 0.51.2 to 0.51.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1805
- build(deps): bump crossbeam-channel from 0.5.12 to 0.5.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/1827
- build(deps): bump csvs_convert from 0.8.9 to 0.8.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1808
- build(deps): bump data-encoding from 2.5.0 to 2.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1780
- build(deps): bump file-format from 0.24.0 to 0.25.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1807
- build(deps): bump flate2 from 1.0.28 to 1.0.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1778
- build(deps): bump flate2 from 1.0.29 to 1.0.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1784
- build(deps): bump hashbrown from 0.14.3 to 0.14.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1781
- build(deps): bump itertools from 0.12.1 to 0.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1822
- deps: bump forked jsonschema from 0.17.1 to 0.18.0 https://github.com/jqnatividad/qsv/commit/f02620fd170804b1995b070e8133522b98a8c443
- build(deps): bump mimalloc from 0.1.41 to 0.1.42 by @dependabot in https://github.com/jqnatividad/qsv/pull/1829
- build(deps): bump mlua from 0.9.7 to 0.9.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1821
- build(deps): bump qsv-stats from 0.16.0 to 0.17.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1813
- build(deps): bump qsv-stats from 0.17.1 to 0.17.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1814
- build(deps): bump qsv-stats from 0.17.2 to 0.18.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1816
- build(deps): bump ryu from 1.0.17 to 1.0.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/1801
- build(deps): bump semver from 1.0.22 to 1.0.23 by @dependabot in https://github.com/jqnatividad/qsv/pull/1800
- build(deps): bump serde from 1.0.198 to 1.0.199 by @dependabot in https://github.com/jqnatividad/qsv/pull/1777
- build(deps): bump serde from 1.0.199 to 1.0.200 by @dependabot in https://github.com/jqnatividad/qsv/pull/1787
- build(deps): bump serde from 1.0.200 to 1.0.201 by @dependabot in https://github.com/jqnatividad/qsv/pull/1804
- build(deps): bump serde from 1.0.201 to 1.0.202 by @dependabot in https://github.com/jqnatividad/qsv/pull/1817
- build(deps): bump serde_json from 1.0.116 to 1.0.117 by @dependabot in https://github.com/jqnatividad/qsv/pull/1806
- build(deps): bump serial_test from 3.1.0 to 3.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1779
- build(deps): bump simple-expand-tilde from 0.1.5 to 0.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1811
- build(deps): bump sysinfo from 0.30.11 to 0.30.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1797
- build(deps): bump titlecase from 3.0.0 to 3.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1791
- build(deps): bump jql-runner from 7.1.8 to 7.1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1839
- apply select clippy suggestions
- updated several indirect dependencies
- pin Rust nightly to 2024-05-14
- bump MSRV to 1.78
Fixed
luau: correct example when using--colindexhttps://github.com/jqnatividad/qsv/commit/cbbed21718324346031a3201407f274abfec5ee6search: fix--jsonoutput https://github.com/jqnatividad/qsv/pull/1792- pass through docopt messages without a prefix https://github.com/jqnatividad/qsv/pull/1835
- apply Polars SQL
count(*) group byfix https://github.com/jqnatividad/qsv/pull/1837
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.127.0...0.128.0
- Rust
Published by jqnatividad almost 2 years ago
https://github.com/dathere/qsv - 0.127.0
📊 Enhanced Frequency Analysis 📊
This a quick release adding several frequency enhancements for more detailed frequency analysis. The frequency command now includes a percentage column, calculates other values, and supports limiting unique counts and negative limits.
These options provides additional context for Datapusher+, qsv-pro and describegpt so their metadata inferences are more accurate and comprehensive.
Previously, for a 775-row CSV file containing one column named state with entries for all 50 states, frequency only showed[^1]:
qsv frequency freq_state_example.csv | qsv table
field value count
state NY 100
state NJ 70
state CA 60
state MA 55
state FL 45
state TX 43
state NM 40
state AZ 39
state NV 38
state MI 35
Now, there's a new percentage column and other values calculation, both of which have configurable options:
qsv frequency freq_state_example.csv | qsv table
field value count percentage
state NY 100 12.90323
state NJ 70 9.03226
state CA 60 7.74194
state MA 55 7.09677
state FL 45 5.80645
state TX 43 5.54839
state NM 40 5.16129
state AZ 39 5.03226
state NV 38 4.90323
state MI 35 4.51613
state Other (40) 250 32.25806
This release is also out of cycle to address a big performance regression in the excel command caused by unnecessary formula info retrieval for the --error-format option introduced in 0.126.0. This has been fixed, and the excel command is now back to its speedy self.
Added
frequency: added percentage column;othervalues calculation, implementing https://github.com/jqnatividad/qsv/issues/1774 https://github.com/jqnatividad/qsv/pull/1775benchmarks: added newfrequencyandexcelbenchmarks https://github.com/jqnatividad/qsv/commit/b83ad3aae1cdf9a1750201cbf9b3ccd4ac3a4192
Changed
- contrib(bashly): update completions.bash for qsv v0.126.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1771
- build(deps): bump mimalloc from 0.1.39 to 0.1.41 by @dependabot in https://github.com/jqnatividad/qsv/pull/1772
- build(deps): bump qsv-stats from 0.14.0 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1773
- updated several indirect dependencies
- applied select clippy recommendations
Fixed
excel: fixed performance regression because qsv was unnecessarily getting formula info (an expensive operation) for--error-formatoption even when not required https://github.com/jqnatividad/qsv/commit/772af3420c44c864e06cd2cb61606900bff17947- renamed 0.126.0 sqlpvsduckdb benchmark results so they're next to each other for easy direct comparison. https://github.com/jqnatividad/qsv/commit/7bcd59e301965b9e8737a9230d1236e8d34ab4bf.
Per the benchmarks,sqlpis 2.87 times faster than duckdb v0.10.2 for a simple aggregation (0.066 secs vs 0.19 secs), and 1.42 times faster for an "expensive" aggregation (0.143 secs vs 0.203 secs).
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.126.0...0.127.0
[^1]: with its default --limit setting of 10 only show the top 10 unique values in the column, sorted by occurence
- Rust
Published by jqnatividad almost 2 years ago
https://github.com/dathere/qsv - 0.126.0
🤖 Expanded Metadata Inferencing 🤖
describegpt headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.
Several commands got additional options: cat with --no-headers support in the rowskey subcommand; excel with new options like --error-format and short --metadata mode; and foreach with a --dry-run option. frequency also got new options, including --unq-limit for limiting unique counts, support for negative limits, and a --lmt-threshold option for compiling comprehensive frequencies below a threshold. slice now supports negative indices and new JSON output options, providing more flexibility in data slicing.
This is all rounded out with sqlp improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv() and read_parquet()) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.
New Features
cat: Added--no-headerssupport to therowskeysubcommand.describegpt: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.excel: Introduced new options in the excel command:--error-formatfor better error handling and a short--metadataJSON mode.foreach: added a--dry-runoption, allowing users to preview the results of scripts without executing them.frequency: New options added such as--unq-limitfor limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a--lmt-thresholdoption to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.slice: Support for negative indices to slice from the end and new JSON output options.sqlp: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.
Changes and Optimizations
- Performance Enhancements: Microoptimizations in
datefmtandvalidatecommands, and increased default length for--infer-leninsqlpfor improved performance. - Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
- Benchmarks Added: New performance benchmarks for
sqlpvs duckdb added to ensure there are no performance regressions between releases. Right now,sqlpis faster thanduckdbin most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.
Security and Robustness
- Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
- Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.
Deprecated Features
fetch&fetchpost: Removal of the jsonxf crate from these commands to streamline JSON processingreverse: Eliminate kludgy buffer expansions.
Added
cat: add--no-headerssupport to rowskey subcommand https://github.com/jqnatividad/qsv/pull/1762describegpt: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in https://github.com/jqnatividad/qsv/pull/1761excel: add--error-formatoption https://github.com/jqnatividad/qsv/pull/1721excel: add--metadatashort JSON mode https://github.com/jqnatividad/qsv/pull/1738foreach: add--dry-runoption https://github.com/jqnatividad/qsv/pull/1740frequency: add--unq-limitoption https://github.com/jqnatividad/qsv/pull/1763frequency: add support for negative--limits https://github.com/jqnatividad/qsv/pull/1765frequency: add--lmt-thresholdoption https://github.com/jqnatividad/qsv/pull/1766slice: add support for negative--indexoption values https://github.com/jqnatividad/qsv/pull/1726slice: implement--jsonoutput option https://github.com/jqnatividad/qsv/pull/1729sqlp: added support for single-line comments in SQL scripts https://github.com/jqnatividad/qsv/commit/bb52bcee61d8ea980a2ab093315ead0c153517a5sqlp: added SKIPINPUT special value to short-circuit input processing if the user wants to load input files directly using table functions (e.g. readcsv(), read_parquet(), etc.) https://github.com/jqnatividad/qsv/commit/fe850adb47f1d7aa7f6c3981e350646e7b0c7476validate: add--valid-outputoption https://github.com/jqnatividad/qsv/pull/1730- contrib: add sample Bashly completions implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1731
benchmarks: addedsqlpvsduckdbbenchmarks.
Changed
datefmt: microoptimize formatting https://github.com/jqnatividad/qsv/commit/0ee27e768fdc08b7381094842d22b45940fd0a26joinp: adapt to breaking change in Polars 0.39 for lazyframe sort https://github.com/jqnatividad/qsv/commit/c625ca9f5aef59c736a837aaa4eeda7688403c37sqlp: change--infer-lenoption default from 250 to 1000 for increased performance https://github.com/jqnatividad/qsv/commit/da1d215d803f8bfe400a7202feeecb8ae14239e9validate: microoptimizeto_json_instance()https://github.com/jqnatividad/qsv/commit/c2e4a1c696300eea04cccacca33f6872622ec086- bump Luau from 0.616 to 0.622 https://github.com/jqnatividad/qsv/commit/9216ec3a53767379662657f69c0076f4a52caaff
- build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1711
- build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1712
- build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1750
- build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1715
- build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1716
- build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1732
- build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1735
- build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1755
- build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1720
- build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1724
- build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1725
- build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1759
- build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in https://github.com/jqnatividad/qsv/pull/1733
- build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1734
- build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in https://github.com/jqnatividad/qsv/pull/1744
- bump polars from 0.38 to 0.39 https://github.com/jqnatividad/qsv/pull/1745
- build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1746
- build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1752
- build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1747
- build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in https://github.com/jqnatividad/qsv/pull/1749
- build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in https://github.com/jqnatividad/qsv/pull/1751
- build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1758
- build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1767
- build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1768
- build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1769
- applied select clippy recommendations
- updated several indirect dependencies
- added several benchmarks for new/changed commands
- pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
- bumped MSRV to 1.77.2
Fixed
- Make init_logger more robust https://github.com/jqnatividad/qsv/pull/1717
count: empty CSVs count as zero also for polars. Fixes #1741 https://github.com/jqnatividad/qsv/pull/1742excel: fix #1682 by adding--error-formatoption https://github.com/jqnatividad/qsv/issues/1689fetch&fetchpost: more robust JSON response validation https://github.com/jqnatividad/qsv/commit/ebc7287cd929cc23629ee53c7d82e0b8984bc2b0slice: usewrite!macro to get rid of GH Advanced Security lint https://github.com/jqnatividad/qsv/commit/c739097e20d526cb6f49ca69d76fed8b28adc029sqlp: fixed docopt defaults that were not being parsed correctly https://github.com/jqnatividad/qsv/commit/fe850adb47f1d7aa7f6c3981e350646e7b0c7476deps: bump h2 from 0.4.3 to 0.4.4 to fix HTTP2 Continuation Flood vulnerability https://github.com/jqnatividad/qsv/commit/6af0da27f4e4a0bb6d5563701c07c89ad00f76b8deps: bump rustls from 0.22.3 to 0.22.4 to fix https://nvd.nist.gov/vuln/detail/CVE-2024-32650 https://github.com/jqnatividad/qsv/pull/1758
Removed
fetch&fetch post: remove jsonxf crate; use serde_json to prettify JSON strings https://github.com/jqnatividad/qsv/pull/1727reverse: remove kludgy expansion of read/write buffers https://github.com/jqnatividad/qsv/commit/46095cdf57f65c5380251c5d59317053ae1f80c3
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.125.0...0.126.0
- Rust
Published by jqnatividad almost 2 years ago
https://github.com/dathere/qsv - 0.125.0
In this release, we focused on the 🏎️ need for even more speed 🏎️ .
This was done primarily by tweaking several supporting qsv crates. qsv-docopt now parses command-line arguments slightly faster. qsv-stats, the crate behind commands like stats, schema, tojsonl, and frequency, has been further optimized for speed. qsv-dateparser has been updated to support new timezone handling options in datefmt. qsv-sniffer also got a speed boost.
Per the benchmark suite, stats is 25% faster (1.563 secs vs 2.067 secs) when computing the 13 "streaming" stats and 13% faster when computing --everything (17 columns of addl stats - 3.149 secs vs 3.656 secs) for the 1M row, 41 column, 520mb sample of NYC's 311 data.
The count command has been refactored to utilize Polars' SQLContext, which leverages LazyFrames evaluation to automagically count even very large files in just a few seconds. Previously, count was already using Polars, but it mistakenly fell back to a slower counting mode. Now, it consistently delivers fast performance, even without an index. On the same benchmark suite, it takes 0.052 secs vs 0.503 seconds - almost 10x faster!
As count is not just a top-level command, but also a widely used helper used by several qsv commands, this gives the entire suite a nice performance boost.
Continuing on the performance front, the excel command now has a new short --metadata mode, allowing users to just get a "shorter" version of the metadata report that only list the workbook's top level metadata (sheet index, sheet name, sheet type, visibility) instead of the full metadata report (which also has info like num rows, column metadata, etc.). On the benchmark suite, the short metadata report takes all of 0.005 secs vs 11.237 secs for the 1M row xlsx version of the same NYC 311 data - more than 3 orders of magnitude faster! (it may actually be faster since 0.005 secs is at the limits of what hyperfine can measure)
The datefmt command also got some major enhancements with new timezone handling and timestamp parsing options, though at the cost of a small 15% performance penalty.
Lastly, we are excited to announce that qsv will be featured at the CSV,Conf,V8 conference in Puebla, Mexico on May 28-29. I'll be presenting a talk titled "qsv: A Blazing Fast CSV Data-Wrangling Toolkit". Hope to see you there!.
Added
excel: added short mode to--metadataoption https://github.com/jqnatividad/qsv/pull/1699datefmt: addedts-resolutionoption to specify resolution to use when parsing unix timestamps https://github.com/jqnatividad/qsv/pull/1704datefmt: added timezone handling options https://github.com/jqnatividad/qsv/pull/1706 https://github.com/jqnatividad/qsv/pull/1707 https://github.com/jqnatividad/qsv/pull/1642
Changed
count: refactored to use Polars SQLContext https://github.com/jqnatividad/qsv/commit/43a236f6a45c890d2bb6b4c43eb469bd627f82e1stats: refactored stats_path helper function https://github.com/jqnatividad/qsv/commit/174c30e3b87470613ff34a98617d44e477a4296aapply,applydp,datefmt,excel,geocode,py,validate: use std::mem::take to avoid clone https://github.com/jqnatividad/qsv/commit/1fd187f23262b51e0f431664895d49fd930d011a https://github.com/jqnatividad/qsv/commit/8402d3a8063ef161fc9ec68dd7f0f0601802d21d https://github.com/jqnatividad/qsv/commit/849615775505a25888a50b255ba0d544e878aeafexcel: optimized workbook opening operation https://github.com/jqnatividad/qsv/commit/67f662eba501e543ec44e5daf5eb175f8a8ae7b1- build(deps): bump flexi_logger from 0.27.4 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1673
- build(deps): bump polars from 0.38.2 to 0.38.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1674
- build(deps): bump uuid from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1675
- build(deps): bump hashbrown from 0.14.3 to 0.14.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1680
- build(deps): bump reqwest from 0.11.26 to 0.11.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1679
- build(deps): bump bytes from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1685
- build(deps): bump regex from 1.10.3 to 1.10.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1686
- build(deps): bump indexmap from 2.2.5 to 2.2.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1687
- build(deps): bump rayon from 1.9.0 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1688
- build(deps): bump qsv_docopt from 1.6.0 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1691
- build(deps): bump reqwest from 0.12.1 to 0.12.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1693
- build(deps): bump serde_json from 1.0.114 to 1.0.115 by @dependabot in https://github.com/jqnatividad/qsv/pull/1694
- build(deps): bump itoa from 1.0.10 to 1.0.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1695
- build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1700
- build(deps): bump rust_decimal from 1.34.3 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1701
- build(deps): bump chrono from 0.4.35 to 0.4.37 by @dependabot in https://github.com/jqnatividad/qsv/pull/1702
- build(deps): bump tokio from 1.36.0 to 1.37.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1703
- build(deps): bump qsv-sniffer from 0.10.2 to 0.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1708
- build(deps): bump titlecase from 2.2.1 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1709
- build(deps): bump qsv-stats from 0.13.0 to 0.14.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1710
- applied select clippy recommendations
- updated several indirect dependencies
- added several benchmarks for new/changed commands
- bumped MSRV to 1.77.1
- use
#[cfg(debug_assertions)]conditional compilation to avoid compiling debug code in release mode - use patched forks of
jsonschema,cached,self_updateandlocalzonecrates to avoid old dependencies which was causing dependency bloat
Fixed
count: fixed polarscountinput helper, as it was always falling back to "slow" counting mode https://github.com/jqnatividad/qsv/commit/3484c89080d41d2e39457c918a893189aee64753
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.124.1...0.125.0
- Rust
Published by jqnatividad almost 2 years ago
https://github.com/dathere/qsv - 0.124.1
Datapusher+ "Speed of Insight" Release! 🚀🚀🚀
This release is all about speed, speed, speed! We've made qsv even faster by leveraging Polars' multithreaded, mem-mapped CSV reader to get near-instant row counts of large CSV files, and near instant SQL queries and aggregations with Datapusher+ - automagically inferring metadata and giving you quick insights into your data in seconds!
We're demoing our qsv-powered Datapusher+ at the March 2024 installment of CKAN Montly Live on March 20, 2024, 13:00-14:00 UTC. Join us!
Beyond pushing data reliably at speed into your CKAN Datastore (it pushes real good! 😉), DP+ does some extended analysis, processing and enrichment of the data so it can be readily Used.
Both fetch and fetchpost commands now also have a --disk-cache option and are fully synched - forming the foundation for high-speed data enrichment from Web Services - including datHere's forthcoming, fully-integrated Data Enrichment Service.
🏇🏽 Hi-ho Quicksilver, away! 🏇🏽
Added
count: automatically use Polars multithreaded, mem-mapped CSV reader whenpolarsfeature is enabled to get near-instant row counts of large CSV files even without an index https://github.com/jqnatividad/qsv/pull/1656qsvdp: added polars support to Datapusher+-optimized binary variant, so we can do near instant SQL queries and aggregations during DP+ processing https://github.com/jqnatividad/qsv/pull/1664fetchpost: added--disk-cacheoptions and synced usage options withfetchhttps://github.com/jqnatividad/qsv/pull/1671- extended
.infile-listto skip empty and commented lines, and to validate file paths https://github.com/jqnatividad/qsv/commit/20a45c80fa32ef8a8060bb32cc94b7934da23229 and https://github.com/jqnatividad/qsv/commit/26509303719ce29e900cb73b5000671a78db6b4a
Changed
sqlp: automatically disableread_csv()fast path optimization when a custom delimiter is specified https://github.com/jqnatividad/qsv/pull/1648- refactored util::count_rows() helper to also use polars if available https://github.com/jqnatividad/qsv/commit/1e09e17e440d3cdc11237d9d9e45cefb82da5a42 and https://github.com/jqnatividad/qsv/commit/8d321fe8ad4c288b72edc7e8d082fcd6ec304a32
- publish: updated Windows MSI publish GH Action workflow to use Wix 3.14 from 3.11 https://github.com/jqnatividad/qsv/commit/75894ef4e894f521056a93b4f0a14d7469bac022
- deps: bump polars from 0.38.1 to 0.38.2 https://github.com/jqnatividad/qsv/commit/5faf90ed830541a724768e808c7f07f0a418e2ab
- deps: update Luau from 0.614 to 0.616 https://github.com/jqnatividad/qsv/commit/eb197fe81738b4ed15352f5f89d5d5d1b0fad604 and https://github.com/jqnatividad/qsv/commit/52331da939a3cd278c6a1f474179bef2207364a8
- build(deps): bump sysinfo from 0.30.6 to 0.30.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1650
- build(deps): bump chrono from 0.4.34 to 0.4.35 by @dependabot in https://github.com/jqnatividad/qsv/pull/1651
- build(deps): bump strum from 0.26.1 to 0.26.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1658
- build(deps): bump qsv-stats from 0.12.0 to 0.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1663
- build(deps): bump anyhow from 1.0.80 to 1.0.81 by @dependabot in https://github.com/jqnatividad/qsv/pull/1662
- build(deps): bump reqwest from 0.11.25 to 0.11.26 by @dependabot in https://github.com/jqnatividad/qsv/pull/1667
- applied select clippy recommendations
- updated several indirect dependencies
- added several benchmarks for new/changed commands
Fixed
dedup: fixed #1665 dedup not handling numeric values properly by adding a --numeric option https://github.com/jqnatividad/qsv/pull/1666joinp: reenable join validation tests now that Polars 0.38.2 join validation is working again https://github.com/jqnatividad/qsv/commit/5faf90ed830541a724768e808c7f07f0a418e2ab and https://github.com/jqnatividad/qsv/commit/fcfc75b855c615effb50f23c09a1d66ce70505e8count: broken in unreleased 0.124.0. Polars-powered count require a "clean" CSV file as it infers the schema based on the first 1000 rows of a CSV. This will sometimes result in an invalid "error" (e.g. it infers a column is a number column, when its not). 0.124.1 fixes this by adding a fallback to the "regular" CSV reader if a Polars error occurs https://github.com/jqnatividad/qsv/commit/a2c086900d1c1f1ba8ed2b2d1eaf8e547e3ef740
Removed
gender_guesser0.2.0 has been released. Remove patch.crates-io entry https://github.com/jqnatividad/qsv/commit/97873a5c496bfd559d7a7804db4d28b94915d536
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.123.0...0.124.1
- Rust
Published by jqnatividad almost 2 years ago
https://github.com/dathere/qsv - 0.123.0
OPEN DATA DAY 2024 Release! 🎉🎉🎉
In celebration of Open Data Day, we're releasing qsv 0.123.0 - the biggest release ever with 330+ commits! qsv 0.123.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.
We've been baking qsv pro for a while now, and it's almost ready for release. qsv pro is a cross-platform Desktop Data Wrangling tool marrying an Excel-like UI with the power of qsv, backed by cloud-based data cleaning, enrichment and enhancement service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.
Stay tuned!
Highlights:
sqlpnow has automaticread_csv()fast path optimization, often making optimized queries run dramatically faster - e.g what took 6.09 seconds for a non-trivial SQL aggregation on an 18 column, 657mb CSV with 7.43 million rows now takes just 0.14 seconds with the optimization - 🚀 43.5x FASTER 🚀 ! [^1] [^1]: measurements taken on an Apple Mac Mini 2023 model with an M2 Pro chip with 12 CPU cores & 32GB of RAM, running macOS Sonoma 14.4 ```bash # with fast path optimization turned off /usr/bin/time qsv sqlp taxi.csv --no-optimizations "select VendorID,sum(totalamount) from taxi group by VendorID order by VendorID" VendorID,totalamount 1,52377417.52985942 2,89959869.13054822 4,600584.610000027 (3, 2) 6.09 real 6.82 user 0.16 sys
with fast path optimization, fully exploiting Polars' multithreaded, mem-mapped CSV reader!
/usr/bin/time qsv sqlp taxi.csv "select VendorID,sum(totalamount) from taxi group by VendorID order by VendorID" VendorID,totalamount 1,52377417.52985942 2,89959869.13054822 4,600584.610000027 (3, 2) 0.14 real 1.09 user 0.09 sys
in contrast, csvq takes 72.46 seconds - 517.57x slower
/usr/bin/time csvq "select VendorID,sum(totalamount) from taxi group by VendorID order by VendorID" +----------+---------------------+ | VendorID | SUM(totalamount) | +----------+---------------------+ | 1 | 52377417.529256366 | | 2 | 89959869.1264675 | | 4 | 600584.6099999828 | +----------+---------------------+ 72.46 real 65.15 user 75.17 sys ```
"Traditional" SQL engines
qsv and csvq both operate on "bare" CSVs. For comparison, let's contrast qsv's performance against "traditional" SQL engines that require setup and import (aka ETL). Not counting setup and import time (which alone, takes several minutes), we get:
sqlite3.43.2 takes 2.910 seconds - 20.79x slower
sql
sqlite> .timer on
sqlite> select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID;
1,52377417.53
2,89959869.13
4,600584.61
Run Time: real 2.910 user 2.569494 sys 0.272972
PostgreSQL 15.6 using PgAdmin 4 v6.12 takes 18.527 seconds - 132.34x slower
even with an index, qsv sqlp is still 5.96x faster
sqlpnow supports JSONL output format and adds compression support for Avro and Arrow output formats.fetchnow has a--disk-cacheoption, so you can cache web service responses to disk, complete with cache control and expiry handling!jsonlis now multithreaded with additional--batchand--joboptions.splitnow has three modes: split by record count, split by number of chunks and split by file size.datefmtis a new top-level command for date formatting. We extracted it fromapplyto make it easier to use, and to set the stage for expanded date and timezone handling.enumnow has a--startoption.excelnow has a--keep-zero-timeoption and now has improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24.tojsonlnow has--trimand--no-booleanoptions and eliminated false positive boolean inferences.
Added
apply: addgender_guessoperation https://github.com/jqnatividad/qsv/pull/1569datefmt: new top-level command for date formatting. https://github.com/jqnatividad/qsv/pull/1638enum: add--startoption https://github.com/jqnatividad/qsv/pull/1631excel: added--keep-zero-timeoption; improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1595fetch: add--disk-cacheoption https://github.com/jqnatividad/qsv/pull/1621jsonl: major performance refactor! Now multithreaded with addl--batchand--joboptions https://github.com/jqnatividad/qsv/pull/1553sniff: added addl mimetype/file formats detected by bumpingfile-formatfrom 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1589split: add<outdir>error handling and add usage text examples https://github.com/jqnatividad/qsv/pull/1585split: added--chunksoption https://github.com/jqnatividad/qsv/pull/1587split: add--kb-sizeoption https://github.com/jqnatividad/qsv/pull/1613sqlp: added JSONL output format and compression support for AVRO and Arrow output formats in https://github.com/jqnatividad/qsv/pull/1635tojsonl: add--trimoption https://github.com/jqnatividad/qsv/pull/1554- Add QSVDOTENVPATH env var https://github.com/jqnatividad/qsv/pull/1562
- Add license scan report and status by @fossabot in https://github.com/jqnatividad/qsv/pull/1550
- Added several benchmarks for new/changed commands
Changed
luau: bumped Luau from 0.606 to 0.614freq: major performance refactor - https://github.com/jqnatividad/qsv/commit/1a3a4b4f54f7459ce120c2bc907385ad72d34d8esplit: migrate to rayon from threadpool https://github.com/jqnatividad/qsv/pull/1555split: refactored to actually create chunks <= desired--kb-size, obviating need for hacky--sep-factoroption https://github.com/jqnatividad/qsv/pull/1615tojsonl: improved true/false boolean inferencing false positive handling https://github.com/jqnatividad/qsv/pull/1641tojsonl: fine-tune boolean inferencing https://github.com/jqnatividad/qsv/pull/1643schema: use parallel sort when sorting enums for fields https://github.com/jqnatividad/qsv/commit/523c60a36bf45b4df5e66f3951a91948c22d5261- Use array for rustflags to avoid conflicts with user flags by @clarfonthey in https://github.com/jqnatividad/qsv/pull/1548
- Make it easier and more consistent to package for distros by @alerque in https://github.com/jqnatividad/qsv/pull/1549
- Replace
simple_home_dirwithsimple_expand_tildecrate https://github.com/jqnatividad/qsv/pull/1578 - build(deps): bump rayon from 1.8.0 to 1.8.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1547
- build(deps): bump rayon from 1.8.1 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1623
- build(deps): bump uuid from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1551
- build(deps): bump jql-runner from 7.1.2 to 7.1.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1552
- build(deps): bump jql-runner from 7.1.3 to 7.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1602
- build(deps): bump jql-runner from 7.1.5 to 7.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1637
- build(deps): bump flexi_logger from 0.27.3 to 0.27.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1556
- build(deps): bump regex from 1.10.2 to 1.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1557
- build(deps): bump cached from 0.47.0 to 0.48.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1558
- build(deps): bump cached from 0.48.0 to 0.48.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1560
- build(deps): bump cached from 0.48.1 to 0.49.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1618
- build(deps): bump chrono from 0.4.31 to 0.4.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/1559
- build(deps): bump chrono from 0.4.32 to 0.4.33 by @dependabot in https://github.com/jqnatividad/qsv/pull/1566
- build(deps): bump mlua from 0.9.4 to 0.9.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1565
- build(deps): bump mlua from 0.9.5 to 0.9.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1632
- build(deps): bump serde from 1.0.195 to 1.0.196 by @dependabot in https://github.com/jqnatividad/qsv/pull/1568
- build(deps): bump serde from 1.0.196 to 1.0.197 by @dependabot in https://github.com/jqnatividad/qsv/pull/1612
- build(deps): bump serde_json from 1.0.111 to 1.0.112 by @dependabot in https://github.com/jqnatividad/qsv/pull/1567
- build(deps): bump serde_json from 1.0.112 to 1.0.113 by @dependabot in https://github.com/jqnatividad/qsv/pull/1576
- build(deps): bump serde_json from 1.0.113 to 1.0.114 by @dependabot in https://github.com/jqnatividad/qsv/pull/1610
- bump Polars from 0.36 to 0.37 https://github.com/jqnatividad/qsv/pull/1570
- build(deps): bump polars from 0.37.0 to 0.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1629
- build(deps): bump polars from 0.38.0 to 0.38.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1634
- build(deps): bump strum from 0.25.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1572
- build(deps): bump indexmap from 2.1.0 to 2.2.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1575
- build(deps): bump indexmap from 2.2.1 to 2.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1579
- build(deps): bump indexmap from 2.2.2 to 2.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1601
- build(deps): bump indexmap from 2.2.4 to 2.2.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1633
- build(deps): bump robinraju/release-downloader from 1.8 to 1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1574
- build(deps): bump itertools from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1577
- build(deps): bump rust_decimal from 1.33.1 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1580
- build(deps): bump rust_decimal from 1.34.0 to 1.34.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1582
- build(deps): bump rust_decimal from 1.34.2 to 1.34.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1597
- build(deps): bump reqwest from 0.11.23 to 0.11.24 by @dependabot in https://github.com/jqnatividad/qsv/pull/1581
- build(deps): bump tokio from 1.35.1 to 1.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1583
- build(deps): bump tempfile from 3.9.0 to 3.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1590
- build(deps): bump tempfile from 3.10.0 to 3.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1622
- build(deps): bump indicatif from 0.17.7 to 0.17.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1598
- build(deps): bump csvs_convert from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1596
- build(deps): bump ahash from 0.8.7 to 0.8.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1599
- build(deps): bump ahash from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1611
- build(deps): bump ahash from 0.8.9 to 0.8.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1624
- build(deps): bump ahash from 0.8.10 to 0.8.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1640
- build(deps): bump governor from 0.6.0 to 0.6.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1603
- build(deps): bump semver from 1.0.21 to 1.0.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1606
- build(deps): bump ryu from 1.0.16 to 1.0.17 by @dependabot in https://github.com/jqnatividad/qsv/pull/1605
- build(deps): bump anyhow from 1.0.79 to 1.0.80 by @dependabot in https://github.com/jqnatividad/qsv/pull/1604
- build(deps): bump geosuggest-core from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1607
- build(deps): bump geosuggest-utils from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1608
- build(deps): bump pyo3 from 0.20.2 to 0.20.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1616
- build(deps): bump crossbeam-channel from 0.5.11 to 0.5.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1627
- build(deps): bump log from 0.4.20 to 0.4.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1628
- build(deps): bump sysinfo from 0.30.5 to 0.30.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1636
- build(deps): bump qsv-sniffer from 0.10.1 to 0.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1644
- deps: bump halfbrown from 0.24 to 0.25 https://github.com/jqnatividad/qsv/commit/b32fc7161715fc0d3cc96b1566f89354bea36abf
- apply select clippy suggestions
- update several indirect dependencies
- pin Rust nightly to 2024-02-23 - the nightly that Polars 0.38 can be built with
Fixed
- fix: fix feature = "cargo-clippy" deprecation by @rex4539 in https://github.com/jqnatividad/qsv/pull/1626
stats: fixed cache.json file not being updated properly https://github.com/jqnatividad/qsv/commit/b9c43713b0943baf2d70eb7089e1d8f05b848b9d
Removed
- Removed
datefmtsubcommand fromapplyhttps://github.com/jqnatividad/qsv/pull/1638
New Contributors
- @clarfonthey made their first contribution in https://github.com/jqnatividad/qsv/pull/1548
- @alerque made their first contribution in https://github.com/jqnatividad/qsv/pull/1549
- @fossabot made their first contribution in https://github.com/jqnatividad/qsv/pull/1550
- @rex4539 made their first contribution in https://github.com/jqnatividad/qsv/pull/1626
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.122.0...0.123.0
- Rust
Published by jqnatividad almost 2 years ago
https://github.com/dathere/qsv - 0.122.0
👉 REQUEST FOR USE CASES: 👈
Please help define the future of qsv. Add what you're currently using qsv for here - https://github.com/jqnatividad/qsv/discussions/1529
Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.
Highlights:
qsvpyis now available in the prebuilt binaries for select platforms! It's a new qsv binary variant with the python feature, enabling thepycommand. Three subvariants are available - qsvpy310, qsvpy311 and qsvpy312, corresponding to Python 3.10, 3.11 and 3.12 respectively.- Removed
generatecommand asgenerate's main dependency is unmaintained and has old dependencies.generatewas also not used much, as the test data it generated was not well suited for training models and it was too slow so we decided to remove it even before thesynthesize(#235) command is ready. reversenow has index support and can work in "streaming" mode and handle larger than memory CSV files.sortandsample: users can now choose from three Random Number Generator (RNG) algorithms with the--rngoption - standard, faster & cryptosecure.pseudonow has--start,--increment&--formatstroptions.fmtnow has a--no-final-newlineoption to suppress the final newline for better interoperability with other tools, specifically Excel. It also treats "T" as special value for tab character for the--out-delimiteroption.
Added
reverse: now has index support and can work in "streaming" mode https://github.com/jqnatividad/qsv/pull/1531sort: added--rng <kind>for different kinds of RNGs - standard, faster & cryptosecure https://github.com/jqnatividad/qsv/pull/1535sample: added--rng <kind>option (standard, faster & cryptosecure) https://github.com/jqnatividad/qsv/pull/1532pseudo: major refactor. Added--start,--increment&--formatstroptions https://github.com/jqnatividad/qsv/pull/1541fmt: add--no-final-newlineoption https://github.com/jqnatividad/qsv/pull/1545- added additional benchmarks
- added additional test for new options. We now have ~1,300 tests!
Changed
fmt:--out-delimiternow treats "T" as special value for tab character https://github.com/jqnatividad/qsv/pull/1546- build(deps): bump whatlang from 0.16.3 to 0.16.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1525
- build(deps): bump serde_json from 1.0.110 to 1.0.111 by @dependabot in https://github.com/jqnatividad/qsv/pull/1524
- build(deps): bump pyo3 from 0.20.1 to 0.20.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1526
- build(deps): bump sysinfo from 0.30.3 to 0.30.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1523
- build(deps): bump sysinfo from 0.30.4 to 0.30.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1530
- build(deps): bump serial_test from 2.0.0 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1534
- build(deps): bump mlua from 0.9.2 to 0.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1540
- build(deps): bump mlua from 0.9.3 to 0.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1542
- build(deps): bump simple-home-dir from 0.2.1 to 0.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1544
- apply select clippy suggestions
- update several indirect dependencies
Removed
- removed
generatecommand https://github.com/jqnatividad/qsv/pull/1527 - removed
generatefeature from GitHub Action workflows https://github.com/jqnatividad/qsv/pull/1528 sample: removed--fasterRNG sampling option, replacing it with--rnghttps://github.com/jqnatividad/qsv/pull/1532
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.121.0...0.122.0
- Rust
Published by jqnatividad about 2 years ago
https://github.com/dathere/qsv - 0.121.0
Two days ago, qsv 0.120.0 was released. Hours later, significant updates occurred in our ecosystem: Polars upgraded to version 0.36, Homebrew rolled out support for Rust 1.75.0, and our pull request for 'cached' was merged.
In light of these developments, we're releasing 0.121.0 out of cycle to leverage the new features, fixes and performance enhancements in these key components integral to qsv.
👉 REQUEST FOR USE CASES: 👈 Please help define the future of qsv. Add what you're currently using qsv for here - https://github.com/jqnatividad/qsv/discussions/1529
Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.
Added
sqlp: with Polars 0.36, it now supports:- subqueries for JOIN and FROM (examples)
- REGEXP and RLIKE pattern matching (examples)
- common variant spelling STDEV in the SQL engine (in addition to STDDEV)
- and more under the hood improvements!
sqlp: now supports writing to Apache Avro format https://github.com/jqnatividad/qsv/commit/32f2fbb1b06dfbee4e7823521e9991a306e7eb44sqlp: when writing to CSV--format, if the--outputfile has a TSV or TAB extension, it will automatically use the tab delimiter https://github.com/jqnatividad/qsv/commit/c97048cfc8c0fed01d7b32d3173be508135b9769
Changed
- Bump polars from 0.35 to 0.36 https://github.com/jqnatividad/qsv/pull/1521
- build(deps): bump serde from 1.0.193 to 1.0.194 by @dependabot in https://github.com/jqnatividad/qsv/pull/1520
- build(deps): bump serde_json from 1.0.109 to 1.0.110 by @dependabot in https://github.com/jqnatividad/qsv/pull/1519
- build(deps): bump semver from 1.0.20 to 1.0.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1518
- build(deps): bump serde_stacker from 0.1.10 to 0.1.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1517
- build(deps): bump cached from 0.46.1 to 0.47.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1522
- bumped MSRV to 1.75.0
Fixed
cat: fixed performance regression inrowskeyby moving unchanging variables out of hot loop - https://github.com/jqnatividad/qsv/commit/96a40e93b5ab09655aa90f8653014c96d3da652bsqlp: Polars 0.36 fixed the SQL SUBSTR() function
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.120.0...0.121.0
- Rust
Published by jqnatividad about 2 years ago
https://github.com/dathere/qsv - 0.120.0
Happy New Year! 🎉🎉🎉 Here's the first release of 2024, the biggest ever with 280+ commits! qsv 0.120.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.
Apart from wrapping qsv with a User Interface, qsv pro also comes with a retinue of related cloud-based data cleaning, enrichment and enhancement services along with expanded metadata inferencing to make your Data Useful, Usable and Used!
qsv pro draws inspiration from OpenRefine, but reimagined without its file size and speed limitations, with qsv pro having the ability to process multi-gigabyte files in seconds.
It incorporates hard lessons we learned in the past 12 years deploying Data Portals and Data Pipelines to create a new Data/Metadata Wrangling and AI-assisted Data Publishing service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.
But it's not quite ready for release yet, so stay tuned!
We're now taking signups for a preview release however, so if you're interested, please sign up here!
Excitingly, qsv was also mentioned on Hacker News in this thread Dec 23, 2023! As a result, we're now almost at 2,000+ stars on GitHub from 900 stars on Dec 22! 🎉🎉🎉
Stay tuned for more advancements in 2024 – it's set to be a landmark year for qsv! 🦄🦄🦄
Added
cat: add rowskey --group options; increased perf of rowskey https://github.com/jqnatividad/qsv/pull/1508validate: add --trim and --quiet options https://github.com/jqnatividad/qsv/pull/1452apply&applydp:operations regex_replacenow supports empty--replacementwith the "" special value https://github.com/jqnatividad/qsv/pull/1470 and https://github.com/jqnatividad/qsv/pull/1471 exclude: also consider rows with empty fields https://github.com/jqnatividad/qsv/pull/1498extsort: add--tmp-diroption https://github.com/jqnatividad/qsv/commit/ca1f46145cf6a06ad4401e2bf30f82415cc2ef82
Changed
validate: Faster RFC4180 validation with byterecords and SIMD-accelerated utf8 validation https://github.com/jqnatividad/qsv/pull/1440excel: minor performance tweaks https://github.com/jqnatividad/qsv/pull/1446apply,applydp,explode,geocode,pseudo: consolidate redundant code and use onereplace_column_valuehelper fn in util.rs https://github.com/jqnatividad/qsv/pull/1456excel: bump calamine from 0.22 to 0.23 https://github.com/jqnatividad/qsv/pull/1473excel&joinp: use atoi_simd for faster &[u8] to int conversion https://github.com/jqnatividad/qsv/commit/9521f3e3fb73f600e6691188a65e19eda6cd317ecat,describegpt,headers,sqlp,to,tojsonl: refactor commands that accept multiple input files to use improved process_input helper https://github.com/jqnatividad/qsv/pull/1496fetch&fetchpost: get_response refactor for maintainability and performance https://github.com/jqnatividad/qsv/pull/1507luau: replaced --no-colindex option with --colindex option. --col-index slows down processing and is not often used, so make it an option, not the default. https://github.com/jqnatividad/qsv/commit/a0c856807c47f00f531837ae124d412fca834cd2- make thousands crate optional with apply feature in https://github.com/jqnatividad/qsv/pull/1453
- build(deps): bump uuid from 1.6.0 to 1.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1430
- build(deps): bump serde from 1.0.192 to 1.0.193 by @dependabot in https://github.com/jqnatividad/qsv/pull/1432
- build(deps): bump data-encoding from 2.4.0 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1435
- build(deps): bump mlua from 0.9.1 to 0.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1436
- build(deps): bump url from 2.4.1 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1437
- build(deps): bump jql-runner from 7.0.6 to 7.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1439
- build(deps): bump jql-runner from 7.0.7 to 7.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1447
- build(deps): bump jql-runner from 7.1.0 to 7.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1457
- build(deps): bump jql-runner from 7.1.1 to 7.1.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1486
- build(deps): bump hashbrown from 0.14.2 to 0.14.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1441
- build(deps): bump redis from 0.23.3 to 0.23.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1442
- build(deps): bump redis from 0.23.3 to 0.24.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1455
- build(deps): bump atoi_simd from 0.15.3 to 0.15.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1444
- build(deps): bump atoi_simd from 0.15.4 to 0.15.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1445
- build(deps): bump atoi_simd from 0.15.5 to 0.15.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1512
- build(deps): bump actions/setup-python from 4.7.1 to 4.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1454
- build(deps): bump actions/setup-python from 4.8.0 to 5.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1459
- build(deps): bump actions/stale from 8 to 9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1463
- build(deps): bump itoa from 1.0.9 to 1.0.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1464
- build(deps): bump tokio from 1.34.0 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1465
- build(deps): bump tokio from 1.35.0 to 1.35.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1483
- build(deps): bump ryu from 1.0.15 to 1.0.16 by @dependabot in https://github.com/jqnatividad/qsv/pull/1466
- build(deps): bump file-format from 0.22.0 to 0.23.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1468
- build(deps): bump github/codeql-action from 2 to 3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1476
- build(deps): bump geosuggest-utils from 0.5.1 to 0.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1479
- build(deps): bump geosuggest-core from 0.5.1 to 0.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1478
- build(deps): bump reqwest from 0.11.22 to 0.11.23 by @dependabot in https://github.com/jqnatividad/qsv/pull/1480
- build(deps): bump calamine from 0.23.0 to 0.23.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1481
- build(deps): bump qsv-sniffer from 0.10.0 to 0.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1484
- build(deps): bump anyhow from 1.0.75 to 1.0.76 by @dependabot in https://github.com/jqnatividad/qsv/pull/1485
- build(deps): bump futures from 0.3.29 to 0.3.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1492
- build(deps): bump futures-util from 0.3.29 to 0.3.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1491
- build(deps): bump crossbeam-channel from 0.5.9 to 0.5.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1490
- build(deps): bump sysinfo from 0.29.10 to 0.29.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1443
- Bump sysinfo from 0.29.11 to 0.30 https://github.com/jqnatividad/qsv/pull/1489
- build(deps): bump sysinfo from 0.30.0 to 0.30.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1495
- build(deps): bump sysinfo from 0.30.1 to 0.30.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1504
- build(deps): bump sysinfo from 0.30.2 to 0.30.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1509
- build(deps): bump tabwriter from 1.3.0 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1500
- build(deps): bump tempfile from 3.8.1 to 3.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1502
- build(deps): bump qsv_docopt from 1.4.0 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1503
- build(deps): bump ahash from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1510
- build(deps): bump serde_json from 1.0.108 to 1.0.109 by @dependabot in https://github.com/jqnatividad/qsv/pull/1511
- apply select clippy suggestions
- update several indirect dependencies
- pin Rust nightly to 2023-12-23
Fixed
apply: Fix fordynfmtandcalcconvsubcommands not working in release mode https://github.com/jqnatividad/qsv/pull/1467luau: fix check for excess mapped columns earlier. Otherwise, we'll get a CSV different field count error https://github.com/jqnatividad/qsv/commit/db1581159590205af9befaade5c047d316c9c8b3
Removed
luau: remove unneeded--jitoption as we precompile luau scripts to bytecode https://github.com/jqnatividad/qsv/pull/1438
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.119.0...0.120.0
- Rust
Published by jqnatividad about 2 years ago
https://github.com/dathere/qsv - 0.119.0
Highlights:
As we prepare for version 1.0, we're focusing on performance, stability and reliability as we set the stage for qsv pro - a cloud-backed UI version of qsv powered by Tauri, set to be released in 2024. Stay tuned!
diffis now out of beta and blazingly fast! Give "the fastest CSV-diff in the world" a try :wink:!joinpnow supports snappy automatic compression/decompression!sqlp&joinpnow recognize theQSV_COMMENT_CHARenvironment variable, allowing you to skip comment lines in your input CSV files. They're also faster with the upgrade to Polars 0.35.4.sqlpnow supports subqueries, table aliases, and more!luau: upgraded embedded Luau from 0.599 to 0.604; refactored code to reduce unneeded allocations and increase performance (more than doubling it!) as we prepare for extended recipe support.catis now even faster with the--flexibleoption. If you know your CSV files are valid, you can use this option to skip CSV validation and makecatrun twice as fast!- qsv can now add a Byte Order Mark (BOM) header sequence to produce Excel-friendly CSVs on Windows with the
QSV_OUTPUT_BOMenvironment variable. stats,sort,schema&validateare now faster with the use ofatoi_simdto directly convert &[u8] to integer, skipping unnecessary utf8 validation, while also using SIMD CPU instructions for noticeably faster performance.
Added
diff: added option/flag for headers in output by @janriemer in https://github.com/jqnatividad/qsv/pull/1395diff: added option/flag--delimiter-outputby @janriemer in https://github.com/jqnatividad/qsv/pull/1402cat: added--flexibleoption to makecat rowsfaster still https://github.com/jqnatividad/qsv/pull/1408sqlp&joinp: both commands now recognize QSVCOMMENTCHAR env var https://github.com/jqnatividad/qsv/pull/1412joinp: added snappy compression/decompression support https://github.com/jqnatividad/qsv/pull/1413geocode: now automatically decompresses snappy-compressed index files https://github.com/jqnatividad/qsv/pull/1429- Add Byte Order Mark (BOM) output support https://github.com/jqnatividad/qsv/pull/1424
- Added Codacy code quality badge https://github.com/jqnatividad/qsv/commit/99591297d59b3c45363592516d5ecb7d4d98d5c8
Changed
stats,sort,schema&validate: use atoi_simd to directly convert &[u8] to integer skipping unnecessary utf8 validation, while also using SIMD instructions for noticeably faster performancecat: fastercat rowshttps://github.com/jqnatividad/qsv/pull/1407count: optimize--widthoption https://github.com/jqnatividad/qsv/pull/1411luau: upgrade embedded Luau from 0.603 to 0.604 https://github.com/jqnatividad/qsv/pull/1426- use
ato_simdfor fast &[u8] to int conversion https://github.com/jqnatividad/qsv/pull/1423 luau: performance refactor https://github.com/jqnatividad/qsv/commit/4cebd7c9a4b3f9f754fd2746484c24fa61ee1286- build(deps): bump csv-diff from 0.1.0-beta.4 to 0.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1394
- build(deps): bump serde_json from 1.0.107 to 1.0.108 by @dependabot in https://github.com/jqnatividad/qsv/pull/1393
- build(deps): bump indexmap from 2.0.2 to 2.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1397
- build(deps): bump jql-runner from 7.0.4 to 7.0.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1399
- build(deps): bump jql-runner from 7.0.5 to 7.0.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1400
- build(deps): bump file-format from 0.21.0 to 0.22.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1401
- build(deps): bump cached from 0.46.0 to 0.46.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1403
- build(deps): bump serde from 1.0.190 to 1.0.192 by @dependabot in https://github.com/jqnatividad/qsv/pull/1404
- build(deps): bump tokio from 1.33.0 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1409
- build(deps): bump flexi_logger from 0.27.2 to 0.27.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1410
- build(deps): bump qsv-stats from 0.11.0 to 0.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1415
- build(deps): bump itertools from 0.11.0 to 0.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1418
- build(deps): bump rust_decimal from 1.33.0 to 1.33.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1420
- build(deps): bump polars from 0.35.2 to 0.35.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1425
- build(deps): bump uuid from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1428
- bump MSRV to 1.74.0
- apply select clippy suggestions
- update several indirect dependencies
- pin Rust nightly to 2023-11-18
Fixed
pseudo: detect when more than one column is selected for pseudonymization https://github.com/jqnatividad/qsv/commit/0b093726bb964c2a4a6eec15c0e30ed3660fdf97- dotenv (.env) tweaks/fixes https://github.com/jqnatividad/qsv/pull/1427
- fix several typos https://github.com/jqnatividad/qsv/commit/723443eed4ac0f692cdd6ac4a1af4d82e22fda8b
- fix several markdown lints
Removed
- remove fast-float as std float parse is now also using Eisel-Lemire algorithm https://github.com/jqnatividad/qsv/pull/1414
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.118.0...0.119.0
NOTE:
To verify prebuilt binary zip archives - click here.
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.118.0
Highlights:
- With the Polars upgrade to 0.34.2, the
sqlpandjoinpenjoy expanded capabilities and a noticeable performance boost. 🦄🏇 - We now publish the 500, 1000, 5000 and 15000 Geonames cities indices for the
geocodecommand, with users able to easily switch indices with theindex-loadsubcommand. As the name implies, the 500 index contains cities with populations of 500 or more, the 1000 index contains cities with populations of 1000 or more, and so on.
The 15000 index (default) is the smallest (13mb) and fastest with ~26k cities. The 500 index is the largest(56mb) and slowest, with ~200k cities. The 5000 index is 21mb with ~53k cities. The 1000 index is 44mb with ~140k cities. 🎠 - The
geocodecommand now returns US Census FIPS codes for US places with the%jsonand%pretty-jsonformats, returning both US State and US County FIPS codes, with upcoming support for Cities and other US Census geographies (School Districts, Voting Districts, Congressional Districts, etc.) 🎠 - Improved performance for
stats,schemaandtojsonlcommands with the stats cache bincode refactor. This is especially noticeable for large CSV files asstatspreviously created large bincode cache files by default.
The bincode cache allows other commands (currently, onlyschemaandtojsonl) to skip recomputing statistics and deserialize the saved stats data structures directly into memory. Now, it will only create a bincode file if the--stats-binoutoption is specified (typically, before using theschemaantojsonlcommands).statswill still continue to create a stats CSV cache file by default, but it will be much smaller than the bincode file, and is universally applicable, unlike the bincode cache. 🏇 - self-update will now verify updates. This is done by verifying the zipsign signature of the release zip archive before applying it. This should make it harder for malicious actors to compromise the self-update process. Version 0.118.0 has the verification code, and future releases will use this new verification process.
Regardless, we will zipsign all zip archives starting with this release.
Users can manually verify the signatures by downloading the zipsign public key and running the
zipsigncommand line tool. See Verifying the Integrity of the Prebuilt Binaries Zip Archive for more info. 🦄 - The
frequencycommand now supports the--ignore-caseoption for case-insensitive frequency counts. 🦄🎠 - The
schemacommand can now compile case-insensitive enum constraints. 🦄 - Improved performance for
applyandapplydpcommands with faster compile-time perfect hash functions for operations lookups. 🏇 - Several minor performance improvements and bug fixes with
snappy,sniff&catcommands. 🏇
Added
frequency: added--ignore-caseoption https://github.com/jqnatividad/qsv/pull/1386geocode: added 500, 1000, 5000, 15000 Geonames cities convenience shortcuts toindexsubcommands https://github.com/jqnatividad/qsv/commit/bd9f4c34b0a88cc6a446872ed4cda41e8a1ca102schema: added--ignore-caseoption when compiling enum constraints; replaced Hashset with faster AHashset https://github.com/jqnatividad/qsv/commit/a16a1ca25f93699a5ee27327f4257e8e559bc5e8snappy: addedbuf_sizeparm to compress helper fn https://github.com/jqnatividad/qsv/commit/e0c0d1f7eb22917d43f638121babe23e366c9dd8sniffadded--just-mimeoption https://github.com/jqnatividad/qsv/pull/1372- added zipsign signature verification to self-update https://github.com/jqnatividad/qsv/pull/1389
Changed
apply&applydp: replaced binary_search with faster compile-time perfect hash functions for operations lookups https://github.com/jqnatividad/qsv/pull/1371stats,schemaandtojsonl: stats cache bincode refactor https://github.com/jqnatividad/qsv/pull/1377luau: replaced sanitise-file-name with more popular sanitize-filename crate https://github.com/jqnatividad/qsv/commit/8927cb70bc92e9e1360547e96d1ac10e6037e9e3cat: minor optimization by preallocating with capacity https://github.com/jqnatividad/qsv/commit/c13c34120c47bb7ab603a97a0a7cae7f0de7b146sqlp&joinp: expanded speed/functionality with upgrade to Polars 0.34.2 https://github.com/jqnatividad/qsv/pull/1385tojsonl: improved boolean inferencing. Now correctly infers booleans, even if the enum domain range is more than 2, but has cardinality 2 case-insensitive https://github.com/jqnatividad/qsv/commit/6345f2dc01f6451075ba7f23c35d8ba8cced9293- build(deps): bump strum_macros from 0.25.2 to 0.25.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1368
- build(deps): bump regex from 1.10.1 to 1.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1369
- build(deps): bump uuid from 1.4.1 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1373
- build(deps): bump hashbrown from 0.14.1 to 0.14.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1376
- build(deps): bump self_update from 0.38.0 to 0.39.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1378
- build(deps): bump ahash from 0.8.5 to 0.8.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1383
- build(deps): bump serde from 1.0.189 to 1.0.190 by @dependabot in https://github.com/jqnatividad/qsv/pull/1388
- build(deps): bump futures from 0.3.28 to 0.3.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1390
- build(deps): bump futures-util from 0.3.28 to 0.3.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1391
- build(deps): bump tempfile from 3.8.0 to 3.8.1 by @dependabot in https://github.com/jqnatividad/qsv/commit/4f6200cb57fdeb612aeb74d796b4b0c1fde7c243
- apply select clippy suggestions
- update several indirect dependencies
- pin Rust nightly to 2023-10-26
Fixed
dedup: fixed --ignore-case not being honored during internal sort option https://github.com/jqnatividad/qsv/pull/1387applydp: fixed wrong usage text usingapplyand notapplydphttps://github.com/jqnatividad/qsv/commit/c47ba86f305508a41e19ce39f2bd6323a0a60e1egeocode: fixedindex-updatenot honoring--timeoutparameter https://github.com/jqnatividad/qsv/commit/3272a9e3ac75e8b8f2d9f13b0cec81a0c41c7ed4geocode: fixedindex-loadto work properly with convenience shortcuts https://github.com/jqnatividad/qsv/commit/5097326ee41d39787b472b4eea95ddec76bb06b5
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.117.0...0.118.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.117.0
Highlights:
geocode: added Federal Information Processing Standards (FIPS) codes to results for US places, so we can derive GEOIDs. This paves the way to doing data enrichment lookups (starting with the US Census) in an upcoming release.- Added Goal/Non-goals, explicitly codifying what qsv is and isn't, and what we're trying to achieve with the toolkit.
excel: CSV output processing is now multi-threaded, making it a bit faster. The bottleneck is still the Excel/ODS library we're using (calamine), which is single-threaded. But there are active discussions underway to make it much faster in the future.- Upgrading the MSRV to 1.73.0 has allowed us to use LLVM 17, which has resulted in an overall performance boost.
Added:
geocode: added Federal Information Processing Standards (FIPS) codes to results for US places.- Added Goals/Non-goals to README.md
Changed
cat: minor optimization https://github.com/jqnatividad/qsv/commit/343bb668ae84fcf862883245382e7d8015da88c2excel: CSV output processing is now multi-threaded https://github.com/jqnatividad/qsv/pull/1360geocode: more efficient dynfmt ptocessing https://github.com/jqnatividad/qsv/pull/1367frequency: optimize allocations before hot loop https://github.com/jqnatividad/qsv/commit/655bebcdec6d89f0ffa33d794069ee5eee0df3e5luau: upgraded embedded Luau from 0.596 to 0.599deps: bump calamine from 0.22.0 to 0.22.1 https://github.com/jqnatividad/qsv/commit/4c4ed7e25614bbfe4d7b16fe7619a5a874ef7591docs: reorganized README, moving FEATURES and INTERPRETERS to their own markdown files.- build(deps): bump byteorder from 1.4.3 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1347
- build(deps): bump tokio from 1.32.0 to 1.33.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1354
- build(deps): bump regex from 1.9.6 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1356
- build(deps): bump semver from 1.0.19 to 1.0.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/1358
- build(deps): bump pyo3 from 0.19.2 to 0.20.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1359
- build(deps): bump serde from 1.0.188 to 1.0.189 by @dependabot in https://github.com/jqnatividad/qsv/pull/1361
- build(deps): bump flate2 from 1.0.27 to 1.0.28 by @dependabot in https://github.com/jqnatividad/qsv/pull/1363
- build(deps): bump regex from 1.10.0 to 1.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1366
deps: update several indirect dependencies- pin Rust nightly to 2023-10-14
- bump MSRV to 1.73.0
Removed
excel: removed--progressbaroption as Excel/ODS maximum sheet size is just too small (1,048,576 rows) to make it useful.
Fixed
- Fixed Jupyter Notebook Viewer Link by @a5dur in https://github.com/jqnatividad/qsv/pull/1349
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.116.0...0.117.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.116.0
Highlights: :tada: :rocket:
- Benchmarks refinements galore with more benchmarks and more comprehensive benchmarking instructions. 🎠
geocode: The Geonames index's configuration metadata is now available with thegeocode index-checksubcommand. No need to maintain a separate metadata JSON file. This should make it even easier to maintain multiple Geonames index files with different configurations without having to worry if you're looking at the right metadata JSON file. 🎠cat:rowskeysubcommand is now 27% faster 🏇🏽tojsonl: parallelized with rayon, making it 33% faster! 🏇🏽- smaller qsv binary size and faster compile times if the
to_parquetfeature is disabled. If you're good enough withsqlp's ability to create a parquet file from a SQL query, qsv's binary size and compile time will be markedly smaller/faster. 🏇🏽 - minor perf tweaks & optimizations -
countandluaucommands 🏇🏽
Added
geocode: added Geonames index file metadata toindex-checksubcommandtojsonl: parallelized with rayon https://github.com/jqnatividad/qsv/pull/1338to: addedto_parquetfeature. https://github.com/jqnatividad/qsv/pull/1341benchmarks: upgraded from 3.0.0 to 3.3.1- you can now specify a separate benchmarking binary as we dogfood qsv for the benchmarks and some features are required that may not be in the qsv binary variant being benchmarked
- added additional
countbenchmarks with--widthoption - added additional
luaubenchmarks with single/multi filter options - added additional
searchbenchmark with--unicodeoption - show absolute path of qsv binaries used (both the one we're dogfooding and the one being benchmarked) and their version info before running the benchmarks proper
- ensured
schemabenchmark was not using the stats cache with the--forceoption
Changed
cat: use an empty byte_record var instead of repeatedly allocating a new one in a hot loop https://github.com/jqnatividad/qsv/commit/eddafd11acb8e8d9d8587f952ba8cd02d450b08ecount: minor optimization https://github.com/jqnatividad/qsv/commit/bb113c0f348d4903ebfdc893c09517e5a4b145adluau: minor perf tweaks https://github.com/jqnatividad/qsv/commit/c71cd16a22f729a074a2a8d59020eba4cc8d7281 and https://github.com/jqnatividad/qsv/commit/f9c1e3c755fdb847be8f7f54d21622fb0c8c747f- (deps): bump Geosuggest from 0.4.5 to 5.1 https://github.com/jqnatividad/qsv/pull/1333
- (deps): use patched version of calamine which has unreleased fixes since 0.22.0
- build(deps): bump flexi_logger from 0.27.0 to 0.27.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1328
- build(deps): bump indexmap from 2.0.0 to 2.0.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1329
- build(deps): bump hashbrown from 0.14.0 to 0.14.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1334
- build(deps): bump file-format from 0.20.0 to 0.21.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1335
- build(deps): bump indexmap from 2.0.1 to 2.0.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1336
- build(deps): bump regex from 1.9.5 to 1.9.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1337
- build(deps): bump jql-runner from 7.0.3 to 7.0.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1340
- build(deps): bump csvs_convert from 0.8.7 to 0.8.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1339
- build(deps): bump actions/setup-python from 4.7.0 to 4.7.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1342
- build(deps): bump reqwest from 0.11.21 to 0.11.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1343
- build(deps): bump csv from 1.2.2 to 1.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1344
- build(deps): bump actix-governor from 0.4.1 to 0.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1346
- applied select clippy suggestions
- update several indirect dependencies
- pin Rust nightly to 2023-10-04
Removed
geocode: removed separate metadata JSON file for Geonames index files. The metadata is now embedded in the index file itself and can be viewed with theindex-checkcommand.- removed redundant setting from profile.release-samply in Cargo.toml https://github.com/jqnatividad/qsv/commit/2a35be5bbae2fc6994c103acac37ea3559854a0a
Fixed
geocode: when producing JSON output with the now subcommands (suggestnow,reversenow,countryinfonow), we now produce valid JSON. We previously generated JSON with escaped/extra quotes as it was formatted to be included in CSV files, which is required for thesuggest,reverseandcountryinfosubcommands as they are designed to process CSVs with multiple rows, thus requiring escaped JSON. Thenowcommands are only meant for one result so there's no need to escape quote the JSON. https://github.com/jqnatividad/qsv/pull/1345schema: fixed--forceflag not being honored
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.115.0...0.116.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.115.0
We continue to refine the benchmark suite, and have added a new setup argument to setup and install the required tools for the benchmark suite. We've also added more comprehensive checks to ensure that the required tools are installed before running the benchmarks. 🎠
For geocode, we've added a JSON file describing the Geonames index file configuration. This should help users maintain several Geonames index files with different configurations. 🎠
geocode should also be a tad faster now, thanks to cached crate making ahash its default hashing algorithm and upgrading hashbrown - microbenchmarks show a 33% performance improvement. 🏇🏽
We also added a release-samply profile so we can make it easier to squeeze more performance out of the toolkit with samply. 🏇🏽
Added
geocode: added a JSON file describing the Geonames index file configuration in https://github.com/jqnatividad/qsv/pull/1324benchmarks: v3.0.0 release- added
setupargument to setup and install required tools for the benchmark suite - added more comprehensive required tools check
- added more realistic luau benchmarks, using helper luau scripts (dtformat.luau and turnaroundtime.luau)
- added stats withcache and createcache benchmarks
- added benchmark_aggregations.luau script for benchmark analysis
- added
binary,total_meanandqsv_envcolumns to benchmark resultsbinaryis the qsv binary variant usedtotal_meanis the sum of all the mean run times of the benchmarksqsv_envare the qsv-relevant environment variables active while running the benchmarks - expanded README.md and benchmark suite usage instructions
- added
- added
release-samplyprofile to Cargo.toml to facilitate continued performance optimization withsamply
Changed
readme: move tab completion instructions/script to scripts/miscgeocode: updated bundled Geonames index to 2021-09-25- bump embedded luau from 0.594 to 0.596
- build(deps): bump flexi_logger from 0.26.1 to 0.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1317
- build(deps): bump indicatif from 0.17.6 to 0.17.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1318
- build(deps): bump semver from 1.0.18 to 1.0.19 by @dependabot in https://github.com/jqnatividad/qsv/pull/1320
- build(deps): bump cached from 0.45.1 to 0.46.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1322
- build(deps): bump geosuggest-core from 0.4.3 to 0.4.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1323
- build(deps): bump geosuggest-utils from 0.4.3 to 0.4.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1321
- build(deps): bump fastrand from 2.0.0 to 2.0.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1325
- bump MSRV from Rust 1.72.0 to 1.72.1
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-09-25
Fixed
benchmarks: fixed invalid luau benchmark that had invalid luau command
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.114.0...0.115.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.114.0
The long-overdue Benchmarks revamp is finally here! 🎉- https://qsv.dathere.com/benchmarks
The benchmarks have been completely rewritten to be more reproducible, and now use hyperfine instead of time. The new benchmarks are now run as part of the release process, and the results are compiled into a single page that is published on the new Quicksilver website.
The new benchmarks are also more comprehensive, and designed to be run on a variety of hardware and operating systems. This allows users to adapt the benchmarks to their own workloads and environments.
Other release highlights include:
* geocode is now fully-featured and ready for production use! 🎉 Though it only currently features Geonames city-level lookup support, it provides a solid foundation on top of which we'll add more geocoding providers in the future (next up - OpenCage support with street-level geocoding).
* Polars has been bumped from 0.32.1 to 0.33.2, which includes a number of performance improvements for the sqlp and joinp commands.
* major performance increase on several regex/aho-corasick powered commands on Apple Silicon thanks to various under-the-hood improvements in the aho-corasick crate.
Big thanks to @rzmk , @a5dur, @minhajuddin2510 and @samibaig and helping me finally push out the revamped Benchmarks!
Added
- Added autoindex size threshold, replacing
QSV_AUTOINDEXenv var withQSV_AUTOINDEX_SIZE. Resolves #1300. in https://github.com/jqnatividad/qsv/pull/1301 https://github.com/jqnatividad/qsv/commit/69e25aceb25d3bb20d8fdeeadf5504d8fe75fe37 diff: Added test for different delimiters by @janriemer in https://github.com/jqnatividad/qsv/pull/1297benchmarks: Added qsv benchmark notebook. by @a5dur in https://github.com/jqnatividad/qsv/pull/1309geocode: Addedcountryinfo/nowsubcommand made available in geosuggest 0.4.3 https://github.com/jqnatividad/qsv/pull/1311geocode: Added--languageoption so users can specify the language of the geocoding results. This requires running theindex-updatesubcommand with the--languagesoption to rebuild the index with the desired languages.sqlp: add example of using columns with embedded spaces in SQL queries https://github.com/jqnatividad/qsv/commit/f7bf4f65edc2068f42712808aec7096ef7122dfe
Changed
benchmarks: Benchmarks revamped https://github.com/jqnatividad/qsv/pull/1298, https://github.com/jqnatividad/qsv/pull/1310 https://github.com/jqnatividad/qsv/commit/d8eeb949b8c846793941eb9c343e8598784b6207- build(deps): bump serde_json from 1.0.106 to 1.0.107 by @dependabot in https://github.com/jqnatividad/qsv/pull/1302
- build(deps): bump mimalloc from 0.1.38 to 0.1.39 by @dependabot in https://github.com/jqnatividad/qsv/pull/1303
- build(deps): bump simple-home-dir from 0.1.4 to 0.2.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1304
- build(deps): bump chrono from 0.4.30 to 0.4.31 by @dependabot in https://github.com/jqnatividad/qsv/pull/1305
- (deps): bump Polars from 0.32.1 to Polars 0.33.2 https://github.com/jqnatividad/qsv/pull/1308
- build(deps): bump cpc from 1.9.2 to 1.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1313
- build(deps): bump rayon from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1315
- (deps): update several indirect dependencies
- pin Rust nightly to 2023-09-21
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.113.0...0.114.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.113.0
This is the first "Unicorn" 🦄 release, adding MAJOR new features to the toolkit!
geocode: adds high-speed, cache-backed, multi-threaded geocoding using a local, updateable copy of the GeoNames database. This is a major improvement over the previousgeocodesubcommand in theapplycommand thanks to the wonderful geosuggest crate.- guaranteed non-UTF8 input detection with the
validateandinputcommands. Quicksilver REQUIRES UTF-8 encoded input. You can now use these commands to ensure you have valid UTF-8 input before using the rest of the toolkit. - New/expanded whirlwind tour & quick-start notebooks by @a5dur and @rzmk 🎠
- Various performance improvements all-around: 🏇🏽
- overall increase of ~5% now that
mimalloc- the default allocator for qsv, is built without secure mode unnecessarily enabled. flattencommand is now ~10% faster- faster regex performance thanks to various under-the-hood improvements in the
regexcrate - and the benchmark scripts have been updated by @minhajuddin2510 to use hyperfine instead of time, and to use the same input file for all benchmarks to make them more reproducible. In upcoming releases, we'll start compiling the benchmark results into a single page as part of the release process, so we can track our progress over time.
- overall increase of ~5% now that
and last but not least - Quicksilver now has a website! - https://qsv.dathere.com/ :unicorn: :tada: :rocket:
And its not just a static site with a few links - its a full-blown web app that lets you try out qsv commands in your browser! It's not just a demo site - you can use it as a configurator and save your commands to a gist and share them with others!
It's the first Beta release of the Quicksilver website, so there's still a lot of work to do, but we're excited to share it with you and get your feedback!
We have more exciting features planned for Quicksilver and the website, but we require your help to make it happen! For qsv, use GitHub issues. For the website, use the feedback form. And if you want to help out, please check out the contributing guide.
Big thanks to @rzmk for all the work on the website! To @a5dur for all the QA work on this release! And to @minhajuddin2510 for revamping the benchmark script!
Added
geocode: new high-speed geocoding command https://github.com/jqnatividad/qsv/pull/1231- major improvements using geosuggest upstream https://github.com/jqnatividad/qsv/pull/1269
- add suggest
--countryfilter https://github.com/jqnatividad/qsv/pull/1275 - add
--admin1filter https://github.com/jqnatividad/qsv/pull/1276 - automatic
--countryinferencing from--admin1code https://github.com/jqnatividad/qsv/pull/1277 - add
--suggestnowand--reversenowsubcommands https://github.com/jqnatividad/qsv/pull/1280 - add
"%dyncols:"special formatter to dynamically add geocoded columns to the output CSV https://github.com/jqnatividad/qsv/pull/1286
excel: add SheetType (Worksheet, DialogSheet, MacroSheet, ChartSheet, VBA) in metadata mode; log.info! headers; wordsmith comments https://github.com/jqnatividad/qsv/pull/1225excel: moar metadata! moar examples! https://github.com/jqnatividad/qsv/pull/1271- add support ALL_PROXY env var https://github.com/jqnatividad/qsv/pull/1233
input: add--encoding-errorshandling option https://github.com/jqnatividad/qsv/pull/1235fixlengths: add--insertoption https://github.com/jqnatividad/qsv/pull/1247joinp: add--sql-filteroption https://github.com/jqnatividad/qsv/pull/1287luau: we now embed Luau 0.594 from 0.592notebooks: add qsv-colab-quickstart by @rzmk in https://github.com/jqnatividad/qsv/pull/1253notebooks: Added Whirlwindtour.ipynb by @a5dur in https://github.com/jqnatividad/qsv/pull/1223
Changed
flatten: refactor for performance https://github.com/jqnatividad/qsv/pull/1227validate: improved utf8 error mesages https://github.com/jqnatividad/qsv/pull/1256apply&applydp: improve usage text in relation to multi-column capabilites https://github.com/jqnatividad/qsv/pull/1257- qsv-cache now set to ~/.qsv-cache by default https://github.com/jqnatividad/qsv/pull/1265
- Download file helper refactor https://github.com/jqnatividad/qsv/pull/1267
- Benchmark Update by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1237
- Improved error handling https://github.com/jqnatividad/qsv/pull/1238
- Improved error handling - incorrect usage errors are now differentiated from other errors as well https://github.com/jqnatividad/qsv/pull/1239
- build(deps): bump whatlang from 0.16.2 to 0.16.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1221
- build(deps): bump serde_json from 1.0.104 to 1.0.105 by @dependabot in https://github.com/jqnatividad/qsv/pull/1220
- build(deps): bump tokio from 1.31.0 to 1.32.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1222
- build(deps): bump mlua from 0.9.0-rc.3 to 0.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1224
- build(deps): bump tempfile from 3.7.1 to 3.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1226
- build(deps): bump postgres from 0.19.5 to 0.19.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1229
- build(deps): bump file-format from 0.18.0 to 0.19.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1228
- build(deps): bump reqwest from 0.11.18 to 0.11.19 by @dependabot in https://github.com/jqnatividad/qsv/pull/1232
- build(deps): bump rustls-webpki from 0.101.3 to 0.101.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1236
- build(deps): bump reqwest from 0.11.19 to 0.11.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/1241
- build(deps): bump rust_decimal from 1.31.0 to 1.32.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1242
- build(deps): bump serde from 1.0.185 to 1.0.186 by @dependabot in https://github.com/jqnatividad/qsv/pull/1243
- build(deps): bump jql-runner from 7.0.2 to 7.0.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1246
- build(deps): bump grex from 1.4.2 to 1.4.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1245
- build(deps): bump mlua from 0.9.0 to 0.9.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1244
- build(deps): bump mimalloc from 0.1.37 to 0.1.38 by @dependabot in https://github.com/jqnatividad/qsv/pull/1249
- build(deps): bump postgres from 0.19.6 to 0.19.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1251
- build(deps): bump serde from 1.0.186 to 1.0.187 by @dependabot in https://github.com/jqnatividad/qsv/pull/1250
- build(deps): bump serde from 1.0.187 to 1.0.188 by @dependabot in https://github.com/jqnatividad/qsv/pull/1252
- build(deps): bump regex from 1.9.3 to 1.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1254
- build(deps): bump url from 2.4.0 to 2.4.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1261
- build(deps): bump tabwriter from 1.2.1 to 1.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1259
- build(deps): bump sysinfo from 0.29.8 to 0.29.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1260
- build(deps): bump actix-web from 4.3.1 to 4.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1262
- build(deps): bump chrono from 0.4.26 to 0.4.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1264
- build(deps): bump chrono from 0.4.27 to 0.4.28 by @dependabot in https://github.com/jqnatividad/qsv/pull/1266
- build(deps): bump redis from 0.23.2 to 0.23.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1268
- build(deps): bump regex from 1.9.4 to 1.9.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1272
- build(deps): bump flexi_logger from 0.25.6 to 0.26.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1273
- build(deps): bump geosuggest-core from 0.4.0 to 0.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1279
- build(deps): bump geosuggest-utils from 0.4.0 to 0.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1278
- build(deps): bump cached from 0.44.0 to 0.45.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1282
- build(deps): bump self_update from 0.37.0 to 0.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1281
- build(deps): bump actions/checkout from 3 to 4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1283
- build(deps): bump chrono from 0.4.28 to 0.4.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1284
- build(deps): bump cached from 0.45.0 to 0.45.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1285
- build(deps): bump sysinfo from 0.29.9 to 0.29.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1288
- build(deps): bump chrono from 0.4.29 to 0.4.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1290
- build(deps): bump bytes from 1.4.0 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1289
- build(deps): bump file-format from 0.19.0 to 0.20.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1291
- cargo update bump several indirect dependencies
- apply select clippy suggestions
- pin Rust nightly to 2023-09-06
Removed
apply: remove geocode subcmd now that we have a dedicatedgeocodecommand https://github.com/jqnatividad/qsv/pull/1263
Fixed
excel: we can now open workbooks with formulas set to an empty string value https://github.com/jqnatividad/qsv/pull/1274notebooks: fix qsv colab quickstart link by @rzmk in https://github.com/jqnatividad/qsv/pull/1255
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.112.0...0.113.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.112.0
This is the second in a series of "Giddy-up" 🏇🏽 releases, improving the performance of the following commands:
stats: by refactoring the code to detect empty cells more efficiently, and by removing unnecessary bounds checks in the main compute loop. (~10% performance improvement)sample: by refactoring the code to use an index more effectively when available - not only making it faster, but also eliminating the need to load the entire dataset into memory. Also added a--fasteroption to use a faster random number generator. (~15% performance improvement)frequency,schema,search&validateby amortizing/reducing allocations in hot loopsexcel: by refactoring the main hot loop to convert Excel cells more efficiently
The prebuilt binaries are also built with CPU optimizations enabled for x86_64 and Apple Silicon (arm64) architectures.
0.112.0 is also a "Carousel" (i.e. increased usability) 🎠 release featuring new Jupyter notebooks in the contrib/notebooks directory to help users get started with qsv.
- intro-to-count.ipynb by @rzmk
- qsv-describegpt-qa.ipynb by @a5dur
Added
sqlp: addedCASEexpression support with Polars 0.32 https://github.com/jqnatividad/qsv/commit/9d508e69cc4165b7adbe4b44b15c4c07001cf76bsample: added--fasteroption to use a faster random number generator https://github.com/jqnatividad/qsv/pull/1210jsonl: added--delimiteroption https://github.com/jqnatividad/qsv/pull/1205excel: added--delimiteroption https://github.com/jqnatividad/qsv/commit/ab73067da1f498c7c64de9b87586d6998d36d042notebook/describegpt: added describegpt QA Jupyter notebook by @a5dur in https://github.com/jqnatividad/qsv/pull/1215notebook/count: added intro-to-count.ipynb by @rzmk in https://github.com/jqnatividad/qsv/pull/1207
Changed
stats: refactor hot compute function - https://github.com/jqnatividad/qsv/commit/35999c5dad996edcafe6094ff4b717f96d657832stats: faster detection of empty samples https://github.com/jqnatividad/qsv/commit/b0548159ca8c8a35bab1dd196c72414f739c2fd8 and https://github.com/jqnatividad/qsv/commit/a7f0836bcebf947efb3cc7e7f6a884cc649196b5sample: major refactor making it faster, but also eliminating need to load the entire dataset into memory when an index is available. https://github.com/jqnatividad/qsv/pull/1210frequency: refactor primary ftables function https://github.com/jqnatividad/qsv/commit/57d660d6cf48be4b8845b5c09a46b16582f612c0excel: refactor main loop for more performance - https://github.com/jqnatividad/qsv/commit/61f227b0120c8d20bfb5906536a0a0de7d9f82adrustfmt: matchblocktrailing_comma https://github.com/jqnatividad/qsv/pull/1206- bump MSRV to 1.71.1 https://github.com/jqnatividad/qsv/commit/1c993644992d1cf4d0985d100045821cb027c17d
- apply clippy suggestions https://github.com/jqnatividad/qsv/pull/1209
- build(deps): bump tokio from 1.29.1 to 1.30.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1204
- build(deps): bump log from 0.4.19 to 0.4.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/1211
- build(deps): bump redis from 0.23.1 to 0.23.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1213
- build(deps): bump tokio from 1.30.0 to 1.31.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1212
- build(deps): bump sysinfo from 0.29.7 to 0.29.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1214
- upgrade to Polars 0.32.0 https://github.com/jqnatividad/qsv/pull/1217
- build(deps): bump flate2 from 1.0.26 to 1.0.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1218
- build(deps): bump polars from 0.32.0 to 0.32.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1219
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-08-13
Removed
stats: removed Debug derives from structs - https://github.com/jqnatividad/qsv/commit/2def136230ed2e9af727168d3a6329d660b65d4d
Fixed
notebook/count: fix Google Colab link by @rzmk in https://github.com/jqnatividad/qsv/pull/1208
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.111.0...0.112.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.111.0
This is the first in a series of "Giddy-up" 🏇🏽 releases.
As Quicksilver matures, we will continue to tweak it in our goal to be the 🚀 fastest general purpose CSV data-wrangling CLI toolkit available.
"Giddy-up" 🏇🏽 releases increase performance by: * taking advantage of new Rust features as they become available * using new libraries that are faster than the ones we currently use * optimizing our code to take advantage of new features in the libraries we use * using new algorithms that are faster than the ones we currently use * taking advantage of more hardware features (SIMD, multi-core, etc.) * adding reproducible benchmarks that are automatically updated on release to track our progress
As it is, Quicksilver has an aggressive release tempo - with more than 160 releases since its initial release in December 2020. This was made possible by the solid foundation of Rust and the xsv project from which qsv was forked. We will continue to build on this foundation by adding more CI tests and starting to track code coverage so we can continue to iterate aggressively with confidence.
Apart from "giddy-up" releases, Quicksilver will also have "carousel" 🎠 releases that will focus on making the toolkit more accessible to non-technical users.
"Carousel" 🎠 releases will include: * more documentation * more examples * more tutorials * more recipes in the Cookbook * multiple GUI wrappers around the CLI * integrations with common desktop tools like Excel, Google Sheets, Open Office, etc. * tighter integration with the CKAN ecosystem, with a focus on helping data publishers & data coordinators maintain a high quality data/metadata catalog
Hopefully, this will make qsv more accessible to non-technical users, and help them get more value out of their data. Special attention will be given to "open data" use cases - enabling non-profits, governments and regular citizens tap raw open data and convert it to actionable insight - making open data useful, usable and used.
Every now and then, we'll also have "Unicorn" 🦄 releases that will add MAJOR new features to the toolkit (e.g. 10x type features like the integration of Pola.rs into qsv).
We will also add a new Technical Documentation section to the wiki to document qsv's architecture and how each command works. The hope is doing so will lower the barrier to contributions and help us grow the community of qsv contributors.
Added
sort: add --faster option https://github.com/jqnatividad/qsv/pull/1190describegpt: add -Q, --quiet option by @rzmk in https://github.com/jqnatividad/qsv/pull/1179
Changed
stats: refactor initdateinference https://github.com/jqnatividad/qsv/pull/1187join: cache has_headers result in hot loop https://github.com/jqnatividad/qsv/commit/e53edafdc91493c61e9889c8004177f147483a45search&searchset: amortize allocs https://github.com/jqnatividad/qsv/pull/1188stats: usefast-floatto convert string to float https://github.com/jqnatividad/qsv/pull/1191sqlp: more examples, apply clippy::needless_borrow lint https://github.com/jqnatividad/qsv/commit/ff37a041da246101db03c51d22b498127a5d7ba7 and https://github.com/jqnatividad/qsv/commit/b8e1f7784cc6906745cdd43b61194e897a3666c4- use
fast-floatproject-wide (apply,applydp,schema,sort,validate) https://github.com/jqnatividad/qsv/pull/1192 - fine tune publishing workflows to enable universally available CPU features https://github.com/jqnatividad/qsv/commit/a1dccc74b480477acaa17e21dde706c159c56b48
- build(deps): bump serde from 1.0.179 to 1.0.180 by @dependabot in https://github.com/jqnatividad/qsv/pull/1176
- build(deps): bump pyo3 from 0.19.1 to 0.19.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1177
- build(deps): bump qsv-dateparser from 0.9.0 to 0.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1178
- build(deps): bump qsv-sniffer from 0.9.4 to 0.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1180
- build(deps): bump indicatif from 0.17.5 to 0.17.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1182
- Bump to qsv stats 0.11 https://github.com/jqnatividad/qsv/pull/1184
- build(deps): bump serde from 1.0.180 to 1.0.181 by @dependabot in https://github.com/jqnatividad/qsv/pull/1185
- build(deps): bump qsv_docopt from 1.3.0 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1186
- build(deps): bump filetime from 0.2.21 to 0.2.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1193
- build(deps): bump regex from 1.9.1 to 1.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1194
- build(deps): bump regex from 1.9.2 to 1.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1195
- build(deps): bump serde from 1.0.181 to 1.0.182 by @dependabot in https://github.com/jqnatividad/qsv/pull/1196
- build(deps): bump tempfile from 3.7.0 to 3.7.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1199
- build(deps): bump strum_macros from 0.25.1 to 0.25.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1200
- build(deps): bump serde from 1.0.182 to 1.0.183 by @dependabot in https://github.com/jqnatividad/qsv/pull/1201
- cargo update bump several indirect dependencies
- apply select clippy lint suggestions
- pin Rust nightly to 2023-08-07
Removed
- temporarily remove rand/simd_support feature when building nightly as its causing the nightly build to fail https://github.com/jqnatividad/qsv/commit/0a66fdb454941052857f6458df38abe7730e0b4b
Fixed
- fixed typos from documentation by @a5dur in https://github.com/jqnatividad/qsv/pull/1203
New Contributors
- @a5dur made their first contribution in https://github.com/jqnatividad/qsv/pull/1203
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.110.0...0.111.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.110.0
Added
describegpt: Add jsonl to prompt file doc section & more clarification by @rzmk in https://github.com/jqnatividad/qsv/pull/1149luau: add--no-jitoption https://github.com/jqnatividad/qsv/pull/1170sqlp: add CTE examples https://github.com/jqnatividad/qsv/commit/33f0218c6a78b9cef15e9bed6e227e5f17ef747a
Changed
frequency: minor optimizations https://github.com/jqnatividad/qsv/commit/ecac0be5777a50cef2bfe7937d80c5ffe071e4cdjoin: performance optimizations https://github.com/jqnatividad/qsv/commit/4cb593783efc4e7c2026d632b8dc741cc2edc778 and https://github.com/jqnatividad/qsv/commit/4cb593783efc4e7c2026d632b8dc741cc2edc778sqlp: reduce allocs in loop https://github.com/jqnatividad/qsv/commit/ae164b570c300845e75ce0fac3272221bdebfa66- Apple Silicon build now uses mimalloc allocator by default https://github.com/jqnatividad/qsv/commit/bfab24aba2d3b3f70f08ea407572d20feeda725d
- build(deps): bump jql-runner from 7.0.1 to 7.0.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1151
- build(deps): bump serde from 1.0.171 to 1.0.173 by @dependabot in https://github.com/jqnatividad/qsv/pull/1154
- build(deps): bump tempfile from 3.6.0 to 3.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1155
- build(deps): bump serde from 1.0.174 to 1.0.175 by @dependabot in https://github.com/jqnatividad/qsv/pull/1157
- build(deps): bump redis from 0.23.0 to 0.23.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1164
- build(deps): bump serde from 1.0.175 to 1.0.177 by @dependabot in https://github.com/jqnatividad/qsv/pull/1163
- build(deps): bump serde_json from 1.0.103 to 1.0.104 by @dependabot in https://github.com/jqnatividad/qsv/pull/1160
- build(deps): bump grex from 1.4.1 to 1.4.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1159
- build(deps): bump sysinfo from 0.29.6 to 0.29.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1158
- build(deps): bump mlua from 0.9.0-rc.1 to 0.9.0-rc.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1169
- build(deps): bump flexi_logger from 0.25.5 to 0.25.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1168
- build(deps): bump jemallocator from 0.5.0 to 0.5.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1167
- build(deps): bump serde from 1.0.177 to 1.0.178 by @dependabot in https://github.com/jqnatividad/qsv/pull/1166
- build(deps): bump rust_decimal from 1.30.0 to 1.31.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1172
- build(deps): bump csvs_convert from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1174
- apply
clippy:needless_pass_by_ref_mutlint inselectandfrequencyhttps://github.com/jqnatividad/qsv/commit/ba6566e5ea73a1042d33c02035ed1736947b60d8 and https://github.com/jqnatividad/qsv/commit/83add7b30c6e32a49b412629acf60c4c7057df37 - cargo update bump indirect dependencies
- pin Rust nightly to 2023-07-29
Removed
excel: remove defunct dates-whitelist comments https://github.com/jqnatividad/qsv/commit/2a24d2dcd23c2ccd24dfef1055bf265085f10146
Fixed
join: fix left-semi join. Fixes #1150. https://github.com/jqnatividad/qsv/pull/1153foreach: fix command argument token splitter pattern. Fixes #1171 https://github.com/jqnatividad/qsv/pull/1173
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.109.0...0.110.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.109.0
This is a monstrous👹 release with lots of new features and improvements!
The biggest new feature is the describegpt command which allows you to use OpenAI's Large Language Models to generate extended metadata from a CSV. We created this command primarily for CKAN and Datapusher+ so we can infer descriptions, tags and to automatically created annotated data dictionaries using the CSV's summary statistics and frequency tables. In that way, it works even for very large CSV files without consuming too many Open AI tokens. This is a very powerful feature and we are looking forward to seeing what people do with it. Thanks @rzmk for all the work on this!
This release also features major improvements in the sqlp and joinp commands thanks to all the new capabilities of Polars 0.31.1.
Polars SQL's capabilities have been vastly improved in 0.31.1 with numerous new SQL functions and operators, and they're all available with the sqlp command.
The joinp command has several new options for CSV parsing, for pre-join filtering (--filter-left and --filter-right), and pre-join validation with the --validate option. Two new asof join variants (--left_by and --right_by) were also added.
Added
describegptcommand by @rzmk in https://github.com/jqnatividad/qsv/pull/1036describegpt: minor refactoring in https://github.com/jqnatividad/qsv/pull/1104describegpt:--key& QSVOPENAIAPI_KEY by @rzmk in https://github.com/jqnatividad/qsv/pull/1105describegpt: add--user-agentin help message by @rzmk in https://github.com/jqnatividad/qsv/pull/1095describegpt: json output format for redirection by @rzmk in https://github.com/jqnatividad/qsv/pull/1107describegpt: add testing (resolves #1114) by @rzmk in https://github.com/jqnatividad/qsv/pull/1115describegpt: add--modeloption (resolves #1101) by @rzmk in https://github.com/jqnatividad/qsv/pull/1117describegpt: polishing https://github.com/jqnatividad/qsv/pull/1122describegpt: add--jsonloption (resolves #1086) by @rzmk in https://github.com/jqnatividad/qsv/pull/1127describegpt: add--prompt-fileoption (resolves #1085) by @rzmk in https://github.com/jqnatividad/qsv/pull/1120joinp: addedasof_byjoin variant; added CSV formatting options consistent with sqlp CSV format options https://github.com/jqnatividad/qsv/pull/1090joinp: add--filter-leftand--filter-rightoptions https://github.com/jqnatividad/qsv/pull/1146joinp: add--validateoption https://github.com/jqnatividad/qsv/pull/1147fetch&fetchpost: add--no-cacheoption https://github.com/jqnatividad/qsv/pull/1112sniff: detect file kind along with mime type https://github.com/jqnatividad/qsv/pull/1137- user-agent metadata now contains the current command's name https://github.com/jqnatividad/qsv/pull/1093
Changed
fetch&fetchpost: --redis and --no-cache are mutually exclusive https://github.com/jqnatividad/qsv/pull/1113luau: adapt to mlua 0.9.0-rc.1 API changes https://github.com/jqnatividad/qsv/pull/1129- upgrade to Polars 0.31.1 https://github.com/jqnatividad/qsv/pull/1139
- Bump MSRV to latest Rust stable (1.71.0)
- pin Rust nightly to 2023-07-15
- Bump uuid from 1.3.4 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1073
- Bump tokio from 1.28.2 to 1.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1077
- Bump tokio from 1.29.0 to 1.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1087
- Bump sysinfo from 0.29.2 to 0.29.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1088
- build(deps): bump sysinfo from 0.29.4 to 0.29.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1148
- Bump jql-runner from 6.0.9 to 7.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1092
- build(deps): bump jql-runner from 7.0.0 to 7.0.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1132
- Bump itoa from 1.0.6 to 1.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1091
- Bump itoa from 1.0.7 to 1.0.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1098
- build(deps): bump itoa from 1.0.8 to 1.0.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1142
- Bump serde from 1.0.164 to 1.0.165 by @dependabot in https://github.com/jqnatividad/qsv/pull/1094
- Bump serde from 1.0.165 to 1.0.166 by @dependabot in https://github.com/jqnatividad/qsv/pull/1100
- Bump serde from 1.0.166 to 1.0.167 by @dependabot in https://github.com/jqnatividad/qsv/pull/1116
- build(deps): bump serde from 1.0.167 to 1.0.171 by @dependabot in https://github.com/jqnatividad/qsv/pull/1118
- Bump pyo3 from 0.19.0 to 0.19.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1099
- Bump ryu from 1.0.13 to 1.0.14 by @dependabot in https://github.com/jqnatividad/qsv/pull/1096
- build(deps): bump ryu from 1.0.14 to 1.0.15 by @dependabot in https://github.com/jqnatividad/qsv/pull/1144
- Bump strum_macros from 0.25.0 to 0.25.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1097
- Bump serde_json from 1.0.99 to 1.0.100 by @dependabot in https://github.com/jqnatividad/qsv/pull/1103
- build(deps): bump serde_json from 1.0.100 to 1.0.101 by @dependabot in https://github.com/jqnatividad/qsv/pull/1123
- build(deps): bump serde_json from 1.0.101 to 1.0.102 by @dependabot in https://github.com/jqnatividad/qsv/pull/1125
- build(deps): bump serde_json from 1.0.102 to 1.0.103 by @dependabot in https://github.com/jqnatividad/qsv/pull/1143
- Bump serde_stacker from 0.1.8 to 0.1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1110
- Bump regex from 1.8.4 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1109
- build(deps): bump regex from 1.9.0 to 1.9.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1119
- Bump jsonschema from 0.17.0 to 0.17.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1108
- build(deps): bump cpc from 1.9.1 to 1.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1121
- build(deps): bump governor from 0.5.1 to 0.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1128
- build(deps): bump actions/setup-python from 4.6.1 to 4.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1134
- build(deps): bump file-format from 0.17.3 to 0.18.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1136
- build(deps): bump serde_stacker from 0.1.9 to 0.1.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1141
- build(deps): bump semver from 1.0.17 to 1.0.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/1140
- cargo update bump several indirect dependencies
Fixed
fmt: Quote ASCII format differently by @LemmingAvalanche in https://github.com/jqnatividad/qsv/pull/1075apply: makedynfmtsubcommand case sensitive. Fixes #1126 https://github.com/jqnatividad/qsv/pull/1130applydp: makedynfmtcase-sensitive https://github.com/jqnatividad/qsv/pull/1131describegpt: docs/Describegpt.md: typo 'a' --> 'an' by @rzmk in https://github.com/jqnatividad/qsv/pull/1135tojsonl: support snappy-compressed input. Fixes #1133 https://github.com/jqnatividad/qsv/pull/1145- security.md: fix mailto text by @rzmk in https://github.com/jqnatividad/qsv/pull/1079
New Contributors
- @LemmingAvalanche made their first contribution in https://github.com/jqnatividad/qsv/pull/1075
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.108.0...0.109.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.108.0
Another big Quicksilver release with lots of new features and improvements!
The two Polars-powered commands - joinp and sqlp - have received significant attention. joinp now supports asof joins and the --try-parsedates option. sqlp now has several Parquet format options, along with a --low-memory option.
Other new features include:
- A new
cat rowskey --groupoption that emulates csvkit'scsvstackcommand. - SIMD-accelerated UTF-8 validation for the
inputcommand. - A
--field-separatoroption for theflattencommand. - The
sniffcommand now uses the excellentfile-formatcrate for mime-type detection on ALL platforms, not just Linux, as was the case when we were using the libmagic library.
Also, QuickSilver now has optimized builds for Apple Silicon. These builds are created using native Apple Silicon self-hosted Action Runners, which means we can enable all qsv features without being constrained by cross-compilation limitations and GitHub’s Action Runner’s disk/memory constraints. Additionally, we compile Apple Silicon builds with M1/M2 chip optimizations enabled to maximize performance.
Finally, qsv startup should be noticeably faster, thanks to @vi’s PR to avoid sysinfo::System::new_all.
Added
joinp: added asof join & --try-parsedates option https://github.com/jqnatividad/qsv/pull/1059cat: emulate csvkit's csvstack https://github.com/jqnatividad/qsv/pull/1067input: SIMD-accelerated utf8 validation https://github.com/jqnatividad/qsv/commit/88e1df2757b4a9a6f9dbaf55a99b87fc15b18a65sniff: replace magic with file-format crate, enabling mime-type detection on all platforms https://github.com/jqnatividad/qsv/pull/1069sqlp: add --low-memory option https://github.com/jqnatividad/qsv/commit/d95048e7be1a9d34cc7a22feebbd792a5c27c604sqlp: added parquet format options https://github.com/jqnatividad/qsv/commit/c179cf49e02343138b058d02783332394029a050 https://github.com/jqnatividad/qsv/commit/a861ebf246d22db0f4bcbce1b76788413cfdd1e7flatten: add --field-separator option https://github.com/jqnatividad/qsv/pull/1068- Apple Silicon binaries built on native Apple Silicon self-hosted Action Runners, enabling all features and optimized for M1/M2 chips
Changed
input: minor improvements https://github.com/jqnatividad/qsv/commit/62cff74b4679e2ba207916392cab5de573ce0a59joinp: align option names withjoincommand https://github.com/jqnatividad/qsv/pull/1058sqlp: minor improvements- changed all GitHub action workflows to account for the new Apple Silicon builds
- Bump rust_decimal from 1.29.1 to 1.30.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1049
- Bump serde_json from 1.0.96 to 1.0.97 by @dependabot in https://github.com/jqnatividad/qsv/pull/1051
- Bump calamine from 0.21.0 to 0.21.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1052
- Bump strum from 0.24.1 to 0.25.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1055
- Bump actix-governor from 0.4.0 to 0.4.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1060
- Bump csvs_convert from 0.8.5 to 0.8.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1061
- Bump itertools from 0.10.5 to 0.11.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1062
- Bump serde_json from 1.0.97 to 1.0.99 by @dependabot in https://github.com/jqnatividad/qsv/pull/1065
- Bump indexmap from 1.9.3 to 2.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1066
- Bump calamine from 0.21.1 to 0.21.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1071
- cargo update bump various indirect dependencies
- pin Rust nightly to 2021-06-23
Fixed
- Avoid sysinfo::System::new_all by @vi in https://github.com/jqnatividad/qsv/pull/1064
- correct typos project-wide https://github.com/jqnatividad/qsv/pull/1072
Removed
- removed libmagic dependency from all GitHub action workflows
New Contributors
- @vi made their first contribution in https://github.com/jqnatividad/qsv/pull/1064
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.107.0...0.108.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.107.0
We continue to improve the new sqlp command. It now supports SQL scripts and additional options to fine-tune Polars CSV parsing and formatting behavior.
We also added an _all_generic special value for the rename command which allows you to rename all columns in a CSV with generic names (e.g. col1, col2, colN). This was done to make it easier to prepare CSVs with no headers for use with sqlp.
This release also features a Windows MSI installer. This is a big step forward for qsv and we hope to make it easier for Windows users to install and use qsv. Thanks @minhajuddin2510 for all the work on pulling this together!
Added
sqlp: added script support https://github.com/jqnatividad/qsv/pull/1037sqlp: added CSV format options https://github.com/jqnatividad/qsv/pull/1048rename: add"_all_generic"special value for headers https://github.com/jqnatividad/qsv/pull/1031
Changed
excel: now supports Duration type with calamine upgrade to 0.21.0 https://github.com/jqnatividad/qsv/pull/1045- Update publish-wix-installer.yml by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1032
- Bump mlua from 0.9.0-beta.2 to 0.9.0-beta.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1030
- Bump serde from 1.0.163 to 1.0.164 by @dependabot in https://github.com/jqnatividad/qsv/pull/1029
- Bump csvs_convert from 0.8.4 to 0.8.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1028
- Bump sysinfo from 0.29.1 to 0.29.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1027
- Bump log from 0.4.18 to 0.4.19 by @dependabot in https://github.com/jqnatividad/qsv/pull/1039
- Bump uuid from 1.3.3 to 1.3.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1041
- Bump jql-runner from 6.0.8 to 6.0.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1043
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-06-13
Fixed
- Remove redundant registries protocol by @icp1994 in https://github.com/jqnatividad/qsv/pull/1034
- fix typo in tojsonl.rs (optionns -> options) by @rzmk in https://github.com/jqnatividad/qsv/pull/1035
- Fix eula by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1046
New Contributors
- @icp1994 made their first contribution in https://github.com/jqnatividad/qsv/pull/1034
- @rzmk made their first contribution in https://github.com/jqnatividad/qsv/pull/1035
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.106.0...0.107.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.106.0
This release features the new Polars-powered sqlp command which allows you to run SQL queries against CSVs.
Initial tests show that its performance is competitive with DuckDB and faster than DataFusion on identical SQL queries, and it just runs rings around pandas sql.
It converts Polars SQL (a subset of ANSI SQL) queries to multi-threaded LazyFrames expressions and then executes them. This is a very powerful feature and allows you to do things like joins, aggregations, group bys, etc. on larger than memory CSVs. The sqlp command is still experimental and we are looking for feedback on it. Please try it out and let us know what you think.
Added
sqlp: new command to allow Polars SQL queries against CSVs https://github.com/jqnatividad/qsv/pull/1015
Changed
- Bump csv from 1.2.1 to 1.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1008
- Bump pyo3 from 0.18.3 to 0.19.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1007
- workflow for creating msi for qsv by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1009
- migrate from once_cell to std::sync::oncelock https://github.com/jqnatividad/qsv/pull/1010
- Bump qsv_docopt from 1.2.2 to 1.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1011
- Bump self_update from 0.36.0 to 0.37.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1014
- Bump indicatif from 0.17.4 to 0.17.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1013
- Bump cached from 0.43.0 to 0.44.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1012
- Bump url from 2.3.1 to 2.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1016
- Wix changes by @minhajuddin2510 in https://github.com/jqnatividad/qsv/pull/1017
- Bump actions/github-script from 5 to 6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1018
- Bump regex from 1.8.3 to 1.8.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1019
- Bump hashbrown from 0.13.2 to 0.14.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1020
- Bump tempfile from 3.5.0 to 3.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1021
- Bump sysinfo from 0.29.0 to 0.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1023
- Bump qsv-dateparser from 0.8.2 to 0.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1022
- Bump qsv-sniffer from 0.9.3 to 0.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1024
- Bump qsv-stats from 0.9.0 to 0.10.0 https://github.com/jqnatividad/qsv/commit/38035793d2bb3bf4bee1d3e4cbfc62a6f0235fb6
- Bump embedded luau from 0.577 to 0.579
- Bump data-encoding from 2.3.3 to 2.4.0 https://github.com/jqnatividad/qsv/commit/2285a12eab6a7997f97cb39f908684c3adae3ec9
- cargo update bump several indirect dependencies
- change MSRV to 1.70.0
- pin Rust nightly to 2023-06-06
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.105.1...0.106.0
- Rust
Published by jqnatividad over 2 years ago
https://github.com/dathere/qsv - 0.105.1
All "unsafe" code has been removed. By selectively using asserts, we obviate the need to use explicit unchecked logic to skip unnecessary bounds checking.
Changed
stats: remove all unsafes https://github.com/jqnatividad/qsv/commit/4a4c0107f98dcd3a2fac7a793101624ec46762dffetch&fetchpost: remove unsafe https://github.com/jqnatividad/qsv/commit/1826bb3cbe24f731973d2e2ce8edc1927dc87d4bvalidate: remove unsafe https://github.com/jqnatividad/qsv/commit/742ccb3b36fd6a0fb9690d9150bec5b2e4d44b0a- normalize
--user-agentoption across all of qsv https://github.com/jqnatividad/qsv/commit/feff90bba4d6840f7d2aa2100897cfaad7efe08f & https://github.com/jqnatividad/qsv/commit/839b3b71369f948135d403b7d30e8b26248a313b - bump qsv-dateparser from 0.8.1 to 0.8.2 which also uses chrono 0.4.26
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-05-29
Fixed
- remove chrono pin to 0.4.24 and upgrade to 0.4.26 which fixed 0.4.25 CI test failures https://github.com/jqnatividad/qsv/commit/7636d82bdcf3428e59b800b6ff9f53dcd52cddd9
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.105.0...0.105.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.105.0
Added
sniff: added --harvest-mode convenience option https://github.com/jqnatividad/qsv/pull/997sniff: added --quick option on Linux https://github.com/jqnatividad/qsv/commit/e16df6fbbad9318cc4efeb500409f80b76cd50e2- qsv (pronounced "Quicksilver") now has a tagline - "Hi ho, QuickSilver! Away!" :smile: https://github.com/jqnatividad/qsv/commit/d32aeb1afe7a90c4887b00a0c2a20481a91722fe
Changed
sniff: if --no-infer is enabled when sniffing a snappy file, just return the snappy mime type https://github.com/jqnatividad/qsv/pull/996sniff: now returns filesize and last-modified date in errors. https://github.com/jqnatividad/qsv/commit/2162659bd574122e93e204cb14b5114bd7ca5344stats: minor performance tweaks in hot compute loop https://github.com/jqnatividad/qsv/commit/f61198c2057545fb76a9b30bd12adfd3a3bbf8ba- qsv binary variants built using older glibc/musl libraries are now published with their respective glibc/musl version suffixes (glibc-2.31/musl-1.1.24) in the filename, instead of just the "older" suffix.
- pin chrono to 0.4.24 as the new 0.4.25 is breaking CI tests https://github.com/jqnatividad/qsv/commit/cde3623b27fcb583a1248fc736aaf11f569f5085
- Bump calamine from 0.19.1 to 0.20.0 https://github.com/jqnatividad/qsv/commit/ec7e2df70e33756d4ef49567bf4f5acba3eb19d4
- Bump actions/setup-python from 4.6.0 to 4.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/991
- Bump flexi_logger from 0.25.4 to 0.25.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/992
- Bump regex from 1.8.2 to 1.8.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/993
- Bump csvs_convert from 0.8.3 to 0.8.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/994
- Bump log from 0.4.17 to 0.4.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/998
- Bump polars from 0.29.0 to 0.30.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/999
- Bump tokio from 1.28.1 to 1.28.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1000
- Bump once_cell from 1.17.1 to 1.17.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1003
- Bump indicatif from 0.17.3 to 0.17.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1001
- cargo bump update several indirect dependencies
- pin Rust nightly to 2023-05-28
Removed
excel: removed kludgy --dates-whitelist option https://github.com/jqnatividad/qsv/pull/1005
Fixed
sniff: fix inconsistent mime type detection https://github.com/jqnatividad/qsv/pull/995
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.104.1...0.105.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.104.1
Added
- added new publishing workflow to build binary variants using older glibc 2.31 instead of glibc 2.35 and musl 1.1.24 instead of musl 1.2.2. This will allow users running on older Linux distros (e.g. Debian, Ubuntu 20.04) to run qsv prebuilt binaries with "older" glibc/musl versions. https://github.com/jqnatividad/qsv/commit/1a08b920240b39ff57282645cc92686b42e3c278
Changed
sniff: improved usage text https://github.com/jqnatividad/qsv/commit/d2b32ac6631589230484cb84506b5113c8f75192sniff: if sniffing a URL, and server does not return content-length or last-modified headers, set filesize and last-modified to "Unknown" https://github.com/jqnatividad/qsv/commit/d4a64ac2e7147e7ab5452864fe6063a97f37f76bfrequency: use SIMD accelerated utf8 validation in hot loop https://github.com/jqnatividad/qsv/commit/33406a15f554d03ca117e0196efa6362f104e3ccforeach: use simdut8 validation https://github.com/jqnatividad/qsv/commit/df6b4f8ae967bde8ca22bc6dd217938ae5238addapply: use simdutf8 validation in decode operation; also tweak it to avoid panics (however unlikely) https://github.com/jqnatividad/qsv/commit/adf7052db39a08aeda2401774892a884be98223c- update install & build instructions with magic
- Bump regex from 1.8.1 to 1.8.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/990
- Bump bumpalo from 3.12.2 to 3.13.0
- pin Rust nightly to 2021-05-22
Removed
sniff: disabled --progressbar option on qsvdp binary variant https://github.com/jqnatividad/qsv/commit/1a20edb7af7525fcb5c54daacf70e3381cf17e82
Fixed
- updated publishing workflows to properly enable magic feature (for sniff mime type detection) https://github.com/jqnatividad/qsv/commit/136211fcd9134f3421223979a5272ff53d77f03b
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.104.0...0.104.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.104.0
Added
sniff: add --no-infer option only available on Linux. Using this option makessniffwork as a general mime type detector - retrieving detected mime type, file size (content-length when sniffing a URL), and last modified date.
When sniffing a URL with --no-infer, it only sniffs the first downloaded chunk, making it very fast even for very large remote files. This option was designed to facilitate accelerated harvesting and broken/stale link checking on CKAN. https://github.com/jqnatividad/qsv/pull/987excel: add canonical_filename to metadata https://github.com/jqnatividad/qsv/pull/985snappy: now accepts url input https://github.com/jqnatividad/qsv/pull/986sample: support url input https://github.com/jqnatividad/qsv/pull/989
Changed
- Bump qsv-sniffer from 0.9.2 to 0.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/979
- Bump console from 0.15.5 to 0.15.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/980
- Bump jql-runner from 6.0.7 to 6.0.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/981
- Bump console from 0.15.6 to 0.15.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/988
- Bump embedded Luau from 0.576 to 0.577
- apply select clippy recommendations
- tweaked emojis used in Available Commands legend - 🗜️ to 🤯 to denote memory-intensive commands that load the entire CSV into memory; 🪗 to 😣 to denote commands that need addl memory proportional to the cardinality of the columns being processed; 🌐 to denote commands that have web-aware options
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-05-21
Fixed
excel: Handle ranges larger than the sheet by @bluepython508 in https://github.com/jqnatividad/qsv/pull/984
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.103.1...0.104.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.103.1
Changed
- Bump reqwest from 0.11.17 to 0.11.18 by @dependabot in https://github.com/jqnatividad/qsv/pull/978
- cargo update bump indirect dependencies
Fixed
- fix
cargo installfailing as it is trying to fetch cargo environment variables that are only set forcargo build, but notcargo install#977
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.103.0...0.103.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.103.0
Added
sniff: On Linux, short-circuit sniffing a remote file when we already know its not a CSV https://github.com/jqnatividad/qsv/pull/976stats: now computes variance for dates https://github.com/jqnatividad/qsv/commit/e3e678298de59f2485d5e70f622218d849a2e2c9stats: now automatically invalidates cached stats across qsv releases https://github.com/jqnatividad/qsv/commit/6e929dd1feac692be3f7e1883ad88f99b3abc5b2- add magic version to --version option https://github.com/jqnatividad/qsv/commit/455c0f26e237c812bf9d88d6a7906e34c5a9cbeb
- added CKAN-aware (
) legend to List of Available Commands
Changed
stats: improve usage textstats: use extendfromslice for readability https://github.com/jqnatividad/qsv/commit/23275e2e8ef30bdc101293084bce71e651b3222avalidate: do not panic if the input is not UTF-8 https://github.com/jqnatividad/qsv/commit/532cd012de0866250be2dc19b6e02ffa27b3c9fbsniff: simplify getting stdin last_modified property; on Linux, return detected mime type in JSON error response https://github.com/jqnatividad/qsv/commit/01975912ae99fe0a7b38cf741f3dfbcf2b9dc486luau: update embedded Luau from 0.573 to 0.576- Update nightly build instructions
- Bump qsv-sniffer from 0.9.1 to 0.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/972
- Bump tokio from 1.28.0 to 1.28.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/973
- Bump serde from 1.0.162 to 1.0.163 by @dependabot in https://github.com/jqnatividad/qsv/pull/974
- cargo update bump several indirect dependencies
- pin Rust nightly to 2021-05-13
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.102.1...0.103.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.102.1
0.102.1 is a small patch release to fix issues in publishing the pre-built binary variants with magic for sniff when cross-compiling.
Changed
stats: refine--infer-booleanoption info & update test count https://github.com/jqnatividad/qsv/commit/de6390b21a21b67ae0dd3f3f6d0153f2c0736cfftojsonl: refine boolcheckfirstlower_char() fn https://github.com/jqnatividad/qsv/commit/241115e4718c67cd8e701c435b91e02556875eac
Fixed
- tweaked GitHub Actions publishing workflows to enable building magic-enabled
sniffon Linux. Disabled magic when cross-compiling for non-x86_64 Linux targets.
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.102.0...0.102.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.102.0
A lot of work was done on sniff to make it not just a CSV dialect detector, but a general purpose file type detector leveraging :magic_wand: magic :sparkles: - able to detect mime types even for files on URLs.
sniff can now also use the same data types as stats with the --stats-types option. This was primarily done to support metadata collection when registering CKAN resources not only during data entry, but also when checking resource links for bitrot, and when harvesting metadata from other systems, so stats & sniff can be used interchangeably based on the response time requirement and the data quality of the data source.
For example, sniff can be used for quickly inferring metadata by just downloading a small sample from a very large data file DURING data entry ("Resource-first upload workflow"), with stats being used later on, when the data is actually being pushed to the Datastore with Datapusher+, when data type inferences need to be guaranteed, and the entire file will need to be scanned.
Added
stats: add--infer-booleanoption https://github.com/jqnatividad/qsv/pull/967sniff: add--stats-typesoption https://github.com/jqnatividad/qsv/pull/968sniff: add magic mime-type detection on Linux https://github.com/jqnatividad/qsv/pull/970sniff: add--user-agentoption https://github.com/jqnatividad/qsv/commit/bd0bf788609c7dd5220cdab6061067170acf1ca2sniff: add last_modified info https://github.com/jqnatividad/qsv/commit/ef68bff177ee7c9ce6bd45868488287c8114a91e
Changed
- make
--envlistoption allocator-aware https://github.com/jqnatividad/qsv/commit/f3566dc0c4ab7c7236374cce936f5db7200e39de - Bump serde from 1.0.160 to 1.0.162 by @dependabot in https://github.com/jqnatividad/qsv/pull/962
- Bump robinraju/release-downloader from 1.7 to 1.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/960
- Bump flexi_logger from 0.25.3 to 0.25.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/965
- Bump sysinfo from 0.28.4 to 0.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/966
- Bump jql-runner from 6.0.6 to 6.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/969
- Bump polars from 0.28.0 to 0.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/971
- apply select clippy recommendations
- cargo update bump indirect dependencies
- change MSRV to 1.69.0
- pin Rust nightly to 2023-05-07
Fixed
sniff: make sniff give more consistent results https://github.com/jqnatividad/qsv/pull/958. Fixes #956- Bump qsv-sniffer from 0.8.3 to 0.9.1. Replaced all assert with proper error-handling. https://github.com/jqnatividad/qsv/pull/961 https://github.com/jqnatividad/qsv/commit/a7c607a55be9bebca13148f5a0dddf1fea909df7 https://github.com/jqnatividad/qsv/commit/43d7eaf9201c72016682096e84400dba59b7cd95
sniff: fixed rowcount calculation when sniffing a URL and the entire file was actually downloaded - https://github.com/jqnatividad/qsv/commit/ef68bff177ee7c9ce6bd45868488287c8114a91e
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.101.0...0.102.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.101.0
We're back to the future! The qsv release train is back on track, as we jump to 0.101.0 over the yanked 0.100.0 release now that self-update logic has been fixed.
Added
stats: added more metadata to stats arg cache json - https://github.com/jqnatividad/qsv/commit/5767e5650690a8f39d537ccdc428a6688762cd77- added target-triple to user-agent string, and changed agent name to qsv binary variant https://github.com/jqnatividad/qsv/commit/063b08031e361b5c1f26ed504870f0bc1bfd7678, https://github.com/jqnatividad/qsv/commit/70f4ea3b2d0d88b54358c470dd8e964e89adf16d, https://github.com/jqnatividad/qsv/commit/f0fcb0591fcecaae9b8a9db192adbcdfeb402728
Changed
excel: performance, safety & documentation refinements https://github.com/jqnatividad/qsv/commit/e9a283d51fe84cc4c4e004c0e7b9b2ef12db683d, https://github.com/jqnatividad/qsv/commit/3800d250223619963bc9072ade9c43200ca1bdaf, https://github.com/jqnatividad/qsv/commit/252b01e2207bb995d09154af546a12174d532d6a, https://github.com/jqnatividad/qsv/commit/6a6df0f045cb4f1e58d07433e73a41579ca1262f, https://github.com/jqnatividad/qsv/commit/67ccd85cbe5441b1ad0188ae524b3e832c817d30, https://github.com/jqnatividad/qsv/commit/f2908ce020316087ed756d614c357373727f2664, https://github.com/jqnatividad/qsv/commit/6d5105deaa00f3b8e350d522b196ef4ed3676fc4, https://github.com/jqnatividad/qsv/commit/dbcea393cfba08b4ffe3b6b6d0acd364a59cb342, https://github.com/jqnatividad/qsv/commit/faa8ef9b3f9d6de6af47ddced0d80a5ad5b4e763replace: clarify that it works on a field-by-field basis https://github.com/jqnatividad/qsv/commit/c0e2012dc011a6269359ed0ff2c7dc157bae5cd0stats: use extendfromslice when possible - https://github.com/jqnatividad/qsv/commit/c71ad4ee3d7992f4ef1cdc37e32d740756340ba9fetch&fetchpost: replace multiple push_fields with a csv from vec - https://github.com/jqnatividad/qsv/commit/f4e0479e508c845f49d320967af443fe5a247327fetch&fetchpost: Migrate to jql 6 https://github.com/jqnatividad/qsv/pull/955schema: made bincode reader buffer bigger - https://github.com/jqnatividad/qsv/commit/39b4bb5f89bab7ada2dda40d66d1e40bb51cbe0aindex: use increased default buffer size when creating index https://github.com/jqnatividad/qsv/commit/60fe7d64b7eeb322625d2cc44d196bd5633bd79c- standardized user_agent processing https://github.com/jqnatividad/qsv/commit/4c063015a8d664b9ef105243b2ea6541b3cc6b59, https://github.com/jqnatividad/qsv/commit/010c565912c6ae5ba09620cee7f90aeb294c4d14
- User agent environment variable; standardized user agent processing https://github.com/jqnatividad/qsv/pull/951
- more robust Environment Variables processing https://github.com/jqnatividad/qsv/pull/946
- move Environment Variables to its own markdown file https://github.com/jqnatividad/qsv/commit/77c167fe3942ce464bc5a675b76b3371cf75e84b
- Bump tokio from 1.27.0 to 1.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/945
- Bump mimalloc from 0.1.36 to 0.1.37 by @dependabot in https://github.com/jqnatividad/qsv/pull/944
- Bump mlua from 0.9.0-beta.1 to 0.9.0-beta.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/952
- Bump flate2 from 1.0.25 to 1.0.26 by @dependabot in https://github.com/jqnatividad/qsv/pull/954
- Bump reqwest from 0.11.16 to 0.11.17 by @dependabot in https://github.com/jqnatividad/qsv/pull/953
- cargo update bump indirect dependencies
- pin Rust nightly to 2023-04-30
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.99.1...0.101.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.99.1
Even though this is a patch release, it actually contains a lot of new features and improvements. This was done so that qsv version 0.99.0 and below can upgrade to this release, as the self-update logic in older versions compared versions as strings, and not as semvers, preventing the older versions from updating as the yanked 0.100.0 is less than anything 0.99.0 and below when compared as strings.
The changelog below is a combination of the changelog of the yanked 0.100.0 and the changes since 0.99.0.
Added
snappy: add validate subcommand https://github.com/jqnatividad/qsv/pull/920sniff: can now sniff snappy-compressed files - on the local file system and on URLs https://github.com/jqnatividad/qsv/pull/925schema&stats: stats now has a--stats-binoutoption whichschematakes advantage of https://github.com/jqnatividad/qsv/pull/931schema: added example NYC 311 JSON schema validation file generated byqsv schemahttps://github.com/jqnatividad/qsv/commit/c956212574ad0d800c3cf3bb1caa4e5722f0a393to: added snappy auto-compression/decompression support https://github.com/jqnatividad/qsv/commit/09a7afd38fdf59703edf76fa492eed9747586b6cto: added dirs as input source https://github.com/jqnatividad/qsv/commit/a31fb3b7499e1ed05136b32b3179d5713bec2106 and https://github.com/jqnatividad/qsv/commit/4d4dd548c44967c61493f1e1c2403f352dcfba34to: added unit tests for sqlite, postgres, xslx and datapackage https://github.com/jqnatividad/qsv/commit/16f2b7ec35bc44093b90d4673e8c20a61f6263bb https://github.com/jqnatividad/qsv/commit/808b018d1f5b7f815897979e1bd67d663fe31c9c https://github.com/jqnatividad/qsv/commit/10739c55bdf66494e5f76028fb1bc67dbeb706cf- add dotenv file support https://github.com/jqnatividad/qsv/pull/936 and https://github.com/jqnatividad/qsv/pull/937
Changed
stats&schema: major performance improvement (30x faster) with stats binary format serialization/deserialization https://github.com/jqnatividad/qsv/commit/73b4b2075a7d9013f8b71a9109073e6d9b8ad9b4snappy: misc improvements in https://github.com/jqnatividad/qsv/pull/921stats: Refine stats binary format caching in https://github.com/jqnatividad/qsv/pull/932- bump embedded Luau from 0.5.71 to 0.5.73 https://github.com/jqnatividad/qsv/commit/d0ea7c8f926299c5d201609e4f3f11e11e3462d7
- Better OOM checks. It now has two distinct modes - NORMAL and CONSERVATIVE, with NORMAL being the default. Previously, the CONSERVATIVE heuristic was the default and it was causing too many false positives https://github.com/jqnatividad/qsv/pull/935
- Bump actions/setup-python from 4.5.0 to 4.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/934
- Bump emdedded Luau from 0.5.67 to 0.5.71 https://github.com/jqnatividad/qsv/commit/a67bd3e274b1f73d64bb93e03c817cce583a8b02
- Bump qsv-stats from 0.7 to 0.8 https://github.com/jqnatividad/qsv/commit/9a6812abff719b11e5b0c7e25009dfc81231757a
- Bump serde from 1.0.159 to 1.0.160 by @dependabot in https://github.com/jqnatividad/qsv/pull/918
- Bump cached from 0.42.0 to 0.43.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/919
- Bump serde_json from 1.0.95 to 1.0.96 by @dependabot in https://github.com/jqnatividad/qsv/pull/922
- Bump pyo3 from 0.18.2 to 0.18.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/923
- Bump ext-sort from 0.1.3 to 0.1.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/929
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-23
Removed
snappyis even snappier when we removed 8-cpu cap for even faster compression - going from 1.75 gb/sec to 2.25 gb/sec for the NYC 311 test data :rocket: https://github.com/jqnatividad/qsv/commit/19acf2f23187dee5fd104e9e6eceb8fdc74d7a08
Fixed
excel: Float serialization correctness by @bluepython508 in https://github.com/jqnatividad/qsv/pull/933luau: only create qsv_cache directory when needed https://github.com/jqnatividad/qsv/pull/930luau: makeqsv_shellcmd()helper function work with Windows https://github.com/jqnatividad/qsv/commit/f867158c4c7eaf10c18092b2a4c88ff67cc3a487 and https://github.com/jqnatividad/qsv/commit/cc24acba3c916184059e7e9d776dce9e35294d44- Self update semver parsing fixed so versions are compared as semvers, not as strings. This prevented self-update from updating from 0.99.0 to 0.100.0 as 0.99.0 > 0.100.0 when compared as strings. https://github.com/jqnatividad/qsv/pull/940
- fixed werr macro to also format! messages https://github.com/jqnatividad/qsv/commit/c3ceaf713683ddb70e40a293f494f15144cc78fb
New Contributors
- @bluepython508 made their first contribution in https://github.com/jqnatividad/qsv/pull/933
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.99.0...0.99.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.99.0
Added
- added Snappy auto-compression/decompression support. The Snappy format was chosen primarily because it supported streaming compression/decompression and is designed for performance. https://github.com/jqnatividad/qsv/pull/911
- added
snappycommand. Although files ending with the ".sz" extension are automatically compressed/decompressed by qsv, thesnappycommand offers 4-5x faster multi-threaded compression. It can also be used to check if a file is Snappy-compressed or not, and can be used to compress/decompress ANY file. https://github.com/jqnatividad/qsv/pull/911 and https://github.com/jqnatividad/qsv/pull/916 diffcommand added toqsvliteandqsvdpbinary variants https://github.com/jqnatividad/qsv/pull/910to: added stdin support https://github.com/jqnatividad/qsv/pull/913
Changed
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-09
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.98.0...0.99.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.98.0
Added
stats: added stats caching and storing the computed stats as metadata. Doing so not only prevents unnecessary recomputation of stats, especially for very large files, it also sets the foundation for summary statistics to be used more widely across qsv to support new commands that leverages these stats - e.g.fixdata,outliers,describegpt,fake,statsvizand multi-pass stats, etc. https://github.com/jqnatividad/qsv/pull/902stats: added--forceoption to force recomputation of stats https://github.com/jqnatividad/qsv/commit/2f91d0cd981ce9be6c36424cd946f3bcce42b909luau: add qsv_loadcsv helper function https://github.com/jqnatividad/qsv/pull/908- added more info about regular expression syntax and link to https://regex101.com which now supports the Rust flavor of regex
Changed
- logging is now buffered by default https://github.com/jqnatividad/qsv/pull/903
- renamed features to be more easily understandable: "full" -> "featurecapable", "allfull" -> "all_features" https://github.com/jqnatividad/qsv/pull/906
- changed GitHub Actions workflows to use the new feature names
- Bump redis from 0.22.3 to 0.23.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/901
- Bump filetime from 0.2.20 to 0.2.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/904
- reenabled
fetchandfetchpostCI tests - cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-06
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.97.0...0.98.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.97.0
Since 0.96.x was not published, 0.97.0 contains the changes from 0.96.x after fixing the mimalloc build errors on some platforms.
Added
excel: add --date-format option in https://github.com/jqnatividad/qsv/pull/897 and https://github.com/jqnatividad/qsv/commit/6a7db997c8d150854405a2cb2ac392479c3534b9luau: add qsv_fileexists() helper fn https://github.com/jqnatividad/qsv/commit/f4cc60f87c3c7c85a7736260356daa3051d2a879
Changed
excel: speed up float conversion by using ryu and itoa together rather than going thru core::fmt::Formatter https://github.com/jqnatividad/qsv/commit/e722753c377e385ebdffca199557ab3cf848ce7bjoinp: --cross option does not require columns; added CI tests https://github.com/jqnatividad/qsv/pull/894schema: better, more human-readable regex patterns are generated when inferring pattern attribute; more interactive messages https://github.com/jqnatividad/qsv/commit/1620477b752e64b6b2844aafeee4adf9256d4de8schema&validate: improve usage text; added JSON Schema Validation info https://github.com/jqnatividad/qsv/commit/3da68474d0fa4b6ec2170bf69dbfb27ab0d5f8a3- Bump tokio from 1.26.0 to 1.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/887
- Bump reqwest from 0.11.15 to 0.11.16 by @dependabot in https://github.com/jqnatividad/qsv/pull/888
- Bump serde_json from 1.0.94 to 1.0.95 by @dependabot in https://github.com/jqnatividad/qsv/pull/889
- Bump serde from 1.0.158 to 1.0.159 by @dependabot in https://github.com/jqnatividad/qsv/pull/890
- Bump tempfile from 3.4.0 to 3.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/891
- Bump polars from 0.27.2 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/893
- Bump mlua from 0.8 to 0.9.0-beta.1 https://github.com/jqnatividad/qsv/commit/9b7e984cba4079f8e826f7e74209a90ce7856bc7
- bump MSRV to Rust 1.68.2
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-04-02
Removed
luau: removed unnecessary --exec option https://github.com/jqnatividad/qsv/commit/0d4ccdaab95ab5471bb71d99aa7f9056dabf48c3
Fixed
- Fixed build errors on non-Windows platforms #900 by bumping mimalloc from 0.1.34 to 0.1.36
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.95.1...0.97.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.95.1
Changed
count: add example/test add link from usage text https://github.com/jqnatividad/qsv/commit/9cd3c293eef0344c27693949f415850881211adfdiff: add examples link from usage text https://github.com/jqnatividad/qsv/commit/4250811d0d20284342ccd7efcc58cd7562d16636- Standardize --timeout option handling and exposed it with QSV_TIMEOUT env var https://github.com/jqnatividad/qsv/pull/886
- improved self-update messages https://github.com/jqnatividad/qsv/commit/4027306f08aeca3b2ebe1e4243628a65c1307a9e
- Bump qsv-dateparser from 0.6 to 0.7
- Bump qsv-sniffer from 0.7 to 0.8
- Bump actions/stale from 7 to 8 by @dependabot in https://github.com/jqnatividad/qsv/pull/876
- Bump newline-converter from 0.2.2 to 0.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/877
- Bump rust_decimal from 1.29.0 to 1.29.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/882
- Bump regex from 1.7.2 to 1.7.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/881
- Bump sysinfo from 0.28.3 to 0.28.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/883
- Bump pyo3 from 0.18.1 to 0.18.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/885
- Bump indexmap from 1.9.2 to 1.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/884
- change MSRV to Rust 1.68.1
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-26
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.95.0...0.95.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.95.0
Added
luau: added qsvcmd() and qsvshellcmd() helpers, detailed map error messages to help with script development https://github.com/jqnatividad/qsv/pull/869luau: added environment variable set/get helper functions - qsvsetenv() and qsvgetenv() https://github.com/jqnatividad/qsv/pull/872luau: added smart qsvregisterlookup() caching so lookup tables need not be repeatedly downloaded and can be persisted/expired as required https://github.com/jqnatividad/qsv/pull/874luau: added QSVCKANAPI, QSVCKANTOKEN and QSVCACHEDIR env vars https://github.com/jqnatividad/qsv/commit/9b7269e98fe004c6d2268d626777628af65dd45d
Changed
apply&applydp: expanded usage text to have arguments section; emptyreplace subcommand now supports column selectors https://github.com/jqnatividad/qsv/pull/868luau: smarter script file processing. In addition to recognizing "file:" prefix, if the script argument ends with ".lua/luau" file extensions, its automatically processed as a file https://github.com/jqnatividad/qsv/pull/875luau: qsvsleep() and qsvwritefile() improvements https://github.com/jqnatividad/qsv/commit/27358a26411f95f57acfd62aad8b92906fe82cedpartition: added arguments section to usage text; added NYC 311 example https://github.com/jqnatividad/qsv/commit/74aa37b1c138f1c010d338fb4f6c9b48a381532a- Bump reqwest from 0.11.14 to 0.11.15 by @dependabot in https://github.com/jqnatividad/qsv/pull/870
- Bump regex from 1.7.1 to 1.7.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/873
- apply select clippy lint recommendations
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-22
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.94.0...0.95.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.94.0
Added
luau: qsvregisterlookup now supports "ckan://" scheme. This allows the luau script developer to fetch lookup table resources from CKAN instances. https://github.com/jqnatividad/qsv/pull/864luau: added detailed example for "dathere://" lookup scheme in https://github.com/dathere/qsv-lookup-tables repo. https://github.com/jqnatividad/qsv/commit/3074538a9ac1071ba6d6b6e85fdc0ca3c833ce4eluau: addedqsv_writefilehelper function. This allows the luau script developer to write text files to the current working directory. Filenames are sanitized for safety. https://github.com/jqnatividad/qsv/pull/867luau: random access mode now supports progressbars. The progressbar indicates the current record and the total number of records in the CSV file https://github.com/jqnatividad/qsv/commit/63150a0a0d885f5bd5b118524d802ff59b18f621input: added --comment option which allows the user to specify the comment character. CSV rows that start with the comment character are skipped. https://github.com/jqnatividad/qsv/pull/866
Changed
luau: added additional logging messages to help with script debugging https://github.com/jqnatividad/qsv/commit/bcff8adc03ad398829f4874e948f5152bca04783schema&tojsonl: refactor stdin handling https://github.com/jqnatividad/qsv/commit/6c923b19bfa3fbed918335b70b793a6d6011a960- bump jsonschema from 0.16 to 0.17
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-17
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.93.1...0.94.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.93.1
Fixed
- Fixed publishing workflow so qsvdp
luauis only enabled on platforms that support it
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.93.0...0.93.1
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.93.0
Added
luau: qsvregisterlookup helper function now works with CSVs on URLs https://github.com/jqnatividad/qsv/pull/860luau: added support for "dathere://" lookup scheme, allowing users to conveniently load oft-used lookup tables from https://github.com/dathere/qsv-lookup-tables https://github.com/jqnatividad/qsv/pull/861luau: added detailed API definitions for Luau Helper Functions https://github.com/jqnatividad/qsv/blob/605b38b5636382d45f96d3d9d3c404bb20efaf15/src/cmd/luau.rs#L1156-L1497validate: added --timeout option when downloading JSON Schemas https://github.com/jqnatividad/qsv/commit/605b38b5636382d45f96d3d9d3c404bb20efaf15
Changed
- remove all glob imports https://github.com/jqnatividad/qsv/pull/857 and https://github.com/jqnatividad/qsv/pull/858
- qsvdp (Datapusher+-optimized qsv binary variant) now has an embedded
luauinterpreter https://github.com/jqnatividad/qsv/pull/859 validate: JSON Schema url now case-insensitive https://github.com/jqnatividad/qsv/commit/3123dc6da30370cae88c9e4bb9d387fed3d36507- Bump serde from 1.0.155 to 1.0.156 by @dependabot in https://github.com/jqnatividad/qsv/pull/862
- applied select clippy lint recommendations
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-03-14
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.92.0
Added
excel: added option to specify range to extract by @EricSoroos in https://github.com/jqnatividad/qsv/pull/843luau: added --remap option. This allows the user to only map specified columns to the output CSV https://github.com/jqnatividad/qsv/pull/841luau: added several new helper functions:qsv_skip: skips writing the current record to the output CSV https://github.com/jqnatividad/qsv/pull/854qsv_break: stops processing the current CSV file https://github.com/jqnatividad/qsv/pull/846qsv_insertrecord: inserts a new record to the output CSV https://github.com/jqnatividad/qsv/pull/845qsv_register_lookup: loads a CSV that can be used as a lookup table in Luau https://github.com/jqnatividad/qsv/commit/38e7b7eb264d4b43b7f3039696ad918238f0a4c6
Changed
luau: reorganized code for readability/maintainability https://github.com/jqnatividad/qsv/pull/846foreach: tweak usage text to say it works with shell commands, not just the bash shell https://github.com/jqnatividad/qsv/commit/78851b33e8482c1961e97c17c95ea316950355fdsplit: added deeplink to examples/tests https://github.com/jqnatividad/qsv/commit/6f293b853b74505b7769e2972e7bc358506db34eselect: added deeplink to examples/tests https://github.com/jqnatividad/qsv/commit/72fa0942c5954b48236b6d137a8347e89e2f097c- Switch to qsv-optimized fork of docopt.rs - qsv_docopt. As docopt.rs is unmaintained and docopt parsing is an integral part of qsv as we embed each command's usage text in a way that cannot be done by either clap or structopt https://github.com/jqnatividad/qsv/pull/852
- Bump embedded Luau from 0.566 to 0.567 https://github.com/jqnatividad/qsv/commit/d624e840802b51aae59cf5db0923f8f2605426c5
- Bump csv from 1.2.0 to 1.2.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/839
- Bump serde from 1.0.152 to 1.0.153 by @dependabot in https://github.com/jqnatividad/qsv/pull/842
- Bump serde from 1.0.153 to 1.0.154 by @dependabot in https://github.com/jqnatividad/qsv/pull/844
- Bump rust_decimal from 1.28.1 to 1.29.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/853
- start using new crates.io sparse protocol
- applied select clippy lint recommendations
- cargo update bump several other dependencies
- pin Rust nightly to 2021-03-12
Fixed
stats: fix stdin regression https://github.com/jqnatividad/qsv/pull/851excel: Fix missing integer headers in excel transform. by @EricSoroos in https://github.com/jqnatividad/qsv/pull/840luau: fix & improve comment remover https://github.com/jqnatividad/qsv/pull/845
New Contributors
- @EricSoroos made their first contribution in https://github.com/jqnatividad/qsv/pull/840
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.91.0...0.92.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.91.0
Added
luau: map multiple new computed columns in one call https://github.com/jqnatividad/qsv/pull/829luau: addedqsv_autoindex()helper function https://github.com/jqnatividad/qsv/pull/834luau: addedqsv_coalesce()helper function https://github.com/jqnatividad/qsv/commit/3064ba2116ce5c96f3bd7e789616a3b0ffe9f63bluau: added_LASTROWspecial variable to facilitate random access mode
Changed
diff: rename --primary-key-idx -> --key by @janriemer in https://github.com/jqnatividad/qsv/pull/826diff: implement option to sort by columns by @janriemer in https://github.com/jqnatividad/qsv/pull/827luau: parsing improvements https://github.com/jqnatividad/qsv/pull/835luau: bump embedded luau version from 0.562 to 0.566 https://github.com/jqnatividad/qsv/commit/f4a08b4980201015dcba31dfae74d8b1045c0455sniff: major refactoring. https://github.com/jqnatividad/qsv/pull/836- enable polars nightly feature when building nightly https://github.com/jqnatividad/qsv/pull/816
- bump qsv-sniffer from 0.6.1 to 0.7.0 https://github.com/jqnatividad/qsv/commit/5027a64576f19792f917550f9146792d5c9e351a
- Bump crossbeam-channel from 0.5.6 to 0.5.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/818
- Bump flexi_logger from 0.25.1 to 0.25.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/824
- Bump rayon from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/831
- Bump ryu from 1.0.12 to 1.0.13 by @dependabot in https://github.com/jqnatividad/qsv/pull/830
- Bump itoa from 1.0.5 to 1.0.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/832
- cargo update bump dependencies
- pin Rust nightly to 2023-03-04
Fixed
stats: use utf8-aware truncate https://github.com/jqnatividad/qsv/pull/819sniff: fix URL sniffing https://github.com/jqnatividad/qsv/commit/8d2c514fa2a173be626b5c36dbfb70d60335b81e- show polars version in
qsv --versionhttps://github.com/jqnatividad/qsv/commit/586a1ed987fa2efbfbc233bd82f84a52fa4c3859
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.90.1...0.91.0
- Rust
Published by jqnatividad almost 3 years ago
https://github.com/dathere/qsv - 0.90.1
Changed
joinp: Refactor to use LazyFrames instead of DataFrames for performance and ability to do streaming and process files larger than RAM. https://github.com/jqnatividad/qsv/pull/814 and https://github.com/jqnatividad/qsv/pull/815luau: expanded example usingqsv_loghelper https://github.com/jqnatividad/qsv/commit/5c198e4bcb243005dace25d8aecbc58bb211cadc- handled new clippy lints https://github.com/jqnatividad/qsv/commit/e81a391bd675a2f4fb07169c1d6848340104b9fe
- adjust publishing workflows to build binaries with as many features enabled. On some platforms, the
toandpolars(forjoinp) features cannot be built. - cargo update bump indirect dependencies, notably arrow and duckdb
- pin Rust nightly to 2023-02-27
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.90.0...0.90.1
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.90.0
Added
joinp: new join command powered by Pola.rs. This is just the first of more commands that will leverage the Pola.rs engine. https://github.com/jqnatividad/qsv/pull/798luau: added random acess mode; major refactor as we prepare to useluauas qsv's DSL; addedqsv_loghelper that can be called from Luau scripts to facilitate development of full-fledged data-wrangling scripts. https://github.com/jqnatividad/qsv/pull/805 and https://github.com/jqnatividad/qsv/pull/806sniff: added URL & re-enabled stdin support; URL support features sampling only the required number of rows to sniff the metadata without downloading the entire file; expanded sniff metadata returned; added--progressbaroption for URL sniffing https://github.com/jqnatividad/qsv/pull/812sniff: added--timeoutoption for URL inputs; now runs async from all the binary variants https://github.com/jqnatividad/qsv/pull/813
Changed
diff: sort by line when no other sort option is given by @janriemer in https://github.com/jqnatividad/qsv/pull/808luau: rename--prologue/--epilogueoptions to--begin/--end; add embedded BEGIN/END block handling https://github.com/jqnatividad/qsv/pull/801- Update to csvs_convert 0.8 by @kindly in https://github.com/jqnatividad/qsv/pull/800
- use simdutf8 when possible https://github.com/jqnatividad/qsv/commit/ae466cbffbc924cc5c1cc09509dd963c56dfc259
- Bump self_update from 0.35.0 to 0.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/797
- Bump sysinfo from 0.28.0 to 0.28.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/809
- Bump actix-web from 4.3.0 to 4.3.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/811
- improved conditional compilation of different variants https://github.com/jqnatividad/qsv/commit/9e636946504a09a1edeea4b0533d42a0bb658b7f
- temporarily skip CI tests that use httpbin.org as it was causing intermittent failures https://github.com/jqnatividad/qsv/commit/bee160228794c26326baf569e5e7239206ae4314
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-26
Removed
- Python 3.6 support removed https://github.com/jqnatividad/qsv/commit/86b29d487261fda7670072bfd5977dd9508ac0aa
Fixed
sniff: does not work with stdin which fixes #803; https://github.com/jqnatividad/qsv/pull/807
Note that stdin support was shortly re-enabled in https://github.com/jqnatividad/qsv/pull/812
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.89.0...0.90.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.89.0
Added
cat: added newrowskeysubcommand. Unlike the existingrowssubcommand, it allows far more flexible concatenation of CSV files by row, even if the files have different number of columns and column order. https://github.com/jqnatividad/qsv/pull/795- added jemalloc support. As the current default mimalloc allocator is not supported in some platforms. Also, for certain workloads, jemalloc may be faster. See Memory Allocator for more info https://github.com/jqnatividad/qsv/pull/796
- added
--no-memcheckand relatedQSV_NO_MEMORY_CHECKenv var. This relaxes the conservative Out-of-Memory prevention heuristic of qsv. See Memory Management for more info https://github.com/jqnatividad/qsv/pull/792
Changed
--versionnow returns "non-streaming" mode max input file size and detailed memory info. See Version details for more info https://github.com/jqnatividad/qsv/pull/780exclude: expanded usage text and added 'input parameters' help by @tmtmtmtm in https://github.com/jqnatividad/qsv/pull/783stats: performance tweaks in https://github.com/jqnatividad/qsv/commit/96e8168e6064469ab4489ed19c36aa595d5d119d, https://github.com/jqnatividad/qsv/commit/634d42a646dfb3bed2d34842bb3fa484cf641c7e and https://github.com/jqnatividad/qsv/commit/7e148cf78753aa60ef60f8efd6f1c7fea246b703- Use simdutf8 to do SIMD accelerated utf8 validation, replacing problematic utf8 screening. Together with https://github.com/jqnatividad/qsv/pull/782, completes utf8 validation revamp. https://github.com/jqnatividad/qsv/pull/784
- Bump sysinfo from 0.27.7 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/786
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-18
Removed
- Removed patched versions of csv crate optimized for performance. With the release of csv 1.2, switched back to csv crate upstream. https://github.com/jqnatividad/qsv/pull/794
- removed utf8 first 8k screening. It was increasing code complexity and not very reliable. https://github.com/jqnatividad/qsv/pull/782
Fixed
dedup: refactored to use iterators to avoid out of bounds errors. https://github.com/jqnatividad/qsv/commit/f5e547b68410407851f217c706ad303bdbc5a583exclude: don't screen for utf8. This bugfix spurred the utf8 validation revamp, where I realized, I just needed to pull out utf8 screening https://github.com/jqnatividad/qsv/pull/781py:col, notrowhttps://github.com/jqnatividad/qsv/pull/793
New Contributors
- @tmtmtmtm made their first contribution in https://github.com/jqnatividad/qsv/pull/783
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.88.2...0.89.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.88.2
Changed
- also show
--updateand--updatenowerrors on stderr in addition to log file https://github.com/jqnatividad/qsv/pull/770 sortcheck: when a file is not sorted, dupecount is invalid. Set dupecount to -1 to make it plainly evident when file is not sorted. https://github.com/jqnatividad/qsv/pull/771excel: added--quietoption https://github.com/jqnatividad/qsv/commit/99d88499df573f9f46992346f394d9372ceeffccextdedup: minimize allocations in hot loop https://github.com/jqnatividad/qsv/commit/62096fa84505b6de2c108d1f07707008e1c2d170- improved memfilecheck OOM-prevention helper function. Better error messages; clamp free memory headroom percentage between 10 and 90 percent https://github.com/jqnatividad/qsv/commit/6701ebfae58e942117378996ec6679544f620cbf and https://github.com/jqnatividad/qsv/commit/5cd8a95e7b36819f75f0d3bb8172dcff601b649b
- improved utf8 check error messages to give more detail, and not just say there is an encoding error https://github.com/jqnatividad/qsv/commit/c9b5b075d31b9639958193db919683475c3e3ba5
- improved README, adding Regular Expression Syntax section; reordered sections
- modified CI workflows to also check qsvlite
- Bump once_cell from 1.17.0 to 1.17.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/775
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-15
Fixed
dedupunnecessarily doing utf8 check; improveinputusage text https://github.com/jqnatividad/qsv/pull/773dedup: fix unstable dedup results caused by usingpar_sort_unstable_byhttps://github.com/jqnatividad/qsv/pull/776sort: fix unstable sort results caused by usingpar_sort_unstable_byhttps://github.com/jqnatividad/qsv/commit/9f01df41a77dece75e434ee24b3ea0178d58deaf- removed mispublished 0.88.1 release
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.88.0...0.88.2
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.88.0
Added
extdedup: new command to deduplicate arbitrarily large CSV/text files using a memory-buffered, on-disk hash table. Not only does it dedup very large files using constant memory, it does so while retaining the file's original sort order, unlikededupwhich loads the entire file into memory to sort it first before deduping by comparing neighboring rows https://github.com/jqnatividad/qsv/pull/762- Added Out-of-Memory (OOM) handling for "non-streaming" commands (i.e. commands that load the entire file into memory) using a heuristic that if an input file's size is lower than the free memory available minus a default headroom of 20 percent, qsv processing stops gracefully with a detailed message about the potential OOM condition. This headroom can be adjusted using the
QSV_FREEMEMORY_HEADROOM_PCTenvironment variable, which has a minimum value of 10 percent https://github.com/jqnatividad/qsv/pull/767 - add
-Q, --quietoption to all commands that return counts to stderr (dedup,extdedup,search,searchsetandreplace) in https://github.com/jqnatividad/qsv/pull/768
Changed
sort&sortcheck: separate test suites and link from usage text https://github.com/jqnatividad/qsv/pull/756frequency: amortize allocations, preallocate with_capacity. Informal benchmarking shows an improvement of ~30%! 🚀 https://github.com/jqnatividad/qsv/pull/761extsort: refactor. Aligned options withextdedup; now also support stdin/stdout; added--memory-limitoption https://github.com/jqnatividad/qsv/pull/763safenames: minor optimization https://github.com/jqnatividad/qsv/commit/a7df378e0a755300e541dec0fef0b12d39b215f2excel: minor optimization https://github.com/jqnatividad/qsv/commit/75eac7875e276b45e668cbe91271ad86cec8db49stats: add date inferencing false positive warning, with a recommendation how to prevent false positives https://github.com/jqnatividad/qsv/commit/a84a4e614b5c14dd2e0d523bec4c6d9dbeb7c3basortcheck: added note to usage text that dupe_count is only valid if file is sorted https://github.com/jqnatividad/qsv/commit/ab69f144fa2ac375255bf9fbd6dd08bf538c1dfa- reorganized Installation section to differentiate installation options https://github.com/jqnatividad/qsv/commit/9ef8bfc0b90574b41629c7c7bd463289dc1dcb62
- bump MSRV to 1.67.1
- applied select clippy recommendations
- Bump flexi_logger from 0.25.0 to 0.25.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/755
- Bump pyo3 from 0.18.0 to 0.18.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/757
- Bump serde_json from 1.0.92 to 1.0.93 by @dependabot in https://github.com/jqnatividad/qsv/pull/760
- Bump filetime from 0.2.19 to 0.2.20 by @dependabot in https://github.com/jqnatividad/qsv/pull/759
- Bump self_update from 0.34.0 to 0.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/765
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-02-12
Fixed
sortcheck: correct wrong progress message showing invalid dupe_count (as dupe count is only valid if the file is sorted) https://github.com/jqnatividad/qsv/commit/8eaa8240249c5c7eb1ece068764a8caa7e804414py&luau: correct usage text about stderr https://github.com/jqnatividad/qsv/commit/1b56e72988e2dee1502517f8e2dbf036416efb8d
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.87.1...0.88.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.87.1
Changed
safenames: refactor in https://github.com/jqnatividad/qsv/pull/754- better handling of headers that start with a digit, instead of replacing the digit with a _, prepend the unsafe prefix
- quoted identifiers are also considered unsafe, unless conditional mode is used
- verbose modes now also return a list of duplicate header names
- update MSRV to 1.67.0
- cargo update bump depedencies
- disable optimization on test profile for faster CI compilation, which was taking much longer than test run time
- optimize prebuilt nightlies to compile with target-cpu=native
- pin Rust nightly to 2023-02-01
Fixed
safenames: fixed mode behavior inconsistencies https://github.com/jqnatividad/qsv/pull/754 all modes now use the same safenames algorithm. Before, the verbose modes used a simpler one leading to inconsistencies between modes (resolves safenames handling inconsistent between modes #753)
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.87.0...0.87.1
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.87.0
Added
apply: add decimal separator --replacement option to thousands operation. This fully rounds outthousandsformatting, as it will allow formatting numbers to support "euro-style" formats (e.g. 1.234.567,89 instead of 1,234,567.89) https://github.com/jqnatividad/qsv/pull/749apply: add round operation; also refactored thousands operation to use more appropriate--formatstroption instead of--comparandoption to specify "format" of thousands separator policy https://github.com/jqnatividad/qsv/pull/751applydp: add round operation https://github.com/jqnatividad/qsv/pull/752
Changed
- changed MSRV policy to track latest Rust version in Homebrew, instead of latest Rust stable
- removed excess trailing whitespace in
apply&applydpusage text - moved
round_numfunction fromstats.rstoutil.rsso it can be used in round operation inapplyandapplydp - cargo update bump dependencies, notably tokio from 1.24.2 to 1.25.0
- pin Rust nightly to 2023-01-28
Fixed
apply: corrected thousands operation usage text -hexfournothex_fourhttps://github.com/jqnatividad/qsv/commit/6545aa2b3ce470b5f6c039c998e9f6fc21a6ad84
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.86.0...0.87.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.86.0
Added
apply: addedthousandsoperation which adds thousands separators to numeric values. Specify the separator policy with --comparand (default: comma). The valid policies are: comma, dot, space, underscore, hexfour (place a space every four hex digits) and indiancomma (place a comma every two digits, except the last three digits). https://github.com/jqnatividad/qsv/pull/748searchset: added--unmatched-outputoption. This was done to allow Datapusher+ to screen for PIIs more efficiently. Writing PII candidate records in one CSV file, and the "clean" records in another CSV in just one pass. https://github.com/jqnatividad/qsv/pull/742
Changed
fetch&fetchpost: expanded usage text info on HTTP2 Adaptive Flow Control supportfetchpost: added more detail about--compressoptionstats: added more tests- updated prebuilt zip archive READMEs https://github.com/jqnatividad/qsv/commit/072973efd7947a93773b2783d098eeace17d963d
- Bump redis from 0.22.2 to 0.22.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/741
- Bump ahash from 0.8.2 to 0.8.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/743
- Bump jql from 5.1.4 to 5.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/747
- applied select clippy recommendations
- cargo update bump several indirect dependencies
- pin Rust nightly to 2023-01-27
Fixed
stats: fixed antimodes null display. Use the literalNULLinstead of just "" when listing NULL as an antimode. https://github.com/jqnatividad/qsv/pull/745tojsonl: fixed invalid escaping of JSON values https://github.com/jqnatividad/qsv/pull/746
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.85.0...0.86.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.85.0
Added
- Update csvs_convert by @kindly in https://github.com/jqnatividad/qsv/pull/736
sniff: added--delimiteroption https://github.com/jqnatividad/qsv/pull/732fetchpost: add--compressoption in https://github.com/jqnatividad/qsv/pull/737searchset: several tweaks for PII screening requirement of Datapusher+.--flagoption now shows regex labels instead of just row number; new--flag-matches-onlyoption sends only matching rows to output when used with--flag;--jsonoption returns rowswithmatches, total_matches and rowcount as json to stderr. https://github.com/jqnatividad/qsv/pull/738
Changed
luau: minor tweaks to increase code readability https://github.com/jqnatividad/qsv/commit/31d01c8b9eb1fe85262e9bf5fd237ae4493d562cstats: now normalizes after rounding. Normalizing strips trailing zeroes and converts -0.0 to 0.0. https://github.com/jqnatividad/qsv/commit/f838272b4deb79d25ca5704cf3c89652c0b9a3bbsafenames: mention CKAN-specific options https://github.com/jqnatividad/qsv/commit/f371ac25ba0c27e48b7b9b14a37dc47913cf0095fetch&fetchpost: document decompression priority https://github.com/jqnatividad/qsv/commit/43ce13c4bf7eb23dc5d051d522d6d52d3cc255aa- Bump actix-governor from 0.3.2 to 0.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/728
- Bump sysinfo from 0.27.6 to 0.27.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/730
- Bump serial_test from 0.10.0 to 1.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/729
- Bump pyo3 from 0.17.3 to 0.18.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/731
- Bump reqwest from 0.11.13 to 0.11.14 by @dependabot in https://github.com/jqnatividad/qsv/pull/734
- cargo update bump for other dependencies
- pin Rust nightly to 2023-01-21
Fixed
sniff: now checks that--samplesize is greater than zero https://github.com/jqnatividad/qsv/commit/cd4c390ce4322d7076866be27025d67800bc60e2
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.84.0...0.85.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.84.0
Added
headers: added--trimoption to trim quote and spaces from headers https://github.com/jqnatividad/qsv/pull/726
Changed
input:--trim-headersoption also removes excess quotes https://github.com/jqnatividad/qsv/pull/727safenames: trim quotes and spaces from headers https://github.com/jqnatividad/qsv/commit/0260833bc8b36ea6e6ccb9e79687c76470a8a6b0- cargo update bump dependencies
- pin Rust nightly to 2022-01-13
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.83.0...0.84.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.83.0
Added
stats: add sparsity to "streaming" statistics https://github.com/jqnatividad/qsv/pull/719schema: also infer enum constraints for integer fields. Not only good for validation, this is also required bytojsonlfor smarter boolean inferencing https://github.com/jqnatividad/qsv/pull/721
Changed
stats: change--typesonlyso it will not automatically--infer-dates. Let the user decide. https://github.com/jqnatividad/qsv/pull/718stats: if median is already known, use it to calculate Median Absolute Deviation https://github.com/jqnatividad/qsv/commit/08ed08da4651a96bf05372b34b670063fbcec14ftojsonl: smarter boolean inferencing. It will infer a column as boolean if it only has a domain of two values, and the first character of the values are one of the following case-insensitive "truthy/falsy" combinations: t/f; t/null; 1/0; 1/null; y/n & y/null are treated as true/false. https://github.com/jqnatividad/qsv/pull/722 and https://github.com/jqnatividad/qsv/pull/723safenames: process--reservedoption before--prefixoption. https://github.com/jqnatividad/qsv/commit/b333549199726a3e92b95fb1d501fbdbbeede34astrumandstrum-macrosare no longer optional dependencies as we use it with all the binary variants now https://github.com/jqnatividad/qsv/commit/bea6e00fc400e8fafa2938832f8654d97c45fe34- Bump qsv-stats from 0.6.0 to 0.7.0
- Bump sysinfo from 0.27.3 to 0.27.6
- Bump hashbrown from 0.13.1 to 0.13.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/720
- Bump actions/setup-python from 4.4.0 to 4.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/724
- change MSRV from 1.66.0 to 1.66.1
- cargo update bump indirect dependencies
- pin Rust nightly to 2023-01-12
Fixed
safenames: fixed--prefixoption. When checking for invalid underscore prefix, it was checking for hyphen, not underscore, causing a problem with Datapusher+ https://github.com/jqnatividad/qsv/commit/4fbbfd3a479b6678fa9d4c823fd00b592b326c7a
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.82.0...0.83.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.82.0
Added
diff: Find the difference between two CSVs ludicrously fast! by @janriemer in https://github.com/jqnatividad/qsv/pull/711stats: added Median Absolute Deviation (MAD) https://github.com/jqnatividad/qsv/pull/715- added Testing section to README https://github.com/jqnatividad/qsv/commit/517d69b496aaa9535a2b23b05e44a5999d8ef994
Changed
validate: schema-less validation error improvements https://github.com/jqnatividad/qsv/pull/703stats: faster date inferencing https://github.com/jqnatividad/qsv/pull/706stats: minor performance tweaks https://github.com/jqnatividad/qsv/commit/15e6284c20cccf4a6b74498336d31b0d7ba03285 https://github.com/jqnatividad/qsv/commit/3f0ed2b314765a546e28b534d5e82bff892592c3stats: refactored modes compilation, with antimodes no longer unnecessarily compiling more than 10 antimodes it won't show anyway. https://github.com/jqnatividad/qsv/commit/6e448b041a2c78b3ce1cc89aadaff4a8d1081472stats: simplify if condition https://github.com/jqnatividad/qsv/commit/ae7cc85afe1dc4c3f87cbefe3b14dc93b28d94e9luau: show luau version when invoking --version https://github.com/jqnatividad/qsv/commit/f7f9c4297fb3dea685b5d0f631932b6b2ca4a99aexcel: add "sheet" suffix to end msg for readability https://github.com/jqnatividad/qsv/commit/ae3a8e31784a24c8492de76c5074e477cc474063- cache
util::count_rowsresult, so if a CSV without an index is queried, it caches the result and future calls to count_rows in the same session will be instantaneous https://github.com/jqnatividad/qsv/commit/e805dedf5674cfbc56d9948791419ac6fd51f2fd - Bump console from 0.15.3 to 0.15.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/704
- Bump cached from 0.41.0 to 0.42.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/709
- Bump mlua from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/712
- Bump qsv-stats from 0.5.2 to 0.6.0 with the new MAD statistic support and faster, more memory-efficient antimodes compilation
- cargo update bump dependencies - notably mimalloc from 0.1.32 to 0.1.34, luau0-src from 0.4.1luau553 to 0.5.0luau555, csvs_convert from 0.7.9 to 0.7.11 and regex from 1.7.0 to 1.7.1
- pin Rust nightly to 2023-01-08
Fixed
tojsonl: fix escaping of unicode string. Replace hand-rolled escape fn with built-in escape_default fn https://github.com/jqnatividad/qsv/pull/707. Fixes https://github.com/jqnatividad/qsv/issues/705tojsonl: more robust boolean inferencing https://github.com/jqnatividad/qsv/pull/710. Fixes https://github.com/jqnatividad/qsv/issues/708
New Contributors
- @janriemer made their first contribution in https://github.com/jqnatividad/qsv/pull/711
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.81.0...0.82.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.81.0
[0.81.0] - 2023-01-02
Added
stats: added range statistic https://github.com/jqnatividad/qsv/pull/691stats: added additional mode stats. For mode, added modecount and modeoccurrences. Added "antimode" (opposite of mode - least frequently non-zero occurring value), antimodecount and antimodeoccurrences. https://github.com/jqnatividad/qsv/pull/694- qsv-dateparser now recognizes unix timestamp values with fractional seconds to nanosecond precision as dates.
stats,sniff,apply datefmtandschema, which all use qsv-dateparser, now infer unix timestamps as dates - https://github.com/jqnatividad/qsv/commit/a29ff8ea255d5aed9992556a0a23ab76117c8340 https://github.com/jqnatividad/qsv/pull/702 > USAGE NOTE: As timestamps can be float or integer, and data type inferencing will guess dates last, preprocess timestamp columns withapply datefmtfirst to more date-like, non-timestamp formats, so they are recognized as dates by other qsv commands.
Changed
apply: document numtocurrency --comparand & --replacement behavior https://github.com/jqnatividad/qsv/commit/cc88fe921d8cdf7eedcb0008e16ebb5c46744f33index: explicitly flush buffer after creating index https://github.com/jqnatividad/qsv/commit/ee5d790af1cde73dfc57b028bf52fa88e83cdaa4sample: no longer requires an index to do percentage sampling https://github.com/jqnatividad/qsv/commit/45d4657713ebe2ae8388ce55f4cb1a733e727024slice: removed unneeded utf8 check https://github.com/jqnatividad/qsv/commit/5a199f4442bd025cec31309bee44ac71bacbdfaaschema: expand usage text regarding--strict-dateshttps://github.com/jqnatividad/qsv/commit/3d22829f3cf0441961e854555cd0c333bcb3ffb1stats: date stats refactor. Date stats are returned in rfc3339 format. Dates are converted to timestamps with millisecond precision while calculating date stats. https://github.com/jqnatividad/qsv/pull/690 https://github.com/jqnatividad/qsv/commit/e7c297795ff5e82cf1dc242090be11ecced6da9a- filter out variance/stddev in tests as float precision issues are causing flaky CI tests https://github.com/jqnatividad/qsv/pull/696
- Bump qsv-dateparser from 0.4.4 to 0.6.0
- Bump qsv-stats from 0.4.6 to 0.5.2
- Bump qsv-sniffer from 0.5.0 to 0.6.0
- Bump serde from 1.0.151 to 1.0.152 by @dependabot in https://github.com/jqnatividad/qsv/pull/692
- Bump csvs_convert from 0.7.7 to 0.7.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/693
- Bump once_cell from 0.16.0 to 0.17.0 https://github.com/jqnatividad/qsv/commit/d3ac2556c74e2ddd66dcee00e5e836d284b662a7
- Bump self-update from 0.32.0 to 0.34.0 https://github.com/jqnatividad/qsv/commit/5f95933f01e2e0c592b52d7424b6a832aafd3591
- Bump cpc from 1.8 to 1.9; set csvs_convert dependency to minor version https://github.com/jqnatividad/qsv/commit/ee9164810559f5496dfafba0e789b9cd84000a17
- applied select clippy recommendations
- deeplink to Cookbook from Table of Contents
- pin Rust nightly to 2023-01-01
- implementation comments on
stats,sample,sort& Python distribution
Fixed
stats: prevent premature rounding, and make sum statistic use the same rounding method https://github.com/jqnatividad/qsv/commit/879214a1f3032f140f0207fe8807e1bb641110d7 https://github.com/jqnatividad/qsv/commit/1a1362031de8973b623598748bea4bc5fc6e08d3- fix autoindex so we return the index path properly https://github.com/jqnatividad/qsv/commit/d3ce6a3918683d66bf0f3246c7d6e8518eead392
fetch&fetchpost: corrected typo https://github.com/jqnatividad/qsv/commit/684036bbc237d5b80ea060f9ee8b8d46c1a2ad88
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.80.0...0.81.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.80.0
Added
- new
tocommand. Converts CSVs "to" PostgreSQL, SQLite, XLSX, Parquet and Data Package by @kindly in https://github.com/jqnatividad/qsv/pull/656 apply: add numtocurrency operation https://github.com/jqnatividad/qsv/pull/670sort: add --ignore-case option https://github.com/jqnatividad/qsv/pull/673stats: now computes summary statistics for dates as well https://github.com/jqnatividad/qsv/pull/684- added --updatenow option, resolves https://github.com/jqnatividad/qsv/issues/661 https://github.com/jqnatividad/qsv/pull/662
- replace footnotes in Available Commands list with emojis :smile:
Changed
apply&applydp: expose --batch size option https://github.com/jqnatividad/qsv/pull/679validate: add last valid row to validation error https://github.com/jqnatividad/qsv/commit/7680011a2fcc459aa621414122ecaa869e98ae83input: add last valid row to error message https://github.com/jqnatividad/qsv/commit/492e51f85ab5a0637c201d7020d7ac2fdb72be96- upgrade to csvs-convert 0.7.5 by @kindly in https://github.com/jqnatividad/qsv/pull/668
- Bump serial_test from 0.9.0 to 0.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/671
- Bump csvs_convert from 0.7.5 to 0.7.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/674
- Bump num_cpus from 1.14.0 to 1.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/678
- Bump robinraju/release-downloader from 1.6 to 1.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/677
- Bump actions/stale from 6 to 7 by @dependabot in https://github.com/jqnatividad/qsv/pull/676
- Bump actions/setup-python from 4.3.1 to 4.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/683
- added concurrency check to CI tests so that redundant CI test are canceled when new ones are launched
- instead of saying "descriptive statistics", use more understandable "summary statistics"
- changed publishing workflows to enable
tofeature for applicable target platforms - cargo update bump dependencies, notably qsv-stats from 0.4.5 to 0.4.6 and qsv_currency from 0.5.0 to 0.6.0
- pin Rust nightly to 2022-12-22
Fixed
stats: fix leading zero handling https://github.com/jqnatividad/qsv/pull/667apply: fix currencytonum bug https://github.com/jqnatividad/qsv/pull/669
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.79.0...0.80.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.79.0
Added
safenames: add --reserved option, allowing user to specify additional "unsafe" names https://github.com/jqnatividad/qsv/pull/657safenames: add --prefix option https://github.com/jqnatividad/qsv/pull/658fetch&fetchpost: added simple retry backoff multiplier - https://github.com/jqnatividad/qsv/commit/e343398ddd9c804237e73bbc652cc9e51c657b78
Changed
excel: refactored --metadata processing; added more debug messages; minor perf tweaks https://github.com/jqnatividad/qsv/commit/f137bab42f81518acd3ef825cd223b9970d70b02- set MSRV to Rust 1.6.6
- cargo update bump several dependencies, notably qsv-dateparser
- pin Rust nightly to 2022-12-15
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.78.2...0.79.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.78.2
Changed
- cargo update bump paste 1.0.9 to 1.0.10
- pin Rust nightly to 2022-12-12
Removed
excel: remove --safenames option. If you need safenames, use thesafenamescommand https://github.com/jqnatividad/qsv/commit/e5da73bcc64ef3a8c66c611fd6247fa331117544
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.78.1...0.78.2
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.78.1
Changed
qsvdp:applynow available in qsvdp asapplydp- removing the geocode and calconv subcommands, and removing all operations that require third-party crates EXCEPT dynfmt and datefmt which is needed for Datapusher+ https://github.com/jqnatividad/qsv/pull/652excel: fine-tune --metadata processing https://github.com/jqnatividad/qsv/commit/09530d4f65b06060d24b7ed3948aeab25b2aa7c8- bump serde from 1.0.149 to 1.0.150
qsvdpin now included in CI tests
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.78.0...0.78.1
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.78.0
Added
stats: added leading zero handling when inferring types (e.g. zipcodes like "07094" are strings not integers) https://github.com/jqnatividad/qsv/pull/648stats: added --typesonly option, which infers only data types with date inferencing enabled for all columns https://github.com/jqnatividad/qsv/pull/650stats: added underflow handing to sum statistic https://github.com/jqnatividad/qsv/commit/1b5e5451f929ad1c7dc5fb7f17b2a3261809ab05excel: expanded --metadata functionality, with the option to return workbook metadata as JSON as well https://github.com/jqnatividad/qsv/pull/651- added platform-specific README for prebuilt zip archives https://github.com/jqnatividad/qsv/commit/15e247e523dbc22a50ebff1b15d7d0c4eb668bd5
Changed
safenames: improved usage textstats: minor performance tweaks https://github.com/jqnatividad/qsv/commit/88be38b542fc61470a7b0331e7be3a3cad62a7bb and https://github.com/jqnatividad/qsv/commit/8aa58c5ad733116d246e171bcea622c1378b8e48join: minor performance tweaks https://github.com/jqnatividad/qsv/commit/92d41910077148f769ccf2c8a283be2c30d68bbfexclude: minor performance tweaks https://github.com/jqnatividad/qsv/commit/f3cc0ac29c5f3e6cec5a08d3aac3371d32b5eb0fsniff: minor performance tweak https://github.com/jqnatividad/qsv/commit/d2a4676fcb5189fc9232538e68854cfcf4ef808bsortcheck: minor performance tweak https://github.com/jqnatividad/qsv/commit/83c22ae5a623a8b0740f7024aac9448ee809eabd- switch GitHub Actions to use ubuntu-20.04 so as not to link to too new glibc libraries, preventing older distros from running the linux-gnu prebuilts.
- switch GitHub Actions to use macos-12 to minimize flaky CI tests
- expanded
qsvdpdescription in README - Bump actions/setup-python from 4.3.0 to 4.3.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/645
- cargo update bump several indirect dependencies
- pin Rust nightly to 2022-12-10
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.77.0...0.78.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.77.0
Added
safenames: added Verbose JSON options https://github.com/jqnatividad/qsv/pull/644
Changed
py&luau: improved usage text- opt-in self-update in https://github.com/jqnatividad/qsv/pull/640 and https://github.com/jqnatividad/qsv/pull/641
- Create README in prebuilt zip archive with platform specific notes (Logic created but not implemented until next release) https://github.com/jqnatividad/qsv/pull/642
- Simplify python map_datetime test so it works on older Python versions https://github.com/jqnatividad/qsv/commit/e85e4e7bf9bf379f8478b066a9f6dea21afbf0e8
- include date.lua in qsv package so
cargo installworks https://github.com/jqnatividad/qsv/commit/11a0ff8edc5405afd9cc6637de026bf2138a7df0 - Bump data-encoding from 2.3.2 to 2.3.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/638
- cargo update bump several dependencies
- pin Rust nightly to 2022-12-07
Fixed:
safenames: fixed calculation of unsafe headers as it was dupe-counting some unsafe headers - https://github.com/jqnatividad/qsv/pull/644
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.3...0.77.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.76.3
Changed
- cargo update bump serde from 1.0.148 to 1.0.149
- simplify python datetime test so it runs on Python 3.6 and above
Fixed
- reverted
not_luau_compatibleintroduced in 0.76.2 and 0.76.3. Adjusted Github Action publish workflow instead to properly buildluauin qsvdp when the platform supports it.
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.2...0.76.3
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.76.2
Fixed
- tweak
not_luau_compatiblefeature so we can more easily disableluaufeature when cross-compiling for some platforms where we cannot properly build luau.
NOTE: Not published on crates.io due to problems creating prebuilt binaries
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.1...0.76.2
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.76.1
Fixed
- added
not_luau_compatiblefeature so we can more easily disableluaufeature when cross-compiling for some platforms where we cannot properly build luau.
NOTE: Not published on crates.io due to problems creating prebuilt binaries
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.76.0...0.76.1
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.76.0
Added
qsvdp: addluauin anticipation of Datapusher+ optional preprocessing https://github.com/jqnatividad/qsv/pull/634luau: added ability to load libraries using "require"; preload LuaDate library https://github.com/jqnatividad/qsv/pull/633luau: added more extensive debug logging support, adding _idx to debug log messages; trace log level support showing global vars and record values when an error occurs https://github.com/jqnatividad/qsv/pull/636 and https://github.com/jqnatividad/qsv/pull/637
Changed
pyandluau: when errors encountered, return non-zero exit code, along with error count to stderr https://github.com/jqnatividad/qsv/pull/631safenamesandexcel: Unsafe empty column/header names are replaced with "_blank" instead of "_" https://github.com/jqnatividad/qsv/pull/632frequency: replace foreach iterator with regular for; remove unneeded assert https://github.com/jqnatividad/qsv/commit/74eb321defbf294675872a7dd891e8a7aedd31f1- bumped qsv-stats from 0.4.1 to 0.4.5 - fixing sum rounding and variance precision errors.
- cargo update bump several indirect dependencies
- pin Rust nightly to 2022-12-03
Fixed
stats: fix sum rounding and variance precision errors https://github.com/jqnatividad/qsv/pull/635
NOTE: Not published on crates.io due to problems creating prebuilt binaries
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.75.0...0.76.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.75.0
Added:
py: added python datetime module by default in https://github.com/jqnatividad/qsv/pull/629qsvdp(Datapusher+ optimized binary variant): added self-update. However, unlikeqsvandqsvlitebinary variants,qsvdpwill not automatically prompt for a self-update, and will only inform the user if there is a new release. The user will need to invoke the--updateoption explicitly. https://github.com/jqnatividad/qsv/pull/622
Changed:
stats: Speedup type checking by @kindly in https://github.com/jqnatividad/qsv/pull/625validate: Added a useful note about validate output by @aborruso in https://github.com/jqnatividad/qsv/pull/624luau: Now precompiles all scripts, including the--prologue&--epiloguescripts, into bytecode https://github.com/jqnatividad/qsv/commit/e97c2caf81316bcf655875a9bee4c78dac5a8b70frequency: remove unsafe fromutf8unchecked https://github.com/jqnatividad/qsv/commit/16642e8ee3364309c1a774142976f6207ba5c594- More robust autoindexing in https://github.com/jqnatividad/qsv/pull/623
- minor clippy performance tweaks to rust-csv fork
- Bump serde from 1.0.147 to 1.0.148 by @dependabot in https://github.com/jqnatividad/qsv/pull/620
- cargo update bump several indirect dependencies
- improved README; use :sparkle: to indicate commands behind a feature flag
- pin Rust nightly to 2022-11-30
New Contributors
- @aborruso made their first contribution in https://github.com/jqnatividad/qsv/pull/624
- @kindly made their first contribution in https://github.com/jqnatividad/qsv/pull/625
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.74.0...0.75.0
- Rust
Published by jqnatividad about 3 years ago
https://github.com/dathere/qsv - 0.74.0
Added:
safenames: added --verify and --verbose modes in https://github.com/jqnatividad/qsv/pull/610 and https://github.com/jqnatividad/qsv/pull/615
Changed:
excel: align --safenames option tosafenamescommand in https://github.com/jqnatividad/qsv/pull/611 and https://github.com/jqnatividad/qsv/pull/616luau: Now precompiles main script to bytecode; now allow loading luau script from file for main, prologue and epilogue scripts in https://github.com/jqnatividad/qsv/pull/619sniff: increase default sample size from 100 to 1000 in https://github.com/jqnatividad/qsv/commit/40d52cf0c67e39d645a1c76a26ae234999317b0bvalidate: applied various optimizations in https://github.com/jqnatividad/qsv/commit/bfed127f28c4ccf6e9a18a5998588396594831d2 and https://github.com/jqnatividad/qsv/commit/06c109a0335326f57d903211334b4f2fb1ab7ccc- updated Github Actions workflows to reflect removal of luajit feature
- Bump sysinfo from 0.26.7 to 0.26.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/614
- Bump rust_decimal from 1.26.1 to 1.27.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/617
- cargo bump update several indirect dependencies
- applied various clippy recommendations
- pin Rust nightly to 2022-11-25
Removed:
luajit: removed as its been deprecated by optimizedluaucommand which now support precompiling to bytecode, largely obviating the main feature of LuaJIT - Just-in-Time compilation in https://github.com/jqnatividad/qsv/pull/619
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.73.2...0.74.0
- Rust
Published by jqnatividad over 3 years ago
https://github.com/dathere/qsv - 0.73.2
Changed:
- Link to tests as examples from usage text in https://github.com/jqnatividad/qsv/pull/608
- Bump serde_json from 1.0.88 to 1.0.89 by @dependabot in https://github.com/jqnatividad/qsv/pull/607
- cargo update bump to get latest crossbeam crates to replace yanked crates https://github.com/jqnatividad/qsv/commit/5108a87b0f5e2d5a7cfef3f60f4cd6b3659bce7d
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.73.1...0.73.2
- Rust
Published by jqnatividad over 3 years ago
https://github.com/dathere/qsv - 0.73.1
Changed:
- rename
safenamecommand tosafenamesfor consistency - cargo update bump indirect dependencies
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.73.0...0.73.1
- Rust
Published by jqnatividad over 3 years ago
https://github.com/dathere/qsv - 0.73.0
Added
safenames: new command to modify header names to db-safe names in https://github.com/jqnatividad/qsv/pull/606apply: addedcensor-countoperation in https://github.com/jqnatividad/qsv/pull/599apply: addedescapeoperation in https://github.com/jqnatividad/qsv/pull/600excel: added--safe-namesoption in https://github.com/jqnatividad/qsv/pull/598
Changed
apply: refactored to use enums instead of strings for operations in https://github.com/jqnatividad/qsv/pull/601fetch&fetchpost: --http-header -H shortcut in https://github.com/jqnatividad/qsv/pull/596excel: smarter date parsing for XLSX files; rename --safe-column-names to --safe-names in https://github.com/jqnatividad/qsv/pull/603- Smarter safe names in https://github.com/jqnatividad/qsv/pull/605
- Bump uuid from 1.2.1 to 1.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/594
- Bump mimalloc from 0.1.31 to 0.1.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/595
- Bump censor from 0.2.0 to 0.3.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/597
- Bump Swatinem/rust-cache from 1 to 2 by @dependabot in https://github.com/jqnatividad/qsv/pull/602
- cargo update bump several indirect dependencies
- pin Rust nightly to 2022-11-19
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.72.0...0.73.0
- Rust
Published by jqnatividad over 3 years ago
