xlsx-to-usv-rust-crate

Microsoft Excel (XLSX) to Unicode Separated Values (USV) Rust crate

https://github.com/sixarm/xlsx-to-usv-rust-crate

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

Microsoft Excel (XLSX) to Unicode Separated Values (USV) Rust crate

Basic Info
  • Host: GitHub
  • Owner: SixArm
  • License: other
  • Language: Rust
  • Default Branch: main
  • Size: 110 KB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

xlsx-to-usv

Convert Microsoft Excel (XLSX) to Unicode Separated Values (USV). Built with the USV Rust crate.

Syntax:

sh stdin | xlsx-to-usv [options] | stdout

Example:

sh cat example.xlsx | xlsx-to-usv

More examples below.

Options

Options for USV separators and modifiers:

  • -u, --unit-separator : Set the unit separator (US) string.

  • -r, --record-separator : Set the record separator (RS) string.

  • -g, --group-separator : Set the group separator (GS) string.

  • -f, --file-separator : Set the file separator (FS) string.

  • -e, --escape : Set the escape (ESC) string.

  • -z, --end-of-transmission : Set the end of transmission (EOT) string.

Options for USV style:

  • --style-braces : Set the style to use braces, such as "{US}" for Unit Separator.

  • --style-controls : Set the style to use controls, such as "\u{001F}" for Unit Separator.

  • --style-symbols : Set the style to use symbols, such as "␟" for Unit Separator.

Options for USV layout:

  • --layout-0: Show each item with no line around it. This is no layout, in other words one long line.

  • --layout-1: Show each item with one line around it. This is like single-space lines for long form text.

  • --layout-2: Show each item with two lines around it. This is like double-space lines for long form text.

  • --layout-units: Show each unit on one line. This can be helpful for line-oriented tools.

  • --layout-records: Show each record on one line. This is like a typical spreadsheet sheet export.

  • --layout-groups: Show each group on one line. This can be helpful for folio-oriented tools.

  • --layout-files: Show one file on one line. This can be helpful for archive-oriented tools.

Options for command line tools:

  • -h, --help : Print help

  • -V, --version : Print version

  • -v, --verbose... : Set the verbosity level: 0=none, 1=error, 2=warn, 3=info, 4=debug, 5=trace. Example: --verbose …

  • --test : Print test output for debugging, verifying, tracing, and the like. Example: --test

Install

Install:

sh cargo install xlsx-to-usv

Link: https://crates.io/crates/xlsx-to-usv

Example

Excel and USV have similar data concepts:

| Excel | USV | |-----------|--------| | Workbook | File | | Worksheet | Group | | Row | Record | | Cell | Unit |

Suppose file example.xlsx contains this kind of data:

```xlsx Worksheet 1 a,b c,d

Worksheet 2 d,e f,g ```

Run:

sh cat example.xlsx | xlsx-to-usv

Output:

usv Worksheet 1␟␞ a␟b␟␞ c␟d␟␞ ␝ Worksheet 2␟␞ e␟f␟␞ g␟h␟␞ ␝

If you prefer ASCII Separated Values (ASV) with zero-width character controls:

Run:

sh cat example.xlsx | xlsx-to-usv --style-controls

Output:

usv Worksheet 1\u{001F}\u{001E} a\u{001F}b\u{001F}\u{001E} c\u{001F}d\u{001F}\u{001E} \u{001D} Worksheet 2\u{001F}\u{001E} e\u{001F}f\u{001F}\u{001E} g\u{001F}h\u{001F}\u{001E} \u{001D}

If you prefer to render markers with braces, to see the markers more easily:

sh cat example.xlsx | xlsx-to-usv --style-braces

Output:

usv Worksheet 1{US}{RS} a{US}b{US}{RS} c{US}d{US}{RS} {GS} Worksheet 2{US}{RS} e{US}f{US}{RS} g{US}h{US}{RS} {GS}

For more, see the official repository:
Unicode Separated Values (USV)

FAQ

What converters are available?

When to use this command?

Use this command when you want to convert from XLSX to USV.

A typical use case is when you have XLSX data, such as a spreadsheet file, and you want to convert it to USV, such as to make the data easier to view in a terminal, or edit in a text editor, or maintain in a text format.

Our real-world use case is converting a bunch of XLSX spreadsheet exports from a variety of programs, including Excel, to USV so we're better-able to handle quoting, and multi-line data units, and Unicode characters in a wide variety of human languages.

Is USV aiming to become a standard?

Yes, USV is submitted to IETF.org as an Internet-Draft work in progress: link.

Can I build my own USV tools?

Yes, and you may freely use the USV RFC and USV Rust crate.

Help wanted

Constructive feedback welcome. Pull requests and feature requests welcome.

Tracking

  • Package: xlsx-to-usv-rust-crate
  • Version: 1.2.4
  • Created: 2024-03-09T13:33:20Z
  • Updated: 2024-04-11T18:32:40Z
  • License: MIT or Apache-2.0 or GPL-2.0 or GPL-3.0 or contact us for more
  • Contact: Joel Parker Henderson (joel@sixarm.com)

Owner

  • Name: SixArm
  • Login: SixArm
  • Kind: organization
  • Email: sixarm@sixarm.com
  • Location: San Francisco

SixArm Software

Citation (CITATION.cff)

cff-version: 1.2.0
title: Options
message: >-
  If you use this work and you want to cite it,
  then you can use the metadata from this file.
type: software
authors:
  - given-names: Joel Parker
    family-names: Henderson
    email: joel@joelparkerhenderson.com
    affiliation: joelparkerhenderson.com
    orcid: 'https://orcid.org/0009-0000-4681-282X'
identifiers:
  - type: url
    value: 'https://github.com/SixArm/xlsx-to-usv-rust-crate/'
    description: Options
repository-code: 'https://github.com/SixArm/xlsx-to-usv-rust-crate/'
abstract: >-
  Options
license: See license file

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 19
  • Total Committers: 1
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Joel Parker Henderson j****l@j****m 19
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Cargo.lock cargo
  • adler 1.0.2
  • aho-corasick 1.1.2
  • anes 0.1.6
  • anstream 0.6.13
  • anstyle 1.0.6
  • anstyle-parse 0.2.3
  • anstyle-query 1.0.2
  • anstyle-wincon 3.0.2
  • autocfg 1.1.0
  • bumpalo 3.15.4
  • byteorder 1.5.0
  • calamine 0.24.0
  • cast 0.3.0
  • cfg-if 1.0.0
  • ciborium 0.2.2
  • ciborium-io 0.2.2
  • ciborium-ll 0.2.2
  • clap 4.5.2
  • clap_builder 4.5.2
  • clap_lex 0.7.0
  • codepage 0.1.1
  • colorchoice 1.0.0
  • crc32fast 1.4.0
  • criterion 0.5.1
  • criterion-plot 0.5.0
  • crossbeam-deque 0.8.5
  • crossbeam-epoch 0.9.18
  • crossbeam-utils 0.8.19
  • crunchy 0.2.2
  • either 1.10.0
  • encoding_rs 0.8.33
  • env_filter 0.1.0
  • env_logger 0.11.3
  • flate2 1.0.28
  • getrandom 0.2.12
  • half 2.4.0
  • hermit-abi 0.3.9
  • humantime 2.1.0
  • is-terminal 0.4.12
  • itertools 0.10.5
  • itoa 1.0.10
  • js-sys 0.3.69
  • libc 0.2.153
  • log 0.4.21
  • memchr 2.7.1
  • miniz_oxide 0.7.2
  • num-traits 0.2.18
  • once_cell 1.19.0
  • oorandom 11.1.3
  • plotters 0.3.5
  • plotters-backend 0.3.5
  • plotters-svg 0.3.5
  • ppv-lite86 0.2.17
  • proc-macro2 1.0.79
  • quick-xml 0.31.0
  • quote 1.0.35
  • rand 0.8.5
  • rand_chacha 0.3.1
  • rand_core 0.6.4
  • rayon 1.9.0
  • rayon-core 1.12.1
  • regex 1.10.3
  • regex-automata 0.4.6
  • regex-syntax 0.8.2
  • ryu 1.0.17
  • same-file 1.0.6
  • serde 1.0.197
  • serde_derive 1.0.197
  • serde_json 1.0.114
  • strsim 0.11.0
  • syn 2.0.52
  • tinytemplate 1.2.1
  • unicode-ident 1.0.12
  • usv 0.10.3
  • utf8parse 0.2.1
  • walkdir 2.5.0
  • wasi 0.11.0+wasi-snapshot-preview1
  • wasm-bindgen 0.2.92
  • wasm-bindgen-backend 0.2.92
  • wasm-bindgen-macro 0.2.92
  • wasm-bindgen-macro-support 0.2.92
  • wasm-bindgen-shared 0.2.92
  • web-sys 0.3.69
  • winapi 0.3.9
  • winapi-i686-pc-windows-gnu 0.4.0
  • winapi-util 0.1.6
  • winapi-x86_64-pc-windows-gnu 0.4.0
  • windows-sys 0.52.0
  • windows-targets 0.52.4
  • windows_aarch64_gnullvm 0.52.4
  • windows_aarch64_msvc 0.52.4
  • windows_i686_gnu 0.52.4
  • windows_i686_msvc 0.52.4
  • windows_x86_64_gnu 0.52.4
  • windows_x86_64_gnullvm 0.52.4
  • windows_x86_64_msvc 0.52.4
  • zip 0.6.6
Cargo.toml cargo
  • criterion >= 0.5 development
  • once_cell 1.19.0 development
  • rand >= 0.8 development
  • calamine 0.24
  • clap 4.5.2
  • env_logger 0.11.3
  • log 0.4.21
  • usv 0.10.3