@stdlib/utils-dsv-base-parse

Parser for delimiter-separated values (DSV).

https://github.com/stdlib-js/utils-dsv-base-parse

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary

Keywords

base csv data delimiter dsv format javascript node node-js nodejs parse parser stdlib table tabular tsv util utilities utils
Last synced: 4 months ago · JSON representation ·

Repository

Parser for delimiter-separated values (DSV).

Basic Info
Statistics
  • Stars: 1
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
base csv data delimiter dsv format javascript node node-js nodejs parse parser stdlib table tabular tsv util utilities utils
Created over 3 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Security

README.md

About stdlib...

We believe in a future in which the web is a preferred environment for numerical computation. To help realize this future, we've built stdlib. stdlib is a standard library, with an emphasis on numerical and scientific computation, written in JavaScript (and C) for execution in browsers and in Node.js.

The library is fully decomposable, being architected in such a way that you can swap out and mix and match APIs and functionality to cater to your exact preferences and use cases.

When you use stdlib, you can be absolutely certain that you are using the most thorough, rigorous, well-written, studied, documented, tested, measured, and high-quality code out there.

To join us in bringing numerical computing to the web, get started by checking us out on GitHub, and please consider financially supporting stdlib. We greatly appreciate your continued support!

DSV Parser

NPM version Build Status Coverage Status <!-- dependencies -->

Incremental parser for delimiter-separated values (DSV).

## Installation ```bash npm install @stdlib/utils-dsv-base-parse ``` Alternatively, - To load the package in a website via a `script` tag without installation and bundlers, use the [ES Module][es-module] available on the [`esm`][esm-url] branch (see [README][esm-readme]). - If you are using Deno, visit the [`deno`][deno-url] branch (see [README][deno-readme] for usage intructions). - For use in Observable, or in browser/node environments, use the [Universal Module Definition (UMD)][umd] build available on the [`umd`][umd-url] branch (see [README][umd-readme]). The [branches.md][branches-url] file summarizes the available branches and displays a diagram illustrating their relationships. To view installation and usage instructions specific to each branch build, be sure to explicitly navigate to the respective README files on each branch, as linked to above.
## Usage ```javascript var Parser = require( '@stdlib/utils-dsv-base-parse' ); ``` #### Parser( \[options] ) Returns an incremental parser for delimiter-separated values (DSV). ```javascript var parse = new Parser(); // Parse a line of comma-separated values (CSV): parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ] // ... // Parse multiple lines of CSV: parse.next( '4,5,6\r\n7,8,9\r\n' ); // => [ '4', '5', '6' ], [ '7', '8', '9' ] // ... // Parse partial lines: parse.next( 'a,b' ); parse.next( ',c,d\r\n' ); // => [ 'a', 'b', 'c', 'd' ] // ... // Chain together invocations: parse.next( 'e,f' ).next( ',g,h' ).next( '\r\n' ); // => [ 'e', 'f', 'g', 'h' ] ``` The constructor accepts the following `options`: - **comment**: character sequence appearing at the beginning of a row which demarcates that the row content should be parsed as a commented line. A commented line ends upon encountering the first newline character sequence, regardless of whether that newline character sequence is preceded by an escape character sequence. Default: `''`. - **delimiter**: character sequence separating record fields (e.g., `','` for comma-separated values (CSV) and `\t` for tab-separated values (TSV)). Default: `','`. - **doublequote**: `boolean` flag indicating how quote sequences should be escaped within a quoted field. When `true`, a quote sequence must be escaped by another quote sequence. When `false`, a quote sequence must be escaped by the escape sequence. Default: `true`. - **escape**: character sequence for escaping character sequences having special meaning (i.e., the delimiter, newline, and escape sequences outside of quoted fields; the comment sequence at the beginning of a record and outside of a quoted field; and the quote sequence inside a quoted field when `doublequote` is `false`). Default: `''`. - **ltrim**: `boolean` indicating whether to trim leading whitespace from field values. If `false`, the parser does not trim leading whitespace (e.g., `a, b, c` parses as `[ 'a', ' b', ' c' ]`). If `true`, the parser trims leading whitespace (e.g., `a, b, c` parses as `[ 'a', 'b', 'c' ]`). Default: `false`. - **maxRows**: maximum number of records to process (excluding skipped lines). By default, the maximum number of records is unlimited. - **newline**: character sequence separating rows. Default: `'\r\n'` (see [RFC 4180][rfc-4180]). - **onClose**: callback to be invoked upon closing the parser. If a parser has partially processed a record upon close, the callback is invoked with the following arguments: - **value**: unparsed partially processed **field** text. Otherwise, the callback is invoked without any arguments. - **onColumn**: callback to be invoked upon processing a field. The callback is invoked with the following arguments: - **field**: field value. - **row**: row number (zero-based). - **col**: field (column) number (zero-based). - **line**: line number (zero-based). - **onComment**: callback to be invoked upon processing a commented line. The callback is invoked with the following arguments: - **comment**: comment text. - **line**: line number (zero-based). - **onError**: callback to be invoked upon encountering an unrecoverable parse error. By default, upon encountering a parse error, the parser throws an `Error`. When provided an error callback, the parser does **not** throw and, instead, invokes the provided callback. The callback is invoked with the following arguments: - **error**: an `Error` object. - **onRow**: callback to be invoked upon processing a record. The callback is invoked with the following arguments: - **record**: an array-like object containing field values. If provided a `rowBuffer`, the `record` argument will be the **same** array-like object for each invocation. - **row**: row number (zero-based). - **ncols**: number of fields (columns). - **line**: line number (zero-based). If a parser is closed **before** fully processing the last record, the callback is invoked with field data for all fields which have been parsed. Any remaining field data is provided to the `onClose` callback. For example, if a parser has processed two fields and closes while attempting to process a third field, the parser invokes the `onRow` callback with field data for the first two fields and invokes the `onClose` callback with the partially processed data for the third field. - **onSkip**: callback to be invoked upon processing a skipped line. The callback is invoked with the following arguments: - **record**: unparsed record text. - **line**: line number (zero-based). - **onWarn**: when `strict` is `false`, a callback to be invoked upon encountering invalid DSV. The callback is invoked with the following arguments: - **error**: an `Error` object. - **quote**: character sequence demarcating the beginning and ending of a quoted field. When `quoting` is `false`, a quote character sequence has no special meaning and is processed as normal text. Default: `'"'`. - **quoting**: `boolean` flag indicating whether to enable special processing of quote character sequences (i.e., when a quote sequence should demarcate a quoted field). Default: `true`. - **rowBuffer**: array-like object for the storing field values of the most recently processed record. When provided, the row buffer is **reused** and is provided to the `onRow` callback for each processed record. If a provided row buffer is a generic array, the parser grows the buffer as needed. If a provided row buffer is a typed array, the buffer size is fixed, and, thus, needs to be large enough to accommodate processed fields. Providing a fixed length array is appropriate when the number of fields is known prior to parsing. When the number of fields is unknown, providing a fixed length array may still be appropriate; however, one is advised to allocate a buffer having more elements than is reasonably expected in order to avoid buffer overflow. - **rtrim**: `boolean` indicating whether to trim trailing whitespace from field values. If `false`, the parser does not trim trailing whitespace (e.g., `a ,b ,c` parses as `[ 'a ', 'b ', 'c' ]`). If `true`, the parser trims trailing whitespace (e.g., `a ,b ,c` parses as `[ 'a', 'b', 'c' ]`). Default: `false`. - **skip**: character sequence appearing at the beginning of a row which demarcates that the row content should be parsed as a skipped record. Default: `''`. - **skipBlankRows**: `boolean` flag indicating whether to skip over rows which are either empty or containing only whitespace. Default: `false`. - **skipRow**: callback whose return value indicates whether to skip over a row. The callback is invoked with the following arguments: - **nrows**: number of processed rows (equivalent to the current row number). - **line**: line number (zero-based). If the callback returns a truthy value, the parser skips the row; otherwise, the parser attempts to process the row. Note, however, that, even if the callback returns a falsy value, a row may still be skipped depending on the presence of a `skip` character sequence. - **strict**: `boolean` flag indicating whether to raise an exception upon encountering invalid DSV. When `false`, instead of throwing an `Error` or invoking the `onError` callback, the parser invokes an `onWarn` callback with an `Error` object specifying the encountered error. Default: `true`. - **trimComment**: `boolean` flag indicating whether to trim leading whitespace in commented lines. Default: `true`. - **whitespace**: list of characters to be interpreted as whitespace. Default: `[ ' ' ]`. The parser does **not** perform field conversion/transformation and, instead, is solely responsible for incrementally identifying fields and records. Further processing of fields/records is the responsibility of parser consumers who are generally expected to provide either an `onColumn` callback, an `onRow` callback, or both. ```javascript var format = require( '@stdlib/string-format' ); function onColumn( field, row, col ) { console.log( format( 'Row: %d. Column: %d. Value: %s', row, col, field ) ); } function onRow( record, row, ncols ) { console.log( format( 'Row: %d. nFields: %d. Value: | %s |', row, ncols, record.join( ' | ' ) ) ); } var opts = { 'onColumn': onColumn, 'onRow': onRow }; var parse = new Parser( opts ); parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ] parse.next( '5,6,7,8\r\n' ); // => [ '5', '6', '7', '8' ] // ... ``` Upon closing the parser, the parser invokes an `onClose` callback with any partially processed (i.e., incomplete) **field** data. Note, however, that the field data may **not** equal the original character sequence, as escape sequences may have already been removed. ```javascript var format = require( '@stdlib/string-format' ); function onClose( v ) { console.log( format( 'Incomplete: %s', v ) ); } var opts = { 'onClose': onClose }; var parse = new Parser( opts ); parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ] // ... // Provide an incomplete record: parse.next( '5,6,"foo' ); // Close the parser: parse.close(); ``` By default, the parser assumes [RFC 4180][rfc-4180]-compliant newline-delimited comma separated values (CSV). To specify alternative separators, specify the relevant options. ```javascript var opts = { 'delimiter': '--', 'newline': '%%' }; var parse = new Parser( opts ); parse.next( '1--2--3--4%%' ); // => [ '1', '2', '3', '4' ] parse.next( '5--6--7--8%%' ); // => [ '5', '6', '7', '8' ] // ... ``` By default, the parser escapes double (i.e., two consecutive) quote character sequences within quoted fields. To parse DSV in which quote character sequences are escaped by an escape character sequence within quoted fields, set `doublequote` to `false` and specify the escape character sequence. ```javascript // Default parser: var parse = new Parser(); // Parse DSV using double quoting: parse.next( '1,"""2""",3,4\r\n' ); // => [ '1', '"2"', '3', '4' ] // ... // Create a parser which uses a custom escape sequence within quoted fields: var opts = { 'doublequote': false, 'escape': '\\' }; parse = new Parser( opts ); parse.next( '1,"\\"2\\"",3,4\r\n' ); // => [ '1', '"2"', '3', '4' ] ``` When `quoting` is `true`, the parser identifies a quote character sequence at the beginning of a field as the start of a quoted field. To process quote character sequences as normal field text, set `quoting` to `false`. ```javascript // Default parser; var parse = new Parser(); parse.next( '1,"2",3,4\r\n' ); // => [ '1', '2', '3', '4' ] // ... // Create a parser which treats quote sequences as normal field text: var opts = { 'quoting': false }; parse = new Parser( opts ); parse.next( '1,"2",3,4\r\n' ); // => [ '1', '"2"', '3', '4' ] ``` To parse DSV containing commented lines, specify a comment character sequence which demarcates the beginning of a commented line. ```javascript var opts = { 'comment': '#' }; var parse = new Parser( opts ); parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ] parse.next( '# This is a commented line.\r\n' ); // comment parse.next( '9,10,11,12\r\n' ); // => [ '9', '10', '11', '12' ] ``` To parse DSV containing skipped lines, specify a skip character sequence which demarcates the beginning of a skipped line. ```javascript var opts = { 'skip': '//' }; var parse = new Parser( opts ); parse.next( '1,2,3,4\r\n' ); // => [ '1', '2', '3', '4' ] parse.next( '//5,6,7,8\r\n' ); // skipped line parse.next( '9,10,11,12\r\n' ); // => [ '9', '10', '11', '12' ] ``` * * * ### Properties #### Parser.prototype.done **Read-only** property indicating whether a parser is able to process new chunks. ```javascript var parse = new Parser(); parse.next( '1,2,3,4\r\n' ); // ... var b = parse.done; // returns false // ... parse.close(); // ... b = parse.done; // returns true ``` * * * ### Methods #### Parser.prototype.next( chunk ) Incrementally parses the next chunk. ```javascript var parse = new Parser(); parse.next( '1,2,3,4\r\n' ); // ... parse.next( '5,6,7,8\r\n' ); // ... ``` #### Parser.prototype.close() Closes the parser. ```javascript var parse = new Parser(); parse.next( '1,2,3,4\r\n' ); // ... parse.next( '5,6,7,8\r\n' ); // ... parse.close(); ``` After closing a parser, a parser raises an exception upon receiving any additional chunks.

## Notes - Special character sequences (i.e., delimiter, newline, quote, escape, skip, and comment sequences) **must** all be unique with respect to one another, and **no** special character sequence is allowed to be a subsequence of another special character sequence. Allowing common subsequences would lead to ambiguous parser states. For example, given the chunk `1,,3,4,,`, if `delimiter` is `','` and `newline` is `',,'`, is the first `,,` a field with no content or a newline? The parser cannot be certain, hence the prohibition. - As specified in [RFC 4180][rfc-4180], special character sequences **must** be consistent across all provided chunks. Hence, providing chunks in which, e.g., line breaks vary between `\r`, `\n`, and `\r\n` is **not** supported.

## Examples ```javascript var format = require( '@stdlib/string-format' ); var Parser = require( '@stdlib/utils-dsv-base-parse' ); function onColumn( v, row, col ) { console.log( format( 'Row: %d. Column: %d. Value: %s', row, col, v ) ); } function onRow( v, row, ncols ) { console.log( format( 'Row: %d. nFields: %d. Value: | %s |', row, ncols, v.join( ' | ' ) ) ); } function onComment( str ) { console.log( format( 'Comment: %s', str ) ); } function onSkip( str ) { console.log( format( 'Skipped line: %s', str ) ); } function onWarn( err ) { console.log( format( 'Warning: %s', err.message ) ); } function onError( err ) { console.log( format( 'Error: %s', err.message ) ); } function onClose( v ) { console.log( format( 'End: %s', v || '(none)' ) ); } var opts = { 'strict': false, 'newline': '\r\n', 'delimiter': ',', 'escape': '\\', 'comment': '#', 'skip': '//', 'doublequote': true, 'quoting': true, 'onColumn': onColumn, 'onRow': onRow, 'onComment': onComment, 'onSkip': onSkip, 'onError': onError, 'onWarn': onWarn, 'onClose': onClose }; var parse = new Parser( opts ); var str = [ [ '1', '2', '3', '4' ], [ '5', '6', '7', '8' ], [ 'foo\\,', 'bar\\ ,', 'beep\\,', 'boop\\,' ], [ '""",1,"""', '""",2,"""', '""",3,"""', '""",4,"""' ], [ '# This is a "comment", including with commas.' ], [ '\\# Escaped comment', '# 2', '# 3', '# 4' ], [ '1', '2', '3', '4' ], [ '//A,Skipped,Line,!!!' ], [ '"foo"', '"bar\\ "', '"beep"', '"boop"' ], [ ' # 😃', ' # 🥳', ' # 😮', ' # 🤠' ] ]; var i; for ( i = 0; i < str.length; i++ ) { str[ i ] = str[ i ].join( opts.delimiter ); } str = str.join( opts.newline ); console.log( format( 'Input:\n\n%s\n', str ) ); parse.next( str ).close(); ```
* * * ## Notice This package is part of [stdlib][stdlib], a standard library for JavaScript and Node.js, with an emphasis on numerical and scientific computing. The library provides a collection of robust, high performance libraries for mathematics, statistics, streams, utilities, and more. For more information on the project, filing bug reports and feature requests, and guidance on how to develop [stdlib][stdlib], see the main project [repository][stdlib]. #### Community [![Chat][chat-image]][chat-url] --- ## License See [LICENSE][stdlib-license]. ## Copyright Copyright © 2016-2025. The Stdlib [Authors][stdlib-authors].

Owner

  • Name: stdlib
  • Login: stdlib-js
  • Kind: organization

Standard library for JavaScript.

Citation (CITATION.cff)

cff-version: 1.2.0
title: stdlib
message: >-
  If you use this software, please cite it using the
  metadata from this file.

type: software

authors:
  - name: The Stdlib Authors
    url: https://github.com/stdlib-js/stdlib/graphs/contributors

repository-code: https://github.com/stdlib-js/stdlib
url: https://stdlib.io

abstract: |
  Standard library for JavaScript and Node.js.

keywords:
  - JavaScript
  - Node.js
  - TypeScript
  - standard library
  - scientific computing
  - numerical computing
  - statistical computing

license: Apache-2.0 AND BSL-1.0

date-released: 2016

GitHub Events

Total
  • Push event: 15
Last Year
  • Push event: 15

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 64
  • Total Committers: 1
  • Avg Commits per committer: 64.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 8
  • Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
stdlib-bot n****y@s****o 64
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • npm 4 last-month
  • Total dependent packages: 2
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 4
npmjs.org: @stdlib/utils-dsv-base-parse

Incremental parser for delimiter-separated values (DSV).

  • Homepage: https://stdlib.io
  • License: Apache-2.0
  • Latest release: 0.2.2
    published over 1 year ago
  • Versions: 7
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 4 Last month
Rankings
Dependent packages count: 8.9%
Dependent repos count: 10.3%
Downloads: 11.4%
Average: 12.6%
Forks count: 15.4%
Stargazers count: 16.7%
Funding
  • type: opencollective
  • url: https://opencollective.com/stdlib
Last synced: 5 months ago

Dependencies

package.json npm
  • @stdlib/bench ^0.0.x development
  • @stdlib/math-base-special-pow ^0.0.x development
  • istanbul ^0.4.1 development
  • tap-spec 5.x.x development
  • tape git+https://github.com/kgryte/tape.git#fix/globby development
  • @stdlib/boolean-ctor ^0.0.x
  • @stdlib/string-base-replace ^0.0.x
  • @stdlib/string-format ^0.0.x
  • @stdlib/utils-define-nonenumerable-read-only-accessor ^0.0.x
  • @stdlib/utils-define-nonenumerable-read-only-property ^0.0.x
  • @stdlib/utils-escape-regexp-string ^0.0.x
  • @stdlib/utils-noop ^0.0.x
  • debug ^2.6.9
.github/workflows/benchmark.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
.github/workflows/cancel.yml actions
  • styfle/cancel-workflow-action 0.11.0 composite
.github/workflows/close_pull_requests.yml actions
  • superbrothers/close-pull-request v3 composite
.github/workflows/examples.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
.github/workflows/npm_downloads.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • actions/upload-artifact v3 composite
  • distributhor/workflow-webhook v3 composite
.github/workflows/productionize.yml actions
  • act10ns/slack v1 composite
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • stdlib-js/bundle-action main composite
  • stdlib-js/transform-errors-action main composite
.github/workflows/publish.yml actions
  • JS-DevTools/npm-publish v1 composite
  • act10ns/slack v1 composite
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • styfle/cancel-workflow-action 0.11.0 composite
.github/workflows/test.yml actions
  • act10ns/slack v1 composite
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
.github/workflows/test_bundles.yml actions
  • act10ns/slack v1 composite
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • denoland/setup-deno v1 composite
.github/workflows/test_coverage.yml actions
  • act10ns/slack v1 composite
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • codecov/codecov-action v3 composite
  • distributhor/workflow-webhook v3 composite
.github/workflows/test_install.yml actions
  • act10ns/slack v1 composite
  • actions/checkout v3 composite
  • actions/setup-node v3 composite