Recent Releases of https://github.com/vincentlaucsb/csv-parser

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.3.0: Race Condition Fix

What's Changed

  • CSVField: new member function tryparsedecimal() to specify one or more decimal symbols by @wilfz in https://github.com/vincentlaucsb/csv-parser/pull/226
  • Replace the includes of Windows.h with windows.h (#204) by @ludovicdelfau in https://github.com/vincentlaucsb/csv-parser/pull/235
  • Use const CSVFormat& in calculate_score by @rajgoel in https://github.com/vincentlaucsb/csv-parser/pull/236
  • Fix memory issues in CSVFieldList by @vincentlaucsb in https://github.com/vincentlaucsb/csv-parser/pull/237

Race Condition Notes

Background

The CSV Parser tries to perform as few allocations as possible. Instead of naively storing individual CSV fields as singular std::strings in a std::vector, the parser keeps references to the raw input and uses lightweight RawCSVField objects to mark where a specific field starts and ends in that field (as well as flag indicating if an escaped quote is present). This has the benefits of:

  1. Avoiding the cost of constructing many std::string instances
  2. Avoiding the cost of constant std::vector reallocations
  3. Preserving locality of reference

Furthermore, the CSV Parser also uses separate threads for parsing CSV and for iterating over the data. As CSV rows are parsed, they are made available to the user who may utilize them without interrupting the parsing of new rows.

The Race Condition

The RawCSVField objects mentioned previously were stored as contiguous blocks, and an std::vector of pointers to these blocks were used to keep track of them.

However, as @ludovicdelfau accurately diagnosed, if the reading thread attempted to access a RawCSVField (e.g. through reading a CSVField ) at the same time that a parsing thread was pushing a new RawCSVField to an at-capacity std::vector, the parsing thread's push would cause the contents of the std::vector to be reallocated, thus causing the reading thread to access deallocated memory.

This issue was first reported in #217.

The Fix

The fix was simple. An std::deque was dropped in to replace std::vector to store RawCSVField pointers, as std::deque does not perform reallocations. This change appears to even improve the CSV Parser's performance as the cost of constant reallocations is avoided. The loss of memory locality typical in std::deque applications was avoided as, again, the CSV Parser is storing pointers to RawCSVField[] and not the RawCSVField objects themselves.

New Contributors

  • @wilfz made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/226
  • @ludovicdelfau made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/235
  • @rajgoel made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/236

Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.3...2.3.0

- C++
Published by vincentlaucsb almost 2 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.3

  • Fix n_rows() being off-by-one when the CSVReader iterator was used (reported in #173)
    • Note: This was due to a simple counting error where the iterator did not increment the row counter for the first row. All rows were still correctly read.
  • Implement ability to handle arbitrary combinations of \r and \n in line endings (#223)
  • Fix CSV writers incorrectly converting decimal values between 0 and -1 to positive numbers

- C++
Published by vincentlaucsb about 2 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.2

What's Changed

  • Allow parsing of numbers that begin with +, fixing #213
  • Fix compiler warnings in g++ from using abs and in try_parse_hex() https://github.com/vincentlaucsb/csv-parser/pull/227
  • Fix invalid memory access issue in g++ builds https://github.com/vincentlaucsb/csv-parser/pull/228
    • Issue was caused when using CSVField methods in conjunction with CSVRow reverse iterators
  • CMake options to disable programs building by @BaptisteLemarcis in https://github.com/vincentlaucsb/csv-parser/pull/148

New Contributors

  • @BaptisteLemarcis made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/148

Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.1...2.2.2

- C++
Published by vincentlaucsb about 2 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.1

This is a simple CMake change that makes it easier to #include "csv.hpp" in a CMake project that grabs csv-parser using FetchContent_Declare().

What's Changed

  • Provide directory of library's header as the include directory by @grosscol in https://github.com/vincentlaucsb/csv-parser/pull/220

New Contributors

  • @grosscol made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/220

Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.0...2.2.1

- C++
Published by vincentlaucsb about 2 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.0

  • Fixed bug which caused inaccurate serialization of floating point values in CSV Writer as reported by #188
    • Bug affected numbers close to 10 ^n; was caused by usage of inaccurate std::log() function (see: https://stackoverflow.com/questions/1489830/efficient-way-to-determine-number-of-digits-in-an-integer)
  • Fixed issue where strings consisting of numbers and dashes (e.g. phone numbers) were inaccurately identified as integers
  • Silenced some compiler warnings

- C++
Published by vincentlaucsb about 2 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.1.3

  • Fix various compatibility issues with g++ and clang
  • Added hex value parsing
  • Fixed a rare out-of-bounds condition

- C++
Published by vincentlaucsb almost 5 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.1.2

  • Fixed compilation issues with C++11 and 14.
    • CSV Parser should now be should C++11 compatible once again with g++ 7.5 or up
  • Allowed users to customize decimal place precision when writing CSVs
  • Fixed floating point output
    • Arbitrarily large integers stored in doubles can now be output w/o limits
  • Fixed newlines not being escaped by CSVWriter

- C++
Published by vincentlaucsb almost 5 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.1.1

  • Fixed CSVStats only processing first 5000 rows thanks to @TobyEalden
  • Fixed parsing """fields like this""" thanks to @rpadrela
  • Fixed CSVReader move semantics thanks to @artpaul

- C++
Published by vincentlaucsb about 5 years ago

https://github.com/vincentlaucsb/csv-parser - Minor Patch

Fixed #142 where decimal numbers were being printed properly by CSVWriter, and incorporated #137 and #134

- C++
Published by vincentlaucsb over 5 years ago

https://github.com/vincentlaucsb/csv-parser - Better, faster, stronger

New Features

  • CSVReader can now parse from memory mapped files, std::stringstream, and std::ifstream
    • DelimWriter now supports writing rows encoded as std::tuple
    • DelimWriter automatically converts numbers and other data types stored in vectors, arrays, and tuples

Improvements

  • CSVReader is now a no-copy parser when memory-mapped IO is used
    • CSVRow and CSVField now refer to the original memory map
  • Significant performance improvements for some files

Bug Fixes

  • Fixed potential thread safety issues with internals::CSVFieldList

API Changes

  • CSVReader::feed() and CSVReader::end_feed() have been removed. In-memory parsing should be performed via the interface for std::stringsteam.

- C++
Published by vincentlaucsb over 5 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.0.1

  • Made parsing CSV files without header rows more convenient
  • Fixed a compilation error with std::back_inserter on some systems

- C++
Published by vincentlaucsb over 5 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.0.0

  • Parser now uses memory-mapped IO for reading from disk thanks to mio
    • CSV files are read in smaller chunks to reduce memory footprint (but parsing is significantly faster)
  • CSVReader::read_row() (and CSVReader::iterator) no longer blocks CSVReader::read_csv(), i.e. we can now simultaneously work on CSV data while reading more rows
  • Parser internals completely rewritten to use more efficient and easier to maintain/debug data structures
  • Fixed bug where single column files could not be parsed
  • Fixed errors with parsing empty files
  • CSVWriter::write_row() now works with std::array

- C++
Published by vincentlaucsb over 5 years ago

https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.0 Beta: >300 MBps Edition

  • Parser now uses memory-mapped IO for reading from disk
    • On Windows, parser may map entire file into memory or mmap chunks of file iteratively based on available RAM (will extend to all OSes)
  • Parser internals completely rewritten to use more efficient and easier to maintain/debug data structures
    • New algorithm involves minimal copying
  • Fixed bug where single column files could not be parsed
  • Fixed errors with parsing empty files

- C++
Published by vincentlaucsb over 5 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed memory errors when parsing large files

  • Fixed issue with incorrect usage of string_view that led to memory errors when parsing large files such as the 1.4GB Craigslist vehicles dataset #90
  • Added ability to have no quote character #83
  • Changed VariableColumnPolicy::IGNORE to IGNORE_ROW to avoid clashing with IGNORE macro as defined by WinBase.h #96

- C++
Published by vincentlaucsb about 6 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed bug with parsing very long rows

  • Fixed bug with parsing very long rows (as reported in #92) when the length of the row was greater than 2^16 (the limit of unsigned short)
    • All instances of unsigned short have been replaced by internals::StrBufferPos (size_t) thus giving this parser the theoretical capability of parsing rows that are 2^64 characters long
  • Fixed bug recognizing numbers in e-notation when the base did not have a decimal, e.g. 1E-06

- C++
Published by vincentlaucsb about 6 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed bug with whitespace trimming when a field is entirely whitespace

Fixes incorrect CSV parsing when whitespace trimming is enabled and a field is composed entirely of whitespace characters as reported in #85

- C++
Published by vincentlaucsb about 6 years ago

https://github.com/vincentlaucsb/csv-parser - First class handling of variable column CSVs

  • The behavior for parsing variable-column CSV files can now be simply defined using CSVFormat::variable_columns()
    • Variable-column rows can be kept or silently dropped (default), or result in an error being thrown
    • CSVReader::bad_row_handler() has been removed
  • Many annoying clang/gcc warning messages fixed (thanks rpavlik!)
  • CSV guessing implementation has been simplified (CSVGuesser is also gone now)

- C++
Published by vincentlaucsb about 6 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed bug where get<>() threw incorrect overflow errors with unsigned integers

Fixed bug reported in #73

- C++
Published by vincentlaucsb about 6 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed Issue with Leading Comments

Fixed issue described by #67 where leading comments got concatenated to the first column name

- C++
Published by vincentlaucsb over 6 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed clang++ compilation warnings and single header version

  • Fixed clang++ compilation warnings and UTF-8 BOM detection (with the help of @tamaskenez)
  • Fixed compilation errors that occurred when including single header csv.hpp in multiple files

- C++
Published by vincentlaucsb over 6 years ago

https://github.com/vincentlaucsb/csv-parser - Added JSON serialization

CSVRow objects now have to_json() and to_json_array() methods with proper string escaping and column slicing/rearranging. The CSVFormat interface is now also more robust.

- C++
Published by vincentlaucsb over 6 years ago

https://github.com/vincentlaucsb/csv-parser - Fixed bug where user provided column names are overwritten

Fixed bug #44 where user provided column names are overwritten if delimiter guessing is used.

- C++
Published by vincentlaucsb almost 7 years ago

https://github.com/vincentlaucsb/csv-parser - Added whitespace trimming

Added efficient whitespace trimming which can be enabled by calling trim({ … }) on CSVFormat.

- C++
Published by vincentlaucsb almost 7 years ago

https://github.com/vincentlaucsb/csv-parser - CSVWriter and CSVField::get<>() Improvements

  • Integrated Hedley library
    • Possible performance increase on older compilers due to use of restrict, pure, etc.
    • Better handling of integer types with get()
    • get<>() is now supported for all signed integer types
    • Removed complications regarding the (mostly) useless long int type
    • CSVWriter now accepts deque<string> and list<string> as inputs
  • Updated Catch to latest version
    • Refactored unit tests

- C++
Published by vincentlaucsb about 7 years ago

https://github.com/vincentlaucsb/csv-parser - Performance + API Improvements

Found a bug in the previous release

- C++
Published by vincentlaucsb about 7 years ago

https://github.com/vincentlaucsb/csv-parser - Performance + API Enhancements

  • Improved performance by storing all CSVRow data in contiguous memory regions
    • Numbers based on my computer
      • Disk parsing speed: 220 MB/s
      • In-memory parsing speed: 380 MB/s
  • Improved CSVFormat interface

- C++
Published by vincentlaucsb about 7 years ago

https://github.com/vincentlaucsb/csv-parser - Memory Fix

  • Fixed memory issues reported by Valgrind
  • Fixed segfaults that occurred in Microsoft Visual Studio debug builds

- C++
Published by vincentlaucsb about 7 years ago

https://github.com/vincentlaucsb/csv-parser - C++11 Compatibility

CSV library should now be compatible with C++11 by using a third-party stringview implementation. If C++17 is detected, then std::stringview will be used.

- C++
Published by vincentlaucsb about 7 years ago

https://github.com/vincentlaucsb/csv-parser - Floating Point Parsing Fix

- C++
Published by vincentlaucsb about 7 years ago

https://github.com/vincentlaucsb/csv-parser - Major performance increase

  • Parser went from 70MB/s (v1.0.0) to 110MB/s on my laptop
  • Added csv_data_types() helper function

- C++
Published by vincentlaucsb almost 8 years ago

https://github.com/vincentlaucsb/csv-parser - First Release

- C++
Published by vincentlaucsb almost 8 years ago