Recent Releases of https://github.com/vincentlaucsb/csv-parser
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.3.0: Race Condition Fix
What's Changed
- CSVField: new member function tryparsedecimal() to specify one or more decimal symbols by @wilfz in https://github.com/vincentlaucsb/csv-parser/pull/226
- Replace the includes of Windows.h with windows.h (#204) by @ludovicdelfau in https://github.com/vincentlaucsb/csv-parser/pull/235
- Use const CSVFormat& in calculate_score by @rajgoel in https://github.com/vincentlaucsb/csv-parser/pull/236
- Fix memory issues in CSVFieldList by @vincentlaucsb in https://github.com/vincentlaucsb/csv-parser/pull/237
Race Condition Notes
Background
The CSV Parser tries to perform as few allocations as possible. Instead of naively storing individual CSV fields as singular std::strings in a std::vector, the parser keeps references to the raw input and uses lightweight RawCSVField objects to mark where a specific field starts and ends in that field (as well as flag indicating if an escaped quote is present). This has the benefits of:
- Avoiding the cost of constructing many
std::stringinstances - Avoiding the cost of constant
std::vectorreallocations - Preserving locality of reference
Furthermore, the CSV Parser also uses separate threads for parsing CSV and for iterating over the data. As CSV rows are parsed, they are made available to the user who may utilize them without interrupting the parsing of new rows.
The Race Condition
The RawCSVField objects mentioned previously were stored as contiguous blocks, and an std::vector of pointers to these blocks were used to keep track of them.
However, as @ludovicdelfau accurately diagnosed, if the reading thread attempted to access a RawCSVField (e.g. through reading a CSVField ) at the same time that a parsing thread was pushing a new RawCSVField to an at-capacity std::vector, the parsing thread's push would cause the contents of the std::vector to be reallocated, thus causing the reading thread to access deallocated memory.
This issue was first reported in #217.
The Fix
The fix was simple. An std::deque was dropped in to replace std::vector to store RawCSVField pointers, as std::deque does not perform reallocations. This change appears to even improve the CSV Parser's performance as the cost of constant reallocations is avoided. The loss of memory locality typical in std::deque applications was avoided as, again, the CSV Parser is storing pointers to RawCSVField[] and not the RawCSVField objects themselves.
New Contributors
- @wilfz made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/226
- @ludovicdelfau made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/235
- @rajgoel made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/236
Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.3...2.3.0
- C++
Published by vincentlaucsb almost 2 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.3
- Fix
n_rows()being off-by-one when theCSVReaderiterator was used (reported in #173)- Note: This was due to a simple counting error where the iterator did not increment the row counter for the first row. All rows were still correctly read.
- Implement ability to handle arbitrary combinations of
\rand\nin line endings (#223) - Fix CSV writers incorrectly converting decimal values between 0 and -1 to positive numbers
- C++
Published by vincentlaucsb about 2 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.2
What's Changed
- Allow parsing of numbers that begin with
+, fixing #213 - Fix compiler warnings in g++ from using
absand intry_parse_hex()https://github.com/vincentlaucsb/csv-parser/pull/227 - Fix invalid memory access issue in g++ builds https://github.com/vincentlaucsb/csv-parser/pull/228
- Issue was caused when using
CSVFieldmethods in conjunction withCSVRowreverse iterators
- Issue was caused when using
- CMake options to disable programs building by @BaptisteLemarcis in https://github.com/vincentlaucsb/csv-parser/pull/148
New Contributors
- @BaptisteLemarcis made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/148
Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.1...2.2.2
- C++
Published by vincentlaucsb about 2 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.1
This is a simple CMake change that makes it easier to #include "csv.hpp" in a CMake project that grabs csv-parser using FetchContent_Declare().
What's Changed
- Provide directory of library's header as the include directory by @grosscol in https://github.com/vincentlaucsb/csv-parser/pull/220
New Contributors
- @grosscol made their first contribution in https://github.com/vincentlaucsb/csv-parser/pull/220
Full Changelog: https://github.com/vincentlaucsb/csv-parser/compare/2.2.0...2.2.1
- C++
Published by vincentlaucsb about 2 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.2.0
- Fixed bug which caused inaccurate serialization of floating point values in CSV Writer as reported by #188
- Bug affected numbers close to 10 ^n; was caused by usage of inaccurate
std::log()function (see: https://stackoverflow.com/questions/1489830/efficient-way-to-determine-number-of-digits-in-an-integer)
- Bug affected numbers close to 10 ^n; was caused by usage of inaccurate
- Fixed issue where strings consisting of numbers and dashes (e.g. phone numbers) were inaccurately identified as integers
- Silenced some compiler warnings
- C++
Published by vincentlaucsb about 2 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.1.3
- Fix various compatibility issues with g++ and clang
- Added hex value parsing
- Fixed a rare out-of-bounds condition
- C++
Published by vincentlaucsb almost 5 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.1.2
- Fixed compilation issues with C++11 and 14.
- CSV Parser should now be should C++11 compatible once again with g++ 7.5 or up
- Allowed users to customize decimal place precision when writing CSVs
- Fixed floating point output
- Arbitrarily large integers stored in doubles can now be output w/o limits
- Fixed newlines not being escaped by
CSVWriter
- C++
Published by vincentlaucsb almost 5 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.1.1
- Fixed CSVStats only processing first 5000 rows thanks to @TobyEalden
- Fixed parsing """fields like this""" thanks to @rpadrela
- Fixed CSVReader move semantics thanks to @artpaul
- C++
Published by vincentlaucsb about 5 years ago
https://github.com/vincentlaucsb/csv-parser - Minor Patch
Fixed #142 where decimal numbers were being printed properly by CSVWriter, and incorporated #137 and #134
- C++
Published by vincentlaucsb over 5 years ago
https://github.com/vincentlaucsb/csv-parser - Better, faster, stronger
New Features
CSVReadercan now parse from memory mapped files,std::stringstream, andstd::ifstream-
DelimWriternow supports writing rows encoded asstd::tuple -
DelimWriterautomatically converts numbers and other data types stored in vectors, arrays, and tuples
-
Improvements
CSVReaderis now a no-copy parser when memory-mapped IO is usedCSVRowandCSVFieldnow refer to the original memory map
- Significant performance improvements for some files
Bug Fixes
- Fixed potential thread safety issues with
internals::CSVFieldList
API Changes
CSVReader::feed()andCSVReader::end_feed()have been removed. In-memory parsing should be performed via the interface forstd::stringsteam.
- C++
Published by vincentlaucsb over 5 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.0.1
- Made parsing CSV files without header rows more convenient
- Fixed a compilation error with
std::back_inserteron some systems
- C++
Published by vincentlaucsb over 5 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.0.0
- Parser now uses memory-mapped IO for reading from disk thanks to
mio- CSV files are read in smaller chunks to reduce memory footprint (but parsing is significantly faster)
CSVReader::read_row()(andCSVReader::iterator) no longer blocksCSVReader::read_csv(), i.e. we can now simultaneously work on CSV data while reading more rows- Parser internals completely rewritten to use more efficient and easier to maintain/debug data structures
- Fixed bug where single column files could not be parsed
- Fixed errors with parsing empty files
CSVWriter::write_row()now works withstd::array
- C++
Published by vincentlaucsb over 5 years ago
https://github.com/vincentlaucsb/csv-parser - CSV Parser 2.0 Beta: >300 MBps Edition
- Parser now uses memory-mapped IO for reading from disk
- On Windows, parser may map entire file into memory or mmap chunks of file iteratively based on available RAM (will extend to all OSes)
- Parser internals completely rewritten to use more efficient and easier to maintain/debug data structures
- New algorithm involves minimal copying
- Fixed bug where single column files could not be parsed
- Fixed errors with parsing empty files
- C++
Published by vincentlaucsb over 5 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed memory errors when parsing large files
- Fixed issue with incorrect usage of
string_viewthat led to memory errors when parsing large files such as the 1.4GB Craigslist vehicles dataset #90 - Added ability to have no quote character #83
- Changed
VariableColumnPolicy::IGNOREtoIGNORE_ROWto avoid clashing withIGNOREmacro as defined byWinBase.h#96
- C++
Published by vincentlaucsb about 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed bug with parsing very long rows
- Fixed bug with parsing very long rows (as reported in #92) when the length of the row was greater than 2^16 (the limit of
unsigned short)- All instances of
unsigned shorthave been replaced byinternals::StrBufferPos(size_t) thus giving this parser the theoretical capability of parsing rows that are 2^64 characters long
- All instances of
- Fixed bug recognizing numbers in e-notation when the base did not have a decimal, e.g.
1E-06
- C++
Published by vincentlaucsb about 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed bug with whitespace trimming when a field is entirely whitespace
Fixes incorrect CSV parsing when whitespace trimming is enabled and a field is composed entirely of whitespace characters as reported in #85
- C++
Published by vincentlaucsb about 6 years ago
https://github.com/vincentlaucsb/csv-parser - First class handling of variable column CSVs
- The behavior for parsing variable-column CSV files can now be simply defined using
CSVFormat::variable_columns()- Variable-column rows can be kept or silently dropped (default), or result in an error being thrown
CSVReader::bad_row_handler()has been removed
- Many annoying clang/gcc warning messages fixed (thanks rpavlik!)
- CSV guessing implementation has been simplified (
CSVGuesseris also gone now)
- C++
Published by vincentlaucsb about 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed bug where get<>() threw incorrect overflow errors with unsigned integers
Fixed bug reported in #73
- C++
Published by vincentlaucsb about 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed Issue with Leading Comments
Fixed issue described by #67 where leading comments got concatenated to the first column name
- C++
Published by vincentlaucsb over 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed bug reading rows that begin with empty fields
- C++
Published by vincentlaucsb over 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed clang++ compilation warnings and single header version
- Fixed clang++ compilation warnings and UTF-8 BOM detection (with the help of @tamaskenez)
- Fixed compilation errors that occurred when including single header
csv.hppin multiple files
- C++
Published by vincentlaucsb over 6 years ago
https://github.com/vincentlaucsb/csv-parser - Added JSON serialization
CSVRow objects now have to_json() and to_json_array() methods with proper string escaping and column slicing/rearranging. The CSVFormat interface is now also more robust.
- C++
Published by vincentlaucsb over 6 years ago
https://github.com/vincentlaucsb/csv-parser - Fixed bug where user provided column names are overwritten
Fixed bug #44 where user provided column names are overwritten if delimiter guessing is used.
- C++
Published by vincentlaucsb almost 7 years ago
https://github.com/vincentlaucsb/csv-parser - Added whitespace trimming
Added efficient whitespace trimming which can be enabled by calling trim({ … }) on CSVFormat.
- C++
Published by vincentlaucsb almost 7 years ago
https://github.com/vincentlaucsb/csv-parser - CSVWriter and CSVField::get<>() Improvements
- Integrated Hedley library
- Possible performance increase on older compilers due to use of
restrict,pure, etc. - Better handling of integer types with
get() get<>()is now supported for all signed integer types- Removed complications regarding the (mostly) useless
long inttype CSVWriternow acceptsdeque<string>andlist<string>as inputs
- Possible performance increase on older compilers due to use of
- Updated Catch to latest version
- Refactored unit tests
- C++
Published by vincentlaucsb about 7 years ago
https://github.com/vincentlaucsb/csv-parser - Performance + API Improvements
Found a bug in the previous release
- C++
Published by vincentlaucsb about 7 years ago
https://github.com/vincentlaucsb/csv-parser - Performance + API Enhancements
- Improved performance by storing all
CSVRowdata in contiguous memory regions- Numbers based on my computer
- Disk parsing speed: 220 MB/s
- In-memory parsing speed: 380 MB/s
- Numbers based on my computer
- Improved
CSVFormatinterface
- C++
Published by vincentlaucsb about 7 years ago
https://github.com/vincentlaucsb/csv-parser - Memory Fix
- Fixed memory issues reported by Valgrind
- Fixed segfaults that occurred in Microsoft Visual Studio debug builds
- C++
Published by vincentlaucsb about 7 years ago
https://github.com/vincentlaucsb/csv-parser - C++11 Compatibility
CSV library should now be compatible with C++11 by using a third-party stringview implementation. If C++17 is detected, then std::stringview will be used.
- C++
Published by vincentlaucsb about 7 years ago
https://github.com/vincentlaucsb/csv-parser - Floating Point Parsing Fix
- C++
Published by vincentlaucsb about 7 years ago
https://github.com/vincentlaucsb/csv-parser - Major performance increase
- Parser went from 70MB/s (v1.0.0) to 110MB/s on my laptop
- Added
csv_data_types()helper function
- C++
Published by vincentlaucsb almost 8 years ago
https://github.com/vincentlaucsb/csv-parser - First Release
- C++
Published by vincentlaucsb almost 8 years ago