dwarfs - dwarfs-0.13.0

FreeBSD, big-endian, and many new architectures

This release finally includes FreeBSD in the list of supported operating systems. That should make it much easier to port DwarFS to other *BSDs as well. Big-endian platforms are also supported, and the file system images (which use little-endian) are fully portable between architectures. Binary releases are now available for a wide range of architectures: aarch64, arm, i386, loongarch64, ppc64, ppc64le, riscv64, s390x, and x86_64.

Metadata manipulation

Before this release, file system metadata was immutable once a DwarFS image had been created, and the only way to manipulate metadata was to build a new image from scratch. This release adds two options to mkdwarfs:

--rebuild-metadata allows changes/upgrades to the metadata block. This makes it easy to change how the metadata is packed, or even perform manipulations such as --chmod after the fact.
--change-block-size allows you to change the physical block size of the file system image.

Bug fixes

The linker configuration for the release binaries was broken. The symptom was that, very occasionally, tests would fail in CI with std::terminate being called after the exception handling code failed to unwind the stack. The root cause was that, in clang builds, code from libunwind and libstdc++ was arbitrarily mixed, which—depending on the order in which individual threads were scheduled in the unit tests—could lead to stack unwinding working flawlessly or not at all. This was one of the hardest bugs to track down this year; fortunately, the fix was quite simple. It’s possible this issue is present in previously released binaries, although there have been no reports.
Made section index discovery more robust. Fixes #264.
A recent kernel change (LKML thread, 2025-05-05) caused tools_test to fail on Linux 6.14 and later. This has been fixed by accepting both EPERM and ENOSYS as valid error codes for link() calls.

Features

FreeBSD support. Everything that works on Linux should also work on FreeBSD. No static binaries are provided for FreeBSD, but the build should work out of the box once all dependencies are installed.
Big-endian architectures. This is still experimental, even though all unit tests pass under QEMU and the benchmark suite runs on real hardware. This currently requires forked versions of folly and fsst. The changes are small, and the pull requests will hopefully be merged upstream soon. (https://github.com/facebook/folly/pull/2484, https://github.com/cwida/fsst/pull/36)
Experimental 32-bit support. While DwarFS should largely “just work” on 32-bit with small images (a few hundred megabytes), limited address space is a problem due to DwarFS’s extensive use of memory-mapped files. There will be changes to limit the use of mmap in the future (primarily due to other issues), which should improve 32-bit compatibility as a side effect. Fixes #268.
Wider CPU architecture coverage. Static binary releases (including universal binaries) are now available for x86_64, aarch64, i386, arm, ppc64, ppc64le, riscv64, s390x, and loongarch64. Building the new release binaries uncovered a few bugs in clang (https://github.com/llvm/llvm-project/issues/150913), binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=33223), mold (https://github.com/rui314/mold/issues/1490, https://github.com/rui314/mold/issues/1496, https://github.com/rui314/mold/issues/1497, https://github.com/rui314/mold/issues/1498), and UPX (https://github.com/upx/upx/issues/925), not all of which have been fixed. As a result, the binaries use slightly different toolchains and configurations depending on the architecture. Fixes #266, #268.
Custom self-extracting stub for universal binaries. It aims for simplicity and portability and should work on most Linux systems. It is used if UPX support for an architecture is unavailable, or if the binaries extract much faster than the UPX-compressed version. The stub also supports --extract-wrapped-binary <file> to extract the embedded binary.
Category metadata stored by default. The category metadata for categorized blocks is now stored in the metadata block by default. This allows recompressing blocks with a metadata-dependent algorithm (e.g., FLAC) even if they were previously compressed with a metadata-independent algorithm. You can disable this with --no-category-metadata. See the mkdwarfs man page for details.
Options for smaller metadata. The --no-category-names and --no-category-metadata options can be used to reduce metadata size. However, this makes it impossible to use metadata-dependent compression algorithms (e.g., FLAC) or to select category-specific compression when recompressing the image.
Metadata rebuilding in mkdwarfs. In addition to recompressing, it is now possible to change metadata packing and apply operations such as --set-owner, --set-group, --set-time, --time-resolution, --chmod, or --no-create-timestamp. Note that these operations are potentially lossy and may be irreversible. By default, the history of metadata rebuilds is tracked in the metadata itself; you can disable this with --no-metadata-version-history.
Change block size on existing images. You can now change the block size of an existing image using --change-block-size. This implies --rebuild-metadata and --recompress=all and can be useful for tuning performance without recreating the image from scratch.
Runtime memory display in mkdwarfs. mkdwarfs now shows its current memory usage while running. Note that -L/--memory-limit still only limits the memory used for the block queue, not overall memory usage. Fixing this is on the roadmap; there’s no need to file an issue.
dwarfsextract format controls. New options --format-options and --format-filters control the output format. There is also --format=auto to automatically “guess” the format and filters based on the output file name. (Thanks to @oxalica for the pull request.)
dwarfsck detail level. New frozen_details detail level shows frozen_analysis content ordered by memory location instead of memory usage, and also shows the address range of each section.
Lean LRU cache. Replaced folly’s EvictingCacheMap with a simple in-repo LRU cache implementation. This reduces external dependencies and binary size without sacrificing performance.
Windows extended attributes. The pxattr utility now supports all extended attribute operations on Windows, including setxattr() and removexattr(). Error handling and reporting for extended attributes on Windows has also been improved.

Docs

Updated dwarfs-format.md with more information on section types, compression metadata, and the Frozen2 binary metadata layout. (Thanks to @oxalica for asking the right questions, reporting bugs, and ultimately releasing a Rust library to read/write DwarFS images.)
Added a notable users section to the README. (Thanks to Vitaly Zdanevich for the PRs.)
Updated mkdwarfs docs with more information on worker threads and the requirements for bit-identical images.
Major README overhaul: added a Quick Start section, added more links, and fixed typos and wording.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.12.4...v0.13.0

SHA-256 Checksums

``` eb4a21fe560721a17059eb26b14abb894420008cc0dc990b829bdfda08e97af8 dwarfs-0.13.0-Linux-aarch64.tar.xz 3781cf4e5dde77f4e7da9900dca4250d300b57bbf5ba6640ba7f61e2efb5782f dwarfs-0.13.0-Linux-arm.tar.xz d7157d7a2faedea61829835060edfb1614d75b8559252f237aae38a97a684e9c dwarfs-0.13.0-Linux-i386.tar.xz 786351112f1659d041e40ef70a4376c55f68bb783fc710ff71a1cb612b6786f5 dwarfs-0.13.0-Linux-loongarch64.tar.xz 465c0a2c14f13612a0413ba4e79f853cefdcc25a82d8ea7e74e3a349ac887983 dwarfs-0.13.0-Linux-ppc64le.tar.xz cb44187074793b4aca9e252fb0bee269725180448e7a47abbf17e38b0b32e6ac dwarfs-0.13.0-Linux-ppc64.tar.xz 0d8eac724b5c00f22592c68c2f61a1ae7be92aef0b236ace208e73e759e6965f dwarfs-0.13.0-Linux-riscv64.tar.xz 52dbd8e44deede8e237d4b8c2fe1388b1478c14b1f76b63b4045a7103110acfb dwarfs-0.13.0-Linux-s390x.tar.xz 81d2c5178b51367207df819a3a754fc3141d0f74d4ac80dbf20b1f0bd9d1be44 dwarfs-0.13.0-Linux-x86_64.tar.xz d0654fcc1219bfd11c96f737011d141c3ae5929620cd22928e49f25c37a15dc9 dwarfs-0.13.0.tar.xz fbbdf50657caf6be3c6864768bd2f0c2f6ea955b66e07875408e0a78bed2f9b9 dwarfs-0.13.0-Windows-AMD64.7z 78c52f9ca120e11d4a4620c614d532425ab694c0f8b6c25f04b475f10b3e0b2e dwarfs-fuse-extract-0.13.0-Linux-aarch64 451744a2be3312fcd2968aa7821d61d97ece24177cebdb4cb8cf463409f9de7d dwarfs-fuse-extract-0.13.0-Linux-aarch64.upx 0079697de87e14ea5bbf3a0dd0e95488d8343bd423f2ca3643462520a6c95b89 dwarfs-fuse-extract-0.13.0-Linux-arm.upx 7d3fc8474c1372a92b4301c28124fac16f858d81686ee4cafe21e8b6a1cebc14 dwarfs-fuse-extract-0.13.0-Linux-i386.upx da9558153264fd14ba49f02d9fb26a8ef5c090a3e99a643fbeb9e88b68c3fd93 dwarfs-fuse-extract-0.13.0-Linux-loongarch64 b42ad7d4a229d22ecf219fd1759f1f1ebb0ccd28b4be16e49cf307f05b154f63 dwarfs-fuse-extract-0.13.0-Linux-ppc64 a81001d3a195eab8cdfd2b29768718ff5eae64a69174d33c5114aa04b87bdb6a dwarfs-fuse-extract-0.13.0-Linux-ppc64le 9c9f4dcee5c78466cda8d3f3b5dccaa57be0270774e373198ab5cdc6f46a8c5d dwarfs-fuse-extract-0.13.0-Linux-riscv64 aeae39035e52632a013f318909b38e0525dc04ba4e52179ab678807b6a3755ea dwarfs-fuse-extract-0.13.0-Linux-s390x 04c9587395dbbdfdb6562f945daf2aa83e5ff80691e1dfeb7854dfc2e0e6993e dwarfs-fuse-extract-0.13.0-Linux-x86_64 4a31bcbf9145fa422d3cfaf3f1f58b40cd095ba9a4bee508123f97e208684f52 dwarfs-fuse-extract-0.13.0-Linux-x86_64.upx 74b730031dfe4d4aa0caaa411ea806d02ba44f32dc2618e52739ce7b52faeb4c dwarfs-universal-0.13.0-Linux-aarch64 3ce4f428869f96c025487ffe0f6625e60bffe3768be19c21062faeaab30c8901 dwarfs-universal-0.13.0-Linux-aarch64.upx 6c911b5fbc891d9ae29f5ac20dca6748dc2ed9fc7cafb39f44163439d563431e dwarfs-universal-0.13.0-Linux-arm.upx da7e154272bc9efb47acbf57c5f065e0c45456fc9aa5c19e6a14395516dbaf78 dwarfs-universal-0.13.0-Linux-i386.upx 77d26cd3ada34f0ab846850b12c319c32c87ddb06940cc371014dd6d6f8c20e0 dwarfs-universal-0.13.0-Linux-loongarch64 c64d49d14043bebf72d374dbf332397acf242eb72caabd77c2d22c2529a22f03 dwarfs-universal-0.13.0-Linux-ppc64 83aa8161f9f305246e89830c15f382a3d7e5f88a7dd2dc256f2c4c5ddf2177d2 dwarfs-universal-0.13.0-Linux-ppc64le ab51c1aa3f55b148b7360935df68d342c91c77859a0f62bb9facb6002fa15d6b dwarfs-universal-0.13.0-Linux-riscv64 97037f5df06e12f705c54ad77da4b0defd835779b9378cb0196c6e0f97e4fe2f dwarfs-universal-0.13.0-Linux-s390x 6e003ec681abc5542af54eb7dda3efb7c3742fc319e8973f53392b1e63de8ac6 dwarfs-universal-0.13.0-Linux-x86_64 df74d98efdce7e3deaef711fd65868a2e5afc9f0a653f326ea7b08c28e99709d dwarfs-universal-0.13.0-Linux-x86_64.upx 17465a7a9df4bccc3bda86eeca66a85806f1b19de0a21d45d3486fa2bed3a4fa dwarfs-universal-0.13.0-Windows-AMD64.exe ```

- C++
Published by mhx 6 months ago

dwarfs - dwarfs-0.12.4

This release mainly fixes a bug that was introduced with v0.12.0 (06f8728cc to be precise). When re-compressing a file system image where some blocks cannot be compressed using the selected algorithm because of a bad_compression_ratio_error (i.e. when the compressed size is actually larger than the uncompressed size), the resulting block object was left empty, which subsequently led to a segfault.

A few small bugs have also been fixed and couple of features were added to the dwarfsck tool, mostly triggered by the discussion in #263.

Most other changes that made it into the release were related to how the static binaries are built. All the dependencies have been updated, in particular also for the Windows build. The Windows build was also switched from openssl to libressl. Overall, the size of the Windows universal binary was reduced by 30% and is now comparable in size to the Linux binaries.

Bugfixes

Segfault on bad_compression_ratio_error. When recompressing a filesystem where some blocks cannot be compressed using the selected algorithm because of a bad_compression_ratio_error, the resulting block was left empty after the refactoring done in 06f8728cc.
Add history unless --no-history is given when rewriting a file system image.
Allow dumping frozen_layout w/o frozen_analysis in dwarfsck.
Logging timestamps should show local time.
Workaround a weird MSVC bug.
Remove useless cast causing compiler warning.

Features

More complete breakdown of metadata in dwarfsck.
Add schema_raw_dump flag to dwarfsck --detail.

Build

Switch static build to libressl on Windows.
Update static build libraries.
Update folly/fbthrift/fsst.

Other

Introduce and use safe_localtime() to prevent issues with fmt deprecating fmt::localtime.
Speed up a few slow tests on Windows.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.12.3...v0.12.4

SHA-256 Checksums

``` 1492c0796ab3479a80e4e191e651f20c005e634b73ec684edd48465830f0aab3 dwarfs-0.12.4-Linux-aarch64.tar.xz bff9da50918cf9272dbb9322733a629ccd84efc723e355a86eafcd075081f968 dwarfs-0.12.4-Linux-x86_64.tar.xz 352d13a3c7d9416e0a7d0d959306a25908b58d1ff47fb97e30a7c8490fcff259 dwarfs-0.12.4.tar.xz 5574d6aeb970c4cbabd50526994159c0551b2dc3e940edc774ac0952e1528c93 dwarfs-0.12.4-Windows-AMD64.7z 72f688800faf74acfdc75b6d77a456930db7088ee7c7e115c4b95414ab751c93 dwarfs-fuse-extract-0.12.4-Linux-aarch64 ea182efeb3ac55f8f79ee80009854c2ac8410b37d427da90e42b96eec73c470f dwarfs-fuse-extract-0.12.4-Linux-x86_64 06c35c1dc99bc9c19f73ceda238866f9b4a631c08ab4ad19bbf8ea5c0a3ff9f0 dwarfs-fuse-extract-mimalloc-0.12.4-Linux-aarch64 afa48dfda2692e5b3cba6c523ab66696140d48dc8c4dfd57b114c73632e45326 dwarfs-fuse-extract-mimalloc-0.12.4-Linux-x86_64 7efdbd93954f1f88898b690466dca4b2ae3d20e799247c4ee4459acb35824389 dwarfs-universal-0.12.4-Linux-aarch64 48fc469bca251b932904b33d6161fc9a964cdefdb9d4d128828b4766b9c63eaa dwarfs-universal-0.12.4-Linux-x86_64 094eaaf821df47daaa11899927f269f57f71843b39ecd53bae70692447cb3e65 dwarfs-universal-0.12.4-Windows-AMD64.exe ```

- C++
Published by mhx 9 months ago

dwarfs - dwarfs-0.12.3

This release provides a fix for cases where automatic image offset detection could fail to work correctly as well as further size optimizations of the release binaries. The dwarfs-universal binary now uses LibreSSL's libcrypto, whereas the binaries from the release tarball use OpenSSL's libcrypto. This is a trade-off favoring size for the universal binary and speed for the "regular" binaries. Note, however, that this will be imperceptible unless you use dwarfsck with either --check-integrity or --checksum.

| | v0.11.3 | v0.12.0 | v0.12.1 | v0.12.2 | v0.12.3 | | -------------------------------------------- | ---------:| ---------:| ---------:| ---------:| ---------:| | Linux x8664 universal binary | 5,319,916 | 2,833,280 | 2,903,624 | 2,968,252 | 2,215,464 | | Linux aarch64 universal binary | 4,637,312 | 2,725,864 | 2,588,924 | 2,636,912 | 2,180,928 | | Linux x8664 fuse-extract binary (jemalloc) | - | 1,183,752 | - | 906,016 | 845,984 | | Linux aarch64 fuse-extract binary (jemalloc) | - | 1,188,760 | - | 913,260 | 885,416 | | Linux x8664 fuse-extract binary (mimalloc) | - | - | 1,075,536 | 835,500 | 774,804 | | Linux aarch64 fuse-extract binary (mimalloc) | - | - | 1,059,588 | 839,740 | 811,156 | | Linux x8664 binary tarball | 7,736,712 | 3,888,104 | 3,698,356 | 3,703,712 | 3,600,544 | | Linux aarch64 binary tarball | 6,791,424 | 3,497,140 | 3,271,584 | 3,296,380 | 3,258,856 |

Bugfixes

Automatic image offset detection (for images using a custom header) did not work correctly if the header contained a string that would be identified as the start of a v1 section header (these were only used before dwarfs-0.3.0). If there was either "DWARFS\x02\x00" or "DWARFS\x02\x01" in the header, offset detection would fail. The check has been modified to peek further into the data and ensure this really is a v1 section header, and also checking if the next section header position can be derived from the length field. It is still possible to construct a file system image where offset detection will ultimately fail, but it is much less likely with the change.

Build

The build process for the release binaries has been further tweaked to reduce binary size. The dwarfs-fuse-extract binary now again supports extracting files by pattern; I didn't realize that this was actually a widely used feature before dropping it in the last release. dwarfs-universal is now linked against LibreSSL's libcrypto instead of OpenSSL's. This significantly reduces the size at the expense slightly slower cryptographic hash functions. However, this will likely only be perceivable when using --tool=dwarfsck with either --check-integrity or --checksum. The binaries from the release tarballs are still linked against libcrypto from OpenSSL.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.12.2...v0.12.3

SHA-256 Checksums

``` 7e2c1d6f4bf8f19cedc8f050f405906e269d96197e238226f334c7ae2fb1f489 dwarfs-0.12.3-Linux-aarch64.tar.xz 9a2590deb3069d7e677604304d94226fa10b7385ccee46ba8c66f4e6c902168c dwarfs-0.12.3-Linux-x86_64.tar.xz bd2d54178c59e229f2280eea747479a569e6f6d38340e90360220d00988f5589 dwarfs-0.12.3.tar.xz acbfbf5a48a8fa53dd6a39d7450c8d6849ef99d305563461596ac5c52767387a dwarfs-0.12.3-Windows-AMD64.7z 16a450e6996ab59b89f69b0e30dab508d1e88c526586df5db5b7a43ede252398 dwarfs-fuse-extract-0.12.3-Linux-aarch64 e658a0513bc9168ff6f366b2dc19f6360408663fc2f0015653912075526c05a2 dwarfs-fuse-extract-0.12.3-Linux-x86_64 755599e8b52e36a87d6b90e860f38c7cf3c71bcebd8f63e11834cca0fbed9708 dwarfs-fuse-extract-mimalloc-0.12.3-Linux-aarch64 372c3090fd966e881978aaa7ccfcbb1476cf16a3cc3ddbaaddeac14cf141cb5d dwarfs-fuse-extract-mimalloc-0.12.3-Linux-x86_64 9c66639a66a122d964ee297b93a638c61f25c917e9aee49339fde17320757f2a dwarfs-universal-0.12.3-Linux-aarch64 c5e83388f41cddb59b3c490fc9fad8fbd73b134f8da5b8bba77297ab593d0efb dwarfs-universal-0.12.3-Linux-x86_64 9dbc03ce6dfc9df25e8c9e77e5ee15f624ef244aaa9b4e235fb121761c698789 dwarfs-universal-0.12.3-Windows-AMD64.exe ```

- C++
Published by mhx 10 months ago

dwarfs - dwarfs-0.12.2

This release provides a fix for a performance regression, switches the default memory allocator back to jemalloc, and further reduces the size of the dwarfs-fuse-extract binary. The latter is available as both a jemalloc and a mimalloc version. jemalloc offers a lot more configuration options that can be crucial in optimizing the memory profile of e.g. the FUSE driver. If you don't need that flexibility, you can save a few bits by using the -mimalloc version.

| | v0.11.3 | v0.12.0 | v0.12.1 | v0.12.2 | | -------------------------------------------- | ---------:| ---------:| ---------:| ---------:| | Linux x8664 universal binary | 5,319,916 | 2,833,280 | 2,903,624 | 2,968,252 | | Linux aarch64 universal binary | 4,637,312 | 2,725,864 | 2,588,924 | 2,636,912 | | Linux x8664 fuse-extract binary (jemalloc) | - | 1,183,752 | - | 906,016 | | Linux aarch64 fuse-extract binary (jemalloc) | - | 1,188,760 | - | 913,260 | | Linux x8664 fuse-extract binary (mimalloc) | - | - | 1,075,536 | 835,500 | | Linux aarch64 fuse-extract binary (mimalloc) | - | - | 1,059,588 | 839,740 | | Linux x8664 binary tarball | 7,736,712 | 3,888,104 | 3,698,356 | 3,703,712 | | Linux aarch64 binary tarball | 6,791,424 | 3,497,140 | 3,271,584 | 3,296,380 |

Bugfixes

The dwarfs-0.12.0 release introduced a performance regression where FLAC compression took more than twice as long as in the previous releases. This has been fixed. FLAC decompression was unaffected.

Build

A few small refactoring changes to further reduce the size of the fuse-extract binary. In particular, the performance monitor and the history feature are now fully removed. Also, the functionality to extract in different archive formats as well as to extract only files matching a pattern have been removed, so the image can only be fully extracted to disk.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.12.1...v0.12.2

SHA-256 Checksums

``` 7d58b4125171befb5457a6318cda99607e32c2226db74de5f7449dee0e10764f dwarfs-0.12.2-Linux-aarch64.tar.xz 61d239c0583d88443ca3e0080f1fe8bc97979a3ad67ed15ca3516e27ea7e7f53 dwarfs-0.12.2-Linux-x86_64.tar.xz 9b256d1f2bc17917cd63a1bee3bd5f505076b4d880fcf9daa18a6ca5bca35aeb dwarfs-0.12.2.tar.xz 1ffbe8bbf44c5168aba5d0132705ad46837e4925ff57798333efcc6312bd7441 dwarfs-0.12.2-Windows-AMD64.7z 2aea873299cecc68dc0d8028d55cb00dc5c3289d12896e70e2038ea09d780c4e dwarfs-fuse-extract-0.12.2-Linux-aarch64 88559806c8f2a98108e9ecf24926a317ea0afe655ef45f76c476307c4e71d971 dwarfs-fuse-extract-0.12.2-Linux-x86_64 99202109637d4d49d3b3945b35487deada758358b4f185869975c7c4be9870fa dwarfs-fuse-extract-mimalloc-0.12.2-Linux-aarch64 8af6f46b5c39fa7fa9294b652bd023302599f2723f33571b3e2bf2376f420770 dwarfs-fuse-extract-mimalloc-0.12.2-Linux-x86_64 79a8e5d729650d8f26e1759228a10c2ea49ae88c1a491741f9196ce2937b4e2e dwarfs-universal-0.12.2-Linux-aarch64 29d3195831c8ff3aca46b2a731eee7899d3735a71b870e9510adeaeb34dd135c dwarfs-universal-0.12.2-Linux-x86_64 1232104b7c44dda3da46fdc6f7667e542e2e916ac926e2395965bcc035ae8046 dwarfs-universal-0.12.2-Windows-AMD64.exe ```

- C++
Published by mhx 10 months ago

dwarfs - dwarfs-0.12.1

A quick update to v0.12.0 that addresses a few issues and improves the performance of the release binaries while mostly making them even smaller. The universal x86_64 binary is slightly bigger, but that's a different story [1].

| | v0.11.3 | v0.12.0 | v0.12.1 | | --------------------------------- | --------- | --------- | --------- | | Linux x8664 universal binary | 5,319,916 | 2,833,280 | 2,903,624 | | Linux aarch64 universal binary | 4,637,312 | 2,725,864 | 2,588,924 | | Linux x8664 fuse-extract binary | - | 1,183,752 | 1,075,536 | | Linux aarch64 fuse-extract binary | - | 1,188,760 | 1,059,588 | | Linux x86_64 binary tarball | 7,736,712 | 3,888,104 | 3,698,356 | | Linux aarch64 binary tarball | 6,791,424 | 3,497,140 | 3,271,584 |

Bugfixes

Attempt to fix a linking issue in the Homebrew build.

Features

Added --memory-limit=auto to mkdwarfs to use a more reasonably (hopefully) default for the block queue. The old default of 1 GiB was quite arbitrary and definitely not suitable for low-end systems. The new auto default will determine the limit based on the number of workers (which in turn is based on the number of CPUs), the block size, and the amount of physical memory of the system.
Replaced vector_byte_buffer with malloc_byte_buffer, which is internally based around a simple buffer that doesn't incur the cost of initializing each element like std::vector. Especially for large blocks which are known to be overwritten immediately, this can save a few CPU cycles.
The x86_64 release binaries now use an optimized memcpy implementation (if supported by the CPU) instead of the rather slow musl memcpy implementation. This makes mkdwarfs a few percent faster and dwarfsextract up to 20% faster.

Build

Switched the release binaries to use mimalloc instead of jemalloc. The primary reason for this change is a reduction in binary size.
Updated the xz library to the latest 5.8.1 release.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.12.0...v0.12.1

SHA-256 Checksums

``` f61d49436ad6b02e7e496f746ce09d69a3f050592286b5ff3d6e3edba765b82a dwarfs-0.12.1-Linux-aarch64.tar.xz 6685622e6bf1edea138023dfb5a84ec241c6a57619ec5d7ee86344057d89296b dwarfs-0.12.1-Linux-x86_64.tar.xz 5523a5c3aea244cbfbccfe64f1df6053b3901e6af8916fac1530faf0f7a5f07f dwarfs-0.12.1.tar.xz d51c2e5ed021a7322928aeb8f09cc3c392362c8a1ea6217e2ba177f241f8a809 dwarfs-0.12.1-Windows-AMD64.7z 8850a0002d7008791c2629fd3a5bc718c50606dfc391bf5b1259d9f6d79c8401 dwarfs-fuse-extract-0.12.1-Linux-aarch64 d01dd82068e0d2020fd35ba5a2ddf416aeba32a9f6f7ac4544a173eefc6c743f dwarfs-fuse-extract-0.12.1-Linux-x86_64 3dfc3d8d2152f4d9d29c543d0a30b2ba1bc0bba470742e08924ad75dc8c23967 dwarfs-universal-0.12.1-Linux-aarch64 518faa4f5a476dcc4ec75d8ed4ac31d076990bb515cd32a752a8165b3ad04885 dwarfs-universal-0.12.1-Linux-x86_64 1ad50c0b6127e56dc026de7d2f7a3df28fa3983bdd628584a652d05f97e28b88 dwarfs-universal-0.12.1-Windows-AMD64.exe ```

[1] I've noticed that the universal binary, and sometimes the other binaries as well, fluctuate dramatically in size after being UPX compressed. In the last CI build before tagging the new DwarFS release, the UPX-compressed universal binary had a size of 2,713,752 bytes. The only change after tagging the release was the version info (although, with LTO, you never know what the compiler makes of that). Interestingly, when unpacked, both binaries have exactly the same size. But the packed size differs by ~190 KiB.

- C++
Published by mhx 11 months ago

dwarfs - dwarfs-0.12.0

The main features of this release are new licensing conditions and significantly smaller binaries.

New Licensing Conditions

Instead of being all GPL-3.0 like all the previous releases, this release changes the license of a large fraction of the DwarFS code to MIT. All tools and libraries that only read DwarFS images are now MIT-licensed. Everything else (e.g. mkdwarfs) is still GPL-3.0 for the time being.

Significantly reduced binary size

By moving the build pipeline to Alpine Linux and through some major refactoring, the size of the release binaries has been reduced significantly.

| | v0.11.3 | v0.12.0 | | ------------------------------ | --------- | --------- | | Linux x8664 universal binary | 5,319,916 | 2,833,280 | | Linux aarch64 universal binary | 4,637,312 | 2,725,864 | | Linux x8664 binary tarball | 7,736,712 | 3,888,104 | | Linux aarch64 binary tarball | 6,791,424 | 3,497,140 |

There's now also an additional binary called dwarfs-fuse-extract that combines the functionality of the FUSE driver dwarfs and dwarfsextract in a single, extremely small binary:

| | v0.12.0 | | --------------------------------- | --------- | | Linux x86_64 fuse-extract binary | 1,183,752 | | Linux aarch64 fuse-extract binary | 1,188,760 |

The main use case for this binary is single-file application image runtimes (e.g. uruntime). This binary doesn't have built-in manual pages or support for the performance monitor. It also only supports zstd and lzma compression.

Bugfixes

Build release binaries against an up-to-date libfuse. Fixes github #252.
Changes for compatibility with Boost.Process v2.

Features

Re-licensed all libraries required for reading DwarFS images under the MIT license. The source of all tools that just read DwarFS images (i.e. everything except for mkdwarfs) are also under the MIT license now. Everything else is still GPL-3.0. Addresses github #255.
Significantly reduced binary size in the static release builds. This is the result of refactoring code that unconditionally pulled in code-heavy dependencies such as libcrypto, as well as optimizing the build pipeline (e.g. building dependencies with only the necessary set of features) and turning on link time optimization.
A new kind of "universal" binary dwarfs-fuse-extract is part of the release now. This combines the FUSE driver (dwarfs) and dwarfsextract into a single binary, but does not include the mkdwarfs and dwarfsck tools that are also part of the regular universal binary. dwarfs-fuse-extract is much smaller than the regular universal binary and especially suitable to AppImage-like applications.
New hotness categorizer in mkdwarfs that allows a list of "hot" files to be stored in distinct file system blocks.
New explicit ordering mode in mkdwarfs that allows files to be ordered accoring to the order in a given list file.
dwarfs now shows the version of the FUSE library used.
New dwarfs options preload_all and preload_category to populate the block cache immediately after mounting.
New dwarfs option analysis_file that can be used for profiling and as input to mkdwarfs new hotness categorizer and explicit ordering mode.
New dwarfs option block_allocator that allows the user to switch from a malloc-based block allocator to an mmap-based one. This can help with returning memory back to the system if the blocks are evicted from the cache.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.11.3...v0.12.0

SHA-256 Checksums

``` 15867a3e1b5a310ea27700806ce7d504c912de8741dddbb40430eee419c4532f dwarfs-0.12.0-Linux-aarch64.tar.xz 4fd1e23a97d871d0536b818c11a58c58859627e347d10c1e211522c8dd56b328 dwarfs-0.12.0-Linux-x86_64.tar.xz 91d5a22e5cf125a9871bcbdb4875bdd661557757b9f50e88553da4b47f8351d2 dwarfs-0.12.0.tar.xz e9a4d513085bb91ab26f86277cfdfbf7930adf4bcb3c44be9f67f205d1db29fb dwarfs-0.12.0-Windows-AMD64.7z f7e7909b779a96dccced8e6749729ccf73ac5782ed5721cc27199498b84a27c1 dwarfs-fuse-extract-0.12.0-Linux-aarch64 264e6cbac5ade98275241082770e7aefb6e931612c543deb87322be03fc65bb8 dwarfs-fuse-extract-0.12.0-Linux-x86_64 41981a4f7a068f6c3bc32dad5307aa4466d0d74bf13083a495151d0ece34b17c dwarfs-universal-0.12.0-Linux-aarch64 1425e102fd3b8251629025fc3b8aab0f74b7079466261f6fcef6c0e0531249ac dwarfs-universal-0.12.0-Linux-x86_64 fe5904d64f1bd08f8f2f7dd271587b415bef1237562da4a77ee30982b48e24c4 dwarfs-universal-0.12.0-Windows-AMD64.exe ```

- C++
Published by mhx 11 months ago

dwarfs - dwarfs-0.11.3

Bugfixes

Handle absolute paths in --input-list. Fixes github #259.
Don't prefetch blocks that are already in the active list within the block cache.
Ensure that statistics for block tidying are correctly updated in the block cache.
A few build fixes, mainly to simplify building on Alpine.

New Contributors

@hexahigh made their first contribution in https://github.com/mhx/dwarfs/pull/257

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.11.2...v0.11.3

SHA-256 Checksums

``` de7f6609a4ddd6f2feff4cb4e43c4481515c5da178bbe12db24de9e7fec48bac dwarfs-0.11.3-Linux-aarch64-clang-reldbg-stacktrace.tar.xz 7c0835b89871e48025b2e30577fb1b3c39927f9b86940dd3c5d7c41871e12533 dwarfs-0.11.3-Linux-aarch64-clang.tar.xz 2e771b53ebee66278b3a2e6e18fd04e20abc0f6defccb5a347dbfc2b7436b729 dwarfs-0.11.3-Linux-x86_64-clang-reldbg-stacktrace.tar.xz adc3fc58d36848a312e846f0e737056b7e406894e24fa20d80fcc476ca7f401f dwarfs-0.11.3-Linux-x86_64-clang.tar.xz 5ccfc293d74e0509a848d10416b9682cf7318c8fa9291ba9e92e967b9a6bb994 dwarfs-0.11.3.tar.xz 33b3488bc1097b1b2b54194eaa5fb169dfded9a6046de6c4fee693d9a97ece32 dwarfs-0.11.3-Windows-AMD64.7z e14c0caa38a8d10273a84e57f532e513b2cbc50bb8df707b57c01d575f040a43 dwarfs-universal-0.11.3-Linux-aarch64-clang 7d4857ee18ffae705a41f164a0a810f173bf8d69bc8bef8dcbd1018fa8287f6e dwarfs-universal-0.11.3-Linux-aarch64-clang-reldbg-stacktrace 64b349aec059b9d460211af6c517f6edd89e79c5e9581381229af745ebf3cc87 dwarfs-universal-0.11.3-Linux-x86_64-clang 07a9ef68256e76e7bda552b24955a3db12c3b34312d7d664e47639995ccdabf1 dwarfs-universal-0.11.3-Linux-x86_64-clang-reldbg-stacktrace 772f00d5d02fdebca4cbd74f4b1b37ee7b57fb3004a078c053b4f58e97a794ed dwarfs-universal-0.11.3-Windows-AMD64.exe ```

- C++
Published by mhx 11 months ago

dwarfs - dwarfs-0.11.2

Bugfixes

macOS Ventura's version of clang appears to be missing an implementation of std::hash<std::filesystem::path, making it hard to define an unordered_map<filesystem::path>. Work around by simply using an unordered_map<string> instead.
Installing the binaries using cmake did not honor the CMAKE_INSTALL_BINDIR or CMAKE_INSTALL_SBINDIR variables. Fixes github #253.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.11.1...v0.11.2

SHA-256 Checksums

``` 61fce8eaa6bbdf10917a5a12331e192748a54ab1aa175ed6f55cb26825ab3177 dwarfs-0.11.2-Linux-aarch64-clang-reldbg-stacktrace.tar.xz 06fc4ed91ee5c348dbfc70771fe3e3ea6834277e4a58f1f99e0bc98cb16ed3d4 dwarfs-0.11.2-Linux-aarch64-clang.tar.xz 15905007cff432bb9be0bdabed93473764c1706796e0da6f3af083f0a142db6d dwarfs-0.11.2-Linux-x86_64-clang-reldbg-stacktrace.tar.xz 3c82708e00af9d1622e78047efd216e4e29213a60aff3afa8326bade8353ea38 dwarfs-0.11.2-Linux-x86_64-clang.tar.xz 1b38faf399a6d01cd0e5f919b176e1cab76e4a8507088d060a91b92c174d912b dwarfs-0.11.2.tar.xz 8a028693ce0a7ab083b25dc491b100f41fbf98f28413a38f6773fe1cf27574fb dwarfs-0.11.2-Windows-AMD64.7z 600134267dd0ad51dd9d8bd1b58fa614b0a0da9a7a3d57f5fce4dbda9bb80460 dwarfs-universal-0.11.2-Linux-aarch64-clang a9f5f79afeff4eba5cc23893de46e4c8eaa3b51b8f5938ed7f9e6cb92560fa4f dwarfs-universal-0.11.2-Linux-aarch64-clang-reldbg-stacktrace 1bee828de84c1a3a1c2134bc866f28bdf93a62927cb7e8c416813f389f7745ad dwarfs-universal-0.11.2-Linux-x86_64-clang ddbd62d3bf0bf420a1720af6c03bea21ce6a77a73cedc46f28c8d79e4ac26827 dwarfs-universal-0.11.2-Linux-x86_64-clang-reldbg-stacktrace d95dab93a7e9d8349d4a4393213a401d9ded79040b1c48e661df7dfe118b72a7 dwarfs-universal-0.11.2-Windows-AMD64.exe ```

- C++
Published by mhx 11 months ago

dwarfs - dwarfs-0.11.1

Bugfixes

macOS Ventura's version of clang appears to be missing the <source_location> header, despite Apple claiming otherwise. Fix this by shipping a wrapper and providing a fallback implementation.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.11.0...v0.11.1

SHA-256 Checksums

``` 3e1b6331cf2f589d7058700aa2c5dc41f1825f3954f3828eb709034ba57a7c97 dwarfs-0.11.1-Linux-aarch64-clang-reldbg-stacktrace.tar.xz 23b1e0b18a7c3ffeb6c5fcc97ab032a7c1c651454d0fa5cb9741918d97a14ab3 dwarfs-0.11.1-Linux-aarch64-clang.tar.xz 4ec6614e87064ac96dfdb9b7957620bd889f5f1e95416409369d00606cfff0a1 dwarfs-0.11.1-Linux-x86_64-clang-reldbg-stacktrace.tar.xz 1eebf6e66eb5d6dc7cfb9c9b3c7c6e67084acc5ced7a018c15a511e929598f99 dwarfs-0.11.1-Linux-x86_64-clang.tar.xz 7a0cccb1ec3c2a18e9a014893c1d3e1f8f2c44ade6936c9f6d3bab5ec14b2052 dwarfs-0.11.1.tar.xz c43fd9f2089b94ddd2819c7853dd6d8c34951e2d42cbfdc2e4470cde9c3e18fb dwarfs-0.11.1-Windows-AMD64.7z e9bf1f8bcccf363be25396d2f60d9f4e7765eba5bd647f071aa4d0ba5cb3785b dwarfs-universal-0.11.1-Linux-aarch64-clang 22966f1dba98697db0cad127d1d8c50ef5952b5c9816cc2564b9410a37cdbaa3 dwarfs-universal-0.11.1-Linux-aarch64-clang-reldbg-stacktrace 0a025dc0f854ad9f3a5f9ca89ac43ada5305de14fe4b2e03088ce9ee5a23dbf4 dwarfs-universal-0.11.1-Linux-x86_64-clang a38de846f48c7979c204699af6a1fda0d8d634caac68223b2b46fcb1c52c0e56 dwarfs-universal-0.11.1-Linux-x86_64-clang-reldbg-stacktrace b8299e5f2102283c52c1cbd8ad14a0c3b71244937327e895e0aee809d92e4474 dwarfs-universal-0.11.1-Windows-AMD64.exe ```

- C++
Published by mhx 11 months ago

dwarfs - dwarfs-0.11.0

Bugfixes

Remove the access implementation from the FUSE driver. There's no point here trying to be more clever than FUSE's default. This makes sure DwarFS will behave more like other FUSE file systems. See github discussion #244 for details.
Limit the number of chunks returned in inodeinfo xattr. Highly fragmented files would have megabytes in inodeinfo, which not only breaks the xattr interface, but can also dramatically slow down tools like eza who like to read xattrs for no apparent reason.
Avoid nested indentation in manpages to work around ronn-ng bug. Fixes github #249.
Don't link library against jemalloc. This fixes both issues with pydwarfs and issues building with jemalloc support on macOS. Only the binaries are now linked against jemalloc, which should be sufficient.

Features

Support case-insensitive lookups. Fixes github #232.
Allow setting image size in FUSE driver. Fixes github #239.
Support extracting a subset of files with dwarfsextract using the new --pattern option. The same glob patterns can be used as for the filter rules in mkdwarfs. Fixes github #243.
Allow overriding UID / GID for the whole file system when using the FUSE driver on non-Windows platforms. See github discussion #244.
Expose more LZMA options (mode, mf, nice, depth).
Improve filter patterns, which now support ranges and complementation.
Improve speed of filesystem walk / walk_data_order calls by 80% / 40%. The impact of this will largely depend on what the code is being run for each inode, but, for example, the speed of listing more than 14 million files with dwarfsck will take about 16 seconds compared to 17 seconds with the previous release.
Added an inode size cache to the metadata to speed up file size computation for large, highly fragmented files. The configuration is currently fixed using a conservative default. Only files with at least 128 chunks will be added to the cache, so in a lot of cases this cache may be completely empty and not contribute to the size of the file system image at all.
Use bit-packing for hardlink, shared files, and chunk tables. This will consume less memory when loading a DwarFS image.
Show total hardlink size in dwarfsck output.
Library: return a dir_entry_view from readdir and find. This is more consistent, but was previously not easily possible due to the lack of a "self" dir entry in the metadata. The "self" entry has been added and will only impact the size of the metadata if directories metadata is not packed.
Library: prefer std::string_view over char const*.
Library: add directory iterator to directory_view.
Library: support for maxiov parameter in readv call.

Other

Lots of internal refactoring to improve overall code quality.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.10.2...v0.11.0

SHA-256 Checksums

``` 2040a951697ddb78a4b6ee887e06be4295f9d2e3708a311ac72ad9ad7bd28aa3 dwarfs-0.11.0-Linux-aarch64-clang-reldbg-stacktrace.tar.xz 0db0d6bc823d26f86d47f1edf8b4ddbcf85fab24e118be7de9ee091234b5623e dwarfs-0.11.0-Linux-aarch64-clang.tar.xz a7214b10902653c582aa4c21e05e2476518ed1d15e4640cc3eb2bbe53a297120 dwarfs-0.11.0-Linux-x86_64-clang-reldbg-stacktrace.tar.xz 35e851bce5ba6a17b6b53081d1305ebcee5698d8bc770b8b1a875d2986fd6d7c dwarfs-0.11.0-Linux-x86_64-clang.tar.xz 852c96133444493eff6f03324bc2700e31859d75410a937f0714eae9f75d2dd4 dwarfs-0.11.0.tar.xz 15591223010400488c5066a864bcee3ad71c045e2aa4bf60b7c05e9d45909b9f dwarfs-0.11.0-Windows-AMD64.7z da197d19b3eadfea5180034765d70c050ae9b85ade58dd0aa91b65283a079236 dwarfs-universal-0.11.0-Linux-aarch64-clang d58ad14583345d4e7efb4ddb0278ec39c836646a39868422ca1358fa22a990b7 dwarfs-universal-0.11.0-Linux-aarch64-clang-reldbg-stacktrace 72fe171dd9d9abd0bba46e52a983934affbcc9a7349d07854eda91d788ea686b dwarfs-universal-0.11.0-Linux-x86_64-clang 1c5b19c21aca4dc6df8cff3e06358c96fb4e3bb1e969ed3ceef0eb381d84f98b dwarfs-universal-0.11.0-Linux-x86_64-clang-reldbg-stacktrace f2451ed0832c13157f869a3d7ba3596fcb4bb0c5c55741fc054ce6b1bdc977c8 dwarfs-universal-0.11.0-Windows-AMD64.exe ```

- C++
Published by mhx 11 months ago

dwarfs - dwarfs-0.10.2

Bugfixes

Gracefully handle localized error message on Windows. These error messages can contain characters from a Windows (non-UTF-8) code page, which could cause a fatal error in fmt::print in the logging code. Call sites that log such error messages now try to convert these from the code page to UTF-8 or, if that fails, simply replace all characters that are invalid from a UTF-8 point-of-view. Partial fix for #241.
Handle invalid wide chars in file names on Windows. For some reason, Windows allows invalid UTF-16 characters in file names. Try to handle these gracefully when converting to UTF-8. Partial fix for #241.
Workaround for new boost versions which have a process component.
Workaround for a deprecated boost header.
Support for upcoming Boost 1.87.0. io_service was deprecated and replaced by io_context in 1.66.0. The upcoming Boost 1.87.0 will remove the deprecated API. (Thanks to Michael Cho for the fix.)
Disable extended output algorithms (shake(128|256)).
Install libraries to CMAKE_INSTALL_LIBDIR. Fixes #240.
mode/uid/gid checks were expecting 16-bit types.
stricter metadata checks and improved error messages.
Various fixes for filesystem_extractor to prevent memory leaks, correctly handle errors during extraction, and prevent creation of invalid archive outputs due to padding.
Various minor fixes: non-virtual dtors, missing includes, std::move vs. std::forward, unused code removal.

Other

More test cases for stricter metadata checks. Also enable the strict checks in in unit tests by default.
Fix typos in README.md. (Thanks to Christian Clauss for the fix.)
Fix typos in man pages.

New Contributors

@cclauss made their first contribution in https://github.com/mhx/dwarfs/pull/235
@cho-m made their first contribution in https://github.com/mhx/dwarfs/pull/245

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.10.1...v0.10.2

SHA-256 Checksums

``` 2f4d275d006228acb2280c3bf5327f02098c2ba110d81fe3854a80f5fd848058 dwarfs-0.10.2-Linux-aarch64-clang-reldbg-stacktrace.tar.xz 75878252ef0bfc490e5bd6ad5870bc5a02531650ceacf1258807e09606069561 dwarfs-0.10.2-Linux-aarch64-clang.tar.xz 74b52460ebd2d8e752ad7fbe976c683be542a8a581fdf25ac59ba1dea5bc5d0c dwarfs-0.10.2-Linux-x86_64-clang-reldbg-stacktrace.tar.xz a018bfe2531763a273a2d78bc507b1c89fe58a44f7955c980c854a55f9adbaea dwarfs-0.10.2-Linux-x86_64-clang.tar.xz 36767290a39f92782e41daaa3eb45e39550ad1a4294a6d8365bc0f456f75f00c dwarfs-0.10.2.tar.xz c15280d920b67b51b42117612bd8a959eb5ca9ed0202fd765e19743aad61a728 dwarfs-0.10.2-Windows-AMD64.7z 36f72f1ff049a1d955e68547540b932539beab44b0cba23efbdb7a1b0bfd32d4 dwarfs-universal-0.10.2-Linux-aarch64-clang 4d55e783e352a5aafc321f7ac36964b0493601320d3d93d021634e78e743505d dwarfs-universal-0.10.2-Linux-aarch64-clang-reldbg-stacktrace b565399a0a671d06be3e078376e02b388ee14133680b8d19483fc93c294b12d2 dwarfs-universal-0.10.2-Linux-x86_64-clang cb374fc2d64bbf3bd4dd4714f1be37e3d6fc6ecffc7afd93714b6897e9d3751a dwarfs-universal-0.10.2-Linux-x86_64-clang-reldbg-stacktrace eb69b1bf4703d28bd3d5f477dca1ab3460dda4250c7ce1899eb4192c2c1bef69 dwarfs-universal-0.10.2-Windows-AMD64.exe ```

- C++
Published by mhx about 1 year ago

dwarfs - v0.10.1

Bugfixes

Allow building utils_test against a non-compatible, system-installed version of gtest. This is a common issue when trying to integrate dwarfs into a package manager, as these generally disallow fetching external dependencies at build time.
dwarfsck was always reporting a block size of 1 byte rather than the actual block size of the image.
DWARFS_HAVE_LIBBROTLI was not set correctly in the config file, causing build errors if the library was built without brotli.
Several small fixes for building with Homebrew.

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.10.0...v0.10.1

SHA-256 Checksums

``` 53bb766f3a22f019c4bac7cbf995376cb4f3f0ad69e4793471af11c954185227 dwarfs-0.10.1-Linux-aarch64-clang-stacktrace.tar.xz f272f667649d71ec7d29d6822ad4198e13a33e997d722f74f2bca23b239de72f dwarfs-0.10.1-Linux-aarch64-clang.tar.xz 671ce264938ab4cacc8af0aabcacb1ecfffa01284b4959441e921264ae19b47e dwarfs-0.10.1-Linux-x86_64-clang-stacktrace.tar.xz 84894bf6a26cac2eb2c8d43d6fccf1ece7665c4c15050cec494d09199bd8310e dwarfs-0.10.1-Linux-x86_64-clang.tar.xz 4041ed9aa19e03f44dbe69b470f31423a3c358bcd07e78230311b859629785b6 dwarfs-0.10.1-Windows-AMD64.7z db785e0e0f257fa4363d90153db34127add4552791a72998b30ded787840d039 dwarfs-0.10.1.tar.xz 3e003c9a5fbf31b75548c11a2c2c1958f606ce2c2022db4baa6d62b80201c76d dwarfs-universal-0.10.1-Linux-aarch64-clang 44ad0a3f2d89e373b0279d1db7c19aeca46879972a2db226e31ec7ebe8ff103e dwarfs-universal-0.10.1-Linux-aarch64-clang-stacktrace 18f99297c7425bd1bea87d47a2046bfc7e00fe9cc566f024e631ed39a6bb1913 dwarfs-universal-0.10.1-Linux-x86_64-clang c60821be4a248be2feb54905b5bb6c5cd323014bcb7107f0d586ba7f70deb488 dwarfs-universal-0.10.1-Linux-x86_64-clang-stacktrace 768d013d55cd030c1fbabd35ad522648581c79435da4994cc39de75b3a7eda30 dwarfs-universal-0.10.1-Windows-AMD64.exe ```

- C++
Published by mhx over 1 year ago

dwarfs - dwarfs-0.10.0

This release doesn't bring a lot of new features or bugfixes to the command-line tools. However, large parts of the code base have been refactored (in more than 300 commits since the last release) and a couple of long-standing issues have been resolved:

It is finally possible to properly install the libraries that implement most of DwarFS. These libraries come with all necessary headers and a CMake config script so you can start building your own tools.
Shared library builds are now explicitly supported. These have caused some trouble in the past, but the problematic code has been refactored to avoid issue like missing compression / categorizer code.
The code now builds about twice as fast. This is due to shipping generated code in the source tarball and removing a lot of dependencies into the folly library.
It is now possible to do modular builds, i.e. the library, tools and FUSE driver can be built independently.

Bugfixes

Fixed a race condition identified by ThreadSanitizer in the root node name processing.
The terminal abstraction code did not check any errors when trying to determine the terminal width, leading to a random terminal width value. This caused the manual page tests to occasionally crash.
Fixed some flaky tests, e.g. unmounting the FUSE driver on macOS.

Features

Two sets of universal binaries and binary tarballs are provided for Linux platforms: one without any debug symbols, the other with minimal debug symbols and support for stack traces. For the universal binary, only the version without debug symbols will be UPX-compressed, as the stack trace functionality doesn't work with a compressed binary.
Symbolic links to the universal binary may now be suffixed with a version (i.e. any part of the name starting with - and followed by a digit will be ignored, e.g. the symlink could be mkdwarfs-0.10 and it would be treated as mkdwarfs).
Introduced support for extended attributes on Windows, including a new utility for cross-platform xattr manipulation (pxattr, for portable xattr).
Enhanced file system API, adding error-code based and exception-safe versions for getattr, access, and similar functions.
Filter rules now consistently use Unix path separators, even for the root path component. Addresses a comment in github discussion #228.

Improvements

Extensive refactoring to improve code modularity, maintainability and to provide proper libraries. The library code has been moved to different namespaces to make it easier to understand the role of different components (e.g. reader, writer, extractor).
Replaced all folly library dependencies in the public DwarFS library interface with alternatives from libraries like e.g. boost or nlohmann::json which are more broadly available. folly and fbthrift are still used as implementation details, but no longer leak into the public library interfaces.
A much smaller subset of folly is now used in DwarFS and only the necessary components are built, significantly reducing the number of compilation units when building DwarFS.
It is now possible to do modular builds in addition to the default monolithic build, i.e. you can build and install just the DwarFS libraries and later build/install the tools (mkdwarfs, ...) and/or the FUSE driver against these libraries. This is particularly useful for packaging (e.g. in Homebrew, which has removed all FUSE support from the core formulae).
Shared library builds are now explicitly supported. This fixes issues such as github #184.
The source tarball now contains all auto-generated code, e.g. manual pages or generated thrift code. This reduces the number of build-time dependencies (e.g. ronn or mistletoe are no longer required) and significantly reduces the build steps (it is no longer necessary to build the thrift compiler). The build is now roughly twice as fast as in the 0.9.x releases.
The parallel-hashmap, xxHash and zstd submodules have been removed from the git repo and are no longer added to the source tarball. Both xxHash and zstd are now widely available. If a suitable version of parallel-hashmap is found on the system, this will be used, otherwise it will be fetched during the build. Being a header-only library and only used internally, there's no need for it to be installed.
A lot of GCC warnings have been fixed and upstreamed to folly / fbthrift.

New Contributors

@mindfocus made their first contribution in https://github.com/mhx/dwarfs/pull/222

Full Changelog: https://github.com/mhx/dwarfs/compare/v0.9.10...v0.10.0

SHA-256 Checksums

52045506839249823a9b05b711f89e3fd18a729b264149fbe74f557bd8f16b81 dwarfs-0.10.0-Linux-aarch64-clang-stacktrace.tar.xz b12a33650694baa6f546ac18768b8a7b2e1dd90c1c674d26bec7f5149a285537 dwarfs-0.10.0-Linux-aarch64-clang.tar.xz 8593eaecc1bed4f570626ff07b0e56a7f6a025c3a5849f4b0056006ab9801393 dwarfs-0.10.0-Linux-x86_64-clang-stacktrace.tar.xz 53f02da7ff7fe484b3bb91e6532987c3fb59db28131309ccbb46d2c2c9cec9ce dwarfs-0.10.0-Linux-x86_64-clang.tar.xz f24ee132cf0c77b0c94e49b3d8108fdf7e0ff965031bbc34aa51e3c2e066fa43 dwarfs-0.10.0-Windows-AMD64.7z c01ae59d4662e4f027a7c8a5934b7aebe6edc4f7affd836fa73e6d861d18bf35 dwarfs-0.10.0.tar.xz 573656e33a171017046a32500b9ebe50ebd117df16212e6ef8f8d4b034618210 dwarfs-system-gtest.patch 149c9518d5229bf6dd07b5c39df02ed5cbc4bddbf9bf54f6fa5617d76a594b1c dwarfs-universal-0.10.0-Linux-aarch64-clang bfaad42a15c687fdaf502283792322692e479c563c56e8734efd9a298956d716 dwarfs-universal-0.10.0-Linux-aarch64-clang-stacktrace 5f3401cdd7267b6eec6f758f902d7d874534cc77b5be7233944832992e2c0081 dwarfs-universal-0.10.0-Linux-x86_64-clang 307f973fd7da525c5f06f6090b4a4f7a0bea041d785dcb718fb226f6b15f942f dwarfs-universal-0.10.0-Linux-x86_64-clang-stacktrace 99cf71972156ec6f5d88ab0331709ecdc1e80f571582fce95c89b694d3be258b dwarfs-universal-0.10.0-Windows-AMD64.exe

- C++
Published by mhx over 1 year ago

dwarfs - dwarfs-0.9.10

Bugfixes

When cloning LZMA compressor objects, the LZMA filter options of the cloned instance would still point to an options object in the original instance. This could lead to LZMA errors when initializing a new compressor. Fixes github #224.
Fetch range-v3 if no suitable version is found. Fixes github #221.
Filter rules did not work correctly when input is root dir
duf reports odd sizes due to using bsize instead of frsize

- C++
Published by mhx over 1 year ago

dwarfs - dwarfs-0.9.9

Bugfixes

A bug introduced by an optimization to skip hashing of large files if they already differ in the first 4 KiB could, under rare circumstances, lead to an unexpected "inode has no file" exception after the scanning phase. This bug did not cause any file system inconsistency issues; mkdwarfs either crashes with the exception, or its output will be correct. Fixes github #217 (see also for more details).

Features

A sequential access detector was added to the block cache, which can trigger a prefetch of blocks assumed to be read in the future. This improves sequential read throughput roughly by a factor of two. Random access should typically be unaffected. Can be configured / disabled using -o seq_detector.
Added tracing support in FUSE driver and dwarfsextract, which allows simple performance analysis using chrome://tracing. Traces can be enabled using -o perfmon_trace and --perfmon-trace.
Added performance monitoring and tracing support for the block cache.

Performance

Significantly improved the speed of dwarfsck --checksum.

- C++
Published by mhx almost 2 years ago

dwarfs - dwarfs-0.9.8

Bugfixes

Build custom version of libcrypto to link with the release binaries in order for them to run properly on FIPS-enabled setups. Fixes github #210.
When mounting a DwarFS image on macOS and viewing the volume in Finder, only the directories were shown, but no files. The root cause was that a non-existent extended attribute is reported via a different error code in macOS (ENOATTR) compared to Linux (ENODATA) and the wrong error code was returned for certain Finder-related attributes. Fixes github #211.
macOS builds using jemalloc were crashing when calling mallctl("version", ...). The root cause of the crash is still unclear, but as a workaround, the jemalloc version is compiled in from a preprocessor constant rather than using mallctl.

- C++
Published by mhx almost 2 years ago

dwarfs - dwarfs-0.9.7

Bugfixes

Handle root uid correctly in access() implementation. Fixes github #204.

Features

Show and track library dependencies. Dependencies will be displayed in the command line help; they will also be tracked in the history metadata of a DwarFS image. See also github #207.

Documentation

Describe nilsimsa ordering algorithm more accurately.

Performance

Reorder branches to improve ricepp speed with real world data.
Some tweaks to improve segmenter speed.

- C++
Published by mhx almost 2 years ago

dwarfs - dwarfs-0.9.6

Bugfixes

Add workaround for new glog release breaking the folly build. Fixes github #201.

Performance

Improve ricepp decoding speed by about 25% on x86 and arm, and up to 100% on Windows. Also improve encoding speed on Windows by 25%. No more need for special hybrid Clang build on Windows.

- C++
Published by mhx almost 2 years ago

dwarfs - dwarfs-0.9.5

Bugfixes

Windows path handling was wrong and didn't work properly for e.g. network shares. This is hopefully fixed for all tools now.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.9.4

Bugfixes

(fix) Prevent installation of ricepp headers/libs. Fixes github #195.
(fix) Don't fetch googletest in ricepp build if the targets are already available. Fixes github #194.

Features

Added blocksize option to the FUSE driver, which allows the st_blksize value to be configured for the mounted file system. This can be used to optimize throughput.
Added experimental readahead option to the FUSE driver. This can potentially increase throughput when performing sequential reads.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.9.3

Bugfixes

v0.8.0 removed the implementation of the null decompressor under the assumption that it was no longer used; it was, however, still used when recompressing an image with null-compressed blocks. The change to remove the implementation was reverted and a new test case was added. Fixes github #193.

Performance

Some more ricepp compression speed improvements. Also, the universal binaries for x86_64 now automatically choose a ricepp version based on CPU capabilities.
For Windows, there's an experimental -ricepp package/binary. This contains a "hybrid" build where the ricepp library was built using clang and everything else using cl. This binary offers significantly faster ricepp compression. Decompression speeds are similar to the regular package/binary. If you don't care about compressing large amounts of FITS files on Windows, just stick to the regular package/binary.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.9.2

Bugfixes

(fix) v0.9.0 introduced an optimization where large files of equal size were only fully hashed for deduplication if the first 4K of their contents also produced the same hash. This introduced a bug causing an exception to be thrown when processing large hard-linked files. The root cause was that the data structure intended to be used for exactly this case was just never populated, and the fix was adding a single line to fill the data structure. The test cases didn't cover large hard-linked files, so this slipped through into the release. A new test case has been added as well.
(fix) On Windows, when using Power Shell, the error message dialog for a missing WinFsp DLL was not shown when running dwarfs.exe. The workaround is to use the same delayed loading mechanism that's already used for the universal binary and show the error in the terminal. See also the discussion on github #192.

Features

Added a --list option to dwarfsck. This lists all files in the files system image. When used with --verbose, the list also shows permissions, size, uid/git and symbolic link information. Fixes github #192.
Added a --checksum option to dwarfsck. This produces output similar to the *sum programs from coreutils and can be used to check the contents of a DwarFS image against local files.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.9.1

Bugfixes

Invalid UTF-8 characters in file paths would crash mkdwarfs if these paths were displayed in the progress output. A possible workaround was to disable progress output. This fix replaces any invalid characters before displaying them. Fixes github #191.
The CMakeLists.txt would bail out as soon as it discovered --as-needed in the linker flags. However, --as-needed is only a problem when combined with BUILD_SHARED_LIBS=ON. The check has been changed to only trigger if both conditions are met.

Other

Minor speed improvements in ricepp compression.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.9.0

Only two weeks since the last release, but another major milestone: DwarFS now runs on all major platforms, including macOS. There are no macOS binaries available for download, though, but the installation procedure is relatively simple and I'm really hoping for a Homebrew formula to be added soon.

The only other change since v0.8.0 is the addition of the ricepp compression algorithm and the fits categorizer, both of which are intended to be used together for efficiently compressing raw data in astrophotography.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.8.0

After more than 600 commits, it's time for another major release. In addition to a long list of fixes, there are quite a few new features, most notably a categorization framework that allows identifying different categories of files and treating them differently. Right now, there are only two categorizers — pcmaudio and incompressible — but there are hopefully more to come. Along with the pcmaudio categorizer, support for FLAC compression has been added. This allows for large collections of uncompressed audio files to be archived efficiently, and also accessed efficiently: the DwarFS FUSE driver can decode a large audio file using multiple cores, something that cannot be done with a single compressed FLAC file.

The project code is now tested much more thoroughly; various new abstractions allow the command line interfaces to actually be covered by the unit tests.

Also, unlike many previous releases, images produced by this release will be compatible with older releases as long as they don't use new features like FLAC compression or history sections, which are unsuppored by older releases. The 0.7.3 and later releases will even deal with unknown sections and compression algorithms. Going forward, use of new features will be tracked by feature flags, so older releases can determine if the feature set used by a file system image is fully or partially supported.

Last but not least, the binaries can now be built with manual pages built-in. This is particularly useful on Windows, where man is not a thing, but also with the universal binaries if you don't have a full install and need to quickly check the manual. The manuals can be read using the --man option.

New Features

Categorizer framework. Initially supported categorizers are pcmaudio (detect audio data & metadata and provide context for FLAC compressor) and incompressible (detects "incompressible" data). Enabled using the --categorize option.
Multiple segmenters can now run in parallel and write to the same filesystem image in a fully deterministic way. Currently, a segmenter instance will be used per category/subcategory. This can makes segmenting multi-threaded in cases where there are multiple categories. The number of segmenter worker threads can be configured using --num-segmenter-workers.
The segmenter now supports different "granularities". The granularity is determined by the categorizer. For example, when segmenting the audio data in a 16-bit stereo PCM file, the granularity is 4 (bytes). This ensures that the segmenter will only produce chunks that start/end on a sample boundary.
The segmenter now also features simple "repeating sequence detection". Under certain conditions, these sequences could cause the segmenter to slow down dramatically. See github #161 for details.
FLAC compression. This can only be used along with the pcmaudio categorizer. Due to the way data is spread across different blocks, both FLAC compression and decompression can likely make use of multiple CPU cores for large audio files, meaning that loading a .wav file from a DwarFS image using FLAC compression will likely be much faster than loading the same data from a single FLAC file.
Completely new similarity ordering implementation that supports multi-threaded and fully deterministic nilsimsa ordering. Also, nilsimsa options are now ever so slightly more user friendly.
The --recompress feature of mkdwarfs has been largely rewritten. It now ensures the input filesystem is checked before an attempt is made to recompress it. Decompression is now using multiple threads. Also, recompression can be applied only to a subset of categories and compression options can be selected per category.
mkdwarfs now stores a history block in the output image by default. The history block contains information about the version of mkdwarfs, all command line arguments, and a time stamp. A new history entry will be added whenever the image is altered (i.e. by using --recompress). The history can be displayed using dwarfsck. History timestamps can be disabled using --no-history-timestamps for bit-identical images. History creation can also be completely disabled using --no-history.
All tools now come with built-in manual pages. This is valuable especially on Windows, which doesn't have man at all, or for the universal binaries, which are usually not installed alongside the manual pages. Running each tool with --man will show the manual page for the tool, using the configured pager. On Windows, if less.exe is in the PATH, it'll also be used as a pager.
New verbose logging level (between info and debug).
Logging now properly supports multi-line strings.
Show compression library versions as part of the --help output. For dwarfsextract, also show libarchive version.
--set-time now supports time strings in different formats (e.g. 20240101T0530).
mkdwarfs can now write the filesystem image to stdout, making it possible to directly stream the output image to e.g. netcat.
Progress display for mkdwarfs has been completely overhauled. Different components (e.g. hashing, categorization, segmenting, ...) can now display their own progress in addition to a "global" progress.
mkdwarfs now supports ordering by "reverse path" with --order=revpath. This is like path ordering, but with the path components reversed (i.e. foo/bar/baz.xyz will be ordered as if it were baz.xyz/bar/foo).
It is now possible to configure larger bloom filters in mkdwarfs.
The mkdwarfs segmenter can now be fully disabled using -W 0.
mkdwarfs now adds "feature sets" to the filesystem metadata. These can be used to introduce now features without necessarily breaking compatibility with older tools. As long as a filesystem image doesn't actively use the new features, it can still be read by old tools. Addresses github #158.
dwarfsck has a new --quiet option that will only report errors.
dwarfsck with --print-header will exit with a special exit code (2) if the image has no header. In all other cases, the exit code will be 0 (no error) or 1 (error).
The --json option of dwarfsck now outputs filesystem information in JSON format.
dwarfsck has a new --no-check option that skips checking all block hashes. This is useful for quickly accessing filesystem information.
The FUSE driver exposes a new dwarfs.inodeinfo xattr on Linux that contains a JSON object with information about the inode, e.g. a list of chunks and associated categories.
Don't enable readlink in the FUSE driver if filesystem has no symlinks. This is mainly useful for Windows where symlink support increases the number of getattr calls issued by WinFsp.
As an experimental feature, CPU affinity for each worker group can be configured via the DWARFS_WORKER_GROUP_AFFINITY environment variable. This works for all tools, but is really only useful if you have different types of cores (e.g. performance and efficiency cores) and would like to e.g. always run the segmenter on a performance core.
The universal binaries are now compressed with a different upx compression level, making them slightly bigger, but decompress much faster.

Bugfixes

Allow version override for nixpkgs. Fixes github #155.
Resize progress bar when terminal size changes. Fixes github #159.
Add Extended Attributes section to README. Fixes github #160.
Support 32-bit uid/gid/mode. Also support more than 65536 uids/gids/modes in a filesystem image. Fixes gh #173.
Add workaround for broken utf8cpp release. Fixes github #182.
Don't call check_section() in filesystem ctor, as it renders the section index useless. Also add regression test to ensure this won't be accidentally reintroduced. Fixes github #183.
Ensure timely exit in progress dtor. This could occasionally block command line tools for a few seconds before exiting.
--set-owner and --set-group did not work properly with non-zero ids. There were two distinct issues: (1) when building a DwarFS image with --set-owner and/or --set-group, the single uid/gid was stored in place of the index and the respective lookup vectors were left empty and (2) when reading such a DwarFS image, the uid/gid was always set to zero. The issue with (1) is not only that it's a special case, but it also wastes metadata space by repeatedly storing a potentially wide integer value. This fix addresses both issues. The uid/gid information is now stored more efficiently and, when reading an image using the old representation, the correct uid/gid will be reported. Unit tests were added to ensure both old and new formats are read correctly.
mkdwarfs is now much better at handling inaccessible or vanishing files. In particular on Windows, where a successful access() call doesn't necessarily mean it'll be possible to open a file, this will make it possible to create a DwarFS file system from hierarchies containing inaccessible files. On other platforms, this means mkdwarfs can now handle files that are vanishing while the file system is being built.
mkdwarfs progress updates are now "atomic", i.e. one update is always written with a single system call. This didn't make much of a difference on Linux, but the notoriously slow Windows terminal, along with somewhat interesting thread scheduling, would sometimes make the updates look like a typewriter in slow-motion.
utf8_truncate() didn't handle zero-width characters properly. This could cause issues when truncating certain UTF8 strings.
A race condition in simple progress mode was fixed.
A race condition in filesystem_writer was fixed.
The --no-create-timestamp option in mkdwarfs was always enabled and thus useless.
Common options (like --log-level) were inconsistent between tools.
Progress was incorrect when mkdwarfs was copying sections with --recompress.
Treat NTFS junctions like directories.
Fix canonical path on Windows when accessing mounted DwarFS image.
Fix slow sorting in file_scanner due to path comparison.
On Windows, don't crash with an assertion if the input path for mkdwarfs is not found.

Removed Features

Python scripting support has been completely removed.

Documentation

Add mkdwarfs sequence diagram.
Document known issues with WinFsp.
Update README with extended attributes information.
Add script to check if all options are documented in manpage.

Building

Factor out repetitive thrift library code in CMakeLists.txt.
Use FetchContent for both fmt and googletest.
Use mold for linking when available.
The CI workflow now uploads coverage information to codecov.io with every commit.

Testing

A ton of tests were added (from 4 kLOC to more than 10 kLOC) and, unsurprisingly, a number of bugs were found in the process.
Introduced I/O abstraction layer for all *_main() functions. This allows testing of almost all tool functionality without the need to start the tool as a subprocess. It also allows to inject errors more easily, and change properties such as the terminal size.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.7.5

Bugfixes

Fix crash in the FUSE driver on Windows when tools like Notepad++ try to access a file like a directory (presumably because this works in cases where the file is an archive). This is a Windows-only issue because the Linux FUSE driver uses the inode-based API, whereas the Windows driver uses the string-based API. While parsing a path in the string-based API, there was no check whether a path component was a directory before trying to descend further.

Other

The universal binaries have been compressed using a different compression level (-9 instead of --best --ultra-brute) in upx. The compression ratio is slightly worse, but the decompression speed is significantly faster.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.7.4

Bugfixes

Fix regression that broke section index optimization introduced in v0.7.3. Fixes github #183.
Add workaround for broken utf8cpp release. Fixes github #182.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.7.3

This is a small incremental update over the 0.7.2 release adding a single new feature: forward compatibility. This means that the 0.7.3 release will be able to handle DwarFS file system images created with newer releases as long as these images don't use features that are not understood by the older binaries. Up until now, support for new features often triggered a file system version increment, rendering the images unusable with older binaries even if the features weren't actually used in the image. This fixes #158.

- C++
Published by mhx about 2 years ago

dwarfs - dwarfs-0.7.2

Bugfixes

Fix locale fallback if user-default locale cannot be set. Fixes github #156.

- C++
Published by mhx over 2 years ago

dwarfs - dwarfs-0.7.1

Bugfixes

Fix potential division by zero crash in speedometer.

Other

New tool header.
Source code cleanups.
Updated static build procedure (see README).

- C++
Published by mhx over 2 years ago

dwarfs - dwarfs-0.7.0

This release took much longer than anticipated, but comes with a rather big surprise (for me, at least): Windows support! I didn't expect this to happen just yet, especially given that I haven't really used Windows over the past two decades. My biggest worries were all the dependencies, but fortunately I came across vcpkg and all of a sudden, porting DwarFS to Windows seemed feasible. So here we are, and all the different tools (mkdwarfs, dwarfsck, dwarfsextract and the FUSE driver dwarfs) are now working on Windows.

As of this release, in addition to the "classic" statically linked binaries, DwarFS is also available as a universal binary for each platform. The universal binaries bundle the four main tools (mkdwarfs, dwarfsck, dwarfsextract, dwarfs) in a single, compressed binary that is between 2.5 and 4 MiB in size, a fraction of the size of the standalone binaries. The tools can be accessed either by passing the --tool=<name> option as the first argument, or, more conveniently, by creating symbolic links to the universal binary using the name of the respective tool.

New Features

Windows support. All tools are fully working on Windows, including tfeatures such as hard links, symbolic links, Unicode file names. Thanks to WinFsp, the FUSE driver is also working, albeit with a few quirks (1, 2, 3, 4) compared to the Linux version.
Universal binaries that bundle all tools in a single binary. On Windows, the universal binary supports delayed loading of WinFsp DLL. This makes the mkdwarfs, dwarfsck and dwarfsextract tools usable without the WinFsp DLL.
Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.
Added --filter option to support simple (rsync-like) filter rules. This resulted from a discussion on github #6.
Added --compress-niceness option to mkdwarfs. This lowers the priority of the compression worker threads, which has two advantages: a system running mkdwarfs will generally be more responsive, and the compression threads won't starve themselves by taking processing power away from the segmenter.
Added --stdout-progress option to dwarfsextract for use with tools such as yad. Fixes github #117.
Added --chmod option to mkdwarfs. Fixes github #7.
Added --input-list option to support reading a list of input files from a file or stdin. At least partially fixes github #6.
Added support for choosing the file hashing algorithm using the --file-hash option. This allows you to pick a secure hash instead of the default XXH3 hash. Also fixes github #92.
Added --max-similarity-size option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much.
Added --num-scanner-workers option.
Added support for extracting corrupted file systems with dwarfsextract. This is enabled using the --continue-on-error and, if really needed, --disable-integrity-check options. Fixes github #51.
Show throughput in the scanning and segmenting phases in mkdwarfs.
Show how much of a file has been consumed in the segmenting phase in mkdwarfs. Useful primarily for large files.
New metadata format (v2.5). The only change is the addition of a "preferred path separator". This is used to correctly interpret symbolic links, as this is the only place where path separators are stored in DwarFS at all.
dwarfs and dwarfsextract now have options to enable performance monitoring. This can provide insight into the latency of various file system operations.
Unreadable files are now added as empty files instead of being ignored. Fixes github #40.
Honour user locale settings when formatting numbers.

Performance improvements

Added a small offset cache to improve random access as well as sequential read latency for large, fragmented files. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. The DwarFS FUSE driver is now capable of achieving read throughput of more than 6 GB/s on a Xeon(R) E-2286M machine.
Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.

Bugfixes

Use folly::hardware_concurrency(). Fixes github #130.
Handle ARCHIVE_FAILED status from libarchive, which could be triggered by trying to write long path names to old archive formats (e.g. USTAR, which has a limit of at most 255 characters).
Properly handle unicode path truncation.
Support LZ4 compression levels above 9.
Fix heap-use-after-free in dwarfsextract due to missing archive_write_close() call.
Fix heap-use-after-free in brotli decompressor due to re-allocation of the decompressed block data.
Default FUSE driver debuglevel to warn in background mode. Fixes github #113.
Fixed extract_block.py, which was incorrectly using printf instead of print.

Documentation

Updated file system format documentation to cover headers and section indices.
Documented how to produce bit-identical images.
Updated internal operation section of mkdwarfs manpage.

Testing

Lots of new tools tests.
Removed dependency on tar and diff binaries, mainly driven by their unavailability on Windows.
Added GitHub workflow based CI pipeline to avoid regressions and simplify builds.

Other

The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.
Started using C++20 features.
Versioning files are no longer written to the git source tree.

- C++
Published by mhx over 2 years ago

dwarfs - dwarfs-0.7.0-RC6

Features

Support delayed loading of WinFsp DLL for universal binary. This makes the mkdwarfs, dwarfsck and dwarfsextract tools of the universal binary usable without the WinFsp DLL.

Performance

Optimized the offset cache to improve random read latency as well as sequential read latency. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. Fixes github #142.

Bugfixes

Fixed building with make instead of ninja. Also fix builing in Debug mode. Fixes github #146.
Fixed ninja clean.
Fixed symlink creation for mount.dwarfs/mount.dwarfs2.

Other

Added CI pipeline.
Don't write versioning files to source tree.

- C++
Published by mhx over 2 years ago

dwarfs - dwarfs-0.7.0-RC5

Features

Windows support. All tools can now be built and run on Windows, including the FUSE driver, which makes use of WinFsp. Also fixes github #85.
Build a "universal" binary that combines mkdwarfs, dwarfsck, dwarfsextract and dwarfs in a single binary. This binary can be used either through symbolic links with the proper names of the tool, or by passing --tool=<name> as the first argument on the command line.
Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
Show throughput in the scanning and segmenting phases in mkdwarfs.
Show how much of a file has been consumed in the segmenting phase. Useful primarily for large files.
dwarfs and dwarfsextract now have options to enable performance monitoring. This can give insight into the latency of various file system operations.
Added inode offset cache, which improves read() latency for very fragmented files.

Bugfixes

Use folly::hardware_concurrency(). Fixes github #130.
Handle ARCHIVE_FAILED status from libarchive, which could be triggered by trying to write long path names to old archive formats.
Properly handle unicode path truncation.

Documentation

Update file system format documentation to cover headers and section indices.

Testing

Lots of new tools tests.
Remove dependency on tar and diff binaries.

Other

Switch to C++20.

- C++
Published by mhx over 2 years ago

dwarfs - dwarfs-0.7.0-RC4

Features

Add --compress-niceness option to mkdwarfs.

- C++
Published by mhx about 3 years ago

dwarfs - dwarfs-0.7.0-RC3

Bugfixes

Fix heap-use-after-free in dwarfsextract.
Fix dwarfs benchmark binary.

Features

Add --stdout-progress option to dwarfsextract. Fixes github #117.

Other

Reduce amount of test data to speed up compiles and avoid timeouts on travis.

- C++
Published by mhx over 3 years ago

dwarfs - dwarfs-0.7.0-RC2

Bugfixes

Fix linking against compression libs. Fixes github #112.
Default FUSE driver debuglevel to warn in background mode. Fixes github #113.

Features

Add --chmod option. Fixes github #7.
Add unreadable files as empty files. Fixes github #40.

Documentation

Document how to produce bit-identical images
Update internal operation section of mkdwarfs manpage
Add more documentation details for --file-hash option

Other

Test image reproducibility for path and similarity ordering

- C++
Published by mhx over 3 years ago

dwarfs - dwarfs-0.7.0-RC1

Bugfixes

Fixed extract_block.py, which was incorrectly using printf instead of print.
Support LZ4 compression levels above 9.

Features

Added --filter option to support simple (rsync-like) filter rules. This was driven by a discussion on github #6.
Added --input-list option to support reading a list of input files from a file or stdin. At least partially fixes github #6.
The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.
Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.
Added support for choosing the file hashing algorithm using the --file-hash option. This allows you to pick a secure hash instead of the default XXH3. Also fixes github #92.
Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.
Added --max-similarity-size option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much.
Honour user locale when formatting numbers.
Added --num-scanner-workers option.
Added support for extracting corrupted file systems with dwarfsextract. This is enabled using the --continue-on-error and, if really needed, --disable-integrity-check options. Fixes github #51.

Other

Added unit tests for progress class.
Lots of internal cleanups.

- C++
Published by mhx over 3 years ago

dwarfs - dwarfs-0.6.2

Bugfixes

Fix #91: image creation reproducibility. Add --no-create-timestamp option, produce deterministic inode numbers and fix fsst bug that causes symbol tables to be non-deterministic. Images built while omitting create timestamps will now be bit-identical.
Fix #93: only overwrite existing output file when --force option given on command line.
Fix #104: extracting large files was causing dwarfsextract to OOM. This was fixed by extracting large files in chunks rather than all at once.
Fix #105: handle strrchr() return NULL.
Fix out-of-bounds access (PR #106).
Fix swapped-out cached block detection (PR #107).
Fix data race in cached block that was triggered by statistics collection and could cause the process to crash.
Fix heap-use-after-free when writing section index.

- C++
Published by mhx over 3 years ago

dwarfs - dwarfs-0.6.1

Bugfixes

Fix binary installation. This caused the 0.6.0 binary release to contain test binaries as well as duplicate binaries.
The fuse2 driver (dwarfs2) was also missing in the 0.6.0 binary release.

- C++
Published by mhx over 3 years ago

dwarfs - dwarfs-0.6.0

Features

Add support for cache tidying, which releases cache memory when the mounted file system is unused.
Section index support for speeding up mount times (fixes #48).

Bugfixes

Fix and simplify static builds as much as possible. Document how to set up a static build environment. This also fixes #75 and #54. Huge shoutout to Maxim Samsonov (@maxirmx) for implementing most of this!
Fix #71: driver hangs when unmounting
Fix #67: dwarfs I/O hangs if call to to fuse_reply_iov fails
Fix #86: block size bits config issues
Various build fixes.

- C++
Published by mhx over 3 years ago

dwarfs - dwarfs-0.5.6

Bugfixes

Build fixes for gcc-11 (fixes #52)
Use REALPATH in version.cmake to fix building in symbolically linked repositories (fixes #47).

- C++
Published by mhx over 4 years ago

dwarfs - dwarfs-0.5.5

Features

If a filesystem block cannot be compressed to less than the uncompressed size, it will be stored uncompressed. This feature actually fixes the bug described below.

Bugfixes

When building a filesystem from high entropy input data (e.g. already compressed files), and when using LZMA compression with block sizes >= 25, the LZMA algorithm could be unable to pack a block into the worst-case allocated size. This behaviour was not expected and crashed mkdwarfs, and seems to me like a bug in LZMA's lzma_stream_buffer_bound() function. The issue has been fixed by not compressing blocks at all if the compressed size matches or exceeds the uncompressed size. This fixes part of github #45.
Filesystems created such that after segmenting the total data size was a multiple of the block size (i.e. the last block was completely filled) had the last block written to the image twice. Such a filesystem image is perfectly usable, but the repeated block uses space unnecessarily. This is highly unlikely to happen with real data.
Filesystems created with -P shared_files, but no shared files in the source tree, were created correctly, but could not be loaded. This has been fixed and the filesystems can now be loaded correctly.

Other

Added tests for binaries and FUSE driver.
Minor code cleanups.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.5.4

Bugfixes

FUSE driver hangs when accessing files and the driver is not started in foreground or debug mode. This bug is present in both the 0.5.2 and 0.5.3 releases. Fixes github #44.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.5.3

Bugfixes

Add PREFER_SYSTEM_GTEST for distributions (like Gentoo) that have a gtest package. (fixes github #42)
Make sure the source tarball can be built inside a git repo. The version file generation code would attempt to pull information from any outside git repository without checking if it's actually the DwarFS repo. This issue came up when building Arch Linux packages.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.5.2

Bugfixes

Make FUSE driver exit with non-zero exit code if filesystem cannot be mounted. Fixes github #41.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.5.1

Bugfixes

fsst library was built with -march=native, which caused the static binaries not to work on non-AVX platforms. The fsst library is now being built with no extra flags.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.5.0

New Features

New metadata format (v2.3). This includes a number of changes:
- Correct hardlink preservation. With older metadata formats, all duplicate files would appear hardlinked. The new format preserves hardlinked files exactly as present in the input data, and performs additional deduplication at a lower level.
- The new format offers a lot of customization for additional packing of metadata. You can use these to trade off metadata size, mounting speed, etc. Especially for filesystems with millions of files, the metadata size can be reduced significantly.
- In particular, filename and symlink data can be stored in a format that reduces the size by roughly a factor of two, but still allows for random access, so the compressed data can be mapped into memory and decompressed on the fly.
DwarFS now directly supports images using a custom header. The header can be completely arbitrary. mkdwarfs can write, replace or remove such headers, and all other tools can either skip to a specified offset, or determine this offset automatically. This fixes github #38.
dwarfsck has been improved to perform extensive metadata checks.
dwarfsck now shows a detailed breakdown of metadata memory usage, which can be used to optimize metadata packing options.
Added ENABLE_COVERAGE cmake option.

Performance improvements

Scanning has been significantly optimized and is now up to three times faster on average.
Digest computation has been parallelized in both mkdwarfs and dwarfsck giving better performance on multi-core systems.
A set of micro-benchmarks has been added to evaluate the performance of different filesystem operations. This can be build by enabling the -DWITH_BENCHMARKS=1 cmake option.
Zstd contexts are now reused during compression, which seems to give some minor speedup.

Bugfixes

Disable multiversioning on non-x86 platforms, which broke the ARM build.
Due to a bug in the bloom filter code, only half of each 64-bit block in the bloom filter was utilized, which reduced the efficiency of the filter. The bug was spotted thanks to ubsan. With the fixed filter being twice as effective, the default size of the bloom filter has now been halved.
When exporting metadata using --export-metadata, dwarfsck was not truncating the output file, which could lead to a corrupt metadata export.

Other

Compatibility testing with older filesystem versions has been improved.
A new test suite has been added to check detection of corrupted DwarFS images.
Added some high level internals documentation for mkdwarfs.
Documented the filesystem and metadata formats.
Lots of internal cleanups.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.4.1

Performance improvements

Binaries built with gcc have traditionally been much slower than those built with clang, but it was unclear why that was the case. It turns out the reason is simply that CMake defaults to -O3 optimization, which is known to cause performance regressions in some cases. The build has been changed to always build with -O2 when doing an optimized GCC build. The Clang build is unaffected. (fixes github #14)
The segmenting code now uses a bloom filter to discard unsuccessful matches as early and quickly as possible. While this only gives a minor speedup when using a single lookback block, as you increase the number of lookback blocks speed is barely affected whereas before it would slow down significantly. The bloom filter size (relative to the number of values) can be tuned by using --bloom-filter-size, though increasing it any further from the default is likely not going to make a difference.
nilsimsa similarity computation has been improved to make use of different instruction sets depending on CPU architecture, speeding up the process of ordering files by similarity by almost a factor of 2.

Bugfixes

[fix] Linking against libarchive was fixed so that it also works for shared library builds. (fixes github #36)
mkdwarfs didn't catch certain exceptions correctly, which would cause a stack trace instead of a simple error message. This has been fixed.
The statically linked executables were unable to handle any exceptions at all due to duplicate stack unwinding code. This has (hopefully) been fixed now.

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.4.0

Up to twice as fast and up to 10% better compression

The segmenting algorithm has been completely rewritten and is now much cleaner, uses much less memory, is significantly faster and detects a lot more duplicate segments. At the same time it's easier to configure (just a single window size instead of a list).

As a result, mkdwarfs speed has been significantly improved. The 47 GiB worth of Perl installations can now be turned into a DwarFS image in less then 6 minutes, about 30% faster than with the 0.3.1 release. Using lzma compression, it actually takes less than 4 minutes now, almost twice as fast as 0.3.1.

At the same time, compression ratio also significantly improved, mostly due to the new segmenting algorithm. With the 0.3.1 release, using the default configuration, the 47 GiB of Perl installations compressed down to 471.6 MiB. With the 0.4.0 release, this has dropped to 426.5 MiB, a 10% improvement. Using lzma compression (-l9), the size of the resulting image went from 319.5 MiB to 300.9 MiB, about 5% better. More importantly, though, the uncompressed file system size dropped from about 7 GiB to 4 GiB thanks to improved segmenting, which means less blocks need to be decompressed on average when using the file system.

New `dwarfsextract` tool

The new tool allows extracting a file system image directly to disk without having to use the FUSE driver. It also allows conversion of the file system image directly into a standard archive format (e.g. tar or cpio). Extracting a DwarFS image can be significantly faster than extracting a equivalent compressed archive.

Options have been cleaned up

The --blockhash-window-sizes and --blockhash-increment-shift options were replaced by --window-size and --window-step, respectively. The new --window-size option takes only a single window size instead of a list. There's also a new option --max-lookback-blocks that allows duplicate segments to be detected across multiple blocks, which can result in significantly better compression when using small file system blocks.

Bugfixes

The rewrite of the segmenting algorithm was triggered by a "bug" (github #35) that caused excessive memory consumption in mkdwarfs. It wasn't really a bug, though, more like a bad algorithm that used memory proportional to the file size. This issue has now been fully solved.
Scanning of large files would excessively grow mkdwarfs RSS. The memory would have sooner or later be reclaimed by the kernel, but the code now actively releases the memory while scanning.
The project can now be built to use the system installed zstd and xxHash libraries. (fixes github #34)
The project can now be built without the legacy FUSE driver. (fixes github #32)

- C++
Published by mhx almost 5 years ago

dwarfs - dwarfs-0.3.1

Bugfix release

This fixes a couple of minor compilation issues mostly related to issue #31.

- C++
Published by mhx about 5 years ago

dwarfs - dwarfs-0.3.0

Even better compression than before

Mostly thanks to a new ordering algorithm that is now enabled by default, I've seen a 15% improvement in achievable compression ratio. In my standard test of packing 48 GiB of Perl installations, the resulting DwarFS image size reduced from 556 MiB to 472 MiB without any regression in compression speed.

More memory efficient FUSE driver

By switching to jemalloc, the FUSE driver has become much more memory efficient, using up to ten times less memory than with the standard glibc allocator.

Python scripting support

The Lua scripting interface has been fully replaced by a new Python interface. I've been looking for a luabind replacement, but none of the candidates seemed to be well maintained or reasonably easy to integrate. Python is much more approachable for most people and boost::python seems well maintained. The new interface also has a lot more features. You can find an example script in the distribution.

Fix for file system images created with versions before dwarfs-0.2.3

If you've created DwarFS images with the 0.2.0, 0.2.1 or 0.2.2 releases, symbolic links were stored in a way that the FUSE driver in the 0.2.x releases could not read them back correctly. With the new 0.3.0 release, these old images, including the symbolic links, can now be read again, so there's no need to rebuild your old images.

Improved file system format

The file system format has been updated with the 0.3.0 release to include integrity checking via SHA2-512/256 hashes as well as features that should make recovery easier in case of file system image corruption. In addition to the SHA hashes, the extremely fast xxHash library is used to store a second hash that is checked every time any part of the file system is used. While there are currently no recovery features implemented, having this data in the file system already should be really valuable. You can convert an old image to the new format using:

mkdwarfs -i old.dwarfs -o new.dwarfs --recompress none

Statically linked 64-bit Linux binaries available

Given the long list of dependencies, building DwarFS might not be an option for you. In that case, you can now download the binary distribution that should work fine on most 64-bit Linux distributions. FUSE drivers are included for both FUSE2 and FUSE3

Lots of smaller fixes & changes

See the Change Log for a full list of changes.

- C++
Published by mhx about 5 years ago

dwarfs - dwarfs-0.3.0-RC1

- C++
Published by mhx about 5 years ago

dwarfs - dwarfs-0.2.4

Fix --set-owner and --set-group options, which caused an exception to be thrown at the end of creating a file system. (fixes github #24)

- C++
Published by mhx about 5 years ago

Recent Releases of dwarfs

dwarfs - dwarfs-0.13.0

FreeBSD, big-endian, and many new architectures

Metadata manipulation

Bug fixes

Features

Docs

SHA-256 Checksums

dwarfs - dwarfs-0.12.4

Bugfixes

Features

Build

Other

SHA-256 Checksums

dwarfs - dwarfs-0.12.3

Bugfixes

Build

SHA-256 Checksums

dwarfs - dwarfs-0.12.2

Bugfixes

Build

SHA-256 Checksums

dwarfs - dwarfs-0.12.1

Bugfixes

Features

Build

SHA-256 Checksums

dwarfs - dwarfs-0.12.0

New Licensing Conditions

Significantly reduced binary size

Bugfixes

Features

SHA-256 Checksums

dwarfs - dwarfs-0.11.3

Bugfixes

New Contributors

SHA-256 Checksums

dwarfs - dwarfs-0.11.2

Bugfixes

SHA-256 Checksums

dwarfs - dwarfs-0.11.1

Bugfixes

SHA-256 Checksums

dwarfs - dwarfs-0.11.0

Bugfixes

Features

Other

SHA-256 Checksums

dwarfs - dwarfs-0.10.2

Bugfixes

Other

New Contributors

SHA-256 Checksums

dwarfs - v0.10.1

Bugfixes

SHA-256 Checksums

dwarfs - dwarfs-0.10.0

Bugfixes

Features

Improvements

New Contributors

SHA-256 Checksums

dwarfs - dwarfs-0.9.10

Bugfixes

dwarfs - dwarfs-0.9.9

Bugfixes

Features

Performance

dwarfs - dwarfs-0.9.8

Bugfixes

dwarfs - dwarfs-0.9.7

Bugfixes

Features

Documentation

Performance

dwarfs - dwarfs-0.9.6

Bugfixes

Performance

dwarfs - dwarfs-0.9.5

Bugfixes