The DUBT can be non-deterministic if an index is equal to
`ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the
indices at 2.
This bug was found by the OSS-Fuzz determinism fuzzer. With this change
the fuzzer test passes. And I've confirmed that this is the root cause,
not just hiding the problem.
Aside: This took me a long time to figure out, because I thought I had
tried this first thing. But, apparantly I messed it up, because when I
was going through it again with @felixhandte, I was pointing out that it
wasn't the case, but it turns out it was.
Credit to: OSS-Fuzz
Add the libzstd.pc target to the lib target in lib/Makefile, which makes
it inherit LDFLAGS_DYNLIB from the lib-mt target. This allows us to add
a Libs.private field to libzstd.pc which gets conditionally populated
with '-pthread'.
The 1.5.0 release notes mention that the static library isn't
multi-threaded by default, due to concern for people building static
binaries with libzstd:
Now the dynamic library supports multi-threaded compression by
default. Note that this property is not extended to the static
library because doing so would have impacted the build script of
existing client applications (requiring them to add -pthread to their
recipe), thus potentially breaking their build.
To get closer to being able to enable multi-threading for all library
builds by default, this commit makes it so that any libzstd consumer
using pkg-config gets the correct flags.
We also fix the indentation of the rule for libzstd.pc and move it
outside the if/endif block for install rules (which uses a list of OSs
where the rules were validated), so the rule is available for all users
of the 'lib*' targets.
* The block splitter missed a bounds check, so when the buffer is too small it
passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which
can then write the compressed data past the end of the buffer. This is a new
regression in v1.5.0 when the block splitter is enabled. It is either enabled
explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()`
or `ZSTD_compressStream*()`.
* `HUF_writeCTable_wksp()` omits a bounds check when calling
`HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass
an erroneously large size to `HUF_compressWeights()`, which can then write
past the end of the buffer. This bug has been present for ages. However, I
believe that zstd cannot trigger the bug, because it never calls
`HUF_compress*()` with `dstCapacity == 0` because of [this check][1].
Credit to: Oss-Fuzz
[1]: 89127e5ee2/lib/compress/zstd_compress_literals.c (L100)
* Flatten ZSTD_row_getMatchMask
* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary.
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)
* replace memcpy with MEM_readST
* silence alignment warnings
* fix neon casts
* Update zstd_lazy.c
* unify simd preprocessor detection (#3)
* remove duplicate asserts
* tweak rotates
* improve endian detection
* add cast
there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..
* more casts
* fix casts
better work-around for the (bogus) warning: unary minus on unsigned
Even with -fvisibility=hidden added to CFLAGS, any symbol which is
given a default visibility attribute ends up exported in the dynamic
library. This happens through zstd_internal.h which defines
..._STATIC_LINKING_ONLY before including various header files, and is
included for example in lib/common/pool.c.
To avoid this, this patch distinguishes static and non-static APIs, by
using ZSTDLIB_API only for the latter, and introducing
ZSTDLIB_STATIC_API for the former. For now, both are exported, but
non-static APIs can be hidden by overriding the definition
ZSTDLIB_STATIC_API. lib/Makefile is modified to allow this using
make CPPFLAGS_DYNLIB=-DZSTDLIB_STATIC_API=ZSTDLIB_HIDDEN
In addition, API declarations are dropped from zstd_compress.c (they
aren't needed there).
Signed-off-by: Stephen Kitt <steve@sk2.org>
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.
This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.
The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
never updated.
4. We will call the ext-dict match finder instead of the prefix one.
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.
40def70387/lib/compress/zstd_opt.c (L476)
This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.
The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).
This should fix test instability issues
using lot of threads in 32-bit environments.
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel,
the mode 2 caused SIGBUS (unaligned memory access).
Running all our arm builds in the build farm
only on armv8 simplifies administration a lot.
Depending on compiler and environment, this change might slow down
memory accesses (did not benchmark it). The original analysis is 6 years old.
Fixes#2632