* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
- Incorrect weight comparison (weights are allowed to be equal to
table log).
- HUF_compress1X_usingCTable_internal() can return compressed
size >= source size, so the assert that `cSize <= 65535` isn't
correct, and it needs to be checked instead.
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.
Credit-to: OSS-Fuzz
The DUBT can be non-deterministic if an index is equal to
`ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the
indices at 2.
This bug was found by the OSS-Fuzz determinism fuzzer. With this change
the fuzzer test passes. And I've confirmed that this is the root cause,
not just hiding the problem.
Aside: This took me a long time to figure out, because I thought I had
tried this first thing. But, apparantly I messed it up, because when I
was going through it again with @felixhandte, I was pointing out that it
wasn't the case, but it turns out it was.
Credit to: OSS-Fuzz
* The block splitter missed a bounds check, so when the buffer is too small it
passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which
can then write the compressed data past the end of the buffer. This is a new
regression in v1.5.0 when the block splitter is enabled. It is either enabled
explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()`
or `ZSTD_compressStream*()`.
* `HUF_writeCTable_wksp()` omits a bounds check when calling
`HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass
an erroneously large size to `HUF_compressWeights()`, which can then write
past the end of the buffer. This bug has been present for ages. However, I
believe that zstd cannot trigger the bug, because it never calls
`HUF_compress*()` with `dstCapacity == 0` because of [this check][1].
Credit to: Oss-Fuzz
[1]: 89127e5ee2/lib/compress/zstd_compress_literals.c (L100)
* Flatten ZSTD_row_getMatchMask
* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary.
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)
* replace memcpy with MEM_readST
* silence alignment warnings
* fix neon casts
* Update zstd_lazy.c
* unify simd preprocessor detection (#3)
* remove duplicate asserts
* tweak rotates
* improve endian detection
* add cast
there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..
* more casts
* fix casts
better work-around for the (bogus) warning: unary minus on unsigned
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.
This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.
The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
never updated.
4. We will call the ext-dict match finder instead of the prefix one.
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.
40def70387/lib/compress/zstd_opt.c (L476)
This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.
The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).
This should fix test instability issues
using lot of threads in 32-bit environments.