Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.
So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.
that's clearer than finding the tables somewhere in the middle of `compress.c`.
Also, down the line, it may potentially allows zstd to feature adjusted tables depending on target cpu.
Speed up compilation times by moving each specialized search function
into its own function. This is faster because compilers can handle many
smaller functions much faster than one gigantic function. The previous
approach generated one giant function with `switch` statements and
inlining to select the implementation.
| Compiler | Flags | Dev Time (s) | PR Time (s) | Delta |
|----------|-------------------------------------|--------------|-------------|-------|
| gcc | -O3 | 16.5 | 5.6 | -66% |
| gcc | -O3 -g -fsanitize=address,undefined | 158.9 | 38.2 | -75% |
| clang | -O3 | 36.5 | 5.5 | -85% |
| clang | -O3 -g -fsanitize=address,undefined | 27.8 | 17.5 | -37% |
This also reduces the binary size because the search functions are no
longer inlined into the main body.
| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc | 1563868 | 1308844 | -16% |
| clang | 1924372 | 1376020 | -28% |
Finally, the performance is not impacted significantly by this change,
in fact we generally see a small speed boost.
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|-------|
| gcc | 5 | 110.6 | 110.0 | -0.5% |
| gcc | 7 | 70.4 | 72.2 | +2.5% |
| gcc | 9 | 53.2 | 53.5 | +0.5% |
| gcc | 13 | 12.7 | 12.9 | +1.5% |
| clang | 5 | 113.9 | 110.4 | -3.0% |
| clang | 7 | 67.7 | 70.6 | +4.2% |
| clang | 9 | 51.9 | 52.2 | +0.5% |
| clang | 13 | 12.4 | 13.3 | +7.2% |
The compression strategy is unmodified in this PR, so the compressed size
should be exactly the same. I may have a follow up PR to slightly improve
the compression ratio, if it doesn't cost too much speed.
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.
The bug was reported by Dan Carpenter and found by Smatch static
checker.
https://lore.kernel.org/all/20211008063704.GA5370@kili/
There is no minimum value check, so the parameter could be negative.
Switch to the standard pattern of using `BOUNDCHECK()`.
The bug was reported by Dan Carpenter and found by Smatch static
checker.
https://lore.kernel.org/all/20211008063704.GA5370@kili/