Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.
Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.
Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.
Compilation time sees a ~4x reduction:
| Compiler | Flags | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc | -O3 | 10.1 | 2.3 | -77% |
| gcc | -O3 -fsanitize=address,undefined | 61.1 | 10.2 | -83% |
| clang | -O3 | 9.0 | 2.1 | -76% |
| clang | -O3 -fsanitize=address,undefined | 33.5 | 5.1 | -84% |
Build size is reduced by 150KB - 200KB:
| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc | 1327476 | 1177108 | -11% |
| clang | 1378324 | 1167780 | -15% |
There is a <2% speed loss in all cases:
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|--------|
| gcc | 16 | 4.78 | 4.72 | -1.25% |
| gcc | 17 | 3.49 | 3.46 | -0.85% |
| gcc | 18 | 2.92 | 2.86 | -2.04% |
| gcc | 19 | 2.61 | 2.61 | 0.00% |
| clang | 16 | 4.69 | 4.80 | 2.34% |
| clang | 17 | 3.53 | 3.49 | -1.13% |
| clang | 18 | 2.86 | 2.85 | -0.34% |
| clang | 19 | 2.61 | 2.61 | 0.00% |
Fixes Issue #2862.
The optimal parser is unlikely to be used in the linux kernel in
practice. There is no reason these functions should be force inlined,
since we aren't gaining anything, and are losing build size.
| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11 | 1142090 | 952754 | -189336 |
| clang-12 | 1228402 | 976290 | -252112 |
This is a temporary solution pending the resolution of PR #2862 in the
`dev` branch.
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.
The bug was reported by Dan Carpenter and found by Smatch static
checker.
https://lore.kernel.org/all/20211008063704.GA5370@kili/
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.
This is supposed to be helpful for both maintenance and the compiler.
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.
Updated small files tests in fuzzer
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.
40def70387/lib/compress/zstd_opt.c (L476)
This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.
The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`