2068 Commits

Author SHA1 Message Date
Nick Terrell
4d8a2132d0 [opt] Fix oss-fuzz bug in optimal parser
oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072,
which can't be represented in the zstd format, so we accessed 1 beyond LL_bits.

Fix the issue by making it cost 1 bit more than litLength = 131071.

There are still follow ups:
1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals
   over a match. Should we bound literals_cost[literal] > 0, unless the block truly only
   has one literal value?
2. When no matches are found, the cost model isn't updated. In this case no matches were
   found for an entire block. So the literals cost model wasn't updated at all. That made
   the optimal parser think literals_cost[0] = 0, where it is actually quite high, since
   the block was entirely random noise.

Credit to OSS-Fuzz.
2022-01-06 16:10:18 -08:00
Yann Collet
41ad7332dd Updated expression for better readability 2022-01-04 09:07:11 -08:00
Yann Collet
8c53e526db fix performance issue in scenario #2966 (part 1)
When re-using a compression state, across multiple successive compressions,
the state should minimize the amount of allocation and initialization required.

This mostly matters in situations where initialization is an overwhelming task
compared to compression itself.
This can happen when the amount to compress is small,
while the compression state was given the impression that it would be much larger,
aka, streaming mode without providing a srcSize hint.

This lean-initialization optimization was broken in 980f3bbf8354edec0ad32b4430800f330185de6a .

This commit fixes it, making this scenario once again on par with v1.4.9.

Note that this does not completely fix #2966,
since another heavy initialization, specific to row mode,
is also happening (and was not present in v1.4.9).
This will be fixed in a separate commit.
2021-12-31 15:16:19 -08:00
Yann Collet
ad7c9fc11e use ZSTD_memcpy(), for proper redirection within Linux Kernel 2021-12-28 17:41:47 -08:00
Yann Collet
8da414231d found a few more places which were dependent on seqStore offcode sumtype numeric representation 2021-12-28 17:03:24 -08:00
Yann Collet
de9f52e945 regroup all mentions of ZSTD_REP_MOVE within zstd_compress_internal.h 2021-12-28 13:47:57 -08:00
Yann Collet
a34ccad9a6 fixed minor conversion warnings 2021-12-28 13:21:22 -08:00
Yann Collet
92a08eec72 abstracted storeSeq() sumtype numeric representation from zstd_lazy.c 2021-12-28 12:23:39 -08:00
Yann Collet
e909fa627f abstracted storeSeq() sumtype numeric representation from zstd_opt.c 2021-12-28 12:14:33 -08:00
Yann Collet
6fa640ef70 separate newRep() from updateRep()
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.

Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
2021-12-28 11:52:33 -08:00
Yann Collet
321583ccf5 fixed minor typecast warnings 2021-12-28 11:38:21 -08:00
Yann Collet
b7630a474b abstracted usage of offBase sumtype within zstd_lazy.c 2021-12-28 10:59:47 -08:00
Yann Collet
435f5a2e6d fixed regression test assert
optLdm->offset might be == 0 in invalid case.
Only use STORE_OFFSET() after validating it's a correct case.
2021-12-28 09:55:31 -08:00
Yann Collet
2068889146 created STORED_*() macros
to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`.

This makes it possible to abstract away this representation by using the macros to extract these values.

First user : ZSTD_updateRep() .
2021-12-28 06:59:07 -08:00
Yann Collet
1aed962216 introduce macros STORE_OFFSET() and STORE_REPCODE()
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.

Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.

While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".

One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.

This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.

At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
2021-12-23 22:03:30 -08:00
Yann Collet
aeff128331 change seqDef.offset into seqDef.offBase
to better reflect the value stored in this field.
2021-12-23 17:56:08 -08:00
Yann Collet
e145b58cfd changed seqDef.matchLength into seqDef.mlBase
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
2021-12-23 13:39:46 -08:00
Yann Collet
b77fcac61f change ZSTD_storeSeq() interface to accept matchLength
instead of mlBase.

This removes the need to do `- MINMATCH` at every call site.

The new interface contract is checked with an `assert()`.
2021-12-23 12:03:33 -08:00
Yann Collet
a9e43b37d0
Revert "Limit ZSTD_maxCLevel to 21 for 32-bit binaries." 2021-12-20 11:43:14 -08:00
Yann Collet
f829c32258 forgot the chainlog is effectively a "fake" value with rowHash
the only value which makes sense is `hashlog-1`
as it mimics the real memory usage.
2021-12-16 11:37:40 -08:00
Yann Collet
db1b408a2f rebalance lazy compression levels 2021-12-15 21:33:31 -08:00
Yann Collet
c8d6067615 fixed incorrect rowlog initialization
the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
2021-12-15 14:37:05 -08:00
Yann Collet
eaf786242d
Merge pull request #2929 from facebook/sse_row_lazy
simplify SSE implementation of row_lazy match finder
2021-12-15 11:47:15 -08:00
Norbert Lange
2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
Yann Collet
05430b25a8 roll SSE implementation of row_lazy match finder
mostly for maintenance convenience.

Performance wise, there is very little change,
slightly faster for slog 3 & 4,
neutral or very slightly negative for slot 5 & 6.
2021-12-14 10:44:23 -08:00
W. Felix Handte
82a49c88f9 Increment Step by 1 not 2
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)

Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
2021-12-13 16:59:33 -05:00
W. Felix Handte
6ca5f42402 Rewrite step to Track Increment Between Pairs of Positions
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.

This seems to work quite well on GCC and Clang!
2021-12-13 14:48:26 -05:00
W. Felix Handte
b8434cb754 Allow Templating ZSTD_fast Matchfinders on Acceleration (Lvl < -1) 2021-12-13 14:46:57 -05:00
W. Felix Handte
ace6a7e746 Decompose step into Two Variables
This avoids an additional addition, at the cost of an additional variable.
2021-12-10 16:44:23 -05:00
W. Felix Handte
22501cd283 Stagger Application of stepSize in ZSTD_fast
This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
2021-12-10 16:44:23 -05:00
Nick Terrell
b94407b6cf Remove possible NULL pointer addition
Refactor `ZSTDMT_isOverlapped()` to do NULL checks before computing the
end pointer.

Fixes #2906.
2021-12-08 12:40:40 -08:00
Nick Terrell
014bbb29f8
Merge pull request #2898 from terrelln/issue-2862
Improve zstd_opt build speed and size
2021-12-02 19:49:43 -05:00
Yann Collet
1bf3d8a475
Merge pull request #2896 from facebook/m68k
Zstandard compiles and run on m68k cpus
2021-12-02 14:25:45 -08:00
Nick Terrell
e5bfaeede7 Improve zstd_opt build speed and size
Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.

Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.

Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.

Compilation time sees a ~4x reduction:

| Compiler | Flags                            | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc      | -O3                              |         10.1 |         2.3 |  -77% |
| gcc      | -O3 -fsanitize=address,undefined |         61.1 |        10.2 |  -83% |
| clang    | -O3                              |          9.0 |         2.1 |  -76% |
| clang    | -O3 -fsanitize=address,undefined |         33.5 |         5.1 |  -84% |

Build size is reduced by 150KB - 200KB:

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1327476 |               1177108 |  -11% |
| clang    |                1378324 |               1167780 |  -15% |

There is a <2% speed loss in all cases:

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta  |
|----------|-------|------------------|-----------------|--------|
| gcc      |    16 |             4.78 |            4.72 | -1.25% |
| gcc      |    17 |             3.49 |            3.46 | -0.85% |
| gcc      |    18 |             2.92 |            2.86 | -2.04% |
| gcc      |    19 |             2.61 |            2.61 |  0.00% |
| clang    |    16 |             4.69 |            4.80 |  2.34% |
| clang    |    17 |             3.53 |            3.49 | -1.13% |
| clang    |    18 |             2.86 |            2.85 | -0.34% |
| clang    |    19 |             2.61 |            2.61 |  0.00% |

Fixes Issue #2862.
2021-12-02 14:19:41 -08:00
Nick Terrell
01ecd6ffc0
Merge pull request #2892 from terrelln/issue-2785
[CircleCI] Fix short-tests-0
2021-12-02 16:20:56 -05:00
Yann Collet
30b9db8ae4 changed macro name to ZSTD_ALIGNOF
for better consistency
2021-12-02 12:57:42 -08:00
Nick Terrell
21e28f5c24
Merge pull request #2891 from supperPants/dev
Fix typos
2021-12-02 13:53:33 -05:00
Yann Collet
39dced092e fix align conditions for huf_compress 2021-12-01 23:02:00 -08:00
Nick Terrell
91f5891dd0 [CircleCI] Fix short-tests-0
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.

Also fix all the test failures that were exposed.

`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
2021-12-01 17:43:46 -08:00
Yann Collet
e89e847820 added alignment test
and fix an incorrect alignment check in cwksp which was failing on m68k
2021-12-01 17:16:36 -08:00
Yann Collet
3f64b31585 Merge branch 'dev' into tomerge2051 2021-12-01 15:29:49 -08:00
Yann Collet
8031dc7a48
Merge pull request #2885 from yoniko/limit-level-32bit-systems
Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.
2021-12-01 14:19:16 -08:00
supperPants
d4713de5a3 Fix typos. 2021-12-01 22:36:21 +08:00
Nick Terrell
5414dd7978 [bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
  TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
  BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
  be any cases where bmi2 is supported but bmi1 isn't. But, since we are
  using the instruction we should check bmi1 as well.
2021-11-30 17:54:56 -08:00
Yonatan Komornik
ef2cba609d ZSTD_maxCLevel now limited to 21 for 32-bit binaries.
CI tests for constrained memory runs with max level on 32-bit binaries.
2021-11-30 10:31:52 -08:00
Felix Handte
c2c6a4ab40
Merge pull request #2869 from felixhandte/oss-fuzz-fix-41005
Determinism: Avoid Mapping Window into Reserved Indices during Reduction
2021-11-18 10:11:48 -05:00
W. Felix Handte
66079085f0 Determinism: Avoid Mapping Window into Reserved Indices during Reduction
PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It
succeeded in addressing that source of non-determinism, but introduced a new
one: it was possible, when index reduction occurred, to map indices in the
window to the reserved value, which would cause them to be zeroed, potentially
altering parsing of the input.

This PR addresses this issue. It makes sure that the bottom of the window is
always `>= ZSTD_WINDOW_START_INDEX`.

I'm not sure if this makes #2850 redundant. I think it's probably still
valuable to have that protection as well.

Credit to OSS-Fuzz for discovering this issue.
2021-11-17 18:09:18 -05:00
Yann Collet
a37a8df532
Merge pull request #2856 from rex4539/typos
Fix typos
2021-11-17 13:04:30 -08:00
Nick Terrell
b7d899d99d
Merge pull request #2864 from terrelln/linux-opt
[linux-kernel] Don't inline function in zstd_opt.c
2021-11-16 14:13:39 -08:00
Nick Terrell
19eb459da3 [linux-kernel] Don't inline function in zstd_opt.c
The optimal parser is unlikely to be used in the linux kernel in
practice. There is no reason these functions should be force inlined,
since we aren't gaining anything, and are losing build size.

| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11   |        1142090 |        952754 |       -189336 |
| clang-12 |        1228402 |        976290 |       -252112 |

This is a temporary solution pending the resolution of PR #2862 in the
`dev` branch.
2021-11-15 20:37:30 -08:00