facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Nick Terrell	4d8a2132d0	[opt] Fix oss-fuzz bug in optimal parser oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072, which can't be represented in the zstd format, so we accessed 1 beyond LL_bits. Fix the issue by making it cost 1 bit more than litLength = 131071. There are still follow ups: 1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals over a match. Should we bound literals_cost[literal] > 0, unless the block truly only has one literal value? 2. When no matches are found, the cost model isn't updated. In this case no matches were found for an entire block. So the literals cost model wasn't updated at all. That made the optimal parser think literals_cost[0] = 0, where it is actually quite high, since the block was entirely random noise. Credit to OSS-Fuzz.	2022-01-06 16:10:18 -08:00
Yann Collet	41ad7332dd	Updated expression for better readability	2022-01-04 09:07:11 -08:00
Yann Collet	8c53e526db	fix performance issue in scenario #2966 (part 1) When re-using a compression state, across multiple successive compressions, the state should minimize the amount of allocation and initialization required. This mostly matters in situations where initialization is an overwhelming task compared to compression itself. This can happen when the amount to compress is small, while the compression state was given the impression that it would be much larger, aka, streaming mode without providing a srcSize hint. This lean-initialization optimization was broken in 980f3bbf8354edec0ad32b4430800f330185de6a . This commit fixes it, making this scenario once again on par with v1.4.9. Note that this does not completely fix #2966, since another heavy initialization, specific to row mode, is also happening (and was not present in v1.4.9). This will be fixed in a separate commit.	2021-12-31 15:16:19 -08:00
Yann Collet	ad7c9fc11e	use ZSTD_memcpy(), for proper redirection within Linux Kernel	2021-12-28 17:41:47 -08:00
Yann Collet	8da414231d	found a few more places which were dependent on seqStore offcode sumtype numeric representation	2021-12-28 17:03:24 -08:00
Yann Collet	de9f52e945	regroup all mentions of ZSTD_REP_MOVE within zstd_compress_internal.h	2021-12-28 13:47:57 -08:00
Yann Collet	a34ccad9a6	fixed minor conversion warnings	2021-12-28 13:21:22 -08:00
Yann Collet	92a08eec72	abstracted storeSeq() sumtype numeric representation from zstd_lazy.c	2021-12-28 12:23:39 -08:00
Yann Collet	e909fa627f	abstracted storeSeq() sumtype numeric representation from zstd_opt.c	2021-12-28 12:14:33 -08:00
Yann Collet	6fa640ef70	separate newRep() from updateRep() the new contracts seems to make more sense : updateRep() updates an array of repeat offsets _in place_, while newRep() generates a new structure with the updated repeat-offset array. Most callers are actually expecting the in-place variant, and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.	2021-12-28 11:52:33 -08:00
Yann Collet	321583ccf5	fixed minor typecast warnings	2021-12-28 11:38:21 -08:00
Yann Collet	b7630a474b	abstracted usage of offBase sumtype within zstd_lazy.c	2021-12-28 10:59:47 -08:00
Yann Collet	435f5a2e6d	fixed regression test assert optLdm->offset might be == 0 in invalid case. Only use STORE_OFFSET() after validating it's a correct case.	2021-12-28 09:55:31 -08:00
Yann Collet	2068889146	created STORED_*() macros to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`. This makes it possible to abstract away this representation by using the macros to extract these values. First user : ZSTD_updateRep() .	2021-12-28 06:59:07 -08:00
Yann Collet	1aed962216	introduce macros STORE_OFFSET() and STORE_REPCODE() this meant to abstract the sumtype representation required to transfert `offcode` to `ZSTD_storeSeq()`. Unfortunately, the sumtype numeric representation is currently a leaky abstraction that has permeated many other parts of the code, especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`. While this PR makes a good job a transfering a large nb of call sites to using the new macros, there are still a few sites where this transformation is more complex, or where the numeric representation itself it used "as is". One of the problematics area is the decision to use the numeric format of the sumtype within the match finders of `zstd_lazy`. This commit doesn't change the behavior, it only introduces and employes the macros, but eventually the resulting code remains identical. At target, if the numeric representation of the sumtype can be completely abstracted and no other part of the code depends on it, it will be possible to move it towards something slightly more efficient.	2021-12-23 22:03:30 -08:00
Yann Collet	aeff128331	change seqDef.offset into seqDef.offBase to better reflect the value stored in this field.	2021-12-23 17:56:08 -08:00
Yann Collet	e145b58cfd	changed seqDef.matchLength into seqDef.mlBase since this is effectively what is stored in this field (== matchLength - MINMATCH). This makes it clearer what needs to be done when reading from / writing to this field.	2021-12-23 13:39:46 -08:00
Yann Collet	b77fcac61f	change ZSTD_storeSeq() interface to accept matchLength instead of mlBase. This removes the need to do `- MINMATCH` at every call site. The new interface contract is checked with an `assert()`.	2021-12-23 12:03:33 -08:00
Yann Collet	a9e43b37d0	Revert "Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries."	2021-12-20 11:43:14 -08:00
Yann Collet	f829c32258	forgot the chainlog is effectively a "fake" value with rowHash the only value which makes sense is `hashlog-1` as it mimics the real memory usage.	2021-12-16 11:37:40 -08:00
Yann Collet	db1b408a2f	rebalance lazy compression levels	2021-12-15 21:33:31 -08:00
Yann Collet	c8d6067615	fixed incorrect rowlog initialization the variable has only very limited usage, being only used once at the beginning of the block for prefetching only, hence the error had no impact on compression ratio.	2021-12-15 14:37:05 -08:00
Yann Collet	eaf786242d	Merge pull request #2929 from facebook/sse_row_lazy simplify SSE implementation of row_lazy match finder	2021-12-15 11:47:15 -08:00
Norbert Lange	2fbb1d10c1	Reduce bit tables to 8bit This saves some 1.7Kb in rodata section (x86_64, zstd tool), while assembler code stays the same except the type of a few load/extend instructions. Should not have negative performance implications.	2021-12-14 23:47:57 +01:00
Yann Collet	05430b25a8	roll SSE implementation of row_lazy match finder mostly for maintenance convenience. Performance wise, there is very little change, slightly faster for slog 3 & 4, neutral or very slightly negative for slot 5 & 6.	2021-12-14 10:44:23 -08:00
W. Felix Handte	82a49c88f9	Increment Step by 1 not 2 I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate due to incompressible inputs. (The methods I tried slowed things down quite a bit.) Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as opposed to the `01__23__` we were actually doing), it's a big ambitious to increment `step` by 2. Instead, let's increment it by 1, which has the benefit sliiightly improving compression. Speed remains pretty much unchanged.	2021-12-13 16:59:33 -05:00
W. Felix Handte	6ca5f42402	Rewrite `step` to Track Increment Between Pairs of Positions The position updates are rewritten from `ip[N] = ip[N-1] + step` to be `ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing of gaps at setup and then we only have to keep a single `step` variable. This seems to work quite well on GCC and Clang!	2021-12-13 14:48:26 -05:00
W. Felix Handte	b8434cb754	Allow Templating `ZSTD_fast` Matchfinders on Acceleration (Lvl < -1)	2021-12-13 14:46:57 -05:00
W. Felix Handte	ace6a7e746	Decompose `step` into Two Variables This avoids an additional addition, at the cost of an additional variable.	2021-12-10 16:44:23 -05:00
W. Felix Handte	22501cd283	Stagger Application of `stepSize` in ZSTD_fast This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That is, it always looks at adjacent pairs of positions, and only applies the acceleration every other position. This produces a more fine-grained acceleration.	2021-12-10 16:44:23 -05:00
Nick Terrell	b94407b6cf	Remove possible NULL pointer addition Refactor `ZSTDMT_isOverlapped()` to do NULL checks before computing the end pointer. Fixes #2906.	2021-12-08 12:40:40 -08:00
Nick Terrell	014bbb29f8	Merge pull request #2898 from terrelln/issue-2862 Improve zstd_opt build speed and size	2021-12-02 19:49:43 -05:00
Yann Collet	1bf3d8a475	Merge pull request #2896 from facebook/m68k Zstandard compiles and run on m68k cpus	2021-12-02 14:25:45 -08:00
Nick Terrell	e5bfaeede7	Improve zstd_opt build speed and size Use the same trick as we did for zstd_lazy in PR #2828: * Create one search function specialization for each (dictMode, mls). * Select the search function pointer at the top of the match finder. Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into every function, since `dictMode` is no longer used as a template. Create two specializations, for opt levels 0 and 2, and call one of the two specializations. Lastly, remove the hack that disabled inlining for zstd_opt for the Linux Kernel, as we've gotten most of the benefit already. Compilation time sees a ~4x reduction: \| Compiler \| Flags \| Dev Time (s) \| PR Time (s) \| Delta \| \|----------\|----------------------------------\|--------------\|-------------\|-------\| \| gcc \| -O3 \| 10.1 \| 2.3 \| -77% \| \| gcc \| -O3 -fsanitize=address,undefined \| 61.1 \| 10.2 \| -83% \| \| clang \| -O3 \| 9.0 \| 2.1 \| -76% \| \| clang \| -O3 -fsanitize=address,undefined \| 33.5 \| 5.1 \| -84% \| Build size is reduced by 150KB - 200KB: \| Compiler \| Dev libzstd.a Size (B) \| PR libzstd.a Size (B) \| Delta \| \|----------\|------------------------\|-----------------------\|-------\| \| gcc \| 1327476 \| 1177108 \| -11% \| \| clang \| 1378324 \| 1167780 \| -15% \| There is a <2% speed loss in all cases: \| Compiler \| Level \| Dev Speed (MB/s) \| PR Speed (MB/s) \| Delta \| \|----------\|-------\|------------------\|-----------------\|--------\| \| gcc \| 16 \| 4.78 \| 4.72 \| -1.25% \| \| gcc \| 17 \| 3.49 \| 3.46 \| -0.85% \| \| gcc \| 18 \| 2.92 \| 2.86 \| -2.04% \| \| gcc \| 19 \| 2.61 \| 2.61 \| 0.00% \| \| clang \| 16 \| 4.69 \| 4.80 \| 2.34% \| \| clang \| 17 \| 3.53 \| 3.49 \| -1.13% \| \| clang \| 18 \| 2.86 \| 2.85 \| -0.34% \| \| clang \| 19 \| 2.61 \| 2.61 \| 0.00% \| Fixes Issue #2862.	2021-12-02 14:19:41 -08:00
Nick Terrell	01ecd6ffc0	Merge pull request #2892 from terrelln/issue-2785 [CircleCI] Fix short-tests-0	2021-12-02 16:20:56 -05:00
Yann Collet	30b9db8ae4	changed macro name to ZSTD_ALIGNOF for better consistency	2021-12-02 12:57:42 -08:00
Nick Terrell	21e28f5c24	Merge pull request #2891 from supperPants/dev Fix typos	2021-12-02 13:53:33 -05:00
Yann Collet	39dced092e	fix align conditions for huf_compress	2021-12-01 23:02:00 -08:00
Nick Terrell	91f5891dd0	[CircleCI] Fix short-tests-0 short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead. Also fix all the test failures that were exposed. `make all` is failing on CircleCI because it is missing Docker. Move that test to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.	2021-12-01 17:43:46 -08:00
Yann Collet	e89e847820	added alignment test and fix an incorrect alignment check in cwksp which was failing on m68k	2021-12-01 17:16:36 -08:00
Yann Collet	3f64b31585	Merge branch 'dev' into tomerge2051	2021-12-01 15:29:49 -08:00
Yann Collet	8031dc7a48	Merge pull request #2885 from yoniko/limit-level-32bit-systems Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.	2021-12-01 14:19:16 -08:00
supperPants	d4713de5a3	Fix typos.	2021-12-01 22:36:21 +08:00
Nick Terrell	5414dd7978	[bmi2] Add lzcnt and bmi target attributes * When dynamic dispatching to bmi2 add lzcnt and bmi to the TARGET_ATTRIBUTE. * Centralize the bmi2 TARGET_ATTRIBUTE definition to BMI2_TARGET_ATTRIBUTE so we can change it in the future. * Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't be any cases where bmi2 is supported but bmi1 isn't. But, since we are using the instruction we should check bmi1 as well.	2021-11-30 17:54:56 -08:00
Yonatan Komornik	ef2cba609d	`ZSTD_maxCLevel` now limited to 21 for 32-bit binaries. CI tests for constrained memory runs with max level on 32-bit binaries.	2021-11-30 10:31:52 -08:00
Felix Handte	c2c6a4ab40	Merge pull request #2869 from felixhandte/oss-fuzz-fix-41005 Determinism: Avoid Mapping Window into Reserved Indices during Reduction	2021-11-18 10:11:48 -05:00
W. Felix Handte	66079085f0	Determinism: Avoid Mapping Window into Reserved Indices during Reduction PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It succeeded in addressing that source of non-determinism, but introduced a new one: it was possible, when index reduction occurred, to map indices in the window to the reserved value, which would cause them to be zeroed, potentially altering parsing of the input. This PR addresses this issue. It makes sure that the bottom of the window is always `>= ZSTD_WINDOW_START_INDEX`. I'm not sure if this makes #2850 redundant. I think it's probably still valuable to have that protection as well. Credit to OSS-Fuzz for discovering this issue.	2021-11-17 18:09:18 -05:00
Yann Collet	a37a8df532	Merge pull request #2856 from rex4539/typos Fix typos	2021-11-17 13:04:30 -08:00
Nick Terrell	b7d899d99d	Merge pull request #2864 from terrelln/linux-opt [linux-kernel] Don't inline function in zstd_opt.c	2021-11-16 14:13:39 -08:00
Nick Terrell	19eb459da3	[linux-kernel] Don't inline function in zstd_opt.c The optimal parser is unlikely to be used in the linux kernel in practice. There is no reason these functions should be force inlined, since we aren't gaining anything, and are losing build size. \| Compiler \| Before (Bytes) \| After (Bytes) \| Delta (Bytes) \| \|----------\|----------------\|---------------\|---------------\| \| gcc-11 \| 1142090 \| 952754 \| -189336 \| \| clang-12 \| 1228402 \| 976290 \| -252112 \| This is a temporary solution pending the resolution of PR #2862 in the `dev` branch.	2021-11-15 20:37:30 -08:00

1 2 3 4 5 ...

2068 Commits