facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Yann Collet	270f9bf005	better consistency in accessing @input as suggested by @terrelln. Also : commented zstreamtest more to ensure ZSTD_stableInBuffer is tested/	2022-01-26 10:31:24 -08:00
Yann Collet	8296be4a0a	pretend consuming input to provide a sense of forward progress	2022-01-26 10:31:24 -08:00
Yann Collet	4b9d1dd9ff	fixed incorrect comment	2022-01-26 10:31:24 -08:00
Yann Collet	27d336b099	minor behavior refinements specifically, there is no obligation to start streaming compression with pos=0. stableSrc mode is now compatible with this setup.	2022-01-26 10:31:24 -08:00
Yann Collet	37b87add7a	make stableSrc compatible with regular streaming API including flushStream(). Now the only condition is for `input.size` to continuously grow.	2022-01-26 10:31:24 -08:00
Yann Collet	c0c5ffa973	streaming compression : lazy parameter adaptation with stable input effectively makes ZSTD_c_stableInput compatible ZSTD_compressStream() and zstd_e_continue operation mode.	2022-01-26 10:31:24 -08:00
Yann Collet	5684bae4f6	minor refactoring on streaming compression implementation.	2022-01-26 10:31:23 -08:00
Yann Collet	cc7d23bcec	Merge pull request #2965 from facebook/offbase Converge sumtype (offset \| repcode) numeric representation towards offBase	2022-01-24 15:47:42 -08:00
Yann Collet	71921e596f	Merge pull request #2983 from facebook/minLitPricev2 [opt] minor compression ratio improvement	2022-01-20 16:02:31 -08:00
Elliot Gorokhovsky	f936dd89cb	Minor lint fix	2022-01-20 11:54:43 -07:00
Elliot Gorokhovsky	9b6dfedf0c	Documentation and minor refactor to clarify MT memory management.	2022-01-18 09:43:05 -07:00
Yann Collet	ca0135c2fd	new Formulation presumes faster	2022-01-07 14:37:53 -08:00
Yann Collet	9e1b4828e5	enforce a minimum price of 1 bit per literal in the optimal parser	2022-01-07 13:53:48 -08:00
Nick Terrell	4d8a2132d0	[opt] Fix oss-fuzz bug in optimal parser oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072, which can't be represented in the zstd format, so we accessed 1 beyond LL_bits. Fix the issue by making it cost 1 bit more than litLength = 131071. There are still follow ups: 1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals over a match. Should we bound literals_cost[literal] > 0, unless the block truly only has one literal value? 2. When no matches are found, the cost model isn't updated. In this case no matches were found for an entire block. So the literals cost model wasn't updated at all. That made the optimal parser think literals_cost[0] = 0, where it is actually quite high, since the block was entirely random noise. Credit to OSS-Fuzz.	2022-01-06 16:10:18 -08:00
Yann Collet	41ad7332dd	Updated expression for better readability	2022-01-04 09:07:11 -08:00
Yann Collet	8c53e526db	fix performance issue in scenario #2966 (part 1) When re-using a compression state, across multiple successive compressions, the state should minimize the amount of allocation and initialization required. This mostly matters in situations where initialization is an overwhelming task compared to compression itself. This can happen when the amount to compress is small, while the compression state was given the impression that it would be much larger, aka, streaming mode without providing a srcSize hint. This lean-initialization optimization was broken in 980f3bbf8354edec0ad32b4430800f330185de6a . This commit fixes it, making this scenario once again on par with v1.4.9. Note that this does not completely fix #2966, since another heavy initialization, specific to row mode, is also happening (and was not present in v1.4.9). This will be fixed in a separate commit.	2021-12-31 15:16:19 -08:00
Yann Collet	03903f5701	fixed minor compression difference in btlazy2 subtle dependency on sumtype numeric representation	2021-12-29 18:51:03 -08:00
Yann Collet	7a18d709ae	updated all names to offBase convention	2021-12-29 17:30:43 -08:00
Yann Collet	f92ec5ea54	change the offset\|repcode sumtype format to match offBase directly at ZSTD_storeSeq() interface. In the process, remove ZSTD_REP_MOVE. This makes it possible, in future commits, to update and effectively simplify the naming scheme to properly label the updated processing pipeline : offset \| repcode => offBase => offCode + offBits	2021-12-29 12:03:36 -08:00
Yann Collet	ad7c9fc11e	use ZSTD_memcpy(), for proper redirection within Linux Kernel	2021-12-28 17:41:47 -08:00
Yann Collet	8da414231d	found a few more places which were dependent on seqStore offcode sumtype numeric representation	2021-12-28 17:03:24 -08:00
Yann Collet	de9f52e945	regroup all mentions of ZSTD_REP_MOVE within zstd_compress_internal.h	2021-12-28 13:47:57 -08:00
Yann Collet	a34ccad9a6	fixed minor conversion warnings	2021-12-28 13:21:22 -08:00
Yann Collet	92a08eec72	abstracted storeSeq() sumtype numeric representation from zstd_lazy.c	2021-12-28 12:23:39 -08:00
Yann Collet	e909fa627f	abstracted storeSeq() sumtype numeric representation from zstd_opt.c	2021-12-28 12:14:33 -08:00
Yann Collet	6fa640ef70	separate newRep() from updateRep() the new contracts seems to make more sense : updateRep() updates an array of repeat offsets _in place_, while newRep() generates a new structure with the updated repeat-offset array. Most callers are actually expecting the in-place variant, and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.	2021-12-28 11:52:33 -08:00
Yann Collet	321583ccf5	fixed minor typecast warnings	2021-12-28 11:38:21 -08:00
Yann Collet	b7630a474b	abstracted usage of offBase sumtype within zstd_lazy.c	2021-12-28 10:59:47 -08:00
Yann Collet	435f5a2e6d	fixed regression test assert optLdm->offset might be == 0 in invalid case. Only use STORE_OFFSET() after validating it's a correct case.	2021-12-28 09:55:31 -08:00
Yann Collet	2068889146	created STORED_*() macros to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`. This makes it possible to abstract away this representation by using the macros to extract these values. First user : ZSTD_updateRep() .	2021-12-28 06:59:07 -08:00
Yann Collet	1aed962216	introduce macros STORE_OFFSET() and STORE_REPCODE() this meant to abstract the sumtype representation required to transfert `offcode` to `ZSTD_storeSeq()`. Unfortunately, the sumtype numeric representation is currently a leaky abstraction that has permeated many other parts of the code, especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`. While this PR makes a good job a transfering a large nb of call sites to using the new macros, there are still a few sites where this transformation is more complex, or where the numeric representation itself it used "as is". One of the problematics area is the decision to use the numeric format of the sumtype within the match finders of `zstd_lazy`. This commit doesn't change the behavior, it only introduces and employes the macros, but eventually the resulting code remains identical. At target, if the numeric representation of the sumtype can be completely abstracted and no other part of the code depends on it, it will be possible to move it towards something slightly more efficient.	2021-12-23 22:03:30 -08:00
Yann Collet	aeff128331	change seqDef.offset into seqDef.offBase to better reflect the value stored in this field.	2021-12-23 17:56:08 -08:00
Yann Collet	e145b58cfd	changed seqDef.matchLength into seqDef.mlBase since this is effectively what is stored in this field (== matchLength - MINMATCH). This makes it clearer what needs to be done when reading from / writing to this field.	2021-12-23 13:39:46 -08:00
Yann Collet	b77fcac61f	change ZSTD_storeSeq() interface to accept matchLength instead of mlBase. This removes the need to do `- MINMATCH` at every call site. The new interface contract is checked with an `assert()`.	2021-12-23 12:03:33 -08:00
Yann Collet	a9e43b37d0	Revert "Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries."	2021-12-20 11:43:14 -08:00
Yann Collet	f829c32258	forgot the chainlog is effectively a "fake" value with rowHash the only value which makes sense is `hashlog-1` as it mimics the real memory usage.	2021-12-16 11:37:40 -08:00
Yann Collet	db1b408a2f	rebalance lazy compression levels	2021-12-15 21:33:31 -08:00
Yann Collet	c8d6067615	fixed incorrect rowlog initialization the variable has only very limited usage, being only used once at the beginning of the block for prefetching only, hence the error had no impact on compression ratio.	2021-12-15 14:37:05 -08:00
Yann Collet	eaf786242d	Merge pull request #2929 from facebook/sse_row_lazy simplify SSE implementation of row_lazy match finder	2021-12-15 11:47:15 -08:00
Norbert Lange	2fbb1d10c1	Reduce bit tables to 8bit This saves some 1.7Kb in rodata section (x86_64, zstd tool), while assembler code stays the same except the type of a few load/extend instructions. Should not have negative performance implications.	2021-12-14 23:47:57 +01:00
Yann Collet	05430b25a8	roll SSE implementation of row_lazy match finder mostly for maintenance convenience. Performance wise, there is very little change, slightly faster for slog 3 & 4, neutral or very slightly negative for slot 5 & 6.	2021-12-14 10:44:23 -08:00
W. Felix Handte	82a49c88f9	Increment Step by 1 not 2 I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate due to incompressible inputs. (The methods I tried slowed things down quite a bit.) Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as opposed to the `01__23__` we were actually doing), it's a big ambitious to increment `step` by 2. Instead, let's increment it by 1, which has the benefit sliiightly improving compression. Speed remains pretty much unchanged.	2021-12-13 16:59:33 -05:00
W. Felix Handte	6ca5f42402	Rewrite `step` to Track Increment Between Pairs of Positions The position updates are rewritten from `ip[N] = ip[N-1] + step` to be `ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing of gaps at setup and then we only have to keep a single `step` variable. This seems to work quite well on GCC and Clang!	2021-12-13 14:48:26 -05:00
W. Felix Handte	b8434cb754	Allow Templating `ZSTD_fast` Matchfinders on Acceleration (Lvl < -1)	2021-12-13 14:46:57 -05:00
W. Felix Handte	ace6a7e746	Decompose `step` into Two Variables This avoids an additional addition, at the cost of an additional variable.	2021-12-10 16:44:23 -05:00
W. Felix Handte	22501cd283	Stagger Application of `stepSize` in ZSTD_fast This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That is, it always looks at adjacent pairs of positions, and only applies the acceleration every other position. This produces a more fine-grained acceleration.	2021-12-10 16:44:23 -05:00
Nick Terrell	b94407b6cf	Remove possible NULL pointer addition Refactor `ZSTDMT_isOverlapped()` to do NULL checks before computing the end pointer. Fixes #2906.	2021-12-08 12:40:40 -08:00
Nick Terrell	014bbb29f8	Merge pull request #2898 from terrelln/issue-2862 Improve zstd_opt build speed and size	2021-12-02 19:49:43 -05:00
Yann Collet	1bf3d8a475	Merge pull request #2896 from facebook/m68k Zstandard compiles and run on m68k cpus	2021-12-02 14:25:45 -08:00
Nick Terrell	e5bfaeede7	Improve zstd_opt build speed and size Use the same trick as we did for zstd_lazy in PR #2828: * Create one search function specialization for each (dictMode, mls). * Select the search function pointer at the top of the match finder. Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into every function, since `dictMode` is no longer used as a template. Create two specializations, for opt levels 0 and 2, and call one of the two specializations. Lastly, remove the hack that disabled inlining for zstd_opt for the Linux Kernel, as we've gotten most of the benefit already. Compilation time sees a ~4x reduction: \| Compiler \| Flags \| Dev Time (s) \| PR Time (s) \| Delta \| \|----------\|----------------------------------\|--------------\|-------------\|-------\| \| gcc \| -O3 \| 10.1 \| 2.3 \| -77% \| \| gcc \| -O3 -fsanitize=address,undefined \| 61.1 \| 10.2 \| -83% \| \| clang \| -O3 \| 9.0 \| 2.1 \| -76% \| \| clang \| -O3 -fsanitize=address,undefined \| 33.5 \| 5.1 \| -84% \| Build size is reduced by 150KB - 200KB: \| Compiler \| Dev libzstd.a Size (B) \| PR libzstd.a Size (B) \| Delta \| \|----------\|------------------------\|-----------------------\|-------\| \| gcc \| 1327476 \| 1177108 \| -11% \| \| clang \| 1378324 \| 1167780 \| -15% \| There is a <2% speed loss in all cases: \| Compiler \| Level \| Dev Speed (MB/s) \| PR Speed (MB/s) \| Delta \| \|----------\|-------\|------------------\|-----------------\|--------\| \| gcc \| 16 \| 4.78 \| 4.72 \| -1.25% \| \| gcc \| 17 \| 3.49 \| 3.46 \| -0.85% \| \| gcc \| 18 \| 2.92 \| 2.86 \| -2.04% \| \| gcc \| 19 \| 2.61 \| 2.61 \| 0.00% \| \| clang \| 16 \| 4.69 \| 4.80 \| 2.34% \| \| clang \| 17 \| 3.53 \| 3.49 \| -1.13% \| \| clang \| 18 \| 2.86 \| 2.85 \| -0.34% \| \| clang \| 19 \| 2.61 \| 2.61 \| 0.00% \| Fixes Issue #2862.	2021-12-02 14:19:41 -08:00

1 2 3 4 5 ...

2084 Commits