facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Yann Collet	637b2d7a24	fixed bug 44168 discovered by oss-fuzz It's a bug in the test itself : ZSTD_compressBound() as an upper bound of the compress size only works for data compressed "normally". But in situations where many flushes are forcefully introduced, this creates many more blocks, each of which has a potential to increase the size by 3 bytes. In extreme cases (lots of small incompressible blocks), the expansion can go beyond ZSTD_compressBound(). This situation is similar when using the CompressSequences() API with Explicit Block Delimiters. In which case, each explicit block acts like a deliberate flush. When employed by a fuzzer, it's possible to generate scenarios like the one described above, with tons of incompressible blocks of small sizes, thus going beyond ZSTD_compressBound(). fix : when using Explicit Block Delimiters, use a larger bound, to account for this scenario.	2022-01-29 16:36:20 -08:00
Yann Collet	9a68840176	minor refactor to blocksplit notably simplication of ZSTD_deriveSeqStoreChunk()	2022-01-27 20:24:35 -08:00
Yann Collet	5d70ec0bc4	Merge pull request #3033 from facebook/fix44108 fix issue 44108	2022-01-27 10:57:48 -08:00
Yann Collet	bad7f82300	Merge pull request #2974 from facebook/fix2966_part3 Lazy parameters adaptation (part 1 - ZSTD_c_stableInBuffer)	2022-01-27 06:14:04 -08:00
Yann Collet	8df1257c3c	fix issue 44108 credit to oss-fuzz In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode. In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes. This scenario is controlled in ZSTD_seqStore_resolveOffCodes(), and the repcode is transformed into a raw offset when its new meaning is incorrect. In more complex scenarios, the previous block might be emitted as uncompressed after all, thus modifying the expected repcode history. In the case discovered by oss-fuzz, the first block is emitted as uncompressed, so the repcode history remains at default values: 1,4,8. But since the starting repcode is repcode3, and the literal length is == 0, its meaning is : = repcode1 - 1. Since repcode1==1, it results in an offset value of 0, which is invalid. So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset. But actually, it doesn't matter, because this result will then be compared to reality, and since it's an invalid offset, it will necessarily be discarded if incorrect, then the repcode will be replaced by a raw offset. So the `assert()` is not useful. Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.	2022-01-27 05:49:59 -08:00
Yann Collet	f2d9652ad8	more usage of new error code stabilityCondition_notRespected as suggested by @terrelln	2022-01-26 18:30:55 -08:00
Yann Collet	8b46895588	removed new huffman depth heuristic results are now identical to before this PR	2022-01-26 15:22:06 -08:00
Yann Collet	a66e8bb437	introduced LitHufLog constant which properly represents the maximum bit size of compressed literals (11) as defined in the specification. To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident. Name selected to keep the same convention as existing width definitions, MLFSELog, LLFSELog and OffFSELog.	2022-01-26 14:47:24 -08:00
Yann Collet	32a5d95dcb	moved HufLog to lib/decompress it's only used to size decompression tables	2022-01-26 14:47:24 -08:00
Yann Collet	e9dd923fa4	only declare debug functions in debug mode	2022-01-26 14:47:24 -08:00
Yann Collet	5db717af10	proper max limit to 11	2022-01-26 14:47:24 -08:00
Yann Collet	4684836f4f	update regression tests minor compression ratio benefits in some cases, no compression ratio regression in the measured scenarios.	2022-01-26 14:47:24 -08:00
Yann Collet	51da2d2ff2	improved compression of literals in specific corner cases In rare cases, the default huffman depth selector is a bit too harsh, requiring brutal adaptations to the tree, resulting is some loss of compression ratio. This new heuristic avoids the worse cases, favoring compression ratio. As an example, compression of a specific distribution of 771 literals is now improved to 441 bytes, from 601 bytes before.	2022-01-26 14:47:24 -08:00
Yann Collet	7616e39f3b	adding traces to better track processing of literals	2022-01-26 14:47:21 -08:00
Yann Collet	cbff372d10	added helper function inBuffer_forEndFlush()	2022-01-26 11:05:57 -08:00
Yann Collet	b99ece96b9	converted checks into user validation generating error codes had to create a new error code for this condition, none of the existing ones were fitting enough.	2022-01-26 10:43:50 -08:00
Yann Collet	c1668a00d2	fix extended case combining stableInBuffer with continue() and flush() modes	2022-01-26 10:31:25 -08:00
Yann Collet	270f9bf005	better consistency in accessing @input as suggested by @terrelln. Also : commented zstreamtest more to ensure ZSTD_stableInBuffer is tested/	2022-01-26 10:31:24 -08:00
Yann Collet	8296be4a0a	pretend consuming input to provide a sense of forward progress	2022-01-26 10:31:24 -08:00
Yann Collet	4b9d1dd9ff	fixed incorrect comment	2022-01-26 10:31:24 -08:00
Yann Collet	27d336b099	minor behavior refinements specifically, there is no obligation to start streaming compression with pos=0. stableSrc mode is now compatible with this setup.	2022-01-26 10:31:24 -08:00
Yann Collet	37b87add7a	make stableSrc compatible with regular streaming API including flushStream(). Now the only condition is for `input.size` to continuously grow.	2022-01-26 10:31:24 -08:00
Yann Collet	c0c5ffa973	streaming compression : lazy parameter adaptation with stable input effectively makes ZSTD_c_stableInput compatible ZSTD_compressStream() and zstd_e_continue operation mode.	2022-01-26 10:31:24 -08:00
Yann Collet	5684bae4f6	minor refactoring on streaming compression implementation.	2022-01-26 10:31:23 -08:00
Yann Collet	fc2ea97442	refactored fuzzer tests for sequence compression api add explicit delimiter mode to libfuzzer test	2022-01-26 00:19:35 -08:00
Yann Collet	87dcd3326a	fix sequence compression API in Explicit Delimiter mode	2022-01-25 13:33:41 -08:00
Yann Collet	cc7d23bcec	Merge pull request #2965 from facebook/offbase Converge sumtype (offset \| repcode) numeric representation towards offBase	2022-01-24 15:47:42 -08:00
Yann Collet	71921e596f	Merge pull request #2983 from facebook/minLitPricev2 [opt] minor compression ratio improvement	2022-01-20 16:02:31 -08:00
Elliot Gorokhovsky	f936dd89cb	Minor lint fix	2022-01-20 11:54:43 -07:00
Elliot Gorokhovsky	9b6dfedf0c	Documentation and minor refactor to clarify MT memory management.	2022-01-18 09:43:05 -07:00
Yann Collet	ca0135c2fd	new Formulation presumes faster	2022-01-07 14:37:53 -08:00
Yann Collet	9e1b4828e5	enforce a minimum price of 1 bit per literal in the optimal parser	2022-01-07 13:53:48 -08:00
Nick Terrell	4d8a2132d0	[opt] Fix oss-fuzz bug in optimal parser oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072, which can't be represented in the zstd format, so we accessed 1 beyond LL_bits. Fix the issue by making it cost 1 bit more than litLength = 131071. There are still follow ups: 1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals over a match. Should we bound literals_cost[literal] > 0, unless the block truly only has one literal value? 2. When no matches are found, the cost model isn't updated. In this case no matches were found for an entire block. So the literals cost model wasn't updated at all. That made the optimal parser think literals_cost[0] = 0, where it is actually quite high, since the block was entirely random noise. Credit to OSS-Fuzz.	2022-01-06 16:10:18 -08:00
Yann Collet	41ad7332dd	Updated expression for better readability	2022-01-04 09:07:11 -08:00
Yann Collet	8c53e526db	fix performance issue in scenario #2966 (part 1) When re-using a compression state, across multiple successive compressions, the state should minimize the amount of allocation and initialization required. This mostly matters in situations where initialization is an overwhelming task compared to compression itself. This can happen when the amount to compress is small, while the compression state was given the impression that it would be much larger, aka, streaming mode without providing a srcSize hint. This lean-initialization optimization was broken in 980f3bbf8354edec0ad32b4430800f330185de6a . This commit fixes it, making this scenario once again on par with v1.4.9. Note that this does not completely fix #2966, since another heavy initialization, specific to row mode, is also happening (and was not present in v1.4.9). This will be fixed in a separate commit.	2021-12-31 15:16:19 -08:00
Yann Collet	03903f5701	fixed minor compression difference in btlazy2 subtle dependency on sumtype numeric representation	2021-12-29 18:51:03 -08:00
Yann Collet	7a18d709ae	updated all names to offBase convention	2021-12-29 17:30:43 -08:00
Yann Collet	f92ec5ea54	change the offset\|repcode sumtype format to match offBase directly at ZSTD_storeSeq() interface. In the process, remove ZSTD_REP_MOVE. This makes it possible, in future commits, to update and effectively simplify the naming scheme to properly label the updated processing pipeline : offset \| repcode => offBase => offCode + offBits	2021-12-29 12:03:36 -08:00
Yann Collet	ad7c9fc11e	use ZSTD_memcpy(), for proper redirection within Linux Kernel	2021-12-28 17:41:47 -08:00
Yann Collet	8da414231d	found a few more places which were dependent on seqStore offcode sumtype numeric representation	2021-12-28 17:03:24 -08:00
Yann Collet	de9f52e945	regroup all mentions of ZSTD_REP_MOVE within zstd_compress_internal.h	2021-12-28 13:47:57 -08:00
Yann Collet	a34ccad9a6	fixed minor conversion warnings	2021-12-28 13:21:22 -08:00
Yann Collet	92a08eec72	abstracted storeSeq() sumtype numeric representation from zstd_lazy.c	2021-12-28 12:23:39 -08:00
Yann Collet	e909fa627f	abstracted storeSeq() sumtype numeric representation from zstd_opt.c	2021-12-28 12:14:33 -08:00
Yann Collet	6fa640ef70	separate newRep() from updateRep() the new contracts seems to make more sense : updateRep() updates an array of repeat offsets _in place_, while newRep() generates a new structure with the updated repeat-offset array. Most callers are actually expecting the in-place variant, and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.	2021-12-28 11:52:33 -08:00
Yann Collet	321583ccf5	fixed minor typecast warnings	2021-12-28 11:38:21 -08:00
Yann Collet	b7630a474b	abstracted usage of offBase sumtype within zstd_lazy.c	2021-12-28 10:59:47 -08:00
Yann Collet	435f5a2e6d	fixed regression test assert optLdm->offset might be == 0 in invalid case. Only use STORE_OFFSET() after validating it's a correct case.	2021-12-28 09:55:31 -08:00
Yann Collet	2068889146	created STORED_*() macros to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`. This makes it possible to abstract away this representation by using the macros to extract these values. First user : ZSTD_updateRep() .	2021-12-28 06:59:07 -08:00
Yann Collet	1aed962216	introduce macros STORE_OFFSET() and STORE_REPCODE() this meant to abstract the sumtype representation required to transfert `offcode` to `ZSTD_storeSeq()`. Unfortunately, the sumtype numeric representation is currently a leaky abstraction that has permeated many other parts of the code, especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`. While this PR makes a good job a transfering a large nb of call sites to using the new macros, there are still a few sites where this transformation is more complex, or where the numeric representation itself it used "as is". One of the problematics area is the decision to use the numeric format of the sumtype within the match finders of `zstd_lazy`. This commit doesn't change the behavior, it only introduces and employes the macros, but eventually the resulting code remains identical. At target, if the numeric representation of the sumtype can be completely abstracted and no other part of the code depends on it, it will be possible to move it towards something slightly more efficient.	2021-12-23 22:03:30 -08:00

1 2 3 4 5 ...

2103 Commits