facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Yann Collet	e9dd923fa4	only declare debug functions in debug mode	2022-01-26 14:47:24 -08:00
Yann Collet	7a18d709ae	updated all names to offBase convention	2021-12-29 17:30:43 -08:00
Yann Collet	f92ec5ea54	change the offset\|repcode sumtype format to match offBase directly at ZSTD_storeSeq() interface. In the process, remove ZSTD_REP_MOVE. This makes it possible, in future commits, to update and effectively simplify the naming scheme to properly label the updated processing pipeline : offset \| repcode => offBase => offCode + offBits	2021-12-29 12:03:36 -08:00
Yann Collet	ad7c9fc11e	use ZSTD_memcpy(), for proper redirection within Linux Kernel	2021-12-28 17:41:47 -08:00
Yann Collet	8da414231d	found a few more places which were dependent on seqStore offcode sumtype numeric representation	2021-12-28 17:03:24 -08:00
Yann Collet	de9f52e945	regroup all mentions of ZSTD_REP_MOVE within zstd_compress_internal.h	2021-12-28 13:47:57 -08:00
Yann Collet	e909fa627f	abstracted storeSeq() sumtype numeric representation from zstd_opt.c	2021-12-28 12:14:33 -08:00
Yann Collet	6fa640ef70	separate newRep() from updateRep() the new contracts seems to make more sense : updateRep() updates an array of repeat offsets _in place_, while newRep() generates a new structure with the updated repeat-offset array. Most callers are actually expecting the in-place variant, and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.	2021-12-28 11:52:33 -08:00
Yann Collet	435f5a2e6d	fixed regression test assert optLdm->offset might be == 0 in invalid case. Only use STORE_OFFSET() after validating it's a correct case.	2021-12-28 09:55:31 -08:00
Yann Collet	2068889146	created STORED_*() macros to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`. This makes it possible to abstract away this representation by using the macros to extract these values. First user : ZSTD_updateRep() .	2021-12-28 06:59:07 -08:00
Yann Collet	1aed962216	introduce macros STORE_OFFSET() and STORE_REPCODE() this meant to abstract the sumtype representation required to transfert `offcode` to `ZSTD_storeSeq()`. Unfortunately, the sumtype numeric representation is currently a leaky abstraction that has permeated many other parts of the code, especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`. While this PR makes a good job a transfering a large nb of call sites to using the new macros, there are still a few sites where this transformation is more complex, or where the numeric representation itself it used "as is". One of the problematics area is the decision to use the numeric format of the sumtype within the match finders of `zstd_lazy`. This commit doesn't change the behavior, it only introduces and employes the macros, but eventually the resulting code remains identical. At target, if the numeric representation of the sumtype can be completely abstracted and no other part of the code depends on it, it will be possible to move it towards something slightly more efficient.	2021-12-23 22:03:30 -08:00
Yann Collet	aeff128331	change seqDef.offset into seqDef.offBase to better reflect the value stored in this field.	2021-12-23 17:56:08 -08:00
Yann Collet	e145b58cfd	changed seqDef.matchLength into seqDef.mlBase since this is effectively what is stored in this field (== matchLength - MINMATCH). This makes it clearer what needs to be done when reading from / writing to this field.	2021-12-23 13:39:46 -08:00
Yann Collet	b77fcac61f	change ZSTD_storeSeq() interface to accept matchLength instead of mlBase. This removes the need to do `- MINMATCH` at every call site. The new interface contract is checked with an `assert()`.	2021-12-23 12:03:33 -08:00
Felix Handte	c2c6a4ab40	Merge pull request #2869 from felixhandte/oss-fuzz-fix-41005 Determinism: Avoid Mapping Window into Reserved Indices during Reduction	2021-11-18 10:11:48 -05:00
W. Felix Handte	66079085f0	Determinism: Avoid Mapping Window into Reserved Indices during Reduction PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It succeeded in addressing that source of non-determinism, but introduced a new one: it was possible, when index reduction occurred, to map indices in the window to the reserved value, which would cause them to be zeroed, potentially altering parsing of the input. This PR addresses this issue. It makes sure that the bottom of the window is always `>= ZSTD_WINDOW_START_INDEX`. I'm not sure if this makes #2850 redundant. I think it's probably still valuable to have that protection as well. Credit to OSS-Fuzz for discovering this issue.	2021-11-17 18:09:18 -05:00
Dimitris Apostolou	ebbd675998	Fix typos	2021-11-13 10:04:04 +02:00
Yann Collet	9d62957b31	Merge pull request #2800 from animalize/fix_c89 Fix a C89 error in msvc	2021-10-18 14:32:04 -07:00
Ma Lin	ae986fcdb8	Use __assume(0) for unreachable code path in msvc msvc will optimize away the condition check.	2021-09-27 19:23:57 +08:00
Ma Lin	e5ba858270	Don't initialize the first parameter of _BitScanForward* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64	2021-09-25 16:36:53 +08:00
Ma Lin	95f492ea17	Don't initialize the first parameter of _BitScanReverse* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64	2021-09-25 16:36:53 +08:00
Nick Terrell	14772d97be	Merge pull request #2796 from terrelln/linux-fixes [lib] Make lib compatible with `-Wfall-through` excepting legacy	2021-09-23 16:11:53 -07:00
Nick Terrell	189e87bcbe	[lib] Make lib compatible with `-Wfall-through` excepting legacy Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported compilers this uses an attribute, otherwise it becomes a comment. This is necessary to be compatible with clang's `-Wfall-through`, and gcc's `-Wfall-through=2` which don't support comments. Without this the linux build emits a bunch of warnings. Also add a test to CI to ensure that we don't regress.	2021-09-23 10:51:18 -07:00
Yann Collet	fa2a4d77c7	constify MatchState* parameter when possible turns out, it's possible to constify MatchState* parameter in some parts of the binary tree algorithm, making it a pure read-only parameter, as opposed to a mutable state. This is supposed to be helpful for both maintenance and the compiler.	2021-09-23 08:27:44 -07:00
senhuang42	1d8143c84f	Move block splitter from stack to CCtx	2021-09-23 00:02:31 -04:00
senhuang42	06f42c3bfd	Use new paramSwitch enum for LDM	2021-09-21 14:22:09 -04:00
senhuang42	b5c35d7ea3	Use new paramSwitch enum for LCM, row matchfinder, and block splitter	2021-09-21 14:22:02 -04:00
Nick Terrell	6ee70bae46	Merge pull request #2733 from terrelln/huf-cspeed [HUF] Improve Huffman encoding speed	2021-08-03 12:59:54 -04:00
Nick Terrell	46f2710562	[HUF] Improve Huffman encoding speed Improve Huffman encoding speed by 20% for gcc and 10% for clang. \| Compiler \| Benchmark \| Config \| Dataset \| Ratio \| Speed MB/s (dev) \| Speed MB/s (huf-cspeed) \| Speed MB/s (huf-cspeed - dev) \| \|----------\|-------------------\|---------\|-------------\|-------\|------------------\|-------------------------\|-------------------------------\| \| gcc \| compress \| level_1 \| enwik7 \| 2.43 \| 253.70 \| 258.72 \| 2.0% \| \| gcc \| compress \| level_1 \| silesia \| 2.88 \| 341.90 \| 348.15 \| 1.8% \| \| gcc \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 761.83 \| 912.76 \| 19.8% \| \| gcc \| compress_literals \| level_1 \| silesia \| 1.28 \| 754.83 \| 902.37 \| 19.5% \| \| gcc \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 502.81 \| 552.79 \| 9.9% \| \| gcc \| compress_literals \| level_7 \| silesia \| 1.11 \| 675.97 \| 776.44 \| 14.9% \| \| clang \| compress \| level_1 \| enwik7 \| 2.43 \| 277.54 \| 280.98 \| 1.2% \| \| clang \| compress \| level_1 \| silesia \| 2.88 \| 369.98 \| 375.46 \| 1.5% \| \| clang \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 828.83 \| 918.41 \| 10.8% \| \| clang \| compress_literals \| level_1 \| silesia \| 1.28 \| 815.81 \| 905.41 \| 11.0% \| \| clang \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 533.13 \| 553.30 \| 3.8% \| \| clang \| compress_literals \| level_7 \| silesia \| 1.11 \| 714.52 \| 775.38 \| 8.5% \|	2021-07-27 15:10:35 -07:00
Nick Terrell	ba044bd6f1	[bug-fix] Fix a determinism bug with the DUBT The DUBT can be non-deterministic if an index is equal to `ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the indices at 2. This bug was found by the OSS-Fuzz determinism fuzzer. With this change the fuzzer test passes. And I've confirmed that this is the root cause, not just hiding the problem. Aside: This took me a long time to figure out, because I thought I had tried this first thing. But, apparantly I messed it up, because when I was going through it again with @felixhandte, I was pointing out that it wasn't the case, but it turns out it was. Credit to: OSS-Fuzz	2021-07-15 13:02:49 -07:00
Nick Terrell	c2555f8c6f	[lib] Fix fuzzer timeouts by backing off overflow correction Linearly back off the frequency of overflow correction based on the number of times the `ZSTD_window_t` has been overflow corrected. This will still allow the fuzzer to quickly find overflow correction bugs, while also keeping good speed for larger inputs. Additionally, the `nbOverflowCorrections` variable can be useful for debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if overflow correction has happened yet. I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6 seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest` still catch the row-hash overflow correction bug.	2021-05-06 22:03:41 -07:00
Nick Terrell	207e33bb61	Merge pull request #2616 from terrelln/deterministic-dict [lib] Add ZSTD_c_deterministicRefPrefix	2021-05-06 11:09:22 -07:00
Nick Terrell	172b4b6ac4	[lib] Add ZSTD_c_deterministicRefPrefix This flag forces zstd to always load the prefix in ext-dict mode, even if it happens to be contiguous, to force determinism. It also applies to dictionaries that are re-processed. A determinism test case is also added, which fails without `ZSTD_c_deterministicRefPrefix` and passes with it set. Question: Should this be the default behavior? It isn't in this PR.	2021-05-05 18:49:56 -07:00
Nick Terrell	c2183d7cdf	[lib] Move some ZSTD_CCtx_params off the stack * Take `params` by const reference in `ZSTD_resetCCtx_internal()`. * Add `simpleApiParams` to the CCtx and use them in the simple API functions, instead of creating those parameters on the stack. I think this is a good direction to move in, because we shouldn't need to worry about adding parameters to `ZSTD_CCtx_params`, since it should always be on the heap (unless they become absoultely gigantic). Some `ZSTD_CCtx_params` are still on the stack in the CDict functions, but I've left them for now, because it was a little more complex, and we don't use those functions in stack-constrained currently.	2021-05-05 13:25:16 -07:00
Nick Terrell	94db4398a0	[lib] Always load the dictionary in one go Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded in multiple segments. Instead, when we detect large dictionaries, ensure that we reset the context's indicies. Then, for dictionaries larger than `ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally, enable DDS for large dictionaries, since we no longer load in multiple segments. This simplifes the dictionary loading code, and reduces opportunities for non-determinism to slip in.	2021-05-04 16:45:25 -07:00
Nick Terrell	34aff7ea06	Bug fix & run overflow correction much more frequently in tests * Fix overflow correction when `windowLog < cycleLog`. Previously, we got the correction wrong in this case, and our chain tables and binary trees would be corrupted. Now, we work as long as `maxDist` is a power of two, by adding `MAX(maxDist, cycleSize)` to our indices. * When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero run overflow correction as frequently as allowed without impacting compression ratio. * Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10% speed penalty at most, which seems reasonable.	2021-05-03 15:21:47 -07:00
Nick Terrell	4694423c4f	Add and integrate lazy row hash strategy	2021-04-07 09:53:34 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
sen	84ccb81e7c	Merge pull request #2561 from senhuang42/longlength_enum Add enum for representing long length ID	2021-03-26 15:55:12 -04:00
Sen Huang	b1a43455f8	Add enum for representing long length ID	2021-03-26 10:41:09 -07:00
Sen Huang	2a907bf4aa	Move lastCountSize into a returned struct, fix MSAN error	2021-03-25 09:11:15 -07:00
Nick Terrell	f8ac0ea7ef	Merge pull request #2539 from terrelln/linux-kernel-fixes Fixes for the next linux kernel patch version	2021-03-24 10:34:29 -07:00
Sen Huang	41c3eae6d9	Fix various fuzzer failures: repcode history, superblocks	2021-03-24 08:21:29 -07:00
senhuang42	0633bf17c3	Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression	2021-03-24 08:21:29 -07:00
senhuang42	eb1ee8686d	Refactor buildSequencesStatistics() to avoid pointer increment for superblocks	2021-03-24 08:21:29 -07:00
senhuang42	f06f6626ed	Update function names for consistency	2021-03-24 08:20:54 -07:00
senhuang42	c56d6e49e8	Add block splitter to experimental params	2021-03-24 08:20:54 -07:00
senhuang42	c05c090cc2	Centralize entropy statistics calculations to zstd_compress.c	2021-03-24 08:20:29 -07:00
Nick Terrell	cd1551d261	[lib][tracing] Add ZSTD_NO_TRACE macro When defined, it disables tracing, and avoids including the header.	2021-03-16 11:47:27 -07:00
Yann Collet	8884cb887d	Merge pull request #2483 from mpu/ldmgear New algorithms for the long distance matcher	2021-02-11 08:38:23 -08:00

1 2 3 4 5

242 Commits