facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Nick Terrell	6ee70bae46	Merge pull request #2733 from terrelln/huf-cspeed [HUF] Improve Huffman encoding speed	2021-08-03 12:59:54 -04:00
Nick Terrell	46f2710562	[HUF] Improve Huffman encoding speed Improve Huffman encoding speed by 20% for gcc and 10% for clang. \| Compiler \| Benchmark \| Config \| Dataset \| Ratio \| Speed MB/s (dev) \| Speed MB/s (huf-cspeed) \| Speed MB/s (huf-cspeed - dev) \| \|----------\|-------------------\|---------\|-------------\|-------\|------------------\|-------------------------\|-------------------------------\| \| gcc \| compress \| level_1 \| enwik7 \| 2.43 \| 253.70 \| 258.72 \| 2.0% \| \| gcc \| compress \| level_1 \| silesia \| 2.88 \| 341.90 \| 348.15 \| 1.8% \| \| gcc \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 761.83 \| 912.76 \| 19.8% \| \| gcc \| compress_literals \| level_1 \| silesia \| 1.28 \| 754.83 \| 902.37 \| 19.5% \| \| gcc \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 502.81 \| 552.79 \| 9.9% \| \| gcc \| compress_literals \| level_7 \| silesia \| 1.11 \| 675.97 \| 776.44 \| 14.9% \| \| clang \| compress \| level_1 \| enwik7 \| 2.43 \| 277.54 \| 280.98 \| 1.2% \| \| clang \| compress \| level_1 \| silesia \| 2.88 \| 369.98 \| 375.46 \| 1.5% \| \| clang \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 828.83 \| 918.41 \| 10.8% \| \| clang \| compress_literals \| level_1 \| silesia \| 1.28 \| 815.81 \| 905.41 \| 11.0% \| \| clang \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 533.13 \| 553.30 \| 3.8% \| \| clang \| compress_literals \| level_7 \| silesia \| 1.11 \| 714.52 \| 775.38 \| 8.5% \|	2021-07-27 15:10:35 -07:00
Nick Terrell	ba044bd6f1	[bug-fix] Fix a determinism bug with the DUBT The DUBT can be non-deterministic if an index is equal to `ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the indices at 2. This bug was found by the OSS-Fuzz determinism fuzzer. With this change the fuzzer test passes. And I've confirmed that this is the root cause, not just hiding the problem. Aside: This took me a long time to figure out, because I thought I had tried this first thing. But, apparantly I messed it up, because when I was going through it again with @felixhandte, I was pointing out that it wasn't the case, but it turns out it was. Credit to: OSS-Fuzz	2021-07-15 13:02:49 -07:00
Nick Terrell	c2555f8c6f	[lib] Fix fuzzer timeouts by backing off overflow correction Linearly back off the frequency of overflow correction based on the number of times the `ZSTD_window_t` has been overflow corrected. This will still allow the fuzzer to quickly find overflow correction bugs, while also keeping good speed for larger inputs. Additionally, the `nbOverflowCorrections` variable can be useful for debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if overflow correction has happened yet. I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6 seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest` still catch the row-hash overflow correction bug.	2021-05-06 22:03:41 -07:00
Nick Terrell	207e33bb61	Merge pull request #2616 from terrelln/deterministic-dict [lib] Add ZSTD_c_deterministicRefPrefix	2021-05-06 11:09:22 -07:00
Nick Terrell	172b4b6ac4	[lib] Add ZSTD_c_deterministicRefPrefix This flag forces zstd to always load the prefix in ext-dict mode, even if it happens to be contiguous, to force determinism. It also applies to dictionaries that are re-processed. A determinism test case is also added, which fails without `ZSTD_c_deterministicRefPrefix` and passes with it set. Question: Should this be the default behavior? It isn't in this PR.	2021-05-05 18:49:56 -07:00
Nick Terrell	c2183d7cdf	[lib] Move some ZSTD_CCtx_params off the stack * Take `params` by const reference in `ZSTD_resetCCtx_internal()`. * Add `simpleApiParams` to the CCtx and use them in the simple API functions, instead of creating those parameters on the stack. I think this is a good direction to move in, because we shouldn't need to worry about adding parameters to `ZSTD_CCtx_params`, since it should always be on the heap (unless they become absoultely gigantic). Some `ZSTD_CCtx_params` are still on the stack in the CDict functions, but I've left them for now, because it was a little more complex, and we don't use those functions in stack-constrained currently.	2021-05-05 13:25:16 -07:00
Nick Terrell	94db4398a0	[lib] Always load the dictionary in one go Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded in multiple segments. Instead, when we detect large dictionaries, ensure that we reset the context's indicies. Then, for dictionaries larger than `ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally, enable DDS for large dictionaries, since we no longer load in multiple segments. This simplifes the dictionary loading code, and reduces opportunities for non-determinism to slip in.	2021-05-04 16:45:25 -07:00
Nick Terrell	34aff7ea06	Bug fix & run overflow correction much more frequently in tests * Fix overflow correction when `windowLog < cycleLog`. Previously, we got the correction wrong in this case, and our chain tables and binary trees would be corrupted. Now, we work as long as `maxDist` is a power of two, by adding `MAX(maxDist, cycleSize)` to our indices. * When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero run overflow correction as frequently as allowed without impacting compression ratio. * Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10% speed penalty at most, which seems reasonable.	2021-05-03 15:21:47 -07:00
Nick Terrell	4694423c4f	Add and integrate lazy row hash strategy	2021-04-07 09:53:34 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
sen	84ccb81e7c	Merge pull request #2561 from senhuang42/longlength_enum Add enum for representing long length ID	2021-03-26 15:55:12 -04:00
Sen Huang	b1a43455f8	Add enum for representing long length ID	2021-03-26 10:41:09 -07:00
Sen Huang	2a907bf4aa	Move lastCountSize into a returned struct, fix MSAN error	2021-03-25 09:11:15 -07:00
Nick Terrell	f8ac0ea7ef	Merge pull request #2539 from terrelln/linux-kernel-fixes Fixes for the next linux kernel patch version	2021-03-24 10:34:29 -07:00
Sen Huang	41c3eae6d9	Fix various fuzzer failures: repcode history, superblocks	2021-03-24 08:21:29 -07:00
senhuang42	0633bf17c3	Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression	2021-03-24 08:21:29 -07:00
senhuang42	eb1ee8686d	Refactor buildSequencesStatistics() to avoid pointer increment for superblocks	2021-03-24 08:21:29 -07:00
senhuang42	f06f6626ed	Update function names for consistency	2021-03-24 08:20:54 -07:00
senhuang42	c56d6e49e8	Add block splitter to experimental params	2021-03-24 08:20:54 -07:00
senhuang42	c05c090cc2	Centralize entropy statistics calculations to zstd_compress.c	2021-03-24 08:20:29 -07:00
Nick Terrell	cd1551d261	[lib][tracing] Add ZSTD_NO_TRACE macro When defined, it disables tracing, and avoids including the header.	2021-03-16 11:47:27 -07:00
Yann Collet	8884cb887d	Merge pull request #2483 from mpu/ldmgear New algorithms for the long distance matcher	2021-02-11 08:38:23 -08:00
Quentin Carbonneaux	552efcac2d	relocate large arrays from the stack to ldmState_t	2021-02-10 16:16:54 +01:00
Nick Terrell	e59c9459a5	[trace] Keep track of a uint64_t tracing context The most common information that you want to track between begin() and end() is the timestamp of the begin function, so you can measure the duration of the (de)compression call. Allow the tracing library to put this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep a global map in this case. If a single uint64_t is not enough, the tracing library can return a unique identifier (like the context pointer) instead, and use it as a key in a map. This keeps the simple case simple.	2021-02-09 11:37:05 -08:00
Nick Terrell	54a4998a80	Add basic tracing functionality	2021-02-05 16:28:52 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Yann Collet	6132df8dd3	fix gcc-10 strict aliasing warnings by exposing HUF_CElt declaration.	2020-12-04 16:43:19 -08:00
senhuang42	7742f076b4	Add experimental param for sequence validation	2020-11-20 11:57:41 -05:00
senhuang42	7f563b0519	Add new sequence format as an experimental CCtx param	2020-11-16 10:49:17 -05:00
senhuang42	46824cb018	Add new sequence compress api params to cctx	2020-11-16 10:49:17 -05:00
senhuang42	5fd69f8173	Add documentation for new api functions	2020-11-16 10:49:16 -05:00
senhuang42	e5fe485dcc	Fix cSize calculation for noCompressBlocks	2020-11-16 10:49:16 -05:00
senhuang42	75b01f34b9	Add support for uncompressible blocks	2020-11-16 10:49:16 -05:00
senhuang42	2cff8df1a2	Pull block compression out of main compressSequences() function	2020-11-16 10:49:16 -05:00
senhuang42	89f3848310	Add support for repcodes	2020-11-16 10:49:16 -05:00
Nick Terrell	d4e021fe35	[lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set We don't use it when we have a stable input buffer, so don't allocate it. I had to slightly modify `ZSTD_copyCCtx()` by storing the `ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is no longer the correct signal for the buffered mode.	2020-10-30 10:55:34 -07:00
Nick Terrell	c74be3f6de	[lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set Adds the validation of the input/output buffers only. They are still unused.	2020-10-30 10:55:34 -07:00
Nick Terrell	e3e0775cc8	[API] Add ZSTD_c_stable{In,Out}Buffer parameters This commit adds the parameters and sets the value in the CCtxParams but it does not do anything with the value.	2020-10-30 10:54:39 -07:00
Yann Collet	f5d5cd3b40	Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser Integrate long distance matches into optimal parser	2020-10-13 13:09:07 -07:00
Nick Terrell	d5c688e8ae	Fix ZSTD_adjustCParams_internal() to handle dictionary logic Pass in the `ZSTD_cParamMode_e` to select how we define our cparams. Based on the mode we either take the `dictSize` into account or we set it to `0`. See the documentation for `ZSTD_cParamMode_e`. Some of the modes currently share the same behavior. But they have distinct modes because they are drastically different cases. E.g. compression + reprocessing the dictionary and creating a cdict. Additionally, when downsizing the hashLog and chainLog take the (adjusted) dictionary size into account, since the size of the dictionary gets added onto the window size. Adds a simple test to ensure that we aren't downsizing too far.	2020-10-12 12:50:04 -07:00
Yann Collet	12541931fa	Merge pull request #2328 from marxin/zstd-pool-api Allow external creation of POOLs that can be shared.	2020-10-09 01:00:50 -07:00
senhuang42	b9c8033cde	Define kNullRawSeqStore for every file	2020-10-07 19:02:41 -04:00
senhuang42	a6165c1b28	Change matchState_t::ldmSeqStore to pointer	2020-10-07 14:13:57 -04:00
senhuang42	10647924f1	Make function descriptions more accurate	2020-10-07 13:56:25 -04:00
senhuang42	1a687b3fcb	Improve documentation of relevant structs	2020-10-07 13:56:25 -04:00
senhuang42	a1ef2db5b2	Add ldm_calculateMatchRange() function	2020-10-07 13:56:25 -04:00
senhuang42	ef823e0299	Remove rawSeqStore.base and add rawSeqStore.posInSequence	2020-10-07 13:56:25 -04:00
senhuang42	ea92fb3a68	Cleanups, add comments and explanations	2020-10-07 13:56:25 -04:00
senhuang42	6ccd97fc96	Fixed end of match boundary update issues	2020-10-07 13:56:25 -04:00

1 2 3 4 5

215 Commits