facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Nick Terrell	8389a5122b	Merge pull request #2602 from terrelln/ldm-opt [LDM] Speed optimization on repetitive data	2021-05-04 23:13:09 -07:00
Nick Terrell	32823bc150	[LDM] Speed optimization on repetitive data LDM does especially poorly on repetitive data when that data's hash happens to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or random chance. Optimize this case by skipping over repetitive patterns. The detection is very simplistic, but should catch most of the offending cases. ``` head -c 1G /dev/zero \| perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long 21.187881087 seconds time elapsed head -c 1G /dev/zero \| perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long 1.149707921 seconds time elapsed ```	2021-05-04 10:57:42 -07:00
Nick Terrell	34aff7ea06	Bug fix & run overflow correction much more frequently in tests * Fix overflow correction when `windowLog < cycleLog`. Previously, we got the correction wrong in this case, and our chain tables and binary trees would be corrupted. Now, we work as long as `maxDist` is a power of two, by adding `MAX(maxDist, cycleSize)` to our indices. * When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero run overflow correction as frequently as allowed without impacting compression ratio. * Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10% speed penalty at most, which seems reasonable.	2021-05-03 15:21:47 -07:00
Nick Terrell	4694423c4f	Add and integrate lazy row hash strategy	2021-04-07 09:53:34 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Quentin Carbonneaux	552efcac2d	relocate large arrays from the stack to ldmState_t	2021-02-10 16:16:54 +01:00
Quentin Carbonneaux	e2ad174d73	fix some compiler warnings	2021-02-08 20:19:16 +01:00
Quentin Carbonneaux	874a590e5c	deal safely with short inputs in ZSTD_ldm_generateSequences The fuzzer CI found this bug.	2021-02-04 11:15:24 +01:00
Quentin Carbonneaux	9f327c02fd	new core ldm algorithm	2021-02-03 22:24:07 +01:00
Quentin Carbonneaux	aee3dc877f	fix a variable name to reflect its nature	2021-01-22 02:24:19 -08:00
Quentin Carbonneaux	d6e3de77dc	fix warning and remove one more occurrence of makeEntryAndInsertByTag	2021-01-20 01:39:16 -08:00
Quentin Carbonneaux	e0d5eca8fa	fix forgotten numTagBits in getTagMask	2021-01-20 00:54:20 -08:00
Quentin Carbonneaux	1e65711ca5	a couple performance improvement changes for ldm	2021-01-20 00:54:20 -08:00
Thomas Waldmann	92a2b5ccc9	fixup: lits means literals	2021-01-07 23:30:42 +01:00
Thomas Waldmann	f9802d80a0	fix typos (work done by Andrea Gelmini)	2021-01-07 18:47:23 +01:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Nick Terrell	0953645837	Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue Fix long distance matcher OSS-fuzz issue	2020-10-27 11:13:03 -07:00
senhuang42	4d01979b62	Expose and call ZSTD_ldm_skipRawSeqStoreBytes()	2020-10-16 20:30:00 -04:00
senhuang42	d0550bb18f	Clarify argument names, fix DEBUGLOG() statements	2020-10-14 15:45:43 -04:00
senhuang42	3f99c9b38d	Adjust match backwards count args	2020-10-14 15:23:03 -04:00
senhuang42	bf0d559449	Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments()	2020-10-14 12:58:06 -04:00
senhuang42	a6165c1b28	Change matchState_t::ldmSeqStore to pointer	2020-10-07 14:13:57 -04:00
senhuang42	abce708a56	Move posInSequence correction to correct location	2020-10-07 13:56:25 -04:00
senhuang42	0fac8e07e1	Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes	2020-10-07 13:56:25 -04:00
senhuang42	a5500cf2af	Refactor separate ldm variables all into one struct	2020-10-07 13:56:25 -04:00
senhuang42	031b7ec15f	Disable LDM minMatch adjustment when using opt parser	2020-10-07 13:56:25 -04:00
senhuang42	b8bfc4e63d	Add cSize regression test to fuzzer.c	2020-10-07 13:56:25 -04:00
senhuang42	10647924f1	Make function descriptions more accurate	2020-10-07 13:56:25 -04:00
senhuang42	7dee62c287	Reset ldmSeqStore after initStats_ultra() pass for btultra2	2020-10-07 13:56:25 -04:00
senhuang42	ea92fb3a68	Cleanups, add comments and explanations	2020-10-07 13:56:25 -04:00
senhuang42	6ccd97fc96	Fixed end of match boundary update issues	2020-10-07 13:56:25 -04:00
senhuang42	28394b64f2	Add proper bounds check on adding ldms	2020-10-07 13:56:25 -04:00
senhuang42	f57c7e6bbf	Add base adjustment correction	2020-10-07 13:56:25 -04:00
senhuang42	84009a076a	Add re-copying of ldmSeqStore after processing	2020-10-07 13:56:25 -04:00
senhuang42	35d9f488f5	Modify codepath to use opt parser exclusively if the compression level is high enough	2020-10-07 13:56:24 -04:00
Nick Terrell	f91ed5c766	[lib] s/current/curr because it collides with Linux Kernel macro	2020-09-09 14:35:39 -07:00
Yann Collet	fdc56baa42	fix 22294 (#2151 )	2020-05-18 21:05:10 -07:00
Nick Terrell	b2092c6dc4	[ldm] Reset loadedDictEnd when the context is reset	2020-05-18 12:35:44 -07:00
Nick Terrell	add7ed2d4a	[lib] Fix bug in loading LDM dictionary in MT mode Exposed when loading a dictionary < LDM minMatch bytes in MT mode. Test Plan: ``` CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address" ./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297 ``` TODO: Add an explicit test that loads a small dictionary in MT mode	2020-05-14 11:52:28 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
Bimba Shrestha	5b0a452cac	Adding --long support for --patch-from (#1959 ) * adding long support for patch-from * adding refPrefix to dictionary_decompress * adding refPrefix to dictionary_loader * conversion nit * triggering log mode on chainLog < fileLog and removing old threshold * adding refPrefix to dictionary_round_trip * adding docs * adding enableldm + forceWindow test for dict * separate patch-from logic into FIO_adjustParamsForPatchFromMode * moving memLimit adjustment to outside ifdefs (need for decomp) * removing refPrefix gate on dictionary_round_trip * rebase on top of dev refPrefix change * making sure refPrefx + ldm is < 1% of srcSize * combining notes for patch-from * moving memlimit logic inside fileio.c * adding display for optimal parser and long mode trigger * conversion nit * fuzzer found heap-overflow fix * another conversion nit * moving FIO_adjustMemLimitForPatchFromMode outside ifndef * making params immutable * moving memLimit update before createDictBuffer call * making maxSrcSize unsigned long long * making dictSize and maxSrcSize params unsigned long long * error on files larger than 4gb * extend refPrefix test to include round trip * conversion to size_t * making sure ldm is at least 10x better * removing break * including zstd_compress_internal and removing redundant macros * exposing ZSTD_cycleLog() * using cycleLog instead of chainLog * add some more docs about user optimizations * formatting	2020-04-17 15:58:53 -05:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
W. Felix Handte	19a0955ec9	Add `ZSTD_cwksp_alloc_size()` to Help Calculate Needed Workspace Size	2019-10-10 13:40:16 -04:00
Nick Terrell	ddab2a94e8	Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy()	2019-09-20 00:56:20 -07:00
Nick Terrell	75cfe1dc69	[ldm] Fix bug in overflow correction with large job size (#1678 ) * [ldm] Fix bug in overflow correction with large job size * [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode) * [test] Add test that exposes the bug Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.	2019-07-12 18:45:18 -04:00
Josh Soref	a880ca239b	Spelling (#1582 ) * spelling: accidentally * spelling: across * spelling: additionally * spelling: addresses * spelling: appropriate * spelling: assumed * spelling: available * spelling: builder * spelling: capacity * spelling: compiler * spelling: compressibility * spelling: compressor * spelling: compression * spelling: contract * spelling: convenience * spelling: decompress * spelling: description * spelling: deflate * spelling: deterministically * spelling: dictionary * spelling: display * spelling: eliminate * spelling: preemptively * spelling: exclude * spelling: failure * spelling: independence * spelling: independent * spelling: intentionally * spelling: matching * spelling: maximum * spelling: meaning * spelling: mishandled * spelling: memory * spelling: occasionally * spelling: occurrence * spelling: official * spelling: offsets * spelling: original * spelling: output * spelling: overflow * spelling: overridden * spelling: parameter * spelling: performance * spelling: probability * spelling: receives * spelling: redundant * spelling: recompression * spelling: resources * spelling: sanity * spelling: segment * spelling: series * spelling: specified * spelling: specify * spelling: subtracted * spelling: successful * spelling: return * spelling: translation * spelling: update * spelling: unrelated * spelling: useless * spelling: variables * spelling: variety * spelling: verbatim * spelling: verification * spelling: visited * spelling: warming * spelling: workers * spelling: with	2019-04-12 11:18:11 -07:00
Yann Collet	be9e561da4	changed ZSTD_c_compressionStrategy into ZSTD_c_strategy also : fixed paramgrill, and limit conditions	2018-12-06 15:00:52 -08:00
Yann Collet	41c7d0b1e1	changed hashEveryLog into hashRateLog	2018-11-21 14:36:57 -08:00
Yann Collet	e874dacc08	changed searchLength into minMatch refactored all relevant API and calls for consistency.	2018-11-20 14:56:07 -08:00
Nick Terrell	b9693d3a49	[lib] Add rsyncable mode - Add rsyncable mode to multithreaded mode - Factor out LDM's hash function for reuse	2018-11-14 16:59:57 -08:00

1 2

79 Commits