facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Elliot Gorokhovsky	f349d18776	Merge pull request #3127 from embg/repcode_history Correct and clarify repcode offset history logic	2022-05-12 13:50:15 -04:00
Elliot Gorokhovsky	3620a0a565	Nits	2022-05-12 12:53:15 -04:00
W. Felix Handte	1dd046a507	Fix Comments Slightly	2022-05-11 12:38:45 -04:00
W. Felix Handte	cd1f582943	Hoist Hash Table Writes Up into Each Match Found Block Refactoring this way avoids the bad write in the case that `step > 4`, and is a bit more straightforward. It also seems to perform better!	2022-05-11 11:27:34 -04:00
W. Felix Handte	040986a4f4	ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1 This commit avoids checking whether a hashtable write is safe in two of the three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro- duces a ~0.5% speed-up in compression. A comment in the code describes why we can skip this check in the other two paths (the repcode check and the first match check in the unrolled loop). A downside is that in the new position where we make this check, we have not yet computed `mLength`. We therefore have to avoid writing possibly dangerous positions, rather than the old check which only avoids writing actually dangerous positions. This leads to a miniscule loss in ratio (remember that this scenario can only been triggered in very negative levels or under incomp- ressibility acceleration).	2022-05-10 14:29:39 -07:00
Elliot Gorokhovsky	22875ece61	Nits	2022-05-09 21:01:38 -04:00
Elliot Gorokhovsky	97aabc496e	Correct and clarify repcode offset history logic	2022-05-09 21:01:38 -04:00
Elliot Gorokhovsky	ac371be27b	Remove hasStep variant (not enough wins to justify the code size increase)	2022-04-28 18:06:24 -04:00
Elliot Gorokhovsky	ce6b69f5c5	Final nit	2022-04-28 14:49:45 -04:00
Elliot Gorokhovsky	6a2e1f7c69	Revert "Hardcode repcode safety check, fix cosmetic nits" This reverts commit 518cb83833074d304dfcaa93cfc16039ea4683c8.	2022-04-27 18:16:21 -04:00
Elliot Gorokhovsky	518cb83833	Hardcode repcode safety check, fix cosmetic nits	2022-04-26 17:54:25 -04:00
Elliot Gorokhovsky	809f652912	Optimize repcode predicate, hardcode hasStep == 0 scenario, cosmetic fixes	2022-04-20 14:40:52 -04:00
Elliot Gorokhovsky	2820efe7ec	Nits	2022-04-19 11:39:52 -04:00
Elliot Gorokhovsky	3536262f70	Port noDict pipeline	2022-04-15 12:16:16 -04:00
Elliot Gorokhovsky	64efba4c5e	Software pipeline for ZSTD_compressBlock_fast_dictMatchState (#3086 ) * prefetch dict content inside loop * ip0/ip1 pipeline * add L2_4 prefetch to dms pipeline * Remove L1 prefetch * Remove L2 prefetching * Reduce # of gotos * Cosmetic fixes * Check final position sometimes * Track step size as in bc768bc * Fix nits	2022-03-17 12:35:11 -04:00
Yann Collet	7a18d709ae	updated all names to offBase convention	2021-12-29 17:30:43 -08:00
Yann Collet	1aed962216	introduce macros STORE_OFFSET() and STORE_REPCODE() this meant to abstract the sumtype representation required to transfert `offcode` to `ZSTD_storeSeq()`. Unfortunately, the sumtype numeric representation is currently a leaky abstraction that has permeated many other parts of the code, especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`. While this PR makes a good job a transfering a large nb of call sites to using the new macros, there are still a few sites where this transformation is more complex, or where the numeric representation itself it used "as is". One of the problematics area is the decision to use the numeric format of the sumtype within the match finders of `zstd_lazy`. This commit doesn't change the behavior, it only introduces and employes the macros, but eventually the resulting code remains identical. At target, if the numeric representation of the sumtype can be completely abstracted and no other part of the code depends on it, it will be possible to move it towards something slightly more efficient.	2021-12-23 22:03:30 -08:00
Yann Collet	b77fcac61f	change ZSTD_storeSeq() interface to accept matchLength instead of mlBase. This removes the need to do `- MINMATCH` at every call site. The new interface contract is checked with an `assert()`.	2021-12-23 12:03:33 -08:00
W. Felix Handte	82a49c88f9	Increment Step by 1 not 2 I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate due to incompressible inputs. (The methods I tried slowed things down quite a bit.) Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as opposed to the `01__23__` we were actually doing), it's a big ambitious to increment `step` by 2. Instead, let's increment it by 1, which has the benefit sliiightly improving compression. Speed remains pretty much unchanged.	2021-12-13 16:59:33 -05:00
W. Felix Handte	6ca5f42402	Rewrite `step` to Track Increment Between Pairs of Positions The position updates are rewritten from `ip[N] = ip[N-1] + step` to be `ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing of gaps at setup and then we only have to keep a single `step` variable. This seems to work quite well on GCC and Clang!	2021-12-13 14:48:26 -05:00
W. Felix Handte	b8434cb754	Allow Templating `ZSTD_fast` Matchfinders on Acceleration (Lvl < -1)	2021-12-13 14:46:57 -05:00
W. Felix Handte	ace6a7e746	Decompose `step` into Two Variables This avoids an additional addition, at the cost of an additional variable.	2021-12-10 16:44:23 -05:00
W. Felix Handte	22501cd283	Stagger Application of `stepSize` in ZSTD_fast This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That is, it always looks at adjacent pairs of positions, and only applies the acceleration every other position. This produces a more fine-grained acceleration.	2021-12-10 16:44:23 -05:00
Nick Terrell	802ea885ef	Reduce function size in fast & dfast Take the same approach as in PR #2828 [0] to remove functions that force inline many function bodies and `switch`. Instead, create one function per "template" combination, and then switch between these functions. This allows the compiler to break the large function into many small functions, which generally helps codegen. Also, in the `extDict` modes when there is no ext-dict, call the top level function instead of the force inlined one, to save on code size. I'm specifically doing this because gcc on the parisc architecture doesn't handle the large function body well, and ends up using a lot of excess stack space. Outlining these functions fixes it.	2021-11-15 19:05:48 -08:00
W. Felix Handte	d6fd7761c9	Fix VS Build: Explicitly Cast to Narrow Ints	2021-09-01 14:15:04 -04:00
W. Felix Handte	15e67bfa7e	Deduplicate Implementations This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new `ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is functionally a no-op.	2021-09-01 14:15:04 -04:00
W. Felix Handte	64054dec44	Tweak Step	2021-09-01 14:15:04 -04:00
W. Felix Handte	24fcccd05c	Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed) Unrolling the loop to handle 2 positions in each iteration allows us to reduce the frequency of some operations that don't need to happen at every position. One such operation is the step calculation, which is a very rough heuristic anyways. It's fine if we do this a position later. The other operation is the repcode check. But since the repcode check already tries expanding back one position, we're really not missing much of importance by only trying it every other position. This commit also slightly reorders some operations.	2021-09-01 14:15:04 -04:00
W. Felix Handte	57a100f6dc	Add `ip1 + 128` Prefetch; Tiny Cleanup	2021-09-01 14:15:04 -04:00
W. Felix Handte	991d660ea9	Nit: Only Store 2 Hash Variables	2021-09-01 14:15:04 -04:00
W. Felix Handte	8706bc115a	Nit: Dedup idx0 and idx1	2021-09-01 14:15:04 -04:00
W. Felix Handte	7c24c3e6ce	Give Up on Searching End of Block Amusingly, it seems to be a non-trivial performance hit to add in final searches or even hash table insertions during cleanup. So let's not. It seems to not make any meaningful difference in compression ratio.	2021-09-01 14:15:03 -04:00
W. Felix Handte	35932ab2f1	Prefetch Input in Incompressible Sections (+0.25% Speed)	2021-09-01 14:15:03 -04:00
W. Felix Handte	b092dd75b7	Shrink Pipeline from 4 Positions to 3	2021-09-01 14:15:03 -04:00
W. Felix Handte	387840af79	Re-Order Operations for Slightly Better Performance	2021-09-01 14:15:03 -04:00
W. Felix Handte	bc768bccc0	Track Step Size Statefully, Rather than Recalculating Every Time	2021-09-01 14:15:03 -04:00
W. Felix Handte	80bc12b33a	Initial Pipelined Implementation for ZSTD_fast	2021-09-01 14:15:03 -04:00
Nick Terrell	10b35b312b	[lib] Fix off-by-one error in repcode checks The repcode checks disallowed repcodes that are equal to `windowLow`. This is slightly inefficient, but isn't a problem on its own. Together with the next commit, it cause non-determinism.	2021-05-13 17:05:59 -07:00
Sen Huang	e6c8a5dd40	Fix incorrect usages of repIndex across all strategies	2021-05-04 19:50:55 -04:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Thomas Waldmann	f9802d80a0	fix typos (work done by Andrea Gelmini)	2021-01-07 18:47:23 +01:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Nick Terrell	f91ed5c766	[lib] s/current/curr because it collides with Linux Kernel macro	2020-09-09 14:35:39 -07:00
Yann Collet	fdc56baa42	fix 22294 (#2151 )	2020-05-18 21:05:10 -07:00
Nick Terrell	4e0515916d	[lib] Fix repcode validation in no dict mode	2020-05-12 11:57:15 -07:00
Yann Collet	54144285fd	small speed improvement for strategy fast gcc 9.3.0 : kennedy : 459 -> 466 silesia : 360 -> 365 enwik8 : 267 -> 269 clang 10.0.0 : kennedy : 436 -> 441 silesia : 364 -> 366 enwik8 : 271 -> 272	2020-05-07 06:15:58 -07:00
Nick Terrell	5fcbc484c8	Merge pull request #2040 from caoyzh/dev-2 Optimize by prefetching on aarch64	2020-04-08 13:14:47 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
caoyzh	7201980650	Optimize by prefetching on aarch64	2020-03-14 15:25:59 +08:00
Bimba Shrestha	43fc88f443	Adding comment and remvoing ivdep	2020-03-10 14:57:27 -05:00

1 2

99 Commits