facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
W. Felix Handte	15e67bfa7e	Deduplicate Implementations This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new `ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is functionally a no-op.	2021-09-01 14:15:04 -04:00
W. Felix Handte	64054dec44	Tweak Step	2021-09-01 14:15:04 -04:00
W. Felix Handte	24fcccd05c	Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed) Unrolling the loop to handle 2 positions in each iteration allows us to reduce the frequency of some operations that don't need to happen at every position. One such operation is the step calculation, which is a very rough heuristic anyways. It's fine if we do this a position later. The other operation is the repcode check. But since the repcode check already tries expanding back one position, we're really not missing much of importance by only trying it every other position. This commit also slightly reorders some operations.	2021-09-01 14:15:04 -04:00
W. Felix Handte	57a100f6dc	Add `ip1 + 128` Prefetch; Tiny Cleanup	2021-09-01 14:15:04 -04:00
W. Felix Handte	991d660ea9	Nit: Only Store 2 Hash Variables	2021-09-01 14:15:04 -04:00
W. Felix Handte	8706bc115a	Nit: Dedup idx0 and idx1	2021-09-01 14:15:04 -04:00
W. Felix Handte	7c24c3e6ce	Give Up on Searching End of Block Amusingly, it seems to be a non-trivial performance hit to add in final searches or even hash table insertions during cleanup. So let's not. It seems to not make any meaningful difference in compression ratio.	2021-09-01 14:15:03 -04:00
W. Felix Handte	35932ab2f1	Prefetch Input in Incompressible Sections (+0.25% Speed)	2021-09-01 14:15:03 -04:00
W. Felix Handte	b092dd75b7	Shrink Pipeline from 4 Positions to 3	2021-09-01 14:15:03 -04:00
W. Felix Handte	387840af79	Re-Order Operations for Slightly Better Performance	2021-09-01 14:15:03 -04:00
W. Felix Handte	bc768bccc0	Track Step Size Statefully, Rather than Recalculating Every Time	2021-09-01 14:15:03 -04:00
W. Felix Handte	80bc12b33a	Initial Pipelined Implementation for ZSTD_fast	2021-09-01 14:15:03 -04:00
Nick Terrell	10b35b312b	[lib] Fix off-by-one error in repcode checks The repcode checks disallowed repcodes that are equal to `windowLow`. This is slightly inefficient, but isn't a problem on its own. Together with the next commit, it cause non-determinism.	2021-05-13 17:05:59 -07:00
Sen Huang	e6c8a5dd40	Fix incorrect usages of repIndex across all strategies	2021-05-04 19:50:55 -04:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Thomas Waldmann	f9802d80a0	fix typos (work done by Andrea Gelmini)	2021-01-07 18:47:23 +01:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Nick Terrell	f91ed5c766	[lib] s/current/curr because it collides with Linux Kernel macro	2020-09-09 14:35:39 -07:00
Yann Collet	fdc56baa42	fix 22294 (#2151 )	2020-05-18 21:05:10 -07:00
Nick Terrell	4e0515916d	[lib] Fix repcode validation in no dict mode	2020-05-12 11:57:15 -07:00
Yann Collet	54144285fd	small speed improvement for strategy fast gcc 9.3.0 : kennedy : 459 -> 466 silesia : 360 -> 365 enwik8 : 267 -> 269 clang 10.0.0 : kennedy : 436 -> 441 silesia : 364 -> 366 enwik8 : 271 -> 272	2020-05-07 06:15:58 -07:00
Nick Terrell	5fcbc484c8	Merge pull request #2040 from caoyzh/dev-2 Optimize by prefetching on aarch64	2020-04-08 13:14:47 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
caoyzh	7201980650	Optimize by prefetching on aarch64	2020-03-14 15:25:59 +08:00
Bimba Shrestha	43fc88f443	Adding comment and remvoing ivdep	2020-03-10 14:57:27 -05:00
Bimba Shrestha	4c72a1a9c2	adding vector to main loop	2020-03-05 09:55:38 -08:00
Nick Terrell	ddab2a94e8	Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy()	2019-09-20 00:56:20 -07:00
Yann Collet	243200e5bf	minor refactor of ZSTD_fast - reduced variables lifetime - more accurate code comments	2019-09-17 14:02:57 -07:00
Yann Collet	facbe8b2c2	factored the logic selecting lowest match index as suggested by @terrelln	2019-08-05 15:18:43 +02:00
Yann Collet	98692c2838	fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3	2019-08-01 15:58:17 +02:00
Yann Collet	a30febaeeb	Made fast strategy compatible with new offset validation strategy fast mode does the same thing as before : it pre-emptively invalidates any index that could lead to offset > maxDistance. It's supposed to help speed. But this logic is performed inside zstd_fast, so that other strategies can select a different behavior.	2019-05-31 16:34:55 -07:00
Nick Terrell	95624b77e4	[libzstd] Speed up single segment zstd_fast by 5% This PR is based on top of PR #1563. The optimization is to process two input pointers per loop. It is based on ideas from [igzip] level 1, and talking to @gbtucker. \| Platform \| Silesia \| Enwik8 \| \|-------------------------\|-------------\|--------\| \| OSX clang-10 \| +5.3% \| +5.4% \| \| i9 5 GHz gcc-8 \| +6.6% \| +6.6% \| \| i9 5 GHz clang-7 \| +8.0% \| +8.0% \| \| Skylake 2.4 GHz gcc-4.8 \| +6.3% \| +7.9% \| \| Skylake 2.4 GHz clang-7 \| +6.2% \| +7.5% \| Testing on all Silesia files on my Intel i9-9900k with gcc-8 \| Silesia File \| Ratio Change \| Speed Change \| \|--------------\|--------------\|--------------\| \| silesia.tar \| +0.17% \| +6.6% \| \| dickens \| +0.25% \| +7.0% \| \| mozilla \| +0.02% \| +6.8% \| \| mr \| -0.30% \| +10.9% \| \| nci \| +1.28% \| +4.5% \| \| ooffice \| -0.35% \| +10.7% \| \| osdb \| +0.75% \| +9.8% \| \| reymont \| +0.65% \| +4.6% \| \| samba \| +0.70% \| +5.9% \| \| sao \| -0.01% \| +14.0% \| \| webster \| +0.30% \| +5.5% \| \| xml \| +0.92% \| +5.3% \| \| x-ray \| -0.00% \| +1.4% \| Same tests on Calgary. For brevity, I've only included files where compression ratio regressed or was much better. \| Calgary File \| Ratio Change \| Speed Change \| \|--------------\|--------------\|--------------\| \| calgary.tar \| +0.30% \| +7.1% \| \| geo \| -0.14% \| +25.0% \| \| obj1 \| -0.46% \| +15.2% \| \| obj2 \| -0.18% \| +6.0% \| \| pic \| +1.80% \| +9.3% \| \| trans \| -0.35% \| +5.5% \| We gain 0.1% of compression ratio on Silesia. We gain 0.3% of compression ratio on enwik8. I also tested on the GitHub and hg-commands datasets without a dictionary, and we gain a small amount of compression ratio on each, as well as speed. I tested the negative compression levels on Silesia on my Intel i9-9900k with gcc-8: \| Level \| Ratio Change \| Speed Change \| \|-------\|--------------\|--------------\| \| -1 \| +0.13% \| +6.4% \| \| -2 \| +4.6% \| -1.5% \| \| -3 \| +7.5% \| -4.8% \| \| -4 \| +8.5% \| -6.9% \| \| -5 \| +9.1% \| -9.1% \| Roughly, the negative levels now scale half as quickly. E.g. the new level 16 is roughly equivalent to the old level 8, but a bit quicker and smaller. If you don't think this is the right trade off, we can change it to multiply the step size by 2, instead of adding 1. I think this makes sense, because it gives a bit slower ratio decay. [igzip]: https://github.com/01org/isa-l/tree/master/igzip	2019-04-02 19:02:50 -07:00
Nick Terrell	f00407b640	Split out zstd_fast dict match state function	2019-03-29 10:39:16 -06:00
Yann Collet	ed2fb6bd57	fixed : better error message when dictionary missing during benchmark. Also : refactored ZSTD_fillHashTable(), just for readability (it does the same thing)	2018-12-20 17:20:07 -08:00
Yann Collet	e874dacc08	changed searchLength into minMatch refactored all relevant API and calls for consistency.	2018-11-20 14:56:07 -08:00
W. Felix Handte	bad74c4781	Use Working Ctx Logs when not in DMS Mode We pre-hash the ptr for the dict match state sometimes. When that actually happens, a hashlog of 0 can produce undefined behavior (right shift a long long by 64). Only applies to unoptimized compilations, since when optimizations are applied, those hash operations are dropped when we're not actually in dms mode.	2018-09-28 17:12:54 -07:00
W. Felix Handte	fe96e98f81	Support a Separate Hash Log in ZSTD_fast	2018-09-28 17:12:54 -07:00
W. Felix Handte	bc880ebe8f	Stop Passing in `hashLog` and `stepSize` to `ZSTD_compressBlock_fast_generic`	2018-09-28 17:12:54 -07:00
W. Felix Handte	dcdf437fed	Also Remove CParams from Table Filling Functions' Args	2018-09-28 17:10:42 -07:00
W. Felix Handte	6cb2454646	Remove CParams from Block Compressor Functions' Args	2018-09-28 17:10:42 -07:00
Yann Collet	f98c69d77c	fix : huge (>4GB) stream of blocks experimental function ZSTD_compressBlock() is designed for very small data in mind, for situation where saving the ~12 bytes of frame header can actually make a difference. Some systems though may have to deal with small and large data entangled. If it's larger than a block (> 128KB), compressBlock() cannot compress them in one round. That's why it's possible to compress in multiple rounds. This is a chain of compressed blocks. Some users push this capability to the limit, encoding gigantic chain of blocks. On crossing the 4GB limit, some internal overflow occurs. This fix moves the overflow correction mechanism higher in the call chain, so that it's applied also to gigantic chains of blocks. Added a test case in fuzzer.c, which crashes before the fix, and pass now.	2018-09-26 14:24:28 -07:00
W. Felix Handte	b048af5999	ZSTD_fast: Don't Search Dict Context When Mismatch Was Found	2018-09-14 15:23:35 -07:00
Yann Collet	c2c47e24e0	support targetlen==0 with strategy==ZSTD_fast to mean "normal compression", targetlen >= 1 now means "disable huffman compression of literals"	2018-06-07 15:49:01 -07:00
Yann Collet	357c648c3f	changed a few variable names to unify naming convention	2018-06-04 17:10:50 -07:00
Yann Collet	2108decb41	Fixed a nasty corruption bug recently introduce into the new dictionary mode. The bug could be reproduced with this command : ./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639 error was in function ZSTD_count_2segments() : the beginning of the 2nd segment corresponds to prefixStart and not the beginning of the current block (istart == src). This would result in comparing the wrong byte.	2018-06-01 18:54:34 -07:00
W. Felix Handte	d9c7e67125	Assert that Dict and Current Window are Adjacent in Index Space	2018-05-23 17:53:03 -04:00
W. Felix Handte	298d24fa57	Make loadedDictEnd an Index, not the Dict Len	2018-05-23 17:53:03 -04:00
W. Felix Handte	7ef85e0618	Fixes in re Comments	2018-05-23 17:53:03 -04:00
W. Felix Handte	9c92223468	Avoid Undefined Behavior in Match Ptr Calculation	2018-05-23 17:53:03 -04:00
W. Felix Handte	95bdf20a87	Moar Renames	2018-05-23 17:53:03 -04:00

1 2

74 Commits