facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Daniel Kutenin	05f3f415ce	Fix big endian ARM NEON path It is not using the NEON acceleration but the bit grouping was applied	2022-06-13 09:16:24 +01:00
Nick Terrell	3b1bd91852	Merge pull request #3141 from JunHe77/seqDec dec: adjust seqSymbol load on aarch64	2022-06-09 13:40:51 -07:00
Nick Terrell	3b915cd94b	Merge pull request #3145 from JunHe77/wildcopy common: apply two stage copy to aarch64	2022-06-09 13:38:30 -07:00
Elliot Gorokhovsky	f313a773a4	Merge pull request #3157 from embg/huge_dict_bugfix Bugfix for huge dictionaries	2022-06-09 15:35:29 -04:00
Elliot Gorokhovsky	31bd6402c6	Bugfix for huge dictionaries	2022-06-09 11:39:30 -04:00
Yann Collet	27bf96e72b	updated --single-thread man	2022-06-07 17:45:15 -07:00
Nick Terrell	802ad778cc	Merge pull request #3154 from terrelln/rsyncable-speed-fix Remove expensive assert in --rsyncable hot loop	2022-06-06 16:07:20 -07:00
Nick Terrell	7c05b9aec3	Remove expensive assert in --rsyncable hot loop This assert slows the loop down by 10x. We can get similar coverage by asserting at the beginning & end of the loop. We need this fix because Debian compiles zstd with asserts enabled. Separately, we should ask them why, and if they would consider disabling asserts in their builds. Since we don't optimize for assert enabled builds. Fixes Issue #3150.	2022-06-06 11:56:13 -07:00
Nick Terrell	9f346dbe45	Merge pull request #3147 from animalize/dev fix leaking thread handles on Windows	2022-06-02 10:04:55 -07:00
Yann Collet	b06d10adbc	Merge pull request #3148 from ihsinme/patch-1 simple fix	2022-06-02 09:58:45 -07:00
Jun He	2491c65937	dec: adjust seqSymbol load on aarch64 ZSTD_seqSymbol is a structure with total of 64 bits wide. So it can be loaded in one operation and extract its fields by simply shifting or extracting on aarch64. GCC doesn't recognize this and generates more unnecessary ldr/ldrb/ldrh operations that cause performance drop. With this change it is observed 2~4% uplift of silesia and 2.5~6% of cantrbry @L8 on Arm N1. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8	2022-05-30 22:01:38 +08:00
ihsinme	5081ccb056	Update zstd_compress.c	2022-05-30 14:08:19 +03:00
Ma Lin	95073b1af1	fix leaking thread handles on Windows On Windows, thread handle should be closed explicitly. Co-authored-by: luben karavelov <luben@users.noreply.github.com>	2022-05-30 16:35:44 +08:00
Jun He	d7249dafb4	common: apply two stage copy to aarch64 On aarch64 ZSTD_wildcopy uses a simple loop to do 16B based memory copy. There is existing optimized two stage copy that can achieve better performance. By applying this to aarch64 it is also observed ~1% uplift in silesia corpus. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Ic1253308e7a8a7df2d08963ba544e086c81ce8be	2022-05-26 14:40:21 +08:00
Yann Collet	9a5e73c74e	Merge pull request #3143 from facebook/fixdoc_3142 fix small error in format documentation example	2022-05-24 10:19:14 -07:00
Nick Terrell	1c8a6974c7	Merge pull request #3139 from danlark1/dev [lazy] Optimize ZSTD_row_getMatchMask for levels 8-10 for ARM	2022-05-24 11:10:26 -04:00
Yann Collet	f33ccd2d1b	fix small error in format documentation example reported by @dkcasset fix #3142	2022-05-24 04:47:49 -07:00
Danila Kutenin	9166c6ae20	Again unused error warning. Fixed	2022-05-23 14:51:47 +00:00
Danila Kutenin	6b561d230f	Move NEON version to a separate function and fix indentation	2022-05-23 14:49:35 +00:00
Danila Kutenin	778f639be9	Disable unused variable warning	2022-05-22 10:50:33 +00:00
Danila Kutenin	e11783b04d	[lazy] Optimize ZSTD_row_getMatchMask for level 8-10 We found that movemask is not used properly or consumes too much CPU. This effort helps to optimize the movemask emulation on ARM. For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5% improvement. The key idea is not to use pure movemasks but to have groups of bits. For rowEntries == 16, 32 we are going to have groups of size 4 and 2 respectively. It means that each bit will be duplicated within the group Then we do AND to have only one bit set in the group so that iteration with lowering bit `a &= (a - 1)` works as well. Also, aarch64 does not have rotate instructions for 16 bit, only for 32 and 64, that's why we see more improvements for level 8-9. vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by 4 every u16 and narrows to 8 lower bits. See the picture below. It's also used in [Folly](`c570259008/folly/container/detail/F14Table.h (L446)`). It also uses 2 cycles according to Neoverse-N{1,2} guidelines. 64 bit movemask is already well optimized. We have ongoing experiments but were not able to validate other implementations work reliably faster.	2022-05-22 10:44:24 +00:00
Yann Collet	fda537b299	Merge pull request #3135 from averred/dev Typo in man	2022-05-20 10:05:16 -07:00
Talha Khan	14894d63c1	Typo in man	2022-05-20 16:53:48 +08:00
Elliot Gorokhovsky	f349d18776	Merge pull request #3127 from embg/repcode_history Correct and clarify repcode offset history logic	2022-05-12 13:50:15 -04:00
Elliot Gorokhovsky	3620a0a565	Nits	2022-05-12 12:53:15 -04:00
Felix Handte	8af64f4116	Merge pull request #3129 from felixhandte/zstd-fast-nodict-unconditional-ip1-table-write ZSTD_fast_noDict: Avoid Safety Check When Writing `ip1` into Table	2022-05-11 17:04:02 -04:00
W. Felix Handte	1bc8019e10	Update results.csv	2022-05-11 10:27:35 -07:00
W. Felix Handte	1dd046a507	Fix Comments Slightly	2022-05-11 12:38:45 -04:00
W. Felix Handte	cd1f582943	Hoist Hash Table Writes Up into Each Match Found Block Refactoring this way avoids the bad write in the case that `step > 4`, and is a bit more straightforward. It also seems to perform better!	2022-05-11 11:27:34 -04:00
W. Felix Handte	040986a4f4	ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1 This commit avoids checking whether a hashtable write is safe in two of the three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro- duces a ~0.5% speed-up in compression. A comment in the code describes why we can skip this check in the other two paths (the repcode check and the first match check in the unrolled loop). A downside is that in the new position where we make this check, we have not yet computed `mLength`. We therefore have to avoid writing possibly dangerous positions, rather than the old check which only avoids writing actually dangerous positions. This leads to a miniscule loss in ratio (remember that this scenario can only been triggered in very negative levels or under incomp- ressibility acceleration).	2022-05-10 14:29:39 -07:00
Elliot Gorokhovsky	22875ece61	Nits	2022-05-09 21:01:38 -04:00
Elliot Gorokhovsky	97aabc496e	Correct and clarify repcode offset history logic	2022-05-09 21:01:38 -04:00
Elliot Gorokhovsky	8bf32de850	Merge pull request #3126 from embg/fix_freebsd_ci Unbreak FreeBSD CI	2022-05-09 19:48:13 -04:00
Elliot Gorokhovsky	83049cb3fe	Unbreak FreeBSD CI	2022-05-09 18:28:03 -04:00
Elliot Gorokhovsky	7915c1164e	Merge pull request #3114 from embg/fast_extdict_pipeline2 Software pipeline for ZSTD_compressBlock_fast_extDict	2022-05-05 15:06:47 -04:00
Elliot Gorokhovsky	3be9a81e46	Update results.csv	2022-05-04 16:05:37 -04:00
Yann Collet	ea763f33cb	Merge pull request #3122 from eli-schwartz/betterlinkage meson: for internal linkage, link to both libzstd and a static copy of it	2022-05-02 10:56:37 -07:00
Eli Schwartz	6548ec7440	meson: for internal linkage, link to both libzstd and a static copy of it Partial, Meson-only implementation of #2976 for non-MSVC builds. Due to the prevalence of private symbol reuse, linking to a shared library is simply utterly unreliable, but we still want to defer to the shared library for installable applications. By linking to both, we can share symbols where possible, and statically link where needed. This means we no longer need to manually track every file that needs to be extracted and reused. The flip side is that MSVC completely does not support this, so for MSVC builds we just link to a full static copy even where -Ddefault_library=shared. As a side benefit, by using library inclusion rather than including extra explicit object files, the zstd program shrinks in size slightly (~4kb).	2022-04-28 21:57:02 -04:00
Eli Schwartz	8d522b8a9d	meson: avoid rebuilding some libzstd sources in the programs These need to be explicitly included as we use their private symbols, but we don't need to recompile them when we can reuse the existing objects. Minus 7 compile steps.	2022-04-28 21:56:36 -04:00
Eli Schwartz	df6eefb3bb	meson: avoid rebuilding some libzstd files in the test programs The poolTests program already linked to libzstd, and later to libtestcommon with included libzstd objects. So this was redundant. Minus 4 compile steps.	2022-04-28 21:56:36 -04:00
Elliot Gorokhovsky	ac371be27b	Remove hasStep variant (not enough wins to justify the code size increase)	2022-04-28 18:06:24 -04:00
Elliot Gorokhovsky	ce6b69f5c5	Final nit	2022-04-28 14:49:45 -04:00
Elliot Gorokhovsky	6a2e1f7c69	Revert "Hardcode repcode safety check, fix cosmetic nits" This reverts commit 518cb83833074d304dfcaa93cfc16039ea4683c8.	2022-04-27 18:16:21 -04:00
Elliot Gorokhovsky	518cb83833	Hardcode repcode safety check, fix cosmetic nits	2022-04-26 17:54:25 -04:00
Yann Collet	86bd977a79	Merge pull request #3117 from cuishuang/dev fix some typos	2022-04-26 10:02:04 -07:00
cuishuang	05796796fd	fix some typos Signed-off-by: cuishuang <imcusg@gmail.com>	2022-04-26 17:40:23 +08:00
Elliot Gorokhovsky	809f652912	Optimize repcode predicate, hardcode hasStep == 0 scenario, cosmetic fixes	2022-04-20 14:40:52 -04:00
Yann Collet	66633f9386	Merge pull request #3039 from eli-schwartz/meson Meson fixups for Windows	2022-04-19 15:51:19 -07:00
Yann Collet	f1faab6720	Merge pull request #3112 from facebook/man2 updated man page, providing more details for --train mode	2022-04-19 15:36:47 -07:00
Elliot Gorokhovsky	2820efe7ec	Nits	2022-04-19 11:39:52 -04:00

1 2 3 4 5 ...

9597 Commits