facebook/zstd - zstd - Final Minetest

Author	SHA1	Message	Date
Jun He	ec5fdcde19	lib: add hint to generate more pipeline friendly code (#3138 ) With statistic data of test data files of silesia the chance of position beyond highThreshold is very low (~1.3%@L8 in most cases, all <2.5%), and is in "lowprob area". Add the branch hint so compiler can get better pipiline codegen. With this change it is observed ~1% of mozilla and xml, and slight (0.3%~0.8%) but consistent uplift on other files on Arm N1. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade	2022-07-29 10:28:04 -07:00
Jun He	558cf20d0d	decomp: add prefetch for matched seq on aarch64 (#3164 ) match is used for following sequence copy. It is only updated when extDict is needed, which is a low probability case. So it can be prefetched to reduce cache miss. The benchmarks on various Arm platforms showed uplift from 1% ~ 14% with gcc-11/clang-14. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: If201af4799d2455d74c79f8387404439d7f684ae	2022-07-29 10:27:20 -07:00
udayanbapat	43f21a600e	Intial commit to address 3090. Added support to decompress empty block. (#3118 ) * Intial commit to address 3090. Added support to decompress empty block * Update zstd_decompress_block.c Addressed review comments for the case of 'set_basic' * Update lib/decompress/zstd_decompress_block.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com> * Update lib/decompress/zstd_decompress_block.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com> Co-authored-by: Nick Terrell <nickrterrell@gmail.com>	2022-07-14 11:54:34 -07:00
Jun He	2491c65937	dec: adjust seqSymbol load on aarch64 ZSTD_seqSymbol is a structure with total of 64 bits wide. So it can be loaded in one operation and extract its fields by simply shifting or extracting on aarch64. GCC doesn't recognize this and generates more unnecessary ldr/ldrb/ldrh operations that cause performance drop. With this change it is observed 2~4% uplift of silesia and 2.5~6% of cantrbry @L8 on Arm N1. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8	2022-05-30 22:01:38 +08:00
Dominique Pelle	b772f53952	Typo and grammar fixes	2022-03-12 08:58:04 +01:00
Elliot Gorokhovsky	db2f4a6532	Move bitwise builtins into bits.h	2022-02-14 11:16:03 -05:00
Yann Collet	7616e39f3b	adding traces to better track processing of literals	2022-01-26 14:47:21 -08:00
Norbert Lange	2fbb1d10c1	Reduce bit tables to 8bit This saves some 1.7Kb in rodata section (x86_64, zstd tool), while assembler code stays the same except the type of a few load/extend instructions. Should not have negative performance implications.	2021-12-14 23:47:57 +01:00
Nick Terrell	5414dd7978	[bmi2] Add lzcnt and bmi target attributes * When dynamic dispatching to bmi2 add lzcnt and bmi to the TARGET_ATTRIBUTE. * Centralize the bmi2 TARGET_ATTRIBUTE definition to BMI2_TARGET_ATTRIBUTE so we can change it in the future. * Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't be any cases where bmi2 is supported but bmi1 isn't. But, since we are using the instruction we should check bmi1 as well.	2021-11-30 17:54:56 -08:00
binhdvo	04734ee84a	Fix oss fuzz test error (#2837 )	2021-10-29 10:29:50 -04:00
binhdvo	6a7ede3dfc	Reduce size of dctx by reutilizing dst buffer (#2751 ) * Reduce size of dctx by reutilizing dst buffer Co-authored-by: Binh Vo <binhvo@fb.com>	2021-10-25 10:38:01 -04:00
Norbert Lange	0d45540695	decompress: conditionally remove bmi2 from context Use an helper function, which will just return 0 in case the feature is disabled. Allows constant propagation and removal of dead code.	2021-09-26 14:41:37 +02:00
Nick Terrell	189e87bcbe	[lib] Make lib compatible with `-Wfall-through` excepting legacy Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported compilers this uses an attribute, otherwise it becomes a comment. This is necessary to be compatible with clang's `-Wfall-through`, and gcc's `-Wfall-through=2` which don't support comments. Without this the linux build emits a bunch of warnings. Also add a test to CI to ensure that we don't regress.	2021-09-23 10:51:18 -07:00
Danila Kutenin	2c2c9e7dfd	Add possible improvements for gcc-11	2021-06-29 09:06:47 +01:00
Danila Kutenin	08a3ddbd28	Add comment for gcc-11	2021-06-08 20:54:21 +01:00
Danila Kutenin	6534c0000f	Be C89 compliant and fix alignment for gcc11	2021-06-08 20:45:57 +01:00
Danila Kutenin	a80d268700	Optimize ZSTD_decodeSequence by another x%	2021-05-29 18:21:10 +01:00
Yann Collet	439e58d060	improved gcc-9 and gcc-10 decoding speed the new alignment setting is better for gcc-9 and gcc-10 by about ~+5%. Unfortunately, it's worse for essentially all other compilers. Make the new alignment setting conditional to gcc-9+.	2021-05-08 00:01:01 -07:00
Yann Collet	6755baf940	update decoder hot loop alignment This seems to bring an additional ~+1.2% decompression speed on average across 10 compilers x 6 scenarios.	2021-05-07 15:18:16 -07:00
Yann Collet	1db5947591	improve decompression speed of long variant by ~+5% changed strategy, now unconditionally prefetch the first 2 cache lines, instead of cache lines corresponding to the first and last bytes of the match. This better corresponds to cpu expectation, which should auto-prefetch following cachelines on detecting the sequential nature of the read. This is globally positive, by +5%, though exact gains depend on compiler (from -2% to +15%). The only negative counter-example is gcc-9.	2021-05-07 11:26:14 -07:00
Yann Collet	ee425faaa7	Merge branch 'dev' into d_prefetch_refactor	2021-05-06 19:49:26 -07:00
Nick Terrell	b052b583e5	[lib] Fix UBSAN warning in ZSTD_decompressSequences()	2021-05-06 15:31:30 -07:00
Yann Collet	7ef6d7b36c	deeper prefetching pipeline for decompressSequencesLong pipeline increased from 4 to 8 slots. This change substantially improves decompression speed when there are long distance offsets. example with enwik9 compressed at level 22 : gcc-9 : 947 -> 1039 MB/s clang-10: 884 -> 946 MB/s I also checked the "cold dictionary" scenario, and found a smaller benefit, around ~2% (measurements are more noisy for this scenario).	2021-05-05 10:04:03 -07:00
Yann Collet	8cde167a27	Merge branch 'dev' into d_prefetch_refactor	2021-05-05 09:13:38 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Yann Collet	f5434663ea	Refactor prefetching for the decoding loop Following #2545, I noticed that one field in `seq_t` is optional, and only used in combination with prefetching. (This may have contributed to static analyzer failure to detect correct initialization). I then wondered if it would be possible to rewrite the code so that this optional part is handled directly by the prefetching code rather than delegated as an option into `ZSTD_decodeSequence()`. This resulted into this refactoring exercise where the prefetching responsibility is better isolated into its own function and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations. Incidently, due to better code locality, it reduces the need to send information around, leading to simplified interface, and smaller state structures.	2021-03-19 15:48:17 -07:00
Nick Terrell	f9b1e711ba	[zstd] Fix NULL pointer addition in ZSTD_checkContinuity() Don't start a new section when `dstSize == 0` to avoid NULL pointer addition.	2021-02-05 12:18:06 -08:00
Yann Collet	b9748757b0	fixed minor cast warning	2021-02-05 09:55:54 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Yann Collet	0b39531d75	moving all references to `release` branch was previously `master`	2020-12-16 23:00:35 -08:00
Nick Terrell	c465f24457	ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free	2020-08-26 12:26:03 -07:00
Nick Terrell	80f577baa2	Move standard includes to zstd_deps.h	2020-08-26 12:25:08 -07:00
Nick Terrell	614e446000	Merge pull request #2271 from terrelln/small-blocks Small block optimizations	2020-08-24 18:54:33 -07:00
Nick Terrell	52f33a1da5	Fix compiler warnings	2020-08-24 16:09:45 -07:00
Nick Terrell	6f301a7903	Merge pull request #2272 from terrelln/dstSize_tooSmall [fix] Always return dstSize_tooSmall when it is the case	2020-08-24 15:01:17 -07:00
Nick Terrell	6d2f750b37	Document the BMI2 default() functions	2020-08-24 14:44:33 -07:00
Nick Terrell	1302f8d676	[fix] Always return dstSize_tooSmall when it is the case	2020-08-24 13:38:13 -07:00
Nick Terrell	575731b6db	Use ncount=1 when < 4096 symbols	2020-08-18 16:47:53 -07:00
Nick Terrell	612e947c5e	wire up bmi2 support	2020-08-17 16:35:28 -07:00
Nick Terrell	ba1fd17a9f	speed up literal header decoding	2020-08-17 12:17:53 -07:00
Nick Terrell	6004c1117f	speed up small blocks	2020-08-16 23:03:38 -07:00
Carl Woffenden	4c81fae146	Fix clang -Wcomma warning	2020-08-13 16:11:22 +02:00
Nick Terrell	cce0edfdbe	Fix unused variable warnings in fuzzing build mode without asserts Fix unused vairable warnings when `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` is defined but asserts are disabled. Fixes #2210.	2020-06-22 12:56:57 -07:00
Nick Terrell	f800e72a3c	[lib] Fix assertion when dictionary is prefix	2020-05-12 14:33:59 -07:00
Nick Terrell	4b88bd3ee0	[lib][fuzz] Assert sequences are valid in round trip tests	2020-05-11 20:38:49 -07:00
Nick Terrell	5717bd39ee	[lib] Fix NULL pointer dereference When the output buffer is `NULL` with size 0, but the frame content size is non-zero, we will write to the NULL pointer because our bounds check underflowed. This was exposed by a recent PR that allowed an empty frame into the single-pass shortcut in streaming mode. * Fix the bug. * Fix another NULL dereference in zstd-v1. * Overflow checks in 32-bit mode. * Add a dedicated test. * Expose the bug in the dedicated simple_decompress fuzzer. * Switch all mallocs in fuzzers to return NULL for size=0. * Fix a new timeout in a fuzzer. Neither clang nor gcc show a decompression speed regression on x86-64. On x86-32 clang is slightly positive and gcc loses 2.5% of speed. Credit to OSS-Fuzz.	2020-05-06 12:09:02 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
W. Felix Handte	5e5f262612	Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations	2020-05-04 10:58:55 -04:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Nick Terrell	8d0ee37ac0	Align decompress sequences loop to 32+16 bytes The alignment is added before the loop, so this shouldn't hurt performance in any case. The only way it hurts is if there is already performance instability, and we force it to be stable but in the bad case. This consistently gets us into the good case with gcc-{7,8,9} on an Intel i9-9900K and clang-9. gcc-5 is 5% worse than its best case but has stable performance. We get consistently good behavior on my Macbook Pro compiled with both clang and gcc-8. It ends up in the 50% from DSB and 50% from MITE case, but the performance is the same as the 85% DSB case, so thats fine.	2020-03-23 19:40:31 -07:00

1 2

83 Commits