83 Commits

Author SHA1 Message Date
Jun He
ec5fdcde19
lib: add hint to generate more pipeline friendly code (#3138)
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
2022-07-29 10:28:04 -07:00
Jun He
558cf20d0d
decomp: add prefetch for matched seq on aarch64 (#3164)
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
2022-07-29 10:27:20 -07:00
udayanbapat
43f21a600e
Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
Jun He
2491c65937 dec: adjust seqSymbol load on aarch64
ZSTD_seqSymbol is a structure with total of 64 bits
wide. So it can be loaded in one operation and
extract its fields by simply shifting or extracting
on aarch64.
GCC doesn't recognize this and generates more
unnecessary ldr/ldrb/ldrh operations that cause
performance drop.
With this change it is observed 2~4% uplift of
silesia and 2.5~6% of cantrbry @L8 on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8
2022-05-30 22:01:38 +08:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
Elliot Gorokhovsky
db2f4a6532 Move bitwise builtins into bits.h 2022-02-14 11:16:03 -05:00
Yann Collet
7616e39f3b adding traces to better track processing of literals 2022-01-26 14:47:21 -08:00
Norbert Lange
2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
Nick Terrell
5414dd7978 [bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
  TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
  BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
  be any cases where bmi2 is supported but bmi1 isn't. But, since we are
  using the instruction we should check bmi1 as well.
2021-11-30 17:54:56 -08:00
binhdvo
04734ee84a
Fix oss fuzz test error (#2837) 2021-10-29 10:29:50 -04:00
binhdvo
6a7ede3dfc
Reduce size of dctx by reutilizing dst buffer (#2751)
* Reduce size of dctx by reutilizing dst buffer

Co-authored-by: Binh Vo <binhvo@fb.com>
2021-10-25 10:38:01 -04:00
Norbert Lange
0d45540695 decompress: conditionally remove bmi2 from context
Use an helper function, which will just return 0 in case
the feature is disabled.
Allows constant propagation and removal of dead code.
2021-09-26 14:41:37 +02:00
Nick Terrell
189e87bcbe [lib] Make lib compatible with -Wfall-through excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Danila Kutenin
2c2c9e7dfd Add possible improvements for gcc-11 2021-06-29 09:06:47 +01:00
Danila Kutenin
08a3ddbd28 Add comment for gcc-11 2021-06-08 20:54:21 +01:00
Danila Kutenin
6534c0000f Be C89 compliant and fix alignment for gcc11 2021-06-08 20:45:57 +01:00
Danila Kutenin
a80d268700 Optimize ZSTD_decodeSequence by another x% 2021-05-29 18:21:10 +01:00
Yann Collet
439e58d060 improved gcc-9 and gcc-10 decoding speed
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.

Unfortunately, it's worse for essentially all other compilers.

Make the new alignment setting conditional to gcc-9+.
2021-05-08 00:01:01 -07:00
Yann Collet
6755baf940 update decoder hot loop alignment
This seems to bring an additional ~+1.2% decompression speed
on average across 10 compilers x 6 scenarios.
2021-05-07 15:18:16 -07:00
Yann Collet
1db5947591 improve decompression speed of long variant by ~+5%
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.

This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.

This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
2021-05-07 11:26:14 -07:00
Yann Collet
ee425faaa7 Merge branch 'dev' into d_prefetch_refactor 2021-05-06 19:49:26 -07:00
Nick Terrell
b052b583e5 [lib] Fix UBSAN warning in ZSTD_decompressSequences() 2021-05-06 15:31:30 -07:00
Yann Collet
7ef6d7b36c deeper prefetching pipeline for decompressSequencesLong
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
2021-05-05 10:04:03 -07:00
Yann Collet
8cde167a27 Merge branch 'dev' into d_prefetch_refactor 2021-05-05 09:13:38 -07:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Yann Collet
f5434663ea Refactor prefetching for the decoding loop
Following #2545,
I noticed that one field in `seq_t` is optional,
and only used in combination with prefetching.
(This may have contributed to static analyzer failure to detect correct initialization).

I then wondered if it would be possible to rewrite the code
so that this optional part is handled directly by the prefetching code
rather than delegated as an option into `ZSTD_decodeSequence()`.

This resulted into this refactoring exercise
where the prefetching responsibility is better isolated into its own function
and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations.
Incidently, due to better code locality,
it reduces the need to send information around,
leading to simplified interface, and smaller state structures.
2021-03-19 15:48:17 -07:00
Nick Terrell
f9b1e711ba [zstd] Fix NULL pointer addition in ZSTD_checkContinuity()
Don't start a new section when `dstSize == 0` to avoid NULL
pointer addition.
2021-02-05 12:18:06 -08:00
Yann Collet
b9748757b0 fixed minor cast warning 2021-02-05 09:55:54 -08:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Yann Collet
0b39531d75 moving all references to release branch
was previously `master`
2020-12-16 23:00:35 -08:00
Nick Terrell
c465f24457 ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free 2020-08-26 12:26:03 -07:00
Nick Terrell
80f577baa2 Move standard includes to zstd_deps.h 2020-08-26 12:25:08 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell
52f33a1da5 Fix compiler warnings 2020-08-24 16:09:45 -07:00
Nick Terrell
6f301a7903
Merge pull request #2272 from terrelln/dstSize_tooSmall
[fix] Always return dstSize_tooSmall when it is the case
2020-08-24 15:01:17 -07:00
Nick Terrell
6d2f750b37 Document the BMI2 default() functions 2020-08-24 14:44:33 -07:00
Nick Terrell
1302f8d676 [fix] Always return dstSize_tooSmall when it is the case 2020-08-24 13:38:13 -07:00
Nick Terrell
575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell
612e947c5e wire up bmi2 support 2020-08-17 16:35:28 -07:00
Nick Terrell
ba1fd17a9f speed up literal header decoding 2020-08-17 12:17:53 -07:00
Nick Terrell
6004c1117f speed up small blocks 2020-08-16 23:03:38 -07:00
Carl Woffenden
4c81fae146 Fix clang -Wcomma warning 2020-08-13 16:11:22 +02:00
Nick Terrell
cce0edfdbe Fix unused variable warnings in fuzzing build mode without asserts
Fix unused vairable warnings when `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` is defined but asserts are disabled.

Fixes #2210.
2020-06-22 12:56:57 -07:00
Nick Terrell
f800e72a3c [lib] Fix assertion when dictionary is prefix 2020-05-12 14:33:59 -07:00
Nick Terrell
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Nick Terrell
5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
W. Felix Handte
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
8d0ee37ac0 Align decompress sequences loop to 32+16 bytes
The alignment is added before the loop, so this shouldn't hurt
performance in any case. The only way it hurts is if there is already
performance instability, and we force it to be stable but in the bad
case.

This consistently gets us into the good case with gcc-{7,8,9} on an
Intel i9-9900K and clang-9. gcc-5 is 5% worse than its best case but has
stable performance. We get consistently good behavior on my Macbook Pro
compiled with both clang and gcc-8. It ends up in the 50% from DSB and
50% from MITE case, but the performance is the same as the 85% DSB case,
so thats fine.
2020-03-23 19:40:31 -07:00