148 Commits

Author SHA1 Message Date
Yann Collet
aba88fa996
Merge pull request #2829 from facebook/ZSTD_DECODER_INTERNAL_BUFFER
minor : change build macro to ZSTD_DECODER_INTERNAL_BUFFER
2021-10-26 10:48:16 -07:00
Yann Collet
518f06b281 added minimum for decoder buffer
also : introduced macro BOUNDED()
2021-10-26 08:21:31 -07:00
Nick Terrell
13cad3abb1 [lazy] Speed up compilation times
Speed up compilation times by moving each specialized search function
into its own function. This is faster because compilers can handle many
smaller functions much faster than one gigantic function. The previous
approach generated one giant function with `switch` statements and
inlining to select the implementation.

| Compiler | Flags                               | Dev Time (s) | PR Time (s) | Delta |
|----------|-------------------------------------|--------------|-------------|-------|
| gcc      | -O3                                 |         16.5 |         5.6 |  -66% |
| gcc      | -O3 -g -fsanitize=address,undefined |        158.9 |        38.2 |  -75% |
| clang    | -O3                                 |         36.5 |         5.5 |  -85% |
| clang    | -O3 -g -fsanitize=address,undefined |         27.8 |        17.5 |  -37% |

This also reduces the binary size because the search functions are no
longer inlined into the main body.

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1563868 |               1308844 |  -16% |
| clang    |                1924372 |               1376020 |  -28% |

Finally, the performance is not impacted significantly by this change,
in fact we generally see a small speed boost.

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|-------|
| gcc      |     5 |            110.6 |           110.0 | -0.5% |
| gcc      |     7 |             70.4 |            72.2 | +2.5% |
| gcc      |     9 |             53.2 |            53.5 | +0.5% |
| gcc      |    13 |             12.7 |            12.9 | +1.5% |
| clang    |     5 |            113.9 |           110.4 | -3.0% |
| clang    |     7 |             67.7 |            70.6 | +4.2% |
| clang    |     9 |             51.9 |            52.2 | +0.5% |
| clang    |    13 |             12.4 |            13.3 | +7.2% |

The compression strategy is unmodified in this PR, so the compressed size
should be exactly the same. I may have a follow up PR to slightly improve
the compression ratio, if it doesn't cost too much speed.
2021-10-22 13:45:26 -07:00
Yann Collet
9d62957b31
Merge pull request #2800 from animalize/fix_c89
Fix a C89 error in msvc
2021-10-18 14:32:04 -07:00
Nick Terrell
b77d95b053
Merge pull request #2820 from terrelln/nb-compares
[binary-tree] Fix underflow of nbCompares
2021-10-11 09:59:57 -07:00
Nick Terrell
c6c482fe07 [binary-tree] Fix underflow of nbCompares
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.

The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 13:22:55 -07:00
Nick Terrell
399644b1f1 [nit] Fix buggy indentation
The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 11:13:11 -07:00
Sen Huang
4b7f45cb04 Pull hot loop into its own function 2021-09-28 08:19:44 -07:00
Sen Huang
ccdcbf4621 Try beginning and end of match 2021-09-28 08:19:44 -07:00
Sen Huang
b8fd6bf30c Skip most long matches in lazy hash table update 2021-09-28 08:19:39 -07:00
Ma Lin
ae986fcdb8 Use __assume(0) for unreachable code path in msvc
msvc will optimize away the condition check.
2021-09-27 19:23:57 +08:00
Ma Lin
e5ba858270 Don't initialize the first parameter of _BitScanForward* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64
2021-09-25 16:36:53 +08:00
Ma Lin
cc22042da0 Fix a C89 error in msvc
Variables (r) must be declared at the beginning of a code block.
This causes msvc2012 to fail to compile 64-bit build.
2021-09-25 16:32:06 +08:00
Yann Collet
fa2a4d77c7 constify MatchState* parameter when possible
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.

This is supposed to be helpful for both maintenance and the compiler.
2021-09-23 08:27:44 -07:00
Sen Huang
539b3aab9b Optimize 32-bit VecMask_next() 2021-08-04 17:14:58 -04:00
senhuang42
e411040ea1 Add 64 row entry support for lazy 2021-08-04 16:19:12 -04:00
sen
5c46f62006
Merge pull request #2677 from senhuang42/ci_overhaul_2
[CI][2/2] Migrate CI tests which (currently) fail
2021-08-02 09:55:49 -04:00
Sen Huang
5ec7897a26 Fix static analyzer warnings 2021-07-29 09:11:12 -07:00
W. Felix Handte
da58821ff2 Fix DDSS Load
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.

Credit-to: OSS-Fuzz
2021-07-27 11:49:44 -04:00
aqrit
dd4f6aa9e6
Flatten ZSTD_row_getMatchMask (#2681)
* Flatten ZSTD_row_getMatchMask

* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary. 
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)

* replace memcpy with MEM_readST

* silence alignment warnings

* fix neon casts

* Update zstd_lazy.c

* unify simd preprocessor detection (#3)

* remove duplicate asserts

* tweak rotates

* improve endian detection

* add cast

there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..

* more casts

* fix casts

better work-around for the (bogus) warning: unary minus on unsigned
2021-06-09 08:50:25 +03:00
Felix Handte
8a3bdfaa7b
Merge pull request #2654 from wolfpld/dev
Initialize "potentially uninitialized" pointers.
2021-06-07 13:04:19 -04:00
Yann Collet
02ece5d59f
Merge pull request #2653 from TrianglesPCT/dev
Enable SSE2 compression path to work on MSVC
2021-05-17 11:20:50 -07:00
Dan Nelson
54f78e3df8 ZSTD_VecMask_next: fix incorrect variable name in fallback code path 2021-05-15 10:20:37 -05:00
TrianglesPCT
bee0ef5647
Update zstd_lazy.c
It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.
2021-05-14 19:23:13 -06:00
TrianglesPCT
d688ab1e0c
Add files via upload
AVX2
2021-05-14 19:18:12 -06:00
TrianglesPCT
bb1cdd8c63
Update zstd_lazy.c
add space
2021-05-14 19:11:28 -06:00
TrianglesPCT
a62856bf65
Update zstd_lazy.c
Remove the AVX2 part
2021-05-14 19:10:24 -06:00
TrianglesPCT
8f7ea1afeb
Update zstd_lazy.c
Switch to other comment style
2021-05-14 19:02:34 -06:00
TrianglesPCT
0e071214b5
Update zstd_lazy.c
switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads
2021-05-14 17:03:30 -06:00
TrianglesPCT
69ac124b12
Update zstd_lazy.c 2021-05-14 16:53:19 -06:00
TrianglesPCT
0b9f4bb0ff
Update zstd_lazy.c
use 8bit
2021-05-14 16:47:24 -06:00
Bartosz Taudul
7012c6e7a4
Initialize "potentially uninitialized" pointers. 2021-05-15 00:40:49 +02:00
TrianglesPCT
77d54eb3b3
Add files via upload 2021-05-14 16:40:32 -06:00
TrianglesPCT
25bda9053a
Add files via upload
msvc suport
avx2 path
2021-05-14 16:32:04 -06:00
Nick Terrell
10b35b312b [lib] Fix off-by-one error in repcode checks
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
2021-05-13 17:05:59 -07:00
Sen Huang
e6c8a5dd40 Fix incorrect usages of repIndex across all strategies 2021-05-04 19:50:55 -04:00
felixhandte
efa6dfa729 Apply DDS adjustments to avoid assert failures 2021-04-23 16:41:00 -04:00
Sen Huang
8844f93957 Adjust nb elements to prefetch in ZSTD_row_fillHashCache() 2021-04-12 14:24:58 -04:00
Sen Huang
4d63d6e8aa Update results.csv, add Row hash to regression test 2021-04-07 10:31:41 -07:00
Nick Terrell
4694423c4f Add and integrate lazy row hash strategy 2021-04-07 09:53:34 -07:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
W. Felix Handte
c5fab8848a Document searchFuncs Table 2020-09-10 22:10:02 -04:00
W. Felix Handte
85a95840e4 Further Consolidate Dict Mode Checks 2020-09-10 22:10:02 -04:00
W. Felix Handte
efa33861f2 Attempt to Fix MSVC Warnings 2020-09-10 22:10:02 -04:00
W. Felix Handte
ed43832770 Simplify Match Limit Checks
Seems like a ~1.25% speedup.
2020-09-10 22:10:02 -04:00
W. Felix Handte
06d240b8a7 Use All Available Space in the Hash Table to Extent Chain Table Reach
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
2020-09-10 22:10:02 -04:00
W. Felix Handte
b2b0641ea0 Rewrite Table Fill to Retain Cache Entries Beyond Chain Window 2020-09-10 22:10:02 -04:00
W. Felix Handte
916238d9dc Avoid Malloc in Table Fill; Pack Tmp Structure into Hash Table 2020-09-10 22:10:02 -04:00
W. Felix Handte
f42c5bddd9 Truncate Chain at Last Possible Attempt
Make the chain table denser?
2020-09-10 22:10:02 -04:00