4238 Commits

Author SHA1 Message Date
Yann Collet
c8d6067615 fixed incorrect rowlog initialization
the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
2021-12-15 14:37:05 -08:00
Yann Collet
eaf786242d
Merge pull request #2929 from facebook/sse_row_lazy
simplify SSE implementation of row_lazy match finder
2021-12-15 11:47:15 -08:00
Norbert Lange
2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
Norbert Lange
99923dfc1a Add typedefs for 8bit (un)signed
To make code more expressive, add U8 and S8 typedefs
2021-12-14 23:47:57 +01:00
binhdvo
64205b7832
Fix performance degradation with -m32 (#2926) 2021-12-14 15:53:50 -05:00
Felix Handte
5e2fede604
Merge pull request #2921 from felixhandte/neg-lvl-stagger-step
Stagger Stepping in Negative Levels
2021-12-14 14:13:57 -05:00
Yann Collet
05430b25a8 roll SSE implementation of row_lazy match finder
mostly for maintenance convenience.

Performance wise, there is very little change,
slightly faster for slog 3 & 4,
neutral or very slightly negative for slot 5 & 6.
2021-12-14 10:44:23 -08:00
W. Felix Handte
82a49c88f9 Increment Step by 1 not 2
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)

Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
2021-12-13 16:59:33 -05:00
W. Felix Handte
6ca5f42402 Rewrite step to Track Increment Between Pairs of Positions
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.

This seems to work quite well on GCC and Clang!
2021-12-13 14:48:26 -05:00
W. Felix Handte
b8434cb754 Allow Templating ZSTD_fast Matchfinders on Acceleration (Lvl < -1) 2021-12-13 14:46:57 -05:00
Yann Collet
e1ab2200ff fixed x32 compatibility 2021-12-10 21:02:17 -08:00
W. Felix Handte
ace6a7e746 Decompose step into Two Variables
This avoids an additional addition, at the cost of an additional variable.
2021-12-10 16:44:23 -05:00
W. Felix Handte
22501cd283 Stagger Application of stepSize in ZSTD_fast
This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
2021-12-10 16:44:23 -05:00
Yann Collet
57383d2317
Merge pull request #2914 from facebook/xxhash081
updated xxHash to latest v0.8.1
2021-12-08 16:48:46 -08:00
Yann Collet
3ce265fea8 remove offending static assert lines
no idea why visual + clang-cl + appveyor don't like them,
I've not been able to reproduce the issue locally,
but these static assert are very unlikely to deliver a useful signal,
I can't imagine a situation where they will be wrong,
and if they are, then a ton of other things will be broken way before reaching that point.
2021-12-08 15:05:17 -08:00
Nick Terrell
8b40095b3f
Merge pull request #2916 from terrelln/issue-2906
Remove possible NULL pointer addition
2021-12-08 16:51:10 -05:00
Yann Collet
16241b7d26 altered copyright title 2021-12-08 13:18:41 -08:00
Yann Collet
a9cd6164d7 removed declarations of XXH3 symbols when XXH_NO_XXH3 is defined
on top of implementations, which were already scoped out.
2021-12-08 12:56:16 -08:00
Yann Collet
27e706de88 replaces malloc / free / memcpy by Zstandard's version 2021-12-08 12:51:04 -08:00
Nick Terrell
b94407b6cf Remove possible NULL pointer addition
Refactor `ZSTDMT_isOverlapped()` to do NULL checks before computing the
end pointer.

Fixes #2906.
2021-12-08 12:40:40 -08:00
Nick Terrell
859e0500ab
Merge pull request #2915 from terrelln/oss-fuzz-build-fix
Fix oss-fuzz build
2021-12-08 15:32:49 -05:00
Nick Terrell
aa7729c9f3 Fix oss-fuzz build
Disable assembly when dataflow sanitizer is enabled.

This regressed in PR #2893, which accidentally removed the check for
dataflow sanitizer.
2021-12-08 11:01:52 -08:00
W. Felix Handte
9f1dee8fa5 Fix Up #2659; Build libzstd.pc Whenever Building the Lib on Unix 2021-12-08 12:43:34 -05:00
Yann Collet
1c7d2c4dd5 updated xxHash to latest v0.8.1
with minor modifications directly embedded in source :
- does not compile XXH3
- namespace emulation (ZSTD_ prefix)

Incidentally fix #2824
2021-12-07 21:16:15 -08:00
Felix Handte
9118ee04c2
Merge pull request #2659 from ericonr/pc
[lib] Fix libzstd.pc for lib-mt builds
2021-12-07 14:18:38 -05:00
Nick Terrell
b6b4c9a3da
Merge pull request #2907 from Hello71/armv6-fix-legacy
Apply FORCE_MEMORY_ACCESS=1 to legacy
2021-12-06 15:41:22 -05:00
Alex Xu (Hello71)
3d773d7013 Apply FORCE_MEMORY_ACCESS=1 to legacy
See #2633, #2881.
2021-12-05 22:51:44 -05:00
Nick Terrell
486472c453
Merge pull request #2893 from terrelln/issue-2789
[asm] Share portability macros and restrict ASM further
2021-12-03 14:07:30 -05:00
Felix Handte
d2c86ec898
Merge pull request #2897 from felixhandte/zstd-deprecated-avoid-deprecated
Avoid Using Deprecated Functions in Deprecated Code
2021-12-03 12:09:58 -05:00
Nick Terrell
c284569457 [asm] Share portability macros and restrict ASM further
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.

Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.

Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.

Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.
2021-12-02 16:58:04 -08:00
Nick Terrell
014bbb29f8
Merge pull request #2898 from terrelln/issue-2862
Improve zstd_opt build speed and size
2021-12-02 19:49:43 -05:00
Yann Collet
1bf3d8a475
Merge pull request #2896 from facebook/m68k
Zstandard compiles and run on m68k cpus
2021-12-02 14:25:45 -08:00
Nick Terrell
e5bfaeede7 Improve zstd_opt build speed and size
Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.

Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.

Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.

Compilation time sees a ~4x reduction:

| Compiler | Flags                            | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc      | -O3                              |         10.1 |         2.3 |  -77% |
| gcc      | -O3 -fsanitize=address,undefined |         61.1 |        10.2 |  -83% |
| clang    | -O3                              |          9.0 |         2.1 |  -76% |
| clang    | -O3 -fsanitize=address,undefined |         33.5 |         5.1 |  -84% |

Build size is reduced by 150KB - 200KB:

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1327476 |               1177108 |  -11% |
| clang    |                1378324 |               1167780 |  -15% |

There is a <2% speed loss in all cases:

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta  |
|----------|-------|------------------|-----------------|--------|
| gcc      |    16 |             4.78 |            4.72 | -1.25% |
| gcc      |    17 |             3.49 |            3.46 | -0.85% |
| gcc      |    18 |             2.92 |            2.86 | -2.04% |
| gcc      |    19 |             2.61 |            2.61 |  0.00% |
| clang    |    16 |             4.69 |            4.80 |  2.34% |
| clang    |    17 |             3.53 |            3.49 | -1.13% |
| clang    |    18 |             2.86 |            2.85 | -0.34% |
| clang    |    19 |             2.61 |            2.61 |  0.00% |

Fixes Issue #2862.
2021-12-02 14:19:41 -08:00
W. Felix Handte
e688317652 Fix Include Path 2021-12-02 16:53:52 -05:00
Nick Terrell
01ecd6ffc0
Merge pull request #2892 from terrelln/issue-2785
[CircleCI] Fix short-tests-0
2021-12-02 16:20:56 -05:00
W. Felix Handte
d82d67d073 Migrate to FORWARD_IF_ERROR 2021-12-02 16:06:07 -05:00
Yann Collet
30b9db8ae4 changed macro name to ZSTD_ALIGNOF
for better consistency
2021-12-02 12:57:42 -08:00
Yann Collet
1d025d871b bound alignment backup to sizeof(void*) 2021-12-02 11:30:03 -08:00
W. Felix Handte
a3ee9815c7 Avoid Using Deprecated Functions in Deprecated Code
`lib/deprecated` is no longer built by zstd's bundled build files. However,
users may try to build these files when they import the source tree into
their own build systems. And if they have `-Wdeprecated-declarations` on,
this can produce warnings.

This PR migrates these files away from using deprecated declarations.

This addresses #2767.
2021-12-02 14:25:33 -05:00
Yann Collet
80a13fd645 move the alignment macro to compiler.h
because mem.h is dropped in the Linux kernel.

Changed macro definition order (gcc/clang/msvc before c11)
due to a limitation in the kernel source builder.

Changed the backup to sizeof(),
reverting to previous behavior when no support of alignof() is detected.
2021-12-02 11:20:01 -08:00
Nick Terrell
21e28f5c24
Merge pull request #2891 from supperPants/dev
Fix typos
2021-12-02 13:53:33 -05:00
Yann Collet
39dced092e fix align conditions for huf_compress 2021-12-01 23:02:00 -08:00
Nick Terrell
91f5891dd0 [CircleCI] Fix short-tests-0
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.

Also fix all the test failures that were exposed.

`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
2021-12-01 17:43:46 -08:00
Yann Collet
e89e847820 added alignment test
and fix an incorrect alignment check in cwksp which was failing on m68k
2021-12-01 17:16:36 -08:00
Yann Collet
a71eed360b removed lib/Makefile preamble
now included from libzstd.mk
2021-12-01 16:54:59 -08:00
Yann Collet
3f64b31585 Merge branch 'dev' into tomerge2051 2021-12-01 15:29:49 -08:00
Nick Terrell
9b97fdf74f
Merge pull request #2887 from terrelln/issue-2815
[zdict] Remove ZDICT_CONTENTSIZE_MIN restriction for ZDICT_finalizeDictionary
2021-12-01 18:15:53 -05:00
Yann Collet
8031dc7a48
Merge pull request #2885 from yoniko/limit-level-32bit-systems
Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.
2021-12-01 14:19:16 -08:00
supperPants
d4713de5a3 Fix typos. 2021-12-01 22:36:21 +08:00
Nick Terrell
0356e05f07 [zdict] Remove ZDICT_CONTENTSIZE_MIN restriction for ZDICT_finalizeDictionary
Allow the `dictContentSize` to be any size. The finalized dictionary
content size must be at least as large as the maximum repcode (8). So we
add zero bytes to the dictionary to ensure that we meet that
requirement.

I've removed this restriction because its been causing us headaches when
people complain that dictionary training failed. It fails because there
isn't enough useful content to put in the dictionary. Either because
every sample is exactly the same and less than ZDICT_CONTENTSIZE_MIN bytes,
or there isn't enough content. Instead, we should succeed in creating
the dictionary, and it is up to the user to decide if it is worthwhile.
It is possible that the tables alone provide enough value.

NOTE: This allows us to produce dictionaries with finalized
`dictContentSize < ZDICT_CONTENTSIZE_MIN`. But, they are still valid
zstd dictionaries. We could remove the `ZDICT_CONTENTSIZE_MIN` macro,
but I've decided to leave that for now, so we don't break users.
2021-11-30 18:02:26 -08:00