Commit Graph

4090 Commits (2624652a3273817a99fb977975db96e38d440bae)

Author SHA1 Message Date
Yann Collet f829c32258 forgot the chainlog is effectively a "fake" value with rowHash
the only value which makes sense is `hashlog-1`
as it mimics the real memory usage.
2021-12-16 11:37:40 -08:00
Yann Collet db1b408a2f rebalance lazy compression levels 2021-12-15 21:33:31 -08:00
Yann Collet c8d6067615 fixed incorrect rowlog initialization
the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
2021-12-15 14:37:05 -08:00
Yann Collet eaf786242d
Merge pull request #2929 from facebook/sse_row_lazy
simplify SSE implementation of row_lazy match finder
2021-12-15 11:47:15 -08:00
Norbert Lange 2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
Norbert Lange 99923dfc1a Add typedefs for 8bit (un)signed
To make code more expressive, add U8 and S8 typedefs
2021-12-14 23:47:57 +01:00
binhdvo 64205b7832
Fix performance degradation with -m32 (#2926) 2021-12-14 15:53:50 -05:00
Felix Handte 5e2fede604
Merge pull request #2921 from felixhandte/neg-lvl-stagger-step
Stagger Stepping in Negative Levels
2021-12-14 14:13:57 -05:00
Yann Collet 05430b25a8 roll SSE implementation of row_lazy match finder
mostly for maintenance convenience.

Performance wise, there is very little change,
slightly faster for slog 3 & 4,
neutral or very slightly negative for slot 5 & 6.
2021-12-14 10:44:23 -08:00
W. Felix Handte 82a49c88f9 Increment Step by 1 not 2
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)

Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
2021-12-13 16:59:33 -05:00
W. Felix Handte 6ca5f42402 Rewrite `step` to Track Increment Between Pairs of Positions
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.

This seems to work quite well on GCC and Clang!
2021-12-13 14:48:26 -05:00
W. Felix Handte b8434cb754 Allow Templating `ZSTD_fast` Matchfinders on Acceleration (Lvl < -1) 2021-12-13 14:46:57 -05:00
Yann Collet e1ab2200ff fixed x32 compatibility 2021-12-10 21:02:17 -08:00
W. Felix Handte ace6a7e746 Decompose `step` into Two Variables
This avoids an additional addition, at the cost of an additional variable.
2021-12-10 16:44:23 -05:00
W. Felix Handte 22501cd283 Stagger Application of `stepSize` in ZSTD_fast
This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
2021-12-10 16:44:23 -05:00
Yann Collet 57383d2317
Merge pull request #2914 from facebook/xxhash081
updated xxHash to latest v0.8.1
2021-12-08 16:48:46 -08:00
Yann Collet 3ce265fea8 remove offending static assert lines
no idea why visual + clang-cl + appveyor don't like them,
I've not been able to reproduce the issue locally,
but these static assert are very unlikely to deliver a useful signal,
I can't imagine a situation where they will be wrong,
and if they are, then a ton of other things will be broken way before reaching that point.
2021-12-08 15:05:17 -08:00
Nick Terrell 8b40095b3f
Merge pull request #2916 from terrelln/issue-2906
Remove possible NULL pointer addition
2021-12-08 16:51:10 -05:00
Yann Collet 16241b7d26 altered copyright title 2021-12-08 13:18:41 -08:00
Yann Collet a9cd6164d7 removed declarations of XXH3 symbols when XXH_NO_XXH3 is defined
on top of implementations, which were already scoped out.
2021-12-08 12:56:16 -08:00
Yann Collet 27e706de88 replaces malloc / free / memcpy by Zstandard's version 2021-12-08 12:51:04 -08:00
Nick Terrell b94407b6cf Remove possible NULL pointer addition
Refactor `ZSTDMT_isOverlapped()` to do NULL checks before computing the
end pointer.

Fixes #2906.
2021-12-08 12:40:40 -08:00
Nick Terrell 859e0500ab
Merge pull request #2915 from terrelln/oss-fuzz-build-fix
Fix oss-fuzz build
2021-12-08 15:32:49 -05:00
Nick Terrell aa7729c9f3 Fix oss-fuzz build
Disable assembly when dataflow sanitizer is enabled.

This regressed in PR #2893, which accidentally removed the check for
dataflow sanitizer.
2021-12-08 11:01:52 -08:00
W. Felix Handte 9f1dee8fa5 Fix Up #2659; Build libzstd.pc Whenever Building the Lib on Unix 2021-12-08 12:43:34 -05:00
Yann Collet 1c7d2c4dd5 updated xxHash to latest v0.8.1
with minor modifications directly embedded in source :
- does not compile XXH3
- namespace emulation (ZSTD_ prefix)

Incidentally fix #2824
2021-12-07 21:16:15 -08:00
Felix Handte 9118ee04c2
Merge pull request #2659 from ericonr/pc
[lib] Fix libzstd.pc for lib-mt builds
2021-12-07 14:18:38 -05:00
Nick Terrell b6b4c9a3da
Merge pull request #2907 from Hello71/armv6-fix-legacy
Apply FORCE_MEMORY_ACCESS=1 to legacy
2021-12-06 15:41:22 -05:00
Alex Xu (Hello71) 3d773d7013 Apply FORCE_MEMORY_ACCESS=1 to legacy
See #2633, #2881.
2021-12-05 22:51:44 -05:00
Nick Terrell 486472c453
Merge pull request #2893 from terrelln/issue-2789
[asm] Share portability macros and restrict ASM further
2021-12-03 14:07:30 -05:00
Felix Handte d2c86ec898
Merge pull request #2897 from felixhandte/zstd-deprecated-avoid-deprecated
Avoid Using Deprecated Functions in Deprecated Code
2021-12-03 12:09:58 -05:00
Nick Terrell c284569457 [asm] Share portability macros and restrict ASM further
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.

Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.

Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.

Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.
2021-12-02 16:58:04 -08:00
Nick Terrell 014bbb29f8
Merge pull request #2898 from terrelln/issue-2862
Improve zstd_opt build speed and size
2021-12-02 19:49:43 -05:00
Yann Collet 1bf3d8a475
Merge pull request #2896 from facebook/m68k
Zstandard compiles and run on m68k cpus
2021-12-02 14:25:45 -08:00
Nick Terrell e5bfaeede7 Improve zstd_opt build speed and size
Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.

Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.

Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.

Compilation time sees a ~4x reduction:

| Compiler | Flags                            | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc      | -O3                              |         10.1 |         2.3 |  -77% |
| gcc      | -O3 -fsanitize=address,undefined |         61.1 |        10.2 |  -83% |
| clang    | -O3                              |          9.0 |         2.1 |  -76% |
| clang    | -O3 -fsanitize=address,undefined |         33.5 |         5.1 |  -84% |

Build size is reduced by 150KB - 200KB:

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1327476 |               1177108 |  -11% |
| clang    |                1378324 |               1167780 |  -15% |

There is a <2% speed loss in all cases:

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta  |
|----------|-------|------------------|-----------------|--------|
| gcc      |    16 |             4.78 |            4.72 | -1.25% |
| gcc      |    17 |             3.49 |            3.46 | -0.85% |
| gcc      |    18 |             2.92 |            2.86 | -2.04% |
| gcc      |    19 |             2.61 |            2.61 |  0.00% |
| clang    |    16 |             4.69 |            4.80 |  2.34% |
| clang    |    17 |             3.53 |            3.49 | -1.13% |
| clang    |    18 |             2.86 |            2.85 | -0.34% |
| clang    |    19 |             2.61 |            2.61 |  0.00% |

Fixes Issue #2862.
2021-12-02 14:19:41 -08:00
W. Felix Handte e688317652 Fix Include Path 2021-12-02 16:53:52 -05:00
Nick Terrell 01ecd6ffc0
Merge pull request #2892 from terrelln/issue-2785
[CircleCI] Fix short-tests-0
2021-12-02 16:20:56 -05:00
W. Felix Handte d82d67d073 Migrate to `FORWARD_IF_ERROR` 2021-12-02 16:06:07 -05:00
Yann Collet 30b9db8ae4 changed macro name to ZSTD_ALIGNOF
for better consistency
2021-12-02 12:57:42 -08:00
Yann Collet 1d025d871b bound alignment backup to sizeof(void*) 2021-12-02 11:30:03 -08:00
W. Felix Handte a3ee9815c7 Avoid Using Deprecated Functions in Deprecated Code
`lib/deprecated` is no longer built by zstd's bundled build files. However,
users may try to build these files when they import the source tree into
their own build systems. And if they have `-Wdeprecated-declarations` on,
this can produce warnings.

This PR migrates these files away from using deprecated declarations.

This addresses #2767.
2021-12-02 14:25:33 -05:00
Yann Collet 80a13fd645 move the alignment macro to compiler.h
because mem.h is dropped in the Linux kernel.

Changed macro definition order (gcc/clang/msvc before c11)
due to a limitation in the kernel source builder.

Changed the backup to sizeof(),
reverting to previous behavior when no support of alignof() is detected.
2021-12-02 11:20:01 -08:00
Nick Terrell 21e28f5c24
Merge pull request #2891 from supperPants/dev
Fix typos
2021-12-02 13:53:33 -05:00
Yann Collet 39dced092e fix align conditions for huf_compress 2021-12-01 23:02:00 -08:00
Nick Terrell 91f5891dd0 [CircleCI] Fix short-tests-0
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.

Also fix all the test failures that were exposed.

`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
2021-12-01 17:43:46 -08:00
Yann Collet e89e847820 added alignment test
and fix an incorrect alignment check in cwksp which was failing on m68k
2021-12-01 17:16:36 -08:00
Yann Collet a71eed360b removed lib/Makefile preamble
now included from libzstd.mk
2021-12-01 16:54:59 -08:00
Yann Collet 3f64b31585 Merge branch 'dev' into tomerge2051 2021-12-01 15:29:49 -08:00
Nick Terrell 9b97fdf74f
Merge pull request #2887 from terrelln/issue-2815
[zdict] Remove ZDICT_CONTENTSIZE_MIN restriction for ZDICT_finalizeDictionary
2021-12-01 18:15:53 -05:00
Yann Collet 8031dc7a48
Merge pull request #2885 from yoniko/limit-level-32bit-systems
Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.
2021-12-01 14:19:16 -08:00