the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.
Should not have negative performance implications.
mostly for maintenance convenience.
Performance wise, there is very little change,
slightly faster for slog 3 & 4,
neutral or very slightly negative for slot 5 & 6.
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)
Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.
This seems to work quite well on GCC and Clang!
This replicates the behavior of @terrelln's `ZSTD_fast` implementation. That
is, it always looks at adjacent pairs of positions, and only applies the
acceleration every other position. This produces a more fine-grained
acceleration.
no idea why visual + clang-cl + appveyor don't like them,
I've not been able to reproduce the issue locally,
but these static assert are very unlikely to deliver a useful signal,
I can't imagine a situation where they will be wrong,
and if they are, then a ton of other things will be broken way before reaching that point.
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.
Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.
Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.
Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.
Use the same trick as we did for zstd_lazy in PR #2828:
* Create one search function specialization for each (dictMode, mls).
* Select the search function pointer at the top of the match finder.
Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into
every function, since `dictMode` is no longer used as a template. Create
two specializations, for opt levels 0 and 2, and call one of the two
specializations.
Lastly, remove the hack that disabled inlining for zstd_opt for the
Linux Kernel, as we've gotten most of the benefit already.
Compilation time sees a ~4x reduction:
| Compiler | Flags | Dev Time (s) | PR Time (s) | Delta |
|----------|----------------------------------|--------------|-------------|-------|
| gcc | -O3 | 10.1 | 2.3 | -77% |
| gcc | -O3 -fsanitize=address,undefined | 61.1 | 10.2 | -83% |
| clang | -O3 | 9.0 | 2.1 | -76% |
| clang | -O3 -fsanitize=address,undefined | 33.5 | 5.1 | -84% |
Build size is reduced by 150KB - 200KB:
| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc | 1327476 | 1177108 | -11% |
| clang | 1378324 | 1167780 | -15% |
There is a <2% speed loss in all cases:
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|--------|
| gcc | 16 | 4.78 | 4.72 | -1.25% |
| gcc | 17 | 3.49 | 3.46 | -0.85% |
| gcc | 18 | 2.92 | 2.86 | -2.04% |
| gcc | 19 | 2.61 | 2.61 | 0.00% |
| clang | 16 | 4.69 | 4.80 | 2.34% |
| clang | 17 | 3.53 | 3.49 | -1.13% |
| clang | 18 | 2.86 | 2.85 | -0.34% |
| clang | 19 | 2.61 | 2.61 | 0.00% |
Fixes Issue #2862.
`lib/deprecated` is no longer built by zstd's bundled build files. However,
users may try to build these files when they import the source tree into
their own build systems. And if they have `-Wdeprecated-declarations` on,
this can produce warnings.
This PR migrates these files away from using deprecated declarations.
This addresses #2767.
because mem.h is dropped in the Linux kernel.
Changed macro definition order (gcc/clang/msvc before c11)
due to a limitation in the kernel source builder.
Changed the backup to sizeof(),
reverting to previous behavior when no support of alignof() is detected.
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.
Also fix all the test failures that were exposed.
`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
Allow the `dictContentSize` to be any size. The finalized dictionary
content size must be at least as large as the maximum repcode (8). So we
add zero bytes to the dictionary to ensure that we meet that
requirement.
I've removed this restriction because its been causing us headaches when
people complain that dictionary training failed. It fails because there
isn't enough useful content to put in the dictionary. Either because
every sample is exactly the same and less than ZDICT_CONTENTSIZE_MIN bytes,
or there isn't enough content. Instead, we should succeed in creating
the dictionary, and it is up to the user to decide if it is worthwhile.
It is possible that the tables alone provide enough value.
NOTE: This allows us to produce dictionaries with finalized
`dictContentSize < ZDICT_CONTENTSIZE_MIN`. But, they are still valid
zstd dictionaries. We could remove the `ZDICT_CONTENTSIZE_MIN` macro,
but I've decided to leave that for now, so we don't break users.