When decompressing with `-q` and an output file, the progress bar was mistakenly printed. This is a minimal fix, with a larger refactor to be stacked on top of it.
Fixes#2967.
oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072,
which can't be represented in the zstd format, so we accessed 1 beyond LL_bits.
Fix the issue by making it cost 1 bit more than litLength = 131071.
There are still follow ups:
1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals
over a match. Should we bound literals_cost[literal] > 0, unless the block truly only
has one literal value?
2. When no matches are found, the cost model isn't updated. In this case no matches were
found for an entire block. So the literals cost model wasn't updated at all. That made
the optimal parser think literals_cost[0] = 0, where it is actually quite high, since
the block was entirely random noise.
Credit to OSS-Fuzz.
xxHash symbols are present in `libzstd.so`, but they are local and therefore
unavailable outside the lib. There are two possible solutions to the problem.
We could make those symbols global, or we could remove the dependency.
This commit chooses the latter approach. I suppose this comes at the cost of
code size / build time. I'm open to comments on whether this is a good thing
to do, especially since this will apply even when we are statically linking
everything.
This commit makes several changes:
1. It adds modules for the dictionary builder and errors headers.
2. It captures all of the macros that are used to configure these headers.
When the headers are imported as modules and one of these macros is defined
the compiler issues a warning that it needs to be defined on the CLI.
3. It promotes the modulemap file into the root of the lib directory.
Experimentation shows that clang's `-fimplicit-module-maps` will find the
modulemap when placed here, but not when it's put in a subdirectory.
After merging #2951 I realized that we will want to explicitly disable
assembly when we aren't including the assembly source file. Otherwise,
if some non clang/gcc compiler does support assembly, it would fail to
build.
When re-using a compression state, across multiple successive compressions,
the state should minimize the amount of allocation and initialization required.
This mostly matters in situations where initialization is an overwhelming task
compared to compression itself.
This can happen when the amount to compress is small,
while the compression state was given the impression that it would be much larger,
aka, streaming mode without providing a srcSize hint.
This lean-initialization optimization was broken in 980f3bbf8354edec0ad32b4430800f330185de6a .
This commit fixes it, making this scenario once again on par with v1.4.9.
Note that this does not completely fix#2966,
since another heavy initialization, specific to row mode,
is also happening (and was not present in v1.4.9).
This will be fixed in a separate commit.
Apparently, even when the assembly file is empty (because
`ZSTD_ENABLE_ASM_X86_64_BMI2` is false), it still is marked as possibly
needing an executable stack and so the whole library is marked as such. This
commit applies a simple patch for this problem by moving the noexecstack
indication outside the macro guard.
This commit builds on #2857.
This commit addresses #2963.
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.
Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`.
This makes it possible to abstract away this representation by using the macros to extract these values.
First user : ZSTD_updateRep() .
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.
Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.
While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".
One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.
This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.
At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.