Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.
This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
got the correction wrong in this case, and our chain tables and binary
trees would be corrupted. Now, we work as long as `maxDist` is a power
of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
run overflow correction as frequently as allowed without impacting
compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
`zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
speed penalty at most, which seems reasonable.
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
* Perform 64-byte alignment of wksp tables and aligneds internally
* Clean up cwskp_finalize() function to only do two allocs
* Refactor aligned/buffer reservation code, remove ASAN req for alignment reservations
* Change from allocating 128 bytes always to allocating only buffer space as needed for tables/aligned
* Back out aligned/table reservation order restriction
* Add stricter bounds for new/resized wksps, fix comment in zstd_cwksp.h
* Do not emit last partitions of blocks as RLE/uncompressed
* Fix repcode updates within block splitter
* Add a entropytables confirm function, redo ZSTD_confirmRepcodesAndEntropyTables() for better function signature
* Add a repcode updater to block splitter, no longer need to force emit compressed blocks
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.
This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.
* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`
It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
Forward the correct compressionLevel to the appliedParams in all cases.
It was already correct for the advanced API, so only the old single-pass
functions needed to be fixed.
This compression level is unused by the library, but is set so that the
tracing framework can consume it.
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.
This keeps the simple case simple.