* seekable_format: fix from-file reading (not in-memory)
It tries to check the buffer boundary, but there is no buffer for
from-file reading.
* seekable_decompression: break when ZSTD_seekable_decompress() returns zero
* seekable_decompression_mem: break when ZSTD_seekable_decompress() returns zero
* seekable_format: cap the offset+len up to the last dOffset
This will allow to read the whole file w/o gotting corruption error if
the offset is more then the data left in file, i.e.:
$ ./seekable_compression seekable_compression.c 8192 | head
$ zstd -cdq seekable_compression.c.zst | wc -c
4737
Before this patch:
$ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
ZSTD_seekable_decompress() error : Corrupted block detected
0
After:
$ ./seekable_decompression seekable_compression.c.zst 0 10000000 | wc -c
4737
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.
This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
previous lower limit was 1 MB.
Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.
Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.
```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
21.187881087 seconds time elapsed
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
1.149707921 seconds time elapsed
```
Any stable API entry point introduced after v1.0
should be documented with its minimum version number.
Since PR fixes this requirement
updating mostly new entry points since v1.4.0
and newly introduced ones for future v1.5.0.
Switch from `-T0` to the default `-T1` which significantly reduces
memory usage for level 19 when there are many cores. This fixes
32-bit issues of running out of address space.
Fixes#2603.
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
got the correction wrong in this case, and our chain tables and binary
trees would be corrupted. Now, we work as long as `maxDist` is a power
of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
run overflow correction as frequently as allowed without impacting
compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
`zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
speed penalty at most, which seems reasonable.
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
Replace kernel-style comments with regular comments.
E.g.
```
/** Before */
/* After */
/**
* Before
*/
/*
* After
*/
/***********************************
* Before
***********************************/
/* *********************************
* After
***********************************/
```
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.