1234 Commits

Author SHA1 Message Date
sen
9e94b7cac5
Assert no divison by 0, correct superblocks 0 sequences case (#2592) 2021-05-07 13:26:56 -04:00
sen
698f261b35
[1.5.0] Deprecate some functions (#2582)
* Add deprecated macro to zstd.h, mark certain functions as deprecated

* Remove ZSTD_compress.c dependencies on deprecated functions
2021-05-06 17:59:32 -04:00
Nick Terrell
207e33bb61
Merge pull request #2616 from terrelln/deterministic-dict
[lib] Add ZSTD_c_deterministicRefPrefix
2021-05-06 11:09:22 -07:00
Nick Terrell
172b4b6ac4 [lib] Add ZSTD_c_deterministicRefPrefix
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.

A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.

Question: Should this be the default behavior? It isn't in this PR.
2021-05-05 18:49:56 -07:00
Nick Terrell
eb7e74ccb7 [tests] Set DEBUGLEVEL=2 by default
This allows us to quickly check for compile errors in debug log
messages, which are compiled out when `DEBUGLEVEL < 2`.
2021-05-05 13:29:06 -07:00
Nick Terrell
c2183d7cdf [lib] Move some ZSTD_CCtx_params off the stack
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
  functions, instead of creating those parameters on the stack.

I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).

Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
2021-05-05 13:25:16 -07:00
Nick Terrell
0b88c2582c [test] Add large dict/data --patch-from test
Dictionary size must be > `ZSTD_CHUNKSIZE_MAX`.
2021-05-04 17:31:32 -07:00
Nick Terrell
94db4398a0 [lib] Always load the dictionary in one go
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.

This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
2021-05-04 16:45:25 -07:00
Nick Terrell
1ffa80a09e [easy] Rewrite rowHashLog computation
`ZSTD_highbit32(1u << x) == x` when it isn't undefined behavior.
2021-05-04 11:43:20 -07:00
Nick Terrell
34aff7ea06 Bug fix & run overflow correction much more frequently in tests
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
  got the correction wrong in this case, and our chain tables and binary
  trees would be corrupted. Now, we work as long as `maxDist` is a power
  of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
  run overflow correction as frequently as allowed without impacting
  compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
  `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
  speed penalty at most, which seems reasonable.
2021-05-03 15:21:47 -07:00
senhuang42
61fe571af6 Fix chaintable check to include rowhash in ZSTD_reduceIndex() 2021-04-30 19:52:04 -04:00
Nick Terrell
6cee3c2c4f [trace] Remove default definitions of weak symbols
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
2021-04-26 16:05:39 -07:00
felixhandte
efa6dfa729 Apply DDS adjustments to avoid assert failures 2021-04-23 16:41:00 -04:00
sen
12c045f74d
Merge pull request #2574 from senhuang42/repcode_mismatch_detector_fix
Correct the block splitter mismatched repcodes detection.
2021-04-12 23:27:43 -04:00
Sen Huang
550f76f131 Correct the detection of mismatched repcodes 2021-04-09 09:08:51 -07:00
Nick Terrell
4694423c4f Add and integrate lazy row hash strategy 2021-04-07 09:53:34 -07:00
sen
f71aabb5b5
Move clevel override to after initLocalDict() (#2571) 2021-04-06 21:05:37 -04:00
sen
f1e8b565c2
Maintain two repcode histories for block splitting, replace invalid repcodes (#2569) 2021-04-06 17:25:55 -04:00
sen
e38124555e
Fix dictionary force reloading clevel selection (#2570)
* Move cdict clevel override to before localdict init

* Update results.csv after dict load changes
2021-04-06 15:35:09 -04:00
sen
980f3bbf83
[cwksp] Align all allocated "tables" and "aligneds" to 64 bytes (#2546)
* Perform 64-byte alignment of wksp tables and aligneds internally

* Clean up cwskp_finalize() function to only do two allocs

* Refactor aligned/buffer reservation code, remove ASAN req for alignment reservations

* Change from allocating 128 bytes always to allocating only buffer space as needed for tables/aligned

* Back out aligned/table reservation order restriction

* Add stricter bounds for new/resized wksps, fix comment in zstd_cwksp.h
2021-04-01 20:07:19 -04:00
sen
255925c231
Fix repcode-related OSS-fuzz issues in block splitter (#2560)
* Do not emit last partitions of blocks as RLE/uncompressed

* Fix repcode updates within block splitter

* Add a entropytables confirm function, redo ZSTD_confirmRepcodesAndEntropyTables() for better function signature

* Add a repcode updater to block splitter, no longer need to force emit compressed blocks
2021-03-31 15:14:59 -04:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
sen
84ccb81e7c
Merge pull request #2561 from senhuang42/longlength_enum
Add enum for representing long length ID
2021-03-26 15:55:12 -04:00
Sen Huang
b1a43455f8 Add enum for representing long length ID 2021-03-26 10:41:09 -07:00
sen
4fe2e7ae14
Merge pull request #2558 from senhuang42/msan_block_splitter_fix
Fix block splitter minor MSAN warning.
2021-03-25 13:51:43 -04:00
sen
b0407b9f0e
Merge pull request #2555 from senhuang42/default_clevel_func
Add ZSTD_defaultCLevel() function to public API
2021-03-25 13:07:28 -04:00
Sen Huang
2a907bf4aa Move lastCountSize into a returned struct, fix MSAN error 2021-03-25 09:11:15 -07:00
Sen Huang
e398744a35 Add ZSTD_defaultCLevel() function to public API 2021-03-25 08:04:00 -07:00
Nick Terrell
f8ac0ea7ef
Merge pull request #2539 from terrelln/linux-kernel-fixes
Fixes for the next linux kernel patch version
2021-03-24 10:34:29 -07:00
sen
bf542c8a8d
Merge pull request #2447 from senhuang42/block_splitter_v2
Recursive block splitting
2021-03-24 12:27:22 -04:00
Sen Huang
5b566ebe08 Rename *compressSequences*() functions for clarity 2021-03-24 08:21:29 -07:00
Sen Huang
0ef1f935b7 Add a fallback in case the total blocksize of split blocks exceeds raw block size 2021-03-24 08:21:29 -07:00
Sen Huang
c90e81a692 Enable block splitter by default when applicable 2021-03-24 08:21:29 -07:00
Sen Huang
e34332834a Clean up various functions, add debuglogging for estimate vs. actual sizes 2021-03-24 08:21:29 -07:00
Sen Huang
41c3eae6d9 Fix various fuzzer failures: repcode history, superblocks 2021-03-24 08:21:29 -07:00
senhuang42
0633bf17c3 Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression 2021-03-24 08:21:29 -07:00
senhuang42
eb1ee8686d Refactor buildSequencesStatistics() to avoid pointer increment for superblocks 2021-03-24 08:21:29 -07:00
senhuang42
e2bb215117 Add unit tests and fuzzer param 2021-03-24 08:21:09 -07:00
senhuang42
de52de1347 Add recursive block split algorithm 2021-03-24 08:21:09 -07:00
senhuang42
f06f6626ed Update function names for consistency 2021-03-24 08:20:54 -07:00
senhuang42
c56d6e49e8 Add block splitter to experimental params 2021-03-24 08:20:54 -07:00
senhuang42
2949a95224 Refactor block compression logic into single function 2021-03-24 08:20:54 -07:00
senhuang42
c05c090cc2 Centralize entropy statistics calculations to zstd_compress.c 2021-03-24 08:20:29 -07:00
sen
c48889f097
Merge pull request #2538 from senhuang42/monotonicity_test
Add memory monotonicity test over srcSize
2021-03-22 16:54:34 -04:00
Sen Huang
dff4a0e867 Make ZSTD_estimateCCtxSize_internal() loop through all srcSize parameter sets as well 2021-03-21 16:15:31 -07:00
Sen Huang
77ae664ba6 Fix ZSTD_dedicatedDictSearch_isSupported() requirements 2021-03-16 17:36:05 -07:00
senhuang42
386111adec Add a nbSeq argument to compressSequences()
Refactor ZSTD_compressBlock_internal() to do the block header write within and add nbSeq argument to compressSequences()
2021-03-16 14:04:22 -07:00
senhuang42
98764493cf Move block header write into compressBlock_internal() 2021-03-16 14:04:22 -07:00
Nick Terrell
cd1551d261 [lib][tracing] Add ZSTD_NO_TRACE macro
When defined, it disables tracing, and avoids including the header.
2021-03-16 11:47:27 -07:00
Nick Terrell
7736549bea [bug-fix] Make simple single-pass functions ignore advanced parameters
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.

This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.

* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`

It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
2021-02-12 19:11:23 -08:00