Commit Graph

273 Commits (0953645837d0f5c22aa3bef713de2c2e13ea1bc5)

Author SHA1 Message Date
senhuang42 81a2c02d8f Move ldm no regression test to fuzzer longtests 2020-10-19 15:28:46 -04:00
Yann Collet f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell 7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
senhuang42 e96ea5d147 Fix static analyze fuzzer.c error 2020-10-07 13:56:25 -04:00
senhuang42 b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
Nick Terrell 2e7d174130 Reset all decompression parameters in ZSTD_DCtx_reset()
* Reset all decompression parameters in `ZSTD_DCtx_reset()` when
  resetting parameters.
* Add a test case.
2020-10-01 14:19:21 -07:00
W. Felix Handte 9398acb245 Move Last Two Long Tests in fuzzer.c into Separate --long-tests Section 2020-09-17 13:31:10 -04:00
Yann Collet dec1a78d3e minor fix casting for Visual 2020-09-14 11:46:23 -07:00
Yann Collet c91a0855f8 check endDirective in ZSTD_compressStream2()
fix #2297
also :
- `assert()` `endDirective` in `ZSTD_compressStream_internal()`, for debug mode
- add relevant tests
2020-09-14 10:56:08 -07:00
W. Felix Handte d6246d4a0f Print More During Fuzzer Test to Avoid CI Killing it Due to Timeout
This is kind of hacky. And maybe this test doesn't need to be permanently as
exhaustive as it is now. But while we're actively developing the DDSS, we
should ensure it's compatible across many different modes.
2020-09-10 23:35:42 -04:00
W. Felix Handte 6d3f816b3e Test Fewer Dictionary Sizes 2020-09-10 22:30:52 -04:00
W. Felix Handte b6df3fd438 Fix Debug Logging in 32-bit Build 2020-09-10 22:10:02 -04:00
W. Felix Handte 2cc2b40a1b Test DDSS A Little More Thoroughly 2020-09-10 22:10:02 -04:00
W. Felix Handte b81f3a37f9 Easy: Fix Test 2020-09-10 18:51:52 -04:00
W. Felix Handte 2cf6cfc55f Add Fuzzer Test for the Various Dict Attachment Strategies 2020-09-10 18:51:52 -04:00
Yann Collet f82d9865b9
Merge pull request #2278 from senhuang42/ignore_checksum_advanced_param
New advanced decompression param to ignore checksums
2020-08-25 12:08:53 -07:00
Nick Terrell 614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
senhuang42 a030560d62 Add new DCtx param: validateChecksum and update unit tests 2020-08-24 17:28:00 -04:00
Nick Terrell 1302f8d676 [fix] Always return dstSize_tooSmall when it is the case 2020-08-24 13:38:13 -07:00
senhuang42 44c54a3e31 Addressing comments: more comments, cleanup, remove extra function, checksum logic 2020-08-24 16:14:19 -04:00
senhuang42 ffaa0df76d Document change in CLI for --no-check during decompression in --help menu 2020-08-24 09:49:12 -04:00
senhuang42 20eb095882 Added unit test to fuzzer.c, changed definition param name 2020-08-22 13:26:33 -04:00
senhuang42 1b34b15e6b Adding CLI capability to invoke decompression with no checksum 2020-08-21 17:49:30 -04:00
senhuang42 6a8dbdcd1f Modify decompression loop to gnore checksums if flag is enabled 2020-08-21 16:46:46 -04:00
Nick Terrell 575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell 651d3d73e0 [test] Update the ldm loadedDictEnd test to cover zstdmt 2020-05-19 16:14:14 -07:00
Nick Terrell 7b317b4876 [test] Test that the ldm dictionary gets invalidated on reset 2020-05-18 16:00:28 -07:00
Nick Terrell 87dbd6d4bf [test] Improve LDM forceMaxWindow test 2020-05-18 15:11:18 -07:00
Nick Terrell bf0591e1e2 [test] Expose the LDM+MT+dict bug in a unit test 2020-05-14 12:06:55 -07:00
Yann Collet e001715b3d fixed asan test 2020-05-11 20:35:47 -07:00
Yann Collet 20bd246045 blindfix for VS macro redefinition 2020-05-11 19:29:36 -07:00
Yann Collet 91ad01218e updated initStatic tests
differentiate small CCtx for small inputs
from full CCtx
from CStream contexts.

Ensure allocation & resize tests are more precise.
2020-05-11 18:50:10 -07:00
Yann Collet 608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
Yann Collet dd026ca505 re-inforced tests for initStaticCCtx
ensure that `estimateCCtxSize()` works as intended.
2020-05-09 11:30:45 -07:00
Nick Terrell 5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
W. Felix Handte 2cf72d56a6 Try to Fix MSVC Error
It's complaining about the `memcpy`s, saying:

"warning C4090: 'function': different 'const' qualifiers"

Let's try explicitly casting to the argument types...
2020-05-04 10:59:15 -04:00
W. Felix Handte dacbcd2cc1 Fix Up Some Pointer Handling in Tests 2020-05-04 10:59:15 -04:00
Nick Terrell e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Nick Terrell 108a5572a5
Merge pull request #2048 from nocnokneo/ctest-support
Add CTest support
2020-04-28 11:01:13 -07:00
Bimba Shrestha f7a7409a49 adding fail test when passing wrong fullDict using refPrefix 2020-04-21 22:26:48 -07:00
Bimba Shrestha 5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Bimba Shrestha 31e76f1ed4 adding test for dctx size reduction 2020-04-04 08:49:24 -07:00
Nick Terrell ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Taylor Braun-Jones 3cbc3d37e7 Add documentation for -T option 2020-03-23 17:49:04 -04:00
Bimba Shrestha 6a4258a08a
Removing symbols already in unit tests and adding some new unit tests for missing symbols (#1985)
* Removing symbols that are not being tested

* Removing symbols used in zstdcli, fileio, dibio and benchzstd

* Removing symbols used in zbuff and add test-zbuff to travis

* Removing remaining symbols and adding unit tests instead

* Removing symbols test entirely
2020-02-05 16:55:00 -08:00
Nick Terrell 3ed0f65158 [cmake] Add playTests.sh as a test 2020-01-13 14:16:15 -08:00
Bimba Shrestha b1f53b1a10 [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936)
* Adding fail logging for superblock flow

* Dividing by targetCBlockSize instead of blockSize

* Adding new const and using more acurate formula for nbBlocks

* Only do dstCapacity check if using superblock

* Remvoing disabling logic

* Updating test to make it catch more extreme case of previou bug

* Also updating comment

* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Bimba Shrestha 56415efc76 Constifying, malloc check and naming nit 2019-12-17 17:16:51 -08:00
Bimba Shrestha 989ce13e19 One more type conversion 2019-12-13 16:50:21 -08:00