Commit Graph

1563 Commits (0d058469526a07febed7026c90f5212f601bb963)

Author SHA1 Message Date
senhuang42 17222654bf Add streaming decompression to unit test 2021-01-07 12:29:12 -05:00
senhuang42 22b7bff2bc Add unit test, improve documentation 2021-01-07 12:29:12 -05:00
Nick Terrell 58476bcf7f Don't shrink window log in ZSTD_getCParams()
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
2021-01-04 15:54:09 -08:00
Nick Terrell 9d31c704d5 Don't shrink window log when streaming with a dictionary
Fixes #2442.

1. When creating a dictionary keep the same behavior as before.
   Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
   the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
   the dictionary size. When streaming this will select the
   largest parameters and not adjust them down. But, the CDict
   will use the correctly sized parameters, which seems like the
   right tradeoff.
4. When not attaching a dictionary (either forced not to, or
   using a prefix dictionary) we select parameters based on the
   dictionary size + source size, and assume the source size is
   small, which is the same behavior as before. But, now we don't
   adjust the window log (and hash and chain log) down when the
   source size is unknown.

When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.

TODO: Add a streaming + dictionary regression test case.
2021-01-04 15:54:09 -08:00
Nick Terrell a98a6e2091 [test][regression] Add no source size with dictionary test
* Add a test that runs without a pledgedSrcSize and with a dictionary.
* Add github.tar data with uses the github dictionary while compressing
  github.tar, instead of each file individually.
2021-01-04 15:54:09 -08:00
Nick Terrell 66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Yann Collet ff2f888d56 fixed one more minor cast issue
can't use address calculation with `void*`
2020-12-29 11:44:37 -08:00
Yann Collet 7f8be046b9 fixed minor warnings introduced in #2439 2020-12-28 14:07:31 -08:00
Yann Collet cfff4c1cd5
Merge pull request #2439 from senhuang42/skippable_frame_api
Generate skippable frame API
2020-12-28 11:22:07 -08:00
senhuang42 5c41490bfe Use pre-defined constants 2020-12-21 11:52:05 -05:00
senhuang42 339d8ba103 Add unit test 2020-12-21 11:33:27 -05:00
Yann Collet 9648bf027b try to keep libzstd.a "as is" once created
to be compatible with scenarios such as
`make -j allmost`
2020-12-20 17:10:57 -08:00
Yann Collet 3536e9d5ff removing tests using too much resources for 32-bit address space 2020-12-17 15:44:54 -08:00
Yann Collet 0b39531d75 moving all references to `release` branch
was previously `master`
2020-12-16 23:00:35 -08:00
Nick Terrell 0be843b200 [tests] Fix playTests.sh with spaces in path 2020-12-10 11:03:47 -08:00
senhuang42 b9ab6bc061 Fix various conversion warnings 2020-12-08 10:07:28 -05:00
Yann Collet 69a04ccf68
Merge pull request #2413 from senhuang42/paramgrill_windows
Paramgrill for windows
2020-12-04 21:38:39 -08:00
Yann Collet b86e3c9304
Merge pull request #2415 from facebook/fix_aliasing
fix gcc-10 strict aliasing warnings
2020-12-04 21:30:57 -08:00
Yann Collet 5c0a3489a5 fix aliasing warning in decodecorpus 2020-12-04 19:21:40 -08:00
Nick Terrell c238db046f
Merge pull request #2414 from terrelln/mt-progress
[lib] Ensure that multithreaded compression always makes some progress
2020-12-04 16:30:08 -08:00
Nick Terrell 4c58cb8383 [lib] Ensure that multithreaded compression always makes some progress 2020-12-03 20:25:14 -08:00
senhuang42 260b85acf5 Fix MSVC 2019 warnings 2020-12-03 10:36:45 -05:00
Yann Collet 5de5c1d759 fixed fuzzer multithreading tests 2020-12-02 10:34:12 -08:00
Yann Collet db21d383b5 fixed fuzzer32 to support multithreading tests
though it still fails on test33:
`test 33: superblock uncompressible data, too many nocompress superblocks`
2020-12-02 09:13:55 -08:00
Yann Collet f69d8c027d removed fullbench-lib from tests/all
this build works fine on all my systems,
but since to fail on CI environment.
Unclear why there is a difference.
This build test is not relevant anyway.
2020-12-02 00:21:29 -08:00
Yann Collet 9f8b180d5d fixed API documentation 2020-12-02 00:15:07 -08:00
Yann Collet f8d0b46a9f streamline fuzzer
from fuzzer32
2020-12-01 23:44:16 -08:00
Yann Collet 37165f66b7 better usage of default build rules 2020-12-01 23:36:05 -08:00
Yann Collet 343a75d2ef simplified test makefile
removed gzstd target:
relevant tests are unused and broken anyway
2020-12-01 22:33:45 -08:00
senhuang42 4c5f337248 Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation 2020-11-30 15:41:20 -05:00
Yann Collet 4b5d7e9ddb fix lz4 test messed by console detection 2020-11-30 06:47:16 -08:00
senhuang42 23554ff25f Force CCtx minmatch to be same as generated minmatch 2020-11-23 13:29:20 -05:00
senhuang42 c502cd33e5 Fix generating 1 too few characters in random string generator 2020-11-20 16:58:25 -05:00
senhuang42 5b0c8f0a7c Add appropriate bound to matchlengths, and reduce srcSize max 2020-11-20 16:58:25 -05:00
senhuang42 a73a07b189 Add a bound for matchlength dependent on window size 2020-11-20 16:58:25 -05:00
senhuang42 5c68c5e31e Variety of minor fixups, reduce allocation, make deterministic 2020-11-20 16:58:25 -05:00
senhuang42 59c021f501 Add built binary to .gitignore 2020-11-20 16:58:25 -05:00
senhuang42 26bc0bfdf6 Add new fuzzer to build targets 2020-11-20 16:58:25 -05:00
senhuang42 ed575963c5 Implement new fuzzer for sequence compression 2020-11-20 16:58:25 -05:00
senhuang42 7742f076b4 Add experimental param for sequence validation 2020-11-20 11:57:41 -05:00
senhuang42 05c0229668 Clean up visual conversion warnings 2020-11-18 15:36:29 -05:00
senhuang42 d6d7ba2a1f Modification to offset validation to include entire sequence 2020-11-17 10:13:22 -05:00
senhuang42 55b90ef010 Fix unit tests to agree with new changes 2020-11-16 11:36:37 -05:00
senhuang42 3d26615c84 Adjust unit tests to agree with new sequence generation API 2020-11-16 10:49:17 -05:00
senhuang42 2db8441245 Add RLE support 2020-11-16 10:49:17 -05:00
senhuang42 2bbdddf24e Add test case to roundtrip using ZSTD_getSequences() and ZSTD_compressSequences() 2020-11-16 10:49:16 -05:00
senhuang42 9d936d61d2 Reduce number of memcpy() calls 2020-11-13 19:43:30 -05:00
senhuang42 1a8af0de73 Improve unit test 2020-11-12 11:09:09 -05:00
sen f62edf0fe9
Merge pull request #2381 from senhuang42/expand_sequence_extraction_api
Add enum to define ZSTD_Sequence type and update sequence extraction API
2020-11-06 13:00:31 -05:00
senhuang42 7d1dea070c Update unit tests 2020-11-06 11:10:37 -05:00
senhuang42 51abd58208 Rename getSequences() to generateSequences() 2020-11-06 10:53:22 -05:00
Luke Pitt eac309c71b Add ZSTD_getDictID_fromCDict function to experimental section 2020-11-04 11:37:37 +00:00
senhuang42 c54a25b666 Revert compressibility change 2020-11-02 11:38:58 -05:00
senhuang42 d4d0346b40 Update name of enum, clarify documentation 2020-11-02 11:38:17 -05:00
senhuang42 9102f30dbf Update unit test 2020-11-02 11:30:31 -05:00
senhuang42 3327932609 Update ZSTD_getSequences function signature 2020-11-02 10:17:59 -05:00
Nick Terrell 37d546c445
Merge pull request #2379 from terrelln/regression-test
[regression] Updates results.csv & add README
2020-10-30 15:09:38 -07:00
Nick Terrell 7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
Nick Terrell a446fa33dc [regression] Add README explaining the test 2020-10-30 13:55:52 -07:00
Nick Terrell 222916a5d3 [regression] Update results.csv
https://github.com/facebook/zstd/pull/2339 removes the single-pass zstdmt API.
This changes the compressed size, because we no longer take the # of threads into
account when deciding the job size.
2020-10-30 13:54:30 -07:00
sen c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell 2ebf6d5588 [test] Add unit tests for ZSTD_c_stable{In,Out}Buffer 2020-10-30 10:55:34 -07:00
sen ff93440fc6
Merge pull request #2375 from senhuang42/ldm_oss_fuzz_testcase
Add a test case for LDM + opt parser with small uncompressible block
2020-10-29 09:32:05 -04:00
senhuang42 7198ebb213 Un-mix declarations and code 2020-10-28 18:51:03 -04:00
senhuang42 60a52c29e6 Add check for allocation 2020-10-28 16:22:22 -04:00
Nick Terrell 599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
senhuang42 169fc07aa1 Move test to appropriate location 2020-10-27 16:59:43 -04:00
senhuang42 db0b5d7d1e Add test to fuzzer.c 2020-10-27 16:57:24 -04:00
sen 17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
senhuang42 dc448563e9 Add test compatibility with last literals in sequences 2020-10-27 12:35:28 -04:00
Yann Collet d3f1a9b5bd fix partial-build test
sometimes, the scope difference is solely determined by the list of source files,
not by the flags.
2020-10-22 21:36:09 -07:00
Yann Collet 91a8cb9559 fix DEBUGLEVEL redefinition from tests/ 2020-10-22 00:20:40 -07:00
Yann Collet 494f7169ed fix directory creation for Windows' libzstd 2020-10-22 00:15:31 -07:00
Yann Collet ca75da8fa3 fix test
DEBUGLEVEL redefinition
2020-10-21 23:51:13 -07:00
Nick Terrell d6dae2000b
Merge pull request #2365 from senhuang42/move_opt_parser_test_to_long_tests
Move ldm + opt parser no regression test to long tests
2020-10-20 11:34:36 -04:00
senhuang42 81a2c02d8f Move ldm no regression test to fuzzer longtests 2020-10-19 15:28:46 -04:00
senhuang42 df470e176b Add unit test for no cctx requested params change 2020-10-19 10:52:41 -04:00
senhuang42 42d037bdba Add libregression build target, also fix make clean and .gitignore 2020-10-15 10:34:50 -04:00
Yann Collet f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell ede4f97153 [zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell 9ab9229e11 [zstreamtest] Add compression determinism tests
* Run compression twice and check the compressed data is byte-identical.
  The compression loop had to be rewritten to ensure deteriminism. It is
  guaranteed by always making maximal forward progress.
* When nbWorkers > 0, change the number of workers 1/8 of the time.
* Run in single-pass mode 1/4 of the time.

I've run a few hundred thousand iterations of zstreamtest and have seen
no deteriminism issues so far. Before the zstdmt fix that skips the
single-pass shortcut non-determinism showed up in a few hundred
iterations.
2020-10-12 12:55:17 -07:00
Nick Terrell c51a9e79b9 [zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell 7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
Yann Collet b951ad20a2
Merge pull request #2329 from senhuang42/prevent_summary_updates_when_using_stdout
Prevent summary updates when using stdout
2020-10-09 01:01:36 -07:00
Yann Collet c3ee284ca2
Merge pull request #2319 from facebook/fullbench_stream2
update fullbench for compressStream2()
2020-10-09 00:40:59 -07:00
senhuang42 e96ea5d147 Fix static analyze fuzzer.c error 2020-10-07 13:56:25 -04:00
senhuang42 b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42 429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42 cfd2aec1b7 Add unit tests into playTests.sh 2020-10-07 13:56:25 -04:00
senhuang42 7259b258d1 Add callsites to zstdcli.c and tests to playTests.sh 2020-10-07 13:47:38 -04:00
Nick Terrell 0057c4acf7
Merge pull request #2333 from terrelln/stable-dst
Reset all decompression parameters in ZSTD_DCtx_reset()
2020-10-01 18:56:11 -07:00
Nick Terrell 2e7d174130 Reset all decompression parameters in ZSTD_DCtx_reset()
* Reset all decompression parameters in `ZSTD_DCtx_reset()` when
  resetting parameters.
* Add a test case.
2020-10-01 14:19:21 -07:00
Yann Collet 83461ce963
Merge pull request #2322 from senhuang42/guard_against_stdin_for_warning_prompts
Don't let warning messages consume input from stdin
2020-09-30 08:26:50 -07:00
senhuang42 9f7212a48b Update unit tests 2020-09-24 16:44:33 -04:00
Yann Collet c6c0a57c53
Merge pull request #2315 from senhuang42/allow_zstd_suffix
Support .zstd suffix only for decompression
2020-09-24 09:44:48 -07:00
senhuang42 21cd640b93 Add unit tests to guard against bad stdin 2020-09-22 14:55:41 -04:00
senhuang42 7aa3da1cd7 Use IS_CONSOLE macro to detect that we're indeed using a console 2020-09-22 14:15:52 -04:00
Nick Terrell 973f2adeec [tests] Don't write to stdout 2020-09-22 00:40:27 -07:00
Yann Collet 5618e000bd update fullbench for compressStream2()
makes it possible to measure scenarios such as #2314
2020-09-21 07:19:20 -07:00