1904 Commits

Author SHA1 Message Date
senhuang42
3327932609 Update ZSTD_getSequences function signature 2020-11-02 10:17:59 -05:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
sen
c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell
d4e021fe35 [lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
2020-10-30 10:55:34 -07:00
Nick Terrell
24f72789e2 [lib] Skip the input window buffer when ZSTD_c_stableInBuffer is set
Compress directly from the `ZSTD_inBuffer`. We still allocate the input
buffer. A following commit will remove that allocation.
2020-10-30 10:55:34 -07:00
Nick Terrell
6bd6b6f7d3 [cwksp] Return NULL when 0 bytes are requested
This ensures that the buffer is never used.
2020-10-30 10:55:34 -07:00
Nick Terrell
fcf81cee5e [lib] Avoid allocating output buffer when ZSTD_c_stableOutBuffer is set
We compress directly to the `ZSTD_outBuffer` so we don't need to
allocate it.
2020-10-30 10:55:34 -07:00
Nick Terrell
6d5dc93d4e [lib] Compress directly into output when ZSTD_c_stableOutBuffer is set
When we have a stable output buffer always compress directly into the
`ZSTD_outBuffer`. We are allowed to return `dstSizeTooSmall`.
2020-10-30 10:55:34 -07:00
Nick Terrell
987cb4ca6a [lib] Take the shortcut when ZSTD_c_stableOutBuffer is set
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
2020-10-30 10:55:34 -07:00
Nick Terrell
809b2f2071 [lib] Set ZSTD_c_stable{In,Out}Buffer in ZSTD_compress2()
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.

An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
2020-10-30 10:55:34 -07:00
Nick Terrell
c74be3f6de [lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00
Nick Terrell
e3e0775cc8 [API] Add ZSTD_c_stable{In,Out}Buffer parameters
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Nick Terrell
e2581d9572 [lib] Set appliedParams in zstdmt mode
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
2020-10-30 10:54:38 -07:00
senhuang42
536e89c723 Sequence extractor should update CBlockState 2020-10-30 12:13:19 -04:00
senhuang42
32cac2627a Emit last literals of 0 size as well, to indicate block boundary 2020-10-29 16:41:17 -04:00
senhuang42
69bd5f0654 Correct literalsRead calculation to include longLength 2020-10-29 14:49:37 -04:00
senhuang42
59624f3163 Remove implicit typecast to appease appVeyor windows build 2020-10-28 16:25:09 -04:00
senhuang42
3ed5d053d8 Clarify comments in zstd.h some more 2020-10-28 09:53:09 -04:00
Nick Terrell
599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
Nick Terrell
0953645837
Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
senhuang42
3163909d14 Remove unused variable position 2020-10-27 12:58:12 -04:00
senhuang42
dc448563e9 Add test compatibility with last literals in sequences 2020-10-27 12:35:28 -04:00
senhuang42
1d221ecc03 Add support for representing last literals in the extracted seqs 2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd Improve documentation of seqStore_t 2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886 Improve documentation regarding various operations in copyBlockSequences 2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03 Modify ZSTD_copyBlockSequences to agree with new API 2020-10-27 10:31:40 -04:00
senhuang42
8bdb32aebe Add a function for LDM enable check 2020-10-20 13:46:02 -04:00
senhuang42
578e889ec1 Move ldm enable to compressStream2() 2020-10-20 13:04:45 -04:00
senhuang42
d28d8a1d72 Include LDM tables size for CCtx size estimation where relevant 2020-10-20 09:21:30 -04:00
senhuang42
b1c7fc5768 Add compatibility for multithreading 2020-10-19 12:07:06 -04:00
senhuang42
590f7f55f0 Add ldm enable condition in ZSTD_resetCCtx_internal 2020-10-19 10:26:17 -04:00
senhuang42
4d01979b62 Expose and call ZSTD_ldm_skipRawSeqStoreBytes() 2020-10-16 20:30:00 -04:00
Yann Collet
a0ec50c2dc
Merge pull request #2355 from senhuang42/change_ldm_mt_config
Reduce --long mode MT jobsize at higher levels
2020-10-16 13:35:50 -07:00
senhuang42
f49926edf4 Change cycleLog adjustment to +3 from +4 2020-10-15 09:56:05 -04:00
senhuang42
ee84817fe7 Reset posInSequence when using ZSTD_referenceExternalSequences() 2020-10-14 22:06:08 -04:00
senhuang42
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
senhuang42
467e4383b0 Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config 2020-10-14 10:17:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
7e6f91ed84 [minor] Improve docs and add an assert in response to review 2020-10-12 16:43:17 -07:00
senhuang42
354b5f1c0a Use cycleLog instead of chainLog to determine LDM jobLog 2020-10-12 16:09:59 -04:00
Nick Terrell
441ce4178f [zstdmt] Clarify a comment 2020-10-12 12:58:13 -07:00
Nick Terrell
efff5d8b2d [zstdmt] Fix determinism issue with rsyncable mode
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
   for the next sync point.

This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.

Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
2020-10-12 12:55:17 -07:00
Nick Terrell
ede4f97153 [zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell
c51a9e79b9 [zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell
1784c4b4ab [zstdmt] Remove single-pass shortcut
Simplifies the code and removes blocking from zstdmt.

At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.

Fixes #2327.
2020-10-12 12:53:26 -07:00
Nick Terrell
b55ae009ac [zstdmt] Remove singleBlockingThread mode
This is already handled by zstd, so this logic is never used.
2020-10-12 12:53:26 -07:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00