199 Commits

Author SHA1 Message Date
Sen Huang
41c3eae6d9 Fix various fuzzer failures: repcode history, superblocks 2021-03-24 08:21:29 -07:00
senhuang42
0633bf17c3 Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression 2021-03-24 08:21:29 -07:00
senhuang42
eb1ee8686d Refactor buildSequencesStatistics() to avoid pointer increment for superblocks 2021-03-24 08:21:29 -07:00
senhuang42
f06f6626ed Update function names for consistency 2021-03-24 08:20:54 -07:00
senhuang42
c56d6e49e8 Add block splitter to experimental params 2021-03-24 08:20:54 -07:00
senhuang42
c05c090cc2 Centralize entropy statistics calculations to zstd_compress.c 2021-03-24 08:20:29 -07:00
Yann Collet
8884cb887d
Merge pull request #2483 from mpu/ldmgear
New algorithms for the long distance matcher
2021-02-11 08:38:23 -08:00
Quentin Carbonneaux
552efcac2d relocate large arrays from the stack to ldmState_t 2021-02-10 16:16:54 +01:00
Nick Terrell
e59c9459a5 [trace] Keep track of a uint64_t tracing context
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.

This keeps the simple case simple.
2021-02-09 11:37:05 -08:00
Nick Terrell
54a4998a80 Add basic tracing functionality 2021-02-05 16:28:52 -08:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Yann Collet
6132df8dd3 fix gcc-10 strict aliasing warnings
by exposing HUF_CElt declaration.
2020-12-04 16:43:19 -08:00
senhuang42
7742f076b4 Add experimental param for sequence validation 2020-11-20 11:57:41 -05:00
senhuang42
7f563b0519 Add new sequence format as an experimental CCtx param 2020-11-16 10:49:17 -05:00
senhuang42
46824cb018 Add new sequence compress api params to cctx 2020-11-16 10:49:17 -05:00
senhuang42
5fd69f8173 Add documentation for new api functions 2020-11-16 10:49:16 -05:00
senhuang42
e5fe485dcc Fix cSize calculation for noCompressBlocks 2020-11-16 10:49:16 -05:00
senhuang42
75b01f34b9 Add support for uncompressible blocks 2020-11-16 10:49:16 -05:00
senhuang42
2cff8df1a2 Pull block compression out of main compressSequences() function 2020-11-16 10:49:16 -05:00
senhuang42
89f3848310 Add support for repcodes 2020-11-16 10:49:16 -05:00
Nick Terrell
d4e021fe35 [lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
2020-10-30 10:55:34 -07:00
Nick Terrell
c74be3f6de [lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00
Nick Terrell
e3e0775cc8 [API] Add ZSTD_c_stable{In,Out}Buffer parameters
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Yann Collet
12541931fa
Merge pull request #2328 from marxin/zstd-pool-api
Allow external creation of POOLs that can be shared.
2020-10-09 01:00:50 -07:00
senhuang42
b9c8033cde Define kNullRawSeqStore for every file 2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
1a687b3fcb Improve documentation of relevant structs 2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2 Add ldm_calculateMatchRange() function 2020-10-07 13:56:25 -04:00
senhuang42
ef823e0299 Remove rawSeqStore.base and add rawSeqStore.posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
senhuang42
e1ae398ad5 Add rawSeqStore to match state 2020-10-07 13:56:24 -04:00
Martin Liska
b684900a4a Allow external creation of POOLs that can be shared. 2020-10-07 12:44:33 +02:00
Nick Terrell
27c969ed07 Add comments to ZSTD_getLowest{Match,Prefix}Index()
Clarify how we handle dictionaries in each case.
2020-10-01 13:21:46 -07:00
animalize
2e5d73dd72 Use MEM_STATIC FORCE_INLINE_ATTR instead of FORCE_INLINE_TEMPLATE
It adds `__attribute__((unused))` for __GNUC__, to eliminate `-Werror=unused-function` error.
2020-09-21 13:26:38 +08:00
animalize
0a69a6b1ca Let MSVC force inline ZSTD_hashPtr() function
ZSTD_hashPtr() function was not expanded by MSVC, led to low performance compared to GCC.
2020-09-21 10:38:55 +08:00
W. Felix Handte
5390fee4f7 Rename and Move DD_BLOG Constant to ZSTD_LAZY_DDSS_BUCKET_LOG 2020-09-10 18:51:52 -04:00
W. Felix Handte
f1b428fdac Rename enableDedicatedDictSearch to dedicatedDictSearch in MatchState
This makes it clear that not only is the feature allowed here, we're actually
using it, as opposed to the CCtxParam field, in which it's enabled, but we may
or may not be using it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
34b545acb0 Add a ZSTD_dedicatedDictSearch ZSTD_dictMode_e to Allow Const Propagation
Speed +1.5%.
2020-09-10 18:51:52 -04:00
Bimba Shrestha
31e581bf65 adding enableDedicatedDictSearch to matchState_t 2020-09-10 18:51:52 -04:00
Bimba Shrestha
f10d4e313c adding ZSTD_dedicatedDictSearch_defaultCParameters variable 2020-09-10 18:51:52 -04:00
Bimba Shrestha
c497cb6716 Add ZSTD_c_enableDedicatedDictSearch Param 2020-09-10 18:51:52 -04:00
Nick Terrell
a90779397a [lib] Reduce zstd stack usage by 1KB 2020-09-09 14:35:39 -07:00
Nick Terrell
f91ed5c766 [lib] s/current/curr because it collides with Linux Kernel macro 2020-09-09 14:35:39 -07:00
Nick Terrell
c465f24457 ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free 2020-08-26 12:26:03 -07:00
Niadb
216a63dcf7
Add files via upload 2020-07-28 02:52:52 -06:00
Nick Terrell
08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00