1313 Commits

Author SHA1 Message Date
udayanbapat
43f21a600e
Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
Elliot Gorokhovsky
cb9e341129 Nits 2022-06-23 16:59:21 -04:00
Elliot Gorokhovsky
2a128110d0 Add prefetchCDictTables CCtxParam 2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky
f6ef14329f
"Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152)
* first attempt at fast DMS short cache

* significant wins for some scenarios

* fix all clang regressions

* nits

* fix 1.5% gcc11 regression on hot 110Kdict scenario

* fix CI

* nit

* Add tags to doublefast hash table

* use tags in doublefast DMS

* Fix CI

* Clean up some hardcoded logic / constants

* Switch forCCtx to an enum

* nit

* add short cache to ip+1 long search

* Move tag size into hashLog

* Minor nits

* Truncate dictionaries greater than 16MB in short cache mode

* Helper function for tag comparison

* Cap short cache hashLog at 24 to prevent overflow

* size_t dictTagsMatch -> int dictTagsMatch

* nit

* Clean up and comment dictionary truncation

* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e

* Comment and expand helper functions

* Asserts and documentation

* nit
2022-06-21 17:27:19 -04:00
Elliot Gorokhovsky
f313a773a4
Merge pull request #3157 from embg/huge_dict_bugfix
Bugfix for huge dictionaries
2022-06-09 15:35:29 -04:00
Elliot Gorokhovsky
31bd6402c6 Bugfix for huge dictionaries 2022-06-09 11:39:30 -04:00
ihsinme
5081ccb056
Update zstd_compress.c 2022-05-30 14:08:19 +03:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
Elliot Gorokhovsky
db2f4a6532 Move bitwise builtins into bits.h 2022-02-14 11:16:03 -05:00
binhdvo
b9566fc558
Add rails for huffman table log calculation (#3047) 2022-02-02 15:12:48 -05:00
Yann Collet
cad9f8d5f9 fix 44239
credit to oss-fuzz

This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode
with a too small dstCapacity.
In which case, there was one place where the buffer size wasn't checked.
2022-02-01 10:49:38 -08:00
Yann Collet
637b2d7a24 fixed bug 44168
discovered by oss-fuzz

It's a bug in the test itself :
ZSTD_compressBound() as an upper bound of the compress size
only works for data compressed "normally".
But in situations where many flushes are forcefully introduced,
this creates many more blocks,
each of which has a potential to increase the size by 3 bytes.
In extreme cases (lots of small incompressible blocks), the expansion can go beyond ZSTD_compressBound().

This situation is similar when using the CompressSequences() API
with Explicit Block Delimiters.
In which case, each explicit block acts like a deliberate flush.
When employed by a fuzzer, it's possible to generate scenarios like the one described above,
with tons of incompressible blocks of small sizes,
thus going beyond ZSTD_compressBound().

fix : when using Explicit Block Delimiters, use a larger bound, to account for this scenario.
2022-01-29 16:36:20 -08:00
Yann Collet
9a68840176 minor refactor to blocksplit
notably simplication of ZSTD_deriveSeqStoreChunk()
2022-01-27 20:24:35 -08:00
Yann Collet
5d70ec0bc4
Merge pull request #3033 from facebook/fix44108
fix issue 44108
2022-01-27 10:57:48 -08:00
Yann Collet
bad7f82300
Merge pull request #2974 from facebook/fix2966_part3
Lazy parameters adaptation (part 1 - ZSTD_c_stableInBuffer)
2022-01-27 06:14:04 -08:00
Yann Collet
8df1257c3c fix issue 44108
credit to oss-fuzz

In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode.
In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes.
This scenario is controlled in ZSTD_seqStore_resolveOffCodes(),
and the repcode is transformed into a raw offset when its new meaning is incorrect.

In more complex scenarios, the previous block might be emitted as uncompressed after all,
thus modifying the expected repcode history.
In the case discovered by oss-fuzz, the first block is emitted as uncompressed,
so the repcode history remains at default values: 1,4,8.

But since the starting repcode is repcode3, and the literal length is == 0,
its meaning is : = repcode1 - 1.
Since repcode1==1, it results in an offset value of 0, which is invalid.

So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset.

But actually, it doesn't matter, because this result will then be compared to reality,
and since it's an invalid offset, it will necessarily be discarded if incorrect,
then the repcode will be replaced by a raw offset.

So the `assert()` is not useful.
Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.
2022-01-27 05:49:59 -08:00
Yann Collet
f2d9652ad8 more usage of new error code stabilityCondition_notRespected
as suggested by @terrelln
2022-01-26 18:30:55 -08:00
Yann Collet
a66e8bb437 introduced LitHufLog constant
which properly represents the maximum bit size of compressed literals (11) as defined in the specification.

To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident.

Name selected to keep the same convention as existing width definitions,
MLFSELog, LLFSELog and OffFSELog.
2022-01-26 14:47:24 -08:00
Yann Collet
32a5d95dcb moved HufLog to lib/decompress
it's only used to size decompression tables
2022-01-26 14:47:24 -08:00
Yann Collet
7616e39f3b adding traces to better track processing of literals 2022-01-26 14:47:21 -08:00
Yann Collet
cbff372d10 added helper function inBuffer_forEndFlush() 2022-01-26 11:05:57 -08:00
Yann Collet
b99ece96b9 converted checks into user validation generating error codes
had to create a new error code for this condition,
none of the existing ones were fitting enough.
2022-01-26 10:43:50 -08:00
Yann Collet
c1668a00d2 fix extended case combining stableInBuffer with continue() and flush() modes 2022-01-26 10:31:25 -08:00
Yann Collet
270f9bf005 better consistency in accessing @input
as suggested by @terrelln.

Also : commented zstreamtest more
to ensure ZSTD_stableInBuffer is tested/
2022-01-26 10:31:24 -08:00
Yann Collet
8296be4a0a pretend consuming input to provide a sense of forward progress 2022-01-26 10:31:24 -08:00
Yann Collet
4b9d1dd9ff fixed incorrect comment 2022-01-26 10:31:24 -08:00
Yann Collet
27d336b099 minor behavior refinements
specifically, there is no obligation to start streaming compression with pos=0.
stableSrc mode is now compatible with this setup.
2022-01-26 10:31:24 -08:00
Yann Collet
37b87add7a make stableSrc compatible with regular streaming API
including flushStream().

Now the only condition is for `input.size` to continuously grow.
2022-01-26 10:31:24 -08:00
Yann Collet
c0c5ffa973 streaming compression : lazy parameter adaptation with stable input
effectively makes ZSTD_c_stableInput compatible ZSTD_compressStream()
and zstd_e_continue operation mode.
2022-01-26 10:31:24 -08:00
Yann Collet
5684bae4f6 minor refactoring
on streaming compression implementation.
2022-01-26 10:31:23 -08:00
Yann Collet
fc2ea97442 refactored fuzzer tests for sequence compression api
add explicit delimiter mode to libfuzzer test
2022-01-26 00:19:35 -08:00
Yann Collet
87dcd3326a fix sequence compression API in Explicit Delimiter mode 2022-01-25 13:33:41 -08:00
Yann Collet
7a18d709ae updated all names to offBase convention 2021-12-29 17:30:43 -08:00
Yann Collet
8da414231d found a few more places which were dependent on seqStore offcode sumtype numeric representation 2021-12-28 17:03:24 -08:00
Yann Collet
6fa640ef70 separate newRep() from updateRep()
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.

Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
2021-12-28 11:52:33 -08:00
Yann Collet
1aed962216 introduce macros STORE_OFFSET() and STORE_REPCODE()
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.

Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.

While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".

One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.

This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.

At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
2021-12-23 22:03:30 -08:00
Yann Collet
aeff128331 change seqDef.offset into seqDef.offBase
to better reflect the value stored in this field.
2021-12-23 17:56:08 -08:00
Yann Collet
e145b58cfd changed seqDef.matchLength into seqDef.mlBase
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
2021-12-23 13:39:46 -08:00
Yann Collet
b77fcac61f change ZSTD_storeSeq() interface to accept matchLength
instead of mlBase.

This removes the need to do `- MINMATCH` at every call site.

The new interface contract is checked with an `assert()`.
2021-12-23 12:03:33 -08:00
Yann Collet
a9e43b37d0
Revert "Limit ZSTD_maxCLevel to 21 for 32-bit binaries." 2021-12-20 11:43:14 -08:00
Norbert Lange
2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
Nick Terrell
21e28f5c24
Merge pull request #2891 from supperPants/dev
Fix typos
2021-12-02 13:53:33 -05:00
Yann Collet
3f64b31585 Merge branch 'dev' into tomerge2051 2021-12-01 15:29:49 -08:00
Yann Collet
8031dc7a48
Merge pull request #2885 from yoniko/limit-level-32bit-systems
Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.
2021-12-01 14:19:16 -08:00
supperPants
d4713de5a3 Fix typos. 2021-12-01 22:36:21 +08:00
Nick Terrell
5414dd7978 [bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
  TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
  BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
  be any cases where bmi2 is supported but bmi1 isn't. But, since we are
  using the instruction we should check bmi1 as well.
2021-11-30 17:54:56 -08:00
Yonatan Komornik
ef2cba609d ZSTD_maxCLevel now limited to 21 for 32-bit binaries.
CI tests for constrained memory runs with max level on 32-bit binaries.
2021-11-30 10:31:52 -08:00
Dimitris Apostolou
ebbd675998
Fix typos 2021-11-13 10:04:04 +02:00
W. Felix Handte
48572f52b1 Rewrite Fix to Still Auto-Vectorize 2021-11-09 12:17:03 -05:00
W. Felix Handte
61765cacd0 Avoid Reducing Indices to Reserved Values
Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.

So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.
2021-11-08 20:03:52 -05:00