989 Commits

Author SHA1 Message Date
Nick Terrell
08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Felix Handte
2af4e07326
Merge pull request #2133 from felixhandte/single-size-calculation
Consolidate CCtx Size Estimation Code
2020-05-28 13:07:18 -04:00
Nick Terrell
b2092c6dc4 [ldm] Reset loadedDictEnd when the context is reset 2020-05-18 12:35:44 -07:00
Nick Terrell
add7ed2d4a [lib] Fix bug in loading LDM dictionary in MT mode
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.

Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```

TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
W. Felix Handte
3bb7992350 Fix Size Estimate for LDM Seq Space 2020-05-14 13:50:53 -04:00
Nick Terrell
c3e921c639
Merge pull request #2131 from terrelln/raw-dict-fuzzer
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
W. Felix Handte
d9a1e37aec Nit: Fix Size Type for 32-bit 2020-05-12 18:03:31 -04:00
W. Felix Handte
1aa6c7ccce Assert We Allocated Approximately What We Expected To 2020-05-12 16:55:03 -04:00
W. Felix Handte
27e2482217 Minor Refactor 2020-05-12 16:55:03 -04:00
W. Felix Handte
afc2488973 Handle Non-Static CCtxes in Estimation 2020-05-12 16:54:33 -04:00
W. Felix Handte
7ed996f5a0 Consolidate CCtx Size Estimation Code
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.

This attempts to guarantee that the estimates returned to users are always
correct.
2020-05-12 16:26:53 -04:00
Nick Terrell
4e0515916d [lib] Fix repcode validation in no dict mode 2020-05-12 11:57:15 -07:00
Yann Collet
608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
W. Felix Handte
c6636afbbb Fix ZSTD_estimateCCtxSize() Under ASAN
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.

This commit fixes this.
2020-05-11 18:58:19 -04:00
caoyzh
a7e34ff693 revert ZSTD_reduceTable_internal()'s modificatiion 2020-05-07 13:10:46 -07:00
caoyzh
b2e56f7f7f Optimize compression by using neon function. 2020-05-07 13:10:46 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
W. Felix Handte
6696933b32 Make All Invocations Start With Literal Format String 2020-05-04 10:59:15 -04:00
W. Felix Handte
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
Nick Terrell
e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Bimba Shrestha
1875f616ce passing dictContentType instead of rawContent every time 2020-04-21 22:29:35 -07:00
Bimba Shrestha
5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Bimba Shrestha
c0d4b2b5a3
Merge pull request #2075 from bimbashrestha/dict_fuzzer_ref
[bug] handling case where prefix is NULL or 0 sized in refPrefix_advanced
2020-04-07 17:37:19 -05:00
Bimba Shrestha
1658ae75cd handling nil case for refprefix 2020-04-07 14:41:53 -07:00
Carl Woffenden
edd9a07322 Code replicated in compression and decompression moved to shared headers
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
Carl Woffenden
7c420344d2 Single-file decoder script can now (optionally) create an encoder
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
2020-04-03 19:07:46 +02:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Bimba Shrestha
80c26117a9 Line-wrapping 2020-02-03 09:38:16 -08:00
Bimba Shrestha
ee8a712af3 Using appliedParams instead of supplied params 2020-01-31 15:49:07 -08:00
Nick Terrell
036b30b555
Fix super block compression and stream raw blocks in decompression (#1947)
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.

This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.

* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).

Credit to OSS-Fuzz.
2020-01-10 18:02:11 -08:00
Bimba Shrestha
b1f53b1a10 [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936)
* Adding fail logging for superblock flow

* Dividing by targetCBlockSize instead of blockSize

* Adding new const and using more acurate formula for nbBlocks

* Only do dstCapacity check if using superblock

* Remvoing disabling logic

* Updating test to make it catch more extreme case of previou bug

* Also updating comment

* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Bimba Shrestha
56415efc76 Constifying, malloc check and naming nit 2019-12-17 17:16:51 -08:00
Bimba Shrestha
5225dcfc0f Adding bool to check if enough room left for noCompress superblocks 2019-12-13 15:47:28 -08:00
Yann Collet
d73e2fb465
Merge pull request #1891 from bimbashrestha/oss
[fuzz] Superblock fuzz issues
2019-12-10 13:17:00 -08:00
Bimba Shrestha
e1913dc87f Making const, removing unnecessary indent, changing parameter order 2019-12-04 15:51:17 -08:00
Bimba Shrestha
2ec556fec2 Moving init/end functions, moving compressSuperBlock inside body() 2019-12-04 15:23:13 -08:00
Bimba Shrestha
ffb0463041 Refactor 2019-12-04 14:52:27 -08:00
Bimba Shrestha
49c6d49247 [fuzz] msan uninitialized unsigned value (#1908)
Fixes new fuzz issue

Credit to OSS-Fuzz

* Initializing unsigned value

* Initialilzing to 1 instead of 0 because its more conservative

* Unconditionoally setting to check first and then checking zero

* Moving bool to before block for c90

* Move check set before block
2019-12-04 10:02:17 -08:00
Bimba Shrestha
1fc9352f81 Using bss var instead of creating new bool 2019-12-02 21:39:06 -08:00
Bimba Shrestha
1f681d8592 Merge branch 'oss' of https://github.com/bimbashrestha/zstd into oss 2019-11-27 10:56:54 -08:00
Bimba Shrestha
a3a3c62b81 [fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898)
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.

* Only setting loaded dict huf table to valid on non-zero

* Adding hasNoZeroWeights test to fse tables

* Forbiding nbBits != 0 when weight == 0

* Reverting the last commit

* Setting table log to 0 when weight == 0

* Small (invalid) zero weight dict test

* Small (valid) zero weight dict test

* Initializing repeatMode vars to check before zero check

* Removing FSE changes to seperate pr

* Reverting accidentally changed file

* Negating bool, using unsigned, optimization nit
2019-11-26 12:24:19 -08:00
Bimba Shrestha
d4e17d0776 Negating bool, updating bool on inner branches 2019-11-26 12:17:43 -08:00
Bimba Shrestha
826b555463
Merge branch 'dev' into oss 2019-11-22 17:29:33 -08:00
Bimba Shrestha
10bce1919e Mixed declration fix 2019-11-21 13:08:27 -08:00
Bimba Shrestha
0451accab1 Checking noCompressBlock explicitly for rep code confirmation 2019-11-21 13:06:26 -08:00
Nick Terrell
659e9f05cf Fix null pointer addition 2019-11-20 18:36:04 -08:00
Bimba Shrestha
8f0c2d04c8 Going back to original flow but removing else return 2019-11-19 10:03:07 -08:00
Nick Terrell
a839d6852c
Merge pull request #1888 from senhuang42/superblocks_fixed
RLE test and re-enable RLE in main compression loop
2019-11-18 16:09:33 -08:00
Bimba Shrestha
80586f5e80 Reversing condition order and forwarding error 2019-11-18 13:53:55 -08:00
Bimba Shrestha
dade64428f Output regular uncompressed block when compressSequences fails 2019-11-18 08:43:14 -08:00