Commit Graph

307 Commits (e926e9c4c34fb170d901497e5b92189dbf4147a6)

Author SHA1 Message Date
Yann Collet f82d9865b9
Merge pull request #2278 from senhuang42/ignore_checksum_advanced_param
New advanced decompression param to ignore checksums
2020-08-25 12:08:53 -07:00
Nick Terrell 614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
senhuang42 a030560d62 Add new DCtx param: validateChecksum and update unit tests 2020-08-24 17:28:00 -04:00
Nick Terrell 1302f8d676 [fix] Always return dstSize_tooSmall when it is the case 2020-08-24 13:38:13 -07:00
senhuang42 44c54a3e31 Addressing comments: more comments, cleanup, remove extra function, checksum logic 2020-08-24 16:14:19 -04:00
senhuang42 ffaa0df76d Document change in CLI for --no-check during decompression in --help menu 2020-08-24 09:49:12 -04:00
senhuang42 20eb095882 Added unit test to fuzzer.c, changed definition param name 2020-08-22 13:26:33 -04:00
senhuang42 1b34b15e6b Adding CLI capability to invoke decompression with no checksum 2020-08-21 17:49:30 -04:00
senhuang42 6a8dbdcd1f Modify decompression loop to gnore checksums if flag is enabled 2020-08-21 16:46:46 -04:00
Nick Terrell 575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell 651d3d73e0 [test] Update the ldm loadedDictEnd test to cover zstdmt 2020-05-19 16:14:14 -07:00
Nick Terrell 7b317b4876 [test] Test that the ldm dictionary gets invalidated on reset 2020-05-18 16:00:28 -07:00
Nick Terrell 87dbd6d4bf [test] Improve LDM forceMaxWindow test 2020-05-18 15:11:18 -07:00
Nick Terrell bf0591e1e2 [test] Expose the LDM+MT+dict bug in a unit test 2020-05-14 12:06:55 -07:00
Yann Collet e001715b3d fixed asan test 2020-05-11 20:35:47 -07:00
Yann Collet 20bd246045 blindfix for VS macro redefinition 2020-05-11 19:29:36 -07:00
Yann Collet 91ad01218e updated initStatic tests
differentiate small CCtx for small inputs
from full CCtx
from CStream contexts.

Ensure allocation & resize tests are more precise.
2020-05-11 18:50:10 -07:00
Yann Collet 608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
Yann Collet dd026ca505 re-inforced tests for initStaticCCtx
ensure that `estimateCCtxSize()` works as intended.
2020-05-09 11:30:45 -07:00
Nick Terrell 5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
W. Felix Handte 2cf72d56a6 Try to Fix MSVC Error
It's complaining about the `memcpy`s, saying:

"warning C4090: 'function': different 'const' qualifiers"

Let's try explicitly casting to the argument types...
2020-05-04 10:59:15 -04:00
W. Felix Handte dacbcd2cc1 Fix Up Some Pointer Handling in Tests 2020-05-04 10:59:15 -04:00
Nick Terrell e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Nick Terrell 108a5572a5
Merge pull request #2048 from nocnokneo/ctest-support
Add CTest support
2020-04-28 11:01:13 -07:00
Bimba Shrestha f7a7409a49 adding fail test when passing wrong fullDict using refPrefix 2020-04-21 22:26:48 -07:00
Bimba Shrestha 5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Bimba Shrestha 31e76f1ed4 adding test for dctx size reduction 2020-04-04 08:49:24 -07:00
Nick Terrell ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Taylor Braun-Jones 3cbc3d37e7 Add documentation for -T option 2020-03-23 17:49:04 -04:00
Bimba Shrestha 6a4258a08a
Removing symbols already in unit tests and adding some new unit tests for missing symbols (#1985)
* Removing symbols that are not being tested

* Removing symbols used in zstdcli, fileio, dibio and benchzstd

* Removing symbols used in zbuff and add test-zbuff to travis

* Removing remaining symbols and adding unit tests instead

* Removing symbols test entirely
2020-02-05 16:55:00 -08:00
Nick Terrell 3ed0f65158 [cmake] Add playTests.sh as a test 2020-01-13 14:16:15 -08:00
Bimba Shrestha b1f53b1a10 [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936)
* Adding fail logging for superblock flow

* Dividing by targetCBlockSize instead of blockSize

* Adding new const and using more acurate formula for nbBlocks

* Only do dstCapacity check if using superblock

* Remvoing disabling logic

* Updating test to make it catch more extreme case of previou bug

* Also updating comment

* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Bimba Shrestha 56415efc76 Constifying, malloc check and naming nit 2019-12-17 17:16:51 -08:00
Bimba Shrestha 989ce13e19 One more type conversion 2019-12-13 16:50:21 -08:00
Bimba Shrestha 4399eed42e Adding explict cast to satisfy appveyor ci 2019-12-13 16:38:11 -08:00
Bimba Shrestha db5124ef6e More void* issues. Just replacing with BYTE* 2019-12-13 16:24:49 -08:00
Bimba Shrestha 49b2bf7106 'void* size issue' fix 2019-12-13 16:06:57 -08:00
Bimba Shrestha e3cd2785e2 Add test to catch too many noCompress superblocks on streaming 2019-12-13 15:31:29 -08:00
Bimba Shrestha 826b555463
Merge branch 'dev' into oss 2019-11-22 17:29:33 -08:00
Bimba Shrestha 707a12c419 Test enough room for checksum in superblock 2019-11-22 17:25:36 -08:00
Nick Terrell 659e9f05cf Fix null pointer addition 2019-11-20 18:36:04 -08:00
Nick Terrell a839d6852c
Merge pull request #1888 from senhuang42/superblocks_fixed
RLE test and re-enable RLE in main compression loop
2019-11-18 16:09:33 -08:00
Sen Huang bc3e21578d No margin on RLE test size check 2019-11-18 16:39:16 -05:00
Sen Huang db8efbfe7d Updated comment to reflect actual compression behavior 2019-11-15 16:11:14 -05:00
Sen Huang 75c34684c0 Modified existing RLE test to take compressed size into account 2019-11-15 12:26:48 -05:00
Yann Collet d67742bc5d
Merge pull request #1858 from senhuang42/dictionary_header_size
Method to get dictionary header size
2019-11-14 09:44:07 -08:00
Sen Huang 97b7f712f3 Change to heap allocation, remove implicit type conversion 2019-11-08 13:57:25 -05:00
Sen Huang e1edc554a3 Added 2 unit tests: one for sanity, one for correctnesson fixed dict 2019-11-08 13:57:25 -05:00
Nick Terrell 8c474f9845 Fix parameter selection and adjustment with srcSize == 0 2019-11-07 08:58:43 -08:00
Nick Terrell b1ec94e63c Fix ZSTD_f_zstd1_magicless for small data
* Fix `ZSTD_FRAMEHEADERSIZE_PREFIX` and `ZSTD_FRAMEHEADERSIZE_MIN` to
  take a `format` parameter, so it is impossible to get the wrong size.
* Fix the places that called `ZSTD_FRAMEHEADERSIZE_PREFIX` without
  taking the format into account, which is now impossible by design.
* Call `ZSTD_frameHeaderSize_internal()` with `dctx->format`.
* The added tests catch both bugs in `ZSTD_decompressFrame()`.

Fixes #1813.
2019-10-21 21:16:17 -07:00
Yann Collet 6323966e53 updated erroneous comments using ZSTD_dm_*
instead of the current ZSTD_dct_*,
reported by @nigeltao (#1822)
2019-10-16 16:14:04 -07:00
Yann Collet fb77afc626
Merge pull request #1760 from bimbashrestha/extract_sequences_api
Adding api for extracting sequences from seqstore
2019-10-10 13:11:18 -07:00
Bimba Shrestha 36528b96c4 Manually moving instead of memcpy on decoder and using genBuffer() 2019-10-03 09:26:51 -07:00
Bimba Shrestha b63a1e7ae5 Typo fix 2019-09-27 07:20:20 -07:00
Bimba Shrestha 91daee5c06 Fixing appveyor test 2019-09-26 16:21:57 -07:00
Bimba Shrestha 75b1286354 Fixing shortest failure 2019-09-26 16:07:34 -07:00
Bimba Shrestha bb27472afc Adding more realistic test for get sequences 2019-09-26 15:38:31 -07:00
Bimba Shrestha be0bebd24e Adding test and null check for malloc 2019-09-23 15:08:18 -07:00
W. Felix Handte f7d9b36835 Update Comment on `ZSTD_estimateCCtxSize()` 2019-09-20 14:11:29 -04:00
Bimba Shrestha 3cacc0a30b Casting void pointer to ZSTD_Sequence pointer 2019-09-17 17:44:08 -07:00
Bimba Shrestha 5b038f128f Merge branch 'extract_sequences_api' of https://github.com/bimbashrestha/zstd into extract_sequences_api 2019-09-16 13:35:49 -07:00
Bimba Shrestha 1f93be0f6d Handling memory leak and potential side effect 2019-09-16 13:35:45 -07:00
Bimba Shrestha a874435478
Merge branch 'dev' into extract_sequences_api 2019-09-16 13:29:59 -07:00
W. Felix Handte 194c542598 Fix Memory Leak in Test 2019-09-11 14:25:30 -04:00
W. Felix Handte ff67c62458 Fix Compilation Error (`uint32_t` -> `size_t`) 2019-09-11 13:59:09 -04:00
W. Felix Handte 5707c8a9d5 Speed Up Test a Little 2019-09-11 13:23:59 -04:00
W. Felix Handte ed4c2c60c3 Add Fuzzer Test Case for Index Reduction 2019-09-11 13:17:19 -04:00
Bimba Shrestha 9e7bb55e14 Addressing comments 2019-09-09 20:04:46 -07:00
Bimba Shrestha 5f8b0f6890 Changing api to get sequences across all blocks 2019-08-30 09:18:44 -07:00
Yann Collet 5198347382
Merge pull request #1744 from bimbashrestha/dev
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
bimbashrestha e5704bbfdf Added test for multiple blocks of zeros and fixed nit about comments 2019-08-28 08:32:34 -07:00
Ed Maste b81d7cc6a0 remove extraneous doubled ;s 2019-08-15 21:17:06 -04:00
Yann Collet 0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet efe8496755 minor test refactoring
just for clarity, for the currently failing unit test
2019-08-02 19:31:19 +02:00
Yann Collet 387e20d4f0 fixed minor conversion warning in datagen 2019-08-02 18:02:54 +02:00
Yann Collet 37f47e51a8 fixed datagen
to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.

also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.
2019-08-02 17:34:53 +02:00
Yann Collet d1927f0b39 regenerate sample to compress
to reduce chances of differences between 32 and 64-bit fuzzer tests
2019-08-02 15:31:00 +02:00
Yann Collet 5cf1b24aca fixed strategies greedy, lazy & lazy2
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet 2115292616 minor : fixed ptr arithmetic
invalid on void ptr
2019-08-01 17:12:26 +02:00
Yann Collet 810a9cac08 added efficiency test
to detect gross CR variations after a patch.

Tests normal and dictionary compression.
2019-08-01 16:59:22 +02:00
Yann Collet 98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
Tyler-Tran c55d2e7ba3 Adding shrinking flag for cover and fastcover (#1656)
* Changed ERROR(GENERIC) excluding inits

* editing git ignore

* Edited init functions to size_t returns

* moved declarations earlier

* resolved issues with changes to init functions

* fixed style and an error check

* attempting to add tests that might trigger changes

* added && die to cases expecting to fail

* resolved no die on expected failed command

* fixed accel to be incorrect value

* Adding an automated shrinking option

* Fixing build

* finalizing fixes

* fix?

* Removing added comment in cover.h

* Styling fixes

* Merging with fb dev

* removing megic number for default regression

* Requested revisions

* fixing support for fast cover

* fixing casting errors

* parenthesis fix

* fixing some build nits

* resolving travis ci syntax

* might resolve all compilation issues

* removed unused variable

* remodeling the selectDict function

* fixing bad memory access

* fixing error checks

* fixed erroring check in selectDict

* fixing mixed declarations

* modify mixed declaration

* fixing nits and adding test cases

* Adding requested changes + fixed bug for error checking

* switched double comparison from != to <

* fixed declaration typing

* refactoring COVER_best_finish() and changing shrinkDict

* removing the const's

* modifying ZDICT_optimizeTrainFromBuffer_cover functions

* fixing potential bad memcpy

* fixing the error function for dict size
2019-06-27 16:26:57 -07:00
Yann Collet ed38b645db fullbench: pass proper parameters in scenario 43 2019-05-29 15:26:06 -07:00
Yann Collet 4baecdf72a added comments to better understand enforceMaxDist() 2019-05-28 13:15:48 -07:00
Josh Soref a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Yann Collet 8ac2831f3d
Merge pull request #1581 from facebook/benchfn
benchfn's reduced dependencies
2019-04-11 14:23:04 -07:00
Nick Terrell 50b9c41196 [libzstd] Fix decompression dictionary bugs and clean up initialization
Bugs:

* `ZSTD_DCtx_refPrefix()` didn't clear the dictionary after the first
  use. Fix and add a test case.
* `ZSTD_DCtx_reset()` always cleared the dictionary. Fix and add a test
  case.
* After calling `ZSTD_resetDStream()` you could no longer load a
  dictionary, since the stage was set to `zdss_loadHeader`. Fix and add
  a test case.

Cleanup:

* Make `ZSTD_initDStream*()` and `ZSTD_resetDStream()` wrap the new
 advanced API, and add test cases.
* Document the equivalent of these functions in the advanced API and
  document the unstable functions as deprecated.
2019-04-10 12:59:02 -07:00
Yann Collet 59a7116cc2 benchfn dependencies reduced to only timefn
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.

Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.

Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
2019-04-10 12:37:03 -07:00
Nick Terrell 824aaa695f [libzstd] Fix ZSTD_decompressDCtx() with a dictionary
* `ZSTD_decompressDCtx()` did not use the dictionary loaded by
  `ZSTD_DCtx_loadDictionary()`.
* Add a unit test.
* A stacked diff uses `ZSTD_decompressDCtx()` in the
  `dictionary_round_trip` and `dictionary_decompress` fuzzers.
2019-04-09 17:59:27 -07:00
Nick Terrell 48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell 6b053b9f60 [lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
  compression job. This allows the user to pass the dictionary before they
  set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell f52a7d8faa
Merge pull request #1547 from shakeelrao/fix-error
Fix incorrect error code in ZSTD_errorFrameSizeInfo
2019-03-15 10:57:49 -07:00
Nick Terrell 787b76904a [libzstd] Allow compression parameters to be set with a cdict
The order you set parameters in the advanced API is not supposed to matter.
However, once you call `ZSTD_CCtx_refCDict()` the compression parameters
cannot be changed. Remove that restriction, and document what parameters
are used when using a CDict.

If the CCtx is in dictionary mode, then the CDict's parameters are used.
If the CCtx is not in dictionary mode, then its requested parameters are
used.
2019-03-13 16:10:05 -07:00
shakeelrao 18d3a97d43 Add unit test to validate the error case 2019-03-13 01:43:40 -07:00
shakeelrao 95dfd48143 update formatting 2019-03-01 23:11:15 -08:00
shakeelrao 3da3dc2f45 add missing size content test 2019-03-01 21:27:30 -08:00
shakeelrao 03026c3b1d change compressedBound to ULL 2019-03-01 00:03:50 -08:00
shakeelrao 8930c3c79b implement API-level changes 2019-02-28 22:55:18 -08:00
shakeelrao d0a3f25697 change return type to ULL 2019-02-28 01:52:01 -08:00
shakeelrao 820af1e078 Provide an API function to estimate decompressed size.
Introduces a new utility function `ZSTD_findFrameCompressedSize_internal` which
is equivalent to `ZSTD_findFrameCompressSize`, but accepts an additional output
parameter `bound` that computes an upper-bound for the compressed data in the frame.

The new API function is named `ZSTD_decompressBound` to be consistent with
`zstd_compressBound` (the inverse operation). Clients will now be able to compute an upper-bound for
their compressed payloads instead of guessing a large size.

Implements https://github.com/facebook/zstd/issues/1536.
2019-02-28 00:42:49 -08:00