Commit Graph

123 Commits (bb6ca687137a976b4daa22349ab4aa1e53336fe9)

Author SHA1 Message Date
Nick Terrell 66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
senhuang42 4c5f337248 Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation 2020-11-30 15:41:20 -05:00
senhuang42 23554ff25f Force CCtx minmatch to be same as generated minmatch 2020-11-23 13:29:20 -05:00
senhuang42 c502cd33e5 Fix generating 1 too few characters in random string generator 2020-11-20 16:58:25 -05:00
senhuang42 5b0c8f0a7c Add appropriate bound to matchlengths, and reduce srcSize max 2020-11-20 16:58:25 -05:00
senhuang42 a73a07b189 Add a bound for matchlength dependent on window size 2020-11-20 16:58:25 -05:00
senhuang42 5c68c5e31e Variety of minor fixups, reduce allocation, make deterministic 2020-11-20 16:58:25 -05:00
senhuang42 59c021f501 Add built binary to .gitignore 2020-11-20 16:58:25 -05:00
senhuang42 26bc0bfdf6 Add new fuzzer to build targets 2020-11-20 16:58:25 -05:00
senhuang42 ed575963c5 Implement new fuzzer for sequence compression 2020-11-20 16:58:25 -05:00
senhuang42 42d037bdba Add libregression build target, also fix make clean and .gitignore 2020-10-15 10:34:50 -04:00
Nick Terrell cf83aceaf3
Merge pull request #2282 from terrelln/ncount-fix
[bug] Fix FSE_readNCount()
2020-08-26 10:31:07 -07:00
Nick Terrell ae163015b1 [fuzz] Fix stream_decompress timeouts 2020-08-25 17:13:09 -07:00
Nick Terrell 49eeb2d1fc [fuzz] Disable superblock expansion test 2020-08-25 17:13:06 -07:00
Nick Terrell 4193638996 [bug] Fix FSE_readNCount()
* Fix bug introduced in PR #2271
* Fix long-standing bug that is impossible to trigger inside of zstd
* Add a fuzzer that makes sure the normalized count always round trips
  correctly
2020-08-25 15:42:41 -07:00
Nick Terrell 1302f8d676 [fix] Always return dstSize_tooSmall when it is the case 2020-08-24 13:38:13 -07:00
Nick Terrell 0dcd3eec43
Merge pull request #2152 from terrelln/simple-rt-bound
[fuzz] Expand the allowedExpansion
2020-05-19 12:56:11 -07:00
Nick Terrell b82bf711fc [fuzz] Expand the allowedExpansion 2020-05-19 11:42:53 -07:00
Yann Collet fdc56baa42
fix 22294 (#2151) 2020-05-18 21:05:10 -07:00
Bimba Shrestha 255e5e3f56
[fuzz] Adding dictionary_stream_round_trip fuzzer (#2140)
* Adding dictionary_stream_round_trip

* fixing memory leak
2020-05-15 13:33:31 -07:00
Nick Terrell 4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Nick Terrell 1185dfb8d1 [fuzz] Add raw dictionary content fuzzer 2020-05-11 19:03:33 -07:00
Nick Terrell 301a62fe08 [fuzz] Fix compress bound for dictionary_round_trip 2020-05-11 19:00:52 -07:00
Nick Terrell 5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
Nick Terrell e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Nick Terrell 1343b815f8 [fuzz] Fuzz test ZSTD_d_stableOutBuffer 2020-04-27 20:04:04 -07:00
Bimba Shrestha 5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Bimba Shrestha 794f03459e adding refPrefix 2020-04-06 22:57:49 -07:00
Nick Terrell ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell d1cc9d2797
[fuzz] Allow zero sized buffers for streaming fuzzers (#1945)
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
  zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
  a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.

Catches the bug fixed in PR #1939
2020-01-09 11:38:50 -08:00
Nick Terrell b77ad810c9
[fuzz] Fix regression_driver.c with directory input (#1944)
The `numFiles` variable wasn't updated, so the fuzzer didn't do anything.
I did two things to fix this:

1. Remove the `numFiles` variable entirely.
2. Error if we can't open a file and print the number of files tested.
2020-01-08 13:20:56 -08:00
Yann Collet c71bd45a3b Merge branch 'dev' into ahmed_file 2019-11-26 11:20:26 -08:00
Nick Terrell e68db76b4b Update .gitignore 2019-11-20 16:36:40 -08:00
Yann Collet aea2ff5d8d fixed wrong assert() in regression driver 2019-11-06 14:56:21 -08:00
Yann Collet a7e33e3e10 updated fuzz tests to use FileNamesTable* abstraction 2019-11-06 14:42:13 -08:00
Sen Huang e21a8bbecd Fix FUZZ_rand32() bug 2019-11-05 16:43:24 -05:00
Sen Huang f2932fb5eb Fix more merge conflicts 2019-11-05 15:54:05 -05:00
Nick Terrell 60205fec02 Fix 2 bugs in dictionary loading
* Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`.
  This changes the compressor, which silently skips dictionaries <= 8 bytes.
* Allow repcodes that are equal to the dictionary content size, since it is in bounds.
2019-11-01 16:52:07 -07:00
Nick Terrell 75e7c0d107 [fuzz] Add dictionary_loader fuzzer
* Adds the fuzzer
* Adds an additional `InputType` for the fuzzer

I ran the fuzzer for about 10 minutes and it found 2 bugs:

* Catches the original bug without any help
* Catches an additional bug with 8-byte dictionaries
2019-11-01 15:54:24 -07:00
Nick Terrell 8c11f089a1 [fuzz] Increase output buffer size of stream_round_trip
Fixes OSS-Fuzz crash.
Credit to OSS-Fuzz
2019-10-18 13:39:08 -07:00
Nick Terrell d721fcf3ee [fuzz] Fix leak in block_round_trip 2019-09-13 10:32:38 -07:00
Nick Terrell 7c4578160e [fuzz] Generate seed data up to 256KB 2019-09-12 15:02:01 -07:00
Dario Pavlovic 51e9d29a51 Merge branch 'improvDataGen' of github.com:darxsys/zstd into improvDataGen 2019-09-12 13:11:02 -07:00
Dario Pavlovic cd8588077e It's time for all of rng seed code to go. Goodbye 2019-09-12 13:10:34 -07:00
Dario Pavlovic 47bb4c6a23
Update tests/fuzz/fuzz_data_producer.h 2019-09-12 12:45:28 -07:00
Dario Pavlovic 92c58c4d5d Use range instead of the generic uint32 method to use less bytes when generating necessary numbers. 2019-09-12 12:40:12 -07:00
Dario Pavlovic b5b24c2a0d Combining fuzz_data_producer restrict calls into a single function 2019-09-11 10:09:29 -07:00
Dario Pavlovic 23cc2d8510 All tests should give some portion of data to the producer and use the rest. 2019-09-10 16:52:38 -07:00
Dario Pavlovic 0630d084cb [Fuzz] Improve data generation #1723
Converting the rest of the tests to use the new data producer.
2019-09-10 16:14:43 -07:00
Dario Pavlovic ea1ad123da Addressing nits 2019-09-09 16:13:24 -07:00