Commit Graph

6425 Commits (59a7116cc2fb0118a0c4a238326b0549425b994e)

Author SHA1 Message Date
Yann Collet 59a7116cc2 benchfn dependencies reduced to only timefn
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.

Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.

Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
2019-04-10 12:37:03 -07:00
Yann Collet 094c000904 Merge branch 'dev' into benchfn 2019-04-10 11:57:05 -07:00
Nick Terrell f86d4bd1d5
Merge pull request #1576 from terrelln/dict-fuzz
Add new fuzzers and fix exposed bugs
2019-04-10 11:29:15 -07:00
Nick Terrell 5f6ca3c6ce
Merge pull request #1578 from orip/r-flag-typo
Fixed `-r` typo
2019-04-10 11:19:02 -07:00
Yann Collet 90c0462d63 minor presentation refactoring
and removed some // comment style
2019-04-10 10:03:06 -07:00
Ori Peleg bdeb4786b5 Fixed `-r` typo 2019-04-10 13:37:41 +03:00
Nick Terrell c45dec12c5 [fuzzer] Use ZSTD_DCtx_loadDictionary_advanced() half the time 2019-04-09 18:02:22 -07:00
Nick Terrell 10a3d4dca9 [fuzzer] Make the regression_driver work while fuzzers are active 2019-04-09 18:01:49 -07:00
Nick Terrell 824aaa695f [libzstd] Fix ZSTD_decompressDCtx() with a dictionary
* `ZSTD_decompressDCtx()` did not use the dictionary loaded by
  `ZSTD_DCtx_loadDictionary()`.
* Add a unit test.
* A stacked diff uses `ZSTD_decompressDCtx()` in the
  `dictionary_round_trip` and `dictionary_decompress` fuzzers.
2019-04-09 17:59:27 -07:00
Nick Terrell c5d70b7dbb [fuzzer] Sometimes fuzz with one less output byte
Zstd compression sometimes does different stuff when it has at least
`ZSTD_compressBound()` output bytes, or not. Half of the time fuzz with
`ZSTD_compressBound() - 1` output bytes. Ensure that we have at least
one byte of overhead by disabling either the dictionary ID or checksum.
2019-04-09 16:47:59 -07:00
Nick Terrell 48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell 7a1fde2957 [fuzzer] Add dictionary fuzzers 2019-04-08 21:07:28 -07:00
Nick Terrell 462918560c [fuzzer] Fix stream_round_trip for the new options 2019-04-08 21:06:19 -07:00
Nick Terrell f871b5144e [fuzz] Use the new advanced API 2019-04-08 20:01:38 -07:00
Nick Terrell e649fad7aa [dictBuilder] Fix displayLevel for corpus warning
Pass the displaylevel into the corpus warning, because it is used in
fast cover and cover, so it needs to respect the local level.
2019-04-08 20:00:18 -07:00
Nick Terrell bfcd5b81d7 [libzstd] Don't check the dictID in fuzzing mode
When `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` is defined don't check
the dictID. This check makes the fuzzers job harder, and it is at the
very beginning.
2019-04-08 19:57:41 -07:00
Nick Terrell 1a90133b15
Merge pull request #1575 from terrelln/zstdmt
[libzstd] Remove ZSTDMT from the shared library
2019-04-08 16:51:40 -07:00
Nick Terrell 947548c24f Remove double the from README 2019-04-08 16:50:18 -07:00
Nick Terrell 641e594309 [libzstd] Remove ZSTDMT from the shared object
* Remove ZSTDMT from the shared object by default.
* Provide a macro `ZSTD_LEGACY_MULTITHREADED_API` to override it.
* Document it in `lib/README.md`.
2019-04-07 18:47:52 -07:00
Nick Terrell d5910a5d94
Merge pull request #1574 from terrelln/examples
Stabilize ZSTD_getDictID_*() functions and clean up examples
2019-04-05 23:29:32 -07:00
Nick Terrell 1d0c1707d1 [examples] Clean up and comment the examples 2019-04-05 21:02:07 -07:00
Nick Terrell 1dfe37fea9 [libzstd] Stabilize ZSTD_getDictID_*() functions 2019-04-05 18:59:30 -07:00
Nick Terrell ce388fe4d2 [libzstd] Fix return value docs for ZSTD_compressStream2() 2019-04-05 17:44:07 -07:00
Nick Terrell a63aaaa2cc
Merge pull request #1573 from terrelln/regression
[regression] Update results.csv for level 1 change
2019-04-05 11:06:35 -07:00
Nick Terrell dbc8a59a0a
Merge pull request #1569 from terrelln/stable
Stabilize the advanced API
2019-04-05 10:47:48 -07:00
Nick Terrell 50c634b86e [regression] Update results.csv for level 1 change 2019-04-05 10:46:22 -07:00
Nick Terrell 7231ea72a8 [libzstd] Reword the streaming docs for the new API 2019-04-03 19:21:05 -07:00
Nick Terrell cf7d601bf5 Move the dictionary API and mark the legacy API
* Move the dictionary API below the streaming API
* Mark the legacy streaming API as redundant
2019-04-03 19:16:40 -07:00
Nick Terrell d7d89513d6 Stabilize advance API
This commit moves the candidate advanced API to the stable section.
It makes some minor whitespace changes, but it doesn't change any
of the wording of the documentation.

I'll put up a separate PR that tweaks some of the documentation
once this lands, so that it is easier to review.

NOTE: Even though these functions are now in stable, they aren't
stable until the next release (in under 1 month). It is possible
that they change until then.
2019-04-03 18:43:20 -07:00
Nick Terrell 0827edeace [libzstd] Bump the library version to 1.4.0
Bumps the library version to 1.4.0 in preparation to stabilize the
advanced API.
2019-04-03 18:43:20 -07:00
Nick Terrell 72a3fbc0e4
Merge pull request #1562 from terrelln/2fast
[libzstd] Speed up single segment zstd_fast by 5%
2019-04-03 18:08:15 -07:00
Nick Terrell 56261001ea
Merge pull request #1567 from terrelln/examples2
[examples] Update streaming_decompression.c
2019-04-03 11:27:49 -07:00
Yann Collet 816a3f47c7
Merge pull request #1568 from terrelln/examples3
Update streaming_memory_usage.c and fix ZSTD_estimateCStreamSize_usingCCtxParams()
2019-04-03 09:07:13 -07:00
Nick Terrell cdc8ae2e9b [examples] Update streaming_memory_usage.c
Update to use the new streaming API. Making progress on Issue #1548.

Tested that the checks don't fail.
Tested with window log 9-32. The lowest and highest fail as expected.
2019-04-02 19:20:57 -07:00
Nick Terrell 00679da22b [libzstd] Setting ZSTD_d_maxWindowLog to 0 means default 2019-04-02 19:20:52 -07:00
Nick Terrell 95624b77e4 [libzstd] Speed up single segment zstd_fast by 5%
This PR is based on top of PR #1563.

The optimization is to process two input pointers per loop.
It is based on ideas from [igzip] level 1, and talking to @gbtucker.

| Platform                | Silesia     | Enwik8 |
|-------------------------|-------------|--------|
| OSX clang-10            | +5.3%       | +5.4%  |
| i9 5 GHz gcc-8          | +6.6%       | +6.6%  |
| i9 5 GHz clang-7        | +8.0%       | +8.0%  |
| Skylake 2.4 GHz gcc-4.8 | +6.3%       | +7.9%  |
| Skylake 2.4 GHz clang-7 | +6.2%       | +7.5%  |

Testing on all Silesia files on my Intel i9-9900k with gcc-8

| Silesia File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| silesia.tar  | +0.17%       | +6.6%        |
| dickens      | +0.25%       | +7.0%        |
| mozilla      | +0.02%       | +6.8%        |
| mr           | -0.30%       | +10.9%       |
| nci          | +1.28%       | +4.5%        |
| ooffice      | -0.35%       | +10.7%       |
| osdb         | +0.75%       | +9.8%        |
| reymont      | +0.65%       | +4.6%        |
| samba        | +0.70%       | +5.9%        |
| sao          | -0.01%       | +14.0%       |
| webster      | +0.30%       | +5.5%        |
| xml          | +0.92%       | +5.3%        |
| x-ray        | -0.00%       | +1.4%        |

Same tests on Calgary. For brevity, I've only included files
where compression ratio regressed or was much better.

| Calgary File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| calgary.tar  | +0.30%       | +7.1%        |
| geo          | -0.14%       | +25.0%       |
| obj1         | -0.46%       | +15.2%       |
| obj2         | -0.18%       | +6.0%        |
| pic          | +1.80%       | +9.3%        |
| trans        | -0.35%       | +5.5%        |

We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.

I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:

| Level | Ratio Change | Speed Change |
|-------|--------------|--------------|
| -1    | +0.13%       | +6.4%        |
| -2    | +4.6%        | -1.5%        |
| -3    | +7.5%        | -4.8%        |
| -4    | +8.5%        | -6.9%        |
| -5    | +9.1%        | -9.1%        |

Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller.  If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.

[igzip]: https://github.com/01org/isa-l/tree/master/igzip
2019-04-02 19:02:50 -07:00
Nick Terrell de58910b5a [examples] Update streaming_decompression.c
Update to use the new streaming API. Making progress on Issue #1548.

Tested that it can decompress files produced by `streaming_compression`.
Tested that it can decompress two frames concatenated together.
Tested that it fails on corrupted data.
2019-04-02 18:52:59 -07:00
Nick Terrell 882ceb86bc
Merge pull request #1566 from terrelln/examples
[examples] Update multiple_streaming_compression.c
2019-04-02 17:13:10 -07:00
Nick Terrell 56682a7709 Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.

Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.

I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
2019-04-01 18:02:52 -07:00
Nick Terrell 04325cbc2f Fix indentation 2019-04-01 17:33:49 -07:00
Nick Terrell fb13d757af [examples] Update multiple_streaming_compression.c
Update to use the new streaming API. Making progress on Issue #1548.

Tested that multiple files could be compressed, and that the output
is the same as calling `streaming_compression` multiple times with
the same compression level, and that it can be decompressed.
2019-04-01 16:41:06 -07:00
Nick Terrell 425ce5547c
Merge pull request #1563 from terrelln/dms-sep
[libzstd] Split out zstd_fast dict match state function
2019-03-29 16:19:21 -06:00
Nick Terrell f00407b640 Split out zstd_fast dict match state function 2019-03-29 10:39:16 -06:00
Nick Terrell 6625f3b390
Merge pull request #1561 from shakeelrao/fix-typo
Update comments in zstd.h and fileio.c
2019-03-28 23:42:16 -06:00
shakeelrao dca73db30c fix srcSize typo and add new UTIL func to comment 2019-03-28 17:50:34 -07:00
Nick Terrell dcc6c7e9ae
Merge pull request #1556 from terrelln/dictbuilder
[cover] Improvements for small or homogeneous data
2019-03-25 15:08:32 -07:00
Nick Terrell 440f390cba
Merge pull request #1557 from terrelln/examples
[examples] Update streaming_compression to the new API
2019-03-25 15:07:35 -07:00
Nick Terrell 7186a50775
Merge pull request #1559 from shakeelrao/reject-dict
[CLI] ensure dictionary and input file are different
2019-03-25 15:06:58 -07:00
shakeelrao 44f77b5c71 Add whitespace to test case 2019-03-24 03:42:11 -07:00
shakeelrao b25d7eacf2 Rename test 2019-03-24 03:40:03 -07:00