1201 Commits

Author SHA1 Message Date
Nick Terrell
cdb9481e38 [libzstd] Optimize ZSTD_insertBt1() for repetitive data
We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.

This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.

```
> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35
```

At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes.
In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`.

Fixes Issue #1634.
2019-06-05 20:34:00 -07:00
Yann Collet
b13a9207f9
Merge pull request #1623 from facebook/fullbench
fullbench minor improvements
2019-05-31 14:40:19 -07:00
Yann Collet
ed38b645db fullbench: pass proper parameters in scenario 43 2019-05-29 15:26:06 -07:00
Yann Collet
9719fd616c removed nextToUpdate3 from ZSTD_window
it's now a local variable of ZSTD_compressBlock_opt()
2019-05-28 16:18:12 -07:00
Yann Collet
33dabc8c80 get bt matches : made it a bit clearer which parameters are input and output 2019-05-28 16:11:32 -07:00
Yann Collet
327cf6fac1 nextToUpdate3 does not need to be maintained outside of zstd_opt.c
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.

Made the logic clear, so that no code tried to maintain this variable.

An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.

This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
Yann Collet
6453f8158f complementary code comments
on variables used / impacted during maxDist check
2019-05-28 14:12:16 -07:00
Yann Collet
4baecdf72a added comments to better understand enforceMaxDist() 2019-05-28 13:15:48 -07:00
Nick Terrell
a17fe4c9e5 [visual] Fix unreachable code warning 2019-04-16 11:32:35 -07:00
Nick Terrell
de0499f7fa [libzstd] Require ZSTD_MULTITHREAD to create a ZSTDMT_CCtx
ZSTDMT was broken when compiled without ZSTD_MULTITHREAD defined,
because `ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, nbWorkerss)`
failed. It was detected by the MSVC test which runs the fuzzer with
multithreading disabled.

This is a very niche use case of a deprecated API, because the API is
inefficient and synchronous, since `threading.h` will be synchronous.
Users almost certainly don't want this, and anyone who tested their code
should realize that it is broken. Therefore, I think it is safe to
require `ZSTD_MULTITHREAD` to be defined to use ZSTDMT.
2019-04-15 23:04:46 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell
641e594309 [libzstd] Remove ZSTDMT from the shared object
* Remove ZSTDMT from the shared object by default.
* Provide a macro `ZSTD_LEGACY_MULTITHREADED_API` to override it.
* Document it in `lib/README.md`.
2019-04-07 18:47:52 -07:00
Nick Terrell
72a3fbc0e4
Merge pull request #1562 from terrelln/2fast
[libzstd] Speed up single segment zstd_fast by 5%
2019-04-03 18:08:15 -07:00
Nick Terrell
95624b77e4 [libzstd] Speed up single segment zstd_fast by 5%
This PR is based on top of PR #1563.

The optimization is to process two input pointers per loop.
It is based on ideas from [igzip] level 1, and talking to @gbtucker.

| Platform                | Silesia     | Enwik8 |
|-------------------------|-------------|--------|
| OSX clang-10            | +5.3%       | +5.4%  |
| i9 5 GHz gcc-8          | +6.6%       | +6.6%  |
| i9 5 GHz clang-7        | +8.0%       | +8.0%  |
| Skylake 2.4 GHz gcc-4.8 | +6.3%       | +7.9%  |
| Skylake 2.4 GHz clang-7 | +6.2%       | +7.5%  |

Testing on all Silesia files on my Intel i9-9900k with gcc-8

| Silesia File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| silesia.tar  | +0.17%       | +6.6%        |
| dickens      | +0.25%       | +7.0%        |
| mozilla      | +0.02%       | +6.8%        |
| mr           | -0.30%       | +10.9%       |
| nci          | +1.28%       | +4.5%        |
| ooffice      | -0.35%       | +10.7%       |
| osdb         | +0.75%       | +9.8%        |
| reymont      | +0.65%       | +4.6%        |
| samba        | +0.70%       | +5.9%        |
| sao          | -0.01%       | +14.0%       |
| webster      | +0.30%       | +5.5%        |
| xml          | +0.92%       | +5.3%        |
| x-ray        | -0.00%       | +1.4%        |

Same tests on Calgary. For brevity, I've only included files
where compression ratio regressed or was much better.

| Calgary File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| calgary.tar  | +0.30%       | +7.1%        |
| geo          | -0.14%       | +25.0%       |
| obj1         | -0.46%       | +15.2%       |
| obj2         | -0.18%       | +6.0%        |
| pic          | +1.80%       | +9.3%        |
| trans        | -0.35%       | +5.5%        |

We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.

I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:

| Level | Ratio Change | Speed Change |
|-------|--------------|--------------|
| -1    | +0.13%       | +6.4%        |
| -2    | +4.6%        | -1.5%        |
| -3    | +7.5%        | -4.8%        |
| -4    | +8.5%        | -6.9%        |
| -5    | +9.1%        | -9.1%        |

Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller.  If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.

[igzip]: https://github.com/01org/isa-l/tree/master/igzip
2019-04-02 19:02:50 -07:00
Nick Terrell
56682a7709 Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.

Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.

I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
2019-04-01 18:02:52 -07:00
Nick Terrell
f00407b640 Split out zstd_fast dict match state function 2019-03-29 10:39:16 -06:00
Nick Terrell
6b053b9f60 [lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
  compression job. This allows the user to pass the dictionary before they
  set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell
e55da9e963 Wrap the new advanced api completely 2019-03-21 10:54:40 -07:00
Nick Terrell
787b76904a [libzstd] Allow compression parameters to be set with a cdict
The order you set parameters in the advanced API is not supposed to matter.
However, once you call `ZSTD_CCtx_refCDict()` the compression parameters
cannot be changed. Remove that restriction, and document what parameters
are used when using a CDict.

If the CCtx is in dictionary mode, then the CDict's parameters are used.
If the CCtx is not in dictionary mode, then its requested parameters are
used.
2019-03-13 16:10:05 -07:00
Nick Terrell
0594e8135b [libzstd] Free local cdict when referencing cdict
We no longer care about the `cdictLocal` after calling
`ZSTD_CCtx_refCDict()`, so we should free it to save some
memory.
2019-03-13 14:54:31 -07:00
Nick Terrell
7ad7ba3178 [libzstd] Rename ZSTD_CCtxParam_* to ZSTD_CCtxParams_* 2019-02-19 17:44:52 -08:00
Nick Terrell
f4abba02ba [libzstd] Clean up parameter code
* Move all ZSTDMT parameter setting code to ZSTD_CCtxParams_*Parameter().
  ZSTDMT now calls these functions, so we can keep all the logic in the
  same place.
* Clean up `ZSTD_CCtx_setParameter()` to only add extra checks where needed.
* Clean up `ZSTDMT_initJobCCtxParams()` by copying all parameters by default,
  and then zeroing the ones that need to be zeroed. We've missed adding several
  parameters here, and it makes more sense to only have to update it if you
  change something in ZSTDMT.
* Add `ZSTDMT_cParam_clampBounds()` to clamp a parameter into its valid
  range. Use this to keep backwards compatibility when setting ZSTDMT parameters,
  which clamp into the valid range.
2019-02-19 13:22:37 -08:00
Nick Terrell
3d7377b874 [libzstd] Handle uncompressed literals 2019-02-15 14:58:11 -08:00
Nick Terrell
f9513115e4 [libzstd] Add ZSTD_c_literalCompressionMode flag
It controls the literals compression. It is either
`auto`, `huffman`, or `uncompressed`. It defaults to
`auto`, which is the current behavior.
2019-02-13 14:59:22 -08:00
W. Felix Handte
501eb25102 Rename FORWARD_ERROR -> FORWARD_IF_ERROR 2019-01-29 12:56:07 -05:00
W. Felix Handte
03e040a966 Replace Uses of CHECK_E with RETURN_ERROR_IF(*_isError(... 2019-01-28 17:33:01 -05:00
W. Felix Handte
64bb6640f2 Replace CHECK_F Uses in zstdmt_compress.c and zstd_ddict.c 2019-01-28 17:15:57 -05:00
W. Felix Handte
cafc3b1bcb Also Convert zstd_compress.c 2019-01-28 17:05:18 -05:00
Yann Collet
f9e4f89252 improved comments for adjustCParams() and getCParams() 2019-01-02 12:18:40 -08:00
Yann Collet
e980ba212f
Merge pull request #1471 from facebook/nofloat
guard functions using floating point for debug mode only
2018-12-23 12:35:51 -08:00
Yann Collet
aae5bc538a
Merge pull request #1470 from facebook/U32
fix confusion between unsigned <-> U32
2018-12-23 12:35:39 -08:00
Yann Collet
c9dfb7e445 guard functions using floating point for debug mode only
they are only used to print debug messages.
Requested in #1386,
2018-12-22 09:09:40 -08:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
c8d1fda982 update aarch64 test to xenial
in an attempt to circumvent the `ld` bug
2018-12-21 15:08:48 -08:00
Yann Collet
8f35c7f94c
Merge pull request #1466 from facebook/noDictPresent
fixed : better error message
2018-12-20 19:01:27 -08:00
Yann Collet
41b45b84a1
Merge pull request #1465 from facebook/noFilePresent
fixed : detection of non-existing file
2018-12-20 17:21:04 -08:00
Yann Collet
ed2fb6bd57 fixed : better error message when dictionary missing
during benchmark.
Also : refactored ZSTD_fillHashTable(),
just for readability (it does the same thing)
2018-12-20 17:20:07 -08:00
Yann Collet
95784c654c fixed shadowing of stat variable
some standard lib declares a `stat` variable at global scope
shadowing local declarations ....
2018-12-20 14:56:44 -08:00
Yann Collet
2898afab52 fixed OSSfuzz 11849
The problem was already masked,
due to no longer accepting tiny blocks for statistics.

But in case it could still happen with not-so-tiny blocks,
there is a stricter control which ensures that
nothing was already loaded prior to statistics collection.
2018-12-19 16:54:15 -08:00
Yann Collet
8e0e495ce8 fixed: compression ratio discrepancy
depending on initialization,
the first byte of a new frame was invalidated or not.

As a consequence, one match opportunity was available or not,
resulting in slightly different compressed sizes
(on average, 1 or 2 bytes once every 20 frames).

It impacted ratio comparison between one-shot and streaming modes.

This fix makes the first byte of a new frame always a valid match.
Now compressed size is always the same.
It also improves compressed size by a negligible amount.
2018-12-19 10:11:06 -08:00
Yann Collet
d0e15f8d32
Merge pull request #1458 from terrelln/estimate
[libzstd] Fix estimate with negative levels
2018-12-18 15:12:21 -08:00
Yann Collet
04baecaeed
Merge pull request #1457 from facebook/btultra2.1
btultra2 and very small input
2018-12-18 14:46:55 -08:00
Nick Terrell
d7def456d8 [libzstd] Fix estimate with negative levels
* Fix `ZSTD_estimateCCtxSize()` with negative levels.
* Fix `ZSTD_estimateCStreamSize()` with negative levels.
* Add a unit test to test for this error.
2018-12-18 14:24:49 -08:00
Yann Collet
ef984e7307 fix debug levels
as reported by @terrelln.
2 is reserved for temporary usage only.
2018-12-18 13:40:07 -08:00
Yann Collet
635783da12 btultra2 and very small srcSize
When srcSize is small,
the nb of symbols produced is likely too small to warrant dedicated probability tables.
In which case, predefined distribution tables will be used instead.

There is a cheap algorithm in btultra initialization :
it presumes default distribution will be used if srcSize <= 1024.

btultra2 now uses the same threshold to shut down probability estimation,
since measured frequencies won't be used at entropy stage,
and therefore relying on them to determine sequence cost is misleading,
resulting in worse compression ratios.

This fixes btultra2 performance issue on very small input.

Note that, a proper way should be
to determine which symbol is going to use predefined probaility
and which symbol is going to use dynamic ones.
But the current algorithm is unable to make a "per-symbol" decision.
So this will require significant modifications.
2018-12-18 12:32:58 -08:00
Yann Collet
373ff8b983 play around with rescale weights 2018-12-17 15:48:34 -08:00
Yann Collet
8be145a8c1 fixed default job size 2018-12-13 16:38:08 -08:00
Yann Collet
62180b27d5 zstdmt parameter getter/setter use int 2018-12-13 15:47:34 -08:00
Yann Collet
34f01e600f fixed multiple conversions
from 64-bit to 32-bit
2018-12-13 14:02:22 -08:00