Commit Graph

1336 Commits (81512e9ebee5db31ac62eba4e6bc04ea34517729)

Author SHA1 Message Date
Nick Terrell 81512e9ebe Avoid '#define inline /* ... */'
Take definition of `FORCE_INLINE` from `zstd_internal.h`.
2017-03-08 14:00:21 -08:00
Nick Terrell e06c303475 Fix ZSTD_sizeof_CStream() 2017-03-08 13:45:10 -08:00
Yann Collet f44b55c18d Merge pull request #584 from terrelln/huff-repeat
Allow compressor to repeat Huffman tables
2017-03-02 17:20:11 -08:00
Yann Collet fe5d27062e disable prefetch-decode for 32-bits target
This decoder variant is detrimental to x86 architecture
likely due to register pressure.

Note that the variant is disabled for all 32-bits targets.
It's unclear if it would help for different architectures,
such as ARM, MIPS or PowerPC.
2017-03-02 17:09:21 -08:00
Nick Terrell d051cd5b43 Use workspace for count and CTable 2017-03-02 16:38:07 -08:00
Sean Purcell 553f67e0c1 Remove 'generic' inline strategy
Seems to avoid performance loss for compression.
Same strategy tested on decompression side, did not appear to improve
speed.
2017-03-02 15:18:13 -08:00
Sean Purcell 3d95925a59 Merge remote-tracking branch 'origin/dev' into m32 2017-03-02 15:17:56 -08:00
Nick Terrell a419777eb1 Allow compressor to repeat Huffman tables
* Compressor saves most recently used Huffman table and reuses it
  if it produces better results.
* I attempted to preserve CPU usage profile.
  I intentionally left all of the existing heuristics in place.
  There is only a speed difference on the second block and later.
  When compressing large enough blocks (say >= 4 KiB) there is
  no significant difference in compression speed.
  Dictionary compression of one block is the same speed for blocks
  with literals <= 1 KiB, and after that the difference is not
  very significant.
* In the synthetic data, with blocks 10 KB or smaller, most blocks
  can't use repeated tables because the previous block did not
  contain a symbol that the current block contains.
  Once blocks are about 12 KB or more, most previous blocks have
  valid Huffman tables for the current block, and the compression
  ratio and decompression speed jumped.
* In silesia blocks as small as 4KB can frequently reuse the
  previous Huffman table (85%), but it isn't as profitable, and
  the previous Huffman table only gets used about 3% of the time.
* Microbenchmarks show that `HUF_validateCTable()` takes ~55 ns
  and `HUF_estimateCompressedSize()` takes ~35 ns.
  They are decently well optimized, the first versions took 90 ns
  and 120 ns respectively. `HUF_validateCTable()` could be twice as
  fast, if we cast the `HUF_CElt*` to a `U32*` and compare to 0.
  However, `U32` has an alignment of 4 instead of 2, so I think that
  might be undefined behavior.
* I've ran `zstreamtest` compiled normally, with UASAN and with MSAN
  for 4 hours each.

The worst case for the speed difference is a bunch of small blocks
in the same frame. I modified `bench.c` to compress the input in a
single frame but with blocks of the given block size, set by `-B`.
Benchmarks on level 1:

|  Program  | Block size |   Corpus  | Ratio | Compression MB/s | Decompression MB/s |
|-----------|------------|-----------|-------|------------------|--------------------|
| zstd.base |        256 | synthetic | 2.364 |            110.0 |              297.0 |
|      zstd |        256 | synthetic | 2.367 |            108.9 |              297.0 |
| zstd.base |        256 | silesia   | 2.204 |             93.8 |              415.7 |
|      zstd |        256 | silesia   | 2.204 |             93.4 |              415.7 |
| zstd.base |        512 | synthetic | 2.594 |            144.2 |              420.0 |
|      zstd |        512 | synthetic | 2.599 |            141.5 |              425.7 |
| zstd.base |        512 | silesia   | 2.358 |            118.4 |              432.6 |
|      zstd |        512 | silesia   | 2.358 |            119.8 |              432.6 |
| zstd.base |       1024 | synthetic | 2.790 |            192.3 |              594.1 |
|      zstd |       1024 | synthetic | 2.794 |            192.3 |              600.0 |
| zstd.base |       1024 | silesia   | 2.524 |            148.2 |              464.2 |
|      zstd |       1024 | silesia   | 2.525 |            148.2 |              467.6 |
| zstd.base |       4096 | synthetic | 3.023 |            300.0 |             1000.0 |
|      zstd |       4096 | synthetic | 3.024 |            300.0 |             1010.1 |
| zstd.base |       4096 | silesia   | 2.779 |            223.1 |              623.5 |
|      zstd |       4096 | silesia   | 2.779 |            223.1 |              636.0 |
| zstd.base |      16384 | synthetic | 3.131 |            350.0 |             1150.1 |
|      zstd |      16384 | synthetic | 3.152 |            350.0 |             1630.3 |
| zstd.base |      16384 | silesia   | 2.871 |            296.5 |              883.3 |
|      zstd |      16384 | silesia   | 2.872 |            294.4 |              898.3 |
2017-03-02 13:27:52 -08:00
Yann Collet fdb0fd34b3 Merge pull request #583 from terrelln/set-dictid
Set dictID to 0 for content only dictionaries
2017-03-02 13:15:31 -08:00
Nick Terrell 3475b9b431 Set dictID to 0 for content only dictionaries 2017-03-02 12:33:02 -08:00
Sean Purcell d44703d145 Offsets >= 32MB in 32-bits mode 2017-03-01 16:27:56 -08:00
Yann Collet 76f0494089 xxhash can be included twice in any order
Previously,

followed by :

would fail to include the static definitions,
because the second include was simply skipped by guard macro.

Now it works as intended :
the missing static part is included during the second include.
2017-03-01 13:29:29 -08:00
Yann Collet 4bcc69b761 solves warnings when compiling with global XXH_STATIC_LINKING_ONLY
XXH_STATIC_LINKING_ONLY protection macro is intended to be triggered just before the include.
The main idea is to keep this setting local :
user module shall explicitly understand and accept the static linking restriction
which becomes transparent when triggering the macro at project level.
Global definition also triggers redefinition warnings for user modules which do locally define the macro.

This new version compiles lib and cli without warning when the macro is set globally.
That's not a scenario to be recommended, since it trades a local effect for a global one,
but it was easy enough to provide from zstd side.
2017-03-01 11:33:25 -08:00
Yann Collet 31432cc57d Merge pull request #579 from iburinoc/multiframe
Check to ensure ddict isn't null before dereference
2017-03-01 11:02:04 -08:00
Sean Purcell a81d4fee58 Check to ensure ddict isn't null before dereference 2017-02-28 15:28:29 -08:00
Yann Collet 22d79762ef fixed multi frames 2017-02-28 02:12:42 -08:00
Yann Collet a33ae64204 fixed decoding skippable frames 2017-02-28 01:15:28 -08:00
Yann Collet d1760113ec Improved speed of ZSTD_decompressStream()
When ZSTD_decompressStream() detects
that there is enough space in dst
to complete decompression in a single pass,
delegates to ZSTD_decompress(),
for an extra ~5% speed boost
2017-02-28 00:14:28 -08:00
Yann Collet a81c2e7e44 Merge pull request #573 from facebook/ddict
Improved DDict memory usage
2017-02-27 20:54:42 -08:00
Yann Collet dccd6b6f65 cli : fix : --rm is silent when input is stdin
previously, app would produce an error message, and stop.
2017-02-27 15:57:50 -08:00
Yann Collet 0b9b894b2d reduced ZSTD_DDict memory usage
saved 128 KB
2017-02-27 00:27:30 -08:00
Yann Collet bd7fa21deb added ZSTD_refDDict()
Now DDict does no longer depends on DCtx duplication
2017-02-26 14:43:07 -08:00
Yann Collet d73eebc00f loadEntropy works on new ZSTD_entropy_t type 2017-02-26 10:16:42 -08:00
Yann Collet 8629f0e41f created entropy structure type 2017-02-25 18:33:31 -08:00
Yann Collet 8dff956dbf Added DDict unit test in fuzzer
also : slightly modified loadEntropy :
know src must points at start of dictionary
2017-02-25 10:11:15 -08:00
Yann Collet 14312d833e zstdmt : fix : loading prefix from previous segments
There used to be a (very small) chance that
loading prefix from previous segment
would be confused with a real zstd dictionary.
For that to happen, the prefix needs to start
with the same value as dictionary magic.
That's 1 chance in 4 billions if all values have equal probability.
But in fact, since some values are more common (0x00000000 for example)
others are less common, and dictionary magic was selected to be one of them,
so probabilities are likely even lower.

Anyway, this risk is no down to zero
by adding a new CCtx parameter : ZSTD_p_forceRawDict

Current parameter policy : the parameter "stick" to its CCtx,
so any dictionary loading after ZSTD_p_forceRawDict is set
will be loaded in "raw" ("content only") mode,
even if CCtx is re-used multiple times with multiple different dictionary.
It's up to the user to reset this value differently if it needs so.
2017-02-23 23:42:12 -08:00
Yann Collet 831b4890ce minor tests/Makefile refactoring
and update of zstd_manual,html
2017-02-23 23:09:10 -08:00
Yann Collet cce8d8ba2b Merge pull request #560 from iburinoc/findcompressedsize
Change name to to findFrameCompressedSize and add skippable support
2017-02-23 13:39:23 -08:00
Sean Purcell 83038d236a Fix bug in FSE distribution normalization 2017-02-22 13:52:48 -08:00
Sean Purcell 64417cd2ff Describe ambiguity around skippable frames 2017-02-22 13:29:01 -08:00
Sean Purcell 9757cc811b Update comment 2017-02-22 12:28:21 -08:00
Sean Purcell 9050e1925e Change name to to findFrameCompressedSize and add skippable support 2017-02-22 12:12:34 -08:00
Przemyslaw Skibinski d8114e5802 zstd_compress.c: fix memory leaks 2017-02-21 18:59:56 +01:00
Anders Oleson 517577bf53 spelling fixes in comments
i.e. occurred labeled Huffman
2017-02-20 12:08:59 -08:00
Sean Purcell 6b010dec80 execSequence copies up to 2*WILDCOPY_OVERLENGTH extra 2017-02-16 12:05:40 -08:00
Sean Purcell 887eaa9e21 Fix wildcopy overwriting data still in window 2017-02-15 16:43:45 -08:00
Yann Collet 2252d29a5a Merge branch 'dev' of github.com:facebook/zstd into dev 2017-02-15 12:00:50 -08:00
Yann Collet 4596037042 updated fse version
feature minor refactoring (removing FSE_abs())
also : fix a few minor issues recently introduced in examples
2017-02-15 12:00:03 -08:00
Yann Collet 44f82d781f Merge pull request #545 from terrelln/force-window
[zstdmt] Fix MSAN failure with ZSTD_p_forceWindow
2017-02-15 10:20:15 -08:00
Yann Collet f0b9a8dddb Merge pull request #547 from inikep/dev11
Avoid fseek()'s 2GiB barrier with MacOS and *BSD
2017-02-14 12:29:00 -08:00
Yann Collet 9696bfc2ad Merge pull request #544 from ds77/avoid-empty
Portable way to avoid empty unit warning in threading.c
2017-02-14 00:54:55 -08:00
Przemyslaw Skibinski b876b96ce1 Merge remote-tracking branch 'refs/remotes/facebook/dev' into dev11 2017-02-14 09:26:03 +01:00
Nick Terrell ecf90ca24b [zstdmt] Fix MSAN failure with ZSTD_p_forceWindow
Reproduction steps:

```
make zstreamtest CC=clang CFLAGS="-O3 -g -fsanitize=memory -fsanitize-memory-track-origins"
./zstreamtest -vv -t4178 -i4178 -s4531
```

How to get to the error in gdb (may be a more efficient way):

* 2 breaks at zstd_compress.c:2418  -- in ZSTD_compressContinue_internal()
* 2 breaks at zstd_compress.c:2276  -- in ZSTD_compressBlock_internal()
* 1 break at zstd_compress.c:1547

Why the error occurred:

When `zc->forceWindow == 1`, after calling `ZSTD_loadDictionaryContent()` we
have `zc->loadedDictEnd == zc->nextToUpdate == 0`. But, we've really loaded up
to `iend` into the dictionary. Then in `ZSTD_compressBlock_internal()` we see
that `current > zc->nextToUpdate + 384`, so we load the last 192 bytes a second
time. In this case the bytes we are loading are a block of all 0s, starting in
the previous block. So when we are loading the last 192 bytes, we find a `match`
in the future, 183 bytes beyond `ip`. Since the block is all 0s, the match
extends to the end of the block. But in `ZSTD_count()` we only check that
`pIn < pInLoopLimit`, but since `pMatch > pIn`, `pMatch` eventually points past
the end of the buffer, causing the MSAN failure.

The fix:

The line changed sets sets `zc->nextToUpdate` to the end of the dictionary.
This is the behavior that existed before `ZSTD_p_forceWindow` was introduced.
This fixes the exposing test case. Since the code doesn't fail without
`zc->forceWindow`, it makes sense that this works. I've run the command
`./zstreamtest -T2mn` 64 times without failures. CI should also verify nothing
obvious broke.
2017-02-13 19:11:22 -08:00
Yann Collet 58af614ef2 push version and NEWS to v1.1.4 2017-02-13 18:32:44 -08:00
ds77 08e6a88a97 avoid empty translation unit warning without #pragma 2017-02-14 00:46:47 +01:00
Przemyslaw Skibinski 09c8e5390d __builtin_bswap requires gcc 4.3+ 2017-02-13 12:45:53 +01:00
Sean Purcell d7bfcac18a Expose frameSrcSize to experimental API 2017-02-10 11:55:44 -08:00
Sean Purcell 5069b6c2c3 Merge branch 'dev' into multiframe 2017-02-10 10:08:55 -08:00
Yann Collet bbba42acd1 Merge pull request #537 from terrelln/small-bugs
Fix small bugs
2017-02-10 04:35:43 -08:00
Yann Collet a28c34cb7a Merge pull request #538 from iburinoc/errorstring
Fix ZSTD_getErrorString and add tests
2017-02-10 03:59:56 -08:00