Commit Graph

4834 Commits (5414b8ea01dff427bd186d042c5b03acdf9ac920)

Author SHA1 Message Date
Yann Collet 5414b8ea01 Merge branch 'dev' of github.com:facebook/zstd into dev 2018-03-09 11:53:24 -08:00
Yann Collet e916b9090e gen_html: changed CFLAGS for CXXFLAGS
since it's associated with $(CXX)
2018-03-09 11:52:14 -08:00
Yann Collet 51169575a8
Merge pull request #1036 from terrelln/thread-void
[threading] Cast unused arguments to void
2018-03-07 12:14:05 -08:00
Yann Collet 0379d83951
Merge pull request #1034 from facebook/longOffsetMode
Dynamic selection of long offset mode
2018-03-07 10:26:35 -08:00
Nick Terrell 7e103cdaf5 [threading] Cast unused arguments to void 2018-03-06 18:36:40 -08:00
Yann Collet db147ea620 improved comments
following @terrelln suggestions
2018-03-06 18:15:26 -08:00
Yann Collet 51262bd832
Merge pull request #1033 from facebook/benchDecode
fix benchmark issue when measuring decoding speed only
2018-03-06 17:55:23 -08:00
Yann Collet 06ca9c7d7c fixed 0-seq blocks in block-decompression mode 2018-03-06 01:50:19 -08:00
Yann Collet 9a91afe6ef long offset mode : new default threshold for 32-bit 2018-03-05 16:41:08 -08:00
Yann Collet 7bd7a3ad43 long offset mode : new default threshold for 64-bits mode 2018-03-05 16:16:49 -08:00
Yann Collet c0393a538f fixed counting long distance weights 2018-03-05 15:12:10 -08:00
Yann Collet a70f7e10fa Merge branch 'benchDecode' into longOffsetMode 2018-03-05 14:09:00 -08:00
Yann Collet 03e7e14192 fix benchmark issue when measuring only decoding speed
zstd bench module can focus on decompression speed _only_.
This is useful when trying to measure performance
on large input data compressed using a high level
as compression time becomes problematic (too long).

This mode is triggered by command : zstd -b -d

Problem was : in such a mode,
measured decoding speed was > 10% slower
than in nominal mode (compression + decompression),
making decompression benchmark mode much less useful.

This patch fixes the issue.
It's not completely clear why, but
moving the `memcpy()` operation sooner in the pipeline fixed it.

I can still measure some difference, but it is in the < 2% range,
so it's much more tolerable.

also : it doesn't matter anymore in which order are selected
commands `-b` and `-d`.
The combination always triggers bench_decodeOnly mode.
2018-03-05 13:57:41 -08:00
Yann Collet 41bd10446e Merge branch 'dev' into longOffsetMode 2018-03-05 13:10:10 -08:00
Yann Collet cb789d2df8 re-inserted offset evaluation 2018-03-05 13:08:59 -08:00
Yann Collet 99afe72576
Merge pull request #1032 from facebook/bmi2
Enable DYNAMIC_BMI2 for clang
2018-03-05 13:03:24 -08:00
Yann Collet b91ddf0ae6 Merge branch 'dev' into longOffsetMode 2018-03-05 11:59:54 -08:00
Yann Collet 403741130d
Merge pull request #1029 from cemeyer/dev
FIO_addFInfo: Fully initialize output 'total' struct
2018-03-05 11:49:48 -08:00
Yann Collet d02b44cf55 DYNAMIC_BMI2 enabled for clang
clang only claims compatibility with gcc 4.2.
Consequently, recent patch which reserved DYNAMIC_BMI2 for gcc >= 4.8
also disabled it for clang.

fix : __clang__ is now enough to enable DYNAMIC_BMI2
(associated with other existing conditions : x64/x64, !bmi2)
2018-03-04 16:05:59 -08:00
Yann Collet 3ba307b240
Merge pull request #1031 from facebook/inline48
force_inline HUF_decodeSymbol*()
2018-03-01 17:52:15 -08:00
Yann Collet 45b09e7625 limit DYNAMIC_BMI2 to gcc >= 4.8
attribute bmi2 not supported by gcc 4.4
2018-03-01 15:02:18 -08:00
Yann Collet b01552a07a force inlining of HUF_decodeSymbol*() functions
which was not done properly by gcc 4.8
resulting in major performance difference.

ex :
zstd -b1 silesia.tar
before : dec 680 MB/s
after  : dec 710 MB/s  (without bmi2)
after  : dec 770 MB/s  (with DYNAMIC_BMI2)
2018-03-01 11:31:45 -08:00
Conrad Meyer 606374269c FIO_addFInfo: Fully initialize output 'total' struct
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value.  Still, appeasing Coverity is pretty harmless in this case.)
2018-02-28 15:23:05 -08:00
Yann Collet 564cb1b640 update doc/README.md on provided tools to test 3rd party implementations 2018-02-27 17:37:05 -08:00
Yann Collet ccb7184a76
Merge pull request #1026 from terrelln/lrm-window
LDM manages its own window round buffer
2018-02-27 17:09:10 -08:00
Nick Terrell 0a0e64c641 LDM manages its own window round buffer 2018-02-27 12:13:23 -08:00
Yann Collet 2c4d3f339a
Merge pull request #1025 from facebook/huf
Huf
2018-02-27 09:57:01 -08:00
Yann Collet 33a3f18848 fixed wrong size test 2018-02-26 18:27:51 -08:00
Yann Collet d18d43aaf9
Merge pull request #1024 from terrelln/window-split
Split the window state into substructure
2018-02-26 17:18:33 -08:00
Yann Collet 89741653ab added error code workSpace_tooSmall 2018-02-26 15:11:50 -08:00
Yann Collet 6cdf690441 minor cleaning of huff0
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
2018-02-26 14:52:23 -08:00
Nick Terrell 6b88d592fd Reduce ZSTD_CHAINLOG_MAX to 29 in 32-bit mode 2018-02-26 13:30:24 -08:00
Nick Terrell 7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
Yann Collet 50bc2ce95e
Merge pull request #1021 from terrelln/lrm-split
Split block compresser out of long range matcher
2018-02-23 17:36:51 -08:00
Yann Collet 653383f74a minor nit from Mac XCode 2018-02-22 15:44:26 -08:00
Nick Terrell 7e2bf4ebad Remove long range matcher immediate repcode check
The compression ratio gets about 0.01% worse on the files I tested, but the
code is much simpler.
2018-02-22 15:18:47 -08:00
Nick Terrell af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
Yann Collet 4fb071ec3c
Merge pull request #1022 from facebook/bmi2IntoC
Implemented BMI2 functions directly within huf_decompress.c
2018-02-22 14:30:43 -08:00
Yann Collet 0fd4df6ed3 Implemented BMI2 functions directly within huf_decompress.c
This makes it easier to edit for maintenance and evolutions
(I plan to experiment modifications in huffman decompression functions).

The methology followed seems broadly applicable to other BMI2 modules.

Performance was tracked rigorously at each step,
there is no noticeable loss (nor win) of performance compared to `#include` version.

Note however that 4X decoder variants tend to be extremely sensitive to code alignment.
This source code resulted in pretty good performance for gcc 7.2 and 7.3,
but future changes (even in other parts of the code) might trigger the issue again.
2018-02-22 10:51:47 -08:00
Yann Collet 4d6632c8f3
Merge pull request #1020 from facebook/betterBench
updated fullbench measurement methodology
2018-02-21 14:51:39 -08:00
Yann Collet 6e481504ee fullbench includes assert.h
as it is missing for Windows
2018-02-21 11:42:23 -08:00
Yann Collet 9c5a8040a9 fixed huf_compress workspace size 2018-02-21 11:34:49 -08:00
Yann Collet 364ce19463 update fullbench measurement methodology
to use less calls to time(), like bench.c.

also upgraded accuracy to nanosecond.
2018-02-21 09:43:32 -08:00
Yann Collet 993ffffba3
Merge pull request #1019 from facebook/betterBench
improve benchmark measurement for small inputs
2018-02-21 05:47:08 -08:00
Yann Collet 25d00d10fc fixed minor conversion warning 2018-02-20 16:52:28 -08:00
Yann Collet 010ba5f71f
Merge pull request #1017 from terrelln/c-bmi2
[compress] Support BMI2
2018-02-20 15:34:59 -08:00
Yann Collet 3538a535bf use TIMELOOP_NANOSEC
as suggested by @terrelln
2018-02-20 15:33:56 -08:00
Yann Collet d3364aa39e improve benchmark measurement for small inputs
by invoking time() once per batch, instead of once per compression / decompression.
Batch is dynamically resized so that each round lasts approximately 1 second.

Also : increases time accuracy to nanosecond
2018-02-20 14:58:40 -08:00
Nick Terrell 6e128d3534 [BMI2] Add comments to the bmi2 variable in the contexts 2018-02-20 14:12:11 -08:00
Yann Collet 70163bf0d3 added clarification comments in zstd_errors.h
answering some points in #1018
2018-02-20 12:54:49 -08:00