Commit Graph

6890 Commits (36528b96c4ae2dff700e63851f710aa11c1d0bc8)

Author SHA1 Message Date
Yann Collet 0d38ee3c30
Merge pull request #1690 from piguin/dev
fix compiling errors with clang-8
2019-07-24 15:37:05 -07:00
Yann Collet ff8b18a0bb
Merge pull request #1697 from Tyler-Tran/dev
Adding documentation for --shrink flag
2019-07-24 15:35:11 -07:00
W. Felix Handte 15da57820d Add New Seekable Compression Example to .gitignore 2019-07-24 18:22:20 -04:00
W. Felix Handte 9cb9b1c9a5 Update Manual 2019-07-24 18:21:11 -04:00
W. Felix Handte 25824cc185 Update CHANGELOG 2019-07-24 17:35:52 -04:00
W. Felix Handte 8083581f9a Bump Library Version Number to 1.4.2 2019-07-24 17:35:19 -04:00
Tyler Tran 5a61e66f7b previous commit did not undo all changes 2019-07-24 13:53:50 -07:00
Tyler Tran 12d60a9bd9 removing changes to zstd.1 2019-07-24 13:52:34 -07:00
Tyler Tran f8c1d7979c modifying minor nit 2019-07-22 16:36:44 -07:00
Tyler Tran 02da4497f0 Adding documentation for shrink flag PR #1656 2019-07-22 16:33:22 -07:00
Yann Collet b0a5d380af
Merge pull request #1695 from iburinoc/seekable-buff
Fix seekable decompression in-memory api
2019-07-22 15:34:32 -07:00
Nick Terrell 740b32173f
Merge pull request #1696 from terrelln/legacy-fix
[legacy] Fix bug in zstd-0.5 decoder
2019-07-22 18:06:18 -04:00
Nick Terrell e6edcfa795 [legacy] Fix bug in zstd-0.5 decoder
The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.

The bug, introduced by commit 0fd322f812,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.

I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.

Fixes #1693.
2019-07-22 13:05:09 -07:00
Sean Purcell 671d533ea7 Fix seekable decompression in-memory api 2019-07-21 23:22:25 -04:00
Yann Collet be3d2e2de8
Merge pull request #1679 from ephiepark/dev
Restructure the source files
2019-07-19 15:29:07 -07:00
Yann Collet f2620697c7
Merge pull request #1685 from vivekmig/dev
Add Check if Block Size Exceeds Maximum
2019-07-19 15:22:29 -07:00
Felix Handte 52181f877a
v1.4.1: Merge pull request #1691 from facebook/dev 2019-07-19 14:37:37 -04:00
Yann Collet d636cd1444
Merge pull request #1692 from felixhandte/v1.4.1-changelog
Update CHANGELOG with v1.4.1 Changes
2019-07-19 09:10:39 -07:00
W. Felix Handte 62a0dc57b1 Update CHANGELOG with v1.4.1 Changes 2019-07-19 11:18:10 -04:00
Qin Li 04a9d6b828 fix compiling errors with clang-8
Compiling with clang-8 fails with the following errors:

largeNbDicts.c:562:37: error: implicit conversion turns floating-point
number into integer: 'const double' to 'U64' (aka 'unsigned long')
[-Werror,-Wfloat-conversion]
        U64 const dTime_ns = result.nanoSecPerRun;
                  ~~~~~~~~   ~~~~~~~^~~~~~~~~~~~~

zstdcli.c:300:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
 * @return 1 means that cover parameters were correct
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

zstdcli.c:301:5: error: '@return' command used in a comment that is
not attached to a function or method declaration
[-Werror,-Wdocumentation]
 * @return 0 in case of malformed parameters
   ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2019-07-18 19:41:00 -07:00
Vivek Miglani a3ce0c9d04 Fixing decodecorpus test issue 2019-07-18 14:32:09 -07:00
W. Felix Handte a2861d75eb [doc] Bump Format Spec Version 2019-07-17 18:55:45 -04:00
W. Felix Handte c05b270edc [doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content
This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.
2019-07-17 18:55:45 -04:00
Vivek Miglani c7be7d2efb Fixing compressed block size checks 2019-07-17 12:53:15 -07:00
Ephraim Park 1dc98de279 Restructure the source files 2019-07-15 17:39:18 -07:00
Nick Terrell f7d56943fd
Merge pull request #1684 from terrelln/regression
[regression] Update results for ZSTD_double_fast update
2019-07-15 15:39:52 -04:00
Vivek Miglani 3f108f82fb Return error if block size exceeds maximum 2019-07-15 12:10:21 -07:00
Nick Terrell 4c2943df23 [regression] Update results for ZSTD_double_fast update 2019-07-15 11:25:22 -07:00
Vivek Miglani de61b36f9e Merge branch 'master' of https://github.com/vivekmig/zstd into dev 2019-07-15 10:47:09 -07:00
Yann Collet 8fb08b68cc
Merge pull request #1681 from facebook/level3
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell 75cfe1dc69
[ldm] Fix bug in overflow correction with large job size (#1678)
* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet eaeb7f00b5 updated the _extDict variant of double fast 2019-07-12 14:17:17 -07:00
Yann Collet e8a7f5d3ce double-fast: changed the trade-off for a smaller positive change
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice 812e8f2a16 perf improvements for zstd decode (#1668)
* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet d1327738c2 updated double_fast complementary insertion
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet b01c1c679f
Merge pull request #1675 from ephiepark/dev
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet 2387d574cb updated .gitignore 2019-07-09 14:54:48 -07:00
Yann Collet 34a1a37246 updated .gitignore rule 2019-07-09 14:44:22 -07:00
Yann Collet 8eda16c9f6
Merge pull request #1677 from LeeYoung624/gitignore_fix
fix gitignore errors
2019-07-09 14:36:38 -07:00
Yann Collet b8ec4b0fd6 updated version number (to v1.4.1)
also : added doc on context re-use, as suggested by @scherepanov at #1676
2019-07-09 11:43:59 -07:00
LeeYoung624 654cb9d439 fix gitignore errors 2019-07-09 21:08:13 +08:00
Yann Collet 096714d1b8
Merge pull request #1671 from ephiepark/dev
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park f57ac7b09e Factor out the logic to build sequences 2019-07-03 15:42:38 -07:00
Ephraim Park 9007701670 Adding targetCBlockSize param 2019-07-03 15:41:52 -07:00
Nick Terrell e962f07d19
[fuzz] Add a compression fuzzer with randomly sized output buffer (#1670) 2019-07-02 22:05:07 -07:00
Nick Terrell 6c92ba774e
ZSTD_compressSequences_internal assert op <= oend (#1667)
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet 857e608b51
Merge pull request #1658 from facebook/memset
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet 4d611ca405
Merge pull request #1664 from ephiepark/dev
decodecorpus
2019-07-01 14:13:49 -07:00
Ephraim Park 01f5b5d918
Merge pull request #7 from ephiepark/decodecorpus
reflect code review comments
2019-07-01 10:18:59 -07:00
Ephraim Park 28309520c0 reflect code review comments 2019-07-01 10:17:30 -07:00