4238 Commits

Author SHA1 Message Date
Yann Collet
2e6f5bc0d8
Merge pull request #2771 from facebook/opt_investigation
Improve optimal parser performance on small data
2021-09-14 10:36:34 -07:00
Nick Terrell
d22bbed5db
Merge pull request #2776 from terrelln/oss-fuzz-fix
[rsyncable] Ensure ZSTD_compressBound() is respected
2021-09-14 09:37:43 -07:00
Yann Collet
fd94b9d1c9 Merge branch 'dev' into opt_investigation 2021-09-14 01:15:51 -07:00
Nick Terrell
a418b4e478 [rsyncable] Ensure ZSTD_compressBound() is respected
In degenerate cases `--rsyncable` could create very small blocks (1
byte). This causes the compressed output to be larger than
`ZSTD_compressBound()`. Fix the issue by ensuring that rsyncable mode
never outputs blocks smaller than 128 KB.

The minimum job size is 512 KB, so we shouldn't lose many
synchronization points from skipping any that cause blocks smaller than
128 KB. And even if we do, that is fine, because we'll find the next
one.

This fixes the `raw_dictionary_round_trip` oss-fuzz assert.

Credit to OSS-Fuzz
2021-09-13 17:14:07 -07:00
Sen Huang
1daf3c8dbc Use 32 buckets for log2 bucketing in huffman sort 2021-09-13 12:29:16 -04:00
Yann Collet
f58e63bee7 Merge branch 'dev' into opt_investigation 2021-09-12 01:42:49 -07:00
Felix Handte
d68aa19a2f
Merge pull request #2749 from felixhandte/zstd-fast-pipelined
Pipelined Implementation of ZSTD_fast (~+5% Speed)
2021-09-09 17:05:30 -04:00
Yann Collet
b7f46ebc23 use ZSTD_memcpy() for better portability
notably within kernel space
2021-09-08 14:45:53 -07:00
Yann Collet
7fce9a41b5 change update rate to 12/11/11/11
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.

Updated small files tests in fuzzer
2021-09-08 14:05:57 -07:00
Yann Collet
ef78611c26 change update rate to 11/10/10/10
better for larger blocks,
very small inefficiency on small block.
2021-09-08 08:58:28 -07:00
Yann Collet
42a3ed752a removed frequency booster for stat initialization of btultra2
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
2021-09-08 07:56:43 -07:00
Yann Collet
08ceda3dfc new statistics update policy
small general compression ratio improvement for btopt+ strategies/
2021-09-04 00:52:44 -07:00
Yann Collet
23a9368c45 new starting offcode table for zstd_opt 2021-09-03 17:41:42 -07:00
Yann Collet
27a8bbe265 new initializer for ll price 2021-09-03 16:07:31 -07:00
Yann Collet
f0fc8cb3e1 Disable console notification by default within the library
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
2021-09-03 13:44:07 -07:00
Yann Collet
eab692211e removed pretty-print of sizes in benchmark
This is less appropriate for this mode :
benchmark is about accuracy,
it's important to read the exact values.
2021-09-03 12:51:02 -07:00
sen
71076b7a01
Merge pull request #2763 from senhuang42/opt_compiletime
Improve compile speed and binary size in `opt`
2021-09-02 11:59:02 -04:00
Yann Collet
a8cf85ad0a
Merge pull request #2762 from facebook/level13
minor rebalancing of level 13
2021-09-01 20:32:53 -07:00
Sen Huang
d88c1d95ce Remove inlining for opt 2021-09-01 16:58:57 -04:00
Yann Collet
70d89e5a12 minor rebalancing of level 13
This new setup is slighly better on `silesia.tar` :
Ratio : 3.649 -> 3.655
Speed : 11.9 MB/s -> 12.2 MB/s
At the cost of more memory : 24 MB -> 32 MB
The new memory budget is a reasonable interpolation between neighboring levels 12 and 14:
level 12 : 24 MB
level 13 : 32 MB (increased from 24 MB)
level 14 : 48 MB
Window size remains unaffected (4 MB)
2021-09-01 13:05:10 -07:00
senhuang42
414e24becf Add 8 bytes to FSE workspace 2021-09-01 15:56:33 -04:00
W. Felix Handte
d6fd7761c9 Fix VS Build: Explicitly Cast to Narrow Ints 2021-09-01 14:15:04 -04:00
W. Felix Handte
15e67bfa7e Deduplicate Implementations
This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new
`ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is
functionally a no-op.
2021-09-01 14:15:04 -04:00
W. Felix Handte
64054dec44 Tweak Step 2021-09-01 14:15:04 -04:00
W. Felix Handte
24fcccd05c Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed)
Unrolling the loop to handle 2 positions in each iteration allows us to reduce
the frequency of some operations that don't need to happen at every position.
One such operation is the step calculation, which is a very rough heuristic
anyways. It's fine if we do this a position later. The other operation is the
repcode check. But since the repcode check already tries expanding back one
position, we're really not missing much of importance by only trying it every
other position.

This commit also slightly reorders some operations.
2021-09-01 14:15:04 -04:00
W. Felix Handte
57a100f6dc Add ip1 + 128 Prefetch; Tiny Cleanup 2021-09-01 14:15:04 -04:00
W. Felix Handte
991d660ea9 Nit: Only Store 2 Hash Variables 2021-09-01 14:15:04 -04:00
W. Felix Handte
8706bc115a Nit: Dedup idx0 and idx1 2021-09-01 14:15:04 -04:00
W. Felix Handte
7c24c3e6ce Give Up on Searching End of Block
Amusingly, it seems to be a non-trivial performance hit to add in final
searches or even hash table insertions during cleanup. So let's not. It seems
to not make any meaningful difference in compression ratio.
2021-09-01 14:15:03 -04:00
W. Felix Handte
35932ab2f1 Prefetch Input in Incompressible Sections (+0.25% Speed) 2021-09-01 14:15:03 -04:00
W. Felix Handte
b092dd75b7 Shrink Pipeline from 4 Positions to 3 2021-09-01 14:15:03 -04:00
W. Felix Handte
387840af79 Re-Order Operations for Slightly Better Performance 2021-09-01 14:15:03 -04:00
W. Felix Handte
bc768bccc0 Track Step Size Statefully, Rather than Recalculating Every Time 2021-09-01 14:15:03 -04:00
W. Felix Handte
80bc12b33a Initial Pipelined Implementation for ZSTD_fast 2021-09-01 14:15:03 -04:00
Yann Collet
74b4171fb8 fix alignment condition in FSE_buildCTable
2-bytes alignment is enough for 16-bit fields
2021-08-29 19:05:04 -07:00
Yann Collet
18a20b3ad7
Merge pull request #2752 from facebook/hashLog3max
make ZSTD_HASHLOG3_MAX private
2021-08-20 12:51:17 -07:00
Yann Collet
2de42174bb make ZSTD_HASHLOG3_MAX private
This is an implementation detail,
it doesn't belong to public space (zstd.h).
2021-08-20 09:52:42 -07:00
sen
ae998544de
Merge pull request #2750 from senhuang42/sb_compress
Improve branch misses on FSE symbol spreading
2021-08-20 12:47:24 -04:00
senhuang42
da095ed899 Improve branch misses on FSE symbol spreading 2021-08-18 10:22:22 -07:00
Clément Chigot
399849e236 Makefile: add AIX support
For lib, AIX linker doesn't allow --soname.
2021-08-13 10:25:14 +02:00
Sen Huang
539b3aab9b Optimize 32-bit VecMask_next() 2021-08-04 17:14:58 -04:00
senhuang42
e411040ea1 Add 64 row entry support for lazy 2021-08-04 16:19:12 -04:00
senhuang42
31820e032c Rebalance clevels for lazy 2021-08-04 16:18:52 -04:00
senhuang42
aa1957477b Improve Huffman sorting algorithm 2021-08-04 12:43:34 -04:00
Nick Terrell
6ee70bae46
Merge pull request #2733 from terrelln/huf-cspeed
[HUF] Improve Huffman encoding speed
2021-08-03 12:59:54 -04:00
Nick Terrell
d8a0797268 [fuzz] Add Huffman round trip fuzzer
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
  - Incorrect weight comparison (weights are allowed to be equal to
    table log).
  - HUF_compress1X_usingCTable_internal() can return compressed
    size >= source size, so the assert that `cSize <= 65535` isn't
    correct, and it needs to be checked instead.
2021-08-03 08:10:06 -07:00
sen
5c46f62006
Merge pull request #2677 from senhuang42/ci_overhaul_2
[CI][2/2] Migrate CI tests which (currently) fail
2021-08-02 09:55:49 -04:00
Sen Huang
5ec7897a26 Fix static analyzer warnings 2021-07-29 09:11:12 -07:00
Nick Terrell
46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
W. Felix Handte
da58821ff2 Fix DDSS Load
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.

Credit-to: OSS-Fuzz
2021-07-27 11:49:44 -04:00