senhuang42
b5c35d7ea3
Use new paramSwitch enum for LCM, row matchfinder, and block splitter
2021-09-21 14:22:02 -04:00
sen
9d2a45a705
Merge pull request #2778 from senhuang42/opt_inlining_revert
...
Revert opt outlining change
2021-09-15 14:22:10 -04:00
Sen Huang
bd84e4a9d3
Revert opt outlining change
2021-09-15 09:08:41 -07:00
Yann Collet
b7f46ebc23
use ZSTD_memcpy() for better portability
...
notably within kernel space
2021-09-08 14:45:53 -07:00
Yann Collet
7fce9a41b5
change update rate to 12/11/11/11
...
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.
Updated small files tests in fuzzer
2021-09-08 14:05:57 -07:00
Yann Collet
ef78611c26
change update rate to 11/10/10/10
...
better for larger blocks,
very small inefficiency on small block.
2021-09-08 08:58:28 -07:00
Yann Collet
42a3ed752a
removed frequency booster for stat initialization of btultra2
...
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
2021-09-08 07:56:43 -07:00
Yann Collet
08ceda3dfc
new statistics update policy
...
small general compression ratio improvement for btopt+ strategies/
2021-09-04 00:52:44 -07:00
Yann Collet
23a9368c45
new starting offcode table for zstd_opt
2021-09-03 17:41:42 -07:00
Yann Collet
27a8bbe265
new initializer for ll price
2021-09-03 16:07:31 -07:00
Yann Collet
f0fc8cb3e1
Disable console notification by default within the library
...
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
2021-09-03 13:44:07 -07:00
Yann Collet
eab692211e
removed pretty-print of sizes in benchmark
...
This is less appropriate for this mode :
benchmark is about accuracy,
it's important to read the exact values.
2021-09-03 12:51:02 -07:00
Sen Huang
d88c1d95ce
Remove inlining for opt
2021-09-01 16:58:57 -04:00
Nick Terrell
46f2710562
[HUF] Improve Huffman encoding speed
...
Improve Huffman encoding speed by 20% for gcc and 10% for clang.
| Compiler | Benchmark | Config | Dataset | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc | compress | level_1 | enwik7 | 2.43 | 253.70 | 258.72 | 2.0% |
| gcc | compress | level_1 | silesia | 2.88 | 341.90 | 348.15 | 1.8% |
| gcc | compress_literals | level_1 | enwik7 | 1.49 | 761.83 | 912.76 | 19.8% |
| gcc | compress_literals | level_1 | silesia | 1.28 | 754.83 | 902.37 | 19.5% |
| gcc | compress_literals | level_7 | enwik7 | 1.29 | 502.81 | 552.79 | 9.9% |
| gcc | compress_literals | level_7 | silesia | 1.11 | 675.97 | 776.44 | 14.9% |
| clang | compress | level_1 | enwik7 | 2.43 | 277.54 | 280.98 | 1.2% |
| clang | compress | level_1 | silesia | 2.88 | 369.98 | 375.46 | 1.5% |
| clang | compress_literals | level_1 | enwik7 | 1.49 | 828.83 | 918.41 | 10.8% |
| clang | compress_literals | level_1 | silesia | 1.28 | 815.81 | 905.41 | 11.0% |
| clang | compress_literals | level_7 | enwik7 | 1.29 | 533.13 | 553.30 | 3.8% |
| clang | compress_literals | level_7 | silesia | 1.11 | 714.52 | 775.38 | 8.5% |
2021-07-27 15:10:35 -07:00
Nick Terrell
91c9a247b6
[lib] Fix determinism bug in the optimal parser
...
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.
40def70387/lib/compress/zstd_opt.c (L476)
This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.
The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
2021-05-13 17:05:59 -07:00
Nick Terrell
a494308ae9
[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
...
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Nick Terrell
66e811d782
[license] Update year to 2021
2021-01-04 17:53:52 -05:00
senhuang42
d6911b86be
Require LDM matches to be strictly greater in length
2020-10-09 12:56:18 -04:00
senhuang42
b9c8033cde
Define kNullRawSeqStore for every file
2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28
Change matchState_t::ldmSeqStore to pointer
2020-10-07 14:13:57 -04:00
senhuang42
abce708a56
Move posInSequence correction to correct location
2020-10-07 13:56:25 -04:00
senhuang42
0c515590d8
Replace offCode of largest match if ldm's offCode is superior
2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1
Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes
2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af
Refactor separate ldm variables all into one struct
2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2
Remove bubbling down matches with longer offCode and same matchLen
2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9
Enable inclusion of mid-flight LDMs in opt parser
2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942
Correct incorrect offcode calculation
2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202
Add explicit conversion of size_t to U32
2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d
Add cSize regression test to fuzzer.c
2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866
Prefix new static ldm helpers with ZSTD_opt
2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42
Add DEBUGLOG() calls in ldm helpers
2020-10-07 13:56:25 -04:00
senhuang42
10647924f1
Make function descriptions more accurate
2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7
Correct matchLength calculation and remove unnecessary functions
2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287
Reset ldmSeqStore after initStats_ultra() pass for btultra2
2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df
Refactor existing functions to use posInSequence
2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87
Adjustments to ldm_calculateMatchRange() to calculate bounds correctly
2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2
Add ldm_calculateMatchRange() function
2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84
Prevent duplicate LDMs from being inserted
2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec
Add extra bounds check to prevent heap access after free ASAN error
2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5
Address mixed variables C90 warning
2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18
ldm_getNextMatch fixed return values
2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68
Cleanups, add comments and explanations
2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808
Fixed sifting algorithm
2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96
Fixed end of match boundary update issues
2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2
Add proper bounds check on adding ldms
2020-10-07 13:56:25 -04:00
senhuang42
a2f2b58d04
Add a function ldm_voidSequences()
2020-10-07 13:56:25 -04:00
senhuang42
9c3c7cd20e
Fix function argument to getNextMatch()
2020-10-07 13:56:25 -04:00
senhuang42
c8b8572b38
Adjustments to no longer segfault on nci
2020-10-07 13:56:25 -04:00
senhuang42
5df9b5e05f
Add initial getNextMatch() in opt parser
2020-10-07 13:56:25 -04:00
senhuang42
f8ce7cabc3
Added more debugging
2020-10-07 13:56:25 -04:00