Commit Graph

4090 Commits (2624652a3273817a99fb977975db96e38d440bae)

Author SHA1 Message Date
W. Felix Handte 051b473e7e Fall Back in _extDict to New _noDict Rather than Old Merged Impl 2021-10-05 14:54:37 -04:00
W. Felix Handte fcab4841aa Nit: Rename Function 2021-10-05 14:54:37 -04:00
W. Felix Handte 47fd762ecc Nit: Unnest Blocks that Don't Declare Anything 2021-10-05 14:54:37 -04:00
W. Felix Handte 2cdfad538c Search One Last Position 2021-10-05 14:54:37 -04:00
W. Felix Handte 6ae44c0db8 Advance Long Index Lookup (+0.5% Speed)
This lookup can be advanced to before the short match check because either way
we will use it (in the next loop iter or in `_search_next_long`).
2021-10-05 14:54:37 -04:00
W. Felix Handte 2ddef7c872 Write Back Advanced Hash in Long Matches as Well (+Ratio)
Since we're now hashing the position ahead even if we find a long match and
don't search that next position, we can write it back into the hashtable even
in long matches. This seems to cost us no speed, and improves compression
ratio slightly!
2021-10-05 14:54:37 -04:00
W. Felix Handte 39f2491bfc Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed)
This costs a little ratio, unfortunately.
2021-10-05 14:54:37 -04:00
W. Felix Handte db4e1b5479 Hash Long One Position Ahead (+2.5% Speed)
Aside from maybe a latency win in the loop, this means that when we find a
short match, we've already done the hash we need to check the next long match.
2021-10-05 14:54:37 -04:00
W. Felix Handte a1ac7205d0 Pull Match Found Stuff Out of the Loop 2021-10-05 14:54:37 -04:00
W. Felix Handte 072ffaad67 Extract Working Variables 2021-10-05 14:54:37 -04:00
W. Felix Handte 1bdf041071 Track Step Rather than Recalculating (+0.5% Speed) 2021-10-05 14:54:37 -04:00
W. Felix Handte 258c0623e1 Extract Single-Segment Variant of ZSTD_dfast 2021-10-05 14:54:37 -04:00
stanjo74 52598d54e9
Limit train samples (#2809)
* Limit training samples size to 2GB

* simplified DISPLAYLEVEL() macro to use global vqriable instead of local.

* refactored training samples loading

* fixed compiler warning

* addressed comments from the pull request

* addressed @terrelln comments

* missed some fixes

* fixed type mismatch

* Fixed bug passing estimated number of samples rather insted of the loaded number of samples.
Changed unit conversion not to use bit-shifts.

* fixed a declaration after code

* fixed type conversion compile errors

* fixed more type castting

* fixed more type mismatching

* changed sizes type to size_t

* move type casting

* more type cast fixes
2021-10-04 17:47:52 -07:00
Yann Collet 7868f38019
Merge pull request #2747 from Helflym/dev
Add AIX support in Makefile
2021-10-01 08:13:39 -07:00
Nick Terrell 3a4d421c0f
Merge pull request #2802 from solbjorn/fix-kernel-wundef
[contrib][linux] Fix -Wundef inside Linux kernel tree
2021-09-29 09:48:17 -07:00
Ma Lin 894f05e88d Fix ZSTD_countTrailingZeros() bug
`>> 3` is wrong.
2021-09-29 07:20:09 +08:00
Sen Huang 4b7f45cb04 Pull hot loop into its own function 2021-09-28 08:19:44 -07:00
Sen Huang ccdcbf4621 Try beginning and end of match 2021-09-28 08:19:44 -07:00
Sen Huang b8fd6bf30c Skip most long matches in lazy hash table update 2021-09-28 08:19:39 -07:00
Nick Terrell 9ef055d706
Merge pull request #2808 from terrelln/huf-oss-fuzz-fix
[huf] Fix OSS-Fuzz assert
2021-09-27 15:00:52 -07:00
Felix Handte 8b7a19fcd4
Merge pull request #2805 from nolange/smaller_code_with_disabled_features
Smaller code with disabled features
2021-09-27 17:43:21 -04:00
Nick Terrell a07ddb47f7 [huf] Fix OSS-Fuzz assert
PR #2784 introduced a bug in the decompressor that caused some valid
inputs to fail to decompress. The bitstream isn't reloaded after the 4X*
loop if the number of elements remaining is small enough, causing us to
read more bits than are available in the bitcontainer.

This was caught by the MSAN fuzzer in OSS-Fuzz because the assembly
implementation isn't used in the MSAN build.

Credit to OSS-Fuzz.
2021-09-27 13:56:07 -07:00
Ma Lin ae986fcdb8 Use __assume(0) for unreachable code path in msvc
msvc will optimize away the condition check.
2021-09-27 19:23:57 +08:00
Yann Collet 2ed14c2476 minor : fix comment
provide correct reasons to include zstd_internal.h
2021-09-26 08:44:18 -07:00
Norbert Lange 6763f40331 zstd_decompress: use a helper function for context create
Multiple ZSTD_createDCtx* functions call other (public)
ZSTD_createDCtx* functions, this makes it harder for humans
and compilers to throw out code that is not used.

This farms out the logic into a static function, if a program
only uses a single ZSTD_createDCtx variant, all others can be easily
dropped and the remaining implementation can be specialized.
2021-09-26 14:41:37 +02:00
Norbert Lange 0d45540695 decompress: conditionally remove bmi2 from context
Use an helper function, which will just return 0 in case
the feature is disabled.
Allows constant propagation and removal of dead code.
2021-09-26 14:41:37 +02:00
Norbert Lange 02296cac82 decompress: conditionally remove legacy members from context
Remove the then unneeded variables from the struct,
and all accesses to them.
2021-09-26 12:12:17 +02:00
Alexander Lobakin 71526e6f29 [contrib][linux] Fix -Wundef inside Linux kernel tree
Commit d7ef97a013
("[build] Fix oss-fuzz build with the dataflow sanitizer") broke
build inside Linux-kernel after 'import', as it no longer can
conditionally remove ZSTD_MEMORY_SANITIZER definition from
the #if DEF_A || DEF_B block. This emits -Wundef warning which
can be treated as error.
Split this preprocessor condition into two separate conditions
to fix this.

Fixes: d7ef97a013 ("[build] Fix oss-fuzz build with the dataflow sanitizer")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
2021-09-25 13:35:25 +02:00
Ma Lin e5ba858270 Don't initialize the first parameter of _BitScanForward* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64
2021-09-25 16:36:53 +08:00
Ma Lin 95f492ea17 Don't initialize the first parameter of _BitScanReverse* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64
2021-09-25 16:36:53 +08:00
Ma Lin cc22042da0 Fix a C89 error in msvc
Variables (r) must be declared at the beginning of a code block.
This causes msvc2012 to fail to compile 64-bit build.
2021-09-25 16:32:06 +08:00
Nick Terrell 14772d97be
Merge pull request #2796 from terrelln/linux-fixes
[lib] Make lib compatible with `-Wfall-through` excepting legacy
2021-09-23 16:11:53 -07:00
Nick Terrell 01976ce4cd
Merge pull request #2799 from terrelln/oss-fuzz-build
[build] Fix oss-fuzz build with the dataflow sanitizer
2021-09-23 15:55:10 -07:00
Nick Terrell 1903d6a5a8
Merge pull request #2798 from abxhr/typo-fix
Fix typo
2021-09-23 13:11:45 -07:00
Nick Terrell d7ef97a013 [build] Fix oss-fuzz build with the dataflow sanitizer
The dataflow sanitizer requires all code to be instrumented. We can't
instrument the ASM function, so we have to disable it.
2021-09-23 11:48:39 -07:00
Abshar Mohammed Aslam 54a888b57b
Fix typo 2021-09-23 21:54:38 +04:00
Nick Terrell 189e87bcbe [lib] Make lib compatible with `-Wfall-through` excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Yann Collet fa2a4d77c7 constify MatchState* parameter when possible
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.

This is supposed to be helpful for both maintenance and the compiler.
2021-09-23 08:27:44 -07:00
senhuang42 1d8143c84f Move block splitter from stack to CCtx 2021-09-23 00:02:31 -04:00
sen 044c8b4722
Merge pull request #2779 from senhuang42/fse_fix
Fix NCountWriteBound
2021-09-22 13:51:21 -04:00
sen 1e99d36361
Merge pull request #2788 from senhuang42/param_switch
Use new paramSwitch enum for row matchfinder and block splitter
2021-09-22 13:27:55 -04:00
Nick Terrell 9450876a9d [huf] Fix compilation when DYNAMIC_BMI2=0 && BMI2 is supported
* Fix compilation issues pointed out in PR #2790.
* Add test cases to GitHub actions that test all combinations of
  `DYNAMIC_BMI2` BMI2 support.
2021-09-21 16:49:13 -07:00
senhuang42 06f42c3bfd Use new paramSwitch enum for LDM 2021-09-21 14:22:09 -04:00
senhuang42 b5c35d7ea3 Use new paramSwitch enum for LCM, row matchfinder, and block splitter 2021-09-21 14:22:02 -04:00
Nick Terrell a5f2c45528 Huffman ASM 2021-09-20 14:46:43 -07:00
Nick Terrell d7542aacd9 [fuzzer] Add huf_decompress fuzzer
Add a fuzzer for Huffman decompression. Fix several bugs in Huffman
decompression, mostly related to `op == NULL` and pointer underflow.
2021-09-17 15:00:49 -07:00
Nick Terrell 8bf699aa59 [build] Add support for ASM files in Make + CMake
* Extract out common portion of `lib/Makefile` into `lib/libzstd.mk`.
  Most relevantly, the way we find library files.
* Use `lib/libzstd.mk` in the other Makefiles instead of repeating the
  same code.
* Add a test `tests/test-variants.sh` that checks that the builds of
  `make -C programs allVariants` are correct, and run it in Actions.
* Adds support for ASM files in the CMake build.

The Meson build is not updated because it lists every file in zstd,
and supports ASM off the bat, so the Huffman ASM commit will just add
the ASM file to the list.

The Visual Studios build is not updated because I'm not adding ASM
support to Visual Studios yet.
2021-09-17 14:13:53 -07:00
sen 9d2a45a705
Merge pull request #2778 from senhuang42/opt_inlining_revert
Revert opt outlining change
2021-09-15 14:22:10 -04:00
Sen Huang a7aa2c5df6 Fix NCountWriteBound 2021-09-15 09:51:42 -07:00
Sen Huang bd84e4a9d3 Revert opt outlining change 2021-09-15 09:08:41 -07:00
Nick Terrell 2fabd370bb
Merge pull request #2777 from terrelln/oss-fuzz-fix
[rsyncable] Fix test failures
2021-09-14 13:20:22 -07:00
Nick Terrell 9d9e2ed00b [rsyncable] Fix test failures
Test failures showed up on the daily cron job. They didn't show up
in CI because the condition is somewhat rare, and didn't trigger
during the CI tests.

This PR fixes up the logic in `findSynchronizationPoint()` to correctly
handle the edge case. It also un-comments an assert that helps catch the
issue, and verify that rsyncable mode is calculating the correct hash.

After the fix, the test that failed passes:

```
./zstreamtest --newapi -t1 --no-big-tests -s9680
```
2021-09-14 12:28:53 -07:00
Yann Collet 2e6f5bc0d8
Merge pull request #2771 from facebook/opt_investigation
Improve optimal parser performance on small data
2021-09-14 10:36:34 -07:00
Nick Terrell d22bbed5db
Merge pull request #2776 from terrelln/oss-fuzz-fix
[rsyncable] Ensure ZSTD_compressBound() is respected
2021-09-14 09:37:43 -07:00
Yann Collet fd94b9d1c9 Merge branch 'dev' into opt_investigation 2021-09-14 01:15:51 -07:00
Nick Terrell a418b4e478 [rsyncable] Ensure ZSTD_compressBound() is respected
In degenerate cases `--rsyncable` could create very small blocks (1
byte). This causes the compressed output to be larger than
`ZSTD_compressBound()`. Fix the issue by ensuring that rsyncable mode
never outputs blocks smaller than 128 KB.

The minimum job size is 512 KB, so we shouldn't lose many
synchronization points from skipping any that cause blocks smaller than
128 KB. And even if we do, that is fine, because we'll find the next
one.

This fixes the `raw_dictionary_round_trip` oss-fuzz assert.

Credit to OSS-Fuzz
2021-09-13 17:14:07 -07:00
Sen Huang 1daf3c8dbc Use 32 buckets for log2 bucketing in huffman sort 2021-09-13 12:29:16 -04:00
Yann Collet f58e63bee7 Merge branch 'dev' into opt_investigation 2021-09-12 01:42:49 -07:00
Felix Handte d68aa19a2f
Merge pull request #2749 from felixhandte/zstd-fast-pipelined
Pipelined Implementation of ZSTD_fast (~+5% Speed)
2021-09-09 17:05:30 -04:00
Yann Collet b7f46ebc23 use ZSTD_memcpy() for better portability
notably within kernel space
2021-09-08 14:45:53 -07:00
Yann Collet 7fce9a41b5 change update rate to 12/11/11/11
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.

Updated small files tests in fuzzer
2021-09-08 14:05:57 -07:00
Yann Collet ef78611c26 change update rate to 11/10/10/10
better for larger blocks,
very small inefficiency on small block.
2021-09-08 08:58:28 -07:00
Yann Collet 42a3ed752a removed frequency booster for stat initialization of btultra2
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
2021-09-08 07:56:43 -07:00
Yann Collet 08ceda3dfc new statistics update policy
small general compression ratio improvement for btopt+ strategies/
2021-09-04 00:52:44 -07:00
Yann Collet 23a9368c45 new starting offcode table for zstd_opt 2021-09-03 17:41:42 -07:00
Yann Collet 27a8bbe265 new initializer for ll price 2021-09-03 16:07:31 -07:00
Yann Collet f0fc8cb3e1 Disable console notification by default within the library
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
2021-09-03 13:44:07 -07:00
Yann Collet eab692211e removed pretty-print of sizes in benchmark
This is less appropriate for this mode :
benchmark is about accuracy,
it's important to read the exact values.
2021-09-03 12:51:02 -07:00
sen 71076b7a01
Merge pull request #2763 from senhuang42/opt_compiletime
Improve compile speed and binary size in `opt`
2021-09-02 11:59:02 -04:00
Yann Collet a8cf85ad0a
Merge pull request #2762 from facebook/level13
minor rebalancing of level 13
2021-09-01 20:32:53 -07:00
Sen Huang d88c1d95ce Remove inlining for opt 2021-09-01 16:58:57 -04:00
Yann Collet 70d89e5a12 minor rebalancing of level 13
This new setup is slighly better on `silesia.tar` :
Ratio : 3.649 -> 3.655
Speed : 11.9 MB/s -> 12.2 MB/s
At the cost of more memory : 24 MB -> 32 MB
The new memory budget is a reasonable interpolation between neighboring levels 12 and 14:
level 12 : 24 MB
level 13 : 32 MB (increased from 24 MB)
level 14 : 48 MB
Window size remains unaffected (4 MB)
2021-09-01 13:05:10 -07:00
senhuang42 414e24becf Add 8 bytes to FSE workspace 2021-09-01 15:56:33 -04:00
W. Felix Handte d6fd7761c9 Fix VS Build: Explicitly Cast to Narrow Ints 2021-09-01 14:15:04 -04:00
W. Felix Handte 15e67bfa7e Deduplicate Implementations
This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new
`ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is
functionally a no-op.
2021-09-01 14:15:04 -04:00
W. Felix Handte 64054dec44 Tweak Step 2021-09-01 14:15:04 -04:00
W. Felix Handte 24fcccd05c Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed)
Unrolling the loop to handle 2 positions in each iteration allows us to reduce
the frequency of some operations that don't need to happen at every position.
One such operation is the step calculation, which is a very rough heuristic
anyways. It's fine if we do this a position later. The other operation is the
repcode check. But since the repcode check already tries expanding back one
position, we're really not missing much of importance by only trying it every
other position.

This commit also slightly reorders some operations.
2021-09-01 14:15:04 -04:00
W. Felix Handte 57a100f6dc Add `ip1 + 128` Prefetch; Tiny Cleanup 2021-09-01 14:15:04 -04:00
W. Felix Handte 991d660ea9 Nit: Only Store 2 Hash Variables 2021-09-01 14:15:04 -04:00
W. Felix Handte 8706bc115a Nit: Dedup idx0 and idx1 2021-09-01 14:15:04 -04:00
W. Felix Handte 7c24c3e6ce Give Up on Searching End of Block
Amusingly, it seems to be a non-trivial performance hit to add in final
searches or even hash table insertions during cleanup. So let's not. It seems
to not make any meaningful difference in compression ratio.
2021-09-01 14:15:03 -04:00
W. Felix Handte 35932ab2f1 Prefetch Input in Incompressible Sections (+0.25% Speed) 2021-09-01 14:15:03 -04:00
W. Felix Handte b092dd75b7 Shrink Pipeline from 4 Positions to 3 2021-09-01 14:15:03 -04:00
W. Felix Handte 387840af79 Re-Order Operations for Slightly Better Performance 2021-09-01 14:15:03 -04:00
W. Felix Handte bc768bccc0 Track Step Size Statefully, Rather than Recalculating Every Time 2021-09-01 14:15:03 -04:00
W. Felix Handte 80bc12b33a Initial Pipelined Implementation for ZSTD_fast 2021-09-01 14:15:03 -04:00
Yann Collet 74b4171fb8 fix alignment condition in FSE_buildCTable
2-bytes alignment is enough for 16-bit fields
2021-08-29 19:05:04 -07:00
Yann Collet 18a20b3ad7
Merge pull request #2752 from facebook/hashLog3max
make ZSTD_HASHLOG3_MAX private
2021-08-20 12:51:17 -07:00
Yann Collet 2de42174bb make ZSTD_HASHLOG3_MAX private
This is an implementation detail,
it doesn't belong to public space (zstd.h).
2021-08-20 09:52:42 -07:00
sen ae998544de
Merge pull request #2750 from senhuang42/sb_compress
Improve branch misses on FSE symbol spreading
2021-08-20 12:47:24 -04:00
senhuang42 da095ed899 Improve branch misses on FSE symbol spreading 2021-08-18 10:22:22 -07:00
Clément Chigot 399849e236 Makefile: add AIX support
For lib, AIX linker doesn't allow --soname.
2021-08-13 10:25:14 +02:00
Sen Huang 539b3aab9b Optimize 32-bit VecMask_next() 2021-08-04 17:14:58 -04:00
senhuang42 e411040ea1 Add 64 row entry support for lazy 2021-08-04 16:19:12 -04:00
senhuang42 31820e032c Rebalance clevels for lazy 2021-08-04 16:18:52 -04:00
senhuang42 aa1957477b Improve Huffman sorting algorithm 2021-08-04 12:43:34 -04:00
Nick Terrell 6ee70bae46
Merge pull request #2733 from terrelln/huf-cspeed
[HUF] Improve Huffman encoding speed
2021-08-03 12:59:54 -04:00
Nick Terrell d8a0797268 [fuzz] Add Huffman round trip fuzzer
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
  - Incorrect weight comparison (weights are allowed to be equal to
    table log).
  - HUF_compress1X_usingCTable_internal() can return compressed
    size >= source size, so the assert that `cSize <= 65535` isn't
    correct, and it needs to be checked instead.
2021-08-03 08:10:06 -07:00
sen 5c46f62006
Merge pull request #2677 from senhuang42/ci_overhaul_2
[CI][2/2] Migrate CI tests which (currently) fail
2021-08-02 09:55:49 -04:00
Sen Huang 5ec7897a26 Fix static analyzer warnings 2021-07-29 09:11:12 -07:00
Nick Terrell 46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
W. Felix Handte da58821ff2 Fix DDSS Load
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.

Credit-to: OSS-Fuzz
2021-07-27 11:49:44 -04:00
Nick Terrell ba044bd6f1 [bug-fix] Fix a determinism bug with the DUBT
The DUBT can be non-deterministic if an index is equal to
`ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the
indices at 2.

This bug was found by the OSS-Fuzz determinism fuzzer. With this change
the fuzzer test passes. And I've confirmed that this is the root cause,
not just hiding the problem.

Aside: This took me a long time to figure out, because I thought I had
tried this first thing. But, apparantly I messed it up, because when I
was going through it again with @felixhandte, I was pointing out that it
wasn't the case, but it turns out it was.

Credit to: OSS-Fuzz
2021-07-15 13:02:49 -07:00
makise-homura 3cd085cec3 Clarify no-tree-vectorize usage for ICC and LCC 2021-07-14 20:00:44 +03:00
makise-homura a5f518ae27 Change zstdcli's main() declaration due to -Wmain on some compilers 2021-07-14 19:55:47 +03:00
makise-homura d4ad02c721 Add support for MCST LCC compiler 2021-07-10 03:57:06 +03:00
binhdvo b3e372c171
Merge pull request #2717 from binhdvo/bootcamp
Proactively skip huffman compression based on sampling where non-comp…
2021-07-01 10:39:58 -04:00
Binh Vo dc5b693f1e Proactively skip huffman compression based on sampling where non-compressibility is suspected 2021-06-30 11:02:47 -04:00
Nick Terrell 609be382ac
Merge pull request #2719 from danlark1/danlark_iwyu
Include what you use in zstd_ldm_geartab
2021-06-29 16:53:10 -07:00
Nick Terrell 094b26081f
Merge pull request #2689 from danlark1/dev
Optimize zstd decompression by another x%
2021-06-29 11:34:36 -07:00
Danila Kutenin e855b78be6 Include what you use in zstd_ldm_geartab 2021-06-29 17:57:53 +01:00
Danila Kutenin 2c2c9e7dfd Add possible improvements for gcc-11 2021-06-29 09:06:47 +01:00
sen 45d707e908
Merge pull request #2715 from senhuang42/sequence_api_3
[RFC] Add internal API for converting ZSTD_Sequence into seqStore
2021-06-24 13:02:11 -04:00
senhuang42 76466dfadf Add simple API for converting ZSTD_Sequence into seqStore 2021-06-23 12:10:48 -04:00
Érico Nogueira 4d09952701 [lib] Fix libzstd.pc for lib-mt builds
Add the libzstd.pc target to the lib target in lib/Makefile, which makes
it inherit LDFLAGS_DYNLIB from the lib-mt target. This allows us to add
a Libs.private field to libzstd.pc which gets conditionally populated
with '-pthread'.

The 1.5.0 release notes mention that the static library isn't
multi-threaded by default, due to concern for people building static
binaries with libzstd:

   Now the dynamic library supports multi-threaded compression by
   default.  Note that this property is not extended to the static
   library because doing so would have impacted the build script of
   existing client applications (requiring them to add -pthread to their
   recipe), thus potentially breaking their build.

To get closer to being able to enable multi-threading for all library
builds by default, this commit makes it so that any libzstd consumer
using pkg-config gets the correct flags.

We also fix the indentation of the rule for libzstd.pc and move it
outside the if/endif block for install rules (which uses a list of OSs
where the rules were validated), so the rule is available for all users
of the 'lib*' targets.
2021-06-19 17:42:24 -03:00
Usuario 8bdce1ff97 lib/Makefile: Fix small typo in ZSTD_FORCE_DECOMPRESS_* build macros 2021-06-18 10:07:39 -04:00
binhdvo 0152435ab0
Merge pull request #2708 from binhdvo/skippable
Add API for fetching skippable frame content
2021-06-14 19:00:31 -04:00
Binh Vo 9d9f7680f8 Add API for fetching skippable frame content 2021-06-14 16:01:28 -04:00
Nick Terrell 05b6773fbc [fix] Add missing bounds checks during compression
* The block splitter missed a bounds check, so when the buffer is too small it
  passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which
  can then write the compressed data past the end of the buffer. This is a new
  regression in v1.5.0 when the block splitter is enabled. It is either enabled
  explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()`
  or `ZSTD_compressStream*()`.
* `HUF_writeCTable_wksp()` omits a bounds check when calling
  `HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass
  an erroneously large size to `HUF_compressWeights()`, which can then write
  past the end of the buffer. This bug has been present for ages. However, I
  believe that zstd cannot trigger the bug, because it never calls
  `HUF_compress*()` with `dstCapacity == 0` because of [this check][1].

Credit to: Oss-Fuzz

[1]: 89127e5ee2/lib/compress/zstd_compress_literals.c (L100)
2021-06-14 11:35:33 -07:00
sen d5f3568c4b
Merge pull request #2697 from senhuang42/entropy_repeat_fix
[bug] Fix entropy repeat mode bug
2021-06-10 16:39:17 +03:00
aqrit dd4f6aa9e6
Flatten ZSTD_row_getMatchMask (#2681)
* Flatten ZSTD_row_getMatchMask

* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary. 
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)

* replace memcpy with MEM_readST

* silence alignment warnings

* fix neon casts

* Update zstd_lazy.c

* unify simd preprocessor detection (#3)

* remove duplicate asserts

* tweak rotates

* improve endian detection

* add cast

there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..

* more casts

* fix casts

better work-around for the (bogus) warning: unary minus on unsigned
2021-06-09 08:50:25 +03:00
Danila Kutenin 08a3ddbd28 Add comment for gcc-11 2021-06-08 20:54:21 +01:00
Danila Kutenin 6534c0000f Be C89 compliant and fix alignment for gcc11 2021-06-08 20:45:57 +01:00
Felix Handte 8a3bdfaa7b
Merge pull request #2654 from wolfpld/dev
Initialize "potentially uninitialized" pointers.
2021-06-07 13:04:19 -04:00
Sen Huang 923e5ad3f5 Fix entropy repeat mode bug 2021-06-07 00:32:03 -07:00
Danila Kutenin 444f4db955 Move declaration of 1 to an inlined cast 2021-05-29 20:55:37 +01:00
Danila Kutenin a80d268700 Optimize ZSTD_decodeSequence by another x% 2021-05-29 18:21:10 +01:00
senhuang42 939276cd0c Add ldm and block splitter auto-enable to old api 2021-05-24 13:09:32 -04:00
Nick Terrell 746f7976ab [trace] Refine the ZSTD_HAVE_WEAK_SYMBOLS detection
* Only enable for ELF on x86-64 or i386.
* Also explicitly disable for AIX.

Fixes #2658.
2021-05-18 20:22:36 -07:00
Yann Collet 02ece5d59f
Merge pull request #2653 from TrianglesPCT/dev
Enable SSE2 compression path to work on MSVC
2021-05-17 11:20:50 -07:00
Dan Nelson 54f78e3df8 ZSTD_VecMask_next: fix incorrect variable name in fallback code path 2021-05-15 10:20:37 -05:00
TrianglesPCT bee0ef5647
Update zstd_lazy.c
It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.
2021-05-14 19:23:13 -06:00
TrianglesPCT d688ab1e0c
Add files via upload
AVX2
2021-05-14 19:18:12 -06:00
TrianglesPCT bb1cdd8c63
Update zstd_lazy.c
add space
2021-05-14 19:11:28 -06:00
TrianglesPCT a62856bf65
Update zstd_lazy.c
Remove the AVX2 part
2021-05-14 19:10:24 -06:00
TrianglesPCT 8f7ea1afeb
Update zstd_lazy.c
Switch to other comment style
2021-05-14 19:02:34 -06:00
TrianglesPCT 0e071214b5
Update zstd_lazy.c
switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads
2021-05-14 17:03:30 -06:00
TrianglesPCT 69ac124b12
Update zstd_lazy.c 2021-05-14 16:53:19 -06:00
TrianglesPCT 0b9f4bb0ff
Update zstd_lazy.c
use 8bit
2021-05-14 16:47:24 -06:00
Bartosz Taudul 7012c6e7a4
Initialize "potentially uninitialized" pointers. 2021-05-15 00:40:49 +02:00
TrianglesPCT 77d54eb3b3
Add files via upload 2021-05-14 16:40:32 -06:00
TrianglesPCT 52f44bb365
Add files via upload
msvc
2021-05-14 16:33:07 -06:00
TrianglesPCT 25bda9053a
Add files via upload
msvc suport
avx2 path
2021-05-14 16:32:04 -06:00
Stephen Kitt e81d567547
Distinguish static symbols, allow hiding them
Even with -fvisibility=hidden added to CFLAGS, any symbol which is
given a default visibility attribute ends up exported in the dynamic
library. This happens through zstd_internal.h which defines
..._STATIC_LINKING_ONLY before including various header files, and is
included for example in lib/common/pool.c.

To avoid this, this patch distinguishes static and non-static APIs, by
using ZSTDLIB_API only for the latter, and introducing
ZSTDLIB_STATIC_API for the former. For now, both are exported, but
non-static APIs can be hidden by overriding the definition
ZSTDLIB_STATIC_API. lib/Makefile is modified to allow this using

	make CPPFLAGS_DYNLIB=-DZSTDLIB_STATIC_API=ZSTDLIB_HIDDEN

In addition, API declarations are dropped from zstd_compress.c (they
aren't needed there).

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-05-14 19:41:59 +02:00
Nick Terrell 03c4111299 [lib] Fix dictionary invalidation logic
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.

This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.

The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
   never updated.
4. We will call the ext-dict match finder instead of the prefix one.
2021-05-13 17:05:59 -07:00
Nick Terrell 10b35b312b [lib] Fix off-by-one error in repcode checks
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
2021-05-13 17:05:59 -07:00
Nick Terrell 91c9a247b6 [lib] Fix determinism bug in the optimal parser
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.

40def70387/lib/compress/zstd_opt.c (L476)

This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.

The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
2021-05-13 17:05:59 -07:00
Yann Collet 8fae35591e Merge branch 'dev' of github.com:facebook/zstd into dev 2021-05-12 13:12:30 -07:00
Yann Collet cb0cad9b79 reduce Max nb Workers to 64 in 32-bit mode
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).

This should fix test instability issues
using lot of threads in 32-bit environments.
2021-05-12 13:10:25 -07:00
sen c730b8c5a3
Remove const data members in threadpooltest payload (#2639) (#2640) 2021-05-12 16:09:48 -04:00