1939 Commits

Author SHA1 Message Date
W. Felix Handte
15e67bfa7e Deduplicate Implementations
This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new
`ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is
functionally a no-op.
2021-09-01 14:15:04 -04:00
W. Felix Handte
64054dec44 Tweak Step 2021-09-01 14:15:04 -04:00
W. Felix Handte
24fcccd05c Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed)
Unrolling the loop to handle 2 positions in each iteration allows us to reduce
the frequency of some operations that don't need to happen at every position.
One such operation is the step calculation, which is a very rough heuristic
anyways. It's fine if we do this a position later. The other operation is the
repcode check. But since the repcode check already tries expanding back one
position, we're really not missing much of importance by only trying it every
other position.

This commit also slightly reorders some operations.
2021-09-01 14:15:04 -04:00
W. Felix Handte
57a100f6dc Add ip1 + 128 Prefetch; Tiny Cleanup 2021-09-01 14:15:04 -04:00
W. Felix Handte
991d660ea9 Nit: Only Store 2 Hash Variables 2021-09-01 14:15:04 -04:00
W. Felix Handte
8706bc115a Nit: Dedup idx0 and idx1 2021-09-01 14:15:04 -04:00
W. Felix Handte
7c24c3e6ce Give Up on Searching End of Block
Amusingly, it seems to be a non-trivial performance hit to add in final
searches or even hash table insertions during cleanup. So let's not. It seems
to not make any meaningful difference in compression ratio.
2021-09-01 14:15:03 -04:00
W. Felix Handte
35932ab2f1 Prefetch Input in Incompressible Sections (+0.25% Speed) 2021-09-01 14:15:03 -04:00
W. Felix Handte
b092dd75b7 Shrink Pipeline from 4 Positions to 3 2021-09-01 14:15:03 -04:00
W. Felix Handte
387840af79 Re-Order Operations for Slightly Better Performance 2021-09-01 14:15:03 -04:00
W. Felix Handte
bc768bccc0 Track Step Size Statefully, Rather than Recalculating Every Time 2021-09-01 14:15:03 -04:00
W. Felix Handte
80bc12b33a Initial Pipelined Implementation for ZSTD_fast 2021-09-01 14:15:03 -04:00
Yann Collet
74b4171fb8 fix alignment condition in FSE_buildCTable
2-bytes alignment is enough for 16-bit fields
2021-08-29 19:05:04 -07:00
Yann Collet
18a20b3ad7
Merge pull request #2752 from facebook/hashLog3max
make ZSTD_HASHLOG3_MAX private
2021-08-20 12:51:17 -07:00
Yann Collet
2de42174bb make ZSTD_HASHLOG3_MAX private
This is an implementation detail,
it doesn't belong to public space (zstd.h).
2021-08-20 09:52:42 -07:00
sen
ae998544de
Merge pull request #2750 from senhuang42/sb_compress
Improve branch misses on FSE symbol spreading
2021-08-20 12:47:24 -04:00
senhuang42
da095ed899 Improve branch misses on FSE symbol spreading 2021-08-18 10:22:22 -07:00
Sen Huang
539b3aab9b Optimize 32-bit VecMask_next() 2021-08-04 17:14:58 -04:00
senhuang42
e411040ea1 Add 64 row entry support for lazy 2021-08-04 16:19:12 -04:00
senhuang42
31820e032c Rebalance clevels for lazy 2021-08-04 16:18:52 -04:00
senhuang42
aa1957477b Improve Huffman sorting algorithm 2021-08-04 12:43:34 -04:00
Nick Terrell
6ee70bae46
Merge pull request #2733 from terrelln/huf-cspeed
[HUF] Improve Huffman encoding speed
2021-08-03 12:59:54 -04:00
Nick Terrell
d8a0797268 [fuzz] Add Huffman round trip fuzzer
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
  - Incorrect weight comparison (weights are allowed to be equal to
    table log).
  - HUF_compress1X_usingCTable_internal() can return compressed
    size >= source size, so the assert that `cSize <= 65535` isn't
    correct, and it needs to be checked instead.
2021-08-03 08:10:06 -07:00
sen
5c46f62006
Merge pull request #2677 from senhuang42/ci_overhaul_2
[CI][2/2] Migrate CI tests which (currently) fail
2021-08-02 09:55:49 -04:00
Sen Huang
5ec7897a26 Fix static analyzer warnings 2021-07-29 09:11:12 -07:00
Nick Terrell
46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
W. Felix Handte
da58821ff2 Fix DDSS Load
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.

Credit-to: OSS-Fuzz
2021-07-27 11:49:44 -04:00
Nick Terrell
ba044bd6f1 [bug-fix] Fix a determinism bug with the DUBT
The DUBT can be non-deterministic if an index is equal to
`ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the
indices at 2.

This bug was found by the OSS-Fuzz determinism fuzzer. With this change
the fuzzer test passes. And I've confirmed that this is the root cause,
not just hiding the problem.

Aside: This took me a long time to figure out, because I thought I had
tried this first thing. But, apparantly I messed it up, because when I
was going through it again with @felixhandte, I was pointing out that it
wasn't the case, but it turns out it was.

Credit to: OSS-Fuzz
2021-07-15 13:02:49 -07:00
binhdvo
b3e372c171
Merge pull request #2717 from binhdvo/bootcamp
Proactively skip huffman compression based on sampling where non-comp…
2021-07-01 10:39:58 -04:00
Binh Vo
dc5b693f1e Proactively skip huffman compression based on sampling where non-compressibility is suspected 2021-06-30 11:02:47 -04:00
Nick Terrell
609be382ac
Merge pull request #2719 from danlark1/danlark_iwyu
Include what you use in zstd_ldm_geartab
2021-06-29 16:53:10 -07:00
Danila Kutenin
e855b78be6 Include what you use in zstd_ldm_geartab 2021-06-29 17:57:53 +01:00
sen
45d707e908
Merge pull request #2715 from senhuang42/sequence_api_3
[RFC] Add internal API for converting ZSTD_Sequence into seqStore
2021-06-24 13:02:11 -04:00
senhuang42
76466dfadf Add simple API for converting ZSTD_Sequence into seqStore 2021-06-23 12:10:48 -04:00
Nick Terrell
05b6773fbc [fix] Add missing bounds checks during compression
* The block splitter missed a bounds check, so when the buffer is too small it
  passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which
  can then write the compressed data past the end of the buffer. This is a new
  regression in v1.5.0 when the block splitter is enabled. It is either enabled
  explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()`
  or `ZSTD_compressStream*()`.
* `HUF_writeCTable_wksp()` omits a bounds check when calling
  `HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass
  an erroneously large size to `HUF_compressWeights()`, which can then write
  past the end of the buffer. This bug has been present for ages. However, I
  believe that zstd cannot trigger the bug, because it never calls
  `HUF_compress*()` with `dstCapacity == 0` because of [this check][1].

Credit to: Oss-Fuzz

[1]: 89127e5ee2/lib/compress/zstd_compress_literals.c (L100)
2021-06-14 11:35:33 -07:00
sen
d5f3568c4b
Merge pull request #2697 from senhuang42/entropy_repeat_fix
[bug] Fix entropy repeat mode bug
2021-06-10 16:39:17 +03:00
aqrit
dd4f6aa9e6
Flatten ZSTD_row_getMatchMask (#2681)
* Flatten ZSTD_row_getMatchMask

* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary. 
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)

* replace memcpy with MEM_readST

* silence alignment warnings

* fix neon casts

* Update zstd_lazy.c

* unify simd preprocessor detection (#3)

* remove duplicate asserts

* tweak rotates

* improve endian detection

* add cast

there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..

* more casts

* fix casts

better work-around for the (bogus) warning: unary minus on unsigned
2021-06-09 08:50:25 +03:00
Felix Handte
8a3bdfaa7b
Merge pull request #2654 from wolfpld/dev
Initialize "potentially uninitialized" pointers.
2021-06-07 13:04:19 -04:00
Sen Huang
923e5ad3f5 Fix entropy repeat mode bug 2021-06-07 00:32:03 -07:00
senhuang42
939276cd0c Add ldm and block splitter auto-enable to old api 2021-05-24 13:09:32 -04:00
Yann Collet
02ece5d59f
Merge pull request #2653 from TrianglesPCT/dev
Enable SSE2 compression path to work on MSVC
2021-05-17 11:20:50 -07:00
Dan Nelson
54f78e3df8 ZSTD_VecMask_next: fix incorrect variable name in fallback code path 2021-05-15 10:20:37 -05:00
TrianglesPCT
bee0ef5647
Update zstd_lazy.c
It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.
2021-05-14 19:23:13 -06:00
TrianglesPCT
d688ab1e0c
Add files via upload
AVX2
2021-05-14 19:18:12 -06:00
TrianglesPCT
bb1cdd8c63
Update zstd_lazy.c
add space
2021-05-14 19:11:28 -06:00
TrianglesPCT
a62856bf65
Update zstd_lazy.c
Remove the AVX2 part
2021-05-14 19:10:24 -06:00
TrianglesPCT
8f7ea1afeb
Update zstd_lazy.c
Switch to other comment style
2021-05-14 19:02:34 -06:00
TrianglesPCT
0e071214b5
Update zstd_lazy.c
switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads
2021-05-14 17:03:30 -06:00
TrianglesPCT
69ac124b12
Update zstd_lazy.c 2021-05-14 16:53:19 -06:00
TrianglesPCT
0b9f4bb0ff
Update zstd_lazy.c
use 8bit
2021-05-14 16:47:24 -06:00