679 Commits

Author SHA1 Message Date
Yann Collet
518f06b281 added minimum for decoder buffer
also : introduced macro BOUNDED()
2021-10-26 08:21:31 -07:00
binhdvo
6a7ede3dfc
Reduce size of dctx by reutilizing dst buffer (#2751)
* Reduce size of dctx by reutilizing dst buffer

Co-authored-by: Binh Vo <binhvo@fb.com>
2021-10-25 10:38:01 -04:00
Yann Collet
9d62957b31
Merge pull request #2800 from animalize/fix_c89
Fix a C89 error in msvc
2021-10-18 14:32:04 -07:00
Ma Lin
894f05e88d Fix ZSTD_countTrailingZeros() bug
`>> 3` is wrong.
2021-09-29 07:20:09 +08:00
Ma Lin
ae986fcdb8 Use __assume(0) for unreachable code path in msvc
msvc will optimize away the condition check.
2021-09-27 19:23:57 +08:00
Ma Lin
e5ba858270 Don't initialize the first parameter of _BitScanForward* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64
2021-09-25 16:36:53 +08:00
Ma Lin
95f492ea17 Don't initialize the first parameter of _BitScanReverse* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64
2021-09-25 16:36:53 +08:00
Nick Terrell
14772d97be
Merge pull request #2796 from terrelln/linux-fixes
[lib] Make lib compatible with `-Wfall-through` excepting legacy
2021-09-23 16:11:53 -07:00
Nick Terrell
d7ef97a013 [build] Fix oss-fuzz build with the dataflow sanitizer
The dataflow sanitizer requires all code to be instrumented. We can't
instrument the ASM function, so we have to disable it.
2021-09-23 11:48:39 -07:00
Nick Terrell
189e87bcbe [lib] Make lib compatible with -Wfall-through excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Nick Terrell
a5f2c45528 Huffman ASM 2021-09-20 14:46:43 -07:00
senhuang42
414e24becf Add 8 bytes to FSE workspace 2021-09-01 15:56:33 -04:00
senhuang42
da095ed899 Improve branch misses on FSE symbol spreading 2021-08-18 10:22:22 -07:00
senhuang42
aa1957477b Improve Huffman sorting algorithm 2021-08-04 12:43:34 -04:00
Nick Terrell
6ee70bae46
Merge pull request #2733 from terrelln/huf-cspeed
[HUF] Improve Huffman encoding speed
2021-08-03 12:59:54 -04:00
Nick Terrell
d8a0797268 [fuzz] Add Huffman round trip fuzzer
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
  - Incorrect weight comparison (weights are allowed to be equal to
    table log).
  - HUF_compress1X_usingCTable_internal() can return compressed
    size >= source size, so the assert that `cSize <= 65535` isn't
    correct, and it needs to be checked instead.
2021-08-03 08:10:06 -07:00
Nick Terrell
46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
makise-homura
3cd085cec3 Clarify no-tree-vectorize usage for ICC and LCC 2021-07-14 20:00:44 +03:00
makise-homura
a5f518ae27 Change zstdcli's main() declaration due to -Wmain on some compilers 2021-07-14 19:55:47 +03:00
makise-homura
d4ad02c721 Add support for MCST LCC compiler 2021-07-10 03:57:06 +03:00
binhdvo
b3e372c171
Merge pull request #2717 from binhdvo/bootcamp
Proactively skip huffman compression based on sampling where non-comp…
2021-07-01 10:39:58 -04:00
Binh Vo
dc5b693f1e Proactively skip huffman compression based on sampling where non-compressibility is suspected 2021-06-30 11:02:47 -04:00
Nick Terrell
094b26081f
Merge pull request #2689 from danlark1/dev
Optimize zstd decompression by another x%
2021-06-29 11:34:36 -07:00
aqrit
dd4f6aa9e6
Flatten ZSTD_row_getMatchMask (#2681)
* Flatten ZSTD_row_getMatchMask

* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary. 
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)

* replace memcpy with MEM_readST

* silence alignment warnings

* fix neon casts

* Update zstd_lazy.c

* unify simd preprocessor detection (#3)

* remove duplicate asserts

* tweak rotates

* improve endian detection

* add cast

there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..

* more casts

* fix casts

better work-around for the (bogus) warning: unary minus on unsigned
2021-06-09 08:50:25 +03:00
Danila Kutenin
444f4db955 Move declaration of 1 to an inlined cast 2021-05-29 20:55:37 +01:00
Danila Kutenin
a80d268700 Optimize ZSTD_decodeSequence by another x% 2021-05-29 18:21:10 +01:00
Nick Terrell
746f7976ab [trace] Refine the ZSTD_HAVE_WEAK_SYMBOLS detection
* Only enable for ELF on x86-64 or i386.
* Also explicitly disable for AIX.

Fixes #2658.
2021-05-18 20:22:36 -07:00
Bernhard M. Wiedemann
28d0120b5a Avoid SIGBUS on armv6
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel,
the mode 2 caused SIGBUS (unaligned memory access).
Running all our arm builds in the build farm
only on armv8 simplifies administration a lot.

Depending on compiler and environment, this change might slow down
memory accesses (did not benchmark it). The original analysis is 6 years old.

Fixes #2632
2021-05-11 17:51:03 +02:00
Nick Terrell
09149beaf8 [1.5.0] Move zstd_errors.h and zdict.h to lib/ root
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
2021-04-30 15:13:54 -07:00
Nick Terrell
6cee3c2c4f [trace] Remove default definitions of weak symbols
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
2021-04-26 16:05:39 -07:00
Sen Huang
550f76f131 Correct the detection of mismatched repcodes 2021-04-09 09:08:51 -07:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Sen Huang
b1a43455f8 Add enum for representing long length ID 2021-03-26 10:41:09 -07:00
Nick Terrell
f8ac0ea7ef
Merge pull request #2539 from terrelln/linux-kernel-fixes
Fixes for the next linux kernel patch version
2021-03-24 10:34:29 -07:00
Nick Terrell
634bfd339f [FSE] Clean up workspace using dynamically sized struct 2021-03-22 11:07:07 -07:00
Nick Terrell
756bd59322 [huf][fse] Clean up workspaces
* Move `counting` to a struct in `FSE_decompress_wksp_body()`
* Fix error code in `FSE_decompress_wksp_body()`
* Rename a variable in `HUF_ReadDTableX2_Workspace`
2021-03-17 16:50:37 -07:00
Nick Terrell
cd1551d261 [lib][tracing] Add ZSTD_NO_TRACE macro
When defined, it disables tracing, and avoids including the header.
2021-03-16 11:47:27 -07:00
Nick Terrell
3b1aba42cc [fse] Reduce stack usage of FSE_decompress_wksp() by 512 bytes
* Move `counting` into the workspace
* Inrease `HUF_DECOMPRESS_WORKSPACE_SIZE` by 512 bytes
2021-03-05 13:24:27 -08:00
Nick Terrell
b5fd348a85
Merge pull request #2523 from terrelln/huf-stack-reduction
Add HUF_writeCTable_wksp() function
2021-03-05 12:35:09 -08:00
Nick Terrell
5df2a21f1e Add HUF_writeCTable_wksp() function
This saves ~700 bytes of stack space in HUF_writeCTable.
2021-03-05 10:29:18 -08:00
Yann Collet
029f974ddc strengthen compilation flags 2021-03-02 15:43:52 -08:00
Nick Terrell
f520f6dfbe [trace] Minor fixes found during integration
* Mark `ZSTD_CCtx_getParameter()` as const
* Add `extern "C"` guards to `zstd_trace.h`
2021-02-11 16:20:04 -08:00
Nick Terrell
e59c9459a5 [trace] Keep track of a uint64_t tracing context
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.

This keeps the simple case simple.
2021-02-09 11:37:05 -08:00
Nick Terrell
54a4998a80 Add basic tracing functionality 2021-02-05 16:28:52 -08:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Nick Terrell
ae85676d44 Fix alignment of scratchBuffer in HUF_compressWeights()
The scratch buffer must be 4-byte aligned. This causes test failures in
32-bit systems, where the stack isn't aligned.

Fixes Issue #2428.
2020-12-17 14:30:27 -08:00
Yann Collet
6132df8dd3 fix gcc-10 strict aliasing warnings
by exposing HUF_CElt declaration.
2020-12-04 16:43:19 -08:00
Yann Collet
68c14bdff2 minor speed improvement to HUF_readCTable()
faster by ~+1-2%
2020-12-04 16:33:39 -08:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
Nick Terrell
c74be3f6de [lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00