Nick Terrell
14772d97be
Merge pull request #2796 from terrelln/linux-fixes
...
[lib] Make lib compatible with `-Wfall-through` excepting legacy
2021-09-23 16:11:53 -07:00
Nick Terrell
d7ef97a013
[build] Fix oss-fuzz build with the dataflow sanitizer
...
The dataflow sanitizer requires all code to be instrumented. We can't
instrument the ASM function, so we have to disable it.
2021-09-23 11:48:39 -07:00
Nick Terrell
189e87bcbe
[lib] Make lib compatible with -Wfall-through
excepting legacy
...
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.
This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.
Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Nick Terrell
a5f2c45528
Huffman ASM
2021-09-20 14:46:43 -07:00
makise-homura
3cd085cec3
Clarify no-tree-vectorize usage for ICC and LCC
2021-07-14 20:00:44 +03:00
makise-homura
a5f518ae27
Change zstdcli's main() declaration due to -Wmain on some compilers
2021-07-14 19:55:47 +03:00
makise-homura
d4ad02c721
Add support for MCST LCC compiler
2021-07-10 03:57:06 +03:00
aqrit
dd4f6aa9e6
Flatten ZSTD_row_getMatchMask ( #2681 )
...
* Flatten ZSTD_row_getMatchMask
* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary.
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577 )
* replace memcpy with MEM_readST
* silence alignment warnings
* fix neon casts
* Update zstd_lazy.c
* unify simd preprocessor detection (#3 )
* remove duplicate asserts
* tweak rotates
* improve endian detection
* add cast
there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..
* more casts
* fix casts
better work-around for the (bogus) warning: unary minus on unsigned
2021-06-09 08:50:25 +03:00
Nick Terrell
a494308ae9
[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
...
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Nick Terrell
54a4998a80
Add basic tracing functionality
2021-02-05 16:28:52 -08:00
Nick Terrell
66e811d782
[license] Update year to 2021
2021-01-04 17:53:52 -05:00
Nick Terrell
caecd8c211
Allow user to override ASAN/MSAN detection
...
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
2020-09-24 19:42:04 -07:00
Nick Terrell
9ae0483858
Reorganize zstd_deps.h and mem.h + replace mem.h for the kernel
2020-09-24 19:41:59 -07:00
Nick Terrell
260fc75028
Move __has_builtin() fallback define to compiler.h
2020-09-24 15:51:08 -07:00
Nick Terrell
4d63ee57f5
Move ASAN/MSAN support declarations to compiler.h
2020-09-24 15:51:08 -07:00
Nick Terrell
e3bda594ae
Prefer __builtin_prefetch over inline asm
...
Reorder the ifdefs for the PREFETCH macros so that the compiler builtin is
favored over the inline assembly for aarch64.
2020-08-10 22:17:18 -07:00
Niadb
493fd40dca
Add files via upload
2020-07-28 02:52:15 -06:00
Niadb
74f65f624c
Update compiler.h
...
clean wording
2020-06-19 09:51:00 -06:00
Niadb
8c115cbe23
Update compiler.h
...
Added a comment explaining the purpose of the WIN_CDECL macro
2020-06-19 09:48:35 -06:00
Niadb
2962fda93f
Add files via upload
2020-06-19 03:34:05 -06:00
Niadb
a4c8aa5e02
Add files via upload
2020-06-19 03:31:47 -06:00
W. Felix Handte
952427aebf
Avoid inline Keyword in C90
...
Previously we would use it for all gcc-like compilations, even when a
restrictive mode that disallowed it had been selected.
2020-05-04 10:59:15 -04:00
Nick Terrell
5fcbc484c8
Merge pull request #2040 from caoyzh/dev-2
...
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
Nick Terrell
ac58c8d720
Fix copyright and license lines
...
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized
The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.
The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
caoyzh
7201980650
Optimize by prefetching on aarch64
2020-03-14 15:25:59 +08:00
Bimba Shrestha
85d0efd619
Removing no-tree-vectorize for intel
2020-03-05 10:02:48 -08:00
Nick Terrell
718f00ff6f
Optimize decompression speed for gcc and clang ( #1892 )
...
* Optimize `ZSTD_decodeSequence()`
* Optimize Huffman decoding
* Optimize `ZSTD_decompressSequences()`
* Delete `ZSTD_decodeSequenceLong()`
2019-11-25 18:26:19 -08:00
Nick Terrell
5cb7615f1f
Add UNUSED_ATTR to ZSTD_storeSeq()
2019-09-20 21:37:13 -07:00
Carl Woffenden
901ea61f83
Tweaks to create a single-file decoder
...
The CHECK_F macros differ slightly (but eventually do the same thing). Older GCC needs to fallback on the old-style pragma optimisation flags.
2019-08-21 17:49:17 +02:00
Joseph Chen
3855bc4295
Add support for IAR C/C++ Compiler for Arm
2019-07-29 15:25:58 +08:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Josh Soref
a880ca239b
Spelling ( #1582 )
...
* spelling: accidentally
* spelling: across
* spelling: additionally
* spelling: addresses
* spelling: appropriate
* spelling: assumed
* spelling: available
* spelling: builder
* spelling: capacity
* spelling: compiler
* spelling: compressibility
* spelling: compressor
* spelling: compression
* spelling: contract
* spelling: convenience
* spelling: decompress
* spelling: description
* spelling: deflate
* spelling: deterministically
* spelling: dictionary
* spelling: display
* spelling: eliminate
* spelling: preemptively
* spelling: exclude
* spelling: failure
* spelling: independence
* spelling: independent
* spelling: intentionally
* spelling: matching
* spelling: maximum
* spelling: meaning
* spelling: mishandled
* spelling: memory
* spelling: occasionally
* spelling: occurrence
* spelling: official
* spelling: offsets
* spelling: original
* spelling: output
* spelling: overflow
* spelling: overridden
* spelling: parameter
* spelling: performance
* spelling: probability
* spelling: receives
* spelling: redundant
* spelling: recompression
* spelling: resources
* spelling: sanity
* spelling: segment
* spelling: series
* spelling: specified
* spelling: specify
* spelling: subtracted
* spelling: successful
* spelling: return
* spelling: translation
* spelling: update
* spelling: unrelated
* spelling: useless
* spelling: variables
* spelling: variety
* spelling: verbatim
* spelling: verification
* spelling: visited
* spelling: warming
* spelling: workers
* spelling: with
2019-04-12 11:18:11 -07:00
W. Felix Handte
9d5f3963ff
Add Option to Not Request Inlining with ZSTD_NO_INLINE
2018-12-18 13:36:39 -08:00
Yann Collet
626040ab53
changed PREFETCH() macro into PREFETCH_L2()
...
which is more accurate
2018-11-12 17:05:32 -08:00
Yann Collet
9126da5b5c
improve long-range decoder speed
...
on enwik9 at level 22 (which is almost a worst case scenario),
speed improves by +7% on my laptop (415 -> 445 MB/s)
2018-11-08 12:47:46 -08:00
Yann Collet
5512400677
updated code comments, based on @terrelln review
2018-09-13 16:44:04 -07:00
Yann Collet
2618253da2
fixed PREFETCH() macro
...
for corner cases and platforms without this instruction
2018-09-12 16:15:37 -07:00
Yann Collet
4de344d505
added conditional prefetch
...
depending on amount of work to do.
2018-09-12 10:29:47 -07:00
Yann Collet
63a519dbf6
implemented first prefetch
...
based on dictID.
dictContent is prefetched up to 32 KB
(no contentSize adaptation)
2018-09-11 17:23:44 -07:00
Yann Collet
bbd78df59b
add build macro NO_PREFETCH
...
prevent usage of prefetch intrinsic commands
which are not supported by c2rust
(see https://github.com/immunant/c2rust/issues/13 )
2018-07-06 17:06:04 -07:00
fbrosson
291824f49d
__builtin_prefetch did probably not exist before gcc 3.1.
2018-05-18 18:40:11 +00:00
taigacon
2c3ad05812
Fix the problem that enables DYNAMIC_BMI2 macro by mistake on ARM architecture with Clang ( #1110 )
2018-04-23 15:41:50 -07:00
Yann Collet
ad15c1b724
added __has_attribute() define for non-clang compilers
2018-03-23 19:04:48 -07:00
Yann Collet
52ca7c6c56
make DYNAMIC_BMI2 support of clang conditional to __has_attribute()
...
to support older clang versions such as 3.4
2018-03-23 18:45:42 -07:00
Yann Collet
d02b44cf55
DYNAMIC_BMI2 enabled for clang
...
clang only claims compatibility with gcc 4.2.
Consequently, recent patch which reserved DYNAMIC_BMI2 for gcc >= 4.8
also disabled it for clang.
fix : __clang__ is now enough to enable DYNAMIC_BMI2
(associated with other existing conditions : x64/x64, !bmi2)
2018-03-04 16:05:59 -08:00
Yann Collet
45b09e7625
limit DYNAMIC_BMI2 to gcc >= 4.8
...
attribute bmi2 not supported by gcc 4.4
2018-03-01 15:02:18 -08:00
Nick Terrell
4319132312
[decompress] Support BMI2
2018-02-13 17:00:15 -08:00
Yann Collet
3128e03be6
updated license header
...
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Yann Collet
32fb407c9d
updated a bunch of headers
...
for the new license
2017-08-18 16:52:05 -07:00
Nick Terrell
565e925eb7
[libzstd] Fix FORCE_INLINE macro
2017-08-14 21:12:05 -07:00