9577 Commits

Author SHA1 Message Date
Danila Kutenin
e11783b04d [lazy] Optimize ZSTD_row_getMatchMask for level 8-10
We found that movemask is not used properly or consumes too much CPU.
This effort helps to optimize the movemask emulation on ARM.

For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5%
improvement.

The key idea is not to use pure movemasks but to have groups of bits.
For rowEntries == 16, 32 we are going to have groups of size 4 and 2
respectively. It means that each bit will be duplicated within the group

Then we do AND to have only one bit set in the group so that iteration
with lowering bit `a &= (a - 1)` works as well.

Also, aarch64 does not have rotate instructions for 16 bit, only for 32
and 64, that's why we see more improvements for level 8-9.

vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by
4 every u16 and narrows to 8 lower bits. See the picture below. It's
also used in
[Folly](c570259008/folly/container/detail/F14Table.h (L446)).
It also uses 2 cycles according to Neoverse-N{1,2} guidelines.

64 bit movemask is already well optimized. We have ongoing experiments
but were not able to validate other implementations work reliably faster.
2022-05-22 10:44:24 +00:00
Yann Collet
fda537b299
Merge pull request #3135 from averred/dev
Typo in man
2022-05-20 10:05:16 -07:00
Talha Khan
14894d63c1 Typo in man 2022-05-20 16:53:48 +08:00
Elliot Gorokhovsky
f349d18776
Merge pull request #3127 from embg/repcode_history
Correct and clarify repcode offset history logic
2022-05-12 13:50:15 -04:00
Elliot Gorokhovsky
3620a0a565 Nits 2022-05-12 12:53:15 -04:00
Felix Handte
8af64f4116
Merge pull request #3129 from felixhandte/zstd-fast-nodict-unconditional-ip1-table-write
ZSTD_fast_noDict: Avoid Safety Check When Writing `ip1` into Table
2022-05-11 17:04:02 -04:00
W. Felix Handte
1bc8019e10 Update results.csv 2022-05-11 10:27:35 -07:00
W. Felix Handte
1dd046a507 Fix Comments Slightly 2022-05-11 12:38:45 -04:00
W. Felix Handte
cd1f582943 Hoist Hash Table Writes Up into Each Match Found Block
Refactoring this way avoids the bad write in the case that `step > 4`, and
is a bit more straightforward. It also seems to perform better!
2022-05-11 11:27:34 -04:00
W. Felix Handte
040986a4f4 ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1
This commit avoids checking whether a hashtable write is safe in two of the
three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro-
duces a ~0.5% speed-up in compression.

A comment in the code describes why we can skip this check in the other two
paths (the repcode check and the first match check in the unrolled loop).

A downside is that in the new position where we make this check, we have not
yet computed `mLength`. We therefore have to avoid writing *possibly* dangerous
positions, rather than the old check which only avoids writing *actually*
dangerous positions. This leads to a miniscule loss in ratio (remember that
this scenario can only been triggered in very negative levels or under incomp-
ressibility acceleration).
2022-05-10 14:29:39 -07:00
Elliot Gorokhovsky
22875ece61 Nits 2022-05-09 21:01:38 -04:00
Elliot Gorokhovsky
97aabc496e Correct and clarify repcode offset history logic 2022-05-09 21:01:38 -04:00
Elliot Gorokhovsky
8bf32de850
Merge pull request #3126 from embg/fix_freebsd_ci
Unbreak FreeBSD CI
2022-05-09 19:48:13 -04:00
Elliot Gorokhovsky
83049cb3fe Unbreak FreeBSD CI 2022-05-09 18:28:03 -04:00
Elliot Gorokhovsky
7915c1164e
Merge pull request #3114 from embg/fast_extdict_pipeline2
Software pipeline for ZSTD_compressBlock_fast_extDict
2022-05-05 15:06:47 -04:00
Elliot Gorokhovsky
3be9a81e46 Update results.csv 2022-05-04 16:05:37 -04:00
Yann Collet
ea763f33cb
Merge pull request #3122 from eli-schwartz/betterlinkage
meson: for internal linkage, link to both libzstd and a static copy of it
2022-05-02 10:56:37 -07:00
Eli Schwartz
6548ec7440
meson: for internal linkage, link to both libzstd and a static copy of it
Partial, Meson-only implementation of #2976 for non-MSVC builds.

Due to the prevalence of private symbol reuse, linking to a shared
library is simply utterly unreliable, but we still want to defer to the
shared library for installable applications. By linking to both, we can
share symbols where possible, and statically link where needed.

This means we no longer need to manually track every file that needs to
be extracted and reused.

The flip side is that MSVC completely does not support this, so for MSVC
builds we just link to a full static copy even where
-Ddefault_library=shared.

As a side benefit, by using library inclusion rather than including
extra explicit object files, the zstd program shrinks in size slightly
(~4kb).
2022-04-28 21:57:02 -04:00
Eli Schwartz
8d522b8a9d
meson: avoid rebuilding some libzstd sources in the programs
These need to be explicitly included as we use their private symbols,
but we don't need to recompile them when we can reuse the existing
objects.

Minus 7 compile steps.
2022-04-28 21:56:36 -04:00
Eli Schwartz
df6eefb3bb
meson: avoid rebuilding some libzstd files in the test programs
The poolTests program already linked to libzstd, and later to
libtestcommon with included libzstd objects. So this was redundant.

Minus 4 compile steps.
2022-04-28 21:56:36 -04:00
Elliot Gorokhovsky
ac371be27b Remove hasStep variant (not enough wins to justify the code size increase) 2022-04-28 18:06:24 -04:00
Elliot Gorokhovsky
ce6b69f5c5 Final nit 2022-04-28 14:49:45 -04:00
Elliot Gorokhovsky
6a2e1f7c69 Revert "Hardcode repcode safety check, fix cosmetic nits"
This reverts commit 518cb83833074d304dfcaa93cfc16039ea4683c8.
2022-04-27 18:16:21 -04:00
Elliot Gorokhovsky
518cb83833 Hardcode repcode safety check, fix cosmetic nits 2022-04-26 17:54:25 -04:00
Yann Collet
86bd977a79
Merge pull request #3117 from cuishuang/dev
fix some typos
2022-04-26 10:02:04 -07:00
cuishuang
05796796fd fix some typos
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-04-26 17:40:23 +08:00
Elliot Gorokhovsky
809f652912 Optimize repcode predicate, hardcode hasStep == 0 scenario, cosmetic fixes 2022-04-20 14:40:52 -04:00
Yann Collet
66633f9386
Merge pull request #3039 from eli-schwartz/meson
Meson fixups for Windows
2022-04-19 15:51:19 -07:00
Yann Collet
f1faab6720
Merge pull request #3112 from facebook/man2
updated man page, providing more details for --train mode
2022-04-19 15:36:47 -07:00
Elliot Gorokhovsky
2820efe7ec Nits 2022-04-19 11:39:52 -04:00
Elliot Gorokhovsky
3536262f70 Port noDict pipeline 2022-04-15 12:16:16 -04:00
Yann Collet
eb726c6a20 updated man pages
had to run the conversion script on Ubuntu, as it doesn't run correctly on macos anymore.
2022-04-13 18:57:27 -07:00
Yann Collet
0df2fd6088 updated man page, providing more details for --train mode
following questions from #3111.

Note : only the source markdown has been updated,
the actual man page zstd.1 still need to be processed.
2022-04-13 18:51:59 -07:00
Yann Collet
460780f804
Merge pull request #3094 from dirkmueller/usage_cleanup
Split help in long and short version, cleanup formatting
2022-04-05 07:09:54 -07:00
Yann Collet
e4cd9bbd88
Merge pull request #3108 from paulmenzel/fix-typo-in-zstd.1
Remove superfluous *not* in description of `--long[=#]` in zstd(1)
2022-04-03 23:12:06 -07:00
Paul Menzel
f133bc8c9c zstd.1: Remove superfluous *not* in description of --long[=#]
Resolves: https://github.com/facebook/zstd/issues/3101
2022-04-03 07:29:51 +02:00
Elliot Gorokhovsky
3e6bbdd847
Disable visual-2015 tests (#3106) 2022-03-31 12:26:20 -04:00
Yann Collet
455c2c21e6
Merge pull request #3103 from facebook/fix45586
fix minor bug in sequence_compression_api tester
2022-03-30 10:08:57 -07:00
Yann Collet
678bfff4fe fix minor bug in sequence_compression_api tester
margin was merely slightly too short for extra splitting.
2022-03-29 16:45:09 -07:00
Dirk Müller
7fbe60d577
Split help in long and short version, cleanup formatting
Adopt the more standard Usage: formatting style
List short and long options alongside where available
Print lists as a table
Use command style description
2022-03-29 12:57:47 +02:00
Nick Terrell
f229daaf42
Merge pull request #3052 from dirkmueller/gzip_keep
Keep original file if -c or --stdout is given
2022-03-28 10:35:21 -07:00
Elliot Gorokhovsky
64efba4c5e
Software pipeline for ZSTD_compressBlock_fast_dictMatchState (#3086)
* prefetch dict content inside loop

* ip0/ip1 pipeline

* add L2_4 prefetch to dms pipeline

* Remove L1 prefetch

* Remove L2 prefetching

* Reduce # of gotos

* Cosmetic fixes

* Check final position sometimes

* Track step size as in bc768bc

* Fix nits
2022-03-17 12:35:11 -04:00
Nick Terrell
eadb6c874f
Merge pull request #3095 from dpelle/typo-and-grammar-fixes
Typo and grammar fixes
2022-03-14 09:17:21 -07:00
Dominique Pelle
3a64aa29a6 On more mistake (Node -> Note) 2022-03-13 00:08:55 +01:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
Nick Terrell
05fc7c78c8
Merge pull request #3088 from cyberknight777/dev
[contrib][linux] Fix a warning in zstd_reset_cstream()
2022-03-11 10:01:11 -08:00
Nick Terrell
c3a89ef60c
Merge pull request #3093 from dirkmueller/cli_tests_fixup
Handle newer less versions in zstdless testing
2022-03-10 10:41:52 -08:00
Dirk Müller
7a3997c21a
Handle newer less versions in zstdless testing
Newer less versions appear to have changed how stderr
and stdout are showing error messages. hardcode the
expected behavior to make the tests pass with any less version.

Also set locale to C so that the strings are matching.
2022-03-10 09:47:33 +01:00
Cyber Knight
498ac8238d
[contrib][linux] Make zstd_reset_cstream() functionally identical to ZSTD_resetCStream()
- As referenced by Nick Terrelln ~ the ZSTD maintainer in the linux kernel, making zstd_reset_cstream() functionally identical to ZSTD_resetCStream() would be the perfect way to fix the warning without touching any core functions or breaking other parts of the code.

Suggested-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2022-03-10 15:32:13 +08:00
Nick Terrell
6a8fba9e5f
Merge pull request #3092 from terrelln/2022-03-09-decoder-errata-doc
[doc] Add decompressor errata document
2022-03-09 15:10:29 -08:00