zstd/compress at e11783b04d1c49678bb4f95a4ecaa26323bd823d - zstd - Final Minetest

facebook/zstd

History

Danila Kutenin e11783b04d [lazy] Optimize ZSTD_row_getMatchMask for level 8-10

We found that movemask is not used properly or consumes too much CPU.
This effort helps to optimize the movemask emulation on ARM.

For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5%
improvement.

The key idea is not to use pure movemasks but to have groups of bits.
For rowEntries == 16, 32 we are going to have groups of size 4 and 2
respectively. It means that each bit will be duplicated within the group

Then we do AND to have only one bit set in the group so that iteration
with lowering bit `a &= (a - 1)` works as well.

Also, aarch64 does not have rotate instructions for 16 bit, only for 32
and 64, that's why we see more improvements for level 8-9.

vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by
4 every u16 and narrows to 8 lower bits. See the picture below. It's
also used in
[Folly](c570259008/folly/container/detail/F14Table.h (L446)).
It also uses 2 cycles according to Neoverse-N{1,2} guidelines.

64 bit movemask is already well optimized. We have ongoing experiments
but were not able to validate other implementations work reliably faster.

2022-05-22 10:44:24 +00:00

..

clevels.h

Revert "Limit ZSTD_maxCLevel to 21 for 32-bit binaries."

2021-12-20 11:43:14 -08:00

fse_compress.c

Move bitwise builtins into bits.h

2022-02-14 11:16:03 -05:00

hist.c

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

hist.h

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

huf_compress.c

Move bitwise builtins into bits.h

2022-02-14 11:16:03 -05:00

zstd_compress_internal.h

Move bitwise builtins into bits.h

2022-02-14 11:16:03 -05:00

zstd_compress_literals.c

fix 44239

2022-02-01 10:49:38 -08:00

zstd_compress_literals.h

Proactively skip huffman compression based on sampling where non-compressibility is suspected

2021-06-30 11:02:47 -04:00

zstd_compress_sequences.c

Typo and grammar fixes

2022-03-12 08:58:04 +01:00

zstd_compress_sequences.h

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

zstd_compress_superblock.c

Typo and grammar fixes

2022-03-12 08:58:04 +01:00

zstd_compress_superblock.h

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

zstd_compress.c

Typo and grammar fixes

2022-03-12 08:58:04 +01:00

zstd_cwksp.h

only declare debug functions in debug mode

2022-01-26 14:47:24 -08:00

zstd_double_fast.c

Nits

2022-05-12 12:53:15 -04:00

zstd_double_fast.h

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

zstd_fast.c

Merge pull request #3127 from embg/repcode_history

2022-05-12 13:50:15 -04:00

zstd_fast.h

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

zstd_lazy.c

[lazy] Optimize ZSTD_row_getMatchMask for level 8-10

2022-05-22 10:44:24 +00:00

zstd_lazy.h

Add and integrate lazy row hash strategy

2021-04-07 09:53:34 -07:00

zstd_ldm_geartab.h

Include what you use in zstd_ldm_geartab

2021-06-29 17:57:53 +01:00

zstd_ldm.c

Typo and grammar fixes

2022-03-12 08:58:04 +01:00

zstd_ldm.h

Use new paramSwitch enum for LCM, row matchfinder, and block splitter

2021-09-21 14:22:02 -04:00

zstd_opt.c

Merge pull request #2965 from facebook/offbase

2022-01-24 15:47:42 -08:00

zstd_opt.h

[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files

2021-03-30 10:30:43 -07:00

zstdmt_compress.c

Typo and grammar fixes

2022-03-12 08:58:04 +01:00

zstdmt_compress.h

Documentation and minor refactor to clarify MT memory management.

2022-01-18 09:43:05 -07:00