decomp: add prefetch for matched seq on aarch64 (#3164)

match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
This commit is contained in:
Jun He 2022-07-30 01:27:20 +08:00 committed by GitHub
parent 85d633042d
commit 558cf20d0d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -967,6 +967,11 @@ size_t ZSTD_execSequence(BYTE* op,
assert(op != NULL /* Precondition */);
assert(oend_w < oend /* No underflow */);
#if defined(__aarch64__)
/* prefetch sequence starting from match that will be used for copy later */
PREFETCH_L1(match);
#endif
/* Handle edge cases in a slow path:
* - Read beyond end of literals
* - Match end is within WILDCOPY_OVERLIMIT of oend