decomp: add prefetch for matched seq on aarch64 (#3164)
match is used for following sequence copy. It is only updated when extDict is needed, which is a low probability case. So it can be prefetched to reduce cache miss. The benchmarks on various Arm platforms showed uplift from 1% ~ 14% with gcc-11/clang-14. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
This commit is contained in:
parent
85d633042d
commit
558cf20d0d
@ -967,6 +967,11 @@ size_t ZSTD_execSequence(BYTE* op,
|
||||
|
||||
assert(op != NULL /* Precondition */);
|
||||
assert(oend_w < oend /* No underflow */);
|
||||
|
||||
#if defined(__aarch64__)
|
||||
/* prefetch sequence starting from match that will be used for copy later */
|
||||
PREFETCH_L1(match);
|
||||
#endif
|
||||
/* Handle edge cases in a slow path:
|
||||
* - Read beyond end of literals
|
||||
* - Match end is within WILDCOPY_OVERLIMIT of oend
|
||||
|
Loading…
x
Reference in New Issue
Block a user