Fix ZSTD_execSequence() performance regression
Commit ae1cb3b3d0
caused the regression.
It is an instruction alignment issue, because if it is `U64 i` instead
of `U32 i`, the regression returns. This patch fixes the regression
in gcc, but only gets some of the clang performance back.
Benchmarks:
Run on `silesia.tar`. I only show levels 1-5 because the performance
regression was uniform across all levels. I did one run on levels
1-19 and it looked good.
| Build | Level | Before | While | After |
|-------|-------|-------:|------:|------:|
| gcc | 1 | 931.4 | 904.4 | 932.8 |
| gcc | 2 | 849.1 | 822.6 | 851.2 |
| gcc | 3 | 815.6 | 790.6 | 818.9 |
| gcc | 4 | 794.1 | 770.7 | 798.0 |
| gcc | 5 | 785.7 | 760.7 | 788.8 |
| clang | 1 | 705.5 | 683.2 | 693.8 |
| clang | 2 | 670.0 | 649.2 | 660.7 |
| clang | 3 | 659.6 | 639.8 | 651.4 |
| clang | 4 | 652.5 | 634.7 | 645.9 |
| clang | 5 | 646.9 | 625.5 | 637.7 |
dev
parent
ee5b725823
commit
10bfd0c0d5
|
@ -887,7 +887,8 @@ size_t ZSTD_execSequence(BYTE* op,
|
||||||
sequence.matchLength -= length1;
|
sequence.matchLength -= length1;
|
||||||
match = base;
|
match = base;
|
||||||
if (op > oend_w) {
|
if (op > oend_w) {
|
||||||
while (op < oMatchEnd) *op++ = *match++;
|
U32 i;
|
||||||
|
for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i];
|
||||||
return sequenceLength;
|
return sequenceLength;
|
||||||
}
|
}
|
||||||
} }
|
} }
|
||||||
|
|
Loading…
Reference in New Issue