Fix ZSTD_execSequence() performance regression

Commit ae1cb3b3d0 caused the regression.
It is an instruction alignment issue, because if it is `U64 i` instead
of `U32 i`, the regression returns.  This patch fixes the regression
in gcc, but only gets some of the clang performance back.

Benchmarks:
Run on `silesia.tar`.  I only show levels 1-5 because the performance
regression was uniform across all levels.  I did one run on levels
1-19 and it looked good.

| Build | Level | Before | While | After |
|-------|-------|-------:|------:|------:|
| gcc   |     1 |  931.4 | 904.4 | 932.8 |
| gcc   |     2 |  849.1 | 822.6 | 851.2 |
| gcc   |     3 |  815.6 | 790.6 | 818.9 |
| gcc   |     4 |  794.1 | 770.7 | 798.0 |
| gcc   |     5 |  785.7 | 760.7 | 788.8 |
| clang |     1 |  705.5 | 683.2 | 693.8 |
| clang |     2 |  670.0 | 649.2 | 660.7 |
| clang |     3 |  659.6 | 639.8 | 651.4 |
| clang |     4 |  652.5 | 634.7 | 645.9 |
| clang |     5 |  646.9 | 625.5 | 637.7 |
dev
Nick Terrell 2016-10-27 16:19:54 -07:00
parent ee5b725823
commit 10bfd0c0d5
1 changed files with 2 additions and 1 deletions

View File

@ -887,7 +887,8 @@ size_t ZSTD_execSequence(BYTE* op,
sequence.matchLength -= length1; sequence.matchLength -= length1;
match = base; match = base;
if (op > oend_w) { if (op > oend_w) {
while (op < oMatchEnd) *op++ = *match++; U32 i;
for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i];
return sequenceLength; return sequenceLength;
} }
} } } }