From 10bfd0c0d5c584c5015c5a3a34436f7025fac325 Mon Sep 17 00:00:00 2001 From: Nick Terrell Date: Thu, 27 Oct 2016 16:19:54 -0700 Subject: [PATCH] Fix ZSTD_execSequence() performance regression Commit ae1cb3b3d07024618269b89e3421d828adfd34d9 caused the regression. It is an instruction alignment issue, because if it is `U64 i` instead of `U32 i`, the regression returns. This patch fixes the regression in gcc, but only gets some of the clang performance back. Benchmarks: Run on `silesia.tar`. I only show levels 1-5 because the performance regression was uniform across all levels. I did one run on levels 1-19 and it looked good. | Build | Level | Before | While | After | |-------|-------|-------:|------:|------:| | gcc | 1 | 931.4 | 904.4 | 932.8 | | gcc | 2 | 849.1 | 822.6 | 851.2 | | gcc | 3 | 815.6 | 790.6 | 818.9 | | gcc | 4 | 794.1 | 770.7 | 798.0 | | gcc | 5 | 785.7 | 760.7 | 788.8 | | clang | 1 | 705.5 | 683.2 | 693.8 | | clang | 2 | 670.0 | 649.2 | 660.7 | | clang | 3 | 659.6 | 639.8 | 651.4 | | clang | 4 | 652.5 | 634.7 | 645.9 | | clang | 5 | 646.9 | 625.5 | 637.7 | --- lib/decompress/zstd_decompress.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/decompress/zstd_decompress.c b/lib/decompress/zstd_decompress.c index f3ff4ebf..ed68d888 100644 --- a/lib/decompress/zstd_decompress.c +++ b/lib/decompress/zstd_decompress.c @@ -887,7 +887,8 @@ size_t ZSTD_execSequence(BYTE* op, sequence.matchLength -= length1; match = base; if (op > oend_w) { - while (op < oMatchEnd) *op++ = *match++; + U32 i; + for (i = 0; i < sequence.matchLength; ++i) op[i] = match[i]; return sequenceLength; } } }