deeper prefetching pipeline for decompressSequencesLong

pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
This commit is contained in:
Yann Collet 2021-05-05 10:04:03 -07:00
parent 455fd1a067
commit 7ef6d7b36c

View File

@ -1254,9 +1254,9 @@ ZSTD_decompressSequencesLong_body(
/* Regen sequences */
if (nbSeq) {
#define STORED_SEQS 4
#define STORED_SEQS 8
#define STORED_SEQS_MASK (STORED_SEQS-1)
#define ADVANCED_SEQS 4
#define ADVANCED_SEQS STORED_SEQS
seq_t sequences[STORED_SEQS];
int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS);
seqState_t seqState;