Merge branch 'dev' into longOffsetMode

This commit is contained in:
Yann Collet 2018-03-05 11:59:54 -08:00
commit b91ddf0ae6
42 changed files with 2879 additions and 1861 deletions

View File

@ -1,4 +1,4 @@
<p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/readme/doc/images/zstd_logo86.png" alt="Zstandard"></p> <p align="center"><img src="https://raw.githubusercontent.com/facebook/zstd/dev/doc/images/zstd_logo86.png" alt="Zstandard"></p>
__Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm, __Zstandard__, or `zstd` as short version, is a fast lossless compression algorithm,
targeting real-time compression scenarios at zlib-level and better compression ratios. targeting real-time compression scenarios at zlib-level and better compression ratios.

View File

@ -2,19 +2,24 @@ Zstandard Documentation
======================= =======================
This directory contains material defining the Zstandard format, This directory contains material defining the Zstandard format,
as well as for help using the `zstd` library. as well as detailed instructions to use `zstd` library.
__`zstd_manual.html`__ : Documentation of `zstd.h` API, in html format.
Click on this link: [http://zstd.net/zstd_manual.html](http://zstd.net/zstd_manual.html)
to display documentation of latest release in readable format within a browser.
__`zstd_compression_format.md`__ : This document defines the Zstandard compression format. __`zstd_compression_format.md`__ : This document defines the Zstandard compression format.
Compliant decoders must adhere to this document, Compliant decoders must adhere to this document,
and compliant encoders must generate data that follows it. and compliant encoders must generate data that follows it.
Should you look for ressources to develop your own port of Zstandard algorithm,
you may find the following ressources useful :
__`educational_decoder`__ : This directory contains an implementation of a Zstandard decoder, __`educational_decoder`__ : This directory contains an implementation of a Zstandard decoder,
compliant with the Zstandard compression format. compliant with the Zstandard compression format.
It can be used, for example, to better understand the format, It can be used, for example, to better understand the format,
or as the basis for a separate implementation a Zstandard decoder/encoder. or as the basis for a separate implementation of Zstandard decoder.
__`zstd_manual.html`__ : Documentation on the functions found in `zstd.h`.
See [http://zstd.net/zstd_manual.html](http://zstd.net/zstd_manual.html) for
the manual released with the latest official `zstd` release.
[__`decode_corpus`__](https://github.com/facebook/zstd/tree/dev/tests#decodecorpus---tool-to-generate-zstandard-frames-for-decoder-testing) :
This tool, stored in `/tests` directory, is able to generate random valid frames,
which is useful if you wish to test your decoder and verify it fully supports the specification.

View File

@ -416,7 +416,7 @@ size_t ZSTD_estimateDCtxSize(void);
It will also consider src size to be arbitrarily "large", which is worst case. It will also consider src size to be arbitrarily "large", which is worst case.
If srcSize is known to always be small, ZSTD_estimateCCtxSize_usingCParams() can provide a tighter estimation. If srcSize is known to always be small, ZSTD_estimateCCtxSize_usingCParams() can provide a tighter estimation.
ZSTD_estimateCCtxSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel. ZSTD_estimateCCtxSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbThreads is > 1. ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbWorkers is >= 1.
Note : CCtx size estimation is only correct for single-threaded compression. Note : CCtx size estimation is only correct for single-threaded compression.
</p></pre><BR> </p></pre><BR>
@ -429,7 +429,7 @@ size_t ZSTD_estimateDStreamSize_fromFrame(const void* src, size_t srcSize);
It will also consider src size to be arbitrarily "large", which is worst case. It will also consider src size to be arbitrarily "large", which is worst case.
If srcSize is known to always be small, ZSTD_estimateCStreamSize_usingCParams() can provide a tighter estimation. If srcSize is known to always be small, ZSTD_estimateCStreamSize_usingCParams() can provide a tighter estimation.
ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel. ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbThreads is set to a value > 1. ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbWorkers is >= 1.
Note : CStream size estimation is only correct for single-threaded compression. Note : CStream size estimation is only correct for single-threaded compression.
ZSTD_DStream memory budget depends on window Size. ZSTD_DStream memory budget depends on window Size.
This information can be passed manually, using ZSTD_estimateDStreamSize, This information can be passed manually, using ZSTD_estimateDStreamSize,
@ -800,18 +800,13 @@ size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned long
</b>/* multi-threading parameters */<b> </b>/* multi-threading parameters */<b>
</b>/* These parameters are only useful if multi-threading is enabled (ZSTD_MULTITHREAD).<b> </b>/* These parameters are only useful if multi-threading is enabled (ZSTD_MULTITHREAD).<b>
* They return an error otherwise. */ * They return an error otherwise. */
ZSTD_p_nbThreads=400, </b>/* Select how many threads a compression job can spawn (default:1)<b> ZSTD_p_nbWorkers=400, </b>/* Select how many threads will be spawned to compress in parallel.<b>
* More threads improve speed, but also increase memory usage. * When nbWorkers >= 1, triggers asynchronous mode :
* Can only receive a value > 1 if ZSTD_MULTITHREAD is enabled. * ZSTD_compress_generic() consumes some input, flush some output if possible, and immediately gives back control to caller,
* Special: value 0 means "do not change nbThreads" */ * while compression work is performed in parallel, within worker threads.
ZSTD_p_nonBlockingMode, </b>/* Single thread mode is by default "blocking" :<b> * (note : a strong exception to this rule is when first invocation sets ZSTD_e_end : it becomes a blocking call).
* it finishes its job as much as possible, and only then gives back control to caller. * More workers improve speed, but also increase memory usage.
* In contrast, multi-thread is by default "non-blocking" : * Default value is `0`, aka "single-threaded mode" : no worker is spawned, compression is performed inside Caller's thread, all invocations are blocking */
* it takes some input, flush some output if available, and immediately gives back control to caller.
* Compression work is performed in parallel, within worker threads.
* (note : a strong exception to this rule is when first job is called with ZSTD_e_end : it becomes blocking)
* Setting this parameter to 1 will enforce non-blocking mode even when only 1 thread is selected.
* It allows the caller to do other tasks while the worker thread compresses in parallel. */
ZSTD_p_jobSize, </b>/* Size of a compression job. This value is only enforced in streaming (non-blocking) mode.<b> ZSTD_p_jobSize, </b>/* Size of a compression job. This value is only enforced in streaming (non-blocking) mode.<b>
* Each compression job is completed in parallel, so indirectly controls the nb of active threads. * Each compression job is completed in parallel, so indirectly controls the nb of active threads.
* 0 means default, which is dynamically determined based on compression parameters. * 0 means default, which is dynamically determined based on compression parameters.
@ -837,28 +832,34 @@ size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned long
* Larger values increase memory usage and compression ratio, but decrease * Larger values increase memory usage and compression ratio, but decrease
* compression speed. * compression speed.
* Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX * Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX
* (default: windowlog - 7). */ * (default: windowlog - 7).
* Special: value 0 means "do not change ldmHashLog". */
ZSTD_p_ldmMinMatch, </b>/* Minimum size of searched matches for long distance matcher.<b> ZSTD_p_ldmMinMatch, </b>/* Minimum size of searched matches for long distance matcher.<b>
* Larger/too small values usually decrease compression ratio. * Larger/too small values usually decrease compression ratio.
* Must be clamped between ZSTD_LDM_MINMATCH_MIN * Must be clamped between ZSTD_LDM_MINMATCH_MIN
* and ZSTD_LDM_MINMATCH_MAX (default: 64). */ * and ZSTD_LDM_MINMATCH_MAX (default: 64).
* Special: value 0 means "do not change ldmMinMatch". */
ZSTD_p_ldmBucketSizeLog, </b>/* Log size of each bucket in the LDM hash table for collision resolution.<b> ZSTD_p_ldmBucketSizeLog, </b>/* Log size of each bucket in the LDM hash table for collision resolution.<b>
* Larger values usually improve collision resolution but may decrease * Larger values usually improve collision resolution but may decrease
* compression speed. * compression speed.
* The maximum value is ZSTD_LDM_BUCKETSIZELOG_MAX (default: 3). */ * The maximum value is ZSTD_LDM_BUCKETSIZELOG_MAX (default: 3).
* note : 0 is a valid value */
ZSTD_p_ldmHashEveryLog, </b>/* Frequency of inserting/looking up entries in the LDM hash table.<b> ZSTD_p_ldmHashEveryLog, </b>/* Frequency of inserting/looking up entries in the LDM hash table.<b>
* The default is MAX(0, (windowLog - ldmHashLog)) to * The default is MAX(0, (windowLog - ldmHashLog)) to
* optimize hash table usage. * optimize hash table usage.
* Larger values improve compression speed. Deviating far from the * Larger values improve compression speed. Deviating far from the
* default value will likely result in a decrease in compression ratio. * default value will likely result in a decrease in compression ratio.
* Must be clamped between 0 and ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN. */ * Must be clamped between 0 and ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN.
* note : 0 is a valid value */
} ZSTD_cParameter; } ZSTD_cParameter;
</b></pre><BR> </b></pre><BR>
<pre><b>size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value); <pre><b>size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value);
</b><p> Set one compression parameter, selected by enum ZSTD_cParameter. </b><p> Set one compression parameter, selected by enum ZSTD_cParameter.
Setting a parameter is generally only possible during frame initialization (before starting compression),
except for a few exceptions which can be updated during compression: compressionLevel, hashLog, chainLog, searchLog, minMatch, targetLength and strategy.
Note : when `value` is an enum, cast it to unsigned for proper type checking. Note : when `value` is an enum, cast it to unsigned for proper type checking.
@result : informational value (typically, the one being set, possibly corrected), @result : informational value (typically, value being set clamped correctly),
or an error code (which can be tested with ZSTD_isError()). or an error code (which can be tested with ZSTD_isError()).
</p></pre><BR> </p></pre><BR>
@ -1000,7 +1001,7 @@ size_t ZSTD_CCtx_refPrefix_advanced(ZSTD_CCtx* cctx, const void* prefix, size_t
</p></pre><BR> </p></pre><BR>
<pre><b>size_t ZSTD_resetCCtxParams(ZSTD_CCtx_params* params); <pre><b>size_t ZSTD_resetCCtxParams(ZSTD_CCtx_params* params);
</b><p> Reset params to default, with the default compression level. </b><p> Reset params to default values.
</p></pre><BR> </p></pre><BR>
@ -1028,9 +1029,10 @@ size_t ZSTD_CCtx_refPrefix_advanced(ZSTD_CCtx* cctx, const void* prefix, size_t
<pre><b>size_t ZSTD_CCtx_setParametersUsingCCtxParams( <pre><b>size_t ZSTD_CCtx_setParametersUsingCCtxParams(
ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params); ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params);
</b><p> Apply a set of ZSTD_CCtx_params to the compression context. </b><p> Apply a set of ZSTD_CCtx_params to the compression context.
This must be done before the dictionary is loaded. This can be done even after compression is started,
The pledgedSrcSize is treated as unknown. if nbWorkers==0, this will have no impact until a new compression is started.
Multithreading parameters are applied only if nbThreads > 1. if nbWorkers>=1, new parameters will be picked up at next job,
with a few restrictions (windowLog, pledgedSrcSize, nbWorkers, jobSize, and overlapLog are not updated).
</p></pre><BR> </p></pre><BR>

View File

@ -9,6 +9,7 @@
# This Makefile presumes libzstd is installed, using `sudo make install` # This Makefile presumes libzstd is installed, using `sudo make install`
CPPFLAGS += -I../lib
LIB = ../lib/libzstd.a LIB = ../lib/libzstd.a
.PHONY: default all clean test .PHONY: default all clean test

View File

@ -46,7 +46,7 @@ static unsigned readU32FromChar(const char** stringPtr)
int main(int argc, char const *argv[]) { int main(int argc, char const *argv[]) {
printf("\n Zstandard (v%u) memory usage for streaming contexts : \n\n", ZSTD_versionNumber()); printf("\n Zstandard (v%s) memory usage for streaming : \n\n", ZSTD_versionString());
unsigned wLog = 0; unsigned wLog = 0;
if (argc > 1) { if (argc > 1) {
@ -69,11 +69,13 @@ int main(int argc, char const *argv[]) {
/* forces compressor to use maximum memory size for given compression level, /* forces compressor to use maximum memory size for given compression level,
* by not providing any information on input size */ * by not providing any information on input size */
ZSTD_parameters params = ZSTD_getParams(compressionLevel, 0, 0); ZSTD_parameters params = ZSTD_getParams(compressionLevel, ZSTD_CONTENTSIZE_UNKNOWN, 0);
if (wLog) { /* special mode : specific wLog */ if (wLog) { /* special mode : specific wLog */
printf("Using custom compression parameter : level 1 + wLog=%u \n", wLog); printf("Using custom compression parameter : level 1 + wLog=%u \n", wLog);
params = ZSTD_getParams(1, 1 << wLog, 0); params = ZSTD_getParams(1 /*compressionLevel*/,
size_t const error = ZSTD_initCStream_advanced(cstream, NULL, 0, params, 0); 1 << wLog /*estimatedSrcSize*/,
0 /*no dictionary*/);
size_t const error = ZSTD_initCStream_advanced(cstream, NULL, 0, params, ZSTD_CONTENTSIZE_UNKNOWN);
if (ZSTD_isError(error)) { if (ZSTD_isError(error)) {
printf("ZSTD_initCStream_advanced error : %s \n", ZSTD_getErrorName(error)); printf("ZSTD_initCStream_advanced error : %s \n", ZSTD_getErrorName(error));
return 1; return 1;

View File

@ -25,6 +25,9 @@ cxx_library(
name='decompress', name='decompress',
header_namespace='', header_namespace='',
visibility=['PUBLIC'], visibility=['PUBLIC'],
headers=subdir_glob([
('decompress', '*_impl.h'),
]),
srcs=glob(['decompress/zstd*.c']), srcs=glob(['decompress/zstd*.c']),
deps=[ deps=[
':common', ':common',
@ -80,6 +83,15 @@ cxx_library(
]), ]),
) )
cxx_library(
name='cpu',
header_namespace='',
visibility=['PUBLIC'],
exported_headers=subdir_glob([
('common', 'cpu.h'),
]),
)
cxx_library( cxx_library(
name='bitstream', name='bitstream',
header_namespace='', header_namespace='',
@ -196,6 +208,7 @@ cxx_library(
deps=[ deps=[
':bitstream', ':bitstream',
':compiler', ':compiler',
':cpu',
':entropy', ':entropy',
':errors', ':errors',
':mem', ':mem',

View File

@ -63,6 +63,25 @@
# endif # endif
#endif #endif
/* target attribute */
#if defined(__GNUC__)
# define TARGET_ATTRIBUTE(target) __attribute__((__target__(target)))
#else
# define TARGET_ATTRIBUTE(target)
#endif
/* Enable runtime BMI2 dispatch based on the CPU.
* Enabled for clang & gcc >=4.8 on x86 when BMI2 isn't enabled by default.
*/
#ifndef DYNAMIC_BMI2
#if defined(__GNUC__) && (__GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8)) \
&& (defined(__x86_64__) || defined(_M_X86)) && !defined(__BMI2__)
# define DYNAMIC_BMI2 1
#else
# define DYNAMIC_BMI2 0
#endif
#endif
/* prefetch */ /* prefetch */
#if defined(_MSC_VER) && (defined(_M_X64) || defined(_M_I86)) /* _mm_prefetch() is not defined outside of x86/x64 */ #if defined(_MSC_VER) && (defined(_M_X64) || defined(_M_I86)) /* _mm_prefetch() is not defined outside of x86/x64 */
# include <mmintrin.h> /* https://msdn.microsoft.com/fr-fr/library/84szxsww(v=vs.90).aspx */ # include <mmintrin.h> /* https://msdn.microsoft.com/fr-fr/library/84szxsww(v=vs.90).aspx */

216
lib/common/cpu.h Normal file
View File

@ -0,0 +1,216 @@
/*
* Copyright (c) 2018-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under both the BSD-style license (found in the
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
* in the COPYING file in the root directory of this source tree).
* You may select, at your option, one of the above-listed licenses.
*/
#ifndef ZSTD_COMMON_CPU_H
#define ZSTD_COMMON_CPU_H
/**
* Implementation taken from folly/CpuId.h
* https://github.com/facebook/folly/blob/master/folly/CpuId.h
*/
#include <string.h>
#include "mem.h"
#ifdef _MSC_VER
#include <intrin.h>
#endif
typedef struct {
U32 f1c;
U32 f1d;
U32 f7b;
U32 f7c;
} ZSTD_cpuid_t;
MEM_STATIC ZSTD_cpuid_t ZSTD_cpuid(void) {
U32 f1c = 0;
U32 f1d = 0;
U32 f7b = 0;
U32 f7c = 0;
#ifdef _MSC_VER
int reg[4];
__cpuid((int*)reg, 0);
{
int const n = reg[0];
if (n >= 1) {
__cpuid((int*)reg, 1);
f1c = (U32)reg[2];
f1d = (U32)reg[3];
}
if (n >= 7) {
__cpuidex((int*)reg, 7, 0);
f7b = (U32)reg[1];
f7c = (U32)reg[2];
}
}
#elif defined(__i386__) && defined(__PIC__) && !defined(__clang__) && defined(__GNUC__)
/* The following block like the normal cpuid branch below, but gcc
* reserves ebx for use of its pic register so we must specially
* handle the save and restore to avoid clobbering the register
*/
U32 n;
__asm__(
"pushl %%ebx\n\t"
"cpuid\n\t"
"popl %%ebx\n\t"
: "=a"(n)
: "a"(0)
: "ecx", "edx");
if (n >= 1) {
U32 f1a;
__asm__(
"pushl %%ebx\n\t"
"cpuid\n\t"
"popl %%ebx\n\t"
: "=a"(f1a), "=c"(f1c), "=d"(f1d)
: "a"(1)
:);
}
if (n >= 7) {
__asm__(
"pushl %%ebx\n\t"
"cpuid\n\t"
"movl %%ebx, %%eax\n\r"
"popl %%ebx"
: "=a"(f7b), "=c"(f7c)
: "a"(7), "c"(0)
: "edx");
}
#elif defined(__x86_64__) || defined(_M_X64) || defined(__i386__)
U32 n;
__asm__("cpuid" : "=a"(n) : "a"(0) : "ebx", "ecx", "edx");
if (n >= 1) {
U32 f1a;
__asm__("cpuid" : "=a"(f1a), "=c"(f1c), "=d"(f1d) : "a"(1) : "ebx");
}
if (n >= 7) {
U32 f7a;
__asm__("cpuid"
: "=a"(f7a), "=b"(f7b), "=c"(f7c)
: "a"(7), "c"(0)
: "edx");
}
#endif
{
ZSTD_cpuid_t cpuid;
cpuid.f1c = f1c;
cpuid.f1d = f1d;
cpuid.f7b = f7b;
cpuid.f7c = f7c;
return cpuid;
}
}
#define X(name, r, bit) \
MEM_STATIC int ZSTD_cpuid_##name(ZSTD_cpuid_t const cpuid) { \
return ((cpuid.r) & (1U << bit)) != 0; \
}
/* cpuid(1): Processor Info and Feature Bits. */
#define C(name, bit) X(name, f1c, bit)
C(sse3, 0)
C(pclmuldq, 1)
C(dtes64, 2)
C(monitor, 3)
C(dscpl, 4)
C(vmx, 5)
C(smx, 6)
C(eist, 7)
C(tm2, 8)
C(ssse3, 9)
C(cnxtid, 10)
C(fma, 12)
C(cx16, 13)
C(xtpr, 14)
C(pdcm, 15)
C(pcid, 17)
C(dca, 18)
C(sse41, 19)
C(sse42, 20)
C(x2apic, 21)
C(movbe, 22)
C(popcnt, 23)
C(tscdeadline, 24)
C(aes, 25)
C(xsave, 26)
C(osxsave, 27)
C(avx, 28)
C(f16c, 29)
C(rdrand, 30)
#undef C
#define D(name, bit) X(name, f1d, bit)
D(fpu, 0)
D(vme, 1)
D(de, 2)
D(pse, 3)
D(tsc, 4)
D(msr, 5)
D(pae, 6)
D(mce, 7)
D(cx8, 8)
D(apic, 9)
D(sep, 11)
D(mtrr, 12)
D(pge, 13)
D(mca, 14)
D(cmov, 15)
D(pat, 16)
D(pse36, 17)
D(psn, 18)
D(clfsh, 19)
D(ds, 21)
D(acpi, 22)
D(mmx, 23)
D(fxsr, 24)
D(sse, 25)
D(sse2, 26)
D(ss, 27)
D(htt, 28)
D(tm, 29)
D(pbe, 31)
#undef D
/* cpuid(7): Extended Features. */
#define B(name, bit) X(name, f7b, bit)
B(bmi1, 3)
B(hle, 4)
B(avx2, 5)
B(smep, 7)
B(bmi2, 8)
B(erms, 9)
B(invpcid, 10)
B(rtm, 11)
B(mpx, 14)
B(avx512f, 16)
B(avx512dq, 17)
B(rdseed, 18)
B(adx, 19)
B(smap, 20)
B(avx512ifma, 21)
B(pcommit, 22)
B(clflushopt, 23)
B(clwb, 24)
B(avx512pf, 26)
B(avx512er, 27)
B(avx512cd, 28)
B(sha, 29)
B(avx512bw, 30)
B(avx512vl, 31)
#undef B
#define C(name, bit) X(name, f7c, bit)
C(prefetchwt1, 0)
C(avx512vbmi, 1)
#undef C
#undef X
#endif /* ZSTD_COMMON_CPU_H */

View File

@ -29,6 +29,7 @@ const char* ERR_getErrorString(ERR_enum code)
case PREFIX(parameter_outOfBound): return "Parameter is out of bound"; case PREFIX(parameter_outOfBound): return "Parameter is out of bound";
case PREFIX(init_missing): return "Context should be init first"; case PREFIX(init_missing): return "Context should be init first";
case PREFIX(memory_allocation): return "Allocation error : not enough memory"; case PREFIX(memory_allocation): return "Allocation error : not enough memory";
case PREFIX(workSpace_tooSmall): return "workSpace buffer is not large enough";
case PREFIX(stage_wrong): return "Operation not authorized at current processing stage"; case PREFIX(stage_wrong): return "Operation not authorized at current processing stage";
case PREFIX(tableLog_tooLarge): return "tableLog requires too much memory : unsupported"; case PREFIX(tableLog_tooLarge): return "tableLog requires too much memory : unsupported";
case PREFIX(maxSymbolValue_tooLarge): return "Unsupported max Symbol Value : too large"; case PREFIX(maxSymbolValue_tooLarge): return "Unsupported max Symbol Value : too large";

View File

@ -67,7 +67,6 @@ HUF_compress() :
`srcSize` must be <= `HUF_BLOCKSIZE_MAX` == 128 KB. `srcSize` must be <= `HUF_BLOCKSIZE_MAX` == 128 KB.
@return : size of compressed data (<= `dstCapacity`). @return : size of compressed data (<= `dstCapacity`).
Special values : if return == 0, srcData is not compressible => Nothing is stored within dst !!! Special values : if return == 0, srcData is not compressible => Nothing is stored within dst !!!
if return == 1, srcData is a single repeated byte symbol (RLE compression).
if HUF_isError(return), compression failed (more details using HUF_getErrorName()) if HUF_isError(return), compression failed (more details using HUF_getErrorName())
*/ */
HUF_PUBLIC_API size_t HUF_compress(void* dst, size_t dstCapacity, HUF_PUBLIC_API size_t HUF_compress(void* dst, size_t dstCapacity,
@ -80,7 +79,7 @@ HUF_decompress() :
`originalSize` : **must** be the ***exact*** size of original (uncompressed) data. `originalSize` : **must** be the ***exact*** size of original (uncompressed) data.
Note : in contrast with FSE, HUF_decompress can regenerate Note : in contrast with FSE, HUF_decompress can regenerate
RLE (cSrcSize==1) and uncompressed (cSrcSize==dstSize) data, RLE (cSrcSize==1) and uncompressed (cSrcSize==dstSize) data,
because it knows size to regenerate. because it knows size to regenerate (originalSize).
@return : size of regenerated data (== originalSize), @return : size of regenerated data (== originalSize),
or an error code, which can be tested using HUF_isError() or an error code, which can be tested using HUF_isError()
*/ */
@ -101,6 +100,7 @@ HUF_PUBLIC_API const char* HUF_getErrorName(size_t code); /**< provides error c
/** HUF_compress2() : /** HUF_compress2() :
* Same as HUF_compress(), but offers direct control over `maxSymbolValue` and `tableLog`. * Same as HUF_compress(), but offers direct control over `maxSymbolValue` and `tableLog`.
* `maxSymbolValue` must be <= HUF_SYMBOLVALUE_MAX .
* `tableLog` must be `<= HUF_TABLELOG_MAX` . */ * `tableLog` must be `<= HUF_TABLELOG_MAX` . */
HUF_PUBLIC_API size_t HUF_compress2 (void* dst, size_t dstCapacity, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog); HUF_PUBLIC_API size_t HUF_compress2 (void* dst, size_t dstCapacity, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog);
@ -129,7 +129,7 @@ HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity, const
/* ****************************************************************** /* ******************************************************************
* WARNING !! * WARNING !!
* The following section contains advanced and experimental definitions * The following section contains advanced and experimental definitions
* which shall never be used in the context of dll * which shall never be used in the context of a dynamic library,
* because they are not guaranteed to remain stable in the future. * because they are not guaranteed to remain stable in the future.
* Only consider them in association with static linking. * Only consider them in association with static linking.
*******************************************************************/ *******************************************************************/
@ -141,8 +141,8 @@ HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity, const
/* *** Constants *** */ /* *** Constants *** */
#define HUF_TABLELOG_MAX 12 /* max configured tableLog (for static allocation); can be modified up to HUF_ABSOLUTEMAX_TABLELOG */ #define HUF_TABLELOG_MAX 12 /* max runtime value of tableLog (due to static allocation); can be modified up to HUF_ABSOLUTEMAX_TABLELOG */
#define HUF_TABLELOG_DEFAULT 11 /* tableLog by default, when not specified */ #define HUF_TABLELOG_DEFAULT 11 /* default tableLog value when none specified */
#define HUF_SYMBOLVALUE_MAX 255 #define HUF_SYMBOLVALUE_MAX 255
#define HUF_TABLELOG_ABSOLUTEMAX 15 /* absolute limit of HUF_MAX_TABLELOG. Beyond that value, code does not work */ #define HUF_TABLELOG_ABSOLUTEMAX 15 /* absolute limit of HUF_MAX_TABLELOG. Beyond that value, code does not work */
@ -223,12 +223,13 @@ typedef enum {
* If it uses hufTable it does not modify hufTable or repeat. * If it uses hufTable it does not modify hufTable or repeat.
* If it doesn't, it sets *repeat = HUF_repeat_none, and it sets hufTable to the table used. * If it doesn't, it sets *repeat = HUF_repeat_none, and it sets hufTable to the table used.
* If preferRepeat then the old table will always be used if valid. */ * If preferRepeat then the old table will always be used if valid. */
size_t HUF_compress4X_repeat(void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize, HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ size_t HUF_compress4X_repeat(void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize, HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */
/** HUF_buildCTable_wksp() : /** HUF_buildCTable_wksp() :
* Same as HUF_buildCTable(), but using externally allocated scratch buffer. * Same as HUF_buildCTable(), but using externally allocated scratch buffer.
* `workSpace` must be aligned on 4-bytes boundaries, and be at least as large as a table of 1024 unsigned. * `workSpace` must be aligned on 4-bytes boundaries, and be at least as large as a table of HUF_CTABLE_WORKSPACE_SIZE_U32 unsigned.
*/ */
#define HUF_CTABLE_WORKSPACE_SIZE_U32 (2*HUF_SYMBOLVALUE_MAX +1 +1)
size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize); size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize);
/*! HUF_readStats() : /*! HUF_readStats() :
@ -236,8 +237,8 @@ size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValu
`huffWeight` is destination buffer. `huffWeight` is destination buffer.
@return : size read from `src` , or an error Code . @return : size read from `src` , or an error Code .
Note : Needed by HUF_readCTable() and HUF_readDTableXn() . */ Note : Needed by HUF_readCTable() and HUF_readDTableXn() . */
size_t HUF_readStats(BYTE* huffWeight, size_t hwSize, U32* rankStats, size_t HUF_readStats(BYTE* huffWeight, size_t hwSize,
U32* nbSymbolsPtr, U32* tableLogPtr, U32* rankStats, U32* nbSymbolsPtr, U32* tableLogPtr,
const void* src, size_t srcSize); const void* src, size_t srcSize);
/** HUF_readCTable() : /** HUF_readCTable() :
@ -279,7 +280,7 @@ size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* src, si
* If it uses hufTable it does not modify hufTable or repeat. * If it uses hufTable it does not modify hufTable or repeat.
* If it doesn't, it sets *repeat = HUF_repeat_none, and it sets hufTable to the table used. * If it doesn't, it sets *repeat = HUF_repeat_none, and it sets hufTable to the table used.
* If preferRepeat then the old table will always be used if valid. */ * If preferRepeat then the old table will always be used if valid. */
size_t HUF_compress1X_repeat(void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize, HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */ size_t HUF_compress1X_repeat(void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize, HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2); /**< `workSpace` must be a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */
size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /* single-symbol decoder */ size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /* single-symbol decoder */
size_t HUF_decompress1X4 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /* double-symbol decoder */ size_t HUF_decompress1X4 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize); /* double-symbol decoder */
@ -295,6 +296,14 @@ size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const void* cS
size_t HUF_decompress1X2_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable); size_t HUF_decompress1X2_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
size_t HUF_decompress1X4_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable); size_t HUF_decompress1X4_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
/* BMI2 variants.
* If the CPU has BMI2 support pass bmi2=1, otherwise pass bmi2=0.
*/
size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2);
size_t HUF_decompress1X2_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2);
size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2);
size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2);
#endif /* HUF_STATIC_LINKING_ONLY */ #endif /* HUF_STATIC_LINKING_ONLY */
#if defined (__cplusplus) #if defined (__cplusplus)

View File

@ -35,12 +35,20 @@ extern "C" {
# define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBILITY # define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBILITY
#endif #endif
/*-**************************************** /*-*********************************************
* error codes list * Error codes list
* note : this API is still considered unstable *-*********************************************
* and shall not be used with a dynamic library. * Error codes _values_ are pinned down since v1.3.1 only.
* only static linking is allowed * Therefore, don't rely on values if you may link to any version < v1.3.1.
******************************************/ *
* Only values < 100 are considered stable.
*
* note 1 : this API shall be used with static linking only.
* dynamic linking is not yet officially supported.
* note 2 : Prefer relying on the enum than on its value whenever possible
* This is the only supported way to use the error list < v1.3.1
* note 3 : ZSTD_isError() is always correct, whatever the library version.
**********************************************/
typedef enum { typedef enum {
ZSTD_error_no_error = 0, ZSTD_error_no_error = 0,
ZSTD_error_GENERIC = 1, ZSTD_error_GENERIC = 1,
@ -61,9 +69,10 @@ typedef enum {
ZSTD_error_stage_wrong = 60, ZSTD_error_stage_wrong = 60,
ZSTD_error_init_missing = 62, ZSTD_error_init_missing = 62,
ZSTD_error_memory_allocation = 64, ZSTD_error_memory_allocation = 64,
ZSTD_error_workSpace_tooSmall= 66,
ZSTD_error_dstSize_tooSmall = 70, ZSTD_error_dstSize_tooSmall = 70,
ZSTD_error_srcSize_wrong = 72, ZSTD_error_srcSize_wrong = 72,
/* following error codes are not stable and may be removed or changed in a future version */ /* following error codes are __NOT STABLE__, they can be removed or changed in future versions */
ZSTD_error_frameIndex_tooLarge = 100, ZSTD_error_frameIndex_tooLarge = 100,
ZSTD_error_seekableIO = 102, ZSTD_error_seekableIO = 102,
ZSTD_error_maxCode = 120 /* never EVER use this value directly, it can change in future versions! Use ZSTD_isError() instead */ ZSTD_error_maxCode = 120 /* never EVER use this value directly, it can change in future versions! Use ZSTD_isError() instead */

View File

@ -140,6 +140,7 @@ typedef enum { set_basic, set_rle, set_compressed, set_repeat } symbolEncodingTy
#define MLFSELog 9 #define MLFSELog 9
#define LLFSELog 9 #define LLFSELog 9
#define OffFSELog 8 #define OffFSELog 8
#define MaxFSELog MAX(MAX(MLFSELog, LLFSELog), OffFSELog)
static const U32 LL_bits[MaxLL+1] = { 0, 0, 0, 0, 0, 0, 0, 0, static const U32 LL_bits[MaxLL+1] = { 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
@ -212,6 +213,12 @@ MEM_STATIC void ZSTD_wildcopy_e(void* dst, const void* src, void* dstEnd) /* s
/*-******************************************* /*-*******************************************
* Private declarations * Private declarations
*********************************************/ *********************************************/
typedef struct rawSeq_s {
U32 offset;
U32 litLength;
U32 matchLength;
} rawSeq;
typedef struct seqDef_s { typedef struct seqDef_s {
U32 offset; U32 offset;
U16 litLength; U16 litLength;

View File

@ -248,7 +248,7 @@ static size_t FSE_writeNCount_generic (void* header, size_t headerBufferSize,
bitCount -= (count<max); bitCount -= (count<max);
previous0 = (count==1); previous0 = (count==1);
if (remaining<1) return ERROR(GENERIC); if (remaining<1) return ERROR(GENERIC);
while (remaining<threshold) nbBits--, threshold>>=1; while (remaining<threshold) { nbBits--; threshold>>=1; }
} }
if (bitCount>16) { if (bitCount>16) {
if ((!writeIsSafe) && (out > oend - 2)) return ERROR(dstSize_tooSmall); /* Buffer overflow */ if ((!writeIsSafe) && (out > oend - 2)) return ERROR(dstSize_tooSmall); /* Buffer overflow */
@ -540,7 +540,7 @@ static size_t FSE_normalizeM2(short* norm, U32 tableLog, const unsigned* count,
find max, then give all remaining points to max */ find max, then give all remaining points to max */
U32 maxV = 0, maxC = 0; U32 maxV = 0, maxC = 0;
for (s=0; s<=maxSymbolValue; s++) for (s=0; s<=maxSymbolValue; s++)
if (count[s] > maxC) maxV=s, maxC=count[s]; if (count[s] > maxC) { maxV=s; maxC=count[s]; }
norm[maxV] += (short)ToDistribute; norm[maxV] += (short)ToDistribute;
return 0; return 0;
} }
@ -548,7 +548,7 @@ static size_t FSE_normalizeM2(short* norm, U32 tableLog, const unsigned* count,
if (total == 0) { if (total == 0) {
/* all of the symbols were low enough for the lowOne or lowThreshold */ /* all of the symbols were low enough for the lowOne or lowThreshold */
for (s=0; ToDistribute > 0; s = (s+1)%(maxSymbolValue+1)) for (s=0; ToDistribute > 0; s = (s+1)%(maxSymbolValue+1))
if (norm[s] > 0) ToDistribute--, norm[s]++; if (norm[s] > 0) { ToDistribute--; norm[s]++; }
return 0; return 0;
} }
@ -604,7 +604,7 @@ size_t FSE_normalizeCount (short* normalizedCounter, unsigned tableLog,
U64 restToBeat = vStep * rtbTable[proba]; U64 restToBeat = vStep * rtbTable[proba];
proba += (count[s]*step) - ((U64)proba<<scale) > restToBeat; proba += (count[s]*step) - ((U64)proba<<scale) > restToBeat;
} }
if (proba > largestP) largestP=proba, largest=s; if (proba > largestP) { largestP=proba; largest=s; }
normalizedCounter[s] = proba; normalizedCounter[s] = proba;
stillToDistribute -= proba; stillToDistribute -= proba;
} } } }

View File

@ -46,6 +46,7 @@
#include <string.h> /* memcpy, memset */ #include <string.h> /* memcpy, memset */
#include <stdio.h> /* printf (debug) */ #include <stdio.h> /* printf (debug) */
#include "bitstream.h" #include "bitstream.h"
#include "compiler.h"
#define FSE_STATIC_LINKING_ONLY /* FSE_optimalTableLog_internal */ #define FSE_STATIC_LINKING_ONLY /* FSE_optimalTableLog_internal */
#include "fse.h" /* header compression */ #include "fse.h" /* header compression */
#define HUF_STATIC_LINKING_ONLY #define HUF_STATIC_LINKING_ONLY
@ -322,7 +323,10 @@ static void HUF_sort(nodeElt* huffNode, const U32* count, U32 maxSymbolValue)
U32 const c = count[n]; U32 const c = count[n];
U32 const r = BIT_highbit32(c+1) + 1; U32 const r = BIT_highbit32(c+1) + 1;
U32 pos = rank[r].current++; U32 pos = rank[r].current++;
while ((pos > rank[r].base) && (c > huffNode[pos-1].count)) huffNode[pos]=huffNode[pos-1], pos--; while ((pos > rank[r].base) && (c > huffNode[pos-1].count)) {
huffNode[pos] = huffNode[pos-1];
pos--;
}
huffNode[pos].count = c; huffNode[pos].count = c;
huffNode[pos].byte = (BYTE)n; huffNode[pos].byte = (BYTE)n;
} }
@ -331,10 +335,10 @@ static void HUF_sort(nodeElt* huffNode, const U32* count, U32 maxSymbolValue)
/** HUF_buildCTable_wksp() : /** HUF_buildCTable_wksp() :
* Same as HUF_buildCTable(), but using externally allocated scratch buffer. * Same as HUF_buildCTable(), but using externally allocated scratch buffer.
* `workSpace` must be aligned on 4-bytes boundaries, and be at least as large as a table of 1024 unsigned. * `workSpace` must be aligned on 4-bytes boundaries, and be at least as large as a table of HUF_CTABLE_WORKSPACE_SIZE_U32 unsigned.
*/ */
#define STARTNODE (HUF_SYMBOLVALUE_MAX+1) #define STARTNODE (HUF_SYMBOLVALUE_MAX+1)
typedef nodeElt huffNodeTable[2*HUF_SYMBOLVALUE_MAX+1 +1]; typedef nodeElt huffNodeTable[HUF_CTABLE_WORKSPACE_SIZE_U32];
size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize) size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize)
{ {
nodeElt* const huffNode0 = (nodeElt*)workSpace; nodeElt* const huffNode0 = (nodeElt*)workSpace;
@ -345,9 +349,10 @@ size_t HUF_buildCTable_wksp (HUF_CElt* tree, const U32* count, U32 maxSymbolValu
U32 nodeRoot; U32 nodeRoot;
/* safety checks */ /* safety checks */
if (wkspSize < sizeof(huffNodeTable)) return ERROR(GENERIC); /* workSpace is not large enough */ if (((size_t)workSpace & 3) != 0) return ERROR(GENERIC); /* must be aligned on 4-bytes boundaries */
if (wkspSize < sizeof(huffNodeTable)) return ERROR(workSpace_tooSmall);
if (maxNbBits == 0) maxNbBits = HUF_TABLELOG_DEFAULT; if (maxNbBits == 0) maxNbBits = HUF_TABLELOG_DEFAULT;
if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) return ERROR(GENERIC); if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) return ERROR(maxSymbolValue_tooLarge);
memset(huffNode0, 0, sizeof(huffNodeTable)); memset(huffNode0, 0, sizeof(huffNodeTable));
/* sort, decreasing order */ /* sort, decreasing order */
@ -433,117 +438,69 @@ static int HUF_validateCTable(const HUF_CElt* CTable, const unsigned* count, uns
return !bad; return !bad;
} }
static void HUF_encodeSymbol(BIT_CStream_t* bitCPtr, U32 symbol, const HUF_CElt* CTable)
{
BIT_addBitsFast(bitCPtr, CTable[symbol].val, CTable[symbol].nbBits);
}
size_t HUF_compressBound(size_t size) { return HUF_COMPRESSBOUND(size); } size_t HUF_compressBound(size_t size) { return HUF_COMPRESSBOUND(size); }
#define HUF_FLUSHBITS(s) BIT_flushBits(s)
#define HUF_FLUSHBITS_1(stream) \ #define FUNCTION(fn) fn##_default
if (sizeof((stream)->bitContainer)*8 < HUF_TABLELOG_MAX*2+7) HUF_FLUSHBITS(stream) #define TARGET
#include "huf_compress_impl.h"
#undef TARGET
#undef FUNCTION
#define HUF_FLUSHBITS_2(stream) \ #if DYNAMIC_BMI2
if (sizeof((stream)->bitContainer)*8 < HUF_TABLELOG_MAX*4+7) HUF_FLUSHBITS(stream)
#define FUNCTION(fn) fn##_bmi2
#define TARGET TARGET_ATTRIBUTE("bmi2")
#include "huf_compress_impl.h"
#undef TARGET
#undef FUNCTION
#endif
static size_t HUF_compress1X_usingCTable_internal(void* dst, size_t dstSize,
const void* src, size_t srcSize,
const HUF_CElt* CTable, const int bmi2)
{
#if DYNAMIC_BMI2
if (bmi2) {
return HUF_compress1X_usingCTable_internal_bmi2(dst, dstSize, src, srcSize, CTable);
}
#endif
(void)bmi2;
return HUF_compress1X_usingCTable_internal_default(dst, dstSize, src, srcSize, CTable);
}
static size_t HUF_compress4X_usingCTable_internal(void* dst, size_t dstSize,
const void* src, size_t srcSize,
const HUF_CElt* CTable, const int bmi2)
{
#if DYNAMIC_BMI2
if (bmi2) {
return HUF_compress4X_usingCTable_internal_bmi2(dst, dstSize, src, srcSize, CTable);
}
#endif
(void)bmi2;
return HUF_compress4X_usingCTable_internal_default(dst, dstSize, src, srcSize, CTable);
}
size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable) size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable)
{ {
const BYTE* ip = (const BYTE*) src; return HUF_compress1X_usingCTable_internal(dst, dstSize, src, srcSize, CTable, /* bmi2 */ 0);
BYTE* const ostart = (BYTE*)dst;
BYTE* const oend = ostart + dstSize;
BYTE* op = ostart;
size_t n;
BIT_CStream_t bitC;
/* init */
if (dstSize < 8) return 0; /* not enough space to compress */
{ size_t const initErr = BIT_initCStream(&bitC, op, oend-op);
if (HUF_isError(initErr)) return 0; }
n = srcSize & ~3; /* join to mod 4 */
switch (srcSize & 3)
{
case 3 : HUF_encodeSymbol(&bitC, ip[n+ 2], CTable);
HUF_FLUSHBITS_2(&bitC);
/* fall-through */
case 2 : HUF_encodeSymbol(&bitC, ip[n+ 1], CTable);
HUF_FLUSHBITS_1(&bitC);
/* fall-through */
case 1 : HUF_encodeSymbol(&bitC, ip[n+ 0], CTable);
HUF_FLUSHBITS(&bitC);
/* fall-through */
case 0 : /* fall-through */
default: break;
} }
for (; n>0; n-=4) { /* note : n&3==0 at this stage */
HUF_encodeSymbol(&bitC, ip[n- 1], CTable);
HUF_FLUSHBITS_1(&bitC);
HUF_encodeSymbol(&bitC, ip[n- 2], CTable);
HUF_FLUSHBITS_2(&bitC);
HUF_encodeSymbol(&bitC, ip[n- 3], CTable);
HUF_FLUSHBITS_1(&bitC);
HUF_encodeSymbol(&bitC, ip[n- 4], CTable);
HUF_FLUSHBITS(&bitC);
}
return BIT_closeCStream(&bitC);
}
size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable) size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable)
{ {
size_t const segmentSize = (srcSize+3)/4; /* first 3 segments */ return HUF_compress4X_usingCTable_internal(dst, dstSize, src, srcSize, CTable, /* bmi2 */ 0);
const BYTE* ip = (const BYTE*) src;
const BYTE* const iend = ip + srcSize;
BYTE* const ostart = (BYTE*) dst;
BYTE* const oend = ostart + dstSize;
BYTE* op = ostart;
if (dstSize < 6 + 1 + 1 + 1 + 8) return 0; /* minimum space to compress successfully */
if (srcSize < 12) return 0; /* no saving possible : too small input */
op += 6; /* jumpTable */
{ CHECK_V_F(cSize, HUF_compress1X_usingCTable(op, oend-op, ip, segmentSize, CTable) );
if (cSize==0) return 0;
MEM_writeLE16(ostart, (U16)cSize);
op += cSize;
} }
ip += segmentSize;
{ CHECK_V_F(cSize, HUF_compress1X_usingCTable(op, oend-op, ip, segmentSize, CTable) );
if (cSize==0) return 0;
MEM_writeLE16(ostart+2, (U16)cSize);
op += cSize;
}
ip += segmentSize;
{ CHECK_V_F(cSize, HUF_compress1X_usingCTable(op, oend-op, ip, segmentSize, CTable) );
if (cSize==0) return 0;
MEM_writeLE16(ostart+4, (U16)cSize);
op += cSize;
}
ip += segmentSize;
{ CHECK_V_F(cSize, HUF_compress1X_usingCTable(op, oend-op, ip, iend-ip, CTable) );
if (cSize==0) return 0;
op += cSize;
}
return op-ostart;
}
static size_t HUF_compressCTable_internal( static size_t HUF_compressCTable_internal(
BYTE* const ostart, BYTE* op, BYTE* const oend, BYTE* const ostart, BYTE* op, BYTE* const oend,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned singleStream, const HUF_CElt* CTable) unsigned singleStream, const HUF_CElt* CTable, const int bmi2)
{ {
size_t const cSize = singleStream ? size_t const cSize = singleStream ?
HUF_compress1X_usingCTable(op, oend - op, src, srcSize, CTable) : HUF_compress1X_usingCTable_internal(op, oend - op, src, srcSize, CTable, bmi2) :
HUF_compress4X_usingCTable(op, oend - op, src, srcSize, CTable); HUF_compress4X_usingCTable_internal(op, oend - op, src, srcSize, CTable, bmi2);
if (HUF_isError(cSize)) { return cSize; } if (HUF_isError(cSize)) { return cSize; }
if (cSize==0) { return 0; } /* uncompressible */ if (cSize==0) { return 0; } /* uncompressible */
op += cSize; op += cSize;
@ -552,86 +509,98 @@ static size_t HUF_compressCTable_internal(
return op-ostart; return op-ostart;
} }
typedef struct {
U32 count[HUF_SYMBOLVALUE_MAX + 1];
HUF_CElt CTable[HUF_SYMBOLVALUE_MAX + 1];
huffNodeTable nodeTable;
} HUF_compress_tables_t;
/* `workSpace` must a table of at least 1024 unsigned */ /* HUF_compress_internal() :
* `workSpace` must a table of at least HUF_WORKSPACE_SIZE_U32 unsigned */
static size_t HUF_compress_internal ( static size_t HUF_compress_internal (
void* dst, size_t dstSize, void* dst, size_t dstSize,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned huffLog, unsigned maxSymbolValue, unsigned huffLog,
unsigned singleStream, unsigned singleStream,
void* workSpace, size_t wkspSize, void* workSpace, size_t wkspSize,
HUF_CElt* oldHufTable, HUF_repeat* repeat, int preferRepeat) HUF_CElt* oldHufTable, HUF_repeat* repeat, int preferRepeat,
const int bmi2)
{ {
HUF_compress_tables_t* const table = (HUF_compress_tables_t*)workSpace;
BYTE* const ostart = (BYTE*)dst; BYTE* const ostart = (BYTE*)dst;
BYTE* const oend = ostart + dstSize; BYTE* const oend = ostart + dstSize;
BYTE* op = ostart; BYTE* op = ostart;
U32* count;
size_t const countSize = sizeof(U32) * (HUF_SYMBOLVALUE_MAX + 1);
HUF_CElt* CTable;
size_t const CTableSize = sizeof(HUF_CElt) * (HUF_SYMBOLVALUE_MAX + 1);
/* checks & inits */ /* checks & inits */
if (wkspSize < sizeof(huffNodeTable) + countSize + CTableSize) return ERROR(GENERIC); if (((size_t)workSpace & 3) != 0) return ERROR(GENERIC); /* must be aligned on 4-bytes boundaries */
if (!srcSize) return 0; /* Uncompressed (note : 1 means rle, so first byte must be correct) */ if (wkspSize < sizeof(*table)) return ERROR(workSpace_tooSmall);
if (!dstSize) return 0; /* cannot fit within dst budget */ if (!srcSize) return 0; /* Uncompressed */
if (!dstSize) return 0; /* cannot fit anything within dst budget */
if (srcSize > HUF_BLOCKSIZE_MAX) return ERROR(srcSize_wrong); /* current block size limit */ if (srcSize > HUF_BLOCKSIZE_MAX) return ERROR(srcSize_wrong); /* current block size limit */
if (huffLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge); if (huffLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge);
if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) return ERROR(maxSymbolValue_tooLarge);
if (!maxSymbolValue) maxSymbolValue = HUF_SYMBOLVALUE_MAX; if (!maxSymbolValue) maxSymbolValue = HUF_SYMBOLVALUE_MAX;
if (!huffLog) huffLog = HUF_TABLELOG_DEFAULT; if (!huffLog) huffLog = HUF_TABLELOG_DEFAULT;
count = (U32*)workSpace; /* Heuristic : If old table is valid, use it for small inputs */
workSpace = (BYTE*)workSpace + countSize;
wkspSize -= countSize;
CTable = (HUF_CElt*)workSpace;
workSpace = (BYTE*)workSpace + CTableSize;
wkspSize -= CTableSize;
/* Heuristic : If we don't need to check the validity of the old table use the old table for small inputs */
if (preferRepeat && repeat && *repeat == HUF_repeat_valid) { if (preferRepeat && repeat && *repeat == HUF_repeat_valid) {
return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, singleStream, oldHufTable); return HUF_compressCTable_internal(ostart, op, oend,
src, srcSize,
singleStream, oldHufTable, bmi2);
} }
/* Scan input and build symbol stats */ /* Scan input and build symbol stats */
{ CHECK_V_F(largest, FSE_count_wksp (count, &maxSymbolValue, (const BYTE*)src, srcSize, (U32*)workSpace) ); { CHECK_V_F(largest, FSE_count_wksp (table->count, &maxSymbolValue, (const BYTE*)src, srcSize, table->count) );
if (largest == srcSize) { *ostart = ((const BYTE*)src)[0]; return 1; } /* single symbol, rle */ if (largest == srcSize) { *ostart = ((const BYTE*)src)[0]; return 1; } /* single symbol, rle */
if (largest <= (srcSize >> 7)+1) return 0; /* Fast heuristic : not compressible enough */ if (largest <= (srcSize >> 7)+1) return 0; /* heuristic : probably not compressible enough */
} }
/* Check validity of previous table */ /* Check validity of previous table */
if (repeat && *repeat == HUF_repeat_check && !HUF_validateCTable(oldHufTable, count, maxSymbolValue)) { if ( repeat
&& *repeat == HUF_repeat_check
&& !HUF_validateCTable(oldHufTable, table->count, maxSymbolValue)) {
*repeat = HUF_repeat_none; *repeat = HUF_repeat_none;
} }
/* Heuristic : use existing table for small inputs */ /* Heuristic : use existing table for small inputs */
if (preferRepeat && repeat && *repeat != HUF_repeat_none) { if (preferRepeat && repeat && *repeat != HUF_repeat_none) {
return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, singleStream, oldHufTable); return HUF_compressCTable_internal(ostart, op, oend,
src, srcSize,
singleStream, oldHufTable, bmi2);
} }
/* Build Huffman Tree */ /* Build Huffman Tree */
huffLog = HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue); huffLog = HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue);
{ CHECK_V_F(maxBits, HUF_buildCTable_wksp (CTable, count, maxSymbolValue, huffLog, workSpace, wkspSize) ); { CHECK_V_F(maxBits, HUF_buildCTable_wksp(table->CTable, table->count,
maxSymbolValue, huffLog,
table->nodeTable, sizeof(table->nodeTable)) );
huffLog = (U32)maxBits; huffLog = (U32)maxBits;
/* Zero the unused symbols so we can check it for validity */ /* Zero unused symbols in CTable, so we can check it for validity */
memset(CTable + maxSymbolValue + 1, 0, CTableSize - (maxSymbolValue + 1) * sizeof(HUF_CElt)); memset(table->CTable + (maxSymbolValue + 1), 0,
sizeof(table->CTable) - ((maxSymbolValue + 1) * sizeof(HUF_CElt)));
} }
/* Write table description header */ /* Write table description header */
{ CHECK_V_F(hSize, HUF_writeCTable (op, dstSize, CTable, maxSymbolValue, huffLog) ); { CHECK_V_F(hSize, HUF_writeCTable (op, dstSize, table->CTable, maxSymbolValue, huffLog) );
/* Check if using the previous table will be beneficial */ /* Check if using previous huffman table is beneficial */
if (repeat && *repeat != HUF_repeat_none) { if (repeat && *repeat != HUF_repeat_none) {
size_t const oldSize = HUF_estimateCompressedSize(oldHufTable, count, maxSymbolValue); size_t const oldSize = HUF_estimateCompressedSize(oldHufTable, table->count, maxSymbolValue);
size_t const newSize = HUF_estimateCompressedSize(CTable, count, maxSymbolValue); size_t const newSize = HUF_estimateCompressedSize(table->CTable, table->count, maxSymbolValue);
if (oldSize <= hSize + newSize || hSize + 12 >= srcSize) { if (oldSize <= hSize + newSize || hSize + 12 >= srcSize) {
return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, singleStream, oldHufTable); return HUF_compressCTable_internal(ostart, op, oend,
} src, srcSize,
} singleStream, oldHufTable, bmi2);
/* Use the new table */ } }
/* Use the new huffman table */
if (hSize + 12ul >= srcSize) { return 0; } if (hSize + 12ul >= srcSize) { return 0; }
op += hSize; op += hSize;
if (repeat) { *repeat = HUF_repeat_none; } if (repeat) { *repeat = HUF_repeat_none; }
if (oldHufTable) { memcpy(oldHufTable, CTable, CTableSize); } /* Save the new table */ if (oldHufTable)
memcpy(oldHufTable, table->CTable, sizeof(table->CTable)); /* Save new table */
} }
return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, singleStream, CTable); return HUF_compressCTable_internal(ostart, op, oend,
src, srcSize,
singleStream, table->CTable, bmi2);
} }
@ -640,52 +609,70 @@ size_t HUF_compress1X_wksp (void* dst, size_t dstSize,
unsigned maxSymbolValue, unsigned huffLog, unsigned maxSymbolValue, unsigned huffLog,
void* workSpace, size_t wkspSize) void* workSpace, size_t wkspSize)
{ {
return HUF_compress_internal(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, 1 /* single stream */, workSpace, wkspSize, NULL, NULL, 0); return HUF_compress_internal(dst, dstSize, src, srcSize,
maxSymbolValue, huffLog, 1 /*single stream*/,
workSpace, wkspSize,
NULL, NULL, 0, 0 /*bmi2*/);
} }
size_t HUF_compress1X_repeat (void* dst, size_t dstSize, size_t HUF_compress1X_repeat (void* dst, size_t dstSize,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned huffLog, unsigned maxSymbolValue, unsigned huffLog,
void* workSpace, size_t wkspSize, void* workSpace, size_t wkspSize,
HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat) HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2)
{ {
return HUF_compress_internal(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, 1 /* single stream */, workSpace, wkspSize, hufTable, repeat, preferRepeat); return HUF_compress_internal(dst, dstSize, src, srcSize,
maxSymbolValue, huffLog, 1 /*single stream*/,
workSpace, wkspSize, hufTable,
repeat, preferRepeat, bmi2);
} }
size_t HUF_compress1X (void* dst, size_t dstSize, size_t HUF_compress1X (void* dst, size_t dstSize,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned huffLog) unsigned maxSymbolValue, unsigned huffLog)
{ {
unsigned workSpace[1024]; unsigned workSpace[HUF_WORKSPACE_SIZE_U32];
return HUF_compress1X_wksp(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, workSpace, sizeof(workSpace)); return HUF_compress1X_wksp(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, workSpace, sizeof(workSpace));
} }
/* HUF_compress4X_repeat():
* compress input using 4 streams.
* provide workspace to generate compression tables */
size_t HUF_compress4X_wksp (void* dst, size_t dstSize, size_t HUF_compress4X_wksp (void* dst, size_t dstSize,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned huffLog, unsigned maxSymbolValue, unsigned huffLog,
void* workSpace, size_t wkspSize) void* workSpace, size_t wkspSize)
{ {
return HUF_compress_internal(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, 0 /* 4 streams */, workSpace, wkspSize, NULL, NULL, 0); return HUF_compress_internal(dst, dstSize, src, srcSize,
maxSymbolValue, huffLog, 0 /*4 streams*/,
workSpace, wkspSize,
NULL, NULL, 0, 0 /*bmi2*/);
} }
/* HUF_compress4X_repeat():
* compress input using 4 streams.
* re-use an existing huffman compression table */
size_t HUF_compress4X_repeat (void* dst, size_t dstSize, size_t HUF_compress4X_repeat (void* dst, size_t dstSize,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned huffLog, unsigned maxSymbolValue, unsigned huffLog,
void* workSpace, size_t wkspSize, void* workSpace, size_t wkspSize,
HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat) HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2)
{ {
return HUF_compress_internal(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, 0 /* 4 streams */, workSpace, wkspSize, hufTable, repeat, preferRepeat); return HUF_compress_internal(dst, dstSize, src, srcSize,
maxSymbolValue, huffLog, 0 /* 4 streams */,
workSpace, wkspSize,
hufTable, repeat, preferRepeat, bmi2);
} }
size_t HUF_compress2 (void* dst, size_t dstSize, size_t HUF_compress2 (void* dst, size_t dstSize,
const void* src, size_t srcSize, const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned huffLog) unsigned maxSymbolValue, unsigned huffLog)
{ {
unsigned workSpace[1024]; unsigned workSpace[HUF_WORKSPACE_SIZE_U32];
return HUF_compress4X_wksp(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, workSpace, sizeof(workSpace)); return HUF_compress4X_wksp(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, workSpace, sizeof(workSpace));
} }
size_t HUF_compress (void* dst, size_t maxDstSize, const void* src, size_t srcSize) size_t HUF_compress (void* dst, size_t maxDstSize, const void* src, size_t srcSize)
{ {
return HUF_compress2(dst, maxDstSize, src, (U32)srcSize, 255, HUF_TABLELOG_DEFAULT); return HUF_compress2(dst, maxDstSize, src, srcSize, 255, HUF_TABLELOG_DEFAULT);
} }

View File

@ -0,0 +1,120 @@
/*
* Copyright (c) 2018-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under both the BSD-style license (found in the
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
* in the COPYING file in the root directory of this source tree).
* You may select, at your option, one of the above-listed licenses.
*/
#ifndef FUNCTION
# error "FUNCTION(name) must be defined"
#endif
#ifndef TARGET
# error "TARGET must be defined"
#endif
static void FUNCTION(HUF_encodeSymbol)(BIT_CStream_t* bitCPtr, U32 symbol, const HUF_CElt* CTable)
{
BIT_addBitsFast(bitCPtr, CTable[symbol].val, CTable[symbol].nbBits);
}
#define HUF_FLUSHBITS(s) BIT_flushBits(s)
#define HUF_FLUSHBITS_1(stream) \
if (sizeof((stream)->bitContainer)*8 < HUF_TABLELOG_MAX*2+7) HUF_FLUSHBITS(stream)
#define HUF_FLUSHBITS_2(stream) \
if (sizeof((stream)->bitContainer)*8 < HUF_TABLELOG_MAX*4+7) HUF_FLUSHBITS(stream)
static TARGET
size_t FUNCTION(HUF_compress1X_usingCTable_internal)(void* dst, size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable)
{
const BYTE* ip = (const BYTE*) src;
BYTE* const ostart = (BYTE*)dst;
BYTE* const oend = ostart + dstSize;
BYTE* op = ostart;
size_t n;
BIT_CStream_t bitC;
/* init */
if (dstSize < 8) return 0; /* not enough space to compress */
{ size_t const initErr = BIT_initCStream(&bitC, op, oend-op);
if (HUF_isError(initErr)) return 0; }
n = srcSize & ~3; /* join to mod 4 */
switch (srcSize & 3)
{
case 3 : FUNCTION(HUF_encodeSymbol)(&bitC, ip[n+ 2], CTable);
HUF_FLUSHBITS_2(&bitC);
/* fall-through */
case 2 : FUNCTION(HUF_encodeSymbol)(&bitC, ip[n+ 1], CTable);
HUF_FLUSHBITS_1(&bitC);
/* fall-through */
case 1 : FUNCTION(HUF_encodeSymbol)(&bitC, ip[n+ 0], CTable);
HUF_FLUSHBITS(&bitC);
/* fall-through */
case 0 : /* fall-through */
default: break;
}
for (; n>0; n-=4) { /* note : n&3==0 at this stage */
FUNCTION(HUF_encodeSymbol)(&bitC, ip[n- 1], CTable);
HUF_FLUSHBITS_1(&bitC);
FUNCTION(HUF_encodeSymbol)(&bitC, ip[n- 2], CTable);
HUF_FLUSHBITS_2(&bitC);
FUNCTION(HUF_encodeSymbol)(&bitC, ip[n- 3], CTable);
HUF_FLUSHBITS_1(&bitC);
FUNCTION(HUF_encodeSymbol)(&bitC, ip[n- 4], CTable);
HUF_FLUSHBITS(&bitC);
}
return BIT_closeCStream(&bitC);
}
static TARGET
size_t FUNCTION(HUF_compress4X_usingCTable_internal)(void* dst, size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable)
{
size_t const segmentSize = (srcSize+3)/4; /* first 3 segments */
const BYTE* ip = (const BYTE*) src;
const BYTE* const iend = ip + srcSize;
BYTE* const ostart = (BYTE*) dst;
BYTE* const oend = ostart + dstSize;
BYTE* op = ostart;
if (dstSize < 6 + 1 + 1 + 1 + 8) return 0; /* minimum space to compress successfully */
if (srcSize < 12) return 0; /* no saving possible : too small input */
op += 6; /* jumpTable */
{ CHECK_V_F(cSize, FUNCTION(HUF_compress1X_usingCTable_internal)(op, oend-op, ip, segmentSize, CTable) );
if (cSize==0) return 0;
MEM_writeLE16(ostart, (U16)cSize);
op += cSize;
}
ip += segmentSize;
{ CHECK_V_F(cSize, FUNCTION(HUF_compress1X_usingCTable_internal)(op, oend-op, ip, segmentSize, CTable) );
if (cSize==0) return 0;
MEM_writeLE16(ostart+2, (U16)cSize);
op += cSize;
}
ip += segmentSize;
{ CHECK_V_F(cSize, FUNCTION(HUF_compress1X_usingCTable_internal)(op, oend-op, ip, segmentSize, CTable) );
if (cSize==0) return 0;
MEM_writeLE16(ostart+4, (U16)cSize);
op += cSize;
}
ip += segmentSize;
{ CHECK_V_F(cSize, FUNCTION(HUF_compress1X_usingCTable_internal)(op, oend-op, ip, iend-ip, CTable) );
if (cSize==0) return 0;
op += cSize;
}
return op-ostart;
}

View File

@ -21,6 +21,7 @@
* Dependencies * Dependencies
***************************************/ ***************************************/
#include <string.h> /* memset */ #include <string.h> /* memset */
#include "cpu.h"
#include "mem.h" #include "mem.h"
#define FSE_STATIC_LINKING_ONLY /* FSE_encodeSymbol */ #define FSE_STATIC_LINKING_ONLY /* FSE_encodeSymbol */
#include "fse.h" #include "fse.h"
@ -73,6 +74,7 @@ ZSTD_CCtx* ZSTD_createCCtx_advanced(ZSTD_customMem customMem)
cctx->customMem = customMem; cctx->customMem = customMem;
cctx->requestedParams.compressionLevel = ZSTD_CLEVEL_DEFAULT; cctx->requestedParams.compressionLevel = ZSTD_CLEVEL_DEFAULT;
cctx->requestedParams.fParams.contentSizeFlag = 1; cctx->requestedParams.fParams.contentSizeFlag = 1;
cctx->bmi2 = ZSTD_cpuid_bmi2(ZSTD_cpuid());
return cctx; return cctx;
} }
} }
@ -96,6 +98,7 @@ ZSTD_CCtx* ZSTD_initStaticCCtx(void *workspace, size_t workspaceSize)
void* const ptr = cctx->blockState.nextCBlock + 1; void* const ptr = cctx->blockState.nextCBlock + 1;
cctx->entropyWorkspace = (U32*)ptr; cctx->entropyWorkspace = (U32*)ptr;
} }
cctx->bmi2 = ZSTD_cpuid_bmi2(ZSTD_cpuid());
return cctx; return cctx;
} }
@ -140,8 +143,6 @@ size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs)
/* private API call, for dictBuilder only */ /* private API call, for dictBuilder only */
const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx) { return &(ctx->seqStore); } const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx) { return &(ctx->seqStore); }
#define ZSTD_CLEVEL_CUSTOM 999
static ZSTD_compressionParameters ZSTD_getCParamsFromCCtxParams( static ZSTD_compressionParameters ZSTD_getCParamsFromCCtxParams(
ZSTD_CCtx_params CCtxParams, U64 srcSizeHint, size_t dictSize) ZSTD_CCtx_params CCtxParams, U64 srcSizeHint, size_t dictSize)
{ {
@ -160,13 +161,6 @@ static void ZSTD_cLevelToCCtxParams_srcSize(ZSTD_CCtx_params* CCtxParams, U64 sr
CCtxParams->compressionLevel = ZSTD_CLEVEL_CUSTOM; CCtxParams->compressionLevel = ZSTD_CLEVEL_CUSTOM;
} }
static void ZSTD_cLevelToCParams(ZSTD_CCtx* cctx)
{
DEBUGLOG(4, "ZSTD_cLevelToCParams: level=%i", cctx->requestedParams.compressionLevel);
ZSTD_cLevelToCCtxParams_srcSize(
&cctx->requestedParams, cctx->pledgedSrcSizePlusOne-1);
}
static void ZSTD_cLevelToCCtxParams(ZSTD_CCtx_params* CCtxParams) static void ZSTD_cLevelToCCtxParams(ZSTD_CCtx_params* CCtxParams)
{ {
DEBUGLOG(4, "ZSTD_cLevelToCCtxParams"); DEBUGLOG(4, "ZSTD_cLevelToCCtxParams");
@ -246,10 +240,48 @@ static ZSTD_CCtx_params ZSTD_assignParamsToCCtxParams(
return ERROR(parameter_outOfBound); \ return ERROR(parameter_outOfBound); \
} } } }
static int ZSTD_isUpdateAuthorized(ZSTD_cParameter param)
{
switch(param)
{
case ZSTD_p_compressionLevel:
case ZSTD_p_hashLog:
case ZSTD_p_chainLog:
case ZSTD_p_searchLog:
case ZSTD_p_minMatch:
case ZSTD_p_targetLength:
case ZSTD_p_compressionStrategy:
return 1;
case ZSTD_p_format:
case ZSTD_p_windowLog:
case ZSTD_p_contentSizeFlag:
case ZSTD_p_checksumFlag:
case ZSTD_p_dictIDFlag:
case ZSTD_p_forceMaxWindow :
case ZSTD_p_nbWorkers:
case ZSTD_p_jobSize:
case ZSTD_p_overlapSizeLog:
case ZSTD_p_enableLongDistanceMatching:
case ZSTD_p_ldmHashLog:
case ZSTD_p_ldmMinMatch:
case ZSTD_p_ldmBucketSizeLog:
case ZSTD_p_ldmHashEveryLog:
default:
return 0;
}
}
size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value) size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value)
{ {
DEBUGLOG(4, "ZSTD_CCtx_setParameter (%u, %u)", (U32)param, value); DEBUGLOG(4, "ZSTD_CCtx_setParameter (%u, %u)", (U32)param, value);
if (cctx->streamStage != zcss_init) return ERROR(stage_wrong); if (cctx->streamStage != zcss_init) {
if (ZSTD_isUpdateAuthorized(param)) {
cctx->cParamsChanged = 1;
} else {
return ERROR(stage_wrong);
} }
switch(param) switch(param)
{ {
@ -268,7 +300,9 @@ size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned v
case ZSTD_p_targetLength: case ZSTD_p_targetLength:
case ZSTD_p_compressionStrategy: case ZSTD_p_compressionStrategy:
if (cctx->cdict) return ERROR(stage_wrong); if (cctx->cdict) return ERROR(stage_wrong);
if (value>0) ZSTD_cLevelToCParams(cctx); /* Can optimize if srcSize is known */ if (value>0) {
ZSTD_cLevelToCCtxParams_srcSize(&cctx->requestedParams, cctx->pledgedSrcSizePlusOne-1); /* Optimize cParams when srcSize is known */
}
return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value); return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
case ZSTD_p_contentSizeFlag: case ZSTD_p_contentSizeFlag:
@ -281,20 +315,20 @@ size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned v
* default : 0 when using a CDict, 1 when using a Prefix */ * default : 0 when using a CDict, 1 when using a Prefix */
return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value); return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
case ZSTD_p_nbThreads: case ZSTD_p_nbWorkers:
if ((value > 1) && cctx->staticSize) { if ((value>0) && cctx->staticSize) {
return ERROR(parameter_unsupported); /* MT not compatible with static alloc */ return ERROR(parameter_unsupported); /* MT not compatible with static alloc */
} }
return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value); return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
case ZSTD_p_nonBlockingMode:
case ZSTD_p_jobSize: case ZSTD_p_jobSize:
case ZSTD_p_overlapSizeLog: case ZSTD_p_overlapSizeLog:
return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value); return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
case ZSTD_p_enableLongDistanceMatching: case ZSTD_p_enableLongDistanceMatching:
if (cctx->cdict) return ERROR(stage_wrong); if (cctx->cdict) return ERROR(stage_wrong);
if (value>0) ZSTD_cLevelToCParams(cctx); if (value>0)
ZSTD_cLevelToCCtxParams_srcSize(&cctx->requestedParams, cctx->pledgedSrcSizePlusOne-1); /* Optimize cParams when srcSize is known */
return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value); return ZSTD_CCtxParam_setParameter(&cctx->requestedParams, param, value);
case ZSTD_p_ldmHashLog: case ZSTD_p_ldmHashLog:
@ -403,21 +437,12 @@ size_t ZSTD_CCtxParam_setParameter(
CCtxParams->forceWindow = (value > 0); CCtxParams->forceWindow = (value > 0);
return CCtxParams->forceWindow; return CCtxParams->forceWindow;
case ZSTD_p_nbThreads : case ZSTD_p_nbWorkers :
if (value == 0) return CCtxParams->nbThreads;
#ifndef ZSTD_MULTITHREAD #ifndef ZSTD_MULTITHREAD
if (value > 1) return ERROR(parameter_unsupported); if (value > 0) return ERROR(parameter_unsupported);
return 1; return 0;
#else #else
return ZSTDMT_CCtxParam_setNbThreads(CCtxParams, value); return ZSTDMT_CCtxParam_setNbWorkers(CCtxParams, value);
#endif
case ZSTD_p_nonBlockingMode :
#ifndef ZSTD_MULTITHREAD
return ERROR(parameter_unsupported);
#else
CCtxParams->nonBlockingMode = (value>0);
return CCtxParams->nonBlockingMode;
#endif #endif
case ZSTD_p_jobSize : case ZSTD_p_jobSize :
@ -476,6 +501,9 @@ size_t ZSTD_CCtxParam_setParameter(
/** ZSTD_CCtx_setParametersUsingCCtxParams() : /** ZSTD_CCtx_setParametersUsingCCtxParams() :
* just applies `params` into `cctx` * just applies `params` into `cctx`
* no action is performed, parameters are merely stored. * no action is performed, parameters are merely stored.
* If ZSTDMT is enabled, parameters are pushed to cctx->mtctx.
* This is possible even if a compression is ongoing.
* In which case, new parameters will be applied on the fly, starting with next compression job.
*/ */
size_t ZSTD_CCtx_setParametersUsingCCtxParams( size_t ZSTD_CCtx_setParametersUsingCCtxParams(
ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params) ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params)
@ -484,7 +512,6 @@ size_t ZSTD_CCtx_setParametersUsingCCtxParams(
if (cctx->cdict) return ERROR(stage_wrong); if (cctx->cdict) return ERROR(stage_wrong);
cctx->requestedParams = *params; cctx->requestedParams = *params;
return 0; return 0;
} }
@ -680,7 +707,7 @@ static size_t ZSTD_sizeof_matchState(ZSTD_compressionParameters const* cParams,
size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params) size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params)
{ {
/* Estimate CCtx size is supported for single-threaded compression only. */ /* Estimate CCtx size is supported for single-threaded compression only. */
if (params->nbThreads > 1) { return ERROR(GENERIC); } if (params->nbWorkers > 0) { return ERROR(GENERIC); }
{ ZSTD_compressionParameters const cParams = { ZSTD_compressionParameters const cParams =
ZSTD_getCParamsFromCCtxParams(*params, 0, 0); ZSTD_getCParamsFromCCtxParams(*params, 0, 0);
size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cParams.windowLog); size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cParams.windowLog);
@ -691,12 +718,11 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params)
size_t const blockStateSpace = 2 * sizeof(ZSTD_compressedBlockState_t); size_t const blockStateSpace = 2 * sizeof(ZSTD_compressedBlockState_t);
size_t const matchStateSize = ZSTD_sizeof_matchState(&params->cParams, /* forCCtx */ 1); size_t const matchStateSize = ZSTD_sizeof_matchState(&params->cParams, /* forCCtx */ 1);
size_t const ldmSpace = params->ldmParams.enableLdm ? size_t const ldmSpace = ZSTD_ldm_getTableSize(params->ldmParams);
ZSTD_ldm_getTableSize(params->ldmParams.hashLog, size_t const ldmSeqSpace = ZSTD_ldm_getMaxNbSeq(params->ldmParams, blockSize) * sizeof(rawSeq);
params->ldmParams.bucketSizeLog) : 0;
size_t const neededSpace = entropySpace + blockStateSpace + tokenSpace + size_t const neededSpace = entropySpace + blockStateSpace + tokenSpace +
matchStateSize + ldmSpace; matchStateSize + ldmSpace + ldmSeqSpace;
DEBUGLOG(5, "sizeof(ZSTD_CCtx) : %u", (U32)sizeof(ZSTD_CCtx)); DEBUGLOG(5, "sizeof(ZSTD_CCtx) : %u", (U32)sizeof(ZSTD_CCtx));
DEBUGLOG(5, "estimate workSpace : %u", (U32)neededSpace); DEBUGLOG(5, "estimate workSpace : %u", (U32)neededSpace);
@ -729,7 +755,7 @@ size_t ZSTD_estimateCCtxSize(int compressionLevel)
size_t ZSTD_estimateCStreamSize_usingCCtxParams(const ZSTD_CCtx_params* params) size_t ZSTD_estimateCStreamSize_usingCCtxParams(const ZSTD_CCtx_params* params)
{ {
if (params->nbThreads > 1) { return ERROR(GENERIC); } if (params->nbWorkers > 0) { return ERROR(GENERIC); }
{ size_t const CCtxSize = ZSTD_estimateCCtxSize_usingCCtxParams(params); { size_t const CCtxSize = ZSTD_estimateCCtxSize_usingCCtxParams(params);
size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << params->cParams.windowLog); size_t const blockSize = MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << params->cParams.windowLog);
size_t const inBuffSize = ((size_t)1 << params->cParams.windowLog) + blockSize; size_t const inBuffSize = ((size_t)1 << params->cParams.windowLog) + blockSize;
@ -768,7 +794,7 @@ size_t ZSTD_estimateCStreamSize(int compressionLevel) {
ZSTD_frameProgression ZSTD_getFrameProgression(const ZSTD_CCtx* cctx) ZSTD_frameProgression ZSTD_getFrameProgression(const ZSTD_CCtx* cctx)
{ {
#ifdef ZSTD_MULTITHREAD #ifdef ZSTD_MULTITHREAD
if ((cctx->appliedParams.nbThreads > 1) || (cctx->appliedParams.nonBlockingMode)) { if (cctx->appliedParams.nbWorkers > 0) {
return ZSTDMT_getFrameProgression(cctx->mtctx); return ZSTDMT_getFrameProgression(cctx->mtctx);
} }
#endif #endif
@ -857,13 +883,9 @@ static void ZSTD_reset_compressedBlockState(ZSTD_compressedBlockState_t* bs)
*/ */
static void ZSTD_invalidateMatchState(ZSTD_matchState_t* ms) static void ZSTD_invalidateMatchState(ZSTD_matchState_t* ms)
{ {
size_t const endT = (size_t)(ms->nextSrc - ms->base); ZSTD_window_clear(&ms->window);
U32 const end = (U32)endT;
assert(endT < (3U<<30));
ms->lowLimit = end; ms->nextToUpdate = ms->window.dictLimit + 1;
ms->dictLimit = end;
ms->nextToUpdate = end + 1;
ms->loadedDictEnd = 0; ms->loadedDictEnd = 0;
ms->opt.litLengthSum = 0; /* force reset of btopt stats */ ms->opt.litLengthSum = 0; /* force reset of btopt stats */
} }
@ -905,10 +927,8 @@ static void* ZSTD_reset_matchState(ZSTD_matchState_t* ms, void* ptr, ZSTD_compre
assert(((size_t)ptr & 3) == 0); assert(((size_t)ptr & 3) == 0);
ms->nextSrc = NULL;
ms->base = NULL;
ms->dictBase = NULL;
ms->hashLog3 = hashLog3; ms->hashLog3 = hashLog3;
memset(&ms->window, 0, sizeof(ms->window));
ZSTD_invalidateMatchState(ms); ZSTD_invalidateMatchState(ms);
/* opt parser space */ /* opt parser space */
@ -963,7 +983,8 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
if (params.ldmParams.enableLdm) { if (params.ldmParams.enableLdm) {
/* Adjust long distance matching parameters */ /* Adjust long distance matching parameters */
ZSTD_ldm_adjustParameters(&params.ldmParams, params.cParams.windowLog); params.ldmParams.windowLog = params.cParams.windowLog;
ZSTD_ldm_adjustParameters(&params.ldmParams, &params.cParams);
assert(params.ldmParams.hashLog >= params.ldmParams.bucketSizeLog); assert(params.ldmParams.hashLog >= params.ldmParams.bucketSizeLog);
assert(params.ldmParams.hashEveryLog < 32); assert(params.ldmParams.hashEveryLog < 32);
zc->ldmState.hashPower = zc->ldmState.hashPower =
@ -978,17 +999,19 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
size_t const buffOutSize = (zbuff==ZSTDb_buffered) ? ZSTD_compressBound(blockSize)+1 : 0; size_t const buffOutSize = (zbuff==ZSTDb_buffered) ? ZSTD_compressBound(blockSize)+1 : 0;
size_t const buffInSize = (zbuff==ZSTDb_buffered) ? windowSize + blockSize : 0; size_t const buffInSize = (zbuff==ZSTDb_buffered) ? windowSize + blockSize : 0;
size_t const matchStateSize = ZSTD_sizeof_matchState(&params.cParams, /* forCCtx */ 1); size_t const matchStateSize = ZSTD_sizeof_matchState(&params.cParams, /* forCCtx */ 1);
size_t const maxNbLdmSeq = ZSTD_ldm_getMaxNbSeq(params.ldmParams, blockSize);
void* ptr; void* ptr;
/* Check if workSpace is large enough, alloc a new one if needed */ /* Check if workSpace is large enough, alloc a new one if needed */
{ size_t const entropySpace = HUF_WORKSPACE_SIZE; { size_t const entropySpace = HUF_WORKSPACE_SIZE;
size_t const blockStateSpace = 2 * sizeof(ZSTD_compressedBlockState_t); size_t const blockStateSpace = 2 * sizeof(ZSTD_compressedBlockState_t);
size_t const bufferSpace = buffInSize + buffOutSize; size_t const bufferSpace = buffInSize + buffOutSize;
size_t const ldmSpace = params.ldmParams.enableLdm size_t const ldmSpace = ZSTD_ldm_getTableSize(params.ldmParams);
? ZSTD_ldm_getTableSize(params.ldmParams.hashLog, params.ldmParams.bucketSizeLog) size_t const ldmSeqSpace = maxNbLdmSeq * sizeof(rawSeq);
: 0;
size_t const neededSpace = entropySpace + blockStateSpace + ldmSpace + size_t const neededSpace = entropySpace + blockStateSpace + ldmSpace +
matchStateSize + tokenSpace + bufferSpace; ldmSeqSpace + matchStateSize + tokenSpace +
bufferSpace;
DEBUGLOG(4, "Need %uKB workspace, including %uKB for match state, and %uKB for buffers", DEBUGLOG(4, "Need %uKB workspace, including %uKB for match state, and %uKB for buffers",
(U32)(neededSpace>>10), (U32)(matchStateSize>>10), (U32)(bufferSpace>>10)); (U32)(neededSpace>>10), (U32)(matchStateSize>>10), (U32)(bufferSpace>>10));
DEBUGLOG(4, "windowSize: %u - blockSize: %u", (U32)windowSize, (U32)blockSize); DEBUGLOG(4, "windowSize: %u - blockSize: %u", (U32)windowSize, (U32)blockSize);
@ -1043,7 +1066,12 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
assert(((size_t)ptr & 3) == 0); /* ensure ptr is properly aligned */ assert(((size_t)ptr & 3) == 0); /* ensure ptr is properly aligned */
zc->ldmState.hashTable = (ldmEntry_t*)ptr; zc->ldmState.hashTable = (ldmEntry_t*)ptr;
ptr = zc->ldmState.hashTable + ldmHSize; ptr = zc->ldmState.hashTable + ldmHSize;
zc->ldmSequences = (rawSeq*)ptr;
ptr = zc->ldmSequences + maxNbLdmSeq;
memset(&zc->ldmState.window, 0, sizeof(zc->ldmState.window));
} }
assert(((size_t)ptr & 3) == 0); /* ensure ptr is properly aligned */
ptr = ZSTD_reset_matchState(&zc->blockState.matchState, ptr, &params.cParams, crp, /* forCCtx */ 1); ptr = ZSTD_reset_matchState(&zc->blockState.matchState, ptr, &params.cParams, crp, /* forCCtx */ 1);
@ -1083,7 +1111,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
void ZSTD_invalidateRepCodes(ZSTD_CCtx* cctx) { void ZSTD_invalidateRepCodes(ZSTD_CCtx* cctx) {
int i; int i;
for (i=0; i<ZSTD_REP_NUM; i++) cctx->blockState.prevCBlock->rep[i] = 0; for (i=0; i<ZSTD_REP_NUM; i++) cctx->blockState.prevCBlock->rep[i] = 0;
assert(/* !extDict */ cctx->blockState.matchState.lowLimit == cctx->blockState.matchState.dictLimit); assert(!ZSTD_window_hasExtDict(cctx->blockState.matchState.window));
} }
static size_t ZSTD_resetCCtx_usingCDict(ZSTD_CCtx* cctx, static size_t ZSTD_resetCCtx_usingCDict(ZSTD_CCtx* cctx,
@ -1125,13 +1153,9 @@ static size_t ZSTD_resetCCtx_usingCDict(ZSTD_CCtx* cctx,
{ {
ZSTD_matchState_t const* srcMatchState = &cdict->matchState; ZSTD_matchState_t const* srcMatchState = &cdict->matchState;
ZSTD_matchState_t* dstMatchState = &cctx->blockState.matchState; ZSTD_matchState_t* dstMatchState = &cctx->blockState.matchState;
dstMatchState->window = srcMatchState->window;
dstMatchState->nextToUpdate = srcMatchState->nextToUpdate; dstMatchState->nextToUpdate = srcMatchState->nextToUpdate;
dstMatchState->nextToUpdate3= srcMatchState->nextToUpdate3; dstMatchState->nextToUpdate3= srcMatchState->nextToUpdate3;
dstMatchState->nextSrc = srcMatchState->nextSrc;
dstMatchState->base = srcMatchState->base;
dstMatchState->dictBase = srcMatchState->dictBase;
dstMatchState->dictLimit = srcMatchState->dictLimit;
dstMatchState->lowLimit = srcMatchState->lowLimit;
dstMatchState->loadedDictEnd= srcMatchState->loadedDictEnd; dstMatchState->loadedDictEnd= srcMatchState->loadedDictEnd;
} }
cctx->dictID = cdict->dictID; cctx->dictID = cdict->dictID;
@ -1186,13 +1210,9 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dstCCtx,
{ {
ZSTD_matchState_t const* srcMatchState = &srcCCtx->blockState.matchState; ZSTD_matchState_t const* srcMatchState = &srcCCtx->blockState.matchState;
ZSTD_matchState_t* dstMatchState = &dstCCtx->blockState.matchState; ZSTD_matchState_t* dstMatchState = &dstCCtx->blockState.matchState;
dstMatchState->window = srcMatchState->window;
dstMatchState->nextToUpdate = srcMatchState->nextToUpdate; dstMatchState->nextToUpdate = srcMatchState->nextToUpdate;
dstMatchState->nextToUpdate3= srcMatchState->nextToUpdate3; dstMatchState->nextToUpdate3= srcMatchState->nextToUpdate3;
dstMatchState->nextSrc = srcMatchState->nextSrc;
dstMatchState->base = srcMatchState->base;
dstMatchState->dictBase = srcMatchState->dictBase;
dstMatchState->dictLimit = srcMatchState->dictLimit;
dstMatchState->lowLimit = srcMatchState->lowLimit;
dstMatchState->loadedDictEnd= srcMatchState->loadedDictEnd; dstMatchState->loadedDictEnd= srcMatchState->loadedDictEnd;
} }
dstCCtx->dictID = srcCCtx->dictID; dstCCtx->dictID = srcCCtx->dictID;
@ -1260,19 +1280,6 @@ static void ZSTD_reduceTable_btlazy2(U32* const table, U32 const size, U32 const
ZSTD_reduceTable_internal(table, size, reducerValue, 1); ZSTD_reduceTable_internal(table, size, reducerValue, 1);
} }
/*! ZSTD_ldm_reduceTable() :
* reduce table indexes by `reducerValue` */
static void ZSTD_ldm_reduceTable(ldmEntry_t* const table, U32 const size,
U32 const reducerValue)
{
U32 u;
for (u = 0; u < size; u++) {
if (table[u].offset < reducerValue) table[u].offset = 0;
else table[u].offset -= reducerValue;
}
}
/*! ZSTD_reduceIndex() : /*! ZSTD_reduceIndex() :
* rescale all indexes to avoid future overflow (indexes are U32) */ * rescale all indexes to avoid future overflow (indexes are U32) */
static void ZSTD_reduceIndex (ZSTD_CCtx* zc, const U32 reducerValue) static void ZSTD_reduceIndex (ZSTD_CCtx* zc, const U32 reducerValue)
@ -1294,11 +1301,6 @@ static void ZSTD_reduceIndex (ZSTD_CCtx* zc, const U32 reducerValue)
U32 const h3Size = (U32)1 << ms->hashLog3; U32 const h3Size = (U32)1 << ms->hashLog3;
ZSTD_reduceTable(ms->hashTable3, h3Size, reducerValue); ZSTD_reduceTable(ms->hashTable3, h3Size, reducerValue);
} }
if (zc->appliedParams.ldmParams.enableLdm) {
U32 const ldmHSize = (U32)1 << zc->appliedParams.ldmParams.hashLog;
ZSTD_ldm_reduceTable(zc->ldmState.hashTable, ldmHSize, reducerValue);
}
} }
@ -1377,7 +1379,7 @@ static size_t ZSTD_compressLiterals (ZSTD_entropyCTables_t const* prevEntropy,
ZSTD_strategy strategy, ZSTD_strategy strategy,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
const void* src, size_t srcSize, const void* src, size_t srcSize,
U32* workspace) U32* workspace, const int bmi2)
{ {
size_t const minGain = ZSTD_minGain(srcSize); size_t const minGain = ZSTD_minGain(srcSize);
size_t const lhSize = 3 + (srcSize >= 1 KB) + (srcSize >= 16 KB); size_t const lhSize = 3 + (srcSize >= 1 KB) + (srcSize >= 16 KB);
@ -1402,9 +1404,9 @@ static size_t ZSTD_compressLiterals (ZSTD_entropyCTables_t const* prevEntropy,
int const preferRepeat = strategy < ZSTD_lazy ? srcSize <= 1024 : 0; int const preferRepeat = strategy < ZSTD_lazy ? srcSize <= 1024 : 0;
if (repeat == HUF_repeat_valid && lhSize == 3) singleStream = 1; if (repeat == HUF_repeat_valid && lhSize == 3) singleStream = 1;
cLitSize = singleStream ? HUF_compress1X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11, cLitSize = singleStream ? HUF_compress1X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
workspace, HUF_WORKSPACE_SIZE, (HUF_CElt*)nextEntropy->hufCTable, &repeat, preferRepeat) workspace, HUF_WORKSPACE_SIZE, (HUF_CElt*)nextEntropy->hufCTable, &repeat, preferRepeat, bmi2)
: HUF_compress4X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11, : HUF_compress4X_repeat(ostart+lhSize, dstCapacity-lhSize, src, srcSize, 255, 11,
workspace, HUF_WORKSPACE_SIZE, (HUF_CElt*)nextEntropy->hufCTable, &repeat, preferRepeat); workspace, HUF_WORKSPACE_SIZE, (HUF_CElt*)nextEntropy->hufCTable, &repeat, preferRepeat, bmi2);
if (repeat != HUF_repeat_none) { if (repeat != HUF_repeat_none) {
/* reused the existing table */ /* reused the existing table */
hType = set_repeat; hType = set_repeat;
@ -1559,95 +1561,52 @@ size_t ZSTD_buildCTable(void* dst, size_t dstCapacity,
} }
} }
MEM_STATIC #define FUNCTION(fn) fn##_default
#define TARGET
#include "zstd_compress_impl.h"
#undef TARGET
#undef FUNCTION
#if DYNAMIC_BMI2
#define FUNCTION(fn) fn##_bmi2
#define TARGET TARGET_ATTRIBUTE("bmi2")
#include "zstd_compress_impl.h"
#undef TARGET
#undef FUNCTION
#endif
size_t ZSTD_encodeSequences( size_t ZSTD_encodeSequences(
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable, FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable, FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable, FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
seqDef const* sequences, size_t nbSeq, int longOffsets) seqDef const* sequences, size_t nbSeq, int longOffsets, int bmi2)
{ {
BIT_CStream_t blockStream; #if DYNAMIC_BMI2
FSE_CState_t stateMatchLength; if (bmi2) {
FSE_CState_t stateOffsetBits; return ZSTD_encodeSequences_bmi2(dst, dstCapacity,
FSE_CState_t stateLitLength; CTable_MatchLength, mlCodeTable,
CTable_OffsetBits, ofCodeTable,
CHECK_E(BIT_initCStream(&blockStream, dst, dstCapacity), dstSize_tooSmall); /* not enough space remaining */ CTable_LitLength, llCodeTable,
sequences, nbSeq, longOffsets);
/* first symbols */
FSE_initCState2(&stateMatchLength, CTable_MatchLength, mlCodeTable[nbSeq-1]);
FSE_initCState2(&stateOffsetBits, CTable_OffsetBits, ofCodeTable[nbSeq-1]);
FSE_initCState2(&stateLitLength, CTable_LitLength, llCodeTable[nbSeq-1]);
BIT_addBits(&blockStream, sequences[nbSeq-1].litLength, LL_bits[llCodeTable[nbSeq-1]]);
if (MEM_32bits()) BIT_flushBits(&blockStream);
BIT_addBits(&blockStream, sequences[nbSeq-1].matchLength, ML_bits[mlCodeTable[nbSeq-1]]);
if (MEM_32bits()) BIT_flushBits(&blockStream);
if (longOffsets) {
U32 const ofBits = ofCodeTable[nbSeq-1];
int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
if (extraBits) {
BIT_addBits(&blockStream, sequences[nbSeq-1].offset, extraBits);
BIT_flushBits(&blockStream);
}
BIT_addBits(&blockStream, sequences[nbSeq-1].offset >> extraBits,
ofBits - extraBits);
} else {
BIT_addBits(&blockStream, sequences[nbSeq-1].offset, ofCodeTable[nbSeq-1]);
}
BIT_flushBits(&blockStream);
{ size_t n;
for (n=nbSeq-2 ; n<nbSeq ; n--) { /* intentional underflow */
BYTE const llCode = llCodeTable[n];
BYTE const ofCode = ofCodeTable[n];
BYTE const mlCode = mlCodeTable[n];
U32 const llBits = LL_bits[llCode];
U32 const ofBits = ofCode;
U32 const mlBits = ML_bits[mlCode];
DEBUGLOG(6, "encoding: litlen:%2u - matchlen:%2u - offCode:%7u",
sequences[n].litLength,
sequences[n].matchLength + MINMATCH,
sequences[n].offset); /* 32b*/ /* 64b*/
/* (7)*/ /* (7)*/
FSE_encodeSymbol(&blockStream, &stateOffsetBits, ofCode); /* 15 */ /* 15 */
FSE_encodeSymbol(&blockStream, &stateMatchLength, mlCode); /* 24 */ /* 24 */
if (MEM_32bits()) BIT_flushBits(&blockStream); /* (7)*/
FSE_encodeSymbol(&blockStream, &stateLitLength, llCode); /* 16 */ /* 33 */
if (MEM_32bits() || (ofBits+mlBits+llBits >= 64-7-(LLFSELog+MLFSELog+OffFSELog)))
BIT_flushBits(&blockStream); /* (7)*/
BIT_addBits(&blockStream, sequences[n].litLength, llBits);
if (MEM_32bits() && ((llBits+mlBits)>24)) BIT_flushBits(&blockStream);
BIT_addBits(&blockStream, sequences[n].matchLength, mlBits);
if (MEM_32bits() || (ofBits+mlBits+llBits > 56)) BIT_flushBits(&blockStream);
if (longOffsets) {
int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
if (extraBits) {
BIT_addBits(&blockStream, sequences[n].offset, extraBits);
BIT_flushBits(&blockStream); /* (7)*/
}
BIT_addBits(&blockStream, sequences[n].offset >> extraBits,
ofBits - extraBits); /* 31 */
} else {
BIT_addBits(&blockStream, sequences[n].offset, ofBits); /* 31 */
}
BIT_flushBits(&blockStream); /* (7)*/
} }
FSE_flushCState(&blockStream, &stateMatchLength);
FSE_flushCState(&blockStream, &stateOffsetBits);
FSE_flushCState(&blockStream, &stateLitLength);
{ size_t const streamSize = BIT_closeCStream(&blockStream);
if (streamSize==0) return ERROR(dstSize_tooSmall); /* not enough space */
return streamSize;
} }
#endif
(void)bmi2;
return ZSTD_encodeSequences_default(dst, dstCapacity,
CTable_MatchLength, mlCodeTable,
CTable_OffsetBits, ofCodeTable,
CTable_LitLength, llCodeTable,
sequences, nbSeq, longOffsets);
} }
MEM_STATIC size_t ZSTD_compressSequences_internal(seqStore_t* seqStorePtr, MEM_STATIC size_t ZSTD_compressSequences_internal(seqStore_t* seqStorePtr,
ZSTD_entropyCTables_t const* prevEntropy, ZSTD_entropyCTables_t const* prevEntropy,
ZSTD_entropyCTables_t* nextEntropy, ZSTD_entropyCTables_t* nextEntropy,
ZSTD_compressionParameters const* cParams, ZSTD_compressionParameters const* cParams,
void* dst, size_t dstCapacity, U32* workspace) void* dst, size_t dstCapacity, U32* workspace,
const int bmi2)
{ {
const int longOffsets = cParams->windowLog > STREAM_ACCUMULATOR_MIN; const int longOffsets = cParams->windowLog > STREAM_ACCUMULATOR_MIN;
U32 count[MaxSeq+1]; U32 count[MaxSeq+1];
@ -1672,7 +1631,7 @@ MEM_STATIC size_t ZSTD_compressSequences_internal(seqStore_t* seqStorePtr,
size_t const litSize = seqStorePtr->lit - literals; size_t const litSize = seqStorePtr->lit - literals;
size_t const cSize = ZSTD_compressLiterals(prevEntropy, nextEntropy, size_t const cSize = ZSTD_compressLiterals(prevEntropy, nextEntropy,
cParams->strategy, op, dstCapacity, literals, litSize, cParams->strategy, op, dstCapacity, literals, litSize,
workspace); workspace, bmi2);
if (ZSTD_isError(cSize)) if (ZSTD_isError(cSize))
return cSize; return cSize;
assert(cSize <= dstCapacity); assert(cSize <= dstCapacity);
@ -1752,7 +1711,7 @@ MEM_STATIC size_t ZSTD_compressSequences_internal(seqStore_t* seqStorePtr,
CTable_OffsetBits, ofCodeTable, CTable_OffsetBits, ofCodeTable,
CTable_LitLength, llCodeTable, CTable_LitLength, llCodeTable,
sequences, nbSeq, sequences, nbSeq,
longOffsets); longOffsets, bmi2);
if (ZSTD_isError(bitstreamSize)) return bitstreamSize; if (ZSTD_isError(bitstreamSize)) return bitstreamSize;
op += bitstreamSize; op += bitstreamSize;
} }
@ -1765,11 +1724,11 @@ MEM_STATIC size_t ZSTD_compressSequences(seqStore_t* seqStorePtr,
ZSTD_entropyCTables_t* nextEntropy, ZSTD_entropyCTables_t* nextEntropy,
ZSTD_compressionParameters const* cParams, ZSTD_compressionParameters const* cParams,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
size_t srcSize, U32* workspace) size_t srcSize, U32* workspace, int bmi2)
{ {
size_t const cSize = ZSTD_compressSequences_internal( size_t const cSize = ZSTD_compressSequences_internal(
seqStorePtr, prevEntropy, nextEntropy, cParams, dst, dstCapacity, seqStorePtr, prevEntropy, nextEntropy, cParams, dst, dstCapacity,
workspace); workspace, bmi2);
/* If the srcSize <= dstCapacity, then there is enough space to write a /* If the srcSize <= dstCapacity, then there is enough space to write a
* raw uncompressed block. Since we ran out of space, the block must not * raw uncompressed block. Since we ran out of space, the block must not
* be compressible, so fall back to a raw uncompressed block. * be compressible, so fall back to a raw uncompressed block.
@ -1836,41 +1795,46 @@ static size_t ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
const void* src, size_t srcSize) const void* src, size_t srcSize)
{ {
ZSTD_matchState_t* const ms = &zc->blockState.matchState;
DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)", DEBUGLOG(5, "ZSTD_compressBlock_internal (dstCapacity=%u, dictLimit=%u, nextToUpdate=%u)",
(U32)dstCapacity, zc->blockState.matchState.dictLimit, zc->blockState.matchState.nextToUpdate); (U32)dstCapacity, ms->window.dictLimit, ms->nextToUpdate);
if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1)
return 0; /* don't even attempt compression below a certain srcSize */ return 0; /* don't even attempt compression below a certain srcSize */
ZSTD_resetSeqStore(&(zc->seqStore)); ZSTD_resetSeqStore(&(zc->seqStore));
/* limited update after a very long match */ /* limited update after a very long match */
{ const BYTE* const base = zc->blockState.matchState.base; { const BYTE* const base = ms->window.base;
const BYTE* const istart = (const BYTE*)src; const BYTE* const istart = (const BYTE*)src;
const U32 current = (U32)(istart-base); const U32 current = (U32)(istart-base);
if (current > zc->blockState.matchState.nextToUpdate + 384) if (current > ms->nextToUpdate + 384)
zc->blockState.matchState.nextToUpdate = current - MIN(192, (U32)(current - zc->blockState.matchState.nextToUpdate - 384)); ms->nextToUpdate = current - MIN(192, (U32)(current - ms->nextToUpdate - 384));
} }
/* select and store sequences */ /* select and store sequences */
{ U32 const extDict = zc->blockState.matchState.lowLimit < zc->blockState.matchState.dictLimit; { U32 const extDict = ZSTD_window_hasExtDict(ms->window);
size_t lastLLSize; size_t lastLLSize;
{ int i; for (i = 0; i < ZSTD_REP_NUM; ++i) zc->blockState.nextCBlock->rep[i] = zc->blockState.prevCBlock->rep[i]; } { int i; for (i = 0; i < ZSTD_REP_NUM; ++i) zc->blockState.nextCBlock->rep[i] = zc->blockState.prevCBlock->rep[i]; }
if (zc->appliedParams.ldmParams.enableLdm) { if (zc->appliedParams.ldmParams.enableLdm) {
typedef size_t (*ZSTD_ldmBlockCompressor)( size_t const nbSeq =
ldmState_t* ldms, ZSTD_matchState_t* ms, seqStore_t* seqStore, ZSTD_ldm_generateSequences(&zc->ldmState, zc->ldmSequences,
U32 rep[ZSTD_REP_NUM], ZSTD_CCtx_params const* params, &zc->appliedParams.ldmParams,
void const* src, size_t srcSize); src, srcSize, extDict);
ZSTD_ldmBlockCompressor const ldmBlockCompressor = extDict ? ZSTD_compressBlock_ldm_extDict : ZSTD_compressBlock_ldm; lastLLSize =
lastLLSize = ldmBlockCompressor(&zc->ldmState, &zc->blockState.matchState, &zc->seqStore, zc->blockState.nextCBlock->rep, &zc->appliedParams, src, srcSize); ZSTD_ldm_blockCompress(zc->ldmSequences, nbSeq,
ms, &zc->seqStore,
zc->blockState.nextCBlock->rep,
&zc->appliedParams.cParams,
src, srcSize, extDict);
} else { /* not long range mode */ } else { /* not long range mode */
ZSTD_blockCompressor const blockCompressor = ZSTD_selectBlockCompressor(zc->appliedParams.cParams.strategy, extDict); ZSTD_blockCompressor const blockCompressor = ZSTD_selectBlockCompressor(zc->appliedParams.cParams.strategy, extDict);
lastLLSize = blockCompressor(&zc->blockState.matchState, &zc->seqStore, zc->blockState.nextCBlock->rep, &zc->appliedParams.cParams, src, srcSize); lastLLSize = blockCompressor(ms, &zc->seqStore, zc->blockState.nextCBlock->rep, &zc->appliedParams.cParams, src, srcSize);
} }
{ const BYTE* const lastLiterals = (const BYTE*)src + srcSize - lastLLSize; { const BYTE* const lastLiterals = (const BYTE*)src + srcSize - lastLLSize;
ZSTD_storeLastLiterals(&zc->seqStore, lastLiterals, lastLLSize); ZSTD_storeLastLiterals(&zc->seqStore, lastLiterals, lastLLSize);
} } } }
/* encode sequences and literals */ /* encode sequences and literals */
{ size_t const cSize = ZSTD_compressSequences(&zc->seqStore, &zc->blockState.prevCBlock->entropy, &zc->blockState.nextCBlock->entropy, &zc->appliedParams.cParams, dst, dstCapacity, srcSize, zc->entropyWorkspace); { size_t const cSize = ZSTD_compressSequences(&zc->seqStore, &zc->blockState.prevCBlock->entropy, &zc->blockState.nextCBlock->entropy, &zc->appliedParams.cParams, dst, dstCapacity, srcSize, zc->entropyWorkspace, zc->bmi2);
if (ZSTD_isError(cSize) || cSize == 0) return cSize; if (ZSTD_isError(cSize) || cSize == 0) return cSize;
/* confirm repcodes and entropy tables */ /* confirm repcodes and entropy tables */
{ ZSTD_compressedBlockState_t* const tmp = zc->blockState.prevCBlock; { ZSTD_compressedBlockState_t* const tmp = zc->blockState.prevCBlock;
@ -1914,52 +1878,19 @@ static size_t ZSTD_compress_frameChunk (ZSTD_CCtx* cctx,
return ERROR(dstSize_tooSmall); /* not enough space to store compressed block */ return ERROR(dstSize_tooSmall); /* not enough space to store compressed block */
if (remaining < blockSize) blockSize = remaining; if (remaining < blockSize) blockSize = remaining;
/* preemptive overflow correction: if (ZSTD_window_needOverflowCorrection(ms->window)) {
* 1. correction is large enough: U32 const cycleLog = ZSTD_cycleLog(cctx->appliedParams.cParams.chainLog, cctx->appliedParams.cParams.strategy);
* lowLimit > (3<<29) ==> current > 3<<29 + 1<<windowLog - blockSize U32 const correction = ZSTD_window_correctOverflow(&ms->window, cycleLog, maxDist, ip);
* 1<<windowLog <= newCurrent < 1<<chainLog + 1<<windowLog
*
* current - newCurrent
* > (3<<29 + 1<<windowLog - blockSize) - (1<<windowLog + 1<<chainLog)
* > (3<<29 - blockSize) - (1<<chainLog)
* > (3<<29 - blockSize) - (1<<30) (NOTE: chainLog <= 30)
* > 1<<29 - 1<<17
*
* 2. (ip+blockSize - cctx->base) doesn't overflow:
* In 32 bit mode we limit windowLog to 30 so we don't get
* differences larger than 1<<31-1.
* 3. cctx->lowLimit < 1<<32:
* windowLog <= 31 ==> 3<<29 + 1<<windowLog < 7<<29 < 1<<32.
*/
if (ms->lowLimit > (3U<<29)) {
U32 const cycleMask = ((U32)1 << ZSTD_cycleLog(cctx->appliedParams.cParams.chainLog, cctx->appliedParams.cParams.strategy)) - 1;
U32 const current = (U32)(ip - cctx->blockState.matchState.base);
U32 const newCurrent = (current & cycleMask) + ((U32)1 << cctx->appliedParams.cParams.windowLog);
U32 const correction = current - newCurrent;
ZSTD_STATIC_ASSERT(ZSTD_CHAINLOG_MAX <= 30); ZSTD_STATIC_ASSERT(ZSTD_CHAINLOG_MAX <= 30);
ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX_32 <= 30); ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX_32 <= 30);
ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <= 31); ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <= 31);
assert(current > newCurrent);
assert(correction > 1<<28); /* Loose bound, should be about 1<<29 */
ZSTD_reduceIndex(cctx, correction); ZSTD_reduceIndex(cctx, correction);
ms->base += correction;
ms->dictBase += correction;
ms->lowLimit -= correction;
ms->dictLimit -= correction;
if (ms->nextToUpdate < correction) ms->nextToUpdate = 0; if (ms->nextToUpdate < correction) ms->nextToUpdate = 0;
else ms->nextToUpdate -= correction; else ms->nextToUpdate -= correction;
DEBUGLOG(4, "Correction of 0x%x bytes to lowLimit=0x%x", correction, ms->lowLimit);
}
/* enforce maxDist */
if ((U32)(ip+blockSize - ms->base) > ms->loadedDictEnd + maxDist) {
U32 const newLowLimit = (U32)(ip+blockSize - ms->base) - maxDist;
if (ms->lowLimit < newLowLimit) ms->lowLimit = newLowLimit;
if (ms->dictLimit < ms->lowLimit)
DEBUGLOG(5, "ZSTD_compress_frameChunk : update dictLimit from %u to %u ",
ms->dictLimit, ms->lowLimit);
if (ms->dictLimit < ms->lowLimit) ms->dictLimit = ms->lowLimit;
if (ms->nextToUpdate < ms->lowLimit) ms->nextToUpdate = ms->lowLimit;
} }
ZSTD_window_enforceMaxDist(&ms->window, ip + blockSize, ms->loadedDictEnd + maxDist);
if (ms->nextToUpdate < ms->window.lowLimit) ms->nextToUpdate = ms->window.lowLimit;
{ size_t cSize = ZSTD_compressBlock_internal(cctx, { size_t cSize = ZSTD_compressBlock_internal(cctx,
op+ZSTD_blockHeaderSize, dstCapacity-ZSTD_blockHeaderSize, op+ZSTD_blockHeaderSize, dstCapacity-ZSTD_blockHeaderSize,
@ -2051,38 +1982,12 @@ size_t ZSTD_writeLastEmptyBlock(void* dst, size_t dstCapacity)
} }
static void ZSTD_manageWindowContinuity(ZSTD_matchState_t* ms, void const* src, size_t srcSize)
{
const BYTE* const ip = (const BYTE*) src;
/* Check if blocks follow each other */
if (src != ms->nextSrc) {
/* not contiguous */
size_t const distanceFromBase = (size_t)(ms->nextSrc - ms->base);
DEBUGLOG(5, "ZSTD_manageWindowContinuity: non contiguous blocks, new segment starts at %u", ms->dictLimit);
ms->lowLimit = ms->dictLimit;
assert(distanceFromBase == (size_t)(U32)distanceFromBase); /* should never overflow */
ms->dictLimit = (U32)distanceFromBase;
ms->dictBase = ms->base;
ms->base = ip - distanceFromBase;
ms->nextToUpdate = ms->dictLimit;
if (ms->dictLimit - ms->lowLimit < HASH_READ_SIZE) ms->lowLimit = ms->dictLimit; /* too small extDict */
}
ms->nextSrc = ip + srcSize;
/* if input and dictionary overlap : reduce dictionary (area presumed modified by input) */
if ((ip+srcSize > ms->dictBase + ms->lowLimit) & (ip < ms->dictBase + ms->dictLimit)) {
ptrdiff_t const highInputIdx = (ip + srcSize) - ms->dictBase;
U32 const lowLimitMax = (highInputIdx > (ptrdiff_t)ms->dictLimit) ? ms->dictLimit : (U32)highInputIdx;
ms->lowLimit = lowLimitMax;
}
}
static size_t ZSTD_compressContinue_internal (ZSTD_CCtx* cctx, static size_t ZSTD_compressContinue_internal (ZSTD_CCtx* cctx,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
const void* src, size_t srcSize, const void* src, size_t srcSize,
U32 frame, U32 lastFrameChunk) U32 frame, U32 lastFrameChunk)
{ {
ZSTD_matchState_t* ms = &cctx->blockState.matchState;
size_t fhSize = 0; size_t fhSize = 0;
DEBUGLOG(5, "ZSTD_compressContinue_internal, stage: %u, srcSize: %u", DEBUGLOG(5, "ZSTD_compressContinue_internal, stage: %u, srcSize: %u",
@ -2100,7 +2005,11 @@ static size_t ZSTD_compressContinue_internal (ZSTD_CCtx* cctx,
if (!srcSize) return fhSize; /* do not generate an empty block if no input */ if (!srcSize) return fhSize; /* do not generate an empty block if no input */
ZSTD_manageWindowContinuity(&cctx->blockState.matchState, src, srcSize); if (!ZSTD_window_update(&ms->window, src, srcSize)) {
ms->nextToUpdate = ms->window.dictLimit;
}
if (cctx->appliedParams.ldmParams.enableLdm)
ZSTD_window_update(&cctx->ldmState.window, src, srcSize);
DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=%u)", (U32)cctx->blockSize); DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=%u)", (U32)cctx->blockSize);
{ size_t const cSize = frame ? { size_t const cSize = frame ?
@ -2152,15 +2061,8 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_matchState_t* ms, ZSTD_CCtx_params
const BYTE* const iend = ip + srcSize; const BYTE* const iend = ip + srcSize;
ZSTD_compressionParameters const* cParams = &params->cParams; ZSTD_compressionParameters const* cParams = &params->cParams;
/* input becomes current prefix */ ZSTD_window_update(&ms->window, src, srcSize);
ms->lowLimit = ms->dictLimit;
ms->dictLimit = (U32)(ms->nextSrc - ms->base);
ms->dictBase = ms->base;
ms->base = ip - ms->dictLimit;
ms->nextToUpdate = ms->dictLimit;
ms->loadedDictEnd = params->forceWindow ? 0 : (U32)(iend - ms->base);
ms->nextSrc = iend;
if (srcSize <= HASH_READ_SIZE) return 0; if (srcSize <= HASH_READ_SIZE) return 0;
switch(params->cParams.strategy) switch(params->cParams.strategy)
@ -2190,7 +2092,7 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_matchState_t* ms, ZSTD_CCtx_params
assert(0); /* not possible : not a valid strategy id */ assert(0); /* not possible : not a valid strategy id */
} }
ms->nextToUpdate = (U32)(iend - ms->base); ms->nextToUpdate = (U32)(iend - ms->window.base);
return 0; return 0;
} }
@ -2513,7 +2415,8 @@ size_t ZSTD_compress_advanced_internal(
const void* dict,size_t dictSize, const void* dict,size_t dictSize,
ZSTD_CCtx_params params) ZSTD_CCtx_params params)
{ {
DEBUGLOG(4, "ZSTD_compress_advanced_internal"); DEBUGLOG(4, "ZSTD_compress_advanced_internal (srcSize:%u)",
(U32)srcSize);
CHECK_F( ZSTD_compressBegin_internal(cctx, dict, dictSize, ZSTD_dm_auto, NULL, CHECK_F( ZSTD_compressBegin_internal(cctx, dict, dictSize, ZSTD_dm_auto, NULL,
params, srcSize, ZSTDb_not_buffered) ); params, srcSize, ZSTDb_not_buffered) );
return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize); return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
@ -2993,7 +2896,7 @@ MEM_STATIC size_t ZSTD_limitCopy(void* dst, size_t dstCapacity,
/** ZSTD_compressStream_generic(): /** ZSTD_compressStream_generic():
* internal function for all *compressStream*() variants and *compress_generic() * internal function for all *compressStream*() variants and *compress_generic()
* non-static, because can be called from zstdmt.c * non-static, because can be called from zstdmt_compress.c
* @return : hint size for next input */ * @return : hint size for next input */
size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs, size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs,
ZSTD_outBuffer* output, ZSTD_outBuffer* output,
@ -3172,28 +3075,28 @@ size_t ZSTD_compress_generic (ZSTD_CCtx* cctx,
#ifdef ZSTD_MULTITHREAD #ifdef ZSTD_MULTITHREAD
if ((cctx->pledgedSrcSizePlusOne-1) <= ZSTDMT_JOBSIZE_MIN) { if ((cctx->pledgedSrcSizePlusOne-1) <= ZSTDMT_JOBSIZE_MIN) {
params.nbThreads = 1; /* do not invoke multi-threading when src size is too small */ params.nbWorkers = 0; /* do not invoke multi-threading when src size is too small */
params.nonBlockingMode = 0;
} }
if ((params.nbThreads > 1) | (params.nonBlockingMode == 1)) { if (params.nbWorkers > 0) {
if (cctx->mtctx == NULL || (params.nbThreads != ZSTDMT_getNbThreads(cctx->mtctx))) { /* mt context creation */
DEBUGLOG(4, "ZSTD_compress_generic: creating new mtctx for nbThreads=%u", if (cctx->mtctx == NULL || (params.nbWorkers != ZSTDMT_getNbWorkers(cctx->mtctx))) {
params.nbThreads); DEBUGLOG(4, "ZSTD_compress_generic: creating new mtctx for nbWorkers=%u",
params.nbWorkers);
if (cctx->mtctx != NULL) if (cctx->mtctx != NULL)
DEBUGLOG(4, "ZSTD_compress_generic: previous nbThreads was %u", DEBUGLOG(4, "ZSTD_compress_generic: previous nbWorkers was %u",
ZSTDMT_getNbThreads(cctx->mtctx)); ZSTDMT_getNbWorkers(cctx->mtctx));
ZSTDMT_freeCCtx(cctx->mtctx); ZSTDMT_freeCCtx(cctx->mtctx);
cctx->mtctx = ZSTDMT_createCCtx_advanced(params.nbThreads, cctx->customMem); cctx->mtctx = ZSTDMT_createCCtx_advanced(params.nbWorkers, cctx->customMem);
if (cctx->mtctx == NULL) return ERROR(memory_allocation); if (cctx->mtctx == NULL) return ERROR(memory_allocation);
} }
DEBUGLOG(4, "call ZSTDMT_initCStream_internal as nbThreads=%u", params.nbThreads); /* mt compression */
DEBUGLOG(4, "call ZSTDMT_initCStream_internal as nbWorkers=%u", params.nbWorkers);
CHECK_F( ZSTDMT_initCStream_internal( CHECK_F( ZSTDMT_initCStream_internal(
cctx->mtctx, cctx->mtctx,
prefixDict.dict, prefixDict.dictSize, ZSTD_dm_rawContent, prefixDict.dict, prefixDict.dictSize, ZSTD_dm_rawContent,
cctx->cdict, params, cctx->pledgedSrcSizePlusOne-1) ); cctx->cdict, params, cctx->pledgedSrcSizePlusOne-1) );
cctx->streamStage = zcss_load; cctx->streamStage = zcss_load;
cctx->appliedParams.nbThreads = params.nbThreads; cctx->appliedParams.nbWorkers = params.nbWorkers;
cctx->appliedParams.nonBlockingMode = params.nonBlockingMode;
} else } else
#endif #endif
{ CHECK_F( ZSTD_resetCStream_internal( { CHECK_F( ZSTD_resetCStream_internal(
@ -3201,19 +3104,23 @@ size_t ZSTD_compress_generic (ZSTD_CCtx* cctx,
prefixDict.dictMode, cctx->cdict, params, prefixDict.dictMode, cctx->cdict, params,
cctx->pledgedSrcSizePlusOne-1) ); cctx->pledgedSrcSizePlusOne-1) );
assert(cctx->streamStage == zcss_load); assert(cctx->streamStage == zcss_load);
assert(cctx->appliedParams.nbThreads <= 1); assert(cctx->appliedParams.nbWorkers == 0);
} } } }
/* compression stage */ /* compression stage */
#ifdef ZSTD_MULTITHREAD #ifdef ZSTD_MULTITHREAD
if ((cctx->appliedParams.nbThreads > 1) | (cctx->appliedParams.nonBlockingMode==1)) { if (cctx->appliedParams.nbWorkers > 0) {
size_t const flushMin = ZSTDMT_compressStream_generic(cctx->mtctx, output, input, endOp); if (cctx->cParamsChanged) {
ZSTDMT_updateCParams_whileCompressing(cctx->mtctx, cctx->requestedParams.compressionLevel, cctx->requestedParams.cParams);
cctx->cParamsChanged = 0;
}
{ size_t const flushMin = ZSTDMT_compressStream_generic(cctx->mtctx, output, input, endOp);
if ( ZSTD_isError(flushMin) if ( ZSTD_isError(flushMin)
|| (endOp == ZSTD_e_end && flushMin == 0) ) { /* compression completed */ || (endOp == ZSTD_e_end && flushMin == 0) ) { /* compression completed */
ZSTD_startNewCompression(cctx); ZSTD_startNewCompression(cctx);
} }
return flushMin; return flushMin;
} } }
#endif #endif
CHECK_F( ZSTD_compressStream_generic(cctx, output, input, endOp) ); CHECK_F( ZSTD_compressStream_generic(cctx, output, input, endOp) );
DEBUGLOG(5, "completed ZSTD_compress_generic"); DEBUGLOG(5, "completed ZSTD_compress_generic");

View File

@ -0,0 +1,106 @@
/*
* Copyright (c) 2018-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under both the BSD-style license (found in the
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
* in the COPYING file in the root directory of this source tree).
* You may select, at your option, one of the above-listed licenses.
*/
#ifndef FUNCTION
# error "FUNCTION(name) must be defined"
#endif
#ifndef TARGET
# error "TARGET must be defined"
#endif
MEM_STATIC TARGET
size_t FUNCTION(ZSTD_encodeSequences)(
void* dst, size_t dstCapacity,
FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
seqDef const* sequences, size_t nbSeq, int longOffsets)
{
BIT_CStream_t blockStream;
FSE_CState_t stateMatchLength;
FSE_CState_t stateOffsetBits;
FSE_CState_t stateLitLength;
CHECK_E(BIT_initCStream(&blockStream, dst, dstCapacity), dstSize_tooSmall); /* not enough space remaining */
/* first symbols */
FSE_initCState2(&stateMatchLength, CTable_MatchLength, mlCodeTable[nbSeq-1]);
FSE_initCState2(&stateOffsetBits, CTable_OffsetBits, ofCodeTable[nbSeq-1]);
FSE_initCState2(&stateLitLength, CTable_LitLength, llCodeTable[nbSeq-1]);
BIT_addBits(&blockStream, sequences[nbSeq-1].litLength, LL_bits[llCodeTable[nbSeq-1]]);
if (MEM_32bits()) BIT_flushBits(&blockStream);
BIT_addBits(&blockStream, sequences[nbSeq-1].matchLength, ML_bits[mlCodeTable[nbSeq-1]]);
if (MEM_32bits()) BIT_flushBits(&blockStream);
if (longOffsets) {
U32 const ofBits = ofCodeTable[nbSeq-1];
int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
if (extraBits) {
BIT_addBits(&blockStream, sequences[nbSeq-1].offset, extraBits);
BIT_flushBits(&blockStream);
}
BIT_addBits(&blockStream, sequences[nbSeq-1].offset >> extraBits,
ofBits - extraBits);
} else {
BIT_addBits(&blockStream, sequences[nbSeq-1].offset, ofCodeTable[nbSeq-1]);
}
BIT_flushBits(&blockStream);
{ size_t n;
for (n=nbSeq-2 ; n<nbSeq ; n--) { /* intentional underflow */
BYTE const llCode = llCodeTable[n];
BYTE const ofCode = ofCodeTable[n];
BYTE const mlCode = mlCodeTable[n];
U32 const llBits = LL_bits[llCode];
U32 const ofBits = ofCode;
U32 const mlBits = ML_bits[mlCode];
DEBUGLOG(6, "encoding: litlen:%2u - matchlen:%2u - offCode:%7u",
sequences[n].litLength,
sequences[n].matchLength + MINMATCH,
sequences[n].offset);
/* 32b*/ /* 64b*/
/* (7)*/ /* (7)*/
FSE_encodeSymbol(&blockStream, &stateOffsetBits, ofCode); /* 15 */ /* 15 */
FSE_encodeSymbol(&blockStream, &stateMatchLength, mlCode); /* 24 */ /* 24 */
if (MEM_32bits()) BIT_flushBits(&blockStream); /* (7)*/
FSE_encodeSymbol(&blockStream, &stateLitLength, llCode); /* 16 */ /* 33 */
if (MEM_32bits() || (ofBits+mlBits+llBits >= 64-7-(LLFSELog+MLFSELog+OffFSELog)))
BIT_flushBits(&blockStream); /* (7)*/
BIT_addBits(&blockStream, sequences[n].litLength, llBits);
if (MEM_32bits() && ((llBits+mlBits)>24)) BIT_flushBits(&blockStream);
BIT_addBits(&blockStream, sequences[n].matchLength, mlBits);
if (MEM_32bits() || (ofBits+mlBits+llBits > 56)) BIT_flushBits(&blockStream);
if (longOffsets) {
int const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN-1);
if (extraBits) {
BIT_addBits(&blockStream, sequences[n].offset, extraBits);
BIT_flushBits(&blockStream); /* (7)*/
}
BIT_addBits(&blockStream, sequences[n].offset >> extraBits,
ofBits - extraBits); /* 31 */
} else {
BIT_addBits(&blockStream, sequences[n].offset, ofBits); /* 31 */
}
BIT_flushBits(&blockStream); /* (7)*/
} }
DEBUGLOG(6, "ZSTD_encodeSequences: flushing ML state with %u bits", stateMatchLength.stateLog);
FSE_flushCState(&blockStream, &stateMatchLength);
DEBUGLOG(6, "ZSTD_encodeSequences: flushing Off state with %u bits", stateOffsetBits.stateLog);
FSE_flushCState(&blockStream, &stateOffsetBits);
DEBUGLOG(6, "ZSTD_encodeSequences: flushing LL state with %u bits", stateLitLength.stateLog);
FSE_flushCState(&blockStream, &stateLitLength);
{ size_t const streamSize = BIT_closeCStream(&blockStream);
if (streamSize==0) return ERROR(dstSize_tooSmall); /* not enough space */
return streamSize;
}
}

View File

@ -30,8 +30,9 @@ extern "C" {
/*-************************************* /*-*************************************
* Constants * Constants
***************************************/ ***************************************/
static const U32 g_searchStrength = 8; #define kSearchStrength 8
#define HASH_READ_SIZE 8 #define HASH_READ_SIZE 8
#define ZSTD_CLEVEL_CUSTOM 999
#define ZSTD_DUBT_UNSORTED_MARK 1 /* For btlazy2 strategy, index 1 now means "unsorted". #define ZSTD_DUBT_UNSORTED_MARK 1 /* For btlazy2 strategy, index 1 now means "unsorted".
It could be confused for a real successor at index "1", if sorted as larger than its predecessor. It could be confused for a real successor at index "1", if sorted as larger than its predecessor.
It's not a big deal though : candidate will just be sorted again. It's not a big deal though : candidate will just be sorted again.
@ -109,10 +110,14 @@ typedef struct {
BYTE const* dictBase; /* extDict indexes relative to this position */ BYTE const* dictBase; /* extDict indexes relative to this position */
U32 dictLimit; /* below that point, need extDict */ U32 dictLimit; /* below that point, need extDict */
U32 lowLimit; /* below that point, no more data */ U32 lowLimit; /* below that point, no more data */
} ZSTD_window_t;
typedef struct {
ZSTD_window_t window; /* State for window round buffer management */
U32 loadedDictEnd; /* index of end of dictionary */
U32 nextToUpdate; /* index from which to continue table update */ U32 nextToUpdate; /* index from which to continue table update */
U32 nextToUpdate3; /* index from which to continue table update */ U32 nextToUpdate3; /* index from which to continue table update */
U32 hashLog3; /* dispatch table : larger == faster, more memory */ U32 hashLog3; /* dispatch table : larger == faster, more memory */
U32 loadedDictEnd; /* index of end of dictionary */
U32* hashTable; U32* hashTable;
U32* hashTable3; U32* hashTable3;
U32* chainTable; U32* chainTable;
@ -131,6 +136,7 @@ typedef struct {
} ldmEntry_t; } ldmEntry_t;
typedef struct { typedef struct {
ZSTD_window_t window; /* State for the window round buffer management */
ldmEntry_t* hashTable; ldmEntry_t* hashTable;
BYTE* bucketOffsets; /* Next position in bucket to insert entry */ BYTE* bucketOffsets; /* Next position in bucket to insert entry */
U64 hashPower; /* Used to compute the rolling hash. U64 hashPower; /* Used to compute the rolling hash.
@ -143,6 +149,7 @@ typedef struct {
U32 bucketSizeLog; /* Log bucket size for collision resolution, at most 8 */ U32 bucketSizeLog; /* Log bucket size for collision resolution, at most 8 */
U32 minMatchLength; /* Minimum match length */ U32 minMatchLength; /* Minimum match length */
U32 hashEveryLog; /* Log number of entries to skip */ U32 hashEveryLog; /* Log number of entries to skip */
U32 windowLog; /* Window log for the LDM */
} ldmParams_t; } ldmParams_t;
struct ZSTD_CCtx_params_s { struct ZSTD_CCtx_params_s {
@ -151,12 +158,11 @@ struct ZSTD_CCtx_params_s {
ZSTD_frameParameters fParams; ZSTD_frameParameters fParams;
int compressionLevel; int compressionLevel;
U32 forceWindow; /* force back-references to respect limit of int forceWindow; /* force back-references to respect limit of
* 1<<wLog, even for dictionary */ * 1<<wLog, even for dictionary */
/* Multithreading: used to pass parameters to mtctx */ /* Multithreading: used to pass parameters to mtctx */
U32 nbThreads; unsigned nbWorkers;
int nonBlockingMode; /* will trigger ZSTDMT even with nbThreads==1 */
unsigned jobSize; unsigned jobSize;
unsigned overlapSizeLog; unsigned overlapSizeLog;
@ -165,14 +171,15 @@ struct ZSTD_CCtx_params_s {
/* For use with createCCtxParams() and freeCCtxParams() only */ /* For use with createCCtxParams() and freeCCtxParams() only */
ZSTD_customMem customMem; ZSTD_customMem customMem;
}; /* typedef'd to ZSTD_CCtx_params within "zstd.h" */ }; /* typedef'd to ZSTD_CCtx_params within "zstd.h" */
struct ZSTD_CCtx_s { struct ZSTD_CCtx_s {
ZSTD_compressionStage_e stage; ZSTD_compressionStage_e stage;
U32 dictID; int cParamsChanged; /* == 1 if cParams(except wlog) or compression level are changed in requestedParams. Triggers transmission of new params to ZSTDMT (if available) then reset to 0. */
int bmi2; /* == 1 if the CPU supports BMI2 and 0 otherwise. CPU support is determined dynamically once per context lifetime. */
ZSTD_CCtx_params requestedParams; ZSTD_CCtx_params requestedParams;
ZSTD_CCtx_params appliedParams; ZSTD_CCtx_params appliedParams;
U32 dictID;
void* workSpace; void* workSpace;
size_t workSpaceSize; size_t workSpaceSize;
size_t blockSize; size_t blockSize;
@ -185,6 +192,7 @@ struct ZSTD_CCtx_s {
seqStore_t seqStore; /* sequences storage ptrs */ seqStore_t seqStore; /* sequences storage ptrs */
ldmState_t ldmState; /* long distance matching state */ ldmState_t ldmState; /* long distance matching state */
rawSeq* ldmSequences; /* Storage for the ldm output sequences */
ZSTD_blockState_t blockState; ZSTD_blockState_t blockState;
U32* entropyWorkspace; /* entropy workspace of HUF_WORKSPACE_SIZE bytes */ U32* entropyWorkspace; /* entropy workspace of HUF_WORKSPACE_SIZE bytes */
@ -439,6 +447,159 @@ MEM_STATIC size_t ZSTD_hashPtr(const void* p, U32 hBits, U32 mls)
} }
} }
/*-*************************************
* Round buffer management
***************************************/
#define ZSTD_LOWLIMIT_MAX (3U << 29) /* Max lowLimit allowed */
/* Maximum chunk size before overflow correction needs to be called again */
#define ZSTD_CHUNKSIZE_MAX \
( ((U32)-1) /* Maximum ending current index */ \
- (1U << ZSTD_WINDOWLOG_MAX) /* Max distance from lowLimit to current */ \
- ZSTD_LOWLIMIT_MAX) /* Maximum beginning lowLimit */
/**
* ZSTD_window_clear():
* Clears the window containing the history by simply setting it to empty.
*/
MEM_STATIC void ZSTD_window_clear(ZSTD_window_t* window)
{
size_t const endT = (size_t)(window->nextSrc - window->base);
U32 const end = (U32)endT;
window->lowLimit = end;
window->dictLimit = end;
}
/**
* ZSTD_window_hasExtDict():
* Returns non-zero if the window has a non-empty extDict.
*/
MEM_STATIC U32 ZSTD_window_hasExtDict(ZSTD_window_t const window)
{
return window.lowLimit < window.dictLimit;
}
/**
* ZSTD_window_needOverflowCorrection():
* Returns non-zero if the indices are getting too large and need overflow
* protection.
*/
MEM_STATIC U32 ZSTD_window_needOverflowCorrection(ZSTD_window_t const window)
{
return window.lowLimit > ZSTD_LOWLIMIT_MAX;
}
/**
* ZSTD_window_correctOverflow():
* Reduces the indices to protect from index overflow.
* Returns the correction made to the indices, which must be applied to every
* stored index.
*
* The least significant cycleLog bits of the indices must remain the same,
* which may be 0. Every index up to maxDist in the past must be valid.
* NOTE: (maxDist & cycleMask) must be zero.
*/
MEM_STATIC U32 ZSTD_window_correctOverflow(ZSTD_window_t* window, U32 cycleLog,
U32 maxDist, void const* src)
{
/* preemptive overflow correction:
* 1. correction is large enough:
* lowLimit > (3<<29) ==> current > 3<<29 + 1<<windowLog
* 1<<windowLog <= newCurrent < 1<<chainLog + 1<<windowLog
*
* current - newCurrent
* > (3<<29 + 1<<windowLog) - (1<<windowLog + 1<<chainLog)
* > (3<<29) - (1<<chainLog)
* > (3<<29) - (1<<30) (NOTE: chainLog <= 30)
* > 1<<29
*
* 2. (ip+ZSTD_CHUNKSIZE_MAX - cctx->base) doesn't overflow:
* After correction, current is less than (1<<chainLog + 1<<windowLog).
* In 64-bit mode we are safe, because we have 64-bit ptrdiff_t.
* In 32-bit mode we are safe, because (chainLog <= 29), so
* ip+ZSTD_CHUNKSIZE_MAX - cctx->base < 1<<32.
* 3. (cctx->lowLimit + 1<<windowLog) < 1<<32:
* windowLog <= 31 ==> 3<<29 + 1<<windowLog < 7<<29 < 1<<32.
*/
U32 const cycleMask = (1U << cycleLog) - 1;
U32 const current = (U32)((BYTE const*)src - window->base);
U32 const newCurrent = (current & cycleMask) + maxDist;
U32 const correction = current - newCurrent;
assert((maxDist & cycleMask) == 0);
assert(current > newCurrent);
/* Loose bound, should be around 1<<29 (see above) */
assert(correction > 1<<28);
window->base += correction;
window->dictBase += correction;
window->lowLimit -= correction;
window->dictLimit -= correction;
DEBUGLOG(4, "Correction of 0x%x bytes to lowLimit=0x%x", correction,
window->lowLimit);
return correction;
}
/**
* ZSTD_window_enforceMaxDist():
* Sets lowLimit such that indices earlier than (srcEnd - base) - lowLimit are
* invalid. This allows a simple check index >= lowLimit to see if it is valid.
* Source pointers past srcEnd are not guaranteed to be valid.
*/
MEM_STATIC void ZSTD_window_enforceMaxDist(ZSTD_window_t* window,
void const* srcEnd, U32 maxDist)
{
U32 const current = (U32)((BYTE const*)srcEnd - window->base);
if (current > maxDist) {
U32 const newLowLimit = current - maxDist;
if (window->lowLimit < newLowLimit) window->lowLimit = newLowLimit;
if (window->dictLimit < window->lowLimit) {
DEBUGLOG(5, "Update dictLimit from %u to %u", window->dictLimit,
window->lowLimit);
window->dictLimit = window->lowLimit;
}
}
}
/**
* ZSTD_window_update():
* Updates the window by appending [src, src + srcSize) to the window.
* If it is not contiguous, the current prefix becomes the extDict, and we
* forget about the extDict. Handles overlap of the prefix and extDict.
* Returns non-zero if the segment is contiguous.
*/
MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* window,
void const* src, size_t srcSize)
{
BYTE const* const ip = (BYTE const*)src;
U32 contiguous = 1;
/* Check if blocks follow each other */
if (src != window->nextSrc) {
/* not contiguous */
size_t const distanceFromBase = (size_t)(window->nextSrc - window->base);
DEBUGLOG(5, "Non contiguous blocks, new segment starts at %u",
window->dictLimit);
window->lowLimit = window->dictLimit;
assert(distanceFromBase == (size_t)(U32)distanceFromBase); /* should never overflow */
window->dictLimit = (U32)distanceFromBase;
window->dictBase = window->base;
window->base = ip - distanceFromBase;
// ms->nextToUpdate = window->dictLimit;
if (window->dictLimit - window->lowLimit < HASH_READ_SIZE) window->lowLimit = window->dictLimit; /* too small extDict */
contiguous = 0;
}
window->nextSrc = ip + srcSize;
/* if input and dictionary overlap : reduce dictionary (area presumed modified by input) */
if ( (ip+srcSize > window->dictBase + window->lowLimit)
& (ip < window->dictBase + window->dictLimit)) {
ptrdiff_t const highInputIdx = (ip + srcSize) - window->dictBase;
U32 const lowLimitMax = (highInputIdx > (ptrdiff_t)window->dictLimit) ? window->dictLimit : (U32)highInputIdx;
window->lowLimit = lowLimitMax;
}
return contiguous;
}
#if defined (__cplusplus) #if defined (__cplusplus)
} }
#endif #endif

View File

@ -21,7 +21,7 @@ void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms,
U32 const mls = cParams->searchLength; U32 const mls = cParams->searchLength;
U32* const hashSmall = ms->chainTable; U32* const hashSmall = ms->chainTable;
U32 const hBitsS = cParams->chainLog; U32 const hBitsS = cParams->chainLog;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* ip = base + ms->nextToUpdate; const BYTE* ip = base + ms->nextToUpdate;
const BYTE* const iend = ((const BYTE*)end) - HASH_READ_SIZE; const BYTE* const iend = ((const BYTE*)end) - HASH_READ_SIZE;
const U32 fastHashFillStep = 3; const U32 fastHashFillStep = 3;
@ -55,11 +55,11 @@ size_t ZSTD_compressBlock_doubleFast_generic(
const U32 hBitsL = cParams->hashLog; const U32 hBitsL = cParams->hashLog;
U32* const hashSmall = ms->chainTable; U32* const hashSmall = ms->chainTable;
const U32 hBitsS = cParams->chainLog; const U32 hBitsS = cParams->chainLog;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const istart = (const BYTE*)src; const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart; const BYTE* ip = istart;
const BYTE* anchor = istart; const BYTE* anchor = istart;
const U32 lowestIndex = ms->dictLimit; const U32 lowestIndex = ms->window.dictLimit;
const BYTE* const lowest = base + lowestIndex; const BYTE* const lowest = base + lowestIndex;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - HASH_READ_SIZE; const BYTE* const ilimit = iend - HASH_READ_SIZE;
@ -113,7 +113,7 @@ size_t ZSTD_compressBlock_doubleFast_generic(
while (((ip>anchor) & (match>lowest)) && (ip[-1] == match[-1])) { ip--; match--; mLength++; } /* catch up */ while (((ip>anchor) & (match>lowest)) && (ip[-1] == match[-1])) { ip--; match--; mLength++; } /* catch up */
} }
} else { } else {
ip += ((ip-anchor) >> g_searchStrength) + 1; ip += ((ip-anchor) >> kSearchStrength) + 1;
continue; continue;
} }
@ -187,14 +187,14 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_generic(
U32 const hBitsL = cParams->hashLog; U32 const hBitsL = cParams->hashLog;
U32* const hashSmall = ms->chainTable; U32* const hashSmall = ms->chainTable;
U32 const hBitsS = cParams->chainLog; U32 const hBitsS = cParams->chainLog;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const BYTE* const istart = (const BYTE*)src; const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart; const BYTE* ip = istart;
const BYTE* anchor = istart; const BYTE* anchor = istart;
const U32 lowestIndex = ms->lowLimit; const U32 lowestIndex = ms->window.lowLimit;
const BYTE* const dictStart = dictBase + lowestIndex; const BYTE* const dictStart = dictBase + lowestIndex;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const BYTE* const lowPrefixPtr = base + dictLimit; const BYTE* const lowPrefixPtr = base + dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
@ -264,7 +264,7 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_generic(
ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH); ZSTD_storeSeq(seqStore, ip-anchor, anchor, offset + ZSTD_REP_MOVE, mLength-MINMATCH);
} else { } else {
ip += ((ip-anchor) >> g_searchStrength) + 1; ip += ((ip-anchor) >> kSearchStrength) + 1;
continue; continue;
} } } }

View File

@ -19,7 +19,7 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
U32* const hashTable = ms->hashTable; U32* const hashTable = ms->hashTable;
U32 const hBits = cParams->hashLog; U32 const hBits = cParams->hashLog;
U32 const mls = cParams->searchLength; U32 const mls = cParams->searchLength;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* ip = base + ms->nextToUpdate; const BYTE* ip = base + ms->nextToUpdate;
const BYTE* const iend = ((const BYTE*)end) - HASH_READ_SIZE; const BYTE* const iend = ((const BYTE*)end) - HASH_READ_SIZE;
const U32 fastHashFillStep = 3; const U32 fastHashFillStep = 3;
@ -45,11 +45,11 @@ size_t ZSTD_compressBlock_fast_generic(
U32 const hlog, U32 const mls) U32 const hlog, U32 const mls)
{ {
U32* const hashTable = ms->hashTable; U32* const hashTable = ms->hashTable;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const istart = (const BYTE*)src; const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart; const BYTE* ip = istart;
const BYTE* anchor = istart; const BYTE* anchor = istart;
const U32 lowestIndex = ms->dictLimit; const U32 lowestIndex = ms->window.dictLimit;
const BYTE* const lowest = base + lowestIndex; const BYTE* const lowest = base + lowestIndex;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - HASH_READ_SIZE; const BYTE* const ilimit = iend - HASH_READ_SIZE;
@ -79,7 +79,7 @@ size_t ZSTD_compressBlock_fast_generic(
} else { } else {
U32 offset; U32 offset;
if ( (matchIndex <= lowestIndex) || (MEM_read32(match) != MEM_read32(ip)) ) { if ( (matchIndex <= lowestIndex) || (MEM_read32(match) != MEM_read32(ip)) ) {
ip += ((ip-anchor) >> g_searchStrength) + 1; ip += ((ip-anchor) >> kSearchStrength) + 1;
continue; continue;
} }
mLength = ZSTD_count(ip+4, match+4, iend) + 4; mLength = ZSTD_count(ip+4, match+4, iend) + 4;
@ -149,14 +149,14 @@ static size_t ZSTD_compressBlock_fast_extDict_generic(
U32 const hlog, U32 const mls) U32 const hlog, U32 const mls)
{ {
U32* hashTable = ms->hashTable; U32* hashTable = ms->hashTable;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const BYTE* const istart = (const BYTE*)src; const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart; const BYTE* ip = istart;
const BYTE* anchor = istart; const BYTE* anchor = istart;
const U32 lowestIndex = ms->lowLimit; const U32 lowestIndex = ms->window.lowLimit;
const BYTE* const dictStart = dictBase + lowestIndex; const BYTE* const dictStart = dictBase + lowestIndex;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const BYTE* const lowPrefixPtr = base + dictLimit; const BYTE* const lowPrefixPtr = base + dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
@ -185,7 +185,7 @@ static size_t ZSTD_compressBlock_fast_extDict_generic(
} else { } else {
if ( (matchIndex < lowestIndex) || if ( (matchIndex < lowestIndex) ||
(MEM_read32(match) != MEM_read32(ip)) ) { (MEM_read32(match) != MEM_read32(ip)) ) {
ip += ((ip-anchor) >> g_searchStrength) + 1; ip += ((ip-anchor) >> kSearchStrength) + 1;
continue; continue;
} }
{ const BYTE* matchEnd = matchIndex < dictLimit ? dictEnd : iend; { const BYTE* matchEnd = matchIndex < dictLimit ? dictEnd : iend;

View File

@ -28,17 +28,17 @@ void ZSTD_updateDUBT(
U32 const btLog = cParams->chainLog - 1; U32 const btLog = cParams->chainLog - 1;
U32 const btMask = (1 << btLog) - 1; U32 const btMask = (1 << btLog) - 1;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
U32 const target = (U32)(ip - base); U32 const target = (U32)(ip - base);
U32 idx = ms->nextToUpdate; U32 idx = ms->nextToUpdate;
if (idx != target) if (idx != target)
DEBUGLOG(7, "ZSTD_updateDUBT, from %u to %u (dictLimit:%u)", DEBUGLOG(7, "ZSTD_updateDUBT, from %u to %u (dictLimit:%u)",
idx, target, ms->dictLimit); idx, target, ms->window.dictLimit);
assert(ip + 8 <= iend); /* condition for ZSTD_hashPtr */ assert(ip + 8 <= iend); /* condition for ZSTD_hashPtr */
(void)iend; (void)iend;
assert(idx >= ms->dictLimit); /* condition for valid base+idx */ assert(idx >= ms->window.dictLimit); /* condition for valid base+idx */
for ( ; idx < target ; idx++) { for ( ; idx < target ; idx++) {
size_t const h = ZSTD_hashPtr(base + idx, hashLog, mls); /* assumption : ip + 8 <= iend */ size_t const h = ZSTD_hashPtr(base + idx, hashLog, mls); /* assumption : ip + 8 <= iend */
U32 const matchIndex = hashTable[h]; U32 const matchIndex = hashTable[h];
@ -68,9 +68,9 @@ static void ZSTD_insertDUBT1(
U32 const btLog = cParams->chainLog - 1; U32 const btLog = cParams->chainLog - 1;
U32 const btMask = (1 << btLog) - 1; U32 const btMask = (1 << btLog) - 1;
size_t commonLengthSmaller=0, commonLengthLarger=0; size_t commonLengthSmaller=0, commonLengthLarger=0;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const BYTE* const ip = (current>=dictLimit) ? base + current : dictBase + current; const BYTE* const ip = (current>=dictLimit) ? base + current : dictBase + current;
const BYTE* const iend = (current>=dictLimit) ? inputEnd : dictBase + dictLimit; const BYTE* const iend = (current>=dictLimit) ? inputEnd : dictBase + dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
@ -80,7 +80,7 @@ static void ZSTD_insertDUBT1(
U32* largerPtr = smallerPtr + 1; U32* largerPtr = smallerPtr + 1;
U32 matchIndex = *smallerPtr; U32 matchIndex = *smallerPtr;
U32 dummy32; /* to be nullified at the end */ U32 dummy32; /* to be nullified at the end */
U32 const windowLow = ms->lowLimit; U32 const windowLow = ms->window.lowLimit;
DEBUGLOG(8, "ZSTD_insertDUBT1(%u) (dictLimit=%u, lowLimit=%u)", DEBUGLOG(8, "ZSTD_insertDUBT1(%u) (dictLimit=%u, lowLimit=%u)",
current, dictLimit, windowLow); current, dictLimit, windowLow);
@ -150,9 +150,9 @@ static size_t ZSTD_DUBT_findBestMatch (
size_t const h = ZSTD_hashPtr(ip, hashLog, mls); size_t const h = ZSTD_hashPtr(ip, hashLog, mls);
U32 matchIndex = hashTable[h]; U32 matchIndex = hashTable[h];
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
U32 const current = (U32)(ip-base); U32 const current = (U32)(ip-base);
U32 const windowLow = ms->lowLimit; U32 const windowLow = ms->window.lowLimit;
U32* const bt = ms->chainTable; U32* const bt = ms->chainTable;
U32 const btLog = cParams->chainLog - 1; U32 const btLog = cParams->chainLog - 1;
@ -203,8 +203,8 @@ static size_t ZSTD_DUBT_findBestMatch (
/* find longest match */ /* find longest match */
{ size_t commonLengthSmaller=0, commonLengthLarger=0; { size_t commonLengthSmaller=0, commonLengthLarger=0;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const prefixStart = base + dictLimit; const BYTE* const prefixStart = base + dictLimit;
U32* smallerPtr = bt + 2*(current&btMask); U32* smallerPtr = bt + 2*(current&btMask);
@ -279,7 +279,7 @@ static size_t ZSTD_BtFindBestMatch (
const U32 mls /* template */) const U32 mls /* template */)
{ {
DEBUGLOG(7, "ZSTD_BtFindBestMatch"); DEBUGLOG(7, "ZSTD_BtFindBestMatch");
if (ip < ms->base + ms->nextToUpdate) return 0; /* skipped area */ if (ip < ms->window.base + ms->nextToUpdate) return 0; /* skipped area */
ZSTD_updateDUBT(ms, cParams, ip, iLimit, mls); ZSTD_updateDUBT(ms, cParams, ip, iLimit, mls);
return ZSTD_DUBT_findBestMatch(ms, cParams, ip, iLimit, offsetPtr, mls, 0); return ZSTD_DUBT_findBestMatch(ms, cParams, ip, iLimit, offsetPtr, mls, 0);
} }
@ -309,7 +309,7 @@ static size_t ZSTD_BtFindBestMatch_extDict (
const U32 mls) const U32 mls)
{ {
DEBUGLOG(7, "ZSTD_BtFindBestMatch_extDict"); DEBUGLOG(7, "ZSTD_BtFindBestMatch_extDict");
if (ip < ms->base + ms->nextToUpdate) return 0; /* skipped area */ if (ip < ms->window.base + ms->nextToUpdate) return 0; /* skipped area */
ZSTD_updateDUBT(ms, cParams, ip, iLimit, mls); ZSTD_updateDUBT(ms, cParams, ip, iLimit, mls);
return ZSTD_DUBT_findBestMatch(ms, cParams, ip, iLimit, offsetPtr, mls, 1); return ZSTD_DUBT_findBestMatch(ms, cParams, ip, iLimit, offsetPtr, mls, 1);
} }
@ -347,7 +347,7 @@ static U32 ZSTD_insertAndFindFirstIndex_internal(
const U32 hashLog = cParams->hashLog; const U32 hashLog = cParams->hashLog;
U32* const chainTable = ms->chainTable; U32* const chainTable = ms->chainTable;
const U32 chainMask = (1 << cParams->chainLog) - 1; const U32 chainMask = (1 << cParams->chainLog) - 1;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const U32 target = (U32)(ip - base); const U32 target = (U32)(ip - base);
U32 idx = ms->nextToUpdate; U32 idx = ms->nextToUpdate;
@ -381,12 +381,12 @@ size_t ZSTD_HcFindBestMatch_generic (
U32* const chainTable = ms->chainTable; U32* const chainTable = ms->chainTable;
const U32 chainSize = (1 << cParams->chainLog); const U32 chainSize = (1 << cParams->chainLog);
const U32 chainMask = chainSize-1; const U32 chainMask = chainSize-1;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const BYTE* const prefixStart = base + dictLimit; const BYTE* const prefixStart = base + dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const U32 lowLimit = ms->lowLimit; const U32 lowLimit = ms->window.lowLimit;
const U32 current = (U32)(ip-base); const U32 current = (U32)(ip-base);
const U32 minChain = current > chainSize ? current - chainSize : 0; const U32 minChain = current > chainSize ? current - chainSize : 0;
U32 nbAttempts = 1U << cParams->searchLog; U32 nbAttempts = 1U << cParams->searchLog;
@ -471,7 +471,7 @@ size_t ZSTD_compressBlock_lazy_generic(
const BYTE* anchor = istart; const BYTE* anchor = istart;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - 8; const BYTE* const ilimit = iend - 8;
const BYTE* const base = ms->base + ms->dictLimit; const BYTE* const base = ms->window.base + ms->window.dictLimit;
typedef size_t (*searchMax_f)( typedef size_t (*searchMax_f)(
ZSTD_matchState_t* ms, ZSTD_compressionParameters const* cParams, ZSTD_matchState_t* ms, ZSTD_compressionParameters const* cParams,
@ -508,7 +508,7 @@ size_t ZSTD_compressBlock_lazy_generic(
} }
if (matchLength < 4) { if (matchLength < 4) {
ip += ((ip-anchor) >> g_searchStrength) + 1; /* jump faster over incompressible sections */ ip += ((ip-anchor) >> kSearchStrength) + 1; /* jump faster over incompressible sections */
continue; continue;
} }
@ -635,13 +635,13 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
const BYTE* anchor = istart; const BYTE* anchor = istart;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - 8; const BYTE* const ilimit = iend - 8;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const U32 lowestIndex = ms->lowLimit; const U32 lowestIndex = ms->window.lowLimit;
const BYTE* const prefixStart = base + dictLimit; const BYTE* const prefixStart = base + dictLimit;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const dictStart = dictBase + ms->lowLimit; const BYTE* const dictStart = dictBase + lowestIndex;
typedef size_t (*searchMax_f)( typedef size_t (*searchMax_f)(
ZSTD_matchState_t* ms, ZSTD_compressionParameters const* cParams, ZSTD_matchState_t* ms, ZSTD_compressionParameters const* cParams,
@ -681,7 +681,7 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
} }
if (matchLength < 4) { if (matchLength < 4) {
ip += ((ip-anchor) >> g_searchStrength) + 1; /* jump faster over incompressible sections */ ip += ((ip-anchor) >> kSearchStrength) + 1; /* jump faster over incompressible sections */
continue; continue;
} }

View File

@ -28,8 +28,17 @@ size_t ZSTD_ldm_initializeParameters(ldmParams_t* params, U32 enableLdm)
return 0; return 0;
} }
void ZSTD_ldm_adjustParameters(ldmParams_t* params, U32 windowLog) void ZSTD_ldm_adjustParameters(ldmParams_t* params,
ZSTD_compressionParameters const* cParams)
{ {
U32 const windowLog = cParams->windowLog;
if (cParams->strategy >= ZSTD_btopt) {
/* Get out of the way of the optimal parser */
U32 const minMatch = MAX(cParams->targetLength, params->minMatchLength);
assert(minMatch >= ZSTD_LDM_MINMATCH_MIN);
assert(minMatch <= ZSTD_LDM_MINMATCH_MAX);
params->minMatchLength = minMatch;
}
if (params->hashLog == 0) { if (params->hashLog == 0) {
params->hashLog = MAX(ZSTD_HASHLOG_MIN, windowLog - LDM_HASH_RLOG); params->hashLog = MAX(ZSTD_HASHLOG_MIN, windowLog - LDM_HASH_RLOG);
assert(params->hashLog <= ZSTD_HASHLOG_MAX); assert(params->hashLog <= ZSTD_HASHLOG_MAX);
@ -41,12 +50,19 @@ void ZSTD_ldm_adjustParameters(ldmParams_t* params, U32 windowLog)
params->bucketSizeLog = MIN(params->bucketSizeLog, params->hashLog); params->bucketSizeLog = MIN(params->bucketSizeLog, params->hashLog);
} }
size_t ZSTD_ldm_getTableSize(U32 hashLog, U32 bucketSizeLog) { size_t ZSTD_ldm_getTableSize(ldmParams_t params)
size_t const ldmHSize = ((size_t)1) << hashLog; {
size_t const ldmBucketSizeLog = MIN(bucketSizeLog, hashLog); size_t const ldmHSize = ((size_t)1) << params.hashLog;
size_t const ldmBucketSizeLog = MIN(params.bucketSizeLog, params.hashLog);
size_t const ldmBucketSize = size_t const ldmBucketSize =
((size_t)1) << (hashLog - ldmBucketSizeLog); ((size_t)1) << (params.hashLog - ldmBucketSizeLog);
return ldmBucketSize + (ldmHSize * (sizeof(ldmEntry_t))); size_t const totalSize = ldmBucketSize + ldmHSize * sizeof(ldmEntry_t);
return params.enableLdm ? totalSize : 0;
}
size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_t maxChunkSize)
{
return params.enableLdm ? (maxChunkSize / params.minMatchLength) : 0;
} }
/** ZSTD_ldm_getSmallHash() : /** ZSTD_ldm_getSmallHash() :
@ -215,12 +231,12 @@ static size_t ZSTD_ldm_fillFastTables(ZSTD_matchState_t* ms,
{ {
case ZSTD_fast: case ZSTD_fast:
ZSTD_fillHashTable(ms, cParams, iend); ZSTD_fillHashTable(ms, cParams, iend);
ms->nextToUpdate = (U32)(iend - ms->base); ms->nextToUpdate = (U32)(iend - ms->window.base);
break; break;
case ZSTD_dfast: case ZSTD_dfast:
ZSTD_fillDoubleHashTable(ms, cParams, iend); ZSTD_fillDoubleHashTable(ms, cParams, iend);
ms->nextToUpdate = (U32)(iend - ms->base); ms->nextToUpdate = (U32)(iend - ms->window.base);
break; break;
case ZSTD_greedy: case ZSTD_greedy:
@ -271,57 +287,61 @@ static U64 ZSTD_ldm_fillLdmHashTable(ldmState_t* state,
* (after a long match, only update tables a limited amount). */ * (after a long match, only update tables a limited amount). */
static void ZSTD_ldm_limitTableUpdate(ZSTD_matchState_t* ms, const BYTE* anchor) static void ZSTD_ldm_limitTableUpdate(ZSTD_matchState_t* ms, const BYTE* anchor)
{ {
U32 const current = (U32)(anchor - ms->base); U32 const current = (U32)(anchor - ms->window.base);
if (current > ms->nextToUpdate + 1024) { if (current > ms->nextToUpdate + 1024) {
ms->nextToUpdate = ms->nextToUpdate =
current - MIN(512, current - ms->nextToUpdate - 1024); current - MIN(512, current - ms->nextToUpdate - 1024);
} }
} }
size_t ZSTD_compressBlock_ldm( static size_t ZSTD_ldm_generateSequences_internal(
ldmState_t* ldmState, ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], ldmState_t* ldmState, rawSeq* sequences,
ZSTD_CCtx_params const* params, void const* src, size_t srcSize) ldmParams_t const* params, void const* src, size_t srcSize,
int const extDict)
{ {
ZSTD_compressionParameters const* cParams = &params->cParams; rawSeq const* const sequencesStart = sequences;
const ldmParams_t ldmParams = params->ldmParams; /* LDM parameters */
const U64 hashPower = ldmState->hashPower; U32 const minMatchLength = params->minMatchLength;
const U32 hBits = ldmParams.hashLog - ldmParams.bucketSizeLog; U64 const hashPower = ldmState->hashPower;
const U32 ldmBucketSize = ((U32)1 << ldmParams.bucketSizeLog); U32 const hBits = params->hashLog - params->bucketSizeLog;
const U32 ldmTagMask = ((U32)1 << ldmParams.hashEveryLog) - 1; U32 const ldmBucketSize = 1U << params->bucketSizeLog;
const BYTE* const base = ms->base; U32 const hashEveryLog = params->hashEveryLog;
const BYTE* const istart = (const BYTE*)src; U32 const ldmTagMask = (1U << params->hashEveryLog) - 1;
const BYTE* ip = istart; /* Prefix and extDict parameters */
const BYTE* anchor = istart; U32 const dictLimit = ldmState->window.dictLimit;
const U32 lowestIndex = ms->dictLimit; U32 const lowestIndex = extDict ? ldmState->window.lowLimit : dictLimit;
const BYTE* const lowest = base + lowestIndex; BYTE const* const base = ldmState->window.base;
const BYTE* const iend = istart + srcSize; BYTE const* const dictBase = extDict ? ldmState->window.dictBase : NULL;
const BYTE* const ilimit = iend - MAX(ldmParams.minMatchLength, HASH_READ_SIZE); BYTE const* const dictStart = extDict ? dictBase + lowestIndex : NULL;
BYTE const* const dictEnd = extDict ? dictBase + dictLimit : NULL;
const ZSTD_blockCompressor blockCompressor = BYTE const* const lowPrefixPtr = base + dictLimit;
ZSTD_selectBlockCompressor(cParams->strategy, 0); /* Input bounds */
BYTE const* const istart = (BYTE const*)src;
BYTE const* const iend = istart + srcSize;
BYTE const* const ilimit = iend - MAX(minMatchLength, HASH_READ_SIZE);
/* Input positions */
BYTE const* anchor = istart;
BYTE const* ip = istart;
/* Rolling hash */
BYTE const* lastHashed = NULL;
U64 rollingHash = 0; U64 rollingHash = 0;
const BYTE* lastHashed = NULL;
size_t i, lastLiterals;
/* Main Search Loop */ while (ip <= ilimit) {
while (ip < ilimit) { /* < instead of <=, because repcode check at (ip+1) */
size_t mLength; size_t mLength;
U32 const current = (U32)(ip - base); U32 const current = (U32)(ip - base);
size_t forwardMatchLength = 0, backwardMatchLength = 0; size_t forwardMatchLength = 0, backwardMatchLength = 0;
ldmEntry_t* bestEntry = NULL; ldmEntry_t* bestEntry = NULL;
if (ip != istart) { if (ip != istart) {
rollingHash = ZSTD_ldm_updateHash(rollingHash, lastHashed[0], rollingHash = ZSTD_ldm_updateHash(rollingHash, lastHashed[0],
lastHashed[ldmParams.minMatchLength], lastHashed[minMatchLength],
hashPower); hashPower);
} else { } else {
rollingHash = ZSTD_ldm_getRollingHash(ip, ldmParams.minMatchLength); rollingHash = ZSTD_ldm_getRollingHash(ip, minMatchLength);
} }
lastHashed = ip; lastHashed = ip;
/* Do not insert and do not look for a match */ /* Do not insert and do not look for a match */
if (ZSTD_ldm_getTag(rollingHash, hBits, ldmParams.hashEveryLog) != if (ZSTD_ldm_getTag(rollingHash, hBits, hashEveryLog) != ldmTagMask) {
ldmTagMask) {
ip++; ip++;
continue; continue;
} }
@ -331,27 +351,49 @@ size_t ZSTD_compressBlock_ldm(
ldmEntry_t* const bucket = ldmEntry_t* const bucket =
ZSTD_ldm_getBucket(ldmState, ZSTD_ldm_getBucket(ldmState,
ZSTD_ldm_getSmallHash(rollingHash, hBits), ZSTD_ldm_getSmallHash(rollingHash, hBits),
ldmParams); *params);
ldmEntry_t* cur; ldmEntry_t* cur;
size_t bestMatchLength = 0; size_t bestMatchLength = 0;
U32 const checksum = ZSTD_ldm_getChecksum(rollingHash, hBits); U32 const checksum = ZSTD_ldm_getChecksum(rollingHash, hBits);
for (cur = bucket; cur < bucket + ldmBucketSize; ++cur) { for (cur = bucket; cur < bucket + ldmBucketSize; ++cur) {
const BYTE* const pMatch = cur->offset + base;
size_t curForwardMatchLength, curBackwardMatchLength, size_t curForwardMatchLength, curBackwardMatchLength,
curTotalMatchLength; curTotalMatchLength;
if (cur->checksum != checksum || cur->offset <= lowestIndex) { if (cur->checksum != checksum || cur->offset <= lowestIndex) {
continue; continue;
} }
if (extDict) {
BYTE const* const curMatchBase =
cur->offset < dictLimit ? dictBase : base;
BYTE const* const pMatch = curMatchBase + cur->offset;
BYTE const* const matchEnd =
cur->offset < dictLimit ? dictEnd : iend;
BYTE const* const lowMatchPtr =
cur->offset < dictLimit ? dictStart : lowPrefixPtr;
curForwardMatchLength = ZSTD_count(ip, pMatch, iend); curForwardMatchLength = ZSTD_count_2segments(
if (curForwardMatchLength < ldmParams.minMatchLength) { ip, pMatch, iend,
matchEnd, lowPrefixPtr);
if (curForwardMatchLength < minMatchLength) {
continue; continue;
} }
curBackwardMatchLength = ZSTD_ldm_countBackwardsMatch( curBackwardMatchLength =
ip, anchor, pMatch, lowest); ZSTD_ldm_countBackwardsMatch(ip, anchor, pMatch,
lowMatchPtr);
curTotalMatchLength = curForwardMatchLength + curTotalMatchLength = curForwardMatchLength +
curBackwardMatchLength; curBackwardMatchLength;
} else { /* !extDict */
BYTE const* const pMatch = base + cur->offset;
curForwardMatchLength = ZSTD_count(ip, pMatch, iend);
if (curForwardMatchLength < minMatchLength) {
continue;
}
curBackwardMatchLength =
ZSTD_ldm_countBackwardsMatch(ip, anchor, pMatch,
lowPrefixPtr);
curTotalMatchLength = curForwardMatchLength +
curBackwardMatchLength;
}
if (curTotalMatchLength > bestMatchLength) { if (curTotalMatchLength > bestMatchLength) {
bestMatchLength = curTotalMatchLength; bestMatchLength = curTotalMatchLength;
@ -366,7 +408,7 @@ size_t ZSTD_compressBlock_ldm(
if (bestEntry == NULL) { if (bestEntry == NULL) {
ZSTD_ldm_makeEntryAndInsertByTag(ldmState, rollingHash, ZSTD_ldm_makeEntryAndInsertByTag(ldmState, rollingHash,
hBits, current, hBits, current,
ldmParams); *params);
ip++; ip++;
continue; continue;
} }
@ -375,280 +417,203 @@ size_t ZSTD_compressBlock_ldm(
mLength = forwardMatchLength + backwardMatchLength; mLength = forwardMatchLength + backwardMatchLength;
ip -= backwardMatchLength; ip -= backwardMatchLength;
/* Call the block compressor on the remaining literals */
{ {
/* Store the sequence:
* ip = current - backwardMatchLength
* The match is at (bestEntry->offset - backwardMatchLength)
*/
U32 const matchIndex = bestEntry->offset; U32 const matchIndex = bestEntry->offset;
const BYTE* const match = base + matchIndex - backwardMatchLength; U32 const offset = current - matchIndex;
U32 const offset = (U32)(ip - match);
/* Fill tables for block compressor */ sequences->litLength = (U32)(ip - anchor);
ZSTD_ldm_limitTableUpdate(ms, anchor); sequences->matchLength = (U32)mLength;
ZSTD_ldm_fillFastTables(ms, cParams, anchor); sequences->offset = offset;
++sequences;
/* Call block compressor and get remaining literals */
lastLiterals = blockCompressor(ms, seqStore, rep, cParams, anchor, ip - anchor);
ms->nextToUpdate = (U32)(ip - base);
/* Update repToConfirm with the new offset */
for (i = ZSTD_REP_NUM - 1; i > 0; i--)
rep[i] = rep[i-1];
rep[0] = offset;
/* Store the sequence with the leftover literals */
ZSTD_storeSeq(seqStore, lastLiterals, ip - lastLiterals,
offset + ZSTD_REP_MOVE, mLength - MINMATCH);
} }
/* Insert the current entry into the hash table */ /* Insert the current entry into the hash table */
ZSTD_ldm_makeEntryAndInsertByTag(ldmState, rollingHash, hBits, ZSTD_ldm_makeEntryAndInsertByTag(ldmState, rollingHash, hBits,
(U32)(lastHashed - base), (U32)(lastHashed - base),
ldmParams); *params);
assert(ip + backwardMatchLength == lastHashed); assert(ip + backwardMatchLength == lastHashed);
/* Fill the hash table from lastHashed+1 to ip+mLength*/ /* Fill the hash table from lastHashed+1 to ip+mLength*/
/* Heuristic: don't need to fill the entire table at end of block */ /* Heuristic: don't need to fill the entire table at end of block */
if (ip + mLength < ilimit) { if (ip + mLength <= ilimit) {
rollingHash = ZSTD_ldm_fillLdmHashTable( rollingHash = ZSTD_ldm_fillLdmHashTable(
ldmState, rollingHash, lastHashed, ldmState, rollingHash, lastHashed,
ip + mLength, base, hBits, ldmParams); ip + mLength, base, hBits, *params);
lastHashed = ip + mLength - 1; lastHashed = ip + mLength - 1;
} }
ip += mLength; ip += mLength;
anchor = ip; anchor = ip;
/* Check immediate repcode */ }
while ( (ip < ilimit) /* Return the number of sequences generated */
&& ( (rep[1] > 0) && (rep[1] <= (U32)(ip-lowest)) return sequences - sequencesStart;
&& (MEM_read32(ip) == MEM_read32(ip - rep[1])) )) { }
size_t const rLength = ZSTD_count(ip+4, ip+4-rep[1], /*! ZSTD_ldm_reduceTable() :
iend) + 4; * reduce table indexes by `reducerValue` */
/* Swap repToConfirm[1] <=> repToConfirm[0] */ static void ZSTD_ldm_reduceTable(ldmEntry_t* const table, U32 const size,
U32 const reducerValue)
{ {
U32 const tmpOff = rep[1]; U32 u;
rep[1] = rep[0]; for (u = 0; u < size; u++) {
rep[0] = tmpOff; if (table[u].offset < reducerValue) table[u].offset = 0;
} else table[u].offset -= reducerValue;
ZSTD_storeSeq(seqStore, 0, anchor, 0, rLength-MINMATCH);
/* Fill the hash table from lastHashed+1 to ip+rLength*/
if (ip + rLength < ilimit) {
rollingHash = ZSTD_ldm_fillLdmHashTable(
ldmState, rollingHash, lastHashed,
ip + rLength, base, hBits, ldmParams);
lastHashed = ip + rLength - 1;
}
ip += rLength;
anchor = ip;
} }
} }
ZSTD_ldm_limitTableUpdate(ms, anchor); size_t ZSTD_ldm_generateSequences(
ZSTD_ldm_fillFastTables(ms, cParams, anchor); ldmState_t* ldmState, rawSeq* sequences,
ldmParams_t const* params, void const* src, size_t srcSize,
lastLiterals = blockCompressor(ms, seqStore, rep, cParams, anchor, iend - anchor); int const extDict)
ms->nextToUpdate = (U32)(iend - base);
/* Return the last literals size */
return lastLiterals;
}
size_t ZSTD_compressBlock_ldm_extDict(
ldmState_t* ldmState, ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
ZSTD_CCtx_params const* params, void const* src, size_t srcSize)
{ {
const ldmParams_t ldmParams = params->ldmParams; U32 const maxDist = 1U << params->windowLog;
ZSTD_compressionParameters const* cParams = &params->cParams; BYTE const* const istart = (BYTE const*)src;
const U64 hashPower = ldmState->hashPower; size_t const kMaxChunkSize = 1 << 20;
const U32 hBits = ldmParams.hashLog - ldmParams.bucketSizeLog; size_t const nbChunks = (srcSize / kMaxChunkSize) + ((srcSize % kMaxChunkSize) != 0);
const U32 ldmBucketSize = ((U32)1 << ldmParams.bucketSizeLog); size_t nbSeq = 0;
const U32 ldmTagMask = ((U32)1 << ldmParams.hashEveryLog) - 1; size_t chunk;
const BYTE* const base = ms->base;
const BYTE* const dictBase = ms->dictBase;
const BYTE* const istart = (const BYTE*)src;
const BYTE* ip = istart;
const BYTE* anchor = istart;
const U32 lowestIndex = ms->lowLimit;
const BYTE* const dictStart = dictBase + lowestIndex;
const U32 dictLimit = ms->dictLimit;
const BYTE* const lowPrefixPtr = base + dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - MAX(ldmParams.minMatchLength, HASH_READ_SIZE);
const ZSTD_blockCompressor blockCompressor = assert(ZSTD_CHUNKSIZE_MAX >= kMaxChunkSize);
ZSTD_selectBlockCompressor(cParams->strategy, 1); /* Check that ZSTD_window_update() has been called for this chunk prior
U64 rollingHash = 0; * to passing it to this function.
const BYTE* lastHashed = NULL; */
size_t i, lastLiterals; assert(ldmState->window.nextSrc >= (BYTE const*)src + srcSize);
for (chunk = 0; chunk < nbChunks; ++chunk) {
size_t const chunkStart = chunk * kMaxChunkSize;
size_t const chunkEnd = MIN(chunkStart + kMaxChunkSize, srcSize);
size_t const chunkSize = chunkEnd - chunkStart;
/* Search Loop */ assert(chunkStart < srcSize);
while (ip < ilimit) { /* < instead of <=, because (ip+1) */ if (ZSTD_window_needOverflowCorrection(ldmState->window)) {
size_t mLength; U32 const ldmHSize = 1U << params->hashLog;
const U32 current = (U32)(ip-base); U32 const correction = ZSTD_window_correctOverflow(
size_t forwardMatchLength = 0, backwardMatchLength = 0; &ldmState->window, /* cycleLog */ 0, maxDist, src);
ldmEntry_t* bestEntry = NULL; ZSTD_ldm_reduceTable(ldmState->hashTable, ldmHSize, correction);
if (ip != istart) { }
rollingHash = ZSTD_ldm_updateHash(rollingHash, lastHashed[0], /* kMaxChunkSize should be small enough that we don't lose too much of
lastHashed[ldmParams.minMatchLength], * the window through early invalidation.
hashPower); * TODO: * Test the chunk size.
* * Try invalidation after the sequence generation and test the
* the offset against maxDist directly.
*/
ZSTD_window_enforceMaxDist(&ldmState->window, istart + chunkEnd,
maxDist);
nbSeq += ZSTD_ldm_generateSequences_internal(
ldmState, sequences + nbSeq, params, istart + chunkStart, chunkSize,
extDict);
}
return nbSeq;
}
#if 0
/**
* If the sequence length is longer than remaining then the sequence is split
* between this block and the next.
*
* Returns the current sequence to handle, or if the rest of the block should
* be literals, it returns a sequence with offset == 0.
*/
static rawSeq maybeSplitSequence(rawSeq* sequences, size_t* nbSeq,
size_t const seq, size_t const remaining,
U32 const minMatch)
{
rawSeq sequence = sequences[seq];
assert(sequence.offset > 0);
/* Handle partial sequences */
if (remaining <= sequence.litLength) {
/* Split the literals that we have out of the sequence.
* They will become the last literals of this block.
* The next block starts off with the remaining literals.
*/
sequences[seq].litLength -= remaining;
*nbSeq = seq;
sequence.offset = 0;
} else if (remaining < sequence.litLength + sequence.matchLength) {
/* Split the match up into two sequences. One in this block, and one
* in the next with no literals. If either match would be shorter
* than searchLength we omit it.
*/
U32 const matchPrefix = remaining - sequence.litLength;
U32 const matchSuffix = sequence.matchLength - matchPrefix;
assert(remaining > sequence.litLength);
assert(matchPrefix < sequence.matchLength);
assert(matchPrefix + matchSuffix == sequence.matchLength);
/* Update the current sequence */
sequence.matchLength = matchPrefix;
/* Update the next sequence when long enough, otherwise omit it. */
if (matchSuffix >= minMatch) {
sequences[seq].litLength = 0;
sequences[seq].matchLength = matchSuffix;
*nbSeq = seq;
} else { } else {
rollingHash = ZSTD_ldm_getRollingHash(ip, ldmParams.minMatchLength); sequences[seq + 1].litLength += matchSuffix;
*nbSeq = seq + 1;
} }
lastHashed = ip; if (sequence.matchLength < minMatch) {
/* Skip the current sequence if it is too short */
if (ZSTD_ldm_getTag(rollingHash, hBits, ldmParams.hashEveryLog) != sequence.offset = 0;
ldmTagMask) {
/* Don't insert and don't look for a match */
ip++;
continue;
} }
}
return sequence;
}
#endif
/* Get the best entry and compute the match lengths */ size_t ZSTD_ldm_blockCompress(rawSeq const* sequences, size_t nbSeq,
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
ZSTD_compressionParameters const* cParams, void const* src, size_t srcSize,
int const extDict)
{ {
ldmEntry_t* const bucket = ZSTD_blockCompressor const blockCompressor =
ZSTD_ldm_getBucket(ldmState, ZSTD_selectBlockCompressor(cParams->strategy, extDict);
ZSTD_ldm_getSmallHash(rollingHash, hBits), BYTE const* const base = ms->window.base;
ldmParams); /* Input bounds */
ldmEntry_t* cur; BYTE const* const istart = (BYTE const*)src;
size_t bestMatchLength = 0; BYTE const* const iend = istart + srcSize;
U32 const checksum = ZSTD_ldm_getChecksum(rollingHash, hBits); /* Input positions */
BYTE const* ip = istart;
size_t seq;
/* Loop through each sequence and apply the block compressor to the lits */
for (seq = 0; seq < nbSeq; ++seq) {
rawSeq const sequence = sequences[seq];
int i;
for (cur = bucket; cur < bucket + ldmBucketSize; ++cur) { if (sequence.offset == 0)
const BYTE* const curMatchBase = break;
cur->offset < dictLimit ? dictBase : base;
const BYTE* const pMatch = curMatchBase + cur->offset;
const BYTE* const matchEnd =
cur->offset < dictLimit ? dictEnd : iend;
const BYTE* const lowMatchPtr =
cur->offset < dictLimit ? dictStart : lowPrefixPtr;
size_t curForwardMatchLength, curBackwardMatchLength,
curTotalMatchLength;
if (cur->checksum != checksum || cur->offset <= lowestIndex) { assert(ip + sequence.litLength + sequence.matchLength <= iend);
continue;
}
curForwardMatchLength = ZSTD_count_2segments( /* Fill tables for block compressor */
ip, pMatch, iend, ZSTD_ldm_limitTableUpdate(ms, ip);
matchEnd, lowPrefixPtr); ZSTD_ldm_fillFastTables(ms, cParams, ip);
if (curForwardMatchLength < ldmParams.minMatchLength) { /* Run the block compressor */
continue;
}
curBackwardMatchLength = ZSTD_ldm_countBackwardsMatch(
ip, anchor, pMatch, lowMatchPtr);
curTotalMatchLength = curForwardMatchLength +
curBackwardMatchLength;
if (curTotalMatchLength > bestMatchLength) {
bestMatchLength = curTotalMatchLength;
forwardMatchLength = curForwardMatchLength;
backwardMatchLength = curBackwardMatchLength;
bestEntry = cur;
}
}
}
/* No match found -- continue searching */
if (bestEntry == NULL) {
ZSTD_ldm_makeEntryAndInsertByTag(ldmState, rollingHash, hBits,
(U32)(lastHashed - base),
ldmParams);
ip++;
continue;
}
/* Match found */
mLength = forwardMatchLength + backwardMatchLength;
ip -= backwardMatchLength;
/* Call the block compressor on the remaining literals */
{ {
/* ip = current - backwardMatchLength size_t const newLitLength =
* The match is at (bestEntry->offset - backwardMatchLength) */ blockCompressor(ms, seqStore, rep, cParams, ip,
U32 const matchIndex = bestEntry->offset; sequence.litLength);
U32 const offset = current - matchIndex; ip += sequence.litLength;
/* Fill the hash table for the block compressor */
ZSTD_ldm_limitTableUpdate(ms, anchor);
ZSTD_ldm_fillFastTables(ms, cParams, anchor);
/* Call block compressor and get remaining literals */
lastLiterals = blockCompressor(ms, seqStore, rep, cParams, anchor, ip - anchor);
ms->nextToUpdate = (U32)(ip - base); ms->nextToUpdate = (U32)(ip - base);
/* Update the repcodes */
/* Update repToConfirm with the new offset */
for (i = ZSTD_REP_NUM - 1; i > 0; i--) for (i = ZSTD_REP_NUM - 1; i > 0; i--)
rep[i] = rep[i-1]; rep[i] = rep[i-1];
rep[0] = offset; rep[0] = sequence.offset;
/* Store the sequence */
/* Store the sequence with the leftover literals */ ZSTD_storeSeq(seqStore, newLitLength, ip - newLitLength,
ZSTD_storeSeq(seqStore, lastLiterals, ip - lastLiterals, sequence.offset + ZSTD_REP_MOVE,
offset + ZSTD_REP_MOVE, mLength - MINMATCH); sequence.matchLength - MINMATCH);
} ip += sequence.matchLength;
/* Insert the current entry into the hash table */
ZSTD_ldm_makeEntryAndInsertByTag(ldmState, rollingHash, hBits,
(U32)(lastHashed - base),
ldmParams);
/* Fill the hash table from lastHashed+1 to ip+mLength */
assert(ip + backwardMatchLength == lastHashed);
if (ip + mLength < ilimit) {
rollingHash = ZSTD_ldm_fillLdmHashTable(
ldmState, rollingHash, lastHashed,
ip + mLength, base, hBits,
ldmParams);
lastHashed = ip + mLength - 1;
}
ip += mLength;
anchor = ip;
/* check immediate repcode */
while (ip < ilimit) {
U32 const current2 = (U32)(ip-base);
U32 const repIndex2 = current2 - rep[1];
const BYTE* repMatch2 = repIndex2 < dictLimit ?
dictBase + repIndex2 : base + repIndex2;
if ( (((U32)((dictLimit-1) - repIndex2) >= 3) &
(repIndex2 > lowestIndex)) /* intentional overflow */
&& (MEM_read32(repMatch2) == MEM_read32(ip)) ) {
const BYTE* const repEnd2 = repIndex2 < dictLimit ?
dictEnd : iend;
size_t const repLength2 =
ZSTD_count_2segments(ip+4, repMatch2+4, iend,
repEnd2, lowPrefixPtr) + 4;
U32 tmpOffset = rep[1];
rep[1] = rep[0];
rep[0] = tmpOffset;
ZSTD_storeSeq(seqStore, 0, anchor, 0, repLength2-MINMATCH);
/* Fill the hash table from lastHashed+1 to ip+repLength2*/
if (ip + repLength2 < ilimit) {
rollingHash = ZSTD_ldm_fillLdmHashTable(
ldmState, rollingHash, lastHashed,
ip + repLength2, base, hBits,
ldmParams);
lastHashed = ip + repLength2 - 1;
}
ip += repLength2;
anchor = ip;
continue;
}
break;
} }
} }
ZSTD_ldm_limitTableUpdate(ms, ip);
ZSTD_ldm_limitTableUpdate(ms, anchor); ZSTD_ldm_fillFastTables(ms, cParams, ip);
ZSTD_ldm_fillFastTables(ms, cParams, anchor); {
size_t const lastLiterals = blockCompressor(ms, seqStore, rep, cParams,
/* Call the block compressor one last time on the last literals */ ip, iend - ip);
lastLiterals = blockCompressor(ms, seqStore, rep, cParams, anchor, iend - anchor);
ms->nextToUpdate = (U32)(iend - base); ms->nextToUpdate = (U32)(iend - base);
/* Return the last literals size */
return lastLiterals; return lastLiterals;
} }
}

View File

@ -24,34 +24,59 @@ extern "C" {
#define ZSTD_LDM_DEFAULT_WINDOW_LOG ZSTD_WINDOWLOG_DEFAULTMAX #define ZSTD_LDM_DEFAULT_WINDOW_LOG ZSTD_WINDOWLOG_DEFAULTMAX
#define ZSTD_LDM_HASHEVERYLOG_NOTSET 9999 #define ZSTD_LDM_HASHEVERYLOG_NOTSET 9999
/** ZSTD_compressBlock_ldm_generic() : /**
* ZSTD_ldm_generateSequences():
* *
* This is a block compressor intended for long distance matching. * Generates the sequences using the long distance match finder.
* The sequences completely parse a prefix of the source, but leave off the last
* literals. Returns the number of sequences generated into `sequences`. The
* user must have called ZSTD_window_update() for all of the input they have,
* even if they pass it to ZSTD_ldm_generateSequences() in chunks.
* *
* The function searches for matches of length at least * NOTE: The source may be any size, assuming it doesn't overflow the hash table
* ldmParams.minMatchLength using a hash table in cctx->ldmState. * indices, and the output sequences table is large enough..
* Matches can be at a distance of up to cParams.windowLog. */
* size_t ZSTD_ldm_generateSequences(
* Upon finding a match, the unmatched literals are compressed using a ldmState_t* ldms, rawSeq* sequences,
* ZSTD_blockCompressor (depending on the strategy in the compression ldmParams_t const* params, void const* src, size_t srcSize,
* parameters), which stores the matched sequences. The "long distance" int const extDict);
* match is then stored with the remaining literals from the
* ZSTD_blockCompressor. */ /**
size_t ZSTD_compressBlock_ldm( * ZSTD_ldm_blockCompress():
ldmState_t* ldms, ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], *
ZSTD_CCtx_params const* params, void const* src, size_t srcSize); * Compresses a block using the predefined sequences, along with a secondary
* block compressor. The literals section of every sequence is passed to the
* secondary block compressor, and those sequences are interspersed with the
* predefined sequences. Returns the length of the last literals.
* `nbSeq` is the number of sequences available in `sequences`.
*
* NOTE: The source must be at most the maximum block size, but the predefined
* sequences can be any size, and may be longer than the block. In the case that
* they are longer than the block, the last sequences may need to be split into
* two. We handle that case correctly, and update `sequences` and `nbSeq`
* appropriately.
*/
size_t ZSTD_ldm_blockCompress(rawSeq const* sequences, size_t nbSeq,
ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
ZSTD_compressionParameters const* cParams, void const* src, size_t srcSize,
int const extDict);
size_t ZSTD_compressBlock_ldm_extDict(
ldmState_t* ldms, ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
ZSTD_CCtx_params const* params, void const* src, size_t srcSize);
/** ZSTD_ldm_initializeParameters() : /** ZSTD_ldm_initializeParameters() :
* Initialize the long distance matching parameters to their default values. */ * Initialize the long distance matching parameters to their default values. */
size_t ZSTD_ldm_initializeParameters(ldmParams_t* params, U32 enableLdm); size_t ZSTD_ldm_initializeParameters(ldmParams_t* params, U32 enableLdm);
/** ZSTD_ldm_getTableSize() : /** ZSTD_ldm_getTableSize() :
* Estimate the space needed for long distance matching tables. */ * Estimate the space needed for long distance matching tables or 0 if LDM is
size_t ZSTD_ldm_getTableSize(U32 hashLog, U32 bucketSizeLog); * disabled.
*/
size_t ZSTD_ldm_getTableSize(ldmParams_t params);
/** ZSTD_ldm_getSeqSpace() :
* Return an upper bound on the number of sequences that can be produced by
* the long distance matcher, or 0 if LDM is disabled.
*/
size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_t maxChunkSize);
/** ZSTD_ldm_getTableSize() : /** ZSTD_ldm_getTableSize() :
* Return prime8bytes^(minMatchLength-1) */ * Return prime8bytes^(minMatchLength-1) */
@ -62,8 +87,12 @@ U64 ZSTD_ldm_getHashPower(U32 minMatchLength);
* windowLog and params->hashLog. * windowLog and params->hashLog.
* *
* Ensures that params->bucketSizeLog is <= params->hashLog (setting it to * Ensures that params->bucketSizeLog is <= params->hashLog (setting it to
* params->hashLog if it is not). */ * params->hashLog if it is not).
void ZSTD_ldm_adjustParameters(ldmParams_t* params, U32 windowLog); *
* Ensures that the minMatchLength >= targetLength during optimal parsing.
*/
void ZSTD_ldm_adjustParameters(ldmParams_t* params,
ZSTD_compressionParameters const* cParams);
#if defined (__cplusplus) #if defined (__cplusplus)
} }

View File

@ -247,7 +247,7 @@ static U32 ZSTD_insertAndFindFirstIndexHash3 (ZSTD_matchState_t* ms, const BYTE*
{ {
U32* const hashTable3 = ms->hashTable3; U32* const hashTable3 = ms->hashTable3;
U32 const hashLog3 = ms->hashLog3; U32 const hashLog3 = ms->hashLog3;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
U32 idx = ms->nextToUpdate3; U32 idx = ms->nextToUpdate3;
U32 const target = ms->nextToUpdate3 = (U32)(ip - base); U32 const target = ms->nextToUpdate3 = (U32)(ip - base);
size_t const hash3 = ZSTD_hash3Ptr(ip, hashLog3); size_t const hash3 = ZSTD_hash3Ptr(ip, hashLog3);
@ -281,9 +281,9 @@ static U32 ZSTD_insertBt1(
U32 const btMask = (1 << btLog) - 1; U32 const btMask = (1 << btLog) - 1;
U32 matchIndex = hashTable[h]; U32 matchIndex = hashTable[h];
size_t commonLengthSmaller=0, commonLengthLarger=0; size_t commonLengthSmaller=0, commonLengthLarger=0;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
const U32 dictLimit = ms->dictLimit; const U32 dictLimit = ms->window.dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const prefixStart = base + dictLimit; const BYTE* const prefixStart = base + dictLimit;
const BYTE* match; const BYTE* match;
@ -292,7 +292,7 @@ static U32 ZSTD_insertBt1(
U32* smallerPtr = bt + 2*(current&btMask); U32* smallerPtr = bt + 2*(current&btMask);
U32* largerPtr = smallerPtr + 1; U32* largerPtr = smallerPtr + 1;
U32 dummy32; /* to be nullified at the end */ U32 dummy32; /* to be nullified at the end */
U32 const windowLow = ms->lowLimit; U32 const windowLow = ms->window.lowLimit;
U32 matchEndIdx = current+8+1; U32 matchEndIdx = current+8+1;
size_t bestLength = 8; size_t bestLength = 8;
U32 nbCompares = 1U << cParams->searchLog; U32 nbCompares = 1U << cParams->searchLog;
@ -383,7 +383,7 @@ void ZSTD_updateTree_internal(
const BYTE* const ip, const BYTE* const iend, const BYTE* const ip, const BYTE* const iend,
const U32 mls, const U32 extDict) const U32 mls, const U32 extDict)
{ {
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
U32 const target = (U32)(ip - base); U32 const target = (U32)(ip - base);
U32 idx = ms->nextToUpdate; U32 idx = ms->nextToUpdate;
DEBUGLOG(7, "ZSTD_updateTree_internal, from %u to %u (extDict:%u)", DEBUGLOG(7, "ZSTD_updateTree_internal, from %u to %u (extDict:%u)",
@ -409,7 +409,7 @@ U32 ZSTD_insertBtAndGetAllMatches (
ZSTD_match_t* matches, const U32 lengthToBeat, U32 const mls /* template */) ZSTD_match_t* matches, const U32 lengthToBeat, U32 const mls /* template */)
{ {
U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1); U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1);
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
U32 const current = (U32)(ip-base); U32 const current = (U32)(ip-base);
U32 const hashLog = cParams->hashLog; U32 const hashLog = cParams->hashLog;
U32 const minMatch = (mls==3) ? 3 : 4; U32 const minMatch = (mls==3) ? 3 : 4;
@ -420,12 +420,12 @@ U32 ZSTD_insertBtAndGetAllMatches (
U32 const btLog = cParams->chainLog - 1; U32 const btLog = cParams->chainLog - 1;
U32 const btMask= (1U << btLog) - 1; U32 const btMask= (1U << btLog) - 1;
size_t commonLengthSmaller=0, commonLengthLarger=0; size_t commonLengthSmaller=0, commonLengthLarger=0;
const BYTE* const dictBase = ms->dictBase; const BYTE* const dictBase = ms->window.dictBase;
U32 const dictLimit = ms->dictLimit; U32 const dictLimit = ms->window.dictLimit;
const BYTE* const dictEnd = dictBase + dictLimit; const BYTE* const dictEnd = dictBase + dictLimit;
const BYTE* const prefixStart = base + dictLimit; const BYTE* const prefixStart = base + dictLimit;
U32 const btLow = btMask >= current ? 0 : current - btMask; U32 const btLow = btMask >= current ? 0 : current - btMask;
U32 const windowLow = ms->lowLimit; U32 const windowLow = ms->window.lowLimit;
U32* smallerPtr = bt + 2*(current&btMask); U32* smallerPtr = bt + 2*(current&btMask);
U32* largerPtr = bt + 2*(current&btMask) + 1; U32* largerPtr = bt + 2*(current&btMask) + 1;
U32 matchEndIdx = current+8+1; /* farthest referenced position of any match => detects repetitive patterns */ U32 matchEndIdx = current+8+1; /* farthest referenced position of any match => detects repetitive patterns */
@ -566,7 +566,7 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_BtGetAllMatches (
{ {
U32 const matchLengthSearch = cParams->searchLength; U32 const matchLengthSearch = cParams->searchLength;
DEBUGLOG(7, "ZSTD_BtGetAllMatches"); DEBUGLOG(7, "ZSTD_BtGetAllMatches");
if (ip < ms->base + ms->nextToUpdate) return 0; /* skipped area */ if (ip < ms->window.base + ms->nextToUpdate) return 0; /* skipped area */
ZSTD_updateTree_internal(ms, cParams, ip, iHighLimit, matchLengthSearch, extDict); ZSTD_updateTree_internal(ms, cParams, ip, iHighLimit, matchLengthSearch, extDict);
switch(matchLengthSearch) switch(matchLengthSearch)
{ {
@ -675,8 +675,8 @@ size_t ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms,seqStore_t* seqStore
const BYTE* anchor = istart; const BYTE* anchor = istart;
const BYTE* const iend = istart + srcSize; const BYTE* const iend = istart + srcSize;
const BYTE* const ilimit = iend - 8; const BYTE* const ilimit = iend - 8;
const BYTE* const base = ms->base; const BYTE* const base = ms->window.base;
const BYTE* const prefixStart = base + ms->dictLimit; const BYTE* const prefixStart = base + ms->window.dictLimit;
U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1); U32 const sufficient_len = MIN(cParams->targetLength, ZSTD_OPT_NUM -1);
U32 const minMatch = (cParams->searchLength == 3) ? 3 : 4; U32 const minMatch = (cParams->searchLength == 3) ? 3 : 4;

View File

@ -10,7 +10,7 @@
/* ====== Tuning parameters ====== */ /* ====== Tuning parameters ====== */
#define ZSTDMT_NBTHREADS_MAX 200 #define ZSTDMT_NBWORKERS_MAX 200
#define ZSTDMT_JOBSIZE_MAX (MEM_32bits() ? (512 MB) : (2 GB)) /* note : limited by `jobSize` type, which is `unsigned` */ #define ZSTDMT_JOBSIZE_MAX (MEM_32bits() ? (512 MB) : (2 GB)) /* note : limited by `jobSize` type, which is `unsigned` */
#define ZSTDMT_OVERLAPLOG_DEFAULT 6 #define ZSTDMT_OVERLAPLOG_DEFAULT 6
@ -97,9 +97,9 @@ typedef struct ZSTDMT_bufferPool_s {
buffer_t bTable[1]; /* variable size */ buffer_t bTable[1]; /* variable size */
} ZSTDMT_bufferPool; } ZSTDMT_bufferPool;
static ZSTDMT_bufferPool* ZSTDMT_createBufferPool(unsigned nbThreads, ZSTD_customMem cMem) static ZSTDMT_bufferPool* ZSTDMT_createBufferPool(unsigned nbWorkers, ZSTD_customMem cMem)
{ {
unsigned const maxNbBuffers = 2*nbThreads + 3; unsigned const maxNbBuffers = 2*nbWorkers + 3;
ZSTDMT_bufferPool* const bufPool = (ZSTDMT_bufferPool*)ZSTD_calloc( ZSTDMT_bufferPool* const bufPool = (ZSTDMT_bufferPool*)ZSTD_calloc(
sizeof(ZSTDMT_bufferPool) + (maxNbBuffers-1) * sizeof(buffer_t), cMem); sizeof(ZSTDMT_bufferPool) + (maxNbBuffers-1) * sizeof(buffer_t), cMem);
if (bufPool==NULL) return NULL; if (bufPool==NULL) return NULL;
@ -236,23 +236,24 @@ static void ZSTDMT_freeCCtxPool(ZSTDMT_CCtxPool* pool)
} }
/* ZSTDMT_createCCtxPool() : /* ZSTDMT_createCCtxPool() :
* implies nbThreads >= 1 , checked by caller ZSTDMT_createCCtx() */ * implies nbWorkers >= 1 , checked by caller ZSTDMT_createCCtx() */
static ZSTDMT_CCtxPool* ZSTDMT_createCCtxPool(unsigned nbThreads, static ZSTDMT_CCtxPool* ZSTDMT_createCCtxPool(unsigned nbWorkers,
ZSTD_customMem cMem) ZSTD_customMem cMem)
{ {
ZSTDMT_CCtxPool* const cctxPool = (ZSTDMT_CCtxPool*) ZSTD_calloc( ZSTDMT_CCtxPool* const cctxPool = (ZSTDMT_CCtxPool*) ZSTD_calloc(
sizeof(ZSTDMT_CCtxPool) + (nbThreads-1)*sizeof(ZSTD_CCtx*), cMem); sizeof(ZSTDMT_CCtxPool) + (nbWorkers-1)*sizeof(ZSTD_CCtx*), cMem);
assert(nbWorkers > 0);
if (!cctxPool) return NULL; if (!cctxPool) return NULL;
if (ZSTD_pthread_mutex_init(&cctxPool->poolMutex, NULL)) { if (ZSTD_pthread_mutex_init(&cctxPool->poolMutex, NULL)) {
ZSTD_free(cctxPool, cMem); ZSTD_free(cctxPool, cMem);
return NULL; return NULL;
} }
cctxPool->cMem = cMem; cctxPool->cMem = cMem;
cctxPool->totalCCtx = nbThreads; cctxPool->totalCCtx = nbWorkers;
cctxPool->availCCtx = 1; /* at least one cctx for single-thread mode */ cctxPool->availCCtx = 1; /* at least one cctx for single-thread mode */
cctxPool->cctx[0] = ZSTD_createCCtx_advanced(cMem); cctxPool->cctx[0] = ZSTD_createCCtx_advanced(cMem);
if (!cctxPool->cctx[0]) { ZSTDMT_freeCCtxPool(cctxPool); return NULL; } if (!cctxPool->cctx[0]) { ZSTDMT_freeCCtxPool(cctxPool); return NULL; }
DEBUGLOG(3, "cctxPool created, with %u threads", nbThreads); DEBUGLOG(3, "cctxPool created, with %u workers", nbWorkers);
return cctxPool; return cctxPool;
} }
@ -260,15 +261,16 @@ static ZSTDMT_CCtxPool* ZSTDMT_createCCtxPool(unsigned nbThreads,
static size_t ZSTDMT_sizeof_CCtxPool(ZSTDMT_CCtxPool* cctxPool) static size_t ZSTDMT_sizeof_CCtxPool(ZSTDMT_CCtxPool* cctxPool)
{ {
ZSTD_pthread_mutex_lock(&cctxPool->poolMutex); ZSTD_pthread_mutex_lock(&cctxPool->poolMutex);
{ unsigned const nbThreads = cctxPool->totalCCtx; { unsigned const nbWorkers = cctxPool->totalCCtx;
size_t const poolSize = sizeof(*cctxPool) size_t const poolSize = sizeof(*cctxPool)
+ (nbThreads-1)*sizeof(ZSTD_CCtx*); + (nbWorkers-1) * sizeof(ZSTD_CCtx*);
unsigned u; unsigned u;
size_t totalCCtxSize = 0; size_t totalCCtxSize = 0;
for (u=0; u<nbThreads; u++) { for (u=0; u<nbWorkers; u++) {
totalCCtxSize += ZSTD_sizeof_CCtx(cctxPool->cctx[u]); totalCCtxSize += ZSTD_sizeof_CCtx(cctxPool->cctx[u]);
} }
ZSTD_pthread_mutex_unlock(&cctxPool->poolMutex); ZSTD_pthread_mutex_unlock(&cctxPool->poolMutex);
assert(nbWorkers > 0);
return poolSize + totalCCtxSize; return poolSize + totalCCtxSize;
} }
} }
@ -295,8 +297,8 @@ static void ZSTDMT_releaseCCtx(ZSTDMT_CCtxPool* pool, ZSTD_CCtx* cctx)
if (pool->availCCtx < pool->totalCCtx) if (pool->availCCtx < pool->totalCCtx)
pool->cctx[pool->availCCtx++] = cctx; pool->cctx[pool->availCCtx++] = cctx;
else { else {
/* pool overflow : should not happen, since totalCCtx==nbThreads */ /* pool overflow : should not happen, since totalCCtx==nbWorkers */
DEBUGLOG(5, "CCtx pool overflow : free cctx"); DEBUGLOG(4, "CCtx pool overflow : free cctx");
ZSTD_freeCCtx(cctx); ZSTD_freeCCtx(cctx);
} }
ZSTD_pthread_mutex_unlock(&pool->poolMutex); ZSTD_pthread_mutex_unlock(&pool->poolMutex);
@ -502,52 +504,51 @@ static ZSTDMT_jobDescription* ZSTDMT_createJobsTable(U32* nbJobsPtr, ZSTD_custom
return jobTable; return jobTable;
} }
/* ZSTDMT_CCtxParam_setNbThreads(): /* ZSTDMT_CCtxParam_setNbWorkers():
* Internal use only */ * Internal use only */
size_t ZSTDMT_CCtxParam_setNbThreads(ZSTD_CCtx_params* params, unsigned nbThreads) size_t ZSTDMT_CCtxParam_setNbWorkers(ZSTD_CCtx_params* params, unsigned nbWorkers)
{ {
if (nbThreads > ZSTDMT_NBTHREADS_MAX) nbThreads = ZSTDMT_NBTHREADS_MAX; if (nbWorkers > ZSTDMT_NBWORKERS_MAX) nbWorkers = ZSTDMT_NBWORKERS_MAX;
if (nbThreads < 1) nbThreads = 1; params->nbWorkers = nbWorkers;
params->nbThreads = nbThreads;
params->overlapSizeLog = ZSTDMT_OVERLAPLOG_DEFAULT; params->overlapSizeLog = ZSTDMT_OVERLAPLOG_DEFAULT;
params->jobSize = 0; params->jobSize = 0;
return nbThreads; return nbWorkers;
} }
ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbThreads, ZSTD_customMem cMem) ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbWorkers, ZSTD_customMem cMem)
{ {
ZSTDMT_CCtx* mtctx; ZSTDMT_CCtx* mtctx;
U32 nbJobs = nbThreads + 2; U32 nbJobs = nbWorkers + 2;
DEBUGLOG(3, "ZSTDMT_createCCtx_advanced (nbThreads = %u)", nbThreads); DEBUGLOG(3, "ZSTDMT_createCCtx_advanced (nbWorkers = %u)", nbWorkers);
if (nbThreads < 1) return NULL; if (nbWorkers < 1) return NULL;
nbThreads = MIN(nbThreads , ZSTDMT_NBTHREADS_MAX); nbWorkers = MIN(nbWorkers , ZSTDMT_NBWORKERS_MAX);
if ((cMem.customAlloc!=NULL) ^ (cMem.customFree!=NULL)) if ((cMem.customAlloc!=NULL) ^ (cMem.customFree!=NULL))
/* invalid custom allocator */ /* invalid custom allocator */
return NULL; return NULL;
mtctx = (ZSTDMT_CCtx*) ZSTD_calloc(sizeof(ZSTDMT_CCtx), cMem); mtctx = (ZSTDMT_CCtx*) ZSTD_calloc(sizeof(ZSTDMT_CCtx), cMem);
if (!mtctx) return NULL; if (!mtctx) return NULL;
ZSTDMT_CCtxParam_setNbThreads(&mtctx->params, nbThreads); ZSTDMT_CCtxParam_setNbWorkers(&mtctx->params, nbWorkers);
mtctx->cMem = cMem; mtctx->cMem = cMem;
mtctx->allJobsCompleted = 1; mtctx->allJobsCompleted = 1;
mtctx->factory = POOL_create_advanced(nbThreads, 0, cMem); mtctx->factory = POOL_create_advanced(nbWorkers, 0, cMem);
mtctx->jobs = ZSTDMT_createJobsTable(&nbJobs, cMem); mtctx->jobs = ZSTDMT_createJobsTable(&nbJobs, cMem);
assert(nbJobs > 0); assert((nbJobs & (nbJobs - 1)) == 0); /* ensure nbJobs is a power of 2 */ assert(nbJobs > 0); assert((nbJobs & (nbJobs - 1)) == 0); /* ensure nbJobs is a power of 2 */
mtctx->jobIDMask = nbJobs - 1; mtctx->jobIDMask = nbJobs - 1;
mtctx->bufPool = ZSTDMT_createBufferPool(nbThreads, cMem); mtctx->bufPool = ZSTDMT_createBufferPool(nbWorkers, cMem);
mtctx->cctxPool = ZSTDMT_createCCtxPool(nbThreads, cMem); mtctx->cctxPool = ZSTDMT_createCCtxPool(nbWorkers, cMem);
if (!mtctx->factory | !mtctx->jobs | !mtctx->bufPool | !mtctx->cctxPool) { if (!mtctx->factory | !mtctx->jobs | !mtctx->bufPool | !mtctx->cctxPool) {
ZSTDMT_freeCCtx(mtctx); ZSTDMT_freeCCtx(mtctx);
return NULL; return NULL;
} }
DEBUGLOG(3, "mt_cctx created, for %u threads", nbThreads); DEBUGLOG(3, "mt_cctx created, for %u threads", nbWorkers);
return mtctx; return mtctx;
} }
ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbThreads) ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbWorkers)
{ {
return ZSTDMT_createCCtx_advanced(nbThreads, ZSTD_defaultCMem); return ZSTDMT_createCCtx_advanced(nbWorkers, ZSTD_defaultCMem);
} }
@ -649,8 +650,8 @@ size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSTDMT_parameter parameter,
} }
} }
/* Sets parameters relevant to the compression job, initializing others to /* Sets parameters relevant to the compression job,
* default values. Notably, nbThreads should probably be zero. */ * initializing others to default values. */
static ZSTD_CCtx_params ZSTDMT_initJobCCtxParams(ZSTD_CCtx_params const params) static ZSTD_CCtx_params ZSTDMT_initJobCCtxParams(ZSTD_CCtx_params const params)
{ {
ZSTD_CCtx_params jobParams; ZSTD_CCtx_params jobParams;
@ -664,13 +665,29 @@ static ZSTD_CCtx_params ZSTDMT_initJobCCtxParams(ZSTD_CCtx_params const params)
return jobParams; return jobParams;
} }
/* ZSTDMT_getNbThreads(): /*! ZSTDMT_updateCParams_whileCompressing() :
* Update compression level and parameters (except wlog)
* while compression is ongoing.
* New parameters will be applied to next compression job. */
void ZSTDMT_updateCParams_whileCompressing(ZSTDMT_CCtx* mtctx, int compressionLevel, ZSTD_compressionParameters cParams)
{
U32 const saved_wlog = mtctx->params.cParams.windowLog; /* Do not modify windowLog while compressing */
DEBUGLOG(5, "ZSTDMT_updateCParams_whileCompressing (level:%i)",
compressionLevel);
mtctx->params.compressionLevel = compressionLevel;
if (compressionLevel != ZSTD_CLEVEL_CUSTOM)
cParams = ZSTD_getCParams(compressionLevel, mtctx->frameContentSize, 0 /* dictSize */ );
cParams.windowLog = saved_wlog;
mtctx->params.cParams = cParams;
}
/* ZSTDMT_getNbWorkers():
* @return nb threads currently active in mtctx. * @return nb threads currently active in mtctx.
* mtctx must be valid */ * mtctx must be valid */
unsigned ZSTDMT_getNbThreads(const ZSTDMT_CCtx* mtctx) unsigned ZSTDMT_getNbWorkers(const ZSTDMT_CCtx* mtctx)
{ {
assert(mtctx != NULL); assert(mtctx != NULL);
return mtctx->params.nbThreads; return mtctx->params.nbWorkers;
} }
/* ZSTDMT_getFrameProgression(): /* ZSTDMT_getFrameProgression():
@ -709,15 +726,15 @@ ZSTD_frameProgression ZSTDMT_getFrameProgression(ZSTDMT_CCtx* mtctx)
/* ===== Multi-threaded compression ===== */ /* ===== Multi-threaded compression ===== */
/* ------------------------------------------ */ /* ------------------------------------------ */
static unsigned ZSTDMT_computeNbJobs(size_t srcSize, unsigned windowLog, unsigned nbThreads) { static unsigned ZSTDMT_computeNbJobs(size_t srcSize, unsigned windowLog, unsigned nbWorkers) {
assert(nbThreads>0); assert(nbWorkers>0);
{ size_t const jobSizeTarget = (size_t)1 << (windowLog + 2); { size_t const jobSizeTarget = (size_t)1 << (windowLog + 2);
size_t const jobMaxSize = jobSizeTarget << 2; size_t const jobMaxSize = jobSizeTarget << 2;
size_t const passSizeMax = jobMaxSize * nbThreads; size_t const passSizeMax = jobMaxSize * nbWorkers;
unsigned const multiplier = (unsigned)(srcSize / passSizeMax) + 1; unsigned const multiplier = (unsigned)(srcSize / passSizeMax) + 1;
unsigned const nbJobsLarge = multiplier * nbThreads; unsigned const nbJobsLarge = multiplier * nbWorkers;
unsigned const nbJobsMax = (unsigned)(srcSize / jobSizeTarget) + 1; unsigned const nbJobsMax = (unsigned)(srcSize / jobSizeTarget) + 1;
unsigned const nbJobsSmall = MIN(nbJobsMax, nbThreads); unsigned const nbJobsSmall = MIN(nbJobsMax, nbWorkers);
return (multiplier>1) ? nbJobsLarge : nbJobsSmall; return (multiplier>1) ? nbJobsLarge : nbJobsSmall;
} } } }
@ -734,7 +751,7 @@ static size_t ZSTDMT_compress_advanced_internal(
ZSTD_CCtx_params const jobParams = ZSTDMT_initJobCCtxParams(params); ZSTD_CCtx_params const jobParams = ZSTDMT_initJobCCtxParams(params);
unsigned const overlapRLog = (params.overlapSizeLog>9) ? 0 : 9-params.overlapSizeLog; unsigned const overlapRLog = (params.overlapSizeLog>9) ? 0 : 9-params.overlapSizeLog;
size_t const overlapSize = (overlapRLog>=9) ? 0 : (size_t)1 << (params.cParams.windowLog - overlapRLog); size_t const overlapSize = (overlapRLog>=9) ? 0 : (size_t)1 << (params.cParams.windowLog - overlapRLog);
unsigned const nbJobs = ZSTDMT_computeNbJobs(srcSize, params.cParams.windowLog, params.nbThreads); unsigned const nbJobs = ZSTDMT_computeNbJobs(srcSize, params.cParams.windowLog, params.nbWorkers);
size_t const proposedJobSize = (srcSize + (nbJobs-1)) / nbJobs; size_t const proposedJobSize = (srcSize + (nbJobs-1)) / nbJobs;
size_t const avgJobSize = (((proposedJobSize-1) & 0x1FFFF) < 0x7FFF) ? proposedJobSize + 0xFFFF : proposedJobSize; /* avoid too small last block */ size_t const avgJobSize = (((proposedJobSize-1) & 0x1FFFF) < 0x7FFF) ? proposedJobSize + 0xFFFF : proposedJobSize; /* avoid too small last block */
const char* const srcStart = (const char*)src; const char* const srcStart = (const char*)src;
@ -742,13 +759,13 @@ static size_t ZSTDMT_compress_advanced_internal(
unsigned const compressWithinDst = (dstCapacity >= ZSTD_compressBound(srcSize)) ? nbJobs : (unsigned)(dstCapacity / ZSTD_compressBound(avgJobSize)); /* presumes avgJobSize >= 256 KB, which should be the case */ unsigned const compressWithinDst = (dstCapacity >= ZSTD_compressBound(srcSize)) ? nbJobs : (unsigned)(dstCapacity / ZSTD_compressBound(avgJobSize)); /* presumes avgJobSize >= 256 KB, which should be the case */
size_t frameStartPos = 0, dstBufferPos = 0; size_t frameStartPos = 0, dstBufferPos = 0;
XXH64_state_t xxh64; XXH64_state_t xxh64;
assert(jobParams.nbThreads == 0); assert(jobParams.nbWorkers == 0);
assert(mtctx->cctxPool->totalCCtx == params.nbThreads); assert(mtctx->cctxPool->totalCCtx == params.nbWorkers);
DEBUGLOG(4, "ZSTDMT_compress_advanced_internal: nbJobs=%2u (rawSize=%u bytes; fixedSize=%u) ", DEBUGLOG(4, "ZSTDMT_compress_advanced_internal: nbJobs=%2u (rawSize=%u bytes; fixedSize=%u) ",
nbJobs, (U32)proposedJobSize, (U32)avgJobSize); nbJobs, (U32)proposedJobSize, (U32)avgJobSize);
if ((nbJobs==1) | (params.nbThreads<=1)) { /* fallback to single-thread mode : this is a blocking invocation anyway */ if ((nbJobs==1) | (params.nbWorkers<=1)) { /* fallback to single-thread mode : this is a blocking invocation anyway */
ZSTD_CCtx* const cctx = mtctx->cctxPool->cctx[0]; ZSTD_CCtx* const cctx = mtctx->cctxPool->cctx[0];
if (cdict) return ZSTD_compress_usingCDict_advanced(cctx, dst, dstCapacity, src, srcSize, cdict, jobParams.fParams); if (cdict) return ZSTD_compress_usingCDict_advanced(cctx, dst, dstCapacity, src, srcSize, cdict, jobParams.fParams);
return ZSTD_compress_advanced_internal(cctx, dst, dstCapacity, src, srcSize, NULL, 0, jobParams); return ZSTD_compress_advanced_internal(cctx, dst, dstCapacity, src, srcSize, NULL, 0, jobParams);
@ -856,7 +873,7 @@ size_t ZSTDMT_compress_advanced(ZSTDMT_CCtx* mtctx,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
const void* src, size_t srcSize, const void* src, size_t srcSize,
const ZSTD_CDict* cdict, const ZSTD_CDict* cdict,
ZSTD_parameters const params, ZSTD_parameters params,
unsigned overlapLog) unsigned overlapLog)
{ {
ZSTD_CCtx_params cctxParams = mtctx->params; ZSTD_CCtx_params cctxParams = mtctx->params;
@ -892,12 +909,14 @@ size_t ZSTDMT_initCStream_internal(
const ZSTD_CDict* cdict, ZSTD_CCtx_params params, const ZSTD_CDict* cdict, ZSTD_CCtx_params params,
unsigned long long pledgedSrcSize) unsigned long long pledgedSrcSize)
{ {
DEBUGLOG(4, "ZSTDMT_initCStream_internal (pledgedSrcSize=%u)", (U32)pledgedSrcSize); DEBUGLOG(4, "ZSTDMT_initCStream_internal (pledgedSrcSize=%u, nbWorkers=%u, cctxPool=%u)",
(U32)pledgedSrcSize, params.nbWorkers, mtctx->cctxPool->totalCCtx);
/* params are supposed to be fully validated at this point */ /* params are supposed to be fully validated at this point */
assert(!ZSTD_isError(ZSTD_checkCParams(params.cParams))); assert(!ZSTD_isError(ZSTD_checkCParams(params.cParams)));
assert(!((dict) && (cdict))); /* either dict or cdict, not both */ assert(!((dict) && (cdict))); /* either dict or cdict, not both */
assert(mtctx->cctxPool->totalCCtx == params.nbThreads); assert(mtctx->cctxPool->totalCCtx == params.nbWorkers);
mtctx->singleBlockingThread = (pledgedSrcSize <= ZSTDMT_JOBSIZE_MIN); /* do not trigger multi-threading when srcSize is too small */
/* init */
if (params.jobSize == 0) { if (params.jobSize == 0) {
if (params.cParams.windowLog >= 29) if (params.cParams.windowLog >= 29)
params.jobSize = ZSTDMT_JOBSIZE_MAX; params.jobSize = ZSTDMT_JOBSIZE_MAX;
@ -906,15 +925,17 @@ size_t ZSTDMT_initCStream_internal(
} }
if (params.jobSize > ZSTDMT_JOBSIZE_MAX) params.jobSize = ZSTDMT_JOBSIZE_MAX; if (params.jobSize > ZSTDMT_JOBSIZE_MAX) params.jobSize = ZSTDMT_JOBSIZE_MAX;
mtctx->singleBlockingThread = (pledgedSrcSize <= ZSTDMT_JOBSIZE_MIN); /* do not trigger multi-threading when srcSize is too small */
if (mtctx->singleBlockingThread) { if (mtctx->singleBlockingThread) {
ZSTD_CCtx_params const singleThreadParams = ZSTDMT_initJobCCtxParams(params); ZSTD_CCtx_params const singleThreadParams = ZSTDMT_initJobCCtxParams(params);
DEBUGLOG(4, "ZSTDMT_initCStream_internal: switch to single blocking thread mode"); DEBUGLOG(4, "ZSTDMT_initCStream_internal: switch to single blocking thread mode");
assert(singleThreadParams.nbThreads == 0); assert(singleThreadParams.nbWorkers == 0);
return ZSTD_initCStream_internal(mtctx->cctxPool->cctx[0], return ZSTD_initCStream_internal(mtctx->cctxPool->cctx[0],
dict, dictSize, cdict, dict, dictSize, cdict,
singleThreadParams, pledgedSrcSize); singleThreadParams, pledgedSrcSize);
} }
DEBUGLOG(4, "ZSTDMT_initCStream_internal: %u threads", params.nbThreads);
DEBUGLOG(4, "ZSTDMT_initCStream_internal: %u workers", params.nbWorkers);
if (mtctx->allJobsCompleted == 0) { /* previous compression not correctly finished */ if (mtctx->allJobsCompleted == 0) { /* previous compression not correctly finished */
ZSTDMT_waitForAllJobsCompleted(mtctx); ZSTDMT_waitForAllJobsCompleted(mtctx);
@ -993,8 +1014,6 @@ size_t ZSTDMT_initCStream_usingCDict(ZSTDMT_CCtx* mtctx,
size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize) size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize)
{ {
if (!pledgedSrcSize) pledgedSrcSize = ZSTD_CONTENTSIZE_UNKNOWN; if (!pledgedSrcSize) pledgedSrcSize = ZSTD_CONTENTSIZE_UNKNOWN;
if (mtctx->params.nbThreads==1)
return ZSTD_resetCStream(mtctx->cctxPool->cctx[0], pledgedSrcSize);
return ZSTDMT_initCStream_internal(mtctx, NULL, 0, ZSTD_dm_auto, 0, mtctx->params, return ZSTDMT_initCStream_internal(mtctx, NULL, 0, ZSTD_dm_auto, 0, mtctx->params,
pledgedSrcSize); pledgedSrcSize);
} }

View File

@ -30,15 +30,15 @@
/* === Memory management === */ /* === Memory management === */
typedef struct ZSTDMT_CCtx_s ZSTDMT_CCtx; typedef struct ZSTDMT_CCtx_s ZSTDMT_CCtx;
ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbThreads); ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbWorkers);
ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbThreads, ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx_advanced(unsigned nbWorkers,
ZSTD_customMem cMem); ZSTD_customMem cMem);
ZSTDLIB_API size_t ZSTDMT_freeCCtx(ZSTDMT_CCtx* mtctx); ZSTDLIB_API size_t ZSTDMT_freeCCtx(ZSTDMT_CCtx* mtctx);
ZSTDLIB_API size_t ZSTDMT_sizeof_CCtx(ZSTDMT_CCtx* mtctx); ZSTDLIB_API size_t ZSTDMT_sizeof_CCtx(ZSTDMT_CCtx* mtctx);
/* === Simple buffer-to-butter one-pass function === */ /* === Simple one-pass compression function === */
ZSTDLIB_API size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* mtctx, ZSTDLIB_API size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* mtctx,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
@ -50,7 +50,7 @@ ZSTDLIB_API size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* mtctx,
/* === Streaming functions === */ /* === Streaming functions === */
ZSTDLIB_API size_t ZSTDMT_initCStream(ZSTDMT_CCtx* mtctx, int compressionLevel); ZSTDLIB_API size_t ZSTDMT_initCStream(ZSTDMT_CCtx* mtctx, int compressionLevel);
ZSTDLIB_API size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize); /**< if srcSize is not known at reset time, use ZSTD_CONTENTSIZE_UNKNOWN. Note: for compatibility with older programs, 0 means the same as ZSTD_CONTENTSIZE_UNKNOWN, but it may change in the future, to mean "empty" */ ZSTDLIB_API size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize); /**< if srcSize is not known at reset time, use ZSTD_CONTENTSIZE_UNKNOWN. Note: for compatibility with older programs, 0 means the same as ZSTD_CONTENTSIZE_UNKNOWN, but it will change in the future to mean "empty" */
ZSTDLIB_API size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input); ZSTDLIB_API size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
@ -68,7 +68,7 @@ ZSTDLIB_API size_t ZSTDMT_compress_advanced(ZSTDMT_CCtx* mtctx,
void* dst, size_t dstCapacity, void* dst, size_t dstCapacity,
const void* src, size_t srcSize, const void* src, size_t srcSize,
const ZSTD_CDict* cdict, const ZSTD_CDict* cdict,
ZSTD_parameters const params, ZSTD_parameters params,
unsigned overlapLog); unsigned overlapLog);
ZSTDLIB_API size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* mtctx, ZSTDLIB_API size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* mtctx,
@ -109,19 +109,28 @@ ZSTDLIB_API size_t ZSTDMT_compressStream_generic(ZSTDMT_CCtx* mtctx,
ZSTD_EndDirective endOp); ZSTD_EndDirective endOp);
/* === Private definitions; never ever use directly === */ /* ========================================================
* === Private interface, for use by ZSTD_compress.c ===
* === Not exposed in libzstd. Never invoke directly ===
* ======================================================== */
size_t ZSTDMT_CCtxParam_setMTCtxParameter(ZSTD_CCtx_params* params, ZSTDMT_parameter parameter, unsigned value); size_t ZSTDMT_CCtxParam_setMTCtxParameter(ZSTD_CCtx_params* params, ZSTDMT_parameter parameter, unsigned value);
/* ZSTDMT_CCtxParam_setNbThreads() /* ZSTDMT_CCtxParam_setNbWorkers()
* Set nbThreads, and clamp it correctly, * Set nbWorkers, and clamp it.
* also reset jobSize and overlapLog */ * Also reset jobSize and overlapLog */
size_t ZSTDMT_CCtxParam_setNbThreads(ZSTD_CCtx_params* params, unsigned nbThreads); size_t ZSTDMT_CCtxParam_setNbWorkers(ZSTD_CCtx_params* params, unsigned nbWorkers);
/* ZSTDMT_getNbThreads(): /*! ZSTDMT_updateCParams_whileCompressing() :
* Update compression level and parameters (except wlog)
* while compression is ongoing.
* New parameters will be applied to next compression job. */
void ZSTDMT_updateCParams_whileCompressing(ZSTDMT_CCtx* mtctx, int compressionLevel, ZSTD_compressionParameters cParams);
/* ZSTDMT_getNbWorkers():
* @return nb threads currently active in mtctx. * @return nb threads currently active in mtctx.
* mtctx must be valid */ * mtctx must be valid */
unsigned ZSTDMT_getNbThreads(const ZSTDMT_CCtx* mtctx); unsigned ZSTDMT_getNbWorkers(const ZSTDMT_CCtx* mtctx);
/* ZSTDMT_getFrameProgression(): /* ZSTDMT_getFrameProgression():
* tells how much data has been consumed (input) and produced (output) for current frame. * tells how much data has been consumed (input) and produced (output) for current frame.

View File

@ -49,6 +49,7 @@
****************************************************************/ ****************************************************************/
#define HUF_isError ERR_isError #define HUF_isError ERR_isError
#define HUF_STATIC_ASSERT(c) { enum { HUF_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */ #define HUF_STATIC_ASSERT(c) { enum { HUF_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */
#define CHECK_F(f) { size_t const err_ = (f); if (HUF_isError(err_)) return err_; }
/* ************************************************************** /* **************************************************************
@ -57,10 +58,10 @@
#define HUF_ALIGN(x, a) HUF_ALIGN_MASK((x), (a) - 1) #define HUF_ALIGN(x, a) HUF_ALIGN_MASK((x), (a) - 1)
#define HUF_ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) #define HUF_ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask))
/*-***************************/ /*-***************************/
/* generic DTableDesc */ /* generic DTableDesc */
/*-***************************/ /*-***************************/
typedef struct { BYTE maxTableLog; BYTE tableType; BYTE tableLog; BYTE reserved; } DTableDesc; typedef struct { BYTE maxTableLog; BYTE tableType; BYTE tableLog; BYTE reserved; } DTableDesc;
static DTableDesc HUF_getDTableDesc(const HUF_DTable* table) static DTableDesc HUF_getDTableDesc(const HUF_DTable* table)
@ -74,7 +75,6 @@ static DTableDesc HUF_getDTableDesc(const HUF_DTable* table)
/*-***************************/ /*-***************************/
/* single-symbol decoding */ /* single-symbol decoding */
/*-***************************/ /*-***************************/
typedef struct { BYTE byte; BYTE nbBits; } HUF_DEltX2; /* single-symbol decoding */ typedef struct { BYTE byte; BYTE nbBits; } HUF_DEltX2; /* single-symbol decoding */
size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize) size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize, void* workSpace, size_t wkspSize)
@ -94,10 +94,7 @@ size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize
huffWeight = (BYTE *)((U32 *)workSpace + spaceUsed32); huffWeight = (BYTE *)((U32 *)workSpace + spaceUsed32);
spaceUsed32 += HUF_ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; spaceUsed32 += HUF_ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2;
if ((spaceUsed32 << 2) > wkspSize) if ((spaceUsed32 << 2) > wkspSize) return ERROR(tableLog_tooLarge);
return ERROR(tableLog_tooLarge);
workSpace = (U32 *)workSpace + spaceUsed32;
wkspSize -= (spaceUsed32 << 2);
HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable)); HUF_STATIC_ASSERT(sizeof(DTableDesc) == sizeof(HUF_DTable));
/* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */ /* memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not necessary, even though some analyzer complain ... */
@ -144,8 +141,10 @@ size_t HUF_readDTableX2(HUF_DTable* DTable, const void* src, size_t srcSize)
workSpace, sizeof(workSpace)); workSpace, sizeof(workSpace));
} }
typedef struct { U16 sequence; BYTE nbBits; BYTE length; } HUF_DEltX4; /* double-symbols decoding */
static BYTE HUF_decodeSymbolX2(BIT_DStream_t* Dstream, const HUF_DEltX2* dt, const U32 dtLog) FORCE_INLINE_TEMPLATE BYTE
HUF_decodeSymbolX2(BIT_DStream_t* Dstream, const HUF_DEltX2* dt, const U32 dtLog)
{ {
size_t const val = BIT_lookBitsFast(Dstream, dtLog); /* note : dtLog >= 1 */ size_t const val = BIT_lookBitsFast(Dstream, dtLog); /* note : dtLog >= 1 */
BYTE const c = dt[val].byte; BYTE const c = dt[val].byte;
@ -164,30 +163,33 @@ static BYTE HUF_decodeSymbolX2(BIT_DStream_t* Dstream, const HUF_DEltX2* dt, con
if (MEM_64bits()) \ if (MEM_64bits()) \
HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr) HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr)
HINT_INLINE size_t HUF_decodeStreamX2(BYTE* p, BIT_DStream_t* const bitDPtr, BYTE* const pEnd, const HUF_DEltX2* const dt, const U32 dtLog) HINT_INLINE size_t
HUF_decodeStreamX2(BYTE* p, BIT_DStream_t* const bitDPtr, BYTE* const pEnd, const HUF_DEltX2* const dt, const U32 dtLog)
{ {
BYTE* const pStart = p; BYTE* const pStart = p;
/* up to 4 symbols at a time */ /* up to 4 symbols at a time */
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) && (p <= pEnd-4)) { while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p < pEnd-3)) {
HUF_DECODE_SYMBOLX2_2(p, bitDPtr); HUF_DECODE_SYMBOLX2_2(p, bitDPtr);
HUF_DECODE_SYMBOLX2_1(p, bitDPtr); HUF_DECODE_SYMBOLX2_1(p, bitDPtr);
HUF_DECODE_SYMBOLX2_2(p, bitDPtr); HUF_DECODE_SYMBOLX2_2(p, bitDPtr);
HUF_DECODE_SYMBOLX2_0(p, bitDPtr); HUF_DECODE_SYMBOLX2_0(p, bitDPtr);
} }
/* closer to the end */ /* [0-3] symbols remaining */
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) && (p < pEnd)) if (MEM_32bits())
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p < pEnd))
HUF_DECODE_SYMBOLX2_0(p, bitDPtr); HUF_DECODE_SYMBOLX2_0(p, bitDPtr);
/* no more data to retrieve from bitstream, hence no need to reload */ /* no more data to retrieve from bitstream, no need to reload */
while (p < pEnd) while (p < pEnd)
HUF_DECODE_SYMBOLX2_0(p, bitDPtr); HUF_DECODE_SYMBOLX2_0(p, bitDPtr);
return pEnd-pStart; return pEnd-pStart;
} }
static size_t HUF_decompress1X2_usingDTable_internal( FORCE_INLINE_TEMPLATE size_t
HUF_decompress1X2_usingDTable_internal_body(
void* dst, size_t dstSize, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable) const HUF_DTable* DTable)
@ -200,58 +202,17 @@ static size_t HUF_decompress1X2_usingDTable_internal(
DTableDesc const dtd = HUF_getDTableDesc(DTable); DTableDesc const dtd = HUF_getDTableDesc(DTable);
U32 const dtLog = dtd.tableLog; U32 const dtLog = dtd.tableLog;
{ size_t const errorCode = BIT_initDStream(&bitD, cSrc, cSrcSize); CHECK_F( BIT_initDStream(&bitD, cSrc, cSrcSize) );
if (HUF_isError(errorCode)) return errorCode; }
HUF_decodeStreamX2(op, &bitD, oend, dt, dtLog); HUF_decodeStreamX2(op, &bitD, oend, dt, dtLog);
/* check */
if (!BIT_endOfDStream(&bitD)) return ERROR(corruption_detected); if (!BIT_endOfDStream(&bitD)) return ERROR(corruption_detected);
return dstSize; return dstSize;
} }
size_t HUF_decompress1X2_usingDTable( FORCE_INLINE_TEMPLATE size_t
void* dst, size_t dstSize, HUF_decompress4X2_usingDTable_internal_body(
const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable)
{
DTableDesc dtd = HUF_getDTableDesc(DTable);
if (dtd.tableType != 0) return ERROR(GENERIC);
return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable);
}
size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
void* workSpace, size_t wkspSize)
{
const BYTE* ip = (const BYTE*) cSrc;
size_t const hSize = HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, workSpace, wkspSize);
if (HUF_isError(hSize)) return hSize;
if (hSize >= cSrcSize) return ERROR(srcSize_wrong);
ip += hSize; cSrcSize -= hSize;
return HUF_decompress1X2_usingDTable_internal (dst, dstSize, ip, cSrcSize, DCtx);
}
size_t HUF_decompress1X2_DCtx(HUF_DTable* DCtx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize)
{
U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32];
return HUF_decompress1X2_DCtx_wksp(DCtx, dst, dstSize, cSrc, cSrcSize,
workSpace, sizeof(workSpace));
}
size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize)
{
HUF_CREATE_STATIC_DTABLEX2(DTable, HUF_TABLELOG_MAX);
return HUF_decompress1X2_DCtx (DTable, dst, dstSize, cSrc, cSrcSize);
}
static size_t HUF_decompress4X2_usingDTable_internal(
void* dst, size_t dstSize, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable) const HUF_DTable* DTable)
@ -286,23 +247,19 @@ static size_t HUF_decompress4X2_usingDTable_internal(
BYTE* op2 = opStart2; BYTE* op2 = opStart2;
BYTE* op3 = opStart3; BYTE* op3 = opStart3;
BYTE* op4 = opStart4; BYTE* op4 = opStart4;
U32 endSignal; U32 endSignal = BIT_DStream_unfinished;
DTableDesc const dtd = HUF_getDTableDesc(DTable); DTableDesc const dtd = HUF_getDTableDesc(DTable);
U32 const dtLog = dtd.tableLog; U32 const dtLog = dtd.tableLog;
if (length4 > cSrcSize) return ERROR(corruption_detected); /* overflow */ if (length4 > cSrcSize) return ERROR(corruption_detected); /* overflow */
{ size_t const errorCode = BIT_initDStream(&bitD1, istart1, length1); CHECK_F( BIT_initDStream(&bitD1, istart1, length1) );
if (HUF_isError(errorCode)) return errorCode; } CHECK_F( BIT_initDStream(&bitD2, istart2, length2) );
{ size_t const errorCode = BIT_initDStream(&bitD2, istart2, length2); CHECK_F( BIT_initDStream(&bitD3, istart3, length3) );
if (HUF_isError(errorCode)) return errorCode; } CHECK_F( BIT_initDStream(&bitD4, istart4, length4) );
{ size_t const errorCode = BIT_initDStream(&bitD3, istart3, length3);
if (HUF_isError(errorCode)) return errorCode; }
{ size_t const errorCode = BIT_initDStream(&bitD4, istart4, length4);
if (HUF_isError(errorCode)) return errorCode; }
/* 16-32 symbols per loop (4-8 symbols per stream) */ /* up to 16 symbols per loop (4 symbols per stream) in 64-bit mode */
endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4); endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
for ( ; (endSignal==BIT_DStream_unfinished) && (op4<(oend-7)) ; ) { while ( (endSignal==BIT_DStream_unfinished) && (op4<(oend-3)) ) {
HUF_DECODE_SYMBOLX2_2(op1, &bitD1); HUF_DECODE_SYMBOLX2_2(op1, &bitD1);
HUF_DECODE_SYMBOLX2_2(op2, &bitD2); HUF_DECODE_SYMBOLX2_2(op2, &bitD2);
HUF_DECODE_SYMBOLX2_2(op3, &bitD3); HUF_DECODE_SYMBOLX2_2(op3, &bitD3);
@ -319,10 +276,15 @@ static size_t HUF_decompress4X2_usingDTable_internal(
HUF_DECODE_SYMBOLX2_0(op2, &bitD2); HUF_DECODE_SYMBOLX2_0(op2, &bitD2);
HUF_DECODE_SYMBOLX2_0(op3, &bitD3); HUF_DECODE_SYMBOLX2_0(op3, &bitD3);
HUF_DECODE_SYMBOLX2_0(op4, &bitD4); HUF_DECODE_SYMBOLX2_0(op4, &bitD4);
endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4); BIT_reloadDStream(&bitD1);
BIT_reloadDStream(&bitD2);
BIT_reloadDStream(&bitD3);
BIT_reloadDStream(&bitD4);
} }
/* check corruption */ /* check corruption */
/* note : should not be necessary : op# advance in lock step, and we control op4.
* but curiously, binary generated by gcc 7.2 & 7.3 with -mbmi2 runs faster when >=1 test is present */
if (op1 > opStart2) return ERROR(corruption_detected); if (op1 > opStart2) return ERROR(corruption_detected);
if (op2 > opStart3) return ERROR(corruption_detected); if (op2 > opStart3) return ERROR(corruption_detected);
if (op3 > opStart4) return ERROR(corruption_detected); if (op3 > opStart4) return ERROR(corruption_detected);
@ -335,8 +297,8 @@ static size_t HUF_decompress4X2_usingDTable_internal(
HUF_decodeStreamX2(op4, &bitD4, oend, dt, dtLog); HUF_decodeStreamX2(op4, &bitD4, oend, dt, dtLog);
/* check */ /* check */
endSignal = BIT_endOfDStream(&bitD1) & BIT_endOfDStream(&bitD2) & BIT_endOfDStream(&bitD3) & BIT_endOfDStream(&bitD4); { U32 const endCheck = BIT_endOfDStream(&bitD1) & BIT_endOfDStream(&bitD2) & BIT_endOfDStream(&bitD3) & BIT_endOfDStream(&bitD4);
if (!endSignal) return ERROR(corruption_detected); if (!endCheck) return ERROR(corruption_detected); }
/* decoded size */ /* decoded size */
return dstSize; return dstSize;
@ -344,6 +306,279 @@ static size_t HUF_decompress4X2_usingDTable_internal(
} }
FORCE_INLINE_TEMPLATE U32
HUF_decodeSymbolX4(void* op, BIT_DStream_t* DStream, const HUF_DEltX4* dt, const U32 dtLog)
{
size_t const val = BIT_lookBitsFast(DStream, dtLog); /* note : dtLog >= 1 */
memcpy(op, dt+val, 2);
BIT_skipBits(DStream, dt[val].nbBits);
return dt[val].length;
}
FORCE_INLINE_TEMPLATE U32
HUF_decodeLastSymbolX4(void* op, BIT_DStream_t* DStream, const HUF_DEltX4* dt, const U32 dtLog)
{
size_t const val = BIT_lookBitsFast(DStream, dtLog); /* note : dtLog >= 1 */
memcpy(op, dt+val, 1);
if (dt[val].length==1) BIT_skipBits(DStream, dt[val].nbBits);
else {
if (DStream->bitsConsumed < (sizeof(DStream->bitContainer)*8)) {
BIT_skipBits(DStream, dt[val].nbBits);
if (DStream->bitsConsumed > (sizeof(DStream->bitContainer)*8))
/* ugly hack; works only because it's the last symbol. Note : can't easily extract nbBits from just this symbol */
DStream->bitsConsumed = (sizeof(DStream->bitContainer)*8);
} }
return 1;
}
#define HUF_DECODE_SYMBOLX4_0(ptr, DStreamPtr) \
ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
#define HUF_DECODE_SYMBOLX4_1(ptr, DStreamPtr) \
if (MEM_64bits() || (HUF_TABLELOG_MAX<=12)) \
ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
#define HUF_DECODE_SYMBOLX4_2(ptr, DStreamPtr) \
if (MEM_64bits()) \
ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
HINT_INLINE size_t
HUF_decodeStreamX4(BYTE* p, BIT_DStream_t* bitDPtr, BYTE* const pEnd,
const HUF_DEltX4* const dt, const U32 dtLog)
{
BYTE* const pStart = p;
/* up to 8 symbols at a time */
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p < pEnd-(sizeof(bitDPtr->bitContainer)-1))) {
HUF_DECODE_SYMBOLX4_2(p, bitDPtr);
HUF_DECODE_SYMBOLX4_1(p, bitDPtr);
HUF_DECODE_SYMBOLX4_2(p, bitDPtr);
HUF_DECODE_SYMBOLX4_0(p, bitDPtr);
}
/* closer to end : up to 2 symbols at a time */
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p <= pEnd-2))
HUF_DECODE_SYMBOLX4_0(p, bitDPtr);
while (p <= pEnd-2)
HUF_DECODE_SYMBOLX4_0(p, bitDPtr); /* no need to reload : reached the end of DStream */
if (p < pEnd)
p += HUF_decodeLastSymbolX4(p, bitDPtr, dt, dtLog);
return p-pStart;
}
FORCE_INLINE_TEMPLATE size_t
HUF_decompress1X4_usingDTable_internal_body(
void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable)
{
BIT_DStream_t bitD;
/* Init */
CHECK_F( BIT_initDStream(&bitD, cSrc, cSrcSize) );
/* decode */
{ BYTE* const ostart = (BYTE*) dst;
BYTE* const oend = ostart + dstSize;
const void* const dtPtr = DTable+1; /* force compiler to not use strict-aliasing */
const HUF_DEltX4* const dt = (const HUF_DEltX4*)dtPtr;
DTableDesc const dtd = HUF_getDTableDesc(DTable);
HUF_decodeStreamX4(ostart, &bitD, oend, dt, dtd.tableLog);
}
/* check */
if (!BIT_endOfDStream(&bitD)) return ERROR(corruption_detected);
/* decoded size */
return dstSize;
}
FORCE_INLINE_TEMPLATE size_t
HUF_decompress4X4_usingDTable_internal_body(
void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable)
{
if (cSrcSize < 10) return ERROR(corruption_detected); /* strict minimum : jump table + 1 byte per stream */
{ const BYTE* const istart = (const BYTE*) cSrc;
BYTE* const ostart = (BYTE*) dst;
BYTE* const oend = ostart + dstSize;
const void* const dtPtr = DTable+1;
const HUF_DEltX4* const dt = (const HUF_DEltX4*)dtPtr;
/* Init */
BIT_DStream_t bitD1;
BIT_DStream_t bitD2;
BIT_DStream_t bitD3;
BIT_DStream_t bitD4;
size_t const length1 = MEM_readLE16(istart);
size_t const length2 = MEM_readLE16(istart+2);
size_t const length3 = MEM_readLE16(istart+4);
size_t const length4 = cSrcSize - (length1 + length2 + length3 + 6);
const BYTE* const istart1 = istart + 6; /* jumpTable */
const BYTE* const istart2 = istart1 + length1;
const BYTE* const istart3 = istart2 + length2;
const BYTE* const istart4 = istart3 + length3;
size_t const segmentSize = (dstSize+3) / 4;
BYTE* const opStart2 = ostart + segmentSize;
BYTE* const opStart3 = opStart2 + segmentSize;
BYTE* const opStart4 = opStart3 + segmentSize;
BYTE* op1 = ostart;
BYTE* op2 = opStart2;
BYTE* op3 = opStart3;
BYTE* op4 = opStart4;
U32 endSignal;
DTableDesc const dtd = HUF_getDTableDesc(DTable);
U32 const dtLog = dtd.tableLog;
if (length4 > cSrcSize) return ERROR(corruption_detected); /* overflow */
CHECK_F( BIT_initDStream(&bitD1, istart1, length1) );
CHECK_F( BIT_initDStream(&bitD2, istart2, length2) );
CHECK_F( BIT_initDStream(&bitD3, istart3, length3) );
CHECK_F( BIT_initDStream(&bitD4, istart4, length4) );
/* 16-32 symbols per loop (4-8 symbols per stream) */
endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
for ( ; (endSignal==BIT_DStream_unfinished) & (op4<(oend-(sizeof(bitD4.bitContainer)-1))) ; ) {
HUF_DECODE_SYMBOLX4_2(op1, &bitD1);
HUF_DECODE_SYMBOLX4_2(op2, &bitD2);
HUF_DECODE_SYMBOLX4_2(op3, &bitD3);
HUF_DECODE_SYMBOLX4_2(op4, &bitD4);
HUF_DECODE_SYMBOLX4_1(op1, &bitD1);
HUF_DECODE_SYMBOLX4_1(op2, &bitD2);
HUF_DECODE_SYMBOLX4_1(op3, &bitD3);
HUF_DECODE_SYMBOLX4_1(op4, &bitD4);
HUF_DECODE_SYMBOLX4_2(op1, &bitD1);
HUF_DECODE_SYMBOLX4_2(op2, &bitD2);
HUF_DECODE_SYMBOLX4_2(op3, &bitD3);
HUF_DECODE_SYMBOLX4_2(op4, &bitD4);
HUF_DECODE_SYMBOLX4_0(op1, &bitD1);
HUF_DECODE_SYMBOLX4_0(op2, &bitD2);
HUF_DECODE_SYMBOLX4_0(op3, &bitD3);
HUF_DECODE_SYMBOLX4_0(op4, &bitD4);
endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
}
/* check corruption */
if (op1 > opStart2) return ERROR(corruption_detected);
if (op2 > opStart3) return ERROR(corruption_detected);
if (op3 > opStart4) return ERROR(corruption_detected);
/* note : op4 already verified within main loop */
/* finish bitStreams one by one */
HUF_decodeStreamX4(op1, &bitD1, opStart2, dt, dtLog);
HUF_decodeStreamX4(op2, &bitD2, opStart3, dt, dtLog);
HUF_decodeStreamX4(op3, &bitD3, opStart4, dt, dtLog);
HUF_decodeStreamX4(op4, &bitD4, oend, dt, dtLog);
/* check */
{ U32 const endCheck = BIT_endOfDStream(&bitD1) & BIT_endOfDStream(&bitD2) & BIT_endOfDStream(&bitD3) & BIT_endOfDStream(&bitD4);
if (!endCheck) return ERROR(corruption_detected); }
/* decoded size */
return dstSize;
}
}
typedef size_t (*HUF_decompress_usingDTable_t)(void *dst, size_t dstSize,
const void *cSrc,
size_t cSrcSize,
const HUF_DTable *DTable);
#if DYNAMIC_BMI2
#define X(fn) \
\
static size_t fn##_default( \
void* dst, size_t dstSize, \
const void* cSrc, size_t cSrcSize, \
const HUF_DTable* DTable) \
{ \
return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable); \
} \
\
static TARGET_ATTRIBUTE("bmi2") size_t fn##_bmi2( \
void* dst, size_t dstSize, \
const void* cSrc, size_t cSrcSize, \
const HUF_DTable* DTable) \
{ \
return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable); \
} \
\
static size_t fn(void* dst, size_t dstSize, void const* cSrc, \
size_t cSrcSize, HUF_DTable const* DTable, int bmi2) \
{ \
if (bmi2) { \
return fn##_bmi2(dst, dstSize, cSrc, cSrcSize, DTable); \
} \
return fn##_default(dst, dstSize, cSrc, cSrcSize, DTable); \
}
#else
#define X(fn) \
static size_t fn(void* dst, size_t dstSize, void const* cSrc, \
size_t cSrcSize, HUF_DTable const* DTable, int bmi2) \
{ \
(void)bmi2; \
return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable); \
}
#endif
X(HUF_decompress1X2_usingDTable_internal)
X(HUF_decompress4X2_usingDTable_internal)
X(HUF_decompress1X4_usingDTable_internal)
X(HUF_decompress4X4_usingDTable_internal)
#undef X
size_t HUF_decompress1X2_usingDTable(
void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable)
{
DTableDesc dtd = HUF_getDTableDesc(DTable);
if (dtd.tableType != 0) return ERROR(GENERIC);
return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
}
size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
void* workSpace, size_t wkspSize)
{
const BYTE* ip = (const BYTE*) cSrc;
size_t const hSize = HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, workSpace, wkspSize);
if (HUF_isError(hSize)) return hSize;
if (hSize >= cSrcSize) return ERROR(srcSize_wrong);
ip += hSize; cSrcSize -= hSize;
return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSize, DCtx, /* bmi2 */ 0);
}
size_t HUF_decompress1X2_DCtx(HUF_DTable* DCtx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize)
{
U32 workSpace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32];
return HUF_decompress1X2_DCtx_wksp(DCtx, dst, dstSize, cSrc, cSrcSize,
workSpace, sizeof(workSpace));
}
size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize)
{
HUF_CREATE_STATIC_DTABLEX2(DTable, HUF_TABLELOG_MAX);
return HUF_decompress1X2_DCtx (DTable, dst, dstSize, cSrc, cSrcSize);
}
size_t HUF_decompress4X2_usingDTable( size_t HUF_decompress4X2_usingDTable(
void* dst, size_t dstSize, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
@ -351,13 +586,12 @@ size_t HUF_decompress4X2_usingDTable(
{ {
DTableDesc dtd = HUF_getDTableDesc(DTable); DTableDesc dtd = HUF_getDTableDesc(DTable);
if (dtd.tableType != 0) return ERROR(GENERIC); if (dtd.tableType != 0) return ERROR(GENERIC);
return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
} }
static size_t HUF_decompress4X2_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize,
size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
void* workSpace, size_t wkspSize) void* workSpace, size_t wkspSize, int bmi2)
{ {
const BYTE* ip = (const BYTE*) cSrc; const BYTE* ip = (const BYTE*) cSrc;
@ -367,7 +601,14 @@ size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize,
if (hSize >= cSrcSize) return ERROR(srcSize_wrong); if (hSize >= cSrcSize) return ERROR(srcSize_wrong);
ip += hSize; cSrcSize -= hSize; ip += hSize; cSrcSize -= hSize;
return HUF_decompress4X2_usingDTable_internal (dst, dstSize, ip, cSrcSize, dctx); return HUF_decompress4X2_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx, bmi2);
}
size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
void* workSpace, size_t wkspSize)
{
return HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, 0);
} }
@ -387,8 +628,6 @@ size_t HUF_decompress4X2 (void* dst, size_t dstSize, const void* cSrc, size_t cS
/* *************************/ /* *************************/
/* double-symbols decoding */ /* double-symbols decoding */
/* *************************/ /* *************************/
typedef struct { U16 sequence; BYTE nbBits; BYTE length; } HUF_DEltX4; /* double-symbols decoding */
typedef struct { BYTE symbol; BYTE weight; } sortedSymbol_t; typedef struct { BYTE symbol; BYTE weight; } sortedSymbol_t;
/* HUF_fillDTableX4Level2() : /* HUF_fillDTableX4Level2() :
@ -508,10 +747,7 @@ size_t HUF_readDTableX4_wksp(HUF_DTable* DTable, const void* src,
weightList = (BYTE *)((U32 *)workSpace + spaceUsed32); weightList = (BYTE *)((U32 *)workSpace + spaceUsed32);
spaceUsed32 += HUF_ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2; spaceUsed32 += HUF_ALIGN(HUF_SYMBOLVALUE_MAX + 1, sizeof(U32)) >> 2;
if ((spaceUsed32 << 2) > wkspSize) if ((spaceUsed32 << 2) > wkspSize) return ERROR(tableLog_tooLarge);
return ERROR(tableLog_tooLarge);
workSpace = (U32 *)workSpace + spaceUsed32;
wkspSize -= (spaceUsed32 << 2);
rankStart = rankStart0 + 1; rankStart = rankStart0 + 1;
memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1)); memset(rankStats, 0, sizeof(U32) * (2 * HUF_TABLELOG_MAX + 2 + 1));
@ -588,95 +824,6 @@ size_t HUF_readDTableX4(HUF_DTable* DTable, const void* src, size_t srcSize)
workSpace, sizeof(workSpace)); workSpace, sizeof(workSpace));
} }
static U32 HUF_decodeSymbolX4(void* op, BIT_DStream_t* DStream, const HUF_DEltX4* dt, const U32 dtLog)
{
size_t const val = BIT_lookBitsFast(DStream, dtLog); /* note : dtLog >= 1 */
memcpy(op, dt+val, 2);
BIT_skipBits(DStream, dt[val].nbBits);
return dt[val].length;
}
static U32 HUF_decodeLastSymbolX4(void* op, BIT_DStream_t* DStream, const HUF_DEltX4* dt, const U32 dtLog)
{
size_t const val = BIT_lookBitsFast(DStream, dtLog); /* note : dtLog >= 1 */
memcpy(op, dt+val, 1);
if (dt[val].length==1) BIT_skipBits(DStream, dt[val].nbBits);
else {
if (DStream->bitsConsumed < (sizeof(DStream->bitContainer)*8)) {
BIT_skipBits(DStream, dt[val].nbBits);
if (DStream->bitsConsumed > (sizeof(DStream->bitContainer)*8))
/* ugly hack; works only because it's the last symbol. Note : can't easily extract nbBits from just this symbol */
DStream->bitsConsumed = (sizeof(DStream->bitContainer)*8);
} }
return 1;
}
#define HUF_DECODE_SYMBOLX4_0(ptr, DStreamPtr) \
ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
#define HUF_DECODE_SYMBOLX4_1(ptr, DStreamPtr) \
if (MEM_64bits() || (HUF_TABLELOG_MAX<=12)) \
ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
#define HUF_DECODE_SYMBOLX4_2(ptr, DStreamPtr) \
if (MEM_64bits()) \
ptr += HUF_decodeSymbolX4(ptr, DStreamPtr, dt, dtLog)
HINT_INLINE size_t HUF_decodeStreamX4(BYTE* p, BIT_DStream_t* bitDPtr, BYTE* const pEnd, const HUF_DEltX4* const dt, const U32 dtLog)
{
BYTE* const pStart = p;
/* up to 8 symbols at a time */
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p < pEnd-(sizeof(bitDPtr->bitContainer)-1))) {
HUF_DECODE_SYMBOLX4_2(p, bitDPtr);
HUF_DECODE_SYMBOLX4_1(p, bitDPtr);
HUF_DECODE_SYMBOLX4_2(p, bitDPtr);
HUF_DECODE_SYMBOLX4_0(p, bitDPtr);
}
/* closer to end : up to 2 symbols at a time */
while ((BIT_reloadDStream(bitDPtr) == BIT_DStream_unfinished) & (p <= pEnd-2))
HUF_DECODE_SYMBOLX4_0(p, bitDPtr);
while (p <= pEnd-2)
HUF_DECODE_SYMBOLX4_0(p, bitDPtr); /* no need to reload : reached the end of DStream */
if (p < pEnd)
p += HUF_decodeLastSymbolX4(p, bitDPtr, dt, dtLog);
return p-pStart;
}
static size_t HUF_decompress1X4_usingDTable_internal(
void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable)
{
BIT_DStream_t bitD;
/* Init */
{ size_t const errorCode = BIT_initDStream(&bitD, cSrc, cSrcSize);
if (HUF_isError(errorCode)) return errorCode;
}
/* decode */
{ BYTE* const ostart = (BYTE*) dst;
BYTE* const oend = ostart + dstSize;
const void* const dtPtr = DTable+1; /* force compiler to not use strict-aliasing */
const HUF_DEltX4* const dt = (const HUF_DEltX4*)dtPtr;
DTableDesc const dtd = HUF_getDTableDesc(DTable);
HUF_decodeStreamX4(ostart, &bitD, oend, dt, dtd.tableLog);
}
/* check */
if (!BIT_endOfDStream(&bitD)) return ERROR(corruption_detected);
/* decoded size */
return dstSize;
}
size_t HUF_decompress1X4_usingDTable( size_t HUF_decompress1X4_usingDTable(
void* dst, size_t dstSize, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
@ -684,7 +831,7 @@ size_t HUF_decompress1X4_usingDTable(
{ {
DTableDesc dtd = HUF_getDTableDesc(DTable); DTableDesc dtd = HUF_getDTableDesc(DTable);
if (dtd.tableType != 1) return ERROR(GENERIC); if (dtd.tableType != 1) return ERROR(GENERIC);
return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
} }
size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize, size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize,
@ -699,7 +846,7 @@ size_t HUF_decompress1X4_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dstSize,
if (hSize >= cSrcSize) return ERROR(srcSize_wrong); if (hSize >= cSrcSize) return ERROR(srcSize_wrong);
ip += hSize; cSrcSize -= hSize; ip += hSize; cSrcSize -= hSize;
return HUF_decompress1X4_usingDTable_internal (dst, dstSize, ip, cSrcSize, DCtx); return HUF_decompress1X4_usingDTable_internal(dst, dstSize, ip, cSrcSize, DCtx, /* bmi2 */ 0);
} }
@ -717,99 +864,6 @@ size_t HUF_decompress1X4 (void* dst, size_t dstSize, const void* cSrc, size_t cS
return HUF_decompress1X4_DCtx(DTable, dst, dstSize, cSrc, cSrcSize); return HUF_decompress1X4_DCtx(DTable, dst, dstSize, cSrc, cSrcSize);
} }
static size_t HUF_decompress4X4_usingDTable_internal(
void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
const HUF_DTable* DTable)
{
if (cSrcSize < 10) return ERROR(corruption_detected); /* strict minimum : jump table + 1 byte per stream */
{ const BYTE* const istart = (const BYTE*) cSrc;
BYTE* const ostart = (BYTE*) dst;
BYTE* const oend = ostart + dstSize;
const void* const dtPtr = DTable+1;
const HUF_DEltX4* const dt = (const HUF_DEltX4*)dtPtr;
/* Init */
BIT_DStream_t bitD1;
BIT_DStream_t bitD2;
BIT_DStream_t bitD3;
BIT_DStream_t bitD4;
size_t const length1 = MEM_readLE16(istart);
size_t const length2 = MEM_readLE16(istart+2);
size_t const length3 = MEM_readLE16(istart+4);
size_t const length4 = cSrcSize - (length1 + length2 + length3 + 6);
const BYTE* const istart1 = istart + 6; /* jumpTable */
const BYTE* const istart2 = istart1 + length1;
const BYTE* const istart3 = istart2 + length2;
const BYTE* const istart4 = istart3 + length3;
size_t const segmentSize = (dstSize+3) / 4;
BYTE* const opStart2 = ostart + segmentSize;
BYTE* const opStart3 = opStart2 + segmentSize;
BYTE* const opStart4 = opStart3 + segmentSize;
BYTE* op1 = ostart;
BYTE* op2 = opStart2;
BYTE* op3 = opStart3;
BYTE* op4 = opStart4;
U32 endSignal;
DTableDesc const dtd = HUF_getDTableDesc(DTable);
U32 const dtLog = dtd.tableLog;
if (length4 > cSrcSize) return ERROR(corruption_detected); /* overflow */
{ size_t const errorCode = BIT_initDStream(&bitD1, istart1, length1);
if (HUF_isError(errorCode)) return errorCode; }
{ size_t const errorCode = BIT_initDStream(&bitD2, istart2, length2);
if (HUF_isError(errorCode)) return errorCode; }
{ size_t const errorCode = BIT_initDStream(&bitD3, istart3, length3);
if (HUF_isError(errorCode)) return errorCode; }
{ size_t const errorCode = BIT_initDStream(&bitD4, istart4, length4);
if (HUF_isError(errorCode)) return errorCode; }
/* 16-32 symbols per loop (4-8 symbols per stream) */
endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
for ( ; (endSignal==BIT_DStream_unfinished) & (op4<(oend-(sizeof(bitD4.bitContainer)-1))) ; ) {
HUF_DECODE_SYMBOLX4_2(op1, &bitD1);
HUF_DECODE_SYMBOLX4_2(op2, &bitD2);
HUF_DECODE_SYMBOLX4_2(op3, &bitD3);
HUF_DECODE_SYMBOLX4_2(op4, &bitD4);
HUF_DECODE_SYMBOLX4_1(op1, &bitD1);
HUF_DECODE_SYMBOLX4_1(op2, &bitD2);
HUF_DECODE_SYMBOLX4_1(op3, &bitD3);
HUF_DECODE_SYMBOLX4_1(op4, &bitD4);
HUF_DECODE_SYMBOLX4_2(op1, &bitD1);
HUF_DECODE_SYMBOLX4_2(op2, &bitD2);
HUF_DECODE_SYMBOLX4_2(op3, &bitD3);
HUF_DECODE_SYMBOLX4_2(op4, &bitD4);
HUF_DECODE_SYMBOLX4_0(op1, &bitD1);
HUF_DECODE_SYMBOLX4_0(op2, &bitD2);
HUF_DECODE_SYMBOLX4_0(op3, &bitD3);
HUF_DECODE_SYMBOLX4_0(op4, &bitD4);
endSignal = BIT_reloadDStream(&bitD1) | BIT_reloadDStream(&bitD2) | BIT_reloadDStream(&bitD3) | BIT_reloadDStream(&bitD4);
}
/* check corruption */
if (op1 > opStart2) return ERROR(corruption_detected);
if (op2 > opStart3) return ERROR(corruption_detected);
if (op3 > opStart4) return ERROR(corruption_detected);
/* note : op4 already verified within main loop */
/* finish bitStreams one by one */
HUF_decodeStreamX4(op1, &bitD1, opStart2, dt, dtLog);
HUF_decodeStreamX4(op2, &bitD2, opStart3, dt, dtLog);
HUF_decodeStreamX4(op3, &bitD3, opStart4, dt, dtLog);
HUF_decodeStreamX4(op4, &bitD4, oend, dt, dtLog);
/* check */
{ U32 const endCheck = BIT_endOfDStream(&bitD1) & BIT_endOfDStream(&bitD2) & BIT_endOfDStream(&bitD3) & BIT_endOfDStream(&bitD4);
if (!endCheck) return ERROR(corruption_detected); }
/* decoded size */
return dstSize;
}
}
size_t HUF_decompress4X4_usingDTable( size_t HUF_decompress4X4_usingDTable(
void* dst, size_t dstSize, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
@ -817,13 +871,12 @@ size_t HUF_decompress4X4_usingDTable(
{ {
DTableDesc dtd = HUF_getDTableDesc(DTable); DTableDesc dtd = HUF_getDTableDesc(DTable);
if (dtd.tableType != 1) return ERROR(GENERIC); if (dtd.tableType != 1) return ERROR(GENERIC);
return HUF_decompress4X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable); return HUF_decompress4X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
} }
static size_t HUF_decompress4X4_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize,
size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize, const void* cSrc, size_t cSrcSize,
void* workSpace, size_t wkspSize) void* workSpace, size_t wkspSize, int bmi2)
{ {
const BYTE* ip = (const BYTE*) cSrc; const BYTE* ip = (const BYTE*) cSrc;
@ -833,7 +886,14 @@ size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize,
if (hSize >= cSrcSize) return ERROR(srcSize_wrong); if (hSize >= cSrcSize) return ERROR(srcSize_wrong);
ip += hSize; cSrcSize -= hSize; ip += hSize; cSrcSize -= hSize;
return HUF_decompress4X4_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx); return HUF_decompress4X4_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx, bmi2);
}
size_t HUF_decompress4X4_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstSize,
const void* cSrc, size_t cSrcSize,
void* workSpace, size_t wkspSize)
{
return HUF_decompress4X4_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, /* bmi2 */ 0);
} }
@ -861,8 +921,8 @@ size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize,
const HUF_DTable* DTable) const HUF_DTable* DTable)
{ {
DTableDesc const dtd = HUF_getDTableDesc(DTable); DTableDesc const dtd = HUF_getDTableDesc(DTable);
return dtd.tableType ? HUF_decompress1X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable) : return dtd.tableType ? HUF_decompress1X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) :
HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable); HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
} }
size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize,
@ -870,8 +930,8 @@ size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize,
const HUF_DTable* DTable) const HUF_DTable* DTable)
{ {
DTableDesc const dtd = HUF_getDTableDesc(DTable); DTableDesc const dtd = HUF_getDTableDesc(DTable);
return dtd.tableType ? HUF_decompress4X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable) : return dtd.tableType ? HUF_decompress4X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) :
HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable); HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
} }
@ -994,3 +1054,42 @@ size_t HUF_decompress1X_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,
return HUF_decompress1X_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize, return HUF_decompress1X_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcSize,
workSpace, sizeof(workSpace)); workSpace, sizeof(workSpace));
} }
size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2)
{
DTableDesc const dtd = HUF_getDTableDesc(DTable);
return dtd.tableType ? HUF_decompress1X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2) :
HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
}
size_t HUF_decompress1X2_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2)
{
const BYTE* ip = (const BYTE*) cSrc;
size_t const hSize = HUF_readDTableX2_wksp(dctx, cSrc, cSrcSize, workSpace, wkspSize);
if (HUF_isError(hSize)) return hSize;
if (hSize >= cSrcSize) return ERROR(srcSize_wrong);
ip += hSize; cSrcSize -= hSize;
return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSize, dctx, bmi2);
}
size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2)
{
DTableDesc const dtd = HUF_getDTableDesc(DTable);
return dtd.tableType ? HUF_decompress4X4_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2) :
HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, cSrcSize, DTable, bmi2);
}
size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, int bmi2)
{
/* validation checks */
if (dstSize == 0) return ERROR(dstSize_tooSmall);
if (cSrcSize == 0) return ERROR(corruption_detected);
{ U32 const algoNb = HUF_selectDecoder(dstSize, cSrcSize);
return algoNb ? HUF_decompress4X4_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, bmi2) :
HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrcSize, workSpace, wkspSize, bmi2);
}
}

View File

@ -43,6 +43,7 @@
* Dependencies * Dependencies
*********************************************************/ *********************************************************/
#include <string.h> /* memcpy, memmove, memset */ #include <string.h> /* memcpy, memmove, memset */
#include "cpu.h"
#include "mem.h" /* low level memory routines */ #include "mem.h" /* low level memory routines */
#define FSE_STATIC_LINKING_ONLY #define FSE_STATIC_LINKING_ONLY
#include "fse.h" #include "fse.h"
@ -80,10 +81,25 @@ typedef enum { ZSTDds_getFrameHeaderSize, ZSTDds_decodeFrameHeader,
typedef enum { zdss_init=0, zdss_loadHeader, typedef enum { zdss_init=0, zdss_loadHeader,
zdss_read, zdss_load, zdss_flush } ZSTD_dStreamStage; zdss_read, zdss_load, zdss_flush } ZSTD_dStreamStage;
typedef struct { typedef struct {
FSE_DTable LLTable[FSE_DTABLE_SIZE_U32(LLFSELog)]; U32 fastMode;
FSE_DTable OFTable[FSE_DTABLE_SIZE_U32(OffFSELog)]; U32 tableLog;
FSE_DTable MLTable[FSE_DTABLE_SIZE_U32(MLFSELog)]; } ZSTD_seqSymbol_header;
typedef struct {
U16 nextState;
BYTE nbAdditionalBits;
BYTE nbBits;
U32 baseValue;
} ZSTD_seqSymbol;
#define SEQSYMBOL_TABLE_SIZE(log) (1 + (1<<log))
typedef struct {
ZSTD_seqSymbol LLTable[SEQSYMBOL_TABLE_SIZE(LLFSELog)];
ZSTD_seqSymbol OFTable[SEQSYMBOL_TABLE_SIZE(OffFSELog)];
ZSTD_seqSymbol MLTable[SEQSYMBOL_TABLE_SIZE(MLFSELog)];
HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_decompress4X */ HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_decompress4X */
U32 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32]; U32 workspace[HUF_DECOMPRESS_WORKSPACE_SIZE_U32];
U32 rep[ZSTD_REP_NUM]; U32 rep[ZSTD_REP_NUM];
@ -91,9 +107,9 @@ typedef struct {
struct ZSTD_DCtx_s struct ZSTD_DCtx_s
{ {
const FSE_DTable* LLTptr; const ZSTD_seqSymbol* LLTptr;
const FSE_DTable* MLTptr; const ZSTD_seqSymbol* MLTptr;
const FSE_DTable* OFTptr; const ZSTD_seqSymbol* OFTptr;
const HUF_DTable* HUFptr; const HUF_DTable* HUFptr;
ZSTD_entropyDTables_t entropy; ZSTD_entropyDTables_t entropy;
const void* previousDstEnd; /* detect continuity */ const void* previousDstEnd; /* detect continuity */
@ -116,6 +132,7 @@ struct ZSTD_DCtx_s
size_t litSize; size_t litSize;
size_t rleSize; size_t rleSize;
size_t staticSize; size_t staticSize;
int bmi2; /* == 1 if the CPU supports BMI2 and 0 otherwise. CPU support is determined dynamically once per context lifetime. */
/* streaming */ /* streaming */
ZSTD_DDict* ddictLocal; ZSTD_DDict* ddictLocal;
@ -173,6 +190,7 @@ static void ZSTD_initDCtx_internal(ZSTD_DCtx* dctx)
dctx->inBuffSize = 0; dctx->inBuffSize = 0;
dctx->outBuffSize = 0; dctx->outBuffSize = 0;
dctx->streamStage = zdss_init; dctx->streamStage = zdss_init;
dctx->bmi2 = ZSTD_cpuid_bmi2(ZSTD_cpuid());
} }
ZSTD_DCtx* ZSTD_initStaticDCtx(void *workspace, size_t workspaceSize) ZSTD_DCtx* ZSTD_initStaticDCtx(void *workspace, size_t workspaceSize)
@ -571,13 +589,13 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
if (HUF_isError((litEncType==set_repeat) ? if (HUF_isError((litEncType==set_repeat) ?
( singleStream ? ( singleStream ?
HUF_decompress1X_usingDTable(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr) : HUF_decompress1X_usingDTable_bmi2(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr, dctx->bmi2) :
HUF_decompress4X_usingDTable(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr) ) : HUF_decompress4X_usingDTable_bmi2(dctx->litBuffer, litSize, istart+lhSize, litCSize, dctx->HUFptr, dctx->bmi2) ) :
( singleStream ? ( singleStream ?
HUF_decompress1X2_DCtx_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize, HUF_decompress1X2_DCtx_wksp_bmi2(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize,
dctx->entropy.workspace, sizeof(dctx->entropy.workspace)) : dctx->entropy.workspace, sizeof(dctx->entropy.workspace), dctx->bmi2) :
HUF_decompress4X_hufOnly_wksp(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize, HUF_decompress4X_hufOnly_wksp_bmi2(dctx->entropy.hufTable, dctx->litBuffer, litSize, istart+lhSize, litCSize,
dctx->entropy.workspace, sizeof(dctx->entropy.workspace))))) dctx->entropy.workspace, sizeof(dctx->entropy.workspace), dctx->bmi2))))
return ERROR(corruption_detected); return ERROR(corruption_detected);
dctx->litPtr = dctx->litBuffer; dctx->litPtr = dctx->litBuffer;
@ -652,98 +670,219 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
} }
} }
/* Default FSE distribution tables.
typedef union { * These are pre-calculated FSE decoding tables using default distributions as defined in specification :
FSE_decode_t realData; * https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md#default-distributions
FSE_DTable dtable; * They were generated programmatically with following method :
U32 alignedBy4; * - start from default distributions, present in /lib/common/zstd_internal.h
} FSE_decode_t4; * - generate tables normally, using ZSTD_buildFSETable()
* - printout the content of tables
* - pretify output, report below, test with fuzzer to ensure it's correct */
/* Default FSE distribution table for Literal Lengths */ /* Default FSE distribution table for Literal Lengths */
static const FSE_decode_t4 LL_defaultDTable[(1<<LL_DEFAULTNORMLOG)+1] = { static const ZSTD_seqSymbol LL_defaultDTable[(1<<LL_DEFAULTNORMLOG)+1] = {
{ { LL_DEFAULTNORMLOG, 1, 1 } }, /* header : tableLog, fastMode, fastMode */ { 1, 1, 1, LL_DEFAULTNORMLOG}, /* header : fastMode, tableLog */
/* base, symbol, bits */ /* nextState, nbAddBits, nbBits, baseVal */
{ { 0, 0, 4 } }, { { 16, 0, 4 } }, { { 32, 1, 5 } }, { { 0, 3, 5 } }, { 0, 0, 4, 0}, { 16, 0, 4, 0},
{ { 0, 4, 5 } }, { { 0, 6, 5 } }, { { 0, 7, 5 } }, { { 0, 9, 5 } }, { 32, 0, 5, 1}, { 0, 0, 5, 3},
{ { 0, 10, 5 } }, { { 0, 12, 5 } }, { { 0, 14, 6 } }, { { 0, 16, 5 } }, { 0, 0, 5, 4}, { 0, 0, 5, 6},
{ { 0, 18, 5 } }, { { 0, 19, 5 } }, { { 0, 21, 5 } }, { { 0, 22, 5 } }, { 0, 0, 5, 7}, { 0, 0, 5, 9},
{ { 0, 24, 5 } }, { { 32, 25, 5 } }, { { 0, 26, 5 } }, { { 0, 27, 6 } }, { 0, 0, 5, 10}, { 0, 0, 5, 12},
{ { 0, 29, 6 } }, { { 0, 31, 6 } }, { { 32, 0, 4 } }, { { 0, 1, 4 } }, { 0, 0, 6, 14}, { 0, 1, 5, 16},
{ { 0, 2, 5 } }, { { 32, 4, 5 } }, { { 0, 5, 5 } }, { { 32, 7, 5 } }, { 0, 1, 5, 20}, { 0, 1, 5, 22},
{ { 0, 8, 5 } }, { { 32, 10, 5 } }, { { 0, 11, 5 } }, { { 0, 13, 6 } }, { 0, 2, 5, 28}, { 0, 3, 5, 32},
{ { 32, 16, 5 } }, { { 0, 17, 5 } }, { { 32, 19, 5 } }, { { 0, 20, 5 } }, { 0, 4, 5, 48}, { 32, 6, 5, 64},
{ { 32, 22, 5 } }, { { 0, 23, 5 } }, { { 0, 25, 4 } }, { { 16, 25, 4 } }, { 0, 7, 5, 128}, { 0, 8, 6, 256},
{ { 32, 26, 5 } }, { { 0, 28, 6 } }, { { 0, 30, 6 } }, { { 48, 0, 4 } }, { 0, 10, 6, 1024}, { 0, 12, 6, 4096},
{ { 16, 1, 4 } }, { { 32, 2, 5 } }, { { 32, 3, 5 } }, { { 32, 5, 5 } }, { 32, 0, 4, 0}, { 0, 0, 4, 1},
{ { 32, 6, 5 } }, { { 32, 8, 5 } }, { { 32, 9, 5 } }, { { 32, 11, 5 } }, { 0, 0, 5, 2}, { 32, 0, 5, 4},
{ { 32, 12, 5 } }, { { 0, 15, 6 } }, { { 32, 17, 5 } }, { { 32, 18, 5 } }, { 0, 0, 5, 5}, { 32, 0, 5, 7},
{ { 32, 20, 5 } }, { { 32, 21, 5 } }, { { 32, 23, 5 } }, { { 32, 24, 5 } }, { 0, 0, 5, 8}, { 32, 0, 5, 10},
{ { 0, 35, 6 } }, { { 0, 34, 6 } }, { { 0, 33, 6 } }, { { 0, 32, 6 } }, { 0, 0, 5, 11}, { 0, 0, 6, 13},
{ 32, 1, 5, 16}, { 0, 1, 5, 18},
{ 32, 1, 5, 22}, { 0, 2, 5, 24},
{ 32, 3, 5, 32}, { 0, 3, 5, 40},
{ 0, 6, 4, 64}, { 16, 6, 4, 64},
{ 32, 7, 5, 128}, { 0, 9, 6, 512},
{ 0, 11, 6, 2048}, { 48, 0, 4, 0},
{ 16, 0, 4, 1}, { 32, 0, 5, 2},
{ 32, 0, 5, 3}, { 32, 0, 5, 5},
{ 32, 0, 5, 6}, { 32, 0, 5, 8},
{ 32, 0, 5, 9}, { 32, 0, 5, 11},
{ 32, 0, 5, 12}, { 0, 0, 6, 15},
{ 32, 1, 5, 18}, { 32, 1, 5, 20},
{ 32, 2, 5, 24}, { 32, 2, 5, 28},
{ 32, 3, 5, 40}, { 32, 4, 5, 48},
{ 0, 16, 6,65536}, { 0, 15, 6,32768},
{ 0, 14, 6,16384}, { 0, 13, 6, 8192},
}; /* LL_defaultDTable */ }; /* LL_defaultDTable */
/* Default FSE distribution table for Offset Codes */
static const ZSTD_seqSymbol OF_defaultDTable[(1<<OF_DEFAULTNORMLOG)+1] = {
{ 1, 1, 1, OF_DEFAULTNORMLOG}, /* header : fastMode, tableLog */
/* nextState, nbAddBits, nbBits, baseVal */
{ 0, 0, 5, 0}, { 0, 6, 4, 61},
{ 0, 9, 5, 509}, { 0, 15, 5,32765},
{ 0, 21, 5,2097149}, { 0, 3, 5, 5},
{ 0, 7, 4, 125}, { 0, 12, 5, 4093},
{ 0, 18, 5,262141}, { 0, 23, 5,8388605},
{ 0, 5, 5, 29}, { 0, 8, 4, 253},
{ 0, 14, 5,16381}, { 0, 20, 5,1048573},
{ 0, 2, 5, 1}, { 16, 7, 4, 125},
{ 0, 11, 5, 2045}, { 0, 17, 5,131069},
{ 0, 22, 5,4194301}, { 0, 4, 5, 13},
{ 16, 8, 4, 253}, { 0, 13, 5, 8189},
{ 0, 19, 5,524285}, { 0, 1, 5, 1},
{ 16, 6, 4, 61}, { 0, 10, 5, 1021},
{ 0, 16, 5,65533}, { 0, 28, 5,268435453},
{ 0, 27, 5,134217725}, { 0, 26, 5,67108861},
{ 0, 25, 5,33554429}, { 0, 24, 5,16777213},
}; /* OF_defaultDTable */
/* Default FSE distribution table for Match Lengths */ /* Default FSE distribution table for Match Lengths */
static const FSE_decode_t4 ML_defaultDTable[(1<<ML_DEFAULTNORMLOG)+1] = { static const ZSTD_seqSymbol ML_defaultDTable[(1<<ML_DEFAULTNORMLOG)+1] = {
{ { ML_DEFAULTNORMLOG, 1, 1 } }, /* header : tableLog, fastMode, fastMode */ { 1, 1, 1, ML_DEFAULTNORMLOG}, /* header : fastMode, tableLog */
/* base, symbol, bits */ /* nextState, nbAddBits, nbBits, baseVal */
{ { 0, 0, 6 } }, { { 0, 1, 4 } }, { { 32, 2, 5 } }, { { 0, 3, 5 } }, { 0, 0, 6, 3}, { 0, 0, 4, 4},
{ { 0, 5, 5 } }, { { 0, 6, 5 } }, { { 0, 8, 5 } }, { { 0, 10, 6 } }, { 32, 0, 5, 5}, { 0, 0, 5, 6},
{ { 0, 13, 6 } }, { { 0, 16, 6 } }, { { 0, 19, 6 } }, { { 0, 22, 6 } }, { 0, 0, 5, 8}, { 0, 0, 5, 9},
{ { 0, 25, 6 } }, { { 0, 28, 6 } }, { { 0, 31, 6 } }, { { 0, 33, 6 } }, { 0, 0, 5, 11}, { 0, 0, 6, 13},
{ { 0, 35, 6 } }, { { 0, 37, 6 } }, { { 0, 39, 6 } }, { { 0, 41, 6 } }, { 0, 0, 6, 16}, { 0, 0, 6, 19},
{ { 0, 43, 6 } }, { { 0, 45, 6 } }, { { 16, 1, 4 } }, { { 0, 2, 4 } }, { 0, 0, 6, 22}, { 0, 0, 6, 25},
{ { 32, 3, 5 } }, { { 0, 4, 5 } }, { { 32, 6, 5 } }, { { 0, 7, 5 } }, { 0, 0, 6, 28}, { 0, 0, 6, 31},
{ { 0, 9, 6 } }, { { 0, 12, 6 } }, { { 0, 15, 6 } }, { { 0, 18, 6 } }, { 0, 0, 6, 34}, { 0, 1, 6, 37},
{ { 0, 21, 6 } }, { { 0, 24, 6 } }, { { 0, 27, 6 } }, { { 0, 30, 6 } }, { 0, 1, 6, 41}, { 0, 2, 6, 47},
{ { 0, 32, 6 } }, { { 0, 34, 6 } }, { { 0, 36, 6 } }, { { 0, 38, 6 } }, { 0, 3, 6, 59}, { 0, 4, 6, 83},
{ { 0, 40, 6 } }, { { 0, 42, 6 } }, { { 0, 44, 6 } }, { { 32, 1, 4 } }, { 0, 7, 6, 131}, { 0, 9, 6, 515},
{ { 48, 1, 4 } }, { { 16, 2, 4 } }, { { 32, 4, 5 } }, { { 32, 5, 5 } }, { 16, 0, 4, 4}, { 0, 0, 4, 5},
{ { 32, 7, 5 } }, { { 32, 8, 5 } }, { { 0, 11, 6 } }, { { 0, 14, 6 } }, { 32, 0, 5, 6}, { 0, 0, 5, 7},
{ { 0, 17, 6 } }, { { 0, 20, 6 } }, { { 0, 23, 6 } }, { { 0, 26, 6 } }, { 32, 0, 5, 9}, { 0, 0, 5, 10},
{ { 0, 29, 6 } }, { { 0, 52, 6 } }, { { 0, 51, 6 } }, { { 0, 50, 6 } }, { 0, 0, 6, 12}, { 0, 0, 6, 15},
{ { 0, 49, 6 } }, { { 0, 48, 6 } }, { { 0, 47, 6 } }, { { 0, 46, 6 } }, { 0, 0, 6, 18}, { 0, 0, 6, 21},
{ 0, 0, 6, 24}, { 0, 0, 6, 27},
{ 0, 0, 6, 30}, { 0, 0, 6, 33},
{ 0, 1, 6, 35}, { 0, 1, 6, 39},
{ 0, 2, 6, 43}, { 0, 3, 6, 51},
{ 0, 4, 6, 67}, { 0, 5, 6, 99},
{ 0, 8, 6, 259}, { 32, 0, 4, 4},
{ 48, 0, 4, 4}, { 16, 0, 4, 5},
{ 32, 0, 5, 7}, { 32, 0, 5, 8},
{ 32, 0, 5, 10}, { 32, 0, 5, 11},
{ 0, 0, 6, 14}, { 0, 0, 6, 17},
{ 0, 0, 6, 20}, { 0, 0, 6, 23},
{ 0, 0, 6, 26}, { 0, 0, 6, 29},
{ 0, 0, 6, 32}, { 0, 16, 6,65539},
{ 0, 15, 6,32771}, { 0, 14, 6,16387},
{ 0, 13, 6, 8195}, { 0, 12, 6, 4099},
{ 0, 11, 6, 2051}, { 0, 10, 6, 1027},
}; /* ML_defaultDTable */ }; /* ML_defaultDTable */
/* Default FSE distribution table for Offset Codes */
static const FSE_decode_t4 OF_defaultDTable[(1<<OF_DEFAULTNORMLOG)+1] = { static void ZSTD_buildSeqTable_rle(ZSTD_seqSymbol* dt, U32 baseValue, U32 nbAddBits)
{ { OF_DEFAULTNORMLOG, 1, 1 } }, /* header : tableLog, fastMode, fastMode */ {
/* base, symbol, bits */ void* ptr = dt;
{ { 0, 0, 5 } }, { { 0, 6, 4 } }, ZSTD_seqSymbol_header* const DTableH = (ZSTD_seqSymbol_header*)ptr;
{ { 0, 9, 5 } }, { { 0, 15, 5 } }, ZSTD_seqSymbol* const cell = dt + 1;
{ { 0, 21, 5 } }, { { 0, 3, 5 } },
{ { 0, 7, 4 } }, { { 0, 12, 5 } }, DTableH->tableLog = 0;
{ { 0, 18, 5 } }, { { 0, 23, 5 } }, DTableH->fastMode = 0;
{ { 0, 5, 5 } }, { { 0, 8, 4 } },
{ { 0, 14, 5 } }, { { 0, 20, 5 } }, cell->nbBits = 0;
{ { 0, 2, 5 } }, { { 16, 7, 4 } }, cell->nextState = 0;
{ { 0, 11, 5 } }, { { 0, 17, 5 } }, assert(nbAddBits < 255);
{ { 0, 22, 5 } }, { { 0, 4, 5 } }, cell->nbAdditionalBits = (BYTE)nbAddBits;
{ { 16, 8, 4 } }, { { 0, 13, 5 } }, cell->baseValue = baseValue;
{ { 0, 19, 5 } }, { { 0, 1, 5 } }, }
{ { 16, 6, 4 } }, { { 0, 10, 5 } },
{ { 0, 16, 5 } }, { { 0, 28, 5 } },
{ { 0, 27, 5 } }, { { 0, 26, 5 } }, /* ZSTD_buildFSETable() :
{ { 0, 25, 5 } }, { { 0, 24, 5 } }, * generate FSE decoding table for one symbol (ll, ml or off) */
}; /* OF_defaultDTable */ static void
ZSTD_buildFSETable(ZSTD_seqSymbol* dt,
const short* normalizedCounter, unsigned maxSymbolValue,
const U32* baseValue, const U32* nbAdditionalBits,
unsigned tableLog)
{
ZSTD_seqSymbol* const tableDecode = dt+1;
U16 symbolNext[MaxSeq+1];
U32 const maxSV1 = maxSymbolValue + 1;
U32 const tableSize = 1 << tableLog;
U32 highThreshold = tableSize-1;
/* Sanity Checks */
assert(maxSymbolValue <= MaxSeq);
assert(tableLog <= MaxFSELog);
/* Init, lay down lowprob symbols */
{ ZSTD_seqSymbol_header DTableH;
DTableH.tableLog = tableLog;
DTableH.fastMode = 1;
{ S16 const largeLimit= (S16)(1 << (tableLog-1));
U32 s;
for (s=0; s<maxSV1; s++) {
if (normalizedCounter[s]==-1) {
tableDecode[highThreshold--].baseValue = s;
symbolNext[s] = 1;
} else {
if (normalizedCounter[s] >= largeLimit) DTableH.fastMode=0;
symbolNext[s] = normalizedCounter[s];
} } }
memcpy(dt, &DTableH, sizeof(DTableH));
}
/* Spread symbols */
{ U32 const tableMask = tableSize-1;
U32 const step = FSE_TABLESTEP(tableSize);
U32 s, position = 0;
for (s=0; s<maxSV1; s++) {
int i;
for (i=0; i<normalizedCounter[s]; i++) {
tableDecode[position].baseValue = s;
position = (position + step) & tableMask;
while (position > highThreshold) position = (position + step) & tableMask; /* lowprob area */
} }
assert(position == 0); /* position must reach all cells once, otherwise normalizedCounter is incorrect */
}
/* Build Decoding table */
{ U32 u;
for (u=0; u<tableSize; u++) {
U32 const symbol = tableDecode[u].baseValue;
U32 const nextState = symbolNext[symbol]++;
tableDecode[u].nbBits = (BYTE) (tableLog - BIT_highbit32(nextState) );
tableDecode[u].nextState = (U16) ( (nextState << tableDecode[u].nbBits) - tableSize);
assert(nbAdditionalBits[symbol] < 255);
tableDecode[u].nbAdditionalBits = (BYTE)nbAdditionalBits[symbol];
tableDecode[u].baseValue = baseValue[symbol];
} }
}
/*! ZSTD_buildSeqTable() : /*! ZSTD_buildSeqTable() :
* @return : nb bytes read from src, * @return : nb bytes read from src,
* or an error code if it fails, testable with ZSTD_isError() * or an error code if it fails */
*/ static size_t ZSTD_buildSeqTable(ZSTD_seqSymbol* DTableSpace, const ZSTD_seqSymbol** DTablePtr,
static size_t ZSTD_buildSeqTable(FSE_DTable* DTableSpace, const FSE_DTable** DTablePtr,
symbolEncodingType_e type, U32 max, U32 maxLog, symbolEncodingType_e type, U32 max, U32 maxLog,
const void* src, size_t srcSize, const void* src, size_t srcSize,
const FSE_decode_t4* defaultTable, U32 flagRepeatTable) const U32* baseValue, const U32* nbAdditionalBits,
const ZSTD_seqSymbol* defaultTable, U32 flagRepeatTable)
{ {
switch(type) switch(type)
{ {
case set_rle : case set_rle :
if (!srcSize) return ERROR(srcSize_wrong); if (!srcSize) return ERROR(srcSize_wrong);
if ( (*(const BYTE*)src) > max) return ERROR(corruption_detected); if ( (*(const BYTE*)src) > max) return ERROR(corruption_detected);
FSE_buildDTable_rle(DTableSpace, *(const BYTE*)src); { U32 const symbol = *(const BYTE*)src;
U32 const baseline = baseValue[symbol];
U32 const nbBits = nbAdditionalBits[symbol];
ZSTD_buildSeqTable_rle(DTableSpace, baseline, nbBits);
}
*DTablePtr = DTableSpace; *DTablePtr = DTableSpace;
return 1; return 1;
case set_basic : case set_basic :
*DTablePtr = &defaultTable->dtable; *DTablePtr = defaultTable;
return 0; return 0;
case set_repeat: case set_repeat:
if (!flagRepeatTable) return ERROR(corruption_detected); if (!flagRepeatTable) return ERROR(corruption_detected);
@ -754,7 +893,7 @@ static size_t ZSTD_buildSeqTable(FSE_DTable* DTableSpace, const FSE_DTable** DTa
size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize); size_t const headerSize = FSE_readNCount(norm, &max, &tableLog, src, srcSize);
if (FSE_isError(headerSize)) return ERROR(corruption_detected); if (FSE_isError(headerSize)) return ERROR(corruption_detected);
if (tableLog > maxLog) return ERROR(corruption_detected); if (tableLog > maxLog) return ERROR(corruption_detected);
FSE_buildDTable(DTableSpace, norm, max, tableLog); ZSTD_buildFSETable(DTableSpace, norm, max, baseValue, nbAdditionalBits, tableLog);
*DTablePtr = DTableSpace; *DTablePtr = DTableSpace;
return headerSize; return headerSize;
} }
@ -764,6 +903,35 @@ static size_t ZSTD_buildSeqTable(FSE_DTable* DTableSpace, const FSE_DTable** DTa
} }
} }
static const U32 LL_base[MaxLL+1] = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 18, 20, 22, 24, 28, 32, 40,
48, 64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000,
0x2000, 0x4000, 0x8000, 0x10000 };
static const U32 OF_base[MaxOff+1] = {
0, 1, 1, 5, 0xD, 0x1D, 0x3D, 0x7D,
0xFD, 0x1FD, 0x3FD, 0x7FD, 0xFFD, 0x1FFD, 0x3FFD, 0x7FFD,
0xFFFD, 0x1FFFD, 0x3FFFD, 0x7FFFD, 0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD,
0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD, 0x1FFFFFFD, 0x3FFFFFFD, 0x7FFFFFFD };
static const U32 OF_bits[MaxOff+1] = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31 };
static const U32 ML_base[MaxML+1] = {
3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34,
35, 37, 39, 41, 43, 47, 51, 59,
67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803,
0x1003, 0x2003, 0x4003, 0x8003, 0x10003 };
size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr, size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
const void* src, size_t srcSize) const void* src, size_t srcSize)
{ {
@ -800,19 +968,27 @@ size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
/* Build DTables */ /* Build DTables */
{ size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr, { size_t const llhSize = ZSTD_buildSeqTable(dctx->entropy.LLTable, &dctx->LLTptr,
LLtype, MaxLL, LLFSELog, LLtype, MaxLL, LLFSELog,
ip, iend-ip, LL_defaultDTable, dctx->fseEntropy); ip, iend-ip,
LL_base, LL_bits,
LL_defaultDTable, dctx->fseEntropy);
if (ZSTD_isError(llhSize)) return ERROR(corruption_detected); if (ZSTD_isError(llhSize)) return ERROR(corruption_detected);
ip += llhSize; ip += llhSize;
} }
{ size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr, { size_t const ofhSize = ZSTD_buildSeqTable(dctx->entropy.OFTable, &dctx->OFTptr,
OFtype, MaxOff, OffFSELog, OFtype, MaxOff, OffFSELog,
ip, iend-ip, OF_defaultDTable, dctx->fseEntropy); ip, iend-ip,
OF_base, OF_bits,
OF_defaultDTable, dctx->fseEntropy);
if (ZSTD_isError(ofhSize)) return ERROR(corruption_detected); if (ZSTD_isError(ofhSize)) return ERROR(corruption_detected);
ip += ofhSize; ip += ofhSize;
} }
{ size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr, { size_t const mlhSize = ZSTD_buildSeqTable(dctx->entropy.MLTable, &dctx->MLTptr,
MLtype, MaxML, MLFSELog, MLtype, MaxML, MLFSELog,
ip, iend-ip, ML_defaultDTable, dctx->fseEntropy); ip, iend-ip,
ML_base, ML_bits,
ML_defaultDTable, dctx->fseEntropy);
if (ZSTD_isError(mlhSize)) return ERROR(corruption_detected); if (ZSTD_isError(mlhSize)) return ERROR(corruption_detected);
ip += mlhSize; ip += mlhSize;
} }
@ -829,11 +1005,16 @@ typedef struct {
const BYTE* match; const BYTE* match;
} seq_t; } seq_t;
typedef struct {
size_t state;
const ZSTD_seqSymbol* table;
} ZSTD_fseState;
typedef struct { typedef struct {
BIT_DStream_t DStream; BIT_DStream_t DStream;
FSE_DState_t stateLL; ZSTD_fseState stateLL;
FSE_DState_t stateOffb; ZSTD_fseState stateOffb;
FSE_DState_t stateML; ZSTD_fseState stateML;
size_t prevOffset[ZSTD_REP_NUM]; size_t prevOffset[ZSTD_REP_NUM];
const BYTE* prefixStart; const BYTE* prefixStart;
const BYTE* dictEnd; const BYTE* dictEnd;
@ -888,119 +1069,6 @@ size_t ZSTD_execSequenceLast7(BYTE* op,
} }
typedef enum { ZSTD_lo_isRegularOffset, ZSTD_lo_isLongOffset=1 } ZSTD_longOffset_e;
/* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the maximum
* offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - 1)
* bits before reloading. This value is the maximum number of bytes we read
* after reloading when we are decoding long offets.
*/
#define LONG_OFFSETS_MAX_EXTRA_BITS_32 \
(ZSTD_WINDOWLOG_MAX_32 > STREAM_ACCUMULATOR_MIN_32 \
? ZSTD_WINDOWLOG_MAX_32 - STREAM_ACCUMULATOR_MIN_32 \
: 0)
static seq_t ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffsets)
{
seq_t seq;
U32 const llCode = FSE_peekSymbol(&seqState->stateLL);
U32 const mlCode = FSE_peekSymbol(&seqState->stateML);
U32 const ofCode = FSE_peekSymbol(&seqState->stateOffb); /* <= MaxOff, by table construction */
U32 const llBits = LL_bits[llCode];
U32 const mlBits = ML_bits[mlCode];
U32 const ofBits = ofCode;
U32 const totalBits = llBits+mlBits+ofBits;
static const U32 LL_base[MaxLL+1] = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 18, 20, 22, 24, 28, 32, 40,
48, 64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000,
0x2000, 0x4000, 0x8000, 0x10000 };
static const U32 ML_base[MaxML+1] = {
3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34,
35, 37, 39, 41, 43, 47, 51, 59,
67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803,
0x1003, 0x2003, 0x4003, 0x8003, 0x10003 };
static const U32 OF_base[MaxOff+1] = {
0, 1, 1, 5, 0xD, 0x1D, 0x3D, 0x7D,
0xFD, 0x1FD, 0x3FD, 0x7FD, 0xFFD, 0x1FFD, 0x3FFD, 0x7FFD,
0xFFFD, 0x1FFFD, 0x3FFFD, 0x7FFFD, 0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD,
0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD, 0x1FFFFFFD, 0x3FFFFFFD, 0x7FFFFFFD };
/* sequence */
{ size_t offset;
if (!ofCode)
offset = 0;
else {
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
assert(ofBits <= MaxOff);
if (MEM_32bits() && longOffsets && (ofBits >= STREAM_ACCUMULATOR_MIN_32)) {
U32 const extraBits = ofBits - MIN(ofBits, 32 - seqState->DStream.bitsConsumed);
offset = OF_base[ofCode] + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
BIT_reloadDStream(&seqState->DStream);
if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
assert(extraBits <= LONG_OFFSETS_MAX_EXTRA_BITS_32); /* to avoid another reload */
} else {
offset = OF_base[ofCode] + BIT_readBitsFast(&seqState->DStream, ofBits/*>0*/); /* <= (ZSTD_WINDOWLOG_MAX-1) bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
}
}
if (ofCode <= 1) {
offset += (llCode==0);
if (offset) {
size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */
if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset = temp;
} else { /* offset == 0 */
offset = seqState->prevOffset[0];
}
} else {
seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset;
}
seq.offset = offset;
}
seq.matchLength = ML_base[mlCode]
+ ((mlCode>31) ? BIT_readBitsFast(&seqState->DStream, mlBits/*>0*/) : 0); /* <= 16 bits */
if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
BIT_reloadDStream(&seqState->DStream);
if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
BIT_reloadDStream(&seqState->DStream);
/* Ensure there are enough bits to read the rest of data in 64-bit mode. */
ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
seq.litLength = LL_base[llCode]
+ ((llCode>15) ? BIT_readBitsFast(&seqState->DStream, llBits/*>0*/) : 0); /* <= 16 bits */
if (MEM_32bits())
BIT_reloadDStream(&seqState->DStream);
DEBUGLOG(6, "seq: litL=%u, matchL=%u, offset=%u",
(U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset);
/* ANS state update */
FSE_updateState(&seqState->stateLL, &seqState->DStream); /* <= 9 bits */
FSE_updateState(&seqState->stateML, &seqState->DStream); /* <= 9 bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream); /* <= 18 bits */
FSE_updateState(&seqState->stateOffb, &seqState->DStream); /* <= 8 bits */
return seq;
}
HINT_INLINE HINT_INLINE
size_t ZSTD_execSequence(BYTE* op, size_t ZSTD_execSequence(BYTE* op,
BYTE* const oend, seq_t sequence, BYTE* const oend, seq_t sequence,
@ -1082,165 +1150,6 @@ size_t ZSTD_execSequence(BYTE* op,
} }
static size_t ZSTD_decompressSequences(
ZSTD_DCtx* dctx,
void* dst, size_t maxDstSize,
const void* seqStart, size_t seqSize, int nbSeq,
const ZSTD_longOffset_e isLongOffset)
{
const BYTE* ip = (const BYTE*)seqStart;
const BYTE* const iend = ip + seqSize;
BYTE* const ostart = (BYTE* const)dst;
BYTE* const oend = ostart + maxDstSize;
BYTE* op = ostart;
const BYTE* litPtr = dctx->litPtr;
const BYTE* const litEnd = litPtr + dctx->litSize;
const BYTE* const base = (const BYTE*) (dctx->base);
const BYTE* const vBase = (const BYTE*) (dctx->vBase);
const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
DEBUGLOG(5, "ZSTD_decompressSequences");
/* Regen sequences */
if (nbSeq) {
seqState_t seqState;
dctx->fseEntropy = 1;
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
FSE_initDState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
FSE_initDState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
FSE_initDState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && nbSeq ; ) {
nbSeq--;
{ seq_t const sequence = ZSTD_decodeSequence(&seqState, isLongOffset);
size_t const oneSeqSize = ZSTD_execSequence(op, oend, sequence, &litPtr, litEnd, base, vBase, dictEnd);
DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqSize);
if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
op += oneSeqSize;
} }
/* check if reached exact end */
DEBUGLOG(5, "after decode loop, remaining nbSeq : %i", nbSeq);
if (nbSeq) return ERROR(corruption_detected);
/* save reps for next block */
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
}
/* last literal segment */
{ size_t const lastLLSize = litEnd - litPtr;
if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
memcpy(op, litPtr, lastLLSize);
op += lastLLSize;
}
return op-ostart;
}
HINT_INLINE
seq_t ZSTD_decodeSequenceLong(seqState_t* seqState, ZSTD_longOffset_e const longOffsets)
{
seq_t seq;
U32 const llCode = FSE_peekSymbol(&seqState->stateLL);
U32 const mlCode = FSE_peekSymbol(&seqState->stateML);
U32 const ofCode = FSE_peekSymbol(&seqState->stateOffb); /* <= MaxOff, by table construction */
U32 const llBits = LL_bits[llCode];
U32 const mlBits = ML_bits[mlCode];
U32 const ofBits = ofCode;
U32 const totalBits = llBits+mlBits+ofBits;
static const U32 LL_base[MaxLL+1] = {
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 18, 20, 22, 24, 28, 32, 40,
48, 64, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000,
0x2000, 0x4000, 0x8000, 0x10000 };
static const U32 ML_base[MaxML+1] = {
3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34,
35, 37, 39, 41, 43, 47, 51, 59,
67, 83, 99, 0x83, 0x103, 0x203, 0x403, 0x803,
0x1003, 0x2003, 0x4003, 0x8003, 0x10003 };
static const U32 OF_base[MaxOff+1] = {
0, 1, 1, 5, 0xD, 0x1D, 0x3D, 0x7D,
0xFD, 0x1FD, 0x3FD, 0x7FD, 0xFFD, 0x1FFD, 0x3FFD, 0x7FFD,
0xFFFD, 0x1FFFD, 0x3FFFD, 0x7FFFD, 0xFFFFD, 0x1FFFFD, 0x3FFFFD, 0x7FFFFD,
0xFFFFFD, 0x1FFFFFD, 0x3FFFFFD, 0x7FFFFFD, 0xFFFFFFD, 0x1FFFFFFD, 0x3FFFFFFD, 0x7FFFFFFD };
/* sequence */
{ size_t offset;
if (!ofCode)
offset = 0;
else {
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
assert(ofBits <= MaxOff);
if (MEM_32bits() && longOffsets) {
U32 const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN_32-1);
offset = OF_base[ofCode] + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
if (MEM_32bits() || extraBits) BIT_reloadDStream(&seqState->DStream);
if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
} else {
offset = OF_base[ofCode] + BIT_readBitsFast(&seqState->DStream, ofBits); /* <= (ZSTD_WINDOWLOG_MAX-1) bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
}
}
if (ofCode <= 1) {
offset += (llCode==0);
if (offset) {
size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */
if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset = temp;
} else {
offset = seqState->prevOffset[0];
}
} else {
seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset;
}
seq.offset = offset;
}
seq.matchLength = ML_base[mlCode] + ((mlCode>31) ? BIT_readBitsFast(&seqState->DStream, mlBits) : 0); /* <= 16 bits */
if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
BIT_reloadDStream(&seqState->DStream);
if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
BIT_reloadDStream(&seqState->DStream);
/* Verify that there is enough bits to read the rest of the data in 64-bit mode. */
ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
seq.litLength = LL_base[llCode] + ((llCode>15) ? BIT_readBitsFast(&seqState->DStream, llBits) : 0); /* <= 16 bits */
if (MEM_32bits())
BIT_reloadDStream(&seqState->DStream);
{ size_t const pos = seqState->pos + seq.litLength;
const BYTE* const matchBase = (seq.offset > pos) ? seqState->dictEnd : seqState->prefixStart;
seq.match = matchBase + pos - seq.offset; /* note : this operation can overflow when seq.offset is really too large, which can only happen when input is corrupted.
* No consequence though : no memory access will occur, overly large offset will be detected in ZSTD_execSequenceLong() */
seqState->pos = pos + seq.matchLength;
}
/* ANS state update */
FSE_updateState(&seqState->stateLL, &seqState->DStream); /* <= 9 bits */
FSE_updateState(&seqState->stateML, &seqState->DStream); /* <= 9 bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream); /* <= 18 bits */
FSE_updateState(&seqState->stateOffb, &seqState->DStream); /* <= 8 bits */
return seq;
}
HINT_INLINE HINT_INLINE
size_t ZSTD_execSequenceLong(BYTE* op, size_t ZSTD_execSequenceLong(BYTE* op,
BYTE* const oend, seq_t sequence, BYTE* const oend, seq_t sequence,
@ -1320,82 +1229,52 @@ size_t ZSTD_execSequenceLong(BYTE* op,
return sequenceLength; return sequenceLength;
} }
static size_t ZSTD_decompressSequencesLong( typedef enum { ZSTD_lo_isRegularOffset, ZSTD_lo_isLongOffset=1 } ZSTD_longOffset_e;
ZSTD_DCtx* dctx,
void* dst, size_t maxDstSize, #define FUNCTION(fn) fn##_default
const void* seqStart, size_t seqSize, int nbSeq, #define TARGET
#include "zstd_decompress_impl.h"
#undef TARGET
#undef FUNCTION
#if DYNAMIC_BMI2
#define FUNCTION(fn) fn##_bmi2
#define TARGET TARGET_ATTRIBUTE("bmi2")
#include "zstd_decompress_impl.h"
#undef TARGET
#undef FUNCTION
#endif
typedef size_t (*ZSTD_decompressSequences_t)(
ZSTD_DCtx *dctx, void *dst, size_t maxDstSize, const void *seqStart,
size_t seqSize, const ZSTD_longOffset_e isLongOffset);
static size_t ZSTD_decompressSequences(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize,
const void* seqStart, size_t seqSize,
const ZSTD_longOffset_e isLongOffset) const ZSTD_longOffset_e isLongOffset)
{ {
const BYTE* ip = (const BYTE*)seqStart; #if DYNAMIC_BMI2
const BYTE* const iend = ip + seqSize; if (dctx->bmi2) {
BYTE* const ostart = (BYTE* const)dst; return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqStart, seqSize, isLongOffset);
BYTE* const oend = ostart + maxDstSize;
BYTE* op = ostart;
const BYTE* litPtr = dctx->litPtr;
const BYTE* const litEnd = litPtr + dctx->litSize;
const BYTE* const prefixStart = (const BYTE*) (dctx->base);
const BYTE* const dictStart = (const BYTE*) (dctx->vBase);
const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
/* Regen sequences */
if (nbSeq) {
#define STORED_SEQS 4
#define STOSEQ_MASK (STORED_SEQS-1)
#define ADVANCED_SEQS 4
seq_t sequences[STORED_SEQS];
int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS);
seqState_t seqState;
int seqNb;
dctx->fseEntropy = 1;
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
seqState.prefixStart = prefixStart;
seqState.pos = (size_t)(op-prefixStart);
seqState.dictEnd = dictEnd;
CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
FSE_initDState(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
FSE_initDState(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
FSE_initDState(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
/* prepare in advance */
for (seqNb=0; (BIT_reloadDStream(&seqState.DStream) <= BIT_DStream_completed) && (seqNb<seqAdvance); seqNb++) {
sequences[seqNb] = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
} }
if (seqNb<seqAdvance) return ERROR(corruption_detected); #endif
return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStart, seqSize, isLongOffset);
/* decode and decompress */
for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && (seqNb<nbSeq) ; seqNb++) {
seq_t const sequence = ZSTD_decodeSequenceLong(&seqState, isLongOffset);
size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[(seqNb-ADVANCED_SEQS) & STOSEQ_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
PREFETCH(sequence.match); /* note : it's safe to invoke PREFETCH() on any memory address, including invalid ones */
sequences[seqNb&STOSEQ_MASK] = sequence;
op += oneSeqSize;
}
if (seqNb<nbSeq) return ERROR(corruption_detected);
/* finish queue */
seqNb -= seqAdvance;
for ( ; seqNb<nbSeq ; seqNb++) {
size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[seqNb&STOSEQ_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
op += oneSeqSize;
} }
/* save reps for next block */ static size_t ZSTD_decompressSequencesLong(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize,
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); } const void* seqStart, size_t seqSize,
const ZSTD_longOffset_e isLongOffset)
{
#if DYNAMIC_BMI2
if (dctx->bmi2) {
return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, seqStart, seqSize, isLongOffset);
} }
#endif
/* last literal segment */ return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqStart, seqSize, isLongOffset);
{ size_t const lastLLSize = litEnd - litPtr;
if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
memcpy(op, litPtr, lastLLSize);
op += lastLLSize;
} }
return op-ostart;
}
static unsigned static unsigned
ZSTD_getLongOffsetsShare(const FSE_DTable* offTable) ZSTD_getLongOffsetsShare(const FSE_DTable* offTable)
{ {
@ -1438,22 +1317,12 @@ static size_t ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
srcSize -= litCSize; srcSize -= litCSize;
} }
/* Build Decoding Tables */ if ( frame /* windowSize exists */
{ int nbSeq; && (dctx->fParams.windowSize > (1<<24))
size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, srcSize); && MEM_64bits() /* x86 benefits less from long mode than x64 */ )
if (ZSTD_isError(seqHSize)) return seqHSize; return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip, srcSize, isLongOffset);
ip += seqHSize;
srcSize -= seqHSize;
if (dctx->fParams.windowSize > (1<<24)) { return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, srcSize, isLongOffset);
U32 const shareLongOffsets = ZSTD_getLongOffsetsShare(dctx->OFTptr);
U32 const minShare = MEM_64bits() ? 5 : 13;
if (shareLongOffsets >= minShare)
return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip, srcSize, nbSeq, isLongOffset);
}
return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, srcSize, nbSeq, isLongOffset);
}
} }
@ -1948,8 +1817,12 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyDTables_t* entropy, const void* const
U32 offcodeMaxValue = MaxOff, offcodeLog; U32 offcodeMaxValue = MaxOff, offcodeLog;
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr); size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted); if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted);
if (offcodeMaxValue > MaxOff) return ERROR(dictionary_corrupted);
if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted); if (offcodeLog > OffFSELog) return ERROR(dictionary_corrupted);
CHECK_E(FSE_buildDTable(entropy->OFTable, offcodeNCount, offcodeMaxValue, offcodeLog), dictionary_corrupted); ZSTD_buildFSETable(entropy->OFTable,
offcodeNCount, offcodeMaxValue,
OF_base, OF_bits,
offcodeLog);
dictPtr += offcodeHeaderSize; dictPtr += offcodeHeaderSize;
} }
@ -1957,8 +1830,12 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyDTables_t* entropy, const void* const
unsigned matchlengthMaxValue = MaxML, matchlengthLog; unsigned matchlengthMaxValue = MaxML, matchlengthLog;
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr); size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(matchlengthHeaderSize)) return ERROR(dictionary_corrupted); if (FSE_isError(matchlengthHeaderSize)) return ERROR(dictionary_corrupted);
if (matchlengthMaxValue > MaxML) return ERROR(dictionary_corrupted);
if (matchlengthLog > MLFSELog) return ERROR(dictionary_corrupted); if (matchlengthLog > MLFSELog) return ERROR(dictionary_corrupted);
CHECK_E(FSE_buildDTable(entropy->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog), dictionary_corrupted); ZSTD_buildFSETable(entropy->MLTable,
matchlengthNCount, matchlengthMaxValue,
ML_base, ML_bits,
matchlengthLog);
dictPtr += matchlengthHeaderSize; dictPtr += matchlengthHeaderSize;
} }
@ -1966,8 +1843,12 @@ static size_t ZSTD_loadEntropy(ZSTD_entropyDTables_t* entropy, const void* const
unsigned litlengthMaxValue = MaxLL, litlengthLog; unsigned litlengthMaxValue = MaxLL, litlengthLog;
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr); size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(litlengthHeaderSize)) return ERROR(dictionary_corrupted); if (FSE_isError(litlengthHeaderSize)) return ERROR(dictionary_corrupted);
if (litlengthMaxValue > MaxLL) return ERROR(dictionary_corrupted);
if (litlengthLog > LLFSELog) return ERROR(dictionary_corrupted); if (litlengthLog > LLFSELog) return ERROR(dictionary_corrupted);
CHECK_E(FSE_buildDTable(entropy->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog), dictionary_corrupted); ZSTD_buildFSETable(entropy->LLTable,
litlengthNCount, litlengthMaxValue,
LL_base, LL_bits,
litlengthLog);
dictPtr += litlengthHeaderSize; dictPtr += litlengthHeaderSize;
} }

View File

@ -0,0 +1,356 @@
/*
* Copyright (c) 2018-present, Facebook, Inc.
* All rights reserved.
*
* This source code is licensed under both the BSD-style license (found in the
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
* in the COPYING file in the root directory of this source tree).
* You may select, at your option, one of the above-listed licenses.
*/
#ifndef FUNCTION
# error "FUNCTION(name) must be defined"
#endif
#ifndef TARGET
# error "TARGET must be defined"
#endif
static TARGET void
FUNCTION(ZSTD_updateFseState)(ZSTD_fseState* DStatePtr, BIT_DStream_t* bitD)
{
ZSTD_seqSymbol const DInfo = DStatePtr->table[DStatePtr->state];
U32 const nbBits = DInfo.nbBits;
size_t const lowBits = BIT_readBits(bitD, nbBits);
DStatePtr->state = DInfo.nextState + lowBits;
}
/* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the maximum
* offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - 1)
* bits before reloading. This value is the maximum number of bytes we read
* after reloading when we are decoding long offets.
*/
#define LONG_OFFSETS_MAX_EXTRA_BITS_32 \
(ZSTD_WINDOWLOG_MAX_32 > STREAM_ACCUMULATOR_MIN_32 \
? ZSTD_WINDOWLOG_MAX_32 - STREAM_ACCUMULATOR_MIN_32 \
: 0)
static TARGET seq_t
FUNCTION(ZSTD_decodeSequence)(seqState_t* seqState, const ZSTD_longOffset_e longOffsets)
{
seq_t seq;
U32 const llBits = seqState->stateLL.table[seqState->stateLL.state].nbAdditionalBits;
U32 const mlBits = seqState->stateML.table[seqState->stateML.state].nbAdditionalBits;
U32 const ofBits = seqState->stateOffb.table[seqState->stateOffb.state].nbAdditionalBits;
U32 const totalBits = llBits+mlBits+ofBits;
U32 const llBase = seqState->stateLL.table[seqState->stateLL.state].baseValue;
U32 const mlBase = seqState->stateML.table[seqState->stateML.state].baseValue;
U32 const ofBase = seqState->stateOffb.table[seqState->stateOffb.state].baseValue;
/* sequence */
{ size_t offset;
if (!ofBits)
offset = 0;
else {
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
assert(ofBits <= MaxOff);
if (MEM_32bits() && longOffsets && (ofBits >= STREAM_ACCUMULATOR_MIN_32)) {
U32 const extraBits = ofBits - MIN(ofBits, 32 - seqState->DStream.bitsConsumed);
offset = ofBase + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
BIT_reloadDStream(&seqState->DStream);
if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
assert(extraBits <= LONG_OFFSETS_MAX_EXTRA_BITS_32); /* to avoid another reload */
} else {
offset = ofBase + BIT_readBitsFast(&seqState->DStream, ofBits/*>0*/); /* <= (ZSTD_WINDOWLOG_MAX-1) bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
}
}
if (ofBits <= 1) {
offset += (llBase==0);
if (offset) {
size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */
if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset = temp;
} else { /* offset == 0 */
offset = seqState->prevOffset[0];
}
} else {
seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset;
}
seq.offset = offset;
}
seq.matchLength = mlBase
+ ((mlBits>0) ? BIT_readBitsFast(&seqState->DStream, mlBits/*>0*/) : 0); /* <= 16 bits */
if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
BIT_reloadDStream(&seqState->DStream);
if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
BIT_reloadDStream(&seqState->DStream);
/* Ensure there are enough bits to read the rest of data in 64-bit mode. */
ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
seq.litLength = llBase
+ ((llBits>0) ? BIT_readBitsFast(&seqState->DStream, llBits/*>0*/) : 0); /* <= 16 bits */
if (MEM_32bits())
BIT_reloadDStream(&seqState->DStream);
DEBUGLOG(6, "seq: litL=%u, matchL=%u, offset=%u",
(U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset);
/* ANS state update */
FUNCTION(ZSTD_updateFseState)(&seqState->stateLL, &seqState->DStream); /* <= 9 bits */
FUNCTION(ZSTD_updateFseState)(&seqState->stateML, &seqState->DStream); /* <= 9 bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream); /* <= 18 bits */
FUNCTION(ZSTD_updateFseState)(&seqState->stateOffb, &seqState->DStream); /* <= 8 bits */
return seq;
}
HINT_INLINE seq_t
FUNCTION(ZSTD_decodeSequenceLong)(seqState_t* seqState, ZSTD_longOffset_e const longOffsets)
{
seq_t seq;
U32 const llBits = seqState->stateLL.table[seqState->stateLL.state].nbAdditionalBits;
U32 const mlBits = seqState->stateML.table[seqState->stateML.state].nbAdditionalBits;
U32 const ofBits = seqState->stateOffb.table[seqState->stateOffb.state].nbAdditionalBits;
U32 const totalBits = llBits+mlBits+ofBits;
U32 const llBase = seqState->stateLL.table[seqState->stateLL.state].baseValue;
U32 const mlBase = seqState->stateML.table[seqState->stateML.state].baseValue;
U32 const ofBase = seqState->stateOffb.table[seqState->stateOffb.state].baseValue;
/* sequence */
{ size_t offset;
if (!ofBits)
offset = 0;
else {
ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);
ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 == 5);
assert(ofBits <= MaxOff);
if (MEM_32bits() && longOffsets) {
U32 const extraBits = ofBits - MIN(ofBits, STREAM_ACCUMULATOR_MIN_32-1);
offset = ofBase + (BIT_readBitsFast(&seqState->DStream, ofBits - extraBits) << extraBits);
if (MEM_32bits() || extraBits) BIT_reloadDStream(&seqState->DStream);
if (extraBits) offset += BIT_readBitsFast(&seqState->DStream, extraBits);
} else {
offset = ofBase + BIT_readBitsFast(&seqState->DStream, ofBits); /* <= (ZSTD_WINDOWLOG_MAX-1) bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);
}
}
if (ofBits <= 1) {
offset += (llBase==0);
if (offset) {
size_t temp = (offset==3) ? seqState->prevOffset[0] - 1 : seqState->prevOffset[offset];
temp += !temp; /* 0 is not valid; input is corrupted; force offset to 1 */
if (offset != 1) seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset = temp;
} else {
offset = seqState->prevOffset[0];
}
} else {
seqState->prevOffset[2] = seqState->prevOffset[1];
seqState->prevOffset[1] = seqState->prevOffset[0];
seqState->prevOffset[0] = offset;
}
seq.offset = offset;
}
seq.matchLength = mlBase + ((mlBits>0) ? BIT_readBitsFast(&seqState->DStream, mlBits) : 0); /* <= 16 bits */
if (MEM_32bits() && (mlBits+llBits >= STREAM_ACCUMULATOR_MIN_32-LONG_OFFSETS_MAX_EXTRA_BITS_32))
BIT_reloadDStream(&seqState->DStream);
if (MEM_64bits() && (totalBits >= STREAM_ACCUMULATOR_MIN_64-(LLFSELog+MLFSELog+OffFSELog)))
BIT_reloadDStream(&seqState->DStream);
/* Verify that there is enough bits to read the rest of the data in 64-bit mode. */
ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMULATOR_MIN_64);
seq.litLength = llBase + ((llBits>0) ? BIT_readBitsFast(&seqState->DStream, llBits) : 0); /* <= 16 bits */
if (MEM_32bits())
BIT_reloadDStream(&seqState->DStream);
{ size_t const pos = seqState->pos + seq.litLength;
const BYTE* const matchBase = (seq.offset > pos) ? seqState->dictEnd : seqState->prefixStart;
seq.match = matchBase + pos - seq.offset; /* note : this operation can overflow when seq.offset is really too large, which can only happen when input is corrupted.
* No consequence though : no memory access will occur, overly large offset will be detected in ZSTD_execSequenceLong() */
seqState->pos = pos + seq.matchLength;
}
/* ANS state update */
FUNCTION(ZSTD_updateFseState)(&seqState->stateLL, &seqState->DStream); /* <= 9 bits */
FUNCTION(ZSTD_updateFseState)(&seqState->stateML, &seqState->DStream); /* <= 9 bits */
if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream); /* <= 18 bits */
FUNCTION(ZSTD_updateFseState)(&seqState->stateOffb, &seqState->DStream); /* <= 8 bits */
return seq;
}
static TARGET void
FUNCTION(ZSTD_initFseState)(ZSTD_fseState* DStatePtr, BIT_DStream_t* bitD, const ZSTD_seqSymbol* dt)
{
const void* ptr = dt;
const ZSTD_seqSymbol_header* const DTableH = (const ZSTD_seqSymbol_header*)ptr;
DStatePtr->state = BIT_readBits(bitD, DTableH->tableLog);
DEBUGLOG(6, "ZSTD_initFseState : val=%u using %u bits",
(U32)DStatePtr->state, DTableH->tableLog);
BIT_reloadDStream(bitD);
DStatePtr->table = dt + 1;
}
static TARGET
size_t FUNCTION(ZSTD_decompressSequences)(
ZSTD_DCtx* dctx,
void* dst, size_t maxDstSize,
const void* seqStart, size_t seqSize,
const ZSTD_longOffset_e isLongOffset)
{
const BYTE* ip = (const BYTE*)seqStart;
const BYTE* const iend = ip + seqSize;
BYTE* const ostart = (BYTE* const)dst;
BYTE* const oend = ostart + maxDstSize;
BYTE* op = ostart;
const BYTE* litPtr = dctx->litPtr;
const BYTE* const litEnd = litPtr + dctx->litSize;
const BYTE* const base = (const BYTE*) (dctx->base);
const BYTE* const vBase = (const BYTE*) (dctx->vBase);
const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
int nbSeq;
DEBUGLOG(5, "ZSTD_decompressSequences");
/* Build Decoding Tables */
{ size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, seqSize);
DEBUGLOG(5, "ZSTD_decodeSeqHeaders: size=%u, nbSeq=%i",
(U32)seqHSize, nbSeq);
if (ZSTD_isError(seqHSize)) return seqHSize;
ip += seqHSize;
}
/* Regen sequences */
if (nbSeq) {
seqState_t seqState;
dctx->fseEntropy = 1;
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
FUNCTION(ZSTD_initFseState)(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
FUNCTION(ZSTD_initFseState)(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
FUNCTION(ZSTD_initFseState)(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && nbSeq ; ) {
nbSeq--;
{ seq_t const sequence = FUNCTION(ZSTD_decodeSequence)(&seqState, isLongOffset);
size_t const oneSeqSize = ZSTD_execSequence(op, oend, sequence, &litPtr, litEnd, base, vBase, dictEnd);
DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqSize);
if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
op += oneSeqSize;
} }
/* check if reached exact end */
DEBUGLOG(5, "ZSTD_decompressSequences: after decode loop, remaining nbSeq : %i", nbSeq);
if (nbSeq) return ERROR(corruption_detected);
/* save reps for next block */
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
}
/* last literal segment */
{ size_t const lastLLSize = litEnd - litPtr;
if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
memcpy(op, litPtr, lastLLSize);
op += lastLLSize;
}
return op-ostart;
}
static TARGET
size_t FUNCTION(ZSTD_decompressSequencesLong)(
ZSTD_DCtx* dctx,
void* dst, size_t maxDstSize,
const void* seqStart, size_t seqSize,
const ZSTD_longOffset_e isLongOffset)
{
const BYTE* ip = (const BYTE*)seqStart;
const BYTE* const iend = ip + seqSize;
BYTE* const ostart = (BYTE* const)dst;
BYTE* const oend = ostart + maxDstSize;
BYTE* op = ostart;
const BYTE* litPtr = dctx->litPtr;
const BYTE* const litEnd = litPtr + dctx->litSize;
const BYTE* const prefixStart = (const BYTE*) (dctx->base);
const BYTE* const dictStart = (const BYTE*) (dctx->vBase);
const BYTE* const dictEnd = (const BYTE*) (dctx->dictEnd);
int nbSeq;
/* Build Decoding Tables */
{ size_t const seqHSize = ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, seqSize);
if (ZSTD_isError(seqHSize)) return seqHSize;
ip += seqHSize;
}
/* Regen sequences */
if (nbSeq) {
#define STORED_SEQS 4
#define STOSEQ_MASK (STORED_SEQS-1)
#define ADVANCED_SEQS 4
seq_t sequences[STORED_SEQS];
int const seqAdvance = MIN(nbSeq, ADVANCED_SEQS);
seqState_t seqState;
int seqNb;
dctx->fseEntropy = 1;
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) seqState.prevOffset[i] = dctx->entropy.rep[i]; }
seqState.prefixStart = prefixStart;
seqState.pos = (size_t)(op-prefixStart);
seqState.dictEnd = dictEnd;
CHECK_E(BIT_initDStream(&seqState.DStream, ip, iend-ip), corruption_detected);
FUNCTION(ZSTD_initFseState)(&seqState.stateLL, &seqState.DStream, dctx->LLTptr);
FUNCTION(ZSTD_initFseState)(&seqState.stateOffb, &seqState.DStream, dctx->OFTptr);
FUNCTION(ZSTD_initFseState)(&seqState.stateML, &seqState.DStream, dctx->MLTptr);
/* prepare in advance */
for (seqNb=0; (BIT_reloadDStream(&seqState.DStream) <= BIT_DStream_completed) && (seqNb<seqAdvance); seqNb++) {
sequences[seqNb] = FUNCTION(ZSTD_decodeSequenceLong)(&seqState, isLongOffset);
}
if (seqNb<seqAdvance) return ERROR(corruption_detected);
/* decode and decompress */
for ( ; (BIT_reloadDStream(&(seqState.DStream)) <= BIT_DStream_completed) && (seqNb<nbSeq) ; seqNb++) {
seq_t const sequence = FUNCTION(ZSTD_decodeSequenceLong)(&seqState, isLongOffset);
size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[(seqNb-ADVANCED_SEQS) & STOSEQ_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
PREFETCH(sequence.match); /* note : it's safe to invoke PREFETCH() on any memory address, including invalid ones */
sequences[seqNb&STOSEQ_MASK] = sequence;
op += oneSeqSize;
}
if (seqNb<nbSeq) return ERROR(corruption_detected);
/* finish queue */
seqNb -= seqAdvance;
for ( ; seqNb<nbSeq ; seqNb++) {
size_t const oneSeqSize = ZSTD_execSequenceLong(op, oend, sequences[seqNb&STOSEQ_MASK], &litPtr, litEnd, prefixStart, dictStart, dictEnd);
if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
op += oneSeqSize;
}
/* save reps for next block */
{ U32 i; for (i=0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] = (U32)(seqState.prevOffset[i]); }
#undef STORED_SEQS
#undef STOSEQ_MASK
#undef ADVANCED_SEQS
}
/* last literal segment */
{ size_t const lastLLSize = litEnd - litPtr;
if (lastLLSize > (size_t)(oend-op)) return ERROR(dstSize_tooSmall);
memcpy(op, litPtr, lastLLSize);
op += lastLLSize;
}
return op-ostart;
}

View File

@ -381,7 +381,9 @@ ZSTDLIB_API size_t ZSTD_DStreamOutSize(void); /*!< recommended size for output
#define ZSTD_WINDOWLOG_MIN 10 #define ZSTD_WINDOWLOG_MIN 10
#define ZSTD_HASHLOG_MAX ((ZSTD_WINDOWLOG_MAX < 30) ? ZSTD_WINDOWLOG_MAX : 30) #define ZSTD_HASHLOG_MAX ((ZSTD_WINDOWLOG_MAX < 30) ? ZSTD_WINDOWLOG_MAX : 30)
#define ZSTD_HASHLOG_MIN 6 #define ZSTD_HASHLOG_MIN 6
#define ZSTD_CHAINLOG_MAX ((ZSTD_WINDOWLOG_MAX < 29) ? ZSTD_WINDOWLOG_MAX+1 : 30) #define ZSTD_CHAINLOG_MAX_32 29
#define ZSTD_CHAINLOG_MAX_64 30
#define ZSTD_CHAINLOG_MAX ((unsigned)(sizeof(size_t) == 4 ? ZSTD_CHAINLOG_MAX_32 : ZSTD_CHAINLOG_MAX_64))
#define ZSTD_CHAINLOG_MIN ZSTD_HASHLOG_MIN #define ZSTD_CHAINLOG_MIN ZSTD_HASHLOG_MIN
#define ZSTD_HASHLOG3_MAX 17 #define ZSTD_HASHLOG3_MAX 17
#define ZSTD_SEARCHLOG_MAX (ZSTD_WINDOWLOG_MAX-1) #define ZSTD_SEARCHLOG_MAX (ZSTD_WINDOWLOG_MAX-1)
@ -506,7 +508,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
* It will also consider src size to be arbitrarily "large", which is worst case. * It will also consider src size to be arbitrarily "large", which is worst case.
* If srcSize is known to always be small, ZSTD_estimateCCtxSize_usingCParams() can provide a tighter estimation. * If srcSize is known to always be small, ZSTD_estimateCCtxSize_usingCParams() can provide a tighter estimation.
* ZSTD_estimateCCtxSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel. * ZSTD_estimateCCtxSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
* ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbThreads is > 1. * ZSTD_estimateCCtxSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbWorkers is >= 1.
* Note : CCtx size estimation is only correct for single-threaded compression. */ * Note : CCtx size estimation is only correct for single-threaded compression. */
ZSTDLIB_API size_t ZSTD_estimateCCtxSize(int compressionLevel); ZSTDLIB_API size_t ZSTD_estimateCCtxSize(int compressionLevel);
ZSTDLIB_API size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cParams); ZSTDLIB_API size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cParams);
@ -518,7 +520,7 @@ ZSTDLIB_API size_t ZSTD_estimateDCtxSize(void);
* It will also consider src size to be arbitrarily "large", which is worst case. * It will also consider src size to be arbitrarily "large", which is worst case.
* If srcSize is known to always be small, ZSTD_estimateCStreamSize_usingCParams() can provide a tighter estimation. * If srcSize is known to always be small, ZSTD_estimateCStreamSize_usingCParams() can provide a tighter estimation.
* ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel. * ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZSTD_getCParams() to create cParams from compressionLevel.
* ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbThreads is set to a value > 1. * ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with ZSTD_CCtxParam_setParameter(). Only single-threaded compression is supported. This function will return an error code if ZSTD_p_nbWorkers is >= 1.
* Note : CStream size estimation is only correct for single-threaded compression. * Note : CStream size estimation is only correct for single-threaded compression.
* ZSTD_DStream memory budget depends on window Size. * ZSTD_DStream memory budget depends on window Size.
* This information can be passed manually, using ZSTD_estimateDStreamSize, * This information can be passed manually, using ZSTD_estimateDStreamSize,
@ -992,18 +994,13 @@ typedef enum {
/* multi-threading parameters */ /* multi-threading parameters */
/* These parameters are only useful if multi-threading is enabled (ZSTD_MULTITHREAD). /* These parameters are only useful if multi-threading is enabled (ZSTD_MULTITHREAD).
* They return an error otherwise. */ * They return an error otherwise. */
ZSTD_p_nbThreads=400, /* Select how many threads a compression job can spawn (default:1) ZSTD_p_nbWorkers=400, /* Select how many threads will be spawned to compress in parallel.
* More threads improve speed, but also increase memory usage. * When nbWorkers >= 1, triggers asynchronous mode :
* Can only receive a value > 1 if ZSTD_MULTITHREAD is enabled. * ZSTD_compress_generic() consumes some input, flush some output if possible, and immediately gives back control to caller,
* Special: value 0 means "do not change nbThreads" */ * while compression work is performed in parallel, within worker threads.
ZSTD_p_nonBlockingMode, /* Single thread mode is by default "blocking" : * (note : a strong exception to this rule is when first invocation sets ZSTD_e_end : it becomes a blocking call).
* it finishes its job as much as possible, and only then gives back control to caller. * More workers improve speed, but also increase memory usage.
* In contrast, multi-thread is by default "non-blocking" : * Default value is `0`, aka "single-threaded mode" : no worker is spawned, compression is performed inside Caller's thread, all invocations are blocking */
* it takes some input, flush some output if available, and immediately gives back control to caller.
* Compression work is performed in parallel, within worker threads.
* (note : a strong exception to this rule is when first job is called with ZSTD_e_end : it becomes blocking)
* Setting this parameter to 1 will enforce non-blocking mode even when only 1 thread is selected.
* It allows the caller to do other tasks while the worker thread compresses in parallel. */
ZSTD_p_jobSize, /* Size of a compression job. This value is only enforced in streaming (non-blocking) mode. ZSTD_p_jobSize, /* Size of a compression job. This value is only enforced in streaming (non-blocking) mode.
* Each compression job is completed in parallel, so indirectly controls the nb of active threads. * Each compression job is completed in parallel, so indirectly controls the nb of active threads.
* 0 means default, which is dynamically determined based on compression parameters. * 0 means default, which is dynamically determined based on compression parameters.
@ -1029,29 +1026,35 @@ typedef enum {
* Larger values increase memory usage and compression ratio, but decrease * Larger values increase memory usage and compression ratio, but decrease
* compression speed. * compression speed.
* Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX * Must be clamped between ZSTD_HASHLOG_MIN and ZSTD_HASHLOG_MAX
* (default: windowlog - 7). */ * (default: windowlog - 7).
* Special: value 0 means "do not change ldmHashLog". */
ZSTD_p_ldmMinMatch, /* Minimum size of searched matches for long distance matcher. ZSTD_p_ldmMinMatch, /* Minimum size of searched matches for long distance matcher.
* Larger/too small values usually decrease compression ratio. * Larger/too small values usually decrease compression ratio.
* Must be clamped between ZSTD_LDM_MINMATCH_MIN * Must be clamped between ZSTD_LDM_MINMATCH_MIN
* and ZSTD_LDM_MINMATCH_MAX (default: 64). */ * and ZSTD_LDM_MINMATCH_MAX (default: 64).
* Special: value 0 means "do not change ldmMinMatch". */
ZSTD_p_ldmBucketSizeLog, /* Log size of each bucket in the LDM hash table for collision resolution. ZSTD_p_ldmBucketSizeLog, /* Log size of each bucket in the LDM hash table for collision resolution.
* Larger values usually improve collision resolution but may decrease * Larger values usually improve collision resolution but may decrease
* compression speed. * compression speed.
* The maximum value is ZSTD_LDM_BUCKETSIZELOG_MAX (default: 3). */ * The maximum value is ZSTD_LDM_BUCKETSIZELOG_MAX (default: 3).
* note : 0 is a valid value */
ZSTD_p_ldmHashEveryLog, /* Frequency of inserting/looking up entries in the LDM hash table. ZSTD_p_ldmHashEveryLog, /* Frequency of inserting/looking up entries in the LDM hash table.
* The default is MAX(0, (windowLog - ldmHashLog)) to * The default is MAX(0, (windowLog - ldmHashLog)) to
* optimize hash table usage. * optimize hash table usage.
* Larger values improve compression speed. Deviating far from the * Larger values improve compression speed. Deviating far from the
* default value will likely result in a decrease in compression ratio. * default value will likely result in a decrease in compression ratio.
* Must be clamped between 0 and ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN. */ * Must be clamped between 0 and ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN.
* note : 0 is a valid value */
} ZSTD_cParameter; } ZSTD_cParameter;
/*! ZSTD_CCtx_setParameter() : /*! ZSTD_CCtx_setParameter() :
* Set one compression parameter, selected by enum ZSTD_cParameter. * Set one compression parameter, selected by enum ZSTD_cParameter.
* Setting a parameter is generally only possible during frame initialization (before starting compression),
* except for a few exceptions which can be updated during compression: compressionLevel, hashLog, chainLog, searchLog, minMatch, targetLength and strategy.
* Note : when `value` is an enum, cast it to unsigned for proper type checking. * Note : when `value` is an enum, cast it to unsigned for proper type checking.
* @result : informational value (typically, the one being set, possibly corrected), * @result : informational value (typically, value being set clamped correctly),
* or an error code (which can be tested with ZSTD_isError()). */ * or an error code (which can be tested with ZSTD_isError()). */
ZSTDLIB_API size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value); ZSTDLIB_API size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cParameter param, unsigned value);
@ -1198,7 +1201,7 @@ ZSTDLIB_API size_t ZSTD_compress_generic_simpleArgs (
ZSTDLIB_API ZSTD_CCtx_params* ZSTD_createCCtxParams(void); ZSTDLIB_API ZSTD_CCtx_params* ZSTD_createCCtxParams(void);
/*! ZSTD_resetCCtxParams() : /*! ZSTD_resetCCtxParams() :
* Reset params to default, with the default compression level. * Reset params to default values.
*/ */
ZSTDLIB_API size_t ZSTD_resetCCtxParams(ZSTD_CCtx_params* params); ZSTDLIB_API size_t ZSTD_resetCCtxParams(ZSTD_CCtx_params* params);
@ -1227,9 +1230,10 @@ ZSTDLIB_API size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* params, ZSTD_cP
/*! ZSTD_CCtx_setParametersUsingCCtxParams() : /*! ZSTD_CCtx_setParametersUsingCCtxParams() :
* Apply a set of ZSTD_CCtx_params to the compression context. * Apply a set of ZSTD_CCtx_params to the compression context.
* This must be done before the dictionary is loaded. * This can be done even after compression is started,
* The pledgedSrcSize is treated as unknown. * if nbWorkers==0, this will have no impact until a new compression is started.
* Multithreading parameters are applied only if nbThreads > 1. * if nbWorkers>=1, new parameters will be picked up at next job,
* with a few restrictions (windowLog, pledgedSrcSize, nbWorkers, jobSize, and overlapLog are not updated).
*/ */
ZSTDLIB_API size_t ZSTD_CCtx_setParametersUsingCCtxParams( ZSTDLIB_API size_t ZSTD_CCtx_setParametersUsingCCtxParams(
ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params); ZSTD_CCtx* cctx, const ZSTD_CCtx_params* params);

View File

@ -34,6 +34,7 @@
#include <stdlib.h> /* malloc, free */ #include <stdlib.h> /* malloc, free */
#include <string.h> /* memset */ #include <string.h> /* memset */
#include <stdio.h> /* fprintf, fopen */ #include <stdio.h> /* fprintf, fopen */
#include <assert.h> /* assert */
#include "mem.h" #include "mem.h"
#define ZSTD_STATIC_LINKING_ONLY #define ZSTD_STATIC_LINKING_ONLY
@ -51,8 +52,9 @@
# define ZSTD_GIT_COMMIT_STRING ZSTD_EXPAND_AND_QUOTE(ZSTD_GIT_COMMIT) # define ZSTD_GIT_COMMIT_STRING ZSTD_EXPAND_AND_QUOTE(ZSTD_GIT_COMMIT)
#endif #endif
#define TIMELOOP_MICROSEC 1*1000000ULL /* 1 second */ #define TIMELOOP_MICROSEC (1*1000000ULL) /* 1 second */
#define ACTIVEPERIOD_MICROSEC 70*1000000ULL /* 70 seconds */ #define TIMELOOP_NANOSEC (1*1000000000ULL) /* 1 second */
#define ACTIVEPERIOD_MICROSEC (70*TIMELOOP_MICROSEC) /* 70 seconds */
#define COOLPERIOD_SEC 10 #define COOLPERIOD_SEC 10
#define KB *(1 <<10) #define KB *(1 <<10)
@ -122,12 +124,12 @@ void BMK_setBlockSize(size_t blockSize)
void BMK_setDecodeOnlyMode(unsigned decodeFlag) { g_decodeOnly = (decodeFlag>0); } void BMK_setDecodeOnlyMode(unsigned decodeFlag) { g_decodeOnly = (decodeFlag>0); }
static U32 g_nbThreads = 1; static U32 g_nbWorkers = 0;
void BMK_setNbThreads(unsigned nbThreads) { void BMK_setNbWorkers(unsigned nbWorkers) {
#ifndef ZSTD_MULTITHREAD #ifndef ZSTD_MULTITHREAD
if (nbThreads > 1) DISPLAYLEVEL(2, "Note : multi-threading is disabled \n"); if (nbWorkers > 0) DISPLAYLEVEL(2, "Note : multi-threading is disabled \n");
#endif #endif
g_nbThreads = nbThreads; g_nbWorkers = nbWorkers;
} }
static U32 g_realTime = 0; static U32 g_realTime = 0;
@ -264,7 +266,9 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
{ U64 fastestC = (U64)(-1LL), fastestD = (U64)(-1LL); { U64 fastestC = (U64)(-1LL), fastestD = (U64)(-1LL);
U64 const crcOrig = g_decodeOnly ? 0 : XXH64(srcBuffer, srcSize, 0); U64 const crcOrig = g_decodeOnly ? 0 : XXH64(srcBuffer, srcSize, 0);
UTIL_time_t coolTime; UTIL_time_t coolTime;
U64 const maxTime = (g_nbSeconds * TIMELOOP_MICROSEC) + 1; U64 const maxTime = (g_nbSeconds * TIMELOOP_NANOSEC) + 1;
U32 nbDecodeLoops = (U32)((100 MB) / (srcSize+1)) + 1; /* initial conservative speed estimate */
U32 nbCompressionLoops = (U32)((2 MB) / (srcSize+1)) + 1; /* initial conservative speed estimate */
U64 totalCTime=0, totalDTime=0; U64 totalCTime=0, totalDTime=0;
U32 cCompleted=g_decodeOnly, dCompleted=0; U32 cCompleted=g_decodeOnly, dCompleted=0;
# define NB_MARKS 4 # define NB_MARKS 4
@ -283,19 +287,17 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
} }
if (!g_decodeOnly) { if (!g_decodeOnly) {
UTIL_time_t clockStart;
/* Compression */ /* Compression */
DISPLAYLEVEL(2, "%2s-%-17.17s :%10u ->\r", marks[markNb], displayName, (U32)srcSize); DISPLAYLEVEL(2, "%2s-%-17.17s :%10u ->\r", marks[markNb], displayName, (U32)srcSize);
if (!cCompleted) memset(compressedBuffer, 0xE5, maxCompressedSize); /* warm up and erase result buffer */ if (!cCompleted) memset(compressedBuffer, 0xE5, maxCompressedSize); /* warm up and erase result buffer */
UTIL_sleepMilli(1); /* give processor time to other processes */ UTIL_sleepMilli(5); /* give processor time to other processes */
UTIL_waitForNextTick(); UTIL_waitForNextTick();
clockStart = UTIL_getTime();
if (!cCompleted) { /* still some time to do compression tests */ if (!cCompleted) { /* still some time to do compression tests */
U64 const clockLoop = g_nbSeconds ? TIMELOOP_MICROSEC : 1;
U32 nbLoops = 0; U32 nbLoops = 0;
ZSTD_CCtx_setParameter(ctx, ZSTD_p_nbThreads, g_nbThreads); UTIL_time_t const clockStart = UTIL_getTime();
ZSTD_CCtx_setParameter(ctx, ZSTD_p_nbWorkers, g_nbWorkers);
ZSTD_CCtx_setParameter(ctx, ZSTD_p_compressionLevel, cLevel); ZSTD_CCtx_setParameter(ctx, ZSTD_p_compressionLevel, cLevel);
ZSTD_CCtx_setParameter(ctx, ZSTD_p_enableLongDistanceMatching, g_ldmFlag); ZSTD_CCtx_setParameter(ctx, ZSTD_p_enableLongDistanceMatching, g_ldmFlag);
ZSTD_CCtx_setParameter(ctx, ZSTD_p_ldmMinMatch, g_ldmMinMatch); ZSTD_CCtx_setParameter(ctx, ZSTD_p_ldmMinMatch, g_ldmMinMatch);
@ -314,7 +316,9 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
ZSTD_CCtx_setParameter(ctx, ZSTD_p_targetLength, comprParams->targetLength); ZSTD_CCtx_setParameter(ctx, ZSTD_p_targetLength, comprParams->targetLength);
ZSTD_CCtx_setParameter(ctx, ZSTD_p_compressionStrategy, comprParams->strategy); ZSTD_CCtx_setParameter(ctx, ZSTD_p_compressionStrategy, comprParams->strategy);
ZSTD_CCtx_loadDictionary(ctx, dictBuffer, dictBufferSize); ZSTD_CCtx_loadDictionary(ctx, dictBuffer, dictBufferSize);
do {
if (!g_nbSeconds) nbCompressionLoops=1;
for (nbLoops=0; nbLoops<nbCompressionLoops; nbLoops++) {
U32 blockNb; U32 blockNb;
for (blockNb=0; blockNb<nbBlocks; blockNb++) { for (blockNb=0; blockNb<nbBlocks; blockNb++) {
#if 0 /* direct compression function, for occasional comparison */ #if 0 /* direct compression function, for occasional comparison */
@ -343,12 +347,16 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
} }
blockTable[blockNb].cSize = out.pos; blockTable[blockNb].cSize = out.pos;
#endif #endif
} }
{ U64 const loopDuration = UTIL_clockSpanNano(clockStart);
if (loopDuration > 0) {
if (loopDuration < fastestC * nbCompressionLoops)
fastestC = loopDuration / nbCompressionLoops;
nbCompressionLoops = (U32)(TIMELOOP_NANOSEC / fastestC) + 1;
} else {
assert(nbCompressionLoops < 40000000); /* avoid overflow */
nbCompressionLoops *= 100;
} }
nbLoops++;
} while (UTIL_clockSpanMicro(clockStart) < clockLoop);
{ U64 const loopDuration = UTIL_clockSpanMicro(clockStart);
if (loopDuration < fastestC*nbLoops)
fastestC = loopDuration / nbLoops;
totalCTime += loopDuration; totalCTime += loopDuration;
cCompleted = (totalCTime >= maxTime); /* end compression tests */ cCompleted = (totalCTime >= maxTime); /* end compression tests */
} } } }
@ -358,7 +366,7 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
ratio = (double)srcSize / (double)cSize; ratio = (double)srcSize / (double)cSize;
markNb = (markNb+1) % NB_MARKS; markNb = (markNb+1) % NB_MARKS;
{ int const ratioAccuracy = (ratio < 10.) ? 3 : 2; { int const ratioAccuracy = (ratio < 10.) ? 3 : 2;
double const compressionSpeed = (double)srcSize / fastestC; double const compressionSpeed = ((double)srcSize / fastestC) * 1000;
int const cSpeedAccuracy = (compressionSpeed < 10.) ? 2 : 1; int const cSpeedAccuracy = (compressionSpeed < 10.) ? 2 : 1;
DISPLAYLEVEL(2, "%2s-%-17.17s :%10u ->%10u (%5.*f),%6.*f MB/s\r", DISPLAYLEVEL(2, "%2s-%-17.17s :%10u ->%10u (%5.*f),%6.*f MB/s\r",
marks[markNb], displayName, (U32)srcSize, (U32)cSize, marks[markNb], displayName, (U32)srcSize, (U32)cSize,
@ -376,16 +384,16 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
/* Decompression */ /* Decompression */
if (!dCompleted) memset(resultBuffer, 0xD6, srcSize); /* warm result buffer */ if (!dCompleted) memset(resultBuffer, 0xD6, srcSize); /* warm result buffer */
UTIL_sleepMilli(1); /* give processor time to other processes */ UTIL_sleepMilli(5); /* give processor time to other processes */
UTIL_waitForNextTick(); UTIL_waitForNextTick();
if (!dCompleted) { if (!dCompleted) {
U64 clockLoop = g_nbSeconds ? TIMELOOP_MICROSEC : 1;
U32 nbLoops = 0; U32 nbLoops = 0;
ZSTD_DDict* const ddict = ZSTD_createDDict(dictBuffer, dictBufferSize); ZSTD_DDict* const ddict = ZSTD_createDDict(dictBuffer, dictBufferSize);
UTIL_time_t const clockStart = UTIL_getTime(); UTIL_time_t const clockStart = UTIL_getTime();
if (!ddict) EXM_THROW(2, "ZSTD_createDDict() allocation failure"); if (!ddict) EXM_THROW(2, "ZSTD_createDDict() allocation failure");
do { if (!g_nbSeconds) nbDecodeLoops = 1;
for (nbLoops=0; nbLoops < nbDecodeLoops; nbLoops++) {
U32 blockNb; U32 blockNb;
for (blockNb=0; blockNb<nbBlocks; blockNb++) { for (blockNb=0; blockNb<nbBlocks; blockNb++) {
size_t const regenSize = ZSTD_decompress_usingDDict(dctx, size_t const regenSize = ZSTD_decompress_usingDDict(dctx,
@ -397,22 +405,26 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
blockNb, (U32)blockTable[blockNb].cSize, ZSTD_getErrorName(regenSize)); blockNb, (U32)blockTable[blockNb].cSize, ZSTD_getErrorName(regenSize));
} }
blockTable[blockNb].resSize = regenSize; blockTable[blockNb].resSize = regenSize;
} } }
nbLoops++;
} while (UTIL_clockSpanMicro(clockStart) < clockLoop);
ZSTD_freeDDict(ddict); ZSTD_freeDDict(ddict);
{ U64 const loopDuration = UTIL_clockSpanMicro(clockStart); { U64 const loopDuration = UTIL_clockSpanNano(clockStart);
if (loopDuration < fastestD*nbLoops) if (loopDuration > 0) {
fastestD = loopDuration / nbLoops; if (loopDuration < fastestD * nbDecodeLoops)
fastestD = loopDuration / nbDecodeLoops;
nbDecodeLoops = (U32)(TIMELOOP_NANOSEC / fastestD) + 1;
} else {
assert(nbDecodeLoops < 40000000); /* avoid overflow */
nbDecodeLoops *= 100;
}
totalDTime += loopDuration; totalDTime += loopDuration;
dCompleted = (totalDTime >= maxTime); dCompleted = (totalDTime >= maxTime);
} } } }
markNb = (markNb+1) % NB_MARKS; markNb = (markNb+1) % NB_MARKS;
{ int const ratioAccuracy = (ratio < 10.) ? 3 : 2; { int const ratioAccuracy = (ratio < 10.) ? 3 : 2;
double const compressionSpeed = (double)srcSize / fastestC; double const compressionSpeed = ((double)srcSize / fastestC) * 1000;
int const cSpeedAccuracy = (compressionSpeed < 10.) ? 2 : 1; int const cSpeedAccuracy = (compressionSpeed < 10.) ? 2 : 1;
double const decompressionSpeed = (double)srcSize / fastestD; double const decompressionSpeed = ((double)srcSize / fastestD) * 1000;
DISPLAYLEVEL(2, "%2s-%-17.17s :%10u ->%10u (%5.*f),%6.*f MB/s ,%6.1f MB/s \r", DISPLAYLEVEL(2, "%2s-%-17.17s :%10u ->%10u (%5.*f),%6.*f MB/s ,%6.1f MB/s \r",
marks[markNb], displayName, (U32)srcSize, (U32)cSize, marks[markNb], displayName, (U32)srcSize, (U32)cSize,
ratioAccuracy, ratio, ratioAccuracy, ratio,
@ -461,8 +473,8 @@ static int BMK_benchMem(const void* srcBuffer, size_t srcSize,
} /* for (testNb = 1; testNb <= (g_nbSeconds + !g_nbSeconds); testNb++) */ } /* for (testNb = 1; testNb <= (g_nbSeconds + !g_nbSeconds); testNb++) */
if (g_displayLevel == 1) { /* hidden display mode -q, used by python speed benchmark */ if (g_displayLevel == 1) { /* hidden display mode -q, used by python speed benchmark */
double const cSpeed = (double)srcSize / fastestC; double const cSpeed = ((double)srcSize / fastestC) * 1000;
double const dSpeed = (double)srcSize / fastestD; double const dSpeed = ((double)srcSize / fastestD) * 1000;
if (g_additionalParam) if (g_additionalParam)
DISPLAY("-%-3i%11i (%5.3f) %6.2f MB/s %6.1f MB/s %s (param=%d)\n", cLevel, (int)cSize, ratio, cSpeed, dSpeed, displayName, g_additionalParam); DISPLAY("-%-3i%11i (%5.3f) %6.2f MB/s %6.1f MB/s %s (param=%d)\n", cLevel, (int)cSize, ratio, cSpeed, dSpeed, displayName, g_additionalParam);
else else
@ -634,7 +646,8 @@ static void BMK_benchFileTable(const char* const * const fileNamesTable, unsigne
} }
static void BMK_syntheticTest(int cLevel, int cLevelLast, double compressibility, const ZSTD_compressionParameters* compressionParams) static void BMK_syntheticTest(int cLevel, int cLevelLast, double compressibility,
const ZSTD_compressionParameters* compressionParams)
{ {
char name[20] = {0}; char name[20] = {0};
size_t benchedSize = 10000000; size_t benchedSize = 10000000;

View File

@ -22,7 +22,7 @@ int BMK_benchFiles(const char** fileNamesTable, unsigned nbFiles, const char* di
/* Set Parameters */ /* Set Parameters */
void BMK_setNbSeconds(unsigned nbLoops); void BMK_setNbSeconds(unsigned nbLoops);
void BMK_setBlockSize(size_t blockSize); void BMK_setBlockSize(size_t blockSize);
void BMK_setNbThreads(unsigned nbThreads); void BMK_setNbWorkers(unsigned nbWorkers);
void BMK_setRealTime(unsigned priority); void BMK_setRealTime(unsigned priority);
void BMK_setNotificationLevel(unsigned level); void BMK_setNotificationLevel(unsigned level);
void BMK_setSeparateFiles(unsigned separate); void BMK_setSeparateFiles(unsigned separate);

View File

@ -36,18 +36,21 @@
# include <io.h> # include <io.h>
#endif #endif
#include "bitstream.h"
#include "mem.h" #include "mem.h"
#include "fileio.h" #include "fileio.h"
#include "util.h" #include "util.h"
#define ZSTD_STATIC_LINKING_ONLY /* ZSTD_magicNumber, ZSTD_frameHeaderSize_max */ #define ZSTD_STATIC_LINKING_ONLY /* ZSTD_magicNumber, ZSTD_frameHeaderSize_max */
#include "zstd.h" #include "zstd.h"
#include "zstd_errors.h" /* ZSTD_error_frameParameter_windowTooLarge */
#if defined(ZSTD_GZCOMPRESS) || defined(ZSTD_GZDECOMPRESS) #if defined(ZSTD_GZCOMPRESS) || defined(ZSTD_GZDECOMPRESS)
# include <zlib.h> # include <zlib.h>
# if !defined(z_const) # if !defined(z_const)
# define z_const # define z_const
# endif # endif
#endif #endif
#if defined(ZSTD_LZMACOMPRESS) || defined(ZSTD_LZMADECOMPRESS) #if defined(ZSTD_LZMACOMPRESS) || defined(ZSTD_LZMADECOMPRESS)
# include <lzma.h> # include <lzma.h>
#endif #endif
@ -215,23 +218,23 @@ static U32 g_removeSrcFile = 0;
void FIO_setRemoveSrcFile(unsigned flag) { g_removeSrcFile = (flag>0); } void FIO_setRemoveSrcFile(unsigned flag) { g_removeSrcFile = (flag>0); }
static U32 g_memLimit = 0; static U32 g_memLimit = 0;
void FIO_setMemLimit(unsigned memLimit) { g_memLimit = memLimit; } void FIO_setMemLimit(unsigned memLimit) { g_memLimit = memLimit; }
static U32 g_nbThreads = 1; static U32 g_nbWorkers = 1;
void FIO_setNbThreads(unsigned nbThreads) { void FIO_setNbWorkers(unsigned nbWorkers) {
#ifndef ZSTD_MULTITHREAD #ifndef ZSTD_MULTITHREAD
if (nbThreads > 1) DISPLAYLEVEL(2, "Note : multi-threading is disabled \n"); if (nbWorkers > 0) DISPLAYLEVEL(2, "Note : multi-threading is disabled \n");
#endif #endif
g_nbThreads = nbThreads; g_nbWorkers = nbWorkers;
} }
static U32 g_blockSize = 0; static U32 g_blockSize = 0;
void FIO_setBlockSize(unsigned blockSize) { void FIO_setBlockSize(unsigned blockSize) {
if (blockSize && g_nbThreads==1) if (blockSize && g_nbWorkers==0)
DISPLAYLEVEL(2, "Setting block size is useless in single-thread mode \n"); DISPLAYLEVEL(2, "Setting block size is useless in single-thread mode \n");
g_blockSize = blockSize; g_blockSize = blockSize;
} }
#define FIO_OVERLAP_LOG_NOTSET 9999 #define FIO_OVERLAP_LOG_NOTSET 9999
static U32 g_overlapLog = FIO_OVERLAP_LOG_NOTSET; static U32 g_overlapLog = FIO_OVERLAP_LOG_NOTSET;
void FIO_setOverlapLog(unsigned overlapLog){ void FIO_setOverlapLog(unsigned overlapLog){
if (overlapLog && g_nbThreads==1) if (overlapLog && g_nbWorkers==0)
DISPLAYLEVEL(2, "Setting overlapLog is useless in single-thread mode \n"); DISPLAYLEVEL(2, "Setting overlapLog is useless in single-thread mode \n");
g_overlapLog = overlapLog; g_overlapLog = overlapLog;
} }
@ -427,7 +430,7 @@ static cRess_t FIO_createCResources(const char* dictFileName, int cLevel,
if (!ress.srcBuffer || !ress.dstBuffer) if (!ress.srcBuffer || !ress.dstBuffer)
EXM_THROW(31, "allocation error : not enough memory"); EXM_THROW(31, "allocation error : not enough memory");
/* Advances parameters, including dictionary */ /* Advanced parameters, including dictionary */
{ void* dictBuffer; { void* dictBuffer;
size_t const dictBuffSize = FIO_createDictBuffer(&dictBuffer, dictFileName); /* works with dictFileName==NULL */ size_t const dictBuffSize = FIO_createDictBuffer(&dictBuffer, dictFileName); /* works with dictFileName==NULL */
if (dictFileName && (dictBuffer==NULL)) if (dictFileName && (dictBuffer==NULL))
@ -439,8 +442,7 @@ static cRess_t FIO_createCResources(const char* dictFileName, int cLevel,
/* compression level */ /* compression level */
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_compressionLevel, cLevel) ); CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_compressionLevel, cLevel) );
/* long distance matching */ /* long distance matching */
CHECK( ZSTD_CCtx_setParameter( CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_enableLongDistanceMatching, g_ldmFlag) );
ress.cctx, ZSTD_p_enableLongDistanceMatching, g_ldmFlag) );
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_ldmHashLog, g_ldmHashLog) ); CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_ldmHashLog, g_ldmHashLog) );
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_ldmMinMatch, g_ldmMinMatch) ); CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_ldmMinMatch, g_ldmMinMatch) );
if (g_ldmBucketSizeLog != FIO_LDM_PARAM_NOTSET) { if (g_ldmBucketSizeLog != FIO_LDM_PARAM_NOTSET) {
@ -458,13 +460,12 @@ static cRess_t FIO_createCResources(const char* dictFileName, int cLevel,
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_targetLength, comprParams->targetLength) ); CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_targetLength, comprParams->targetLength) );
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_compressionStrategy, (U32)comprParams->strategy) ); CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_compressionStrategy, (U32)comprParams->strategy) );
/* multi-threading */ /* multi-threading */
DISPLAYLEVEL(5,"set nb threads = %u \n", g_nbThreads);
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_nbThreads, g_nbThreads) );
#ifdef ZSTD_MULTITHREAD #ifdef ZSTD_MULTITHREAD
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_nonBlockingMode, 1) ); DISPLAYLEVEL(5,"set nb workers = %u \n", g_nbWorkers);
CHECK( ZSTD_CCtx_setParameter(ress.cctx, ZSTD_p_nbWorkers, g_nbWorkers) );
#endif #endif
/* dictionary */ /* dictionary */
CHECK( ZSTD_CCtx_setPledgedSrcSize(ress.cctx, srcSize) ); /* just for dictionary loading, for compression parameters adaptation */ CHECK( ZSTD_CCtx_setPledgedSrcSize(ress.cctx, srcSize) ); /* set the value temporarily for dictionary loading, to adapt compression parameters */
CHECK( ZSTD_CCtx_loadDictionary(ress.cctx, dictBuffer, dictBuffSize) ); CHECK( ZSTD_CCtx_loadDictionary(ress.cctx, dictBuffer, dictBuffSize) );
CHECK( ZSTD_CCtx_setPledgedSrcSize(ress.cctx, ZSTD_CONTENTSIZE_UNKNOWN) ); /* reset */ CHECK( ZSTD_CCtx_setPledgedSrcSize(ress.cctx, ZSTD_CONTENTSIZE_UNKNOWN) ); /* reset */
@ -735,56 +736,22 @@ static unsigned long long FIO_compressLz4Frame(cRess_t* ress,
* @return : 0 : compression completed correctly, * @return : 0 : compression completed correctly,
* 1 : missing or pb opening srcFileName * 1 : missing or pb opening srcFileName
*/ */
static int FIO_compressFilename_internal(cRess_t ress, static unsigned long long
const char* dstFileName, const char* srcFileName, int compressionLevel) FIO_compressZstdFrame(const cRess_t* ressPtr,
const char* srcFileName, U64 fileSize,
int compressionLevel, U64* readsize)
{ {
cRess_t const ress = *ressPtr;
FILE* const srcFile = ress.srcFile; FILE* const srcFile = ress.srcFile;
FILE* const dstFile = ress.dstFile; FILE* const dstFile = ress.dstFile;
U64 readsize = 0;
U64 compressedfilesize = 0; U64 compressedfilesize = 0;
U64 const fileSize = UTIL_getFileSize(srcFileName);
ZSTD_EndDirective directive = ZSTD_e_continue; ZSTD_EndDirective directive = ZSTD_e_continue;
DISPLAYLEVEL(5, "%s: %u bytes \n", srcFileName, (U32)fileSize); DISPLAYLEVEL(6, "compression using zstd format \n");
switch (g_compressionType) {
case FIO_zstdCompression:
break;
case FIO_gzipCompression:
#ifdef ZSTD_GZCOMPRESS
compressedfilesize = FIO_compressGzFrame(&ress, srcFileName, fileSize, compressionLevel, &readsize);
#else
(void)compressionLevel;
EXM_THROW(20, "zstd: %s: file cannot be compressed as gzip (zstd compiled without ZSTD_GZCOMPRESS) -- ignored \n",
srcFileName);
#endif
goto finish;
case FIO_xzCompression:
case FIO_lzmaCompression:
#ifdef ZSTD_LZMACOMPRESS
compressedfilesize = FIO_compressLzmaFrame(&ress, srcFileName, fileSize, compressionLevel, &readsize, g_compressionType==FIO_lzmaCompression);
#else
(void)compressionLevel;
EXM_THROW(20, "zstd: %s: file cannot be compressed as xz/lzma (zstd compiled without ZSTD_LZMACOMPRESS) -- ignored \n",
srcFileName);
#endif
goto finish;
case FIO_lz4Compression:
#ifdef ZSTD_LZ4COMPRESS
compressedfilesize = FIO_compressLz4Frame(&ress, srcFileName, fileSize, compressionLevel, &readsize);
#else
(void)compressionLevel;
EXM_THROW(20, "zstd: %s: file cannot be compressed as lz4 (zstd compiled without ZSTD_LZ4COMPRESS) -- ignored \n",
srcFileName);
#endif
goto finish;
}
/* init */ /* init */
if (fileSize != UTIL_FILESIZE_UNKNOWN) if (fileSize != UTIL_FILESIZE_UNKNOWN)
ZSTD_CCtx_setPledgedSrcSize(ress.cctx, fileSize); ZSTD_CCtx_setPledgedSrcSize(ress.cctx, fileSize);
(void)compressionLevel; (void)srcFileName;
/* Main compression loop */ /* Main compression loop */
do { do {
@ -793,9 +760,9 @@ static int FIO_compressFilename_internal(cRess_t ress,
size_t const inSize = fread(ress.srcBuffer, (size_t)1, ress.srcBufferSize, srcFile); size_t const inSize = fread(ress.srcBuffer, (size_t)1, ress.srcBufferSize, srcFile);
ZSTD_inBuffer inBuff = { ress.srcBuffer, inSize, 0 }; ZSTD_inBuffer inBuff = { ress.srcBuffer, inSize, 0 };
DISPLAYLEVEL(6, "fread %u bytes from source \n", (U32)inSize); DISPLAYLEVEL(6, "fread %u bytes from source \n", (U32)inSize);
readsize += inSize; *readsize += inSize;
if (inSize == 0 || (fileSize != UTIL_FILESIZE_UNKNOWN && readsize == fileSize)) if ((inSize == 0) || (*readsize == fileSize))
directive = ZSTD_e_end; directive = ZSTD_e_end;
result = 1; result = 1;
@ -809,12 +776,13 @@ static int FIO_compressFilename_internal(cRess_t ress,
if (outBuff.pos) { if (outBuff.pos) {
size_t const sizeCheck = fwrite(ress.dstBuffer, 1, outBuff.pos, dstFile); size_t const sizeCheck = fwrite(ress.dstBuffer, 1, outBuff.pos, dstFile);
if (sizeCheck!=outBuff.pos) if (sizeCheck!=outBuff.pos)
EXM_THROW(25, "Write error : cannot write compressed block into %s", dstFileName); EXM_THROW(25, "Write error : cannot write compressed block");
compressedfilesize += outBuff.pos; compressedfilesize += outBuff.pos;
} }
if (READY_FOR_UPDATE()) { if (READY_FOR_UPDATE()) {
ZSTD_frameProgression const zfp = ZSTD_getFrameProgression(ress.cctx); ZSTD_frameProgression const zfp = ZSTD_getFrameProgression(ress.cctx);
DISPLAYUPDATE(2, "\rRead :%6u MB - Consumed :%6u MB - Compressed :%6u MB => %.2f%%", DISPLAYUPDATE(2, "\r(%i) Read :%6u MB - Consumed :%6u MB - Compressed :%6u MB => %.2f%%",
compressionLevel,
(U32)(zfp.ingested >> 20), (U32)(zfp.ingested >> 20),
(U32)(zfp.consumed >> 20), (U32)(zfp.consumed >> 20),
(U32)(zfp.produced >> 20), (U32)(zfp.produced >> 20),
@ -823,10 +791,67 @@ static int FIO_compressFilename_internal(cRess_t ress,
} }
} while (directive != ZSTD_e_end); } while (directive != ZSTD_e_end);
finish: return compressedfilesize;
}
/*! FIO_compressFilename_internal() :
* same as FIO_compressFilename_extRess(), with `ress.desFile` already opened.
* @return : 0 : compression completed correctly,
* 1 : missing or pb opening srcFileName
*/
static int
FIO_compressFilename_internal(cRess_t ress,
const char* dstFileName, const char* srcFileName,
int compressionLevel)
{
U64 readsize = 0;
U64 compressedfilesize = 0;
U64 const fileSize = UTIL_getFileSize(srcFileName);
DISPLAYLEVEL(5, "%s: %u bytes \n", srcFileName, (U32)fileSize);
/* compression format selection */
switch (g_compressionType) {
default:
case FIO_zstdCompression:
compressedfilesize = FIO_compressZstdFrame(&ress, srcFileName, fileSize, compressionLevel, &readsize);
break;
case FIO_gzipCompression:
#ifdef ZSTD_GZCOMPRESS
compressedfilesize = FIO_compressGzFrame(&ress, srcFileName, fileSize, compressionLevel, &readsize);
#else
(void)compressionLevel;
EXM_THROW(20, "zstd: %s: file cannot be compressed as gzip (zstd compiled without ZSTD_GZCOMPRESS) -- ignored \n",
srcFileName);
#endif
break;
case FIO_xzCompression:
case FIO_lzmaCompression:
#ifdef ZSTD_LZMACOMPRESS
compressedfilesize = FIO_compressLzmaFrame(&ress, srcFileName, fileSize, compressionLevel, &readsize, g_compressionType==FIO_lzmaCompression);
#else
(void)compressionLevel;
EXM_THROW(20, "zstd: %s: file cannot be compressed as xz/lzma (zstd compiled without ZSTD_LZMACOMPRESS) -- ignored \n",
srcFileName);
#endif
break;
case FIO_lz4Compression:
#ifdef ZSTD_LZ4COMPRESS
compressedfilesize = FIO_compressLz4Frame(&ress, srcFileName, fileSize, compressionLevel, &readsize);
#else
(void)compressionLevel;
EXM_THROW(20, "zstd: %s: file cannot be compressed as lz4 (zstd compiled without ZSTD_LZ4COMPRESS) -- ignored \n",
srcFileName);
#endif
break;
}
/* Status */ /* Status */
DISPLAYLEVEL(2, "\r%79s\r", ""); DISPLAYLEVEL(2, "\r%79s\r", "");
DISPLAYLEVEL(2,"%-20s :%6.2f%% (%6llu => %6llu bytes, %s) \n", srcFileName, DISPLAYLEVEL(2,"%-20s :%6.2f%% (%6llu => %6llu bytes, %s) \n",
srcFileName,
(double)compressedfilesize / (readsize+(!readsize)/*avoid div by zero*/) * 100, (double)compressedfilesize / (readsize+(!readsize)/*avoid div by zero*/) * 100,
(unsigned long long)readsize, (unsigned long long) compressedfilesize, (unsigned long long)readsize, (unsigned long long) compressedfilesize,
dstFileName); dstFileName);
@ -1142,33 +1167,46 @@ static unsigned FIO_passThrough(FILE* foutput, FILE* finput, void* buffer, size_
return 0; return 0;
} }
static void FIO_zstdErrorHelp(dRess_t* ress, size_t ret, char const* srcFileName) /* FIO_highbit64() :
* gives position of highest bit.
* note : only works for v > 0 !
*/
static unsigned FIO_highbit64(unsigned long long v)
{
unsigned count = 0;
assert(v != 0);
v >>= 1;
while (v) { v >>= 1; count++; }
return count;
}
/* FIO_zstdErrorHelp() :
* detailed error message when requested window size is too large */
static void FIO_zstdErrorHelp(dRess_t* ress, size_t err, char const* srcFileName)
{ {
ZSTD_frameHeader header; ZSTD_frameHeader header;
/* No special help for these errors */
if (ZSTD_getErrorCode(ret) != ZSTD_error_frameParameter_windowTooLarge) /* Help message only for one specific error */
if (ZSTD_getErrorCode(err) != ZSTD_error_frameParameter_windowTooLarge)
return; return;
/* Try to decode the frame header */ /* Try to decode the frame header */
ret = ZSTD_getFrameHeader(&header, ress->srcBuffer, ress->srcBufferLoaded); err = ZSTD_getFrameHeader(&header, ress->srcBuffer, ress->srcBufferLoaded);
if (ret == 0) { if (err == 0) {
U32 const windowSize = (U32)header.windowSize; unsigned long long const windowSize = header.windowSize;
U32 const windowLog = BIT_highbit32(windowSize) + ((windowSize & (windowSize - 1)) != 0); U32 const windowLog = FIO_highbit64(windowSize) + ((windowSize & (windowSize - 1)) != 0);
U32 const windowMB = (windowSize >> 20) + ((windowSize & ((1 MB) - 1)) != 0); U32 const windowMB = (U32)((windowSize >> 20) + ((windowSize & ((1 MB) - 1)) != 0));
assert(header.windowSize <= (U64)((U32)-1)); assert(windowSize < (U64)(1ULL << 52));
assert(g_memLimit > 0); assert(g_memLimit > 0);
DISPLAYLEVEL(1, "%s : Window size larger than maximum : %llu > %u\n", DISPLAYLEVEL(1, "%s : Window size larger than maximum : %llu > %u\n",
srcFileName, header.windowSize, g_memLimit); srcFileName, windowSize, g_memLimit);
if (windowLog <= ZSTD_WINDOWLOG_MAX) { if (windowLog <= ZSTD_WINDOWLOG_MAX) {
DISPLAYLEVEL(1, "%s : Use --long=%u or --memory=%uMB\n", DISPLAYLEVEL(1, "%s : Use --long=%u or --memory=%uMB\n",
srcFileName, windowLog, windowMB); srcFileName, windowLog, windowMB);
return; return;
} }
} else if (ZSTD_getErrorCode(ret) != ZSTD_error_frameParameter_windowTooLarge) {
DISPLAYLEVEL(1, "%s : Error decoding frame header to read window size : %s\n",
srcFileName, ZSTD_getErrorName(ret));
return;
} }
DISPLAYLEVEL(1, "%s : Window log larger than ZSTD_WINDOWLOG_MAX=%u not supported\n", DISPLAYLEVEL(1, "%s : Window log larger than ZSTD_WINDOWLOG_MAX=%u; not supported\n",
srcFileName, ZSTD_WINDOWLOG_MAX); srcFileName, ZSTD_WINDOWLOG_MAX);
} }

View File

@ -54,7 +54,7 @@ void FIO_setDictIDFlag(unsigned dictIDFlag);
void FIO_setChecksumFlag(unsigned checksumFlag); void FIO_setChecksumFlag(unsigned checksumFlag);
void FIO_setRemoveSrcFile(unsigned flag); void FIO_setRemoveSrcFile(unsigned flag);
void FIO_setMemLimit(unsigned memLimit); void FIO_setMemLimit(unsigned memLimit);
void FIO_setNbThreads(unsigned nbThreads); void FIO_setNbWorkers(unsigned nbWorkers);
void FIO_setBlockSize(unsigned blockSize); void FIO_setBlockSize(unsigned blockSize);
void FIO_setOverlapLog(unsigned overlapLog); void FIO_setOverlapLog(unsigned overlapLog);
void FIO_setLdmFlag(unsigned ldmFlag); void FIO_setLdmFlag(unsigned ldmFlag);

View File

@ -142,7 +142,9 @@ static int g_utilDisplayLevel;
} }
return 1000000000ULL*(clockEnd.QuadPart - clockStart.QuadPart)/ticksPerSecond.QuadPart; return 1000000000ULL*(clockEnd.QuadPart - clockStart.QuadPart)/ticksPerSecond.QuadPart;
} }
#elif defined(__APPLE__) && defined(__MACH__) #elif defined(__APPLE__) && defined(__MACH__)
#include <mach/mach_time.h> #include <mach/mach_time.h>
#define UTIL_TIME_INITIALIZER 0 #define UTIL_TIME_INITIALIZER 0
typedef U64 UTIL_time_t; typedef U64 UTIL_time_t;
@ -167,7 +169,9 @@ static int g_utilDisplayLevel;
} }
return ((clockEnd - clockStart) * (U64)rate.numer) / ((U64)rate.denom); return ((clockEnd - clockStart) * (U64)rate.numer) / ((U64)rate.denom);
} }
#elif (PLATFORM_POSIX_VERSION >= 200112L) && (defined __UCLIBC__ || ((__GLIBC__ == 2 && __GLIBC_MINOR__ >= 17) || __GLIBC__ > 2)) #elif (PLATFORM_POSIX_VERSION >= 200112L) && (defined __UCLIBC__ || ((__GLIBC__ == 2 && __GLIBC_MINOR__ >= 17) || __GLIBC__ > 2))
#define UTIL_TIME_INITIALIZER { 0, 0 } #define UTIL_TIME_INITIALIZER { 0, 0 }
typedef struct timespec UTIL_freq_t; typedef struct timespec UTIL_freq_t;
typedef struct timespec UTIL_time_t; typedef struct timespec UTIL_time_t;
@ -223,6 +227,12 @@ UTIL_STATIC U64 UTIL_clockSpanMicro( UTIL_time_t clockStart )
return UTIL_getSpanTimeMicro(clockStart, clockEnd); return UTIL_getSpanTimeMicro(clockStart, clockEnd);
} }
/* returns time span in microseconds */
UTIL_STATIC U64 UTIL_clockSpanNano(UTIL_time_t clockStart )
{
UTIL_time_t const clockEnd = UTIL_getTime();
return UTIL_getSpanTimeNano(clockStart, clockEnd);
}
UTIL_STATIC void UTIL_waitForNextTick(void) UTIL_STATIC void UTIL_waitForNextTick(void)
{ {

View File

@ -116,10 +116,16 @@ the last one takes effect.
Note: If `windowLog` is set to larger than 27, `--long=windowLog` or Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
`--memory=windowSize` needs to be passed to the decompressor. `--memory=windowSize` needs to be passed to the decompressor.
* `-T#`, `--threads=#`: * `-T#`, `--threads=#`:
Compress using `#` threads (default: 1). Compress using `#` working threads (default: 1).
If `#` is 0, attempt to detect and use the number of physical CPU cores. If `#` is 0, attempt to detect and use the number of physical CPU cores.
In all cases, the nb of threads is capped to ZSTDMT_NBTHREADS_MAX==256. In all cases, the nb of threads is capped to ZSTDMT_NBTHREADS_MAX==200.
This modifier does nothing if `zstd` is compiled without multithread support. This modifier does nothing if `zstd` is compiled without multithread support.
* `--single-thread`:
Does not spawn a thread for compression, use caller thread instead.
This is the only available mode when multithread support is disabled.
In this mode, compression is serialized with I/O.
(This is different from `-T1`, which spawns 1 compression thread in parallel of I/O).
Single-thread mode also features lower memory usage.
* `-D file`: * `-D file`:
use `file` as Dictionary to compress or decompress FILE(s) use `file` as Dictionary to compress or decompress FILE(s)
* `--nodictID`: * `--nodictID`:

View File

@ -135,7 +135,7 @@ static int usage_advanced(const char* programName)
DISPLAY( "--ultra : enable levels beyond %i, up to %i (requires more memory)\n", ZSTDCLI_CLEVEL_MAX, ZSTD_maxCLevel()); DISPLAY( "--ultra : enable levels beyond %i, up to %i (requires more memory)\n", ZSTDCLI_CLEVEL_MAX, ZSTD_maxCLevel());
DISPLAY( "--long[=#] : enable long distance matching with given window log (default: %u)\n", g_defaultMaxWindowLog); DISPLAY( "--long[=#] : enable long distance matching with given window log (default: %u)\n", g_defaultMaxWindowLog);
#ifdef ZSTD_MULTITHREAD #ifdef ZSTD_MULTITHREAD
DISPLAY( " -T# : use # threads for compression (default: 1) \n"); DISPLAY( " -T# : spawns # compression threads (default: 1) \n");
DISPLAY( " -B# : select size of each job (default: 0==automatic) \n"); DISPLAY( " -B# : select size of each job (default: 0==automatic) \n");
#endif #endif
DISPLAY( "--no-dictID : don't write dictID into header (dictionary compression)\n"); DISPLAY( "--no-dictID : don't write dictID into header (dictionary compression)\n");
@ -366,21 +366,22 @@ typedef enum { zom_compress, zom_decompress, zom_test, zom_bench, zom_train, zom
int main(int argCount, const char* argv[]) int main(int argCount, const char* argv[])
{ {
int argNb, int argNb,
forceStdout=0,
followLinks = 0, followLinks = 0,
forceStdout = 0,
lastCommand = 0,
ldmFlag = 0,
main_pause = 0, main_pause = 0,
nextEntryIsDictionary=0, nbWorkers = 0,
operationResult=0,
nextArgumentIsOutFileName = 0, nextArgumentIsOutFileName = 0,
nextArgumentIsMaxDict = 0, nextArgumentIsMaxDict = 0,
nextArgumentIsDictID = 0, nextArgumentIsDictID = 0,
nextArgumentsAreFiles = 0, nextArgumentsAreFiles = 0,
ultra=0, nextEntryIsDictionary = 0,
lastCommand = 0, operationResult = 0,
nbThreads = 1,
setRealTimePrio = 0,
separateFiles = 0, separateFiles = 0,
ldmFlag = 0; setRealTimePrio = 0,
singleThread = 0,
ultra=0;
unsigned bench_nbSeconds = 3; /* would be better if this value was synchronized from bench */ unsigned bench_nbSeconds = 3; /* would be better if this value was synchronized from bench */
size_t blockSize = 0; size_t blockSize = 0;
zstd_operation_mode operation = zom_compress; zstd_operation_mode operation = zom_compress;
@ -418,11 +419,13 @@ int main(int argCount, const char* argv[])
if (filenameTable==NULL) { DISPLAY("zstd: %s \n", strerror(errno)); exit(1); } if (filenameTable==NULL) { DISPLAY("zstd: %s \n", strerror(errno)); exit(1); }
filenameTable[0] = stdinmark; filenameTable[0] = stdinmark;
g_displayOut = stderr; g_displayOut = stderr;
programName = lastNameFromPath(programName); programName = lastNameFromPath(programName);
#ifdef ZSTD_MULTITHREAD
nbWorkers = 1;
#endif
/* preset behaviors */ /* preset behaviors */
if (exeNameMatch(programName, ZSTD_ZSTDMT)) nbThreads=0; if (exeNameMatch(programName, ZSTD_ZSTDMT)) nbWorkers=0;
if (exeNameMatch(programName, ZSTD_UNZSTD)) operation=zom_decompress; if (exeNameMatch(programName, ZSTD_UNZSTD)) operation=zom_decompress;
if (exeNameMatch(programName, ZSTD_CAT)) { operation=zom_decompress; forceStdout=1; FIO_overwriteMode(); outFileName=stdoutmark; g_displayLevel=1; } /* supports multiple formats */ if (exeNameMatch(programName, ZSTD_CAT)) { operation=zom_decompress; forceStdout=1; FIO_overwriteMode(); outFileName=stdoutmark; g_displayLevel=1; } /* supports multiple formats */
if (exeNameMatch(programName, ZSTD_ZCAT)) { operation=zom_decompress; forceStdout=1; FIO_overwriteMode(); outFileName=stdoutmark; g_displayLevel=1; } /* behave like zcat, also supports multiple formats */ if (exeNameMatch(programName, ZSTD_ZCAT)) { operation=zom_decompress; forceStdout=1; FIO_overwriteMode(); outFileName=stdoutmark; g_displayLevel=1; } /* behave like zcat, also supports multiple formats */
@ -481,6 +484,7 @@ int main(int argCount, const char* argv[])
if (!strcmp(argument, "--keep")) { FIO_setRemoveSrcFile(0); continue; } if (!strcmp(argument, "--keep")) { FIO_setRemoveSrcFile(0); continue; }
if (!strcmp(argument, "--rm")) { FIO_setRemoveSrcFile(1); continue; } if (!strcmp(argument, "--rm")) { FIO_setRemoveSrcFile(1); continue; }
if (!strcmp(argument, "--priority=rt")) { setRealTimePrio = 1; continue; } if (!strcmp(argument, "--priority=rt")) { setRealTimePrio = 1; continue; }
if (!strcmp(argument, "--single-thread")) { nbWorkers = 0; singleThread = 1; continue; }
#ifdef ZSTD_GZCOMPRESS #ifdef ZSTD_GZCOMPRESS
if (!strcmp(argument, "--format=gzip")) { suffix = GZ_EXTENSION; FIO_setCompressionType(FIO_gzipCompression); continue; } if (!strcmp(argument, "--format=gzip")) { suffix = GZ_EXTENSION; FIO_setCompressionType(FIO_gzipCompression); continue; }
#endif #endif
@ -515,7 +519,7 @@ int main(int argCount, const char* argv[])
continue; continue;
} }
#endif #endif
if (longCommandWArg(&argument, "--threads=")) { nbThreads = readU32FromChar(&argument); continue; } if (longCommandWArg(&argument, "--threads=")) { nbWorkers = readU32FromChar(&argument); continue; }
if (longCommandWArg(&argument, "--memlimit=")) { memLimit = readU32FromChar(&argument); continue; } if (longCommandWArg(&argument, "--memlimit=")) { memLimit = readU32FromChar(&argument); continue; }
if (longCommandWArg(&argument, "--memory=")) { memLimit = readU32FromChar(&argument); continue; } if (longCommandWArg(&argument, "--memory=")) { memLimit = readU32FromChar(&argument); continue; }
if (longCommandWArg(&argument, "--memlimit-decompress=")) { memLimit = readU32FromChar(&argument); continue; } if (longCommandWArg(&argument, "--memlimit-decompress=")) { memLimit = readU32FromChar(&argument); continue; }
@ -648,7 +652,7 @@ int main(int argCount, const char* argv[])
/* nb of threads (hidden option) */ /* nb of threads (hidden option) */
case 'T': case 'T':
argument++; argument++;
nbThreads = readU32FromChar(&argument); nbWorkers = readU32FromChar(&argument);
break; break;
/* Dictionary Selection level */ /* Dictionary Selection level */
@ -716,11 +720,13 @@ int main(int argCount, const char* argv[])
/* Welcome message (if verbose) */ /* Welcome message (if verbose) */
DISPLAYLEVEL(3, WELCOME_MESSAGE); DISPLAYLEVEL(3, WELCOME_MESSAGE);
if (nbThreads == 0) { #ifdef ZSTD_MULTITHREAD
/* try to guess */ if ((nbWorkers==0) && (!singleThread)) {
nbThreads = UTIL_countPhysicalCores(); /* automatically set # workers based on # of reported cpus */
DISPLAYLEVEL(3, "Note: %d physical core(s) detected \n", nbThreads); nbWorkers = UTIL_countPhysicalCores();
DISPLAYLEVEL(3, "Note: %d physical core(s) detected \n", nbWorkers);
} }
#endif
g_utilDisplayLevel = g_displayLevel; g_utilDisplayLevel = g_displayLevel;
if (!followLinks) { if (!followLinks) {
@ -763,7 +769,7 @@ int main(int argCount, const char* argv[])
BMK_setNotificationLevel(g_displayLevel); BMK_setNotificationLevel(g_displayLevel);
BMK_setSeparateFiles(separateFiles); BMK_setSeparateFiles(separateFiles);
BMK_setBlockSize(blockSize); BMK_setBlockSize(blockSize);
BMK_setNbThreads(nbThreads); BMK_setNbWorkers(nbWorkers);
BMK_setRealTime(setRealTimePrio); BMK_setRealTime(setRealTimePrio);
BMK_setNbSeconds(bench_nbSeconds); BMK_setNbSeconds(bench_nbSeconds);
BMK_setLdmFlag(ldmFlag); BMK_setLdmFlag(ldmFlag);
@ -791,7 +797,7 @@ int main(int argCount, const char* argv[])
zParams.dictID = dictID; zParams.dictID = dictID;
if (cover) { if (cover) {
int const optimize = !coverParams.k || !coverParams.d; int const optimize = !coverParams.k || !coverParams.d;
coverParams.nbThreads = nbThreads; coverParams.nbThreads = nbWorkers;
coverParams.zParams = zParams; coverParams.zParams = zParams;
operationResult = DiB_trainFromFiles(outFileName, maxDictSize, filenameTable, filenameIdx, blockSize, NULL, &coverParams, optimize); operationResult = DiB_trainFromFiles(outFileName, maxDictSize, filenameTable, filenameIdx, blockSize, NULL, &coverParams, optimize);
} else { } else {
@ -835,7 +841,7 @@ int main(int argCount, const char* argv[])
FIO_setNotificationLevel(g_displayLevel); FIO_setNotificationLevel(g_displayLevel);
if (operation==zom_compress) { if (operation==zom_compress) {
#ifndef ZSTD_NOCOMPRESS #ifndef ZSTD_NOCOMPRESS
FIO_setNbThreads(nbThreads); FIO_setNbWorkers(nbWorkers);
FIO_setBlockSize((U32)blockSize); FIO_setBlockSize((U32)blockSize);
FIO_setLdmFlag(ldmFlag); FIO_setLdmFlag(ldmFlag);
FIO_setLdmHashLog(g_ldmHashLog); FIO_setLdmHashLog(g_ldmHashLog);

View File

@ -15,6 +15,7 @@
#include "util.h" /* Compiler options, UTIL_GetFileSize */ #include "util.h" /* Compiler options, UTIL_GetFileSize */
#include <stdlib.h> /* malloc */ #include <stdlib.h> /* malloc */
#include <stdio.h> /* fprintf, fopen, ftello64 */ #include <stdio.h> /* fprintf, fopen, ftello64 */
#include <assert.h> /* assert */
#include "mem.h" /* U32 */ #include "mem.h" /* U32 */
#ifndef ZSTD_DLL_IMPORT #ifndef ZSTD_DLL_IMPORT
@ -181,7 +182,7 @@ static size_t local_ZSTD_compress_generic_T2_end(void* dst, size_t dstCapacity,
ZSTD_inBuffer buffIn; ZSTD_inBuffer buffIn;
(void)buff2; (void)buff2;
ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_compressionLevel, 1); ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_compressionLevel, 1);
ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_nbThreads, 2); ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_nbWorkers, 2);
buffOut.dst = dst; buffOut.dst = dst;
buffOut.size = dstCapacity; buffOut.size = dstCapacity;
buffOut.pos = 0; buffOut.pos = 0;
@ -198,7 +199,7 @@ static size_t local_ZSTD_compress_generic_T2_continue(void* dst, size_t dstCapac
ZSTD_inBuffer buffIn; ZSTD_inBuffer buffIn;
(void)buff2; (void)buff2;
ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_compressionLevel, 1); ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_compressionLevel, 1);
ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_nbThreads, 2); ZSTD_CCtx_setParameter(g_cstream, ZSTD_p_nbWorkers, 2);
buffOut.dst = dst; buffOut.dst = dst;
buffOut.size = dstCapacity; buffOut.size = dstCapacity;
buffOut.pos = 0; buffOut.pos = 0;
@ -413,33 +414,48 @@ static size_t benchMem(const void* src, size_t srcSize, U32 benchNb)
break; break;
/* test functions */ /* test functions */
/* by convention, test functions can be added > 100 */ /* convention: test functions have ID > 100 */
default : ; default : ;
} }
{ size_t i; for (i=0; i<dstBuffSize; i++) dstBuff[i]=(BYTE)i; } /* warming up memory */ /* warming up memory */
{ size_t i; for (i=0; i<dstBuffSize; i++) dstBuff[i]=(BYTE)i; }
/* benchmark loop */
{ U32 loopNb; { U32 loopNb;
U32 nbRounds = (U32)((50 MB) / (srcSize+1)) + 1; /* initial conservative speed estimate */
# define TIME_SEC_MICROSEC (1*1000000ULL) /* 1 second */ # define TIME_SEC_MICROSEC (1*1000000ULL) /* 1 second */
U64 const clockLoop = TIMELOOP_S * TIME_SEC_MICROSEC; # define TIME_SEC_NANOSEC (1*1000000000ULL) /* 1 second */
DISPLAY("%2i- %-30.30s : \r", benchNb, benchName); DISPLAY("%2i- %-30.30s : \r", benchNb, benchName);
for (loopNb = 1; loopNb <= g_nbIterations; loopNb++) { for (loopNb = 1; loopNb <= g_nbIterations; loopNb++) {
UTIL_time_t clockStart; UTIL_time_t clockStart;
size_t benchResult=0; size_t benchResult=0;
U32 nbRounds; U32 roundNb;
UTIL_sleepMilli(1); /* give processor time to other processes */ UTIL_sleepMilli(5); /* give processor time to other processes */
UTIL_waitForNextTick(); UTIL_waitForNextTick();
clockStart = UTIL_getTime(); clockStart = UTIL_getTime();
for (nbRounds=0; UTIL_clockSpanMicro(clockStart) < clockLoop; nbRounds++) { for (roundNb=0; roundNb < nbRounds; roundNb++) {
benchResult = benchFunction(dstBuff, dstBuffSize, buff2, src, srcSize); benchResult = benchFunction(dstBuff, dstBuffSize, buff2, src, srcSize);
if (ZSTD_isError(benchResult)) { DISPLAY("ERROR ! %s() => %s !! \n", benchName, ZSTD_getErrorName(benchResult)); exit(1); } if (ZSTD_isError(benchResult)) {
} DISPLAY("ERROR ! %s() => %s !! \n", benchName, ZSTD_getErrorName(benchResult));
{ U64 const clockSpanMicro = UTIL_clockSpanMicro(clockStart); exit(1);
double const averageTime = (double)clockSpanMicro / TIME_SEC_MICROSEC / nbRounds; } }
{ U64 const clockSpanNano = UTIL_clockSpanNano(clockStart);
double const averageTime = (double)clockSpanNano / TIME_SEC_NANOSEC / nbRounds;
if (clockSpanNano > 0) {
if (averageTime < bestTime) bestTime = averageTime; if (averageTime < bestTime) bestTime = averageTime;
DISPLAY("%2i- %-30.30s : %7.1f MB/s (%9u)\r", loopNb, benchName, (double)srcSize / (1 MB) / bestTime, (U32)benchResult); assert(bestTime > (1./2000000000));
nbRounds = (U32)(1. / bestTime); /* aim for 1 sec */
DISPLAY("%2i- %-30.30s : %7.1f MB/s (%9u)\r",
loopNb, benchName,
(double)srcSize / (1 MB) / bestTime,
(U32)benchResult);
} else {
assert(nbRounds < 40000000); /* avoid overflow */
nbRounds *= 100;
}
} } } } } }
DISPLAY("%2u\n", benchNb); DISPLAY("%2u\n", benchNb);
@ -573,7 +589,7 @@ int main(int argc, const char** argv)
for(i=1; i<argc; i++) { for(i=1; i<argc; i++) {
const char* argument = argv[i]; const char* argument = argv[i];
if(!argument) continue; /* Protection if argument empty */ assert(argument != NULL);
/* Commands (note : aggregated commands are allowed) */ /* Commands (note : aggregated commands are allowed) */
if (argument[0]=='-') { if (argument[0]=='-') {

View File

@ -53,7 +53,7 @@ static const U32 nbTestsDefault = 30000;
/*-************************************ /*-************************************
* Display Macros * Display Macros
**************************************/ **************************************/
#define DISPLAY(...) fprintf(stdout, __VA_ARGS__) #define DISPLAY(...) fprintf(stderr, __VA_ARGS__)
#define DISPLAYLEVEL(l, ...) if (g_displayLevel>=l) { DISPLAY(__VA_ARGS__); } #define DISPLAYLEVEL(l, ...) if (g_displayLevel>=l) { DISPLAY(__VA_ARGS__); }
static U32 g_displayLevel = 2; static U32 g_displayLevel = 2;
@ -63,7 +63,7 @@ static UTIL_time_t g_displayClock = UTIL_TIME_INITIALIZER;
#define DISPLAYUPDATE(l, ...) if (g_displayLevel>=l) { \ #define DISPLAYUPDATE(l, ...) if (g_displayLevel>=l) { \
if ((UTIL_clockSpanMicro(g_displayClock) > g_refreshRate) || (g_displayLevel>=4)) \ if ((UTIL_clockSpanMicro(g_displayClock) > g_refreshRate) || (g_displayLevel>=4)) \
{ g_displayClock = UTIL_getTime(); DISPLAY(__VA_ARGS__); \ { g_displayClock = UTIL_getTime(); DISPLAY(__VA_ARGS__); \
if (g_displayLevel>=4) fflush(stdout); } } if (g_displayLevel>=4) fflush(stderr); } }
#undef MIN #undef MIN
@ -226,7 +226,7 @@ static int FUZ_mallocTests(unsigned seed, double compressibility, unsigned part)
ZSTD_outBuffer out = { outBuffer, outSize, 0 }; ZSTD_outBuffer out = { outBuffer, outSize, 0 };
ZSTD_inBuffer in = { inBuffer, inSize, 0 }; ZSTD_inBuffer in = { inBuffer, inSize, 0 };
CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_compressionLevel, (U32)compressionLevel) ); CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_compressionLevel, (U32)compressionLevel) );
CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_nbThreads, nbThreads) ); CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_nbWorkers, nbThreads) );
while ( ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_end) ) {} while ( ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_end) ) {}
ZSTD_freeCCtx(cctx); ZSTD_freeCCtx(cctx);
DISPLAYLEVEL(3, "compress_generic,-T%u,end level %i : ", DISPLAYLEVEL(3, "compress_generic,-T%u,end level %i : ",
@ -246,7 +246,7 @@ static int FUZ_mallocTests(unsigned seed, double compressibility, unsigned part)
ZSTD_outBuffer out = { outBuffer, outSize, 0 }; ZSTD_outBuffer out = { outBuffer, outSize, 0 };
ZSTD_inBuffer in = { inBuffer, inSize, 0 }; ZSTD_inBuffer in = { inBuffer, inSize, 0 };
CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_compressionLevel, (U32)compressionLevel) ); CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_compressionLevel, (U32)compressionLevel) );
CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_nbThreads, nbThreads) ); CHECK_Z( ZSTD_CCtx_setParameter(cctx, ZSTD_p_nbWorkers, nbThreads) );
CHECK_Z( ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_continue) ); CHECK_Z( ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_continue) );
while ( ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_end) ) {} while ( ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_end) ) {}
ZSTD_freeCCtx(cctx); ZSTD_freeCCtx(cctx);
@ -1209,7 +1209,7 @@ static int basicUnitTests(U32 seed, double compressibility)
if (strcmp("No error detected", ZSTD_getErrorName(ZSTD_error_GENERIC)) != 0) goto _output_error; if (strcmp("No error detected", ZSTD_getErrorName(ZSTD_error_GENERIC)) != 0) goto _output_error;
DISPLAYLEVEL(3, "OK \n"); DISPLAYLEVEL(3, "OK \n");
DISPLAYLEVEL(4, "test%3i : testing ZSTD dictionary sizes : ", testNb++); DISPLAYLEVEL(3, "test%3i : testing ZSTD dictionary sizes : ", testNb++);
RDG_genBuffer(CNBuffer, CNBuffSize, compressibility, 0., seed); RDG_genBuffer(CNBuffer, CNBuffSize, compressibility, 0., seed);
{ {
size_t const size = MIN(128 KB, CNBuffSize); size_t const size = MIN(128 KB, CNBuffSize);
@ -1230,6 +1230,7 @@ static int basicUnitTests(U32 seed, double compressibility)
ZSTD_freeCDict(lgCDict); ZSTD_freeCDict(lgCDict);
ZSTD_freeCCtx(cctx); ZSTD_freeCCtx(cctx);
} }
DISPLAYLEVEL(3, "OK \n");
_end: _end:
free(CNBuffer); free(CNBuffer);

View File

@ -634,6 +634,7 @@ roundTripTest -g518K "19 --long"
fileRoundTripTest -g5M "3 --long" fileRoundTripTest -g5M "3 --long"
roundTripTest -g96K "5 --single-thread"
if [ -n "$hasMT" ] if [ -n "$hasMT" ]
then then
$ECHO "\n===> zstdmt round-trip tests " $ECHO "\n===> zstdmt round-trip tests "

View File

@ -94,7 +94,7 @@ static size_t cctxParamRoundTripTest(void* resultBuff, size_t resultBuffCapacity
/* Set parameters */ /* Set parameters */
CHECK_Z( ZSTD_CCtxParam_setParameter(cctxParams, ZSTD_p_compressionLevel, cLevel) ); CHECK_Z( ZSTD_CCtxParam_setParameter(cctxParams, ZSTD_p_compressionLevel, cLevel) );
CHECK_Z( ZSTD_CCtxParam_setParameter(cctxParams, ZSTD_p_nbThreads, 2) ); CHECK_Z( ZSTD_CCtxParam_setParameter(cctxParams, ZSTD_p_nbWorkers, 2) );
CHECK_Z( ZSTD_CCtxParam_setParameter(cctxParams, ZSTD_p_overlapSizeLog, 5) ); CHECK_Z( ZSTD_CCtxParam_setParameter(cctxParams, ZSTD_p_overlapSizeLog, 5) );

View File

@ -753,9 +753,9 @@ static int basicUnitTests(U32 seed, double compressibility)
DISPLAYLEVEL(3, "OK \n"); DISPLAYLEVEL(3, "OK \n");
/* Complex multithreading + dictionary test */ /* Complex multithreading + dictionary test */
{ U32 const nbThreads = 2; { U32 const nbWorkers = 2;
size_t const jobSize = 4 * 1 MB; size_t const jobSize = 4 * 1 MB;
size_t const srcSize = jobSize * nbThreads; /* we want each job to have predictable size */ size_t const srcSize = jobSize * nbWorkers; /* we want each job to have predictable size */
size_t const segLength = 2 KB; size_t const segLength = 2 KB;
size_t const offset = 600 KB; /* must be larger than window defined in cdict */ size_t const offset = 600 KB; /* must be larger than window defined in cdict */
size_t const start = jobSize + (offset-1); size_t const start = jobSize + (offset-1);
@ -763,7 +763,7 @@ static int basicUnitTests(U32 seed, double compressibility)
BYTE* const dst = (BYTE*)CNBuffer + start - offset; BYTE* const dst = (BYTE*)CNBuffer + start - offset;
DISPLAYLEVEL(3, "test%3i : compress %u bytes with multiple threads + dictionary : ", testNb++, (U32)srcSize); DISPLAYLEVEL(3, "test%3i : compress %u bytes with multiple threads + dictionary : ", testNb++, (U32)srcSize);
CHECK_Z( ZSTD_CCtx_setParameter(zc, ZSTD_p_compressionLevel, 3) ); CHECK_Z( ZSTD_CCtx_setParameter(zc, ZSTD_p_compressionLevel, 3) );
CHECK_Z( ZSTD_CCtx_setParameter(zc, ZSTD_p_nbThreads, 2) ); CHECK_Z( ZSTD_CCtx_setParameter(zc, ZSTD_p_nbWorkers, nbWorkers) );
CHECK_Z( ZSTD_CCtx_setParameter(zc, ZSTD_p_jobSize, jobSize) ); CHECK_Z( ZSTD_CCtx_setParameter(zc, ZSTD_p_jobSize, jobSize) );
assert(start > offset); assert(start > offset);
assert(start + segLength < COMPRESSIBLE_NOISE_LENGTH); assert(start + segLength < COMPRESSIBLE_NOISE_LENGTH);
@ -1672,7 +1672,7 @@ static int fuzzerTests_newAPI(U32 seed, U32 nbTests, unsigned startTest, double
U32 const nbThreadsAdjusted = (windowLogMalus < nbThreadsCandidate) ? nbThreadsCandidate - windowLogMalus : 1; U32 const nbThreadsAdjusted = (windowLogMalus < nbThreadsCandidate) ? nbThreadsCandidate - windowLogMalus : 1;
U32 const nbThreads = MIN(nbThreadsAdjusted, nbThreadsMax); U32 const nbThreads = MIN(nbThreadsAdjusted, nbThreadsMax);
DISPLAYLEVEL(5, "t%u: nbThreads : %u \n", testNb, nbThreads); DISPLAYLEVEL(5, "t%u: nbThreads : %u \n", testNb, nbThreads);
CHECK_Z( setCCtxParameter(zc, cctxParams, ZSTD_p_nbThreads, nbThreads, useOpaqueAPI) ); CHECK_Z( setCCtxParameter(zc, cctxParams, ZSTD_p_nbWorkers, nbThreads, useOpaqueAPI) );
if (nbThreads > 1) { if (nbThreads > 1) {
U32 const jobLog = FUZ_rand(&lseed) % (testLog+1); U32 const jobLog = FUZ_rand(&lseed) % (testLog+1);
CHECK_Z( setCCtxParameter(zc, cctxParams, ZSTD_p_overlapSizeLog, FUZ_rand(&lseed) % 10, useOpaqueAPI) ); CHECK_Z( setCCtxParameter(zc, cctxParams, ZSTD_p_overlapSizeLog, FUZ_rand(&lseed) % 10, useOpaqueAPI) );