Merge remote-tracking branch 'refs/remotes/Cyan4973/dev' into dev

This commit is contained in:
inikep 2016-06-17 15:17:37 +02:00
commit dba8b44370
25 changed files with 276 additions and 130 deletions

1
.gitattributes vendored
View File

@ -18,3 +18,4 @@
# Windows
*.bat text eol=crlf
*.cmd text eol=crlf

1
.gitignore vendored
View File

@ -13,6 +13,7 @@
*.dylib
# Executables
zstd
*.exe
*.out
*.app

View File

@ -51,6 +51,7 @@ all:
zstdprogram:
$(MAKE) -C $(PRGDIR)
mv $(PRGDIR)/zstd .
zlibwrapper:
$(MAKE) -C $(ZSTDDIR) all

5
NEWS
View File

@ -5,8 +5,9 @@ New : Visual build scripts, by Christophe Chevalier
New : Support for Sparse File-systems (do not use space for zero-filled sectors)
New : Frame checksum support
New : Support pass-through mode (when using `-df`)
New : API : dictionary files from custom content, by Giuseppe Ottaviano
New : API support for custom malloc/free functions
API : more efficient Dictionary API : `ZSTD_compress_usingCDict()`, `ZSTD_decompress_usingDDict()`
API : create dictionary files from custom content, by Giuseppe Ottaviano
API : support for custom malloc/free functions
New : controllable Dictionary ID
New : Support for skippable frames

View File

@ -1,4 +1,4 @@
**Zstd**, short for Zstandard, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level compression ratio.
**Zstd**, short for Zstandard, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.
It is provided as a BSD-license package, hosted on Github.
@ -7,7 +7,7 @@ It is provided as a BSD-license package, hosted on Github.
|master | [![Build Status](https://travis-ci.org/Cyan4973/zstd.svg?branch=master)](https://travis-ci.org/Cyan4973/zstd) |
|dev | [![Build Status](https://travis-ci.org/Cyan4973/zstd.svg?branch=dev)](https://travis-ci.org/Cyan4973/zstd) |
As a reference, several fast compression algorithms were tested and compared to [zlib] on a Core i7-3930K CPU @ 4.5GHz, using [lzbench], an open-source in-memory benchmark by @inikep compiled with gcc 5.2.1, on the [Silesia compression corpus].
As a reference, several fast compression algorithms were tested and compared on a Core i7-3930K CPU @ 4.5GHz, using [lzbench], an open-source in-memory benchmark by @inikep compiled with gcc 5.2.1, with the [Silesia compression corpus].
[lzbench]: https://github.com/inikep/lzbench
[Silesia compression corpus]: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
@ -16,7 +16,7 @@ As a reference, several fast compression algorithms were tested and compared to
|Name | Ratio | C.speed | D.speed |
|-----------------|-------|--------:|--------:|
| | | MB/s | MB/s |
|**zstd 0.6.0 -1**|**2.877**|**330**| **915** |
|**zstd 0.7.0 -1**|**2.877**|**325**| **930** |
| [zlib] 1.2.8 -1 | 2.730 | 95 | 360 |
| brotli -0 | 2.708 | 220 | 430 |
| QuickLZ 1.5 | 2.237 | 510 | 605 |
@ -28,16 +28,16 @@ As a reference, several fast compression algorithms were tested and compared to
[zlib]:http://www.zlib.net/
[LZ4]: http://www.lz4.org/
Zstd can also offer stronger compression ratio at the cost of compression speed.
Speed vs Compression trade-off is configurable by small increment. Decompression speed is preserved and remain roughly the same at all settings, a property shared by most LZ compression algorithms, such as [zlib].
Zstd can also offer stronger compression ratios at the cost of compression speed.
Speed vs Compression trade-off is configurable by small increment. Decompression speed is preserved and remain roughly the same at all settings, a property shared by most LZ compression algorithms, such as [zlib] or lzma.
The following test is run on a Core i7-3930K CPU @ 4.5GHz, using [lzbench], an open-source in-memory benchmark by @inikep compiled with gcc 5.2.1, on the [Silesia compression corpus].
The following tests were run on a Core i7-3930K CPU @ 4.5GHz, using [lzbench], an open-source in-memory benchmark by @inikep compiled with gcc 5.2.1, on the [Silesia compression corpus].
Compression Speed vs Ratio | Decompression Speed
---------------------------|--------------------
![Compression Speed vs Ratio](images/Cspeed4.png "Compression Speed vs Ratio") | ![Decompression Speed](images/Dspeed4.png "Decompression Speed")
Several algorithms can produce higher compression ratio at slower speed, falling outside of the graph.
Several algorithms can produce higher compression ratio but at slower speed, falling outside of the graph.
For a larger picture including very slow modes, [click on this link](images/DCspeed5.png) .
@ -74,8 +74,10 @@ Hence, deploying one dictionary per type of data will provide the greater benefi
### Status
Zstd is in development. The internal format evolves to reach better performance. "Final Format" is projected H1 2016, and will be tagged `v1.0`. Zstd offers legacy support, meaning any data compressed by any version >= 0.1 (therefore including current one) remain decodable in the future.
The library is also quite robust, able to withstand hazards situations, including invalid inputs. Library reliability has been tested using [Fuzz Testing](https://en.wikipedia.org/wiki/Fuzz_testing), with both [internal tools](programs/fuzzer.c) and [external ones](http://lcamtuf.coredump.cx/afl). Therefore, Zstandard is considered safe for production environments.
Zstd compression format has reached "Final status". It means it is planned to become the official stable zstd format and be tagged `v1.0`. The reason it's not yet tagged `v1.0` is that it currently performs its "validation period", making sure the format holds all its promises and nothing was missed.
Zstd library also offers legacy decoder support. Any data compressed by any version >= `v0.1` (hence including current one) remains decodable now and in the future.
The library has been validated using strong [fuzzer tests](https://en.wikipedia.org/wiki/Fuzz_testing), including both [internal tools](programs/fuzzer.c) and [external ones](http://lcamtuf.coredump.cx/afl). It's able to withstand hazard situations, including invalid inputs.
As a consequence, Zstandard is considered safe for, and is currently used in, production environments.
### Branch Policy

View File

@ -132,8 +132,8 @@ size_t FSE_count(unsigned* count, unsigned* maxSymbolValuePtr, const void* src,
/*! FSE_optimalTableLog():
dynamically downsize 'tableLog' when conditions are met.
It saves CPU time, by using smaller tables, while preserving or even improving compression ratio.
@return : recommended tableLog (necessarily <= initial 'tableLog') */
unsigned FSE_optimalTableLog(unsigned tableLog, size_t srcSize, unsigned maxSymbolValue);
@return : recommended tableLog (necessarily <= 'maxTableLog') */
unsigned FSE_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigned maxSymbolValue);
/*! FSE_normalizeCount():
normalize counts so that sum(count[]) == Power_of_2 (2^tableLog)

View File

@ -64,7 +64,7 @@
#endif
#define ZSTD_OPT_NUM (1<<12)
#define ZSTD_DICT_MAGIC 0xEC30A437
#define ZSTD_DICT_MAGIC 0xEC30A437 /* v0.7 */
#define ZSTD_REP_NUM 3
#define ZSTD_REP_INIT ZSTD_REP_NUM

View File

@ -1128,7 +1128,6 @@ void ZSTD_compressBlock_fast_generic(ZSTD_CCtx* cctx,
size_t offset_1=cctx->rep[0], offset_2=cctx->rep[1];
/* init */
ZSTD_resetSeqStore(seqStorePtr);
ip += (ip==lowest);
{ U32 const maxRep = (U32)(ip-lowest);
if (offset_1 > maxRep) offset_1 = 0;
@ -1239,7 +1238,6 @@ static void ZSTD_compressBlock_fast_extDict_generic(ZSTD_CCtx* ctx,
U32 offset_1=ctx->rep[0], offset_2=ctx->rep[1];
/* init */
ZSTD_resetSeqStore(seqStorePtr);
/* skip first position to avoid read overflow during repcode match check */
hashTable[ZSTD_hashPtr(ip, hBits, mls)] = (U32)(ip-base);
ip++;
@ -1743,7 +1741,6 @@ void ZSTD_compressBlock_lazy_generic(ZSTD_CCtx* ctx,
/* init */
ip += (ip==base);
ctx->nextToUpdate3 = ctx->nextToUpdate;
ZSTD_resetSeqStore(seqStorePtr);
{ U32 i;
U32 const maxRep = (U32)(ip-base);
for (i=0; i<ZSTD_REP_INIT; i++) {
@ -1847,7 +1844,7 @@ _storeSequence:
/* Save reps for next block */
{ int i;
for (i=0; i<ZSTD_REP_NUM; i++) {
if (!rep[i]) rep[i] = (U32)(iend-base); /* in case some zero are left */
if (!rep[i]) rep[i] = (U32)(iend - ctx->base); /* in case some zero are left */
ctx->savedRep[i] = rep[i];
} }
@ -1913,7 +1910,6 @@ void ZSTD_compressBlock_lazy_extDict_generic(ZSTD_CCtx* ctx,
{ U32 i; for (i=0; i<ZSTD_REP_INIT; i++) rep[i]=ctx->rep[i]; }
ctx->nextToUpdate3 = ctx->nextToUpdate;
ZSTD_resetSeqStore(seqStorePtr);
ip += (ip == prefixStart);
/* Match Loop */
@ -2097,11 +2093,7 @@ typedef void (*ZSTD_blockCompressor) (ZSTD_CCtx* ctx, const void* src, size_t sr
static ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, int extDict)
{
static const ZSTD_blockCompressor blockCompressor[2][6] = {
#if 1
{ ZSTD_compressBlock_fast, ZSTD_compressBlock_greedy, ZSTD_compressBlock_lazy, ZSTD_compressBlock_lazy2, ZSTD_compressBlock_btlazy2, ZSTD_compressBlock_btopt },
#else
{ ZSTD_compressBlock_fast_extDict, ZSTD_compressBlock_greedy_extDict, ZSTD_compressBlock_lazy_extDict,ZSTD_compressBlock_lazy2_extDict, ZSTD_compressBlock_btlazy2_extDict, ZSTD_compressBlock_btopt_extDict },
#endif
{ ZSTD_compressBlock_fast_extDict, ZSTD_compressBlock_greedy_extDict, ZSTD_compressBlock_lazy_extDict,ZSTD_compressBlock_lazy2_extDict, ZSTD_compressBlock_btlazy2_extDict, ZSTD_compressBlock_btopt_extDict }
};
@ -2111,8 +2103,9 @@ static ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, int
static size_t ZSTD_compressBlock_internal(ZSTD_CCtx* zc, void* dst, size_t dstCapacity, const void* src, size_t srcSize)
{
ZSTD_blockCompressor blockCompressor = ZSTD_selectBlockCompressor(zc->params.cParams.strategy, zc->lowLimit < zc->dictLimit);
ZSTD_blockCompressor const blockCompressor = ZSTD_selectBlockCompressor(zc->params.cParams.strategy, zc->lowLimit < zc->dictLimit);
if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) return 0; /* don't even attempt compression below a certain srcSize */
ZSTD_resetSeqStore(&(zc->seqStore));
blockCompressor(zc, src, srcSize);
return ZSTD_compressSequences(zc, dst, dstCapacity, srcSize);
}
@ -2335,52 +2328,60 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_CCtx* zc, const void* src, size_t
/* Dictionary format :
Magic == ZSTD_DICT_MAGIC (4 bytes)
HUF_writeCTable(256)
FSE_writeNCount(ml)
FSE_writeNCount(off)
FSE_writeNCount(ll)
RepOffsets
Dictionary content
*/
/*! ZSTD_loadDictEntropyStats() :
@return : size read from dictionary */
static size_t ZSTD_loadDictEntropyStats(ZSTD_CCtx* zc, const void* dict, size_t dictSize)
@return : size read from dictionary
note : magic number supposed already checked */
static size_t ZSTD_loadDictEntropyStats(ZSTD_CCtx* cctx, const void* dict, size_t dictSize)
{
/* note : magic number already checked */
size_t const dictSizeStart = dictSize;
const BYTE* dictPtr = (const BYTE*)dict;
const BYTE* const dictEnd = dictPtr + dictSize;
{ size_t const hufHeaderSize = HUF_readCTable(zc->hufTable, 255, dict, dictSize);
{ size_t const hufHeaderSize = HUF_readCTable(cctx->hufTable, 255, dict, dictSize);
if (HUF_isError(hufHeaderSize)) return ERROR(dictionary_corrupted);
dict = (const char*)dict + hufHeaderSize;
dictSize -= hufHeaderSize;
dictPtr += hufHeaderSize;
}
{ short offcodeNCount[MaxOff+1];
unsigned offcodeMaxValue = MaxOff, offcodeLog = OffFSELog;
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dict, dictSize);
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted);
{ size_t const errorCode = FSE_buildCTable(zc->offcodeCTable, offcodeNCount, offcodeMaxValue, offcodeLog);
{ size_t const errorCode = FSE_buildCTable(cctx->offcodeCTable, offcodeNCount, offcodeMaxValue, offcodeLog);
if (FSE_isError(errorCode)) return ERROR(dictionary_corrupted); }
dict = (const char*)dict + offcodeHeaderSize;
dictSize -= offcodeHeaderSize;
dictPtr += offcodeHeaderSize;
}
{ short matchlengthNCount[MaxML+1];
unsigned matchlengthMaxValue = MaxML, matchlengthLog = MLFSELog;
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dict, dictSize);
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(matchlengthHeaderSize)) return ERROR(dictionary_corrupted);
{ size_t const errorCode = FSE_buildCTable(zc->matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog);
{ size_t const errorCode = FSE_buildCTable(cctx->matchlengthCTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog);
if (FSE_isError(errorCode)) return ERROR(dictionary_corrupted); }
dict = (const char*)dict + matchlengthHeaderSize;
dictSize -= matchlengthHeaderSize;
dictPtr += matchlengthHeaderSize;
}
{ short litlengthNCount[MaxLL+1];
unsigned litlengthMaxValue = MaxLL, litlengthLog = LLFSELog;
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dict, dictSize);
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(litlengthHeaderSize)) return ERROR(dictionary_corrupted);
{ size_t const errorCode = FSE_buildCTable(zc->litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog);
{ size_t const errorCode = FSE_buildCTable(cctx->litlengthCTable, litlengthNCount, litlengthMaxValue, litlengthLog);
if (FSE_isError(errorCode)) return ERROR(dictionary_corrupted); }
dictSize -= litlengthHeaderSize;
dictPtr += litlengthHeaderSize;
}
zc->flagStaticTables = 1;
return (dictSizeStart-dictSize);
if (dictPtr+12 > dictEnd) return ERROR(dictionary_corrupted);
cctx->rep[0] = MEM_readLE32(dictPtr+0); if (cctx->rep[0] >= dictSize) return ERROR(dictionary_corrupted);
cctx->rep[1] = MEM_readLE32(dictPtr+4); if (cctx->rep[1] >= dictSize) return ERROR(dictionary_corrupted);
cctx->rep[2] = MEM_readLE32(dictPtr+8); if (cctx->rep[2] >= dictSize) return ERROR(dictionary_corrupted);
dictPtr += 12;
cctx->flagStaticTables = 1;
return dictPtr - (const BYTE*)dict;
}
/** ZSTD_compress_insertDictionary() :

View File

@ -465,7 +465,6 @@ void ZSTD_compressBlock_opt_generic(ZSTD_CCtx* ctx,
/* init */
ctx->nextToUpdate3 = ctx->nextToUpdate;
ZSTD_resetSeqStore(seqStorePtr);
ZSTD_rescaleFreqs(seqStorePtr);
ip += (ip==prefixStart);
{ U32 i; for (i=0; i<ZSTD_REP_INIT; i++) rep[i]=ctx->rep[i]; }
@ -574,7 +573,7 @@ void ZSTD_compressBlock_opt_generic(ZSTD_CCtx* ctx,
best_mlen = minMatch;
{ U32 i;
for (i=0; i<ZSTD_REP_NUM; i++) {
if ((rep[i]<(U32)(inr-prefixStart))
if ((opt[cur].rep[i]<(U32)(inr-prefixStart))
&& (MEM_readMINMATCH(inr, minMatch) == MEM_readMINMATCH(inr - opt[cur].rep[i], minMatch))) { /* check rep */
mlen = (U32)ZSTD_count(inr+minMatch, inr+minMatch - opt[cur].rep[i], iend) + minMatch;
ZSTD_LOG_PARSER("%d: Found REP %d/%d mlen=%d off=%d rep=%d opt[%d].off=%d\n", (int)(inr-base), i, ZSTD_REP_NUM, mlen, i, opt[cur].rep[i], cur, opt[cur].off);
@ -757,7 +756,6 @@ void ZSTD_compressBlock_opt_extDict_generic(ZSTD_CCtx* ctx,
{ U32 i; for (i=0; i<ZSTD_REP_INIT; i++) rep[i]=ctx->rep[i]; }
ctx->nextToUpdate3 = ctx->nextToUpdate;
ZSTD_resetSeqStore(seqStorePtr);
ZSTD_rescaleFreqs(seqStorePtr);
ip += (ip==prefixStart);

View File

@ -625,7 +625,7 @@ size_t HUF_decompress1X4_usingDTable(
const HUF_DTable* DTable)
{
DTableDesc dtd = HUF_getDTableDesc(DTable);
if (dtd.tableType != 0) return ERROR(GENERIC);
if (dtd.tableType != 1) return ERROR(GENERIC);
return HUF_decompress1X4_usingDTable_internal(dst, dstSize, cSrc, cSrcSize, DTable);
}

View File

@ -223,8 +223,10 @@ void ZSTD_copyDCtx(ZSTD_DCtx* dstDCtx, const ZSTD_DCtx* srcDCtx)
// new
1 byte - FrameHeaderDescription :
bit 0-1 : dictID (0, 1, 2 or 4 bytes)
bit 2-4 : reserved (must be zero)
bit 5 : SkippedWindowLog (if 1, WindowLog byte is not present)
bit 2 : checksumFlag
bit 3 : reserved (must be zero)
bit 4 : reserved (unused, can be any value)
bit 5 : Single Segment (if 1, WindowLog byte is not present)
bit 6-7 : FrameContentFieldSize (0, 2, 4, or 8)
if (SkippedWindowLog && !FrameContentFieldsize) FrameContentFieldsize=1;
@ -365,7 +367,7 @@ size_t ZSTD_getFrameParams(ZSTD_frameParams* fparamsPtr, const void* src, size_t
U32 windowSize = 0;
U32 dictID = 0;
U64 frameContentSize = 0;
if ((fhdByte & 0x18) != 0) return ERROR(frameParameter_unsupported); /* reserved bits */
if ((fhdByte & 0x08) != 0) return ERROR(frameParameter_unsupported); /* reserved bits, which must be zero */
if (!directMode) {
BYTE const wlByte = ip[pos++];
U32 const windowLog = (wlByte >> 3) + ZSTD_WINDOWLOG_ABSOLUTEMIN;
@ -1186,47 +1188,51 @@ static void ZSTD_refDictContent(ZSTD_DCtx* dctx, const void* dict, size_t dictSi
dctx->previousDstEnd = (const char*)dict + dictSize;
}
static size_t ZSTD_loadEntropy(ZSTD_DCtx* dctx, const void* dict, size_t const dictSizeStart)
static size_t ZSTD_loadEntropy(ZSTD_DCtx* dctx, const void* const dict, size_t const dictSize)
{
size_t dictSize = dictSizeStart;
const BYTE* dictPtr = (const BYTE*)dict;
const BYTE* const dictEnd = dictPtr + dictSize;
{ size_t const hSize = HUF_readDTableX4(dctx->hufTable, dict, dictSize);
if (HUF_isError(hSize)) return ERROR(dictionary_corrupted);
dict = (const char*)dict + hSize;
dictSize -= hSize;
dictPtr += hSize;
}
{ short offcodeNCount[MaxOff+1];
U32 offcodeMaxValue=MaxOff, offcodeLog=OffFSELog;
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dict, dictSize);
size_t const offcodeHeaderSize = FSE_readNCount(offcodeNCount, &offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(offcodeHeaderSize)) return ERROR(dictionary_corrupted);
{ size_t const errorCode = FSE_buildDTable(dctx->OffTable, offcodeNCount, offcodeMaxValue, offcodeLog);
if (FSE_isError(errorCode)) return ERROR(dictionary_corrupted); }
dict = (const char*)dict + offcodeHeaderSize;
dictSize -= offcodeHeaderSize;
dictPtr += offcodeHeaderSize;
}
{ short matchlengthNCount[MaxML+1];
unsigned matchlengthMaxValue = MaxML, matchlengthLog = MLFSELog;
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dict, dictSize);
size_t const matchlengthHeaderSize = FSE_readNCount(matchlengthNCount, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(matchlengthHeaderSize)) return ERROR(dictionary_corrupted);
{ size_t const errorCode = FSE_buildDTable(dctx->MLTable, matchlengthNCount, matchlengthMaxValue, matchlengthLog);
if (FSE_isError(errorCode)) return ERROR(dictionary_corrupted); }
dict = (const char*)dict + matchlengthHeaderSize;
dictSize -= matchlengthHeaderSize;
dictPtr += matchlengthHeaderSize;
}
{ short litlengthNCount[MaxLL+1];
unsigned litlengthMaxValue = MaxLL, litlengthLog = LLFSELog;
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dict, dictSize);
size_t const litlengthHeaderSize = FSE_readNCount(litlengthNCount, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr);
if (FSE_isError(litlengthHeaderSize)) return ERROR(dictionary_corrupted);
{ size_t const errorCode = FSE_buildDTable(dctx->LLTable, litlengthNCount, litlengthMaxValue, litlengthLog);
if (FSE_isError(errorCode)) return ERROR(dictionary_corrupted); }
dictSize -= litlengthHeaderSize;
dictPtr += litlengthHeaderSize;
}
if (dictPtr+12 > dictEnd) return ERROR(dictionary_corrupted);
dctx->rep[0] = MEM_readLE32(dictPtr+0); if (dctx->rep[0] >= dictSize) return ERROR(dictionary_corrupted);
dctx->rep[1] = MEM_readLE32(dictPtr+4); if (dctx->rep[1] >= dictSize) return ERROR(dictionary_corrupted);
dctx->rep[2] = MEM_readLE32(dictPtr+8); if (dctx->rep[2] >= dictSize) return ERROR(dictionary_corrupted);
dictPtr += 12;
dctx->litEntropy = dctx->fseEntropy = 1;
return dictSizeStart - dictSize;
return dictPtr - (const BYTE*)dict;
}
static size_t ZSTD_decompress_insertDictionary(ZSTD_DCtx* dctx, const void* dict, size_t dictSize)

View File

@ -578,9 +578,10 @@ typedef struct
void* workPlace; /* must be ZSTD_BLOCKSIZE_MAX allocated */
} EStats_ress_t;
#define MAXREPOFFSET 1024
static void ZDICT_countEStats(EStats_ress_t esr,
U32* countLit, U32* offsetcodeCount, U32* matchlengthCount, U32* litlengthCount,
U32* countLit, U32* offsetcodeCount, U32* matchlengthCount, U32* litlengthCount, U32* repOffsets,
const void* src, size_t srcSize)
{
const seqStore_t* seqStorePtr;
@ -614,6 +615,17 @@ static void ZDICT_countEStats(EStats_ress_t esr,
size_t u;
for (u=0; u<nbSeq; u++) litlengthCount[codePtr[u]]++;
} }
/* rep offsets */
{ const U32* const offsetPtr = seqStorePtr->offsetStart;
U32 offset1 = offsetPtr[0] - 3;
U32 offset2 = offsetPtr[1] - 3;
if (offset1 >= MAXREPOFFSET) offset1 = 0;
if (offset2 >= MAXREPOFFSET) offset2 = 0;
repOffsets[offset1] += 3;
repOffsets[offset2] += 1;
}
}
/*
@ -629,12 +641,29 @@ static size_t ZDICT_maxSampleSize(const size_t* fileSizes, unsigned nbFiles)
static size_t ZDICT_totalSampleSize(const size_t* fileSizes, unsigned nbFiles)
{
size_t total;
size_t total=0;
unsigned u;
for (u=0, total=0; u<nbFiles; u++) total += fileSizes[u];
for (u=0; u<nbFiles; u++) total += fileSizes[u];
return total;
}
typedef struct { U32 offset; U32 count; } offsetCount_t;
static void ZDICT_insertSortCount(offsetCount_t table[ZSTD_REP_NUM+1], U32 val, U32 count)
{
U32 u;
table[ZSTD_REP_NUM].offset = val;
table[ZSTD_REP_NUM].count = count;
for (u=ZSTD_REP_NUM; u>0; u--) {
offsetCount_t tmp;
if (table[u-1].count >= table[u].count) break;
tmp = table[u-1];
table[u-1] = table[u];
table[u] = tmp;
}
}
#define OFFCODE_MAX 18 /* only applicable to first block */
static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
unsigned compressionLevel,
@ -649,6 +678,8 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
short matchLengthNCount[MaxML+1];
U32 litLengthCount[MaxLL+1];
short litLengthNCount[MaxLL+1];
U32 repOffset[MAXREPOFFSET] = { 0 };
offsetCount_t bestRepOffset[ZSTD_REP_NUM+1];
EStats_ress_t esr;
ZSTD_parameters params;
U32 u, huffLog = 12, Offlog = OffFSELog, mlLog = MLFSELog, llLog = LLFSELog, total;
@ -656,12 +687,15 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
size_t eSize = 0;
size_t const totalSrcSize = ZDICT_totalSampleSize(fileSizes, nbFiles);
size_t const averageSampleSize = totalSrcSize / nbFiles;
BYTE* dstPtr = (BYTE*)dstBuffer;
/* init */
for (u=0; u<256; u++) countLit[u]=1; /* any character must be described */
for (u=0; u<=OFFCODE_MAX; u++) offcodeCount[u]=1;
for (u=0; u<=MaxML; u++) matchLengthCount[u]=1;
for (u=0; u<=MaxLL; u++) litLengthCount[u]=1;
repOffset[1] = repOffset[4] = repOffset[8] = 1;
memset(bestRepOffset, 0, sizeof(bestRepOffset));
esr.ref = ZSTD_createCCtx();
esr.zc = ZSTD_createCCtx();
esr.workPlace = malloc(ZSTD_BLOCKSIZE_MAX);
@ -679,7 +713,7 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
/* collect stats on all files */
for (u=0; u<nbFiles; u++) {
ZDICT_countEStats(esr,
countLit, offcodeCount, matchLengthCount, litLengthCount,
countLit, offcodeCount, matchLengthCount, litLengthCount, repOffset,
(const char*)srcBuffer + pos, fileSizes[u]);
pos += fileSizes[u];
}
@ -693,6 +727,13 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
}
huffLog = (U32)errorCode;
/* looking for most common first offsets */
{ U32 offset;
for (offset=1; offset<MAXREPOFFSET; offset++)
ZDICT_insertSortCount(bestRepOffset, offset, repOffset[offset]);
}
/* note : the result of this phase should be used to better appreciate the impact on statistics */
total=0; for (u=0; u<=OFFCODE_MAX; u++) total+=offcodeCount[u];
errorCode = FSE_normalizeCount(offcodeNCount, Offlog, offcodeCount, total, OFFCODE_MAX);
if (FSE_isError(errorCode)) {
@ -720,46 +761,70 @@ static size_t ZDICT_analyzeEntropy(void* dstBuffer, size_t maxDstSize,
}
llLog = (U32)errorCode;
/* write result to buffer */
errorCode = HUF_writeCTable(dstBuffer, maxDstSize, hufTable, 255, huffLog);
if (HUF_isError(errorCode)) {
{ size_t const hhSize = HUF_writeCTable(dstPtr, maxDstSize, hufTable, 255, huffLog);
if (HUF_isError(hhSize)) {
eSize = ERROR(GENERIC);
DISPLAYLEVEL(1, "HUF_writeCTable error");
goto _cleanup;
}
dstBuffer = (char*)dstBuffer + errorCode;
maxDstSize -= errorCode;
eSize += errorCode;
dstPtr += hhSize;
maxDstSize -= hhSize;
eSize += hhSize;
}
errorCode = FSE_writeNCount(dstBuffer, maxDstSize, offcodeNCount, OFFCODE_MAX, Offlog);
if (FSE_isError(errorCode)) {
{ size_t const ohSize = FSE_writeNCount(dstPtr, maxDstSize, offcodeNCount, OFFCODE_MAX, Offlog);
if (FSE_isError(ohSize)) {
eSize = ERROR(GENERIC);
DISPLAYLEVEL(1, "FSE_writeNCount error with offcodeNCount");
goto _cleanup;
}
dstBuffer = (char*)dstBuffer + errorCode;
maxDstSize -= errorCode;
eSize += errorCode;
dstPtr += ohSize;
maxDstSize -= ohSize;
eSize += ohSize;
}
errorCode = FSE_writeNCount(dstBuffer, maxDstSize, matchLengthNCount, MaxML, mlLog);
if (FSE_isError(errorCode)) {
{ size_t const mhSize = FSE_writeNCount(dstPtr, maxDstSize, matchLengthNCount, MaxML, mlLog);
if (FSE_isError(mhSize)) {
eSize = ERROR(GENERIC);
DISPLAYLEVEL(1, "FSE_writeNCount error with matchLengthNCount");
goto _cleanup;
}
dstBuffer = (char*)dstBuffer + errorCode;
maxDstSize -= errorCode;
eSize += errorCode;
dstPtr += mhSize;
maxDstSize -= mhSize;
eSize += mhSize;
}
errorCode = FSE_writeNCount(dstBuffer, maxDstSize, litLengthNCount, MaxLL, llLog);
if (FSE_isError(errorCode)) {
{ size_t const lhSize = FSE_writeNCount(dstPtr, maxDstSize, litLengthNCount, MaxLL, llLog);
if (FSE_isError(lhSize)) {
eSize = ERROR(GENERIC);
DISPLAYLEVEL(1, "FSE_writeNCount error with litlengthNCount");
goto _cleanup;
}
dstBuffer = (char*)dstBuffer + errorCode;
maxDstSize -= errorCode;
eSize += errorCode;
dstPtr += lhSize;
maxDstSize -= lhSize;
eSize += lhSize;
}
if (maxDstSize<12) {
eSize = ERROR(GENERIC);
DISPLAYLEVEL(1, "not enough space to write RepOffsets");
goto _cleanup;
}
# if 0
MEM_writeLE32(dstPtr+0, bestRepOffset[0].offset);
MEM_writeLE32(dstPtr+4, bestRepOffset[1].offset);
MEM_writeLE32(dstPtr+8, bestRepOffset[2].offset);
#else
/* at this stage, we don't use the result of "most common first offset",
as the impact of statistics is not properly evaluated */
MEM_writeLE32(dstPtr+0, repStartValue[0]);
MEM_writeLE32(dstPtr+4, repStartValue[1]);
MEM_writeLE32(dstPtr+8, repStartValue[2]);
#endif
dstPtr += 12;
eSize += 12;
_cleanup:
ZSTD_freeCCtx(esr.ref);

View File

@ -79,7 +79,7 @@ const char* ZDICT_getErrorName(size_t errorCode);
/* ====================================================================================
* The definitions in this section are considered experimental.
* They should never be used in association with a dynamic library, as they may change in the future.
* They should never be used with a dynamic library, as they may change in the future.
* They are provided for advanced usages.
* Use them only in association with static linking.
* ==================================================================================== */

1
programs/.gitignore vendored
View File

@ -50,3 +50,4 @@ afl
# Misc files
*.bat
fileTests.sh
dirTest*

View File

@ -847,10 +847,17 @@ int main(int argc, const char** argv)
/* Get Seed */
DISPLAY("Starting zstd tester (%i-bits, %s)\n", (int)(sizeof(size_t)*8), ZSTD_VERSION_STRING);
if (!seedset) seed = (U32)(clock() % 10000);
if (!seedset) {
time_t const t = time(NULL);
U32 const h = XXH32(&t, sizeof(t), 1);
seed = h % 10000;
}
DISPLAY("Seed = %u\n", seed);
if (proba!=FUZ_compressibility_default) DISPLAY("Compressibility : %u%%\n", proba);
if (nbTests < testNb) nbTests = testNb;
if (testNb==0)
result = basicUnitTests(0, ((double)proba) / 100); /* constant seed for predictability */
if (!result)

View File

@ -133,31 +133,6 @@ diff tmpSparse2M tmpSparseRegenerated
rm tmpSparse*
$ECHO "\n**** dictionary tests **** "
./datagen > tmpDict
./datagen -g1M | $MD5SUM > tmp1
./datagen -g1M | $ZSTD -D tmpDict | $ZSTD -D tmpDict -dvq | $MD5SUM > tmp2
diff -q tmp1 tmp2
$ECHO "Create first dictionary"
$ZSTD --train *.c -o tmpDict
cp zstdcli.c tmp
$ZSTD -f tmp -D tmpDict
$ZSTD -d tmp.zst -D tmpDict -of result
diff zstdcli.c result
$ECHO "Create second (different) dictionary"
$ZSTD --train *.c *.h -o tmpDictC
$ZSTD -d tmp.zst -D tmpDictC -of result && die "wrong dictionary not detected!"
$ECHO "Create dictionary with short dictID"
$ZSTD --train *.c --dictID 1 -o tmpDict1
cmp tmpDict tmpDict1 && die "dictionaries should have different ID !"
$ECHO "Compress without dictID"
$ZSTD -f tmp -D tmpDict1 --no-dictID
$ZSTD -d tmp.zst -D tmpDict -of result
diff zstdcli.c result
rm tmp*
$ECHO "\n**** multiple files tests **** "
./datagen -s1 > tmp1 2> /dev/null
@ -181,9 +156,46 @@ $ECHO "compress multiple files including a missing one (notHere) : "
$ZSTD -f tmp1 notHere tmp2 && die "missing file not detected!"
$ECHO "\n**** dictionary tests **** "
./datagen > tmpDict
./datagen -g1M | $MD5SUM > tmp1
./datagen -g1M | $ZSTD -D tmpDict | $ZSTD -D tmpDict -dvq | $MD5SUM > tmp2
diff -q tmp1 tmp2
$ECHO "- Create first dictionary"
$ZSTD --train *.c -o tmpDict
cp zstdcli.c tmp
$ZSTD -f tmp -D tmpDict
$ZSTD -d tmp.zst -D tmpDict -of result
diff zstdcli.c result
$ECHO "- Create second (different) dictionary"
$ZSTD --train *.c *.h -o tmpDictC
$ZSTD -d tmp.zst -D tmpDictC -of result && die "wrong dictionary not detected!"
$ECHO "- Create dictionary with short dictID"
$ZSTD --train *.c --dictID 1 -o tmpDict1
cmp tmpDict tmpDict1 && die "dictionaries should have different ID !"
$ECHO "- Compress without dictID"
$ZSTD -f tmp -D tmpDict1 --no-dictID
$ZSTD -d tmp.zst -D tmpDict -of result
diff zstdcli.c result
$ECHO "- Compress multiple files with dictionary"
rm -rf dirTestDict
mkdir dirTestDict
cp *.c dirTestDict
cp *.h dirTestDict
cat dirTestDict/* | $MD5SUM > tmph1 # note : we expect same file order to generate same hash
$ZSTD -f dirTestDict/* -D tmpDictC
$ZSTD -d dirTestDict/*.zst -D tmpDictC -c | $MD5SUM > tmph2
diff -q tmph1 tmph2
rm -rf dirTestDict
rm tmp*
$ECHO "\n**** integrity tests **** "
$ECHO "test one file (tmp1.zst) "
./datagen > tmp1
$ZSTD tmp1
$ZSTD -t tmp1.zst
$ZSTD --test tmp1.zst
$ECHO "test multiple files (*.zst) "

View File

@ -43,7 +43,7 @@ It also features a very fast decoder, with speed > 500 MB/s per core.
.SH OPTIONS
.TP
.B \-#
# compression level [1-21] (default:1)
# compression level [1-22] (default:1)
.TP
.BR \-d ", " --decompress
decompression

51
projects/build/README.md Normal file
View File

@ -0,0 +1,51 @@
Here are a few command lines for reference :
### Build with Visual Studio 2013 for msvcr120.dll
Running the following command will build both the `Release Win32` and `Release x64` versions:
```batch
build\build.VS2013.cmd
```
The result of each build will be in the corresponding `build\bin\Release\{ARCH}\` folder.
If you want to only need one architecture:
- Win32: `build\build.generic.cmd VS2013 Win32 Release v120`
- x64: `build\build.generic.cmd VS2013 x64 Release v120`
If you want a Debug build:
- Win32: `build\build.generic.cmd VS2013 Win32 Debug v120`
- x64: `build\build.generic.cmd VS2013 x64 Debug v120`
### Build with Visual Studio 2015 for msvcr140.dll
Running the following command will build both the `Release Win32` and `Release x64` versions:
```batch
build\build.VS2015.cmd
```
The result of each build will be in the corresponding `build\bin\Release\{ARCH}\` folder.
If you want to only need one architecture:
- Win32: `build\build.generic.cmd VS2015 Win32 Release v140`
- x64: `build\build.generic.cmd VS2015 x64 Release v140`
If you want a Debug build:
- Win32: `build\build.generic.cmd VS2015 Win32 Debug v140`
- x64: `build\build.generic.cmd VS2015 x64 Debug v140`
### Build with Visual Studio 2015 for msvcr120.dll
You need to invoke `build\build.generic.cmd` with the proper arguments:
**For Win32**
```batch
build\build.generic.cmd VS2015 Win32 Release v120
```
The result of the build will be in the `build\bin\Release\Win32\` folder.
**For x64**
```batch
build\build.generic.cmd VS2015 x64 Release v120
```
The result of the build will be in the `build\bin\Release\x64\` folder.
If you want Debug builds, replace `Release` with `Debug`.

View File

@ -33,7 +33,7 @@ IF %msbuild_version% == VS2013 SET msbuild="%programfiles(x86)%\MSBuild\12.0\Bin
IF %msbuild_version% == VS2015 SET msbuild="%programfiles(x86)%\MSBuild\14.0\Bin\MSBuild.exe"
rem TODO: Visual Studio "15" (vNext) will use MSBuild 15.0 ?
SET project="%~p0\..\projects\VS2010\zstd.sln"
SET project="%~p0\..\VS2010\zstd.sln"
SET msbuild_params=/verbosity:minimal /nologo /t:Clean,Build /p:Platform=%msbuild_platform% /p:Configuration=%msbuild_configuration%
IF NOT "%msbuild_toolset%" == "" SET msbuild_params=%msbuild_params% /p:PlatformToolset=%msbuild_toolset%
@ -50,4 +50,3 @@ IF ERRORLEVEL 1 EXIT /B 1
echo # Success
echo # OutDir: %output%
echo #