Commit Graph

2810 Commits (c4a40dbf650ec7137b7548ff55c3d85477e043b0)

Author SHA1 Message Date
W. Felix Handte 596f7d1256 Fix #1412: Perform Signed Comparison When Setting Attach Dict Param 2018-11-12 12:07:57 -08:00
Yann Collet 7b0c551bff
Merge pull request #1411 from facebook/prefetch_dict
Improves decompression speed when using cold dictionary
2018-11-09 11:31:35 -08:00
Yann Collet 1b4a9c518b
Merge pull request #1410 from facebook/prefetch_dec
improve long-range decoder speed
2018-11-08 18:41:58 -08:00
Yann Collet 483759a3de Improves decompression speed when using cold dictionary
by triggering the prefetching decoder path
(which used to be dedicated to long-range offsets only).

Figures on my laptop :
no content prefetch : ~300 MB/s (for reference)
full content prefetch : ~325 MB/s (before this patch)
new prefetch path : ~375 MB/s (after this patch)

The benchmark speed is already significant,
but another side-effect is that this version
prefetch less data into memory,
since it only prefetches what's needed, instead of the full dictionary.

This is supposed to help highly active environments
such as active databases,
that can't be properly measured in benchmark environment (too clean).

Also :
fixed the largeNbDict test program
which was working improperly when setting nbBlocks > nbFiles.
2018-11-08 17:00:23 -08:00
Yann Collet 20fb9e7f36 reduced assertion strength
one limit case can apparently be generated during fuzzer tests
2018-11-08 12:57:34 -08:00
Yann Collet 9126da5b5c improve long-range decoder speed
on enwik9 at level 22 (which is almost a worst case scenario),
speed improves by +7% on my laptop (415 -> 445 MB/s)
2018-11-08 12:47:46 -08:00
Yann Collet 8bed4012bd fixed decompression-only benchmark 2018-11-08 12:36:39 -08:00
Nick Terrell a8daa2d683 Signal before unlocking in pool.c 2018-11-08 10:45:53 -08:00
Bartosz Szreder 5c5c476338 Prevent deadlock on malloc() failure. 2018-11-08 10:29:31 +01:00
Yann Collet e0701d3c5d
Merge pull request #1404 from facebook/T36302429
fixed T36302471
2018-11-06 11:53:20 -08:00
Yann Collet 3e5cdf1b6a fixed T36302429 2018-11-05 17:50:30 -08:00
Yann Collet 2caa995558 just add an assert() in ZSTD_insertBtAndGetAllMatches()
to express a condition on ll0 .
May help static analyzer as in #1397
2018-11-05 17:13:32 -08:00
Yann Collet 3a90229616
Merge pull request #1395 from facebook/decompressblock
created zstd_decompress_block module
2018-10-29 16:28:09 -07:00
Yann Collet acd75a1448 fixed a second memset() on NULL
not sure why it only triggers now,
this code has been around for a while.

Introduced a new error code : dstBuffer_null,
I couldn't express anything even remotely similar with existing error codes set.
2018-10-29 15:03:57 -07:00
Yann Collet 9c58098200 fixed memcpy() on NULL warning
memcpy(NULL, src, 0) is undefined behavior.
2018-10-29 13:57:37 -07:00
Yann Collet ea966c8fb1
Merge pull request #1396 from facebook/huf_refactor
refactor HUF_compress_internal for clarity
2018-10-29 13:06:45 -07:00
Yann Collet fc20b3c441 added flag -Wc++-compat
for library and cli
2018-10-26 16:38:23 -07:00
Yann Collet c3c7deb1e1
Merge pull request #1392 from coetry/dev
provide consistent spacing to enum field
2018-10-26 15:29:07 -07:00
Yann Collet 1866bd374a Merge branch 'dev' into huf_Refactor 2018-10-26 15:25:01 -07:00
Yann Collet 8d56f4baee added a few comments for clarifications 2018-10-26 15:21:52 -07:00
Yann Collet 450356b5af Merge branch 'dev' into decompressblock 2018-10-26 15:03:43 -07:00
Yann Collet 7d4960a5e8
Merge pull request #1390 from facebook/nullAsOutput
support decompressing an empty frame into NULL
2018-10-26 14:43:16 -07:00
Allen Hai 3783720f70 vertically align code comment 2018-10-26 16:16:06 -05:00
Yann Collet 7b74405150 refactor HUF_compress_internal for clarity
changed workspace parameter convention
to always provide workspaceSize,
so that size can be explicitly checked.

Also, use more enum to make the meaning of some parameters more explicit.
2018-10-26 13:21:37 -07:00
Allen Hai 26e34d8a73 provide consistent spacing to enum field 2018-10-25 18:45:20 -05:00
Yann Collet 2b4914082e created zstd_decompress_block module
isolate all logic associated with block decompression
into its own module.

zstd_decompress is still in charge
of context creation/destruction,
frames, headers, streaming, special blocks, etc.

Compressed blocks themselves are now handled within zstd_decompress_block .
2018-10-25 16:28:41 -07:00
Yann Collet cb320a9fc0 added comment on public ddict functions 2018-10-24 16:50:03 -07:00
Yann Collet 806a5c84e4 support decompressing an empty frame into NULL
fix #1385
decompressing into NULL was an automatic error.
It is now allowed, as long as the content of the frame is empty.

Seems to simplify things for `arrow`.
Maybe some other projects rely on this behavior ?
2018-10-24 16:34:35 -07:00
Yann Collet debff3929b fixed warnings in testpools 2018-10-24 10:36:06 -07:00
Yann Collet cc3612e1c5 added simple guard macros
in case of accidental multi-includes
2018-10-23 17:55:23 -07:00
Yann Collet ccd2d426fc separate DDict logic into its own module
created zstd_ddict.c within lib/decompress
2018-10-23 17:25:49 -07:00
Yann Collet f181799082 fix decodecorpus incorrect frame generation
fix #1379
decodecorpus was generating one extraneous byte when `nbSeq==0`.
This is disallowed by the specification.

The reference decoder was just skipping the extraneous byte.
It is now stricter, and flag such situation as an error.
2018-10-20 18:56:21 -07:00
Yann Collet 1e6208e75e bumped version number to v1.3.7
updated documentation
2018-10-11 14:40:12 -07:00
Ori Livneh f31715f5e0 Enable use of bswap intrinsics in clang
Necessary because clang disguises itself as an older (__GNUC_MINOR__ = 2) GCC.
2018-10-11 15:01:09 -04:00
Yann Collet 6ed3b526e4 restored bitMask for shift values
since corrupted bitstreams can generate too large values.

This slightly reduces the benefits from clang on my laptop.
gcc results and code generation are not affected.
2018-10-10 18:29:50 -07:00
Yann Collet c012e9540a removed one assert()
that can be triggered by a corrupted bitstream.
2018-10-10 17:33:04 -07:00
Yann Collet 7791f192ee removed one assert()
which can be triggered when input is corrupted.
2018-10-10 16:39:15 -07:00
Yann Collet d3ec23313d improved decompression speed
while reviewing #1364,
I found a decompression speed improvement.

On my laptop, the new code decompresses +5-6% faster on clang
and +2-3% faster on gcc.

not bad for an accidental optimization...
2018-10-10 15:48:43 -07:00
Yann Collet 942df522cc
Merge pull request #1361 from facebook/streamdoc
Clarify streaming api doc
2018-10-08 19:19:34 -07:00
W. Felix Handte b8235be865 Avoid Searching Dictionary in ZSTD_btlazy2 When an Optimal Match is Found
Bailing here is important to avoid reading past the end of the input buffer.
2018-10-08 15:59:32 -07:00
W. Felix Handte d121b3451c Clean Up Debug Log Statements 2018-10-08 15:59:32 -07:00
W. Felix Handte 08da9ad316 Remove Unused Variable 2018-10-08 15:59:32 -07:00
Yann Collet 8fc79fac07 clarify streaming api doc
as suggested by @indygreg in #1360
2018-10-08 15:53:29 -07:00
Yann Collet 11cd2ea43d finalized minor warnings on Haiku 2018-10-03 16:37:50 -07:00
Yann Collet bc93b801f0
Merge pull request #1330 from korli/haiku
Enable building zstd on Haiku.
2018-10-03 13:36:00 -07:00
Jerome Duval 87c10e2f58 Enable building zstd on Haiku. 2018-10-03 09:51:56 +02:00
Yann Collet 22ddf3523a fixed msan warning
on btlazy2 strategy with dictAttach
2018-10-02 18:20:20 -07:00
Yann Collet c9843ec232
Merge pull request #1348 from facebook/donotdelete
Fix #1082
2018-10-02 16:37:58 -07:00
Yann Collet 3ca6261223 fixed static analyzer warnings
note : for some reason,
scan-build version on my laptop found problems within fastcover.c
that scan-build on travisCI does not flag.

They are, as usual, false positive :
the analyzer does not understand that a table (`offset`) is correctly filled before usage.
2018-10-02 15:59:11 -07:00
Yann Collet 228c6e5147
Merge pull request #1317 from felixhandte/split-logs
Independent Dictionary and Working Context Table Logs
2018-10-01 17:20:12 -07:00
W. Felix Handte 5b296869df Revert Ability to Set HashLog and ChainLog on Context When Dict is Attached
This capability is not needed / used in the current unit of work. I'll
re-introduce it later, when we start allowing users to override the deduced
working context logs.
2018-10-01 13:28:13 -07:00
W. Felix Handte c2369fedc4 Restore Passing CParams to `ZSTD_insertAndFindFirstIndex_internal` 2018-09-28 17:12:54 -07:00
W. Felix Handte bad74c4781 Use Working Ctx Logs when not in DMS Mode
We pre-hash the ptr for the dict match state sometimes. When that actually
happens, a hashlog of 0 can produce undefined behavior (right shift a long
long by 64). Only applies to unoptimized compilations, since when
optimizations are applied, those hash operations are dropped when we're not
actually in dms mode.
2018-09-28 17:12:54 -07:00
W. Felix Handte c38acff94f When Attaching Dictionary, Size Working Tables Based on Input Size Only 2018-09-28 17:12:54 -07:00
W. Felix Handte 9d87d50878 Remove Log Overriding for the Time Being 2018-09-28 17:12:54 -07:00
W. Felix Handte 77fd17d93f Remove Strategy-Dependency in Making Attachment Decision 2018-09-28 17:12:54 -07:00
W. Felix Handte 00c088b32d Support Split Logs in ZSTD_btopt..ZSTD_btultra 2018-09-28 17:12:54 -07:00
W. Felix Handte 0783492178 Bump Split Log Support to ZSTD_btultra 2018-09-28 17:12:54 -07:00
W. Felix Handte e4ac4a0f16 Support Split Logs in ZSTD_greedy..ZSTD_btlazy2 2018-09-28 17:12:54 -07:00
W. Felix Handte e710dc3369 Bump Split Log Support to ZSTD_btlazy2 2018-09-28 17:12:54 -07:00
W. Felix Handte 22fcb8d4c7 Support Split Logs in ZSTD_dfast 2018-09-28 17:12:54 -07:00
W. Felix Handte a232b3bb7c Bump Split Log Support to ZSTD_dfast 2018-09-28 17:12:54 -07:00
W. Felix Handte fe96e98f81 Support a Separate Hash Log in ZSTD_fast 2018-09-28 17:12:54 -07:00
W. Felix Handte bc880ebe8f Stop Passing in `hashLog` and `stepSize` to `ZSTD_compressBlock_fast_generic` 2018-09-28 17:12:54 -07:00
W. Felix Handte b3107c7799 Temporary Commit to Retain Requested Hash and Chain Logs During Dict Attach 2018-09-28 17:12:54 -07:00
W. Felix Handte 34e0193129 Allow Setting Hash and Chain Logs on Contexts with Attached CDict 2018-09-28 17:12:54 -07:00
W. Felix Handte eae8232f50 For Supported Strategies, Attach Dict Even When Params Don't Match 2018-09-28 17:12:54 -07:00
W. Felix Handte 01ff945eae Split Attach and Copy Reset Strategies into Separate Implementation Functions 2018-09-28 17:12:54 -07:00
W. Felix Handte a6d6bbeae1 Pull Attachment Decision into Separate Function 2018-09-28 17:12:54 -07:00
W. Felix Handte b7fba599ae And Then Avoid the Unused Parameter Warning 2018-09-28 17:12:54 -07:00
W. Felix Handte 1f188ae655 Move Asserts into Function to Avoid Unused Function Warning 2018-09-28 17:12:54 -07:00
W. Felix Handte 7212b5e5c2 Move Match State CParams Setting into `resetCCtx` and `continueCCtx` 2018-09-28 17:12:54 -07:00
W. Felix Handte 01e34d365b Strengthen Assertion to Assert Equality 2018-09-28 17:12:53 -07:00
W. Felix Handte 50cc1cf4d5 Remove CParams Arg from ZSTD_ldm_blockCompress 2018-09-28 17:12:53 -07:00
W. Felix Handte 14764de49f Stop Separately Passing CParams in ZSTD_lazy Internal Functions 2018-09-28 17:12:53 -07:00
W. Felix Handte 97149f22c3 Stop Separately Passing CParams in ZSTD_opt Internal Functions 2018-09-28 17:10:42 -07:00
W. Felix Handte dcdf437fed Also Remove CParams from Table Filling Functions' Args 2018-09-28 17:10:42 -07:00
W. Felix Handte 3483f89101 Also Assert Equivalency When Filling MatchState with Prefix 2018-09-28 17:10:42 -07:00
W. Felix Handte 6cb2454646 Remove CParams from Block Compressor Functions' Args 2018-09-28 17:10:42 -07:00
W. Felix Handte 03103269de Assert `ctx` and `ms` cparams Equivalency 2018-09-28 17:10:42 -07:00
W. Felix Handte 4e3ecee9ed Remove cParams from CDict 2018-09-28 17:10:42 -07:00
W. Felix Handte 76ef87ed9d Add ZSTD_compressionParameters to ZSTD_matchState_t 2018-09-28 17:10:42 -07:00
Nick Terrell 6391cd1030 [zstd] Fix newly added test case 2018-09-28 12:09:28 -07:00
Yann Collet 73773c6b6a fixed legacy compilation tests
for some reason, these tests started failing recently on CircleCI
2018-09-27 18:15:14 -07:00
Nick Terrell a180ea07c4 Restore ZSTD_noCompressBlock() for clarity 2018-09-27 16:06:02 -07:00
Nick Terrell aec1a3ec58 Change byte to value to avoid a GRUB typedef 2018-09-27 15:24:48 -07:00
Nick Terrell 109bd37474 Include stddef.h for size_t 2018-09-27 15:24:48 -07:00
Nick Terrell f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet 2a5cd8535a
Merge pull request #1342 from facebook/fixcatyd
fix : huge (>4GB) chain of blocks
2018-09-27 10:20:14 -07:00
Yann Collet 404a7bfed0 moved again overflow correction
cannot work from within ZSTD_compressBlock()
2018-09-26 18:06:53 -07:00
Yann Collet 0e2dbac18a changed overflow correction place
keep one in compress_frameChunk(),
so that it's tested at every loop
in case some user simply some large mulit-GB input in a single invocation.

Add one in ZSTD_compressBlock(),
since compressBlock() explicitly skips frameChunk().
2018-09-26 15:35:38 -07:00
Yann Collet e74eade251
Merge pull request #1339 from facebook/grep_colors
fixed usage of grep in Makefile
2018-09-26 14:39:20 -07:00
Yann Collet 8883af6a1e
Merge pull request #1327 from facebook/adapt
Adaptive compression
2018-09-26 14:39:08 -07:00
Yann Collet f98c69d77c fix : huge (>4GB) stream of blocks
experimental function ZSTD_compressBlock() is designed for very small data in mind,
for situation where saving the ~12 bytes of frame header can actually make a difference.

Some systems though may have to deal with small and large data entangled.
If it's larger than a block (> 128KB), compressBlock() cannot compress them in one round.

That's why it's possible to compress in multiple rounds.
This is a chain of compressed blocks.

Some users push this capability to the limit, encoding gigantic chain of blocks.
On crossing the 4GB limit, some internal overflow occurs.

This fix moves the overflow correction mechanism higher in the call chain,
so that it's applied also to gigantic chains of blocks.

Added a test case in fuzzer.c, which crashes before the fix, and pass now.
2018-09-26 14:24:28 -07:00
Yann Collet 8ff17a6a09
Merge pull request #1329 from facebook/v04isout
Changed default legacy support to v0.5+
2018-09-26 13:39:05 -07:00
Yann Collet 08f68d83c5 fixed usage of grep in Makefile
when terminal uses colors
as suggested by @danielshir (#1294)
2018-09-25 16:56:53 -07:00
Yann Collet 04f47bbdd2 Merge branch 'dev' into adapt 2018-09-24 16:56:45 -07:00
Yann Collet 9bb6c15f79
Merge pull request #1332 from facebook/minclevel
defined a minimum negative level
2018-09-24 16:01:13 -07:00
Yann Collet 292d8e4a83 added some tests based on limits.h
in order to ensure proper type mapping
when not using stdint.h
2018-09-23 23:57:30 -07:00
Yann Collet 71a5210617 avoid recompiling dll every time under mingw 2018-09-21 17:40:30 -07:00
Yann Collet c484345a82 Merge branch 'mingw' into adapt 2018-09-21 16:00:46 -07:00
Yann Collet bfff4f4809 ensure all writes to job->cSize are mutex protected
even when reporting errors,
using a macro for code brevity, as suggested by @terrelln,
2018-09-21 16:00:39 -07:00
Yann Collet 32b7cf1bcf fixed tautological tests
involving ZSTD_TARGETLENGTH_MIN (== 0)
2018-09-21 15:04:43 -07:00
Yann Collet c044345f8f Merge branch 'mingw' into minclevel 2018-09-21 14:56:57 -07:00
Yann Collet de6c75e4e5
Merge pull request #1318 from felixhandte/shadow-dict-matches
Don't Search Dictionary Context When Working Context Search Resulted in Mismatch
2018-09-21 12:15:33 -07:00
Yann Collet a54c86cfc6 defined a minimum negative level
which can be probed using new function ZSTD_minCLevel().

Also : redefined ZSTD_TARGETLENGTH_MIN/MAX for consistency

used the opportunity to bump version number to v1.3.6
2018-09-20 16:52:03 -07:00
Yann Collet b2939163e1 Changed default legacy support to v0.5+
thus dropping read support for v0.4.

It's always possible to re-enable it, by changing build macro ZSTD_LEGACY_SUPPORT to 4.
2018-09-20 14:30:20 -07:00
Yann Collet 7992942d66 fixed complex tsan issue
when job->consumed == job->src.size , compression job is presumed completed,
so it must be the very last action done in worker thread.
2018-09-20 13:47:31 -07:00
Yann Collet 6b07a66aec fixed minor reporting discrepancy in MT mode 2018-09-19 16:30:55 -07:00
Yann Collet ca02ebee07 removed static variables
so that --adapt can work on multiple input files too
2018-09-19 15:25:50 -07:00
Yann Collet 89bc309d90 error out when --adapt is associated with --single-thread
since they are not compatible
2018-09-19 14:49:13 -07:00
Yann Collet 2f78228f65 Merge branch 'dev' into adapt 2018-09-19 12:43:42 -07:00
Yann Collet 005f000aed updated documentation of *refPrefix()
indicating the equivalence with `diff` operation.
2018-09-18 13:07:08 -07:00
ko-zu 18b4a1da61 Fix clang build
Fix dixygen comment
Fix clang binary path
2018-09-16 10:27:02 +09:00
Yann Collet 7269fe6cd3 minor code comment update 2018-09-14 16:06:35 -07:00
Yann Collet 0403148315
Merge pull request #1295 from felixhandte/hdr-intro-comment-negative-lvls
Proposed Update to Zstd.h Introduction Comment
2018-09-14 15:29:19 -07:00
W. Felix Handte b76c888497 ZSTD_dfast: Don't Search Dict Context When Mismatch Was Found 2018-09-14 15:24:25 -07:00
W. Felix Handte b048af5999 ZSTD_fast: Don't Search Dict Context When Mismatch Was Found 2018-09-14 15:23:35 -07:00
Yann Collet 0e5b447aaa
Merge pull request #1316 from facebook/coldDict
Cold dictionary mitigation
2018-09-14 10:37:46 -07:00
Yann Collet 5512400677 updated code comments, based on @terrelln review 2018-09-13 16:44:04 -07:00
Yann Collet d195eec97e fixed msan error
cold dictionary is detected through a comparison with dictEnd,
which was not initialized at the beginning of first DCtx usage.
2018-09-13 12:29:52 -07:00
Yann Collet 674dd21bd0 final parameter tuning 2018-09-12 17:25:34 -07:00
Yann Collet 419dfd4ea3 clean traces 2018-09-12 16:40:28 -07:00
Yann Collet 2618253da2 fixed PREFETCH() macro
for corner cases and platforms without this instruction
2018-09-12 16:15:37 -07:00
Yann Collet 44d3b83bb1 conditional dict content prefetching
based on nbSeq.
2018-09-12 15:35:21 -07:00
Yann Collet 5fb5ed3b31 adjust heuristic decisions 2018-09-12 12:32:09 -07:00
Nick Terrell f6daddf2db Also allow x86 2018-09-12 12:05:32 -07:00
Nick Terrell 1e0bac6a9c [libzstd] Fix cpu for MSFT ARM
The `__cpuid()` and `__cpuidex()` intrinsics are only available
on x86 and x86_64.
2018-09-12 10:35:16 -07:00
Yann Collet 4de344d505 added conditional prefetch
depending on amount of work to do.
2018-09-12 10:29:47 -07:00
Yann Collet 63a519dbf6 implemented first prefetch
based on dictID.
dictContent is prefetched up to 32 KB
(no contentSize adaptation)
2018-09-11 17:23:44 -07:00
Yann Collet 3675ef4762 added comment about minimum size of FSE tables
required for DDict creation,
which use this space as workspace during Hufman table building stage.
2018-09-10 11:24:17 -07:00
Yann Collet f97ca36eab strengthened conditions for using workplace into fse table space
ensure that the structure layout is as expected.
will trigger an error if it changes in the future.

Another solution would be to use a union,
this would be cleaner and get rid of these static asserts.

However, in order to keep the current code unmodified,
it would be necessary to use an un-named unions.
And apparently, un-named unions are only possible on "recent" compilers (C99+).
2018-09-06 17:54:13 -07:00
Yann Collet 87406548f0 reduced DDict size, by -2KB
corresponding to the removal of workspace
which is needed while building huffman table
and is now either present in DCtx,
or temporarily borrowed from available FSE table space.
2018-09-06 17:07:53 -07:00
Yann Collet 50b216146f
Merge pull request #1304 from facebook/largeNbDicts
contrib/largeNbDicts
2018-09-06 09:50:56 -07:00
Jennifer Liu 21721b75a3 Change default f to 20 2018-09-04 17:15:14 -07:00
Jennifer Liu 944c9986e0 Update comment on default steps of cover and fastcover 2018-08-30 15:37:29 -07:00
Jennifer Liu 16db0337b1 Always use splitPoint=1.0 for non-optimize cover and fastcover 2018-08-30 14:59:22 -07:00
Yann Collet 31ebb26945
Merge pull request #1301 from terrelln/lit-size
[zstd] Fix seqStore growth
2018-08-28 17:10:25 -07:00
Nick Terrell 5e580de6da [zstd] Fix seqStore growth
We could undersize the literals buffer by up to 11 bytes,
due to a combination of 2 bugs:
* The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra
  space, like it is supposed to.
* We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.
2018-08-28 13:24:44 -07:00
Yann Collet b37a0a6bde
Merge pull request #1298 from facebook/bench
Refactored bench.c
2018-08-28 12:25:02 -07:00
modbw d14edf259f
Fixed memory leak detected by cppcheck
cppcheck (which is run regularly in our CI environment)  detected a possible memory leak.
2018-08-28 07:25:05 +02:00
Yann Collet 6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00
Yann Collet af23d39eb8
Merge pull request #1297 from felixhandte/check-offset-table
Fix Missing Offset Table Check
2018-08-24 17:36:44 -07:00
W. Felix Handte 37f17ee237 Mark Repeated Offset Table as Needing Check 2018-08-24 14:33:34 -07:00
Nick Terrell e34e917655 Fix compiler warning 2018-08-23 17:48:06 -07:00
Nick Terrell 5ee5e71be3 [zstd] Add note about empty ZSTD_CDict 2018-08-23 17:48:06 -07:00
Nick Terrell 924944e471 [zstd] Reuse the ZSTD_CCtx more often with small data. 2018-08-23 17:48:06 -07:00
Yann Collet 2e45badff4 refactored bench.c
for clarity and safety, especially at interface level
2018-08-23 14:21:18 -07:00
Jennifer Liu 9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
W. Felix Handte e589ac6276 Reformat Introduction Comment and Mention Negative Levels 2018-08-22 17:07:34 -07:00
Yann Collet c71c4f23d7 fix "unused parameter" in single-thread mode
within newly added ZSD_toFlushNow()
2018-08-20 11:40:10 -07:00
Yann Collet 105677c6db created ZSTDMT_toFlushNow()
tells in a non-blocking way if there is something ready to flush right now.
only works with multi-threading for the time being.

Useful to know if flush speed will be limited by lack of production.
2018-08-17 18:11:54 -07:00
Yann Collet 36d6165a2d Makefile: added variable SCANBUILD
so that a different version of scan-build can be selected
2018-08-16 16:44:13 -07:00
Yann Collet 1515f0bb0d fixed more issues detected by recent version of scan-build
test run on Linux
2018-08-16 15:20:25 -07:00
Yann Collet 5291d9ac31 fix scope of scan-build tests
exclude zlib code
2018-08-15 17:41:44 -07:00
Yann Collet 42a02ab745 fixed minor warnings issued by scan-build 2018-08-15 14:36:02 -07:00
Yann Collet 3692c31598 Merge branch 'dev' into scanbuild 2018-08-15 13:50:49 -07:00
Yann Collet 6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00
Yann Collet 3e4617ef54 frameProgression reports nbActiveWorkers and output flushed 2018-08-14 11:49:25 -07:00
Yann Collet e7a49c6683 introduced command --adapt 2018-08-11 20:48:06 -07:00
Yann Collet 2dd76037be zstd cli can increase level when input is too slow 2018-08-09 15:51:30 -07:00
Yann Collet 79a35ac20d minor code comments improvements 2018-08-09 15:16:31 -07:00
W. Felix Handte 2ca7c69167 Fix CDict Attachment to Handle CDicts with Non-Zero Starts
CDicts were previously guaranteed to be generated with `lowLimit=dictLimit=0`.
This is no longer true, and so the old length and index calculations are no
longer valid. This diff fixes them to handle non-zero start indices in CDicts.
2018-08-07 18:14:14 -07:00
Yann Collet 5808027abf Merge branch 'dev' into fix1241 2018-08-03 16:08:33 -07:00
Yann Collet 5892dd5da4
Merge pull request #1255 from terrelln/norm-fix
[FSE] Fix division by zero
2018-08-02 11:48:56 -07:00
Nick Terrell dc5a67cb7b Disallow tableLog == srcLog 2018-08-02 11:12:17 -07:00
Jennifer Liu f5228f2c44 Refactoring 2018-07-31 13:58:54 -07:00
Jennifer Liu 4e29bc2469 Use CDict instead of CCtx in analyzeEntropy 2018-07-31 10:36:45 -07:00
cyan4973 3f535007e4 fix %zu support under minGW
and relevant test on Appveyor
2018-07-30 16:56:18 +02:00
cyan4973 aade1e5904 Merge branch 'dev' into fix1241 2018-07-30 16:30:35 +02:00
Nick Terrell 9889bca530 [FSE] Fix division by zero
When the primary normalization method fails, and
`(1 << tableLog) == (maxSymbolValue + 1)`, and every symbol gets assigned
normalized weight 1 or -1 in the first loop, then the next division can
raise `SIGFPE`.
2018-07-27 17:30:03 -07:00
Yann Collet 6e490a2f09
Merge pull request #1237 from terrelln/init-cstream-adv
Set requestedParams in ZSTD_initCStream*()
2018-07-18 16:33:30 +02:00
cyan4973 9597b438e9 fix #1241
Ensure that first input position is valid for a match
even during first usage of context
by starting reference at 1
(avoiding the problematic 0).
2018-07-17 18:52:57 +02:00
cyan4973 53e1f0504e zstdmt debug traces compatibles with mingw
since mingw does not have `sys/times.h`,
remove this path when detecting mingw compilation.
2018-07-17 14:39:44 +02:00
Nick Terrell 45821fac0c
Merge pull request #1225 from jennifermliu/dev
Split samples when building dictionary for COVER
2018-07-13 13:26:15 -07:00
Nick Terrell 6d222c437c Set requestedParams in ZSTD_initCStream*()
The correct parameters are used once, but once `ZSTD_resetCStream()` is
called the default parameters (level 3) are used. Fix this by setting
`requestedParams` in the `ZSTD_initCStream*()` functions.

The added tests both fail before this patch and pass after.
2018-07-12 18:35:55 -07:00
Jennifer Liu 612b346ed5 Add explanation for split=100 2018-07-11 15:50:28 -07:00
Jennifer Liu 5021441d86 Change default splitPoint to 100 2018-07-10 11:19:33 -07:00
Jennifer Liu 456f290e31 Change back to splitPoint<=0 2018-07-09 13:53:25 -07:00
Jennifer Liu 7efabb2cf6 Only make 0.0 default splitPoint 2018-07-09 12:26:53 -07:00
Yann Collet bbd78df59b add build macro NO_PREFETCH
prevent usage of prefetch intrinsic commands
which are not supported by c2rust
(see https://github.com/immunant/c2rust/issues/13)
2018-07-06 17:06:04 -07:00
Jennifer Liu 015a00af0f Change cover_sum back to 2 parameters and fix splitPoint issues 2018-07-06 14:24:18 -07:00
Jennifer Liu 0bbff01211 Fix testing parameter 2018-07-05 22:40:32 -07:00
Jennifer Liu a085d1aae1 Allow splitPoint==1.0 (using all samples for both training and testing) 2018-07-05 10:38:45 -07:00
Jennifer Liu 0881184c89 Some edits based on pull request comments 2018-07-03 17:53:27 -07:00
Jennifer Liu 16e75e8804 Update minimal training sample size 2018-07-03 12:07:06 -07:00
Jennifer Liu 348e5f77a9 Add split=# to cli 2018-06-29 17:54:41 -07:00
Jennifer Liu 52fbbbcb6b Explicitly cast double to unsigned 2018-06-29 16:17:20 -07:00
Jennifer Liu f9d19b83fb Fix variable declaration problem 2018-06-29 15:46:56 -07:00
Jennifer Liu e061d84016 Another fix to comparator 2018-06-29 15:38:08 -07:00
Jennifer Liu 59797d3328 Fix splitPoint floating point comparison problem 2018-06-29 12:47:03 -07:00
Jennifer Liu 0ef06f2e8a Split samples into train and test sets 2018-06-29 12:33:34 -07:00
Yann Collet 121aa2c388
Merge pull request #1211 from facebook/staticAssert
updated DEBUG_STATIC_ASSERT()
2018-06-27 12:19:17 -07:00
Yann Collet 4489daec09 slightly adjusted default-distribution threshold
depending on strategy.
fast favors faster compression and decompression speeds.
2018-06-26 20:10:45 -07:00
Yann Collet ff773bfcde zeroise freq table with memset()
improves decoding speed by ~5% in github_users sample set
2018-06-26 17:24:41 -07:00
Yann Collet 7b9bbf77c9 switched to a sizeof() version
avoid -Werror=unused-variable issue
2018-06-26 14:08:35 -07:00
Yann Collet f98ec46979 updated DEBUG_STATIC_ASSERT()
following suggestion from #1209
2018-06-26 12:04:59 -07:00
Nick Terrell b426bcc097
[zstdmt] Fix jobsize bugs (#1205)
[zstdmt] Fix jobsize bugs

* `ZSTDMT_serialState_reset()` should use `targetSectionSize`, not `jobSize` when sizing the seqstore.
  Add an assert that checks that we sized the seqstore using the right job size.
* `ZSTDMT_compressionJob()` should check if `rawSeqStore.seq == NULL`.
* `ZSTDMT_initCStream_internal()` should not adjust `mtctx->params.jobSize` (clamping to MIN/MAX is okay).
2018-06-25 15:21:08 -07:00
Yann Collet 3b53bfe4f3
Merge pull request #1200 from felixhandte/zstd-attach-dict-pref
Add CCtx Param Controlling Dict Attachment Behavior
2018-06-25 12:42:31 -07:00
Yann Collet 31769ce702 error on no forward progress
streaming decoders, such as ZSTD_decompressStream() or ZSTD_decompress_generic(),
may end up making no forward progress,
(aka no byte read from input __and__ no byte written to output),
due to unusual parameters conditions,
such as providing an output buffer already full.

In such case, the caller may be caught in an infinite loop,
calling the streaming decompression function again and again,
without making any progress.

This version detects such situation, and generates an error instead :
ZSTD_error_dstSize_tooSmall when output buffer is full,
ZSTD_error_srcSize_wrong when input buffer is empty.

The detection tolerates a number of attempts before triggering an error,
controlled by ZSTD_NO_FORWARD_PROGRESS_MAX macro constant,
which is set to 16 by default, and can be re-defined at compilation time.
This behavior tolerates potentially existing implementations
where such cases happen sporadically, like once or twice,
which is not dangerous (only infinite loops are),
without generating an error, hence without breaking these implementations.
2018-06-22 17:58:21 -07:00
Yann Collet 3934e010a2
Merge pull request #1197 from facebook/poolResize
Thread Pool resize
2018-06-22 14:20:07 -07:00
Yann Collet fbd5dfc1b1 changed POOL_resize() return type to int
return is now just en error code.
This guarantee that `ctx` remains valid after POOL_resize().
Gets rid of internal POOL_free() operation.
2018-06-22 12:14:59 -07:00
Yann Collet 1d5648ca10
Merge pull request #1196 from felixhandte/zstd-btopt-in-place-dict
ZSTD_btopt: Support Searching the Dictionary Context In-Place
2018-06-22 11:53:23 -07:00
Yann Collet f6242d30b7
Merge pull request #1202 from facebook/barelyCompressible
Increase threshold detection of poorly compressible data
2018-06-22 11:52:52 -07:00
Yann Collet 698fd00afb huf: increase threshold detection of poorly compressible data 2018-06-21 18:32:38 -07:00
Yann Collet 243cd9d8bb add a cond_broadcast after resize
to make sure all threads (notably newly available threads)
get awaken to immediately process potential items in the queue.
2018-06-21 18:04:58 -07:00
Yann Collet 818e72b4d5 added extended POOL test
abrupt end + downsizing with running jobs remaining in queue.

also : POOL_resize() requires numThreads >= 1
2018-06-21 14:58:59 -07:00
W. Felix Handte 01bb1c1016 Add CCtx Param Controlling Dict Attachment Behavior 2018-06-21 17:29:25 -04:00
W. Felix Handte 3e91dc4d6a Add Repcode Bounds Check 2018-06-21 15:54:41 -04:00
W. Felix Handte 5bd3d4b7d2 Add Debug Log Statement 2018-06-21 15:54:07 -04:00
W. Felix Handte 3caba150c6 Fix `dmsBtLow` Test 2018-06-21 15:53:40 -04:00
W. Felix Handte 5da9bbc38e Conceivably Dedup ZSTD_noDict and ZSTD_dictMatchState _insertBt1 Impls
By reverting to the bool extDict flag, we call ZSTD_insertBt1 with the same
const args in both non-extDict dictModes.
2018-06-21 11:20:01 -04:00
Yann Collet 6de249c1c6 fixed: bug when counting nb of active threads
when queueSize > 1

also : added a test in testpool.c
       verifying resizing is effective.
2018-06-20 18:28:49 -07:00
Yann Collet 6b48eb12c0 change control of threadLimit
now limits maximum nb of active threads
even when queueSize > 1.
2018-06-20 14:35:39 -07:00
W. Felix Handte 5d81f71e83 Consistency in Guarding DMS-Only Variable Initializations 2018-06-20 16:54:53 -04:00
W. Felix Handte 9c14eafe3d Also Use `matchLow` for HC3 Match 2018-06-20 15:51:14 -04:00
W. Felix Handte 0a6cf7cd1d Minor Changes 2018-06-20 15:27:23 -04:00
W. Felix Handte ae1f3898a2 Remove Dead(!) HC3 DMS Lookup 2018-06-20 15:27:12 -04:00
Yann Collet 93702a7a62
Merge pull request #1198 from facebook/msdebug
made Visual Studio compatible with DEBUGLEVEL >= 2
2018-06-20 12:26:31 -07:00
cyan4973 ae0b7ffa0a made Visual Studio compatible with DEBUGLEVEL >= 2 2018-06-20 09:45:02 -07:00
Yann Collet 62469c9f41 fixed wrong size in pthread struct transfer 2018-06-19 20:14:03 -07:00
Yann Collet 166901dc72 reduced POOL_resize() restriction
It's not necessary to ensure that no job is ongoing.
The pool is only expanded, existing threads are preserved.
In case of error, the only option is to return NULL and terminate the thread pool anyway.
2018-06-19 18:07:18 -07:00
Yann Collet 066fbbfe1c make zstdmt resize its context
when nbThreads change.

Technically, it only expands.
But when instructed to use less threads,
the thread pool will limit nb of concurrent threads.
2018-06-19 17:28:56 -07:00
Yann Collet 4567c57199 finalized POOL_resize()
POOL_ctx* POOL_resize(POOL_ctx* ctx, size_t numThreads)

The function may fail, and returns a NULL pointer in this case.
2018-06-19 16:03:12 -07:00
Yann Collet 6768cf53fd
Merge pull request #1190 from terrelln/ldm-adjust
Adjust advanced parameters to source size
2018-06-19 14:40:56 -07:00
W. Felix Handte 03c39c540b Fix Incorrect Param 2018-06-19 15:36:33 -04:00
W. Felix Handte de639502aa Update Dict Attachment Cut-Offs 2018-06-19 15:36:13 -04:00
W. Felix Handte f0a13bcd68 Make Sure Position 0 Gets Into the Tree 2018-06-19 15:10:06 -04:00
W. Felix Handte 87fe4788a3 Fix Compression Ratio Regression #1 2018-06-19 13:01:21 -04:00
W. Felix Handte 4bb79f9c55 Misc Changes 2018-06-19 13:01:21 -04:00
W. Felix Handte 2091f34e9e Find Proper Matches 2018-06-19 13:01:21 -04:00
W. Felix Handte 64348a15f1 Misc Fixes 2018-06-19 13:01:21 -04:00
W. Felix Handte ade8586ce6 Find `mls == 3` Matches 2018-06-19 13:01:21 -04:00
W. Felix Handte ce743312e2 Fix Typo 2018-06-19 13:01:21 -04:00
W. Felix Handte a075864756 Switch `!= ZSTD_extDict` to `== ZSTD_noDict` 2018-06-19 13:01:21 -04:00
W. Felix Handte 1e03377bde Implement RepCode Check 2018-06-19 13:01:21 -04:00
W. Felix Handte ccbf067973 Add _dictMatchState Functions 2018-06-19 13:01:21 -04:00
W. Felix Handte d5d8240967 Convert `extDict` Flag to `dictMode` Enum 2018-06-19 13:01:21 -04:00
W. Felix Handte 93c3184d44 Attach Dicts when Using ZSTD_btopt and ZSTD_btultra 2018-06-19 13:01:21 -04:00
Yann Collet 1c714fda3f introduced POOL_resize()
not complete yet :
finalize behavior in case of unfinished expansion
2018-06-18 20:46:39 -07:00
Nick Terrell 3841dbac84 Adjust advanced parameters to source size
In the new advanced API, adjust the parameters even if they are explicitly
set. This mainly applies to the `windowLog`, and accordingly the `hashLog`
and `chainLog`, when the source size is known.
2018-06-18 15:49:31 -07:00
Yann Collet e30f13bde0
Merge pull request #1185 from felixhandte/zstd-btlazy-in-place-dict
ZSTD_btlazy2: Support Searching the Dictionary Context In-Place
2018-06-18 13:29:44 -07:00
Yann Collet d8462ecba2 Merge branch 'dev' into huf_rename 2018-06-14 20:42:10 -04:00
Yann Collet b7e5ebef2a grouped X2 function together 2018-06-14 20:41:50 -04:00
Yann Collet 9698d2fb72
Merge pull request #1189 from facebook/hist
histogram module
2018-06-14 20:39:52 -04:00
Yann Collet 6901c94cd6 avoid duplicate code comments
when a function is decribed in hist.h,
do not describe it again in hist.c
to avoid future doc synchronization issues.
2018-06-14 19:47:05 -04:00
Yann Collet f70f829ff5
Merge pull request #1187 from facebook/fix1186
fix dctx initialization within ZSTD_decompress in stack mode
2018-06-14 16:22:22 -04:00
Yann Collet a71513bec6
Merge pull request #1184 from facebook/debug
Grouped debug functions into debug.h
2018-06-14 16:21:53 -04:00
Yann Collet 1adf84ccb7 renamed all HUF_decompress*X4*() functions into *X2
to underline they generate up to 2 symbols per decoding,
in preparation for a future *X3 variant.
2018-06-14 15:17:03 -04:00
Yann Collet a09af5eb6b renamed all HUF_decompress*X2*() functions into *X1
to underline they generate one symbol per decoding operation.

The new naming scheme will make it easier to introduce an *X3 variant.
2018-06-14 15:08:43 -04:00