Commit Graph

1161 Commits (0ed8ee4a371a11b91a68677cdff7a7efee79fe57)

Author SHA1 Message Date
Yann Collet 8e0e495ce8 fixed: compression ratio discrepancy
depending on initialization,
the first byte of a new frame was invalidated or not.

As a consequence, one match opportunity was available or not,
resulting in slightly different compressed sizes
(on average, 1 or 2 bytes once every 20 frames).

It impacted ratio comparison between one-shot and streaming modes.

This fix makes the first byte of a new frame always a valid match.
Now compressed size is always the same.
It also improves compressed size by a negligible amount.
2018-12-19 10:11:06 -08:00
Yann Collet d0e15f8d32
Merge pull request #1458 from terrelln/estimate
[libzstd] Fix estimate with negative levels
2018-12-18 15:12:21 -08:00
Yann Collet 04baecaeed
Merge pull request #1457 from facebook/btultra2.1
btultra2 and very small input
2018-12-18 14:46:55 -08:00
Nick Terrell d7def456d8 [libzstd] Fix estimate with negative levels
* Fix `ZSTD_estimateCCtxSize()` with negative levels.
* Fix `ZSTD_estimateCStreamSize()` with negative levels.
* Add a unit test to test for this error.
2018-12-18 14:24:49 -08:00
Yann Collet ef984e7307 fix debug levels
as reported by @terrelln.
2 is reserved for temporary usage only.
2018-12-18 13:40:07 -08:00
Yann Collet 635783da12 btultra2 and very small srcSize
When srcSize is small,
the nb of symbols produced is likely too small to warrant dedicated probability tables.
In which case, predefined distribution tables will be used instead.

There is a cheap algorithm in btultra initialization :
it presumes default distribution will be used if srcSize <= 1024.

btultra2 now uses the same threshold to shut down probability estimation,
since measured frequencies won't be used at entropy stage,
and therefore relying on them to determine sequence cost is misleading,
resulting in worse compression ratios.

This fixes btultra2 performance issue on very small input.

Note that, a proper way should be
to determine which symbol is going to use predefined probaility
and which symbol is going to use dynamic ones.
But the current algorithm is unable to make a "per-symbol" decision.
So this will require significant modifications.
2018-12-18 12:32:58 -08:00
Yann Collet 373ff8b983 play around with rescale weights 2018-12-17 15:48:34 -08:00
Yann Collet 8be145a8c1 fixed default job size 2018-12-13 16:38:08 -08:00
Yann Collet 62180b27d5 zstdmt parameter getter/setter use `int` 2018-12-13 15:47:34 -08:00
Yann Collet 34f01e600f fixed multiple conversions
from 64-bit to 32-bit
2018-12-13 14:02:22 -08:00
Yann Collet f2f86d369b Merge branch 'btultra2' into ovlog_def 2018-12-12 20:58:14 -08:00
Yann Collet 9a92ed401d updated compression results.csv
and fixed nit
2018-12-12 20:30:09 -08:00
Yann Collet 9792acda3b Merge branch 'dev' into btultra2 2018-12-12 20:18:27 -08:00
Yann Collet 7bb8dfc62f new overlapLog default values
varies between 6 and 9, depending on strategy
2018-12-11 18:10:29 -08:00
Yann Collet eee789b7ea continued: changed to overlapLog
in deeper code layer.
for consistency.
2018-12-11 17:41:42 -08:00
Yann Collet 9b784dec7f changed parameter name to ZSTD_c_overlapLog
from overlapSizeLog.

Reasoning :
`overlapLog` is already used everwhere, in the code, command line and documentation.
`ZSTD_c_overlapSizeLog` feels unnecessarily different.
2018-12-11 16:55:33 -08:00
Yann Collet 5e6aaa3abb fixed btultra2 usage with prefix
notably while using multi-threading
2018-12-10 18:45:03 -08:00
Yann Collet 3619c34399 fix assert position within ZSTD_compress2() 2018-12-10 17:42:35 -08:00
Yann Collet c226a7b9f3 fixed ZSTD_compress2()
as suggested by @terrelln
2018-12-10 17:33:49 -08:00
Yann Collet 37e314a68d updated clevel table for large inputs 2018-12-09 22:38:05 -08:00
Yann Collet c9c4c7ec8c update clevel table for 256K 2018-12-08 21:40:08 -08:00
Yann Collet 8075d75f9c update clevel table for 128K 2018-12-08 10:42:55 -08:00
Yann Collet 95b152ab33 updated clevel table for 16K
to introduce btultra2
2018-12-07 20:12:43 -08:00
Yann Collet d613fd9afe linked btultra2 as strategy9
and ensure zstdbench detects out-of-bound parameters
2018-12-06 19:27:37 -08:00
Yann Collet ae370b0e12 minor bound refinements 2018-12-06 16:51:17 -08:00
Yann Collet 39e28982cf introduced constants ZSTD_STRATEGY_MIN and ZSTD_STRATEGY_MAX 2018-12-06 16:16:16 -08:00
Yann Collet c3c3488981 fixed c++ assignment to enum 2018-12-06 15:57:55 -08:00
Yann Collet be9e561da4 changed ZSTD_c_compressionStrategy into ZSTD_c_strategy
also : fixed paramgrill, and limit conditions
2018-12-06 15:00:52 -08:00
Yann Collet e9448cdf4c introduced strategy btultra2
note : not yet applied on any compression level
2018-12-06 13:38:09 -08:00
Yann Collet 3583d19c4e changed parameter names from ZSTD_p_* to ZSTD_c_*
for naming consistency
2018-12-05 17:26:02 -08:00
Yann Collet aec945f0dc implemented ZSTD_dParam_getBounds()
and ZSTD_DCtx_setParameter()
2018-12-04 15:35:37 -08:00
Yann Collet 6ced8f7c7c joined normal streaming API with advanced one 2018-12-03 14:22:38 -08:00
Yann Collet d8e215cbee created ZSTD_compress2() and ZSTD_compressStream2()
ZSTD_compress_generic() is renamed ZSTD_compressStream2().

Note that, for the time being,
the "stable" API and advanced one use different parameter planes :
setting parameters using the advanced API does not influence ZSTD_compressStream()
and using ZSTD_initCStream() does not influence parameters for ZSTD_compressStream2().
2018-11-30 11:25:56 -08:00
Yann Collet d4d4e109e9 getParameter fills an int*
rather than an unsigned*
for consistency
since type of setParameter() changed to int.
2018-11-21 15:37:26 -08:00
Yann Collet 41c7d0b1e1 changed hashEveryLog into hashRateLog 2018-11-21 14:36:57 -08:00
Yann Collet 5d3592398d fixed fall-through 2018-11-20 16:09:33 -08:00
Yann Collet 5c6d4b18ac completed implementation of ZSTD_cParam_getBounds()
for all parameters
2018-11-20 16:06:00 -08:00
Yann Collet 2e7fd6a2cb fixed remaining searchLength invocations 2018-11-20 15:13:27 -08:00
Yann Collet e874dacc08 changed searchLength into minMatch
refactored all relevant API and calls
for consistency.
2018-11-20 14:56:07 -08:00
Yann Collet 114bd4346e changed enum type name to ZSTD_ResetDirective
for naming consistency :
types should start with a capital letter (after prefix)
2018-11-20 12:00:20 -08:00
Yann Collet 3b838abf97 ZSTD_CCtx_setParameter : `value` argument is now `int`
for compatibility with compression level
2018-11-20 11:53:01 -08:00
Yann Collet 5c68639186 updated ZSTD_DCtx_reset()
signature and behavior is now the same as ZSTD_CCtx_reset()
2018-11-15 16:12:39 -08:00
Yann Collet 06c8d5a4f4 Merge branch 'dev' into advancedAPI
fixed rsyncable
2018-11-15 10:51:24 -08:00
Nick Terrell b9693d3a49 [lib] Add rsyncable mode
- Add rsyncable mode to multithreaded mode
- Factor out LDM's hash function for reuse
2018-11-14 16:59:57 -08:00
Yann Collet 7b0391e37e finalized retrofit of ZSTD_CCtx_reset()
updated all depending sources
2018-11-14 13:05:35 -08:00
Yann Collet ff8d371708 modified ZSTD_CCtx_reset()
which now accepts an enum,
to distinguish between resetting the session, or the parameters (or both).

removed ZSTD_CCtx_resetParameters(), which is redundant.

start replacing invocation of ZSTD_CCtx_reset*() functions

Updated advanced API documentation

trimmed down amount of API staged in RC,
in particular, all functions related to ZSTD_CCtxParams()
seem too advanced.
2018-11-14 12:33:57 -08:00
Yann Collet d7e10a774a added constant ZSTD_WINDOWLOG_LIMIT_DEFAULT
answering #1407.

Also : removed obsolete function ZSTD_setDStreamParameter()
which could only be used with one parameter (DStream_p_maxWindowSize).
Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)
2018-11-13 18:12:34 -08:00
Yann Collet b83d1e7714 removed some `static const` variables
and replaced by traditional macro constants.

Unfortunately, C doesn't consider `static const` to mean "constant"
2018-11-13 16:56:32 -08:00
Yann Collet f28af025d9
Merge pull request #1413 from felixhandte/attach-dict-fix-unsigned-compare
Fix #1412: Perform Signed Comparison When Setting Attach Dict Param
2018-11-12 17:53:11 -08:00
Yann Collet 626040ab53 changed PREFETCH() macro into PREFETCH_L2()
which is more accurate
2018-11-12 17:05:32 -08:00