Commit Graph

2149 Commits (31b54b6eea9051cb44f239ecd9d52b453df8880a)

Author SHA1 Message Date
Yann Collet 27c5853c42 zstdmt: job table correctly cleaned after synchronous ZSTDMT_compress() 2018-01-26 14:35:54 -08:00
Yann Collet 0d426f6b83 zstdmt : refactor a few member names
for clarity
2018-01-26 13:00:14 -08:00
Yann Collet 79b6e28b0a zstdmt : flush() only lock to read shared job members
Other job members are accessed directly.
This avoids a full job copy, which would access everything,
including a few members that are supposed to be used by worker only,
uselessly requiring additional locks to avoid race conditions.
2018-01-26 12:15:43 -08:00
Yann Collet d2b62b6fa5 minor : ZSTDMT_writeLastEmptyBlock() is a void function
because it cannot fail
2018-01-26 11:06:34 -08:00
Yann Collet fca13c6855 zstdmt : fixed memory leak
writeLastEmptyBlock() must release srcBuffer
as mtctx assumes it's done by job worker.

minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity
2018-01-26 10:44:09 -08:00
Yann Collet 8e128eaf05 zstdmt : refactor job members
grouped by sharing properties
2018-01-26 10:20:38 -08:00
Yann Collet 777d3c1559 fixed minor declaration-after-statement warning 2018-01-25 17:45:18 -08:00
Yann Collet a1d4041e69 zstdmt: removed job->jobCompleted
replaced by equivalent signal job->consumer == job->srcSize.

created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.

It avoids creating a job with srcSize==0.
2018-01-25 17:35:49 -08:00
Yann Collet 1272d8e760 zstdmt:: renamed mutex and cond to underline they are context-global 2018-01-25 14:52:34 -08:00
Yann Collet 5f349b129c zstdmt : correctly set end of frame 2018-01-23 15:52:40 -08:00
Yann Collet c1cc57f270 zstdmt : fix end condition (ZSTD_e_end)
When ZSTD_e_end directive is provided,
the question is not only "are internal buffers completely flushed",
it is also "is current frame completed".

In some rare cases,
it was possible for internal buffers to be completely flushed,
triggering a @return == 0,
but frame was not completed as it needed a last null-size block to mark the end,
resulting in an unfinished frame.
2018-01-23 15:19:11 -08:00
Yann Collet de5e38a7a6 zstdmt: fixed minor race condition
no real consequence, but pollute tsan tests :

job->dstBuff is being modified inside worker,
while main thread might read it accidentally
because it copies whole job.
But since it doesn't used dstBuff, there is no real consequence.

Other potential solution : only copy useful data, instead of whole job
2018-01-23 14:03:07 -08:00
Yann Collet ebd955e26a zstdmt : fixed ending frame with 0-size block 2018-01-23 13:12:40 -08:00
Yann Collet 6711396d97 zstreamtest : fixed test 32 : multi-thread compression
using ZSTD_compress_generic(,,ZSTD_e_end)
Since it already provides ZSTD_e_end as directive,
it should not be followed by ZSTDMT_endStream().
2018-01-19 22:20:53 -08:00
Yann Collet a7ef3a219c zstdmt : fixed last job size 2018-01-19 18:19:09 -08:00
Yann Collet 3ad7d4951c zstdmt : finally vanquished an elusive and rare race condition 2018-01-19 17:35:08 -08:00
Yann Collet 940634a610 zstdmt : simplify job creation
job will not be created when not enough room within job Table
2018-01-19 13:25:06 -08:00
Yann Collet dc69623453 zstdmt: fixed corruption issue in ZSTDMT_endStream()
when invoked directly.
2018-01-19 12:41:56 -08:00
Yann Collet 70f81d6030 zstdmt uses POOL_tryAdd() to call a new worker
so that it's no longer a blocking call.
This makes it possible to stream out data gradually,
while waiting for a worker to become available.
2018-01-19 10:01:40 -08:00
Yann Collet d19dc1903c
Merge pull request #995 from facebook/progressiveMT
Progressive mt
2018-01-18 17:59:49 -08:00
Yann Collet 6f7280fb33 fixed frame checksum issue
and race conditions
2018-01-18 16:20:26 -08:00
Yann Collet 997e4d0ccd added POOL_tryAdd() 2018-01-18 14:39:51 -08:00
Yann Collet 4f43ef731d Merge branch 'dev' into constCDict 2018-01-18 13:36:43 -08:00
Yann Collet ef97d5a287 Merge branch 'progressiveMT' into progressiveFlush 2018-01-18 13:35:24 -08:00
Yann Collet b6ab232f2d Merge branch 'dev' into progressiveMT 2018-01-18 13:34:56 -08:00
Nick Terrell 9d96761520 Set repcodes for empty ZSTD_CDict
When the dictionary is <= 8 bytes, no data is loaded from the dictionary.
In this case the repcodes weren't set, because they were inserted after the
size check. Fix this problem in general by first setting the cdict state to
a clean state of an empty dictionary, then filling the state from there.
2018-01-18 13:28:30 -08:00
Yann Collet c7190c69cc fixes for @terrelln comments 2018-01-18 11:15:23 -08:00
Yann Collet 1b5d80d633 zstdmt: added ability to flush current job before it's completed
however, zstdmt may still wait on next available worker,
so it's not smooth yet.
2018-01-18 11:03:27 -08:00
Yann Collet aa79c18e3f fixed a few access contention
passes thread sanitizer test
2018-01-17 17:18:19 -08:00
Yann Collet 394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet f3b8f90b6d changed initStatic?Dict() return type to const ZSTD_?Dict*
ZSTD_create?Dict() is required to produce a ?Dict* return type
because `free()` does not accept a `const type*` argument.
If it wasn't for this restriction, I would have preferred to create a `const ?Dict*` object
to emphasize the fact that, once created, a dictionary never changes
(hence can be shared concurrently until the end of its lifetime).

There is no such limitation with initStatic?Dict() :
as stated in the doc, there is no corresponding free() function,
since `workspace` is provided, hence allocated, externally,
it can only be free() externally.

Which means, ZSTD_initStatic?Dict() can return a `const ZSTD_?Dict*` pointer.

Tested with `make all`, to catch initStatic's users,
which, incidentally, also updated zstd.h documentation.
2018-01-17 14:08:48 -08:00
Yann Collet b86865323a Merge branch 'dev' into progressiveMT
fixed minor conflict on cdict
2018-01-17 13:51:03 -08:00
Yann Collet d14cc881b0 zstdmt : fixed very large window sizes
would create too large buffers,
since default job size == window size * 4.

This would crash on 32-bit systems.

Also : jobSize being a 32-bit unsigned, it cannot be >= 4 GB,
so the formula was failing for large window sizes >= 1 GB.
Fixed now : max job Size is 2 GB, whatever the window size.
2018-01-17 12:39:58 -08:00
Yann Collet 58dd7de640 zstdmt: fixed an endless loop on allocation failure
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.

also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
2018-01-17 12:10:15 -08:00
Nick Terrell 16bd0fd4df Reduce size of ZSTD_CDict
Shaves 492,076 B off of the `ZSTD_CDict`.
The size of a `ZSTD_CDict` created from a 112,640 B dictionary is:

| Level | Before (B) | After (B) |
|-------|------------|-----------|
| 1     | 648,448    | 156,412   |
| 3     | 1,140,008  | 647,932   |
2018-01-17 11:50:49 -08:00
Yann Collet cb57c107ff zstdmt: minor variable renaming, for clarity 2018-01-17 11:39:07 -08:00
Yann Collet 1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet 6025465e42 ZSTDMT : minor CCtx memory optimization
can be useful when a compression job only has small amount of data to compress.
2018-01-16 15:34:41 -08:00
Yann Collet 2e23333094 ZSTDMT can now work in non-blocking mode with 1 thread
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.

Also : fixed a bug in new block-granular compression routine.
2018-01-16 15:28:43 -08:00
Yann Collet 8e83c5c910 Merge branch 'dev' into progressiveMT 2018-01-16 12:54:33 -08:00
Nick Terrell aae267a2e1 Reorganize block state 2018-01-16 11:17:50 -08:00
Nick Terrell 887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
Yann Collet 9477f6529d
Merge pull request #984 from terrelln/dict-load
Load more dictionary positions into table if empty
2018-01-13 13:20:42 -08:00
Yann Collet 58ecf13e02 zstdmt : can compress at block granularity
offering perspective of more accurate progression report.
2018-01-13 13:18:57 -08:00
Nick Terrell 9a211d1f05 Load more dictionary positions into table if empty
If the hash table is empty load positions into the hash table
that we would otherwise skip.

| Level | Data Set     | Improvement |
|-------|--------------|-------------|
| 1     | github       | 0.44%       |
| 1     | hg-changelog | 0.13%       |
| 1     | hg-commands  | 1.28%       |
| 1     | hg-manifest  | 0.70%       |
| 3     | github       | 0.74%       |
| 3     | hg-changelog | 0.87%       |
| 3     | hg-commands  | 1.74%       |
| 3     | hg-manifest  | 0.23%       |
2018-01-12 16:17:22 -08:00
Yann Collet 863b2f8db4
Merge pull request #983 from terrelln/dict-wlog
Increase windowLog from CDict based on the srcSize when known
2018-01-12 07:47:43 -08:00
Nick Terrell b610b777d3 Increase windowLog from CDict based on the srcSize when known 2018-01-11 16:23:21 -08:00
Yann Collet cacf47cbee Merge branch 'dev' into dubtlazy
and fixed conflicts
2018-01-11 13:25:08 -08:00
Yann Collet 04c00f9388
Merge pull request #982 from facebook/fix304
Fix for #304 and #977 : error during dictionary creation
2018-01-11 13:20:59 -08:00
Yann Collet b9a14900ff changed function name to ZSTD_DUBT_findBestMatch() 2018-01-11 12:38:31 -08:00
Yann Collet 752bae4a48 added warning message
when pathological dataset is detected
(note : cover_optimize needs -v to display the warning)
2018-01-11 11:29:28 -08:00
Yann Collet e8093dde09 fixed #304
Pathological samples may result in literal section being incompressible.
This case is now detected,
and literal distribution is replaced by one that can be written into the dictionary.
2018-01-11 11:16:32 -08:00
Yann Collet 218e9fe0fc added a test case for dictBuilder failure
cyclic data set makes the entropy stage fails
now, onto a fix for #304 ...
2018-01-11 09:42:38 -08:00
Yann Collet ff795580f2 fixed bug #976, reported by @indygreg
constants in zstd.h should not depend on MIN() macro which existence is not guaranteed.

Added a test to check the specific constants.
The test is a bit too specific.
But I have found no way to control a more generic "are all macro already defined" condition,
especially as this is a valid construction (the missing macro might be defined later, intentionnally).
2018-01-10 20:33:45 -08:00
Yann Collet 292eeb672f api doc : grouped all ZSTD_create*_advanced() functions together
in a new "custom memory allocator" paragraph
which is itself part of "memory management" category.

This makes it simpler to see the relation between the type and its usages.
2018-01-10 09:07:47 -08:00
Yann Collet 3ea156368c API doc : grouped ZSTD_initStatic*() together
within "memory management" category.
2018-01-10 08:49:50 -08:00
Yann Collet b17fb488b0 fixed msan test
a pointer calculation was wrong in a corner case
2018-01-06 20:50:36 +01:00
Yann Collet 658d6b8588 Merge branch 'dev' into dubtlazy 2018-01-06 12:40:58 +01:00
Yann Collet a927fae2a1 fixed ZSTD_reduceIndex()
following suggestions from @terrelln.
Also added some comments to present logic behind ZSTD_preserveUnsortedMark().
2018-01-06 12:31:26 +01:00
Yann Collet 2eff217136 updated /lib documentation 2017-12-31 15:50:00 +01:00
Yann Collet 00db4dbbb3 fixed minor argument property for Visual 2017-12-30 15:42:28 +01:00
Yann Collet f597f55675 improved btlazy2 : list of unsorted candidates can reach extDict
It used to stop on reaching extDict, for simplification.
As a consequence, there was a small loss of performance each time the round buffer would restart from beginning.
It's not a large difference though, just several hundreds of bytes on silesia.
This patch fixes it.
2017-12-30 15:12:59 +01:00
Yann Collet a68b76afef updated compression level table for btlazy2
now selected for levels 13, 14 and 15.

Also : dropped the requirement for monotonic memory budget increase of compression levels,,
which was required for ZSTD_estimateCCtxSize()
in order to ensure that a memory budget for level L is large enough for any level <= L.
This condition is now ensured at run time inside ZSTD_estimateCCtxSize().
2017-12-30 11:40:35 +01:00
Yann Collet eb52e2f45e simplify ZSTD_preserveUnsortedMark() implementation
since no compiler attempts to auto-vectorize it.
2017-12-30 11:13:52 +01:00
Yann Collet d228b6b0d0 btlazy2 : optimization for dictionary compression
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
2017-12-29 19:14:18 +01:00
Yann Collet 02f64ef955 btlazy2: fixed interaction between unsortedMark and reduceTable 2017-12-29 19:08:51 +01:00
Yann Collet 64482c2c97 fixed bug in dubt
the chain of unsorted candidates could grow beyond lowLimit.
2017-12-29 17:04:37 +01:00
Yann Collet f36da5b4d9 minor speed optimization : index overflow prevention
new code supposed to be easier to auto-vectorize
2017-12-29 14:40:33 +01:00
Yann Collet 5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00
Yann Collet 473362e922
Merge pull request #958 from facebook/continueCCtx
fix a subtle issue in continue mode
2017-12-20 00:12:50 +01:00
Yann Collet cafedcbbe4 ZSTD_resetCCtx_internal: fixed order of arguments
params1 was swapped with params2.
This used to be a non-issue when testing for strict equality,
but now that some tests look for "sufficient size" `<=`, order matters.
2017-12-19 21:49:04 +01:00
Yann Collet 9096088f45 changed variable name for clarity, suggested by @terrelln 2017-12-19 21:20:46 +01:00
Yann Collet f299fa39ac fix a subtle issue in continue mode
The deep fuzzer tests caught a subtle bug that was probably there for a long time.
The impact of the bug is not a crash, or any other clear error signal,
rather, it reduces performance, by cutting data into smaller blocks.
Eventually, the following test would fail because it produces too many 1-byte blocks,
requiring more space than buffer can provide :
`./zstreamtest_asan --mt -s3514 -t1678312 -i1678314`

The root scenario is as follows :
- Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1`
- The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`.
  `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`.
  The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes.
- Complete compression
- New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN`
  trigger "continue mode"
- Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only,
  but in fact, it also depends on `pledgedSrcSize`.
- The "old" blocksize (1) is still there,
  next compression will use this value to cut input into blocks,
  resulting in more blocks and worse performance than necessary performance.

Given the scenario, and its possible variants, I'm surprised it did not show up before.
But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger.
The above test is a special corner case, where performance is so impacted that it reaches an error case.

The fix works, but I'm not completely pleased.
I think the current code relies too much on implied relations between variables.
This will likely break again in the future when some related part of the code change.
Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3.
So a longer term fix will have to be considered after the release.

To do : create a reliable test case which triggers this scenario for CI tests.
2017-12-19 09:43:03 +01:00
Yann Collet 5c2f2ebfdb zstdmt via compress_generic: reduce opportunity to free/create mtctx
`zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads
resulting in failure of tsan tests,
and potentially connected to the qemu flaky tests.

This is because, at each test, the nb of threads can be changed (random).

The `--no-big-tests` directive reduce this choice to 1/2 threads,
in order to limit memory usage, especially for qemu and 32-bits builds.
Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx.

This patch takes advantage of the following property :
via compress_generic, no internal mtctx is needed for nbThreads < 2.
As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good.

This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests`
(only when parent cctx itself is created, which is randomized to 1/256 tests).

Expected outcome :
- at a minimum : tsan tests shall now work continuously without exploding the thread counter
- at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction

Real world impact :
minimal, I don't expect users to constantly change `nbThreads` between each invocation.
If `nbThreads` remains stable, existing implementation re-uses existing mtctx.

Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much,
since this test can select a random `nbThreads` value between 1 and 4.
The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx)
but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4.
A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size.
This would require a larger change to `POOL_*` api though.
2017-12-16 12:48:13 -08:00
Yann Collet 3cbfac1cdb updated levels 15-20
taking advantage of `btopt` improved speed to tune parameters.
Levels 16-19 are stronger than previous release, making the graph more favorable.

In theory, I should also update small-size tables,
but I got lazy on that one ...
2017-12-14 23:29:00 -08:00
Yann Collet 2cff66b62f version bump to v1.3.3 2017-12-14 16:11:20 -08:00
Yann Collet 8c41a9cb1e
Merge pull request #951 from facebook/lastBlock
saves 3-bytes on small input with streaming API
2017-12-14 15:39:50 -08:00
Yann Collet a0ac8c895c
Merge pull request #950 from facebook/srcSizeAdaptation
fix adaptation on srcSize
2017-12-14 14:48:31 -08:00
Yann Collet 281f06e01f saves 3-bytes on small input with streaming API
zstd streaming API was adding a null-block at end of frame for small input.

Reason is : on small input, a single block is enough.
ZSTD_CStream would size its input buffer to expect a single block of this size,
automatically triggering a flush on reaching this size.

Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`).
The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame.

The solution is to not flush automatically, which is btw the expected behavior.
It happens in this case because blocksize is defined with exactly the same size as input.
Just adding one-byte is enough to stop triggering the automatic flush.

I initially looked at another solution, solving the problem directly in the compression context.
But it felt awkward.
Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame
on reaching its expected end (`pledgedSrcSize`).
This feels awkward, a responsability over-reach, beyond the definition of this API.
ZSTD_compressContinue() is clearly documented as a guaranteed flush,
with ZSTD_compressEnd() generating a guaranteed end.

I faced similar issue when trying to port a similar mechanism at the higher streaming layer.
Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller,
since it did not explicitly requested an end of frame.
The only sensible action remaining after that is to end the frame with no additional input.
This adds additional logic in the ZSTD_CStream state to check this condition.
Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?)

In the end, just enlarging input buffer by 1 byte feels the least intrusive change.
It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code.

The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.
2017-12-14 11:47:02 -08:00
Yann Collet c005df136f
Merge pull request #947 from facebook/fix944
Fix #944
2017-12-14 10:01:52 -08:00
Yann Collet 2e97a6d464 fixed minor declaration-after-statement warning 2017-12-13 18:50:05 -08:00
Yann Collet 5432ef6921 fixes adaptation on srcSize
This patch restores capability for each file to receive adapted compression parameters depending on its size.

The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.

Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
2017-12-13 17:45:26 -08:00
Yann Collet d23eb9a098 zstreamtest : added missing CHECK_Z() 2017-12-13 15:35:49 -08:00
Nick Terrell 22727a7467 Fix cdict compressor repcodes 2017-12-13 11:31:20 -08:00
Yann Collet e28305fcca fix #944 : ZSTDMT with large files and dictionary now works correctly
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.

also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
2017-12-12 18:04:58 -08:00
Yann Collet 03832b7aa5 re-added test case
messing with revert ... :(
2017-12-12 14:01:54 -08:00
Yann Collet 8a104fda05 Revert "Created a test case which reliably reproduces bug #944"
This reverts commit 5098d1fbe2.
2017-12-12 12:51:49 -08:00
Yann Collet 5098d1fbe2 Created a test case which reliably reproduces bug #944
in zstreamtest.
2017-12-12 12:48:31 -08:00
Yann Collet ac8e022806
Merge pull request #943 from facebook/fix942
Fix #942
2017-12-08 13:53:08 -05:00
Yann Collet dfc697e967 comment clarification 2017-12-08 12:16:49 -05:00
Yann Collet c029ee1f0b ZSTD_initCStream_srcSize() considers "0" to mean "unknown"
to not break existing programs relying on this behavior.
Might be changed to mean "empty" in the future.
2017-12-07 17:13:10 -05:00
Yann Collet 3aa2b27a89 fix #942 : streaming interface does not compress after ZSTD_initCStream()
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.

This patch fixes the issue.
Todo : add a test which traps the issue.
2017-12-07 02:52:50 -05:00
Yann Collet c173dbd6e7 no longer supported starting C++17 2017-12-04 18:00:53 -08:00
Yann Collet 7e05ef851a Merge branch 'dev' into qemu32panic 2017-12-03 11:14:36 -08:00
Yann Collet 5e1f34b7e4 setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
2017-12-01 21:17:09 -08:00
Yann Collet 78290874a5 fixed Visual warning on minor interface discrepancy 2017-11-29 17:01:14 -08:00
Yann Collet d3c59edac9 removed long-range-mode tests from `zstreamtest --no-big-tests` 2017-11-29 16:42:20 -08:00
Yann Collet 998a93b784 simplified ZSTD_CCtx_setParametersUsingCCtxParams()
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
2017-11-29 16:13:05 -08:00
Yann Collet f98ee994c4 zstd_opt: added comments, as requested by @terrelln 2017-11-29 15:19:00 -08:00
Yann Collet bc42bc3b1d removed one invocation of SET_PRICE() macro 2017-11-28 16:08:56 -08:00
Yann Collet 0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
Yann Collet b71405dc51 removed a bunch of code related to cached literal price
optState was used both to evaluate price
and to cache cost of previously calculated literals.
This created a strong dependency, forcing parser to request cost in a strict order.
This limitation is forbids future parser with skipping capabilities.

After this patch, caching literals price still exists,
but is now explicit, in a stack structure.
2017-11-28 12:32:24 -08:00
Yann Collet 03f30d9dcb separate rawLiterals, fullLiterals and match costs
removed one SET_PRICE() macro invocation
2017-11-28 12:14:46 -08:00
Yann Collet eee87cd6f2 btopt: minor refactor : removed one SET_PRICE() macro invocation
direct assignment makes operation cleaner.
Also allows some (very minor) optimization (non-measurable)
2017-11-27 17:18:57 -08:00
Yann Collet e9d1987fd7 btopt: minor speed optimization
matchPrice is always right at beginning
2017-11-27 17:01:51 -08:00
Yann Collet bd88f633ac zstreamtest : in `-T#s`, s considered a suffix meaning "seconds"
avoid unintentionnally triggering `seedset`,
so that seed gets automatically determined when not set.
2017-11-27 12:15:23 -08:00
Yann Collet f8d5c478af fixed comment, reported by @gyscos 2017-11-21 10:36:14 -08:00
Yann Collet 4154aec679 fixed comment, as suggested by @terrelln 2017-11-21 10:26:17 -08:00
Yann Collet 899f2a29f6 strategy ZSTD_btopt pinned to (0) variant (faster one) 2017-11-20 11:53:20 -08:00
Yann Collet 3f457264d1 slightly improved compression speed 2017-11-19 14:40:21 -08:00
Yann Collet 42c1e64270 slightly improved ratio at -22
merging of repcode search into btsearch introduced a small compression ratio regressio at max level :
1.3.2 : 52728769
after repMerge patch : 52760789 (+32020)

A few minor changes have produced this difference.
They can be hard to spot.

This patch buys back about half of the difference,
by no longer inserting position at hc3 when a long match is found there.
It feels strangely counter-intuitive, but works :
after this patch : 52742555 (-18234)
2017-11-19 14:00:55 -08:00
Yann Collet 99435dbbab minor : search early-out on sufficient_len for hc3 and rep
very very small speed and ratio increases
2017-11-19 12:58:04 -08:00
Yann Collet d100670045 btopt0 : a bit faster and weaker 2017-11-19 10:38:02 -08:00
Yann Collet e6da37c430 created (hidden) new strategy btopt0
about ~+10% faster but losing ~0.01 compression ratio
(note : amplitude vary a lot depending on files, but direction remains the same)
2017-11-19 10:21:21 -08:00
Yann Collet e717a5b0dd zstd_opt: minor speed optimization
Calculate reference log2sums only once per serie of sequence
(as opposed to once per sequence)

Also: improved code comments
2017-11-18 16:24:02 -08:00
Yann Collet d11661c3ec fix ZSTD_COMPRESSBOUND() macro
It was using macro `KB`, which is not defined in `zstd.h`.
2017-11-18 11:16:39 -08:00
Yann Collet a4a20a4b2f fix un-initialized memory warning
harmless, but cleaner
2017-11-17 15:51:52 -08:00
Yann Collet 23767e950a fix one UB pointer arithmetic in encoder
Instead of calculating distance between 2 memory objects, which is UB,
we extract the offset from object 1, and transfer it into object 2.
2017-11-17 13:24:51 -08:00
Yann Collet cdade555ee fixed one UB pointer arithmetic 2017-11-17 11:40:08 -08:00
Yann Collet 11e58d9ba4 fixed minor warning
warning: void function returning a value
(even if the return value is void)
2017-11-16 15:21:30 -08:00
Yann Collet 15768cabb5 fixed some complex scenarios
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
2017-11-16 15:18:18 -08:00
Yann Collet 05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00
Yann Collet dfc14579f5 removed wrong assertion 2017-11-15 15:35:56 -08:00
Yann Collet c55e35b2fc removed a few specialized traces 2017-11-15 15:04:53 -08:00
Yann Collet 61c2d70c86 shortened repcode match finder implementation 2017-11-15 14:37:40 -08:00
Yann Collet d7e9805028 fixed corruption issue 2017-11-15 13:44:24 -08:00
Yann Collet 046ea53bef still fighting data corruption
due to messed up tree.
Seems to happen when reaching end of buffer.
2017-11-15 11:29:24 -08:00
Yann Collet 4202b2e8a6 merged rep search into btMatchSearch
but there is a tree corruption somewhere ...
bug hunt ongoing
2017-11-14 20:38:52 -08:00
Yann Collet 9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet eb47705b18 reduced scope of multiple variables
renamed some variables for better understanding
2017-11-10 08:31:12 -08:00
Yann Collet 100d8ad6be lib/compress: created ZSTD_LLcode() and ZSTD_MLcode()
transform length into code.
Since transformation is needed in several places throughout the code,
better write the logic in one place.
2017-11-08 12:43:05 -08:00
Yann Collet 5aa0352742 zstd_opt: simplified ZSTD_getPrice() and ZSTD_updatePrice() interface
ZSTD_getPrice() and ZSTD_updatePrice() accept normal matchlength as argument
instead of matchlength-MINMATCH,
which makes them easier / more logical to use and read.
Conversion is simply done internally.
2017-11-08 12:23:27 -08:00
Yann Collet bf730e2044 zstd_opt: refactor code for improved readability
renamed variables to be more meaningful
reduced scope of multiple variables
removed some useless var attribution
2017-11-08 12:07:39 -08:00
Yann Collet 4191efa993 zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests 2017-11-08 11:24:00 -08:00
Yann Collet ee441d5d2b renamed zstd_compress.h into zstd_compress_internal.h
to emphasize the fact that all definitions it contains
must remain private, accross lib/compress modules.
2017-11-07 16:15:23 -08:00
Yann Collet 8b6aecf2cb moved a few structures from `zstd_internal.h` to `zstd_compress.h`
which is a more precise scope
2017-11-07 16:03:14 -08:00
Yann Collet aec56a52fb
Merge pull request #908 from facebook/ubsan
Modified one pointer arithmetic expression to a more conformant way.
2017-11-07 11:45:34 -08:00
Yann Collet d0ffd398d2
Merge pull request #906 from facebook/fixAutoPledge
fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
2017-11-02 10:14:20 -07:00
Yann Collet 150354c5fe minor refactor
added some traces and assert
related to hunting a potential ubsan error in 32-bits more
(it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802).

Modified one pointer arithmetic expression for a more conformant way.
2017-11-01 16:57:48 -07:00
Yann Collet 428e8b3bf4 fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.

This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.

It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
2017-11-01 13:15:23 -07:00
Nick Terrell 1fc4f593da Allow skippable frames of any size 2017-11-01 13:07:26 -07:00
Yann Collet 61e5a1adfc removed direct call to malloc() from pool.c 2017-10-31 17:43:24 -07:00
Yann Collet f73e15de33
Merge pull request #903 from terrelln/empty-input
[libzstd] Fix parameter selection for empty input
2017-10-28 17:28:08 -07:00
Nick Terrell 86b8134cad [libzstd] Fix parameter selection for empty input
ZSTD_compress() and friends would treat an empty input as an unknown size
when selecting parameters. Thus, they would drastically overallocate the
context. Tell ZSTD_getParams() that the source size is 1 when it is empty.
2017-10-25 17:24:15 -07:00
Nick Terrell b495140f67 Update BUCK files
* Correct XXH namespace (Fixes #901)
* Multithreading always enabled
* GZIP/LZ4/LZMA always enabled
* Legacy support always fully enabled
2017-10-25 12:47:57 -07:00
Yann Collet 97dccbbb2b fixed zbufftest
preserve "pledgedSrcSize=0" means "unknown" in init_advanced()
2017-10-19 14:06:02 -07:00
Yann Collet ca1a9ebac5 fixed zlib wrapper
it was invoking ZSTD_initCStream_advanced() with pledgedSrcSize==0 and contentSizeFlag=1
which means "empty"
while the intention was to mean "unknown".

The contentSizeFlag==1 is new, it is a consequence of setting this value to 1 by default.

The solution selected here is to pass ZSTD_CONTENTSIZE_UNKNOWN to mean "unknown".
So contentSizeFlag remains set (it wasn't in previous versions).
2017-10-18 11:22:23 -07:00
Yann Collet 1ff8a8c109 Merge pull request #891 from facebook/contentSize
Content size
2017-10-17 17:24:51 -07:00
Yann Collet 32c9f715ae fixed : Visual build compressing stdin with multi-threading enabled fails
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN`  (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
2017-10-17 14:07:43 -07:00
Yann Collet 13bfe885aa edited ZSTD_initCStream_advanced() comment 2017-10-16 14:06:22 -07:00