Commit Graph

201 Commits (fca13c6855d28d6ba8628794465372cb6088442d)

Author SHA1 Message Date
Yann Collet fca13c6855 zstdmt : fixed memory leak
writeLastEmptyBlock() must release srcBuffer
as mtctx assumes it's done by job worker.

minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity
2018-01-26 10:44:09 -08:00
Yann Collet 8e128eaf05 zstdmt : refactor job members
grouped by sharing properties
2018-01-26 10:20:38 -08:00
Yann Collet a1d4041e69 zstdmt: removed job->jobCompleted
replaced by equivalent signal job->consumer == job->srcSize.

created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.

It avoids creating a job with srcSize==0.
2018-01-25 17:35:49 -08:00
Yann Collet 1272d8e760 zstdmt:: renamed mutex and cond to underline they are context-global 2018-01-25 14:52:34 -08:00
Yann Collet 5f349b129c zstdmt : correctly set end of frame 2018-01-23 15:52:40 -08:00
Yann Collet c1cc57f270 zstdmt : fix end condition (ZSTD_e_end)
When ZSTD_e_end directive is provided,
the question is not only "are internal buffers completely flushed",
it is also "is current frame completed".

In some rare cases,
it was possible for internal buffers to be completely flushed,
triggering a @return == 0,
but frame was not completed as it needed a last null-size block to mark the end,
resulting in an unfinished frame.
2018-01-23 15:19:11 -08:00
Yann Collet de5e38a7a6 zstdmt: fixed minor race condition
no real consequence, but pollute tsan tests :

job->dstBuff is being modified inside worker,
while main thread might read it accidentally
because it copies whole job.
But since it doesn't used dstBuff, there is no real consequence.

Other potential solution : only copy useful data, instead of whole job
2018-01-23 14:03:07 -08:00
Yann Collet ebd955e26a zstdmt : fixed ending frame with 0-size block 2018-01-23 13:12:40 -08:00
Yann Collet a7ef3a219c zstdmt : fixed last job size 2018-01-19 18:19:09 -08:00
Yann Collet 3ad7d4951c zstdmt : finally vanquished an elusive and rare race condition 2018-01-19 17:35:08 -08:00
Yann Collet 940634a610 zstdmt : simplify job creation
job will not be created when not enough room within job Table
2018-01-19 13:25:06 -08:00
Yann Collet dc69623453 zstdmt: fixed corruption issue in ZSTDMT_endStream()
when invoked directly.
2018-01-19 12:41:56 -08:00
Yann Collet 70f81d6030 zstdmt uses POOL_tryAdd() to call a new worker
so that it's no longer a blocking call.
This makes it possible to stream out data gradually,
while waiting for a worker to become available.
2018-01-19 10:01:40 -08:00
Yann Collet 6f7280fb33 fixed frame checksum issue
and race conditions
2018-01-18 16:20:26 -08:00
Yann Collet ef97d5a287 Merge branch 'progressiveMT' into progressiveFlush 2018-01-18 13:35:24 -08:00
Yann Collet c7190c69cc fixes for @terrelln comments 2018-01-18 11:15:23 -08:00
Yann Collet 1b5d80d633 zstdmt: added ability to flush current job before it's completed
however, zstdmt may still wait on next available worker,
so it's not smooth yet.
2018-01-18 11:03:27 -08:00
Yann Collet aa79c18e3f fixed a few access contention
passes thread sanitizer test
2018-01-17 17:18:19 -08:00
Yann Collet 394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet d14cc881b0 zstdmt : fixed very large window sizes
would create too large buffers,
since default job size == window size * 4.

This would crash on 32-bit systems.

Also : jobSize being a 32-bit unsigned, it cannot be >= 4 GB,
so the formula was failing for large window sizes >= 1 GB.
Fixed now : max job Size is 2 GB, whatever the window size.
2018-01-17 12:39:58 -08:00
Yann Collet 58dd7de640 zstdmt: fixed an endless loop on allocation failure
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.

also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
2018-01-17 12:10:15 -08:00
Yann Collet cb57c107ff zstdmt: minor variable renaming, for clarity 2018-01-17 11:39:07 -08:00
Yann Collet 1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet 6025465e42 ZSTDMT : minor CCtx memory optimization
can be useful when a compression job only has small amount of data to compress.
2018-01-16 15:34:41 -08:00
Yann Collet 2e23333094 ZSTDMT can now work in non-blocking mode with 1 thread
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.

Also : fixed a bug in new block-granular compression routine.
2018-01-16 15:28:43 -08:00
Yann Collet 58ecf13e02 zstdmt : can compress at block granularity
offering perspective of more accurate progression report.
2018-01-13 13:18:57 -08:00
Yann Collet 473362e922
Merge pull request #958 from facebook/continueCCtx
fix a subtle issue in continue mode
2017-12-20 00:12:50 +01:00
Yann Collet f299fa39ac fix a subtle issue in continue mode
The deep fuzzer tests caught a subtle bug that was probably there for a long time.
The impact of the bug is not a crash, or any other clear error signal,
rather, it reduces performance, by cutting data into smaller blocks.
Eventually, the following test would fail because it produces too many 1-byte blocks,
requiring more space than buffer can provide :
`./zstreamtest_asan --mt -s3514 -t1678312 -i1678314`

The root scenario is as follows :
- Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1`
- The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`.
  `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`.
  The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes.
- Complete compression
- New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN`
  trigger "continue mode"
- Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only,
  but in fact, it also depends on `pledgedSrcSize`.
- The "old" blocksize (1) is still there,
  next compression will use this value to cut input into blocks,
  resulting in more blocks and worse performance than necessary performance.

Given the scenario, and its possible variants, I'm surprised it did not show up before.
But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger.
The above test is a special corner case, where performance is so impacted that it reaches an error case.

The fix works, but I'm not completely pleased.
I think the current code relies too much on implied relations between variables.
This will likely break again in the future when some related part of the code change.
Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3.
So a longer term fix will have to be considered after the release.

To do : create a reliable test case which triggers this scenario for CI tests.
2017-12-19 09:43:03 +01:00
Yann Collet 5c2f2ebfdb zstdmt via compress_generic: reduce opportunity to free/create mtctx
`zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads
resulting in failure of tsan tests,
and potentially connected to the qemu flaky tests.

This is because, at each test, the nb of threads can be changed (random).

The `--no-big-tests` directive reduce this choice to 1/2 threads,
in order to limit memory usage, especially for qemu and 32-bits builds.
Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx.

This patch takes advantage of the following property :
via compress_generic, no internal mtctx is needed for nbThreads < 2.
As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good.

This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests`
(only when parent cctx itself is created, which is randomized to 1/256 tests).

Expected outcome :
- at a minimum : tsan tests shall now work continuously without exploding the thread counter
- at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction

Real world impact :
minimal, I don't expect users to constantly change `nbThreads` between each invocation.
If `nbThreads` remains stable, existing implementation re-uses existing mtctx.

Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much,
since this test can select a random `nbThreads` value between 1 and 4.
The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx)
but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4.
A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size.
This would require a larger change to `POOL_*` api though.
2017-12-16 12:48:13 -08:00
Yann Collet e28305fcca fix #944 : ZSTDMT with large files and dictionary now works correctly
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.

also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
2017-12-12 18:04:58 -08:00
Yann Collet 03832b7aa5 re-added test case
messing with revert ... :(
2017-12-12 14:01:54 -08:00
Yann Collet 8a104fda05 Revert "Created a test case which reliably reproduces bug #944"
This reverts commit 5098d1fbe2.
2017-12-12 12:51:49 -08:00
Yann Collet 5098d1fbe2 Created a test case which reliably reproduces bug #944
in zstreamtest.
2017-12-12 12:48:31 -08:00
Yann Collet 5e1f34b7e4 setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
2017-12-01 21:17:09 -08:00
Yann Collet d3c59edac9 removed long-range-mode tests from `zstreamtest --no-big-tests` 2017-11-29 16:42:20 -08:00
Yann Collet 05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00
Yann Collet ee441d5d2b renamed zstd_compress.h into zstd_compress_internal.h
to emphasize the fact that all definitions it contains
must remain private, accross lib/compress modules.
2017-11-07 16:15:23 -08:00
Yann Collet 8b6aecf2cb moved a few structures from `zstd_internal.h` to `zstd_compress.h`
which is a more precise scope
2017-11-07 16:03:14 -08:00
Yann Collet 150354c5fe minor refactor
added some traces and assert
related to hunting a potential ubsan error in 32-bits more
(it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802).

Modified one pointer arithmetic expression for a more conformant way.
2017-11-01 16:57:48 -07:00
Yann Collet 32c9f715ae fixed : Visual build compressing stdin with multi-threading enabled fails
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN`  (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
2017-10-17 14:07:43 -07:00
Yann Collet beb9b4b398 fixed ZSTDMT_initCStream() when contentSizeFlag==1 by default
and a wrong test in zstreamtest --mt
2017-10-13 19:09:30 -07:00
Yann Collet 3c1e3f8ec9 contentSizeFlag enabled by default would also fail for streaming and MT operations
fixed
2017-10-13 18:32:06 -07:00
Yann Collet 67478f4cb0 fixed minor conversion warnings for printf
in debug mode
2017-10-02 17:28:57 -07:00
Yann Collet 7c9669f272 Merge pull request #873 from facebook/shorterTests
Leaner tests
2017-09-29 17:26:46 -07:00
Yann Collet 1416bc0f07 erase existence of a buffer when it's sent out of the pool
In some complex scenario,
the buffer would be freed because it's too large,
another buffer would be allocated, but fail,
trigger an error,
and the general buffer pool would then be freed,
where the definition of the already freed buffer would be found
(beyond total index, but still), and freed again, resulting in double-free error.
2017-09-29 16:27:47 -07:00
Yann Collet e963800e27 zstdmt : fixed : buffer dst0 wasn't properly set to null after usage
now it's possible to unconditionnally invoke ZSTD_releaseAllJobRessources()
wether previous compression was completed correctly or not.
2017-09-28 23:01:31 -07:00
Yann Collet 754ae5cc0b removed ZSTDMT_waitForAllJobsCompleted() from ZSTDMT_freeCCtx()
as per @terrelln comment
2017-09-28 20:45:31 -07:00
Yann Collet b93598d6a4 zstdmt : reduced maximum nb of threads
to avoid memory address space issues on 32-bits systems
(see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=876416#17)
2017-09-28 13:49:12 -07:00
Yann Collet e4ec427720 Merge branch 'dev' into shorterTests
fixed conflicts
2017-09-28 12:19:28 -07:00
Yann Collet 8074261d00 zstdmt : move on when not enough memory for a new input buffer
just continue operations without input forward progress,
instead of an error that stops current compression session.
2017-09-28 11:46:19 -07:00