274 Commits

Author SHA1 Message Date
Nick Terrell
94c77710a9 Integrate ldm with zstdmt
Integrate ldm into zstdmt by running it in serial and in order in the first
step of each job, in the same place as the hash gets updated. The input
buffer is sized to fit the whole LDM window and 2 full buffers of slack.
Input buffers cannot be reused until the LDM step is done with them.
After the LDM step is finished, the jobs don't actually have access to the
full window, only the overlap.

Tested on a few different multi-GB files with and without sanitizers,
and with different numbers of threads.
2018-03-19 16:29:03 -07:00
Nick Terrell
aa4dbd09a1
Pull job/overlap log logic into common function (#1055)
Prepares for LDM integration by separating the job size and overlap logic
into helper functions.
2018-03-19 15:56:36 -07:00
Yann Collet
6f4d0778a5 make it possible to express compression parameters in any order 2018-03-19 14:41:23 -07:00
Nick Terrell
2253d01b27 Move XXH64_update() into worker threads
* Computes the XXH hash in the worker threads.
* Workers get a sequence number and wait until ther number shows up. On
  error, ensures that its sequence is finished, so future threads don't
  get blocked.
* Sets up for ldm integration, which will go in the same spot.
2018-03-19 11:08:27 -07:00
Nick Terrell
f15a17e19f Use a single buffer in zstdmt
Summary:
Allocate a single input buffer large enough to house each job, as well as
enough space for the IO thread to write 2 extra buffers. One goes in the
`POOL` queue, and one to fill, and then block on a full `POOL` queue.
Since we can't overlap with the prefix, we allocate space for 3 extra
input buffers.

Test Plan:
* CI
* With and without ASAN/UBSAN run zstdmt with different number of threads
  on two large binaries, and verify that their checksums match.
* Test on the tip of the zstdmt ldm integration.

Reviewers: cyan

Differential Revision: https://phabricator.intern.facebook.com/D7284007

Tasks: T25664120
2018-03-15 16:21:33 -07:00
Yann Collet
50f763ec44 fixed several comments are underlined by @terrelln 2018-03-13 14:23:14 -07:00
Yann Collet
6a9b41b731 create command --fast[=#]
access negative compression levels from command line
for both compression and benchmark modes.

also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.

added relevant cli tests.
2018-03-11 20:01:23 -07:00
Yann Collet
5cb1144872 fixed --single-thread
was incorrectly set to -T0 (use as many cores as possible) previously
2018-02-13 14:56:35 -08:00
Yann Collet
c72091556b fixed minor nit as per @terrelln's comments 2018-02-09 09:46:08 -08:00
Yann Collet
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
Yann Collet
4b525af53a zstdmt: applies new parameters on the fly
when invoked from ZSTD_compress_generic()
2018-02-02 15:58:13 -08:00
Yann Collet
90eca318a7 fileio: create dedicated function to generate zstd frames
like other formats
2018-02-02 14:24:56 -08:00
Yann Collet
209df52ba2 Changed nbThreads for nbWorkers
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
2018-02-01 19:29:30 -08:00
Yann Collet
60fa90b6c0 zstdmt: added ability to change compression parameters during compression 2018-02-01 16:13:31 -08:00
Yann Collet
2cb0740b6b zstdmt: changed naming convention
to avoid confusion with blocks.

also:
- jobs are cut into chunks of 512KB now, to reduce nb of mutex calls.
- fix function declaration ZSTD_getBlockSizeMax()
- fix outdated comment
2018-01-30 14:43:36 -08:00
Yann Collet
ba0cd8cf78 fixed minor conversion warning for C++ compilation mode 2018-01-26 18:18:42 -08:00
Yann Collet
caf9e96dc3 job mutex creation is checked 2018-01-26 18:09:25 -08:00
Yann Collet
9c40ae7ff1 zstdmt: there is now one mutex/cond per job 2018-01-26 17:55:08 -08:00
Yann Collet
77e36273de zstdmt: minor code refactor for clarity 2018-01-26 17:08:58 -08:00
Yann Collet
27c5853c42 zstdmt: job table correctly cleaned after synchronous ZSTDMT_compress() 2018-01-26 14:35:54 -08:00
Yann Collet
0d426f6b83 zstdmt : refactor a few member names
for clarity
2018-01-26 13:00:14 -08:00
Yann Collet
79b6e28b0a zstdmt : flush() only lock to read shared job members
Other job members are accessed directly.
This avoids a full job copy, which would access everything,
including a few members that are supposed to be used by worker only,
uselessly requiring additional locks to avoid race conditions.
2018-01-26 12:15:43 -08:00
Yann Collet
d2b62b6fa5 minor : ZSTDMT_writeLastEmptyBlock() is a void function
because it cannot fail
2018-01-26 11:06:34 -08:00
Yann Collet
fca13c6855 zstdmt : fixed memory leak
writeLastEmptyBlock() must release srcBuffer
as mtctx assumes it's done by job worker.

minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity
2018-01-26 10:44:09 -08:00
Yann Collet
8e128eaf05 zstdmt : refactor job members
grouped by sharing properties
2018-01-26 10:20:38 -08:00
Yann Collet
a1d4041e69 zstdmt: removed job->jobCompleted
replaced by equivalent signal job->consumer == job->srcSize.

created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.

It avoids creating a job with srcSize==0.
2018-01-25 17:35:49 -08:00
Yann Collet
1272d8e760 zstdmt:: renamed mutex and cond to underline they are context-global 2018-01-25 14:52:34 -08:00
Yann Collet
5f349b129c zstdmt : correctly set end of frame 2018-01-23 15:52:40 -08:00
Yann Collet
c1cc57f270 zstdmt : fix end condition (ZSTD_e_end)
When ZSTD_e_end directive is provided,
the question is not only "are internal buffers completely flushed",
it is also "is current frame completed".

In some rare cases,
it was possible for internal buffers to be completely flushed,
triggering a @return == 0,
but frame was not completed as it needed a last null-size block to mark the end,
resulting in an unfinished frame.
2018-01-23 15:19:11 -08:00
Yann Collet
de5e38a7a6 zstdmt: fixed minor race condition
no real consequence, but pollute tsan tests :

job->dstBuff is being modified inside worker,
while main thread might read it accidentally
because it copies whole job.
But since it doesn't used dstBuff, there is no real consequence.

Other potential solution : only copy useful data, instead of whole job
2018-01-23 14:03:07 -08:00
Yann Collet
ebd955e26a zstdmt : fixed ending frame with 0-size block 2018-01-23 13:12:40 -08:00
Yann Collet
a7ef3a219c zstdmt : fixed last job size 2018-01-19 18:19:09 -08:00
Yann Collet
3ad7d4951c zstdmt : finally vanquished an elusive and rare race condition 2018-01-19 17:35:08 -08:00
Yann Collet
940634a610 zstdmt : simplify job creation
job will not be created when not enough room within job Table
2018-01-19 13:25:06 -08:00
Yann Collet
dc69623453 zstdmt: fixed corruption issue in ZSTDMT_endStream()
when invoked directly.
2018-01-19 12:41:56 -08:00
Yann Collet
70f81d6030 zstdmt uses POOL_tryAdd() to call a new worker
so that it's no longer a blocking call.
This makes it possible to stream out data gradually,
while waiting for a worker to become available.
2018-01-19 10:01:40 -08:00
Yann Collet
6f7280fb33 fixed frame checksum issue
and race conditions
2018-01-18 16:20:26 -08:00
Yann Collet
ef97d5a287 Merge branch 'progressiveMT' into progressiveFlush 2018-01-18 13:35:24 -08:00
Yann Collet
c7190c69cc fixes for @terrelln comments 2018-01-18 11:15:23 -08:00
Yann Collet
1b5d80d633 zstdmt: added ability to flush current job before it's completed
however, zstdmt may still wait on next available worker,
so it's not smooth yet.
2018-01-18 11:03:27 -08:00
Yann Collet
aa79c18e3f fixed a few access contention
passes thread sanitizer test
2018-01-17 17:18:19 -08:00
Yann Collet
394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet
d14cc881b0 zstdmt : fixed very large window sizes
would create too large buffers,
since default job size == window size * 4.

This would crash on 32-bit systems.

Also : jobSize being a 32-bit unsigned, it cannot be >= 4 GB,
so the formula was failing for large window sizes >= 1 GB.
Fixed now : max job Size is 2 GB, whatever the window size.
2018-01-17 12:39:58 -08:00
Yann Collet
58dd7de640 zstdmt: fixed an endless loop on allocation failure
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.

also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
2018-01-17 12:10:15 -08:00
Yann Collet
cb57c107ff zstdmt: minor variable renaming, for clarity 2018-01-17 11:39:07 -08:00
Yann Collet
1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet
6025465e42 ZSTDMT : minor CCtx memory optimization
can be useful when a compression job only has small amount of data to compress.
2018-01-16 15:34:41 -08:00
Yann Collet
2e23333094 ZSTDMT can now work in non-blocking mode with 1 thread
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.

Also : fixed a bug in new block-granular compression routine.
2018-01-16 15:28:43 -08:00
Yann Collet
58ecf13e02 zstdmt : can compress at block granularity
offering perspective of more accurate progression report.
2018-01-13 13:18:57 -08:00
Yann Collet
473362e922
Merge pull request #958 from facebook/continueCCtx
fix a subtle issue in continue mode
2017-12-20 00:12:50 +01:00