Commit Graph

1436 Commits (72dbf1bcd04fe429f00e7435255a16470764871c)

Author SHA1 Message Date
George Lu e89f1fb45c Fix scan-build warnings in bench.c 2018-08-14 14:44:47 -07:00
Yann Collet 3e4617ef54 frameProgression reports nbActiveWorkers and output flushed 2018-08-14 11:49:25 -07:00
Yann Collet 973a8d42c7
Merge pull request #1236 from GeorgeLu97/paramgrillconstraints
ParamgrillConstraints
2018-08-13 15:44:50 -07:00
Yann Collet 0853f86044 adaptive mode uses default window size of 8 MB 2018-08-13 13:13:22 -07:00
Yann Collet 33f7709c71 fileio: changed parameter type from ptr to plain structure
safer : this parameter is read-only,
we don't want original structure to be modified
2018-08-13 13:02:03 -07:00
Yann Collet f3aa510738 rateLimiter does not "catch up" when input speed is slow 2018-08-13 11:38:55 -07:00
Yann Collet e7a49c6683 introduced command --adapt 2018-08-11 20:48:06 -07:00
Yann Collet 9d26cb6a75 slow down faster when output speed is limited 2018-08-09 17:44:30 -07:00
Yann Collet 3d7b533f68 Merge branch 'dev' into adapt 2018-08-09 15:57:36 -07:00
Yann Collet 754942cb79 fixed assert() condition 2018-08-09 15:57:19 -07:00
Yann Collet 2dd76037be zstd cli can increase level when input is too slow 2018-08-09 15:51:30 -07:00
Yann Collet 79a35ac20d minor code comments improvements 2018-08-09 15:16:31 -07:00
Yann Collet 51e71a5ec7 added zstdgrep documentation
presenting `zstdgrep` limit regarding dictionary compression
with workaround recommended by @tobwen (#1268)
2018-08-09 12:28:25 -07:00
George Lu bfe8392e23 Remove ctx from benchMem 2018-08-09 12:07:57 -07:00
George Lu 8278a49cb6 const srcPtrs 2018-08-09 10:42:58 -07:00
George Lu 3d230db853 Change speed representation from floating point to integral 2018-08-09 10:42:58 -07:00
George Lu dd270b2f75 Renaming / Style fixes 2018-08-09 10:42:58 -07:00
George Lu e148db366e Separate capacity vs size
Also:
Make suggested fixes
-varInds_t
-reorder some arguments
-remove code duplication
-update README / -h
-Fix memory leaks
2018-08-09 10:42:58 -07:00
George Lu df026e159f Fix windows implicit casting bugs 2018-08-09 10:42:58 -07:00
George Lu 7b5b3d7ae3 BenchMem with block compressed sizes passed back up 2018-08-09 10:42:58 -07:00
George Lu 3adc217ea4 Total Changes:
Add different constraint types (decompression speed, compression memory, parameter constraints)
Separate search space by strategy + strategy selection
Memoize results
Real random restarts
Support multiple files
Support Dictionary inputs
Debug Macro for extra printing
2018-08-09 10:42:58 -07:00
George Lu eb21b7f482 Not crashing 2018-08-09 10:42:58 -07:00
George Lu 5f49034520 Working V1 2018-08-09 10:42:58 -07:00
George Lu cffb6da339 Parses additional parameters
Additional constraint checking

Minor fixes

more param parsing

Add Memory

Change paramVariation

work on feasibility

reformat bench

Changed Paramgrill to use bench.c benchmarking

customlevel macro

Printing Flag

Minor changes

Explicit casting

Makefile fix

casting, type fix

Printing Flag

Minor Changes

comments, helper fn's
2018-08-09 10:42:58 -07:00
Yann Collet 5808027abf Merge branch 'dev' into fix1241 2018-08-03 16:08:33 -07:00
Yann Collet 2fdab1629b fix unused variable warning 2018-08-03 08:30:01 -07:00
Yann Collet 5203f01774 fix : zstd cli can be built with build macro ZSTD_NOBENCH
which disables bench.c module
2018-08-03 07:54:29 -07:00
cyan4973 3f535007e4 fix %zu support under minGW
and relevant test on Appveyor
2018-07-30 16:56:18 +02:00
George Lu 09ccd977c3 no zero 2018-07-26 15:17:58 -07:00
Yann Collet effa84c8d1
Merge pull request #1230 from terrelln/train-out
zstdcli: Allow -o before --train
2018-07-18 16:34:10 +02:00
Nick Terrell 4e706d7f2c fileio: Error in compression on read errors
We can write a corrupted file if the input file errors during a read.
We should return a non-zero error code in this case.
2018-07-17 15:26:30 -07:00
Nick Terrell 58b8219475 zstdcli: Allow -o before --train
Only set the default value if `outFileName` is unset.

Fixes #1227.
2018-07-16 12:45:34 -07:00
Nick Terrell 45821fac0c
Merge pull request #1225 from jennifermliu/dev
Split samples when building dictionary for COVER
2018-07-13 13:26:15 -07:00
Jennifer Liu 612b346ed5 Add explanation for split=100 2018-07-11 15:50:28 -07:00
Jennifer Liu 5021441d86 Change default splitPoint to 100 2018-07-10 11:19:33 -07:00
Jennifer Liu bfad1af031 Update doc for split==100 2018-07-05 11:05:31 -07:00
Jennifer Liu 0881184c89 Some edits based on pull request comments 2018-07-03 17:53:27 -07:00
Yann Collet 689bfecd48
Merge pull request #1188 from GeorgeLu97/BenchModule
Bench module
2018-07-02 13:33:27 -07:00
Jennifer Liu 8afcb8eea7 Update documentation 2018-07-01 19:59:37 -07:00
Jennifer Liu 84e8b2a305 Fix another declaration issue 2018-06-29 18:02:02 -07:00
Jennifer Liu 348e5f77a9 Add split=# to cli 2018-06-29 17:54:41 -07:00
Yann Collet b5207aadfa make build tests more unforgiving
`-Werror` will ensure they fail if there is the slightest warning.

fix a minor warning specific to `zstd_decompress` variant.
2018-06-29 17:10:56 -07:00
W. Felix Handte 712a9fd972 Allow Invoking `zstd --list` When `stdin` is not a `tty`
Also now returns an error when no inputs are given.

New proposed behavior:

```
felix@odin:~/prog/zstd (list-stdin-check)$ ./zstd -l; echo $?
No files given
1
felix@odin:~/prog/zstd (list-stdin-check)$ ./zstd -l Makefile.zst; echo $?
Frames  Skips  Compressed  Uncompressed  Ratio  Check  Filename
     1      0     3.08 KB      10.92 KB  3.544  XXH64  Makefile.zst
0
felix@odin:~/prog/zstd (list-stdin-check)$ ./zstd -l <Makefile.zst; echo $?
zstd: --list does not support reading from standard input
No files given
1
felix@odin:~/prog/zstd (list-stdin-check)$ ./zstd -l Makefile.zst <Makefile.zst; echo $?
Frames  Skips  Compressed  Uncompressed  Ratio  Check  Filename
     1      0     3.08 KB      10.92 KB  3.544  XXH64  Makefile.zst
0
felix@odin:~/prog/zstd (list-stdin-check)$
```
2018-06-29 15:33:44 -04:00
Yann Collet a2c3a4cd0e
Merge pull request #1214 from jennifermliu/dev
Make --fast=0 fail
2018-06-27 18:53:39 -07:00
Yann Collet 1fd621ff6d minor man page update
regarding advanced parameter `tlen`
which was recently changed.
`0` in association with `ZSTD_fast` now means "normal fast mode".
2018-06-27 18:49:02 -07:00
Jennifer Liu 1ab57a7ce1 Redirect failed test result to INTOVOID and update comment about parsing fast command 2018-06-27 16:27:45 -07:00
Jennifer Liu aef8486fee Make fast=0 fail 2018-06-27 14:27:27 -07:00
cyan4973 f741fb8fcd minor fixes for MSYS2 compilation 2018-06-26 01:22:45 -07:00
George Lu 50d612f4f0 Interleave compression/decompression
Fix Bugs
2018-06-25 15:01:03 -07:00
George Lu d6121ad0e1 Opaque State
And minor fixups (comments/alignment/checks/fix memory leak)
2018-06-25 08:07:43 -07:00
George Lu ab26f24c9c benchFunction Timed Wrappers
Add BMK_benchFunctionTimed
Add BMK_init_customResultCont..
Change benchMem to use benchFunctionTimed
Minor Fixes/Adjustments
2018-06-21 16:23:55 -07:00
George Lu a8eea99ebe Incremental Display + Fn Separations
Seperate syntheticTest and fileTableTest (now renamed as benchFiles)
Add incremental display to benchMem
Change to only iterMode for benchFunction
Make Synthetic test's compressibility configurable from cli (using -P#)
2018-06-21 16:23:18 -07:00
Yann Collet 93702a7a62
Merge pull request #1198 from facebook/msdebug
made Visual Studio compatible with DEBUGLEVEL >= 2
2018-06-20 12:26:31 -07:00
cyan4973 ae0b7ffa0a made Visual Studio compatible with DEBUGLEVEL >= 2 2018-06-20 09:45:02 -07:00
Yann Collet 6768cf53fd
Merge pull request #1190 from terrelln/ldm-adjust
Adjust advanced parameters to source size
2018-06-19 14:40:56 -07:00
Yann Collet c0b6ce95b1
Merge pull request #1179 from supertopher/dev
Improves UX for --list command's lack of support for pipes
2018-06-19 14:36:30 -07:00
Nick Terrell 1d0fcde45d Use debug.h in fileio.c 2018-06-18 15:51:21 -07:00
Nick Terrell 3841dbac84 Adjust advanced parameters to source size
In the new advanced API, adjust the parameters even if they are explicitly
set. This mainly applies to the `windowLog`, and accordingly the `hashLog`
and `chainLog`, when the source size is known.
2018-06-18 15:49:31 -07:00
George Lu a3c8b59990 Fix cli no print
Change looping behavior to match old
2018-06-18 15:38:14 -07:00
George Lu e482e328cd Reorder Arguments
make initFn nullable
2018-06-18 13:21:42 -07:00
George Lu 0d1ee22990 Requested Changes
Add Comment
Simplify Interface (Remove resultSet)
Reorder Arguments
Remove customBench displayLevel
Reorder bench.h
Change benchFiles return type to match advanced
Rename stuff
2018-06-18 12:01:12 -07:00
George Lu 8522346322 Make Fullbench use new function
Rearrange Args
Add nothing function
Use new function, change locals to match
New Display
Comment cleanup
Change builds
2018-06-15 11:37:49 -04:00
George Lu 20f4f32379 Add to bench
-Remove global variables
-Remove gv setting functions
-Add advancedParams struct
-Add defaultAdvancedParams();
-Change return type of bench Files
-Change cli to use new interface
-Changed error returns to own struct value
-Change default compression benchmark to use decompress_generic
-Add CustomBench function
-Add Documentation for new functions
2018-06-14 14:23:24 -04:00
Topher Lubaway 6bca3fb4bf Reduce noise in diff
putting the code block back on the exact line it came from
2018-06-13 14:32:59 -07:00
Topher Lubaway ec24f98cca Removes duplicate IS_CONSOLE from PR
I misunderstood that this function was included already
2018-06-13 13:39:23 -07:00
Yann Collet c986dbf241
Merge pull request #1168 from GeorgeLu97/paramgrillfeatures
Have paramgrill share bench.c benchmarking function
2018-06-13 11:38:29 -04:00
George Lu 01d940b670 Requested changes
-Remove g_displaylevel/setNotificationLevel function
-Add extern "C"
-Remove averaging
-Reorder arguments

More fixes

-Added BMK_return_t (result + possible error)
-Correct comment'
-Nullcheck ctx, dctx when allocated
-Remove extra assert
2018-06-12 17:02:44 -04:00
Topher Lubaway b024e1e1f4 Keep windows specific headers
Accidentially deleted this existing windows only header
2018-06-12 10:16:27 -07:00
Topher Lubaway 88ae51acb3 Multi-OS support for --list detecting stream input
IS_CONSOLE stolen wholesale from Options.cpp
not sure if i should have extracted that code for DRY-ness
tested in OSX and functionality seems appropriate
unstested in a windows environment
2018-06-12 07:59:17 -07:00
Topher Lubaway 881defaeb3 Only check for tty in non-windows environments
unistd.h is for unix standard tools.
There does not appear to be a simple isatty for windows
this we only run the logic and header include in
non-windows environments
2018-06-11 15:26:35 -07:00
Topher Lubaway 5ca1d5c6f4 Properly brackets isatty if statement
¯\_(ツ)_/¯ this is my first commit in c
2018-06-11 12:19:15 -07:00
Topher Lubaway 4c16608e3c Improves UX for --list command's lack of support for pipes
--list does not support piped input
This checks for a terminal and exits 1 with a well formatted
error message if the STDIN is not from a terminal
2018-06-11 10:13:00 -07:00
Ryan Schmidt b567ce9d68 Fix name of macOS 2018-06-09 14:31:17 -05:00
George Lu 0e808d608b Make paramgrill use bench.c benchmarking 2018-06-08 12:01:05 -07:00
Yann Collet d3615c28db
Merge pull request #1159 from GeorgeLu97/suffixlist
Unknown Suffix Error
2018-06-01 14:00:10 -07:00
George Lu 8984cc93d6 update display 2018-05-31 18:04:05 -07:00
George Lu 547096d672 update man 2018-05-31 18:03:52 -07:00
George Lu c9b1068298 removed strcats 2018-05-31 17:47:29 -07:00
George Lu 5ff30fe2e5 Unknown Suffix Error
Changed so only compiled formats are printed in list of supported extensions
2018-05-31 16:13:36 -07:00
George Lu 140f59d38e Added --format=zstd
title
2018-05-31 15:29:35 -07:00
Yann Collet 174bd3d4a7
Merge pull request #1131 from facebook/zstdcli
minor: control numeric argument overflow
2018-05-14 11:53:58 -07:00
Yann Collet 9cd5c63771 cli: control numeric argument overflow
exit on overflow
backported from paramgrill
added associated test case
2018-05-12 14:29:33 -07:00
Yann Collet b824d213cb fix #1115 2018-05-12 10:21:30 -07:00
cyan4973 62487b5e76 fixed decoding bogus lz4 frame
FIO would keep presenting data after an LZ4F decoding error
resulting in a NULL pointer dereference
when associated with older liblz4 version (< v1.8.1.2)
2018-04-23 18:50:16 -07:00
Yann Collet 1da629f2ad
Merge pull request #1104 from terrelln/fast-train
Allow negative compression levels in training
2018-04-09 14:16:20 -07:00
Nick Terrell 569e2abccd Allow negative compression levels in training
* Set `dictCLevel` in `zstdcli.c`.
* Only set to default level if the compression level `== 0`, not `<= 0`.
2018-04-09 12:12:03 -07:00
Björn Ketelaars e5ea8d272a fix typo in programs/zstd.{1,1.md}
s/nodictID/no-dictID/g
2018-04-05 06:44:46 +02:00
Yann Collet 7188862d32
Merge pull request #1086 from hagemt/hagemt-patch-1
Correct small typo in manual (man file and markdown)
2018-03-30 20:45:10 -06:00
Tor E Hagemann c7a5e60bc6
Update zstd.1.md 2018-03-30 15:25:32 -07:00
Tor E Hagemann 292d370ab4
Update zstd.1 2018-03-30 14:53:57 -07:00
Yann Collet 525f3fab33 restored ability to manually set overlapLog 2018-03-28 11:33:41 -06:00
Yann Collet 01082a39bd restored simple status line during zstd compression
the more advanced one, featuring amount of data buffered,
is triggered on `-v`.
2018-03-22 17:49:46 -07:00
Yann Collet 153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
Yann Collet 353117c5d7 implemented ZSTD_DCtx_loadDictionary*()
this required updating ZSTD_createDDict_advanced()
to accept a dictContentType parameter (raw, full, auto).
2018-03-20 13:40:29 -07:00
Yann Collet 4c5cbac179
Merge pull request #1041 from facebook/fasterFast
Negative compression levels
2018-03-13 21:32:46 -07:00
Yann Collet bd7bb94361
Merge pull request #1044 from baldurk/remove-utf8-characters
Remove non-ASCII characters in header file comments
2018-03-13 13:22:07 -07:00
Baldur Karlsson 430a2fec19 Remove non-ASCII characters in header file comments
* Replaced a non-breaking space and an en dash with a plain space and
  a hyphen.
* This means the files are simple ASCII and less likely to run into
  codepage issues.
2018-03-13 20:05:53 +00:00
Jesse Talavera-Greenberg 2f70fbf2a3
Made -H's printout specify the semantics of -T0 2018-03-12 20:43:32 -04:00
Yann Collet a57d43d4d4 updated documentation of targetLength 2018-03-12 11:35:01 -07:00
Yann Collet f24566b597 minor bench improvements
- do not test level 0, as it is converted into level 3,
  which feels strange when compressing multiple levels
- Use direct synchronous mode when a single worker is requested.
2018-03-12 04:02:57 -07:00
Yann Collet 6a9b41b731 create command --fast[=#]
access negative compression levels from command line
for both compression and benchmark modes.

also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.

added relevant cli tests.
2018-03-11 20:01:23 -07:00
Yann Collet a70f7e10fa Merge branch 'benchDecode' into longOffsetMode 2018-03-05 14:09:00 -08:00
Yann Collet 03e7e14192 fix benchmark issue when measuring only decoding speed
zstd bench module can focus on decompression speed _only_.
This is useful when trying to measure performance
on large input data compressed using a high level
as compression time becomes problematic (too long).

This mode is triggered by command : zstd -b -d

Problem was : in such a mode,
measured decoding speed was > 10% slower
than in nominal mode (compression + decompression),
making decompression benchmark mode much less useful.

This patch fixes the issue.
It's not completely clear why, but
moving the `memcpy()` operation sooner in the pipeline fixed it.

I can still measure some difference, but it is in the < 2% range,
so it's much more tolerable.

also : it doesn't matter anymore in which order are selected
commands `-b` and `-d`.
The combination always triggers bench_decodeOnly mode.
2018-03-05 13:57:41 -08:00
Yann Collet 41bd10446e Merge branch 'dev' into longOffsetMode 2018-03-05 13:10:10 -08:00
Yann Collet b91ddf0ae6 Merge branch 'dev' into longOffsetMode 2018-03-05 11:59:54 -08:00
Conrad Meyer 606374269c FIO_addFInfo: Fully initialize output 'total' struct
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value.  Still, appeasing Coverity is pretty harmless in this case.)
2018-02-28 15:23:05 -08:00
Yann Collet 25d00d10fc fixed minor conversion warning 2018-02-20 16:52:28 -08:00
Yann Collet 3538a535bf use TIMELOOP_NANOSEC
as suggested by @terrelln
2018-02-20 15:33:56 -08:00
Yann Collet d3364aa39e improve benchmark measurement for small inputs
by invoking time() once per batch, instead of once per compression / decompression.
Batch is dynamically resized so that each round lasts approximately 1 second.

Also : increases time accuracy to nanosecond
2018-02-20 14:58:40 -08:00
Yann Collet 5cb1144872 fixed --single-thread
was incorrectly set to -T0 (use as many cores as possible) previously
2018-02-13 14:56:35 -08:00
Yann Collet 04a3f85ce7 fixed gcc warning on a switch code path 2018-02-09 16:16:27 -08:00
Yann Collet 75689838e4 specify new command --single-thread 2018-02-09 15:55:41 -08:00
Yann Collet 4beaeaace5 Merge branch 'dev' into flexibleLevel 2018-02-09 09:15:05 -08:00
Yann Collet 4b525af53a zstdmt: applies new parameters on the fly
when invoked from ZSTD_compress_generic()
2018-02-02 15:58:13 -08:00
Yann Collet 90eca318a7 fileio: create dedicated function to generate zstd frames
like other formats
2018-02-02 14:24:56 -08:00
Yann Collet 549d26ae71
Merge pull request #1005 from systemcrash/dev
Update zstd.1
2018-02-02 10:04:40 -08:00
Yann Collet 6c492af284 fixed minor conversion warning 2018-02-01 20:16:00 -08:00
Yann Collet 209df52ba2 Changed nbThreads for nbWorkers
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
2018-02-01 19:29:30 -08:00
Yann Collet 4b6a94f0cc clarified comments on LDM parameters 2018-02-01 17:07:27 -08:00
Yann Collet 2bfc79ab8d removed bitstream.h dependency 2018-02-01 16:13:04 -08:00
Yann Collet 823a28a1f4
Merge pull request #1000 from facebook/progressiveFlush
Progressive flush
2018-01-30 22:49:47 -08:00
systemcrash d13a75c969
Update zstd.1 2018-01-29 18:38:02 +01:00
Yann Collet 9f8ed23b5b bumped version number to v1.3.4
also added a paragraph on using compression level with training mode
as this is a recurrent question (see for example #1004)
2018-01-27 22:23:26 -08:00
ne-sted 50aea2f293 cli: fix align of defaults 2018-01-24 15:07:22 +02:00
Yann Collet cb5eba8e20 add `zcat` symlink support, suggested by @wtarreau
added some test
also updated relevant doc

+ fixed a mistake in `lz4` symlink support :
  lz4 utility doesn't remove source files by default (like zstd, but unlike gzip).
  The symlink must behave the same.
2018-01-19 11:26:35 -08:00
Yann Collet 70f81d6030 zstdmt uses POOL_tryAdd() to call a new worker
so that it's no longer a blocking call.
This makes it possible to stream out data gradually,
while waiting for a worker to become available.
2018-01-19 10:01:40 -08:00
Yann Collet 4d08ba8b77 fileio: READY_FOR_UPDATE() is now a function-like macro
as suggested by @terrelln
2018-01-18 11:27:13 -08:00
Yann Collet aa79c18e3f fixed a few access contention
passes thread sanitizer test
2018-01-17 17:18:19 -08:00
Yann Collet 394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet 58dd7de640 zstdmt: fixed an endless loop on allocation failure
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.

also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
2018-01-17 12:10:15 -08:00
Yann Collet 3e1e57db27 fix fileio progression status update
The compression % is no longer correct,
since it's no longer possible to make direct correlation
between nb bytes read and nb bytes written
due to large internal buffer inside CCtx
(exacerbated with --long).

The current "fix" is to no longer display the %.

A more complex solution will have to count exactly how much data has been consumed and compressed internally, within CCtx buffers.
2018-01-16 17:35:00 -08:00
Yann Collet 10c213761a cli: fix for no-MT mode
when cli is compiled without MT support,
invoking ZSTD_p_nonBlockingMode result in an error code.

This patch only sets ZSTD_p_nonBlockingMode when ZSTD_MULTITHREAD is set, meaning there is MT support.

The error code could also be intentionnally ignored (there is no side effect).
2018-01-16 17:28:11 -08:00
Yann Collet 1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet 58ecf13e02 zstdmt : can compress at block granularity
offering perspective of more accurate progression report.
2018-01-13 13:18:57 -08:00
Yann Collet 1edf33764e
Merge pull request #974 from terrelln/dstfile
[fileio] Improve safety of output file modifications
2018-01-10 19:02:48 +01:00
Yann Collet 752880ffed
Merge pull request #963 from facebook/benchfix
fix: bench can accept hlog custom parameter
2018-01-06 06:57:02 +01:00
Nick Terrell ed9611dc62 [fileio] Don't call FIO_remove() on stdout or /dev/null 2018-01-05 11:50:24 -08:00
Nick Terrell 282ad05e0a [fileio] Use FIO_remove() everywhere for safety 2018-01-05 11:44:45 -08:00
Nick Terrell fd63140e1c [util] Refuse to set file stat on non-regular file 2018-01-05 11:44:25 -08:00
Pádraig Brady e0596715dc zstd: fix crash when not overwriting existing files
This fixes the following crash:
  $ touch exists
  $ programs/zstd -r examples/ -o exists
  zstd: exists already exists; not overwritten
  Segmentation fault (core dumped)

* programs/fileio.c (FIO_compressMultipleFilenames):
Handle the case where we're not overwriting the destination.

Reported at https://bugzilla.redhat.com/1530049
2018-01-02 15:24:09 +00:00
Yann Collet c707c6e9f2 fix: bench can accept hlog custom parameter
was ignored during initialization
2017-12-27 13:32:05 +01:00
Yann Collet cc9e026866
Merge pull request #952 from terrelln/merge-end
[fileio] Merge end loop for small optimization
2017-12-15 10:27:53 -08:00
Yann Collet 2cff66b62f version bump to v1.3.3 2017-12-14 16:11:20 -08:00
Nick Terrell f48d34edba [fileio] Merge end loop for small optimization 2017-12-14 15:52:24 -08:00
Yann Collet a0ac8c895c
Merge pull request #950 from facebook/srcSizeAdaptation
fix adaptation on srcSize
2017-12-14 14:48:31 -08:00
Yann Collet 2e97a6d464 fixed minor declaration-after-statement warning 2017-12-13 18:50:05 -08:00
Yann Collet 5432ef6921 fixes adaptation on srcSize
This patch restores capability for each file to receive adapted compression parameters depending on its size.

The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.

Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
2017-12-13 17:45:26 -08:00
Nick Terrell 4680e85bdf Allow -o with multiple files 2017-12-13 17:44:34 -08:00
Yann Collet 4d0dfafa7b
Merge pull request #949 from terrelln/rrm
[fileio] Refuse to remove non-regular file
2017-12-13 17:36:39 -08:00
Nick Terrell 82bc8fe0cc [fileio] Refuse to remove non-regular file 2017-12-13 13:38:26 -08:00
Nick Terrell b5e7f6c0f3 [fileio] Fix window size MB calculation
Test command:
```
head -c 10000 /dev/zero | ./zstd -c --zstd=wlog=12 | ./zstd -M2048 -t
```
2017-12-13 10:57:01 -08:00
Yann Collet 31293330d0 It's still necessary to check PLATFORM_POSIX_VERSION for clock_gettime()
glibc/uclibc is not enough
2017-12-04 16:31:59 -08:00
Yann Collet 0097469238 removed a few redundant #include 2017-12-04 16:02:42 -08:00
Yann Collet e46194bbf9 fix #911 : changed detection macro for clock_gettime()
The new macro might be a bit too restrictive.
Systems which do not support new test will simply default to <time.h>'s `clock_t clock()`,
suffering lesser benchmark accuracy.
Should it matter, the detection macro will have to be upgraded.
2017-12-04 15:57:01 -08:00
Yann Collet 55faa5492d fileio: fixed LZ4F invocation from assert() 2017-12-04 11:26:59 -08:00
Yann Collet af2fbbcb0d
Merge pull request #939 from facebook/shorterCircleCI
Faster CircleCI tests
2017-12-04 11:22:30 -08:00
Yann Collet 71f012e5bf zstdcli: fixed minor warning when bench module not enabled
one variable defined but not used
2017-12-01 17:42:46 -08:00
Yann Collet a1b24e6262
Merge pull request #938 from terrelln/time
Use util.h for timing
2017-12-01 16:40:38 -08:00
Nick Terrell dab8cfa3c7 Combine definitions of SEC_TO_MICRO 2017-11-30 19:40:53 -08:00
Nick Terrell 9a2f6f477b Use util.h for timing 2017-11-30 14:57:25 -08:00
Yann Collet 2f22a6ec50 Merge branch 'dev' into opt3 2017-11-28 15:03:58 -08:00
Yann Collet 0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
W. Felix Handte baff9dd15e Fix LZ4 Compression Buffer Overflow
Fixes issue where, when `zstd --format=lz4` is fed an input larger than 128KB,
the read overruns the input buffer. This changes Zstd to use LZ4 with chained
64KB blocks. This is technically a breaking change in that some third party
LZ4 implementations may not support linked blocks. However, progress should not
be allowed to be stopped by such petty concerns as backwards compatibility!
2017-11-28 12:07:26 -05:00
Yann Collet 743b23878e install: changed variable MANDIR into MAN1DIR
MANDIR still exists, and is now the parent of MAN1DIR
2017-11-27 13:47:35 -08:00
Yann Collet 2fd765498a updated man page
following patch #931 by @scottchiefbaker
2017-11-24 17:20:54 -08:00
Yann Collet c857ee850a minor update 2017-11-24 16:44:28 -08:00
Scott Baker 31a191b178 Include information about the benchmark output/methodology
Addresses #930
2017-11-22 20:34:25 -08:00
Yann Collet daebc7fe26 bench: slightly adjusted display format
adapt accuracy depending on value.
makes it possible to have higher accuracy for small value,
notably small compression speed.
This capability is expected to be useful while modifying optimal parser.
2017-11-18 15:54:32 -08:00
Nick Terrell a6052af0e8 [zstd] Fix rare bug with signal handler 2017-11-17 16:38:56 -08:00
Yann Collet 5b957ba899 minor interface adjustments 2017-11-17 01:21:40 -08:00
Yann Collet d898fb7ba6 bench: added cli command `-S` to benchmark multiple files separately
Currently, all files are joined by default,
they are compressed separately but benchmarked together,
providing a single final result.

Benchmarking files separately make it possible to accurately measure difference for each file.
This is expected to be useful while tuning optimal parser.
2017-11-17 00:22:55 -08:00
Yann Collet 8accfa7fcc bench: realTime is a global parameter
like most parameters not directly related to compression
2017-11-17 00:02:37 -08:00
Yann Collet 9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet 6f1dfa8adf removed line with `//` comment
this is for a different topic
(better parameter adaptation for small files + dictionary and/or custome parameters)
2017-11-01 17:01:45 -07:00
Yann Collet 428e8b3bf4 fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.

This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.

It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
2017-11-01 13:15:23 -07:00
Nick Terrell b495140f67 Update BUCK files
* Correct XXH namespace (Fixes #901)
* Multithreading always enabled
* GZIP/LZ4/LZMA always enabled
* Legacy support always fully enabled
2017-10-25 12:47:57 -07:00
Yann Collet 91535d71ec fixed missing zstdmt_compress.h dependency
we lose a warning message :
when a job size is chosen < minimum job size for multithreading,
it is automatically resized to minimum size.

If this information is really useful, it should be present in zstd.h now.
2017-10-19 12:09:34 -07:00
Yann Collet eac42534fe bench: fixed Visual warning regarding struct initialization
also :
removed dependency on zstdmt_compress.h
removed several unused macros
fileio : small code refactoring to reduce some variable scope
2017-10-19 11:56:14 -07:00
Yann Collet d3b9547aa4 IO and bench : ZSTD_NEWAPI is the only remaining code path
removed the other 2 code paths (single thread, and ZSTDMT ones)
keeping only the new advanced API, for easier code coverage.

It shall also fix identified issue with Visual Studio
which doesn't have ZSTD_NEWAPI defined.
2017-10-18 17:01:53 -07:00
Yann Collet 300e1df0a3 fixed wrong test to display compression status 2017-10-18 11:41:52 -07:00
Yann Collet 18b795374a UTIL_getFileSize() returns UTIL_FILESIZE_UNKNOWN on failure
UTIL_getFileSize() used to return zero on failure.
This made it impossible to distinguish a failure from a genuine empty file.
Both cases where coalesced.

Adding UTIL_FILESIZE_UNKNOWN constant has many consequences on user code,
since in many places, the `0` was assumed to mean "error".
This is no longer the case, and the error code must be actively checked.
2017-10-17 16:14:25 -07:00
Yann Collet 32c9f715ae fixed : Visual build compressing stdin with multi-threading enabled fails
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN`  (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
2017-10-17 14:07:43 -07:00
Yann Collet fc8d293460 dictionary compression use correct file size estimation
when determining compression parameters
to compress one file only.

For multiple files, it still "bets" that files are going to be small.

There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
2017-10-14 01:21:43 -07:00
Yann Collet 9ef32b3cf1 minor : zstd -l -v display each file name 2017-10-14 00:02:32 -07:00
Yann Collet dd18d73e7e fileio: content size is enabled by default 2017-10-13 16:32:18 -07:00
Nick Terrell 6dd958eea2 [zstdcli] Add window size to verbose list
```
> zstd --list -v file1 file2 file3
*** zstd command line interface 64-bits v1.3.2, by Yann Collet ***
Window Size: 512.00 KB (524288 B)
Compressed Size: 0.02 KB (19 B)
Check: XXH64

Window Size: 8192.00 KB (8388608 B)
Compressed Size: 0.02 KB (19 B)
Check: XXH64

Window Size: 512.00 KB (524288 B)
Compressed Size: 0.01 KB (15 B)
Check: None

```
2017-10-04 12:26:28 -07:00
Yann Collet 3b27ed41fd Merge branch 'srcSize' into dev 2017-10-02 16:34:14 -07:00
Yann Collet 4946993f87 removed isRegularFile parameter
no longer useful : size of src is determined for each file.
2017-10-02 12:29:25 -07:00
Yann Collet 7f580f9ee8 interruption handler and variable are static 2017-10-02 11:39:05 -07:00
Yann Collet fe5444bc66 removed the statement for all versions of Visual Studio 2017-10-02 02:02:16 -07:00
Yann Collet 51d82d5516 same error in Visual Studio 2012 ... 2017-10-02 01:12:40 -07:00
Yann Collet ed7ae4c9bd The issue also impacts Visual Studio 2010 2017-10-02 00:45:28 -07:00
Yann Collet 6e7ba3df2f added (void)sig to avoid compilers complaining that sig is not used. 2017-10-02 00:19:47 -07:00
Yann Collet 82bc200f82 conditionnally removed invocation that generates a buggy warning with Visual Studio 2008 2017-10-02 00:02:24 -07:00
Yann Collet bd18095edc blindfix for Visual : minor casting issue
should not happen since SIGIGN is provided by <signal.h>,
so it should work "ouf of the box"
2017-10-01 15:32:48 -07:00
Yann Collet 00fc1ba8dd cli: add Ctrl-C support, requested by @mike155 in #854
Now, pressing Ctrl-C during compression or decompression
will erase operation artefact (unfinished destination file)
before leaving execution.
2017-10-01 12:10:26 -07:00
Yann Collet e580dc6a4a Merge pull request #860 from felixhandte/zstd-lz4-support-tests
Add Default LZ4 Support When Available
2017-09-29 22:32:54 -07:00
Yann Collet 8afb151c9b cli: fixed wrong initialization in MT mode
It's not good to mix old and new API
ZSTD_resetCStream() doesn't just set pledgedSrcSize :
it also sets the CCtx for a single thread compression.

Problem is, when 2+ threads are defined in cctx->requestedParams,
ZSTD_compress_generic() will want to start MT compression,
since initialization is supposed to have already happened (thanks to ZSTD_resetCStream())
except that the underlying ZSTDMT_CCtx* object is not created,
resulting in a segfault.

This is an invalid construction
(correct one is to use ZSTD_CCtx_setPledgedSrcSize()).
I haven't found a nice way to mitigate this impact if someone makes the same mistake.

At some point, removing the old API to keep only the new API within fileio.c will limit these risks.
2017-09-29 22:14:37 -07:00
Yann Collet fbd5ab7027 minor fix : no longer use fake srcSize during resource creation
srcSize is read and provided at each file, not at resource creation.
This used to be useful with older API, because it could not re-adapt parameters between sessions.

At some point, it will be better to remove the old code, and only keep the new_api.
It works fine by now.
2017-09-29 19:40:27 -07:00
Yann Collet db1668a43b fix : srcSize written in frame header when multiple files compressed
This information used to be disabled when nbFiles>1.
It was badly initialized later in the code, resulting in an error.
2017-09-29 18:05:18 -07:00