updated man page

and added `--adapt` test in `playTests.sh`
dev
Yann Collet 2018-09-19 17:37:22 -07:00
parent 6b07a66aec
commit 45010da074
3 changed files with 60 additions and 50 deletions

View File

@ -1,5 +1,5 @@
.
.TH "ZSTD" "1" "2018-06-27" "zstd 1.3.5" "User Commands"
.TH "ZSTD" "1" "September 2018" "zstd 1.3.5" "User Commands"
.
.SH "NAME"
\fBzstd\fR \- zstd, zstdmt, unzstd, zstdcat \- Compress or decompress \.zst files
@ -100,6 +100,10 @@ Display information related to a zstd compressed file, such as size, ratio, and
\fB#\fR compression level [1\-19] (default: 3)
.
.TP
\fB\-\-fast[=#]\fR
switch to ultra\-fast compression levels\. If \fB=#\fR is not present, it defaults to \fB1\fR\. The higher the value, the faster the compression speed, at the cost of some compression ratio\. This setting overwrites compression level if one was set previously\. Similarly, if a compression level is set after \fB\-\-fast\fR, it overrides it\.
.
.TP
\fB\-\-ultra\fR
unlocks high compression levels 20+ (maximum 22), using a lot more memory\. Note that decompression will also require more memory when using these levels\.
.
@ -111,16 +115,16 @@ enables long distance matching with \fB#\fR \fBwindowLog\fR, if not \fB#\fR is n
Note: If \fBwindowLog\fR is set to larger than 27, \fB\-\-long=windowLog\fR or \fB\-\-memory=windowSize\fR needs to be passed to the decompressor\.
.
.TP
\fB\-\-fast[=#]\fR
switch to ultra\-fast compression levels\. If \fB=#\fR is not present, it defaults to \fB1\fR\. The higher the value, the faster the compression speed, at the cost of some compression ratio\. This setting overwrites compression level if one was set previously\. Similarly, if a compression level is set after \fB\-\-fast\fR, it overrides it\.
.
.TP
\fB\-T#\fR, \fB\-\-threads=#\fR
Compress using \fB#\fR working threads (default: 1)\. If \fB#\fR is 0, attempt to detect and use the number of physical CPU cores\. In all cases, the nb of threads is capped to ZSTDMT_NBTHREADS_MAX==200\. This modifier does nothing if \fBzstd\fR is compiled without multithread support\.
.
.TP
\fB\-\-single\-thread\fR
Does not spawn a thread for compression, use caller thread instead\. This is the only available mode when multithread support is disabled\. In this mode, compression is serialized with I/O\. (This is different from \fB\-T1\fR, which spawns 1 compression thread in parallel of I/O)\. Single\-thread mode also features lower memory usage\.
Does not spawn a thread for compression, use a single thread for both I/O and compression\. In this mode, compression is serialized with I/O, which is slightly slower\. (This is different from \fB\-T1\fR, which spawns 1 compression thread in parallel of I/O)\. This mode is the only one available when multithread support is disabled\. Single\-thread mode features lower memory usage\. Final compressed result is slightly different from \fB\-T1\fR\.
.
.TP
\fB\-\-adapt\fR
\fBzstd\fR will dynamically adapt compression level to perceived I/O conditions\. Compression level adaptation can be observed live by using command \fB\-v\fR\. The feature works when combined with multi\-threading and \fB\-\-long\fR mode\. It does not work with \fB\-\-single\-thread\fR\. It sets window size to 8 MB by default (can be changed manually, see \fBwlog\fR)\. Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible\. \fInote\fR : at the time of this writing, \fB\-\-adapt\fR can remain stuck at low speed when combined with multiple worker threads (>=2)\.
.
.TP
\fB\-D file\fR
@ -194,7 +198,7 @@ All arguments after \fB\-\-\fR are treated as files
Use FILEs as training set to create a dictionary\. The training set should contain a lot of small files (> 100), and weight typically 100x the target dictionary size (for example, 10 MB for a 100 KB dictionary)\.
.
.IP
Supports multithreading if \fBzstd\fR is compiled with threading support\. Additional parameters can be specified with \fB\-\-train\-fastcover\fR\. The legacy dictionary builder can be accessed with \fB\-\-train\-legacy\fR\. The cover dictionary builder can be accessed with \fB\-\-train\-cover\fR\. Equivalent to \fB\-\-train\-fastCover=d=8,steps=4\fR\.
Supports multithreading if \fBzstd\fR is compiled with threading support\. Additional parameters can be specified with \fB\-\-train\-fastcover\fR\. The legacy dictionary builder can be accessed with \fB\-\-train\-legacy\fR\. The cover dictionary builder can be accessed with \fB\-\-train\-cover\fR\. Equivalent to \fB\-\-train\-fastcover=d=8,steps=4\fR\.
.
.TP
\fB\-o file\fR
@ -218,11 +222,10 @@ A dictionary ID is a locally unique ID that a decoder can use to verify it is us
.
.TP
\fB\-\-train\-cover[=k#,d=#,steps=#,split=#]\fR
Select parameters for the default dictionary builder algorithm named cover\. If \fId\fR is not specified, then it tries \fId\fR = 6 and \fId\fR = 8\. If \fIk\fR is not specified, then it tries \fIsteps\fR values in the range [50, 2000]\. If \fIsteps\fR is not specified, then the default value of 40 is used\. If \fIsplit\fR is not specified or \fIsplit\fR <= 0, then the default value of 100 is used\. If \fIsplit\fR is 100, all input samples are used for both training and testing
to find optimal _d_ and _k_ to build dictionary.Requires that \fId\fR <= \fIk\fR\.
Select parameters for the default dictionary builder algorithm named cover\. If \fId\fR is not specified, then it tries \fId\fR = 6 and \fId\fR = 8\. If \fIk\fR is not specified, then it tries \fIsteps\fR values in the range [50, 2000]\. If \fIsteps\fR is not specified, then the default value of 40 is used\. If \fIsplit\fR is not specified or split <= 0, then the default value of 100 is used\. Requires that \fId\fR <= \fIk\fR\.
.
.IP
Selects segments of size \fIk\fR with highest score to put in the dictionary\. The score of a segment is computed by the sum of the frequencies of all the subsegments of size \fId\fR\. Generally \fId\fR should be in the range [6, 8], occasionally up to 16, but the algorithm will run faster with d <= \fI8\fR\. Good values for \fIk\fR vary widely based on the input data, but a safe range is [2 * \fId\fR, 2000]\. Supports multithreading if \fBzstd\fR is compiled with threading support\.
Selects segments of size \fIk\fR with highest score to put in the dictionary\. The score of a segment is computed by the sum of the frequencies of all the subsegments of size \fId\fR\. Generally \fId\fR should be in the range [6, 8], occasionally up to 16, but the algorithm will run faster with d <= \fI8\fR\. Good values for \fIk\fR vary widely based on the input data, but a safe range is [2 * \fId\fR, 2000]\. If \fIsplit\fR is 100, all input samples are used for both training and testing to find optimal \fId\fR and \fIk\fR to build dictionary\. Supports multithreading if \fBzstd\fR is compiled with threading support\.
.
.IP
Examples:
@ -239,15 +242,15 @@ Examples:
.IP
\fBzstd \-\-train\-cover=k=50 FILEs\fR
.
.IP
\fBzstd \-\-train\-cover=k=50,split=60 FILEs\fR
.
.TP
\fB\-\-train\-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]\fR
Same as cover but with extra parameters \fIf\fR and \fIaccel\fR and different default value of split
Same as cover but with extra parameters \fIf\fR and \fIaccel\fR and different default value of split If \fIsplit\fR is not specified, then it tries \fIsplit\fR = 75\. If \fIf\fR is not specified, then it tries \fIf\fR = 20\. Requires that 0 < \fIf\fR < 32\. If \fIaccel\fR is not specified, then it tries \fIaccel\fR = 1\. Requires that 0 < \fIaccel\fR <= 10\. Requires that \fId\fR = 6 or \fId\fR = 8\.
.
.IP
If \fIsplit\fR is not specified, then it tries \fIsplit\fR = 75. If \fIf\fR is not specified, then it tries \fIf\fR = 20. Requires that 0 < \fIf\fR < 32. If \fIaccel\fR is not specified, then it tries \fIaccel\fR = 1. Requires that 0 < \fIaccel\fR <= 10. Requires that \fId\fR = 6 or \fId\fR = 8.
.
.IP
\fIf\fR is log of size of array that keeps track of frequency of subsegments of size \fId\fR. The subsegment is hashed to an index in the range [0,2^\fIf\fR - 1]. It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency. Using a higher \fIf\fR reduces collision but takes longer.
\fIf\fR is log of size of array that keeps track of frequency of subsegments of size \fId\fR\. The subsegment is hashed to an index in the range [0,2^\fIf\fR \- 1]\. It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency\. Using a higher \fIf\fR reduces collision but takes longer\.
.
.IP
Examples:

View File

@ -137,10 +137,12 @@ the last one takes effect.
* `--adapt` :
`zstd` will dynamically adapt compression level to perceived I/O conditions.
Compression level adaptation can be observed live by using command `-v`.
Works with multi-threading and `--long` mode.
When not specified otherwise, uses a window size of 8 MB.
The feature works when combined with multi-threading and `--long` mode.
It does not work with `--single-thread`.
It sets window size to 8 MB by default (can be changed manually, see `wlog`).
Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible.
Adaptation does not work with `--single-thread`.
_note_ : at the time of this writing, `--adapt` can remain stuck at low speed
when combined with multiple worker threads (>=2).
* `-D file`:
use `file` as Dictionary to compress or decompress FILE(s)
* `--no-dictID`:

View File

@ -417,38 +417,6 @@ test -f dictionary
rm tmp* dictionary
$ECHO "\n===> cover dictionary builder : advanced options "
TESTFILE=../programs/zstdcli.c
./datagen > tmpDict
$ECHO "- Create first dictionary"
$ZSTD --train-cover=k=46,d=8,split=80 *.c ../programs/*.c -o tmpDict
cp $TESTFILE tmp
$ZSTD -f tmp -D tmpDict
$ZSTD -d tmp.zst -D tmpDict -fo result
$DIFF $TESTFILE result
$ECHO "- Create second (different) dictionary"
$ZSTD --train-cover=k=56,d=8 *.c ../programs/*.c ../programs/*.h -o tmpDictC
$ZSTD -d tmp.zst -D tmpDictC -fo result && die "wrong dictionary not detected!"
$ECHO "- Create dictionary with short dictID"
$ZSTD --train-cover=k=46,d=8,split=80 *.c ../programs/*.c --dictID=1 -o tmpDict1
cmp tmpDict tmpDict1 && die "dictionaries should have different ID !"
$ECHO "- Create dictionary with size limit"
$ZSTD --train-cover=steps=8 *.c ../programs/*.c -o tmpDict2 --maxdict=4K
$ECHO "- Compare size of dictionary from 90% training samples with 80% training samples"
$ZSTD --train-cover=split=90 -r *.c ../programs/*.c
$ZSTD --train-cover=split=80 -r *.c ../programs/*.c
$ECHO "- Create dictionary using all samples for both training and testing"
$ZSTD --train-cover=split=100 -r *.c ../programs/*.c
$ECHO "- Test -o before --train-cover"
rm -f tmpDict dictionary
$ZSTD -o tmpDict --train-cover *.c ../programs/*.c
test -f tmpDict
$ZSTD --train-cover *.c ../programs/*.c
test -f dictionary
rm tmp* dictionary
$ECHO "\n===> fastCover dictionary builder : advanced options "
TESTFILE=../programs/zstdcli.c
@ -890,6 +858,10 @@ roundTripTest -g700M -P50 "1 --single-thread --long=29"
roundTripTest -g600M -P50 "1 --single-thread --long --zstd=wlog=29,clog=28"
$ECHO "\n===> adaptive mode "
roundTripTest -g270000000 " --adapt"
if [ -n "$hasMT" ]
then
$ECHO "\n===> zstdmt long round-trip tests "
@ -902,4 +874,37 @@ else
$ECHO "\n**** no multithreading, skipping zstdmt tests **** "
fi
$ECHO "\n===> cover dictionary builder : advanced options "
TESTFILE=../programs/zstdcli.c
./datagen > tmpDict
$ECHO "- Create first dictionary"
$ZSTD --train-cover=k=46,d=8,split=80 *.c ../programs/*.c -o tmpDict
cp $TESTFILE tmp
$ZSTD -f tmp -D tmpDict
$ZSTD -d tmp.zst -D tmpDict -fo result
$DIFF $TESTFILE result
$ECHO "- Create second (different) dictionary"
$ZSTD --train-cover=k=56,d=8 *.c ../programs/*.c ../programs/*.h -o tmpDictC
$ZSTD -d tmp.zst -D tmpDictC -fo result && die "wrong dictionary not detected!"
$ECHO "- Create dictionary with short dictID"
$ZSTD --train-cover=k=46,d=8,split=80 *.c ../programs/*.c --dictID=1 -o tmpDict1
cmp tmpDict tmpDict1 && die "dictionaries should have different ID !"
$ECHO "- Create dictionary with size limit"
$ZSTD --train-cover=steps=8 *.c ../programs/*.c -o tmpDict2 --maxdict=4K
$ECHO "- Compare size of dictionary from 90% training samples with 80% training samples"
$ZSTD --train-cover=split=90 -r *.c ../programs/*.c
$ZSTD --train-cover=split=80 -r *.c ../programs/*.c
$ECHO "- Create dictionary using all samples for both training and testing"
$ZSTD --train-cover=split=100 -r *.c ../programs/*.c
$ECHO "- Test -o before --train-cover"
rm -f tmpDict dictionary
$ZSTD -o tmpDict --train-cover *.c ../programs/*.c
test -f tmpDict
$ZSTD --train-cover *.c ../programs/*.c
test -f dictionary
rm tmp* dictionary
rm tmp*