was broken by #1505.
I'm surprised it passed CI tests.
LZ4 tests are part of the "Extended" tests on Travis CI,
which are run on "master" and in "cron" jobs.
Since latest cron job did not failed,
especially this one : https://travis-ci.org/facebook/zstd/jobs/484365040
it suggests cron jobs are no longer using `dev` branch.
To be investigated
fseek() doesn't indicate when it moves past the end of a file.
Consequently, if a file is truncated within its last block, the error would't be detected.
This PR adds a test scenario that induces this situation using a small compressed file of only one block in size.
This test is added to tests/playTests.sh
Check is implemented by ensuring that the filehandle position is equal to the filesize upon exit.
due to bad support of inode identifiers.
On Visual, option is limited to same file name,
which is imperfect, but way better than disabling the feature entirely.
It's enough to pass associated tests.
as suggested in #1441.
generally U32 and unsigned are the same thing,
except when they are not ...
case : 32-bit compilation for MIPS (uint32_t == unsigned long)
A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).
Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
but the caller user a pointer to U32 instead.
These fixes are useful.
However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.
As a consequence, the amount of code changed is larger than I would expect for such a patch.
Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.
So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.
I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.
Thoughts ?
Compare the input and output files by their inode number and
refuse to open the output file if the input file is the same.
This doesn't work when (de)compressing multiple files to a single
file, but that is a very uncommon use case, mostly used for
benchmarking by me.
Fixes#1422.
The `--no-progress` flag disables zstd's progress bars, but leaves
the summary.
I've added simple tests to `playTests.sh` to make sure the parsing
works.
from overlapSizeLog.
Reasoning :
`overlapLog` is already used everwhere, in the code, command line and documentation.
`ZSTD_c_overlapSizeLog` feels unnecessarily different.
ZSTD_compress_generic() is renamed ZSTD_compressStream2().
Note that, for the time being,
the "stable" API and advanced one use different parameter planes :
setting parameters using the advanced API does not influence ZSTD_compressStream()
and using ZSTD_initCStream() does not influence parameters for ZSTD_compressStream2().
answering #1407.
Also : removed obsolete function ZSTD_setDStreamParameter()
which could only be used with one parameter (DStream_p_maxWindowSize).
Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)
gcc-8 on Linux doesn't like usage of strncat :
`warning: ‘strncat’ output truncated before terminating nul copying as many bytes from a string as its length`.
Not sure what was wrong, it might be a false positive,
but the logic is simple enough to replaced by a simple `memcpy()`,
thus avoiding the shenanigans of null-terminated strings.
following #1356,
only enable backtrace compilation on linux+glibc.
Also, disable backtrace by default from "release" compilation,
so that less platforms get impacted by the new requirements.
Can be manually enabled/disabled using BACKTRACE=1/0.
note : for some reason,
scan-build version on my laptop found problems within fastcover.c
that scan-build on travisCI does not flag.
They are, as usual, false positive :
the analyzer does not understand that a table (`offset`) is correctly filled before usage.
from EXIT_IF() to RETURN_IF()
EXIT could be misunderstood as exit(), which terminates program execution.
But the macro only leaves the function, not the program.
For OSX and Linux, add a signal handler to SIGABRT, SGIFPE, SIGILL,
SIGSEGV, and SIGBUS. When the program terminates unexpectedly the
handler will print the current stack to the terminal to help determine
the location of the failure.
On OSX the output will look like:
```
Stack trace:
4 zstd 0x000000010927ed96 main + 16886
5 libdyld.dylib 0x00007fff767d1015 start + 1
6 ??? 0x0000000000000001 0x0 + 1
```
On Linux the output will look like:
```
Stack trace:
./zstd() [0x4b8e1b]
./zstd() [0x4b928a]
./zstd() [0x403dc2]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f5e0fbb0445]
./zstd() [0x405754]
```
As is, the code does not function on WIN32.
See also: https://oroboro.com/stack-trace-on-crash/
Per warnings from flawfinder: "Does not check for buffer overflows when
copying to destination [MS-banned] (CWE-120). Consider using snprintf,
strcpy_s, or strlcpy (warning: strncpy easily misused).".
Replaced called to strcpy and strcat in `fileio.c` to calls with a
specified size (`strncpy` and `strncat`).
Tested the changes on OSX, Linux, Windows.
On OSX + Linux, changes were tested with ASAN. The following flags were
used: 'check_initialization_order=1:strict_init_order=1:detect_odr_violation=1:detect_stack_use_after_return=1'
To reproduce warning:
./flawfinder.py ./programs/fileio.c
tells in a non-blocking way if there is something ready to flush right now.
only works with multi-threading for the time being.
Useful to know if flush speed will be limited by lack of production.
In the new advanced API, adjust the parameters even if they are explicitly
set. This mainly applies to the `windowLog`, and accordingly the `hashLog`
and `chainLog`, when the source size is known.
IS_CONSOLE stolen wholesale from Options.cpp
not sure if i should have extracted that code for DRY-ness
tested in OSX and functionality seems appropriate
unstested in a windows environment
unistd.h is for unix standard tools.
There does not appear to be a simple isatty for windows
this we only run the logic and header include in
non-windows environments
FIO would keep presenting data after an LZ4F decoding error
resulting in a NULL pointer dereference
when associated with older liblz4 version (< v1.8.1.2)
access negative compression levels from command line
for both compression and benchmark modes.
also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.
added relevant cli tests.
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value. Still, appeasing Coverity is pretty harmless in this case.)
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced
Ingested can be larger than consumed due to buffering effect.
For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.
That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.
This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
The compression % is no longer correct,
since it's no longer possible to make direct correlation
between nb bytes read and nb bytes written
due to large internal buffer inside CCtx
(exacerbated with --long).
The current "fix" is to no longer display the %.
A more complex solution will have to count exactly how much data has been consumed and compressed internally, within CCtx buffers.
when cli is compiled without MT support,
invoking ZSTD_p_nonBlockingMode result in an error code.
This patch only sets ZSTD_p_nonBlockingMode when ZSTD_MULTITHREAD is set, meaning there is MT support.
The error code could also be intentionnally ignored (there is no side effect).
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.
It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.
Applied within fileio.c, this patch provides non-negligible gains during compression.
Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9
With traditional single-thread blocking mode :
real 0m9.557s
user 0m8.861s
sys 0m0.538s
With new single-worker non blocking mode :
real 0m7.938s
user 0m8.049s
sys 0m0.514s
=> 20% faster
This fixes the following crash:
$ touch exists
$ programs/zstd -r examples/ -o exists
zstd: exists already exists; not overwritten
Segmentation fault (core dumped)
* programs/fileio.c (FIO_compressMultipleFilenames):
Handle the case where we're not overwriting the destination.
Reported at https://bugzilla.redhat.com/1530049
This patch restores capability for each file to receive adapted compression parameters depending on its size.
The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.
Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
Fixes issue where, when `zstd --format=lz4` is fed an input larger than 128KB,
the read overruns the input buffer. This changes Zstd to use LZ4 with chained
64KB blocks. This is technically a breaking change in that some third party
LZ4 implementations may not support linked blocks. However, progress should not
be allowed to be stopped by such petty concerns as backwards compatibility!
we lose a warning message :
when a job size is chosen < minimum job size for multithreading,
it is automatically resized to minimum size.
If this information is really useful, it should be present in zstd.h now.
removed the other 2 code paths (single thread, and ZSTDMT ones)
keeping only the new advanced API, for easier code coverage.
It shall also fix identified issue with Visual Studio
which doesn't have ZSTD_NEWAPI defined.
UTIL_getFileSize() used to return zero on failure.
This made it impossible to distinguish a failure from a genuine empty file.
Both cases where coalesced.
Adding UTIL_FILESIZE_UNKNOWN constant has many consequences on user code,
since in many places, the `0` was assumed to mean "error".
This is no longer the case, and the error code must be actively checked.
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN` (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
when determining compression parameters
to compress one file only.
For multiple files, it still "bets" that files are going to be small.
There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
It's not good to mix old and new API
ZSTD_resetCStream() doesn't just set pledgedSrcSize :
it also sets the CCtx for a single thread compression.
Problem is, when 2+ threads are defined in cctx->requestedParams,
ZSTD_compress_generic() will want to start MT compression,
since initialization is supposed to have already happened (thanks to ZSTD_resetCStream())
except that the underlying ZSTDMT_CCtx* object is not created,
resulting in a segfault.
This is an invalid construction
(correct one is to use ZSTD_CCtx_setPledgedSrcSize()).
I haven't found a nice way to mitigate this impact if someone makes the same mistake.
At some point, removing the old API to keep only the new API within fileio.c will limit these risks.
srcSize is read and provided at each file, not at resource creation.
This used to be useful with older API, because it could not re-adapt parameters between sessions.
At some point, it will be better to remove the old code, and only keep the new_api.
It works fine by now.
fixes#874 :
when a frame is not properly terminated by a "last block" signal,
zstd -d used to detect it immediately and error out.
This version will decode and flush the last block, and only then issue an error.
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
`--zstd=wlog=#`. These options also set the window size during
decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
and decompression.
Simple makefile change + quick typename change
Test:
make clean
make
# successfully produces binary without lz4 support
make clean
# with flags to pick up my lz4 build
make MOREFLAGS="-L/home/felixh/prog/lz4/lib -I/home/felixh/prog/lz4/lib"
# successfully produces binary with lz4 support
echo "TEST TEST TEST THIS IS A TEST STRING PLEASE TEST THIS PLEASE OK THANK YOU" | \
./lz4/lz4 | \
LD_LIBRARY_PATH=/home/felixh/prog/lz4/lib ./zstd/zstd -d
# successfully prints TEST TEST TEST THIS IS A TEST STRING PLEASE TEST THIS PLEASE OK THANK YOU
file-information is dependent on decompression functions.
it should only be enabled when ZSTD_NODECOMPRESS is not set.
also : added zstd-compress compilation test into `make shortest`