Commit Graph

848 Commits (87125c2c01097dd2fd51050cf03f177733a74c99)

Author SHA1 Message Date
Yann Collet c005df136f
Merge pull request #947 from facebook/fix944
Fix #944
2017-12-14 10:01:52 -08:00
Yann Collet 2e97a6d464 fixed minor declaration-after-statement warning 2017-12-13 18:50:05 -08:00
Yann Collet 5432ef6921 fixes adaptation on srcSize
This patch restores capability for each file to receive adapted compression parameters depending on its size.

The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.

Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
2017-12-13 17:45:26 -08:00
Yann Collet d23eb9a098 zstreamtest : added missing CHECK_Z() 2017-12-13 15:35:49 -08:00
Nick Terrell 22727a7467 Fix cdict compressor repcodes 2017-12-13 11:31:20 -08:00
Yann Collet e28305fcca fix #944 : ZSTDMT with large files and dictionary now works correctly
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.

also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
2017-12-12 18:04:58 -08:00
Yann Collet 03832b7aa5 re-added test case
messing with revert ... :(
2017-12-12 14:01:54 -08:00
Yann Collet 8a104fda05 Revert "Created a test case which reliably reproduces bug #944"
This reverts commit 5098d1fbe2.
2017-12-12 12:51:49 -08:00
Yann Collet 5098d1fbe2 Created a test case which reliably reproduces bug #944
in zstreamtest.
2017-12-12 12:48:31 -08:00
Yann Collet ac8e022806
Merge pull request #943 from facebook/fix942
Fix #942
2017-12-08 13:53:08 -05:00
Yann Collet dfc697e967 comment clarification 2017-12-08 12:16:49 -05:00
Yann Collet c029ee1f0b ZSTD_initCStream_srcSize() considers "0" to mean "unknown"
to not break existing programs relying on this behavior.
Might be changed to mean "empty" in the future.
2017-12-07 17:13:10 -05:00
Yann Collet 3aa2b27a89 fix #942 : streaming interface does not compress after ZSTD_initCStream()
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.

This patch fixes the issue.
Todo : add a test which traps the issue.
2017-12-07 02:52:50 -05:00
Yann Collet c173dbd6e7 no longer supported starting C++17 2017-12-04 18:00:53 -08:00
Yann Collet 7e05ef851a Merge branch 'dev' into qemu32panic 2017-12-03 11:14:36 -08:00
Yann Collet 5e1f34b7e4 setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
2017-12-01 21:17:09 -08:00
Yann Collet 78290874a5 fixed Visual warning on minor interface discrepancy 2017-11-29 17:01:14 -08:00
Yann Collet d3c59edac9 removed long-range-mode tests from `zstreamtest --no-big-tests` 2017-11-29 16:42:20 -08:00
Yann Collet 998a93b784 simplified ZSTD_CCtx_setParametersUsingCCtxParams()
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
2017-11-29 16:13:05 -08:00
Yann Collet f98ee994c4 zstd_opt: added comments, as requested by @terrelln 2017-11-29 15:19:00 -08:00
Yann Collet bc42bc3b1d removed one invocation of SET_PRICE() macro 2017-11-28 16:08:56 -08:00
Yann Collet 0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
Yann Collet b71405dc51 removed a bunch of code related to cached literal price
optState was used both to evaluate price
and to cache cost of previously calculated literals.
This created a strong dependency, forcing parser to request cost in a strict order.
This limitation is forbids future parser with skipping capabilities.

After this patch, caching literals price still exists,
but is now explicit, in a stack structure.
2017-11-28 12:32:24 -08:00
Yann Collet 03f30d9dcb separate rawLiterals, fullLiterals and match costs
removed one SET_PRICE() macro invocation
2017-11-28 12:14:46 -08:00
Yann Collet eee87cd6f2 btopt: minor refactor : removed one SET_PRICE() macro invocation
direct assignment makes operation cleaner.
Also allows some (very minor) optimization (non-measurable)
2017-11-27 17:18:57 -08:00
Yann Collet e9d1987fd7 btopt: minor speed optimization
matchPrice is always right at beginning
2017-11-27 17:01:51 -08:00
Yann Collet f8d5c478af fixed comment, reported by @gyscos 2017-11-21 10:36:14 -08:00
Yann Collet 4154aec679 fixed comment, as suggested by @terrelln 2017-11-21 10:26:17 -08:00
Yann Collet 899f2a29f6 strategy ZSTD_btopt pinned to (0) variant (faster one) 2017-11-20 11:53:20 -08:00
Yann Collet 3f457264d1 slightly improved compression speed 2017-11-19 14:40:21 -08:00
Yann Collet 42c1e64270 slightly improved ratio at -22
merging of repcode search into btsearch introduced a small compression ratio regressio at max level :
1.3.2 : 52728769
after repMerge patch : 52760789 (+32020)

A few minor changes have produced this difference.
They can be hard to spot.

This patch buys back about half of the difference,
by no longer inserting position at hc3 when a long match is found there.
It feels strangely counter-intuitive, but works :
after this patch : 52742555 (-18234)
2017-11-19 14:00:55 -08:00
Yann Collet 99435dbbab minor : search early-out on sufficient_len for hc3 and rep
very very small speed and ratio increases
2017-11-19 12:58:04 -08:00
Yann Collet d100670045 btopt0 : a bit faster and weaker 2017-11-19 10:38:02 -08:00
Yann Collet e6da37c430 created (hidden) new strategy btopt0
about ~+10% faster but losing ~0.01 compression ratio
(note : amplitude vary a lot depending on files, but direction remains the same)
2017-11-19 10:21:21 -08:00
Yann Collet e717a5b0dd zstd_opt: minor speed optimization
Calculate reference log2sums only once per serie of sequence
(as opposed to once per sequence)

Also: improved code comments
2017-11-18 16:24:02 -08:00
Yann Collet a4a20a4b2f fix un-initialized memory warning
harmless, but cleaner
2017-11-17 15:51:52 -08:00
Yann Collet 23767e950a fix one UB pointer arithmetic in encoder
Instead of calculating distance between 2 memory objects, which is UB,
we extract the offset from object 1, and transfer it into object 2.
2017-11-17 13:24:51 -08:00
Yann Collet 11e58d9ba4 fixed minor warning
warning: void function returning a value
(even if the return value is void)
2017-11-16 15:21:30 -08:00
Yann Collet 15768cabb5 fixed some complex scenarios
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
2017-11-16 15:18:18 -08:00
Yann Collet 05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00
Yann Collet dfc14579f5 removed wrong assertion 2017-11-15 15:35:56 -08:00
Yann Collet c55e35b2fc removed a few specialized traces 2017-11-15 15:04:53 -08:00
Yann Collet 61c2d70c86 shortened repcode match finder implementation 2017-11-15 14:37:40 -08:00
Yann Collet d7e9805028 fixed corruption issue 2017-11-15 13:44:24 -08:00
Yann Collet 046ea53bef still fighting data corruption
due to messed up tree.
Seems to happen when reaching end of buffer.
2017-11-15 11:29:24 -08:00
Yann Collet 4202b2e8a6 merged rep search into btMatchSearch
but there is a tree corruption somewhere ...
bug hunt ongoing
2017-11-14 20:38:52 -08:00
Yann Collet 9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet eb47705b18 reduced scope of multiple variables
renamed some variables for better understanding
2017-11-10 08:31:12 -08:00
Yann Collet 100d8ad6be lib/compress: created ZSTD_LLcode() and ZSTD_MLcode()
transform length into code.
Since transformation is needed in several places throughout the code,
better write the logic in one place.
2017-11-08 12:43:05 -08:00
Yann Collet 5aa0352742 zstd_opt: simplified ZSTD_getPrice() and ZSTD_updatePrice() interface
ZSTD_getPrice() and ZSTD_updatePrice() accept normal matchlength as argument
instead of matchlength-MINMATCH,
which makes them easier / more logical to use and read.
Conversion is simply done internally.
2017-11-08 12:23:27 -08:00
Yann Collet bf730e2044 zstd_opt: refactor code for improved readability
renamed variables to be more meaningful
reduced scope of multiple variables
removed some useless var attribution
2017-11-08 12:07:39 -08:00
Yann Collet 4191efa993 zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests 2017-11-08 11:24:00 -08:00
Yann Collet ee441d5d2b renamed zstd_compress.h into zstd_compress_internal.h
to emphasize the fact that all definitions it contains
must remain private, accross lib/compress modules.
2017-11-07 16:15:23 -08:00
Yann Collet 8b6aecf2cb moved a few structures from `zstd_internal.h` to `zstd_compress.h`
which is a more precise scope
2017-11-07 16:03:14 -08:00
Yann Collet 150354c5fe minor refactor
added some traces and assert
related to hunting a potential ubsan error in 32-bits more
(it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802).

Modified one pointer arithmetic expression for a more conformant way.
2017-11-01 16:57:48 -07:00
Yann Collet 428e8b3bf4 fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.

This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.

It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
2017-11-01 13:15:23 -07:00
Nick Terrell 86b8134cad [libzstd] Fix parameter selection for empty input
ZSTD_compress() and friends would treat an empty input as an unknown size
when selecting parameters. Thus, they would drastically overallocate the
context. Tell ZSTD_getParams() that the source size is 1 when it is empty.
2017-10-25 17:24:15 -07:00
Yann Collet 1ff8a8c109 Merge pull request #891 from facebook/contentSize
Content size
2017-10-17 17:24:51 -07:00
Yann Collet 32c9f715ae fixed : Visual build compressing stdin with multi-threading enabled fails
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN`  (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
2017-10-17 14:07:43 -07:00
Yann Collet 13bfe885aa edited ZSTD_initCStream_advanced() comment 2017-10-16 14:06:22 -07:00
Nick Terrell 7f961ba6cd Don't allow default tables to repeat
It isn't useful in any case to repeat default tables.
Saves a few bytes on Silesia, since we don't trigger the dictionary
heuristic.

Before: 211988480 => 73651998 bytes
After:  211988480 => 73651721 bytes
2017-10-16 11:37:56 -07:00
Yann Collet fc8d293460 dictionary compression use correct file size estimation
when determining compression parameters
to compress one file only.

For multiple files, it still "bets" that files are going to be small.

There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
2017-10-14 01:21:43 -07:00
Yann Collet beb9b4b398 fixed ZSTDMT_initCStream() when contentSizeFlag==1 by default
and a wrong test in zstreamtest --mt
2017-10-13 19:09:30 -07:00
Yann Collet 213ef3b510 fixed ZSTD_initCStream_advanced() behavior, which depends on contentSizeFlag,
and a stream fuzzer test, which was incorrect
(relied on 0 being unconditionnally transformed into `ZSTD_CONTENTSIZE_UNKNOWN`)
2017-10-13 19:01:58 -07:00
Yann Collet 3c1e3f8ec9 contentSizeFlag enabled by default would also fail for streaming and MT operations
fixed
2017-10-13 18:32:06 -07:00
Yann Collet fb44516641 ensure fParams.contentSizeFlag starts at 1
such default was failing for ZSTD_compressBegin/ZSTD_compressContinue
fixed too
2017-10-13 17:39:13 -07:00
Yann Collet dd18d73e7e fileio: content size is enabled by default 2017-10-13 16:32:18 -07:00
Nick Terrell ced6e6189c Add DEBUGLOG() that prints FSE encoding types 2017-10-13 14:55:23 -07:00
Nick Terrell 24ac2dbd2a Fix invalid use of dictionary offcode table
Fixes #888.
2017-10-13 12:47:03 -07:00
Yann Collet a9e5705077 minor code formatting
added a trace during sequence encoding
2017-10-13 02:36:16 -07:00
Nick Terrell a86a7097ec Ensure dictionary Huff table can encode any symbol
* Ensure that the dictionary Huffman CTable has maxSymbolValue 255.
* Fix a stack buffer overflow during compression dictionary loading.
2017-10-03 13:22:13 -07:00
Yann Collet 67478f4cb0 fixed minor conversion warnings for printf
in debug mode
2017-10-02 17:28:57 -07:00
Yann Collet 004fd34fd9 Merge pull request #876 from facebook/srcSize
CLI Fix : srcSize written in frame headers when compressing multiple files
2017-10-02 15:02:05 -07:00
Nick Terrell 86e83e926f [libzstd] Set CLEVEL_CUSTOM correctly
In `ZSTD_compressBegin_advanced()`, `ZSTD_parameters` are used to set the
compression parameters, but the level didn't get set to `CLEVEL_CUSTOM`, so
`ZSTD_compressBlock()` used the wrong parameters when checking the source
size.
2017-10-02 13:43:30 -07:00
Yann Collet 6e930c13d1 Merge branch 'dev' into compressBound 2017-10-01 11:24:02 -07:00
Yann Collet dc404119e5 ZSTD_adjustCParams_internal : minor optimization 2017-09-30 15:02:40 -07:00
Nick Terrell c5d6dde502 Don't `size -= 1` in ZSTD_adjustCParams()
The window size could end up too small if the source size is 2^n + 1.

Credit to OSS-Fuzz
2017-09-30 14:20:06 -07:00
Yann Collet 5b10345b26 added ZSTD_COMPRESSBOUND() as a macro
ZSTD_compressBound() works fine, but is only useful for dynamic allocation.
For static allocation, only a macro can provide the amount during compilation time.
2017-09-29 23:17:41 -07:00
Yann Collet 8afb151c9b cli: fixed wrong initialization in MT mode
It's not good to mix old and new API
ZSTD_resetCStream() doesn't just set pledgedSrcSize :
it also sets the CCtx for a single thread compression.

Problem is, when 2+ threads are defined in cctx->requestedParams,
ZSTD_compress_generic() will want to start MT compression,
since initialization is supposed to have already happened (thanks to ZSTD_resetCStream())
except that the underlying ZSTDMT_CCtx* object is not created,
resulting in a segfault.

This is an invalid construction
(correct one is to use ZSTD_CCtx_setPledgedSrcSize()).
I haven't found a nice way to mitigate this impact if someone makes the same mistake.

At some point, removing the old API to keep only the new API within fileio.c will limit these risks.
2017-09-29 22:14:37 -07:00
Yann Collet fbd5ab7027 minor fix : no longer use fake srcSize during resource creation
srcSize is read and provided at each file, not at resource creation.
This used to be useful with older API, because it could not re-adapt parameters between sessions.

At some point, it will be better to remove the old code, and only keep the new_api.
It works fine by now.
2017-09-29 19:40:27 -07:00
Yann Collet db1668a43b fix : srcSize written in frame header when multiple files compressed
This information used to be disabled when nbFiles>1.
It was badly initialized later in the code, resulting in an error.
2017-09-29 18:05:18 -07:00
Yann Collet 7c9669f272 Merge pull request #873 from facebook/shorterTests
Leaner tests
2017-09-29 17:26:46 -07:00
Yann Collet 1416bc0f07 erase existence of a buffer when it's sent out of the pool
In some complex scenario,
the buffer would be freed because it's too large,
another buffer would be allocated, but fail,
trigger an error,
and the general buffer pool would then be freed,
where the definition of the already freed buffer would be found
(beyond total index, but still), and freed again, resulting in double-free error.
2017-09-29 16:27:47 -07:00
Yann Collet e963800e27 zstdmt : fixed : buffer dst0 wasn't properly set to null after usage
now it's possible to unconditionnally invoke ZSTD_releaseAllJobRessources()
wether previous compression was completed correctly or not.
2017-09-28 23:01:31 -07:00
Yann Collet 754ae5cc0b removed ZSTDMT_waitForAllJobsCompleted() from ZSTDMT_freeCCtx()
as per @terrelln comment
2017-09-28 20:45:31 -07:00
Yann Collet 86b4fe5b45 adjustCParams : restored previous behavior
unknowns srcSize presumed small if there is a dictionary (dictSize>0)
and presumed large otherwise.
2017-09-28 18:14:28 -07:00
Yann Collet b93598d6a4 zstdmt : reduced maximum nb of threads
to avoid memory address space issues on 32-bits systems
(see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=876416#17)
2017-09-28 13:49:12 -07:00
Yann Collet e4ec427720 Merge branch 'dev' into shorterTests
fixed conflicts
2017-09-28 12:19:28 -07:00
Yann Collet 8074261d00 zstdmt : move on when not enough memory for a new input buffer
just continue operations without input forward progress,
instead of an error that stops current compression session.
2017-09-28 11:46:19 -07:00
Yann Collet 2cd15dd9a4 fixed minor Visual conversion warning 2017-09-28 02:33:41 -07:00
Yann Collet 377abcc02c zstdmt : better behavior when freeing a context right after a memory allocation error
wait for all jobs to be completed, so that freeing can happen safely
2017-09-28 02:23:44 -07:00
Yann Collet d6770f80af minor : rewrite unit tests using CHECK_Z macro 2017-09-28 02:14:48 -07:00
Yann Collet 9b5b47ac93 ensure adjustCParams adjust hLog and cLog even without srcSize
It would previously exit when srcSize is unknown.
But in the case of custom parameters,
hLog and cLog can still be too large in comparison with windowLog.

Reduces maximum memory allocated during zstreamtest --newapi
2017-09-28 01:25:40 -07:00
Yann Collet 54a827fff0 Merge branch 'dev' into newFormats
Fixed conflicts in zstdmt_compress.c
2017-09-27 16:39:40 -07:00
Yann Collet e45a2aea9b Merge pull request #869 from terrelln/dev
[libzstd] pthread function prefixed with ZSTD_
2017-09-27 16:35:08 -07:00
Nick Terrell b555b7ef41 [libzstd][opt] Simplify repcode logic 2017-09-27 15:30:12 -07:00
Yann Collet c994932788 fixed ZSTD_format_e value validation 2017-09-27 12:22:22 -07:00
Nick Terrell 6c41adfb28 [libzstd] pthread function prefixed with ZSTD_
* `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/*.[hc]`
* Fix up `lib/common/threading.[hc]`
* `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`
2017-09-27 11:48:48 -07:00
Yann Collet ecf1778e23 updated ZSTD_format_e value validation
also updated manual
2017-09-27 11:19:21 -07:00
Yann Collet 4791561c4a silence minor gcc warning -Wempty-body
also silence fuzz test artefacts
2017-09-26 17:57:38 -07:00
Yann Collet 9f0b8dfbe9 Merge branch 'dev' into newFormats 2017-09-26 14:22:39 -07:00
Nick Terrell c233bdbaee Increase maximum window size
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
  on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
  power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
  `--zstd=wlog=#`. These options also set the window size during
  decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
  decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
  which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
  window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
  and decompression.
2017-09-26 14:00:01 -07:00
Yann Collet 586df82a78 Merge pull request #862 from terrelln/static
[zstd] Backport kernel patch from @ColinIanKing
2017-09-25 17:02:40 -07:00
Yann Collet 5d8fdd1641 Merge pull request #855 from terrelln/maxoff
[libzstd] Increase MaxOff
2017-09-25 16:34:29 -07:00
Nick Terrell 76cb38d085 [zstd] Backport kernel patch from @ColinIanKing
* Make the U32 table in `FSE_normalizeCount()` static.
* Patch from https://lkml.kernel.org/r/20170922145946.14316-1-colin.king@canonical.com.
* Clang makes non-static tables static anyways. gcc however, does [weird things](https://godbolt.org/g/fvTcED).
* Benchmarks showed no difference in speed.
2017-09-25 16:18:23 -07:00
Yann Collet 6ee05a02b8 added ZSTD_decompress_generic()
same as ZSTD_decompressStream(),
just for a similar feeling as the compression side, which uses ZSTD_compress_generic()
2017-09-25 15:41:48 -07:00
Yann Collet 62568c9a42 added capability to generate magic-less frames
decoder not implemented yet
2017-09-25 14:26:26 -07:00
Nick Terrell bbe77212ef [libzstd] Increase MaxOff 2017-09-25 13:36:18 -07:00
Yann Collet 96f0cde31a minor function rename
ZSTD_estimateCStreamSize_advanced_usingCParams -> ZSTD_estimateCStreamSize_usingCParams
_usingX is clear.
_advanced feels redundant
2017-09-24 16:47:02 -07:00
Yann Collet 7c3dea42ce added prototypes for advanced parameters for decompression API
required to decode custom formats
2017-09-24 15:57:29 -07:00
Nick Terrell d6abb28951 Prepare for ZSTD_WINDOWLOG_MAX == 31 2017-09-21 17:18:41 -07:00
Yann Collet da74aabc00 Merge pull request #850 from terrelln/fse-optimal
[fse] Fix FSE_optimalTableLog() for srcSize==1
2017-09-19 14:59:21 -07:00
Nick Terrell 6c9ed76676 [ldm] Fix corner case where minMatch < 8
There is a potential read buffer overflow when minMatch < 8.

fix-fuzz-failure
2017-09-19 13:49:37 -07:00
Yann Collet 7d1ff3817b fix ZSTD_sizeof_CCtx() / ZSTD_sizeof_CStream()
previous result was over-estimated
by counting streaming buffers twice
2017-09-18 14:47:34 -07:00
Nick Terrell cae3e3c652 [fse] Fix FSE_optimalTableLog() for srcSize==1 2017-09-18 14:11:18 -07:00
Yann Collet 539b91ee9b minor : added assert in bt 2017-09-16 23:41:58 -07:00
Yann Collet 335780c427 fixed too strong alignment assert in ZSTD_initStaticCCtx()
64-bits fields are only 32-bits aligned on 32-bits CPU
2017-09-13 16:35:29 -07:00
Yann Collet f1571dad8f Merge pull request #838 from stellamplau/ldm-mergeDev
Add long distance matcher
2017-09-13 13:24:08 -07:00
Yann Collet 3306bcb0e6 fix #820 : GCC v3.x 32-bits doesn't define 64-bits intrinsic
resulting in undefined symbol error.
Push the requirement to GCC 4 for now.
Another solution, proposed by @NWilson, is to use __LONG_MAX__ instead.
__LONG_MAX__ is a GCC-specific constant, which value is supposed to depend on underlying target hardware (32/64 bits)
Might be better, but seems also more complex, hence more prone to side effects.
Keeping the simple solution for now (just rely on __GNUC__)
2017-09-11 15:17:31 -07:00
Stella Lau eb3327c10a Merge branch 'dev' of https://github.com/facebook/zstd into ldm-mergeDev 2017-09-11 15:00:01 -07:00
Stella Lau f902bf9676 Merge branch 'ldm-integrate' into ldm-mergeDev 2017-09-11 14:55:29 -07:00
Yann Collet f325ee4e84 fixed pass-through warning 2017-09-11 14:37:03 -07:00
Stella Lau 0d1b54db61 Explicitly cast raw numerals when left-shifting 2017-09-11 14:28:18 -07:00
Yann Collet 0d6ecc72a3 makes it possible to compile libzstd in single-thread mode without zstdmt_compress.c (#819) 2017-09-11 14:09:34 -07:00
Yann Collet 3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Stella Lau 360428c5d9 Move ldm functions to their own file 2017-09-06 18:09:26 -07:00
Stella Lau 2b99d696de Remove debug code 2017-09-06 15:57:26 -07:00
Stella Lau eeff55dfa8 Merge remote-tracking branch 'upstream/dev' into ldm-mergeDev 2017-09-06 15:56:32 -07:00
Yann Collet ad0046244f Merge pull request #831 from terrelln/split-compress
Split parsers out of zstd_compress.c
2017-09-06 10:01:27 -07:00
Stella Lau 9e4060200b Add tests and fix pointer alignment 2017-09-06 09:14:05 -07:00
Stella Lau c706de5395 Rename and add short ldm parameters in cli 2017-09-05 21:11:18 -07:00
Stella Lau 98b85426f1 Fix setting of nextToUpdate at end of ldm matcher 2017-09-05 20:41:37 -07:00
Nick Terrell 721726d688 Split parsers out of zstd_compress.c 2017-09-05 17:10:25 -07:00
Stella Lau 08d33fe1c9 Fix parameter handling in copyCCtx with cdict 2017-09-05 15:50:20 -07:00
Stella Lau fd0071da29 Fix parameter handling with ZSTD_copyCCtx 2017-09-05 15:34:17 -07:00
Stella Lau 67d4a6161c Add ldmBucketSizeLog param 2017-09-02 21:55:29 -07:00
Stella Lau a1f04d518d Move hashEveryLog to cctxParams and update cli 2017-09-01 15:05:47 -07:00
Stella Lau 767a0b3be1 Move ldm hashLog, bucketLog, and mml to cctxParams 2017-09-01 12:24:59 -07:00
Stella Lau 17d8e0bdcc Merge remote-tracking branch 'upstream/longRangeMatcher' into ldm-integrate 2017-09-01 10:19:38 -07:00
Stella Lau 8081becadc Add long distance matching as a CCtxParam 2017-09-01 09:18:58 -07:00
Yann Collet 369c29dd1a fixed impact of merge conflict for longRange 2017-08-31 18:25:56 -07:00
Yann Collet d7ad99b2ab Merge branch 'longRangeMatcher' into dev 2017-08-31 18:08:37 -07:00
Stella Lau 6a546efb8c Add long distance matcher
Move last literals section to ZSTD_block_internal
2017-08-31 12:53:19 -07:00
Stella Lau 90a31bfa16 Pass dictMode to ZSTDMT_initCStream; fix nits
- Return error code in estimate{CCtx,CStream}Size functions
2017-08-30 16:19:07 -07:00
Stella Lau ee65701720 Minor fixes; remove formatting only changes 2017-08-29 20:27:35 -07:00
Stella Lau a6e20e1bd7 Add test for raw content starting with dict header 2017-08-29 18:36:18 -07:00
Stella Lau 623e3cd40b Use ZSTD_dm_rawContent in zstdmt_compress 2017-08-29 18:04:32 -07:00
Stella Lau 82d636b76a Rename applyCCtxParams() 2017-08-29 18:03:06 -07:00
Stella Lau 4e835720bf Delay creation of ZSTDMT_CCtx 2017-08-29 17:58:32 -07:00
Stella Lau c7a18b7c21 Localize 'dictMode' from cctx to function param 2017-08-29 15:52:24 -07:00
Stella Lau c88fb9267f Replace 'byReference' with enum 2017-08-29 11:55:02 -07:00
Stella Lau b5b9275e67 Rename estimateCCtxSize_advanced() and estimateCStreamSize_advanced() 2017-08-29 10:49:29 -07:00
Stella Lau 0e56a84a1e Fix getting cParams from CCtxParams 2017-08-28 19:25:17 -07:00
Stella Lau 024098a47d Fix parameter retrieval from cdict 2017-08-25 17:58:28 -07:00
Stella Lau 2adde898c8 Fix typo with ZSTDMT_parameter 2017-08-25 16:13:40 -07:00
Stella Lau 18224608ff Remove ZSTD_setCCtxParameter() 2017-08-25 13:58:41 -07:00
Stella Lau 0744592d38 Add function initializing cctxParams from clevel 2017-08-25 13:36:47 -07:00
Stella Lau 9911153723 Move jobSize and overlapLog in zstdmt to cctxParams 2017-08-25 13:14:51 -07:00
Stella Lau de5193422d Distinguish between jobParams and cctxParams in zstdmt 2017-08-25 11:36:17 -07:00
Stella Lau eb7bbab36a Remove ZSTD_p_refDictContent and dictContentByRef 2017-08-25 11:11:45 -07:00
Nick Terrell db3f5372df [zstdmt] Use POOL_create_advanced() 2017-08-24 18:12:28 -07:00
Stella Lau 15fdeb9e41 Enforce nbThreads<=1 for estimateCCtxSize 2017-08-24 16:28:49 -07:00
Stella Lau 2fbf0285b2 Fix interaction with ZSTD_setCCtxParameter() and cleanup 2017-08-24 11:25:41 -07:00
Stella Lau fd9bf42516 Fix forceWindow and dictMode setting for zstdmt jobs 2017-08-23 19:16:57 -07:00
Stella Lau bf3108fb50 Ensure zstdmt uses 'job version' of cctx parameters 2017-08-23 17:03:31 -07:00
Stella Lau 1c81f725ff Remove duplicated testing code 2017-08-23 15:47:15 -07:00
Stella Lau 64ce49426b Fix cstream compression level 2017-08-23 12:30:47 -07:00
Stella Lau 5bc2c1e982 Add prototype support for customMem with cctxParams 2017-08-23 12:03:30 -07:00
Yann Collet e9ce1208a1 Merge pull request #812 from facebook/longRangeFix
fixed extraordinary scenario where all fields use maximum nbBits
2017-08-23 11:35:28 -07:00
Stella Lau 6f1a21c7e9 Remove formatting-only changes 2017-08-23 10:24:19 -07:00
Stella Lau 11303778d0 Add function to make cctxParams from ZSTD_parameters 2017-08-22 14:53:13 -07:00
Stella Lau 23fc0e41fa Remove 'opaque' naming from internal functions 2017-08-22 14:24:47 -07:00
Stella Lau 8fd1636776 Remove unused functions 2017-08-22 13:33:58 -07:00
Yann Collet 6b2b6a9bd5 fixed extraordinary scenario where all fields use maximum possible nb of bits simultaneously
can only happen if windowLog>=27  (level 22 --ultra)
2017-08-22 12:09:21 -07:00
Stella Lau e50ed1fa3a Fix undefined behavior when srcSize==1 2017-08-22 11:55:42 -07:00
Stella Lau 60e1bc617c Explicitly create a job cctxParam for multithreading 2017-08-21 15:39:37 -07:00
Stella Lau 5b956f4753 Comment out CCtx_param versions of CDict functions 2017-08-21 14:49:16 -07:00
Stella Lau fd8a25786e Check parameters are valid in initCCtxParams 2017-08-21 13:23:35 -07:00
Stella Lau 1c0dbe81b1 Add documentation for CCtx_params 2017-08-21 13:18:00 -07:00
Stella Lau 939f954285 Pass ZSTD_CCtx_params as const ptr when possible 2017-08-21 12:57:18 -07:00
Stella Lau 560b34f6d2 Return error code when initializing NULL cctxParams 2017-08-21 11:52:26 -07:00
Stella Lau 25be09c6b4 Set some parameters to zero before initializing cdict 2017-08-21 11:35:46 -07:00
Yann Collet 232d62b637 fixed a few headers that were too hastily copy/pasted during last license change 2017-08-21 11:24:32 -07:00
Stella Lau 502031ca10 Use cctxParam version of createCDict internally 2017-08-21 11:00:44 -07:00
Stella Lau 91b30dbe84 Remove test parameter 2017-08-21 10:09:06 -07:00
Stella Lau f181f33bdf Disable tests and refactor 2017-08-21 01:59:08 -07:00
Stella Lau 023b24e6d4 Add cctx param tests 2017-08-20 22:55:07 -07:00
Yann Collet 7db552676e reduced pool queue to 0 to save memory
fixed : pool performance when jobs are fires fast and queueSize==0
2017-08-19 15:07:54 -07:00
Stella Lau 6cee6e07e5 Add internal createCDict function 2017-08-18 22:48:31 -07:00
Stella Lau d775519296 Add cctxParam versions of internal functions 2017-08-18 17:37:58 -07:00
Yann Collet 32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Stella Lau 63b8c98531 Pass cctx parameters to MTCtx 2017-08-18 16:17:24 -07:00
Stella Lau 399ae013d4 Add function to apply cctx params 2017-08-18 13:01:55 -07:00
Stella Lau 81d89d82a6 Move nbThreads to cctx params 2017-08-18 12:08:57 -07:00
Stella Lau 2300c58a6f Move dictContentByRef to cctx params 2017-08-18 12:03:16 -07:00
Stella Lau b6cb2ed8cb Move dictMode to cctxParams 2017-08-18 11:43:31 -07:00
Stella Lau 97e27affcb Move compression level to cctx params 2017-08-18 11:20:08 -07:00
Stella Lau c0221124d5 Add function to set opaque parameters 2017-08-17 19:30:22 -07:00
Stella Lau 4169f49171 Add initialization/allocation functions for opaque params 2017-08-17 18:45:04 -07:00
Stella Lau ade95b8bed Add opaque interfaces for static initialization 2017-08-17 18:13:08 -07:00