1013 Commits

Author SHA1 Message Date
bimbashrestha
4a1ca5e0a8 Adding method for extracting sequences. 2019-08-29 11:55:12 -07:00
bimbashrestha
e5704bbfdf Added test for multiple blocks of zeros and fixed nit about comments 2019-08-28 08:32:34 -07:00
bimbashrestha
96201d9774 Added bool to cctx and fixed some comment nits 2019-08-26 15:30:41 -07:00
bimbashrestha
991cbc9024 Fixing mixed declaration compiler complaint 2019-08-26 15:00:50 -07:00
bimbashrestha
ce264ce53b Forbiding emission of RLE when its the first block 2019-08-26 14:54:29 -07:00
bimbashrestha
33b6446ca7 Removing accidental method call 2019-08-26 14:34:43 -07:00
bimbashrestha
7b041b552e Removing assert for rle that doesn't always hold 2019-08-26 12:26:53 -07:00
bimbashrestha
1f2bf77f2a Using typedef U32 instead of int 2019-08-26 09:00:22 -07:00
bimbashrestha
ba46932492 Removing implicit conversion from const void* to const BYTE* and added constant for threshold 2019-08-26 08:51:34 -07:00
bimbashrestha
0e3ba02cf1 Fixing more test falure errors 2019-08-22 13:54:41 -07:00
bimbashrestha
4faf3a5911 Fixing ci-circle test failure issues 2019-08-22 13:46:15 -07:00
bimbashrestha
cba5350f88 Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert 2019-08-22 12:12:44 -07:00
Nick Magerko
493f95c7df Fix merge conflicts 2019-08-22 11:51:41 -07:00
bimbashrestha
4c90d862e3 Generate RLE blocks in the encoder 2019-08-22 11:27:20 -07:00
Nick Magerko
c7a24d7a14 Define ZSTD_SRCSIZEHINT_MIN as 0 2019-08-20 13:06:15 -07:00
Nick Magerko
edf2abf106 Fix fall-through case 2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89 Add --size-hint=# option 2019-08-19 11:38:49 -07:00
Yann Collet
782bfb858a fixed very minor inefficiency (nbSeq==127)
The nbSeq "short" format (1-byte)
is compatible with any value < 128.

However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.

Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279 Restructure the source files 2019-07-15 17:39:18 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
f57ac7b09e Factor out the logic to build sequences 2019-07-03 15:42:38 -07:00
Ephraim Park
9007701670 Adding targetCBlockSize param 2019-07-03 15:41:52 -07:00
Nick Terrell
6c92ba774e
ZSTD_compressSequences_internal assert op <= oend (#1667)
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet
857e608b51
Merge pull request #1658 from facebook/memset
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet
621adde3b2 changed naming to ZSTD_indexTooCloseToMax()
Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.

Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
2019-06-24 14:39:29 -07:00
Yann Collet
45c9fbd6d9 prefer memset() rather than reduceIndex() when close to index range limit
by disabling continue mode when index is close to limit.
2019-06-21 16:19:21 -07:00
Nick Terrell
674534a700 [zstd] Fix data corruption in niche use case
* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
  and overflow correct between each chunk.

Data corruption could happen when all these conditions are true:

* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.
2019-06-21 15:47:31 -07:00
Yann Collet
80d6ccea79 removed UINT32_MAX
apparently not guaranteed on all platforms,
replaced by UINT_MAX.
2019-05-31 17:27:07 -07:00
Yann Collet
a968099038 minor code cleaning for new index invalidation strategy 2019-05-31 16:52:37 -07:00
Yann Collet
bc601bdc6d first implementation of small window size for btopt
noticeably improves compression ratio
when window size is small (< 18).

enwik7	level 19

windowLog	`dev`	`smallwlog`	improvement
23	3.577	3.577	0.02%
22	3.536	3.538	0.06%
21	3.462	3.467	0.14%
20	3.364	3.377	0.39%
19	3.244	3.272	0.86%
18	3.110	3.166	1.80%
17	2.843	3.057	7.53%
16	2.724	2.943	8.04%
15	2.594	2.822	8.79%
14	2.456	2.686	9.36%
13	2.312	2.523	9.13%
12	2.162	2.361	9.20%
11	2.003	2.182	8.94%
2019-05-31 15:55:12 -07:00
Yann Collet
b13a9207f9
Merge pull request #1623 from facebook/fullbench
fullbench minor improvements
2019-05-31 14:40:19 -07:00
Yann Collet
ed38b645db fullbench: pass proper parameters in scenario 43 2019-05-29 15:26:06 -07:00
Yann Collet
327cf6fac1 nextToUpdate3 does not need to be maintained outside of zstd_opt.c
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.

Made the logic clear, so that no code tried to maintain this variable.

An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.

This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
Yann Collet
6453f8158f complementary code comments
on variables used / impacted during maxDist check
2019-05-28 14:12:16 -07:00
Yann Collet
4baecdf72a added comments to better understand enforceMaxDist() 2019-05-28 13:15:48 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell
56682a7709 Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.

Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.

I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
2019-04-01 18:02:52 -07:00
Nick Terrell
6b053b9f60 [lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
  compression job. This allows the user to pass the dictionary before they
  set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell
e55da9e963 Wrap the new advanced api completely 2019-03-21 10:54:40 -07:00
Nick Terrell
787b76904a [libzstd] Allow compression parameters to be set with a cdict
The order you set parameters in the advanced API is not supposed to matter.
However, once you call `ZSTD_CCtx_refCDict()` the compression parameters
cannot be changed. Remove that restriction, and document what parameters
are used when using a CDict.

If the CCtx is in dictionary mode, then the CDict's parameters are used.
If the CCtx is not in dictionary mode, then its requested parameters are
used.
2019-03-13 16:10:05 -07:00
Nick Terrell
0594e8135b [libzstd] Free local cdict when referencing cdict
We no longer care about the `cdictLocal` after calling
`ZSTD_CCtx_refCDict()`, so we should free it to save some
memory.
2019-03-13 14:54:31 -07:00
Nick Terrell
7ad7ba3178 [libzstd] Rename ZSTD_CCtxParam_* to ZSTD_CCtxParams_* 2019-02-19 17:44:52 -08:00
Nick Terrell
f4abba02ba [libzstd] Clean up parameter code
* Move all ZSTDMT parameter setting code to ZSTD_CCtxParams_*Parameter().
  ZSTDMT now calls these functions, so we can keep all the logic in the
  same place.
* Clean up `ZSTD_CCtx_setParameter()` to only add extra checks where needed.
* Clean up `ZSTDMT_initJobCCtxParams()` by copying all parameters by default,
  and then zeroing the ones that need to be zeroed. We've missed adding several
  parameters here, and it makes more sense to only have to update it if you
  change something in ZSTDMT.
* Add `ZSTDMT_cParam_clampBounds()` to clamp a parameter into its valid
  range. Use this to keep backwards compatibility when setting ZSTDMT parameters,
  which clamp into the valid range.
2019-02-19 13:22:37 -08:00
Nick Terrell
3d7377b874 [libzstd] Handle uncompressed literals 2019-02-15 14:58:11 -08:00
Nick Terrell
f9513115e4 [libzstd] Add ZSTD_c_literalCompressionMode flag
It controls the literals compression. It is either
`auto`, `huffman`, or `uncompressed`. It defaults to
`auto`, which is the current behavior.
2019-02-13 14:59:22 -08:00