Commit Graph

2273 Commits (0998f10813efc40d42f7f4043183a81153032292)

Author SHA1 Message Date
W. Felix Handte 0998f10813 Implement First Repcode Check 2018-05-25 13:13:28 -04:00
W. Felix Handte 50c5b2bb90 Find Dict Hash Table Matches 2018-05-25 13:13:28 -04:00
W. Felix Handte 7a25f7ef5b Existing Repcode Check Only Applies to noDict Case 2018-05-25 13:13:28 -04:00
W. Felix Handte 8b241da4df Properly Initialize Repcode Values 2018-05-25 13:13:28 -04:00
W. Felix Handte 7097a03749 Add Necessary Dict Variables 2018-05-25 13:13:28 -04:00
W. Felix Handte aacbbf4f9a Rename 'lowest' to 'localLowest' to Prepare to Introduce Dict Indices 2018-05-25 13:13:28 -04:00
W. Felix Handte c10d1b4011 Skeleton for In-Place Impl for ZSTD_dfast 2018-05-25 13:13:28 -04:00
Yann Collet b5ef32fea7 Merge branch 'dev' into fracFse 2018-05-24 14:09:49 -07:00
Yann Collet 776128d16f fix corner case when requiring cost of an FSE symbol
ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.

Also : enable BIT_DEBUG in /tests
2018-05-24 13:59:11 -07:00
Yann Collet 08c5be5db3
Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict
ZSTD_fast: Support Searching the Dictionary Context In-Place
2018-05-23 19:32:25 -07:00
Nick Terrell 06b70179da
Work around bug in zstd decoder (#1147)
Work around bug in zstd decoder

Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
2018-05-23 18:02:30 -07:00
Nick Terrell f2d0924b87 Variable declarations 2018-05-23 14:58:58 -07:00
W. Felix Handte d9c7e67125 Assert that Dict and Current Window are Adjacent in Index Space 2018-05-23 17:53:03 -04:00
W. Felix Handte 298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
W. Felix Handte 7ef85e0618 Fixes in re Comments 2018-05-23 17:53:03 -04:00
W. Felix Handte 582b7f85ed Don't Attach Empty Dict Contents
In weird corner cases, they produce unexpected results...
2018-05-23 17:53:03 -04:00
W. Felix Handte 9c92223468 Avoid Undefined Behavior in Match Ptr Calculation 2018-05-23 17:53:03 -04:00
W. Felix Handte a44ab3b475 Remove Out-of-Date Comment 2018-05-23 17:53:03 -04:00
W. Felix Handte 95bdf20a87 Moar Renames 2018-05-23 17:53:03 -04:00
W. Felix Handte 7e0402e738 Also Attach Dict When Source Size is Unknown 2018-05-23 17:53:03 -04:00
W. Felix Handte 3ba70cc759 Clear the Dictionary When Sliding the Window 2018-05-23 17:53:03 -04:00
W. Felix Handte b05ae9b608 Refine ip Initialization to Avoid ARM Weirdness 2018-05-23 17:53:03 -04:00
W. Felix Handte 1a7b34ef28 Use New Index Invariant to Simplify Conditionals 2018-05-23 17:53:03 -04:00
W. Felix Handte 2d598e6fed Force Working Context Indices Greater than Dict Indices 2018-05-23 17:53:03 -04:00
W. Felix Handte d005e5daf4 Whitespace Fix 2018-05-23 17:53:03 -04:00
W. Felix Handte 154eb09419 Switch to Original Match Calc for noDict Repcode Check 2018-05-23 17:53:03 -04:00
W. Felix Handte 191fc74a51 Rename 'hasDict' to 'dictMode' 2018-05-23 17:53:03 -04:00
W. Felix Handte ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
W. Felix Handte ca26cecc7a Rename and Reformat 2018-05-23 17:53:03 -04:00
W. Felix Handte 66bc1ca641 Change Cut-Off to 8 KB 2018-05-23 17:53:03 -04:00
W. Felix Handte c31ee3c7f8 Fix Rep Code Initialization 2018-05-23 17:53:03 -04:00
W. Felix Handte b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
W. Felix Handte 265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
W. Felix Handte 6929964d65 Add bounds check in repcode tests 2018-05-23 17:53:03 -04:00
W. Felix Handte 70a537d1d7 Initial Repcode Check Support for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte 8d24ff0353 Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte d18a405779 Refer to the Dictionary Match State In-Place (Sometimes) 2018-05-23 17:53:03 -04:00
Nick Terrell c92dd11940 Error if reported size is too large in edge case 2018-05-23 14:47:20 -07:00
Nick Terrell a97e9a627a [zstd] Fix decompression edge case
This edge case is only possible with the new optimal encoding selector,
since before zstd would always choose `set_basic` for small numbers of
sequences.

Fix `FSE_readNCount()` to support buffers < 4 bytes.

Credit to OSS-Fuzz
2018-05-23 12:16:00 -07:00
Nick Terrell e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Nick Terrell 49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
Yann Collet 27af35c110
Merge pull request #1143 from facebook/tableLevels
Update table of compression levels
2018-05-19 14:40:37 -07:00
Yann Collet 5381369cb1 Merge branch 'dev' into tableLevels 2018-05-18 18:23:27 -07:00
Yann Collet b0b3fb517d updated compression levels for blocks of 256KB 2018-05-18 17:17:12 -07:00
Nick Terrell 7cbb8bbbbf [cover] Small compression ratio improvement
The cover algorithm selects one segment per epoch, and it selects the epoch
size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs
gives the algorithm more candidates to choose from for each segment it
selects, and then it will loop back to the first epoch when it hits the
last one.

The trade off is that now it takes longer to select each segment, since it
has to look at more data before making a choice.

I benchmarked on the following data sets using this command:

```sh
$ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR | wc -c
```

| Data set     | k (approx) |  Before  |  After   | % difference |
|--------------|------------|----------|----------|--------------|
| GitHub       | ~1000      |   738138 |   746610 |       +1.14% |
| hg-changelog | ~90        |  4295156 |  4285336 |       -0.23% |
| hg-commands  | ~500       |  1095580 |  1079814 |       -1.44% |
| hg-manifest  | ~400       | 16559892 | 16504346 |       -0.34% |

There is some noise in the measurements, since small changes to `k` can
have large differences, which is why I'm using `steps=256`, to try to
minimize the noise. However, the GitHub data set still has some noise.

If I run the GitHub data set on my Mac, which presumably lists directory
entries in a different order, so the dictionary builder sees the files in
a different order, or I use `steps=1024` I see these results.

| Run        | Before | After  | % difference |
|------------|--------|--------|--------------|
| steps=1024 | 738138 | 734470 |       -0.50% |
| MacBook    | 738451 | 737132 |       -0.18% |

Question: Should we expose this as a parameter? I don't think it is
necessary. Someone might want to turn it up to exchange a much longer
dictionary building time in exchange for a slightly better dictionary.
I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16`
with a faster running time.
2018-05-18 16:15:27 -07:00
fbrosson 291824f49d __builtin_prefetch did probably not exist before gcc 3.1. 2018-05-18 18:40:11 +00:00
fbrosson 16bb8f1f9e Drop colon in asm snippet to make old versions of gcc happy. 2018-05-18 17:05:36 +00:00
Yann Collet 63eeeaa1dd update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
2018-05-16 16:13:37 -07:00
Yann Collet 9938b17d4c
Merge pull request #1135 from facebook/frameCSize
decompress: changed error code when input is too large
2018-05-15 11:02:53 -07:00
Nick Terrell 30d9c84b1a Fix failing Travis tests 2018-05-15 09:46:20 -07:00