facebook/zstd - zstd - Final Minetest

Commit Graph

Author	SHA1	Message	Date
Yann Collet	712318a244	Merge pull request #1146 from terrelln/fse-fix [zstd] Fix decompression edge case	2018-05-23 16:41:42 -07:00
Nick Terrell	f2d0924b87	Variable declarations	2018-05-23 14:58:58 -07:00
Nick Terrell	c92dd11940	Error if reported size is too large in edge case	2018-05-23 14:47:20 -07:00
Nick Terrell	a97e9a627a	[zstd] Fix decompression edge case This edge case is only possible with the new optimal encoding selector, since before zstd would always choose `set_basic` for small numbers of sequences. Fix `FSE_readNCount()` to support buffers < 4 bytes. Credit to OSS-Fuzz	2018-05-23 12:16:00 -07:00
Yann Collet	27dc078aa6	Merge pull request #1144 from terrelln/fse-entropy Approximate FSE encoding costs for selection	2018-05-22 19:25:37 -07:00
Yann Collet	4a498f03dc	Merge pull request #1145 from terrelln/spec Clarify what happens when Number_of_Sequences == 0	2018-05-22 16:21:40 -07:00
Nick Terrell	73f4c890cd	Clarify what happens when Number_of_Sequences == 0	2018-05-22 16:12:33 -07:00
Nick Terrell	e3959d5eba	Fixes	2018-05-22 16:06:33 -07:00
Nick Terrell	49cf880513	Approximate FSE encoding costs for selection Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and `set_repeat`, and select the one with the lowest cost. * The cost of `set_basic` is computed using the cross-entropy cost function `ZSTD_crossEntropyCost()`, using the normalized default count and the count. * The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the previous table to see if it is able to represent the distribution. * The cost of `set_compressed` is computed with the entropy cost function `ZSTD_entropyCost()`, together with the cost of writing the normalized count `ZSTD_NCountCost()`.	2018-05-22 14:33:22 -07:00
Yann Collet	27af35c110	Merge pull request #1143 from facebook/tableLevels Update table of compression levels	2018-05-19 14:40:37 -07:00
Yann Collet	ade583948d	Merge branch 'tableLevels' of github.com:facebook/zstd into tableLevels	2018-05-18 18:23:40 -07:00
Yann Collet	5381369cb1	Merge branch 'dev' into tableLevels	2018-05-18 18:23:27 -07:00
Yann Collet	ca06a1d82f	Merge pull request #1142 from terrelln/better-dict [cover] Small compression ratio improvement	2018-05-18 17:19:13 -07:00
Yann Collet	38c2c46823	Merge branch 'dev' into tableLevels	2018-05-18 17:17:45 -07:00
Yann Collet	b0b3fb517d	updated compression levels for blocks of 256KB	2018-05-18 17:17:12 -07:00
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	44303428c6	Merge pull request #1139 from fbrosson/prefetch __builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 13:23:35 -07:00
fbrosson	291824f49d	__builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 18:40:11 +00:00
Yann Collet	bd6417de7f	Merge pull request #1140 from fbrosson/cpu-asm Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 10:32:16 -07:00
fbrosson	16bb8f1f9e	Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 17:05:36 +00:00
Yann Collet	63eeeaa1dd	update table levels for blocks <= 16K also : allow hlog to be slighly larger than windowlog, as it's apparently good for both speed and compression ratio.	2018-05-16 16:13:37 -07:00
Yann Collet	9938b17d4c	Merge pull request #1135 from facebook/frameCSize decompress: changed error code when input is too large	2018-05-15 11:02:53 -07:00
Yann Collet	b14c4bff96	Merge pull request #1136 from terrelln/fix Fix failing Travis tests	2018-05-15 11:02:01 -07:00
Nick Terrell	30d9c84b1a	Fix failing Travis tests	2018-05-15 09:46:20 -07:00
Yann Collet	f372ffc64d	Merge pull request #1127 from facebook/staticDictCost Improved optimal parser with dictionary	2018-05-14 17:45:50 -07:00
Yann Collet	d59cf02df0	decompress: changed error code when input is too large ZSTD_decompress() can decompress multiple frames sent as a single input. But the input size must be the exact sum of all compressed frames, no more. In the case of a mistake on srcSize, being larger than required, ZSTD_decompress() will try to decompress a new frame after current one, and fail. As a consequence, it will issue an error code, ERROR(prefix_unknown). While the error is technically correct (the decoder could not recognise the header of _next_ frame), it's confusing, as users will believe that the first header of the first frame is wrong, which is not the case (it's correct). It makes it more difficult to understand that the error is in the source size, which is too large. This patch changes the error code provided in such a scenario. If (at least) a first frame was successfully decoded, and then following bytes are garbage values, the decoder assumes the provided input size is wrong (too large), and issue the error code ERROR(srcSize_wrong).	2018-05-14 15:32:28 -07:00
Yann Collet	c8c67f7c84	Merge branch 'dev' into tableLevels	2018-05-14 11:55:52 -07:00
Yann Collet	174bd3d4a7	Merge pull request #1131 from facebook/zstdcli minor: control numeric argument overflow	2018-05-14 11:53:58 -07:00
Yann Collet	5d76201fee	Merge pull request #1130 from facebook/man fix #1115	2018-05-14 11:52:53 -07:00
Yann Collet	902db38798	Merge pull request #1129 from facebook/paramgrill Paramgrill refactoring	2018-05-14 11:52:41 -07:00
Yann Collet	3870db1ba5	Merge branch 'dev' into tableLevels	2018-05-14 11:52:05 -07:00
Yann Collet	4da0216db0	Merge pull request #1133 from felixhandte/travis-fix Make Travis CI Run `apt-get update`	2018-05-14 09:59:43 -07:00
W. Felix Handte	e26be5a7b3	Travis CI Runs apt-get Update	2018-05-14 11:55:21 -04:00
Yann Collet	2c392952f9	paramgrill: use NB_LEVELS_TRACKED in loop make it easier to generate/track more levels than ZSTD_maxClevel()	2018-05-13 17:25:53 -07:00
Yann Collet	c9227ee16b	update table for 128 KB blocks	2018-05-13 17:15:07 -07:00
Yann Collet	b4250489cf	update compression levels for large inputs	2018-05-13 01:53:38 -07:00
Yann Collet	9cd5c63771	cli: control numeric argument overflow exit on overflow backported from paramgrill added associated test case	2018-05-12 14:29:33 -07:00
Yann Collet	3f89cd1081	minor : factor out errorOut()	2018-05-12 14:09:32 -07:00
Yann Collet	b824d213cb	fix #1115	2018-05-12 10:21:30 -07:00
Yann Collet	50993901b2	paramgrill: subtle change in level spacing distance between levels is slightly increased to compensate for level 1 speed improvements and the will to have stronger level 19 extending the range of speed to cover.	2018-05-12 09:40:04 -07:00
Yann Collet	a3f2e84a37	added programmable constraints	2018-05-11 19:43:08 -07:00
Yann Collet	17c19fbbb5	generalized use of readU32FromChar() and check input overflow	2018-05-11 17:32:26 -07:00
Yann Collet	761758982e	replaced FSE_count by FSE_count_simple to reduce usage of stack memory. Also : tweaked a few comments, as suggested by @terrelln	2018-05-11 16:03:37 -07:00
Yann Collet	66b81817b5	Merge pull request #1128 from facebook/libdir minor Makefile patch	2018-05-11 11:47:59 -07:00
Yann Collet	3193d692c2	minor patch, ensuring LIBDIR is created before installation follow-up from #1123	2018-05-11 11:31:48 -07:00
Yann Collet	99ddca43a6	fixed wrong assertion base can actually overflow	2018-05-10 19:48:09 -07:00
Yann Collet	0d7626672d	fixed c++ conversion warning	2018-05-10 18:17:21 -07:00
Yann Collet	09d0fa29ee	minor adjusting of weights	2018-05-10 18:13:48 -07:00
Yann Collet	1a26ec6e8d	opt: init statistics from dictionary instead of starting from fake "default" statistics.	2018-05-10 17:59:12 -07:00
Yann Collet	74b1c75d64	btopt : minor adjustment of update frequencies	2018-05-10 16:32:36 -07:00

1 2 3 4 5 ...

5077 Commits (712318a2440121dd88bb0bcd9ce956b13fb3c5ac) All Branches Search

5077 Commits (712318a2440121dd88bb0bcd9ce956b13fb3c5ac)

All Branches