171 Commits

Author SHA1 Message Date
senhuang42
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2 Remove bubbling down matches with longer offCode and same matchLen 2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9 Enable inclusion of mid-flight LDMs in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942 Correct incorrect offcode calculation 2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202 Add explicit conversion of size_t to U32 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866 Prefix new static ldm helpers with ZSTD_opt 2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7 Correct matchLength calculation and remove unnecessary functions 2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df Refactor existing functions to use posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87 Adjustments to ldm_calculateMatchRange() to calculate bounds correctly 2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2 Add ldm_calculateMatchRange() function 2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84 Prevent duplicate LDMs from being inserted 2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec Add extra bounds check to prevent heap access after free ASAN error 2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5 Address mixed variables C90 warning 2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18 ldm_getNextMatch fixed return values 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808 Fixed sifting algorithm 2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2 Add proper bounds check on adding ldms 2020-10-07 13:56:25 -04:00
senhuang42
a2f2b58d04 Add a function ldm_voidSequences() 2020-10-07 13:56:25 -04:00
senhuang42
9c3c7cd20e Fix function argument to getNextMatch() 2020-10-07 13:56:25 -04:00
senhuang42
c8b8572b38 Adjustments to no longer segfault on nci 2020-10-07 13:56:25 -04:00
senhuang42
5df9b5e05f Add initial getNextMatch() in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
f8ce7cabc3 Added more debugging 2020-10-07 13:56:25 -04:00
senhuang42
84009a076a Add re-copying of ldmSeqStore after processing 2020-10-07 13:56:25 -04:00
senhuang42
42395a70c2 Add debug statements, flesh out functions 2020-10-07 13:56:25 -04:00
senhuang42
dd3dd199bb Get zstd to build with new functions and callsites, fix arguments 2020-10-07 13:56:25 -04:00
senhuang42
766c4a8c28 Implement part of ldm_maybeAddLdm() 2020-10-07 13:56:25 -04:00
senhuang42
84777059d2 Implement ldm_getNextMatch() 2020-10-07 13:56:24 -04:00
senhuang42
28c74bf591 Implement basic splitSequence and skipSequence functions 2020-10-07 13:56:24 -04:00
senhuang42
634ab7830d Flesh out required args for ldm_handleLdm() 2020-10-07 13:56:24 -04:00
senhuang42
db70761032 Add callsites to appropriate locations in ..opt_generic() 2020-10-07 13:56:24 -04:00
senhuang42
aea61e3c91 Add ldm helper function declarations into opt parser 2020-10-07 13:56:24 -04:00
Nick Terrell
f91ed5c766 [lib] s/current/curr because it collides with Linux Kernel macro 2020-09-09 14:35:39 -07:00
Nick Terrell
c465f24457 ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free 2020-08-26 12:26:03 -07:00
Nick Terrell
6d687a8816 [lib] Fix dictionary + repcodes + optimal parser 2020-05-12 10:36:53 -07:00
W. Felix Handte
c7da66c9cf Purge C++-Style Comments (// ...), Make Compilation Succeed Under C90 2020-05-04 10:59:15 -04:00
Nick Terrell
e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
81fda0419e [opt] Only update repcodes upon arrival 2020-03-04 17:57:15 -08:00
Nick Terrell
0f9882deb9 [opt] Don't recompute repcodes while emitting sequences 2020-03-04 17:23:00 -08:00
Nick Terrell
c6caa2d04e [opt] Delete ZSTD_litLengthContribution 2020-03-04 16:35:26 -08:00
Nick Terrell
610171ed86 [opt] Explain why we don't include literals price 2020-03-04 16:29:19 -08:00
Nick Terrell
5f49578be7 [opt] Don't recompute initial literals price 2020-03-04 16:27:17 -08:00
Nick Terrell
ddab2a94e8 Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy() 2019-09-20 00:56:20 -07:00
Yann Collet
facbe8b2c2 factored the logic selecting lowest match index
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
98e7c344cd fixed strategies btopt+ 2019-08-02 14:42:53 +02:00