Yann Collet
243200e5bf
minor refactor of ZSTD_fast
...
- reduced variables lifetime
- more accurate code comments
2019-09-17 14:02:57 -07:00
Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api
2019-09-16 13:29:59 -07:00
Bimba Shrestha
9e7bb55e14
Addressing comments
2019-09-09 20:04:46 -07:00
W. Felix Handte
b511a84adc
Move Workspace Functions to Their Own File
2019-09-09 13:34:08 -04:00
W. Felix Handte
077a2d7dc9
Rename
2019-09-09 13:34:08 -04:00
W. Felix Handte
ebd162194f
Clean Up TODOs and Comments
2019-09-09 13:34:08 -04:00
W. Felix Handte
2abe0145b1
Improve Comments a Bit
2019-09-09 13:34:08 -04:00
W. Felix Handte
75d574368b
When Loading Dict By Copy, Always Put it in the Workspace
2019-09-09 13:34:08 -04:00
W. Felix Handte
e69b67e33a
Alloc Tables Separately
2019-09-09 13:34:08 -04:00
W. Felix Handte
6177354b36
Begin Introducing Phases
2019-09-09 13:34:08 -04:00
W. Felix Handte
786f2266bb
TMP
2019-09-09 13:34:08 -04:00
W. Felix Handte
ccaac852e8
Normalize Case 'workSpace' -> 'workspace'
2019-09-09 13:27:18 -04:00
Varun S Nair
771645471f
Passing ZSTD_CCtx_params by const pointer
2019-09-05 15:28:30 +05:30
Bimba Shrestha
5f8b0f6890
Changing api to get sequences across all blocks
2019-08-30 09:18:44 -07:00
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
...
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
mgrice
b830599582
Improvements in zstd decode performance
...
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3). This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.
Speedup on Xeon E5-2680v4 is in the range of 3-5%.
Measured wildcopy length distributions on silesia.tar:
level <=8 <=16 <=24 >24
1 78.05% 11.49% 3.52% 6.94%
3 82.14% 8.99% 2.44% 6.43%
6 85.81% 6.51% 2.92% 4.76%
8 83.02% 7.31% 3.64% 6.03%
10 84.13% 6.67% 3.29% 5.91%
15 77.58% 7.55% 5.21% 9.66%
16 80.07% 7.20% 3.98% 8.75%
Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
bimbashrestha
96201d9774
Added bool to cctx and fixed some comment nits
2019-08-26 15:30:41 -07:00
Nick Magerko
2d39b43906
Use int for srcSizeHint when sensible
2019-08-19 16:49:25 -07:00
Nick Magerko
dffbac5f89
Add --size-hint=# option
2019-08-19 11:38:49 -07:00
Yann Collet
facbe8b2c2
factored the logic selecting lowest match index
...
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
98692c2838
fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3
2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
...
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279
Restructure the source files
2019-07-15 17:39:18 -07:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
...
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
9007701670
Adding targetCBlockSize param
2019-07-03 15:41:52 -07:00
Yann Collet
944e2e9e12
benchfn : added macro macro CONTROL()
...
like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
2019-06-21 15:58:55 -07:00
Yann Collet
a968099038
minor code cleaning for new index invalidation strategy
2019-05-31 16:52:37 -07:00
Yann Collet
bc601bdc6d
first implementation of small window size for btopt
...
noticeably improves compression ratio
when window size is small (< 18).
enwik7 level 19
windowLog `dev` `smallwlog` improvement
23 3.577 3.577 0.02%
22 3.536 3.538 0.06%
21 3.462 3.467 0.14%
20 3.364 3.377 0.39%
19 3.244 3.272 0.86%
18 3.110 3.166 1.80%
17 2.843 3.057 7.53%
16 2.724 2.943 8.04%
15 2.594 2.822 8.79%
14 2.456 2.686 9.36%
13 2.312 2.523 9.13%
12 2.162 2.361 9.20%
11 2.003 2.182 8.94%
2019-05-31 15:55:12 -07:00
Yann Collet
9719fd616c
removed nextToUpdate3 from ZSTD_window
...
it's now a local variable of ZSTD_compressBlock_opt()
2019-05-28 16:18:12 -07:00
Yann Collet
327cf6fac1
nextToUpdate3 does not need to be maintained outside of zstd_opt.c
...
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.
Made the logic clear, so that no code tried to maintain this variable.
An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.
This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
Yann Collet
6453f8158f
complementary code comments
...
on variables used / impacted during maxDist check
2019-05-28 14:12:16 -07:00
Yann Collet
4baecdf72a
added comments to better understand enforceMaxDist()
2019-05-28 13:15:48 -07:00
Josh Soref
a880ca239b
Spelling ( #1582 )
...
* spelling: accidentally
* spelling: across
* spelling: additionally
* spelling: addresses
* spelling: appropriate
* spelling: assumed
* spelling: available
* spelling: builder
* spelling: capacity
* spelling: compiler
* spelling: compressibility
* spelling: compressor
* spelling: compression
* spelling: contract
* spelling: convenience
* spelling: decompress
* spelling: description
* spelling: deflate
* spelling: deterministically
* spelling: dictionary
* spelling: display
* spelling: eliminate
* spelling: preemptively
* spelling: exclude
* spelling: failure
* spelling: independence
* spelling: independent
* spelling: intentionally
* spelling: matching
* spelling: maximum
* spelling: meaning
* spelling: mishandled
* spelling: memory
* spelling: occasionally
* spelling: occurrence
* spelling: official
* spelling: offsets
* spelling: original
* spelling: output
* spelling: overflow
* spelling: overridden
* spelling: parameter
* spelling: performance
* spelling: probability
* spelling: receives
* spelling: redundant
* spelling: recompression
* spelling: resources
* spelling: sanity
* spelling: segment
* spelling: series
* spelling: specified
* spelling: specify
* spelling: subtracted
* spelling: successful
* spelling: return
* spelling: translation
* spelling: update
* spelling: unrelated
* spelling: useless
* spelling: variables
* spelling: variety
* spelling: verbatim
* spelling: verification
* spelling: visited
* spelling: warming
* spelling: workers
* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
6b053b9f60
[lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
...
* After loading a dictionary only create the cdict once we've started the
compression job. This allows the user to pass the dictionary before they
set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell
e55da9e963
Wrap the new advanced api completely
2019-03-21 10:54:40 -07:00
Nick Terrell
3d7377b874
[libzstd] Handle uncompressed literals
2019-02-15 14:58:11 -08:00
Nick Terrell
f9513115e4
[libzstd] Add ZSTD_c_literalCompressionMode flag
...
It controls the literals compression. It is either
`auto`, `huffman`, or `uncompressed`. It defaults to
`auto`, which is the current behavior.
2019-02-13 14:59:22 -08:00
Yann Collet
e980ba212f
Merge pull request #1471 from facebook/nofloat
...
guard functions using floating point for debug mode only
2018-12-23 12:35:51 -08:00
Yann Collet
c9dfb7e445
guard functions using floating point for debug mode only
...
they are only used to print debug messages.
Requested in #1386 ,
2018-12-22 09:09:40 -08:00
Yann Collet
ededcfca57
fix confusion between unsigned <-> U32
...
as suggested in #1441 .
generally U32 and unsigned are the same thing,
except when they are not ...
case : 32-bit compilation for MIPS (uint32_t == unsigned long)
A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).
Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
but the caller user a pointer to U32 instead.
These fixes are useful.
However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.
As a consequence, the amount of code changed is larger than I would expect for such a patch.
Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.
So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.
I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.
Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
95784c654c
fixed shadowing of stat variable
...
some standard lib declares a `stat` variable at global scope
shadowing local declarations ....
2018-12-20 14:56:44 -08:00
Yann Collet
8e0e495ce8
fixed: compression ratio discrepancy
...
depending on initialization,
the first byte of a new frame was invalidated or not.
As a consequence, one match opportunity was available or not,
resulting in slightly different compressed sizes
(on average, 1 or 2 bytes once every 20 frames).
It impacted ratio comparison between one-shot and streaming modes.
This fix makes the first byte of a new frame always a valid match.
Now compressed size is always the same.
It also improves compressed size by a negligible amount.
2018-12-19 10:11:06 -08:00
Yann Collet
eee789b7ea
continued: changed to overlapLog
...
in deeper code layer.
for consistency.
2018-12-11 17:41:42 -08:00
Yann Collet
41c7d0b1e1
changed hashEveryLog into hashRateLog
2018-11-21 14:36:57 -08:00
Nick Terrell
b9693d3a49
[lib] Add rsyncable mode
...
- Add rsyncable mode to multithreaded mode
- Factor out LDM's hash function for reuse
2018-11-14 16:59:57 -08:00
W. Felix Handte
4127de5fa6
Switch Enum to Only Non-Negative Values, Update Comments
2018-11-12 12:47:47 -08:00
Yann Collet
8d56f4baee
added a few comments for clarifications
2018-10-26 15:21:52 -07:00
W. Felix Handte
6cb2454646
Remove CParams from Block Compressor Functions' Args
2018-09-28 17:10:42 -07:00
W. Felix Handte
76ef87ed9d
Add ZSTD_compressionParameters to ZSTD_matchState_t
2018-09-28 17:10:42 -07:00