Commit Graph

223 Commits (a34a39c1838552027342e859a5a3917db07ef700)

Author SHA1 Message Date
Yann Collet e6fa70a0a1 reorganized ZSTD_resetCCtx_internal()
clearer separation between variables and buffers
clearer buffers category
kept static buffers at the beginning, favoring cache locality
(it will be easier to add FSE tables there later)

This break a few assumptions that hashTable was always at the beginning.
This is fixed.
And remaining assumptions (namely that tables stand next to each other in memory)
are now tested with assert.
2017-04-20 17:28:31 -07:00
Yann Collet 4f818182b8 clarified frame parameters for ZSTD_compress*_usingCDict()
created ZSTD_compressBegin_usingCDict_internal(),
which gives direct control to frame Parameters.
ZSTD_resetCStream_internal() now points into it.
2017-04-17 18:29:06 -07:00
Yann Collet c47c68f6ca proper evaluation of Huffman CTable size 2017-04-17 16:14:21 -07:00
Yann Collet 20d5e03893 content size is controlled at bufferless level
so it's active for all entry points

Also : added relevant test (wrong content size) in fuzzer
2017-04-11 18:34:02 -07:00
Nick Terrell 16a739cab0 Switch call of FSE_count() to FSE_count_wksp() 2017-04-04 16:17:21 -07:00
Yann Collet 274f59919d Changed memory strategy to __packed for gcc
Method 1 __packed is always as good or better than memcpy().
But it's not portable, as it depends on compiler extension.

For gcc, __pakced directive works fine.
Furthermore, gcc has serious performance issues with memcpy() on ARM 32 bits.
See #620
2017-03-30 12:52:14 -07:00
Nick Terrell 5152fb2cb2 Convert all tabs to spaces 2017-03-29 18:51:58 -07:00
Yann Collet f332ece468 dictBuilder fails to create dictionary on certain input
Properly expressed with an error code (see zstd_errors.h)
and a cli return code != 0
2017-03-23 16:24:02 -07:00
Sean Purcell 8fe5c6862c Fix undefined behaviour in decompressor 2017-03-10 10:17:42 -08:00
Yann Collet 1f2c95c5f3 minor code refactor in HUF module 2017-03-05 21:07:20 -08:00
Nick Terrell 54c4babd8f Always check Huffman tables for ZSTD_lazy+
The compressor always reuses the existing Huffman table if the literals
size is at most 1 KiB. If the compression strategy is `ZSTD_lazy` or
stronger always check to see if reusing the previous table or creating
a new table is better.

This doesn't yet weigh in decompression speed. I don't want to add any
heuristics there until I have real data to work with to ensure that the
heuristic works for at least one use case, preferably more.
2017-03-03 16:49:38 -08:00
Yann Collet f44b55c18d Merge pull request #584 from terrelln/huff-repeat
Allow compressor to repeat Huffman tables
2017-03-02 17:20:11 -08:00
Nick Terrell d051cd5b43 Use workspace for count and CTable 2017-03-02 16:38:07 -08:00
Sean Purcell 3d95925a59 Merge remote-tracking branch 'origin/dev' into m32 2017-03-02 15:17:56 -08:00
Nick Terrell a419777eb1 Allow compressor to repeat Huffman tables
* Compressor saves most recently used Huffman table and reuses it
  if it produces better results.
* I attempted to preserve CPU usage profile.
  I intentionally left all of the existing heuristics in place.
  There is only a speed difference on the second block and later.
  When compressing large enough blocks (say >= 4 KiB) there is
  no significant difference in compression speed.
  Dictionary compression of one block is the same speed for blocks
  with literals <= 1 KiB, and after that the difference is not
  very significant.
* In the synthetic data, with blocks 10 KB or smaller, most blocks
  can't use repeated tables because the previous block did not
  contain a symbol that the current block contains.
  Once blocks are about 12 KB or more, most previous blocks have
  valid Huffman tables for the current block, and the compression
  ratio and decompression speed jumped.
* In silesia blocks as small as 4KB can frequently reuse the
  previous Huffman table (85%), but it isn't as profitable, and
  the previous Huffman table only gets used about 3% of the time.
* Microbenchmarks show that `HUF_validateCTable()` takes ~55 ns
  and `HUF_estimateCompressedSize()` takes ~35 ns.
  They are decently well optimized, the first versions took 90 ns
  and 120 ns respectively. `HUF_validateCTable()` could be twice as
  fast, if we cast the `HUF_CElt*` to a `U32*` and compare to 0.
  However, `U32` has an alignment of 4 instead of 2, so I think that
  might be undefined behavior.
* I've ran `zstreamtest` compiled normally, with UASAN and with MSAN
  for 4 hours each.

The worst case for the speed difference is a bunch of small blocks
in the same frame. I modified `bench.c` to compress the input in a
single frame but with blocks of the given block size, set by `-B`.
Benchmarks on level 1:

|  Program  | Block size |   Corpus  | Ratio | Compression MB/s | Decompression MB/s |
|-----------|------------|-----------|-------|------------------|--------------------|
| zstd.base |        256 | synthetic | 2.364 |            110.0 |              297.0 |
|      zstd |        256 | synthetic | 2.367 |            108.9 |              297.0 |
| zstd.base |        256 | silesia   | 2.204 |             93.8 |              415.7 |
|      zstd |        256 | silesia   | 2.204 |             93.4 |              415.7 |
| zstd.base |        512 | synthetic | 2.594 |            144.2 |              420.0 |
|      zstd |        512 | synthetic | 2.599 |            141.5 |              425.7 |
| zstd.base |        512 | silesia   | 2.358 |            118.4 |              432.6 |
|      zstd |        512 | silesia   | 2.358 |            119.8 |              432.6 |
| zstd.base |       1024 | synthetic | 2.790 |            192.3 |              594.1 |
|      zstd |       1024 | synthetic | 2.794 |            192.3 |              600.0 |
| zstd.base |       1024 | silesia   | 2.524 |            148.2 |              464.2 |
|      zstd |       1024 | silesia   | 2.525 |            148.2 |              467.6 |
| zstd.base |       4096 | synthetic | 3.023 |            300.0 |             1000.0 |
|      zstd |       4096 | synthetic | 3.024 |            300.0 |             1010.1 |
| zstd.base |       4096 | silesia   | 2.779 |            223.1 |              623.5 |
|      zstd |       4096 | silesia   | 2.779 |            223.1 |              636.0 |
| zstd.base |      16384 | synthetic | 3.131 |            350.0 |             1150.1 |
|      zstd |      16384 | synthetic | 3.152 |            350.0 |             1630.3 |
| zstd.base |      16384 | silesia   | 2.871 |            296.5 |              883.3 |
|      zstd |      16384 | silesia   | 2.872 |            294.4 |              898.3 |
2017-03-02 13:27:52 -08:00
Sean Purcell d44703d145 Offsets >= 32MB in 32-bits mode 2017-03-01 16:27:56 -08:00
Yann Collet 76f0494089 xxhash can be included twice in any order
Previously,

followed by :

would fail to include the static definitions,
because the second include was simply skipped by guard macro.

Now it works as intended :
the missing static part is included during the second include.
2017-03-01 13:29:29 -08:00
Yann Collet 4bcc69b761 solves warnings when compiling with global XXH_STATIC_LINKING_ONLY
XXH_STATIC_LINKING_ONLY protection macro is intended to be triggered just before the include.
The main idea is to keep this setting local :
user module shall explicitly understand and accept the static linking restriction
which becomes transparent when triggering the macro at project level.
Global definition also triggers redefinition warnings for user modules which do locally define the macro.

This new version compiles lib and cli without warning when the macro is set globally.
That's not a scenario to be recommended, since it trades a local effect for a global one,
but it was easy enough to provide from zstd side.
2017-03-01 11:33:25 -08:00
Yann Collet 0b9b894b2d reduced ZSTD_DDict memory usage
saved 128 KB
2017-02-27 00:27:30 -08:00
Anders Oleson 517577bf53 spelling fixes in comments
i.e. occurred labeled Huffman
2017-02-20 12:08:59 -08:00
Yann Collet 2252d29a5a Merge branch 'dev' of github.com:facebook/zstd into dev 2017-02-15 12:00:50 -08:00
Yann Collet 4596037042 updated fse version
feature minor refactoring (removing FSE_abs())
also : fix a few minor issues recently introduced in examples
2017-02-15 12:00:03 -08:00
Yann Collet f0b9a8dddb Merge pull request #547 from inikep/dev11
Avoid fseek()'s 2GiB barrier with MacOS and *BSD
2017-02-14 12:29:00 -08:00
ds77 08e6a88a97 avoid empty translation unit warning without #pragma 2017-02-14 00:46:47 +01:00
Przemyslaw Skibinski 09c8e5390d __builtin_bswap requires gcc 4.3+ 2017-02-13 12:45:53 +01:00
Sean Purcell e0b3265e87 Fix ZSTD_getErrorString and add tests 2017-02-08 17:28:49 -08:00
Yann Collet cc3d1bc262 Merge pull request #525 from terrelln/covermt
Multithreaded COVER dictionary training
2017-01-30 10:15:33 -08:00
Nick Terrell b42dd27ef5 Add include guards and extern C 2017-01-27 16:00:19 -08:00
Nick Terrell e628eaf87a Fix pool.c threading.h import 2017-01-26 15:29:10 -08:00
cyan4973 2e3b659ae1 fixed minor warnings (Visual, conversion, doxygen) 2017-01-20 14:43:09 -08:00
cyan4973 5fba09fa41 updated util's time for Windows compatibility
Correctly measures time on Posix systems when running with
Multi-threading

Todo : check Windows measurement under multi-threading
2017-01-20 12:57:31 -08:00
Yann Collet 0f984d94c4 changed MT enabling macro to ZSTD_MULTITHREAD 2017-01-19 14:05:07 -08:00
Yann Collet 32dfae6f98 fixed Multi-threaded compression
MT compression generates a single frame.
Multi-threading operates by breaking the frames into independent sections.
But from a decoder perspective, there is no difference :
it's just a suite of blocks.

Problem is, decoder preserves repCodes from previous block to start decoding next block.
This is also valid between sections, since they are no different than changing block.

Previous version would incorrectly initialize repcodes to their default value at the beginning of each section.
When using them, there was a mismatch between encoder (default values) and decoder (values from previous block).

This change ensures that repcodes won't be used at the beginning of a new section.
It works by setting them to 0.
This only works with regular (single segment) variants : extDict variants will fail !
Fortunately, sections beyond the 1st one belong to this category.

To be checked : btopt strategy.
This change was only validated from fast to btlazy2 strategies.
2017-01-19 10:32:55 -08:00
Yann Collet f1cb55192c fixed linux warnings 2017-01-02 01:11:55 +01:00
Nick Terrell bb13387d7d Fix pool for threading.h 2016-12-31 19:10:47 -05:00
Nick Terrell 4204e03e77 Add threading.h condition variables 2016-12-31 19:10:29 -05:00
Yann Collet 3b9d434356 extended ZSTDMT code support for non-MT systems and WIN32 (preliminary) 2016-12-31 16:32:19 +01:00
Yann Collet 3b29dbd9e8 new zstdmt version using generic treadpool 2016-12-31 06:04:25 +01:00
Yann Collet c6a6417458 bench correctly measures time for multi-threaded compression (posix only) 2016-12-31 03:31:26 +01:00
Nick Terrell e777a5be6b Add a thread pool for ZSTDMT and COVER 2016-12-29 23:39:44 -08:00
Yann Collet 0819abe3c1 added ZSTD_createDDict_byReference() body 2016-12-21 19:25:15 +01:00
Nick Terrell 8de46ab51a Export all API functions 2016-12-16 13:27:30 -08:00
Yann Collet 5397a66b19 minor BMI version check 2016-12-13 15:21:06 +01:00
Nick Terrell 064a143520 Fix execSequence wildcopy undefined behavior
execSequence relied on pointer overflow to handle cases where
`sequence.matchLength < 8`.  Instead of passing an `size_t` to
wildcopy, pass a `ptrdiff_t`.
2016-12-12 19:01:23 -08:00
Yann Collet 825dffbc43 moved zbuff source files into lib/deprecated 2016-12-05 19:28:19 -08:00
Przemyslaw Skibinski 821bf1febc fixed Doxygen trailing comment 2016-12-02 16:13:41 +01:00
Yann Collet b89af20353 reduced table sizes for HUF_readDTableX4 2016-12-01 18:24:59 -08:00
Yann Collet a0d742b1e4 introduced HUF_buildCTable_wksp(), to reduce stack memory usage 2016-12-01 17:47:30 -08:00
Yann Collet e928f7e16d introduced ext_wksp variants of count to reduce stack memory usage 2016-12-01 16:13:35 -08:00
Yann Collet 5e00b848a8 FSE_compress_wksp() uses less stack space 2016-11-30 16:46:13 -08:00