Commit Graph

41 Commits (551240067786409fa7ac6da9ecee0505e3065bc9)

Author SHA1 Message Date
Yann Collet 50b216146f
Merge pull request #1304 from facebook/largeNbDicts
contrib/largeNbDicts
2018-09-06 09:50:56 -07:00
Jennifer Liu 21721b75a3 Change default f to 20 2018-09-04 17:15:14 -07:00
Jennifer Liu 944c9986e0 Update comment on default steps of cover and fastcover 2018-08-30 15:37:29 -07:00
Jennifer Liu 16db0337b1 Always use splitPoint=1.0 for non-optimize cover and fastcover 2018-08-30 14:59:22 -07:00
Yann Collet 6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00
Jennifer Liu 9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Jennifer Liu 612b346ed5 Add explanation for split=100 2018-07-11 15:50:28 -07:00
Jennifer Liu 5021441d86 Change default splitPoint to 100 2018-07-10 11:19:33 -07:00
Jennifer Liu 015a00af0f Change cover_sum back to 2 parameters and fix splitPoint issues 2018-07-06 14:24:18 -07:00
Jennifer Liu 0ef06f2e8a Split samples into train and test sets 2018-06-29 12:33:34 -07:00
Yann Collet 9f8ed23b5b bumped version number to v1.3.4
also added a paragraph on using compression level with training mode
as this is a recurrent question (see for example #1004)
2018-01-27 22:23:26 -08:00
Yann Collet 218e9fe0fc added a test case for dictBuilder failure
cyclic data set makes the entropy stage fails
now, onto a fix for #304 ...
2018-01-11 09:42:38 -08:00
Yann Collet 3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Yann Collet 32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Nick Terrell 5b7fd7c422 [zdict] Make COVER the default algorithm 2017-06-26 21:09:22 -07:00
Nick Terrell a1280406b0 [libzstd] Allow users to define custom visibility 2017-05-19 18:01:59 -07:00
Nick Terrell 865918dd04 Fix typo in zdict.h 2017-05-02 11:02:37 -07:00
Yann Collet 9da3b215ec Ensure all limits derived from same constants
Now uses ZDICT_DICTSIZE_MIN and ZDICT_CONTENTSIZE_MIN
from zdict.h.

Also : reduced values to 256 and 128 respectively
2017-03-24 15:02:09 -07:00
Nick Terrell 545987996a Fix deprecation warnings for clang with C++14 2017-02-08 17:38:17 -08:00
Nick Terrell 43474313f8 Fix documentation about memory usage 2017-01-27 18:43:05 -08:00
Nick Terrell 2fe9126591 Add multithread support to COVER 2017-01-27 11:56:02 -08:00
Nick Terrell 8d984699db Document memory requirements for COVER algorithm 2017-01-09 18:20:10 -08:00
Nick Terrell 3a1fefcf00 Simplify COVER parameters 2017-01-02 17:51:38 -08:00
Nick Terrell 96b39f65fa Add COVER dictionary builder 2017-01-02 13:22:51 -08:00
Nick Terrell 1b5d4a7d53 ZDICT_finalizeDictionary() flipped comparison 2016-12-22 18:14:57 -08:00
Nick Terrell bcbe77e994 ZDICT_finalizeDictionary() flipped comparison
`ZDICT_finalizeDictionary()` had a flipped comparison.
I also allowed `dictBufferCapacity == dictContentSize`.
It might be the case that the user wants to fill the dictionary
completely up, and then let zstd take exactly the space it needs
for the entropy tables.
2016-12-22 18:01:14 -08:00
Nick Terrell 78a0072d5a Fix failing test due to deprecation warning 2016-12-22 17:36:16 -08:00
Yann Collet d76d1a9ef0 added ZDICT_finalizeDictionary() 2016-12-22 20:18:43 +01:00
Nick Terrell 8de46ab51a Export all API functions 2016-12-16 13:27:30 -08:00
Yann Collet d725427a3c g_time => local displayTime 2016-09-02 15:32:39 -07:00
Yann Collet 4ded9e591c added boilerplate 2016-08-30 11:06:28 -07:00
Yann Collet da3fbcb302 Added ZDICT_getDictID() 2016-08-19 14:23:58 +02:00
Alexander Borzunov 0f6f17a14f Rename ZSTDLIB_API to ZDICTLIB_API in zdict.h 2016-08-18 16:47:06 +05:00
Alexander Borzunov 1f48382b1a Export functions related to dictionary compression from DLL 2016-08-18 16:12:49 +05:00
Yann Collet d469a98c01 Clarified API comments, from suggestions by ‎Bryan O'Sullivan‎ 2016-07-28 04:55:03 +02:00
Yann Collet 4110534886 ZSTD_maxCLevel() is promoted to "stable" API (#254, by @FrancescAlted) 2016-07-27 15:09:11 +02:00
Yann Collet 5e80dd3261 fixed minor coverity warnings 2016-07-13 19:21:57 +02:00
Yann Collet 52a0622beb RepsCodes are saved into Dict
(uncomplete : need decompression to regenerate them)
2016-06-16 01:05:04 +02:00
Yann Collet e69b8ccceb merged `zdict_static.h` into `zdict.h`. Now requires `ZDICT_STATIC_LINKING_ONLY` macro. 2016-06-04 18:56:23 +02:00
Giuseppe Ottaviano 370b751e24 Expose function to add entropy tables to pre-built dictionary.
In some cases a custom dictionary building algorithm tailored for a specific
input can be more effective than the one produced by `ZDICT_trainFromBuffer`,
but with the current API it's not possible encode the entropy tables into the
custom-built dictionary.

This commit extracts the logic to add entropy tables to a dictionary from
`ZDICT_trainFromBuffer` and exposes it as a function
`ZDICT_addEntropyTablesFromBuffer`.
2016-05-30 19:50:09 -07:00
inikep 23a0889301 separation of lib/ into common/, compress/, decompress/, dictBuilder/, legacy/ 2016-04-22 12:43:18 +02:00