Commit Graph

108 Commits (b660ef87f85646891211ce162c0dd1b38eba366e)

Author SHA1 Message Date
Vincent Torri 6b5c10b48c shared library: rename import library with .dll.a extension
mort of open source project are using this extension for the import library.
The Win32 linker is supporting this extension, see
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/win32.html
section "direct linking to a dll"
2019-11-15 19:46:06 +01:00
W. Felix Handte 74bd76c3ff In pkg-config File, Derive Lib and Include Dir from Prefix at Use-Time
Addresses #1794. Instead of deriving the lib dir and include dir at
build-time, let's do it like everyone else does at pkg-config run-time.

This has the disadvantage that we can no longer override LIBDIR and
INCLUDEDIR in the Makefile and have that reflected in the .pc file.
2019-10-25 15:07:31 -04:00
mgrice 812e8f2a16 perf improvements for zstd decode (#1668)
* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Nick Terrell 641e594309 [libzstd] Remove ZSTDMT from the shared object
* Remove ZSTDMT from the shared object by default.
* Provide a macro `ZSTD_LEGACY_MULTITHREADED_API` to override it.
* Document it in `lib/README.md`.
2019-04-07 18:47:52 -07:00
Nick Terrell 9f9630f455 [Windows] Don't use a .def file 2019-02-19 16:52:38 -08:00
Peter (Stig) Edwards 894bbda44c
-Wformat-security not needed with -Wformat=2 2019-02-01 09:31:02 +00:00
W. Felix Handte bd4afc389f Add Logic to Makefile to Convert Make Vars to Defines 2018-12-18 13:36:39 -08:00
Yann Collet 7ef7dc561a check availability of --color=never command on grep and egrep
before applying them.
Fixes #1436
2018-12-03 15:46:55 -08:00
Yann Collet fc20b3c441 added flag -Wc++-compat
for library and cli
2018-10-26 16:38:23 -07:00
Yann Collet bc93b801f0
Merge pull request #1330 from korli/haiku
Enable building zstd on Haiku.
2018-10-03 13:36:00 -07:00
Jerome Duval 87c10e2f58 Enable building zstd on Haiku. 2018-10-03 09:51:56 +02:00
Nick Terrell f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet e74eade251
Merge pull request #1339 from facebook/grep_colors
fixed usage of grep in Makefile
2018-09-26 14:39:20 -07:00
Yann Collet 8ff17a6a09
Merge pull request #1329 from facebook/v04isout
Changed default legacy support to v0.5+
2018-09-26 13:39:05 -07:00
Yann Collet 08f68d83c5 fixed usage of grep in Makefile
when terminal uses colors
as suggested by @danielshir (#1294)
2018-09-25 16:56:53 -07:00
Yann Collet 71a5210617 avoid recompiling dll every time under mingw 2018-09-21 17:40:30 -07:00
Yann Collet b2939163e1 Changed default legacy support to v0.5+
thus dropping read support for v0.4.

It's always possible to re-enable it, by changing build macro ZSTD_LEGACY_SUPPORT to 4.
2018-09-20 14:30:20 -07:00
Yann Collet 6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00
cyan4973 3f535007e4 fix %zu support under minGW
and relevant test on Appveyor
2018-07-30 16:56:18 +02:00
Ryan Schmidt b567ce9d68 Fix name of macOS 2018-06-09 14:31:17 -05:00
George Lu b3ef314830 Fix Typos 2018-06-04 17:19:06 -07:00
George Lu 609d72b0ca Added Deprecated Dependencies 2018-06-04 14:33:21 -07:00
George Lu 9437021d2f Remove old file declaration 2018-06-04 13:32:41 -07:00
George Lu 65de25a463 Created Macros 2018-06-04 09:56:29 -07:00
Yann Collet 3193d692c2 minor patch, ensuring LIBDIR is created before installation
follow-up from #1123
2018-05-11 11:31:48 -07:00
Baruch Siach 9a0643b633 lib/Makefile: create include directory before headers installation
Make sure that $(INCLUDEDIR) exists before copying the headers there.
Otherwise, the contest of header files is copied over
$(DESTDIR)$(INCLUDEDIR), making it a regular file.

While at it, remove $(DESTDIR)$(INCLUDEDIR) from the list of directories
to create in the install-pc target. The install-pc target does not need
this directory.
2018-05-08 20:59:44 +03:00
Peter Seiderer 64bfdca5b9 Split library install target into pc, static, shared and include only target
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
2018-04-30 20:32:32 +02:00
Björn Ketelaars 9d3048346d Fix building zstd on OpenBSD. 2018-03-31 10:46:20 +02:00
Yann Collet 0d6ecc72a3 makes it possible to compile libzstd in single-thread mode without zstdmt_compress.c (#819) 2017-09-11 14:09:34 -07:00
Yann Collet 3a12531a3d lib/Makefile : better support for GNU conventions
see https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html
2017-09-06 16:35:49 -07:00
Yann Collet b0cb081dc8 last batch of header files changed to reflect new license (#825)
only remains to update contrib/linux-kernel (@terrelln)
2017-08-31 12:20:50 -07:00
Bernhard M. Wiedemann cf689b84f9 Sort input file list
in order to make builds reproducible
in spite of indeterministic filesystem readdir order.
See https://reproducible-builds.org/ for why this is good.
2017-08-26 17:08:00 +02:00
Yuri 92bafda406 INSTALL_DATA instead of INSTALL_LIB for libzstd.a
INSTALL_LIB can be passed -s flag to strip symbols. Static libraries should not be stripped, only dynamic ones should be stripped.
2017-06-17 00:23:41 -07:00
Dmitry V. Levin 1ea655c765 Fix typo in libzstd.a-mt make rules
The macro name is ZSTD_MULTITHREAD, not ZSTD_MULTHREAD.

Fixes: ca6fae7808 ("Add MT enabled targets for libzstd")
2017-05-25 23:43:05 +00:00
Yann Collet 2d4d31c18a removed gcc compilation flag -Wbad-function-cast
It makes it more difficult to directly cast the result of a function,
requiring to store the result in an intermediate variable.
It does not necessarily help readability,
and this restriction can be difficult to overcome in some constructions,
like some macros.

also : fixed minor Visual conversion warnings in datagencli.c
2017-05-16 11:34:38 -07:00
Yann Collet 83d0c764dc added several compilation flags 2017-05-15 17:15:46 -07:00
Yann Collet a00e9599f1 removed -g from DEBUGFLAGS
It inflates binary sizes, which is negative for the Windows build.
It also makes it impossible to check if 2 different source codes
get nonetheless compiled to the same binary,
since checksum will be different, due to integrated source code.
2017-05-04 17:24:29 -07:00
Sean Purcell ca6fae7808 Add MT enabled targets for libzstd 2017-04-18 14:13:01 -07:00
Sean Purcell 120df494e9 Update builds to not support legacy v01-v03 2017-03-13 14:44:08 -07:00
Sean Purcell 334cb34edb ZSTD_LEGACY_SUPPORT defines lowest supported version 2017-03-13 14:32:30 -07:00
Yann Collet 8b1d004031 added -Wformat-security flag, as recommended by @pixelb 2017-03-05 21:17:32 -08:00
Yann Collet b54e235bf3 fixed Mac OS-X specific directory in $(RM) list
these directories are now removed with -r command
2017-02-05 10:22:58 -08:00
Yann Collet c2a4632789 release builds use less debug symbols and warnings
release build are triggered through either `make`,
or their specific target `make zstd-release` and `make lib-release`.
2017-02-02 20:54:41 -08:00
Yann Collet d7e3cb58c5 Resolved merge conflict dev+zstdmt 2017-01-20 16:44:50 -08:00
Przemyslaw Skibinski d72f4b6b7a added "Makefile is validated" 2017-01-17 12:40:06 +01:00
Yann Collet 6334b04d61 compile object files, for faster recompilation 2017-01-02 03:22:18 +01:00
Przemyslaw Skibinski 75f3a3a335 changed default PREFIX and MANDIR 2016-12-28 12:32:41 +01:00
Przemyslaw Skibinski 63b0014b96 BSD: improved "make install" 2016-12-23 10:05:49 +01:00
Przemyslaw Skibinski b999170311 Solaris: working "make -C lib install" 2016-12-22 20:14:37 +01:00
Yann Collet 383b8088a3 minor lib build refactoring 2016-12-08 18:42:27 -08:00