Frank Denis 72064eba23 std/crypto: vectorize BLAKE3
Gives a ~40% speedup on x86_64.

However, the generic code remains faster on aarch64.

This is still processing only one block at a time for now.

I'm pretty confident that processing more blocks per round
will eventually give a substantial performance improvement on
all platforms with vector units.
2020-10-25 21:13:14 -04:00
..
2020-10-24 07:57:34 +02:00
2020-10-25 21:13:14 -04:00
2020-09-30 01:39:55 +02:00
2020-09-16 01:58:48 +03:00
2020-10-25 18:04:12 +01:00