Gives a ~40% speedup on x86_64.
However, the generic code remains faster on aarch64.
This is still processing only one block at a time for now.
I'm pretty confident that processing more blocks per round
will eventually give a substantial performance improvement on
all platforms with vector units.
The bcrypt function intentionally requires quite a lot of CPU cycles
to complete.
In addition to that, not having its full state constantly in the
CPU L1 cache causes a massive performance drop.
These properties slow down brute-force attacks against low-entropy
inputs (typically passwords), and GPU-based attacks get little
to no advantages over CPUs.
The NaCl constructions are available in pretty much all programming
languages, making them a solid choice for applications that require
interoperability.
Go includes them in the standard library, JavaScript has the popular
tweetnacl.js module, and reimplementations and ports of TweetNaCl
have been made everywhere.
Zig has almost everything that NaCl has at this point, the main
missing component being the Salsa20 cipher, on top on which NaCl's
secretboxes, boxes, and sealedboxes can be implemented.
So, here they are!
And clean the X25519 API up a little bit by the way.
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm.
- consistently use `_length` instead of mixing `_size` and `_length` for the
constants we expose
- Use `minimum_key_length` when it represents an actual minimum length.
Otherwise, use `key_length`.
- Require output buffers (for ciphertexts, macs, hashes) to be of the right
size, not at least of that size in some functions, and the exact size elsewhere.
- Use a `_bits` suffix instead of `_length` when a size is represented as a
number of bits to avoid confusion.
- Functions returning a constant-sized slice are now defined as a slice instead
of a pointer + a runtime assertion. This is the case for most hash functions.
- Use `camelCase` for all functions instead of `snake_case`.
No functional changes, but these are breaking API changes.
`DefaultCsprng` is documented as a cryptographically secure RNG.
While `ISAAC` is a CSPRNG, the variant we have, `ISAAC64` is not.
A 64 bit seed is a bit small to satisfy that claim.
We also saw it being used with the current date as a seed, that
also defeats the point of a CSPRNG.
Set `DefaultCsprng` to `Gimli` instead of `ISAAC64`, rename
the parameter from `init_s` to `secret_seed` + add a comment to
clarify what kind of seed is expected here.
Instead of directly touching the internals of the Gimli implementation
(which can change/be architecture-specific), add an `init()` function
to the state.
Our Gimli-based CSPRNG was also not backtracking resistant. Gimli
is a permutation; it can be reverted. So, if the state was ever leaked,
future secrets, but also all the previously generated ones could be
recovered. Clear the rate after a squeeze in order to prevent this.
Finally, a dumb test was added just to exercise `DefaultCsprng` since
we don't use it anywhere.
HMAC is a generic construction, so we allow it to be instantiated
with any hash function.
In practice, HMAC is almost exclusively used with MD5, SHA1 and SHA2,
so it makes sense to define some shortcuts for them.
However, defining `HmacBlake2s256` is a bit weird (and why
specifically that one, and not other hash functions we also support?).
There would be nothing wrong with that construction, but it's not
used in any standard protocol and would be a curious choice.
BLAKE2 being a keyed hash function, it doesn't need HMAC to be used as
a MAC, so that also doesn't make it a good example of a possible hash
function for HMAC.
This commit doesn't remove the ability to use a Hmac(Blake2s256) type
if, for some reason, applications really need this, but it removes
HmacBlake2s256 as a constant.
This is slightly slower but makes our verification function compatible
with batch signatures. Which, in turn, makes blockchain people happy.
And we want to make our users happy.
Add convenience functions to substract edwards25519 points and to
clear the cofactor.
Brings a 30% speed boost on x86_64 even though we still process only
one block at a time for now.
Only enabled on x86_64 since the non-vectorized implementation seems
to currently perform better on some architectures (at least on aarch64).
But the non-vectorized implementation still gets a little speed boost
as well (~17%) with these changes.
Performance increases from ~400 MiB/s to 450 MiB/s at the expense of
extra code. Thus, aggregation is disabled on ReleaseSmall.
Since the multiplication cost is significant compared to the reduction,
aggregating more than 2 blocks is probably not worth it.
Showcase that Zig can be a great option for high performance cryptography.
The AEGIS family of authenticated encryption algorithms was selected for
high-performance applications in the final portfolio of the CAESAR
competition.
They reuse the AES core function, but are substantially faster than the
CCM, GCM and OCB modes while offering a high level of security.
AEGIS algorithms are especially fast on CPUs with built-in AES support, and
the 128L variant fully takes advantage of the pipeline in modern Intel CPUs.
Performance of the Zig implementation is on par with libsodium.
Before:
gimli-hash: 120 MiB/s
gimli-aead: 130 MiB/s
After:
gimli-hash: 195 MiB/s
gimli-aead: 208 MiB/s
Also fixes in-place decryption by the way.
If the input & output buffers were the same, decryption used to fail.
Return on decryption error in the benchmark to detect similar issues
in future AEADs even in non release-fast mode.
* Reorganize crypto/aes in order to separate parameters, implementations and
modes.
* Add a zero-cost abstraction over the internal representation of a block,
so that blocks can be kept in vector registers in optimized implementations.
* Add architecture-independent aesenc/aesdec/aesenclast/aesdeclast operations,
so that any AES-based primitive can be implemented, including these that don't
use the original key schedule (AES-PRF, AEGIS, MeowHash...)
* Add support for parallelization/wide blocks to take advantage of hardware
implementations.
* Align T-tables to cache lines in the software implementations to slightly
reduce side channels.
* Add an optimized implementation for modern Intel CPUs with AES-NI.
* Add new tests (AES256 key expansion).
* Reimplement the counter mode to work with any block cipher, any endianness
and to take advantage of wide blocks.
* Add benchmarks for AES.