165 Commits

Author SHA1 Message Date
Frank Denis
83abb32247 std/crypto - edwards25519 precomp: prefer doublings over adds
Doublings are a little bit faster than additions, so use them half
the time during precomputations.
2020-11-25 15:37:43 -08:00
Frank Denis
9c2b014ea8 std/crypto: use NAF for multi-scalar edwards25519 multiplication
Transforming scalars to non-adjacent form shrinks the number of
precomputations down to 8, while still processing 4 bits at a time.

However, real-world benchmarks show that the transform is only
really useful with large precomputation tables and for batch
signature verification. So, do it for batch verification only.
2020-11-17 17:07:32 -08:00
Frank Denis
0d9c474ecf std/crypto: implement the Hash-To-Curve standard for Edwards25519
https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve

This is quite an important feature to have since many other standards
being worked on depend on this operation.

Brings a couple useful arithmetic operations on field elements by the way.

This PR also adds comments to the functions we expose in 25519/field
so that they can appear in the generated documentation.
2020-11-17 17:06:38 -08:00
Andrew Kelley
e6bfa377d1 std.crypto.isap: fix callsites of secureZero 2020-11-16 18:10:41 -07:00
Frank Denis
f9d209787b std/crypto: add ISAPv2 (ISAP-A-128a) AEAD
We currently have ciphers optimized for performance, for
compatibility, for size and for specific CPUs.

However we lack a class of ciphers that is becoming increasingly
important, as Zig is being used for embedded systems, but also as
hardware-level side channels keep being found on (Intel) CPUs.

Here is ISAPv2, a construction specifically designed for resilience
against leakage and fault attacks.

ISAPv2 is obviously not optimized for performance, but can be an
option for highly sensitive data, when the runtime environment cannot
be trusted.
2020-11-16 16:02:19 -08:00
Frank Denis
7f9e3e419c Use @reduce 2020-11-07 20:30:13 +01:00
Frank Denis
bd07154242 Add mem.timingSafeEql() for constant-time array comparison
This is a trivial implementation that just does a or[xor] loop.

However, this pattern is used by virtually all crypto libraries and
in practice, even without assembly barriers, LLVM never turns it into
code with conditional jumps, even if one of the parameters is constant.

This has been verified to still be the case with LLVM 11.0.0.
2020-11-07 20:18:43 +01:00
Frank Denis
e7b60b219b std/crypto: don't constrain Gimli hash output to a fixed length
As documented in the comment right above the finalization function,
Gimli can be used as a XOF, i.e. the output doesn't have a fixed
length.

So, allow it to be used that way, just like BLAKE3.
2020-11-05 17:21:19 -05:00
Frank Denis
2e354c387e math.shl/math.shr: add support for vectors 2020-11-05 17:20:54 -05:00
Frank Denis
73aef46f7d std.crypto: namespace constructions a bit more
With the simple rule that whenever we have or will have 2 similar
functions, they should be in their own namespace.

Some of these new namespaces currently contain a single function.

This is to prepare for reduced-round versions that are likely to
be added later.
2020-11-05 17:20:25 -05:00
Frank Denis
4417206230 Now that they support vectors, use math.rot{l,r} 2020-11-05 17:19:48 -05:00
Frank Denis
8d7c160fb4 Make Gimli test vector look like the python implementation 2020-11-03 09:13:14 +01:00
Frank Denis
d764636d21 Another big-endian fix for Gimli
We read and write bytes directly from the state, but in the init
function, we potentially endian-swap them.

Initialize bytes in native format since we will be reading them
in native format as well later.

Also use the public interface in the "permute" test rather than an
internal interface. The state itself is not meant to be accessed directly,
even in tests.
2020-11-03 02:01:48 +01:00
Frank Denis
ad9655db3a Fix Gimli for big-endian targets 2020-11-02 13:38:20 -05:00
Frank Denis
c387f1340f std/crypto: make Hkdf functions public 2020-11-01 18:27:11 +02:00
Frank Denis
26793453a7 std/crypto/blake2b: allow the initial output length to be set
BLAKE2 includes the expected output length in the initial state.

This length is actually distinct from the actual output length
used at finalization.

BLAKE2b-256/128 is thus not the same as BLAKE2b-128.

This behavior can be a little bit surprising, and has been "fixed"
in BLAKE3.

In order to support this, we may want to provide an option to set the
length used for domain separation.

In Zig, there is another reason to allow this: we assume that the
output length is defined at comptime.

But BLAKE2 doesn't have a fixed output length. For an output length that
is not known at comptime, we can't take the full block size and
truncate it due to the reason above.

What we can do now is set that length as an option to get the correct
initial state, and truncate the output if necessary.
2020-10-29 15:18:37 -04:00
Frank Denis
e59dd7eecf std/crypto/x25519: return encoded points directly + ed->mont map
Leverage result location semantics for X25519 like we do everywhere
else in 25519/*

Also add the edwards25519->curve25519 map by the way since many
applications seem to use this to share the same key pair for encryption
and signature.
2020-10-29 14:39:58 -04:00
Frank Denis
5764c550ed std/crypto: vectorize Salsa20
20% faster on x86_64, slower on aarch64 as usual :/
2020-10-29 14:34:58 -04:00
Frank Denis
0adc144f88 std/crypto: adjust aesni parallelism to CPU models
Intel keeps changing the latency & throughput of the aes* and clmul
instructions every time they release a new model.

Adjust `optimal_parallel_blocks` accordingly, keeping 8 as a safe
default for unknown data.
2020-10-28 21:44:00 +02:00
Frank Denis
ea45897fcc PascalCase *box names, remove unneeded comptime & parenthesis
Also rename (salsa20|chacha20)Internal() to a better name.

And sort reexported crypto.* names
2020-10-28 21:43:15 +02:00
Žiga Željko
7c2bde1f07 std/crypto: API cleanup 2020-10-26 19:19:34 -04:00
Frank Denis
74a1175d9d std/*: add missing MIT license headers 2020-10-26 17:41:29 +01:00
Frank Denis
72064eba23 std/crypto: vectorize BLAKE3
Gives a ~40% speedup on x86_64.

However, the generic code remains faster on aarch64.

This is still processing only one block at a time for now.

I'm pretty confident that processing more blocks per round
will eventually give a substantial performance improvement on
all platforms with vector units.
2020-10-25 21:13:14 -04:00
Frank Denis
1b4ab749cf std/crypto: add the bcrypt password hashing function
The bcrypt function intentionally requires quite a lot of CPU cycles
to complete.

In addition to that, not having its full state constantly in the
CPU L1 cache causes a massive performance drop.

These properties slow down brute-force attacks against low-entropy
inputs (typically passwords), and GPU-based attacks get little
to no advantages over CPUs.
2020-10-25 21:11:40 -04:00
Frank Denis
0c7a99b38d Move ed25519 key pairs to a KeyPair structure 2020-10-25 21:55:05 +01:00
Frank Denis
28fb97f188 Add (X)Salsa20 and NaCl boxes
The NaCl constructions are available in pretty much all programming
languages, making them a solid choice for applications that require
interoperability.

Go includes them in the standard library, JavaScript has the popular
tweetnacl.js module, and reimplementations and ports of TweetNaCl
have been made everywhere.

Zig has almost everything that NaCl has at this point, the main
missing component being the Salsa20 cipher, on top on which NaCl's
secretboxes, boxes, and sealedboxes can be implemented.

So, here they are!

And clean the X25519 API up a little bit by the way.
2020-10-25 18:04:12 +01:00
Frank Denis
91a1c20e74 Fix a typo (s/multple/multiple/) 2020-10-24 07:57:34 +02:00
Frank Denis
047599928a Add a benchmark for signature verifications 2020-10-22 09:58:26 +02:00
Frank Denis
2d9befe9bf Implement multiscalar edwards25519 point multiplication 2020-10-22 09:58:26 +02:00
Frank Denis
0fb6fdd7eb Support variable-time edwards25519 scalar multiplication
This is useful to save some CPU cycles when the scalar is public,
such as when verifying signatures.
2020-10-22 09:58:26 +02:00
Frank Denis
ff658abe79 std/crypto/25519: use Barrett reduction for scalars (mod l) 2020-10-22 09:58:26 +02:00
Frank Denis
8e79b3cf23 std/crypto/25519: add support for batch Ed25519 signature verification 2020-10-22 09:58:26 +02:00
Frank Denis
fa17447090 std/crypto: make the whole APIs more consistent
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm.
- consistently use `_length` instead of mixing `_size` and `_length` for the
constants we expose
- Use `minimum_key_length` when it represents an actual minimum length.
Otherwise, use `key_length`.
- Require output buffers (for ciphertexts, macs, hashes) to be of the right
size, not at least of that size in some functions, and the exact size elsewhere.
- Use a `_bits` suffix instead of `_length` when a size is represented as a
number of bits to avoid confusion.
- Functions returning a constant-sized slice are now defined as a slice instead
of a pointer + a runtime assertion. This is the case for most hash functions.
- Use `camelCase` for all functions instead of `snake_case`.

No functional changes, but these are breaking API changes.
2020-10-17 18:53:08 -04:00
Frank Denis
0b4a5254fa Vectorize Gimli 2020-10-16 18:41:11 -04:00
Frank Denis
51a3d0603c std.rand: set DefaultCsprng to Gimli, and require a larger seed
`DefaultCsprng` is documented as a cryptographically secure RNG.

While `ISAAC` is a CSPRNG, the variant we have, `ISAAC64` is not.
A 64 bit seed is a bit small to satisfy that claim.

We also saw it being used with the current date as a seed, that
also defeats the point of a CSPRNG.

Set `DefaultCsprng` to `Gimli` instead of `ISAAC64`, rename
the parameter from `init_s` to `secret_seed` + add a comment to
clarify what kind of seed is expected here.

Instead of directly touching the internals of the Gimli implementation
(which can change/be architecture-specific), add an `init()` function
to the state.

Our Gimli-based CSPRNG was also not backtracking resistant. Gimli
is a permutation; it can be reverted. So, if the state was ever leaked,
future secrets, but also all the previously generated ones could be
recovered. Clear the rate after a squeeze in order to prevent this.

Finally, a dumb test was added just to exercise `DefaultCsprng` since
we don't use it anywhere.
2020-10-15 20:57:16 -04:00
Frank Denis
cb44f27104 std/crypto/hmac: remove HmacBlake2s256 definition
HMAC is a generic construction, so we allow it to be instantiated
with any hash function.

In practice, HMAC is almost exclusively used with MD5, SHA1 and SHA2,
so it makes sense to define some shortcuts for them.

However, defining `HmacBlake2s256` is a bit weird (and why
specifically that one, and not other hash functions we also support?).
There would be nothing wrong with that construction, but it's not
used in any standard protocol and would be a curious choice.

BLAKE2 being a keyed hash function, it doesn't need HMAC to be used as
a MAC, so that also doesn't make it a good example of a possible hash
function for HMAC.

This commit doesn't remove the ability to use a Hmac(Blake2s256) type
if, for some reason, applications really need this, but it removes
HmacBlake2s256 as a constant.
2020-10-15 20:50:34 -04:00
Frank Denis
f3667e8a80 std/crypto/25519: do cofactored ed25519 verification
This is slightly slower but makes our verification function compatible
with batch signatures. Which, in turn, makes blockchain people happy.
And we want to make our users happy.

Add convenience functions to substract edwards25519 points and to
clear the cofactor.
2020-10-15 18:49:10 -04:00
Frank Denis
9f109ba0eb Simpler ChaCha20 vector code 2020-10-10 22:45:41 +02:00
Frank Denis
459128e059 Use an array of comptime_int for shuffle masks
Suggested by @LemonBoy - Thanks!
2020-10-10 22:45:41 +02:00
Frank Denis
9b386bda33 std/crypto: add a vectorized ChaCha20 implementation
Brings a 30% speed boost on x86_64 even though we still process only
one block at a time for now.

Only enabled on x86_64 since the non-vectorized implementation seems
to currently perform better on some architectures (at least on aarch64).

But the non-vectorized implementation still gets a little speed boost
as well (~17%) with these changes.
2020-10-10 22:45:41 +02:00
Andrew Kelley
b02341d6f5
Merge pull request #6614 from jedisct1/aes-arm
std/crypto/aes: add AES hardware acceleration on aarch64
2020-10-08 18:09:40 -04:00
Frank Denis
1bc2b68916 ghash: add pmull support on aarch64 2020-10-08 18:09:23 -04:00
Frank Denis
60d1e675d2 aes/aesni is not based on a Go implementation, only aes/soft is
Don't blame them for our bugs :)
2020-10-08 14:55:11 +02:00
Frank Denis
f39dc00ed4 std/crypto/aes: add AES hardware acceleration on aarch64 2020-10-08 14:55:08 +02:00
Frank Denis
fb63a2cfae std/crypto: faster (mod 2^255-19) square root computation
251 squarings, 250 multiplications -> 251 squarings, 11 multiplications
2020-10-06 19:48:26 -04:00
Frank Denis
06c16f44e7 std/crypto: Add support for AES-GCM
Already pretty fast on platforms with AES-NI, even though GHASH
reduction hasn't been optimized yet, and we don't do stitching either.
2020-10-06 00:00:33 +02:00
Frank Denis
d343b75e7f ghash & poly1305: fix handling of partial blocks and add pad()
pad() aligns the next input to the first byte of a block, which is
useful to implement the IETF version of ChaCha20Poly1305 and AES-GCM.
2020-10-05 23:50:38 +02:00
Andrew Kelley
8170a3d574
Merge pull request #6463 from jedisct1/ghash
std/crypto: add GHASH implementation
2020-10-04 02:46:36 -04:00
Frank Denis
97fd0974b9 ghash: add pclmul support on x86_64 2020-10-01 02:05:11 +02:00
Frank Denis
8161de7fa4 Implement ghash aggregated reduction
Performance increases from ~400 MiB/s to 450 MiB/s at the expense of
extra code. Thus, aggregation is disabled on ReleaseSmall.

Since the multiplication cost is significant compared to the reduction,
aggregating more than 2 blocks is probably not worth it.
2020-10-01 02:05:07 +02:00