An Introduction to the Noise Protocol FrameworkNick Mooney March 5th, 2020 (Last Updated: March 5th, 2020)
Noise is a framework that can be used to construct secure channel protocols. Noise takes a fairly basic set of cryptographic operations and allows them to be combined in ways that provide various security properties. Noise is not a protocol itself: it is a protocol framework, so by "filling in the blanks" you get a concrete protocol that has essentially no knobs to twist. We’ll use the term “Noise protocol” to refer to a concrete protocol, and “Noise framework” to refer to the framework overall.
Every Noise protocol begins with a handshake that follows a particular pattern. The end result of a Noise handshake is an encrypted channel that provides various forms of confidentiality, integrity, and authenticity guarantees. Which of these guarantees you get depends on which handshake pattern is used, but a collection of standard handshakes with known security properties are provided. The Noise framework is fully agnostic to what is actually transmitted via the encrypted channel established with a handshake. You could transmit messages, video files, or anything else.
Noise is fundamentally based around Diffie-Hellman key agreement. There are many constructions that make use of DH, including perhaps the most simple DH construction which is to agree on a key that is then used directly for symmetric encryption. Noise has several advantages over building your own DH-based protocol. Some of the primary benefits are (1) that the structured nature of the Noise framework allows us to build protocols with exactly the properties we need, as well as analyze whether those properties are present, and (2) that “advanced” properties not provided by a simple DH construction (like message authentication) can be built into Noise protocols with combinations of Diffie-Hellman and the behavior of the Noise state machine. Noise Explorer is a tool that automatically analyzes handshake patterns and demonstrates the security guarantees present at each step of the handshake graphically. I refer to Noise Explorer often when trying to understand new handshake patterns.
The rigidity of a Noise protocol is one of its biggest assets. A web browser, using TLS in lieu of a Noise protocol, might have to connect to a wide variety of servers, each supporting different combinations of cryptographic algorithms. This additional capability on behalf of the web browser means that sometimes the browser might use less secure cryptography than it is capable of, or that bugs may be introduced by the logic that handles protocol negotiation. On the other hand, a Noise protocol uses a defined set of cryptographic algorithms and handshake messages that are chosen ahead of time. Noise fits well in homogeneous environments where negotiation is not generally required because both parties run software controlled by the same entity.
I will try to explain a bit more about the Noise framework and why it's neat, but I should mention that the Noise spec is very readable.
There are several reasons why I think the Noise framework is useful:
- The framework is flexible, but protocols built with the framework are concrete: each participant implements the same state machine, and the framework is built to ensure from the start that each participant sees the same interactions
- There is almost no built-in protocol negotiation*, reducing the risk of downgrade attacks (where an attacker forces a client to use a less-secure mode of operation), confused deputy attacks (where a client is "tricked" into misbehaving or leaking secret information), and other issues that are the result of two different parties having different views of the same interaction
- Noise requires a fairly minimal set of primitives to build a concrete protocol:
- An AEAD cipher (such as ChaChaPoly1305 or AES-GCM)
- A hashing algorithm (such as BLAKE2 or SHA-256)
- A Diffie-Hellman scheme (such as ECDH with Curve25519)
In short, Noise allows developers to build secure protocols that do not have a lot of surprising behavior.
* Noise supports fallback patterns, which allow for some negotiation in circumstances that cause an initial handshake to fail, such as when a long-term static key has changed. This is very limited compared to, say, TLS.
A Noise protocol begins with two parties exchanging handshake messages. During this handshake phase the parties exchange DH public keys and perform a sequence of DH operations, hashing the DH results into a shared secret key. After the handshake phase each party can use this shared key to send encrypted transport messages.
The Noise framework supports handshakes where each party has a long-term static key pair and/or an ephemeral key pair. A Noise handshake is described by a simple language. This language consists of tokens which are arranged into message patterns. Message patterns are arranged into handshake patterns.
A message pattern is a sequence of tokens that specifies the DH public keys that comprise a handshake message, and the DH operations that are performed when sending or receiving that message. A handshake pattern specifies the sequential exchange of messages that comprise a handshake.
A handshake pattern can be instantiated by DH functions, cipher functions, and hash functions to give a concrete Noise protocol.
Source: The Noise Protocol Framework Specification
A handshake consists of two parties, the initiator and the responder. Once a Noise handshake is completed, the result is an AEAD-protected transport channel, but it's also important to note that arbitrary message payloads can be transmitted during the handshake phase, before the full handshake is complete. This allows immediate transmission of protocol messages without the full round trip delay of the handshake. Payloads transmitted alongside handshake messages are partially protected, and will have different security guarantees depending on which handshake message they are attached to.
Whenever encrypted information is transmitted during a handshake (after keying material has been established, usually after the first Diffie-Hellman), the hash of the handshake transcript so far is included as the "associated data" in AEAD. This helps ensure that both parties have the same view of the handshake, even if the encrypted payload is empty.
The quote above mentions that the initiator and responder can each have a long-term static key pair and/or an ephemeral key pair. Noise handshake patterns are named after the state of these long-term static keys:
XN, etc. The first letter indicates the status of the initiator's long-term static key, and the second letter indicates the status of the responder's long-term static key. All Noise handshakes involve some combination of transmitting public keys and performing Diffie-Hellman operations. Static keys are used to provide long-term participant identity, so you can confirm that the party you’re talking to today is the same party you were talking to yesterday.
All the standard handshake patterns require an exchange of ephemeral keys: this is done to provide forward secrecy, so that a later compromise of long-term static keys would not reveal the plaintext contents of previous communications. Noise has this property in common with TLS 1.3, which also requires the exchange of ephemeral keys, an upgrade from previous versions of TLS where it was optional. Some Noise protocols also offer identity hiding properties, depending on when the static keys are transmitted.
||No long term static-key is present|
||The long-term static key is Known to the other party before the handshake|
||The long-term static key is transmitted (Xmitted) to the other party during the handshake|
||The long-term static key (for the initiator) is Immediately transmitted to the responder, despite absent/reduced identity hiding|
Handshakes are represented textually using a standard format: an arrow signifying the direction of communication followed by a sequence of tokens that describe state machine operations. You will see this "ASCII art" format whenever handshake patterns are described in the Noise specification or elsewhere.
02. Valid Handshakes
During a handshake, each party transmits its ephemeral and/or static public keys, and performs DH operations between the ephemeral and/or static public keys of both parties. In fact, there are only six possible tokens (barring PSKs, which we will discuss later):
e: generate ephemeral keypair and transmit public key.
->at the front of the line indicates that the public key is transmitted from initiator to responder and
<-indicates that the public key is transmitted from responder to initiator.
s: transmit long-term static public key. The
<-arrows signify the transmission direction in the same way as for the
se: both participants perform a DH between the ephemeral/static keypair of the initiator and the ephemeral/static keypair of the responder.
Commas separate each token in the same step of the handshake and indicate that the associated action occurs before the next token is processed.
03. Example Handshake Patterns
Jumping Right In: The
Here is the
NN Noise handshake pattern.
NN means that neither party has a long-term static key, so the handshake is based entirely on ephemeral keys. The handshake pattern is:
-> e <- e, ee
This pattern represents an unauthenticated DH handshake.
The first thing to notice is that
ee are not messages, per se -- they are tokens processed by the state machines of both parties. Some tokens (
s), but not all (e.g.
es), lead to messages being sent.
Let's look at what each party does during this handshake.
-> arrow indicates that the transmission will be from the initiator to the responder. The
e token specifies that the initiator generates an ephemeral keypair and transmits the public key to the responder. The responder receives and stores the initiator public key. Both parties hash this key into their handshake hash, which will be included as authenticated data in AEAD ciphertext (ensuring that both parties have the same view of the handshake transcript) as soon as a symmetric key is established and the parties begin encrypting messages. The initiator also has the option to transmit a payload alongside this handshake message. If the initiator were to include a payload, it would include no authentication.
<- e, ee
The responder now does the same. The
<- arrow indicates the transmission will be from the responder to the initiator. The
e token indicates that the responder will generate an ephemeral keypair and transmit the public key to the initiator. The initiator receives this key, and both parties hash the key into their handshake hash. Now, processing the
ee token, both parties perform a Diffie-Hellman between the initiator ephemeral key and the responder ephemeral key. The result of this Diffie-Hellman is used to create a new chaining key, which is in turn used to derive a key that can be used to symmetrically encrypt/decrypt content*. As we mentioned in the first handshake step, now that symmetric key material has been generated, the handshake hash will be included in AEAD ciphertext.
The responder can include a message payload alongside this handshake message. This message would be encrypted, providing message secrecy and some forward secrecy.
See the analysis of the NN handshake in Noise Explorer for some more information.
At the termination of the handshake, both parties will have a shared symmetric state (technically, two shared symmetric states) that can be used to send encrypted messages back and forth. These transport messages (post-handshake) will benefit from message secrecy and some forward secrecy. Because the whole handshake is unauthenticated via any out-of-band means, this scheme is not resistant to an active attacker.
* The chaining key is used as an input to HKDF, which outputs the actual
k used for encryption. Each update to the chaining key also results in a new
Changing Things: The
Let's consider now the
NK pattern. The initiator here still has no long-term static identity key, but the responder has a long-term static identity that is known to the initiator (transmitted out of band, or during a previous handshake).
The handshake pattern is as follows:
<- s ... -> e, es <- e, ee
The first step of the handshake pattern is a "pre-message," which just serves to identify that the contents were somehow transmitted before the handshake began. In this case,
<- s shows that the responder's long-term static identity was somehow communicated to the initiator ahead of time. The
... separates pre-messages from handshake messages.
-> e, es
The initiator generates an ephemeral public key transmits it to the responder. Transmitted / received messages are always hashed into the handshake hash. Next, both parties perform a Diffie-Hellman between the initiator's ephemeral key and the responder's static key, which is (as always) used to update the chaining key.
Because our chaining key is now based off the responder's long-term static key, which was transmitted out-of-band, any message payload attached to this handshake method benefits from some message secrecy (i.e. given a full transcript of this handshake, the message contents could only be decrypted by an attacker with access to the responder's long-term private key).
<- e, ee
The responder now generates an ephemeral keypair and transmits its public key to the initiator. This handshake message (containing the responder's ephemeral pubkey) benefits from sender authentication since the responder's long-term static identity was used in a Diffie-Hellman. This handshake message also benefits from some message secrecy, since the former DH was used to establish a symmetric key.
Both parties perform a Diffie-Hellman between the initiator's ephemeral key and the responder's ephemeral key, rolling the result into the chaining key and enabling forward secrecy, should the responder’s long-term static key ever be compromised.
04. The Handshake State Machine
During a Noise handshake, each party keeps track of the following variables:
e: The static and ephemeral keypairs of the local party (which may be empty)
re: The static and ephemeral public keys of the remote party (which may be empty)
h: The aforementioned handshake hash, which hashes all handshake data sent and received
ck: A chaining key based on hashes of the outputs of all previous Diffie-Hellman operations
n: An encryption key (derived from
ck) and a nonce that are used to encrypt message payloads
As each token is processed, these variables are updated. The functions supported by the state machine are defined in the Processing Rules section of the Noise specification.
Because the handshake pattern is set ahead of time, each state of the state machine has exact one valid transition to the next state. You can view the possible state transitions as a simple, single-directional chain: there is no input that causes cyclical behavior.
05. After the Handshake
During the handshake phase, the two parties share a single symmetric cipher state. Once a Noise handshake is completed, this state is split into two cipher states, one for each direction of communication. Each of the newly-created ciphers uses a key derived from an HKDF with the chaining key as input.
At this point, the handshake is complete and there is nothing Noise-specific about communicating over the encrypted channels produced by the handshake. Noise does specify a rekey operation that could be triggered by an application-specific message to rotate keys any time after the handshake has been completed.
06. Adding More
Noise supports several other features outside of the handshake patterns that we haven't yet talked about.
Prologues can be used to ensure that both parties have identical views of data -- to ensure that a MITM attack hasn't occurred between the two users before the handshake commences, for example. Prologues will cause the handshake to fail if both parties do not have the same prologue data, but prologues are not considered to be secret data and are not mixed into encryption keys.
Noise also supports pre-shared keys. PSKs can be used to provide message secrecy (and some form of message authentication) before any other handshake operations have occurred. Noise patterns that use PSKs are named by appending "pskZ" to the name of the handshake, where "Z" is a number indicating where the psk token is inserted into the handshake.
NNpsk0 for example. Remember that the original
NN handshake is:
-> e <- e, ee
NN with the PSK token included at the beginning of the first handshake message. The suffixes
2, etc place the PSK token at the end of the first, second, etc. messages respectively. The
NNpsk0 handshake pattern is:
-> psk, e <- e, ee
As a PSK is pre-shared by definition, the
psk token doesn't actually cause either party to transmit anything to the other. The
psk token is processed by both parties mixing the PSK into their cipher state.
In particular, this token is processed by each party calling
MixKeyAndHash(psk) (defined in the Noise spec), which updates both the chaining key and the handshake hash. To ensure forward secrecy and avoid catastrophic reuse of cipher keys, the Noise protocol framework does not allow for the transmission of encrypted data after just processing the
e token is processed in a PSK handshake, the ephemeral public key is mixed into the handshake hash (as usual) and the chaining key (which is specific to PSK handshakes). This mixing ensures randomization of the symmetric key to ensure that the symmetric key is not based solely on the PSK. In fact, an
e token must be present in a PSK-based handshake, either before or after the
Full Protocol Names
When we use Noise to build a protocol, we "fill in the blanks" by providing a handshake pattern, an AEAD construction, a hash function, and a DH scheme. Noise prescribes a naming convention for a specified protocol, as follows:
This protocol name contains all the information required for Noise clients to participate in a concrete run of this protocol, giving us a nice human-readable way to specify a protocol. The initial chaining key within the handshake state machine is actually based on the full protocol name, further ensuring that both parties have the same internal model of the protocol they are running.
07. Noise in Production
Noise is used today in several high-profile projects:
- WhatsApp uses the "Noise Pipes" construction from the specification to perform encryption of client-server communications
- WireGuard, a modern VPN, uses the Noise IK pattern to establish encrypted channels between clients
- Slack's Nebula project, an overlay networking tool, uses Noise
- The Lightning Network uses Noise
- I2P uses Noise
- David Wong's Noise explanation, an excellent visual introduction to the Noise protocol framework
- Trevor Perrin's Noise talk at Real World Crypto 2018
- The official Noise website
- The official Noise specification, one of the more readable specs I have encountered!
- Noise Explorer, a tool that allows you to explore Noise handshake patterns as well as design your own. Noise Explorer performs some automated analysis of the security properties of various handshakes, and is also capable of generating reference implementations.
- Thanks to Jordan Wright, Jeremy Erickson, Ed Marczak, and Dennis Jackson for editing and providing input on pre-release versions of this post