Remote Code Signing Protocol¶
The remote signing protocol facilitates the cryptographic signing of messages involving 2 discrete network peers.
The peer that wants something signed is the initiator.
The peer with access to the signing key that produces cryptographic signatures is the signer.
Peers establish persistent websocket connections to a central server to enable them to speak with each through firewalls and NATs.
Peers register an ephemeral session with the server, which is essentially a binding between 2 connected websocket clients.
Peers derive session-specific encryption keys using mutually agreed upon ahead of time data. They then relay end-to-end encrypted messages through the central server and perform cryptographic signing operations.
The protocol entails the exchange of JSON encoded objects via websockets.
The JSON objects sent from clients to the server have the following keys:
(string) (required) A unique identifier for this request.
(string) (required) The name of the API / method to invoke on the server.
(object) (optional) Parameters passed to this API invocation.
The JSON objects sent from servers to clients have the following keys:
(string) (optional) Echo of
request_idfrom the message that generated this one. The value could be unknown to the receiver if this message was generated from the other peer in the session.
(string) (required) The message type.
(number) (optional) Integer number of seconds remaining before the session expires and will be automatically deleted by the server.
(object) (optional) Payload further describing this message.
All other fields in the top-level object are reserved for future use.
Messages sent from the client to server ALWAYS result in the server responding to that API request.
It is also possible for servers to send messages to clients asynchronously of any client-initiated message.
Initial Connection Protocol¶
When a client connects to the server, it SHOULD issue a
message and wait for the server’s response.
If the response contains a message of the day string, it MUST be displayed to the end-user.
Clients SHOULD also make a best effort attempt to validate the server’s advertised capabilities and make a determination about compatibility and error or print warnings if incompatibility is detected.
The initiator and signer pair with each other by forming a session.
From the server’s perspective, a session is an opaque identifier string with associated state, such as the unique websocket connection IDs of the initiator and signer clients.
Sessions are ephemeral and expire automatically after a duration specified by the initiating client. (The server can impose a maximum duration to prevent service abuse.)
Sessions are generally created by the initiator.
The initiator creates a unique session ID,
be randomly chosen. It SHOULD have sufficient entropy to prevent server-side
collisions. The use of type 4 UUIDs for session IDs is recommended.
Once a server-side session is created, the initiator then shares a session join string with the signer via an out-of-band mechanism. See Session Join Strings for more.
At this point, mechanisms diverge based on the session joining mechanism employed. But generally speaking, the signer sends a join-session to the server to register itself as the other peer in the session. At this point, both peers derive encryption keys and communicate with each other by issuing send-message messages. See Signing Protocol for more.
Session Join Strings¶
The initiator and signer need to leverage an out-of-band mechanism for communicating metadata with each other in order to join a server-established session. There are various potential solutions for this and we’ve purposefully designed the mechanism to be extensible.
Generically, the mechanism to join a session is expressed through a session join string, or SJS.
The SJS is ultimately a CBOR encoded array of length 2. The array’s elements are:
(string) The scheme being used.
(varied) The payload for that scheme.
But to end-users it is an opaque string.
The SJS can be encoded as:
Base64 using the RFC 3548 URL safe character set with optional
SESSION JOIN STRINGas the armoring tag.
In general, the session join string is shared out-of-band with the other peer, who uses it to join the session.
In general, session join strings are designed such that a 3rd party becoming aware of the SJS will not jeopardize the security of the current or future signing operations. However, denial of service could occur if the SJS exposes the session ID and a 3rd party joins the session before the intended peer.
The following sections denote the defined session join string schemes.
Sections names are the
publickey0 session joining mechanism relies on public key cryptography
to authenticate the 2nd peer in a session by leveraging knowledge of the
2nd peer’s public encryption key.
The initiating peer,
A, MUST know the public key of the joining peer,
A generates a random value at least 32 bytes long,
A generates a new RFC 7748 Curve 25519 private key. Its private /
public components are
A generates a new random 16 byte value,
A loads the public key of
BPublic. It usually does so by
extracting the X.509 SubjectPublicKeyInfo (SPKI) (RFC 5280 Section 18.104.22.168)
from an X.509 certificate or DER/PEM fragment of just the SPKI.
A prepares a plaintext message to be sent to
This message is a CBOR array with the following elements:
(Index 0) (optional string) URL of the server to connect to.
(Index 1) (string) The session identifier created on the server.
(Index 2) (bytes) The content of
(Index 3) (bytes)
AJoinPlaintext using AES-128 in GCM with
AJoinCiphertext. A 12 byte nonce is used where the bytes are all
0x42. The 16 byte authentication tag is appended to the raw ciphertext
and constitutes the final bytes of
SharedAESKey using asymmetric encryption targeting
For RSA, OAEP padding with SHA-256 digests MUST be used.
The payload of the session join string is a CBOR array with the following elements:
(Index 0) (bytes) The
(Index 1) (bytes) The SPKI describing which public key was used to encrypt
(Index 2) (bytes) The
So, the final session join string is
["publickey0", [SharedAESCiphertext, BSPKI, AJoinCiphertext]].
The session join string is summarily CBOR and base64 encoded and made
B receives and decodes the SJS.
B locates the decryption key from the provided SPKI structure. (
may want to impose restrictions here to prevent clients from fishing for
BPrivate, yielding back
B verifies and decrypts
B generates a new RFC 7748 Curve 25519 private key,
B connects to the server and sends a
join-session message with
At this point,
B both perform key agreement using their
ephemeral ED25519 private key and the public key of the other peer, each
At this point, the procedure described in
AEAD Key Derivation is used to derive new symmetric
ChallengeSecret is used as the additional value to
The session join string consists of 2 discrete encrypted payloads and is generally safe against offline attacks. Unless ciphers are broken, the private key is required to obtain for anything beyond side-channels (like total payload size).
SessionId is encrypted, so compromise of the SJS can’t easily lead to a
DoS by an unwanted peer joining the session.
The server doesn’t see anything: the encrypted AES key and AES encrypted peer metadata are both encapsulated in the SJS. We could potentially move some of these to the server to reduce the length of the SJS.
Open Questions for Security Audit¶
We don’t sign / HMAC the asymmetrically encrypted AES key. Nor do we include an IV or other prepended message. This seems to go against best practices. Does it matter? Does the additional layer of AEAD feeding into the key agreement compensate for this?
Is the use of a constant nonce for the
AJoinCiphertextacceptable? The AES key is randomly generated and is used exactly once, so do the nonces even matter?
Is AES-128 in GCM mode a sufficient key/cipher for encrypting the main message?
We currently generate 2 distinct private keys: 1 for key agreement and 1 for AES encryption. They are generated independently. Does this make sense or should perhaps HKDF be used against a common key?
Right now there is no explicit trust anchoring between the asymmetric encryption targeting
Band the derived shared secret key. Should
Bproduce a cryptographic signature using
Adoesn’t assume that ability to decrypt authenticates
B? Or is ability to decrypt along with the assumption that only
AEAD Key Derivation¶
The schemes above commonly detail the steps to enable 2 peers to mutually
derive a session-ephemeral shared encryption key,
Rather than use
SessionSharedKey directly for subsequent message exchange,
we instead derive additional keys from it for use with Authenticated Encryption
and Additional Data (AEAD) encryption / message exchange.
An identifier value is associated with peers assuming roles
A (the session
B (the session joiner). The value is a bytes concatenation
The role name. e.g.
A colon (
SessionIdidentifier, UTF-8 encoded.
A colon (
An additional value communicated in the session join string. e.g.
These values are known as
HKDF is used to derive new keys.
Step 1 / HKDF-Extract uses an empty salt and
SessionSharedKey to produce
a pseudorandom key,
Step 2 / HKDF-Expand is performed twice to derive 2 new keys. The first
RoleAKey. The second invocation uses
RoleBKey are used to empower AEAD encryption / message
exchange. ChaCha20+Poly1305 is used. Nonces are 12 bytes where the first 4
bytes are a little-endian u32 counter whose initial used value is
the subsequent 8 bytes are always
0. Additionally authenticated data
AAD) is generally not used.
RoleAKey is used by
A to encrypt messages and by
verify/decrypt messages from
RoleBKey is used by
encrypt messages and by
A to verify/decrypt messages from
Open Questions for Security Audit¶
Is ChaCha20+Poly1305 a reasonable cipher choice? Or should we be using block ciphers (e.g. AES)?
Using a simple, easily guessable counter for nonces seems wrong. Using a random value seems more appropriate. But both parties need to know what the nonce we be. Do we use a random value for the nonce but encode the nonce in plaintext next to the exchanged ciphertext messages? Or do we need something else entirely?
We could potentially use additionally authenticated data (AAD) to encapsulate more details of the request, such as the request ID. Does that buy us security benefits?
Once 2 peers have established a session and derived encryption keys to facilitate end-to-end encrypted communication, they communicate with each other using peer to peer messages by invoking the send-message API.
This process generally involves a handshake:
Both peers simultaneously send ping messages.
Upon receipt, each peer sends a pong in response. This dance confirms peer presence and that the derived encryption keys work.
The initiator sends a request-signing-certificate to request information about the signer’s public certificate. This is necessary in order to allow the signer to do things like estimate the sizes of signatures and to derive additional details needed for signing.
The signer sends a signing-certificate in response.
At this point, both peers are ready to commence signing.
The initiator sends a sign-request.
The signer receives the request, assesses it, creates a cryptographic signature, and sends a signature in reply.
Steps 5-6 are repeated as necessary.
Either peer sends a goodbye to finalize the session.
Client Issued Messages¶
The following sections denote the types of messages issued from clients to servers.
Section names denote the value of the
api key in the messages.
Greets the server and obtains information about the server.
This message type has no payload.
Servers respond to this message with a error.
Requests the creation of a new session on the server.
Sent by the initiator as part of session negotiation.
(string) (required) Unique identifier to use for this session.
(number) (required) Requested session duration, in seconds.
(string) (optional) Additional context to be passed to the peer when it joins the session.
Servers SHOULD automatically expire the server-side session state after its TTL duration expires. Servers MAY close connections to connected clients when their session expires. Servers MAY impose a shorter TTL if the requested TTL is too long.
Servers respond to this message with a session-created.
Attempts to join an existing session.
Sent by the signer as part of session negotiation.
(string) (required) Identifier of session to join.
(string) (optional) Additional context to pass through to the other peer.
Servers respond to this message with a session-joined.
Sends an (encrypted) message to the other peer in this session.
(string) (required) Identifier of session to use for peer lookup.
(string) (required) Base64 encoded ciphertext of an AEAD encrypted message to send to the peer.
Server implementations MUST ensure that the client issuing this request are bound to the session they are attempting to send a message to.
Servers react to this message by sending a peer-message to the other peer in the specified session.
Servers respond to this message with a message-sent.
Indicates the client is finished and will be disconnecting.
(string) (required) Identifier of session to use for peer lookup.
(string) (option) Reason the client is disconnecting.
Server implementations MUST ensure that the client issuing this request is bound to the session they are attempting to close.
Servers react to this message by sending a session-closed to the other peer in the specified session.
Servers respond to this message with a session-closed.
Server Sent Messages¶
The following sections denote the types of messages sent from the server to clients.
Section names denote the value of the
type field in the message.
Conveys information about a server-side error.
Could be sent in reply to any API request or sent asynchronously if some error occurred (such as the peer disconnecting unexpectedly).
(string) (required) Value that uniquely identifies this error type.
(string) (required) Human readable error message.
Conveys information about the server.
Sent in reply to a hello request.
(array of strings) (required) Names of APIs that the server supports.
(string) (optional) Message of the day conveying messaging that the server operator wishes clients to know about.
Conveys the successful creation of a session.
Sent in reply to a create-session request.
Conveys the successful joining into a session.
Sent in reply to a join-session request.
Sent asynchronously by servers in response to a join-session issued by the joining peer.
(string) (optional) Data from the peer required to finish initializing the session.
If this message was sent in reply to a join-session, the value will be from the initiating peer.
If this message was sent to the pre-existing peer in reaction to a join-session, the value will be from the joining peer.
Conveys the successful sending of a message to the session peer.
Sent in reply to a send-message request.
Delivers an (encrypted) message from the peer in this session.
Sent asynchronously by servers in response to a send-message issued by the other peer in a session.
(string) (required) Base64 encoded AEAD message.
Conveys that the session has been finalized and can no longer be used.
Sent in reply to a goodbye request as well as asynchronously to the peer in its session.
(string) (optional) Provides further context on why the session was closed.
Peer to Peer Messages¶
message field denotes a base64 encoded AEAD encrypted message. The
message consists of the ciphertext with the authentication tag appended. The
plaintext of these messages is the JSON encoding of an object having the
(string) (required) The message type. This is unique message namespace from server-sent messages.
(object) (optional) Payload for this message.
The following sections denote the types of peer-to-peer messages. The section
names denote the value for the
Check on the status of the peer.
Receivers should send a pong in response.
Respond to a status check from a peer.
Sent in response to a ping message.
Requests the peer to send it information about its signing certificate.
Receivers should send a signing-certificate in response.
Should only be sent by the initiator.
Describes the signing certificate(s) that is being used by the signer.
Sent in response to a request-signing-certificate.
(array of object) (required) Contains a list of signing certificates that will potentially be used.
Each entry is an object described below.
Today, there is likely a single certificate in this array. We’ve left the door open for supporting the use of multiple signing certificates in the future.
Each entry in the
certificatess array is an object with the following
(string) (required) Base64 encoded DER of the public X.509 certificate.
(array of strings) (optional) Base64 encoded DER of additional public X.509 certificates in the signing chain for this certificate.
Requests the cryptographic signing of a message.
(string) (required) Base64 encoded message to be signed.
Conveys the cryptographic signature over a message.
Sent in response to a sign-request.
(string) (required) Base64 encoded message that was signed.
(string) (required) Base64 encoded signature data.
(string) (required) Base64 encoded DER encoding of OID denoting the signature algorithm.