← Back to Bitcoin Improvement Proposals
BIP 93informationalDraftwalletkey-management

codex32: Checksummed SSSS-aware BIP32 seeds

This document proposes a checksummed base32 format, "codex32", and a standard for backing up and restoring the master seed of a [https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032] hierarchical deterministic wallet using it. It includes an encoding format, a BCH error-correcting checksum, and optional Shamir's secret sharing algorithms for share generation and secret recovery. Secret data can be encoded directly, or split into up to 31 shares. A minimum threshold of shares,

No reviews
Leon Olsson Curr and Pearlwort Sneed·Updated Mar 29, 2026·0 reviews·0 attestations·View source
Collections:BIPs — Merged

Specification

  BIP: 93
  Layer: Applications
  Title: codex32: Checksummed SSSS-aware BIP32 seeds
  Authors: Leon Olsson Curr and Pearlwort Sneed 
           Andrew Poelstra 
  Status: Draft
  Type: Informational
  Assigned: 2023-02-13
  License: BSD-3-Clause
  Discussion: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2023-February/021469.html

Introduction

Abstract

This document proposes a checksummed base32 format, "codex32", and a standard for backing up and restoring the master seed of a BIP-0032 hierarchical deterministic wallet using it. It includes an encoding format, a BCH error-correcting checksum, and optional Shamir's secret sharing algorithms for share generation and secret recovery. Secret data can be encoded directly, or split into up to 31 shares. A minimum threshold of shares, which can be between 2 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.

Copyright

This document is licensed under the 3-clause BSD license.

Motivation

BIP-0032 master seed data is the source entropy used to derive all private keys in an HD wallet. Safely storing this secret data is the hardest and most important part of self-custody. However, there is a tension between security, which demands limiting the number of backups, and resilience, which demands widely replicated backups. Encrypting the seed does not change this fundamental tradeoff, since it leaves essentially the same problem of how to back up the encryption key(s).

To allow users freedom to make this tradeoff, we use Shamir's secret sharing, which guarantees that any number of shares less than the threshold leaks no information about the secret. This approach allows increasing safety by widely distributing the generated shares, while also providing security against the compromise of one or more shares (as long as fewer than the threshold have been compromised).

SLIP-0039 has essentially the same motivations as this standard. However, unlike SLIP-0039,

  • this standard aims to be simple enough for hand computation
  • we use the bech32 alphabet rather than a word list, resulting in fixed-length compact encodings
  • we do not support multi-level secret sharing (splitting of shares), although it is technically possible and may be added in a future BIP
  • because of the need to support hand computation, we do not support passphrases or key hardening
Users who demand a higher level of security for particular secrets, or have a general distrust in digital electronic devices, have the option of using hand computation to backup and restore secret data in an interoperable manner. In particular, all computations can be done with simple lookup tables. It is therefore possible to compute and verify checksums, and to split and recover seeds, entirely using pen and paper. For long-lived rarely-used seeds, the ability to hand-verify checksums has a significant benefit even for users who do not care to do any other part of this process by hand. It means that they can verify the integrity (against non-malicious tampering) of their shares regularly, say, on an annual basis, without needing to continually expose secret data to new hardware.

The ability to compute properties by hand comes from our choice of a small field and our use of linear error correcting codes. It does not come with any reduction in security, as long as users use high-quality randomness. Note that hand computation is optional, the particular details of hand computation are outside the scope of this standard, and implementers do not need to be concerned with this possibility.

BIP-0039 serves the same purpose as this standard: encoding master seeds for storage by users. However, BIP-0039 has no error-correcting ability, cannot sensibly be extended to support secret sharing, has no support for versioning or other metadata, and has many technical design decisions that make implementation and interoperability difficult (for example, the use of SHA-512 to derive seeds, or the use of 11-bit words).

Specification

We first describe the general checksummed base32 format called codex32 and then define a BIP-0032 master seed encoding using it.

codex32

A codex32 string is similar to a bech32 string defined in BIP-0173. It reuses the base-32 character set from BIP-0173, and consists of:

  • A human-readable part, which is the string "ms" (or "MS").
  • A separator, which is always "1".
  • A data part which is in turn subdivided into:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0". *** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S"). ** An identifier consisting of 4 bech32 characters. ** A share index, which is any bech32 character. Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret"). ** A payload which is a sequence of up to 74 bech32 characters. (However, see Long codex32 Strings below for an exception to this limit.) ** A checksum which consists of 13 bech32 characters as described below.

As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes. In particular, given an all uppercase codex32 string, we still use lowercase ms as the human-readable part during checksum construction. For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings. If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.

Checksum

The last thirteen characters of the data part form a checksum and contain no information. Valid strings MUST pass the criteria for validity specified by the Python 3 code snippet below. The function ms32_verify_checksum must return true when its argument is the data part as a list of integers representing the characters converted using the bech32 character table from BIP-0173.

To construct a valid checksum given the data-part characters (excluding the checksum), the ms32_create_checksum function can be used.

MS32_CONST = 0x10ce0795c2fd1e62a

def ms32_polymod(values): GEN = [ 0x19dc500ce73fde210, 0x1bfae00def77fe529, 0x1fbd920fffe7bee52, 0x1739640bdeee3fdad, 0x07729a039cfc75f5a, ] residue = 0x23181b3 for v in values: b = (residue >> 60) residue = (residue & 0x0fffffffffffffff) << 5 ^ v for i in range(5): residue ^= GEN[i] if ((b >> i) & 1) else 0 return residue

def ms32_verify_checksum(data): if len(data) >= 96: # See Long codex32 Strings return ms32_verify_long_checksum(data) if len(data) <= 93: return ms32_polymod(data) == MS32_CONST return False

def ms32_create_checksum(data): if len(data) > 80: # See Long codex32 Strings return ms32_create_long_checksum(data) values = data polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]

This implements a BCH code that guarantees detection of any error affecting at most 8 characters and has less than a 3 in 1020 chance of failing to detect more random errors.

Error Correction

A codex32 string without a valid checksum MUST NOT be used. The checksum is designed to be an error correcting code that can correct up to 4 character substitutions, up to 8 unreadable characters (called erasures), or up to 13 consecutive erasures. Implementations SHOULD provide the user with a corrected valid codex32 string if possible. However, implementations SHOULD NOT automatically proceed with a corrected codex32 string without user confirmation of the corrected string, either by prompting the user, or returning a corrected string in an error message and allowing the user to repeat their action. We do not specify how an implementation should implement error correction. However, we recommend that:

  • Implementations make suggestions to substitute non-bech32 characters with bech32 characters in some situations, such as replacing "B" with "8", "O" with "0", "I" with "l", etc.
  • Implementations interpret "?" as an erasure.
  • Implementations optionally interpret other non-bech32 characters, or characters with incorrect case, as erasures.
  • If a string with 8 or fewer erasures can have those erasures filled in to make a valid codex32 string, then the implementation suggests such a string as a correction.
  • If a string consisting of valid bech32 characters in the proper case can be made valid by substituting 4 or fewer characters, then the implementation suggests such a string as a correction.

Unshared Secret

When the share index of a valid codex32 string (converted to lowercase) is the letter "s", we call the string a codex32 secret.

The secret is decoded by converting the payload to bytes:

  • Translate the characters to 5 bits values using the bech32 character table from BIP-0173, most significant bit first.
  • Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be 4 bits or less, and is discarded.
Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.

For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid). We recommend using the digit "0" for the threshold parameter in this case. The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different secrets in cases where they have more than one.

The function ms32_encode constructs a codex32 string when its argument is the converted data-part characters (excluding the checksum).

To validate a codex32 string and determine the data-part (excluding the checksum) as a list of 5-bit values, the ms32_decode function can be used.

CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"

def ms32_encode(data): combined = data + ms32_create_checksum(data) return "ms" + "1" + ''.join([CHARSET[d] for d in combined])

def ms32_decode(codex): if ((any(ord(x) < 33 or ord(x) > 126 for x in codex)) or (codex.lower() != codex and codex.upper() != codex)): return None codex = codex.lower() pos = codex.rfind("1") if pos < 2 or not (48 <= len(codex) <= 127): return None if not all(x in CHARSET for x in codex[pos+1:]): return None if codex[:pos] != "ms" or codex[pos+1].isalpha() or codex[pos+1] == "0" and codex[pos+6] != "s": return None data = [CHARSET.index(x) for x in codex[pos+1:]] if not ms32_verify_checksum(data): return None return data[:-13 if len(data) < 94 else -15] # See Long codex32 Strings

Master seed format

When the human-readable part of a valid codex32 secret (converted to lowercase) is the string "ms", we call it a codex32-encoded master seed or secret seed. The payload in this case is a direct encoding of a BIP-0032 HD master seed.

A secret seed is a codex32 encoding of:

  • The human-readable part "ms" for master seed.
  • The data-part values:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0". ** An identifier consisting of 4 bech32 characters. *** We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed and share set the user may need to disambiguate. ** The share index "s". ** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32: *** Start with the bits of the master seed, most significant bit per byte first. *** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed. *** Translate those bits to characters using the bech32 character table from BIP-0173. ** A valid checksum in accordance with the Checksum section.

Recovering Secret

When the share index of a valid codex32 string (converted to lowercase) is not the letter "s", we call the string a codex32 share. The first character of the data part indicates the threshold of the share, and it is required to be a non-"0" digit.

In order to recover a secret, one needs a set of valid shares such that:

  • All shares have the same threshold value, the same identifier, and the same length.
  • All of the share index values are distinct.
  • The number of shares is exactly equal to the (common) threshold value.
If all the above conditions are satisfied, the ms32_recover function will return a codex32 secret when its argument is the list of codex32 shares with each share represented as a list of integers representing the characters converted using the bech32 character table from BIP-0173.
BECH32_INV = [
    0, 1, 20, 24, 10, 8, 12, 29, 5, 11, 4, 9, 6, 28, 26, 31,
    22, 18, 17, 23, 2, 25, 16, 19, 3, 21, 14, 30, 13, 7, 27, 15,
]

def bech32_mul(a, b): res = 0 for i in range(5): res ^= a if ((b >> i) & 1) else 0 a *= 2 a ^= 41 if (32 <= a) else 0 return res

def bech32_lagrange(l, x): n = 1 c = [] for i in l: n = bech32_mul(n, i ^ x) m = 1 for j in l: m = bech32_mul(m, (x if i == j else i) ^ j) c.append(m) return [bech32_mul(n, BECH32_INV[i]) for i in c]

def ms32_interpolate(l, x): w = bech32_lagrange([s[5] for s in l], x) res = [] for i in range(len(l[0])): n = 0 for j in range(len(l)): n ^= bech32_mul(w[j], l[j][i]) res.append(n) return res

def ms32_recover(shares): return ms32_interpolate(shares, 16)

Generating Shares

If we already have k valid codex32 strings such that:

  • All strings have the same threshold value k, the same identifier, and the same length
  • All of the share index values are distinct
Then we can derive additional shares with the ms32_interpolate function by passing it a list of exactly k of these codex32 strings, together with a fresh share index distinct from all of the existing share indexes. The newly derived share will

[Content truncatedview full spec at source]

Discussion (0 threads)

Loading discussions...