← Back to BitTorrent Enhancement Proposals
BEP 42specificationDraftaddresseskey-managementdhtp2prouting

DHT Security extension

DHT Security Extension

No reviews
Arvid Norberg·Updated Mar 29, 2026·0 reviews·0 attestations·View source
Collections:BEPs — Merged

Specification

DHT Security Extension

The purpose of this extension is to make it harder to launch a few

specific attacks against the BitTorrent DHT and also to make it harder

to snoop the network.

Specifically the attack this extension intends to make harder is launching

8 or more DHT nodes with node-IDs selected close to a specific target

info-hash, in order to become the main nodes hosting peers for it. Currently

this is very easy to do and lets the attacker not only see all the traffic

related to this specific info-hash but also block access to it by other

peers.

The proposed guard against this is to enforce restrictions on which node-ID

a node can choose, based on its external IP address.

Considerations

One straight forward scheme to tie the node ID to an IP would be to hash

the IP and force the node ID to share the prefix of that hash. One main

drawback of this approach is that an entities control over the DHT key

space grows linearly with its control over the IP address space.

In order to successfully launch an attack, you just need to find 8 IPs

whose hash will be closest to the target info-hash. Given the current

size of the DHT, that is quite likely to be possible by anyone in control

of a /8 IP block.

The size of the DHT is approximately 8.4 million nodes. This is estimated

by observing that a typical routing table typically has about 20 of its

top routing table buckets full. That means the key space is dense enough

to contain 8 nodes for every combination of the 20 top bits of node IDs.

2^20 * 8 = 8388608

By controlling that many IP addresses, an attacker could snoop any info-hash.

By controlling 8 times that many IP addresses, an attacker could actually

take over any info-hash.

With IPv4, snooping would require a /8 IP block, giving access to 16.7 million

Ips.

Another problem with hashing the IP is that multiple users behind a NAT are

forced to run their DHT nodes on the same node ID.

Node ID restriction

In order to avoid the number node IDs controlled to grow linearly by the number

of IPs, as well as allowing more than one node ID per external IP, the node

ID can be restricted at each class level of the IP.

Another important property of the restriction put on node IDs is that the

distribution of the IDs remain uniform. This is why CRC32C (Castagnoli) was

chosen as the hash function.

The expression to calculate a valid ID prefix (from an IPv4 address) is

	crc32c((ip & 0x030f3fff) | (r << 29))

And for an IPv6 address (ip is the high 64 bits of the address)

	crc32c((ip & 0x0103070f1f3f7fff) | (r << 61))

r is a random number in the range [0, 7]. The resulting integer,

representing the masked IP address is supposed to be big-endian before

hashed. The "|" operator means bit-wise OR.

The details of implementing this is to evaluate the expression, store the

result in a big endian 64 bit integer and hash those 8 bytes with CRC32C.

The first (most significant) 21 bits of the node ID used in the DHT MUST

match the first 21 bits of the resulting hash. The last byte of the hash MUST

match the random number (r) used to generate the hash.

image:
image:

Example code code for calculating a valid node ID

	uint8_t* ip; // our external IPv4 or IPv6 address (network byte order)
	int num_octets; // the number of octets to consider in ip (4 or 8)
	uint8_t node_id[20]; // resulting node ID

	uint8_t v4_mask[] = { 0x03, 0x0f, 0x3f, 0xff };
	uint8_t v6_mask[] = { 0x01, 0x03, 0x07, 0x0f, 0x1f, 0x3f, 0x7f, 0xff };
	uint8_t* mask = num_octets == 4 ? v4_mask : v6_mask;

	for (int i = 0; i < num_octets; ++i)
		ip[i] &= mask[i];

	uint32_t rand = std::rand() & 0xff;
	uint8_t r = rand & 0x7;
	ip[0] |= r << 5;

	uint32_t crc = 0;
	crc = crc32c(crc, ip, num_octets);

	// only take the top 21 bits from crc
	node_id[0] = (crc >> 24) & 0xff;
	node_id[1] = (crc >> 16) & 0xff;
	node_id[2] = ((crc >> 8) & 0xf8) | (std::rand() & 0x7);
	for (int i = 3; i < 19; ++i) node_id[i] = std::rand();
	node_id[19] = rand;

test vectors:

.. parsed-literal

	IP           rand  example node ID
	============ ===== ==========================================
	124.31.75.21   1   **5fbfbf** f10c5d6a4ec8a88e4c6ab4c28b95eee4 **01**
	21.75.31.124  86   **5a3ce9** c14e7a08645677bbd1cfe7d8f956d532 **56**
	65.23.51.170  22   **a5d432** 20bc8f112a3d426c84764f8c2a1150e6 **16**
	84.124.73.14  65   **1b0321** dd1bb1fe518101ceef99462b947a01ff **41**
	43.213.53.83  90   **e56f6c** bf5b7c4be0237986d5243b87aa6d5130 **5a**

The bold parts of the node ID are the important parts. The rest are

random numbers. The last bold number of each row has only its most significant

bit pulled from the CRC32C function. The lower 3 bits are random.

Bootstrapping

In order to set ones initial node ID, the external IP needs to be known. This

is not a trivial problem. With this extension, all DHT responses SHOULD include

a top-level field called ip, containing a compact binary representation of

the requestor's IP and port. That is big endian IP followed by 2 bytes of big endian

port.

The IP portion is the same byte sequence used to verify the node ID.

It is important that the ip field is in the top level dictionary. Nodes that

enforce the node-ID will respond with an error message ("y": "e", "e": { ... }),

whereas a node that supports this extension but without enforcing it will respond

with a normal reply ("y": "r", "r": { ... }).

A DHT node which receives an ip result in a request SHOULD consider restarting

its DHT node with a new node ID, taking this IP into account. Since a single node

can not be trusted, there should be some mechanism to determine whether or

not the node has a correct understanding of its external IP or not. This could

be done by voting, or only restart the DHT once at least a certain number of

nodes, from separate searches, tells you your node ID is incorrect.

Rationale

The choice of using CRC32C instead of a more traditional cryptographic hash

function is justified primarily of these reasons:

1. it is a fast function

2. produces well distributed results

3. there is no need for the hash function to be one-way (the input set is

so small that any hash function could be reversed).

4. CRC32C (Castagnoli) is supported in hardware by SSE 4.2, which can

significantly speed up computation

There are primarily two tests run on SHA-1 and CRC32C to establish the

distribution of results. The first one is the number of bits in the output

set that contain every possible combination of bits. The CRC32C function

has a longer such prefix in its output than SHA-1. This means nodes will still

have well uniformly distributed IDs, even when IP addresses in use are not

uniformly distributed.

The following graph illustrate a few different hash functions with regard

to this property.

image:

This test takes into account IP addresses that are not globally routable, i.e.

reserved for local networks, multicast and other things. It also takes into

account that some /8 blocks are not in use by end-users and extremely unlikely

to ever run a DHT node. This makes the results likely to be very similar to

what we would see in the wild.

These results indicate that CRC32C provides the best uniformity in the results

in terms of bit prefixes where all possibilities are represented, and that

no more than 21 bits should be used from the result. If more than 21 bits

were used, there would be certain node IDs that would be impossible to

have, which would make routing sub-optimal.

The second test is a sanity test for the uniform distribution property.

The target space (32 bit integer) is divided up into 1000 buckets. Every valid

IP and r input is run through the algorithm and the result is put in the

bucket it falls in. The expectation is that each bucket has roughly an equal

number of results falling into it. The following graph shows the resulting

histogram, comparing SHA-1 and CRC32C.

image:

The source code for these tests can be found here_.

The reason to use CRC32C instead of the CRC32 implemented by zlib is that

Intel CPUs have hardware support for the CRC32C calculations. The input

being exactly 4 bytes is also deliberate, to make it fit in a single

instruction.

Enforcement

Once enforced, responses to get_peers requests whose node ID does not match its

external IP should be considered to not contain a token and thus not be

eligible as storage target. Implementations should take care that they find the

closest set of nodes which return a token and whose IDs matches their IPs

before sending a store request to those nodes. In the original DHT specification

only announce_peer stores data in the network, but any future extension

which stores data in the network SHOULD use the same restriction.

Non-compliant nodes also must not be considered to have sent a valid response

when checking the termination condition of a lookup. For example, if a lookup

stops sending requests after good responses have been received from the closest

8 peers then non-compliant nodes must not be included in that count.

Any peer on a local network address is exempt from this node ID verification.

This includes the following IP blocks:

10.0.0.0/8

reserved for local networks

172.16.0.0/12

reserved for local networks

192.168.0.0/16

reserved for local networks

169.254.0.0/16

reserved for self-assigned IPs

127.0.0.0/8

reserved for loopback

Backwards Compatibility and Transition

During some transition period, this restriction should not be enforced, and

peers whose node ID does not match this formula relative to their external IP

should not be blocked.

Requests from peers whose node ID does not match their external IP should

always be serviced, even after the transition period. The attack this protects

from is storing data on an attacker's node, not servicing an attackers request.

Even after the transition period, enforcing nodes MAY also announce to

non-compliant nodes as long as they also announce to the closest set of

compliant nodes.

Forward Compatibility

If the total size of the DHT grows to the point where the inherent size limit

in this proposal is too small, the modulus constants can be updated in a new

proposal, and another transition period where both sets of modulus constants

are accepted.

Discussion (0 threads)

Loading discussions...