← Back to BitTorrent Enhancement Proposals
BEP 30specificationDraftp2p

Merkle hash torrent extension

BitTorrent requires a torrent file containing a cryptographic digest of every piece of the content to allow the verification of pieces during the download. Large torrent files put a strain on the Web servers distributing them, and cannot be directly included in RSS feeds or gossiped around. A related problem is the use of large piece sizes. To keep the size of a torrent file small (as to not over

No reviews
Arno Bakker·Updated Mar 29, 2026·0 reviews·0 attestations·View source
Collections:BEPs — Merged

Specification

Abstract

BitTorrent requires a torrent file containing a cryptographic digest of

every piece of the content to allow the verification of pieces during the

download. Large torrent files put a strain on the Web servers distributing

them, and cannot be directly included in RSS feeds or gossiped around.

A related problem is the use of large piece sizes. To keep the size of a

torrent file small (as to not overload the Web servers) the number of hashes

for a content file is being kept small. For large files this implies that the

piece size over which digests are calculated must go up (up to 2MB pieces are

used). The large piece sizes affect the ability of peers to barter pieces.

Only when a piece has been completely received and verified using the digest

may it be traded with other peers. This means that it may be some time

before a node starts bartering with others.

Our solution to these two problems is to replace the list of digests with a

single Merkle hash [1]_. A Merkle hash can be used to verify the integrity

of the total content file as well as the individual blocks via a hierarchical

scheme. It works by constructing a hash tree of the content and using just

the root hash as data integrity protection. The simple root hash value also

allows for smaller piece sizes to be used. A common form of hash trees is the

Merkle hash tree, hence the name.

Simple Merkle Hashes

We propose a minimalistic design that does not affect the existing BitTorrent

protocol and clients very much. The design is backwards compatible in the

sense that clients supporting the Simple Merkle Hash extension can still be

made to process regular torrent files easily.

>From the content we construct a hash tree as follows. Given a piece size,

we calculate the hashes of all the pieces in the set of content files. Next,

we create a binary tree of sufficient height. Sufficient height means that the

lowest level in the tree has enough nodes to hold all piece hashes in the set.

We place all piece hashes in the tree, starting at the left-most leaf, see

figure. The remaining leaves in the tree are assigned a filler hash value of

0 (see Discussion). Finally, we calculate the hash values of the higher levels

in the tree, by concatenating the hash values of the two children (again left

to right) and computing the hash of that aggregate. This process ends in a

hash value for the root node, which we call the root hash. The hashing

algorithm used is SHA1, as in normal torrents.

The root hash along with the total size of the content-file set and the piece

size are now the only information in the system that needs to come from a

trusted source. A client that has only the root hash of a file set can check

any piece as follows (see figure). It first calculates the hash of the piece

it received. Along with this piece it should have received the hashes of the

piece's sibling and of its uncles, that is the sibling Y of its parent X,

and the uncle of that Y until the root is reached (uncles are marked with \*

in the figure). Using this information the client recalculates the root hash

of the tree, and compares it to the root hash it received from the trusted

source.

                                          0* = root hash
                                       /     \
                                   /            \
                               /                   \
                           /                          \
                       /                                 \
                     1*                                     2
                    / \                                    / \
                  /     \                                /     \
                /         \                            /         \
              /             \                        /             \
            /                 \                    /                 \
           3                   4                  5                   6* = uncle
          / \                 / \                / \                 / \
         /   \               /   \              /   \               /   \
        /     \             /     \            /     \             /     \
      7         8         9        10        11        12*       13        14 
     / \       / \       / \       / \       / \       / \       / \       / \
   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30
   
   P0   P1   P2   P3   P4   P5   P6   P7   P8*  P9*  P10  P11  P12   X    X    X
   = piece index                            =    =                   = filler hash 
                                            p    s                   
                                            i    i                   
                                            e    b                   
                                            c    l
                                            e    i
                                                 n
                                                 g

Inclusion in BitTorrent

The original publisher of the content-file set creates a so-called *Merkle

torrent* which is a torrent file that contains a root hash key in its

info part instead of a pieces key, see BEP 3 [#BEP-3]_.

When a seeder starts it uses the information in the Merkle torrent and the

file set to reconstruct the hash tree and registers itself with the tracker

using the hash value of the info part of the Merkle torrent, as usual

(see Discussion).

A BitTorrent client that supports the Simple Merkle Hash extension must also

support the Extension protocol (BEP 10) [#BEP-10]_. In particular, it must add

a Tr_hashpiece message name in the m field of the Extension

protocol's handshake message. Such a client must not send piece messages

but must use Extension protocol messages with type Tr_hashpiece to send

pieces.

A Tr_hashpiece message consists of an index, begin, hashlist and piece.

The hashlist consists of the piece's own hash, the piece's sibling hash, and

the uncles of the piece up until and including the root hash (see above and

Discussion). In particular, the hashlist is a list of 2-element lists. The

first element denotes the node offset in the tree, the second element is the

hash value. The node offset is the number of the node when numbered in a

breadth-first fashion (i.e., going left to right starting at the top).

Only the Tr_hashpiece message with begin field equal to 0 must contain a

filled hashlist, for all other begin values the hashlist must be empty. In

other words, the message containing the first subpiece should have a filled

hashlist, subsequent subpieces should not.

Formally, a Tr_hashpiece message has the following payload:

1. 4-byte index

2. 4-byte begin

3. 4-byte length of bencoded hashlist

4. the bencoded hashlist

5. the subpiece data

Upon receipt of a Tr_hashpiece message, the receiver recomputes the root

hash using the hashlist and compares it to the root hash in the Merkle

torrent. If they match, all the hash values are saved in the receiver's own

hash tree, such that they can be passed on to others when the piece is

downloaded from this receiver. When all subpieces have come in, the piece is

checked using the hash from the hash tree.

Discussion

We chose a binary tree for simplicity. Trees with larger degrees are also

possible. However, the number of hashes that need to be sent with each

piece is already small at about 2log of the file-set size.

Using the hash of the info part for registering at the tracker means

that for a given content-file set, the swarm that use a conventional torrent

file and the swarm that uses a Merkle torrent will be disjunct. The infohash

value is different, hence the swarms are known under different identifiers at

the trackers.

In theory we can create one swarm. In that swarm, new clients could serve

pieces to old clients. For the new clients to benefit from the old clients,

however, we need to add some way for the new to obtain the hashes required to

check a piece. Designing a fool proof solution for this problem is not

trivial.

Because we let the initial seeders recalculate the hash tree, this

extension is incompatible with the proposed HTTP Seeding extensions in

BEP 17 [#BEP-17]_ and 19 [#BEP-19]_ .

Including the root hash in a Tr_hashpiece message allows a quick sanity

check.

This extension paves the way for BitTorrent URLs. The only information

required for a client to commence sharing are the root hash, the total size,

the piece size, and a source of peer addresses (tracker, DHT).

Acknowledgements

Development of this extension was supported by funding from:

  • BSIK Freeband Communication I-Share project (Dutch Ministry of Economic
  • Affairs)

  • The European Community's Seventh Framework Programme in the P2P-Next
  • project under grant agreement no 216217.

    Thanks to Olaf van der Spek and Johan Pouwelse for ideas and suggestions.

    References

    .. [1] MERKLE, R. A Digital Signature Based on a Conventional Encryption

    Function. In Proceedings CRYPTOâ™87 (Santa Barbara, CA, USA, Aug. 1987),

    C. Pomerance, Ed., no. 293 in Lecture Notes in Computer Science,

    Springer-Verlag, pp. 369â“378.

    .. [#BEP-3] BEP_0003. The BitTorrent Protocol Specification, Cohen

    (http://www.bittorrent.org/beps/bep_0003.html)

    .. [#BEP-10] BEP_0010. Extension Protocol, Norberg, Strigeus, Hazel

    (http://www.bittorrent.org/beps/bep_0010.html)

    .. [#BEP-17] BEP_0017. HTTP Seeding, Hoffman

    (http://www.bittorrent.org/beps/bep_0017.html)

    .. [#BEP-19] BEP_0019. WebSeed - HTTP/FTP Seeding (GetRight style), Burford

    (http://www.bittorrent.org/beps/bep_0019.html)

    Copyright

    This document has been placed in the public domain.

    ..

    Local Variables:

    mode: indented-text

    indent-tabs-mode: nil

    sentence-end-double-space: t

    fill-column: 70

    coding: utf-8

    End:

    Discussion (0 threads)

    Loading discussions...