Merkle hash torrent extension
BitTorrent requires a torrent file containing a cryptographic digest of every piece of the content to allow the verification of pieces during the download. Large torrent files put a strain on the Web servers distributing them, and cannot be directly included in RSS feeds or gossiped around. A related problem is the use of large piece sizes. To keep the size of a torrent file small (as to not over
No reviewsSpecification
Abstract
BitTorrent requires a torrent file containing a cryptographic digest of
every piece of the content to allow the verification of pieces during the
download. Large torrent files put a strain on the Web servers distributing
them, and cannot be directly included in RSS feeds or gossiped around.
A related problem is the use of large piece sizes. To keep the size of a
torrent file small (as to not overload the Web servers) the number of hashes
for a content file is being kept small. For large files this implies that the
piece size over which digests are calculated must go up (up to 2MB pieces are
used). The large piece sizes affect the ability of peers to barter pieces.
Only when a piece has been completely received and verified using the digest
may it be traded with other peers. This means that it may be some time
before a node starts bartering with others.
Our solution to these two problems is to replace the list of digests with a
single Merkle hash [1]_. A Merkle hash can be used to verify the integrity
of the total content file as well as the individual blocks via a hierarchical
scheme. It works by constructing a hash tree of the content and using just
the root hash as data integrity protection. The simple root hash value also
allows for smaller piece sizes to be used. A common form of hash trees is the
Merkle hash tree, hence the name.
Simple Merkle Hashes
We propose a minimalistic design that does not affect the existing BitTorrent
protocol and clients very much. The design is backwards compatible in the
sense that clients supporting the Simple Merkle Hash extension can still be
made to process regular torrent files easily.
>From the content we construct a hash tree as follows. Given a piece size,
we calculate the hashes of all the pieces in the set of content files. Next,
we create a binary tree of sufficient height. Sufficient height means that the
lowest level in the tree has enough nodes to hold all piece hashes in the set.
We place all piece hashes in the tree, starting at the left-most leaf, see
figure. The remaining leaves in the tree are assigned a filler hash value of
0 (see Discussion). Finally, we calculate the hash values of the higher levels
in the tree, by concatenating the hash values of the two children (again left
to right) and computing the hash of that aggregate. This process ends in a
hash value for the root node, which we call the root hash. The hashing
algorithm used is SHA1, as in normal torrents.
The root hash along with the total size of the content-file set and the piece
size are now the only information in the system that needs to come from a
trusted source. A client that has only the root hash of a file set can check
any piece as follows (see figure). It first calculates the hash of the piece
it received. Along with this piece it should have received the hashes of the
piece's sibling and of its uncles, that is the sibling Y of its parent X,
and the uncle of that Y until the root is reached (uncles are marked with \*
in the figure). Using this information the client recalculates the root hash
of the tree, and compares it to the root hash it received from the trusted
source.
0* = root hash
/ \
/ \
/ \
/ \
/ \
1* 2
/ \ / \
/ \ / \
/ \ / \
/ \ / \
/ \ / \
3 4 5 6* = uncle
/ \ / \ / \ / \
/ \ / \ / \ / \
/ \ / \ / \ / \
7 8 9 10 11 12* 13 14
/ \ / \ / \ / \ / \ / \ / \ / \
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
P0 P1 P2 P3 P4 P5 P6 P7 P8* P9* P10 P11 P12 X X X
= piece index = = = filler hash
p s
i i
e b
c l
e i
n
g
Inclusion in BitTorrent
The original publisher of the content-file set creates a so-called *Merkle
torrent* which is a torrent file that contains a root hash key in its
info part instead of a pieces key, see BEP 3 [#BEP-3]_.
When a seeder starts it uses the information in the Merkle torrent and the
file set to reconstruct the hash tree and registers itself with the tracker
using the hash value of the info part of the Merkle torrent, as usual
(see Discussion).
A BitTorrent client that supports the Simple Merkle Hash extension must also
support the Extension protocol (BEP 10) [#BEP-10]_. In particular, it must add
a Tr_hashpiece message name in the m field of the Extension
protocol's handshake message. Such a client must not send piece messages
but must use Extension protocol messages with type Tr_hashpiece to send
pieces.
A Tr_hashpiece message consists of an index, begin, hashlist and piece.
The hashlist consists of the piece's own hash, the piece's sibling hash, and
the uncles of the piece up until and including the root hash (see above and
Discussion). In particular, the hashlist is a list of 2-element lists. The
first element denotes the node offset in the tree, the second element is the
hash value. The node offset is the number of the node when numbered in a
breadth-first fashion (i.e., going left to right starting at the top).
Only the Tr_hashpiece message with begin field equal to 0 must contain a
filled hashlist, for all other begin values the hashlist must be empty. In
other words, the message containing the first subpiece should have a filled
hashlist, subsequent subpieces should not.
Formally, a Tr_hashpiece message has the following payload:
1. 4-byte index
2. 4-byte begin
3. 4-byte length of bencoded hashlist
4. the bencoded hashlist
5. the subpiece data
Upon receipt of a Tr_hashpiece message, the receiver recomputes the root
hash using the hashlist and compares it to the root hash in the Merkle
torrent. If they match, all the hash values are saved in the receiver's own
hash tree, such that they can be passed on to others when the piece is
downloaded from this receiver. When all subpieces have come in, the piece is
checked using the hash from the hash tree.
Discussion
We chose a binary tree for simplicity. Trees with larger degrees are also
possible. However, the number of hashes that need to be sent with each
piece is already small at about 2log of the file-set size.
Using the hash of the info part for registering at the tracker means
that for a given content-file set, the swarm that use a conventional torrent
file and the swarm that uses a Merkle torrent will be disjunct. The infohash
value is different, hence the swarms are known under different identifiers at
the trackers.
In theory we can create one swarm. In that swarm, new clients could serve
pieces to old clients. For the new clients to benefit from the old clients,
however, we need to add some way for the new to obtain the hashes required to
check a piece. Designing a fool proof solution for this problem is not
trivial.
Because we let the initial seeders recalculate the hash tree, this
extension is incompatible with the proposed HTTP Seeding extensions in
BEP 17 [#BEP-17]_ and 19 [#BEP-19]_ .
Including the root hash in a Tr_hashpiece message allows a quick sanity
check.
This extension paves the way for BitTorrent URLs. The only information
required for a client to commence sharing are the root hash, the total size,
the piece size, and a source of peer addresses (tracker, DHT).
Acknowledgements
Development of this extension was supported by funding from:
Affairs)
project under grant agreement no 216217.
Thanks to Olaf van der Spek and Johan Pouwelse for ideas and suggestions.
References
.. [1] MERKLE, R. A Digital Signature Based on a Conventional Encryption
Function. In Proceedings CRYPTOâ87 (Santa Barbara, CA, USA, Aug. 1987),
C. Pomerance, Ed., no. 293 in Lecture Notes in Computer Science,
Springer-Verlag, pp. 369â378.
.. [#BEP-3] BEP_0003. The BitTorrent Protocol Specification, Cohen
(http://www.bittorrent.org/beps/bep_0003.html)
.. [#BEP-10] BEP_0010. Extension Protocol, Norberg, Strigeus, Hazel
(http://www.bittorrent.org/beps/bep_0010.html)
.. [#BEP-17] BEP_0017. HTTP Seeding, Hoffman
(http://www.bittorrent.org/beps/bep_0017.html)
.. [#BEP-19] BEP_0019. WebSeed - HTTP/FTP Seeding (GetRight style), Burford
(http://www.bittorrent.org/beps/bep_0019.html)
Copyright
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
Discussion (0 threads)
Loading discussions...