uTorrent transport protocol
uTorrent Transport Protocol
No reviewsSpecification
uTorrent Transport Protocol
credits
The uTorrent transport protocol was designed by Ludvig Strigeus,
Greg Hazel, Stanislav Shalunov, Arvid Norberg and Bram Cohen.
rationale
The motivation for uTP is for BitTorrent clients to not disrupt
internet connections, while still utilizing the unused bandwidth
fully.
The problem is that DSL and cable modems typically have a send
buffer disproportional to their max send rate, which can hold several
seconds worth of packets. BitTorrent traffic is typically background
transfers, and should have lower priority than checking email,
phone calls and browsing the web, but when using regular TCP
connections BitTorrent quickly fills up the send buffer, adding
multiple seconds delay to all interactive traffic.
The fact that BitTorrent uses multiple TCP connections gives it an
unfair advantage when competing with other services for bandwidth,
which exaggerates the effect of BitTorrent filling the upload pipe.
The reason for this is because TCP distributes the available bandwidth
evenly across connections, and the more connections one application
uses, the larger share of the bandwidth it gets.
The traditional solution to this problem is to cap the upload rate
of the BitTorrent client to 80% of the up-link capacity. 80% leaves
some head room for interactive traffic.
The main drawbacks with this solution are:
- The user needs to configure his/her BitTorrent client, it won't
work out-of-the-box.
- The user needs to know his/her internet connection's upload
capacity. This capacity may change, especially on laptops that
may connect to a large number of different networks.
- The headroom of 20% is arbitrary and wastes bandwidth. Whenever
there is no interactive traffic competing with BitTorrent, the
extra 20% are wasted. Whenever there is competing interactive
traffic, it cannot use more than 20% of the capacity.
uTP solves this problem by using the modem queue size as a controller
for its send rate. When the queue grows too large, it throttles back.
This lets it utilize the full upload capacity when there is no
competition for it, and it lets it throttle back to virtually nothing
when there is a lot of interactive traffic.
overview
This document assumes some knowledge of how TCP and window based
congestion control works.
uTP is a transport protocol layered on top of UDP. As such, it must
(and has the ability to) implement its own congestion control.
The main difference compared to TCP is the delay based congestion
control. See the congestion control section.
Like TCP, uTP uses window based congestion control. Each socket
has a max_window which determines the maximum number of bytes
the socket may have in-flight at any given time. Any packet that
has been sent, but not yet acked, is considered to be in-flight.
The number of bytes in-flight is cur_window.
A socket may only send a packet if cur_window + packet_size
is less than or equal to min(max_window, wnd_size).
The packet size may vary, see the packet sizes section.
wnd_size is the advertised window from the other end. It sets
an upper limit on the number of packets in-flight.
An implementation MAY violate the above rule if the max_window
is smaller than the packet size, and it paces the packets so that
the average cur_window is less than or equal to max_window.
Each socket keeps a state for the last delay measurement from the
other endpoint (reply_micro). Whenever a packet is received,
this state is updated by subtracting timestamp_microseconds
from the hosts current time, in microseconds (see header format).
Every time a packet is sent, the sockets reply_micro value is
put in the timestamp_difference_microseconds field of the packet
header.
Unlike TCP, sequence numbers and ACKs in uTP refers to packets, not
bytes. This means uTP cannot repackage data when resending it.
Each socket keeps a state of the next sequence number to use when
sending a packet, seq_nr. It also keeps a state of the sequence
number that was last received, ack_nr. The oldest unacked packet
is seq_nr - cur_window.
header format
version 1 header:
0 4 8 16 24 32 +-------+-------+---------------+---------------+---------------+ | type | ver | extension | connection_id | +-------+-------+---------------+---------------+---------------+ | timestamp_microseconds | +---------------+---------------+---------------+---------------+ | timestamp_difference_microseconds | +---------------+---------------+---------------+---------------+ | wnd_size | +---------------+---------------+---------------+---------------+ | seq_nr | ack_nr | +---------------+---------------+---------------+---------------+
All fields are in network byte order (big endian).
version
This is the protocol version. The current version is 1.
connection_id
This is a random, unique, number identifying all the packets that belong to
the same connection. Each socket has one connection ID for sending packets and
a different connection ID for receiving packets. The endpoint initiating the
connection decides which ID to use, and the return path has the same ID + 1.
timestamp_microseconds
This is the 'microseconds' parts of the timestamp of when this packet was sent.
This is set using gettimeofday() on posix and QueryPerformanceTimer()
on windows. The higher resolution this timestamp has, the better. The closer
to the actual transmit time it is set, the better.
timestamp_difference_microseconds
This is the difference between the local time and the timestamp in the last
received packet, at the time the last packet was received. This is the latest
one-way delay measurement of the link from the remote peer to the local machine.
When a socket is newly opened and doesn't have any delay samples yet, this must
be set to 0.
wnd_size
Advertised receive window. This is 32 bits wide and specified in bytes.
The window size is the number of bytes currently in-flight, i.e. sent but not
acked. The advertised receive window lets the other end cap the window size
if it cannot receive any faster, if its receive buffer is filling up.
When sending packets, this should be set to the number of bytes left in the
socket's receive buffer.
extension
The type of the first extension in a linked list of extension headers. 0 means
no extension.
There is currently one extension:
1. Selective acks
Extensions are linked, just like TCP options. If the extension field is non-zero,
immediately following the uTP header are two bytes
0 8 16 +---------------+---------------+ | extension | len | +---------------+---------------+
where extension specifies the type of the next extension in the linked list,
0 terminates the list. And len specifies the number of bytes of this extension.
Unknown extensions can be skipped by simply advancing len bytes.
Selective ACK
,,,,,,,,,,,,,
Selective ACK is an extension that can selectively ACK packets non-sequentially.
Its payload is a bitmask of at least 32 bits, in multiples of 32 bits. Each bit
represents one packet in the send window. Bits that are outside of the send window
are ignored. A set bit specifies that packet has been received, a cleared bit
specifies that the packet has not been received. The header looks like this
0 8 16 +---------------+---------------+---------------+---------------+ | extension | len | bitmask +---------------+---------------+---------------+---------------+ | +---------------+---------------+
Note that the len field of extensions refer to bytes, which in this extension
must be at least 4, and in multiples of 4.
The selective ACK is only sent when at least one sequence number was skipped in
the received stream. The first bit in the mask therefore represents ack_nr + 2.
ack_nr + 1 is assumed to have been dropped or be missing when this packet was sent.
A set bit represents a packet that has been received, a cleared bit represents
a packet that has not yet been received.
The bitmask has reverse byte order. The first byte represents packets [ack_nr + 2,
ack_nr + 2 + 7] in reverse order. The least significant bit in the byte represents
ack_nr + 2, the most significant bit in the byte represents ack_nr + 2 + 7. The
next byte in the mask represents [ack_nr + 2 + 8, ack_nr + 2 + 15] in reverse order,
and so on. The bitmask is not limited to 32 bits but can be of any size.
Here is the layout of a bitmask representing the first 32 packet acks
represented in a selective ACK bitfield
0 8 16 +---------------+---------------+---------------+---------------+ | 9 8 ... 3 2 | 17 ... 10 | 25 ... 18 | 33 ... 26 | +---------------+---------------+---------------+---------------+
The number in the diagram maps the bit in the bitmask to the offset to add to
ack_nr in order to calculate the sequence number that the bit is ACKing.
type
The type field describes the type of packet.
It can be one of:
ST_DATA = 0
regular data packet. Socket is in connected state and has data
to send. An ST_DATA packet always has a data payload.
ST_FIN = 1
Finalize the connection. This is the last packet. It closes
the connection, similar to TCP FIN flag.
This connection will never have a sequence number greater than
the sequence number in this packet. The socket records this sequence
number as eof_pkt. This lets the socket wait for packets that
might still be missing and arrive out of order even after receiving
the ST_FIN packet.
ST_STATE = 2
State packet. Used to transmit an ACK with no data. Packets that don't
include any payload do not increase the seq_nr.
ST_RESET = 3
Terminate connection forcefully. Similar to TCP RST flag. The remote
host does not have any state for this connection. It is stale and should
be terminated.
ST_SYN = 4
Connect SYN. Similar to TCP SYN flag, this packet initiates a connection.
The sequence number is initialized to 1. The connection ID is initialized
to a random number. The syn packet is special, all subsequent packets sent
on this connection (except for re-sends of the ST_SYN) are sent with the
connection ID + 1. The connection ID is what the other end is expected to
use in its responses.
When receiving an ST_SYN, the new socket should be initialized with the
ID in the packet header. The send ID for the socket should be initialized
to the ID + 1. The sequence number for the return channel is initialized
to a random number. The other end expects an ST_STATE packet (only an ACK)
in response.
seq_nr
This is the sequence number of this packet. As opposed to TCP, uTP sequence
numbers are not referring to bytes, but packets. The sequence number tells the
other end in which order packets should be served back to the application layer.
ack_nr
This is the sequence number the sender of the packet last received in the other
direction.
connection setup
Here is a diagram illustrating the exchanges and states to initiate
a connection. The c. refers to a state in the socket itself, pkt.
refers to a field in the packet header.
initiating endpoint accepting endpoint | c.state = CS_SYN_SENT | | c.seq_nr = 1 | | c.conn_id_recv = rand() | | c.conn_id_send = c.conn_id_recv + 1 | | | | | | ST_SYN | | seq_nr=c.seq_nr++ | | ack_nr=* | | conn_id=c.rcv_conn_id | | >-------------------------------------------> | | c.receive_conn_id = pkt.conn_id+1 | | c.send_conn_id = pkt.conn_id | | c.seq_nr = rand() | | c.ack_nr = pkt.seq_nr | | c.state = CS_SYN_RECV | | | | | | | | | | ST_STATE | | seq_nr=c.seq_nr++ | | ack_nr=c.ack_nr | | conn_id=c.send_conn_id | | <------------------------------------------< | | c.state = CS_CONNECTED | | c.ack_nr = pkt.seq_nr | | | | | | | | ST_DATA | | seq_nr=c.seq_nr++ | | ack_nr=c.ack_nr | | conn_id=c.conn_id_send | | >-------------------------------------------> | | c.ack_nr = pkt.seq_nr | | c.state = CS_CONNECTED | | | | | connection established .. ..|.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..|.. .. | | | ST_DATA | | seq_nr=c.seq_nr++ | | ack_nr=c.ack_nr | | conn_id=c.send_conn_id | | <------------------------------------------< | | c.ack_nr = pkt.seq_nr | | | | | V V
Connections are identified by their conn_id header. If the connection ID of a new
connection collides with an existing connection, the connection attempt will fails, since
the ST_SYN packet will be unexpected in the existing stream, and ignored.
packet loss
If the packet with sequence number (seq_nr - cur_window) has not been acked
(t
[Content truncated — view full spec at source]
Discussion (0 threads)
Loading discussions...