732 lines
27 KiB
Plaintext
732 lines
27 KiB
Plaintext
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Network Working Group E. Blanton
|
|||
|
|
Request for Comments: 3517 Purdue University
|
|||
|
|
Category: Standards Track M. Allman
|
|||
|
|
BBN/NASA GRC
|
|||
|
|
K. Fall
|
|||
|
|
Intel Research
|
|||
|
|
L. Wang
|
|||
|
|
University of Kentucky
|
|||
|
|
April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
A Conservative Selective Acknowledgment (SACK)-based
|
|||
|
|
Loss Recovery Algorithm for TCP
|
|||
|
|
|
|||
|
|
Status of this Memo
|
|||
|
|
|
|||
|
|
This document specifies an Internet standards track protocol for the
|
|||
|
|
Internet community, and requests discussion and suggestions for
|
|||
|
|
improvements. Please refer to the current edition of the "Internet
|
|||
|
|
Official Protocol Standards" (STD 1) for the standardization state
|
|||
|
|
and status of this protocol. Distribution of this memo is unlimited.
|
|||
|
|
|
|||
|
|
Copyright Notice
|
|||
|
|
|
|||
|
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|
|||
|
|
Abstract
|
|||
|
|
|
|||
|
|
This document presents a conservative loss recovery algorithm for TCP
|
|||
|
|
that is based on the use of the selective acknowledgment (SACK) TCP
|
|||
|
|
option. The algorithm presented in this document conforms to the
|
|||
|
|
spirit of the current congestion control specification (RFC 2581),
|
|||
|
|
but allows TCP senders to recover more effectively when multiple
|
|||
|
|
segments are lost from a single flight of data.
|
|||
|
|
|
|||
|
|
Terminology
|
|||
|
|
|
|||
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|||
|
|
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
|
|||
|
|
document are to be interpreted as described in BCP 14, RFC 2119
|
|||
|
|
[RFC2119].
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 1]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
1 Introduction
|
|||
|
|
|
|||
|
|
This document presents a conservative loss recovery algorithm for TCP
|
|||
|
|
that is based on the use of the selective acknowledgment (SACK) TCP
|
|||
|
|
option. While the TCP SACK [RFC2018] is being steadily deployed in
|
|||
|
|
the Internet [All00], there is evidence that hosts are not using the
|
|||
|
|
SACK information when making retransmission and congestion control
|
|||
|
|
decisions [PF01]. The goal of this document is to outline one
|
|||
|
|
straightforward method for TCP implementations to use SACK
|
|||
|
|
information to increase performance.
|
|||
|
|
|
|||
|
|
[RFC2581] allows advanced loss recovery algorithms to be used by TCP
|
|||
|
|
[RFC793] provided that they follow the spirit of TCP's congestion
|
|||
|
|
control algorithms [RFC2581, RFC2914]. [RFC2582] outlines one such
|
|||
|
|
advanced recovery algorithm called NewReno. This document outlines a
|
|||
|
|
loss recovery algorithm that uses the SACK [RFC2018] TCP option to
|
|||
|
|
enhance TCP's loss recovery. The algorithm outlined in this
|
|||
|
|
document, heavily based on the algorithm detailed in [FF96], is a
|
|||
|
|
conservative replacement of the fast recovery algorithm [Jac90,
|
|||
|
|
RFC2581]. The algorithm specified in this document is a
|
|||
|
|
straightforward SACK-based loss recovery strategy that follows the
|
|||
|
|
guidelines set in [RFC2581] and can safely be used in TCP
|
|||
|
|
implementations. Alternate SACK-based loss recovery methods can be
|
|||
|
|
used in TCP as implementers see fit (as long as the alternate
|
|||
|
|
algorithms follow the guidelines provided in [RFC2581]). Please
|
|||
|
|
note, however, that the SACK-based decisions in this document (such
|
|||
|
|
as what segments are to be sent at what time) are largely decoupled
|
|||
|
|
from the congestion control algorithms, and as such can be treated as
|
|||
|
|
separate issues if so desired.
|
|||
|
|
|
|||
|
|
2 Definitions
|
|||
|
|
|
|||
|
|
The reader is expected to be familiar with the definitions given in
|
|||
|
|
[RFC2581].
|
|||
|
|
|
|||
|
|
The reader is assumed to be familiar with selective acknowledgments
|
|||
|
|
as specified in [RFC2018].
|
|||
|
|
|
|||
|
|
For the purposes of explaining the SACK-based loss recovery algorithm
|
|||
|
|
we define four variables that a TCP sender stores:
|
|||
|
|
|
|||
|
|
"HighACK" is the sequence number of the highest byte of data that
|
|||
|
|
has been cumulatively ACKed at a given point.
|
|||
|
|
|
|||
|
|
"HighData" is the highest sequence number transmitted at a given
|
|||
|
|
point.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 2]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
"HighRxt" is the highest sequence number which has been
|
|||
|
|
retransmitted during the current loss recovery phase.
|
|||
|
|
|
|||
|
|
"Pipe" is a sender's estimate of the number of bytes outstanding
|
|||
|
|
in the network. This is used during recovery for limiting the
|
|||
|
|
sender's sending rate. The pipe variable allows TCP to use a
|
|||
|
|
fundamentally different congestion control than specified in
|
|||
|
|
[RFC2581]. The algorithm is often referred to as the "pipe
|
|||
|
|
algorithm".
|
|||
|
|
|
|||
|
|
For the purposes of this specification we define a "duplicate
|
|||
|
|
acknowledgment" as a segment that arrives with no data and an
|
|||
|
|
acknowledgment (ACK) number that is equal to the current value of
|
|||
|
|
HighACK, as described in [RFC2581].
|
|||
|
|
|
|||
|
|
We define a variable "DupThresh" that holds the number of duplicate
|
|||
|
|
acknowledgments required to trigger a retransmission. Per [RFC2581]
|
|||
|
|
this threshold is defined to be 3 duplicate acknowledgments.
|
|||
|
|
However, implementers should consult any updates to [RFC2581] to
|
|||
|
|
determine the current value for DupThresh (or method for determining
|
|||
|
|
its value).
|
|||
|
|
|
|||
|
|
Finally, a range of sequence numbers [A,B] is said to "cover"
|
|||
|
|
sequence number S if A <= S <= B.
|
|||
|
|
|
|||
|
|
3 Keeping Track of SACK Information
|
|||
|
|
|
|||
|
|
For a TCP sender to implement the algorithm defined in the next
|
|||
|
|
section it must keep a data structure to store incoming selective
|
|||
|
|
acknowledgment information on a per connection basis. Such a data
|
|||
|
|
structure is commonly called the "scoreboard". The specifics of the
|
|||
|
|
scoreboard data structure are out of scope for this document (as long
|
|||
|
|
as the implementation can perform all functions required by this
|
|||
|
|
specification).
|
|||
|
|
|
|||
|
|
Note that this document refers to keeping account of (marking)
|
|||
|
|
individual octets of data transferred across a TCP connection. A
|
|||
|
|
real-world implementation of the scoreboard would likely prefer to
|
|||
|
|
manage this data as sequence number ranges. The algorithms presented
|
|||
|
|
here allow this, but require arbitrary sequence number ranges to be
|
|||
|
|
marked as having been selectively acknowledged.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 3]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
4 Processing and Acting Upon SACK Information
|
|||
|
|
|
|||
|
|
For the purposes of the algorithm defined in this document the
|
|||
|
|
scoreboard SHOULD implement the following functions:
|
|||
|
|
|
|||
|
|
Update ():
|
|||
|
|
|
|||
|
|
Given the information provided in an ACK, each octet that is
|
|||
|
|
cumulatively ACKed or SACKed should be marked accordingly in the
|
|||
|
|
scoreboard data structure, and the total number of octets SACKed
|
|||
|
|
should be recorded.
|
|||
|
|
|
|||
|
|
Note: SACK information is advisory and therefore SACKed data MUST
|
|||
|
|
NOT be removed from TCP's retransmission buffer until the data is
|
|||
|
|
cumulatively acknowledged [RFC2018].
|
|||
|
|
|
|||
|
|
IsLost (SeqNum):
|
|||
|
|
|
|||
|
|
This routine returns whether the given sequence number is
|
|||
|
|
considered to be lost. The routine returns true when either
|
|||
|
|
DupThresh discontiguous SACKed sequences have arrived above
|
|||
|
|
'SeqNum' or (DupThresh * SMSS) bytes with sequence numbers greater
|
|||
|
|
than 'SeqNum' have been SACKed. Otherwise, the routine returns
|
|||
|
|
false.
|
|||
|
|
|
|||
|
|
SetPipe ():
|
|||
|
|
|
|||
|
|
This routine traverses the sequence space from HighACK to HighData
|
|||
|
|
and MUST set the "pipe" variable to an estimate of the number of
|
|||
|
|
octets that are currently in transit between the TCP sender and
|
|||
|
|
the TCP receiver. After initializing pipe to zero the following
|
|||
|
|
steps are taken for each octet 'S1' in the sequence space between
|
|||
|
|
HighACK and HighData that has not been SACKed:
|
|||
|
|
|
|||
|
|
(a) If IsLost (S1) returns false:
|
|||
|
|
|
|||
|
|
Pipe is incremented by 1 octet.
|
|||
|
|
|
|||
|
|
The effect of this condition is that pipe is incremented for
|
|||
|
|
packets that have not been SACKed and have not been determined
|
|||
|
|
to have been lost (i.e., those segments that are still assumed
|
|||
|
|
to be in the network).
|
|||
|
|
|
|||
|
|
(b) If S1 <= HighRxt:
|
|||
|
|
|
|||
|
|
Pipe is incremented by 1 octet.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 4]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
The effect of this condition is that pipe is incremented for
|
|||
|
|
the retransmission of the octet.
|
|||
|
|
|
|||
|
|
Note that octets retransmitted without being considered lost are
|
|||
|
|
counted twice by the above mechanism.
|
|||
|
|
|
|||
|
|
NextSeg ():
|
|||
|
|
|
|||
|
|
This routine uses the scoreboard data structure maintained by the
|
|||
|
|
Update() function to determine what to transmit based on the SACK
|
|||
|
|
information that has arrived from the data receiver (and hence
|
|||
|
|
been marked in the scoreboard). NextSeg () MUST return the
|
|||
|
|
sequence number range of the next segment that is to be
|
|||
|
|
transmitted, per the following rules:
|
|||
|
|
|
|||
|
|
(1) If there exists a smallest unSACKed sequence number 'S2' that
|
|||
|
|
meets the following three criteria for determining loss, the
|
|||
|
|
sequence range of one segment of up to SMSS octets starting
|
|||
|
|
with S2 MUST be returned.
|
|||
|
|
|
|||
|
|
(1.a) S2 is greater than HighRxt.
|
|||
|
|
|
|||
|
|
(1.b) S2 is less than the highest octet covered by any
|
|||
|
|
received SACK.
|
|||
|
|
|
|||
|
|
(1.c) IsLost (S2) returns true.
|
|||
|
|
|
|||
|
|
(2) If no sequence number 'S2' per rule (1) exists but there
|
|||
|
|
exists available unsent data and the receiver's advertised
|
|||
|
|
window allows, the sequence range of one segment of up to SMSS
|
|||
|
|
octets of previously unsent data starting with sequence number
|
|||
|
|
HighData+1 MUST be returned.
|
|||
|
|
|
|||
|
|
(3) If the conditions for rules (1) and (2) fail, but there exists
|
|||
|
|
an unSACKed sequence number 'S3' that meets the criteria for
|
|||
|
|
detecting loss given in steps (1.a) and (1.b) above
|
|||
|
|
(specifically excluding step (1.c)) then one segment of up to
|
|||
|
|
SMSS octets starting with S3 MAY be returned.
|
|||
|
|
|
|||
|
|
Note that rule (3) is a sort of retransmission "last resort".
|
|||
|
|
It allows for retransmission of sequence numbers even when the
|
|||
|
|
sender has less certainty a segment has been lost than as with
|
|||
|
|
rule (1). Retransmitting segments via rule (3) will help
|
|||
|
|
sustain TCP's ACK clock and therefore can potentially help
|
|||
|
|
avoid retransmission timeouts. However, in sending these
|
|||
|
|
segments the sender has two copies of the same data considered
|
|||
|
|
to be in the network (and also in the Pipe estimate). When an
|
|||
|
|
ACK or SACK arrives covering this retransmitted segment, the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 5]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
sender cannot be sure exactly how much data left the network
|
|||
|
|
(one of the two transmissions of the packet or both
|
|||
|
|
transmissions of the packet). Therefore the sender may
|
|||
|
|
underestimate Pipe by considering both segments to have left
|
|||
|
|
the network when it is possible that only one of the two has.
|
|||
|
|
|
|||
|
|
We believe that the triggering of rule (3) will be rare and
|
|||
|
|
that the implications are likely limited to corner cases
|
|||
|
|
relative to the entire recovery algorithm. Therefore we leave
|
|||
|
|
the decision of whether or not to use rule (3) to
|
|||
|
|
implementors.
|
|||
|
|
|
|||
|
|
(4) If the conditions for each of (1), (2), and (3) are not met,
|
|||
|
|
then NextSeg () MUST indicate failure, and no segment is
|
|||
|
|
returned.
|
|||
|
|
|
|||
|
|
Note: The SACK-based loss recovery algorithm outlined in this
|
|||
|
|
document requires more computational resources than previous TCP loss
|
|||
|
|
recovery strategies. However, we believe the scoreboard data
|
|||
|
|
structure can be implemented in a reasonably efficient manner (both
|
|||
|
|
in terms of computation complexity and memory usage) in most TCP
|
|||
|
|
implementations.
|
|||
|
|
|
|||
|
|
5 Algorithm Details
|
|||
|
|
|
|||
|
|
Upon the receipt of any ACK containing SACK information, the
|
|||
|
|
scoreboard MUST be updated via the Update () routine.
|
|||
|
|
|
|||
|
|
Upon the receipt of the first (DupThresh - 1) duplicate ACKs, the
|
|||
|
|
scoreboard is to be updated as normal. Note: The first and second
|
|||
|
|
duplicate ACKs can also be used to trigger the transmission of
|
|||
|
|
previously unsent segments using the Limited Transmit algorithm
|
|||
|
|
[RFC3042].
|
|||
|
|
|
|||
|
|
When a TCP sender receives the duplicate ACK corresponding to
|
|||
|
|
DupThresh ACKs, the scoreboard MUST be updated with the new SACK
|
|||
|
|
information (via Update ()). If no previous loss event has occurred
|
|||
|
|
on the connection or the cumulative acknowledgment point is beyond
|
|||
|
|
the last value of RecoveryPoint, a loss recovery phase SHOULD be
|
|||
|
|
initiated, per the fast retransmit algorithm outlined in [RFC2581].
|
|||
|
|
The following steps MUST be taken:
|
|||
|
|
|
|||
|
|
(1) RecoveryPoint = HighData
|
|||
|
|
|
|||
|
|
When the TCP sender receives a cumulative ACK for this data octet
|
|||
|
|
the loss recovery phase is terminated.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 6]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
(2) ssthresh = cwnd = (FlightSize / 2)
|
|||
|
|
|
|||
|
|
The congestion window (cwnd) and slow start threshold (ssthresh)
|
|||
|
|
are reduced to half of FlightSize per [RFC2581].
|
|||
|
|
|
|||
|
|
(3) Retransmit the first data segment presumed dropped -- the segment
|
|||
|
|
starting with sequence number HighACK + 1. To prevent repeated
|
|||
|
|
retransmission of the same data, set HighRxt to the highest
|
|||
|
|
sequence number in the retransmitted segment.
|
|||
|
|
|
|||
|
|
(4) Run SetPipe ()
|
|||
|
|
|
|||
|
|
Set a "pipe" variable to the number of outstanding octets
|
|||
|
|
currently "in the pipe"; this is the data which has been sent by
|
|||
|
|
the TCP sender but for which no cumulative or selective
|
|||
|
|
acknowledgment has been received and the data has not been
|
|||
|
|
determined to have been dropped in the network. It is assumed
|
|||
|
|
that the data is still traversing the network path.
|
|||
|
|
|
|||
|
|
(5) In order to take advantage of potential additional available
|
|||
|
|
cwnd, proceed to step (C) below.
|
|||
|
|
|
|||
|
|
Once a TCP is in the loss recovery phase the following procedure MUST
|
|||
|
|
be used for each arriving ACK:
|
|||
|
|
|
|||
|
|
(A) An incoming cumulative ACK for a sequence number greater than
|
|||
|
|
RecoveryPoint signals the end of loss recovery and the loss
|
|||
|
|
recovery phase MUST be terminated. Any information contained in
|
|||
|
|
the scoreboard for sequence numbers greater than the new value of
|
|||
|
|
HighACK SHOULD NOT be cleared when leaving the loss recovery
|
|||
|
|
phase.
|
|||
|
|
|
|||
|
|
(B) Upon receipt of an ACK that does not cover RecoveryPoint the
|
|||
|
|
following actions MUST be taken:
|
|||
|
|
|
|||
|
|
(B.1) Use Update () to record the new SACK information conveyed
|
|||
|
|
by the incoming ACK.
|
|||
|
|
|
|||
|
|
(B.2) Use SetPipe () to re-calculate the number of octets still
|
|||
|
|
in the network.
|
|||
|
|
|
|||
|
|
(C) If cwnd - pipe >= 1 SMSS the sender SHOULD transmit one or more
|
|||
|
|
segments as follows:
|
|||
|
|
|
|||
|
|
(C.1) The scoreboard MUST be queried via NextSeg () for the
|
|||
|
|
sequence number range of the next segment to transmit (if any),
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 7]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
and the given segment sent. If NextSeg () returns failure (no
|
|||
|
|
data to send) return without sending anything (i.e., terminate
|
|||
|
|
steps C.1 -- C.5).
|
|||
|
|
|
|||
|
|
(C.2) If any of the data octets sent in (C.1) are below HighData,
|
|||
|
|
HighRxt MUST be set to the highest sequence number of the
|
|||
|
|
retransmitted segment.
|
|||
|
|
|
|||
|
|
(C.3) If any of the data octets sent in (C.1) are above HighData,
|
|||
|
|
HighData must be updated to reflect the transmission of
|
|||
|
|
previously unsent data.
|
|||
|
|
|
|||
|
|
(C.4) The estimate of the amount of data outstanding in the
|
|||
|
|
network must be updated by incrementing pipe by the number of
|
|||
|
|
octets transmitted in (C.1).
|
|||
|
|
|
|||
|
|
(C.5) If cwnd - pipe >= 1 SMSS, return to (C.1)
|
|||
|
|
|
|||
|
|
5.1 Retransmission Timeouts
|
|||
|
|
|
|||
|
|
In order to avoid memory deadlocks, the TCP receiver is allowed to
|
|||
|
|
discard data that has already been selectively acknowledged. As a
|
|||
|
|
result, [RFC2018] suggests that a TCP sender SHOULD expunge the SACK
|
|||
|
|
information gathered from a receiver upon a retransmission timeout
|
|||
|
|
"since the timeout might indicate that the data receiver has
|
|||
|
|
reneged." Additionally, a TCP sender MUST "ignore prior SACK
|
|||
|
|
information in determining which data to retransmit." However, a
|
|||
|
|
SACK TCP sender SHOULD still use all SACK information made available
|
|||
|
|
during the slow start phase of loss recovery following an RTO.
|
|||
|
|
|
|||
|
|
If an RTO occurs during loss recovery as specified in this document,
|
|||
|
|
RecoveryPoint MUST be set to HighData. Further, the new value of
|
|||
|
|
RecoveryPoint MUST be preserved and the loss recovery algorithm
|
|||
|
|
outlined in this document MUST be terminated. In addition, a new
|
|||
|
|
recovery phase (as described in section 5) MUST NOT be initiated
|
|||
|
|
until HighACK is greater than or equal to the new value of
|
|||
|
|
RecoveryPoint.
|
|||
|
|
|
|||
|
|
As described in Sections 4 and 5, Update () SHOULD continue to be
|
|||
|
|
used appropriately upon receipt of ACKs. This will allow the slow
|
|||
|
|
start recovery period to benefit from all available information
|
|||
|
|
provided by the receiver, despite the fact that SACK information was
|
|||
|
|
expunged due to the RTO.
|
|||
|
|
|
|||
|
|
If there are segments missing from the receiver's buffer following
|
|||
|
|
processing of the retransmitted segment, the corresponding ACK will
|
|||
|
|
contain SACK information. In this case, a TCP sender SHOULD use this
|
|||
|
|
SACK information when determining what data should be sent in each
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 8]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
segment of the slow start. The exact algorithm for this selection is
|
|||
|
|
not specified in this document (specifically NextSeg () is
|
|||
|
|
inappropriate during slow start after an RTO). A relatively
|
|||
|
|
straightforward approach to "filling in" the sequence space reported
|
|||
|
|
as missing should be a reasonable approach.
|
|||
|
|
|
|||
|
|
6 Managing the RTO Timer
|
|||
|
|
|
|||
|
|
The standard TCP RTO estimator is defined in [RFC2988]. Due to the
|
|||
|
|
fact that the SACK algorithm in this document can have an impact on
|
|||
|
|
the behavior of the estimator, implementers may wish to consider how
|
|||
|
|
the timer is managed. [RFC2988] calls for the RTO timer to be
|
|||
|
|
re-armed each time an ACK arrives that advances the cumulative ACK
|
|||
|
|
point. Because the algorithm presented in this document can keep the
|
|||
|
|
ACK clock going through a fairly significant loss event,
|
|||
|
|
(comparatively longer than the algorithm described in [RFC2581]), on
|
|||
|
|
some networks the loss event could last longer than the RTO. In this
|
|||
|
|
case the RTO timer would expire prematurely and a segment that need
|
|||
|
|
not be retransmitted would be resent.
|
|||
|
|
|
|||
|
|
Therefore we give implementers the latitude to use the standard
|
|||
|
|
[RFC2988] style RTO management or, optionally, a more careful variant
|
|||
|
|
that re-arms the RTO timer on each retransmission that is sent during
|
|||
|
|
recovery MAY be used. This provides a more conservative timer than
|
|||
|
|
specified in [RFC2988], and so may not always be an attractive
|
|||
|
|
alternative. However, in some cases it may prevent needless
|
|||
|
|
retransmissions, go-back-N transmission and further reduction of the
|
|||
|
|
congestion window.
|
|||
|
|
|
|||
|
|
7 Research
|
|||
|
|
|
|||
|
|
The algorithm specified in this document is analyzed in [FF96], which
|
|||
|
|
shows that the above algorithm is effective in reducing transfer time
|
|||
|
|
over standard TCP Reno [RFC2581] when multiple segments are dropped
|
|||
|
|
from a window of data (especially as the number of drops increases).
|
|||
|
|
[AHKO97] shows that the algorithm defined in this document can
|
|||
|
|
greatly improve throughput in connections traversing satellite
|
|||
|
|
channels.
|
|||
|
|
|
|||
|
|
8 Security Considerations
|
|||
|
|
|
|||
|
|
The algorithm presented in this paper shares security considerations
|
|||
|
|
with [RFC2581]. A key difference is that an algorithm based on SACKs
|
|||
|
|
is more robust against attackers forging duplicate ACKs to force the
|
|||
|
|
TCP sender to reduce cwnd. With SACKs, TCP senders have an
|
|||
|
|
additional check on whether or not a particular ACK is legitimate.
|
|||
|
|
While not fool-proof, SACK does provide some amount of protection in
|
|||
|
|
this area.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 9]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Acknowledgments
|
|||
|
|
|
|||
|
|
The authors wish to thank Sally Floyd for encouraging this document
|
|||
|
|
and commenting on early drafts. The algorithm described in this
|
|||
|
|
document is loosely based on an algorithm outlined by Kevin Fall and
|
|||
|
|
Sally Floyd in [FF96], although the authors of this document assume
|
|||
|
|
responsibility for any mistakes in the above text. Murali Bashyam,
|
|||
|
|
Ken Calvert, Tom Henderson, Reiner Ludwig, Jamshid Mahdavi, Matt
|
|||
|
|
Mathis, Shawn Ostermann, Vern Paxson and Venkat Venkatsubra provided
|
|||
|
|
valuable feedback on earlier versions of this document. We thank
|
|||
|
|
Matt Mathis and Jamshid Mahdavi for implementing the scoreboard in ns
|
|||
|
|
and hence guiding our thinking in keeping track of SACK state.
|
|||
|
|
|
|||
|
|
The first author would like to thank Ohio University and the Ohio
|
|||
|
|
University Internetworking Research Group for supporting the bulk of
|
|||
|
|
his work on this project.
|
|||
|
|
|
|||
|
|
Normative References
|
|||
|
|
|
|||
|
|
[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
|
|||
|
|
793, September 1981.
|
|||
|
|
|
|||
|
|
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S. and A. Romanow, "TCP
|
|||
|
|
Selective Acknowledgment Options", RFC 2018, October 1996.
|
|||
|
|
|
|||
|
|
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision
|
|||
|
|
3", BCP 9, RFC 2026, October 1996.
|
|||
|
|
|
|||
|
|
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
|
|||
|
|
Requirement Levels", BCP 14, RFC 2119, March 1997.
|
|||
|
|
|
|||
|
|
[RFC2581] Allman, M., Paxson, V. and R. Stevens, "TCP Congestion
|
|||
|
|
Control", RFC 2581, April 1999.
|
|||
|
|
|
|||
|
|
Informative References
|
|||
|
|
|
|||
|
|
[AHKO97] Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann. TCP
|
|||
|
|
Performance Over Satellite Links. Proceedings of the Fifth
|
|||
|
|
International Conference on Telecommunications Systems,
|
|||
|
|
Nashville, TN, March, 1997.
|
|||
|
|
|
|||
|
|
[All00] Mark Allman. A Web Server's View of the Transport Layer.
|
|||
|
|
ACM Computer Communication Review, 30(5), October 2000.
|
|||
|
|
|
|||
|
|
[FF96] Kevin Fall and Sally Floyd. Simulation-based Comparisons
|
|||
|
|
of Tahoe, Reno and SACK TCP. Computer Communication
|
|||
|
|
Review, July 1996.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 10]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
[Jac90] Van Jacobson. Modified TCP Congestion Avoidance Algorithm.
|
|||
|
|
Technical Report, LBL, April 1990.
|
|||
|
|
|
|||
|
|
[PF01] Jitendra Padhye, Sally Floyd. Identifying the TCP Behavior
|
|||
|
|
of Web Servers, ACM SIGCOMM, August 2001.
|
|||
|
|
|
|||
|
|
[RFC2582] Floyd, S. and T. Henderson, "The NewReno Modification to
|
|||
|
|
TCP's Fast Recovery Algorithm", RFC 2582, April 1999.
|
|||
|
|
|
|||
|
|
[RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC
|
|||
|
|
2914, September 2000.
|
|||
|
|
|
|||
|
|
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
|
|||
|
|
Timer", RFC 2988, November 2000.
|
|||
|
|
|
|||
|
|
[RFC3042] Allman, M., Balakrishnan, H, and S. Floyd, "Enhancing TCP's
|
|||
|
|
Loss Recovery Using Limited Transmit", RFC 3042, January
|
|||
|
|
2001.
|
|||
|
|
|
|||
|
|
Intellectual Property Rights Notice
|
|||
|
|
|
|||
|
|
The IETF takes no position regarding the validity or scope of any
|
|||
|
|
intellectual property or other rights that might be claimed to
|
|||
|
|
pertain to the implementation or use of the technology described in
|
|||
|
|
this document or the extent to which any license under such rights
|
|||
|
|
might or might not be available; neither does it represent that it
|
|||
|
|
has made any effort to identify any such rights. Information on the
|
|||
|
|
IETF's procedures with respect to rights in standards-track and
|
|||
|
|
standards-related documentation can be found in BCP-11. Copies of
|
|||
|
|
claims of rights made available for publication and any assurances of
|
|||
|
|
licenses to be made available, or the result of an attempt made to
|
|||
|
|
obtain a general license or permission for the use of such
|
|||
|
|
proprietary rights by implementors or users of this specification can
|
|||
|
|
be obtained from the IETF Secretariat.
|
|||
|
|
|
|||
|
|
The IETF invites any interested party to bring to its attention any
|
|||
|
|
copyrights, patents or patent applications, or other proprietary
|
|||
|
|
rights which may cover technology that may be required to practice
|
|||
|
|
this standard. Please address the information to the IETF Executive
|
|||
|
|
Director.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 11]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Authors' Addresses
|
|||
|
|
|
|||
|
|
Ethan Blanton
|
|||
|
|
Purdue University Computer Sciences
|
|||
|
|
1398 Computer Science Building
|
|||
|
|
West Lafayette, IN 47907
|
|||
|
|
|
|||
|
|
EMail: eblanton@cs.purdue.edu
|
|||
|
|
|
|||
|
|
|
|||
|
|
Mark Allman
|
|||
|
|
BBN Technologies/NASA Glenn Research Center
|
|||
|
|
Lewis Field
|
|||
|
|
21000 Brookpark Rd. MS 54-5
|
|||
|
|
Cleveland, OH 44135
|
|||
|
|
|
|||
|
|
Phone: 216-433-6586
|
|||
|
|
Fax: 216-433-8705
|
|||
|
|
EMail: mallman@bbn.com
|
|||
|
|
http://roland.grc.nasa.gov/~mallman
|
|||
|
|
|
|||
|
|
|
|||
|
|
Kevin Fall
|
|||
|
|
Intel Research
|
|||
|
|
2150 Shattuck Ave., PH Suite
|
|||
|
|
Berkeley, CA 94704
|
|||
|
|
|
|||
|
|
EMail: kfall@intel-research.net
|
|||
|
|
|
|||
|
|
|
|||
|
|
Lili Wang
|
|||
|
|
Laboratory for Advanced Networking
|
|||
|
|
210 Hardymon Building
|
|||
|
|
University of Kentucky
|
|||
|
|
Lexington, KY 40506-0495
|
|||
|
|
|
|||
|
|
EMail: lwang0@uky.edu
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 12]
|
|||
|
|
|
|||
|
|
RFC 3517 SACK-based Loss Recovery for TCP April 2003
|
|||
|
|
|
|||
|
|
|
|||
|
|
Full Copyright Statement
|
|||
|
|
|
|||
|
|
Copyright (C) The Internet Society (2003). All Rights Reserved.
|
|||
|
|
|
|||
|
|
This document and translations of it may be copied and furnished to
|
|||
|
|
others, and derivative works that comment on or otherwise explain it
|
|||
|
|
or assist in its implementation may be prepared, copied, published
|
|||
|
|
and distributed, in whole or in part, without restriction of any
|
|||
|
|
kind, provided that the above copyright notice and this paragraph are
|
|||
|
|
included on all such copies and derivative works. However, this
|
|||
|
|
document itself may not be modified in any way, such as by removing
|
|||
|
|
the copyright notice or references to the Internet Society or other
|
|||
|
|
Internet organizations, except as needed for the purpose of
|
|||
|
|
developing Internet standards in which case the procedures for
|
|||
|
|
copyrights defined in the Internet Standards process must be
|
|||
|
|
followed, or as required to translate it into languages other than
|
|||
|
|
English.
|
|||
|
|
|
|||
|
|
The limited permissions granted above are perpetual and will not be
|
|||
|
|
revoked by the Internet Society or its successors or assigns.
|
|||
|
|
|
|||
|
|
This document and the information contained herein is provided on an
|
|||
|
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
|||
|
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
|||
|
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
|||
|
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
|||
|
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|||
|
|
|
|||
|
|
Acknowledgement
|
|||
|
|
|
|||
|
|
Funding for the RFC Editor function is currently provided by the
|
|||
|
|
Internet Society.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Blanton, et al. Standards Track [Page 13]
|
|||
|
|
|