3532 lines
167 KiB
Plaintext
3532 lines
167 KiB
Plaintext
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Network Working Group K. Ramakrishnan
|
|||
|
|
Request for Comments: 3168 TeraOptic Networks
|
|||
|
|
Updates: 2474, 2401, 793 S. Floyd
|
|||
|
|
Obsoletes: 2481 ACIRI
|
|||
|
|
Category: Standards Track D. Black
|
|||
|
|
EMC
|
|||
|
|
September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
The Addition of Explicit Congestion Notification (ECN) to IP
|
|||
|
|
|
|||
|
|
Status of this Memo
|
|||
|
|
|
|||
|
|
This document specifies an Internet standards track protocol for the
|
|||
|
|
Internet community, and requests discussion and suggestions for
|
|||
|
|
improvements. Please refer to the current edition of the "Internet
|
|||
|
|
Official Protocol Standards" (STD 1) for the standardization state
|
|||
|
|
and status of this protocol. Distribution of this memo is unlimited.
|
|||
|
|
|
|||
|
|
Copyright Notice
|
|||
|
|
|
|||
|
|
Copyright (C) The Internet Society (2001). All Rights Reserved.
|
|||
|
|
|
|||
|
|
Abstract
|
|||
|
|
|
|||
|
|
This memo specifies the incorporation of ECN (Explicit Congestion
|
|||
|
|
Notification) to TCP and IP, including ECN's use of two bits in the
|
|||
|
|
IP header.
|
|||
|
|
|
|||
|
|
Table of Contents
|
|||
|
|
|
|||
|
|
1. Introduction.................................................. 3
|
|||
|
|
2. Conventions and Acronyms...................................... 5
|
|||
|
|
3. Assumptions and General Principles............................ 5
|
|||
|
|
4. Active Queue Management (AQM)................................. 6
|
|||
|
|
5. Explicit Congestion Notification in IP........................ 6
|
|||
|
|
5.1. ECN as an Indication of Persistent Congestion............... 10
|
|||
|
|
5.2. Dropped or Corrupted Packets................................ 11
|
|||
|
|
5.3. Fragmentation............................................... 11
|
|||
|
|
6. Support from the Transport Protocol........................... 12
|
|||
|
|
6.1. TCP......................................................... 13
|
|||
|
|
6.1.1 TCP Initialization......................................... 14
|
|||
|
|
6.1.1.1. Middlebox Issues........................................ 16
|
|||
|
|
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field. 17
|
|||
|
|
6.1.2. The TCP Sender............................................ 18
|
|||
|
|
6.1.3. The TCP Receiver.......................................... 19
|
|||
|
|
6.1.4. Congestion on the ACK-path................................ 20
|
|||
|
|
6.1.5. Retransmitted TCP packets................................. 20
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 1]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
6.1.6. TCP Window Probes......................................... 22
|
|||
|
|
7. Non-compliance by the End Nodes............................... 22
|
|||
|
|
8. Non-compliance in the Network................................. 24
|
|||
|
|
8.1. Complications Introduced by Split Paths..................... 25
|
|||
|
|
9. Encapsulated Packets.......................................... 25
|
|||
|
|
9.1. IP packets encapsulated in IP............................... 25
|
|||
|
|
9.1.1. The Limited-functionality and Full-functionality Options.. 27
|
|||
|
|
9.1.2. Changes to the ECN Field within an IP Tunnel.............. 28
|
|||
|
|
9.2. IPsec Tunnels............................................... 29
|
|||
|
|
9.2.1. Negotiation between Tunnel Endpoints...................... 31
|
|||
|
|
9.2.1.1. ECN Tunnel Security Association Database Field.......... 32
|
|||
|
|
9.2.1.2. ECN Tunnel Security Association Attribute............... 32
|
|||
|
|
9.2.1.3. Changes to IPsec Tunnel Header Processing............... 33
|
|||
|
|
9.2.2. Changes to the ECN Field within an IPsec Tunnel........... 35
|
|||
|
|
9.2.3. Comments for IPsec Support................................ 35
|
|||
|
|
9.3. IP packets encapsulated in non-IP Packet Headers............ 36
|
|||
|
|
10. Issues Raised by Monitoring and Policing Devices............. 36
|
|||
|
|
11. Evaluations of ECN........................................... 37
|
|||
|
|
11.1. Related Work Evaluating ECN................................ 37
|
|||
|
|
11.2. A Discussion of the ECN nonce.............................. 37
|
|||
|
|
11.2.1. The Incremental Deployment of ECT(1) in Routers.......... 38
|
|||
|
|
12. Summary of changes required in IP and TCP.................... 38
|
|||
|
|
13. Conclusions.................................................. 40
|
|||
|
|
14. Acknowledgements............................................. 41
|
|||
|
|
15. References................................................... 41
|
|||
|
|
16. Security Considerations...................................... 45
|
|||
|
|
17. IPv4 Header Checksum Recalculation........................... 45
|
|||
|
|
18. Possible Changes to the ECN Field in the Network............. 45
|
|||
|
|
18.1. Possible Changes to the IP Header.......................... 46
|
|||
|
|
18.1.1. Erasing the Congestion Indication........................ 46
|
|||
|
|
18.1.2. Falsely Reporting Congestion............................. 47
|
|||
|
|
18.1.3. Disabling ECN-Capability................................. 47
|
|||
|
|
18.1.4. Falsely Indicating ECN-Capability........................ 47
|
|||
|
|
18.2. Information carried in the Transport Header................ 48
|
|||
|
|
18.3. Split Paths................................................ 49
|
|||
|
|
19. Implications of Subverting End-to-End Congestion Control..... 50
|
|||
|
|
19.1. Implications for the Network and for Competing Flows....... 50
|
|||
|
|
19.2. Implications for the Subverted Flow........................ 53
|
|||
|
|
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion
|
|||
|
|
Control.................................................... 54
|
|||
|
|
20. The Motivation for the ECT Codepoints........................ 54
|
|||
|
|
20.1. The Motivation for an ECT Codepoint........................ 54
|
|||
|
|
20.2. The Motivation for two ECT Codepoints...................... 55
|
|||
|
|
21. Why use Two Bits in the IP Header?........................... 57
|
|||
|
|
22. Historical Definitions for the IPv4 TOS Octet................ 58
|
|||
|
|
23. IANA Considerations.......................................... 60
|
|||
|
|
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet................. 60
|
|||
|
|
23.2. TCP Header Flags........................................... 61
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 2]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
23.3. IPSEC Security Association Attributes....................... 62
|
|||
|
|
24. Authors' Addresses........................................... 62
|
|||
|
|
25. Full Copyright Statement..................................... 63
|
|||
|
|
|
|||
|
|
1. Introduction
|
|||
|
|
|
|||
|
|
We begin by describing TCP's use of packet drops as an indication of
|
|||
|
|
congestion. Next we explain that with the addition of active queue
|
|||
|
|
management (e.g., RED) to the Internet infrastructure, where routers
|
|||
|
|
detect congestion before the queue overflows, routers are no longer
|
|||
|
|
limited to packet drops as an indication of congestion. Routers can
|
|||
|
|
instead set the Congestion Experienced (CE) codepoint in the IP
|
|||
|
|
header of packets from ECN-capable transports. We describe when the
|
|||
|
|
CE codepoint is to be set in routers, and describe modifications
|
|||
|
|
needed to TCP to make it ECN-capable. Modifications to other
|
|||
|
|
transport protocols (e.g., unreliable unicast or multicast, reliable
|
|||
|
|
multicast, other reliable unicast transport protocols) could be
|
|||
|
|
considered as those protocols are developed and advance through the
|
|||
|
|
standards process. We also describe in this document the issues
|
|||
|
|
involving the use of ECN within IP tunnels, and within IPsec tunnels
|
|||
|
|
in particular.
|
|||
|
|
|
|||
|
|
One of the guiding principles for this document is that, to the
|
|||
|
|
extent possible, the mechanisms specified here be incrementally
|
|||
|
|
deployable. One challenge to the principle of incremental deployment
|
|||
|
|
has been the prior existence of some IP tunnels that were not
|
|||
|
|
compatible with the use of ECN. As ECN becomes deployed, non-
|
|||
|
|
compatible IP tunnels will have to be upgraded to conform to this
|
|||
|
|
document.
|
|||
|
|
|
|||
|
|
This document obsoletes RFC 2481, "A Proposal to add Explicit
|
|||
|
|
Congestion Notification (ECN) to IP", which defined ECN as an
|
|||
|
|
Experimental Protocol for the Internet Community. This document also
|
|||
|
|
updates RFC 2474, "Definition of the Differentiated Services Field
|
|||
|
|
(DS Field) in the IPv4 and IPv6 Headers", in defining the ECN field
|
|||
|
|
in the IP header, RFC 2401, "Security Architecture for the Internet
|
|||
|
|
Protocol" to change the handling of IPv4 TOS Byte and IPv6 Traffic
|
|||
|
|
Class Octet in tunnel mode header construction to be compatible with
|
|||
|
|
the use of ECN, and RFC 793, "Transmission Control Protocol", in
|
|||
|
|
defining two new flags in the TCP header.
|
|||
|
|
|
|||
|
|
TCP's congestion control and avoidance algorithms are based on the
|
|||
|
|
notion that the network is a black-box [Jacobson88, Jacobson90]. The
|
|||
|
|
network's state of congestion or otherwise is determined by end-
|
|||
|
|
systems probing for the network state, by gradually increasing the
|
|||
|
|
load on the network (by increasing the window of packets that are
|
|||
|
|
outstanding in the network) until the network becomes congested and a
|
|||
|
|
packet is lost. Treating the network as a "black-box" and treating
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 3]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
loss as an indication of congestion in the network is appropriate for
|
|||
|
|
pure best-effort data carried by TCP, with little or no sensitivity
|
|||
|
|
to delay or loss of individual packets. In addition, TCP's
|
|||
|
|
congestion management algorithms have techniques built-in (such as
|
|||
|
|
Fast Retransmit and Fast Recovery) to minimize the impact of losses,
|
|||
|
|
from a throughput perspective. However, these mechanisms are not
|
|||
|
|
intended to help applications that are in fact sensitive to the delay
|
|||
|
|
or loss of one or more individual packets. Interactive traffic such
|
|||
|
|
as telnet, web-browsing, and transfer of audio and video data can be
|
|||
|
|
sensitive to packet losses (especially when using an unreliable data
|
|||
|
|
delivery transport such as UDP) or to the increased latency of the
|
|||
|
|
packet caused by the need to retransmit the packet after a loss (with
|
|||
|
|
the reliable data delivery semantics provided by TCP).
|
|||
|
|
|
|||
|
|
Since TCP determines the appropriate congestion window to use by
|
|||
|
|
gradually increasing the window size until it experiences a dropped
|
|||
|
|
packet, this causes the queues at the bottleneck router to build up.
|
|||
|
|
With most packet drop policies at the router that are not sensitive
|
|||
|
|
to the load placed by each individual flow (e.g., tail-drop on queue
|
|||
|
|
overflow), this means that some of the packets of latency-sensitive
|
|||
|
|
flows may be dropped. In addition, such drop policies lead to
|
|||
|
|
synchronization of loss across multiple flows.
|
|||
|
|
|
|||
|
|
Active queue management mechanisms detect congestion before the queue
|
|||
|
|
overflows, and provide an indication of this congestion to the end
|
|||
|
|
nodes. Thus, active queue management can reduce unnecessary queuing
|
|||
|
|
delay for all traffic sharing that queue. The advantages of active
|
|||
|
|
queue management are discussed in RFC 2309 [RFC2309]. Active queue
|
|||
|
|
management avoids some of the bad properties of dropping on queue
|
|||
|
|
overflow, including the undesirable synchronization of loss across
|
|||
|
|
multiple flows. More importantly, active queue management means that
|
|||
|
|
transport protocols with mechanisms for congestion control (e.g.,
|
|||
|
|
TCP) do not have to rely on buffer overflow as the only indication of
|
|||
|
|
congestion.
|
|||
|
|
|
|||
|
|
Active queue management mechanisms may use one of several methods for
|
|||
|
|
indicating congestion to end-nodes. One is to use packet drops, as is
|
|||
|
|
currently done. However, active queue management allows the router to
|
|||
|
|
separate policies of queuing or dropping packets from the policies
|
|||
|
|
for indicating congestion. Thus, active queue management allows
|
|||
|
|
routers to use the Congestion Experienced (CE) codepoint in a packet
|
|||
|
|
header as an indication of congestion, instead of relying solely on
|
|||
|
|
packet drops. This has the potential of reducing the impact of loss
|
|||
|
|
on latency-sensitive flows.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 4]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
There exist some middleboxes (firewalls, load balancers, or intrusion
|
|||
|
|
detection systems) in the Internet that either drop a TCP SYN packet
|
|||
|
|
configured to negotiate ECN, or respond with a RST. This document
|
|||
|
|
specifies procedures that TCP implementations may use to provide
|
|||
|
|
robust connectivity even in the presence of such equipment.
|
|||
|
|
|
|||
|
|
2. Conventions and Acronyms
|
|||
|
|
|
|||
|
|
The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
|
|||
|
|
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
|
|||
|
|
document, are to be interpreted as described in [RFC2119].
|
|||
|
|
|
|||
|
|
3. Assumptions and General Principles
|
|||
|
|
|
|||
|
|
In this section, we describe some of the important design principles
|
|||
|
|
and assumptions that guided the design choices in this proposal.
|
|||
|
|
|
|||
|
|
* Because ECN is likely to be adopted gradually, accommodating
|
|||
|
|
migration is essential. Some routers may still only drop packets
|
|||
|
|
to indicate congestion, and some end-systems may not be ECN-
|
|||
|
|
capable. The most viable strategy is one that accommodates
|
|||
|
|
incremental deployment without having to resort to "islands" of
|
|||
|
|
ECN-capable and non-ECN-capable environments.
|
|||
|
|
|
|||
|
|
* New mechanisms for congestion control and avoidance need to co-
|
|||
|
|
exist and cooperate with existing mechanisms for congestion
|
|||
|
|
control. In particular, new mechanisms have to co-exist with
|
|||
|
|
TCP's current methods of adapting to congestion and with
|
|||
|
|
routers' current practice of dropping packets in periods of
|
|||
|
|
congestion.
|
|||
|
|
|
|||
|
|
* Congestion may persist over different time-scales. The time
|
|||
|
|
scales that we are concerned with are congestion events that may
|
|||
|
|
last longer than a round-trip time.
|
|||
|
|
|
|||
|
|
* The number of packets in an individual flow (e.g., TCP
|
|||
|
|
connection or an exchange using UDP) may range from a small
|
|||
|
|
number of packets to quite a large number. We are interested in
|
|||
|
|
managing the congestion caused by flows that send enough packets
|
|||
|
|
so that they are still active when network feedback reaches
|
|||
|
|
them.
|
|||
|
|
|
|||
|
|
* Asymmetric routing is likely to be a normal occurrence in the
|
|||
|
|
Internet. The path (sequence of links and routers) followed by
|
|||
|
|
data packets may be different from the path followed by the
|
|||
|
|
acknowledgment packets in the reverse direction.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 5]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
* Many routers process the "regular" headers in IP packets more
|
|||
|
|
efficiently than they process the header information in IP
|
|||
|
|
options. This suggests keeping congestion experienced
|
|||
|
|
information in the regular headers of an IP packet.
|
|||
|
|
|
|||
|
|
* It must be recognized that not all end-systems will cooperate in
|
|||
|
|
mechanisms for congestion control. However, new mechanisms
|
|||
|
|
shouldn't make it easier for TCP applications to disable TCP
|
|||
|
|
congestion control. The benefit of lying about participating in
|
|||
|
|
new mechanisms such as ECN-capability should be small.
|
|||
|
|
|
|||
|
|
4. Active Queue Management (AQM)
|
|||
|
|
|
|||
|
|
Random Early Detection (RED) is one mechanism for Active Queue
|
|||
|
|
Management (AQM) that has been proposed to detect incipient
|
|||
|
|
congestion [FJ93], and is currently being deployed in the Internet
|
|||
|
|
[RFC2309]. AQM is meant to be a general mechanism using one of
|
|||
|
|
several alternatives for congestion indication, but in the absence of
|
|||
|
|
ECN, AQM is restricted to using packet drops as a mechanism for
|
|||
|
|
congestion indication. AQM drops packets based on the average queue
|
|||
|
|
length exceeding a threshold, rather than only when the queue
|
|||
|
|
overflows. However, because AQM may drop packets before the queue
|
|||
|
|
actually overflows, AQM is not always forced by memory limitations to
|
|||
|
|
discard the packet.
|
|||
|
|
|
|||
|
|
AQM can set a Congestion Experienced (CE) codepoint in the packet
|
|||
|
|
header instead of dropping the packet, when such a field is provided
|
|||
|
|
in the IP header and understood by the transport protocol. The use
|
|||
|
|
of the CE codepoint with ECN allows the receiver(s) to receive the
|
|||
|
|
packet, avoiding the potential for excessive delays due to
|
|||
|
|
retransmissions after packet losses. We use the term 'CE packet' to
|
|||
|
|
denote a packet that has the CE codepoint set.
|
|||
|
|
|
|||
|
|
5. Explicit Congestion Notification in IP
|
|||
|
|
|
|||
|
|
This document specifies that the Internet provide a congestion
|
|||
|
|
indication for incipient congestion (as in RED and earlier work
|
|||
|
|
[RJ90]) where the notification can sometimes be through marking
|
|||
|
|
packets rather than dropping them. This uses an ECN field in the IP
|
|||
|
|
header with two bits, making four ECN codepoints, '00' to '11'. The
|
|||
|
|
ECN-Capable Transport (ECT) codepoints '10' and '01' are set by the
|
|||
|
|
data sender to indicate that the end-points of the transport protocol
|
|||
|
|
are ECN-capable; we call them ECT(0) and ECT(1) respectively. The
|
|||
|
|
phrase "the ECT codepoint" in this documents refers to either of the
|
|||
|
|
two ECT codepoints. Routers treat the ECT(0) and ECT(1) codepoints
|
|||
|
|
as equivalent. Senders are free to use either the ECT(0) or the
|
|||
|
|
ECT(1) codepoint to indicate ECT, on a packet-by-packet basis.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 6]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
The use of both the two codepoints for ECT, ECT(0) and ECT(1), is
|
|||
|
|
motivated primarily by the desire to allow mechanisms for the data
|
|||
|
|
sender to verify that network elements are not erasing the CE
|
|||
|
|
codepoint, and that data receivers are properly reporting to the
|
|||
|
|
sender the receipt of packets with the CE codepoint set, as required
|
|||
|
|
by the transport protocol. Guidelines for the senders and receivers
|
|||
|
|
to differentiate between the ECT(0) and ECT(1) codepoints will be
|
|||
|
|
addressed in separate documents, for each transport protocol. In
|
|||
|
|
particular, this document does not address mechanisms for TCP end-
|
|||
|
|
nodes to differentiate between the ECT(0) and ECT(1) codepoints.
|
|||
|
|
Protocols and senders that only require a single ECT codepoint SHOULD
|
|||
|
|
use ECT(0).
|
|||
|
|
|
|||
|
|
The not-ECT codepoint '00' indicates a packet that is not using ECN.
|
|||
|
|
The CE codepoint '11' is set by a router to indicate congestion to
|
|||
|
|
the end nodes. Routers that have a packet arriving at a full queue
|
|||
|
|
drop the packet, just as they do in the absence of ECN.
|
|||
|
|
|
|||
|
|
+-----+-----+
|
|||
|
|
| ECN FIELD |
|
|||
|
|
+-----+-----+
|
|||
|
|
ECT CE [Obsolete] RFC 2481 names for the ECN bits.
|
|||
|
|
0 0 Not-ECT
|
|||
|
|
0 1 ECT(1)
|
|||
|
|
1 0 ECT(0)
|
|||
|
|
1 1 CE
|
|||
|
|
|
|||
|
|
Figure 1: The ECN Field in IP.
|
|||
|
|
|
|||
|
|
The use of two ECT codepoints essentially gives a one-bit ECN nonce
|
|||
|
|
in packet headers, and routers necessarily "erase" the nonce when
|
|||
|
|
they set the CE codepoint [SCWA99]. For example, routers that erased
|
|||
|
|
the CE codepoint would face additional difficulty in reconstructing
|
|||
|
|
the original nonce, and thus repeated erasure of the CE codepoint
|
|||
|
|
would be more likely to be detected by the end-nodes. The ECN nonce
|
|||
|
|
also can address the problem of misbehaving transport receivers lying
|
|||
|
|
to the transport sender about whether or not the CE codepoint was set
|
|||
|
|
in a packet. The motivations for the use of two ECT codepoints is
|
|||
|
|
discussed in more detail in Section 20, along with some discussion of
|
|||
|
|
alternate possibilities for the fourth ECT codepoint (that is, the
|
|||
|
|
codepoint '01'). Backwards compatibility with earlier ECN
|
|||
|
|
implementations that do not understand the ECT(1) codepoint is
|
|||
|
|
discussed in Section 11.
|
|||
|
|
|
|||
|
|
In RFC 2481 [RFC2481], the ECN field was divided into the ECN-Capable
|
|||
|
|
Transport (ECT) bit and the CE bit. The ECN field with only the
|
|||
|
|
ECN-Capable Transport (ECT) bit set in RFC 2481 corresponds to the
|
|||
|
|
ECT(0) codepoint in this document, and the ECN field with both the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 7]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
ECT and CE bit in RFC 2481 corresponds to the CE codepoint in this
|
|||
|
|
document. The '01' codepoint was left undefined in RFC 2481, and
|
|||
|
|
this is the reason for recommending the use of ECT(0) when only a
|
|||
|
|
single ECT codepoint is needed.
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
| DS FIELD, DSCP | ECN FIELD |
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
|
|||
|
|
DSCP: differentiated services codepoint
|
|||
|
|
ECN: Explicit Congestion Notification
|
|||
|
|
|
|||
|
|
Figure 2: The Differentiated Services and ECN Fields in IP.
|
|||
|
|
|
|||
|
|
Bits 6 and 7 in the IPv4 TOS octet are designated as the ECN field.
|
|||
|
|
The IPv4 TOS octet corresponds to the Traffic Class octet in IPv6,
|
|||
|
|
and the ECN field is defined identically in both cases. The
|
|||
|
|
definitions for the IPv4 TOS octet [RFC791] and the IPv6 Traffic
|
|||
|
|
Class octet have been superseded by the six-bit DS (Differentiated
|
|||
|
|
Services) Field [RFC2474, RFC2780]. Bits 6 and 7 are listed in
|
|||
|
|
[RFC2474] as Currently Unused, and are specified in RFC 2780 as
|
|||
|
|
approved for experimental use for ECN. Section 22 gives a brief
|
|||
|
|
history of the TOS octet.
|
|||
|
|
|
|||
|
|
Because of the unstable history of the TOS octet, the use of the ECN
|
|||
|
|
field as specified in this document cannot be guaranteed to be
|
|||
|
|
backwards compatible with those past uses of these two bits that
|
|||
|
|
pre-date ECN. The potential dangers of this lack of backwards
|
|||
|
|
compatibility are discussed in Section 22.
|
|||
|
|
|
|||
|
|
Upon the receipt by an ECN-Capable transport of a single CE packet,
|
|||
|
|
the congestion control algorithms followed at the end-systems MUST be
|
|||
|
|
essentially the same as the congestion control response to a *single*
|
|||
|
|
dropped packet. For example, for ECN-Capable TCP the source TCP is
|
|||
|
|
required to halve its congestion window for any window of data
|
|||
|
|
containing either a packet drop or an ECN indication.
|
|||
|
|
|
|||
|
|
One reason for requiring that the congestion-control response to the
|
|||
|
|
CE packet be essentially the same as the response to a dropped packet
|
|||
|
|
is to accommodate the incremental deployment of ECN in both end-
|
|||
|
|
systems and in routers. Some routers may drop ECN-Capable packets
|
|||
|
|
(e.g., using the same AQM policies for congestion detection) while
|
|||
|
|
other routers set the CE codepoint, for equivalent levels of
|
|||
|
|
congestion. Similarly, a router might drop a non-ECN-Capable packet
|
|||
|
|
but set the CE codepoint in an ECN-Capable packet, for equivalent
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 8]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
levels of congestion. If there were different congestion control
|
|||
|
|
responses to a CE codepoint than to a packet drop, this could result
|
|||
|
|
in unfair treatment for different flows.
|
|||
|
|
|
|||
|
|
An additional goal is that the end-systems should react to congestion
|
|||
|
|
at most once per window of data (i.e., at most once per round-trip
|
|||
|
|
time), to avoid reacting multiple times to multiple indications of
|
|||
|
|
congestion within a round-trip time.
|
|||
|
|
|
|||
|
|
For a router, the CE codepoint of an ECN-Capable packet SHOULD only
|
|||
|
|
be set if the router would otherwise have dropped the packet as an
|
|||
|
|
indication of congestion to the end nodes. When the router's buffer
|
|||
|
|
is not yet full and the router is prepared to drop a packet to inform
|
|||
|
|
end nodes of incipient congestion, the router should first check to
|
|||
|
|
see if the ECT codepoint is set in that packet's IP header. If so,
|
|||
|
|
then instead of dropping the packet, the router MAY instead set the
|
|||
|
|
CE codepoint in the IP header.
|
|||
|
|
|
|||
|
|
An environment where all end nodes were ECN-Capable could allow new
|
|||
|
|
criteria to be developed for setting the CE codepoint, and new
|
|||
|
|
congestion control mechanisms for end-node reaction to CE packets.
|
|||
|
|
However, this is a research issue, and as such is not addressed in
|
|||
|
|
this document.
|
|||
|
|
|
|||
|
|
When a CE packet (i.e., a packet that has the CE codepoint set) is
|
|||
|
|
received by a router, the CE codepoint is left unchanged, and the
|
|||
|
|
packet is transmitted as usual. When severe congestion has occurred
|
|||
|
|
and the router's queue is full, then the router has no choice but to
|
|||
|
|
drop some packet when a new packet arrives. We anticipate that such
|
|||
|
|
packet losses will become relatively infrequent when a majority of
|
|||
|
|
end-systems become ECN-Capable and participate in TCP or other
|
|||
|
|
compatible congestion control mechanisms. In an ECN-Capable
|
|||
|
|
environment that is adequately-provisioned, packet losses should
|
|||
|
|
occur primarily during transients or in the presence of non-
|
|||
|
|
cooperating sources.
|
|||
|
|
|
|||
|
|
The above discussion of when CE may be set instead of dropping a
|
|||
|
|
packet applies by default to all Differentiated Services Per-Hop
|
|||
|
|
Behaviors (PHBs) [RFC 2475]. Specifications for PHBs MAY provide
|
|||
|
|
more specifics on how a compliant implementation is to choose between
|
|||
|
|
setting CE and dropping a packet, but this is NOT REQUIRED. A router
|
|||
|
|
MUST NOT set CE instead of dropping a packet when the drop that would
|
|||
|
|
occur is caused by reasons other than congestion or the desire to
|
|||
|
|
indicate incipient congestion to end nodes (e.g., a diffserv edge
|
|||
|
|
node may be configured to unconditionally drop certain classes of
|
|||
|
|
traffic to prevent them from entering its diffserv domain).
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 9]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
We expect that routers will set the CE codepoint in response to
|
|||
|
|
incipient congestion as indicated by the average queue size, using
|
|||
|
|
the RED algorithms suggested in [FJ93, RFC2309]. To the best of our
|
|||
|
|
knowledge, this is the only proposal currently under discussion in
|
|||
|
|
the IETF for routers to drop packets proactively, before the buffer
|
|||
|
|
overflows. However, this document does not attempt to specify a
|
|||
|
|
particular mechanism for active queue management, leaving that
|
|||
|
|
endeavor, if needed, to other areas of the IETF. While ECN is
|
|||
|
|
inextricably tied up with the need to have a reasonable active queue
|
|||
|
|
management mechanism at the router, the reverse does not hold; active
|
|||
|
|
queue management mechanisms have been developed and deployed
|
|||
|
|
independent of ECN, using packet drops as indications of congestion
|
|||
|
|
in the absence of ECN in the IP architecture.
|
|||
|
|
|
|||
|
|
5.1. ECN as an Indication of Persistent Congestion
|
|||
|
|
|
|||
|
|
We emphasize that a *single* packet with the CE codepoint set in an
|
|||
|
|
IP packet causes the transport layer to respond, in terms of
|
|||
|
|
congestion control, as it would to a packet drop. The instantaneous
|
|||
|
|
queue size is likely to see considerable variations even when the
|
|||
|
|
router does not experience persistent congestion. As such, it is
|
|||
|
|
important that transient congestion at a router, reflected by the
|
|||
|
|
instantaneous queue size reaching a threshold much smaller than the
|
|||
|
|
capacity of the queue, not trigger a reaction at the transport layer.
|
|||
|
|
Therefore, the CE codepoint should not be set by a router based on
|
|||
|
|
the instantaneous queue size.
|
|||
|
|
|
|||
|
|
For example, since the ATM and Frame Relay mechanisms for congestion
|
|||
|
|
indication have typically been defined without an associated notion
|
|||
|
|
of average queue size as the basis for determining that an
|
|||
|
|
intermediate node is congested, we believe that they provide a very
|
|||
|
|
noisy signal. The TCP-sender reaction specified in this document for
|
|||
|
|
ECN is NOT the appropriate reaction for such a noisy signal of
|
|||
|
|
congestion notification. However, if the routers that interface to
|
|||
|
|
the ATM network have a way of maintaining the average queue at the
|
|||
|
|
interface, and use it to come to a reliable determination that the
|
|||
|
|
ATM subnet is congested, they may use the ECN notification that is
|
|||
|
|
defined here.
|
|||
|
|
|
|||
|
|
We continue to encourage experiments in techniques at layer 2 (e.g.,
|
|||
|
|
in ATM switches or Frame Relay switches) to take advantage of ECN.
|
|||
|
|
For example, using a scheme such as RED (where packet marking is
|
|||
|
|
based on the average queue length exceeding a threshold), layer 2
|
|||
|
|
devices could provide a reasonably reliable indication of congestion.
|
|||
|
|
When all the layer 2 devices in a path set that layer's own
|
|||
|
|
Congestion Experienced codepoint (e.g., the EFCI bit for ATM, the
|
|||
|
|
FECN bit in Frame Relay) in this reliable manner, then the interface
|
|||
|
|
router to the layer 2 network could copy the state of that layer 2
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 10]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Congestion Experienced codepoint into the CE codepoint in the IP
|
|||
|
|
header. We recognize that this is not the current practice, nor is
|
|||
|
|
it in current standards. However, encouraging experimentation in this
|
|||
|
|
manner may provide the information needed to enable evolution of
|
|||
|
|
existing layer 2 mechanisms to provide a more reliable means of
|
|||
|
|
congestion indication, when they use a single bit for indicating
|
|||
|
|
congestion.
|
|||
|
|
|
|||
|
|
5.2. Dropped or Corrupted Packets
|
|||
|
|
|
|||
|
|
For the proposed use for ECN in this document (that is, for a
|
|||
|
|
transport protocol such as TCP for which a dropped data packet is an
|
|||
|
|
indication of congestion), end nodes detect dropped data packets, and
|
|||
|
|
the congestion response of the end nodes to a dropped data packet is
|
|||
|
|
at least as strong as the congestion response to a received CE
|
|||
|
|
packet. To ensure the reliable delivery of the congestion indication
|
|||
|
|
of the CE codepoint, an ECT codepoint MUST NOT be set in a packet
|
|||
|
|
unless the loss of that packet in the network would be detected by
|
|||
|
|
the end nodes and interpreted as an indication of congestion.
|
|||
|
|
|
|||
|
|
Transport protocols such as TCP do not necessarily detect all packet
|
|||
|
|
drops, such as the drop of a "pure" ACK packet; for example, TCP does
|
|||
|
|
not reduce the arrival rate of subsequent ACK packets in response to
|
|||
|
|
an earlier dropped ACK packet. Any proposal for extending ECN-
|
|||
|
|
Capability to such packets would have to address issues such as the
|
|||
|
|
case of an ACK packet that was marked with the CE codepoint but was
|
|||
|
|
later dropped in the network. We believe that this aspect is still
|
|||
|
|
the subject of research, so this document specifies that at this
|
|||
|
|
time, "pure" ACK packets MUST NOT indicate ECN-Capability.
|
|||
|
|
|
|||
|
|
Similarly, if a CE packet is dropped later in the network due to
|
|||
|
|
corruption (bit errors), the end nodes should still invoke congestion
|
|||
|
|
control, just as TCP would today in response to a dropped data
|
|||
|
|
packet. This issue of corrupted CE packets would have to be
|
|||
|
|
considered in any proposal for the network to distinguish between
|
|||
|
|
packets dropped due to corruption, and packets dropped due to
|
|||
|
|
congestion or buffer overflow. In particular, the ubiquitous
|
|||
|
|
deployment of ECN would not, in and of itself, be a sufficient
|
|||
|
|
development to allow end-nodes to interpret packet drops as
|
|||
|
|
indications of corruption rather than congestion.
|
|||
|
|
|
|||
|
|
5.3. Fragmentation
|
|||
|
|
|
|||
|
|
ECN-capable packets MAY have the DF (Don't Fragment) bit set.
|
|||
|
|
Reassembly of a fragmented packet MUST NOT lose indications of
|
|||
|
|
congestion. In other words, if any fragment of an IP packet to be
|
|||
|
|
reassembled has the CE codepoint set, then one of two actions MUST be
|
|||
|
|
taken:
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 11]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
* Set the CE codepoint on the reassembled packet. However, this
|
|||
|
|
MUST NOT occur if any of the other fragments contributing to
|
|||
|
|
this reassembly carries the Not-ECT codepoint.
|
|||
|
|
|
|||
|
|
* The packet is dropped, instead of being reassembled, for any
|
|||
|
|
other reason.
|
|||
|
|
|
|||
|
|
If both actions are applicable, either MAY be chosen. Reassembly of
|
|||
|
|
a fragmented packet MUST NOT change the ECN codepoint when all of the
|
|||
|
|
fragments carry the same codepoint.
|
|||
|
|
|
|||
|
|
We would note that because RFC 2481 did not specify reassembly
|
|||
|
|
behavior, older ECN implementations conformant with that Experimental
|
|||
|
|
RFC do not necessarily perform reassembly correctly, in terms of
|
|||
|
|
preserving the CE codepoint in a fragment. The sender could avoid
|
|||
|
|
the consequences of this behavior by setting the DF bit in ECN-
|
|||
|
|
Capable packets.
|
|||
|
|
|
|||
|
|
Situations may arise in which the above reassembly specification is
|
|||
|
|
insufficiently precise. For example, if there is a malicious or
|
|||
|
|
broken entity in the path at or after the fragmentation point, packet
|
|||
|
|
fragments could carry a mixture of ECT(0), ECT(1), and/or Not-ECT
|
|||
|
|
codepoints. The reassembly specification above does not place
|
|||
|
|
requirements on reassembly of fragments in this case. In situations
|
|||
|
|
where more precise reassembly behavior would be required, protocol
|
|||
|
|
specifications SHOULD instead specify that DF MUST be set in all
|
|||
|
|
ECN-capable packets sent by the protocol.
|
|||
|
|
|
|||
|
|
6. Support from the Transport Protocol
|
|||
|
|
|
|||
|
|
ECN requires support from the transport protocol, in addition to the
|
|||
|
|
functionality given by the ECN field in the IP packet header. The
|
|||
|
|
transport protocol might require negotiation between the endpoints
|
|||
|
|
during setup to determine that all of the endpoints are ECN-capable,
|
|||
|
|
so that the sender can set the ECT codepoint in transmitted packets.
|
|||
|
|
Second, the transport protocol must be capable of reacting
|
|||
|
|
appropriately to the receipt of CE packets. This reaction could be
|
|||
|
|
in the form of the data receiver informing the data sender of the
|
|||
|
|
received CE packet (e.g., TCP), of the data receiver unsubscribing to
|
|||
|
|
a layered multicast group (e.g., RLM [MJV96]), or of some other
|
|||
|
|
action that ultimately reduces the arrival rate of that flow on that
|
|||
|
|
congested link. CE packets indicate persistent rather than transient
|
|||
|
|
congestion (see Section 5.1), and hence reactions to the receipt of
|
|||
|
|
CE packets should be those appropriate for persistent congestion.
|
|||
|
|
|
|||
|
|
This document only addresses the addition of ECN Capability to TCP,
|
|||
|
|
leaving issues of ECN in other transport protocols to further
|
|||
|
|
research. For TCP, ECN requires three new pieces of functionality:
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 12]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
negotiation between the endpoints during connection setup to
|
|||
|
|
determine if they are both ECN-capable; an ECN-Echo (ECE) flag in the
|
|||
|
|
TCP header so that the data receiver can inform the data sender when
|
|||
|
|
a CE packet has been received; and a Congestion Window Reduced (CWR)
|
|||
|
|
flag in the TCP header so that the data sender can inform the data
|
|||
|
|
receiver that the congestion window has been reduced. The support
|
|||
|
|
required from other transport protocols is likely to be different,
|
|||
|
|
particularly for unreliable or reliable multicast transport
|
|||
|
|
protocols, and will have to be determined as other transport
|
|||
|
|
protocols are brought to the IETF for standardization.
|
|||
|
|
|
|||
|
|
In a mild abuse of terminology, in this document we refer to `TCP
|
|||
|
|
packets' instead of `TCP segments'.
|
|||
|
|
|
|||
|
|
6.1. TCP
|
|||
|
|
|
|||
|
|
The following sections describe in detail the proposed use of ECN in
|
|||
|
|
TCP. This proposal is described in essentially the same form in
|
|||
|
|
[Floyd94]. We assume that the source TCP uses the standard congestion
|
|||
|
|
control algorithms of Slow-start, Fast Retransmit and Fast Recovery
|
|||
|
|
[RFC2581].
|
|||
|
|
|
|||
|
|
This proposal specifies two new flags in the Reserved field of the
|
|||
|
|
TCP header. The TCP mechanism for negotiating ECN-Capability uses
|
|||
|
|
the ECN-Echo (ECE) flag in the TCP header. Bit 9 in the Reserved
|
|||
|
|
field of the TCP header is designated as the ECN-Echo flag. The
|
|||
|
|
location of the 6-bit Reserved field in the TCP header is shown in
|
|||
|
|
Figure 4 of RFC 793 [RFC793] (and is reproduced below for
|
|||
|
|
completeness). This specification of the ECN Field leaves the
|
|||
|
|
Reserved field as a 4-bit field using bits 4-7.
|
|||
|
|
|
|||
|
|
To enable the TCP receiver to determine when to stop setting the
|
|||
|
|
ECN-Echo flag, we introduce a second new flag in the TCP header, the
|
|||
|
|
CWR flag. The CWR flag is assigned to Bit 8 in the Reserved field of
|
|||
|
|
the TCP header.
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
| | | U | A | P | R | S | F |
|
|||
|
|
| Header Length | Reserved | R | C | S | S | Y | I |
|
|||
|
|
| | | G | K | H | T | N | N |
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
|
|||
|
|
Figure 3: The old definition of bytes 13 and 14 of the TCP
|
|||
|
|
header.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 13]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
| | | C | E | U | A | P | R | S | F |
|
|||
|
|
| Header Length | Reserved | W | C | R | C | S | S | Y | I |
|
|||
|
|
| | | R | E | G | K | H | T | N | N |
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
|
|||
|
|
Figure 4: The new definition of bytes 13 and 14 of the TCP
|
|||
|
|
Header.
|
|||
|
|
|
|||
|
|
Thus, ECN uses the ECT and CE flags in the IP header (as shown in
|
|||
|
|
Figure 1) for signaling between routers and connection endpoints, and
|
|||
|
|
uses the ECN-Echo and CWR flags in the TCP header (as shown in Figure
|
|||
|
|
4) for TCP-endpoint to TCP-endpoint signaling. For a TCP connection,
|
|||
|
|
a typical sequence of events in an ECN-based reaction to congestion
|
|||
|
|
is as follows:
|
|||
|
|
|
|||
|
|
* An ECT codepoint is set in packets transmitted by the sender to
|
|||
|
|
indicate that ECN is supported by the transport entities for
|
|||
|
|
these packets.
|
|||
|
|
|
|||
|
|
* An ECN-capable router detects impending congestion and detects
|
|||
|
|
that an ECT codepoint is set in the packet it is about to drop.
|
|||
|
|
Instead of dropping the packet, the router chooses to set the CE
|
|||
|
|
codepoint in the IP header and forwards the packet.
|
|||
|
|
|
|||
|
|
* The receiver receives the packet with the CE codepoint set, and
|
|||
|
|
sets the ECN-Echo flag in its next TCP ACK sent to the sender.
|
|||
|
|
|
|||
|
|
* The sender receives the TCP ACK with ECN-Echo set, and reacts to
|
|||
|
|
the congestion as if a packet had been dropped.
|
|||
|
|
|
|||
|
|
* The sender sets the CWR flag in the TCP header of the next
|
|||
|
|
packet sent to the receiver to acknowledge its receipt of and
|
|||
|
|
reaction to the ECN-Echo flag.
|
|||
|
|
|
|||
|
|
The negotiation for using ECN by the TCP transport entities and the
|
|||
|
|
use of the ECN-Echo and CWR flags is described in more detail in the
|
|||
|
|
sections below.
|
|||
|
|
|
|||
|
|
6.1.1 TCP Initialization
|
|||
|
|
|
|||
|
|
In the TCP connection setup phase, the source and destination TCPs
|
|||
|
|
exchange information about their willingness to use ECN. Subsequent
|
|||
|
|
to the completion of this negotiation, the TCP sender sets an ECT
|
|||
|
|
codepoint in the IP header of data packets to indicate to the network
|
|||
|
|
that the transport is capable and willing to participate in ECN for
|
|||
|
|
this packet. This indicates to the routers that they may mark this
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 14]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
packet with the CE codepoint, if they would like to use that as a
|
|||
|
|
method of congestion notification. If the TCP connection does not
|
|||
|
|
wish to use ECN notification for a particular packet, the sending TCP
|
|||
|
|
sets the ECN codepoint to not-ECT, and the TCP receiver ignores the
|
|||
|
|
CE codepoint in the received packet.
|
|||
|
|
|
|||
|
|
For this discussion, we designate the initiating host as Host A and
|
|||
|
|
the responding host as Host B. We call a SYN packet with the ECE and
|
|||
|
|
CWR flags set an "ECN-setup SYN packet", and we call a SYN packet
|
|||
|
|
with at least one of the ECE and CWR flags not set a "non-ECN-setup
|
|||
|
|
SYN packet". Similarly, we call a SYN-ACK packet with only the ECE
|
|||
|
|
flag set but the CWR flag not set an "ECN-setup SYN-ACK packet", and
|
|||
|
|
we call a SYN-ACK packet with any other configuration of the ECE and
|
|||
|
|
CWR flags a "non-ECN-setup SYN-ACK packet".
|
|||
|
|
|
|||
|
|
Before a TCP connection can use ECN, Host A sends an ECN-setup SYN
|
|||
|
|
packet, and Host B sends an ECN-setup SYN-ACK packet. For a SYN
|
|||
|
|
packet, the setting of both ECE and CWR in the ECN-setup SYN packet
|
|||
|
|
is defined as an indication that the sending TCP is ECN-Capable,
|
|||
|
|
rather than as an indication of congestion or of response to
|
|||
|
|
congestion. More precisely, an ECN-setup SYN packet indicates that
|
|||
|
|
the TCP implementation transmitting the SYN packet will participate
|
|||
|
|
in ECN as both a sender and receiver. Specifically, as a receiver,
|
|||
|
|
it will respond to incoming data packets that have the CE codepoint
|
|||
|
|
set in the IP header by setting ECE in outgoing TCP Acknowledgement
|
|||
|
|
(ACK) packets. As a sender, it will respond to incoming packets that
|
|||
|
|
have ECE set by reducing the congestion window and setting CWR when
|
|||
|
|
appropriate. An ECN-setup SYN packet does not commit the TCP sender
|
|||
|
|
to setting the ECT codepoint in any or all of the packets it may
|
|||
|
|
transmit. However, the commitment to respond appropriately to
|
|||
|
|
incoming packets with the CE codepoint set remains even if the TCP
|
|||
|
|
sender in a later transmission, within this TCP connection, sends a
|
|||
|
|
SYN packet without ECE and CWR set.
|
|||
|
|
|
|||
|
|
When Host B sends an ECN-setup SYN-ACK packet, it sets the ECE flag
|
|||
|
|
but not the CWR flag. An ECN-setup SYN-ACK packet is defined as an
|
|||
|
|
indication that the TCP transmitting the SYN-ACK packet is ECN-
|
|||
|
|
Capable. As with the SYN packet, an ECN-setup SYN-ACK packet does
|
|||
|
|
not commit the TCP host to setting the ECT codepoint in transmitted
|
|||
|
|
packets.
|
|||
|
|
|
|||
|
|
The following rules apply to the sending of ECN-setup packets within
|
|||
|
|
a TCP connection, where a TCP connection is defined by the standard
|
|||
|
|
rules for TCP connection establishment and termination.
|
|||
|
|
|
|||
|
|
* If a host has received an ECN-setup SYN packet, then it MAY send
|
|||
|
|
an ECN-setup SYN-ACK packet. Otherwise, it MUST NOT send an
|
|||
|
|
ECN-setup SYN-ACK packet.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 15]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
* A host MUST NOT set ECT on data packets unless it has sent at
|
|||
|
|
least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has
|
|||
|
|
received at least one ECN-setup SYN or ECN-setup SYN-ACK packet,
|
|||
|
|
and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK
|
|||
|
|
packet. If a host has received at least one non-ECN-setup SYN
|
|||
|
|
or non-ECN-setup SYN-ACK packet, then it SHOULD NOT set ECT on
|
|||
|
|
data packets.
|
|||
|
|
|
|||
|
|
* If a host ever sets the ECT codepoint on a data packet, then
|
|||
|
|
that host MUST correctly set/clear the CWR TCP bit on all
|
|||
|
|
subsequent packets in the connection.
|
|||
|
|
|
|||
|
|
* If a host has sent at least one ECN-setup SYN or ECN-setup SYN-
|
|||
|
|
ACK packet, and has received no non-ECN-setup SYN or non-ECN-
|
|||
|
|
setup SYN-ACK packet, then if that host receives TCP data
|
|||
|
|
packets with ECT and CE codepoints set in the IP header, then
|
|||
|
|
that host MUST process these packets as specified for an ECN-
|
|||
|
|
capable connection.
|
|||
|
|
|
|||
|
|
* A host that is not willing to use ECN on a TCP connection SHOULD
|
|||
|
|
clear both the ECE and CWR flags in all non-ECN-setup SYN and/or
|
|||
|
|
SYN-ACK packets that it sends to indicate this unwillingness.
|
|||
|
|
Receivers MUST correctly handle all forms of the non-ECN-setup
|
|||
|
|
SYN and SYN-ACK packets.
|
|||
|
|
|
|||
|
|
* A host MUST NOT set ECT on SYN or SYN-ACK packets.
|
|||
|
|
|
|||
|
|
A TCP client enters TIME-WAIT state after receiving a FIN-ACK, and
|
|||
|
|
transitions to CLOSED state after a timeout. Many TCP
|
|||
|
|
implementations create a new TCP connection if they receive an in-
|
|||
|
|
window SYN packet during TIME-WAIT state. When a TCP host enters
|
|||
|
|
TIME-WAIT or CLOSED state, it should ignore any previous state about
|
|||
|
|
the negotiation of ECN for that connection.
|
|||
|
|
|
|||
|
|
6.1.1.1. Middlebox Issues
|
|||
|
|
|
|||
|
|
ECN introduces the use of the ECN-Echo and CWR flags in the TCP
|
|||
|
|
header (as shown in Figure 3) for initialization. There exist some
|
|||
|
|
faulty firewalls, load balancers, and intrusion detection systems in
|
|||
|
|
the Internet that either drop an ECN-setup SYN packet or respond with
|
|||
|
|
a RST, in the belief that such a packet (with these bits set) is a
|
|||
|
|
signature for a port-scanning tool that could be used in a denial-
|
|||
|
|
of-service attack. Some of the offending equipment has been
|
|||
|
|
identified, and a web page [FIXES] contains a list of non-compliant
|
|||
|
|
products and the fixes posted by the vendors, where these are
|
|||
|
|
available. The TBIT web page [TBIT] lists some of the web servers
|
|||
|
|
affected by this faulty equipment. We mention this in this document
|
|||
|
|
as a warning to the community of this problem.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 16]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
To provide robust connectivity even in the presence of such faulty
|
|||
|
|
equipment, a host that receives a RST in response to the transmission
|
|||
|
|
of an ECN-setup SYN packet MAY resend a SYN with CWR and ECE cleared.
|
|||
|
|
This could result in a TCP connection being established without using
|
|||
|
|
ECN.
|
|||
|
|
|
|||
|
|
A host that receives no reply to an ECN-setup SYN within the normal
|
|||
|
|
SYN retransmission timeout interval MAY resend the SYN and any
|
|||
|
|
subsequent SYN retransmissions with CWR and ECE cleared. To overcome
|
|||
|
|
normal packet loss that results in the original SYN being lost, the
|
|||
|
|
originating host may retransmit one or more ECN-setup SYN packets
|
|||
|
|
before giving up and retransmitting the SYN with the CWR and ECE bits
|
|||
|
|
cleared.
|
|||
|
|
|
|||
|
|
We note that in this case, the following example scenario is
|
|||
|
|
possible:
|
|||
|
|
|
|||
|
|
(1) Host A: Sends an ECN-setup SYN.
|
|||
|
|
(2) Host B: Sends an ECN-setup SYN/ACK, packet is dropped or delayed.
|
|||
|
|
(3) Host A: Sends a non-ECN-setup SYN.
|
|||
|
|
(4) Host B: Sends a non-ECN-setup SYN/ACK.
|
|||
|
|
|
|||
|
|
We note that in this case, following the procedures above, neither
|
|||
|
|
Host A nor Host B may set the ECT bit on data packets. Further, an
|
|||
|
|
important consequence of the rules for ECN setup and usage in Section
|
|||
|
|
6.1.1 is that a host is forbidden from using the reception of ECT
|
|||
|
|
data packets as an implicit signal that the other host is ECN-
|
|||
|
|
capable.
|
|||
|
|
|
|||
|
|
6.1.1.2. Robust TCP Initialization with an Echoed Reserved Field
|
|||
|
|
|
|||
|
|
There is the question of why we chose to have the TCP sending the SYN
|
|||
|
|
set two ECN-related flags in the Reserved field of the TCP header for
|
|||
|
|
the SYN packet, while the responding TCP sending the SYN-ACK sets
|
|||
|
|
only one ECN-related flag in the SYN-ACK packet. This asymmetry is
|
|||
|
|
necessary for the robust negotiation of ECN-capability with some
|
|||
|
|
deployed TCP implementations. There exists at least one faulty TCP
|
|||
|
|
implementation in which TCP receivers set the Reserved field of the
|
|||
|
|
TCP header in ACK packets (and hence the SYN-ACK) simply to reflect
|
|||
|
|
the Reserved field of the TCP header in the received data packet.
|
|||
|
|
Because the TCP SYN packet sets the ECN-Echo and CWR flags to
|
|||
|
|
indicate ECN-capability, while the SYN-ACK packet sets only the ECN-
|
|||
|
|
Echo flag, the sending TCP correctly interprets a receiver's
|
|||
|
|
reflection of its own flags in the Reserved field as an indication
|
|||
|
|
that the receiver is not ECN-capable. The sending TCP is not mislead
|
|||
|
|
by a faulty TCP implementation sending a SYN-ACK packet that simply
|
|||
|
|
reflects the Reserved field of the incoming SYN packet.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 17]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
6.1.2. The TCP Sender
|
|||
|
|
|
|||
|
|
For a TCP connection using ECN, new data packets are transmitted with
|
|||
|
|
an ECT codepoint set in the IP header. When only one ECT codepoint
|
|||
|
|
is needed by a sender for all packets sent on a TCP connection,
|
|||
|
|
ECT(0) SHOULD be used. If the sender receives an ECN-Echo (ECE) ACK
|
|||
|
|
packet (that is, an ACK packet with the ECN-Echo flag set in the TCP
|
|||
|
|
header), then the sender knows that congestion was encountered in the
|
|||
|
|
network on the path from the sender to the receiver. The indication
|
|||
|
|
of congestion should be treated just as a congestion loss in non-
|
|||
|
|
ECN-Capable TCP. That is, the TCP source halves the congestion window
|
|||
|
|
"cwnd" and reduces the slow start threshold "ssthresh". The sending
|
|||
|
|
TCP SHOULD NOT increase the congestion window in response to the
|
|||
|
|
receipt of an ECN-Echo ACK packet.
|
|||
|
|
|
|||
|
|
TCP should not react to congestion indications more than once every
|
|||
|
|
window of data (or more loosely, more than once every round-trip
|
|||
|
|
time). That is, the TCP sender's congestion window should be reduced
|
|||
|
|
only once in response to a series of dropped and/or CE packets from a
|
|||
|
|
single window of data. In addition, the TCP source should not
|
|||
|
|
decrease the slow-start threshold, ssthresh, if it has been decreased
|
|||
|
|
within the last round trip time. However, if any retransmitted
|
|||
|
|
packets are dropped, then this is interpreted by the source TCP as a
|
|||
|
|
new instance of congestion.
|
|||
|
|
|
|||
|
|
After the source TCP reduces its congestion window in response to a
|
|||
|
|
CE packet, incoming acknowledgments that continue to arrive can
|
|||
|
|
"clock out" outgoing packets as allowed by the reduced congestion
|
|||
|
|
window. If the congestion window consists of only one MSS (maximum
|
|||
|
|
segment size), and the sending TCP receives an ECN-Echo ACK packet,
|
|||
|
|
then the sending TCP should in principle still reduce its congestion
|
|||
|
|
window in half. However, the value of the congestion window is
|
|||
|
|
bounded below by a value of one MSS. If the sending TCP were to
|
|||
|
|
continue to send, using a congestion window of 1 MSS, this results in
|
|||
|
|
the transmission of one packet per round-trip time. It is necessary
|
|||
|
|
to still reduce the sending rate of the TCP sender even further, on
|
|||
|
|
receipt of an ECN-Echo packet when the congestion window is one. We
|
|||
|
|
use the retransmit timer as a means of reducing the rate further in
|
|||
|
|
this circumstance. Therefore, the sending TCP MUST reset the
|
|||
|
|
retransmit timer on receiving the ECN-Echo packet when the congestion
|
|||
|
|
window is one. The sending TCP will then be able to send a new
|
|||
|
|
packet only when the retransmit timer expires.
|
|||
|
|
|
|||
|
|
When an ECN-Capable TCP sender reduces its congestion window for any
|
|||
|
|
reason (because of a retransmit timeout, a Fast Retransmit, or in
|
|||
|
|
response to an ECN Notification), the TCP sender sets the CWR flag in
|
|||
|
|
the TCP header of the first new data packet sent after the window
|
|||
|
|
reduction. If that data packet is dropped in the network, then the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 18]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
sending TCP will have to reduce the congestion window again and
|
|||
|
|
retransmit the dropped packet.
|
|||
|
|
|
|||
|
|
We ensure that the "Congestion Window Reduced" information is
|
|||
|
|
reliably delivered to the TCP receiver. This comes about from the
|
|||
|
|
fact that if the new data packet carrying the CWR flag is dropped,
|
|||
|
|
then the TCP sender will have to again reduce its congestion window,
|
|||
|
|
and send another new data packet with the CWR flag set. Thus, the
|
|||
|
|
CWR bit in the TCP header SHOULD NOT be set on retransmitted packets.
|
|||
|
|
|
|||
|
|
When the TCP data sender is ready to set the CWR bit after reducing
|
|||
|
|
the congestion window, it SHOULD set the CWR bit only on the first
|
|||
|
|
new data packet that it transmits.
|
|||
|
|
|
|||
|
|
[Floyd94] discusses TCP's response to ECN in more detail. [Floyd98]
|
|||
|
|
discusses the validation test in the ns simulator, which illustrates
|
|||
|
|
a wide range of ECN scenarios. These scenarios include the following:
|
|||
|
|
an ECN followed by another ECN, a Fast Retransmit, or a Retransmit
|
|||
|
|
Timeout; a Retransmit Timeout or a Fast Retransmit followed by an
|
|||
|
|
ECN; and a congestion window of one packet followed by an ECN.
|
|||
|
|
|
|||
|
|
TCP follows existing algorithms for sending data packets in response
|
|||
|
|
to incoming ACKs, multiple duplicate acknowledgments, or retransmit
|
|||
|
|
timeouts [RFC2581]. TCP also follows the normal procedures for
|
|||
|
|
increasing the congestion window when it receives ACK packets without
|
|||
|
|
the ECN-Echo bit set [RFC2581].
|
|||
|
|
|
|||
|
|
6.1.3. The TCP Receiver
|
|||
|
|
|
|||
|
|
When TCP receives a CE data packet at the destination end-system, the
|
|||
|
|
TCP data receiver sets the ECN-Echo flag in the TCP header of the
|
|||
|
|
subsequent ACK packet. If there is any ACK withholding implemented,
|
|||
|
|
as in current "delayed-ACK" TCP implementations where the TCP
|
|||
|
|
receiver can send an ACK for two arriving data packets, then the
|
|||
|
|
ECN-Echo flag in the ACK packet will be set to '1' if the CE
|
|||
|
|
codepoint is set in any of the data packets being acknowledged. That
|
|||
|
|
is, if any of the received data packets are CE packets, then the
|
|||
|
|
returning ACK has the ECN-Echo flag set.
|
|||
|
|
|
|||
|
|
To provide robustness against the possibility of a dropped ACK packet
|
|||
|
|
carrying an ECN-Echo flag, the TCP receiver sets the ECN-Echo flag in
|
|||
|
|
a series of ACK packets sent subsequently. The TCP receiver uses the
|
|||
|
|
CWR flag received from the TCP sender to determine when to stop
|
|||
|
|
setting the ECN-Echo flag.
|
|||
|
|
|
|||
|
|
After a TCP receiver sends an ACK packet with the ECN-Echo bit set,
|
|||
|
|
that TCP receiver continues to set the ECN-Echo flag in all the ACK
|
|||
|
|
packets it sends (whether they acknowledge CE data packets or non-CE
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 19]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
data packets) until it receives a CWR packet (a packet with the CWR
|
|||
|
|
flag set). After the receipt of the CWR packet, acknowledgments for
|
|||
|
|
subsequent non-CE data packets do not have the ECN-Echo flag set. If
|
|||
|
|
another CE packet is received by the data receiver, the receiver
|
|||
|
|
would once again send ACK packets with the ECN-Echo flag set. While
|
|||
|
|
the receipt of a CWR packet does not guarantee that the data sender
|
|||
|
|
received the ECN-Echo message, this does suggest that the data sender
|
|||
|
|
reduced its congestion window at some point *after* it sent the data
|
|||
|
|
packet for which the CE codepoint was set.
|
|||
|
|
|
|||
|
|
We have already specified that a TCP sender is not required to reduce
|
|||
|
|
its congestion window more than once per window of data. Some care
|
|||
|
|
is required if the TCP sender is to avoid unnecessary reductions of
|
|||
|
|
the congestion window when a window of data includes both dropped
|
|||
|
|
packets and (marked) CE packets. This is illustrated in [Floyd98].
|
|||
|
|
|
|||
|
|
6.1.4. Congestion on the ACK-path
|
|||
|
|
|
|||
|
|
For the current generation of TCP congestion control algorithms, pure
|
|||
|
|
acknowledgement packets (e.g., packets that do not contain any
|
|||
|
|
accompanying data) MUST be sent with the not-ECT codepoint. Current
|
|||
|
|
TCP receivers have no mechanisms for reducing traffic on the ACK-path
|
|||
|
|
in response to congestion notification. Mechanisms for responding to
|
|||
|
|
congestion on the ACK-path are areas for current and future research.
|
|||
|
|
(One simple possibility would be for the sender to reduce its
|
|||
|
|
congestion window when it receives a pure ACK packet with the CE
|
|||
|
|
codepoint set). For current TCP implementations, a single dropped ACK
|
|||
|
|
generally has only a very small effect on the TCP's sending rate.
|
|||
|
|
|
|||
|
|
6.1.5. Retransmitted TCP packets
|
|||
|
|
|
|||
|
|
This document specifies ECN-capable TCP implementations MUST NOT set
|
|||
|
|
either ECT codepoint (ECT(0) or ECT(1)) in the IP header for
|
|||
|
|
retransmitted data packets, and that the TCP data receiver SHOULD
|
|||
|
|
ignore the ECN field on arriving data packets that are outside of the
|
|||
|
|
receiver's current window. This is for greater security against
|
|||
|
|
denial-of-service attacks, as well as for robustness of the ECN
|
|||
|
|
congestion indication with packets that are dropped later in the
|
|||
|
|
network.
|
|||
|
|
|
|||
|
|
First, we note that if the TCP sender were to set an ECT codepoint on
|
|||
|
|
a retransmitted packet, then if an unnecessarily-retransmitted packet
|
|||
|
|
was later dropped in the network, the end nodes would never receive
|
|||
|
|
the indication of congestion from the router setting the CE
|
|||
|
|
codepoint. Thus, setting an ECT codepoint on retransmitted data
|
|||
|
|
packets is not consistent with the robust delivery of the congestion
|
|||
|
|
indication even for packets that are later dropped in the network.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 20]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
In addition, an attacker capable of spoofing the IP source address of
|
|||
|
|
the TCP sender could send data packets with arbitrary sequence
|
|||
|
|
numbers, with the CE codepoint set in the IP header. On receiving
|
|||
|
|
this spoofed data packet, the TCP data receiver would determine that
|
|||
|
|
the data does not lie in the current receive window, and return a
|
|||
|
|
duplicate acknowledgement. We define an out-of-window packet at the
|
|||
|
|
TCP data receiver as a data packet that lies outside the receiver's
|
|||
|
|
current window. On receiving an out-of-window packet, the TCP data
|
|||
|
|
receiver has to decide whether or not to treat the CE codepoint in
|
|||
|
|
the packet header as a valid indication of congestion, and therefore
|
|||
|
|
whether to return ECN-Echo indications to the TCP data sender. If
|
|||
|
|
the TCP data receiver ignored the CE codepoint in an out-of-window
|
|||
|
|
packet, then the TCP data sender would not receive this possibly-
|
|||
|
|
legitimate indication of congestion from the network, resulting in a
|
|||
|
|
violation of end-to-end congestion control. On the other hand, if
|
|||
|
|
the TCP data receiver honors the CE indication in the out-of-window
|
|||
|
|
packet, and reports the indication of congestion to the TCP data
|
|||
|
|
sender, then the malicious node that created the spoofed, out-of-
|
|||
|
|
window packet has successfully "attacked" the TCP connection by
|
|||
|
|
forcing the data sender to unnecessarily reduce (halve) its
|
|||
|
|
congestion window. To prevent such a denial-of-service attack, we
|
|||
|
|
specify that a legitimate TCP data sender MUST NOT set an ECT
|
|||
|
|
codepoint on retransmitted data packets, and that the TCP data
|
|||
|
|
receiver SHOULD ignore the CE codepoint on out-of-window packets.
|
|||
|
|
|
|||
|
|
One drawback of not setting ECT(0) or ECT(1) on retransmitted packets
|
|||
|
|
is that it denies ECN protection for retransmitted packets. However,
|
|||
|
|
for an ECN-capable TCP connection in a fully-ECN-capable environment
|
|||
|
|
with mild congestion, packets should rarely be dropped due to
|
|||
|
|
congestion in the first place, and so instances of retransmitted
|
|||
|
|
packets should rarely arise. If packets are being retransmitted,
|
|||
|
|
then there are already packet losses (from corruption or from
|
|||
|
|
congestion) that ECN has been unable to prevent.
|
|||
|
|
|
|||
|
|
We note that if the router sets the CE codepoint for an ECN-capable
|
|||
|
|
data packet within a TCP connection, then the TCP connection is
|
|||
|
|
guaranteed to receive that indication of congestion, or to receive
|
|||
|
|
some other indication of congestion within the same window of data,
|
|||
|
|
even if this packet is dropped or reordered in the network. We
|
|||
|
|
consider two cases, when the packet is later retransmitted, and when
|
|||
|
|
the packet is not later retransmitted.
|
|||
|
|
|
|||
|
|
In the first case, if the packet is either dropped or delayed, and at
|
|||
|
|
some point retransmitted by the data sender, then the retransmission
|
|||
|
|
is a result of a Fast Retransmit or a Retransmit Timeout for either
|
|||
|
|
that packet or for some prior packet in the same window of data. In
|
|||
|
|
this case, because the data sender already has retransmitted this
|
|||
|
|
packet, we know that the data sender has already responded to an
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 21]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
indication of congestion for some packet within the same window of
|
|||
|
|
data as the original packet. Thus, even if the first transmission of
|
|||
|
|
the packet is dropped in the network, or is delayed, if it had the CE
|
|||
|
|
codepoint set, and is later ignored by the data receiver as an out-
|
|||
|
|
of-window packet, this is not a problem, because the sender has
|
|||
|
|
already responded to an indication of congestion for that window of
|
|||
|
|
data.
|
|||
|
|
|
|||
|
|
In the second case, if the packet is never retransmitted by the data
|
|||
|
|
sender, then this data packet is the only copy of this data received
|
|||
|
|
by the data receiver, and therefore arrives at the data receiver as
|
|||
|
|
an in-window packet, regardless of how much the packet might be
|
|||
|
|
delayed or reordered. In this case, if the CE codepoint is set on
|
|||
|
|
the packet within the network, this will be treated by the data
|
|||
|
|
receiver as a valid indication of congestion.
|
|||
|
|
|
|||
|
|
6.1.6. TCP Window Probes.
|
|||
|
|
|
|||
|
|
When the TCP data receiver advertises a zero window, the TCP data
|
|||
|
|
sender sends window probes to determine if the receiver's window has
|
|||
|
|
increased. Window probe packets do not contain any user data except
|
|||
|
|
for the sequence number, which is a byte. If a window probe packet
|
|||
|
|
is dropped in the network, this loss is not detected by the receiver.
|
|||
|
|
Therefore, the TCP data sender MUST NOT set either an ECT codepoint
|
|||
|
|
or the CWR bit on window probe packets.
|
|||
|
|
|
|||
|
|
However, because window probes use exact sequence numbers, they
|
|||
|
|
cannot be easily spoofed in denial-of-service attacks. Therefore, if
|
|||
|
|
a window probe arrives with the CE codepoint set, then the receiver
|
|||
|
|
SHOULD respond to the ECN indications.
|
|||
|
|
|
|||
|
|
7. Non-compliance by the End Nodes
|
|||
|
|
|
|||
|
|
This section discusses concerns about the vulnerability of ECN to
|
|||
|
|
non-compliant end-nodes (i.e., end nodes that set the ECT codepoint
|
|||
|
|
in transmitted packets but do not respond to received CE packets).
|
|||
|
|
We argue that the addition of ECN to the IP architecture will not
|
|||
|
|
significantly increase the current vulnerability of the architecture
|
|||
|
|
to unresponsive flows.
|
|||
|
|
|
|||
|
|
Even for non-ECN environments, there are serious concerns about the
|
|||
|
|
damage that can be done by non-compliant or unresponsive flows (that
|
|||
|
|
is, flows that do not respond to congestion control indications by
|
|||
|
|
reducing their arrival rate at the congested link). For example, an
|
|||
|
|
end-node could "turn off congestion control" by not reducing its
|
|||
|
|
congestion window in response to packet drops. This is a concern for
|
|||
|
|
the current Internet. It has been argued that routers will have to
|
|||
|
|
deploy mechanisms to detect and differentially treat packets from
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 22]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
non-compliant flows [RFC2309,FF99]. It has also been suggested that
|
|||
|
|
techniques such as end-to-end per-flow scheduling and isolation of
|
|||
|
|
one flow from another, differentiated services, or end-to-end
|
|||
|
|
reservations could remove some of the more damaging effects of
|
|||
|
|
unresponsive flows.
|
|||
|
|
|
|||
|
|
It might seem that dropping packets in itself is an adequate
|
|||
|
|
deterrent for non-compliance, and that the use of ECN removes this
|
|||
|
|
deterrent. We would argue in response that (1) ECN-capable routers
|
|||
|
|
preserve packet-dropping behavior in times of high congestion; and
|
|||
|
|
(2) even in times of high congestion, dropping packets in itself is
|
|||
|
|
not an adequate deterrent for non-compliance.
|
|||
|
|
|
|||
|
|
First, ECN-Capable routers will only mark packets (as opposed to
|
|||
|
|
dropping them) when the packet marking rate is reasonably low. During
|
|||
|
|
periods where the average queue size exceeds an upper threshold, and
|
|||
|
|
therefore the potential packet marking rate would be high, our
|
|||
|
|
recommendation is that routers drop packets rather then set the CE
|
|||
|
|
codepoint in packet headers.
|
|||
|
|
|
|||
|
|
During the periods of low or moderate packet marking rates when ECN
|
|||
|
|
would be deployed, there would be little deterrent effect on
|
|||
|
|
unresponsive flows of dropping rather than marking those packets. For
|
|||
|
|
example, delay-insensitive flows using reliable delivery might have
|
|||
|
|
an incentive to increase rather than to decrease their sending rate
|
|||
|
|
in the presence of dropped packets. Similarly, delay-sensitive flows
|
|||
|
|
using unreliable delivery might increase their use of FEC in response
|
|||
|
|
to an increased packet drop rate, increasing rather than decreasing
|
|||
|
|
their sending rate. For the same reasons, we do not believe that
|
|||
|
|
packet dropping itself is an effective deterrent for non-compliance
|
|||
|
|
even in an environment of high packet drop rates, when all flows are
|
|||
|
|
sharing the same packet drop rate.
|
|||
|
|
|
|||
|
|
Several methods have been proposed to identify and restrict non-
|
|||
|
|
compliant or unresponsive flows. The addition of ECN to the network
|
|||
|
|
environment would not in any way increase the difficulty of designing
|
|||
|
|
and deploying such mechanisms. If anything, the addition of ECN to
|
|||
|
|
the architecture would make the job of identifying unresponsive flows
|
|||
|
|
slightly easier. For example, in an ECN-Capable environment routers
|
|||
|
|
are not limited to information about packets that are dropped or have
|
|||
|
|
the CE codepoint set at that router itself; in such an environment,
|
|||
|
|
routers could also take note of arriving CE packets that indicate
|
|||
|
|
congestion encountered by that packet earlier in the path.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 23]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
8. Non-compliance in the Network
|
|||
|
|
|
|||
|
|
This section considers the issues when a router is operating,
|
|||
|
|
possibly maliciously, to modify either of the bits in the ECN field.
|
|||
|
|
We note that in IPv4, the IP header is protected from bit errors by a
|
|||
|
|
header checksum; this is not the case in IPv6. Thus for IPv6 the
|
|||
|
|
ECN field can be accidentally modified by bit errors on links or in
|
|||
|
|
routers without being detected by an IP header checksum.
|
|||
|
|
|
|||
|
|
By tampering with the bits in the ECN field, an adversary (or a
|
|||
|
|
broken router) could do one or more of the following: falsely report
|
|||
|
|
congestion, disable ECN-Capability for an individual packet, erase
|
|||
|
|
the ECN congestion indication, or falsely indicate ECN-Capability.
|
|||
|
|
Section 18 systematically examines the various cases by which the ECN
|
|||
|
|
field could be modified. The important criterion considered in
|
|||
|
|
determining the consequences of such modifications is whether it is
|
|||
|
|
likely to lead to poorer behavior in any dimension (throughput,
|
|||
|
|
delay, fairness or functionality) than if a router were to drop a
|
|||
|
|
packet.
|
|||
|
|
|
|||
|
|
The first two possible changes, falsely reporting congestion or
|
|||
|
|
disabling ECN-Capability for an individual packet, are no worse than
|
|||
|
|
if the router were to simply drop the packet. From a congestion
|
|||
|
|
control point of view, setting the CE codepoint in the absence of
|
|||
|
|
congestion by a non-compliant router would be no worse than a router
|
|||
|
|
dropping a packet unnecessarily. By "erasing" an ECT codepoint of a
|
|||
|
|
packet that is later dropped in the network, a router's actions could
|
|||
|
|
result in an unnecessary packet drop for that packet later in the
|
|||
|
|
network.
|
|||
|
|
|
|||
|
|
However, as discussed in Section 18, a router that erases the ECN
|
|||
|
|
congestion indication or falsely indicates ECN-Capability could
|
|||
|
|
potentially do more damage to the flow that if it has simply dropped
|
|||
|
|
the packet. A rogue or broken router that "erased" the CE codepoint
|
|||
|
|
in arriving CE packets would prevent that indication of congestion
|
|||
|
|
from reaching downstream receivers. This could result in the failure
|
|||
|
|
of congestion control for that flow and a resulting increase in
|
|||
|
|
congestion in the network, ultimately resulting in subsequent packets
|
|||
|
|
dropped for this flow as the average queue size increased at the
|
|||
|
|
congested gateway.
|
|||
|
|
|
|||
|
|
Section 19 considers the potential repercussions of subverting end-
|
|||
|
|
to-end congestion control by either falsely indicating ECN-
|
|||
|
|
Capability, or by erasing the congestion indication in ECN (the CE-
|
|||
|
|
codepoint). We observe in Section 19 that the consequence of
|
|||
|
|
subverting ECN-based congestion control may lead to potential
|
|||
|
|
unfairness, but this is likely to be no worse than the subversion of
|
|||
|
|
either ECN-based or packet-based congestion control by the end nodes.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 24]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
8.1. Complications Introduced by Split Paths
|
|||
|
|
|
|||
|
|
If a router or other network element has access to all of the packets
|
|||
|
|
of a flow, then that router could do no more damage to a flow by
|
|||
|
|
altering the ECN field than it could by simply dropping all of the
|
|||
|
|
packets from that flow. However, in some cases, a malicious or
|
|||
|
|
broken router might have access to only a subset of the packets from
|
|||
|
|
a flow. The question is as follows: can this router, by altering
|
|||
|
|
the ECN field in this subset of the packets, do more damage to that
|
|||
|
|
flow than if it has simply dropped that set of the packets?
|
|||
|
|
|
|||
|
|
This is also discussed in detail in Section 18, which concludes as
|
|||
|
|
follows: It is true that the adversary that has access only to a
|
|||
|
|
subset of packets in an aggregate might, by subverting ECN-based
|
|||
|
|
congestion control, be able to deny the benefits of ECN to the other
|
|||
|
|
packets in the aggregate. While this is undesirable, this is not a
|
|||
|
|
sufficient concern to result in disabling ECN.
|
|||
|
|
|
|||
|
|
9. Encapsulated Packets
|
|||
|
|
|
|||
|
|
9.1. IP packets encapsulated in IP
|
|||
|
|
|
|||
|
|
The encapsulation of IP packet headers in tunnels is used in many
|
|||
|
|
places, including IPsec and IP in IP [RFC2003]. This section
|
|||
|
|
considers issues related to interactions between ECN and IP tunnels,
|
|||
|
|
and specifies two alternative solutions. This discussion is
|
|||
|
|
complemented by RFC 2983's discussion of interactions between
|
|||
|
|
Differentiated Services and IP tunnels of various forms [RFC 2983],
|
|||
|
|
as Differentiated Services uses the remaining six bits of the IP
|
|||
|
|
header octet that is used by ECN (see Figure 2 in Section 5).
|
|||
|
|
|
|||
|
|
|
|||
|
|
Some IP tunnel modes are based on adding a new "outer" IP header that
|
|||
|
|
encapsulates the original, or "inner" IP header and its associated
|
|||
|
|
packet. In many cases, the new "outer" IP header may be added and
|
|||
|
|
removed at intermediate points along a connection, enabling the
|
|||
|
|
network to establish a tunnel without requiring endpoint
|
|||
|
|
participation. We denote tunnels that specify that the outer header
|
|||
|
|
be discarded at tunnel egress as "simple tunnels".
|
|||
|
|
|
|||
|
|
ECN uses the ECN field in the IP header for signaling between routers
|
|||
|
|
and connection endpoints. ECN interacts with IP tunnels based on the
|
|||
|
|
treatment of the ECN field in the IP header. In simple IP tunnels
|
|||
|
|
the octet containing the ECN field is copied or mapped from the inner
|
|||
|
|
IP header to the outer IP header at IP tunnel ingress, and the outer
|
|||
|
|
header's copy of this field is discarded at IP tunnel egress. If the
|
|||
|
|
outer header were to be simply discarded without taking care to deal
|
|||
|
|
with the ECN field, and an ECN-capable router were to set the CE
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 25]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
(Congestion Experienced) codepoint within a packet in a simple IP
|
|||
|
|
tunnel, this indication would be discarded at tunnel egress, losing
|
|||
|
|
the indication of congestion.
|
|||
|
|
|
|||
|
|
Thus, the use of ECN over simple IP tunnels would result in routers
|
|||
|
|
attempting to use the outer IP header to signal congestion to
|
|||
|
|
endpoints, but those congestion warnings never arriving because the
|
|||
|
|
outer header is discarded at the tunnel egress point. This problem
|
|||
|
|
was encountered with ECN and IPsec in tunnel mode, and RFC 2481
|
|||
|
|
recommended that ECN not be used with the older simple IPsec tunnels
|
|||
|
|
in order to avoid this behavior and its consequences. When ECN
|
|||
|
|
becomes widely deployed, then simple tunnels likely to carry ECN-
|
|||
|
|
capable traffic will have to be changed. If ECN-capable traffic is
|
|||
|
|
carried by a simple tunnel through a congested, ECN-capable router,
|
|||
|
|
this could result in subsequent packets being dropped for this flow
|
|||
|
|
as the average queue size increases at the congested router, as
|
|||
|
|
discussed in Section 8 above.
|
|||
|
|
|
|||
|
|
From a security point of view, the use of ECN in the outer header of
|
|||
|
|
an IP tunnel might raise security concerns because an adversary could
|
|||
|
|
tamper with the ECN information that propagates beyond the tunnel
|
|||
|
|
endpoint. Based on an analysis in Sections 18 and 19 of these
|
|||
|
|
concerns and the resultant risks, our overall approach is to make
|
|||
|
|
support for ECN an option for IP tunnels, so that an IP tunnel can be
|
|||
|
|
specified or configured either to use ECN or not to use ECN in the
|
|||
|
|
outer header of the tunnel. Thus, in environments or tunneling
|
|||
|
|
protocols where the risks of using ECN are judged to outweigh its
|
|||
|
|
benefits, the tunnel can simply not use ECN in the outer header.
|
|||
|
|
Then the only indication of congestion experienced at routers within
|
|||
|
|
the tunnel would be through packet loss.
|
|||
|
|
|
|||
|
|
The result is that there are two viable options for the behavior of
|
|||
|
|
ECN-capable connections over an IP tunnel, including IPsec tunnels:
|
|||
|
|
|
|||
|
|
* A limited-functionality option in which ECN is preserved in the
|
|||
|
|
inner header, but disabled in the outer header. The only
|
|||
|
|
mechanism available for signaling congestion occurring within
|
|||
|
|
the tunnel in this case is dropped packets.
|
|||
|
|
|
|||
|
|
* A full-functionality option that supports ECN in both the inner
|
|||
|
|
and outer headers, and propagates congestion warnings from nodes
|
|||
|
|
within the tunnel to endpoints.
|
|||
|
|
|
|||
|
|
Support for these options requires varying amounts of changes to IP
|
|||
|
|
header processing at tunnel ingress and egress. A small subset of
|
|||
|
|
these changes sufficient to support only the limited-functionality
|
|||
|
|
option would be sufficient to eliminate any incompatibility between
|
|||
|
|
ECN and IP tunnels.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 26]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
One goal of this document is to give guidance about the tradeoffs
|
|||
|
|
between the limited-functionality and full-functionality options. A
|
|||
|
|
full discussion of the potential effects of an adversary's
|
|||
|
|
modifications of the ECN field is given in Sections 18 and 19.
|
|||
|
|
|
|||
|
|
9.1.1. The Limited-functionality and Full-functionality Options
|
|||
|
|
|
|||
|
|
The limited-functionality option for ECN encapsulation in IP tunnels
|
|||
|
|
is for the not-ECT codepoint to be set in the outside (encapsulating)
|
|||
|
|
header regardless of the value of the ECN field in the inside
|
|||
|
|
(encapsulated) header. With this option, the ECN field in the inner
|
|||
|
|
header is not altered upon de-capsulation. The disadvantage of this
|
|||
|
|
approach is that the flow does not have ECN support for that part of
|
|||
|
|
the path that is using IP tunneling, even if the encapsulated packet
|
|||
|
|
(from the original TCP sender) is ECN-Capable. That is, if the
|
|||
|
|
encapsulated packet arrives at a congested router that is ECN-
|
|||
|
|
capable, and the router can decide to drop or mark the packet as an
|
|||
|
|
indication of congestion to the end nodes, the router will not be
|
|||
|
|
permitted to set the CE codepoint in the packet header, but instead
|
|||
|
|
will have to drop the packet.
|
|||
|
|
|
|||
|
|
The full-functionality option for ECN encapsulation is to copy the
|
|||
|
|
ECN codepoint of the inside header to the outside header on
|
|||
|
|
encapsulation if the inside header is not-ECT or ECT, and to set the
|
|||
|
|
ECN codepoint of the outside header to ECT(0) if the ECN codepoint of
|
|||
|
|
the inside header is CE. On decapsulation, if the CE codepoint is
|
|||
|
|
set on the outside header, then the CE codepoint is also set in the
|
|||
|
|
inner header. Otherwise, the ECN codepoint on the inner header is
|
|||
|
|
left unchanged. That is, for full ECN support the encapsulation and
|
|||
|
|
decapsulation processing involves the following: At tunnel ingress,
|
|||
|
|
the full-functionality option sets the ECN codepoint in the outer
|
|||
|
|
header. If the ECN codepoint in the inner header is not-ECT or ECT,
|
|||
|
|
then it is copied to the ECN codepoint in the outer header. If the
|
|||
|
|
ECN codepoint in the inner header is CE, then the ECN codepoint in
|
|||
|
|
the outer header is set to ECT(0). Upon decapsulation at the tunnel
|
|||
|
|
egress, the full-functionality option sets the CE codepoint in the
|
|||
|
|
inner header if the CE codepoint is set in the outer header.
|
|||
|
|
Otherwise, no change is made to this field of the inner header.
|
|||
|
|
|
|||
|
|
With the full-functionality option, a flow can take advantage of ECN
|
|||
|
|
in those parts of the path that might use IP tunneling. The
|
|||
|
|
disadvantage of the full-functionality option from a security
|
|||
|
|
perspective is that the IP tunnel cannot protect the flow from
|
|||
|
|
certain modifications to the ECN bits in the IP header within the
|
|||
|
|
tunnel. The potential dangers from modifications to the ECN bits in
|
|||
|
|
the IP header are described in detail in Sections 18 and 19.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 27]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
(1) An IP tunnel MUST modify the handling of the DS field octet at
|
|||
|
|
IP tunnel endpoints by implementing either the limited-
|
|||
|
|
functionality or the full-functionality option.
|
|||
|
|
|
|||
|
|
(2) Optionally, an IP tunnel MAY enable the endpoints of an IP
|
|||
|
|
tunnel to negotiate the choice between the limited-functionality
|
|||
|
|
and the full-functionality option for ECN in the tunnel.
|
|||
|
|
|
|||
|
|
The minimum required to make ECN usable with IP tunnels is the
|
|||
|
|
limited-functionality option, which prevents ECN from being enabled
|
|||
|
|
in the outer header of the tunnel. Full support for ECN requires the
|
|||
|
|
use of the full-functionality option. If there are no optional
|
|||
|
|
mechanisms for the tunnel endpoints to negotiate a choice between the
|
|||
|
|
limited-functionality or full-functionality option, there can be a
|
|||
|
|
pre-existing agreement between the tunnel endpoints about whether to
|
|||
|
|
support the limited-functionality or the full-functionality ECN
|
|||
|
|
option.
|
|||
|
|
|
|||
|
|
All IP tunnels MUST implement the limited-functionality option, and
|
|||
|
|
SHOULD support the full-functionality option.
|
|||
|
|
|
|||
|
|
In addition, it is RECOMMENDED that packets with the CE codepoint in
|
|||
|
|
the outer header be dropped if they arrive at the tunnel egress point
|
|||
|
|
for a tunnel that uses the limited-functionality option, or for a
|
|||
|
|
tunnel that uses the full-functionality option but for which the
|
|||
|
|
not-ECT codepoint is set in the inner header. This is motivated by
|
|||
|
|
backwards compatibility and to ensure that no unauthorized
|
|||
|
|
modifications of the ECN field take place, and is discussed further
|
|||
|
|
in the next Section (9.1.2).
|
|||
|
|
|
|||
|
|
9.1.2. Changes to the ECN Field within an IP Tunnel.
|
|||
|
|
|
|||
|
|
The presence of a copy of the ECN field in the inner header of an IP
|
|||
|
|
tunnel mode packet provides an opportunity for detection of
|
|||
|
|
unauthorized modifications to the ECN field in the outer header.
|
|||
|
|
Comparison of the ECT fields in the inner and outer headers falls
|
|||
|
|
into two categories for implementations that conform to this
|
|||
|
|
document:
|
|||
|
|
|
|||
|
|
* If the IP tunnel uses the full-functionality option, then the
|
|||
|
|
not-ECT codepoint should be set in the outer header if and only
|
|||
|
|
if it is also set in the inner header.
|
|||
|
|
|
|||
|
|
* If the tunnel uses the limited-functionality option, then the
|
|||
|
|
not-ECT codepoint should be set in the outer header.
|
|||
|
|
|
|||
|
|
Receipt of a packet not satisfying the appropriate condition could be
|
|||
|
|
a cause of concern.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 28]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Consider the case of an IP tunnel where the tunnel ingress point has
|
|||
|
|
not been updated to this document's requirements, while the tunnel
|
|||
|
|
egress point has been updated to support ECN. In this case, the IP
|
|||
|
|
tunnel is not explicitly configured to support the full-functionality
|
|||
|
|
ECN option. However, the tunnel ingress point is behaving identically
|
|||
|
|
to a tunnel ingress point that supports the full-functionality
|
|||
|
|
option. If packets from an ECN-capable connection use this tunnel,
|
|||
|
|
the ECT codepoint will be set in the outer header at the tunnel
|
|||
|
|
ingress point. Congestion within the tunnel may then result in ECN-
|
|||
|
|
capable routers setting CE in the outer header. Because the tunnel
|
|||
|
|
has not been explicitly configured to support the full-functionality
|
|||
|
|
option, the tunnel egress point expects the not-ECT codepoint to be
|
|||
|
|
set in the outer header. When an ECN-capable tunnel egress point
|
|||
|
|
receives a packet with the ECT or CE codepoint in the outer header,
|
|||
|
|
in a tunnel that has not been configured to support the full-
|
|||
|
|
functionality option, that packet should be processed, according to
|
|||
|
|
whether the CE codepoint was set, as follows. It is RECOMMENDED that
|
|||
|
|
on a tunnel that has not been configured to support the full-
|
|||
|
|
functionality option, packets should be dropped at the egress point
|
|||
|
|
if the CE codepoint is set in the outer header but not in the inner
|
|||
|
|
header, and should be forwarded otherwise.
|
|||
|
|
|
|||
|
|
An IP tunnel cannot provide protection against erasure of congestion
|
|||
|
|
indications based on changing the ECN codepoint from CE to ECT. The
|
|||
|
|
erasure of congestion indications may impact the network and other
|
|||
|
|
flows in ways that would not be possible in the absence of ECN. It
|
|||
|
|
is important to note that erasure of congestion indications can only
|
|||
|
|
be performed to congestion indications placed by nodes within the
|
|||
|
|
tunnel; the copy of the ECN field in the inner header preserves
|
|||
|
|
congestion notifications from nodes upstream of the tunnel ingress
|
|||
|
|
(unless the inner header is also erased). If erasure of congestion
|
|||
|
|
notifications is judged to be a security risk that exceeds the
|
|||
|
|
congestion management benefits of ECN, then tunnels could be
|
|||
|
|
specified or configured to use the limited-functionality option.
|
|||
|
|
|
|||
|
|
9.2. IPsec Tunnels
|
|||
|
|
|
|||
|
|
IPsec supports secure communication over potentially insecure network
|
|||
|
|
components such as intermediate routers. IPsec protocols support two
|
|||
|
|
operating modes, transport mode and tunnel mode, that span a wide
|
|||
|
|
range of security requirements and operating environments. Transport
|
|||
|
|
mode security protocol header(s) are inserted between the IP (IPv4 or
|
|||
|
|
IPv6) header and higher layer protocol headers (e.g., TCP), and hence
|
|||
|
|
transport mode can only be used for end-to-end security on a
|
|||
|
|
connection. IPsec tunnel mode is based on adding a new "outer" IP
|
|||
|
|
header that encapsulates the original, or "inner" IP header and its
|
|||
|
|
associated packet. Tunnel mode security headers are inserted between
|
|||
|
|
these two IP headers. In contrast to transport mode, the new "outer"
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 29]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
IP header and tunnel mode security headers can be added and removed
|
|||
|
|
at intermediate points along a connection, enabling security gateways
|
|||
|
|
to secure vulnerable portions of a connection without requiring
|
|||
|
|
endpoint participation in the security protocols. An important
|
|||
|
|
aspect of tunnel mode security is that in the original specification,
|
|||
|
|
the outer header is discarded at tunnel egress, ensuring that
|
|||
|
|
security threats based on modifying the IP header do not propagate
|
|||
|
|
beyond that tunnel endpoint. Further discussion of IPsec can be
|
|||
|
|
found in [RFC2401].
|
|||
|
|
|
|||
|
|
The IPsec protocol as originally defined in [ESP, AH] required that
|
|||
|
|
the inner header's ECN field not be changed by IPsec decapsulation
|
|||
|
|
processing at a tunnel egress node; this would have ruled out the
|
|||
|
|
possibility of full-functionality mode for ECN. At the same time,
|
|||
|
|
this would ensure that an adversary's modifications to the ECN field
|
|||
|
|
cannot be used to launch theft- or denial-of-service attacks across
|
|||
|
|
an IPsec tunnel endpoint, as any such modifications will be discarded
|
|||
|
|
at the tunnel endpoint.
|
|||
|
|
|
|||
|
|
In principle, permitting the use of ECN functionality in the outer
|
|||
|
|
header of an IPsec tunnel raises security concerns because an
|
|||
|
|
adversary could tamper with the information that propagates beyond
|
|||
|
|
the tunnel endpoint. Based on an analysis (included in Sections 18
|
|||
|
|
and 19) of these concerns and the associated risks, our overall
|
|||
|
|
approach has been to provide configuration support for IPsec changes
|
|||
|
|
to remove the conflict with ECN.
|
|||
|
|
|
|||
|
|
In particular, in tunnel mode the IPsec tunnel MUST support the
|
|||
|
|
limited-functionality option outlined in Section 9.1.1, and SHOULD
|
|||
|
|
support the full-functionality option outlined in Section 9.1.1.
|
|||
|
|
|
|||
|
|
This makes permission to use ECN functionality in the outer header of
|
|||
|
|
an IPsec tunnel a configurable part of the corresponding IPsec
|
|||
|
|
Security Association (SA), so that it can be disabled in situations
|
|||
|
|
where the risks are judged to outweigh the benefits. The result is
|
|||
|
|
that an IPsec security administrator is presented with two
|
|||
|
|
alternatives for the behavior of ECN-capable connections within an
|
|||
|
|
IPsec tunnel, the limited-functionality alternative and full-
|
|||
|
|
functionality alternative described earlier.
|
|||
|
|
|
|||
|
|
In addition, this document specifies how the endpoints of an IPsec
|
|||
|
|
tunnel could negotiate enabling ECN functionality in the outer
|
|||
|
|
headers of that tunnel based on security policy. The ability to
|
|||
|
|
negotiate ECN usage between tunnel endpoints would enable a security
|
|||
|
|
administrator to disable ECN in situations where she believes the
|
|||
|
|
risks (e.g., of lost congestion notifications) outweigh the benefits
|
|||
|
|
of ECN.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 30]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
The IPsec protocol, as defined in [ESP, AH], does not include the IP
|
|||
|
|
header's ECN field in any of its cryptographic calculations (in the
|
|||
|
|
case of tunnel mode, the outer IP header's ECN field is not
|
|||
|
|
included). Hence modification of the ECN field by a network node has
|
|||
|
|
no effect on IPsec's end-to-end security, because it cannot cause any
|
|||
|
|
IPsec integrity check to fail. As a consequence, IPsec does not
|
|||
|
|
provide any defense against an adversary's modification of the ECN
|
|||
|
|
field (i.e., a man-in-the-middle attack), as the adversary's
|
|||
|
|
modification will also have no effect on IPsec's end-to-end security.
|
|||
|
|
In some environments, the ability to modify the ECN field without
|
|||
|
|
affecting IPsec integrity checks may constitute a covert channel; if
|
|||
|
|
it is necessary to eliminate such a channel or reduce its bandwidth,
|
|||
|
|
then the IPsec tunnel should be run in limited-functionality mode.
|
|||
|
|
|
|||
|
|
9.2.1. Negotiation between Tunnel Endpoints
|
|||
|
|
|
|||
|
|
This section describes the detailed changes to enable usage of ECN
|
|||
|
|
over IPsec tunnels, including the negotiation of ECN support between
|
|||
|
|
tunnel endpoints. This is supported by three changes to IPsec:
|
|||
|
|
|
|||
|
|
* An optional Security Association Database (SAD) field indicating
|
|||
|
|
whether tunnel encapsulation and decapsulation processing allows
|
|||
|
|
or forbids ECN usage in the outer IP header.
|
|||
|
|
|
|||
|
|
* An optional Security Association Attribute that enables
|
|||
|
|
negotiation of this SAD field between the two endpoints of an SA
|
|||
|
|
that supports tunnel mode.
|
|||
|
|
|
|||
|
|
* Changes to tunnel mode encapsulation and decapsulation
|
|||
|
|
processing to allow or forbid ECN usage in the outer IP header
|
|||
|
|
based on the value of the SAD field. When ECN usage is allowed
|
|||
|
|
in the outer IP header, the ECT codepoint is set in the outer
|
|||
|
|
header for ECN-capable connections and congestion notifications
|
|||
|
|
(indicated by the CE codepoint) from such connections are
|
|||
|
|
propagated to the inner header at tunnel egress.
|
|||
|
|
|
|||
|
|
If negotiation of ECN usage is implemented, then the SAD field SHOULD
|
|||
|
|
also be implemented. On the other hand, negotiation of ECN usage is
|
|||
|
|
OPTIONAL in all cases, even for implementations that support the SAD
|
|||
|
|
field. The encapsulation and decapsulation processing changes are
|
|||
|
|
REQUIRED, but MAY be implemented without the other two changes by
|
|||
|
|
assuming that ECN usage is always forbidden. The full-functionality
|
|||
|
|
alternative for ECN usage over IPsec tunnels consists of the SAD
|
|||
|
|
field and the full version of encapsulation and decapsulation
|
|||
|
|
processing changes, with or without the OPTIONAL negotiation support.
|
|||
|
|
The limited-functionality alternative consists of a subset of the
|
|||
|
|
encapsulation and decapsulation changes that always forbids ECN
|
|||
|
|
usage.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 31]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
These changes are covered further in the following three subsections.
|
|||
|
|
|
|||
|
|
9.2.1.1. ECN Tunnel Security Association Database Field
|
|||
|
|
|
|||
|
|
Full ECN functionality adds a new field to the SAD (see [RFC2401]):
|
|||
|
|
|
|||
|
|
ECN Tunnel: allowed or forbidden.
|
|||
|
|
|
|||
|
|
Indicates whether ECN-capable connections using this SA in tunnel
|
|||
|
|
mode are permitted to receive ECN congestion notifications for
|
|||
|
|
congestion occurring within the tunnel. The allowed value enables
|
|||
|
|
ECN congestion notifications. The forbidden value disables such
|
|||
|
|
notifications, causing all congestion to be indicated via dropped
|
|||
|
|
packets.
|
|||
|
|
|
|||
|
|
[OPTIONAL. The value of this field SHOULD be assumed to be
|
|||
|
|
"forbidden" in implementations that do not support it.]
|
|||
|
|
|
|||
|
|
If this attribute is implemented, then the SA specification in a
|
|||
|
|
Security Policy Database (SPD) entry MUST support a corresponding
|
|||
|
|
attribute, and this SPD attribute MUST be covered by the SPD
|
|||
|
|
administrative interface (currently described in Section 4.4.1 of
|
|||
|
|
[RFC2401]).
|
|||
|
|
|
|||
|
|
9.2.1.2. ECN Tunnel Security Association Attribute
|
|||
|
|
|
|||
|
|
A new IPsec Security Association Attribute is defined to enable the
|
|||
|
|
support for ECN congestion notifications based on the outer IP header
|
|||
|
|
to be negotiated for IPsec tunnels (see [RFC2407]). This attribute
|
|||
|
|
is OPTIONAL, although implementations that support it SHOULD also
|
|||
|
|
support the SAD field defined in Section 9.2.1.1.
|
|||
|
|
|
|||
|
|
Attribute Type
|
|||
|
|
|
|||
|
|
class value type
|
|||
|
|
-------------------------------------------------
|
|||
|
|
ECN Tunnel 10 Basic
|
|||
|
|
|
|||
|
|
The IPsec SA Attribute value 10 has been allocated by IANA to
|
|||
|
|
indicate that the ECN Tunnel SA Attribute is being negotiated; the
|
|||
|
|
type of this attribute is Basic (see Section 4.5 of [RFC2407]). The
|
|||
|
|
Class Values are used to conduct the negotiation. See [RFC2407,
|
|||
|
|
RFC2408, RFC2409] for further information including encoding formats
|
|||
|
|
and requirements for negotiating this SA attribute.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 32]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Class Values
|
|||
|
|
|
|||
|
|
ECN Tunnel
|
|||
|
|
|
|||
|
|
Specifies whether ECN functionality is allowed to be used with Tunnel
|
|||
|
|
Encapsulation Mode. This affects tunnel encapsulation and
|
|||
|
|
decapsulation processing - see Section 9.2.1.3.
|
|||
|
|
|
|||
|
|
RESERVED 0
|
|||
|
|
Allowed 1
|
|||
|
|
Forbidden 2
|
|||
|
|
|
|||
|
|
Values 3-61439 are reserved to IANA. Values 61440-65535 are for
|
|||
|
|
private use.
|
|||
|
|
|
|||
|
|
If unspecified, the default shall be assumed to be Forbidden.
|
|||
|
|
|
|||
|
|
ECN Tunnel is a new SA attribute, and hence initiators that use it
|
|||
|
|
can expect to encounter responders that do not understand it, and
|
|||
|
|
therefore reject proposals containing it. For backwards
|
|||
|
|
compatibility with such implementations initiators SHOULD always also
|
|||
|
|
include a proposal without the ECN Tunnel attribute to enable such a
|
|||
|
|
responder to select a transform or proposal that does not contain the
|
|||
|
|
ECN Tunnel attribute. RFC 2407 currently requires responders to
|
|||
|
|
reject all proposals if any proposal contains an unknown attribute;
|
|||
|
|
this requirement is expected to be changed to require a responder not
|
|||
|
|
to select proposals or transforms containing unknown attributes.
|
|||
|
|
|
|||
|
|
9.2.1.3. Changes to IPsec Tunnel Header Processing
|
|||
|
|
|
|||
|
|
For full ECN support, the encapsulation and decapsulation processing
|
|||
|
|
for the IPv4 TOS field and the IPv6 Traffic Class field are changed
|
|||
|
|
from that specified in [RFC2401] to the following:
|
|||
|
|
|
|||
|
|
<-- How Outer Hdr Relates to Inner Hdr -->
|
|||
|
|
Outer Hdr at Inner Hdr at
|
|||
|
|
IPv4 Encapsulator Decapsulator
|
|||
|
|
Header fields: -------------------- ------------
|
|||
|
|
DS Field copied from inner hdr (5) no change
|
|||
|
|
ECN Field constructed (7) constructed (8)
|
|||
|
|
|
|||
|
|
IPv6
|
|||
|
|
Header fields:
|
|||
|
|
DS Field copied from inner hdr (6) no change
|
|||
|
|
ECN Field constructed (7) constructed (8)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 33]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
(5)(6) If the packet will immediately enter a domain for which the
|
|||
|
|
DSCP value in the outer header is not appropriate, that value MUST
|
|||
|
|
be mapped to an appropriate value for the domain [RFC 2474]. Also
|
|||
|
|
see [RFC 2475] for further information.
|
|||
|
|
|
|||
|
|
(7) If the value of the ECN Tunnel field in the SAD entry for this
|
|||
|
|
SA is "allowed" and the ECN field in the inner header is set to
|
|||
|
|
any value other than CE, copy this ECN field to the outer header.
|
|||
|
|
If the ECN field in the inner header is set to CE, then set the
|
|||
|
|
ECN field in the outer header to ECT(0).
|
|||
|
|
|
|||
|
|
(8) If the value of the ECN tunnel field in the SAD entry for this
|
|||
|
|
SA is "allowed" and the ECN field in the inner header is set to
|
|||
|
|
ECT(0) or ECT(1) and the ECN field in the outer header is set to
|
|||
|
|
CE, then copy the ECN field from the outer header to the inner
|
|||
|
|
header. Otherwise, make no change to the ECN field in the inner
|
|||
|
|
header.
|
|||
|
|
|
|||
|
|
(5) and (6) are identical to match usage in [RFC2401], although
|
|||
|
|
they are different in [RFC2401].
|
|||
|
|
|
|||
|
|
The above description applies to implementations that support the ECN
|
|||
|
|
Tunnel field in the SAD; such implementations MUST implement this
|
|||
|
|
processing instead of the processing of the IPv4 TOS octet and IPv6
|
|||
|
|
Traffic Class octet defined in [RFC2401]. This constitutes the
|
|||
|
|
full-functionality alternative for ECN usage with IPsec tunnels.
|
|||
|
|
|
|||
|
|
An implementation that does not support the ECN Tunnel field in the
|
|||
|
|
SAD MUST implement this processing by assuming that the value of the
|
|||
|
|
ECN Tunnel field of the SAD is "forbidden" for every SA. In this
|
|||
|
|
case, the processing of the ECN field reduces to:
|
|||
|
|
|
|||
|
|
(7) Set the ECN field to not-ECT in the outer header.
|
|||
|
|
(8) Make no change to the ECN field in the inner header.
|
|||
|
|
|
|||
|
|
This constitutes the limited functionality alternative for ECN usage
|
|||
|
|
with IPsec tunnels.
|
|||
|
|
|
|||
|
|
For backwards compatibility, packets with the CE codepoint set in the
|
|||
|
|
outer header SHOULD be dropped if they arrive on an SA that is using
|
|||
|
|
the limited-functionality option, or that is using the full-
|
|||
|
|
functionality option with the not-ECN codepoint set in the inner
|
|||
|
|
header.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 34]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
9.2.2. Changes to the ECN Field within an IPsec Tunnel.
|
|||
|
|
|
|||
|
|
If the ECN Field is changed inappropriately within an IPsec tunnel,
|
|||
|
|
and this change is detected at the tunnel egress, then the receipt of
|
|||
|
|
a packet not satisfying the appropriate condition for its SA is an
|
|||
|
|
auditable event. An implementation MAY create audit records with
|
|||
|
|
per-SA counts of incorrect packets over some time period rather than
|
|||
|
|
creating an audit record for each erroneous packet. Any such audit
|
|||
|
|
record SHOULD contain the headers from at least one erroneous packet,
|
|||
|
|
but need not contain the headers from every packet represented by the
|
|||
|
|
entry.
|
|||
|
|
|
|||
|
|
9.2.3. Comments for IPsec Support
|
|||
|
|
|
|||
|
|
Substantial comments were received on two areas of this document
|
|||
|
|
during review by the IPsec working group. This section describes
|
|||
|
|
these comments and explains why the proposed changes were not
|
|||
|
|
incorporated.
|
|||
|
|
|
|||
|
|
The first comment indicated that per-node configuration is easier to
|
|||
|
|
implement than per-SA configuration. After serious thought and
|
|||
|
|
despite some initial encouragement of per-node configuration, it no
|
|||
|
|
longer seems to be a good idea. The concern is that as ECN-awareness
|
|||
|
|
is progressively deployed in IPsec, many ECN-aware IPsec
|
|||
|
|
implementations will find themselves communicating with a mixture of
|
|||
|
|
ECN-aware and ECN-unaware IPsec tunnel endpoints. In such an
|
|||
|
|
environment with per-node configuration, the only reasonable thing to
|
|||
|
|
do is forbid ECN usage for all IPsec tunnels, which is not the
|
|||
|
|
desired outcome.
|
|||
|
|
|
|||
|
|
In the second area, several reviewers noted that SA negotiation is
|
|||
|
|
complex, and adding to it is non-trivial. One reviewer suggested
|
|||
|
|
using ICMP after tunnel setup as a possible alternative. The
|
|||
|
|
addition to SA negotiation in this document is OPTIONAL and will
|
|||
|
|
remain so; implementers are free to ignore it. The authors believe
|
|||
|
|
that the assurance it provides can be useful in a number of
|
|||
|
|
situations. In practice, if this is not implemented, it can be
|
|||
|
|
deleted at a subsequent stage in the standards process. Extending
|
|||
|
|
ICMP to negotiate ECN after tunnel setup is more complex than
|
|||
|
|
extending SA attribute negotiation. Some tunnels do not permit
|
|||
|
|
traffic to be addressed to the tunnel egress endpoint, hence the ICMP
|
|||
|
|
packet would have to be addressed to somewhere else, scanned for by
|
|||
|
|
the egress endpoint, and discarded there or at its actual
|
|||
|
|
destination. In addition, ICMP delivery is unreliable, and hence
|
|||
|
|
there is a possibility of an ICMP packet being dropped, entailing the
|
|||
|
|
invention of yet another ack/retransmit mechanism. It seems better
|
|||
|
|
simply to specify an OPTIONAL extension to the existing SA
|
|||
|
|
negotiation mechanism.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 35]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
9.3. IP packets encapsulated in non-IP Packet Headers.
|
|||
|
|
|
|||
|
|
A different set of issues are raised, relative to ECN, when IP
|
|||
|
|
packets are encapsulated in tunnels with non-IP packet headers. This
|
|||
|
|
occurs with MPLS [MPLS], GRE [GRE], L2TP [L2TP], and PPTP [PPTP].
|
|||
|
|
For these protocols, there is no conflict with ECN; it is just that
|
|||
|
|
ECN cannot be used within the tunnel unless an ECN codepoint can be
|
|||
|
|
specified for the header of the encapsulating protocol. Earlier work
|
|||
|
|
considered a preliminary proposal for incorporating ECN into MPLS,
|
|||
|
|
and proposals for incorporating ECN into GRE, L2TP, or PPTP will be
|
|||
|
|
considered as the need arises.
|
|||
|
|
|
|||
|
|
10. Issues Raised by Monitoring and Policing Devices
|
|||
|
|
|
|||
|
|
One possibility is that monitoring and policing devices (or more
|
|||
|
|
informally, "penalty boxes") will be installed in the network to
|
|||
|
|
monitor whether best-effort flows are appropriately responding to
|
|||
|
|
congestion, and to preferentially drop packets from flows determined
|
|||
|
|
not to be using adequate end-to-end congestion control procedures.
|
|||
|
|
|
|||
|
|
We recommend that any "penalty box" that detects a flow or an
|
|||
|
|
aggregate of flows that is not responding to end-to-end congestion
|
|||
|
|
control first change from marking to dropping packets from that flow,
|
|||
|
|
before taking any additional action to restrict the bandwidth
|
|||
|
|
available to that flow. Thus, initially, the router may drop packets
|
|||
|
|
in which the router would otherwise would have set the CE codepoint.
|
|||
|
|
This could include dropping those arriving packets for that flow that
|
|||
|
|
are ECN-Capable and that already have the CE codepoint set. In this
|
|||
|
|
way, any congestion indications seen by that router for that flow
|
|||
|
|
will be guaranteed to also be seen by the end nodes, even in the
|
|||
|
|
presence of malicious or broken routers elsewhere in the path. If we
|
|||
|
|
assume that the first action taken at any "penalty box" for an ECN-
|
|||
|
|
capable flow will be to drop packets instead of marking them, then
|
|||
|
|
there is no way that an adversary that subverts ECN-based end-to-end
|
|||
|
|
congestion control can cause a flow to be characterized as being
|
|||
|
|
non-cooperative and placed into a more severe action within the
|
|||
|
|
"penalty box".
|
|||
|
|
|
|||
|
|
The monitoring and policing devices that are actually deployed could
|
|||
|
|
fall short of the `ideal' monitoring device described above, in that
|
|||
|
|
the monitoring is applied not to a single flow, but to an aggregate
|
|||
|
|
of flows (e.g., those sharing a single IPsec tunnel). In this case,
|
|||
|
|
the switch from marking to dropping would apply to all of the flows
|
|||
|
|
in that aggregate, denying the benefits of ECN to the other flows in
|
|||
|
|
the aggregate also. At the highest level of aggregation, another
|
|||
|
|
form of the disabling of ECN happens even in the absence of
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 36]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
monitoring and policing devices, when ECN-Capable RED queues switch
|
|||
|
|
from marking to dropping packets as an indication of congestion when
|
|||
|
|
the average queue size has exceeded some threshold.
|
|||
|
|
|
|||
|
|
11. Evaluations of ECN
|
|||
|
|
|
|||
|
|
11.1. Related Work Evaluating ECN
|
|||
|
|
|
|||
|
|
This section discusses some of the related work evaluating the use of
|
|||
|
|
ECN. The ECN Web Page [ECN] has pointers to other papers, as well as
|
|||
|
|
to implementations of ECN.
|
|||
|
|
|
|||
|
|
[Floyd94] considers the advantages and drawbacks of adding ECN to the
|
|||
|
|
TCP/IP architecture. As shown in the simulation-based comparisons,
|
|||
|
|
one advantage of ECN is to avoid unnecessary packet drops for short
|
|||
|
|
or delay-sensitive TCP connections. A second advantage of ECN is in
|
|||
|
|
avoiding some unnecessary retransmit timeouts in TCP. This paper
|
|||
|
|
discusses in detail the integration of ECN into TCP's congestion
|
|||
|
|
control mechanisms. The possible disadvantages of ECN discussed in
|
|||
|
|
the paper are that a non-compliant TCP connection could falsely
|
|||
|
|
advertise itself as ECN-capable, and that a TCP ACK packet carrying
|
|||
|
|
an ECN-Echo message could itself be dropped in the network. The
|
|||
|
|
first of these two issues is discussed in the appendix of this
|
|||
|
|
document, and the second is addressed by the addition of the CWR flag
|
|||
|
|
in the TCP header.
|
|||
|
|
|
|||
|
|
Experimental evaluations of ECN include [RFC2884,K98]. The
|
|||
|
|
conclusions of [K98] and [RFC2884] are that ECN TCP gets moderately
|
|||
|
|
better throughput than non-ECN TCP; that ECN TCP flows are fair
|
|||
|
|
towards non-ECN TCP flows; and that ECN TCP is robust with two-way
|
|||
|
|
traffic (with congestion in both directions) and with multiple
|
|||
|
|
congested gateways. Experiments with many short web transfers show
|
|||
|
|
that, while most of the short connections have similar transfer times
|
|||
|
|
with or without ECN, a small percentage of the short connections have
|
|||
|
|
very long transfer times for the non-ECN experiments as compared to
|
|||
|
|
the ECN experiments.
|
|||
|
|
|
|||
|
|
11.2. A Discussion of the ECN nonce.
|
|||
|
|
|
|||
|
|
The use of two ECT codepoints, ECT(0) and ECT(1), can provide a one-
|
|||
|
|
bit ECN nonce in packet headers [SCWA99]. The primary motivation for
|
|||
|
|
this is the desire to allow mechanisms for the data sender to verify
|
|||
|
|
that network elements are not erasing the CE codepoint, and that data
|
|||
|
|
receivers are properly reporting to the sender the receipt of packets
|
|||
|
|
with the CE codepoint set, as required by the transport protocol.
|
|||
|
|
This section discusses issues of backwards compatibility with IP ECN
|
|||
|
|
implementations in routers conformant with RFC 2481, in which only
|
|||
|
|
one ECT codepoint was defined. We do not believe that the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 37]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
incremental deployment of ECN implementations that understand the
|
|||
|
|
ECT(1) codepoint will cause significant operational problems. This
|
|||
|
|
is particularly likely to be the case when the deployment of the
|
|||
|
|
ECT(1) codepoint begins with routers, before the ECT(1) codepoint
|
|||
|
|
starts to be used by end-nodes.
|
|||
|
|
|
|||
|
|
11.2.1. The Incremental Deployment of ECT(1) in Routers.
|
|||
|
|
|
|||
|
|
ECN has been an Experimental standard since January 1999, and there
|
|||
|
|
are already implementations of ECN in routers that do not understand
|
|||
|
|
the ECT(1) codepoint. When the use of the ECT(1) codepoint is
|
|||
|
|
standardized for TCP or for other transport protocols, this could
|
|||
|
|
mean that a data sender is using the ECT(1) codepoint, but that this
|
|||
|
|
codepoint is not understood by a congested router on the path.
|
|||
|
|
|
|||
|
|
If allowed by the transport protocol, a data sender would be free not
|
|||
|
|
to make use of ECT(1) at all, and to send all ECN-capable packets
|
|||
|
|
with the codepoint ECT(0). However, if an ECN-capable sender is
|
|||
|
|
using ECT(1), and the congested router on the path did not understand
|
|||
|
|
the ECT(1) codepoint, then the router would end up marking some of
|
|||
|
|
the ECT(0) packets, and dropping some of the ECT(1) packets, as
|
|||
|
|
indications of congestion. Since TCP is required to react to both
|
|||
|
|
marked and dropped packets, this behavior of dropping packets that
|
|||
|
|
could have been marked poses no significant threat to the network,
|
|||
|
|
and is consistent with the overall approach to ECN that allows
|
|||
|
|
routers to determine when and whether to mark packets as they see fit
|
|||
|
|
(see Section 5).
|
|||
|
|
|
|||
|
|
12. Summary of changes required in IP and TCP
|
|||
|
|
|
|||
|
|
This document specified two bits in the IP header to be used for ECN.
|
|||
|
|
The not-ECT codepoint indicates that the transport protocol will
|
|||
|
|
ignore the CE codepoint. This is the default value for the ECN
|
|||
|
|
codepoint. The ECT codepoints indicate that the transport protocol
|
|||
|
|
is willing and able to participate in ECN.
|
|||
|
|
|
|||
|
|
The router sets the CE codepoint to indicate congestion to the end
|
|||
|
|
nodes. The CE codepoint in a packet header MUST NOT be reset by a
|
|||
|
|
router.
|
|||
|
|
|
|||
|
|
TCP requires three changes for ECN, a setup phase and two new flags
|
|||
|
|
in the TCP header. The ECN-Echo flag is used by the data receiver to
|
|||
|
|
inform the data sender of a received CE packet. The Congestion
|
|||
|
|
Window Reduced (CWR) flag is used by the data sender to inform the
|
|||
|
|
data receiver that the congestion window has been reduced.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 38]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
When ECN (Explicit Congestion Notification) is used, it is required
|
|||
|
|
that congestion indications generated within an IP tunnel not be lost
|
|||
|
|
at the tunnel egress. We specified a minor modification to the IP
|
|||
|
|
protocol's handling of the ECN field during encapsulation and de-
|
|||
|
|
capsulation to allow flows that will undergo IP tunneling to use ECN.
|
|||
|
|
|
|||
|
|
Two options for ECN in tunnels were specified:
|
|||
|
|
|
|||
|
|
1) A limited-functionality option that does not use ECN inside the IP
|
|||
|
|
tunnel, by setting the ECN field in the outer header to not-ECT, and
|
|||
|
|
not altering the inner header at the time of decapsulation.
|
|||
|
|
|
|||
|
|
2) The full-functionality option, which sets the ECN field in the
|
|||
|
|
outer header to either not-ECT or to one of the ECT codepoints,
|
|||
|
|
depending on the ECN field in the inner header. At decapsulation, if
|
|||
|
|
the CE codepoint is set in the outer header, and the inner header is
|
|||
|
|
set to one of the ECT codepoints, then the CE codepoint is copied to
|
|||
|
|
the inner header.
|
|||
|
|
|
|||
|
|
For IPsec tunnels, this document also defines an optional IPsec
|
|||
|
|
Security Association (SA) attribute that enables negotiation of ECN
|
|||
|
|
usage within IPsec tunnels and an optional field in the Security
|
|||
|
|
Association Database to indicate whether ECN is permitted in tunnel
|
|||
|
|
mode on a SA. The required changes to IPsec tunnels for ECN usage
|
|||
|
|
modify RFC 2401 [RFC2401], which defines the IPsec architecture and
|
|||
|
|
specifies some aspects of its implementation. The new IPsec SA
|
|||
|
|
attribute is in addition to those already defined in Section 4.5 of
|
|||
|
|
[RFC2407].
|
|||
|
|
|
|||
|
|
This document obsoletes RFC 2481, "A Proposal to add Explicit
|
|||
|
|
Congestion Notification (ECN) to IP", which defined ECN as an
|
|||
|
|
Experimental Protocol for the Internet Community. The rest of this
|
|||
|
|
section describes the relationship between this document and its
|
|||
|
|
predecessor.
|
|||
|
|
|
|||
|
|
RFC 2481 included a brief discussion of the use of ECN with
|
|||
|
|
encapsulated packets, and noted that for the IPsec specifications at
|
|||
|
|
the time (January 1999), flows could not safely use ECN if they were
|
|||
|
|
to traverse IPsec tunnels. RFC 2481 also described the changes that
|
|||
|
|
could be made to IPsec tunnel specifications to made them compatible
|
|||
|
|
with ECN.
|
|||
|
|
|
|||
|
|
This document also incorporates work that was done after RFC 2481.
|
|||
|
|
First was to describe the changes to IPsec tunnels in detail, and
|
|||
|
|
extensively discuss the security implications of ECN (now included as
|
|||
|
|
Sections 18 and 19 of this document). Second was to extend the
|
|||
|
|
discussion of IPsec tunnels to include all IP tunnels. Because older
|
|||
|
|
IP tunnels are not compatible with a flow's use of ECN, the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 39]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
deployment of ECN in the Internet will create strong pressure for
|
|||
|
|
older IP tunnels to be updated to an ECN-compatible version, using
|
|||
|
|
either the limited-functionality or the full-functionality option.
|
|||
|
|
|
|||
|
|
This document does not address the issue of including ECN in non-IP
|
|||
|
|
tunnels such as MPLS, GRE, L2TP, or PPTP. An earlier preliminary
|
|||
|
|
document about adding ECN support to MPLS was not advanced.
|
|||
|
|
|
|||
|
|
A third new piece of work after RFC2481 was to describe the ECN
|
|||
|
|
procedure with retransmitted data packets, that an ECT codepoint
|
|||
|
|
should not be set on retransmitted data packets. The motivation for
|
|||
|
|
this additional specification is to eliminate a possible avenue for
|
|||
|
|
denial-of-service attacks on an existing TCP connection. Some prior
|
|||
|
|
deployments of ECN-capable TCP might not conform to the (new)
|
|||
|
|
requirement not to set an ECT codepoint on retransmitted packets; we
|
|||
|
|
do not believe this will cause significant problems in practice.
|
|||
|
|
|
|||
|
|
This document also expands slightly on the specification of the use
|
|||
|
|
of SYN packets for the negotiation of ECN. While some prior
|
|||
|
|
deployments of ECN-capable TCP might not conform to the requirements
|
|||
|
|
specified in this document, we do not believe that this will lead to
|
|||
|
|
any performance or compatibility problems for TCP connections with a
|
|||
|
|
combination of TCP implementations at the endpoints.
|
|||
|
|
|
|||
|
|
This document also includes the specification of the ECT(1)
|
|||
|
|
codepoint, which may be used by TCP as part of the implementation of
|
|||
|
|
an ECN nonce.
|
|||
|
|
|
|||
|
|
13. Conclusions
|
|||
|
|
|
|||
|
|
Given the current effort to implement AQM, we believe this is the
|
|||
|
|
right time to deploy congestion avoidance mechanisms that do not
|
|||
|
|
depend on packet drops alone. With the increased deployment of
|
|||
|
|
applications and transports sensitive to the delay and loss of a
|
|||
|
|
single packet (e.g., realtime traffic, short web transfers),
|
|||
|
|
depending on packet loss as a normal congestion notification
|
|||
|
|
mechanism appears to be insufficient (or at the very least, non-
|
|||
|
|
optimal).
|
|||
|
|
|
|||
|
|
We examined the consequence of modifications of the ECN field within
|
|||
|
|
the network, analyzing all the opportunities for an adversary to
|
|||
|
|
change the ECN field. In many cases, the change to the ECN field is
|
|||
|
|
no worse than dropping a packet. However, we noted that some changes
|
|||
|
|
have the more serious consequence of subverting end-to-end congestion
|
|||
|
|
control. However, we point out that even then the potential damage
|
|||
|
|
is limited, and is similar to the threat posed by end-systems
|
|||
|
|
intentionally failing to cooperate with end-to-end congestion
|
|||
|
|
control.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 40]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
14. Acknowledgements
|
|||
|
|
|
|||
|
|
Many people have made contributions to this work and this document,
|
|||
|
|
including many that we have not managed to directly acknowledge in
|
|||
|
|
this document. In addition, we would like to thank Kenjiro Cho for
|
|||
|
|
the proposal for the TCP mechanism for negotiating ECN-Capability,
|
|||
|
|
Kevin Fall for the proposal of the CWR bit, Steve Blake for material
|
|||
|
|
on IPv4 Header Checksum Recalculation, Jamal Hadi-Salim for
|
|||
|
|
discussions of ECN issues, and Steve Bellovin, Jim Bound, Brian
|
|||
|
|
Carpenter, Paul Ferguson, Stephen Kent, Greg Minshall, and Vern
|
|||
|
|
Paxson for discussions of security issues. We also thank the
|
|||
|
|
Internet End-to-End Research Group for ongoing discussions of these
|
|||
|
|
issues.
|
|||
|
|
|
|||
|
|
Email discussions with a number of people, including Dax Kelson,
|
|||
|
|
Alexey Kuznetsov, Jamal Hadi-Salim, and Venkat Venkatsubra, have
|
|||
|
|
addressed the issues raised by non-conformant equipment in the
|
|||
|
|
Internet that does not respond to TCP SYN packets with the ECE and
|
|||
|
|
CWR flags set. We thank Mark Handley, Jitentra Padhye, and others
|
|||
|
|
for discussions on the TCP initialization procedures.
|
|||
|
|
|
|||
|
|
The discussion of ECN and IP tunnel considerations draws heavily on
|
|||
|
|
related discussions and documents from the Differentiated Services
|
|||
|
|
Working Group. We thank Tabassum Bint Haque from Dhaka, Bangladesh,
|
|||
|
|
for feedback on IP tunnels. We thank Derrell Piper and Kero Tivinen
|
|||
|
|
for proposing modifications to RFC 2407 that improve the usability of
|
|||
|
|
negotiating the ECN Tunnel SA attribute.
|
|||
|
|
|
|||
|
|
We thank David Wetherall, David Ely, and Neil Spring for the proposal
|
|||
|
|
for the ECN nonce. We also thank Stefan Savage for discussions on
|
|||
|
|
this issue. We thank Bob Briscoe and Jon Crowcroft for raising the
|
|||
|
|
issue of fragmentation in IP, on alternate semantics for the fourth
|
|||
|
|
ECN codepoint, and several other topics. We thank Richard Wendland
|
|||
|
|
for feedback on several issues in the document.
|
|||
|
|
|
|||
|
|
We also thank the IESG, and in particular the Transport Area
|
|||
|
|
Directors over the years, for their feedback and their work towards
|
|||
|
|
the standardization of ECN.
|
|||
|
|
|
|||
|
|
15. References
|
|||
|
|
|
|||
|
|
[AH] Kent, S. and R. Atkinson, "IP Authentication Header",
|
|||
|
|
RFC 2402, November 1998.
|
|||
|
|
|
|||
|
|
[ECN] "The ECN Web Page", URL
|
|||
|
|
"http://www.aciri.org/floyd/ecn.html". Reference for
|
|||
|
|
informational purposes only.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 41]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
[ESP] Kent, S. and R. Atkinson, "IP Encapsulating Security
|
|||
|
|
Payload", RFC 2406, November 1998.
|
|||
|
|
|
|||
|
|
[FIXES] ECN-under-Linux Unofficial Vendor Support Page, URL
|
|||
|
|
"http://gtf.org/garzik/ecn/". Reference for
|
|||
|
|
informational purposes only.
|
|||
|
|
|
|||
|
|
[FJ93] Floyd, S., and Jacobson, V., "Random Early Detection
|
|||
|
|
gateways for Congestion Avoidance", IEEE/ACM
|
|||
|
|
Transactions on Networking, V.1 N.4, August 1993, p.
|
|||
|
|
397-413.
|
|||
|
|
|
|||
|
|
[Floyd94] Floyd, S., "TCP and Explicit Congestion Notification",
|
|||
|
|
ACM Computer Communication Review, V. 24 N. 5, October
|
|||
|
|
1994, p. 10-23.
|
|||
|
|
|
|||
|
|
[Floyd98] Floyd, S., "The ECN Validation Test in the NS
|
|||
|
|
Simulator", URL "http://www-mash.cs.berkeley.edu/ns/",
|
|||
|
|
test tcl/test/test-all- ecn. Reference for
|
|||
|
|
informational purposes only.
|
|||
|
|
|
|||
|
|
[FF99] Floyd, S., and Fall, K., "Promoting the Use of End-to-
|
|||
|
|
End Congestion Control in the Internet", IEEE/ACM
|
|||
|
|
Transactions on Networking, August 1999.
|
|||
|
|
|
|||
|
|
[FRED] Lin, D., and Morris, R., "Dynamics of Random Early
|
|||
|
|
Detection", SIGCOMM '97, September 1997.
|
|||
|
|
|
|||
|
|
[GRE] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
|
|||
|
|
Routing Encapsulation (GRE)", RFC 1701, October 1994.
|
|||
|
|
|
|||
|
|
[Jacobson88] V. Jacobson, "Congestion Avoidance and Control", Proc.
|
|||
|
|
ACM SIGCOMM '88, pp. 314-329.
|
|||
|
|
|
|||
|
|
[Jacobson90] V. Jacobson, "Modified TCP Congestion Avoidance
|
|||
|
|
Algorithm", Message to end2end-interest mailing list,
|
|||
|
|
April 1990. URL
|
|||
|
|
"ftp://ftp.ee.lbl.gov/email/vanj.90apr30.txt".
|
|||
|
|
|
|||
|
|
[K98] Krishnan, H., "Analyzing Explicit Congestion
|
|||
|
|
Notification (ECN) benefits for TCP", Master's thesis,
|
|||
|
|
UCLA, 1998. Citation for acknowledgement purposes only.
|
|||
|
|
|
|||
|
|
[L2TP] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
|
|||
|
|
G. and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
|
|||
|
|
RFC 2661, August 1999.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 42]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
[MJV96] S. McCanne, V. Jacobson, and M. Vetterli, "Receiver-
|
|||
|
|
driven Layered Multicast", SIGCOMM '96, August 1996, pp.
|
|||
|
|
117-130.
|
|||
|
|
|
|||
|
|
[MPLS] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J.
|
|||
|
|
McManus, Requirements for Traffic Engineering Over MPLS,
|
|||
|
|
RFC 2702, September 1999.
|
|||
|
|
|
|||
|
|
[PPTP] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,
|
|||
|
|
W. and G. Zorn, "Point-to-Point Tunneling Protocol
|
|||
|
|
(PPTP)", RFC 2637, July 1999.
|
|||
|
|
|
|||
|
|
[RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791,
|
|||
|
|
September 1981.
|
|||
|
|
|
|||
|
|
[RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
|
|||
|
|
793, September 1981.
|
|||
|
|
|
|||
|
|
[RFC1141] Mallory, T. and A. Kullberg, "Incremental Updating of
|
|||
|
|
the Internet Checksum", RFC 1141, January 1990.
|
|||
|
|
|
|||
|
|
[RFC1349] Almquist, P., "Type of Service in the Internet Protocol
|
|||
|
|
Suite", RFC 1349, July 1992.
|
|||
|
|
|
|||
|
|
[RFC1455] Eastlake, D., "Physical Link Security Type of Service",
|
|||
|
|
RFC 1455, May 1993.
|
|||
|
|
|
|||
|
|
[RFC1701] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
|
|||
|
|
Routing Encapsulation (GRE)", RFC 1701, October 1994.
|
|||
|
|
|
|||
|
|
[RFC1702] Hanks, S., Li, T., Farinacci, D. and P. Traina, "Generic
|
|||
|
|
Routing Encapsulation over IPv4 networks", RFC 1702,
|
|||
|
|
October 1994.
|
|||
|
|
|
|||
|
|
[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
|
|||
|
|
October 1996.
|
|||
|
|
|
|||
|
|
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
|
|||
|
|
Requirement Levels", BCP 14, RFC 2119, March 1997.
|
|||
|
|
|
|||
|
|
[RFC2309] Braden, B., et al., "Recommendations on Queue Management
|
|||
|
|
and Congestion Avoidance in the Internet", RFC 2309,
|
|||
|
|
April 1998.
|
|||
|
|
|
|||
|
|
[RFC2401] Kent, S. and R. Atkinson, Security Architecture for the
|
|||
|
|
Internet Protocol, RFC 2401, November 1998.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 43]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
[RFC2407] Piper, D., "The Internet IP Security Domain of
|
|||
|
|
Interpretation for ISAKMP", RFC 2407, November 1998.
|
|||
|
|
|
|||
|
|
[RFC2408] Maughan, D., Schertler, M., Schneider, M. and J. Turner,
|
|||
|
|
"Internet Security Association and Key Management
|
|||
|
|
Protocol (ISAKMP)", RFC 2409, November 1998.
|
|||
|
|
|
|||
|
|
[RFC2409] Harkins D. and D. Carrel, "The Internet Key Exchange
|
|||
|
|
(IKE)", RFC 2409, November 1998.
|
|||
|
|
|
|||
|
|
[RFC2474] Nichols, K., Blake, S., Baker, F. and D. Black,
|
|||
|
|
"Definition of the Differentiated Services Field (DS
|
|||
|
|
Field) in the IPv4 and IPv6 Headers", RFC 2474, December
|
|||
|
|
1998.
|
|||
|
|
|
|||
|
|
[RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.
|
|||
|
|
and W. Weiss, "An Architecture for Differentiated
|
|||
|
|
Services", RFC 2475, December 1998.
|
|||
|
|
|
|||
|
|
[RFC2481] Ramakrishnan K. and S. Floyd, "A Proposal to add
|
|||
|
|
Explicit Congestion Notification (ECN) to IP", RFC 2481,
|
|||
|
|
January 1999.
|
|||
|
|
|
|||
|
|
[RFC2581] Alman, M., Paxson, V. and W. Stevens, "TCP Congestion
|
|||
|
|
Control", RFC 2581, April 1999.
|
|||
|
|
|
|||
|
|
[RFC2884] Hadi Salim, J. and U. Ahmed, "Performance Evaluation of
|
|||
|
|
Explicit Congestion Notification (ECN) in IP Networks",
|
|||
|
|
RFC 2884, July 2000.
|
|||
|
|
|
|||
|
|
[RFC2983] Black, D., "Differentiated Services and Tunnels",
|
|||
|
|
RFC2983, October 2000.
|
|||
|
|
|
|||
|
|
[RFC2780] Bradner S. and V. Paxson, "IANA Allocation Guidelines
|
|||
|
|
For Values In the Internet Protocol and Related
|
|||
|
|
Headers", BCP 37, RFC 2780, March 2000.
|
|||
|
|
|
|||
|
|
[RJ90] K. K. Ramakrishnan and Raj Jain, "A Binary Feedback
|
|||
|
|
Scheme for Congestion Avoidance in Computer Networks",
|
|||
|
|
ACM Transactions on Computer Systems, Vol.8, No.2, pp.
|
|||
|
|
158-181, May 1990.
|
|||
|
|
|
|||
|
|
[SCWA99] Stefan Savage, Neal Cardwell, David Wetherall, and Tom
|
|||
|
|
Anderson, TCP Congestion Control with a Misbehaving
|
|||
|
|
Receiver, ACM Computer Communications Review, October
|
|||
|
|
1999.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 44]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
[TBIT] Jitendra Padhye and Sally Floyd, "Identifying the TCP
|
|||
|
|
Behavior of Web Servers", ICSI TR-01-002, February 2001.
|
|||
|
|
URL "http://www.aciri.org/tbit/".
|
|||
|
|
|
|||
|
|
16. Security Considerations
|
|||
|
|
|
|||
|
|
Security considerations have been discussed in Sections 7, 8, 18, and
|
|||
|
|
19.
|
|||
|
|
|
|||
|
|
17. IPv4 Header Checksum Recalculation
|
|||
|
|
|
|||
|
|
IPv4 header checksum recalculation is an issue with some high-end
|
|||
|
|
router architectures using an output-buffered switch, since most if
|
|||
|
|
not all of the header manipulation is performed on the input side of
|
|||
|
|
the switch, while the ECN decision would need to be made local to the
|
|||
|
|
output buffer. This is not an issue for IPv6, since there is no IPv6
|
|||
|
|
header checksum. The IPv4 TOS octet is the last byte of a 16-bit
|
|||
|
|
half-word.
|
|||
|
|
|
|||
|
|
RFC 1141 [RFC1141] discusses the incremental updating of the IPv4
|
|||
|
|
checksum after the TTL field is decremented. The incremental
|
|||
|
|
updating of the IPv4 checksum after the CE codepoint was set would
|
|||
|
|
work as follows: Let HC be the original header checksum for an ECT(0)
|
|||
|
|
packet, and let HC' be the new header checksum after the CE bit has
|
|||
|
|
been set. That is, the ECN field has changed from '10' to '11'.
|
|||
|
|
Then for header checksums calculated with one's complement
|
|||
|
|
subtraction, HC' would be recalculated as follows:
|
|||
|
|
|
|||
|
|
HC' = { HC - 1 HC > 1
|
|||
|
|
{ 0x0000 HC = 1
|
|||
|
|
|
|||
|
|
For header checksums calculated on two's complement machines, HC'
|
|||
|
|
would be recalculated as follows after the CE bit was set:
|
|||
|
|
|
|||
|
|
HC' = { HC - 1 HC > 0
|
|||
|
|
{ 0xFFFE HC = 0
|
|||
|
|
|
|||
|
|
A similar incremental updating of the IPv4 checksum can be carried
|
|||
|
|
out when the ECN field is changed from ECT(1) to CE, that is, from '
|
|||
|
|
01' to '11'.
|
|||
|
|
|
|||
|
|
18. Possible Changes to the ECN Field in the Network
|
|||
|
|
|
|||
|
|
This section discusses in detail possible changes to the ECN field in
|
|||
|
|
the network, such as falsely reporting congestion, disabling ECN-
|
|||
|
|
Capability for an individual packet, erasing the ECN congestion
|
|||
|
|
indication, or falsely indicating ECN-Capability.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 45]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
18.1. Possible Changes to the IP Header
|
|||
|
|
|
|||
|
|
18.1.1. Erasing the Congestion Indication
|
|||
|
|
|
|||
|
|
First, we consider the changes that a router could make that would
|
|||
|
|
result in effectively erasing the congestion indication after it had
|
|||
|
|
been set by a router upstream. The convention followed is: ECN
|
|||
|
|
codepoint of received packet -> ECN codepoint of packet transmitted.
|
|||
|
|
|
|||
|
|
Replacing the CE codepoint with the ECT(0) or ECT(1) codepoint
|
|||
|
|
effectively erases the congestion indication. However, with the use
|
|||
|
|
of two ECT codepoints, a router erasing the CE codepoint has no way
|
|||
|
|
to know whether the original ECT codepoint was ECT(0) or ECT(1).
|
|||
|
|
Thus, it is possible for the transport protocol to deploy mechanisms
|
|||
|
|
to detect such erasures of the CE codepoint.
|
|||
|
|
|
|||
|
|
The consequence of the erasure of the CE codepoint for the upstream
|
|||
|
|
router is that there is a potential for congestion to build for a
|
|||
|
|
time, because the congestion indication does not reach the source.
|
|||
|
|
However, the packet would be received and acknowledged.
|
|||
|
|
|
|||
|
|
The potential effect of erasing the congestion indication is complex,
|
|||
|
|
and is discussed in depth in Section 19 below. Note that the effect
|
|||
|
|
of erasing the congestion indication is different from dropping a
|
|||
|
|
packet in the network. When a data packet is dropped, the drop is
|
|||
|
|
detected by the TCP sender, and interpreted as an indication of
|
|||
|
|
congestion. Similarly, if a sufficient number of consecutive
|
|||
|
|
acknowledgement packets are dropped, causing the cumulative
|
|||
|
|
acknowledgement field not to be advanced at the sender, the sender is
|
|||
|
|
limited by the congestion window from sending additional packets, and
|
|||
|
|
ultimately the retransmit timer expires.
|
|||
|
|
|
|||
|
|
In contrast, a systematic erasure of the CE bit by a downstream
|
|||
|
|
router can have the effect of causing a queue buildup at an upstream
|
|||
|
|
router, including the possible loss of packets due to buffer
|
|||
|
|
overflow. There is a potential of unfairness in that another flow
|
|||
|
|
that goes through the congested router could react to the CE bit set
|
|||
|
|
while the flow that has the CE bit erased could see better
|
|||
|
|
performance. The limitations on this potential unfairness are
|
|||
|
|
discussed in more detail in Section 19 below.
|
|||
|
|
|
|||
|
|
The last of the three changes is to replace the CE codepoint with the
|
|||
|
|
not-ECT codepoint, thus erasing the congestion indication and
|
|||
|
|
disabling ECN-Capability at the same time.
|
|||
|
|
|
|||
|
|
The `erasure' of the congestion indication is only effective if the
|
|||
|
|
packet does not end up being marked or dropped again by a downstream
|
|||
|
|
router. If the CE codepoint is replaced by an ECT codepoint, the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 46]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
packet remains ECN-Capable, and could be either marked or dropped by
|
|||
|
|
a downstream router as an indication of congestion. If the CE
|
|||
|
|
codepoint is replaced by the not-ECT codepoint, the packet is no
|
|||
|
|
longer ECN-capable, and can therefore be dropped but not marked by a
|
|||
|
|
downstream router as an indication of congestion.
|
|||
|
|
|
|||
|
|
18.1.2. Falsely Reporting Congestion
|
|||
|
|
|
|||
|
|
This change is to set the CE codepoint when an ECT codepoint was
|
|||
|
|
already set, even though there was no congestion. This change does
|
|||
|
|
not affect the treatment of that packet along the rest of the path.
|
|||
|
|
In particular, a router does not examine the CE codepoint in deciding
|
|||
|
|
whether to drop or mark an arriving packet.
|
|||
|
|
|
|||
|
|
However, this could result in the application unnecessarily invoking
|
|||
|
|
end-to-end congestion control, and reducing its arrival rate. By
|
|||
|
|
itself, this is no worse (for the application or for the network)
|
|||
|
|
than if the tampering router had actually dropped the packet.
|
|||
|
|
|
|||
|
|
18.1.3. Disabling ECN-Capability
|
|||
|
|
|
|||
|
|
This change is to turn off the ECT codepoint of a packet. This means
|
|||
|
|
that if the packet later encounters congestion (e.g., by arriving to
|
|||
|
|
a RED queue with a moderate average queue size), it will be dropped
|
|||
|
|
instead of being marked. By itself, this is no worse (for the
|
|||
|
|
application) than if the tampering router had actually dropped the
|
|||
|
|
packet. The saving grace in this particular case is that there is no
|
|||
|
|
congested router upstream expecting a reaction from setting the CE
|
|||
|
|
bit.
|
|||
|
|
|
|||
|
|
18.1.4. Falsely Indicating ECN-Capability
|
|||
|
|
|
|||
|
|
This change would incorrectly label a packet as ECN-Capable. The
|
|||
|
|
packet may have been sent either by an ECN-Capable transport or a
|
|||
|
|
transport that is not ECN-Capable.
|
|||
|
|
|
|||
|
|
If the packet later encounters moderate congestion at an ECN-Capable
|
|||
|
|
router, the router could set the CE codepoint instead of dropping the
|
|||
|
|
packet. If the transport protocol in fact is not ECN-Capable, then
|
|||
|
|
the transport will never receive this indication of congestion, and
|
|||
|
|
will not reduce its sending rate in response. The potential
|
|||
|
|
consequences of falsely indicating ECN-capability are discussed
|
|||
|
|
further in Section 19 below.
|
|||
|
|
|
|||
|
|
If the packet never later encounters congestion at an ECN-Capable
|
|||
|
|
router, then the first of these two changes would have no effect,
|
|||
|
|
other than possibly interfering with the use of the ECN nonce by the
|
|||
|
|
transport protocol. The last change, however, would have the effect
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 47]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
of giving false reports of congestion to a monitoring device along
|
|||
|
|
the path. If the transport protocol is ECN-Capable, then this change
|
|||
|
|
could also have an effect at the transport level, by combining
|
|||
|
|
falsely indicating ECN-Capability with falsely reporting congestion.
|
|||
|
|
For an ECN-capable transport, this would cause the transport to
|
|||
|
|
unnecessarily react to congestion. In this particular case, the
|
|||
|
|
router that is incorrectly changing the ECN field could have dropped
|
|||
|
|
the packet. Thus for this case of an ECN-capable transport, the
|
|||
|
|
consequence of this change to the ECN field is no worse than dropping
|
|||
|
|
the packet.
|
|||
|
|
|
|||
|
|
18.2. Information carried in the Transport Header
|
|||
|
|
|
|||
|
|
For TCP, an ECN-capable TCP receiver informs its TCP peer that it is
|
|||
|
|
ECN-capable at the TCP level, conveying this information in the TCP
|
|||
|
|
header at the time the connection is setup. This document does not
|
|||
|
|
consider potential dangers introduced by changes in the transport
|
|||
|
|
header within the network. We note that when IPsec is used, the
|
|||
|
|
transport header is protected both in tunnel and transport modes
|
|||
|
|
[ESP, AH].
|
|||
|
|
|
|||
|
|
Another issue concerns TCP packets with a spoofed IP source address
|
|||
|
|
carrying invalid ECN information in the transport header. For
|
|||
|
|
completeness, we examine here some possible ways that a node spoofing
|
|||
|
|
the IP source address of another node could use the two ECN flags in
|
|||
|
|
the TCP header to launch a denial-of-service attack. However, these
|
|||
|
|
attacks would require an ability for the attacker to use valid TCP
|
|||
|
|
sequence numbers, and any attacker with this ability and with the
|
|||
|
|
ability to spoof IP source addresses could damage the TCP connection
|
|||
|
|
without using the ECN flags. Therefore, ECN does not add any new
|
|||
|
|
vulnerabilities in this respect.
|
|||
|
|
|
|||
|
|
An acknowledgement packet with a spoofed IP source address of the TCP
|
|||
|
|
data receiver could include the ECE bit set. If accepted by the TCP
|
|||
|
|
data sender as a valid packet, this spoofed acknowledgement packet
|
|||
|
|
could result in the TCP data sender unnecessarily halving its
|
|||
|
|
congestion window. However, to be accepted by the data sender, such
|
|||
|
|
a spoofed acknowledgement packet would have to have the correct 32-
|
|||
|
|
bit sequence number as well as a valid acknowledgement number. An
|
|||
|
|
attacker that could successfully send such a spoofed acknowledgement
|
|||
|
|
packet could also send a spoofed RST packet, or do other equally
|
|||
|
|
damaging operations to the TCP connection.
|
|||
|
|
|
|||
|
|
Packets with a spoofed IP source address of the TCP data sender could
|
|||
|
|
include the CWR bit set. Again, to be accepted, such a packet would
|
|||
|
|
have to have a valid sequence number. In addition, such a spoofed
|
|||
|
|
packet would have a limited performance impact. Spoofing a data
|
|||
|
|
packet with the CWR bit set could result in the TCP data receiver
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 48]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
sending fewer ECE packets than it would otherwise, if the data
|
|||
|
|
receiver was sending ECE packets when it received the spoofed CWR
|
|||
|
|
packet.
|
|||
|
|
|
|||
|
|
18.3. Split Paths
|
|||
|
|
|
|||
|
|
In some cases, a malicious or broken router might have access to only
|
|||
|
|
a subset of the packets from a flow. The question is as follows:
|
|||
|
|
can this router, by altering the ECN field in this subset of the
|
|||
|
|
packets, do more damage to that flow than if it had simply dropped
|
|||
|
|
that set of packets?
|
|||
|
|
|
|||
|
|
We will classify the packets in the flow as A packets and B packets,
|
|||
|
|
and assume that the adversary only has access to A packets. Assume
|
|||
|
|
that the adversary is subverting end-to-end congestion control along
|
|||
|
|
the path traveled by A packets only, by either falsely indicating
|
|||
|
|
ECN-Capability upstream of the point where congestion occurs, or
|
|||
|
|
erasing the congestion indication downstream. Consider also that
|
|||
|
|
there exists a monitoring device that sees both the A and B packets,
|
|||
|
|
and will "punish" both the A and B packets if the total flow is
|
|||
|
|
determined not to be properly responding to indications of
|
|||
|
|
congestion. Another key characteristic that we believe is likely to
|
|||
|
|
be true is that the monitoring device, before `punishing' the A&B
|
|||
|
|
flow, will first drop packets instead of setting the CE codepoint,
|
|||
|
|
and will drop arriving packets of that flow that already have the CE
|
|||
|
|
codepoint set. If the end nodes are in fact using end-to-end
|
|||
|
|
congestion control, they will see all of the indications of
|
|||
|
|
congestion seen by the monitoring device, and will begin to respond
|
|||
|
|
to these indications of congestion. Thus, the monitoring device is
|
|||
|
|
successful in providing the indications to the flow at an early
|
|||
|
|
stage.
|
|||
|
|
|
|||
|
|
It is true that the adversary that has access only to the A packets
|
|||
|
|
might, by subverting ECN-based congestion control, be able to deny
|
|||
|
|
the benefits of ECN to the other packets in the A&B aggregate. While
|
|||
|
|
this is unfortunate, this is not a reason to disable ECN.
|
|||
|
|
|
|||
|
|
A variant of falsely reporting congestion occurs when there are two
|
|||
|
|
adversaries along a path, where the first adversary falsely reports
|
|||
|
|
congestion, and the second adversary `erases' those reports. (Unlike
|
|||
|
|
packet drops, ECN congestion reports can be `reversed' later in the
|
|||
|
|
network by a malicious or broken router. However, the use of the ECN
|
|||
|
|
nonce could help the transport to detect this behavior.) While this
|
|||
|
|
would be transparent to the end node, it is possible that a
|
|||
|
|
monitoring device between the first and second adversaries would see
|
|||
|
|
the false indications of congestion. Keep in mind our recommendation
|
|||
|
|
in this document, that before `punishing' a flow for not responding
|
|||
|
|
appropriately to congestion, the router will first switch to dropping
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 49]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
rather than marking as an indication of congestion, for that flow.
|
|||
|
|
When this includes dropping arriving packets from that flow that have
|
|||
|
|
the CE codepoint set, this ensures that these indications of
|
|||
|
|
congestion are being seen by the end nodes. Thus, there is no
|
|||
|
|
additional harm that we are able to postulate as a result of multiple
|
|||
|
|
conflicting adversaries.
|
|||
|
|
|
|||
|
|
19. Implications of Subverting End-to-End Congestion Control
|
|||
|
|
|
|||
|
|
This section focuses on the potential repercussions of subverting
|
|||
|
|
end-to-end congestion control by either falsely indicating ECN-
|
|||
|
|
Capability, or by erasing the congestion indication in ECN (the CE
|
|||
|
|
codepoint). Subverting end-to-end congestion control by either of
|
|||
|
|
these two methods can have consequences both for the application and
|
|||
|
|
for the network. We discuss these separately below.
|
|||
|
|
|
|||
|
|
The first method to subvert end-to-end congestion control, that of
|
|||
|
|
falsely indicating ECN-Capability, effectively subverts end-to-end
|
|||
|
|
congestion control only if the packet later encounters congestion
|
|||
|
|
that results in the setting of the CE codepoint. In this case, the
|
|||
|
|
transport protocol (which may not be ECN-capable) does not receive
|
|||
|
|
the indication of congestion from these downstream congested routers.
|
|||
|
|
|
|||
|
|
The second method to subvert end-to-end congestion control, `erasing'
|
|||
|
|
the CE codepoint in a packet, effectively subverts end-to-end
|
|||
|
|
congestion control only when the CE codepoint in the packet was set
|
|||
|
|
earlier by a congested router. In this case, the transport protocol
|
|||
|
|
does not receive the indication of congestion from the upstream
|
|||
|
|
congested routers.
|
|||
|
|
|
|||
|
|
Either of these two methods of subverting end-to-end congestion
|
|||
|
|
control can potentially introduce more damage to the network (and
|
|||
|
|
possibly to the flow itself) than if the adversary had simply dropped
|
|||
|
|
packets from that flow. However, as we discuss later in this section
|
|||
|
|
and in Section 7, this potential damage is limited.
|
|||
|
|
|
|||
|
|
19.1. Implications for the Network and for Competing Flows
|
|||
|
|
|
|||
|
|
The CE codepoint of the ECN field is only used by routers as an
|
|||
|
|
indication of congestion during periods of *moderate* congestion.
|
|||
|
|
ECN-capable routers should drop rather than mark packets during heavy
|
|||
|
|
congestion even if the router's queue is not yet full. For example,
|
|||
|
|
for routers using active queue management based on RED, the router
|
|||
|
|
should drop rather than mark packets that arrive while the average
|
|||
|
|
queue sizes exceed the RED queue's maximum threshold.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 50]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
One consequence for the network of subverting end-to-end congestion
|
|||
|
|
control is that flows that do not receive the congestion indications
|
|||
|
|
from the network might increase their sending rate until they drive
|
|||
|
|
the network into heavier congestion. Then, the congested router
|
|||
|
|
could begin to drop rather than mark arriving packets. For flows
|
|||
|
|
that are not isolated by some form of per-flow scheduling or other
|
|||
|
|
per-flow mechanisms, but are instead aggregated with other flows in a
|
|||
|
|
single queue in an undifferentiated fashion, this packet-dropping at
|
|||
|
|
the congested router would apply to all flows that share that queue.
|
|||
|
|
Thus, the consequences would be to increase the level of congestion
|
|||
|
|
in the network.
|
|||
|
|
|
|||
|
|
In some cases, the increase in the level of congestion will lead to a
|
|||
|
|
substantial buffer buildup at the congested queue that will be
|
|||
|
|
sufficient to drive the congested queue from the packet-marking to
|
|||
|
|
the packet-dropping regime. This transition could occur either
|
|||
|
|
because of buffer overflow, or because of the active queue management
|
|||
|
|
policy described above that drops packets when the average queue is
|
|||
|
|
above RED's maximum threshold. At this point, all flows, including
|
|||
|
|
the subverted flow, will begin to see packet drops instead of packet
|
|||
|
|
marks, and a malicious or broken router will no longer be able to `
|
|||
|
|
erase' these indications of congestion in the network. If the end
|
|||
|
|
nodes are deploying appropriate end-to-end congestion control, then
|
|||
|
|
the subverted flow will reduce its arrival rate in response to
|
|||
|
|
congestion. When the level of congestion is sufficiently reduced,
|
|||
|
|
the congested queue can return from the packet-dropping regime to the
|
|||
|
|
packet-marking regime. The steady-state pattern could be one of the
|
|||
|
|
congested queue oscillating between these two regimes.
|
|||
|
|
|
|||
|
|
In other cases, the consequences of subverting end-to-end congestion
|
|||
|
|
control will not be severe enough to drive the congested link into
|
|||
|
|
sufficiently-heavy congestion that packets are dropped instead of
|
|||
|
|
being marked. In this case, the implications for competing flows in
|
|||
|
|
the network will be a slightly-increased rate of packet marking or
|
|||
|
|
dropping, and a corresponding decrease in the bandwidth available to
|
|||
|
|
those flows. This can be a stable state if the arrival rate of the
|
|||
|
|
subverted flow is sufficiently small, relative to the link bandwidth,
|
|||
|
|
that the average queue size at the congested router remains under
|
|||
|
|
control. In particular, the subverted flow could have a limited
|
|||
|
|
bandwidth demand on the link at this router, while still getting more
|
|||
|
|
than its "fair" share of the link. This limited demand could be due
|
|||
|
|
to a limited demand from the data source; a limitation from the TCP
|
|||
|
|
advertised window; a lower-bandwidth access pipe; or other factors.
|
|||
|
|
Thus the subversion of ECN-based congestion control can still lead to
|
|||
|
|
unfairness, which we believe is appropriate to note here.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 51]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
The threat to the network posed by the subversion of ECN-based
|
|||
|
|
congestion control in the network is essentially the same as the
|
|||
|
|
threat posed by an end-system that intentionally fails to cooperate
|
|||
|
|
with end-to-end congestion control. The deployment of mechanisms in
|
|||
|
|
routers to address this threat is an open research question, and is
|
|||
|
|
discussed further in Section 10.
|
|||
|
|
|
|||
|
|
Let us take the example described in Section 18.1.1, where the CE
|
|||
|
|
codepoint that was set in a packet is erased: {'11' -> '10' or '11'
|
|||
|
|
-> '01'}. The consequence for the congested upstream router that set
|
|||
|
|
the CE codepoint is that this congestion indication does not reach
|
|||
|
|
the end nodes for that flow. The source (even one which is completely
|
|||
|
|
cooperative and not malicious) is thus allowed to continue to
|
|||
|
|
increase its sending rate (if it is a TCP flow, by increasing its
|
|||
|
|
congestion window). The flow potentially achieves better throughput
|
|||
|
|
than the other flows that also share the congested router, especially
|
|||
|
|
if there are no policing mechanisms or per-flow queuing mechanisms at
|
|||
|
|
that router. Consider the behavior of the other flows, especially if
|
|||
|
|
they are cooperative: that is, the flows that do not experience
|
|||
|
|
subverted end-to-end congestion control. They are likely to reduce
|
|||
|
|
their load (e.g., by reducing their window size) on the congested
|
|||
|
|
router, thus benefiting our subverted flow. This results in
|
|||
|
|
unfairness. As we discussed above, this unfairness could either be
|
|||
|
|
transient (because the congested queue is driven into the packet-
|
|||
|
|
marking regime), oscillatory (because the congested queue oscillates
|
|||
|
|
between the packet marking and the packet dropping regime), or more
|
|||
|
|
moderate but a persistent stable state (because the congested queue
|
|||
|
|
is never driven to the packet dropping regime).
|
|||
|
|
|
|||
|
|
The results would be similar if the subverted flow was intentionally
|
|||
|
|
avoiding end-to-end congestion control. One difference is that a
|
|||
|
|
flow that is intentionally avoiding end-to-end congestion control at
|
|||
|
|
the end nodes can avoid end-to-end congestion control even when the
|
|||
|
|
congested queue is in packet-dropping mode, by refusing to reduce its
|
|||
|
|
sending rate in response to packet drops in the network. Thus the
|
|||
|
|
problems for the network from the subversion of ECN-based congestion
|
|||
|
|
control are less severe than the problems caused by the intentional
|
|||
|
|
avoidance of end-to-end congestion control in the end nodes. It is
|
|||
|
|
also the case that it is considerably more difficult to control the
|
|||
|
|
behavior of the end nodes than it is to control the behavior of the
|
|||
|
|
infrastructure itself. This is not to say that the problems for the
|
|||
|
|
network posed by the network's subversion of ECN-based congestion
|
|||
|
|
control are small; just that they are dwarfed by the problems for the
|
|||
|
|
network posed by the subversion of either ECN-based or other
|
|||
|
|
currently known packet-based congestion control mechanisms by the end
|
|||
|
|
nodes.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 52]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
19.2. Implications for the Subverted Flow
|
|||
|
|
|
|||
|
|
When a source indicates that it is ECN-capable, there is an
|
|||
|
|
expectation that the routers in the network that are capable of
|
|||
|
|
participating in ECN will use the CE codepoint for indication of
|
|||
|
|
congestion. There is the potential benefit of using ECN in reducing
|
|||
|
|
the amount of packet loss (in addition to the reduced queuing delays
|
|||
|
|
because of active queue management policies). When the packet flows
|
|||
|
|
through an IPsec tunnel where the nodes that the tunneled packets
|
|||
|
|
traverse are untrusted in some way, the expectation is that IPsec
|
|||
|
|
will protect the flow from subversion that results in undesirable
|
|||
|
|
consequences.
|
|||
|
|
|
|||
|
|
In many cases, a subverted flow will benefit from the subversion of
|
|||
|
|
end-to-end congestion control for that flow in the network, by
|
|||
|
|
receiving more bandwidth than it would have otherwise, relative to
|
|||
|
|
competing non-subverted flows. If the congested queue reaches the
|
|||
|
|
packet-dropping stage, then the subversion of end-to-end congestion
|
|||
|
|
control might or might not be of overall benefit to the subverted
|
|||
|
|
flow, depending on that flow's relative tradeoffs between throughput,
|
|||
|
|
loss, and delay.
|
|||
|
|
|
|||
|
|
One form of subverting end-to-end congestion control is to falsely
|
|||
|
|
indicate ECN-capability by setting the ECT codepoint. This has the
|
|||
|
|
consequence of downstream congested routers setting the CE codepoint
|
|||
|
|
in vain. However, as described in Section 9.1.2, if an ECT codepoint
|
|||
|
|
is changed in an IP tunnel, this can be detected at the egress point
|
|||
|
|
of the tunnel, as long as the inner header was not changed within the
|
|||
|
|
tunnel.
|
|||
|
|
|
|||
|
|
The second form of subverting end-to-end congestion control is to
|
|||
|
|
erase the congestion indication by erasing the CE codepoint. In this
|
|||
|
|
case, it is the upstream congested routers that set the CE codepoint
|
|||
|
|
in vain.
|
|||
|
|
|
|||
|
|
If an ECT codepoint is erased within an IP tunnel, then this can be
|
|||
|
|
detected at the egress point of the tunnel, as long as the inner
|
|||
|
|
header was not changed within the tunnel. If the CE codepoint is set
|
|||
|
|
upstream of the IP tunnel, then any erasure of the outer header's CE
|
|||
|
|
codepoint within the tunnel will have no effect because the inner
|
|||
|
|
header preserves the set value of the CE codepoint. However, if the
|
|||
|
|
CE codepoint is set within the tunnel, and erased either within or
|
|||
|
|
downstream of the tunnel, this is not necessarily detected at the
|
|||
|
|
egress point of the tunnel.
|
|||
|
|
|
|||
|
|
With this subversion of end-to-end congestion control, an end-system
|
|||
|
|
transport does not respond to the congestion indication. Along with
|
|||
|
|
the increased unfairness for the non-subverted flows described in the
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 53]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
previous section, the congested router's queue could continue to
|
|||
|
|
build, resulting in packet loss at the congested router - which is a
|
|||
|
|
means for indicating congestion to the transport in any case. In the
|
|||
|
|
interim, the flow might experience higher queuing delays, possibly
|
|||
|
|
along with an increased bandwidth relative to other non-subverted
|
|||
|
|
flows. But transports do not inherently make assumptions of
|
|||
|
|
consistently experiencing carefully managed queuing in the path. We
|
|||
|
|
believe that these forms of subverting end-to-end congestion control
|
|||
|
|
are no worse for the subverted flow than if the adversary had simply
|
|||
|
|
dropped the packets of that flow itself.
|
|||
|
|
|
|||
|
|
19.3. Non-ECN-Based Methods of Subverting End-to-end Congestion Control
|
|||
|
|
|
|||
|
|
We have shown that, in many cases, a malicious or broken router that
|
|||
|
|
is able to change the bits in the ECN field can do no more damage
|
|||
|
|
than if it had simply dropped the packet in question. However, this
|
|||
|
|
is not true in all cases, in particular in the cases where the broken
|
|||
|
|
router subverted end-to-end congestion control by either falsely
|
|||
|
|
indicating ECN-Capability or by erasing the ECN congestion indication
|
|||
|
|
(in the CE codepoint). While there are many ways that a router can
|
|||
|
|
harm a flow by dropping packets, a router cannot subvert end-to-end
|
|||
|
|
congestion control by dropping packets. As an example, a router
|
|||
|
|
cannot subvert TCP congestion control by dropping data packets,
|
|||
|
|
acknowledgement packets, or control packets.
|
|||
|
|
|
|||
|
|
Even though packet-dropping cannot be used to subvert end-to-end
|
|||
|
|
congestion control, there *are* non-ECN-based methods for subverting
|
|||
|
|
end-to-end congestion control that a broken or malicious router could
|
|||
|
|
use. For example, a broken router could duplicate data packets, thus
|
|||
|
|
effectively negating the effects of end-to-end congestion control
|
|||
|
|
along some portion of the path. (For a router that duplicated
|
|||
|
|
packets within an IPsec tunnel, the security administrator can cause
|
|||
|
|
the duplicate packets to be discarded by configuring anti-replay
|
|||
|
|
protection for the tunnel.) This duplication of packets within the
|
|||
|
|
network would have similar implications for the network and for the
|
|||
|
|
subverted flow as those described in Sections 18.1.1 and 18.1.4
|
|||
|
|
above.
|
|||
|
|
|
|||
|
|
20. The Motivation for the ECT Codepoints.
|
|||
|
|
|
|||
|
|
20.1. The Motivation for an ECT Codepoint.
|
|||
|
|
|
|||
|
|
The need for an ECT codepoint is motivated by the fact that ECN will
|
|||
|
|
be deployed incrementally in an Internet where some transport
|
|||
|
|
protocols and routers understand ECN and some do not. With an ECT
|
|||
|
|
codepoint, the router can drop packets from flows that are not ECN-
|
|||
|
|
capable, but can *instead* set the CE codepoint in packets that *are*
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 54]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
ECN-capable. Because an ECT codepoint allows an end node to have the
|
|||
|
|
CE codepoint set in a packet *instead* of having the packet dropped,
|
|||
|
|
an end node might have some incentive to deploy ECN.
|
|||
|
|
|
|||
|
|
If there was no ECT codepoint, then the router would have to set the
|
|||
|
|
CE codepoint for packets from both ECN-capable and non-ECN-capable
|
|||
|
|
flows. In this case, there would be no incentive for end-nodes to
|
|||
|
|
deploy ECN, and no viable path of incremental deployment from a non-
|
|||
|
|
ECN world to an ECN-capable world. Consider the first stages of such
|
|||
|
|
an incremental deployment, where a subset of the flows are ECN-
|
|||
|
|
capable. At the onset of congestion, when the packet
|
|||
|
|
dropping/marking rate would be low, routers would only set CE
|
|||
|
|
codepoints, rather than dropping packets. However, only those flows
|
|||
|
|
that are ECN-capable would understand and respond to CE packets. The
|
|||
|
|
result is that the ECN-capable flows would back off, and the non-
|
|||
|
|
ECN-capable flows would be unaware of the ECN signals and would
|
|||
|
|
continue to open their congestion windows.
|
|||
|
|
|
|||
|
|
In this case, there are two possible outcomes: (1) the ECN-capable
|
|||
|
|
flows back off, the non-ECN-capable flows get all of the bandwidth,
|
|||
|
|
and congestion remains mild, or (2) the ECN-capable flows back off,
|
|||
|
|
the non-ECN-capable flows don't, and congestion increases until the
|
|||
|
|
router transitions from setting the CE codepoint to dropping packets.
|
|||
|
|
While this second outcome evens out the fairness, the ECN-capable
|
|||
|
|
flows would still receive little benefit from being ECN-capable,
|
|||
|
|
because the increased congestion would drive the router to packet-
|
|||
|
|
dropping behavior.
|
|||
|
|
|
|||
|
|
A flow that advertised itself as ECN-Capable but does not respond to
|
|||
|
|
CE codepoints is functionally equivalent to a flow that turns off
|
|||
|
|
congestion control, as discussed earlier in this document.
|
|||
|
|
|
|||
|
|
Thus, in a world when a subset of the flows are ECN-capable, but
|
|||
|
|
where ECN-capable flows have no mechanism for indicating that fact to
|
|||
|
|
the routers, there would be less effective and less fair congestion
|
|||
|
|
control in the Internet, resulting in a strong incentive for end
|
|||
|
|
nodes not to deploy ECN.
|
|||
|
|
|
|||
|
|
20.2. The Motivation for two ECT Codepoints.
|
|||
|
|
|
|||
|
|
The primary motivation for the two ECT codepoints is to provide a
|
|||
|
|
one-bit ECN nonce. The ECN nonce allows the development of
|
|||
|
|
mechanisms for the sender to probabilistically verify that network
|
|||
|
|
elements are not erasing the CE codepoint, and that data receivers
|
|||
|
|
are properly reporting to the sender the receipt of packets with the
|
|||
|
|
CE codepoint set.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 55]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Another possibility for senders to detect misbehaving network
|
|||
|
|
elements or receivers would be for the data sender to occasionally
|
|||
|
|
send a data packet with the CE codepoint set, to see if the receiver
|
|||
|
|
reports receiving the CE codepoint. Of course, if these packets
|
|||
|
|
encountered congestion in the network, the router might make no
|
|||
|
|
change in the packets, because the CE codepoint would already be set.
|
|||
|
|
Thus, for packets sent with the CE codepoint set, the TCP end-nodes
|
|||
|
|
could not determine if some router intended to set the CE codepoint
|
|||
|
|
in these packets. For this reason, sending packets with the CE
|
|||
|
|
codepoint would have to be done sparingly, and would be a less
|
|||
|
|
effective check against misbehaving network elements and receivers
|
|||
|
|
than would be the ECN nonce.
|
|||
|
|
|
|||
|
|
The assignment of the fourth ECN codepoint to ECT(1) precludes the
|
|||
|
|
use of this codepoint for some other purposes. For clarity, we
|
|||
|
|
briefly list other possible purposes here.
|
|||
|
|
|
|||
|
|
One possibility might have been for the data sender to use the fourth
|
|||
|
|
ECN codepoint to indicate an alternate semantics for ECN. However,
|
|||
|
|
this seems to us more appropriate to be signaled using a
|
|||
|
|
differentiated services codepoint in the DS field.
|
|||
|
|
|
|||
|
|
A second possible use for the fourth ECN codepoint would have been to
|
|||
|
|
give the router two separate codepoints for the indication of
|
|||
|
|
congestion, CE(0) and CE(1), for mild and severe congestion
|
|||
|
|
respectively. While this could be useful in some cases, this
|
|||
|
|
certainly does not seem a compelling requirement at this point. If
|
|||
|
|
there was judged to be a compelling need for this, the complications
|
|||
|
|
of incremental deployment would most likely necessitate more that
|
|||
|
|
just one codepoint for this function.
|
|||
|
|
|
|||
|
|
A third use that has been informally proposed for the ECN codepoint
|
|||
|
|
is for use in some forms of multicast congestion control, based on
|
|||
|
|
randomized procedures for duplicating marked packets at routers.
|
|||
|
|
Some proposed multicast packet duplication procedures are based on a
|
|||
|
|
new ECN codepoint that (1) conveys the fact that congestion occurred
|
|||
|
|
upstream of the duplication point that marked the packet with this
|
|||
|
|
codepoint and (2) can detect congestion downstream of that
|
|||
|
|
duplication point. ECT(1) can serve this purpose because it is both
|
|||
|
|
distinct from ECT(0) and is replaced by CE when ECN marking occurs in
|
|||
|
|
response to congestion or incipient congestion. Explanation of how
|
|||
|
|
this enhanced version of ECN would be used by multicast congestion
|
|||
|
|
control is beyond the scope of this document, as are ECN-aware
|
|||
|
|
multicast packet duplication procedures and the processing of the ECN
|
|||
|
|
field at multicast receivers in all cases (i.e., irrespective of the
|
|||
|
|
multicast packet duplication procedure(s) used).
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 56]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
The specification of IP tunnel modifications for ECN in this document
|
|||
|
|
assumes that the only change made to the outer IP header's ECN field
|
|||
|
|
between tunnel endpoints is to set the CE codepoint to indicate
|
|||
|
|
congestion. This is not consistent with some of the proposed uses of
|
|||
|
|
ECT(1) by the multicast duplication procedures in the previous
|
|||
|
|
paragraph, and such procedures SHOULD NOT be deployed unless this
|
|||
|
|
inconsistency between multicast duplication procedures and IP tunnels
|
|||
|
|
with full ECN functionality is resolved. Limited ECN functionality
|
|||
|
|
may be used instead, although in practice many tunnel protocols
|
|||
|
|
(including IPsec) will not work correctly if multicast traffic
|
|||
|
|
duplication occurs within the tunnel
|
|||
|
|
|
|||
|
|
21. Why use Two Bits in the IP Header?
|
|||
|
|
|
|||
|
|
Given the need for an ECT indication in the IP header, there still
|
|||
|
|
remains the question of whether the ECT (ECN-Capable Transport) and
|
|||
|
|
CE (Congestion Experienced) codepoints should have been overloaded on
|
|||
|
|
a single bit. This overloaded-one-bit alternative, explored in
|
|||
|
|
[Floyd94], would have involved a single bit with two values. One
|
|||
|
|
value, "ECT and not CE", would represent an ECN-Capable Transport,
|
|||
|
|
and the other value, "CE or not ECT", would represent either
|
|||
|
|
Congestion Experienced or a non-ECN-Capable transport.
|
|||
|
|
|
|||
|
|
One difference between the one-bit and two-bit implementations
|
|||
|
|
concerns packets that traverse multiple congested routers. Consider
|
|||
|
|
a CE packet that arrives at a second congested router, and is
|
|||
|
|
selected by the active queue management at that router for either
|
|||
|
|
marking or dropping. In the one-bit implementation, the second
|
|||
|
|
congested router has no choice but to drop the CE packet, because it
|
|||
|
|
cannot distinguish between a CE packet and a non-ECT packet. In the
|
|||
|
|
two-bit implementation, the second congested router has the choice of
|
|||
|
|
either dropping the CE packet, or of leaving it alone with the CE
|
|||
|
|
codepoint set.
|
|||
|
|
|
|||
|
|
Another difference between the one-bit and two-bit implementations
|
|||
|
|
comes from the fact that with the one-bit implementation, receivers
|
|||
|
|
in a single flow cannot distinguish between CE and non-ECT packets.
|
|||
|
|
Thus, in the one-bit implementation an ECN-capable data sender would
|
|||
|
|
have to unambiguously indicate to the receiver or receivers whether
|
|||
|
|
each packet had been sent as ECN-Capable or as non-ECN-Capable. One
|
|||
|
|
possibility would be for the sender to indicate in the transport
|
|||
|
|
header whether the packet was sent as ECN-Capable. A second
|
|||
|
|
possibility that would involve a functional limitation for the one-
|
|||
|
|
bit implementation would be for the sender to unambiguously indicate
|
|||
|
|
that it was going to send *all* of its packets as ECN-Capable or as
|
|||
|
|
non-ECN-Capable. For a multicast transport protocol, this
|
|||
|
|
unambiguous indication would have to be apparent to receivers joining
|
|||
|
|
an on-going multicast session.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 57]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Another concern that was described earlier (and recommended in this
|
|||
|
|
document) is that transports (particularly TCP) should not mark pure
|
|||
|
|
ACK packets or retransmitted packets as being ECN-Capable. A pure
|
|||
|
|
ACK packet from a non-ECN-capable transport could be dropped, without
|
|||
|
|
necessarily having an impact on the transport from a congestion
|
|||
|
|
control perspective (because subsequent ACKs are cumulative). An
|
|||
|
|
ECN-capable transport reacting to the CE codepoint in a pure ACK
|
|||
|
|
packet by reducing the window would be at a disadvantage in
|
|||
|
|
comparison to a non-ECN-capable transport. For this reason (and for
|
|||
|
|
reasons described earlier in relation to retransmitted packets), it
|
|||
|
|
is desirable to have the ECT codepoint set on a per-packet basis.
|
|||
|
|
|
|||
|
|
Another advantage of the two-bit approach is that it is somewhat more
|
|||
|
|
robust. The most critical issue, discussed in Section 8, is that the
|
|||
|
|
default indication should be that of a non-ECN-Capable transport. In
|
|||
|
|
a two-bit implementation, this requirement for the default value
|
|||
|
|
simply means that the not-ECT codepoint should be the default. In
|
|||
|
|
the one-bit implementation, this means that the single overloaded bit
|
|||
|
|
should by default be in the "CE or not ECT" position. This is less
|
|||
|
|
clear and straightforward, and possibly more open to incorrect
|
|||
|
|
implementations either in the end nodes or in the routers.
|
|||
|
|
|
|||
|
|
In summary, while the one-bit implementation could be a possible
|
|||
|
|
implementation, it has the following significant limitations relative
|
|||
|
|
to the two-bit implementation. First, the one-bit implementation has
|
|||
|
|
more limited functionality for the treatment of CE packets at a
|
|||
|
|
second congested router. Second, the one-bit implementation requires
|
|||
|
|
either that extra information be carried in the transport header of
|
|||
|
|
packets from ECN-Capable flows (to convey the functionality of the
|
|||
|
|
second bit elsewhere, namely in the transport header), or that
|
|||
|
|
senders in ECN-Capable flows accept the limitation that receivers
|
|||
|
|
must be able to determine a priori which packets are ECN-Capable and
|
|||
|
|
which are not ECN-Capable. Third, the one-bit implementation is
|
|||
|
|
possibly more open to errors from faulty implementations that choose
|
|||
|
|
the wrong default value for the ECN bit. We believe that the use of
|
|||
|
|
the extra bit in the IP header for the ECT-bit is extremely valuable
|
|||
|
|
to overcome these limitations.
|
|||
|
|
|
|||
|
|
22. Historical Definitions for the IPv4 TOS Octet
|
|||
|
|
|
|||
|
|
RFC 791 [RFC791] defined the ToS (Type of Service) octet in the IP
|
|||
|
|
header. In RFC 791, bits 6 and 7 of the ToS octet are listed as
|
|||
|
|
"Reserved for Future Use", and are shown set to zero. The first two
|
|||
|
|
fields of the ToS octet were defined as the Precedence and Type of
|
|||
|
|
Service (TOS) fields.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 58]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
| PRECEDENCE | TOS | 0 | 0 | RFC 791
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
|
|||
|
|
RFC 1122 included bits 6 and 7 in the TOS field, though it did not
|
|||
|
|
discuss any specific use for those two bits:
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
| PRECEDENCE | TOS | RFC 1122
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
|
|||
|
|
The IPv4 TOS octet was redefined in RFC 1349 [RFC1349] as follows:
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
| PRECEDENCE | TOS | MBZ | RFC 1349
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
|
|||
|
|
Bit 6 in the TOS field was defined in RFC 1349 for "Minimize Monetary
|
|||
|
|
Cost". In addition to the Precedence and Type of Service (TOS)
|
|||
|
|
fields, the last field, MBZ (for "must be zero") was defined as
|
|||
|
|
currently unused. RFC 1349 stated that "The originator of a datagram
|
|||
|
|
sets [the MBZ] field to zero (unless participating in an Internet
|
|||
|
|
protocol experiment which makes use of that bit)."
|
|||
|
|
|
|||
|
|
RFC 1455 [RFC 1455] defined an experimental standard that used all
|
|||
|
|
four bits in the TOS field to request a guaranteed level of link
|
|||
|
|
security.
|
|||
|
|
|
|||
|
|
RFC 1349 and RFC 1455 have been obsoleted by "Definition of the
|
|||
|
|
Differentiated Services Field (DS Field) in the IPv4 and IPv6
|
|||
|
|
Headers" [RFC2474] in which bits 6 and 7 of the DS field are listed
|
|||
|
|
as Currently Unused (CU). RFC 2780 [RFC2780] specified ECN as an
|
|||
|
|
experimental use of the two-bit CU field. RFC 2780 updated the
|
|||
|
|
definition of the DS Field to only encompass the first six bits of
|
|||
|
|
this octet rather than all eight bits; these first six bits are
|
|||
|
|
defined as the Differentiated Services CodePoint (DSCP):
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+
|
|||
|
|
| DSCP | CU | RFCs 2474,
|
|||
|
|
+-----+-----+-----+-----+-----+-----+-----+-----+ 2780
|
|||
|
|
|
|||
|
|
Because of this unstable history, the definition of the ECN field in
|
|||
|
|
this document cannot be guaranteed to be backwards compatible with
|
|||
|
|
all past uses of these two bits.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 59]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Prior to RFC 2474, routers were not permitted to modify bits in
|
|||
|
|
either the DSCP or ECN field of packets forwarded through them, and
|
|||
|
|
hence routers that comply only with RFCs prior to 2474 should have no
|
|||
|
|
effect on ECN. For end nodes, bit 7 (the second ECN bit) must be
|
|||
|
|
transmitted as zero for any implementation compliant only with RFCs
|
|||
|
|
prior to 2474. Such nodes may transmit bit 6 (the first ECN bit) as
|
|||
|
|
one for the "Minimize Monetary Cost" provision of RFC 1349 or the
|
|||
|
|
experiment authorized by RFC 1455; neither this aspect of RFC 1349
|
|||
|
|
nor the experiment in RFC 1455 were widely implemented or used. The
|
|||
|
|
damage that could be done by a broken, non-conformant router would
|
|||
|
|
include "erasing" the CE codepoint for an ECN-capable packet that
|
|||
|
|
arrived at the router with the CE codepoint set, or setting the CE
|
|||
|
|
codepoint even in the absence of congestion. This has been discussed
|
|||
|
|
in the section on "Non-compliance in the Network".
|
|||
|
|
|
|||
|
|
The damage that could be done in an ECN-capable environment by a
|
|||
|
|
non-ECN-capable end-node transmitting packets with the ECT codepoint
|
|||
|
|
set has been discussed in the section on "Non-compliance by the End
|
|||
|
|
Nodes".
|
|||
|
|
|
|||
|
|
23. IANA Considerations
|
|||
|
|
|
|||
|
|
This section contains the namespaces that have either been created in
|
|||
|
|
this specification, or the values assigned in existing namespaces
|
|||
|
|
managed by IANA.
|
|||
|
|
|
|||
|
|
23.1. IPv4 TOS Byte and IPv6 Traffic Class Octet
|
|||
|
|
|
|||
|
|
The codepoints for the ECN Field of the IP header are specified by
|
|||
|
|
the Standards Action of this RFC, as is required by RFC 2780.
|
|||
|
|
|
|||
|
|
When this document is published as an RFC, IANA should create a new
|
|||
|
|
registry, "IPv4 TOS Byte and IPv6 Traffic Class Octet", with the
|
|||
|
|
namespace as follows:
|
|||
|
|
|
|||
|
|
IPv4 TOS Byte and IPv6 Traffic Class Octet
|
|||
|
|
|
|||
|
|
Description: The registrations are identical for IPv4 and IPv6.
|
|||
|
|
|
|||
|
|
Bits 0-5: see Differentiated Services Field Codepoints Registry
|
|||
|
|
(http://www.iana.org/assignments/dscp-registry)
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 60]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
Bits 6-7, ECN Field:
|
|||
|
|
|
|||
|
|
Binary Keyword References
|
|||
|
|
------ ------- ----------
|
|||
|
|
00 Not-ECT (Not ECN-Capable Transport) [RFC 3168]
|
|||
|
|
01 ECT(1) (ECN-Capable Transport(1)) [RFC 3168]
|
|||
|
|
10 ECT(0) (ECN-Capable Transport(0)) [RFC 3168]
|
|||
|
|
11 CE (Congestion Experienced) [RFC 3168]
|
|||
|
|
|
|||
|
|
23.2. TCP Header Flags
|
|||
|
|
|
|||
|
|
The codepoints for the CWR and ECE flags in the TCP header are
|
|||
|
|
specified by the Standards Action of this RFC, as is required by RFC
|
|||
|
|
2780.
|
|||
|
|
|
|||
|
|
When this document is published as an RFC, IANA should create a new
|
|||
|
|
registry, "TCP Header Flags", with the namespace as follows:
|
|||
|
|
|
|||
|
|
TCP Header Flags
|
|||
|
|
|
|||
|
|
The Transmission Control Protocol (TCP) included a 6-bit Reserved
|
|||
|
|
field defined in RFC 793, reserved for future use, in bytes 13 and 14
|
|||
|
|
of the TCP header, as illustrated below. The other six Control bits
|
|||
|
|
are defined separately by RFC 793.
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
| | | U | A | P | R | S | F |
|
|||
|
|
| Header Length | Reserved | R | C | S | S | Y | I |
|
|||
|
|
| | | G | K | H | T | N | N |
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
|
|||
|
|
RFC 3168 defines two of the six bits from the Reserved field to be
|
|||
|
|
used for ECN, as follows:
|
|||
|
|
|
|||
|
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
| | | C | E | U | A | P | R | S | F |
|
|||
|
|
| Header Length | Reserved | W | C | R | C | S | S | Y | I |
|
|||
|
|
| | | R | E | G | K | H | T | N | N |
|
|||
|
|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 61]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
TCP Header Flags
|
|||
|
|
|
|||
|
|
Bit Name Reference
|
|||
|
|
--- ---- ---------
|
|||
|
|
8 CWR (Congestion Window Reduced) [RFC 3168]
|
|||
|
|
9 ECE (ECN-Echo) [RFC 3168]
|
|||
|
|
|
|||
|
|
23.3. IPSEC Security Association Attributes
|
|||
|
|
|
|||
|
|
IANA allocated the IPSEC Security Association Attribute value 10 for
|
|||
|
|
the ECN Tunnel use described in Section 9.2.1.2 above at the request
|
|||
|
|
of David Black in November 1999. The IANA has changed the Reference
|
|||
|
|
for this allocation from David Black's request to this RFC.
|
|||
|
|
|
|||
|
|
24. Authors' Addresses
|
|||
|
|
|
|||
|
|
K. K. Ramakrishnan
|
|||
|
|
TeraOptic Networks, Inc.
|
|||
|
|
|
|||
|
|
Phone: +1 (408) 666-8650
|
|||
|
|
EMail: kk@teraoptic.com
|
|||
|
|
|
|||
|
|
|
|||
|
|
Sally Floyd
|
|||
|
|
ACIRI
|
|||
|
|
|
|||
|
|
Phone: +1 (510) 666-2989
|
|||
|
|
EMail: floyd@aciri.org
|
|||
|
|
URL: http://www.aciri.org/floyd/
|
|||
|
|
|
|||
|
|
|
|||
|
|
David L. Black
|
|||
|
|
EMC Corporation
|
|||
|
|
42 South St.
|
|||
|
|
Hopkinton, MA 01748
|
|||
|
|
|
|||
|
|
Phone: +1 (508) 435-1000 x75140
|
|||
|
|
EMail: black_david@emc.com
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 62]
|
|||
|
|
|
|||
|
|
RFC 3168 The Addition of ECN to IP September 2001
|
|||
|
|
|
|||
|
|
|
|||
|
|
25. Full Copyright Statement
|
|||
|
|
|
|||
|
|
Copyright (C) The Internet Society (2001). All Rights Reserved.
|
|||
|
|
|
|||
|
|
This document and translations of it may be copied and furnished to
|
|||
|
|
others, and derivative works that comment on or otherwise explain it
|
|||
|
|
or assist in its implementation may be prepared, copied, published
|
|||
|
|
and distributed, in whole or in part, without restriction of any
|
|||
|
|
kind, provided that the above copyright notice and this paragraph are
|
|||
|
|
included on all such copies and derivative works. However, this
|
|||
|
|
document itself may not be modified in any way, such as by removing
|
|||
|
|
the copyright notice or references to the Internet Society or other
|
|||
|
|
Internet organizations, except as needed for the purpose of
|
|||
|
|
developing Internet standards in which case the procedures for
|
|||
|
|
copyrights defined in the Internet Standards process must be
|
|||
|
|
followed, or as required to translate it into languages other than
|
|||
|
|
English.
|
|||
|
|
|
|||
|
|
The limited permissions granted above are perpetual and will not be
|
|||
|
|
revoked by the Internet Society or its successors or assigns.
|
|||
|
|
|
|||
|
|
This document and the information contained herein is provided on an
|
|||
|
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
|||
|
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
|||
|
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
|||
|
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
|||
|
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|||
|
|
|
|||
|
|
Acknowledgement
|
|||
|
|
|
|||
|
|
Funding for the RFC Editor function is currently provided by the
|
|||
|
|
Internet Society.
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
Ramakrishnan, et al. Standards Track [Page 63]
|
|||
|
|
|