draft-ietf-tsvwg-tunnel-congestion-feedback-03.txt   draft-ietf-tsvwg-tunnel-congestion-feedback-04.txt 
Internet Engineering Task Force X. Wei Internet Engineering Task Force X. Wei
INTERNET-DRAFT Huawei Technologies INTERNET-DRAFT Huawei Technologies
Intended Status: Informational L.Zhu Intended Status: Informational L.Zhu
Expires: April 3, 2017 Huawei Technologies Expires: July 29, 2017 Huawei Technologies
L.Deng L.Deng
China Mobile China Mobile
September 30, 2016 January 25, 2017
Tunnel Congestion Feedback Tunnel Congestion Feedback
draft-ietf-tsvwg-tunnel-congestion-feedback-03 draft-ietf-tsvwg-tunnel-congestion-feedback-04
Abstract Abstract
This document describes a mechanism to calculate congestion of a This document describes a method to measure congestion on a tunnel
tunnel segment based on RFC6040 recommendations, and a feedback segment based on recommendations from RFC 6040, "Tunneling of
protocol by which to send the measured congestion of the tunnel from Explicit Congestion Notification", and to use IPFIX to communicate
egress to ingress . A basic model for measuring tunnel congestion the congestion measurements from the tunnel's egress to a controller
and feedback is described, and a protocol for carrying the feedback which can respond by modifying the traffic control policies at the
data is outlined. tunnel's ingress.
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as other groups may also distribute working documents as
Internet-Drafts. Internet-Drafts.
skipping to change at page 2, line 6 skipping to change at page 2, line 4
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
Copyright and License Notice Copyright and License Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions And Terminologies . . . . . . . . . . . . . . . . . 3 2. Conventions And Terminologies . . . . . . . . . . . . . . . . . 3
3. Congestion Information Feedback Models . . . . . . . . . . . . 4 3. Congestion Information Feedback Models . . . . . . . . . . . . 3
3.1 Direct Model . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Congestion Level Measurement . . . . . . . . . . . . . . . . . 4
3.2 Centralized Model . . . . . . . . . . . . . . . . . . . . . 4 5. Congestion Information Delivery . . . . . . . . . . . . . . . . 6
4. Congestion Level Measurement . . . . . . . . . . . . . . . . . 5 5.1 IPFIX Extensions . . . . . . . . . . . . . . . . . . . . . . 7
5. Congestion Information Delivery . . . . . . . . . . . . . . . . 8 5.1.1 tunnelEcnCeCePacketTotalCount . . . . . . . . . . . . . 8
5.1 IPFIX Extentions . . . . . . . . . . . . . . . . . . . . . . 9 5.1.2 tunnelEcnEct0NectPacketTotalCount . . . . . . . . . . . 8
5.1.1 ce-cePacketTotalCount . . . . . . . . . . . . . . . . . 9 5.1.3 tunnelEcnEct1NectPacketTotalCount . . . . . . . . . . . 8
5.1.2 ect0-nectPacketTotalCount . . . . . . . . . . . . . . . 9 5.1.4 tunnelEcnCeNectPacketTotalCount . . . . . . . . . . . . 9
5.1.3 ect1-nectPacketTotalCount . . . . . . . . . . . . . . . 10 5.1.5 tunnelEcnCeEct0PacketTotalCount . . . . . . . . . . . . 9
5.1.4 ce-nectPacketTotalCount . . . . . . . . . . . . . . . . 10 5.1.6 tunnelEcnCeEct1PacketTotalCount . . . . . . . . . . . . 9
5.1.5 ce-ect0PacketTotalCount . . . . . . . . . . . . . . . . 10 5.1.7 tunnelEcnEct0Ect0PacketTotalCount . . . . . . . . . . . 10
5.1.6 ce-ect1PacketTotalCount . . . . . . . . . . . . . . . . 11 5.1.8 tunnelEcnEct1Ect1PacketTotalCount . . . . . . . . . . . 10
5.1.7 ect0-ect0PacketTotalCount . . . . . . . . . . . . . . . 11 6. Congestion Management . . . . . . . . . . . . . . . . . . . . . 10
5.1.8 ect1-ect1PacketTotalCount . . . . . . . . . . . . . . . 11 6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6. Congestion Management . . . . . . . . . . . . . . . . . . . . . 12 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 14
7. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 12 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.1 Normative References . . . . . . . . . . . . . . . . . . . 16
9.1 Normative References . . . . . . . . . . . . . . . . . . . 14 9.2 Informative References . . . . . . . . . . . . . . . . . . 17
9.2 Informative References . . . . . . . . . . . . . . . . . . 15 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16
1. Introduction 1. Introduction
In IP network, persistent congestion (or named congestion collapse) In IP networks, persistent congestion[RFC2914] lowers transport
lowers transport throughput, leading to waste of network resource. throughput, leading to waste of network resource. Appropriate
Appropriate congestion control mechanisms are therefore critical to congestion control mechanisms are therefore critical to prevent the
prevent the network from falling into the persistent congestion network from falling into the persistent congestion state. Currently,
state. Currently, transport protocols such as TCP[RFC793], transport protocols such as TCP[RFC793], SCTP[RFC4960],
SCTP[RFC4960], DCCP[RFC4340], have their built-in congestion control DCCP[RFC4340], have their built-in congestion control mechanisms, and
mechanisms, and even for certain single transport protocol like TCP even for certain single transport protocol like TCP there can be a
there can be a couple of different congestion control mechanisms to couple of different congestion control mechanisms to choose from. All
choose from. All these congestion control mechanisms are implemented these congestion control mechanisms are implemented on host side, and
on host side, and there are reasons that only host side congestion there are reasons that only host side congestion control is not
control is not sufficient for the whole network to keep away from sufficient for the whole network to keep away from persistent
persistent congestion. For example, (1) some protocol's congestion congestion. For example, (1) some protocol's congestion control
control scheme may have internal design flaws; (2) improper software scheme may have internal design flaws; (2) improper software
implementation of protocol; (3) some transport protocols do not even implementation of protocol; (3) some transport protocols, e.g.
provide congestion control at all. RTP[RFC3550] do not even provide congestion control at all.
In order to have a better control on network congestion status, it's
necessary for the network side to do certain kind of traffic control.
For example, ConEx [ConEx] provides a method for network operator to
learn about traffic's congestion contribution information, and then
congestion management action can be taken based on this information.
Tunnels are widely deployed in various networks including public Tunnels are widely deployed in various networks including public
Internet, datacenter network, and enterprise network etc. A tunnel Internet, data center network, and enterprise network etc. A tunnel
consists of ingress, egress and a set of interior routers. For the consists of ingress, egress and a set of intermediate routers. For
tunnel scenario, a tunnel-based mechanism which is different from the tunnel scenario, a tunnel-based mechanism is introduced for
ConEx is introduced for network traffic control to keep the network network traffic control to keep the network from persistent
from persistent congestion. Here, tunnel ingress will implement congestion. Here, tunnel ingress will implement congestion
congestion management function to control the traffic entering the management function to control the traffic entering the tunnel.
tunnel.
In order to perform congestion management at ingress, the ingress
must first obtain the inner tunnel congestion level information. Yet
the ingress cannot use the locally visible traffic rates, because it
would require additional knowledge of downstream capacity and
topology, as well as cross traffic that does not pass through this
ingress.
This document provides a mechanism of feeding back inner tunnel This document provides a mechanism of feeding back inner tunnel
congestion level to the ingress. Using this mechanism the egress can congestion level to the ingress. Using this mechanism the egress can
feed the tunnel congestion level information it collects back to the feed the tunnel congestion level information it collects back to the
ingress. After receiving this information the ingress will be able to ingress. After receiving this information the ingress will be able to
perform congestion management according to network management policy. perform congestion management according to network management policy.
2. Conventions And Terminologies 2. Conventions And Terminologies
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] document are to be interpreted as described in RFC 2119 [RFC2119]
DP: Decision Point, an logical entity that make congestion management DP: Decision Point, an logical entity that makes congestion
decision based on the received congestion feedback information. management decision based on the received congestion feedback
information.
AP: Action Point, an logical entity that implements congestion AP: Action Point, an logical entity that implements congestion
management action according to the decision made by Decision Point. management action according to the decision made by Decision Point.
ECT: ECN-Capable Transport code point defined in RFC3168.
3. Congestion Information Feedback Models 3. Congestion Information Feedback Models
The feedback model mainly consists of tunnel egress and tunnel
ingress. The tunnel egress composes of meter function and exporter
function; tunnel ingress composes AP (Action Point) function,
collector function and DP (Decision Point) function.
According to specific network deployment, there are two kinds of The Meter function collects network congestion level information, and
feedback model: direct model and centralized model. conveys the information to Exporter which feeds back the information
to the collector function.
The collector collects congestion level information from exporter,
after that congestion management Decision Point (DP) function will
make congestion management decision based on the information from
collector.
The action point controls the traffic entering tunnel, and it
implements traffic control decision of DP.
3.1 Direct Model
Feedback Feedback
+-----------------------------------+ +-----------------------------------+
| | | |
| | | |
| V | V
+--------------+ +-------------+ +--------------+ +-------------+
| +--------+ | | +---------+ | | +--------+ | | +---------+ |
| |Exporter| | | |Collector| | | |Exporter| | | |Collector| |
| +---|----+ | | +---|-----+ | | +---|----+ | | +---|-----+ |
| +--|--+ | | +|-+ | | +--|--+ | | +|-+ |
| |Meter| | | |DP| | | |Meter| | | |DP| |
| +-----+ | | +--+ | | +-----+ | | +--+ |
| | | +--+ | | | | +--+ |
| | | |AP| | | | | |AP| |
| | | +--+ | | | | +--+ |
|Egress | | Ingress | |Egress | | Ingress |
+--------------+ +-------------+ +--------------+ +-------------+
Figure 1: Feedback Model.
(a) Direct Feedback Model.
Direct model means egress feeds information directly to ingress. The
egress consists of Meter function and Exporter function, the Meter
function collects network congestion level information, and convey
the information to Exporter which feeds back the information to the
Collector function locating at ingress, after that congestion
management Decision Point (DP) function on ingress will make
congestion management decision based on the information from
Collector. The ingress here will act as both the decision point that
decides how to do congestion management and the action point that
implements congestion management decision.
3.2 Centralized Model
+-------------------+
|+---------+ +--+ |
feedback ||Collector|---|DP| |
+---->|+---------+ +--+ |#########
| | | #
| | Controller | #
| +-------------------+ #
| #
| #
+--------------+ +------V------+
| +--------+ | | |
| |Exporter| | | |
| +---|----+ | | |
| +--|--+ | | |
| |Meter| | | |
| +-----+ | | |
| | | +--+ |
| | | |AP| |
| | | +--+ |
|Egress | | Ingress |
+--------------+ +-------------+
(b) Centralized Feedback Model
In the centralized model, the ingress only takes the role of action
point, and it implements traffic control decision from another entity
named "controller". Here, after Exporter function on egress has
collected network congestion level information, it feeds back the
information to the collector of a controller instead of the ingress.
Then the controller makes congestion management decision and sends
the decision to the ingress to implement.
4. Congestion Level Measurement 4. Congestion Level Measurement
This section describes how to measure congestion level in a tunnel. This section describes how to measure congestion level in a
tunnel.
There could be different approaches of packet loss detection for The congestion level measurement is based on ECN (Explicit
different tunneling protocol scenarios. For instance, if there is a Congestion Notification) [RFC3168] and packet drop. If the routers
sequence field in the tunneling protocol header, it will be easy for support ECN, after router's queue length is over a predefined
egress to detect packet loss through the gaps in sequence number threshold, the routers will mark the ECN-capable packets as
space. Another approach is to compare the number of packets entering Congestion Experienced (CE) or drop not-ECT packets with the
ingress and the number of packets arriving at egress over the same probability proportional to queue length; if the queue overflows
span of packets. This document will focus on the latter one which is all packets will be dropped. If the routers do not support ECN,
a more general approach. after router's queue length is over a predefined threshold, the
routers will drop both the ECN-capable packets and the not-ECT
packets with the probability proportional to the queue length.
If the routers support Explicit Congestion Notification (ECN), after The network congestion level could be indicated through the ratio
router's queue length is over a predefined threshold, the routers of CE-marked packet and the ratio of packet drop, the relationship
will marks the ECN-capable packets as Congestion Experienced (CE) or between these two kinds of indicator is complementary. If the
drop not-ECT packets with the probability proportional to queue congestion level in tunnel is not high enough, the packets would
length; if the queue overflows all packets will be dropped. If the be marked as CE instead of being dropped, and then it is easy to
routers do not support ECN, after router's queue length is over a calculate congestion level according to the ratio of CE-marked
predefined threshold, the routers will drop both the ECN-capable packets. If the congestion level is so high that ECT packet will
packets and the not-ECT packets with the probability proportional to be dropped, then the packet loss ratio could be calculated by
the queue length. It's assumed all routers in the tunnel support ECN. comparing total packets entering ingress and total packets
arriving at egress over the same span of packets, if packet loss
is detected, it could be assumed that severe congestion has
occurred in the tunnel. Because loss is only ever a sign of
serious congestion, so it doesn't need to measure loss ratio
accurately.
Faked ECN-capable transport (ECT) is used at ingress to defer packet Faked ECN-capable transport (ECT) is used at ingress to defer
loss to egress. The basic idea of faked ECT is that, when packet loss to egress. The basic idea of faked ECT is that, when
encapsulating packets, ingress first marks tunnel outer header encapsulating packets, ingress first marks tunnel outer header
according to RFC6040, and then remarks outer header of Not-ECT packet according to RFC6040, and then remarks outer header of Not-ECT
as ECT, there will be three kinds of combination of outer header ECN packet as ECT, there will be three kinds of combination of outer
field and inner header ECN field: CE|CE, ECT|N-ECT, ECT|ECT (in the header ECN field and inner header ECN field: CE|CE, ECT|N-ECT,
form of outer ECN| inner ECN); when decapsulating packets at egress, ECT|ECT (in the form of outer ECN| inner ECN); when decapsulating
RFC6040 defined decapsulation behavior is used, and according to packets at egress, RFC6040 defined decapsulation behavior is used,
RFC6040, the packets marked as CE|N-ECT will be dropped by egress. and according to RFC6040, the packets marked as CE|N-ECT will be
dropped by egress.
In case all interior routers support ECN, the network congestion To calculate congestion level, for the same span of packets, the
level could be indicated through the ratio of CE-marked packet and number of each kind of ECN marking packet at ingress and egress
the ratio of packet drop, the relationship between these two kinds of will be compared to get the volume of CE-marked packet in the
indicator is complementary. If the congestion level in tunnel is not tunnel; and the total number of packets at ingress and egress will
high enough, the packets would be marked as CE instead of being be compared to detect the packet loss.
dropped, and then it is easy to calculate congestion level according
to the ratio of CE-marked packets. If the congestion level is so high
that ECT packet will be dropped, then the packet loss ratio could be
calculated by comparing total packets entering ingress and total
packets arriving at egress over the same span of packets, if packet
loss is detected, it could be assumed that severe congestion has
occurred in the tunnel. Because loss is only ever a sign of serious
congestion, so it doesn't need to measure loss ratio accurately.
The basic procedure of congestion level measurement is as follows: The basic procedure of congestion level measurement is as follows:
+-------+ +------+ +-------+ +------+
|Ingress| |Egress| |Ingress| |Egress|
+-------+ +------+ +-------+ +------+
| | | |
+----------------+ | +----------------+ |
|cumulative count| | |cumulative count| |
+----------------+ | +----------------+ |
| | | |
| <node id-i, ECN counts> | | <node id-i, ECN counts> |
|------------------------>| |------------------------>|
|<node id-e, ECN counts> | |<node id-e, ECN counts> |
|<------------------------| |<------------------------|
| | | |
| | | |
(a) Direct model feedback procedure Figure 2: Procedure of Congestion Level Measurement
+----------+ +-------+ +------+
|Controller| |Ingress| |Egress|
+----------+ +-------+ +------+
| | |
| +----------------+ |
| |cumulative count| |
| +----------------+ |
| | |
| | <node id-i, ECN counts> |
| |------------------------>|
| | |
| |
| |
| <node id-i, ECN counts> |
| <node id-e, ECN counts> |
|<---------------------------------------|
| |
| |
| |
(b) Centralized model feedback procedure
Ingress encapsulates packets and marks outer header according to Ingress encapsulates packets and marks outer header according to
faked ECT as described above. Ingress cumulatively counts packets for faked ECT as described above. Ingress cumulatively counts packets for
three types of ECN combination (CE|CE, ECT|N-ECT, ECT|ECT) and then three types of ECN combination (CE|CE, ECT|N-ECT, ECT|ECT) and then
the ingress regularly sends cumulative packet counts message of each the ingress regularly sends cumulative packet counts message of each
type of ECN combination to the egress. When each message arrives, the type of ECN combination to the egress. When each message arrives, the
egress cumulatively counts packets coming from the ingress and adds egress cumulatively counts packets coming from the ingress and adds
its own packet counts of each type of ECN combination (CE|CE, ECT|N- its own packet counts of each type of ECN combination (CE|CE, ECT|N-
ECT, CE|N-ECT, CE|ECT, ECT|ECT) to the message and either returns the ECT, CE|N-ECT, CE|ECT, ECT|ECT) to the message and returns the whole
whole message to the ingress, or to a central controller. message to the ingress.
The counting of packets can be at the granularity of the all traffic The counting of packets can be at the granularity of the all traffic
from the ingress to the egress to learn about the overall congestion from the ingress to the egress to learn about the overall congestion
status of the path between the ingress and the egress. The counting status of the path between the ingress and the egress. The counting
can also be at the granularity of individual customer's traffic or a can also be at the granularity of individual customer's traffic or a
specific set of flows to learn about their congestion contribution. specific set of flows to learn about their congestion contribution.
5. Congestion Information Delivery 5. Congestion Information Delivery
As described above, the tunnel ingress needs to convey message of As described above, the tunnel ingress needs to convey a message
cumulative packet counts of each type of ECN combination to tunnel containing cumulative packet counts of each type of ECN combination
egress, and the tunnel egress also needs to feed the message of to tunnel egress, and the tunnel egress also needs to feed back the
cumulative packet counts of each type of ECN combination to the message of cumulative packet counts of each type of ECN combination
ingress or central collector. This section describes how the messages to the ingress. This section describes how the messages should be
could be conveyed. conveyed.
The message can travel along the same path with network data traffic,
referred as in band signal; or go through a different path from
network data traffic, referred as out of band signal. Because out of
band scheme needs additional separate path which might limit its
actual deployment, the in band scheme will be discussed here.
Because the message is transmitted in band, so the message packet may The message travels along the same path with network data traffic,
get lost in case of network congestion. To cope with the situation referred as in-band signal. Because the message is transmitted in
that the message packet gets lost, the packet counts values are sent band, so the message packet may get lost in case of network
as cumulative counters. Then if a message is lost the next message congestion. To cope with the situation that the message packet gets
will recover the missing information. lost, the packet counts values are sent as cumulative counters. Then
if a message is lost the next message will recover the missing
information. Even though the missing information could be recovered,
the message should be transmitted in a much higher priority than
users' traffic flows.
IPFIX [RFC7011] is selected as information feedback protocol. IPFIX IPFIX [RFC7011] is selected as information feedback protocol. IPFIX
is preferred to use SCTP as transport. SCTP allows partially reliable uses preferably SCTP as transport. SCTP allows partially reliable
delivery [RFC3758], which ensures the feedback message will not be delivery [RFC3758], which ensures the feedback message will not be
blocked in case of packet loss due to network congestion. blocked in case of packet loss due to network congestion.
Ingress can do congestion management at different granularity which Ingress can do congestion management at different granularity which
means both the overall aggregated inner tunnel congestion level and means both the overall aggregated inner tunnel congestion level and
congestion level contributed by certain traffic(s) could be measured congestion level contributed by certain traffic(s) could be measured
for different congestion management purpose. For example, if the for different congestion management purpose. For example, if the
ingress only wants to limit congestion volume caused by certain ingress only wants to limit congestion volume caused by certain
traffic(s),e.g UDP-based traffic, then congestion volume for the traffic(s),e.g UDP-based traffic, then congestion volume for the
traffic will be fed back; or if the ingress do overall congestion traffic will be fed back; or if the ingress do overall congestion
management, the aggregated congestion volume will be fed back. management, the aggregated congestion volume will be fed back.
When sending message from ingress to egress, the ingress acts as When sending message from ingress to egress, the ingress acts as
IPFIX exporter and egress acts as IPFIX collector; When feedback IPFIX exporter and egress acts as IPFIX collector; When feedback
congestion level information from egress to ingress or to controller, congestion level information from egress to ingress, then the egress
the the egress acts as IPFIX exporter and ingress or controller acts acts as IPFIX exporter and ingress acts as IPFIX collector.
as IPFIX collector.
The combination of congestion level measurement and congestion The combination of congestion level measurement and congestion
information delivery procedure should be as following: information delivery procedure should be as following:
# The ingress determines template record to be used. The template # The ingress determines IPFIX template record to be used. The
record can be preconfigured or determined at runtime, the content of template record can be preconfigured or determined at runtime, the
template record will be determined according to the granularity of content of template record will be determined according to the
congestion management, if the ingress wants to limit congestion granularity of congestion management, if the ingress wants to limit
volume contributed by specific traffic flow then the elements such as congestion volume contributed by specific traffic flow then the
source IP address, destination IP address, flow id and CE-marked elements such as source IP address, destination IP address, flow id
packet volume of the flow etc will be included in the template and CE-marked packet volume of the flow etc will be included in the
record. template record.
# Meter on ingress measures traffic volume according to template # Meter on ingress measures traffic volume according to template
record chosen and then the measurement records are sent to egress in record chosen and then the measurement records are sent to egress in
band. band.
# Meter on egress measures congestion level information according to # Meter on egress measures congestion level information according to
template record, the template record can be preconfigured or use the template record, the content of template record should be the same
template record from ingress, the content of template record should as template record of ingress.
be the same as template record of ingress.
# Exporter of egress sends measurement record together with the # Exporter of egress sends measurement record together with the
measurement record of ingress to Controller or back to the ingress. measurement record of ingress back to the ingress.
5.1 IPFIX Extentions
5.1 IPFIX Extensions
This sub-section defines a list of new IPFIX Information Elements This sub-section defines a list of new IPFIX Information Elements
according to RFC7013 [RFC7013]. according to RFC7013 [RFC7013].
5.1.1 ce-cePacketTotalCount 5.1.1 tunnelEcnCeCePacketTotalCount
Description: The total number of incoming packets with CE|CE ECN Description: The total number of incoming packets with CE|CE ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD1 ElementId: TBD1
Statues: current Statues: current
Units: packets Units: packets
5.1.2 ect0-nectPacketTotalCount 5.1.2 tunnelEcnEct0NectPacketTotalCount
Description: The total number of incoming packets with ECT(0)|N-ECT Description: The total number of incoming packets with ECT(0)|N-ECT
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD2 ElementId: TBD2
Statues: current Statues: current
Units: packets Units: packets
5.1.3 ect1-nectPacketTotalCount 5.1.3 tunnelEcnEct1NectPacketTotalCount
Description: The total number of incoming packets with ECT(1)|N-ECT Description: The total number of incoming packets with ECT(1)|N-ECT
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD3 ElementId: TBD3
skipping to change at page 10, line 31 skipping to change at page 9, line 4
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD3 ElementId: TBD3
Statues: current Statues: current
Units: packets Units: packets
5.1.4 ce-nectPacketTotalCount 5.1.4 tunnelEcnCeNectPacketTotalCount
Description: The total number of incoming packets with CE|N-ECT ECN Description: The total number of incoming packets with CE|N-ECT ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD4 ElementId: TBD4
Statues: current Statues: current
Units: packets Units: packets
5.1.5 ce-ect0PacketTotalCount 5.1.5 tunnelEcnCeEct0PacketTotalCount
Description: The total number of incoming packets with CE|ECT(0) ECN Description: The total number of incoming packets with CE|ECT(0) ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD5 ElementId: TBD5
Statues: current Statues: current
Units: packets Units: packets
5.1.6 ce-ect1PacketTotalCount 5.1.6 tunnelEcnCeEct1PacketTotalCount
Description: The total number of incoming packets with CE|ECT(1) ECN Description: The total number of incoming packets with CE|ECT(1) ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD6 ElementId: TBD6
skipping to change at page 11, line 31 skipping to change at page 10, line 4
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD6 ElementId: TBD6
Statues: current Statues: current
Units: packets Units: packets
5.1.7 ect0-ect0PacketTotalCount
5.1.7 tunnelEcnEct0Ect0PacketTotalCount
Description: The total number of incoming packets with ECT(0)|ECT(0) Description: The total number of incoming packets with ECT(0)|ECT(0)
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD7 ElementId: TBD7
Statues: current Statues: current
Units: packets Units: packets
5.1.8 ect1-ect1PacketTotalCount 5.1.8 tunnelEcnEct1Ect1PacketTotalCount
Description: The total number of incoming packets with ECT(1)|ECT(1) Description: The total number of incoming packets with ECT(1)|ECT(1)
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Abstract Data Type: unsigned64 Abstract Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
ElementId: TBD8 ElementId: TBD8
Statues: current Statues: current
Units: packets Units: packets
6. Congestion Management 6. Congestion Management
After tunnel ingress (or controller) receives congestion level After tunnel ingress receives congestion level information, then
information, then congestion management actions could be taken based congestion management actions could be taken based on the
on the information, e.g. if the congestion level is higher than a information, e.g. if the congestion level is higher than a predefined
predefined threshold, then action could be taken to reduce the threshold, then action could be taken to reduce the congestion level.
congestion level.
The design of network side congestion management SHOULD take host The design of network side congestion management SHOULD take host
side e2e congestion control mechanism into consideration, which means side e2e congestion control mechanism into consideration, which means
the congestion management needs to avoid the impacts on e2e the congestion management needs to avoid the impacts on e2e
congestion control. For instance, congestion management action must congestion control. For instance, congestion management action must
be delayed by more than a worst-case global RTT, otherwise tunnel be delayed by more than a worst-case global RTT (e.g. 100ms),
traffic management will not give normal e2e congestion control enough otherwise tunnel traffic management will not give normal e2e
time to do its job, and the system could go unstable. congestion control enough time to do its job, and the system could go
unstable.
The detailed description of congestion management is out of scope of The detailed description of congestion management is out of scope of
this document, as examples, congestion management such as circuit this document, as examples, congestion management such as circuit
breaker [CB] and congestion policing [CP] could be applied. Circuit breaker [CB] could be applied. Circuit breaker is an automatic
breaker is an automatic mechanism to estimate congestion, and to mechanism to estimate congestion, and to terminate flow(s) when
terminate flow(s) when persistent congestion is detected to prevent persistent congestion is detected to prevent network congestion
network congestion collapse; Congestion policing is used in data collapse.
center to limit the amount of congestion any tenant can cause
according to the congestion information in the tunnels.
7. Security 6.1 Example
This subsection provides an example of how the solution described in
this document could work.
First of all, IPFIX template records are exchanged between ingress
and egress to negotiate the format of data record, the example here
is to measure the congestion level for the overall tunnel (caused by
all the traffic in tunnel). After the negotiation is finished,
ingress sends in-band message to egress, the message contains the
number of each kind of ECN-marked packets (i.e. CE|CE, ECT|N-ECT and
ECT|ECT) received until the sending of message.
After egress receives the message, the egress counts number of
different kinds of ECN-marking packets received until receiving the
message, then the egress sends a feedback message containing the
counts together with the information in ingress's message to ingress.
Figure 3 to Figure 6 below show the example procedure between ingress
and egress.
+---------------------------------+----------------------+
|Set ID=2 Length=40 |
|---------------------------------|----------------------|
|Template ID=256 Field Count =8 |
|---------------------------------|----------------------|
|tunnelEcnCeCePacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnEctNectPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnEctEctPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnCeCePacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnEctNectPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnEctEctPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnCeNectPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnCeEctPacketTotalCount | Field Length=8 |
+---------------------------------+----------------------+
Figure 3: Template Record Sent From Egress to Ingress
+---------------------------------+----------------------+
|Set ID=2 Length=28 |
|---------------------------------|----------------------|
|Template ID=257 Field Count =3 |
|---------------------------------|----------------------|
|tunnelEcnCeCePacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnEctNectPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
|tunnelEcnEctEctPacketTotalCount Field Length=8 |
|---------------------------------|----------------------|
Figure 4: Template Record Sent From Ingress to Egress
+-------+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-------+
| | |M| |P| |P| |P| |M| |P| |P| | |
| | +-+ +-+ +-+ +-+ +-+ +-+ +-+ | |
| |<---------------------------------------| |
| | | |
| | | |
|egress | +-+ +-+ |ingress|
| | |M| |M| | |
| | +-+ +-+ | |
| |--------------------------------------->| |
| | | |
| | | |
+-------+ +-------+
+-+
|M| : Message Packet
+-+
+-+
|P| : User Packet
+-+
Figure 5 Traffic flow Between Ingress and Egress
Set ID=257, Length=28
+------+ A1 +------+
| | B1 | |
| | C1 | |
| | <----------------------------- | |
| | | |
| | | |
| | SetID=256, Length=68 | |
| | A1 | |
| | B1 | |
|egress| C1 ingress|
| | A2 | |
| | B2 | |
| | C2 | |
| | D | |
| | E | |
| | ----------------------------> | |
| | | |
+------+ +------+
Figure 6: Message Between Ingress and Egress
The following provides an example of how tunnel congestion level
could be calculated:
Congestion Level could be divided into two categories:(1)slight
congestion(no packets dropped); (2)serious congestion (packet
dropping happen).
For slight congestion, the congestion level is indicated as the
number of CE-marked packet:
ce_marked = (A2 + D + E) - A1;
For serious congestion, the congestion level is indicated as the
number of lost packets:
total_ingress = (A1 + B1 + C1)
total_egress = (A2 + B2 + C2 + D + E)
packet_loss = (total_ingress - total_egress)
7. Security Considerations
This document describes the tunnel congestion calculation and This document describes the tunnel congestion calculation and
feedback. For feeding back congestion, security mechanisms of IPFIX feedback.
are expected to be sufficient. No additional security concerns are
expected. The tunnel endpoints are assumed to be deployed in the same
administrative domain, so the ingress and egress will trust each
other, the signaling traffic between ingress and egress will be
protected utilizing security mechanism provided IPFIX (see section 11
in RFC7011).
From the consideration of privacy point of view, in case of fine
grained congestion management, ingress is aware of the amount of
traffic for specific application flows inside the tunnel which seems
to be an invasion of privacy. But in any way, the ingress could The
solution doesn't introduce more privacy problem.
8. IANA Considerations 8. IANA Considerations
This document defines a set of new IPFIX Information Elements This document defines a set of new IPFIX Information Elements
(IE),which need to be registered at IANA IPFIX Information Element (IE),which need to be registered at IANA IPFIX Information Element
Registry. Registry.
ElementID: TBD1 ElementID: TBD1
Name:ce-cePacketTotalCount Name:tunnelEcnCeCePacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|CE ECN Description:The total number of incoming packets with CE|CE ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD2 ElementID: TBD2
Name:ect0-nectPacketTotalCount Name:tunnelEcnEct0NectPacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with ECT(0)|N-ECT Description:The total number of incoming packets with ECT(0)|N-ECT
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD3 ElementID: TBD3
Name: ect1-nectPacketTotalCount Name: tunnelEcnEct1NectPacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with ECT(1)|N-ECT Description:The total number of incoming packets with ECT(1)|N-ECT
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD4 ElementID: TBD4
Name:ce-nectPacketTotalCount Name:tunnelEcnCeNectPacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|N-ECT ECN Description:The total number of incoming packets with CE|N-ECT ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD5 ElementID: TBD5
Name:ce-ect0PacketTotalCount Name:tunnelEcnCeEct0PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|ECT(0) ECN Description:The total number of incoming packets with CE|ECT(0) ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD6 ElementID: TBD6
Name:ce-ect1PacketTotalCount Name:tunnelEcnCeEct1PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with CE|ECT(1) ECN Description:The total number of incoming packets with CE|ECT(1) ECN
marking combination for this Flow at the Observation Point since the marking combination for this Flow at the Observation Point since the
Metering Process (re-)initialization for this Observation Point. Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD7 ElementID: TBD7
Name:ect0-ect0PacketTotalCount Name:tunnelEcnEct0Ect0PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with ECT(0)|ECT(0) Description:The total number of incoming packets with ECT(0)|ECT(0)
ECN marking combination for this Flow at the Observation Point since ECN marking combination for this Flow at the Observation Point since
the Metering Process (re-)initialization for this Observation Point. the Metering Process (re-)initialization for this Observation Point.
Units: packets Units: packets
ElementID: TBD8 ElementID: TBD8
Name:ect1-ect1PacketTotalCount Name:tunnelEcnEct1Ect1PacketTotalCount
Data Type: unsigned64 Data Type: unsigned64
Data Type Semantics: totalCounter Data Type Semantics: totalCounter
Status: current Status: current
Description:The total number of incoming packets with Description:The total number of incoming packets with
ECT(1)|ECT(1)ECN marking combination for this Flow at the Observation ECT(1)|ECT(1)ECN marking combination for this Flow at the Observation
Point since the Metering Process (re-)initialization for this Point since the Metering Process (re-)initialization for this
Observation Point. Observation Point.
Units: packets Units: packets
[TO BE REMOVED: This registration should take place at the following [TO BE REMOVED: This registration should take place at the following
skipping to change at page 15, line 10 skipping to change at page 17, line 5
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997, Requirement Levels", BCP 14, RFC 2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>. <http://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001, <http://www.rfc- RFC 3168, September 2001, <http://www.rfc-
editor.org/info/rfc3168>. editor.org/info/rfc3168>.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003,
<http://www.rfc-editor.org/info/rfc3550>.
[RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P.
Conrad, "Stream Control Transmission Protocol (SCTP) Conrad, "Stream Control Transmission Protocol (SCTP)
Partial Reliability Extension", RFC 3758, May 2004, Partial Reliability Extension", RFC 3758, May 2004,
<http://www.rfc-editor.org/info/rfc3758>. <http://www.rfc-editor.org/info/rfc3758>.
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram
Congestion Control Protocol (DCCP)", RFC 4340, March 2006, Congestion Control Protocol (DCCP)", RFC 4340, March 2006,
<http://www.rfc-editor.org/info/rfc4340>. <http://www.rfc-editor.org/info/rfc4340>.
[RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol",
skipping to change at page 15, line 47 skipping to change at page 17, line 47
[CONEX] Matt Mathis, Bob Briscoe. "Congestion Exposure (ConEx) [CONEX] Matt Mathis, Bob Briscoe. "Congestion Exposure (ConEx)
Concepts, Abstract Mechanism and Requirements", RFC7713, Concepts, Abstract Mechanism and Requirements", RFC7713,
December 2015 December 2015
9.2 Informative References 9.2 Informative References
[CB] G. Fairhurst. "Network Transport Circuit Breakers", draft-ietf- [CB] G. Fairhurst. "Network Transport Circuit Breakers", draft-ietf-
tsvwg-circuit-breaker-01, April 02, 2015 tsvwg-circuit-breaker-01, April 02, 2015
[CP] Bob Briscoe, Murari Sridharan. "Network Performance Isolation
in Data Centres using Congestion Policing", draft-briscoe-
conex-data-centre-02, February 14, 2014
10. Acknowledgements 10. Acknowledgements
Thanks Bob Briscoe for his insightful suggestions on the basic Thanks Bob Briscoe for his insightful suggestions on the basic
mechanisms of congestion information collection and many other useful mechanisms of congestion information collection and many other useful
comments. Thanks David Black for his useful technical suggestions. comments. Thanks David Black for his useful technical suggestions.
Also, thanks Anthony Chan and John Kaippallimalil for their careful
reviews. Also, thanks Anthony Chan, Jake Holland, John Kaippallimalil and
Vincent Roca for their careful reviews.
Authors' Addresses Authors' Addresses
Xinpeng Wei Xinpeng Wei
Beiqing Rd. Z-park No.156, Haidian District, Beiqing Rd. Z-park No.156, Haidian District,
Beijing, 100095, P. R. China Beijing, 100095, P. R. China
E-mail: weixinpeng@huawei.com E-mail: weixinpeng@huawei.com
Zhu Lei Zhu Lei
Beiqing Rd. Z-park No.156, Haidian District, Beiqing Rd. Z-park No.156, Haidian District,
 End of changes. 58 change blocks. 
262 lines changed or deleted 327 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/