draft-ietf-tsvwg-datagram-plpmtud-07.txt   draft-ietf-tsvwg-datagram-plpmtud-08.txt 
Internet Engineering Task Force G. Fairhurst Internet Engineering Task Force G. Fairhurst
Internet-Draft T. Jones Internet-Draft T. Jones
Updates: 4821 (if approved) University of Aberdeen Updates4821 (if approved) University of Aberdeen
Intended status: Standards Track M. Tuexen Intended status: Standards Track M. Tuexen
Expires: August 22, 2019 I. Ruengeler Expires: 7 December 2019 I. Ruengeler
T. Voelker T. Voelker
Muenster University of Applied Sciences Muenster University of Applied Sciences
February 18, 2019 5 June 2019
Packetization Layer Path MTU Discovery for Datagram Transports Packetization Layer Path MTU Discovery for Datagram Transports
draft-ietf-tsvwg-datagram-plpmtud-07 draft-ietf-tsvwg-datagram-plpmtud-08
Abstract Abstract
This document describes a robust method for Path MTU Discovery This document describes a robust method for Path MTU Discovery
(PMTUD) for datagram Packetization Layers (PLs). The document (PMTUD) for datagram Packetization Layers (PLs). It describes an
describes an extension to RFC 1191 and RFC 8201, which specifies extension to RFC 1191 and RFC 8201, which specifies ICMP-based Path
ICMP-based Path MTU Discovery for IPv4 and IPv6. The method allows a MTU Discovery for IPv4 and IPv6. The method allows a PL, or a
PL, or a datagram application that uses a PL, to discover whether a datagram application that uses a PL, to discover whether a network
network path can support the current size of datagram. This can be path can support the current size of datagram. This can be used to
used to detect and reduce the message size when a sender encounters a detect and reduce the message size when a sender encounters a network
network black hole (where packets are discarded, and no ICMP message black hole (where packets are discarded). The method can probe a
is received). The method can also probe a network path with network path with progressively larger packets to discover whether
progressively larger packets to find whether the maximum packet size the maximum packet size can be increased. This allows a sender to
can be increased. This allows a sender to determine an appropriate determine an appropriate packet size, providing functionally for
packet size, providing functionally for datagram transports that is datagram transports that is equivalent to the Packetization Layer
equivalent to the Packetization Layer PMTUD specification for TCP, PMTUD specification for TCP, specified in RFC 4821.
specified in RFC 4821.
The document also provides implementation notes for incorporating The document also provides implementation notes for incorporating
Datagram PMTUD into IETF datagram transports or applications that use Datagram PMTUD into IETF datagram transports or applications that use
datagram transports. datagram transports.
When published, this specification updates RFC 4821. When published, this specification updates RFC 4821.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 22, 2019. This Internet-Draft will expire on 7 December 2019.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents (https://trustee.ietf.org/
(https://trustee.ietf.org/license-info) in effect on the date of license-info) in effect on the date of publication of this document.
publication of this document. Please review these documents Please review these documents carefully, as they describe your rights
carefully, as they describe your rights and restrictions with respect and restrictions with respect to this document. Code Components
to this document. Code Components extracted from this document must extracted from this document must include Simplified BSD License text
include Simplified BSD License text as described in Section 4.e of as described in Section 4.e of the Trust Legal Provisions and are
the Trust Legal Provisions and are provided without warranty as provided without warranty as described in the Simplified BSD License.
described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4 1.1. Classical Path MTU Discovery . . . . . . . . . . . . . . 4
1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 6 1.2. Packetization Layer Path MTU Discovery . . . . . . . . . 6
1.3. Path MTU Discovery for Datagram Services . . . . . . . . 7 1.3. Path MTU Discovery for Datagram Services . . . . . . . . 7
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7
3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 9 3. Features Required to Provide Datagram PLPMTUD . . . . . . . . 10
4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 12 4. DPLPMTUD Mechanisms . . . . . . . . . . . . . . . . . . . . . 12
4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 12 4.1. PLPMTU Probe Packets . . . . . . . . . . . . . . . . . . 12
4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 13 4.2. Confirmation of Probed Packet Size . . . . . . . . . . . 14
4.3. Detection of Black Holes . . . . . . . . . . . . . . . . 14 4.3. Detection of Unsupported PLPMTU Size, aka Black Hole
Detection . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4. Response to PTB Messages . . . . . . . . . . . . . . . . 15 4.4. Response to PTB Messages . . . . . . . . . . . . . . . . 15
4.4.1. Validation of PTB Messages . . . . . . . . . . . . . 15 4.4.1. Validation of PTB Messages . . . . . . . . . . . . . 15
4.4.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 16 4.4.2. Use of PTB Messages . . . . . . . . . . . . . . . . . 16
5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 17 5. Datagram Packetization Layer PMTUD . . . . . . . . . . . . . 17
5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 18 5.1. DPLPMTUD Components . . . . . . . . . . . . . . . . . . . 18
5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 18 5.1.1. Timers . . . . . . . . . . . . . . . . . . . . . . . 18
5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 19 5.1.2. Constants . . . . . . . . . . . . . . . . . . . . . . 19
5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 19 5.1.3. Variables . . . . . . . . . . . . . . . . . . . . . . 20
5.2. DPLPMTUD Phases . . . . . . . . . . . . . . . . . . . . . 20 5.1.4. Overview of DPLPMTUD Phases . . . . . . . . . . . . . 21
5.2.1. BASE_PMTU Confirmation Phase . . . . . . . . . . . . 22 5.2. State Machine . . . . . . . . . . . . . . . . . . . . . . 23
5.2.2. Search Phase . . . . . . . . . . . . . . . . . . . . 22 5.3. Search to Increase the PLPMTU . . . . . . . . . . . . . . 26
5.2.2.1. Resilience to Inconsistent Path Information . . . 22 5.3.1. Probing for a larger PLPMTU . . . . . . . . . . . . . 26
5.2.3. Search Complete Phase . . . . . . . . . . . . . . . . 23 5.3.2. Selection of Probe Sizes . . . . . . . . . . . . . . 27
5.2.4. PROBE_BASE Phase . . . . . . . . . . . . . . . . . . 23 5.3.3. Resilience to Inconsistent Path Information . . . . . 27
5.2.5. ERROR Phase . . . . . . . . . . . . . . . . . . . . . 24 5.4. Robustness to Inconsistent Paths . . . . . . . . . . . . 28
5.2.5.1. Robustness to Inconsistent Path . . . . . . . . . 24
5.2.6. DISABLED Phase . . . . . . . . . . . . . . . . . . . 24
5.3. State Machine . . . . . . . . . . . . . . . . . . . . . . 24
5.4. Search to Increase the PLPMTU . . . . . . . . . . . . . . 27
5.4.1. Probing for a Larger PLPMTU . . . . . . . . . . . . . 27
5.4.2. Selection of Probe Sizes . . . . . . . . . . . . . . 28
5.4.3. Resilience to Inconsistent Path Information . . . . . 28
6. Specification of Protocol-Specific Methods . . . . . . . . . 28 6. Specification of Protocol-Specific Methods . . . . . . . . . 28
6.1. Application support for DPLPMTUD with UDP or UDP-Lite . . 29 6.1. Application support for DPLPMTUD with UDP or
UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 28
6.1.1. Application Request . . . . . . . . . . . . . . . . . 29 6.1.1. Application Request . . . . . . . . . . . . . . . . . 29
6.1.2. Application Response . . . . . . . . . . . . . . . . 29 6.1.2. Application Response . . . . . . . . . . . . . . . . 29
6.1.3. Sending Application Probe Packets . . . . . . . . . . 30 6.1.3. Sending Application Probe Packets . . . . . . . . . . 29
6.1.4. Validating the Path . . . . . . . . . . . . . . . . . 30 6.1.4. Validating the Path . . . . . . . . . . . . . . . . . 29
6.1.5. Handling of PTB Messages . . . . . . . . . . . . . . 30 6.1.5. Handling of PTB Messages . . . . . . . . . . . . . . 29
6.2. DPLPMTUD with UDP Options . . . . . . . . . . . . . . . . 30 6.2. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 30
6.2.1. UDP Probe Request Option . . . . . . . . . . . . . . 32 6.2.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 30
6.2.2. UDP Probe Response Option . . . . . . . . . . . . . . 32 6.2.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 31
6.3. DPLPMTUD for SCTP . . . . . . . . . . . . . . . . . . . . 33 6.2.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 31
6.3.1. SCTP/IPv4 and SCTP/IPv6 . . . . . . . . . . . . . . . 33 6.3. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 32
6.3.1.1. Sending SCTP Probe Packets . . . . . . . . . . . 33 6.3.1. Sending QUIC Probe Packets . . . . . . . . . . . . . 32
6.3.1.2. Validating the Path with SCTP . . . . . . . . . . 34 6.3.2. Validating the Path with QUIC . . . . . . . . . . . . 33
6.3.1.3. PTB Message Handling by SCTP . . . . . . . . . . 34 6.3.3. Handling of PTB Messages by QUIC . . . . . . . . . . 33
6.3.2. DPLPMTUD for SCTP/UDP . . . . . . . . . . . . . . . . 34 6.4. DPLPMTUD for UDP-Options . . . . . . . . . . . . . . . . 33
6.3.2.1. Sending SCTP/UDP Probe Packets . . . . . . . . . 34 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33
6.3.2.2. Validating the Path with SCTP/UDP . . . . . . . . 34 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33
6.3.2.3. Handling of PTB Messages by SCTP/UDP . . . . . . 34 9. Security Considerations . . . . . . . . . . . . . . . . . . . 33
6.3.3. DPLPMTUD for SCTP/DTLS . . . . . . . . . . . . . . . 34 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.3.3.1. Sending SCTP/DTLS Probe Packets . . . . . . . . . 35 10.1. Normative References . . . . . . . . . . . . . . . . . . 34
6.3.3.2. Validating the Path with SCTP/DTLS . . . . . . . 35 10.2. Informative References . . . . . . . . . . . . . . . . . 36
6.3.3.3. Handling of PTB Messages by SCTP/DTLS . . . . . . 35 Appendix A. Revision Notes . . . . . . . . . . . . . . . . . . . 37
6.4. DPLPMTUD for QUIC . . . . . . . . . . . . . . . . . . . . 35 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40
6.4.1. Sending QUIC Probe Packets . . . . . . . . . . . . . 35
6.4.2. Validating the Path with QUIC . . . . . . . . . . . . 36
6.4.3. Handling of PTB Messages by QUIC . . . . . . . . . . 36
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36
9. Security Considerations . . . . . . . . . . . . . . . . . . . 36
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 38
10.1. Normative References . . . . . . . . . . . . . . . . . . 38
10.2. Informative References . . . . . . . . . . . . . . . . . 39
Appendix A. Event-driven state changes . . . . . . . . . . . . . 40
Appendix B. Revision Notes . . . . . . . . . . . . . . . . . . . 43
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45
1. Introduction 1. Introduction
The IETF has specified datagram transport using UDP, SCTP, and DCCP, The IETF has specified datagram transport using UDP, SCTP, and DCCP,
as well as protocols layered on top of these transports (e.g., SCTP/ as well as protocols layered on top of these transports (e.g., SCTP/
UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP UDP, DCCP/UDP, QUIC/UDP), and direct datagram transport over the IP
network layer. This document describes a robust method for Path MTU network layer. This document describes a robust method for Path MTU
Discovery (PMTUD) that may be used with these transport protocols (or Discovery (PMTUD) that may be used with these transport protocols (or
the applications that use their transport service) to discover an the applications that use their transport service) to discover an
appropriate size of packet to use across an Internet path. appropriate size of packet to use across an Internet path.
1.1. Classical Path MTU Discovery 1.1. Classical Path MTU Discovery
Classical Path Maximum Transmission Unit Discovery (PMTUD) can be Classical Path Maximum Transmission Unit Discovery (PMTUD) can be
used with any transport that is able to process ICMP Packet Too Big used with any transport that is able to process ICMP Packet Too Big
(PTB) messages (e.g., [RFC1191] and [RFC8201]). The term PTB message (PTB) messages (e.g., [RFC1191] and [RFC8201]). In this document,
is applied to both IPv4 ICMP Unreachable messages (type 3) that carry the term PTB message is applied to both IPv4 ICMP Unreachable
the error Fragmentation Needed (Type 3, Code 4) [RFC0792] and ICMPv6 messages (type 3) that carry the error Fragmentation Needed (Type 3,
packet too big messages (Type 2) [RFC4443]. When a sender receives a Code 4) [RFC0792] and ICMPv6 packet too big messages (Type 2)
PTB message, it reduces the effective MTU to the value reported as [RFC4443]. When a sender receives a PTB message, it reduces the
the Link MTU in the PTB message, and a method that from time-to-time effective MTU to the value reported as the Link MTU in the PTB
increases the packet size in attempt to discover an increase in the message, and a method that from time-to-time increases the packet
supported PMTU. The packets sent with a size larger than the current size in attempt to discover an increase in the supported PMTU. The
effective PMTU are known as probe packets. packets sent with a size larger than the current effective PMTU are
known as probe packets.
Packets not intended as probe packets are either fragmented to the Packets not intended as probe packets are either fragmented to the
current effective PMTU, or the attempt to send fails with an error current effective PMTU, or the attempt to send fails with an error
code. Applications are sometimes provided with a primitive to let code. Applications are sometimes provided with a primitive to let
them read the Maximum Packet Size (MPS), derived from the current them read the Maximum Packet Size (MPS), derived from the current
effective PMTU. effective PMTU.
Classical PMTUD is subject to protocol failures. One failure arises Classical PMTUD is subject to protocol failures. One failure arises
when traffic using a packet size larger than the actual PMTU is when traffic using a packet size larger than the actual PMTU is
black-holed (all datagrams sent with this size, or larger, are black-holed (all datagrams sent with this size, or larger, are
silently discarded without the sender receiving PTB messages). This discarded). This could arise when the PTB messages are not delivered
could arise when the PTB messages are not delivered back to the back to the sender for some reason (see for example [RFC2923]).
sender for some reason (see for example [RFC2923]).
Examples where PTB messages are not delivered include: Examples where PTB messages are not delivered include:
o The generation of ICMP messages is usually rate limited. This may * The generation of ICMP messages is usually rate limited. This
result in no PTB messages being sent to the sender (see section could result in no PTB messages being generated to the sender (see
2.4 of [RFC4443]) section 2.4 of [RFC4443])
o ICMP messages are increasingly filtered by middleboxes (including * ICMP messages can be filtered by middleboxes (including firewalls)
firewalls) [RFC4890]. A stateful firewall could be configured [RFC4890]. A stateful firewall could be configured with a policy
with a policy to block incoming ICMP messages, which would prevent to block incoming ICMP messages, which would prevent reception of
reception of PTB messages to endpoints behind this firewall. PTB messages to a sending endpoint behind this firewall.
o When the router issuing the ICMP message drops a tunneled packet, * When the router issuing the ICMP message drops a tunneled packet,
the resulting ICMP message will be directed to the tunnel ingress. the resulting ICMP message will be directed to the tunnel ingress.
This tunnel endpoint is responsible for forwarding the ICMP This tunnel endpoint is responsible for forwarding the ICMP
message and also processing the quoted packet within the payload message and also processing the quoted packet within the payload
field to remove the effect of the tunnel, and return a correctly field to remove the effect of the tunnel, and return a correctly
formatted ICMP message to the sender [I-D.ietf-intarea-tunnels]. formatted ICMP message to the sender [I-D.ietf-intarea-tunnels].
Failure to do this results in black-holing. Failure to do this prevents the PTB message reaching the original
sender.
o Asymmetry in forwarding can result in there being no route back to * Asymmetry in forwarding can result in there being no return route
the original sender, which would prevent an ICMP message being to the original sender, which would prevent an ICMP message being
delivered to the sender. This can be also be an issue when delivered to the sender. This issue can also arise when policy-
policy-based routing is used, Equal Cost Multipath (ECMP) routing based routing is used, Equal Cost Multipath (ECMP) routing is
is used, or a middlebox acts as an application load balancer. An used, or a middlebox acts as an application load balancer. An
example is where the path towards the server is chosen by ECMP example is where the path towards the server is chosen by ECMP
routing depending on bytes in the IP payload. In this case, when routing depending on bytes in the IP payload. In this case, when
a packet sent by the server encounters a problem after the ECMP a packet sent by the server encounters a problem after the ECMP
router, then any resulting ICMP message needs to also be directed router, then any resulting ICMP message needs to also be directed
by the ECMP router towards the same server (i.e., ICMP messages by the ECMP router towards the original sender.
need to follow the same path as the flows to which they
correspond). Failure to do this results in black-holing.
o There are cases where the next hop destination fails to receive a * There are additional cases where the next hop destination fails to
packet because of its size. This could be due to misconfiguration receive a packet because of its size. This could be due to
of the layer 2 path between nodes, for instance the MTU configured misconfiguration of the layer 2 path between nodes, for instance
in a layer 2 switch, or misconfiguration of the Maximum Receive the MTU configured in a layer 2 switch, or misconfiguration of the
Unit (MRU). If the packet is dropped by the link, this will not Maximum Receive Unit (MRU). If the packet is dropped by the link,
cause a PTB message to be sent, and result in consequent black- this will not cause a PTB message to be sent to the original
holing. sender.
Another failure could result if a node that is not on the network Another failure could result if a node that is not on the network
path sends a PTB message that attempts to force the sender to change path sends a PTB message that attempts to force a sender to change
the effective PMTU [RFC8201]. A sender can protect itself from the effective PMTU [RFC8201]. A sender can protect itself from
reacting to such messages by utilising the quoted packet within a PTB reacting to such messages by utilising the quoted packet within a PTB
message payload to validate that the received PTB message was message payload to validate that the received PTB message was
generated in response to a packet that had actually originated from generated in response to a packet that had actually originated from
the sender. However, there are situations where a sender would be the sender. However, there are situations where a sender would be
unable to provide this validation. unable to provide this validation. Examples where validation of the
PTB message is not possible include:
Examples where validation of the PTB message is not possible include:
o When a router issuing the ICMP message implements RFC792 * When a router issuing the ICMP message implements RFC792
[RFC0792], it is only required to include the first 64 bits of the [RFC0792], it is only required to include the first 64 bits of the
IP payload of the packet within the quoted payload. This may be IP payload of the packet within the quoted payload. There could
insufficient to perform the tunnel processing described in the be insufficient bytes remaining for the sender to interpret the
previous bullet. There could be insufficient bytes remaining for quoted transport information.
the sender to interpret the quoted transport information. The
recommendation in RFC1812 [RFC1812] is that IPv4 routers return a
quoted packet with as much of the original datagram as possible
without the length of the ICMP datagram exceeding 576 bytes.
(IPv6 routers include as much of invoking packet as possible
without the ICMPv6 packet exceeding 1280 bytes [RFC4443].)
o The use of tunnels/encryption can reduce the size of the quoted Note: The recommendation in RFC1812 [RFC1812] is that IPv4 routers
return a quoted packet with as much of the original datagram as
possible without the length of the ICMP datagram exceeding 576
bytes. IPv6 routers include as much of the invoking packet as
possible without the ICMPv6 packet exceeding 1280 bytes [RFC4443].
* The use of tunnels/encryption can reduce the size of the quoted
packet returned to the original source address, increasing the packet returned to the original source address, increasing the
risk that there could be insufficient bytes remaining for the risk that there could be insufficient bytes remaining for the
sender to interpret the quoted transport information. sender to interpret the quoted transport information.
o Even when the PTB message includes sufficient bytes of the quoted * Even when the PTB message includes sufficient bytes of the quoted
packet, the network layer could lack sufficient context to packet, the network layer could lack sufficient context to
validate the message, because validation depends on information validate the message, because validation depends on information
about the active transport flows at an endpoint node (e.g., the about the active transport flows at an endpoint node (e.g., the
socket/address pairs being used, and other protocol header socket/address pairs being used, and other protocol header
information). information).
o When a packet is encapsulated/tunneled over an encrypted * When a packet is encapsulated/tunneled over an encrypted
transport, the tunnel/encapsulation ingress might have transport, the tunnel/encapsulation ingress might have
insufficient context, or computational power, to reconstruct the insufficient context, or computational power, to reconstruct the
transport header that would be needed to perform validation. transport header that would be needed to perform validation.
1.2. Packetization Layer Path MTU Discovery 1.2. Packetization Layer Path MTU Discovery
The term Packetization Layer (PL) has been introduced to describe the The term Packetization Layer (PL) has been introduced to describe the
layer that is responsible for placing data blocks into the payload of layer that is responsible for placing data blocks into the payload of
IP packets and selecting an appropriate MPS. This function is often IP packets and selecting an appropriate MPS. This function is often
performed by a transport protocol, but can also be performed by other performed by a transport protocol, but can also be performed by other
encapsulation methods working above the transport layer. encapsulation methods working above the transport layer.
In contrast to PMTUD, Packetization Layer Path MTU Discovery In contrast to PMTUD, Packetization Layer Path MTU Discovery
(PLPMTUD) [RFC4821] does not rely upon reception and validation of (PLPMTUD) [RFC4821] does not rely upon reception and validation of
PTB messages. It is therefore more robust than Classical PMTUD. PTB messages. It is therefore more robust than Classical PMTUD.
This has become the recommended approach for implementing PMTU This has become the recommended approach for implementing PMTU
discovery with TCP. discovery with TCP.
It uses a general strategy where the PL sends probe packets to search It uses a general strategy where the PL sends probe packets to search
for the largest size of unfragmented datagram that can be sent over a for the largest size of unfragmented datagram that can be sent over a
network path. The probe packets are sent with a progressively larger network path. Probe packets are sent with a progressively larger
packet size. If a probe packet is successfully delivered (as packet size. If a probe packet is successfully delivered (as
determined by the PL), then the PLPMTU is raised to the size of the determined by the PL), then the PLPMTU is raised to the size of the
successful probe. If no response is received to a probe packet, the successful probe. If no response is received to a probe packet, the
method reduces the probe size. This PLPMTU is used to set the method reduces the probe size. The result of probing with the PLPMTU
application MPS. is used to set the application MPS.
PLPMTUD introduces flexibility in the implementation of PMTU PLPMTUD introduces flexibility in the implementation of PMTU
discovery. At one extreme, it can be configured to only perform PTB discovery. At one extreme, it can be configured to only perform ICMP
black hole detection and recovery to increase the robustness of black Hole Detection and recovery to increase the robustness of
Classical PMTUD, or at the other extreme, all PTB processing can be Classical PMTUD, or at the other extreme, all PTB processing can be
disabled and PLPMTUD can completely replace Classical PMTUD. disabled and PLPMTUD can completely replace Classical PMTUD.
PLPMTUD can also include additional consistency checks without PLPMTUD can also include additional consistency checks without
increasing the risk of increased black-holing. For instance,the increasing the risk that data is lost when probing to discover the
information available at the PL, or higher layers, makes PTB message path MTU. For example, information available at the PL, or higher
validation more straight forward. layers, enables received PTB messages to be validated before being
utilized.
1.3. Path MTU Discovery for Datagram Services 1.3. Path MTU Discovery for Datagram Services
Section 5 of this document presents a set of algorithms for datagram Section 5 of this document presents a set of algorithms for datagram
protocols to discover the largest size of unfragmented datagram that protocols to discover the largest size of unfragmented datagram that
can be sent over a network path. The method described relies on can be sent over a network path. The method described relies on
features of the PL described in Section 3 and applies to transport features of the PL described in Section 3 and applies to transport
protocols operating over IPv4 and IPv6. It does not require protocols operating over IPv4 and IPv6. It does not require
cooperation from the lower layers, although it can utilise PTB cooperation from the lower layers, although it can utilize PTB
messages when these received messages are made available to the PL. messages when these received messages are made available to the PL.
The UDP Usage Guidelines [RFC8085] state "an application SHOULD The UDP Usage Guidelines [RFC8085] state "an application SHOULD
either use the Path MTU information provided by the IP layer or either use the Path MTU information provided by the IP layer or
implement Path MTU Discovery (PMTUD)", but does not provide a implement Path MTU Discovery (PMTUD)", but does not provide a
mechanism for discovering the largest size of unfragmented datagram mechanism for discovering the largest size of unfragmented datagram
that can be used on a network path. Prior to this document, PLPMTUD that can be used on a network path. Prior to this document, PLPMTUD
had not been specified for UDP. had not been specified for UDP.
Section 10.2 of [RFC4821] recommends a PLPMTUD probing method for the Section 10.2 of [RFC4821] recommends a PLPMTUD probing method for the
Stream Control Transport Protocol (SCTP). SCTP utilises probe Stream Control Transport Protocol (SCTP). SCTP utilizes probe
packets consisting of a minimal sized HEARTBEAT chunk bundled with a packets consisting of a minimal sized HEARTBEAT chunk bundled with a
PAD chunk as defined in [RFC4820], but RFC4821 does not provide a PAD chunk as defined in [RFC4820], but RFC4821 does not provide a
complete specification. The present document provides the details to complete specification. The present document provides the details to
complete that specification. complete that specification.
The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires The Datagram Congestion Control Protocol (DCCP) [RFC4340] requires
implementations to support Classical PMTUD and states that a DCCP implementations to support Classical PMTUD and states that a DCCP
sender "MUST maintain the MPS allowed for each active DCCP session". sender "MUST maintain the MPS allowed for each active DCCP session".
It also defines the current congestion control MPS (CCMPS) supported It also defines the current congestion control MPS (CCMPS) supported
by a network path. This recommends use of PMTUD, and suggests use of by a network path. This recommends use of PMTUD, and suggests use of
skipping to change at page 8, line 12 skipping to change at page 8, line 9
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
Other terminology is directly copied from [RFC4821], and the Other terminology is directly copied from [RFC4821], and the
definitions in [RFC1122]. definitions in [RFC1122].
Actual PMTU: The Actual PMTU is the PMTU of a network path between a Actual PMTU: The Actual PMTU is the PMTU of a network path between a
sender PL and a destination PL, which the DPLPMTUD algorithm seeks sender PL and a destination PL, which the DPLPMTUD algorithm seeks
to determine. to determine.
Black Holed: Packets are Black holed when the sender is unaware that Black Hole: A Black Hole is encountered when a sender is unaware
packets are not delivered to the destination endpoint (e.g., when that packets are not being delivered to the destination end point.
the sender transmits packets of a particular size with a Two types of Black Hole are relevant to DPLPMTUD:
previously known effective PMTU and they are silently discarded by
the network, but is not made aware of a change to the path that Packet Black Hole: Packets encounter a Packet Black Hole when
resulted in a smaller PLPMTU by ICMP messages). packets are not delivered to the destination
endpoint (e.g., when the sender transmits
packets of a particular size with a previously
known effective PMTU and they are discarded by
the network).
ICMP Black Hole An ICMP Black Hole is encountered when the
sender is unaware that packets are not
delivered to the destination endpoint because
PTB messages are not received by the
originating PL sender.
Black holed : Traffic is black-holed when the sender is unaware that
packets are not being delivered. This could be due to a Packet
Black Hole or an ICMP Black Hole.
Classical Path MTU Discovery: Classical PMTUD is a process described Classical Path MTU Discovery: Classical PMTUD is a process described
in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to in [RFC1191] and [RFC8201], in which nodes rely on PTB messages to
learn the largest size of unfragmented datagram that can be used learn the largest size of unfragmented datagram that can be used
across a network path. across a network path.
Datagram: A datagram is a transport-layer protocol data unit, Datagram: A datagram is a transport-layer protocol data unit,
transmitted in the payload of an IP packet. transmitted in the payload of an IP packet.
Effective PMTU: The Effective PMTU is the current estimated value Effective PMTU: The Effective PMTU is the current estimated value
for PMTU that is used by a PMTUD. This is equivalent to the for PMTU that is used by a PMTUD. This is equivalent to the
PLPMTU derived by PLPMTUD. PLPMTU derived by PLPMTUD.
EMTU_S: The Effective MTU for sending (EMTU_S) is defined in EMTU_S: The Effective MTU for sending (EMTU_S) is defined in
[RFC1122] as "the maximum IP datagram size that may be sent, for a [RFC1122] as "the maximum IP datagram size that may be sent, for a
particular combination of IP source and destination addresses...". particular combination of IP source and destination addresses...".
EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in EMTU_R: The Effective MTU for receiving (EMTU_R) is designated in
[RFC1122] as the largest datagram size that can be reassembled by [RFC1122] as the largest datagram size that can be reassembled by
EMTU_R ("Effective MTU to receive"). EMTU_R (Effective MTU to receive).
Link: A Link is a communication facility or medium over which nodes Link: A Link is a communication facility or medium over which nodes
can communicate at the link layer, i.e., a layer below the IP can communicate at the link layer, i.e., a layer below the IP
layer. Examples are Ethernet LANs and Internet (or higher) layer layer. Examples are Ethernet LANs and Internet (or higher) layer
and tunnels. and tunnels.
Link MTU: The Link Maximum Transmission Unit (MTU) is the size in Link MTU: The Link Maximum Transmission Unit (MTU) is the size in
bytes of the largest IP packet, including the IP header and bytes of the largest IP packet, including the IP header and
payload, that can be transmitted over a link. Note that this payload, that can be transmitted over a link. Note that this
could more properly be called the IP MTU, to be consistent with could more properly be called the IP MTU, to be consistent with
how other standards organizations use the acronym. This includes how other standards organizations use the acronym. This includes
the IP header, but excludes link layer headers and other framing the IP header, but excludes link layer headers and other framing
that is not part of IP or the IP payload. Other standards that is not part of IP or the IP payload. Other standards
organizations generally define the link MTU to include the link organizations generally define the link MTU to include the link
layer headers. layer headers.
MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU that DPLPMTUD MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU that DPLPMTUD
will attempt to use. will attempt to use.
MPS: The Maximum Packet Size (MPS) is the largest size of MPS: The Maximum Packet Size (MPS) is the largest size of
application data block that can be sent across a network path. In application data block that can be sent across a network path by a
DPLPMTUD this quantity is derived from the PLPMTU by taking into PL. In DPLPMTUD this quantity is derived from the PLPMTU by
consideration the size of the lower protocol layer headers. taking into consideration the size of the lower protocol layer
headers. Probe packets generated by DPLPMTUD can have a size
larger than the MPS.
MIN_PMTU: The MIN_PMTU is the smallest size of PLPMTU that DPLPMTUD MIN_PMTU: The MIN_PMTU is the smallest size of PLPMTU that DPLPMTUD
will attempt to use. will attempt to use.
Packet: A Packet is the IP header plus the IP payload. Packet: A Packet is the IP header plus the IP payload.
Packetization Layer (PL): The Packetization Layer (PL) is the layer Packetization Layer (PL): The Packetization Layer (PL) is the layer
of the network stack that places data into packets and performs of the network stack that places data into packets and performs
transport protocol functions. transport protocol functions.
skipping to change at page 10, line 21 skipping to change at page 10, line 35
It MAY utilize similar information about the receiver when this It MAY utilize similar information about the receiver when this
is supplied (note this could be less than EMTU_R). This avoids is supplied (note this could be less than EMTU_R). This avoids
implementations trying to send probe packets that can not be implementations trying to send probe packets that can not be
transmitted by the local link. Too high of a value could reduce transmitted by the local link. Too high of a value could reduce
the efficiency of the search algorithm. Some applications also the efficiency of the search algorithm. Some applications also
have a maximum transport protocol data unit (PDU) size, in which have a maximum transport protocol data unit (PDU) size, in which
case there is no benefit from probing for a size larger than this case there is no benefit from probing for a size larger than this
(unless a transport allows multiplexing multiple applications (unless a transport allows multiplexing multiple applications
PDUs into the same datagram). PDUs into the same datagram).
2. PLPMTU: A datagram application using a transport layer not 2. PLPMTU: A datagram application using a PL not supporting
supporting fragmentation is REQUIRED to be able to choose the fragmentation is REQUIRED to be able to choose the size of
size of datagrams sent to the network, up to the PLPMTU, or a datagrams sent to the network, up to the PLPMTU, or a smaller
smaller value (such as the MPS) derived from this. This value is value (such as the MPS) derived from this. This value is managed
managed by the DPLPMTUD method. The PLPMTU (specified as the by the DPLPMTUD method. The PLPMTU (specified as the effective
effective PMTU in Section 1 of [RFC1191]) is equivalent to the PMTU in Section 1 of [RFC1191]) is equivalent to the EMTU_S
EMTU_S (specified in [RFC1122]). (specified in [RFC1122]).
3. Probe packets: On request, a DPLPMTUD sender is REQUIRED to be 3. Probe packets: On request, a DPLPMTUD sender is REQUIRED to be
able to transmit a packet larger than the PLMPMTU. This is used able to transmit a packet larger than the PLMPMTU. This is used
to send a probe packet. In IPv4, a probe packet MUST be sent to send a probe packet. In IPv4, a probe packet MUST be sent
with the Don't Fragment (DF) bit set in the IP header, and with the Don't Fragment (DF) bit set in the IP header, and
without network layer endpoint fragmentation. In IPv6, a probe without network layer endpoint fragmentation. In IPv6, a probe
packet is always sent without source fragmentation (as specified packet is always sent without source fragmentation (as specified
in section 5.4 of [RFC8201]). in section 5.4 of [RFC8201]).
4. Processing PTB messages: A DPLPMTUD sender MAY optionally utilize 4. Processing PTB messages: A DPLPMTUD sender MAY optionally utilize
skipping to change at page 11, line 4 skipping to change at page 11, line 20
validation confirms that the PTB message was sent in response to validation confirms that the PTB message was sent in response to
a packet originating by the sender, and needs to be performed a packet originating by the sender, and needs to be performed
before the PLPMTU discovery method reacts to the PTB message. A before the PLPMTU discovery method reacts to the PTB message. A
PTB message MUST NOT be used to increase the PLPMTU [RFC8201]. PTB message MUST NOT be used to increase the PLPMTU [RFC8201].
5. Reception feedback: The destination PL endpoint is REQUIRED to 5. Reception feedback: The destination PL endpoint is REQUIRED to
provide a feedback method that indicates to the DPLPMTUD sender provide a feedback method that indicates to the DPLPMTUD sender
when a probe packet has been received by the destination PL when a probe packet has been received by the destination PL
endpoint. The mechanism needs to be robust to the possibility endpoint. The mechanism needs to be robust to the possibility
that packets could be significantly delayed along a network path. that packets could be significantly delayed along a network path.
The local PL endpoint at the sending node is REQUIRED to pass The local PL endpoint at the sending node is REQUIRED to pass
this feedback to the sender-side DPLPMTUD method. this feedback to the sender DPLPMTUD method.
6. Probe loss recovery: It is RECOMMENDED to use probe packets that 6. Probe loss recovery: It is RECOMMENDED to use probe packets that
do not carry any user data. Most datagram transports permit do not carry any user data. Most datagram transports permit
this. If a probe packet contains user data requiring this. If a probe packet contains user data requiring
retransmission in case of loss, the PL (or layers above) are retransmission in case of loss, the PL (or layers above) are
REQUIRED to arrange any retransmission/repair of any resulting REQUIRED to arrange any retransmission/repair of any resulting
loss. DPLPMTUD is REQUIRED to be robust in the case where probe loss. DPLPMTUD is REQUIRED to be robust in the case where probe
packets are lost due to other reasons (including link packets are lost due to other reasons (including link
transmission error, congestion). transmission error, congestion).
skipping to change at page 11, line 31 skipping to change at page 11, line 46
indication of congestion and the loss SHOULD NOT directly trigger indication of congestion and the loss SHOULD NOT directly trigger
a congestion control reaction [RFC4821]. a congestion control reaction [RFC4821].
8. Shared PLPMTU state: The PLPMTU value could also be stored with 8. Shared PLPMTU state: The PLPMTU value could also be stored with
the corresponding entry in the destination cache and used by the corresponding entry in the destination cache and used by
other PL instances. The specification of PLPMTUD [RFC4821] other PL instances. The specification of PLPMTUD [RFC4821]
states: "If PLPMTUD updates the MTU for a particular path, all states: "If PLPMTUD updates the MTU for a particular path, all
Packetization Layer sessions that share the path representation Packetization Layer sessions that share the path representation
(as described in Section 5.2 of [RFC4821]) SHOULD be notified to (as described in Section 5.2 of [RFC4821]) SHOULD be notified to
make use of the new MTU". Such methods MUST be robust to the make use of the new MTU". Such methods MUST be robust to the
wide variety of underlying network forwarding behaviours, PLPMTU wide variety of underlying network forwarding behaviors, PLPMTU
adjustments based on shared PLPMTU values should be incorporated adjustments based on shared PLPMTU values should be incorporated
in the search algorithms. Section 5.2 of [RFC8201] provides in the search algorithms. Section 5.2 of [RFC8201] provides
guidance on the caching of PMTU information and also the relation guidance on the caching of PMTU information and also the relation
to IPv6 flow labels. to IPv6 flow labels.
In addition, the following principles are stated for design of a In addition, the following principles are stated for design of a
DPLPMTUD method: DPLPMTUD method:
o MPS: A method is REQUIRED to signal an appropriate MPS to the * MPS: A method is REQUIRED to signal an appropriate MPS to the
higher layer using the PL. The value of the MPS can change higher layer using the PL. The value of the MPS can change
following a change to the path. It is RECOMMENDED that methods following a change to the path. It is RECOMMENDED that methods
avoid forcing an application to use an arbitrary small MPS avoid forcing an application to use an arbitrary small MPS
(PLPMTU) for transmission while the method is searching for the (PLPMTU) for transmission while the method is searching for the
currently supported PLPMTU. Datagram PLs do not necessarily currently supported PLPMTU. Datagram PLs do not necessarily
support fragmentation of PDUs larger than the PLPMTU. A reduced support fragmentation of PDUs larger than the PLPMTU. A reduced
MPS can adversely impact the performance of a datagram MPS can adversely impact the performance of a datagram
application. application.
o Path validation: It is RECOMMENDED that methods are robust to path * Path validation: It is RECOMMENDED that methods are robust to path
changes that could have occurred since the path characteristics changes that could have occurred since the path characteristics
were last confirmed, and to the possibility of inconsistent path were last confirmed, and to the possibility of inconsistent path
information being received. information being received.
o Datagram reordering: A method is REQUIRED to be robust to the * Datagram reordering: A method is REQUIRED to be robust to the
possibility that a flow encounters reordering, or the traffic possibility that a flow encounters reordering, or the traffic
(including probe packets) is divided over more than one network (including probe packets) is divided over more than one network
path. path.
o When to probe: It is RECOMMENDED that methods determine whether * When to probe: It is RECOMMENDED that methods determine whether
the path capacity has increased since it last measured the path. the path has changed since it last measured the path. This can
This determines when the path should again be probed. help determine when to probe the path again.
4. DPLPMTUD Mechanisms 4. DPLPMTUD Mechanisms
This section lists the protocol mechanisms used in this This section lists the protocol mechanisms used in this
specification. specification.
4.1. PLPMTU Probe Packets 4.1. PLPMTU Probe Packets
The DPLPMTUD method relies upon the PL sender being able to generate The DPLPMTUD method relies upon the PL sender being able to generate
probe packets with a specific size. TCP is able to generate these probe packets with a specific size. TCP is able to generate these
probe packets by choosing to appropriately segment data being sent probe packets by choosing to appropriately segment data being sent
[RFC4821]. In contrast, a datagram PL that needs to construct a [RFC4821]. In contrast, a datagram PL that needs to construct a
probe packet has to either request an application to send a data probe packet has to either request an application to send a data
block that is larger than that generated by an application, or to block that is larger than that generated by an application, or to
utilise padding functions to extend a datagram beyond the size of the utilize padding functions to extend a datagram beyond the size of the
application data block. Protocols that permit exchange of control application data block. Protocols that permit exchange of control
messages (without an application data block) could alternatively messages (without an application data block) could alternatively
prefer to generate a probe packet by extending a control message with prefer to generate a probe packet by extending a control message with
padding data. padding data.
A receiver needs to be able to distinguish an in-band data block from A receiver needs to be able to distinguish an in-band data block from
any added padding. This is needed to ensure that any added padding any added padding. This is needed to ensure that any added padding
is not passed on to an application at the receiver. is not passed on to an application at the receiver.
This results in three possible ways that a sender can create a probe This results in three possible ways that a sender can create a probe
packet listed in order of preference: packet listed in order of preference:
Probing using padding data: A probe packet that contains only Probing using padding data: A probe packet that contains only
control information together with any padding, which is needed to control information together with any padding, which is needed to
be inflated to the size required for the probe packet. Since be inflated to the size required for the probe packet. Since
these probe packets do not carry an application-supplied data these probe packets do not carry an application-supplied data
block, they do not typically require retransmission, although they block, they do not typically require retransmission, although they
do still consume network capacity and incur endpoint processing. do still consume network capacity and incur endpoint processing.
Probing using application data and padding data: A probe packet that Probing using application data and padding
data: A probe packet that
contains a data block supplied by an application that is combined contains a data block supplied by an application that is combined
with padding to inflate the length of the datagram to the size with padding to inflate the length of the datagram to the size
required for the probe packet. If the application/transport needs required for the probe packet. If the application/transport needs
protection from the loss of this probe packet, the application/ protection from the loss of this probe packet, the application/
transport could perform transport-layer retransmission/repair of transport could perform transport-layer retransmission/repair of
the data block (e.g., by retransmission after loss is detected or the data block (e.g., by retransmission after loss is detected or
by duplicating the data block in a datagram without the padding by duplicating the data block in a datagram without the padding
data). data).
Probing using application data: A probe packet that contains a data Probing using application data: A probe packet that contains a data
skipping to change at page 13, line 24 skipping to change at page 13, line 41
issue a data block of the desired probe size. If the application/ issue a data block of the desired probe size. If the application/
transport needs protection from the loss of an unsuccessful probe transport needs protection from the loss of an unsuccessful probe
packet, the application/transport needs then to perform transport- packet, the application/transport needs then to perform transport-
layer retransmission/repair of the data block (e.g., by layer retransmission/repair of the data block (e.g., by
retransmission after loss is detected). retransmission after loss is detected).
A PL that uses a probe packet carrying an application data block, A PL that uses a probe packet carrying an application data block,
could need to retransmit this application data block if the probe could need to retransmit this application data block if the probe
fails. This could need the PL to re-fragment the data block to a fails. This could need the PL to re-fragment the data block to a
smaller packet size that is expected to traverse the end-to-end path smaller packet size that is expected to traverse the end-to-end path
(which could utilise endpoint network-layer or PL fragmentation when (which could utilize endpoint network-layer or PL fragmentation when
these are available). these are available).
DPLPMTUD MAY choose to use only one of these methods to simplify the DPLPMTUD MAY choose to use only one of these methods to simplify the
implementation. implementation.
Probe messages sent by a PL MUST contain enough information to Probe messages sent by a PL MUST contain enough information to
uniquely identify the probe within Maximum Segment Lifetime, while uniquely identify the probe within Maximum Segment Lifetime, while
being robust to reordering and replay of probe response and PTB being robust to reordering and replay of probe response and PTB
messages. messages.
skipping to change at page 14, line 5 skipping to change at page 14, line 23
mechanism SHOULD also be used by DPLPMTUD to acknowledge reception of mechanism SHOULD also be used by DPLPMTUD to acknowledge reception of
a probe packet. a probe packet.
A PL that does not acknowledge data reception (e.g., UDP and UDP- A PL that does not acknowledge data reception (e.g., UDP and UDP-
Lite) is unable itself to detect when the packets that it sends are Lite) is unable itself to detect when the packets that it sends are
discarded because their size is greater than the actual PMTU. These discarded because their size is greater than the actual PMTU. These
PLs need to either rely on an application protocol to detect this PLs need to either rely on an application protocol to detect this
loss, or make use of an additional transport method such as UDP- loss, or make use of an additional transport method such as UDP-
Options [I-D.ietf-tsvwg-udp-options]. Options [I-D.ietf-tsvwg-udp-options].
Section 5 specifies this function for a set of IETF-specified Section 6 specifies this function for a set of IETF-specified
protocols. protocols.
4.3. Detection of Black Holes 4.3. Detection of Unsupported PLPMTU Size, aka Black Hole Detection
A PL sender needs to reduce the PLPMTU when it discovers the actual A PL sender needs to reduce the PLPMTU when it discovers the actual
PMTU supported by a network path is less than the PLPMTU (i.e. to PMTU supported by a network path is less than the PLPMTU. This can
detect that traffic is being black holed). This can be triggered be triggered when a validated PTB message is received, or by another
when a validated PTB message is received, or by another event that event that indicates the network path no longer sustains the current
indicates the network path no longer sustains the current packet packet size, such as a loss report from the PL, or repeated lack of
size, such as a loss report from the PL or repeated lack of response response to probe packets sent to confirm the PLPMTU. Detection is
to probe packets sent to confirm the PLPMTU. Detection is followed followed by a reduction of the PLPMTU.
by a reduction of the PLPMTU.
Black Hole detection is performed by periodically sending packet This is performed by sending packet probes of size PLPMTU to verify
probes of size PLPMTU to verify that a network path still supports that a network path still supports the last acknowledged PLPMTU size.
the last acknowledged PLPMTU size. There are two ways a DPLPMTUD There are two alternative mechanism:
sender detect that the current PLPMTU is not sustained by the path
(i.e., to detect a black hole):
o A PL can rely upon a mechanisms implemented within the PL protocol * A PL can rely upon a mechanism implemented within the PL to detect
to detect excessive loss of data sent with a specific packet size excessive loss of data sent with a specific packet size and then
and then conclude that this excessive loss could be a result of an conclude that this excessive loss could be a result of an invalid
invalid PMTU (as in PLPMTUD for TCP [RFC4821]). PMTU (as in PLPMTUD for TCP [RFC4821]).
o A PL can use the probing mechanism to send confirmation probe * A PL can use the DPLPMTUD probing mechanism to periodically
packets of the size of the current PLPMTU and a timer track generate probe packets of the size of the current PLPMTU (e.g.,
whether acknowledgments are received (e.g., the number of probe using the confirmation timer Section 5.1.1). A timer tracks
packets sent without receiving an acknowledgement, PROBE_COUNT, whether acknowledgments are received. Successive loss of probes
becomes greater than the MAX_PROBES). These messages need to be is an indication that the current path no longer supports the
generated periodically (e.g., using the confirmation timer PLPMTU (e.g., when the number of probe packets sent without
Section 5.1.1), and MAY inhibit sending probe packets when no receiving an acknowledgement, PROBE_COUNT, becomes greater than
application data has been sent since the previous probe packet. A MAX_PROBES).
PL preferring to use an up-to-data PMTU once user data is sent
again, MAY choose to continue PMTU discovery for each path.
However, this may result in additional packets being sent.
Successive loss of probes is an indication that the current path
no longer supports the PLPMTU.
When the method detects the current PLPMTU is not supported (a black A PL MAY inhibit sending probe packets when no application data has
hole is found), DPLPMTUD sets a lower MPS. The PL then confirms that been sent since the previous probe packet. A PL preferring to use an
the updated PLPMTU can be successfully used across the path. This up-to-data PLPMTU once user data is sent again, MAY choose to
can need the PL to send a probe packet with a size less than the size continue PLPMTU discovery for each path. However, this may result in
of the data block generated by an application. In this case, the PL additional packets being sent.
could provide a way to fragment a datagram at the PL, or could
instead utilise a control packet with padding. When the method detects the current PLPMTU is not supported, DPLPMTUD
sets a lower MPS. The PL then confirms that the updated PLPMTU can
be successfully used across the path. The PL could need to send a
probe packet with a size less than the size of the data block
generated by an application. In this case, the PL could provide a
way to fragment a datagram at the PL, or use a control packet as the
packet probe.
4.4. Response to PTB Messages 4.4. Response to PTB Messages
This method requires the DPLPMTUD sender to validate any received PTB This method requires the DPLPMTUD sender to validate any received PTB
message before using the PTB information. The response to a PTB message before using the PTB information. The response to a PTB
message depends on the PTB_SIZE indicated in the PTB message, the message depends on the PTB_SIZE indicated in the PTB message, the
state of the PLPMTUD state machine, and the IP protocol being used. state of the PLPMTUD state machine, and the IP protocol being used.
Section 4.4.1 first describes validation for both IPv4 ICMP Section 4.4.1 first describes validation for both IPv4 ICMP
Unreachable messages (type 3) and ICMPv6 packet too big messages, Unreachable messages (type 3) and ICMPv6 packet too big messages,
both of which are referred to as PTB messages in this document. both of which are referred to as PTB messages in this document.
4.4.1. Validation of PTB Messages 4.4.1. Validation of PTB Messages
This section specifies utlisation of PTB messages. This section specifies utilization of PTB messages.
o A simple implementation MAY ignore received PTB messages and in * A simple implementation MAY ignore received PTB messages and in
this case the PLPMTU is not updated when a PTB message is this case the PLPMTU is not updated when a PTB message is
received. received.
o An implementation that supports PTB messages MUST validate * An implementation that supports PTB messages MUST validate
messages before they are further processed. messages before they are further processed.
A PL that receives a PTB message from a router or middlebox, performs A PL that receives a PTB message from a router or middlebox, performs
ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201]. ICMP validation as specified in Section 5.2 of [RFC8085][RFC8201].
Because DPLPMTUD operates at the PL, the PL needs to check that each Because DPLPMTUD operates at the PL, the PL needs to check that each
received PTB message is received in response to a packet transmitted received PTB message is received in response to a packet transmitted
by the endpoint PL performing DPLPMTUD. by the endpoint PL performing DPLPMTUD.
The PL MUST check the protocol information in the quoted packet The PL MUST check the protocol information in the quoted packet
carried in the ICMP PTB message payload to validate the message carried in an ICMP PTB message payload to validate the message
originated from the sending node. This validation includes originated from the sending node. This validation includes
determining that the combination of the IP addresses, the protocol, determining that the combination of the IP addresses, the protocol,
the source port and destination port match those returned in the the source port and destination port match those returned in the
quoted packet - this is also necessary for the PTB message to be quoted packet - this is also necessary for the PTB message to be
passed to the corresponding PL. passed to the corresponding PL.
The validation SHOULD utilise information that it is not simple for The validation SHOULD utilize information that it is not simple for
an off-path attacker to determine. For example, by checking the an off-path attacker to determine [RFC8085]. For example, by
value of a protocol header field known only to the two PL endpoints. checking the value of a protocol header field known only to the two
A datagram application that uses well-known source and destination PL endpoints. A datagram application that uses well-known source and
ports ought to also rely on other information to complete this destination ports ought to also rely on other information to complete
validation. this validation.
These checks are intended to provide protection from packets that These checks are intended to provide protection from packets that
originate from a node that is not on the network path. originate from a node that is not on the network path. A PTB message
that does not complete the validation MUST NOT be further utilized by
A PTB message that does not complete the validation MUST NOT be the DPLPMTUD method.
further utilised by the DPLPMTUD method.
PTB messages that have been validated MAY be utilised by the DPLPMTUD PTB messages that have been validated MAY be utilized by the DPLPMTUD
algorithm, but MUST NOT be used directly to set the PLPMTU. A method algorithm, but MUST NOT be used directly to set the PLPMTU. A method
that utilises these PTB messages can improve the speed at the which that utilizes these PTB messages can improve the speed at the which
the algorithm detects an appropriate PLPMTU, compared to one that the algorithm detects an appropriate PLPMTU, compared to one that
relies solely on probing. Section 4.4.2 describes this processing. relies solely on probing. Section 4.4.2 describes this processing.
4.4.2. Use of PTB Messages 4.4.2. Use of PTB Messages
A set of checks are intended to provide protection from a router that A set of checks are intended to provide protection from a router that
reports an unexpected PTB_SIZE. The PL needs to check that the reports an unexpected PTB_SIZE. The PL also needs to check that the
indicated PTB_SIZE is less than the size used by probe packets and indicated PTB_SIZE is less than the size used by probe packets and
larger than minimum size accepted. larger than minimum size accepted.
This section provides a summary of how PTB messages can be utilised. This section provides a summary of how PTB messages can be utilized.
This processing depends on the PTB_SIZE and the current value of a This processing depends on the PTB_SIZE and the current value of a
set of variables: set of variables:
MIN_PMTU < PTB_SIZE < BASE_PMTU MIN_PMTU < PTB_SIZE < BASE_PMTU
* A robust PL MAY enter an error state (see Section 5.2) for an
IPv4 path when the PTB_SIZE reported in the PTB message is
larger than or equal to 68 bytes and when this is less than the
BASE_PMTU.
* A robust PL MAY enter the PROBE_ERROR state for an IPv4 path * A robust PL MAY enter an error state (see Section 5.2) for an
when the PTB_SIZE reported in the PTB message >= 68 bytes and IPv6 path when the PTB_SIZE reported in the PTB message is
when this is less than the BASE_PMTU. larger than or equal to 1280 bytes and when this is less than
the BASE_PMTU.
* A robust PL MAY enter the PROBE_ERROR state for an IPv6 path
when the PTB_SIZE reported in the PTB message >= 1280 bytes and
when this is less than the BASE_PMTU.
PTB_SIZE = PLPMTU PTB_SIZE = PLPMTU
* Completes the search for a larger PLPMTU.
* Transition to SEARCH_COMPLETE.
PTB_SIZE > PROBED_SIZE PTB_SIZE > PROBED_SIZE
* Inconsistent network signal.
* The PTB_SIZE > PROBED_SIZE, inconsistent network signal. These * PTB message ought to be discarded without further processing
PTB messages ought to be discarded without further processing (e. g. PLPMTU not modified).
(the PLPMTU not updated).
* The information could be utilised as an input to trigger * The information could be utilized as an input to trigger
enabling a resilience mode. enabling a resilience mode.
BASE_PMTU <= PTB_SIZE < PLPMTU BASE_PMTU <= PTB_SIZE < PLPMTU
* Black Hole Detection is triggered and the PLPMTU ought to be
* Black hole detection is triggered and the PLPMTU ought to be
set to BASE_PMTU. set to BASE_PMTU.
* The PL could use PTB_SIZE reported in the PTB message to * The PL could use the PTB_SIZE reported in the PTB message to
initialise a search algorithm. initialize a search algorithm.
PLPMTU < PTB_SIZE < PROBED_SIZE PLPMTU < PTB_SIZE < PROBED_SIZE
* The PLPMTU continues to be valid, but the last PROBED_SIZE * The PLPMTU continues to be valid, but the last PROBED_SIZE
searched was larger than the actual PMTU. searched was larger than the actual PMTU.
* The PLPMTU is not updated. * The PLPMTU is not updated.
* The PL can use the reported PTB_SIZE from the PTB message as * The PL can use the reported PTB_SIZE from the PTB message as
the next search point when it resumes the search algorithm. the next search point when it resumes the search algorithm.
xxx Author Note: Do we want to specify how to handle PTB Message with xxx Author Note: Do we want to specify how to handle PTB Message with
PTB_SIZE = 0? xxx PTB_SIZE = 0? xxx
skipping to change at page 17, line 23 skipping to change at page 17, line 35
the next search point when it resumes the search algorithm. the next search point when it resumes the search algorithm.
xxx Author Note: Do we want to specify how to handle PTB Message with xxx Author Note: Do we want to specify how to handle PTB Message with
PTB_SIZE = 0? xxx PTB_SIZE = 0? xxx
5. Datagram Packetization Layer PMTUD 5. Datagram Packetization Layer PMTUD
This section specifies Datagram PLPMTUD (DPLPMTUD). The method can This section specifies Datagram PLPMTUD (DPLPMTUD). The method can
be introduced at various points (as indicated with * in the figure be introduced at various points (as indicated with * in the figure
below) in the IP protocol stack to discover the PLPMTU so that an below) in the IP protocol stack to discover the PLPMTU so that an
application can utilise an appropriate MPS for the current network application can utilize an appropriate MPS for the current network
path. DPLPMTUD SHOULD NOT be used by an application if it is already path. DPLPMTUD SHOULD NOT be used by an application if it is already
used in a lower layer. used in a lower layer.
+----------------------+ +----------------------+
| Application* | | Application* |
+-+-------+----+---+---+ +-+-------+----+----+--+
| | | | | | | |
+---+--+ +--+--+ | +-+---+ +---+--+ +--+--+ | +-+---+
| QUIC*| |UDPO*| | |SCTP*| | QUIC*| |UDPO*| | |SCTP*|
+---+--+ +--+--+ | ++--+-+ +---+--+ +--+--+ | +--+--+
| | | | | | | | | |
+-------+-+ | | | +-------+--+ | | |
| | | | | | | |
++-+--++ | +-+-+--+ |
| UDP | | | UDP | |
+---+--+ | +---+--+ |
| | | |
+--------------+-----+-+ +--------------+-----+-+
| Network Interface | | Network Interface |
+----------------------+ +----------------------+
Figure 1: Examples where DPLPMTUD can be implemented Figure 1: Examples where DPLPMTUD can be implemented
The central idea of DPLPMTUD is probing by a sender. Probe packets The central idea of DPLPMTUD is probing by a sender. Probe packets
are sent to find the maximum size of a user message that can be are sent to find the maximum size of a user message that can be
completely transferred across the network path from the sender to the completely transferred across the network path from the sender to the
destination. destination.
This section identifies the components needed for implementation, the The folloowing sections identify the components needed for
phases of operation, the state machine and search algorithm. implementation, provides an overvoew of the phases of operation, and
specifies the state machine and search algorithm.
5.1. DPLPMTUD Components 5.1. DPLPMTUD Components
This section describes components of DPLPMTUD. This section describes the timers, constants, and variables of
DPLPMTUD.
5.1.1. Timers 5.1.1. Timers
The method utilises up to three timers: The method utilizes up to three timers:
PROBE_TIMER: The PROBE_TIMER is configured to expire after a period PROBE_TIMER: The PROBE_TIMER is configured to expire after a
longer than the maximum time to receive an acknowledgment to a period longer than the maximum time to receive
probe packet. This value MUST NOT be smaller than 1 second, and an acknowledgment to a probe packet. This value
SHOULD be larger than 15 seconds. Guidance on selection of the MUST NOT be smaller than 1 second, and SHOULD be
timer value are provided in section 3.1.1 of the UDP Usage larger than 15 seconds. Guidance on selection
Guidelines [RFC8085]. of the timer value are provided in section 3.1.1
of the UDP Usage Guidelines [RFC8085].
If the PL has a path Round Trip Time (RTT) estimate and timely If the PL has a path Round Trip Time (RTT)
acknowledgements the PROBE_TIMER can be derived from the PL RTT estimate and timely acknowledgements the
estimate. PROBE_TIMER can be derived from the PL RTT
estimate.
PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period a PMTU_RAISE_TIMER: The PMTU_RAISE_TIMER is configured to the period
sender will continue to use the current PLPMTU, after which it re- a sender will continue to use the current
enters the Search phase. This timer has a period of 600 secs, as PLPMTU, after which it re-enters the Search
recommended by PLPMTUD [RFC4821]. phase. This timer has a period of 600 secs, as
recommended by PLPMTUD [RFC4821].
DPLPMTUD MAY inhibit sending probe packets when no application DPLPMTUD MAY inhibit sending probe packets when
data has been sent since the previous probe packet. A PL no application data has been sent since the
preferring to use an up-to-data PMTU once user data is sent again, previous probe packet. A PL preferring to use
can choose to continue PMTU discovery for each path. However, an up-to-data PMTU once user data is sent again,
this could in sending additional packets. can choose to continue PMTU discovery for each
path. However, this may result in sending
additional packets.
CONFIRMATION_TIMER: When an acknowledged PL is used, this timer MUST CONFIRMATION_TIMER: When an acknowledged PL is used, this timer MUST
NOT be used. For other PLs, the CONFIRMATION_TIMER is configured NOT be used. For other PLs, the
to the period a PL sender waits before confirming the current CONFIRMATION_TIMER is configured to the period a
PLPMTU is still supported. This is less than the PMTU_RAISE_TIMER PL sender waits before confirming the current
and used to decrease the PLPMTU (e.g., when a black hole is PLPMTU is still supported. This is less than
encountered). Confirmation needs to be frequent enough when data the PMTU_RAISE_TIMER and used to decrease the
is flowing that the sending PL does not black hole extensive PLPMTU (e.g., when a black hole is encountered).
amounts of traffic. Guidance on selection of the timer value are Confirmation needs to be frequent enough when
provided in section 3.1.1 of the UDP Usage Guidelines [RFC8085]. data is flowing that the sending PL does not
black hole extensive amounts of traffic.
Guidance on selection of the timer value are
provided in section 3.1.1 of the UDP Usage
Guidelines [RFC8085].
DPLPMTUD MAY inhibit sending probe packets when no application DPLPMTUD MAY inhibit sending probe packets when
data has been sent since the previous probe packet. A PL no application data has been sent since the
preferring to use an up-to-data PMTU once user data is sent again, previous probe packet. A PL preferring to use
can choose to continue PMTU discovery for each path. However, an up-to-data PMTU once user data is sent again,
this may result in sending additional packets. can choose to continue PMTU discovery for each
path. However, this may result in sending
additional packets.
An implementation could implement the various timers using a single An implementation could implement the various timers using a single
timer. timer.
5.1.2. Constants 5.1.2. Constants
The following constants are defined: The following constants are defined:
MAX_PROBES: MAX_PROBES is the maximum value of the PROBE_COUNT MAX_PROBES: The MAX_PROBES is the maximum value of the PROBE_COUNT
counter. The default value of MAX_PROBES is 10. counter (see Section 5.1.3). The default value of
MAX_PROBES is 10.
MIN_PMTU: The MIN_PMTU is smallest allowed probe packet size. For MIN_PMTU: The MIN_PMTU is the smallest allowed probe packet size.
IPv6, this value is 1280 bytes, as specified in [RFC2460]. For For IPv6, this value is 1280 bytes, as specified in
IPv4, the minimum value is 68 bytes. (An IPv4 router is required [RFC2460]. For IPv4, the minimum value is 68 bytes.
to be able to forward a datagram of 68 bytes without further
fragmentation. This is the combined size of an IPv4 header and
the minimum fragment size of 8 bytes. In addition, receivers are
required to be able to reassemble fragmented datagrams at least up
to 576 bytes, as stated in section 3.3.3 of [RFC1122]))
MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU. This has to Note: An IPv4 router is required to be able to forward a
be less than or equal to the minimum of the local MTU of the datagram of 68 bytes without further fragmentation.
outgoing interface and the destination PMTU for receiving. An This is the combined size of an IPv4 header and the
application or PL MAY reduce the MAX_PMTU when there is no need to minimum fragment size of 8 bytes. In addition,
send packets larger than a specific size. receivers are required to be able to reassemble
fragmented datagrams at least up to 576 bytes, as stated
in section 3.3.3 of [RFC1122].
BASE_PMTU: The BASE_PMTU is a configured size expected to work for MAX_PMTU: The MAX_PMTU is the largest size of PLPMTU. This has to
most paths. The size is equal to or larger than the MIN_PMTU and be less than or equal to the minimum of the local MTU of
smaller than the MAX_PMTU. In the case of IPv6, this value is the outgoing interface and the destination PMTU for
1280 bytes [RFC2460]. When using IPv4, a size of 1200 bytes is receiving. An application, or PL, MAY reduce the
RECOMMENDED. MAX_PMTU when there is no need to send packets larger
than a specific size.
BASE_PMTU: The BASE_PMTU is a configured size expected to work for
most paths. The size is equal to or larger than the
MIN_PMTU and smaller than the MAX_PMTU. In the case of
IPv6, this value is 1280 bytes [RFC2460]. When using
IPv4, a size of 1200 bytes is RECOMMENDED.
5.1.3. Variables 5.1.3. Variables
This method utilises a set of variables: This method utilizes a set of variables:
PROBED_SIZE: The PROBED_SIZE is the size of the current probe PROBED_SIZE: The PROBED_SIZE is the size of the current probe
packet. This is a tentative value for the PLPMTU, which is packet. This is a tentative value for the PLPMTU,
awaiting confirmation by an acknowledgment. which is awaiting confirmation by an acknowledgment.
PROBE_COUNT: The PROBE_COUNT is a count of the number of PROBE_COUNT: The PROBE_COUNT is a count of the number of
unsuccessful probe packets that have been sent with a size of unsuccessful probe packets that have been sent with a
PROBED_SIZE. The value is initialised to zero when a particular size of PROBED_SIZE. The value is initialized to zero
size of PROBED_SIZE is first attempted. when a particular size of PROBED_SIZE is first
attempted.
The figure below illustrates the relationship between the packet size The figure below illustrates the relationship between the packet size
constants and variables, in this case when the DPLPMTUD algorithm constants and variables at a point of time when the DPLPMTUD
performs path probing to increase the size of the PLPMTU. The MPS is algorithm performs path probing to increase the size of the PLPMTU.
less than the PLPMTU. A probe packet has been sent of size A probe packet has been sent of size PROBED_SIZE. Once this is
PROBED_SIZE. When this is acknowledged, the PLPMTU will be raised to acknowledged, the PLPMTU will raise to PROBED_SIZE allowing the
PROBED_SIZE allowing the PROBED_SIZE to be increased towards the DPLPMTUD algorithm to further increase PROBED_SIZE towards the actual
actual PMTU. PMTU.
MIN_PMTU MAX_PMTU MIN_PMTU MAX_PMTU
<--------------------------------------------------> <-------------------------------------------------->
| | | | | | | |
V | | V v | | v
BASE_PMTU | V Actual PMTU BASE_PMTU | v Actual PMTU
| PROBED_SIZE | PROBED_SIZE
V v
PLPMTU PLPMTU
Figure 2: Relationships between probe and packet sizes Figure 2: Relationships between packet size constants and variables
5.2. DPLPMTUD Phases
The Datagram PLPMTUD algorithm moves through several phases of
operation.
An implementation that only reduces the PLPMTU to a suitable size
would be sufficient to ensure reliable operation, but can be very
inefficient when the actual PMTU changes or when the method (for
whatever reason) makes a suboptimal choice for the PLPMTU.
A full implementation of DPLPMTUD provides an algorithm enabling the 5.1.4. Overview of DPLPMTUD Phases
DPLPMTUD sender to increase the PLPMTU following a change in the
characteristics of the path, such as when a link is reconfigured with
a larger MTU, or when there is a change in the set of links traversed
by an end-to-end flow (e.g., after a routing or path fail-over
decision).
Black hole detection (Section 4.3) and PTB processing (Section 4.4) This section provides a high-level informative view of the DPLPMTUD
proceed in parallel with these phases of operation. method, by describing the movement of the method through several
phases of operation. More detail is available in the state machine
Section 5.2.
+------------------------+ +------+
| BASE_PMTU Confirmation +-- Connectivity +------->| Base |----------------+ Connectivity
+------------+-----------+ \----+ or BASE_PMTU | +------+ | or BASE_PMTU
| ^ V Confirmation Fails | | | confirmation failed
Connectivity and | | +-------+ | | v
BASE_PMTU confirmed | +---------+ Error | | | Connectivity +-------+
| +-------+ | | and BASE_PMTU | Error |
| CONFIRMATION_TIMER | | confirmed +-------+
| Fires | | |
V | v | Consistent connectivity
+----------------+ +--------------+ PLPMTU | +--------+ | and BASE_PMTU
| Search Complete|<---------+ Search | confirmation | | Search |<--------------+ confirmed
+----------------+ +--------------+ failed | +--------+
Search Algorithm | ^ |
Completes | | |
| Raise | | Search
| timer | | algorithm
| expired | | completed
| | |
| | v
| +-----------------+
+---| Search Complete |
+-----------------+
Figure 3: DPLPMTUD Phases Figure 3: DPLPMTUD Phases
BASE_PMTU Confirmation BASE_PMTU Confirmation Phase
* The BASE_PMTU Confirmation Phase confirms connectivity to the
* Connectivity is confirmed. remote peer. This phase is implicit for a connection-oriented
PL (where it can be performed in a PL connection handshake). A
* DPLPMTUD confirms the BASE_PMTU is supported across the network connectionless PL needs to send an acknowledged probe packet to
path. confirm that the remote peer is reachable.
* DPLPMTUD then enters the search phase.
Search
* DPLPMTUD performs probing to increase the PLPMTU.
* DPLPMTUD then enters the search complete or an error phase.
Search Complete
* DPLPMTUD has found a suitable PLPMTU that is supported across
the network path.
* Black hole detection will confirm this PLPMTU continues to be
supported.
* On a longer time-frame, DPLPMTUD will re-enter the search phase
to discover if the PLPMTU can be raised.
Error
* Inconsistent or invalid network signals cause DPLPMTUD to be
unable to progress.
* This causes the algorithm to lower the MPS until the path is
shown to support the BASE_PMTU, or to suspend DPLPMTUD.
5.2.1. BASE_PMTU Confirmation Phase
DPLPMTUD starts in the BASE_PMTU confirmation phase. BASE_PMTU
confirmation is performed in two stages:
1. Connectivity to the remote peer is first confirmed. When a
connection-oriented PL is used, this stage is implicit. It is
performed as part of the normal PL connection handshake. In
contrast, an connectionless PL MUST send an acknowledged probe
packet to confirm that the remote peer is reachable.
2. In the second stage, the PL confirms it can successfully send a
datagram of the BASE_PMTU size across the current path.
A PL that does not wish to support a network path with a PLPMTU less
than BASE_PMTU can simplify the phase into a single step by
performing connectivity checks with probes of the BASE_PMTU size.
A PL MAY respond to PTB messages while in this phase, see
Section 4.4.
Once BASE_PMTU confirmation has completed, DPLPMTUD can advertise an
MPS to an upper layer.
If DPLPMTUD fails to complete these tests it enters the
PROBE_DISABLED phase, see Section 5.2.6, and ceases using DPLPTMUD.
5.2.2. Search Phase
The search phase utilises a search algorithm in attempt to increase
the PLPMTU (see Section 5.4.1). The PL sender increases the MPS each
time a packet probe confirms a larger PLPMTU is supported by the
path. The algorithm concludes by entering the SEARCH_COMPLETE phase,
see Section 5.2.3.
A PL MAY respond to PTB messages while in this phase, using the PTB
to advance or terminate the search, see Section 4.4. Similarly black
hole detection can terminate the search by entering the PROBE_BASE
phase, see Section 5.2.4.
5.2.2.1. Resilience to Inconsistent Path Information
Sometimes a PL sender is able to detect inconsistent results from the
sequence of PLPMTU probes that it sends or the sequence of PTB
messages that it receives. This could be manifested as excessive
fluctuation of the MPS.
When inconsistent path information is detected, a PL sender can
enable an alternate search mode that clamps the offered MPS to a
smaller value for a period of time. This avoids unnecessary black-
holing of packets.
5.2.3. Search Complete Phase
On entry to the search complete phase, the DPLPMTUD sender starts the
PMTU_RAISE_TIMER. In this phase, the PLPMTU remains at the value
confirmed by the last successful probe packet.
In this phase, the PL MUST periodically confirm that the PLPMTU is
still supported by the path. If the PL is designed in a way that is
unable to confirm reachability to the destination endpoint after
probing has completed, the method uses a CONFIRMATION_TIMER to
periodically repeat a probe packet for the current PLPMTU size.
If the DPLPMTUD sender is unable to confirm reachability for packets
with a size of the current PLPMTU (e.g., if the CONFIRMATION_TIMER
expires) or the PL signals a lack of reachability, the method exits
the phase and enters the PROBE_BASE phase, see Section 5.2.4.
If the PMTU_RAISE_TIMER expires, the DPLPMTUD sender re-enters the
Search phase, see Section 5.2.2, and resumes probing for a larger
PLPMTU.
Back hole detection can be used in parallel to check that a network * The sender also confirms that BASE_PMTU is supported across the
path continues to support a previously confirmed PLPMTU. If a black network path.
hole is detected the algorithm moves to the PROBE_BASE phase, see
Section 5.2.4.
The phase can also exited when a validated PTB message is received * A PL that does not wish to support a path with a PLPMTU less
(see Section 4.4.1). than BASE_PMTU can simplify the phase into a single step by
performing the connectivity checks with a probe of the
BASE_PMTU size.
5.2.4. PROBE_BASE Phase * Once confirmed, DPLPMTUD enters the Search Phase. If this
phase fails to confirm, DPLPMTUD enters the Error Phase.
This phase is entered when black hole detection or a PTB message Search Phase
indicates that the PLPMTU is not supported by the path. * The Search Phase utilizes a search algorithm to send probe
packets to seek to increase the PLPMTU.
On entry to this phase, the PLPMTU is set to the BASE_PMTU, and a * The algorithm concludes when it has found a suitable PLPMTU, by
corresponding reduced MPS is advertised. entering the Search Complete Phase.
PROBED_SIZE is then set to the PLPMTU (i.e., the BASE_PMTU), to * A PL could respond to PTB messages using the PTB to advance or
confirm this size is supported across the path. If confirmed, terminate the search, see Section 4.4.
DPLPMTUD enters the Search Phase to determine whether the PL sender
can use a larger PLPMTU.
If the path cannot be confirmed to support the BASE_PMTU after * Black Hole Detection can also terminate the search by entering
sending MAX_PROBES, DPLPMTUD moves to the Error phase, see the BASE_PMTU Confirmation phase.
Section 5.2.5.
5.2.5. ERROR Phase Search Complete Phase
* The Search Complete Phase is entered when the PLPMTU is
supported across the network path.
The ERROR phase is entered when there is conflicting or invalid * A PL can use a CONFIRMATION_TIMER to periodically repeat a
PLPMTU information for the path (e.g. a failure to support the probe packet for the current PLPMTU size. If the sender is
BASE_PMTU). In this phase, the MPS is set to a value less than the unable to confirm reachability (e.g., if the CONFIRMATION_TIMER
BASE_PMTU, but at least the size of the MIN_PMTU. expires) or the PL signals a lack of reachability, DPLPMTUD
enters the BASE_PMTU Confirmation phase.
DPLPMTUD remains in the ERROR phase until a consistent view of the * The PMTU_RAISE_TIMER is used to periodically resume the search
path can be discovered and it has also been confirmed that the path phase to discover if the PLPMTU can be raised.
supports the BASE_PMTU.
Note: MIN_PMTU may be identical to BASE_PMTU, simplifying the actions * Black Hole Detection or receipt of a validated PTB message
in this phase. Section 4.4.1) can cause the sender to enter the BASE_PMTU
Confirmation Phase.
If no acknowledgement is received for PROBE_COUNT probes of size Error Phase
MIN_PMTU, the method suspends DPLPMTUD, see Section 5.2.5. * The Error Phase is entered when there is conflicting or invalid
PLPMTU information for the path (e.g. a failure to support the
BASE_PMTU) that cause DPLPMTUD to be unable to progress and the
PLPMTU is lowered
5.2.5.1. Robustness to Inconsistent Path * DPLPMTUD remains in the Error Phase until a consistent view of
the path can be discovered and it has also been confirmed that
the path supports the BASE_PMTU (or DPLPMTUD is suspended).
Robustness to paths unable to sustain the BASE_PMTU. Some paths * Note: MIN_PMTU may be identical to BASE_PMTU, simplifying the
could be unable to sustain packets of the BASE_PMTU size. These actions in this phase.
paths could use an alternate algorithm to implement the PROBE_ERROR
phase that allows fallback to a smaller than desired PLPMTU, rather
than suffer connectivity failure.
This could also utilise methods such as endpoint IP fragmentation to An implementation that only reduces the PLPMTU to a suitable size
enable the PL sender to communicate using packets smaller than the would be sufficient to ensure reliable operation, but can be very
BASE_PMTU. inefficient when the actual PMTU changes or when the method (for
whatever reason) makes a suboptimal choice for the PLPMTU.
5.2.6. DISABLED Phase A full implementation of DPLPMTUD provides an algorithm enabling the
DPLPMTUD sender to increase the PLPMTU following a change in the
characteristics of the path, such as when a link is reconfigured with
a larger MTU, or when there is a change in the set of links traversed
by an end-to-end flow (e.g., after a routing or path fail-over
decision).
This phase suspends operation of DPLPMTUD. It disables probing for 5.2. State Machine
the PLPMTU until action is taken by the PL or application using the
PL.
5.3. State Machine A state machine for DPLPMTUD is depicted in Figure 4. If multipath
or multihoming is supported, a state machine is needed for each path.
A state machine for DPLPMTUD is depicted in Figure 4. If multihoming Note: Some state changes are not shown to simplify the diagram.
is supported, a state machine is needed for each path.
| | | |
| Start | PL indicates loss | Start | PL indicates loss
| | of connectivity | | of connectivity
V V v v
+---------------+ +---------------+ +---------------+ +---------------+
| DISABLED | | ERROR | | DISABLED | | ERROR |
+---------------+ +---------------+ +---------------+ PROBE_TIMER expiry: +---------------+
| PL indicates PROBE_TIMER expiry: ^ | | PL indicates PROBE_COUNT = MAX_PROBES or ^ |
| connectivity PROBE_COUNT = MAX_PROBES | | | connectivity PTB: PTB_SIZE < BASE_PMTU | |
+--------------------+ +---------------+ | +--------------------+ +---------------+ |
| | | | | |
V | BASE_PMTU Probe | v | BASE_PMTU Probe |
+---------------+ acked | +---------------+ acked |
| BASE |----------------------+ | BASE |----------------------+
+---------------+ | +---------------+ |
Black hole detected or ^ | ^ ^ Black hole detected or | Black hole detected or ^ | ^ ^ Black hole detected or |
PTB_SIZE < PLPMTU | | | | PTB_SIZE < PLPMTU | PTB: PTB_SIZE < PLPMTU | | | | PTB: PTB_SIZE < PLPMTU |
+--------------------+ | | +--------------------+ | +--------------------+ | | +--------------------+ |
| +----+ | | | +----+ | |
| PROBE_TIMER expiry: | | | PROBE_TIMER expiry: | |
| PROBE_COUNT < MAX_PROBES | | | PROBE_COUNT < MAX_PROBES | |
| | | | | |
| PMTU_RAISE_TIMER expiry | | | PMTU_RAISE_TIMER expiry | |
| +-----------------------------------------+ | | | +-----------------------------------------+ | |
| | | | | | | | | |
| | V | V | | v | v
+---------------+ +---------------+ +---------------+ +---------------+
|SEARCH_COMPLETE| | SEARCHING | |SEARCH_COMPLETE| | SEARCHING |
+---------------+ +---------------+ +---------------+ +---------------+
| ^ ^ | | ^ | ^ ^ | | ^
| | | | | | | | | | | |
| | +-----------------------------------------+ | | | | +-----------------------------------------+ | |
| | MAX_PMTU Probe acked or | | | | MAX_PMTU Probe acked or PROBE_TIMER | |
| | PTB (BASE_PMTU <= PTB_SIZE < PROBED_SIZE) or | | | | expiry: PROBE_COUNT = MAX_PROBES or | |
+----+ PROBE_COUNT = MAX_PROBES +----+ +----+ PTB: PTB_SIZE = PLPMTU +----+
CONFIRMATION_TIMER expiry: PROBE_TIMER expiry: CONFIRMATION_TIMER expiry: PROBE_TIMER expiry:
PROBE_COUNT < MAX_PROBES or PROBE_COUNT < MAX_PROBES or PROBE_COUNT < MAX_PROBES or PROBE_COUNT < MAX_PROBES or
PLPMTU Probe acked Probe acked PLPMTU Probe acked Probe acked or PTB:
PLPMTU < PTB_SIZE < PROBED_SIZE
Figure 4: State machine for Datagram PLPMTUD. Note: Some state Figure 4: State machine for Datagram PLPMTUD
changes are not show to simplify the diagram.
The following states are defined: The following states are defined:
DISABLED: The DISABLED state is the initial state before probing has DISABLED: The DISABLED state is the initial state before
started. It is also entered from any other state, when the PL probing has started. It is also entered from any
indicates loss of connectivity. This state is left, once the PL other state, when the PL indicates loss of
indicates connectivity to the remote PL. connectivity. This state is left, once the PL
indicates connectivity to the remote PL.
BASE: The BASE state is used to confirm that the BASE_PMTU size is BASE: The BASE state is used to confirm that the
supported by the network path and is designed to allow an BASE_PMTU size is supported by the network path and
application to continue working when there are transient is designed to allow an application to continue
reductions in the actual PMTU. It also seeks to avoid long working when there are transient reductions in the
periods where traffic is black holed while searching for a larger actual PMTU. It also seeks to avoid long periods
PLPMTU. where traffic is black holed while searching for a
larger PLPMTU.
On entry, the PROBED_SIZE is set to the BASE_PMTU size and the On entry, the PROBED_SIZE is set to the BASE_PMTU
PROBE_COUNT is set to zero. size and the PROBE_COUNT is set to zero.
Each time a probe packet is sent, and the PROBE_TIMER is started. Each time a probe packet is sent, the PROBE_TIMER
The state is exited when the probe packet is acknowledged, and the is started. The state is exited when the probe
PL sender enters the SEARCHING state. packet is acknowledged, and the PL sender enters
the SEARCHING state.
The state is also left when the PROBE_COUNT reaches MAX_PROBES; a The state is also left when the PROBE_COUNT reaches
PTB message is validated. This causes the PL sender to enter the MAX_PROBES or a received PTB message is validated.
ERROR state. This causes the PL sender to enter the ERROR state.
SEARCHING: The SEARCHING state is the main probing state. This SEARCHING: The SEARCHING state is the main probing state.
state is entered when probing for the BASE_PMTU was successful. This state is entered when probing for the
BASE_PMTU was successful.
The PROBE_COUNT is set to zero when the first probe packet is sent The PROBE_COUNT is set to zero when the first probe
for each probe size. Each time a probe packet is acknowledged, packet is sent for each probe size. Each time a
the PLPMTU is set to the PROBED_SIZE, and then the PROBED_SIZE is probe packet is acknowledged, the PLPMTU is set to
increased using the search algorithm. the PROBED_SIZE, and then the PROBED_SIZE is
increased using the search algorithm.
When a probe packet is sent and not acknowledged within the period When a probe packet is sent and not acknowledged
of the PROBE_TIMER, the PROBE_COUNT is incremented and the probe within the period of the PROBE_TIMER, the
packet is retransmitted. The state is exited when the PROBE_COUNT PROBE_COUNT is incremented and the probe packet is
reaches MAX_PROBES; a PTB message is validated; a probe of size retransmitted. The state is exited when the
MAX_PMTU is acknowledged or black hole detection is triggered. PROBE_COUNT reaches MAX_PROBES, a received PTB
message is validated, a probe of size MAX_PMTU is
acknowledged, or a black hole is detected.
SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates a successful SEARCH_COMPLETE: The SEARCH_COMPLETE state indicates a successful
end to the PROBE_SEARCH state. DPLPMTUD remains in this state end to the SEARCHING state. DPLPMTUD remains in
until either the PMTU_RAISE_TIMER expires; a received PTB message this state until either the PMTU_RAISE_TIMER
is validated; or black hole detection is triggered. expires, a received PTB message is validated, or a
black hole is detected.
When DPLPMTUD uses an unacknowledged PL and is in the
SEARCH_COMPLETE state, a CONFIRMATION_TIMER periodically resets
the PROBE_COUNT and schedules a probe packet with the size of the
PLPMTU. If the probe packet fails to be acknowledged after
MAX_PROBES attempts, the method enters the BASE state. When used
with an acknowledged PL (e.g., SCTP), DPLPMTUD SHOULD NOT continue
to generate PLPMTU probes in this state.
ERROR: The ERROR state represents the case where either the network When DPLPMTUD uses an unacknowledged PL and is in
path is not known to support a PLPMTU of at least the BASE_PMTU the SEARCH_COMPLETE state, a CONFIRMATION_TIMER
size or when there is contradictory information about the network periodically resets the PROBE_COUNT and schedules a
path that would otherwise result in excessive variation in the MPS probe packet with the size of the PLPMTU. If the
signalled to the higher layer. The state implements a method to probe packet fails to be acknowledged after
mitigate oscillation in the state-event engine. It signals a MAX_PROBES attempts, the method enters the BASE
conservative value of the MPS to the higher layer by the PL. The state. When used with an acknowledged PL (e.g.,
state is exited when Packet Probes no longer detect the error or SCTP), DPLPMTUD SHOULD NOT continue to generate
when the PL indicates that connectivity has been lost. PLPMTU probes in this state.
Implementations are permitted to enable endpoint fragmentation if ERROR: The ERROR state represents the case where either
the DPLPMTUD is unable to validate MIN_PMTU within PROBE_COUNT the network path is not known to support a PLPMTU
probes. If DPLPMTUD is unable to validate MIN_PMTU the of at least the BASE_PMTU size or when there is
implementation should transition to PROBE_DISABLED. contradictory information about the network path
that would otherwise result in excessive variation
in the MPS signalled to the higher layer. The
state implements a method to mitigate oscillation
in the state-event engine. It signals a
conservative value of the MPS to the higher layer
by the PL. The state is exited when packet probes
no longer detect the error or when the PL indicates
that connectivity has been lost.
Appendix A contains an informative description of key events. Implementations are permitted to enable endpoint
fragmentation if the DPLPMTUD is unable to validate
MIN_PMTU within PROBE_COUNT probes. If DPLPMTUD is
unable to validate MIN_PMTU the implementation
should transition to the DISABLED state.
5.4. Search to Increase the PLPMTU 5.3. Search to Increase the PLPMTU
This section describes the algorithms used by DPLPMTUD to search for This section describes the algorithms used by DPLPMTUD to search for
a larger PLPMTU. a larger PLPMTU.
5.4.1. Probing for a Larger PLPMTU 5.3.1. Probing for a larger PLPMTU
Implementations use a search algorithm across the search range to Implementations use a search algorithm across the search range to
determine whether a larger PLPMTU can be supported across a network determine whether a larger PLPMTU can be supported across a network
path. path.
The method discovers the search range by confirming the minimum The method discovers the search range by confirming the minimum
PLPMTU and then using the probe method to select a PROBED_SIZE less PLPMTU and then using the probe method to select a PROBED_SIZE less
than or equal to MAX_PMTU. MAX_PMTU is the minimum of the local MTU than or equal to MAX_PMTU. MAX_PMTU is the minimum of the local MTU
and EMTU_R (learned from the remote endpoint). The MAX_PMTU MAY be and EMTU_R (learned from the remote endpoint). The MAX_PMTU MAY be
reduced by an application that sets a maximum to the size of reduced by an application that sets a maximum to the size of
datagrams it will send. datagrams it will send.
The PROBE_COUNT is initialised to zero when a probe packet is first The PROBE_COUNT is initialized to zero when a probe packet is first
sent with a particular size. A timer is used by the search algorithm sent with a particular size. A timer is used by the search algorithm
to trigger the sending of probe packets of size PROBED_SIZE, larger to trigger the sending of probe packets of size PROBED_SIZE, larger
than the PLPMTU. Each probe packet successfully sent to the remote than the PLPMTU. Each probe packet successfully sent to the remote
peer is confirmed by acknowledgement at the PL, see Section 4.1. peer is confirmed by acknowledgement at the PL, see Section 4.1.
Each time a probe packet is sent to the destination, the PROBE_TIMER Each time a probe packet is sent to the destination, the PROBE_TIMER
is started. The timer is cancelled when the PL receives is started. The timer is canceled when the PL receives
acknowledgment that the probe packet has been successfully sent acknowledgment that the probe packet has been successfully sent
across the path Section 4.1. This confirms that the PROBED_SIZE is across the path Section 4.1. This confirms that the PROBED_SIZE is
supported, and the PROBED_SIZE value is then assigned to the PLPMTU. supported, and the PROBED_SIZE value is then assigned to the PLPMTU.
The search algorithm can continue to send subsequent probe packets of The search algorithm can continue to send subsequent probe packets of
an increasing size. an increasing size.
If the timer expires before a probe packet is acknowledged, the probe If the timer expires before a probe packet is acknowledged, the probe
has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER has failed to confirm the PROBED_SIZE. Each time the PROBE_TIMER
expires, the PROBE_COUNT is incremented, the PROBE_TIMER is expires, the PROBE_COUNT is incremented, the PROBE_TIMER is
reinitialised, and a probe packet of the same size is retransmitted reinitialized, and a probe packet of the same size is retransmitted
(the replicated probe improve the resilience to loss). The maximum (the replicated probe improve the resilience to loss). The maximum
number of retransmissions for a particular size is configured number of retransmissions for a particular size is configured
(MAX_PROBES). If the value of the PROBE_COUNT reaches MAX_PROBES, (MAX_PROBES). If the value of the PROBE_COUNT reaches MAX_PROBES,
probing will stop, and the PL sender enters the SEARCH_COMPLETE probing will stop, and the PL sender enters the SEARCH_COMPLETE
state. state.
5.4.2. Selection of Probe Sizes 5.3.2. Selection of Probe Sizes
The search algorithm needs to determine a minimum useful gain in The search algorithm needs to determine a minimum useful gain in
PLPMTU. It would not be constructive for a PL sender to attempt to PLPMTU. It would not be constructive for a PL sender to attempt to
probe for all sizes - this would incur unnecessary load on the path probe for all sizes. This would incur unnecessary load on the path
and has the undesirable effect of slowing the time to reach a more and has the undesirable effect of slowing the time to reach a more
optimal MPS. Implementations SHOULD select the set of probe packet optimal MPS. Implementations SHOULD select the set of probe packet
sizes to maximise the gain in PLPMTU from each search step. sizes to maximize the gain in PLPMTU from each search step.
Implementations could optimize the search procedure by selecting step Implementations could optimize the search procedure by selecting step
sizes from a table of common PMTU sizes. When selecting the sizes from a table of common PMTU sizes. When selecting the
appropriate next size to search, an implementor ought to also appropriate next size to search, an implementor ought to also
consider that there can be common sizes of MPS that applications seek consider that there can be common sizes of MPS that applications seek
to use. to use, and their could be common sizes of MTU used within the
network.
xxx Author Note: A future version of this section will detail example
methods for selecting probe size values, but does not plan to mandate
a single method. xxx
5.4.3. Resilience to Inconsistent Path Information 5.3.3. Resilience to Inconsistent Path Information
A decision to increase the PLPMTU needs to be resilient to the A decision to increase the PLPMTU needs to be resilient to the
possibility that information learned about the network path is possibility that information learned about the network path is
inconsistent (this could happen when probe packets are lost due to inconsistent. A path is inconsistent, when, for example, probe
other reasons, or some of the packets in a flow are forwarded along a packets are lost due to other reasons (i. e. not packet size) or due
portion of the path that supports a different actual PMTU). to frequent path changes. Frequent path changes could occur by
unexpected "flapping" - where some packets from a flow pass along one
path, but other packets follow a different path with different
properties.
Frequent path changes could occur due to unexpected "flapping" - A PL sender is able to detect inconsistency from the sequence of
where some packets from a flow pass along one path, but other packets PLPMTU probes that it sends or the sequence of PTB messages that it
follow a different path with different properties. DPLPMTUD can be receives. When inconsistent path information is detected, a PL
made resilient to these anomalies by introducing hysteresis into the sender could use an alternate search mode that clamps the offered MPS
search decision to increase the MPS. to a smaller value for a period of time. This avoids unnecessary
loss of packets due to MTU limitation.
5.4. Robustness to Inconsistent Paths
Some paths could be unable to sustain packets of the BASE_PMTU size.
To be robust to these paths an implementation could implement the
Error State. This allows fallback to a smaller than desired PLPMTU,
rather than suffer connectivity failure. This could utilize methods
such as endpoint IP fragmentation to enable the PL sender to
communicate using packets smaller than the BASE_PMTU.
6. Specification of Protocol-Specific Methods 6. Specification of Protocol-Specific Methods
This section specifies protocol-specific details for datagram PLPMTUD This section specifies protocol-specific details for datagram PLPMTUD
for IETF-specified transports. for IETF-specified transports.
The first subsection provides guidance on how to implement the The first subsection provides guidance on how to implement the
DPLPMTUD method as a part of an application using UDP or UDP-Lite. DPLPMTUD method as a part of an application using UDP or UDP-Lite.
The guidance also applies to other datagram services that do not The guidance also applies to other datagram services that do not
include a specific transport protocol (such as a tunnel include a specific transport protocol (such as a tunnel
encapsulation). The following subsections describe how DPLPMTUD can encapsulation). The following subsections describe how DPLPMTUD can
be implemented as a part of the transport service, allowing be implemented as a part of the transport service, allowing
applications using the service to benefit from discovery of the applications using the service to benefit from discovery of the
PLPMTU without themselves needing to implement this method. PLPMTU without themselves needing to implement this method.
6.1. Application support for DPLPMTUD with UDP or UDP-Lite 6.1. Application support for DPLPMTUD with UDP or UDP-Lite
The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do The current specifications of UDP [RFC0768] and UDP-Lite [RFC3828] do
skipping to change at page 29, line 28 skipping to change at page 28, line 51
The DPLPMTUD method can be implemented as a part of an application The DPLPMTUD method can be implemented as a part of an application
built directly or indirectly on UDP or UDP-Lite, but relies on built directly or indirectly on UDP or UDP-Lite, but relies on
higher-layer protocol features to implement the method [RFC8085]. higher-layer protocol features to implement the method [RFC8085].
Some primitives used by DPLPMTUD might not be available via the Some primitives used by DPLPMTUD might not be available via the
Datagram API (e.g., the ability to access the PLPMTU cache, or Datagram API (e.g., the ability to access the PLPMTU cache, or
interpret received PTB messages). interpret received PTB messages).
In addition, it is desirable that PMTU discovery is not performed by In addition, it is desirable that PMTU discovery is not performed by
multiple protocol layers. An application SHOULD avoid implementing multiple protocol layers. An application SHOULD avoid using DPLPMTUD
DPLPMTUD when the underlying transport system provides this when the underlying transport system provides this capability. To
capability. Using a common method for managing the PLPMTU has use common method for managing the PLPMTU has benefits, both in the
benefits, both in the ability to share state between different ability to share state between different processes and opportunities
processes and opportunities to coordinate probing. to coordinate probing.
6.1.1. Application Request 6.1.1. Application Request
An application needs an application-layer protocol mechanism (such as An application needs an application-layer protocol mechanism (such as
a message acknowledgement method) that solicits a response from a a message acknowledgement method) that solicits a response from a
destination endpoint. The method SHOULD allow the sender to check destination endpoint. The method SHOULD allow the sender to check
the value returned in the response to provide additional protection the value returned in the response to provide additional protection
from off-path insertion of data [RFC8085], suitable methods include a from off-path insertion of data [RFC8085], suitable methods include a
parameter known only to the two endpoints, such as a session ID or parameter known only to the two endpoints, such as a session ID or
initialised sequence number. initialized sequence number.
6.1.2. Application Response 6.1.2. Application Response
An application needs an application-layer protocol mechanism to An application needs an application-layer protocol mechanism to
communicate the response from the destination endpoint. This communicate the response from the destination endpoint. This
response may indicate successful reception of the probe across the response may indicate successful reception of the probe across the
path, but could also indicate that some (or all packets) have failed path, but could also indicate that some (or all packets) have failed
to reach the destination. to reach the destination.
6.1.3. Sending Application Probe Packets 6.1.3. Sending Application Probe Packets
A probe packet that may carry an application data block, but the A probe packet that may carry an application data block, but the
successful transmission of this data is at risk when used for successful transmission of this data is at risk when used for
probing. Some applications may prefer to use a probe packet that probing. Some applications may prefer to use a probe packet that
does not carry an application data block to avoid disruption to does not carry an application data block to avoid disruption to data
normal data transfer. transfer.
6.1.4. Validating the Path 6.1.4. Validating the Path
An application that does not have other higher-layer information An application that does not have other higher-layer information
confirming correct delivery of datagrams SHOULD implement the confirming correct delivery of datagrams SHOULD implement the
CONFIRMATION_TIMER to periodically send probe packets while in the CONFIRMATION_TIMER to periodically send probe packets while in the
SEARCH_COMPLETE state. SEARCH_COMPLETE state.
6.1.5. Handling of PTB Messages 6.1.5. Handling of PTB Messages
An application that is able and wishes to receive PTB messages MUST An application that is able and wishes to receive PTB messages MUST
perform ICMP validation as specified in Section 5.2 of [RFC8085]. perform ICMP validation as specified in Section 5.2 of [RFC8085].
This requires that the application to check each received PTB This requires that the application to check each received PTB
messages to validate it is received in response to transmitted messages to validate it is received in response to transmitted
traffic and that the reported PTB_SIZE is less than the current traffic and that the reported PTB_SIZE is less than the current
probed size (see Section 4.4.2). A validated PTB message MAY be used probed size (see Section 4.4.2). A validated PTB message MAY be used
as input to the DPLPMTUD algorithm, but MUST NOT be used directly to as input to the DPLPMTUD algorithm, but MUST NOT be used directly to
set the PLPMTU. set the PLPMTU.
6.2. DPLPMTUD with UDP Options 6.2. DPLPMTUD for SCTP
UDP Options[I-D.ietf-tsvwg-udp-options] can supply the additional
functionality required to implement DPLPMTUD within the UDP transport
service. Implementing DPLPMTUD using UDP Options avoids the need for
each application to implement the DPLPMTUD method.
Section 5.6 of[I-D.ietf-tsvwg-udp-options] defines the Maximum
Segment Size (MSS) option, which allows the local sender to indicate
the EMTU_R to the peer. The value received in this option can be
used to initialise MAX_PMTU.
UDP Options enables padding to be added to UDP datagrams that are
used as Probe Packets. Feedback confirming reception of each Probe
Packet is provided by two new UDP Options:
o The Probe Request Option (Section 6.2.1) is set by a sending PL to
solicit a response from a remote endpoint. A four-byte token
identifies each request.
o The Probe Response Option (Section 6.2.2 is generated by the UDP
Options receiver in response to reception of a previously received
Probe Request Option. Each Probe Response Option echoes a
previously received four-byte token.
The token value allows implementations to be distinguish between
acknowledgements for initial probe packets and acknowledgements
confirming receipt of subsequent probe packets (e.g., travelling
along alternate paths with a larger RTT). Each probe packet needs to
be uniquely identifiable by the UDP Options sender within the Maximum
Segment Lifetime (MSL). The UDP Options sender therefore needs to
not recycle token values until they have expired or have been
acknowledged. A 4 byte value for the token field provides sufficient
space for multiple unique probes to be made within the MSL.
The initial value of the four byte token field SHOULD be assigned to
a randomised value, as described in section 5.1 of [RFC8085]) to
enhance protection from off-path attacks.
Implementations ought to only send a probe packet with a Request
Probe Option when required by their local state machine, i.e., when
probing to grow the PLPMTU or to confirm the current PLPMTU. The
procedure to handle the loss of a response packet is the
responsibility of the sender of the request. Implementations are
allowed to track multiple requests and respond to them with a single
packet.
A PL needs to determine that the path can still support the size of
datagram that the application is currently sending in the DPLPMTUD
search_done state (i.e., to detect black-holing of data). One way to
achieve this is to send probe packets of size PLPMTU or to utilise a
higher-layer method that provides explicit feedback indicating any
packet loss. Another possibility is to utilise data packets that
carry a Timestamp Option. Reception of a valid timestamp that was
echoed by the remote endpoint can be used to infer connectivity.
This can provide useful feedback even over paths with asymmetric
capacity and/or that carry UDP Option flows that have very asymmetric
datagram rates, because an echo of the most recent timestamp still
indicates reception of at least one packet of the transmitted size.
This is sufficient to confirm there is no black hole.
In contrast, when sending a probe to increase the PLPMTU, a timestamp
might be unable to unambiguously identify that a specific probe
packet has been received. Timestamp mechanisms cannot be used to
confirm the reception of individual probe messages and cannot be used
to stimulate a response from the remote peer.
6.2.1. UDP Probe Request Option
The Probe Request Option allows a sending endpoint to solicit a
response from a destination endpoint.
The Probe Request Option carries a four byte token set by the sender.
This token can be set to a value that is likely to be known only to
the sender (and is sent along the end-to-end path). The initial
value of the token SHOULD be assigned to a randomised value, as
described in section 5.1 of [RFC8085]) to enhance protection from
off-path attacks.
The sender needs to then check the value returned in the UDP Probe
Response Option. The value of the Token field, uniquely identifies a
probe within the maximum segment lifetime.
+----------+--------+-----------------+
| Kind=9* | Len=6 | Token |
+----------+--------+-----------------+
1 byte 1 byte 4 bytes
* To be confirmed by IANA.
Figure 5: UDP Probe REQ Option Format
6.2.2. UDP Probe Response Option
The Probe Response Option is generated in response to reception of a
previously received Probe Request Option. This response is generated
by the UDP Option processing.
The Probe Response Option carries a four byte token field. The Token
field associates the response with the Token value carried in the
most recently-received Echo Request. The rate of generation of UDP
packets carrying a Probe Response Option is expected to be less than
once per RTT and SHOULD be rate-limited (see Section 9).
+----------+--------+-----------------+
| Kind=10* | Len=6 | Token |
+----------+--------+-----------------+
1 byte 1 byte 4 bytes
* To be confirmed by IANA.
Figure 6: UDP Probe RES Option Format
6.3. DPLPMTUD for SCTP
Section 10.2 of [RFC4821] specifies a recommended PLPMTUD probing Section 10.2 of [RFC4821] specifies a recommended PLPMTUD probing
method for SCTP. It recommends the use of the PAD chunk, defined in method for SCTP. It recommends the use of the PAD chunk, defined in
[RFC4820] to be attached to a minimum length HEARTBEAT chunk to build [RFC4820] to be attached to a minimum length HEARTBEAT chunk to build
a probe packet. This enables probing without affecting the transfer a probe packet. This enables probing without affecting the transfer
of user messages and without interfering with congestion control. of user messages and without interfering with congestion control.
This is preferred to using DATA chunks (with padding as required) as This is preferred to using DATA chunks (with padding as required) as
path probes. path probes.
XXX Author Note: Future versions of this document might define a XXX Author Note: Future versions of this document might define a
parameter contained in the INIT and INIT ACK chunk to indicate the parameter contained in the INIT and INIT ACK chunk to indicate the
remote peer MTU to the local peer. However, multihoming makes this a remote peer MTU to the local peer. However, multihoming makes this a
bit complex, so it might not be worth doing. XXX bit complex, so it might not be worth doing. XXX
6.3.1. SCTP/IPv4 and SCTP/IPv6 6.2.1. SCTP/IPv4 and SCTP/IPv6
The base protocol is specified in [RFC4960]. This provides an The base protocol is specified in [RFC4960]. This provides an
acknowledged PL. A sender can therefore enter the PROBE_BASE state acknowledged PL. A sender can therefore enter the BASE state as soon
as soon as connectivity has been confirmed. as connectivity has been confirmed.
6.3.1.1. Sending SCTP Probe Packets 6.2.1.1. Sending SCTP Probe Packets
Probe packets consist of an SCTP common header followed by a Probe packets consist of an SCTP common header followed by a
HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control
the length of the probe packet. The HEARTBEAT chunk is used to the length of the probe packet. The HEARTBEAT chunk is used to
trigger the sending of a HEARTBEAT ACK chunk. The reception of the trigger the sending of a HEARTBEAT ACK chunk. The reception of the
HEARTBEAT ACK chunk acknowledges reception of a successful probe. HEARTBEAT ACK chunk acknowledges reception of a successful probe.
The HEARTBEAT chunk carries a Heartbeat Information parameter which The HEARTBEAT chunk carries a Heartbeat Information parameter which
should include, besides the information suggested in [RFC4960], the should include, besides the information suggested in [RFC4960], the
probe size, which is the size of the complete datagram. The size of probe size, which is the size of the complete datagram. The size of
the PAD chunk is therefore computed by reducing the probing size by the PAD chunk is therefore computed by reducing the probing size by
the IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT the IPv4 or IPv6 header size, the SCTP common header, the HEARTBEAT
request and the PAD chunk header. The payload of the PAD chunk request and the PAD chunk header. The payload of the PAD chunk
contains arbitrary data. contains arbitrary data.
To avoid fragmentation of retransmitted data, probing starts right To avoid fragmentation of retransmitted data, probing starts right
after the handshake, before data is sent. Assuming normal behaviour after the PL handshake, before data is sent. Assuming this behavior
(i.e., the PMTU is smaller than or equal to the interface MTU), this (i.e., the PMTU is smaller than or equal to the interface MTU), this
process will take a few round trip time periods depending on the process will take a few round trip time periods depending on the
number of PMTU sizes probed. The Heartbeat timer can be used to number of PMTU sizes probed. The Heartbeat timer can be used to
implement the PROBE_TIMER. implement the PROBE_TIMER.
6.3.1.2. Validating the Path with SCTP 6.2.1.2. Validating the Path with SCTP
Since SCTP provides an acknowledged PL, a sender MUST NOT implement Since SCTP provides an acknowledged PL, a sender MUST NOT implement
the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state.
6.3.1.3. PTB Message Handling by SCTP 6.2.1.3. PTB Message Handling by SCTP
Normal ICMP validation MUST be performed as specified in Appendix C Normal ICMP validation MUST be performed as specified in Appendix C
of [RFC4960]. This requires that the first 8 bytes of the SCTP of [RFC4960]. This requires that the first 8 bytes of the SCTP
common header are quoted in the payload of the PTB message, which can common header are quoted in the payload of the PTB message, which can
be the case for ICMPv4 and is normally the case for ICMPv6. be the case for ICMPv4 and is normally the case for ICMPv6.
When a PTB message has been validated, the PTB_SIZE reported in the When a PTB message has been validated, the PTB_SIZE reported in the
PTB message SHOULD be used with the DPLPMTUD algorithm, providing PTB message SHOULD be used with the DPLPMTUD algorithm, providing
that the reported PTB_SIZE is less than the current probe size. that the reported PTB_SIZE is less than the current probe size (see
Section 4.4).
6.3.2. DPLPMTUD for SCTP/UDP 6.2.2. DPLPMTUD for SCTP/UDP
The UDP encapsulation of SCTP is specified in [RFC6951]. The UDP encapsulation of SCTP is specified in [RFC6951].
6.3.2.1. Sending SCTP/UDP Probe Packets 6.2.2.1. Sending SCTP/UDP Probe Packets
Packet probing can be performed as specified in Section 6.3.1.1. The Packet probing can be performed as specified in Section 6.2.1.1. The
maximum payload is reduced by 8 bytes, which has to be considered maximum payload is reduced by 8 bytes, which has to be considered
when filling the PAD chunk. when filling the PAD chunk.
6.3.2.2. Validating the Path with SCTP/UDP 6.2.2.2. Validating the Path with SCTP/UDP
Since SCTP provides an acknowledged PL, a sender MUST NOT implement Since SCTP provides an acknowledged PL, a sender MUST NOT implement
the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state.
6.3.2.3. Handling of PTB Messages by SCTP/UDP 6.2.2.3. Handling of PTB Messages by SCTP/UDP
Normal ICMP validation MUST be performed for PTB messages as ICMP validation MUST be performed for PTB messages as specified in
specified in Appendix C of [RFC4960]. This requires that the first 8 Appendix C of [RFC4960]. This requires that the first 8 bytes of the
bytes of the SCTP common header are contained in the PTB message, SCTP common header are contained in the PTB message, which can be the
which can be the case for ICMPv4 (but note the UDP header also case for ICMPv4 (but note the UDP header also consumes a part of the
consumes a part of the quoted packet header) and is normally the case quoted packet header) and is normally the case for ICMPv6. When the
for ICMPv6. When the validation is completed, the PTB_SIZE indicated validation is completed, the PTB_SIZE indicated in the PTB message
in the PTB message SHOULD be used with the DPLPMTUD providing that SHOULD be used with the DPLPMTUD providing that the reported PTB_SIZE
the reported PTB_SIZE is less than the current probe size. is less than the current probe size.
6.3.3. DPLPMTUD for SCTP/DTLS 6.2.3. DPLPMTUD for SCTP/DTLS
The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is The Datagram Transport Layer Security (DTLS) encapsulation of SCTP is
specified in [RFC8261]. It is used for data channels in WebRTC specified in [RFC8261]. It is used for data channels in WebRTC
implementations. implementations.
6.3.3.1. Sending SCTP/DTLS Probe Packets 6.2.3.1. Sending SCTP/DTLS Probe Packets
Packet probing can be done as specified in Section 6.3.1.1. Packet probing can be done as specified in Section 6.2.1.1.
6.3.3.2. Validating the Path with SCTP/DTLS 6.2.3.2. Validating the Path with SCTP/DTLS
Since SCTP provides an acknowledged PL, a sender MUST NOT implement Since SCTP provides an acknowledged PL, a sender MUST NOT implement
the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state.
6.3.3.3. Handling of PTB Messages by SCTP/DTLS 6.2.3.3. Handling of PTB Messages by SCTP/DTLS
It is not possible to perform normal ICMP validation as specified in It is not possible to perform ICMP validation as specified in
[RFC4960], since even if the ICMP message payload contains sufficient [RFC4960], since even if the ICMP message payload contains sufficient
information, the reflected SCTP common header would be encrypted. information, the reflected SCTP common header would be encrypted.
Therefore it is not possible to process PTB messages at the PL. Therefore it is not possible to process PTB messages at the PL.
6.4. DPLPMTUD for QUIC 6.3. DPLPMTUD for QUIC
Quick UDP Internet Connection (QUIC) [I-D.ietf-quic-transport] is a Quick UDP Internet Connection (QUIC) [I-D.ietf-quic-transport] is a
UDP-based transport that provides reception feedback. The UDP UDP-based transport that provides reception feedback. The UDP
payload includes the QUIC packet header, protected payload, and any payload includes the QUIC packet header, protected payload, and any
authentication fields. QUIC depends on a PMTU of at least 1280 authentication fields. QUIC depends on a PMTU of at least 1280
bytes. bytes.
Section 9.2 of [I-D.ietf-quic-transport] describes the path Section 14.1 of [I-D.ietf-quic-transport] describes the path
considerations when sending QUIC packets. It recommends the use of considerations when sending QUIC packets. It recommends the use of
PADDING frames to build the probe packet. Pure probe-only packets PADDING frames to build the probe packet. Pure probe-only packets
are constructed with PADDING frames and PING frames to create a are constructed with PADDING frames and PING frames to create a
padding only packet that will elicit an acknowledgement. Padding padding only packet that will elicit an acknowledgement. Such
only frames enable probing the without affecting the transfer of padding only packets enable probing without affecting the transfer of
other QUIC frames. other QUIC frames.
The recommendation for QUIC endpoints implementing DPLPMTUD is The recommendation for QUIC endpoints implementing DPLPMTUD is that a
therefore that a MPS is maintained for each combination of local and MPS is maintained for each combination of local and remote IP
remote IP addresses [I-D.ietf-quic-transport]. If a QUIC endpoint addresses [I-D.ietf-quic-transport]. If a QUIC endpoint determines
determines that the PMTU between any pair of local and remote IP that the PMTU between any pair of local and remote IP addresses has
addresses has fallen below an acceptable MPS, it needs to immediately fallen below an acceptable MPS, it needs to immediately cease sending
cease sending QUIC packets on the affected path. This could result QUIC packets on the affected path. This could result in termination
in termination of the connection if an alternative path cannot be of the connection if an alternative path cannot be found
found [I-D.ietf-quic-transport]. [I-D.ietf-quic-transport].
6.4.1. Sending QUIC Probe Packets 6.3.1. Sending QUIC Probe Packets
A probe packet consists of a QUIC Header and a payload containing A probe packet consists of a QUIC Header and a payload containing
PADDING Frames and a PING Frame. PADDING Frames are a single octet PADDING Frames and a PING Frame. PADDING Frames are a single octet
(0x00) and several of these can be used to create a probe packet of (0x00) and several of these can be used to create a probe packet of
size PROBED_SIZE. QUIC provides an acknowledged PL, A sender can size PROBED_SIZE. QUIC provides an acknowledged PL, a sender can
therefore enter the PROBE_BASE state as soon as connectivity has been therefore enter the BASE state as soon as connectivity has been
confirmed. confirmed.
The current specification of QUIC sets the following: The current specification of QUIC sets the following:
o BASE_PMTU: 1200. A QUIC sender needs to pad initial packets to * BASE_PMTU: 1200. A QUIC sender needs to pad initial packets to
1200 bytes to confirm the path can support packets of a useful 1200 bytes to confirm the path can support packets of a useful
size. size.
o MIN_PMTU: 1200 bytes. A QUIC sender that determines the PMTU has * MIN_PMTU: 1200 bytes. A QUIC sender that determines the PMTU has
fallen below 1200 bytes MUST immediately stop sending on the fallen below 1200 bytes MUST immediately stop sending on the
affected path. affected path.
6.4.2. Validating the Path with QUIC 6.3.2. Validating the Path with QUIC
QUIC provides an acknowledged PL. A sender therefore MUST NOT QUIC provides an acknowledged PL. A sender therefore MUST NOT
implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state. implement the CONFIRMATION_TIMER while in the SEARCH_COMPLETE state.
6.4.3. Handling of PTB Messages by QUIC 6.3.3. Handling of PTB Messages by QUIC
QUIC operates over the UDP transport, and the guidelines on ICMP QUIC operates over the UDP transport, and the guidelines on ICMP
validation as specified in Section 5.2 of [RFC8085] therefore apply. validation as specified in Section 5.2 of [RFC8085] therefore apply.
In addition to UDP Port validation QUIC can validate an ICMP message In addition to UDP Port validation QUIC can validate an ICMP message
by looking for valid Connection IDs in the quoted packet. by looking for valid Connection IDs in the quoted packet.
6.4. DPLPMTUD for UDP-Options
UDP Options [I-D.ietf-tsvwg-udp-options] provides a way to extend UDP
to provide new transport mechanisms.
Support for using DPLPMTUD with UDP-Options is defined in the UDP-
Options specification [I-D.ietf-tsvwg-udp-options].
7. Acknowledgements 7. Acknowledgements
This work was partially funded by the European Union's Horizon 2020 This work was partially funded by the European Union's Horizon 2020
research and innovation programme under grant agreement No. 644334 research and innovation programme under grant agreement No. 644334
(NEAT). The views expressed are solely those of the author(s). (NEAT). The views expressed are solely those of the author(s).
8. IANA Considerations 8. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
XXX If new UDP Options are specified in this document, a request to
IANA will be included here. XXX
If there are no requirements for IANA, the section will be removed If there are no requirements for IANA, the section will be removed
during conversion into an RFC by the RFC Editor. during conversion into an RFC by the RFC Editor.
9. Security Considerations 9. Security Considerations
The security considerations for the use of UDP and SCTP are provided The security considerations for the use of UDP and SCTP are provided
in the references RFCs. Security guidance for applications using UDP in the references RFCs. Security guidance for applications using UDP
is provided in the UDP Usage Guidelines [RFC8085], specifically the is provided in the UDP Usage Guidelines [RFC8085], specifically the
generation of probe packets is regarded as a "Low Data-Volume generation of probe packets is regarded as a "Low Data-Volume
Application", described in section 3.1.3 of this document. This Application", described in section 3.1.3 of this document. This
recommends that sender limits generation of probe packets to an recommends that sender limits generation of probe packets to an
average rate lower than one probe per 3 seconds. average rate lower than one probe per 3 seconds.
A PL sender needs to ensure that the method used to confirm reception A PL sender needs to ensure that the method used to confirm reception
of probe packets offers protection from off-path attackers injecting of probe packets offers protection from off-path attackers injecting
packets into the path. This protection if provided in IETF-defined packets into the path. This protection if provided in IETF-defined
protocols (e.g., TCP, SCTP) using a randomly-initialised sequence protocols (e.g., TCP, SCTP) using a randomly-initialized sequence
number. A description of one way to do this when using UDP is number. A description of one way to do this when using UDP is
provided in section 5.1 of [RFC8085]). provided in section 5.1 of [RFC8085]).
There are cases where ICMP Packet Too Big (PTB) messages are not There are cases where ICMP Packet Too Big (PTB) messages are not
delivered due to policy, configuration or equipment design (see delivered due to policy, configuration or equipment design (see
Section 1.1), this method therefore does not rely upon PTB messages Section 1.1), this method therefore does not rely upon PTB messages
being received, but is able to utilise these when they are received being received, but is able to utilize these when they are received
by the sender. PTB messages could potentially be used to cause a by the sender. PTB messages could potentially be used to cause a
node to inappropriately reduce the PLPMTU. A node supporting node to inappropriately reduce the PLPMTU. A node supporting
DPLPMTUD MUST therefore appropriately validate the payload of PTB DPLPMTUD MUST therefore appropriately validate the payload of PTB
messages to ensure these are received in response to transmitted messages to ensure these are received in response to transmitted
traffic (i.e., a reported error condition that corresponds to a traffic (i.e., a reported error condition that corresponds to a
datagram actually sent by the path layer, see Section 4.4.1). datagram actually sent by the path layer, see Section 4.4.1).
An on-path attacker, able to create a PTB message could forge PTB An on-path attacker, able to create a PTB message could forge PTB
messages that include a valid quoted IP packet. Such an attack could messages that include a valid quoted IP packet. Such an attack could
be used to drive down the PLPMTU. There are two ways this method can be used to drive down the PLPMTU. There are two ways this method can
be mitigated against such attacks: First, by ensuring that a PL be mitigated against such attacks: First, by ensuring that a PL
sender never reduces the PLPMTU below the base size, solely in sender never reduces the PLPMTU below the base size, solely in
response to receiving a PTB message. This is achieved by first response to receiving a PTB message. This is achieved by first
entering the PROBE_BASE state when such a message is received. entering the BASE state when such a message is received. Second, the
Second, the design does not require processing of PTB messages, a PL design does not require processing of PTB messages, a PL sender could
sender could therefore suspend processing of PTB messages (e.g., in a therefore suspend processing of PTB messages (e.g., in a robustness
robustness mode after detecting that subsequent probes actually mode after detecting that subsequent probes actually confirm that a
confirm that a size larger than the PTB_SIZE is supported by a path). size larger than the PTB_SIZE is supported by a path).
Parallel forwarding paths SHOULD be considered. Section 5.2.5.1 Parallel forwarding paths SHOULD be considered. Section 5.4
identifies the need for robustness in the method when the path identifies the need for robustness in the method when the path
information may be inconsistent. information may be inconsistent.
A node performing DPLPMTUD could experience conflicting information A node performing DPLPMTUD could experience conflicting information
about the size of supported probe packets. This could occur when about the size of supported probe packets. This could occur when
there are multiple paths are concurrently in use and these exhibit a there are multiple paths are concurrently in use and these exhibit a
different PMTU. If not considered, this could result in data being different PMTU. If not considered, this could result in data being
black holed when the PLPMTU is larger than the smallest PMTU across black holed when the PLPMTU is larger than the smallest PMTU across
the current paths. the current paths.
10. References 10. References
10.1. Normative References 10.1. Normative References
[I-D.ietf-quic-transport] [I-D.ietf-quic-transport]
Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
and Secure Transport", draft-ietf-quic-transport-16 (work and Secure Transport", draft-ietf-quic-transport-20 (work
in progress), October 2018. in progress), 23 April 2019,
<http://www.ietf.org/internet-drafts/draft-ietf-quic-
[I-D.ietf-tsvwg-udp-options] transport-20.txt>.
Touch, J., "Transport Options for UDP", draft-ietf-tsvwg-
udp-options-05 (work in progress), July 2018.
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768,
DOI 10.17487/RFC0768, August 1980, DOI 10.17487/RFC0768, August 1980,
<https://www.rfc-editor.org/info/rfc768>. <https://www.rfc-editor.org/info/rfc768>.
[RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [RFC1191] Mogul, J.C. and S.E. Deering, "Path MTU discovery",
DOI 10.17487/RFC1191, November 1990, RFC 1191, DOI 10.17487/RFC1191, November 1990,
<https://www.rfc-editor.org/info/rfc1191>. <https://www.rfc-editor.org/info/rfc1191>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460, (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
December 1998, <https://www.rfc-editor.org/info/rfc2460>. December 1998, <https://www.rfc-editor.org/info/rfc2460>.
skipping to change at page 39, line 34 skipping to change at page 36, line 24
[RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto, [RFC8261] Tuexen, M., Stewart, R., Jesup, R., and S. Loreto,
"Datagram Transport Layer Security (DTLS) Encapsulation of "Datagram Transport Layer Security (DTLS) Encapsulation of
SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November SCTP Packets", RFC 8261, DOI 10.17487/RFC8261, November
2017, <https://www.rfc-editor.org/info/rfc8261>. 2017, <https://www.rfc-editor.org/info/rfc8261>.
10.2. Informative References 10.2. Informative References
[I-D.ietf-intarea-tunnels] [I-D.ietf-intarea-tunnels]
Touch, J. and M. Townsley, "IP Tunnels in the Internet Touch, J. and M. Townsley, "IP Tunnels in the Internet
Architecture", draft-ietf-intarea-tunnels-09 (work in Architecture", draft-ietf-intarea-tunnels-09 (work in
progress), July 2018. progress), 19 July 2018,
<http://www.ietf.org/internet-drafts/draft-ietf-intarea-
tunnels-09.txt>.
[I-D.ietf-tsvwg-udp-options]
Touch, J., "Transport Options for UDP", draft-ietf-tsvwg-
udp-options-07 (work in progress), 8 March 2019,
<http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-udp-
options-07.txt>.
[RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5,
RFC 792, DOI 10.17487/RFC0792, September 1981, RFC 792, DOI 10.17487/RFC0792, September 1981,
<https://www.rfc-editor.org/info/rfc792>. <https://www.rfc-editor.org/info/rfc792>.
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989, DOI 10.17487/RFC1122, October 1989,
<https://www.rfc-editor.org/info/rfc1122>. <https://www.rfc-editor.org/info/rfc1122>.
skipping to change at page 40, line 25 skipping to change at page 37, line 22
[RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007,
<https://www.rfc-editor.org/info/rfc4821>. <https://www.rfc-editor.org/info/rfc4821>.
[RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering [RFC4890] Davies, E. and J. Mohacsi, "Recommendations for Filtering
ICMPv6 Messages in Firewalls", RFC 4890, ICMPv6 Messages in Firewalls", RFC 4890,
DOI 10.17487/RFC4890, May 2007, DOI 10.17487/RFC4890, May 2007,
<https://www.rfc-editor.org/info/rfc4890>. <https://www.rfc-editor.org/info/rfc4890>.
Appendix A. Event-driven state changes Appendix A. Revision Notes
This appendix contains an informative description of key events:
Path Setup: When a new path is initiated, the state is set to
PROBE_START. This sends a probe packet with the size of the
BASE_PMTU. As soon as the path is confirmed, the state changes to
PROBE_SEARCH.
Arrival of an Acknowledgment: Depending on the probing state, the
reaction differs according to Figure 7, which is a simplification
of Figure 4 focusing on this event.
+--------------+ +----------------+
| PROBE_START | --3------------------------------> | PROBE_DISABLED |
+--------------+ --4---------------- ------------> +----------------+
\/
+--------------+ /\ +--------------+
| PROBE_ERROR | -------------------- \ ----------> | PROBE_BASE |
+--------------+ --4--------------/ \ +--------------+
\
+--------------+ --1 -------- \ +--------------+
| PROBE_BASE | \ --- \ ------> | PROBE_ERROR |
+--------------+ --3--------- \ -----/ \ +--------------+
\ \
+--------------+ \ -----> +--------------+
| PROBE_SEARCH | --2--- -----------------> | PROBE_SEARCH |
+--------------+ \ ------------------> +--------------+
\ ---- /
+---------------+ / \ +---------------+
|SEARCH_COMPLETE| -1--- \ |SEARCH_COMPLETE|
+---------------+ -5-- -----------------------> +---------------+
\
\ +--------------+
--------------------------> | PROBE_BASE |
+--------------+
Condition 1: The maximum PMTU size has not yet been reached.
Condition 2: The maximum PMTU size has been reached. Condition 3:
Probe Timer expires and PROBE_COUNT = MAX_PROBEs. Condition 4:
PROBE_ACK received. Condition 5: Black hole detected.
Figure 7: State changes at the arrival of an acknowledgment
Probing timeout: The PROBE_COUNT is initialised to zero each time
the value of PROBED_SIZE is changed and when a acknowledgment
confirming delivery of a probe packet. The PROBE_TIMER is started
each time a probe packet is sent. It is stopped when an
acknowledgment arrives that confirms delivery of a probe packet of
PROBED_SIZE. If the probe packet is not acknowledged before the
PROBE_TIMER expires, the PROBE_COUNT is incremented. When the
PROBE_COUNT equals the value MAX_PROBES, the state is changed,
otherwise a new probe packet of the same size (PROBED_SIZE) is
resent. The state transitions are illustrated in Figure 8. This
shows a simplification of Figure 4 with a focus only on this
event.
+--------------+ +----------------+
| PROBE_START | --2------------------------------->| PROBE_DISABLED |
+--------------+ +----------------+
+--------------+ +--------------+
| PROBE_ERROR | -----------------> | PROBE_ERROR |
+--------------+ / +--------------+
/
+--------------+ --2----------/ +--------------+
| PROBE_BASE | --1------------------------------> | PROBE_BASE |
+--------------+ +--------------+
+--------------+ +--------------+
| PROBE_SEARCH | --1------------------------------> | PROBE_SEARCH |
+--------------+ --2--------- +--------------+
\
+---------------+ \ +---------------+
|SEARCH_COMPLETE| -------------------> |SEARCH_COMPLETE|
+---------------+ +---------------+
Condition 1: The maximum number of probe packets has not been
reached. Condition 2: The maximum number of probe packets has been
reached. XXX This diagram has not been validated.
Figure 8: State changes at the expiration of the probe timer
PMTU raise timer timeout: DPLPMTUD periodically sends a probe packet
to detect whether a larger PMTU is possible. This probe packet is
generated by the PMTU_RAISE_TIMER.
Arrival of a PTB message: The active probing of the path can be
supported by the arrival of a PTB message indicating the PTB_SIZE.
Two examples are:
1. The PTB_SIZE is between the PLPMTU and the probe that
triggered the PTB message.
2. The PTB_SIZE is smaller than the PLPMTU.
In first case, the PROBE_BASE state transitions to the PROBE_ERROR
state. In the PROBE_SEARCH state, a new probe packet is sent with
the size reported by the PTB message.
In second case, the probing starts again with a value of
PROBE_BASE.
Appendix B. Revision Notes
Note to RFC-Editor: please remove this entire section prior to Note to RFC-Editor: please remove this entire section prior to
publication. publication.
Individual draft -00: Individual draft -00:
o Comments and corrections are welcome directly to the authors or * Comments and corrections are welcome directly to the authors or
via the IETF TSVWG working group mailing list. via the IETF TSVWG working group mailing list.
o This update is proposed for WG comments. * This update is proposed for WG comments.
Individual draft -01: Individual draft -01:
o Contains the first representation of the algorithm, showing the * Contains the first representation of the algorithm, showing the
states and timers states and timers
o This update is proposed for WG comments. * This update is proposed for WG comments.
Individual draft -02: Individual draft -02:
o Contains updated representation of the algorithm, and textual * Contains updated representation of the algorithm, and textual
corrections. corrections.
o The text describing when to set the effective PMTU has not yet * The text describing when to set the effective PMTU has not yet
been validated by the authors been validated by the authors
o To determine security to off-path-attacks: We need to decide * To determine security to off-path-attacks: We need to decide
whether a received PTB message SHOULD/MUST be validated? The text whether a received PTB message SHOULD/MUST be validated? The text
on how to handle a PTB message indicating a link MTU larger than on how to handle a PTB message indicating a link MTU larger than
the probe has yet not been validated by the authors the probe has yet not been validated by the authors
o No text currently describes how to handle inconsistent results * No text currently describes how to handle inconsistent results
from arbitrary re-routing along different parallel paths from arbitrary re-routing along different parallel paths
o This update is proposed for WG comments. * This update is proposed for WG comments.
Working Group draft -00: Working Group draft -00:
o This draft follows a successful adoption call for TSVWG * This draft follows a successful adoption call for TSVWG
o There is still work to complete, please comment on this draft. * There is still work to complete, please comment on this draft.
Working Group draft -01: Working Group draft -01:
o This draft includes improved introduction. * This draft includes improved introduction.
o The draft is updated to require ICMP validation prior to accepting * The draft is updated to require ICMP validation prior to accepting
PTB messages - this to be confirmed by WG PTB messages - this to be confirmed by WG
o Section added to discuss Selection of Probe Size - methods to be * Section added to discuss Selection of Probe Size - methods to be
evlauated and recommendations to be considered evlauated and recommendations to be considered
o Section added to align with work proposed in the QUIC WG. * Section added to align with work proposed in the QUIC WG.
Working Group draft -02: Working Group draft -02:
o The draft was updated based on feedback from the WG, and a * The draft was updated based on feedback from the WG, and a
detailed review by Magnus Westerlund. detailed review by Magnus Westerlund.
o The document updates RFC 4821. * The document updates RFC 4821.
o Requirements list updated. * Requirements list updated.
o Added more explicit discussion of a simpler black-hole detection * Added more explicit discussion of a simpler black-hole detection
mode. mode.
o This draft includes reorganisation of the section on IETF * This draft includes reorganisation of the section on IETF
protocols. protocols.
o Added more discussion of implementation within an application. * Added more discussion of implementation within an application.
o Added text on flapping paths. * Added text on flapping paths.
o Replaced 'effective MTU' with new term PLPMTU. * Replaced 'effective MTU' with new term PLPMTU.
Working Group draft -03: Working Group draft -03:
o Updated figures * Updated figures
o Added more discussion on blackhole detection * Added more discussion on blackhole detection
o Added figure describing just blackhole detection * Added figure describing just blackhole detection
o Added figure relating MPS sizes * Added figure relating MPS sizes
Working Group draft -04: Working Group draft -04:
o Described phases and named these consistently. * Described phases and named these consistently.
o Corrected transition from confirmation directly to the search * Corrected transition from confirmation directly to the search
phase (Base has been checked). phase (Base has been checked).
o Redrawn state diagrams. * Redrawn state diagrams.
o Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU). * Renamed BASE_MTU to BASE_PMTU (because it is a base for the PMTU).
o Clarified Error state. * Clarified Error state.
o Clarified supsending DPLPMTUD. * Clarified supsending DPLPMTUD.
o Verified normative text in requirements section. * Verified normative text in requirements section.
o Removed duplicate text. * Removed duplicate text.
o Changed all text to refer to /packet probe/probe packet/ * Changed all text to refer to /packet probe/probe packet/
/validation/verification/ added term /Probe Confirmation/ and /validation/verification/ added term /Probe Confirmation/ and
clarified BlackHole detection. clarified BlackHole detection.
Working Group draft -05: Working Group draft -05:
o Updated security considerations. * Updated security considerations.
o Feedback after speaking with Joe Touch helped improve UDP-Options * Feedback after speaking with Joe Touch helped improve UDP-Options
description. description.
Working Group draft -06: Working Group draft -06:
o Updated description of ICMP issues in section 1.1 * Updated description of ICMP issues in section 1.1
o Update to description of QUIC. * Update to description of QUIC.
Working group draft -07: Working group draft -07:
o Moved description of the PTB processing method from the PTB * Moved description of the PTB processing method from the PTB
requirements section. requirements section.
o Clarified what is performed in the PTB validation check. * Clarified what is performed in the PTB validation check.
o Updated security consideration to explain PTB security without * Updated security consideration to explain PTB security without
needing to read the rest of the document. needing to read the rest of the document.
o Reformatted state machine diagram * Reformatted state machine diagram
Working group draft -08:
* Moved to rfcxml v3+
* Rendered diagrams to svg in html version.
* Removed Appendix A. Event-driven state changes.
* Removed section on DPLPMTUD with UDP Options.
* Shortened the dsecription of phases.
Authors' Addresses Authors' Addresses
Godred Fairhurst Godred Fairhurst
University of Aberdeen University of Aberdeen
School of Engineering School of Engineering, Fraser Noble Building
Fraser Noble Building Aberdeen
Aberdeen AB24 3UE AB24 3UE
UK United Kingdom
Email: gorry@erg.abdn.ac.uk Email: gorry@erg.abdn.ac.uk
Tom Jones Tom Jones
University of Aberdeen University of Aberdeen
School of Engineering School of Engineering, Fraser Noble Building
Fraser Noble Building Aberdeen
Aberdeen AB24 3UE AB24 3UE
UK United Kingdom
Email: tom@erg.abdn.ac.uk Email: tom@erg.abdn.ac.uk
Michael Tuexen Michael Tuexen
Muenster University of Applied Sciences Muenster University of Applied Sciences
Stegerwaldstrasse 39 Stegerwaldstrasse 39
Steinfurt 48565 48565 Steinfurt
DE Germany
Email: tuexen@fh-muenster.de Email: tuexen@fh-muenster.de
Irene Ruengeler Irene Ruengeler
Muenster University of Applied Sciences Muenster University of Applied Sciences
Stegerwaldstrasse 39 Stegerwaldstrasse 39
Steinfurt 48565 48565 Steinfurt
DE Germany
Email: i.ruengeler@fh-muenster.de Email: i.ruengeler@fh-muenster.de
Timo Voelker Timo Voelker
Muenster University of Applied Sciences Muenster University of Applied Sciences
Stegerwaldstrasse 39 Stegerwaldstrasse 39
Steinfurt 48565 48565 Steinfurt
DE Germany
Email: timo.voelker@fh-muenster.de Email: timo.voelker@fh-muenster.de
 End of changes. 248 change blocks. 
951 lines changed or deleted 692 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/