TCP Maintenance Working Group                                   Y. Cheng
Internet-Draft                                               N. Cardwell
Intended status: Standards Track                            N. Dukkipati
Expires: September 10, 2020 January 14, 2021                                         P. Jha
                                                             Google, Inc
                                                           March 9,
                                                           July 13, 2020

        RACK:

        RACK-TLP: a time-based fast efficient loss detection algorithm for TCP
                        draft-ietf-tcpm-rack-08
                        draft-ietf-tcpm-rack-09

Abstract

   This document presents a new TCP the RACK-TLP loss detection algorithm called for TCP.
   RACK-TLP uses per-segment transmit timestamp and selective
   acknowledgement (SACK) [RFC2018] and has two parts: RACK ("Recent ACKnowledgment").
   ACKnowledgment") starts fast recovery quickly using time-based
   inferences derived from ACK feedback.  TLP ("Tail Loss Probe")
   leverages RACK uses the notion of time, instead of and sends a probe packet or sequence counts, to detect losses, for modern TCP
   implementations that can support per-packet timestamps and trigger ACK feedback to
   avoid the
   selective acknowledgment (SACK) option. retransmission timeout (RTO) events.  Compared to the
   widely used DUPACK threshold approach, RACK-TLP detects losses more
   efficiently when there are application-limited flights of data, lost
   retransmissions, or data packet reordering events.  It is intended to
   be an alternative to the DUPACK threshold approach [RFC6675], as well as
   other nonstandard approaches such as FACK [FACK]. in
   [RFC5681][RFC6675].

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 10, 2020. January 14, 2021.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.  In this document, these words will appear
   with that interpretation only when in UPPER CASE.  Lower case uses of
   these words are not to be interpreted as carrying [RFC2119]
   significance. . . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Introduction

   This document presents a new  . . . . . . . . . . . . . . . . . . . . . . . .   3
     2.1.  Background  . . . . . . . . . . . . . . . . . . . . . . .   3
     2.2.  Motivation  . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  RACK-TLP high-level design  . . . . . . . . . . . . . . . . .   5
     3.1.  RACK: time-based loss detection algorithm called RACK
   ("Recent ACKnowledgment"). inferences from ACKs  . . . . . . .   5
     3.2.  TLP: sending one segment to probe losses quickly with
           RACK uses the notion of  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
     3.3.  RACK-TLP: reordering resilience with a time instead threshold . .   6
       3.3.1.  Reordering design rationale . . . . . . . . . . . . .   6
       3.3.2.  Reordering window adaptation  . . . . . . . . . . . .   8
     3.4.  An Example of
   the conventional packet or sequence counting approaches for detecting
   losses.  RACK deems a packet lost if it has not been delivered and
   some packet sent sufficiently later has been delivered.  It does this
   by recording packet transmission times and inferring losses using
   cumulative acknowledgments or selective acknowledgment (SACK) TCP
   options.

   In recent years we have been observing several increasingly common
   loss and reordering patterns RACK-TLP in the Internet:

   1.  Slow Action: fast recovery due to lost retransmissions.  Traffic policers
       [POLICER16] and burst losses often cause retransmissions to be
       lost again.  This severely increases latency because the lost
       retransmissions can only be recovered by retransmission timeouts
       (RTOs).

   2.  Tail drops.  Structured request-response traffic turns more
       losses into tail drops.  In such cases, TCP is application-
       limited, so it cannot send new . . . . .   9
     3.5.  An Example of RACK-TLP in Action: RTO . . . . . . . . . .   9
     3.6.  Design Summary  . . . . . . . . . . . . . . . . . . . . .  10
   4.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .  10
   5.  Definitions . . . . . . . . . . . . . . . . . . . . . . . . .  11
     5.1.  Per-packet variables  . . . . . . . . . . . . . . . . . .  11
     5.2.  Per-connection variables  . . . . . . . . . . . . . . . .  12
   6.  RACK Algorithm Details  . . . . . . . . . . . . . . . . . . .  13
     6.1.  Upon transmitting a data to segment  . . . . . . . . . . . .  13
     6.2.  Upon receiving an ACK . . . . . . . . . . . . . . . . . .  13
     6.3.  Upon RTO expiration . . . . . . . . . . . . . . . . . . .  19
   7.  TLP Algorithm Details . . . . . . . . . . . . . . . . . . . .  20
     7.1.  Initializing state  . . . . . . . . . . . . . . . . . . .  20
     7.2.  Scheduling a loss probe . . . . . . . . . . . . . . . . .  20
     7.3.  Sending a loss probe upon PTO expiration  . . . . . . . .  21
     7.4.  Detecting losses and has to
       rely on retransmission timeouts (RTOs).

   3.  Reordering.  Link-layer protocols (e.g., 802.11 block ACK), link
       bonding, or routers' internal load-balancing can deliver TCP
       packets out of order.  The degree of such reordering is usually
       within by the order ACK of the path round trip time.

   Despite TCP stacks (e.g.  Linux) that implement many of loss probe . . . . . .  22
       7.4.1.  General case: detecting packet losses using RACK  . .  22
       7.4.2.  Special case: detecting a single loss repaired by the standard
               loss probe  . . . . . . . . . . . . . . . . . . . . .  23
   8.  Discussion  . . . . . . . . . . . . . . . . . . . . . . . . .  24
     8.1.  Advantages and proposed disadvantages  . . . . . . . . . . . . . .  24
     8.2.  Relationships with other loss detection recovery algorithms

   [RFC4653][RFC5827][RFC5681][RFC6675][RFC7765][FACK][THIN-STREAM],
   we've found that together they do not perform well. . . . .  26
     8.3.  Interaction with congestion control . . . . . . . . . . .  26
     8.4.  TLP recovery detection with delayed ACKs  . . . . . . . .  27
     8.5.  RACK for other transport protocols  . . . . . . . . . . .  28

   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  28
   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
   11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  28
   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  28
     12.1.  Normative References . . . . . . . . . . . . . . . . . .  28
     12.2.  Informative References . . . . . . . . . . . . . . . . .  29
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  30

1.  Terminology

   The main reason
   is that many of them key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are based on the classic rule to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.  In this document, these words will appear
   with that interpretation only when in UPPER CASE.  Lower case uses of counting
   duplicate acknowledgments [RFC5681].  They can either detect loss
   quickly or accurately, but
   these words are not both, especially when to be interpreted as carrying [RFC2119]
   significance.

2.  Introduction

   This document presents RACK-TLP, a TCP loss detection algorithm that
   improves upon the sender is
   application-limited or under reordering widely implemented DUPACK counting approach in
   [RFC5681][RFC6675], and that is unpredictable.  And
   under these conditions none of them can detect lost retransmissions
   well.

   Also, these algorithms, including RFCs, rarely address the
   interactions with other algorithms.  For example, FACK may consider RECOMMENDED to be used as an
   alternative to that earlier approach.  RACK-TLP has two parts: RACK
   ("Recent ACKnowledgment") detects losses quickly using time-based
   inferences derived from ACK feedback.  TLP ("Tail Loss Probe")
   triggers ACK feedback by quickly sending a
   packet is lost while RFC6675 may not.  Implementing N probe segment, to avoid
   retransmission timeout (RTO) events.

2.1.  Background

   In traditional TCP loss recovery algorithms
   while dealing with N^2 interactions is [RFC5681][RFC6675], a daunting task and error-
   prone.

   The goal
   sender starts fast recovery when the number of RACK DUPACKs received
   exceeds a threshold (DupThresh) that defaults to 3 (this approach is
   referred to solve all as DUPACK-counting in the problems above by replacing many rest of the loss detection algorithms above with one more effective
   algorithm to handle loss and reordering.

3.  Overview document).  The main idea
   sender also halves the congestion window during the recovery.  The
   rationale behind RACK the partial window reduction is that if a packet has been delivered out
   of order, then the packets sent chronologically before that were
   either congestion does
   not seem severe since ACK clocking is still maintained.  The time
   elapsed in fast recovery can be just one round-trip, e.g. if the
   sender uses SACK-based recovery [RFC6675] and the number of lost or reordered.  This concept
   segments is small.

   If fast recovery is not fundamentally
   different from [RFC5681][RFC6675][FACK].  But triggered, or triggers but fails to repair
   all the key innovation in
   RACK losses, then the sender resorts to RTO recovery.  The RTO
   timer interval is conservatively the smoothed RTT (SRTT) plus four
   times the RTT variation, and is lower bounded to use a per-packet transmission timestamp 1 second [RFC6298].
   Upon RTO timer expiration, the sender retransmits the first
   unacknowledged segment and widely
   deployed SACK options resets the congestion window to conduct time-based inferences instead of
   inferring losses with packet or sequence counting approaches.

   Using a threshold for counting duplicate acknowledgments (i.e.,
   DupThresh) alone is no longer reliable because of today's prevalent
   reordering patterns.  A common type of reordering the LOSS
   WINDOW value (by default 1 full-size segment [RFC5681]).  The
   rationale behind the congestion window reset is that the last
   "runt" packet an entire flight
   of data was lost, and the ACK clock was lost, so this deserves a window's worth of packet bursts gets delivered
   first,
   cautious response.  The sender then retransmits the rest arrive shortly after of the data
   following the slow start algorithm [RFC5681].  The time elapsed in order.  To handle this
   effectively, a sender would need to constantly adjust
   RTO recovery is one RTO interval plus the DupThresh number of round-trips
   needed to repair all the burst size; but this would risk increasing losses.

2.2.  Motivation

   Fast Recovery is the frequency preferred form of
   RTOs on real losses.

   Today's prevalent lost retransmissions also cause problems with
   packet-counting approaches [RFC5681][RFC6675][FACK], since those
   approaches depend on reasoning loss recovery because it can
   potentially recover all losses in sequence number space.
   Retransmissions break the direct correspondence between ordering in
   sequence space time scale of a single round
   trip, with only a fractional congestion window reduction.  RTO
   recovery and ordering in time.  So congestion window reset should ideally be the last
   resort, only used when retransmissions are
   lost, sequence-based approaches are often unable the entire flight is lost.  However, in
   addition to infer and quickly
   repair losses that losing an entire flight of data, the following situations
   can be deduced unnecessarily resort to RTO recovery with time-based approaches.

   Instead traditional TCP loss
   recovery algorithms [RFC5681][RFC6675]:

   1.  Packet drops for short flows or at the end of counting packets, RACK uses an application data
       flight.  When the most recently delivered
   packet's transmission time to judge if some packets sent previous to
   that time have "expired" sender is limited by passing a certain reordering settling
   window.  On each ACK, RACK marks any already-expired packets lost,
   and for any packets that have not yet expired it waits until the
   reordering window passes and then marks those application (e.g.
       structured request/response traffic), segments lost as well.  In
   either case, RACK can repair at the loss without waiting for a (long)
   RTO.  RACK end of
       the application data transfer often can only be applied to both fast recovery and timeout recovery,
   and can detect losses on both originally transmitted and
   retransmitted packets, making it a great all-weather loss detection
   mechanism.

4.  Design Rationale for Reordering Tolerance

   The reordering behavior recovered by RTO.
       Consider an example of networks can evolve (over years) in
   response to losing only the behavior last segment in a flight
       of transport protocols and applications, as
   well as 100 segments.  Lacking any DUPACK, the needs of network designers sender RTO expires and operators.  From a network
   or link designer's viewpoint, parallelization (eg. link bonding) is
       reduces the easiest way congestion window to get a network 1, and raises the congestion
       window to go faster.  Therefore their main
   constraint on speed just 2 after the loss repair is reordering, acknowledged.  In
       contrast, any single segment loss occurring between the first and there is pressure to relax
   that constraint.  If RACK becomes widely deployed,
       the underlying
   networks may introduce more reordering for higher throughput.  But
   this may result 97th segment would result in excessive reordering that hurts end to end
   performance:

   1.  End host packet processing: extreme reordering on high-speed
       networks fast recovery, which would incur high CPU cost by greatly reducing the
       effectiveness of aggregation mechanisms, such as large receive
       offload (LRO) and generic receive offload (GRO), and
       significantly increasing the number of ACKs.

   2.  Congestion control: TCP congestion control implicitly assumes the
       feedback from ACKs are from only
       cut the same bottleneck.  Therefore it
       cannot handle well scenarios where packets are traversing largely
       disjoint paths.

   3.  Loss recovery: Having an excessively large reordering window to
       accommodate widely different latencies from different paths would
       increase the latency of loss recovery.

   An end-to-end transport protocol cannot tell immediately whether a
   hole is reordering in half.

   1.  Lost retransmissions.  Heavy congestion or loss.  It traffic policers can only distinguish between the two
   in hindsight if the hole in the sequence space gets filled later
   without a retransmission.  How long the sender waits for such
   potential reordering events
       cause retransmissions to settle is determined by the current
   reordering window.

   Given these considerations, be lost again.  Lost retransmissions
       cause a core design philosophy of RACK is to
   adapt resort to RTO recovery, since DUPACK-counting does not
       detect the measured duration loss of reordering events, within
   reasonable and specific bounds.  To accomplish this RACK places the
   following mandates on retransmissions.  Then the reordering window:

   1. slow start
       after RTO recovery could cause burst losses again that severely
       degrades performance [POLICER16].

   2.  Packet reordering.  Link-layer protocols (e.g., 802.11 block
       ACK), link bonding, or routers' internal load-balancing (e.g.,
       ECMP) can deliver TCP segments out of order.  The initial RACK degree of such
       reordering window SHOULD be set to a small
       fraction is usually within the order of the round-trip path round trip
       time.

   2.  If no reordering has been observed, then RACK SHOULD honor the
       classic 3-DUPACK rule for initiating fast recovery.  One simple
       way to implement this reordering degree is to temporarily override beyond DupThresh, the reorder
       window to 0.

   3.  The RACK reordering DUPACK-
       counting can cause a spurious fast recovery and unnecessary
       congestion window SHOULD leverage Duplicate Selective
       Acknowledgement (DSACK) information [RFC3708] to adaptively
       estimate reduction.  To mitigate the duration of reordering events.

   4.  The RACK reordering window MUST be bounded and this bound SHOULD
       be one round trip.

   As a flow starts, either condition 1 or condition 2 or both would
   trigger RACK issue, [RFC4653]
       adjusts DupThresh to start half of the recovery process quickly.  The low initial
   reordering window and use inflight size to tolerate higher
       degree of reordering.  However if more than half of the 3-DUPACK rule are key inflight
       is lost, then the sender has to achieving
   low-latency loss recovery for short flows by risking spurious
   retransmissions resort to RTO recovery.

3.  RACK-TLP high-level design

   RACK-TLP allows senders to recover losses quickly.  This rationale is that
   spurious retransmissions for short flows are not expected to produce
   excessive network traffic.

   For long flows the design tolerates reordering within a round trip.
   This handles reordering caused by path divergence more effectively in all
   three scenarios described in small time
   scales (reordering within the round-trip time of the shortest path),
   which should tolerate previous section.  There are two
   design principles behind RACK-TLP.  The first principle is to detect
   losses via ACK events as much of the reordering from link bonding,
   multipath routing, or link-layer out-of-order delivery.  It also
   relaxes ordering constraints as possible, to allow sending flights of TCP packets
   on different paths dynamically for better load-balancing (e.g.
   flowlets).

   However, repair losses at round-
   trip time-scales.  The second principle is to gently probe the fact that
   network to solicit additional ACK feedback, to avoid RTO expiration
   and subsequent congestion window reset.  At a high level, the initial two
   principles are implemented in RACK reordering window is low, and
   the TLP, respectively.

3.1.  RACK: time-based loss inferences from ACKs

   The rationale behind RACK reordering window's adaptive growth is bounded, means that
   there will continue to be a cost to reordering and if a limit to RACK's
   adaptation to reordering. segment is delivered out of
   order, then the segments sent chronologically before that were either
   lost or reordered.  This maintains a disincentive for network
   designers concept is not fundamentally different from
   [RFC5681][RFC6675][FACK].  RACK's key innovation is using per-segment
   transmission timestamps and operators widely-deployed SACK options to introduce needless conduct
   time-based inferences, instead of inferring losses by counting ACKs
   or excessive
   reordering, particularly SACKed sequences.  Time-based inferences are more robust than
   DUPACK-counting approaches because they have to allow no dependence on flight
   size, and thus are effective for low round
   trip time paths.  This means application-limited traffic.

   Conceptually, RACK will not encourage networks puts a virtual timer for every data segment sent
   (including retransmissions).  Each timer expires dynamically based on
   the latest RTT measurements plus an additional delay budget to
   perform inconsiderate fine-grained packet-spraying over highly
   disjoint paths with very different characteristics.  There are good
   alternative solutions, such
   accommodate potential packet reordering (called the reordering
   window).  When a segment's timer expires, RACK marks the
   corresponding segment lost for retransmission.

   In reality, as MPTCP, an algorithm, RACK does not arm a timer for such networks.

   To conclude, every
   segment sent because it's not necessary.  Instead the sender records
   the RACK algorithm aims to adapt to small degrees of
   reordering, quickly recover most losses within one to two round
   trips, recent transmission time of every data segment sent,
   including retransmissions.  For each ACK received, the sender
   calculates the latest RTT measurement (if eligible) and avoid costly retransmission timeouts (RTOs).  In adjusts the
   presence
   expiration time of reordering, the adaptation algorithm can impose
   sometimes-needless delays when it waits to disambiguate loss from
   reordering, every segment sent but not yet delivered.  If a
   segment has expired, RACK marks it lost.

   Since the penalty for waiting is bounded time-based logic of RACK applies equally to one round trip retransmissions
   and such delays are confined to longer-running flows.

5.  Requirements

   The reader is expected original transmissions, it can detect lost retransmissions as
   well.  If a segment has been retransmitted but its most recent
   (re)transmission timestamp has expired, then after a reordering
   window it's marked lost.

3.2.  TLP: sending one segment to be familiar probe losses quickly with the definitions given RACK

   RACK infers losses from ACK feedback; however, in some cases ACKs are
   sparse, particularly when the TCP congestion control [RFC5681] and selective acknowledgment
   [RFC2018] RFCs.  Familiarity with the conservative SACK-based
   recovery for TCP [RFC6675] inflight is not expected but helps.

   RACK has three requirements:

   1.  The connection MUST use selective acknowledgment (SACK) options
       [RFC2018].

   2.  For each packet sent, small or when the sender MUST store its most recent
       transmission time with (at least) millisecond granularity.  For
       round-trip times lower than losses
   are high.  In some challenging cases the last few segments in a millisecond (e.g., intra-datacenter
       communications) microsecond granularity would significantly help
   flight are lost.  With [RFC5681] or [RFC6675] the detection latency but is not required.

   3.  For each packet sent, sender's RTO would
   expire and reset the sender MUST remember whether congestion window, when in reality most of the packet
   flight has been retransmitted or not.

   We assume that requirement 1 implies the sender keeps delivered.

   Consider an example where a SACK
   scoreboard, which is sender with a large congestion window
   transmits 100 new data structure to store selective
   acknowledgment information on a per-connection basis ([RFC6675]
   section 3).  For segments after an application write, and only
   the ease of explaining last three segments are lost.  Without RACK-TLP, the algorithm, we use a
   pseudo-scoreboard that manages RTO expires,
   the data in sequence number ranges.
   But sender retransmits the specifics of first unacknowledged segment, and the data structure
   congestion window slow-starts from 1.  After all the retransmits are left to
   acknowledged the implementor.

   RACK does not need any change on the receiver.

6.  Definitions congestion window has been increased to 4.  The reader
   total delivery time for this application transfer is expected to be familiar with the definitions three RTTs plus
   one RTO, a steep cost given that only a tiny fraction of the flight
   was lost.  If instead the losses had occurred three segments sooner
   in
   [RFC793], including SND.UNA, SND.NXT, SEG.ACK, the flight, then fast recovery would have recovered all losses
   within one round-trip and SEG.SEQ.

6.1.  Definitions of variables

   A sender implementing RACK needs would have avoided resetting the congestion
   window.

   Fast Recovery would be preferable in such scenarios; TLP is designed
   to store these new trigger the feedback RACK variables:

   "Packet.xmit_ts" is needed to enable that.  After the time of last
   (100th) segment was originally sent, TLP sends the next available
   (new) segment or retransmits the last transmission of a data
   packet, including retransmissions, if any.  The sender needs (highest-sequenced) segment in
   two round-trips to
   record probe the transmission time for each packet sent and not yet
   acknowledged. network, hence the name "Tail Loss
   Probe".  The time MUST be stored at millisecond granularity or
   finer.

   "RACK.packet".  Among all successful delivery of the packets probe would solicit an ACK.
   RACK uses this ACK to detect that have been either
   selectively or cumulatively acknowledged, RACK.packet is the one that
   was sent most recently including retransmissions.

   "RACK.xmit_ts" 98th and 99th segments were
   lost, trigger fast recovery, and retransmit both successfully.  The
   total recovery time is four RTTs, and the latest transmission timestamp of RACK.packet.

   "RACK.end_seq" congestion window is the ending TCP sequence number only
   partially reduced instead of RACK.packet.

   "RACK.rtt" is being fully reset.  If the RTT of probe was
   also lost then the most recently delivered packet on sender would invoke RTO recovery resetting the
   connection (either cumulatively acknowledged or selectively
   acknowledged) that was not marked invalid as
   congestion window.

3.3.  RACK-TLP: reordering resilience with a possible spurious
   retransmission.

   "RACK.rtt_seq" is the SND.NXT when RACK.rtt is updated.

   "RACK.reo_wnd" is a reordering window computed in the unit of time
   used for recording packet transmission times.  It is used to defer
   the moment at which RACK marks threshold

3.3.1.  Reordering design rationale

   Upon receiving an ACK indicating an out-of-order data delivery, a packet lost.

   "RACK.dupthresh" is
   sender cannot tell immediately whether that out-of-order delivery was
   a constant specifying the number result of duplicate
   acknowledgments, reordering or selectively acknowledged segments, that loss.  It can
   (under certain conditions) trigger fast recovery, similar only distinguish between the
   two in hindsight if the missing sequence ranges are filled in later
   without retransmission.  Thus a loss detection algorithm needs to
   [RFC6675].  As
   budget some wait time -- a reordering window -- to try to
   disambiguate packet reordering from packet loss.

   The reordering window in [RFC5681] and [RFC6675], this threshold the DUPACK-counting approach is implicitly
   defined
   to be 3.

   "RACK.min_RTT" is as the estimated minimum round-trip elapsed time (RTT) to receive acknowledgements for
   DupThresh-worth of the
   connection.

   "RACK.ack_ts" out-of-order deliveries.  This approach is
   effective if the time when all network reordering degree (in sequence distance) is
   smaller than DupThresh and at least DupThresh segments after the sequences in RACK.packet were
   selectively or cumulatively loss
   are acknowledged.

   "RACK.reo_wnd_incr" is  For cases where the multiplier applied to adjust RACK.reo_wnd

   "RACK.reo_wnd_persist" reordering degree is larger
   than the number default DupThresh of loss recoveries before
   resetting RACK.reo_wnd

   "RACK.dsack" indicates if a DSACK option has been received since last
   RACK.reo_wnd change

   "RACK.pkts_sacked" returns 3 packets, one alternative is to
   dynamically adapt DupThresh based on the total number FlightSize (e.g. adjusts
   DUPTRESH to half of packets selectively
   acknowledged in the SACK scoreboard.

   "RACK.reord" indicates FlightSize).  However, this does not work
   well with the connection has detected packet reordering
   event(s)
   "RACK.fack" following two types of reordering:

   1.  Application-limited flights where the last non-full-sized segment
       is delivered first and then the highest selectively or cumulatively acknowledged
   sequence

   Note that the Packet.xmit_ts variable is per packet in flight.  The
   RACK.xmit_ts, RACK.end_seq, RACK.rtt, RACK.reo_wnd, and RACK.min_RTT
   variables are kept remaining full-sized segments in
       the per-connection TCP control block.
   RACK.packet and RACK.ack_ts flight are used as local variables delivered in order.  This reordering pattern can
       occur when segments traverse parallel forwarding paths.  In such
       scenarios the
   algorithm.

7.  Algorithm Details

7.1.  Transmitting a data packet

   Upon transmitting a new packet or retransmitting an old packet,
   record the time degree of reordering in Packet.xmit_ts.  RACK does not care if the
   retransmission packet distance is triggered by an ACK, new application data, an RTO,
   or any other means.

7.2.  Upon receiving an ACK

   Step 1: Update RACK.min_RTT.

   Use the RTT measurements obtained via [RFC6298] or [RFC7323] to
   update one
       segment less than the estimated minimum RTT in RACK.min_RTT.  The sender can
   track a simple global minimum flight size.

   2.  A flight of all RTT measurements from the
   connection, or a windowed min-filtered value segments that are delivered partially out of recent RTT
   measurements.  This document does not specify order.
       One cause for this pattern is wireless link-layer retransmissions
       with an exact approach.

   Step 2: Update RACK stats

   Given inadequate reordering buffer at the information provided in an ACK, each packet cumulatively
   ACKed or SACKed is marked as delivered in receiver.  In such
       scenarios, the scoreboard.  Among all wireless sender sends the data packets newly ACKed or SACKed in order
       initially, but some are lost and then recovered by link-layer
       retransmissions; the connection, record wireless receiver delivers the most
   recent Packet.xmit_ts TCP data
       packets in RACK.xmit_ts if it is ahead of RACK.xmit_ts.
   Sometimes the timestamps of RACK.Packet and Packet could carry the
   same transmit timestamps order they are received, due to clock granularity or segmentation
   offloading (i.e. the two packets were handed to inadequate
       reordering buffer.  The random wireless transmission errors in
       such scenarios cause the NIC as a single
   unit).  In that case the sequence numbers of RACK.end_seq and
   Packet.end_seq are compared reordering degree, expressed in packet
       distance, to break the tie.

   Since an ACK can also acknowledge retransmitted data packets, and
   retransmissions can be spurious, the sender must take care have highly variable values up to avoid
   spurious inferences.  For example, if the sender were to use timing
   information from a spurious retransmission, flight size.

   In the RACK.rtt could be
   vastly underestimated.

   To avoid spurious inferences, ignore a packet as invalid if any of
   its TCP sequences have been retransmitted before and either of above two
   conditions is true:

   1.  The Timestamp Echo Reply field (TSecr) of the ACK's timestamp
       option [RFC7323], if available, indicates the ACK was not
       acknowledging cases the last retransmission degree of the packet.

   2.  The reordering in packet was last retransmitted less than RACK.min_rtt ago.

   If distance is
   highly variable, making DUPACK-counting approach ineffective
   including dynamic adaptation variants like [RFC4653].  Instead the ACK
   degree of reordering in time difference in such cases is not ignored as invalid, update usually
   within a single round-trip time.  This is because the RACK.rtt to be packets either
   traverse slightly disjoint paths with similar propagation delays or
   are repaired quickly by the
   RTT sample calculated local access technology.  Hence, using this ACK, and continue.  If this ACK or
   SACK was for a
   time threshold instead of packet threshold strikes a middle ground,
   allowing a bounded degree of reordering resilience while still
   allowing fast recovery.  This is the most recently sent packet, then record rationale behind the
   RACK.xmit_ts timestamp and RACK.end_seq sequence implied by this ACK.
   Otherwise exit here RACK-TLP
   reordering resilience design.

   Specifically, RACK-TLP introduces a new dynamic reordering window
   parameter in time units, and omit the following steps.

   Notice that the second condition above is sender considers a heuristic.  This
   heuristic would fail to update RACK stats data segment S
   lost if a both conditions are met:

   1.  Another data packet is
   spuriously retransmitted because segment sent later than S has been delivered
   2.  S has not been delivered after the estimated round-trip time plus
       the reordering window

   Note that condition (1) implies at least one round-trip of a recent minimum RTT decrease
   (e.g. path change).  For example, time has
   elapsed since S has been sent.

3.3.2.  Reordering window adaptation

   The RACK reordering window adapts to the measured duration of
   reordering events, within reasonable and specific bounds in cases with a TCP order to
   disincentivize excessive reordering.  More specifically:

   1.  If the sender has not observed any reordering since the
       connection
   without TCP timestamps, and where was established, then the first M packets RACK reordering window
       SHOULD be zero in a flight either of
   data packets travel an old (longer) original path, and the remaining
   N packets in following cases:

       1.  After learning that flight travel a new (shorter) path and arrive three segments have been delivered out of
           order and elicit SACKs, then those SACKs for the N packets can
   initiate a spurious retransmission of the first M packets.  In such
   scenarios, (e.g. receiving 3 DUPACKs per [RFC5681]); in turn, this
           will cause the sender would not be able RACK loss detection logic to update its RACK.min_rtt
   using the (ambiguous) RTT samples from retransmissions, so during trigger fast
           recovery.

       2.  During fast recovery all RTT samples may be less than RACK.min_rtt, and thus meet or RTO recovery.

   1.  If the second condition.  In such cases RACK may not detect losses from
   ACK events and sender has observed some reordering since the recovery would connection
       was established, then resort the RACK reordering window SHOULD be set to
       a small fraction of the (slower) TLP round-trip time, or
   RTO timer-based recovery.  However, such events should zero if no round-trip
       time estimate is available.

   2.  The RACK reordering window MUST be rare bounded and
   the connection would pick up the new minimum RTT when the recovery
   ends, so the sender can avoid repeated similar failures.

   Step 2 may this bound SHOULD
       be summarized in pseudocode as:

   RACK_sent_after(t1, seq1, t2, seq2):
       If t1 > t2:
           Return true
       Else if t1 == t2 AND seq1 > seq2:
           Return true
       Else:
           Return false

   RACK_update():
       For each Packet newly acknowledged cumulatively or selectively:
           rtt = Now() - Packet.xmit_ts
           If Packet.retransmitted is TRUE:
               If ACK.ts_option.echo_reply < Packet.xmit_ts:
                  Return
               If rtt < RACK.min_rtt:
                  Return

           RACK.rtt = rtt
           If RACK_sent_after(Packet.xmit_ts, Packet.end_seq
                              RACK.xmit_ts, RACK.end_seq):
               RACK.xmit_ts = Packet.xmit_ts

   Step 3: Detect packet SRTT.

   3.  The RACK reordering

   To detect reordering, window SHOULD leverage that to adaptively
       estimate the sender looks for original data packets
   being delivered out duration of order in sequence space.  The sender tracks reordering events, if the highest sequence selectively or cumulatively acknowledged in receiver uses
       Duplicate Selective Acknowledgement (DSACK) [RFC2883].

   For short flows, the
   RACK.fack variable. low initial reordering window is key to recover
   quickly by risking spurious retransmissions.  The name fack stands rationale is that
   spurious retransmissions for short flows are not expected to produce
   excessive network traffic additionally.  For long flows the most forward ACK
   originated from design
   tolerates reordering within a round trip.  This handles reordering
   caused by path divergence in small time scales (reordering within the [FACK] draft.  If
   round-trip time of the ACK selectively or
   cumulatively acknowledges an unacknowledged and also never
   retransmitted sequence below RACK.fack, then shortest path).

   However, the corresponding packet
   has been reordered and RACK.reord is set to TRUE.

   The heuristic above only detects reordering if fact that the re-ordered packet
   has not yet been retransmitted.  This is a major drawback because if
   RACK has a low initial reordering window is low, and the network is reordering
   packets, RACK may falsely retransmit frequently.  Consequently RACK
   may fail to detect
   reordering window's adaptive growth is bounded, means that there will
   continue to increase the reordering window,
   because the reordered packets were already (falsely) retransmitted.

   DSACK [RFC3708] can help mitigate this issue.  The false
   retransmission would solicit DSACK option in the ACK.  Therefore if
   the ACK has be a DSACK option covering some sequence that were both
   acknowledged and retransmitted, this implies the original packet was
   reordered but RACK retransmitted the packet too quickly and should
   set RACK.reord cost to TRUE.

   RACK_detect_reordering():
       For each Packet newly acknowledged cumulatively or selectively:
           If Packet.end_seq > RACK.fack:
               RACK.fack = Packet.end_seq
           Else if Packet.end_seq < RACK.fack AND
                   Packet.retransmitted is FALSE:
               RACK.reord = TRUE

       For each Packet covered by the DSACK option:
           If Packet.retransmitted is TRUE:
               RACK.reord = TRUE

   Step 4: Update RACK reordering window

   To handle the prevalent small degree of reordering, RACK.reo_wnd
   serves as an allowance for settling time before marking a packet
   lost.  This section documents a detailed algorithm following the
   design rationale section.  RACK starts initially with a conservative
   window of min_RTT/4.  If no to disincentivize excessive
   network reordering has been observed, RACK uses
   RACK.reo_wnd over highly disjoint paths.  For such networks
   there are good alternative solutions, such as MPTCP.

3.4.  An Example of 0 during loss recovery, RACK-TLP in order to retransmit
   quickly, or when the number of DUPACKs exceeds the classic DUPACK
   threshold. Action: fast recovery

   The subtle difference between this approach and the
   conventional one [RFC5681][RFC6675] is discussed later following example in figure 1 illustrates the section
   "RACK and RACK-TLP algorithm
   in action:

   Event  TCP DATA SENDER                            TCP DATA RECEIVER
   _____  ____________________________________________________________
     1.   Send P0, P1, P2, P3          -->
          [P1, P2, P3 dropped by network]

     2.                                <--          Receive P0, ACK P0

     3a.  2RTTs after (2), TLP Discussion".

   Further, RACK MAY use DSACK [RFC3708] to adapt the reordering window,
   to higher degrees of reordering, if DSACK is supported.  Receiving an timer fires
     3b.  TLP: retransmits P3          -->

     4.                                <--         Receive P3, SACK P3

     5a.  Receive SACK for P3
     5b.  RACK: marks P1, P2 lost
     5c.  Retransmit P1, P2            -->
          [P1 retransmission dropped by network]

     6.                                <--    Receive P2, SACK P2 & P3

     7a.  RACK: marks P1 retransmission lost
     7b.  Retransmit P1                -->

     8.                                <--          Receive P1, ACK with a DSACK indicates P3

                                  Figure 1.

   Figure 1, above, illustrates a spurious retransmission, which in turn
   suggests that the RACK reordering window, RACK.reo_wnd, is likely too
   small.  The sender MAY increase the RACK.reo_wnd window linearly for
   every round trip in which the sender receives a DSACK, so that after
   N distinct round trips in which a DSACK is received, the RACK.reo_wnd
   becomes (N+1) * min_RTT / 4, with an upper-bound of SRTT.  The
   inflated RACK.reo_wnd would persist for 16 loss recoveries sending four segments (P1, P2,
   P3, P4) and after
   which it resets to its starting value, min_RTT / 4.

   The following pseudocode implements losing the above algorithm.  Note that
   extensions that require additional TCP features (e.g.  DSACK) would
   work if the feature functions simply return false.

   RACK_update_reo_wnd():
       RACK.min_RTT = TCP_min_RTT()
       If DSACK option is present:
           RACK.dsack = true

       If SND.UNA < RACK.rtt_seq:
           RACK.dsack = false  /* React to DSACK once per round trip */

       If RACK.dsack:
           RACK.reo_wnd_incr += 1
           RACK.dsack = false
           RACK.rtt_seq = SND.NXT
           RACK.reo_wnd_persist = 16 /* Keep window for 16 recoveries */
       Else if exiting loss recovery:
           RACK.reo_wnd_persist -= 1
           If RACK.reo_wnd_persist <= 0:
               RACK.reo_wnd_incr = 1

       If RACK.reord is FALSE:
           If in last three segments.  After two round-trips,
   TLP sends a loss recovery:  /* If in fast or timeout recovery */
               RACK.reo_wnd = 0
               Return
           Else if RACK.pkts_sacked >= RACK.dupthresh:
               RACK.reo_wnd = 0
               return
       RACK.reo_wnd = RACK.min_RTT / 4 * RACK.reo_wnd_incr
       RACK.reo_wnd = min(RACK.reo_wnd, SRTT)

   Step 5: Detect losses.

   For each packet that has not been SACKed, if RACK.xmit_ts is after
   Packet.xmit_ts + RACK.reo_wnd, then mark probe, retransmitting the packet (or its
   corresponding sequence range) lost in last segment, P3, to
   solicit SACK feedback and restore the scoreboard. ACK clock (event 3).  The rationale
   is that if another packet
   delivery of P3 enables RACK to infer (event 5b) that was P1 and P2 were
   likely lost, because they were sent later has been delivered, before P3.  The sender then
   retransmits P1 and P2.  Unfortunately, the reordering window or "reordering settling time" has already
   passed, then retransmission of P1 is
   lost again.  However, the packet delivery of the retransmission of P2 allows
   RACK to infer that the retransmission of P1 was likely lost (event
   7a), and hence P1 should be retransmitted (event 7b).

3.5.  An Example of RACK-TLP in Action: RTO

   In addition to enhancing fast recovery, RACK improves the accuracy of
   RTO recovery by reducing spurious retransmissions.

   Without RACK, upon RTO timer expiration the sender marks all the
   unacknowledged segments lost.

   If another packet that  This approach can lead to spurious
   retransmissions.  For example, consider a simple case where one
   segment was sent later has been delivered, but the
   reordering window has not passed, with an RTO of 1 second, and then it is not yet safe the application
   writes more data, causing a second and third segment to deem be sent right
   before the
   unacked packet lost.  Using RTO of the basic algorithm above, first segment expires.  Suppose only the first
   segment is lost.  Without RACK, upon RTO expiration the sender
   would wait for marks
   all three segments as lost and retransmits the next ACK to further advance RACK.xmit_ts; but this
   risks a timeout (RTO) if no more ACKs come back (e.g, due to losses
   or application limit).  For timely loss detection, first segment.  When
   the sender MAY
   install a "reordering settling" timer set to fire at receives the earliest
   moment at which it is safe to conclude ACK that some packet is lost.  The
   earliest moment is selectively acknowledges the time it takes to expire second
   segment, the reordering window
   of sender spuriously retransmits the earliest unacked packet in flight.

   This third segment.

   With RACK, upon RTO timer expiration value can be derived as follows.  As a starting
   point, we consider that the reordering window has passed if only segment automatically
   marked lost is the
   RACK.packet first segment (since it was sent sufficiently after an RTO ago); for
   all the packet in question, or a
   sufficient time other segments RACK only marks the segment lost if at least
   one round trip has elapsed since the RACK.packet segment was S/ACKed, or
   some combination of transmitted.
   Consider the two.  More precisely, RACK previous example scenario, this time with RACK.  With
   RACK, when the RTO expires the sender only marks a packet the first segment as
   lost if
   lost, and retransmits that segment.  The other two very recently sent
   segments are not marked lost, because they were sent less than one
   round trip ago and there were no ACKs providing evidence that they
   were lost.  When the reordering window for a packet has elapsed through sender receives the
   sum of:

   1. delta in transmit time between a packet and ACK that selectively
   acknowledges the RACK.packet

   2. delta in time between RACK.ack_ts and now

   So we mark a packet as lost if:

    RACK.xmit_ts >= Packet.xmit_ts
           AND
    (RACK.xmit_ts - Packet.xmit_ts) + (now - RACK.ack_ts) >= RACK.reo_wnd

   If we solve this second condition for "now", segment, the moment at which we
   can declare a packet lost, then we get:

   now >= Packet.xmit_ts + RACK.reo_wnd + (RACK.ack_ts - RACK.xmit_ts)

   Then (RACK.ack_ts - RACK.xmit_ts) is sender would not retransmit the RTT of
   third segment but rather would send any new segments (if allowed by
   congestion window and receive window).

   In the packet above example, if the sender were to send a large burst of
   segments instead of two segments right before RTO, without RACK the
   sender
   used to set RACK.xmit_ts: may spuriously retransmit almost the round trip time of entire flight [RACK-
   TCPM97].  Note that the most recently
   (re)transmitted packet that's been delivered. Eifel protocol [RFC3522] cannot prevent this
   issue because it can only detect spurious RTO episodes.  In this
   example the RTO itself was not spurious.

3.6.  Design Summary

   To be more robust summarize, RACK-TLP aims to adapt to small time-varying degrees of
   reordering, RACK uses a more conservative RTT value quickly recover most losses within one to decide if an
   unacknowledged packet should be considered lost, RACK.rtt: the two round
   trip time
   trips, and avoid costly RTO recoveries.  In the presence of
   reordering, the most recently delivered packet on adaptation algorithm can impose sometimes-needless
   delays when it waits to disambiguate loss from reordering, but the connection
   that was not marked invalid as a possible spurious retransmission.

   When packets
   penalty for waiting is bounded to one round trip and such delays are delivered in order, the most recently
   (re)transmitted packet that's been delivered
   confined to flows long enough to have observed reordering.

4.  Requirements

   The reader is also expected to be familiar with the most
   recently delivered, hence RACK.rtt == RACK.ack_ts - RACK.xmit_ts.
   But if packets were reordered, then definitions given in
   the packet delivered most
   recently was sent before TCP congestion control [RFC5681] and selective acknowledgment
   [RFC2018][RFC6675] RFCs.  RACK-TLP has the most recently (re)transmitted packet.
   Hence RACK.rtt > (RACK.ack_ts - RACK.xmit_ts).

   Since RACK.RTT >= (RACK.ack_ts - RACK.xmit_ts), following requirements:

   1.  The connection MUST use selective acknowledgment (SACK) options
       [RFC2018], and the previous equation
   reduces to saying that sender keeps a SACK scoreboard information on
       a per-connection basis ([RFC6675] section 3).

   2.  For each data segment sent, the sender can declare MUST store its most recent
       transmission time with a packet lost when:

   now >= Packet.xmit_ts + RACK.reo_wnd + RACK.rtt

   In turn, timestamp whose granularity that is equivalent to stating that a RACK sender should
   declare a packet lost when:

   Packet.xmit_ts + RACK.rtt + RACK.reo_wnd - now <= 0
   The following pseudocode implements
       finer than 1/4 of the algorithm above.  When an ACK minimum RTT of the connection.  At the time
       of writing, microsecond resolution is received suitable for intra-
       datacenter traffic and millisecond granularity or finer is
       suitable for the RACK timer expires, call RACK_detect_loss().  The
   algorithm includes an additional optimization to break timestamp ties Internet.  Note that RACK-TLP can be implemented
       with TSO (TCP Segmentation Offload) support by using having multiple
       segments in a TSO aggregate share the TCP sequence space. same timestamp.

   3.  RACK DSACK-based reordering window adaptation is RECOMMENDED but
       is not required.

   4.  TLP requires RACK.

5.  Definitions

   The optimization reader is particularly
   useful expected to detect losses in a timely manner be familiar with TCP Segmentation
   Offload, where multiple packets the variables of SND.UNA,
   SND.NXT, SEG.ACK, and SEG.SEQ in one TCP Segmentation Offload (TSO)
   blob have identical timestamps.

   RACK_detect_loss():
       timeout = 0

       For each packet, Packet, not acknowledged yet:
           If Packet.lost is TRUE AND Packet.retransmitted [RFC793], SMSS, FlightSize in
   [RFC5681], DupThresh in [RFC6675], RTO and SRTT in [RFC6298].  A
   RACK-TLP implementation needs to store new per-packet and per-
   connection state, described below.

5.1.  Per-packet variables

   Theses variables indicate the status of the most recent transmission
   of a data segment:

   "Segment.lost" is FALSE:
               Continue /* Packet true if the most recent (re)transmission of the
   segment has been marked lost but not yet and needs to be retransmitted.  False
   otherwise.

   "Segment.retransmitted" is true if it was retransmitted */

           If RACK_sent_after(RACK.xmit_ts, RACK.end_seq,
                              Packet.xmit_ts, Packet.end_seq):
               remaining = Packet.xmit_ts + RACK.rtt +
                           RACK.reo_wnd - Now()
               If remaining <= 0:
                   Packet.lost = TRUE
               Else:
                   timeout = max(remaining, timeout)

       If timeout != 0
           Arm in the most
   recent transmission.  False otherwise.

   "Segment.xmit_ts" is the time of the last transmission of a timer to call RACK_detect_loss() after timeout

   Implementation optimization: looping through packets data
   segment, including retransmissions, if any, with a clock granularity
   specified in the SACK
   scoreboard above could be very costly on large-BDP networks since Requirements section.

   "Segment.end_seq" is the
   inflight could be very large.  If next sequence number after the implementation can organize last sequence
   number of the
   scoreboard data structures to segment.

5.2.  Per-connection variables

   "RACK.segment".  Among all the segments that have packets sorted by been either
   selectively or cumulatively acknowledged, RACK.segment is the last
   (re)transmission time, then one
   that was sent most recently (including retransmissions).

   "RACK.xmit_ts" is the loop can start on latest transmission timestamp of RACK.segment.

   "RACK.end_seq" is the least Segment.end_seq of RACK.segment.

   "RACK.ack_ts" is the time when the full sequence range of
   RACK.segment was selectively or cumulatively acknowledged.

   "RACK.segs_sacked" returns the total number of segments selectively
   acknowledged in the SACK scoreboard.

   "RACK.fack" is the highest selectively or cumulatively acknowledged
   sequence (i.e. forward acknowledgement).

   "RACK.min_RTT" is the estimated minimum round-trip time (RTT) of the
   connection.

   "RACK.rtt" is the RTT of the most recently
   sent packet and abort delivered segment on the first packet sent after RACK.time_ts.
   This can be implemented by using
   connection (either cumulatively acknowledged or selectively
   acknowledged) that was not marked invalid as a seperate doubly-linked list sorted possible spurious
   retransmission.

   "RACK.reordering_seen" indicates whether the sender has detected data
   segment reordering event(s).

   "RACK.reo_wnd" is a reordering window computed in the unit of time order.  The implementation inserts
   used for recording segment transmission times.  It is used to defer
   the moment at which RACK marks a segment lost.

   "RACK.dsack" indicates if a DSACK option has been received since the
   last RACK.reo_wnd change.

   "RACK.reo_wnd_mult" is the multiplier applied to adjust RACK.reo_wnd.

   "RACK.reo_wnd_persist" is the number of loss recoveries before
   resetting RACK.reo_wnd

   "RACK.rtt_seq" is the SND.NXT when RACK.rtt is updated.

   "TLP.is_retrans": a boolean indicating whether there is an
   unacknowledged TLP retransmission.

   "TLP.end_seq": the packet value of SND.NXT at the tail time of
   the list when it is (re)transmitted, and removes sending a packet TLP
   retransmission.

   "TLP.max_ack_delay": sender's maximum delayed ACK timer budget.

   Per-connection timers

   "RACK reordering timer": a timer that allows RACK to wait for
   reordering to resolve, to try to disambiguate reordering from the
   list loss,
   when it is delivered or some out-of-order segments are marked lost.  We RECOMMEND such as SACKed.

   "TLP PTO": a timer event indicating that an
   optimization because it enables implementations to support high-BDP
   networks.  This optimization ACK is implemented in Linux overdue and sees orders
   of magnitude improvement in CPU usage on high-speed WAN networks.

7.3.  Tail Loss Probe: fast recovery for tail losses

   This section describes the
   sender should transmit a supplemental algorithm, Tail Loss Probe
   (TLP), which leverages RACK TLP segment, to further reduce solicit SACK or ACK
   feedback.

   These timers augment the existing timers maintained by a sender,
   including the RTO recoveries. timer [RFC6298].  A RACK-TLP sender arms one of
   these three timers -- RACK reordering timer, TLP
   triggers fast recovery to quickly repair tail losses that PTO timer, or RTO
   timer -- when it has unacknowledged segments in flight.  The
   implementation can
   otherwise be recovered only via RTOs.  After simplify managing all three timers by multiplexing
   a single timer among them with an original data
   transmission, TLP sends additional variable to indicate the
   event to invoke upon the next timer expiration.

6.  RACK Algorithm Details

6.1.  Upon transmitting a probe data segment within one

   Upon transmitting a new segment or retransmitting an old segment,
   record the time in Segment.xmit_ts and set Segment.lost to two RTTs. FALSE.
   Upon retransmitting a segment, set Segment.retransmitted to TRUE.

   RACK_transmit_data(Segment):
           Segment.xmit_ts = Now()
           Segment.lost = FALSE

   RACK_retransmit_data(Segment):
           Segment.retransmitted = TRUE
           RACK_transmit_data(Segment)

6.2.  Upon receiving an ACK

   Step 1: Update RACK.min_RTT.

   Use the RTT measurements obtained via [RFC6298] or [RFC7323] to
   update the estimated minimum RTT in RACK.min_RTT.  The probe data segment can either be new, previously unsent data, sender SHOULD
   track a simple global minimum of all RTT measurements from the
   connection, or a retransmission windowed min-filtered estimate of previously recent RTT
   measurements.

   Step 2: Update state for most recently sent data just below SND.NXT. segment that has been
   delivered

   In
   either case this step, RACK updates its states that tracks the goal is most recently
   sent segment that has been delivered: RACK.segment; RACK maintains
   its latest transmission timestamp in RACK.xmit_ts and its highest
   sequence number in RACK.end_seq.  These two variables are used, in
   later steps, to elicit more feedback from estimate if some segments not yet delivered were
   likely lost.  Given the receiver, information provided in an ACK, each segment
   cumulatively ACKed or SACKed is marked as delivered in the form of
   scoreboard.  Since an ACK (potentially with SACK blocks), can also acknowledge retransmitted data
   segments, and retransmissions can be spurious, the sender needs to allow RACK
   take care to
   trigger fast recovery instead of an RTO.

   An RTO occurs when avoid spurious inferences.  For example, if the first unacknowledged sequence number is not
   acknowledged after sender
   were to use timing information from a conservative period spurious retransmission, the
   RACK.rtt could be vastly underestimated.

   To avoid spurious inferences, ignore a segment as invalid if any of time
   its sequence range has elapsed
   [RFC6298].  Common causes been retransmitted before and either of RTOs include: two
   conditions is true:

   1.  The entire flight Timestamp Echo Reply field (TSecr) of data is lost.

   2.  Tail losses the ACK's timestamp
       option [RFC7323], if available, indicates the ACK was not
       acknowledging the last retransmission of data segments at the end of an application
       transaction.

   3.  Tail losses of ACKs at segment.

   2.  The segment was last retransmitted less than RACK.min_rtt ago.

   The second check is a heuristic when the TCP Timestamp option is not
   available, or when the round trip time is less than the end of an application transaction.

   4.  Lost retransmits, which can halt fast recovery based on [RFC6675]
       if TCP Timestamp
   clock granularity.

   Among all the segments newly ACKed or SACKed by this ACK stream completely dries up.  For example, consider a
       window of three data packets (P1, P2, P3) that are sent; P1 and
       P2 are dropped.  On receipt of a SACK for P3, RACK marks P1 and
       P2 as lost and retransmits them as R1 and R2.  Suppose R1 and R2
       are lost as well, so there are no more returning ACKs to detect
       R1 and R2 as lost.  Recovery stalls.

   5.  An unexpectedly long round-trip time (RTT).  This can cause ACKs pass
   the checks above, update the RACK.rtt to arrive after be the RTO timer expires.  The F-RTO algorithm
       [RFC5682] RTT sample calculated
   using this ACK.  Furthermore, record the most recent Segment.xmit_ts
   in RACK.xmit_ts if it is designed ahead of RACK.xmit_ts.  If Segment.xmit_ts
   equals RACK.xmit_ts (e.g. due to detect such spurious retransmission
       timeouts clock granularity limits) then
   compare Segment.end_seq and at least partially undo RACK.end_seq to break the consequences of such
       events, but F-RTO cannot tie.

   Step 2 may be used summarized in many situations.

7.4.  Tail Loss Probe: An Example

   Following is an example of TLP.  All events listed are at a TCP
   sender.

   1.  Sender transmits segments 1-10: 1, 2, 3, ..., 8, 9, 10.  There pseudocode as:

   RACK_sent_after(t1, seq1, t2, seq2):
       If t1 > t2:
           Return true
       Else if t1 == t2 AND seq1 > seq2:
           Return true
       Else:
           Return false

   RACK_update():
       For each Segment newly acknowledged cumulatively or selectively:
           rtt = Now() - Segment.xmit_ts
           If Segment.retransmitted is
       no more new TRUE:
               If ACK.ts_option.echo_reply < Segment.xmit_ts:
                  Return
               If rtt < RACK.min_rtt:
                  Return

           RACK.rtt = rtt
           If RACK_sent_after(Segment.xmit_ts, Segment.end_seq
                              RACK.xmit_ts, RACK.end_seq):
               RACK.xmit_ts = Segment.xmit_ts

   Step 3: Detect data to transmit.  A TLP is scheduled to be sent 2
       RTTs after the transmission of segment reordering

   To detect reordering, the 10th segment.

   2.  Sender receives acknowledgements (ACKs) sender looks for original data segments 1-5;
       segments 6-10 are lost and no ACKs are received.  The sender
       reschedules its TLP at a time relative to the last received ACK,
       which is
   being delivered out of order.  To detect such cases, the ACK for segment 5 in this case.  The sender sets the
       time for the TLP using
   tracks the calculation described highest sequence selectively or cumulatively acknowledged
   in step (2) of
       the algorithm.

   3.  When the TLP timer fires, sender retransmits segment 10.

   4.  After an RTT, a SACK for packet 10 arrives. RACK.fack variable.  The ACK also carries
       SACK holes name "fack" stands for segments 6, 7, 8 and 9.  This triggers RACK-based
       loss recovery.

   5.  The connection enters fast recovery and retransmits the remaining
       lost segments.

7.5.  Tail Loss Probe Algorithm Details

   We define the terminology used in specifying the TLP algorithm:

   FlightSize: amount most
   "Forward ACK" (this term is adopted from [FACK]).  If a never-
   retransmitted segment that's below RACK.fack is (selectively or
   cumulatively) acknowledged, it has been delivered out of outstanding data in the network, as defined in
   [RFC5681].

   RTO: order.  The transport's retransmission timeout (RTO) is based on
   measured round-trip times (RTT) between the
   sender and receiver, sets RACK.reordering_seen to TRUE if such segment is
   identified.

   RACK_detect_reordering():
       For each Segment newly acknowledged cumulatively or selectively:
           If Segment.end_seq > RACK.fack:
               RACK.fack = Segment.end_seq
           Else if Segment.end_seq < RACK.fack AND
                   Segment.retransmitted is FALSE:
               RACK.reordering_seen = TRUE

   Step 4: Update RACK reordering window

   The RACK reordering window, RACK.reo_wnd, serves as
   specified in [RFC6298] an adaptive
   allowance for TCP.  PTO: Probe timeout (PTO) is settling time before marking a timer
   event indicating that an ACK is overdue and segment lost.  This step
   documents a detailed algorithm that follows the principles outlined
   in the ``RACK reordering window adaptation'' section.

   If the sender should try has not yet observed any reordering based on the
   previous step, then RACK prioritizes quick loss recovery by using
   setting RACK.reo_wnd to
   transmit 0 when the number of SACKed segments exceeds
   DupThresh, or during loss recovery.

   Aside from those special conditions, RACK starts with a TLP.  Its conservative
   reordering window of RACK.min_RTT/4.  This value is constrained was chosen because
   Linux TCP used the same factor in its implementation to be smaller than or equal delay Early
   Retransmit [RFC5827] to an RTO.

   SRTT: smoothed round-trip time, computed as specified reduce spurious loss detections in [RFC6298].
   TLPRxtOut: a boolean indicating whether there is an unacknowledged
   TLP retransmission.

   TLPHighRxt: the value
   presence of SND.NXT at reordering, and experience showed this worked reasonably
   well [DMCG11].

   However, the time of sending a TLP
   retransmission.

   WCDelAckT: maximum delayed ACK timer value.

   The TLP algorithm has three phases, which we discuss reordering detection in turn.

7.5.1.  Phase 1: Scheduling a loss probe the previous step, Step 1: Check conditions for scheduling 3, has a PTO.

   A sender should check
   self-reinforcing drawback when the reordering window is too small to see if it should schedule a PTO in
   cope with the
   following situations:

   1.  After transmitting new data that was not itself a TLP probe

   2.  Upon receiving an ACK actual reordering.  When that cumulatively acknowledges data

   A sender should schedule a PTO only if all of happens, RACK could
   spuriously mark reordered segments lost, causing them to be
   retransmitted.  In turn, the following retransmissions can prevent the
   necessary conditions are met:

   1.  The connection supports SACK [RFC2018]
   2.  The connection has no SACKed sequences in for Step 3 to detect reordering, since this
   mechanism requires ACKs or SACKs for only segments that have never
   been retransmitted.  In some cases such scenarios can persist,
   causing RACK to continue to spuriously mark segments lost without
   realizing the SACK scoreboard

   3.  The connection reordering window is not in loss recovery

   If a PTO can be scheduled according too small.

   To avoid the issue above, RACK dynamically adapts to these conditions, higher degrees
   of reordering using DSACK options from the sender
   should schedule receiver.  Receiving an
   ACK with a PTO.  If there was DSACK option indicates a previously scheduled PTO or
   RTO pending, then spurious retransmission,
   suggesting that pending PTO or RTO should first be cancelled,
   and then the new PTO should be scheduled.

   If a PTO cannot RACK.reo_wnd may be scheduled according to these conditions, then the
   sender MUST arm the RTO timer if there is unacknowledged data too small.  The RACK.reo_wnd
   increases linearly for every round trip in
   flight.

   Step 2: Select the duration of which the PTO.

   A sender SHOULD use the following logic to select the duration of receives
   some DSACK option, so that after N distinct round trips in which a
   PTO:

   TLP_timeout():
       If SRTT
   DSACK is available:
           PTO = 2 received, the RACK.reo_wnd becomes (N+1) * SRTT
           If FlightSize = 1:
              PTO += WCDelAckT
       Else:
           PTO = 1 sec

       If Now() + PTO > TCP_RTO_expire():
           PTO = TCP_RTO_expire() - Now()

   Aiming for a PTO value of 2*SRTT allows a sender to wait long enough
   to know that an ACK is overdue.  Under normal circumstances, i.e. no
   losses, min_RTT / 4, with
   an ACK typically arrives in one upper-bound of SRTT.  But choosing PTO to be
   exactly an SRTT

   If the reordering is likely to generate spurious probes given that
   network temporary then a large adapted reordering window
   would unnecessarily delay variance and even end-system timings can easily push an
   ACK to be above an SRTT.  We chose PTO loss recovery later.  Therefore, RACK
   persists the inflated RACK.reo_wnd for only 16 loss recoveries, after
   which it resets RACK.reo_wnd to be its starting value, min_RTT / 4.  The
   downside of resetting the next integral
   multiple reordering window is the risk of SRTT.

   WCDelAckT stands triggering
   spurious fast recovery episodes if the reordering remains high.  The
   rationale for worst case delayed ACK timer.  When FlightSize
   is 1, PTO this approach is inflated by WCDelAckT time to compensate for a potential
   long delayed ACK timer at bound such spurious recoveries to
   approximately once every 16 recoveries (less than 7%).

   To track the receiver.  The RECOMMENDED value linear scaling factor for
   WCDelAckT is 200ms.

   Finally, if the time at adaptive reordering
   window, RACK uses the variable RACK.reo_wnd_mult, which an RTO would fire (here denoted
   "TCP_RTO_expire") is sooner than
   initialized to 1 and adapts with the computed time for following pseudocode, which
   implements the PTO, then
   a probe above algorithm:

   RACK_update_reo_wnd():

       /* DSACK-based reordering window adaptation */
       If RACK.dsack_round is scheduled to be sent at not None AND
          SND.UNA >= RACK.dsack_round:
           RACK.dsack_round = None
       /* Grow the reordering window per round that earlier time.

7.5.2.  Phase 2: Sending a loss probe

   When sees DSACK.
          Reset the PTO fires, transmit a probe data segment:

   TLP_send_probe(): window after 16 DSACK-free recoveries */
       If an unsent segment exists RACK.dsack_round is None AND
          any DSACK option is present on latest received ACK:
           RACK.dsack_round = SND.NXT
           RACK.reo_wnd_mult += 1
           RACK.reo_wnd_persist = 16
       Else if exiting Fast or RTO recovery:
           RACK.reo_wnd_persist -= 1
           If RACK.reo_wnd_persist <= 0:
               RACK.reo_wnd_mult = 1

       If RACK.reordering_seen is FALSE:
           If in Fast or RTO recovery:
               Return 0
           Else if RACK.segs_sacked >= DupThresh:
               Return 0
       Return min(RACK.min_RTT / 4 * RACK.reo_wnd_mult, SRTT)

   Step 5: Detect losses.

   For each segment that has not been SACKed, RACK considers that
   segment lost if another segment that was sent later has been
   delivered, and the receive reordering window has passed.  RACK considers the
   reordering window allows new data to be sent:
              Transmit have passed if the RACK.segment was sent
   sufficiently after the lowest-sequence unsent segment of up to SMSS
              Increment FlightSize by in question, or a sufficient time has
   elapsed since the size RACK.segment was S/ACKed, or some combination of
   the newly-sent two.  More precisely, RACK marks a segment lost if:

    RACK.xmit_ts >= Segment.xmit_ts
           AND
    (RACK.xmit_ts - Segment.xmit_ts) + (now - RACK.ack_ts) >= RACK.reo_wnd

   Solving this second condition for "now", the moment at which a
   segment

       Else if TLPRxtOut is not set:
              Retransmit marked lost, yields:

   now >= Segment.xmit_ts + RACK.reo_wnd + (RACK.ack_ts - RACK.xmit_ts)

   Then (RACK.ack_ts - RACK.xmit_ts) is the highest-sequence round trip time of the most
   recently (re)transmitted segment sent so far
              TLPRxtOut = true
              TLPHighRxt = SND.NXT
       The cwnd remains unchanged that's been delivered.  When
   segments are delivered in order, the loss probe most recently (re)transmitted
   segment that's been delivered is a retransmission, also the sender uses most recently delivered,
   hence RACK.rtt == RACK.ack_ts - RACK.xmit_ts.  But if segments were
   reordered, then the highest-
   sequence segment delivered most recently was sent so far.  This is in order to deal with before
   the
   retransmission ambiguity problem in TCP.  Suppose a sender sends N
   segments, and then retransmits most recently (re)transmitted segment.  Hence RACK.rtt >
   (RACK.ack_ts - RACK.xmit_ts).

   Since RACK.RTT >= (RACK.ack_ts - RACK.xmit_ts), the last segment (segment N) as a loss
   probe, and then previous equation
   reduces to saying that the sender receives can declare a SACK for segment N.  As long as
   the sender waits for any required RACK reordering settling timer lost when:

   now >= Segment.xmit_ts + RACK.reo_wnd + RACK.rtt

   In turn, that is equivalent to
   then expire, it doesn't matter if stating that SACK was for the original
   transmission of segment N or the TLP retransmission; in either case
   the arrival of the SACK for a RACK sender should
   declare a segment N provides evidence lost when:

   Segment.xmit_ts + RACK.rtt + RACK.reo_wnd - now <= 0

   Note that if the N-1
   segments preceding segment N were likely lost.  In value on the case where
   there left hand side is only one original outstanding segment of data (N=1), the
   same logic (trivially) applies: an ACK for a single outstanding
   segment tells positive, it
   represents the sender remaining wait time before the N-1=0 segments preceding that segment
   were lost.  Furthermore, whether there are N>1 or N=1 outstanding
   segments, there is deemed lost.
   But this risks a question about whether the original last segment timeout (RTO) if no more ACKs come back (e.g, due to
   losses or its TLP retransmission were lost; application-limited transmissions) to trigger the marking.
   For timely loss detection, the sender estimates this using
   TLP recovery detection (see below).

   Note that whether or not is RECOMMENDED to install a probe was sent in TLP_send_probe(),
   reordering timer.  This timer expires at the
   sender MUST arm earliest moment when
   RACK would conclude that all the RTO timer, not unacknowledged segments within the PTO timer, at
   reordering window were lost.

   The following pseudocode implements the end of
   TLP_send_probe() if FlightSize algorithm above.  When an ACK
   is not zero.  This ensures that received or the
   sender does not send repeated, back-to-back TLP probes.  Checking
   TLPRxtOut prior to sending RACK reordering timer expires, call
   RACK_detect_loss_and_arm_timer().  The algorithm breaks timestamp
   ties by using the loss probe TCP sequence space, since high-speed networks often
   have multiple segments with identical timestamps.

   RACK_detect_loss():
       timeout = 0
       RACK.reo_wnd = RACK_update_reo_wnd()
       For each segment, Segment, not acknowledged yet:
           If Segment.lost is also critical TRUE AND Segment.retransmitted is FALSE:
               Continue /* Segment lost but not yet retransmitted */

           If RACK_sent_after(RACK.xmit_ts, RACK.end_seq,
                              Segment.xmit_ts, Segment.end_seq):
               remaining = Segment.xmit_ts + RACK.rtt +
                           RACK.reo_wnd - Now()
               If remaining <= 0:
                   Segment.lost = TRUE
               Else:
                   timeout = max(remaining, timeout)
       Return timeout

   RACK_detect_loss_and_arm_timer():
       timeout = RACK_detect_loss()
       If timeout != 0
           Arm the RACK timer to avoid
   TLP loops if call
           RACK_detect_loss_and_arm_timer() after timeout

   As an application writes periodically at optimization, an interval less
   than PTO.

7.5.3.  Phase 3: ACK processing

   On each incoming ACK, the sender should check the conditions in Step
   1 of Phase 1 implementation can choose to see if it should schedule (or reschedule) the loss
   probe timer.

7.6.  TLP recovery detection

   If the check only loss in an outstanding window of data was the last
   segment, then a TLP loss probe retransmission of
   segments that data segment
   might repair the loss.  TLP recovery detection examines ACKs to
   detect when the probe might have repaired a loss, and thus allows
   congestion control to properly reduce been sent before RACK.xmit_ts.  This can be more
   efficient than scanning the congestion window (cwnd)
   [RFC5681].

   Consider a TLP retransmission episode where a sender retransmits a
   tail packet entire SACK scoreboard, especially when
   there are many segments in a flight.  The TLP retransmission episode ends when
   the sender receives an ACK with implementation can use a SEG.ACK above the SND.NXT
   separate doubly-linked list ordered by Segment.xmit_ts and inserts a
   segment at the
   time the episode started (i.e.  TLPHighRxt).  During the TLP
   retransmission episode tail of the sender checks for list when it is (re)transmitted, and
   removes a duplicate ACK segment from the list when it is delivered or
   D-SACK indicating that both marked lost.
   In Linux TCP this optimization improved CPU usage by orders of
   magnitude during some fast recovery episodes on high-speed WAN
   networks.

6.3.  Upon RTO expiration

   Upon RTO timer expiration, RACK marks the original first outstanding segment and TLP
   retransmission arrived at the receiver, meaning there
   as lost (since it was no loss
   that needed repairing.  If the TLP sender does not receive such sent an
   indication before RTO ago); for all the end of other segments
   RACK only marks the segment lost if the TLP retransmission episode, then it
   MUST estimate that either time elapsed since the original data
   segment or was transmitted is at least the TLP
   retransmission were lost, and congestion control MUST react
   appropriately to that loss as it would any other loss.

   Since a significant fraction sum of the hosts that support SACK do not
   support duplicate selective acknowledgments (D-SACKs) [RFC2883] recent RTT and the
   reordering window.

   RACK_mark_losses_on_RTO():
       For each segment, Segment, not acknowledged yet:
           If SEG.SEQ == SND.UNA OR
              Segment.xmit_ts + RACK.rtt + RACK.reo_wnd - Now() <= 0:
               Segment.lost = TRUE

7.  TLP algorithm for detecting such lost segments relies only on basic
   SACK support [RFC2018].

7.6.1. Algorithm Details

7.1.  Initializing and resetting state

   When a connection is created, or suffers

   Reset TLP.is_retrans and TLP.end_seq when initiating a retransmission timeout, or
   enters connection,
   fast recovery, it executes the following:

       TLPRxtOut or RTO recovery.

       TLP.is_retrans = false

7.6.2.  Recording

7.2.  Scheduling a loss probe states

   Senders MUST only send

   The sender schedules a TLP loss probe retransmission if TLPRxtOut
   is false.  This ensures that at any given time timeout (PTO) to transmit a connection has at
   most one outstanding TLP retransmission.  This allows segment
   during the normal transmission process.  The sender to
   use the algorithm described in this section to estimate whether any SHOULD start or
   restart a loss probe PTO timer after transmitting new data segments were lost.

   Note that this condition only restricts TLP (that was
   not itself a loss probes probe) or upon receiving an ACK that cumulatively
   acknowledges new data, unless it is already in fast recovery, RTO
   recovery, or the sender has segments delivered out-of-order (i.e.
   RACK.segs_sacked is not zero).  These conditions are
   retransmissions.  There may be an arbitrary excluded because
   they are addressed by similar mechanisms, like Limited Transmit
   [RFC3042], the RACK reordering timer, and F-RTO [RFC5682].  Further,
   prior to scheduling a PTO the sender SHOULD cancel any pending PTO,
   RTO, RACK reordering timer, or zero window probe (ZWP) timer
   [RFC793].

   The sender calculates the PTO interval by taking into account a
   number of outstanding
   unacknowledged TLP loss probes that consist of new, previously-unsent
   data, since factors.

   First, the retransmission timeout and fast recovery algorithms
   are sufficient default PTO interval is 2*SRTT.  By that time, it is
   prudent to detect losses of such probe segments.

   Upon sending a TLP probe declare that an ACK is a retransmission, the sender sets
   TLPRxtOut overdue, since under normal
   circumstances, i.e. no losses, an ACK typically arrives in one SRTT.
   Choosing PTO to true be exactly an SRTT would risk causing spurious
   probes, given that network and TLPHighRxt end-host delay variance can cause an
   ACK to SND.NXT.

7.6.3.  Detecting recoveries accomplished by loss probes

   Step 1: Track ACKs indicating receipt be delayed beyond SRTT.  Hence the PTO is conservatively
   chosen to be the next integral multiple of original and retransmitted
   segments

   A sender considers both SRTT.

   Second, when there is no SRTT estimate available, the original segment and TLP probe
   retransmission segment as acknowledged if either PTO SHOULD be 1 or 2 are true:

   1.
   second.  This conservative value corresponds to the RTO value when no
   SRTT is available, per [RFC6298].

   Third, when FlightSize is one segment, the sender MAY inflate PTO by
   TLP.max_ack_delay to accommodate a duplicate potential delayed acknowledgment (as defined in [RFC5681],
       section 2),
   and all reduce the risk of spurious retransmissions.  The actual value of
   TLP.max_ack_delay is implementation-specific.

   Finally, if the time at which an RTO would fire (here denoted
   "TCP_RTO_expiration()") is sooner than the computed time for the PTO,
   then the sender schedules a TLP to be sent at that RTO time.

   Summarizing these considerations in pseudocode form, a sender SHOULD
   use the following conditions are met:

       1.  TLPRxtOut logic to select the duration of a PTO:

   TLP_calc_PTO():
       If SRTT is true

       2.  SEG.ACK == TLPHighRxt

       3.  SEG.ACK == SND.UNA

       4. available:
           PTO = 2 * SRTT
           If FlightSize is one segment:
              PTO += TLP.max_ack_delay
       Else:
           PTO = 1 sec

       If Now() + PTO > TCP_RTO_expiration():
           PTO = TCP_RTO_expiration() - Now()

7.3.  Sending a loss probe upon PTO expiration

   When the segment contains no SACK blocks for sequence ranges above
           TLPHighRxt

       5. PTO timer expires, the segment contains no sender SHOULD transmit a previously
   unsent data

       6. segment, if the segment is not a receive window update

   2.  This is an ACK acknowledging a sequence number at or above
       TLPHighRxt allows, and it contains a D-SACK; i.e. all of increment the following
       conditions are met:

       1.  TLPRxtOut is true

       2.  SEG.ACK >= TLPHighRxt

       3.
   FlightSize accordingly.  Note that FlightSize could be one packet
   greater than the congestion window temporarily until the next ACK contains a D-SACK block
   arrives.

   If either of the conditions such a segment is met, not available, then the sender estimates that
   the receiver received both SHOULD retransmit
   the original data highest-sequence segment and the TLP
   probe retransmission, and sent so far and set TLP.is_retrans to
   true.  This segment is chosen in order to deal with the
   retransmission ambiguity problem in TCP.  Suppose a sender considers the TLP episode to
   be done, sends N
   segments, and records that fact by setting TLPRxtOut to false.

   Step 2: Mark then retransmits the end of last segment (segment N) as a TLP retransmission episode loss
   probe, and detect
   losses

   If then the sender receives a cumulative ACK SACK for data beyond segment N.  As long as
   the sender waits for the RACK reordering window expire, it doesn't
   matter if that SACK was for the original transmission of segment N or
   the TLP loss
   probe retransmission then, retransmission; in either case the absence arrival of reordering on the return
   path of ACKs, it should have received any ACKs SACK for
   segment N provides evidence that the N-1 segments preceding segment N
   were likely lost.

   In the case where there is only one original outstanding segment of
   data (N=1), the original same logic (trivially) applies: an ACK for a single
   outstanding segment and TLP probe retransmission segment.  At that time, if tells the
   TLPRxtOut flag is still true and thus indicates sender the N-1=0 segments preceding
   that segment were lost.  Furthermore, whether there are N>1 or N=1
   outstanding segments, there is a question about whether the original
   last segment or its TLP probe retransmission remains unacknowledged, then were lost; the sender should presume
   that at least one of its data segments
   estimates whether there was lost, so it SHOULD invoke such a congestion control response equivalent to fast recovery.

   More precisely, on each ACK the sender executes the following:

       if (TLPRxtOut and SEG.ACK >= TLPHighRxt) {
           TLPRxtOut = false
           EnterRecovery()
           ExitRecovery()
       }

8.  RACK and loss using TLP Discussion

8.1.  Advantages recovery detection
   (see below).

   The biggest advantage of sender MUST follow the RACK is that every data packet, whether it
   is an original data transmission or procedures in the ''Upon
   Transmitting a retransmission, can be used to
   detect Data Segment'' section (see above) upon sending either
   a retransmission or new data loss probe.  This is critical for
   detecting losses of using the packets sent chronologically ACK for the loss probe.  Furthermore,
   prior to it.

   Example: TAIL DROP.  Consider sending a loss probe, the sender MUST check that transmits a window of
   three data packets (P1, P2, P3), and P1 and P3 are lost.  Suppose the
   transmission of each packet there is at least RACK.reo_wnd (1 millisecond
   by default) after the transmission of the no
   other previous packet.  RACK will
   mark P1 as lost when the SACK of P2 is received, and this will
   trigger the retransmission of P1 as R1.  When R1 is cumulatively
   acknowledged, RACK will mark P3 as lost and loss probe still in flight.  This ensures that at any
   given time the sender will
   retransmit P3 as R3.  This example illustrates how RACK is able to
   repair certain drops has at most one additional packet in flight
   beyond the tail of a transaction without any timer.
   Notice that neither congestion window limit.  This invariant is maintained
   using the conventional duplicate ACK threshold
   [RFC5681], nor [RFC6675], nor state variable TLP.end_seq, which indicates the Forward Acknowledgment [FACK]
   algorithm can detect such losses, because of latest
   unacknowledged TLP loss probe's ending sequence.  It is reset when
   the required packet loss probe has been acknowledged or
   sequence count.

   Example: LOST RETRANSMIT.  Consider is deemed lost or irrelevant.
   After attempting to send a window loss probe, regardless of three data packets
   (P1, P2, P3) that are sent; P1 and P2 are dropped.  Suppose whether a loss
   probe was sent, the
   transmission of each packet sender MUST re-arm the RTO timer, not the PTO
   timer, if FlightSize is at least RACK.reo_wnd (1 millisecond
   by default) after not zero.  This ensures RTO recovery remains
   the transmission of last resort if TLP fails.  The following pseudo code summarizes
   the previous packet.  When P3 operations.

   TLP_send_probe():

       If TLP.end_seq is SACKed, RACK will mark P1 and P2 lost and they will be
   retransmitted as R1 None:
           TLP.is_retrans = false
           Segment = send buffer segment starting at SND.NXT
           If Segment exists and R2.  Suppose R1 is lost again but R2 is
   SACKed; RACK will mark R1 lost for retransmission again.  Again,
   neither fits the conventional three duplicate ACK threshold approach, nor
   [RFC6675], nor peer receive window limit:
              /* Transmit the Forward Acknowledgment [FACK] algorithm can detect
   such losses.  And such a lost retransmission is very common when TCP
   is being rate-limited, particularly lowest-sequence unsent Segment */
              Transmit Segment
              RACK_transmit_data(Segment)
              TLP.end_seq = SND.NXT
              Increase FlightSize by Segment length
           Else:
              /* Retransmit the highest-sequence Segment sent */
              Segment = send buffer segment ending at SND.NXT
              Transmit Segment
              RACK_retransmit_data(Segment)
              TLP.end_seq = SND.NXT
              TLP.is_retrans = true

7.4.  Detecting losses by token bucket policers with
   large bucket depth and low rate limit.  Retransmissions are often
   lost repeatedly because standard congestion control requires multiple
   round trips to reduce the rate below ACK of the policed rate.

   Example: SMALL DEGREE OF REORDERING.  Consider loss probe

   When there is packet loss in a common reordering
   event: flight ending with a window of packets are sent as (P1, P2, P3).  P1 and P2 carry loss probe, the
   feedback solicited by a full payload loss probe will reveal one of MSS octets, but P3 has only a 1-octet payload.
   Suppose two scenarios,
   depending on the sender has detected reordering previously and thus
   RACK.reo_wnd is min_RTT/4.  Now P3 is reordered and delivered first,
   before P1 and P2.  As long as P1 and P2 are delivered within
   min_RTT/4, pattern of losses.

7.4.1.  General case: detecting packet losses using RACK will not consider P1 and P2 lost.  But if P1

   If the loss probe and P2 the ACK that acknowledges the probe are
   delivered outside the reordering window, then RACK will still
   falsely mark P1 and P2 lost.  We discuss how successfully, RACK-TLP uses this ACK -- just as it would
   with any other ack -- to detect if any segments sent prior to reduce false
   positives in the end of this section.

   The examples above show that
   probe were dropped.  RACK is particularly useful when would typically infer that any
   unacknowledged data segments sent before the
   sender is limited by loss probe were lost,
   since they were sent sufficiently far in the application, which is common past (at least one PTO
   has elapsed, plus one round-trip for
   interactive, request/response traffic.  Similarly, RACK still works
   when the loss probe to be ACKed).
   More specifically, RACK_detect_loss() (step 5) would mark those
   earlier segments as lost.  Then the sender is limited would trigger a fast
   recovery to recover those losses.

7.4.2.  Special case: detecting a single loss repaired by the receive window, which is common for
   applications that use the receive window to throttle loss probe

   If the sender.

   For some implementations (e.g., Linux), RACK works quite efficiently
   with TCP Segmentation Offload (TSO).  RACK always marks TLP retransmission repairs all the entire
   TSO blob lost because in-flight sequence
   ranges (i.e. only the packets last segment in the same TSO blob have flight was lost), the same
   transmission timestamp.  By contrast, ACK
   for the algorithms based on
   sequence counting (e.g., [RFC6675][RFC5681]) may mark only loss probe appears to be a subset
   of packets in the TSO blob lost, forcing regular cumulative ACK, which
   would not normally trigger the stack congestion control response to perform
   expensive fragmentation of the TSO blob, or this
   packet loss event.  The following TLP recovery detection mechanism
   examines ACKs to selectively tag
   individual packets lost in the scoreboard.

8.2.  Disadvantages

   RACK requires the sender detect this special case to record make congestion control
   respond properly [RFC5681].

   After a TLP retransmission, the transmission time sender checks for this special case
   of each
   packet sent at a clock granularity of one millisecond or finer.  TCP
   implementations single loss that record this already is recovered by the loss probe itself.  To
   accomplish this, the sender checks for RTT estimation do not
   require any new per-packet state.  But implementations a duplicate ACK or DSACK
   indicating that are both the original segment and TLP retransmission
   arrived at the receiver, meaning there was no loss.  If the TLP
   sender does not
   yet recording packet transmission times will need to add per-packet
   internal state (commonly receive such an indication, then it SHOULD assume
   that either 4 or 8 octets per packet the original data segment or TSO blob)
   to track transmission times.  In contrast, the conventional [RFC6675]
   loss detection approach TLP retransmission were
   lost, for congestion control purposes.

   If the TLP retransmission is spurious, a receiver that uses DSACK
   would return an ACK that covers TLP.end_seq with a DSACK option (Case
   1).  If the receiver does not require support DSACK, it would return a DUPACK
   without any per-packet state beyond
   the SACK scoreboard.  This is particularly useful on ultra-low RTT
   networks where option (Case 2).  If the RTT is far less than sender receives an ACK
   matching either case, then the sender TCP clock
   granularity (e.g. inside data-centers).

   RACK can easily estimates that the receiver
   received both the original data segment and optionally support the conventional approach in
   [RFC6675][RFC5681] by resetting TLP probe
   retransmission, and so the reordering window to zero when sender considers the threshold is met.  Note TLP episode to be
   done, and records that this approach differs slightly from
   [RFC6675] which considers a packet lost when at least DupThresh
   higher-sequence packets are SACKed.  RACK's approach considers a
   packet lost when at least one higher fact by setting TLP.end_seq to None.

   Upon receiving an ACK that covers some sequence packet is SACKed number after
   TLP.end_seq, the sender should have received any ACKs for the
   original segment and TLP probe retransmission segment.  At that time,
   if the total number of SACKed packets TLP.end_seq is still set, and thus indicates that the TLP
   probe retransmission remains unacknowledged, then the sender should
   presume that at least DupThresh.  For
   example, suppose a connection sends 10 packets, and packets 3, 5, 7
   are SACKed.  [RFC6675] considers packets 1 and 2 lost.  RACK
   considers packets 1, 2, 4, 6 one of its data segments was lost.

8.3.  Adjusting the reordering window

   When  The sender
   then SHOULD invoke a congestion control response equivalent to a fast
   recovery.

   More precisely, on each ACK the sender detects packet reordering, RACK uses a reordering
   window executes the following:

   TLP_process_ack(ACK):
       If TLP.end_seq is not None AND ACK.seq >= TLP.end_seq:
           If not TLP.is_retrans:
               TLP.end_seq = None    /* TLP of min_rtt / 4.  It uses new data delivered */
           Else if ACK has a DSACK option matching TLP.end_seq:
               TLP.end_seq = None    /* Case 1, above */
           Else If SEG.ACK > TLP.end_seq:
               TLP.end_seq = None    /* Repaired the minimum RTT single loss */
               (Invoke congestion control to accommodate
   reordering introduced by packets traversing slightly different paths
   (e.g., router-based parallelism schemes) or out-of-order deliveries
   in react on
                the lower link layer (e.g., wireless links using link-layer
   retransmission).  RACK uses loss event the probe has repaired)
           Else If ACK is a quarter DUPACK without any SACK option:
               TLP.end_seq = None     /* Case 2, above */

8.  Discussion

8.1.  Advantages and disadvantages

   The biggest advantage of minimum RTT because Linux
   TCP RACK-TLP is that every data segment, whether
   it is an original data transmission or a retransmission, can be used
   to detect losses of the same factor in its implementation segments sent chronologically prior to delay Early
   Retransmit [RFC5827] it.
   This enables RACK-TLP to reduce spurious loss detections use fast recovery in the
   presence cases with application-
   limited flights of reordering, and experience shows that this seems to work
   reasonably well.  We have evaluated using the smoothed RTT (SRTT from
   [RFC6298] RTT estimation) data, lost retransmissions, or data segment
   reordering events.  Consider the most recently measured RTT
   (RACK.rtt) using an experiment similar to that in following examples:

   1.  Packet drops at the Performance
   Evaluation section.  They do not make any significant difference in
   terms of total recovery latency.

8.4.  Relationships with other loss recovery algorithms

   The primary motivation end of RACK is to ultimately provide an application data flight: Consider a simple
       sender that transmits an application-limited flight of three data
       segments (P1, P2, P3), and P1 and
   general replacement for some of P3 are lost.  Suppose the standard loss recovery algorithms
   [RFC5681][RFC6675][RFC5827][RFC4653], as well as some nonstandard
   ones [FACK][THIN-STREAM].  While RACK can be a supplemental loss
   detection mechanism on top
       transmission of these algorithms, this each segment is not
   necessary, because RACK implicitly subsumes most of them.

   [RFC5827][RFC4653][THIN-STREAM] dynamically adjusts at least RACK.reo_wnd after the duplicate ACK
   threshold based on
       transmission of the current or previous flight sizes.  RACK takes
   a different approach, by using only one ACK event and a reordering
   window. segment.  RACK can be seen will mark P1 as an extended Early Retransmit [RFC5827]
   without a FlightSize limit but with an additional reordering window.
   [FACK] considers an original packet to be lost
       when its sequence
   range is sufficiently far below the highest SACKed sequence.  In some
   sense RACK can be seen as a generalized form of FACK that operates in
   time space instead SACK of sequence space, enabling it to better handle
   reordering, application-limited traffic, and lost retransmissions.

   Since the 3 duplicate ACK threshold for triggering fast recovery
   [RFC5681] has been widely deployed P2 is received, and usually works well in the
   absence of reordering, RACK uses this signal to will trigger fast recovery
   if a connection has not observed reordering.

   RACK is compatible with and does not interfere with the standard RTO
   [RFC6298], RTO-restart [RFC7765], F-RTO [RFC5682] and Eifel
   algorithms [RFC3522].  This
       retransmission of P1 as R1.  When R1 is because RACK only detects loss by
   using ACK events.  It neither changes the RTO timer calculation nor
   detects spurious timeouts.

   Furthermore, RACK naturally works well with Tail Loss Probe [TLP]
   because a tail loss probe solicits either an ACK or SACK, which can
   be used by RACK to detect more losses.  RACK can be used to relax
   TLP's requirement for using FACK cumulatively
       acknowledged, RACK will mark P3 as lost and retransmitting the the highest-
   sequenced packet, because sender will
       retransmit P3 as R3.  This example illustrates how RACK is agnostic to packet sequence
   numbers, and uses transmission time instead.  Thus TLP could be
   modified able
       to retransmit repair certain drops at the first unacknowledged packet, which could
   improve application latency.

8.5.  Interaction with congestion control

   RACK intentionally decouples loss detection from congestion control.
   RACK only detects losses; it does not modify tail of a transaction without an
       RTO recovery.  Notice that neither the congestion control conventional duplicate ACK
       threshold [RFC5681], nor [RFC6675], nor the Forward
       Acknowledgment [FACK] algorithm [RFC5681][RFC6937].  A packet marked lost by RACK SHOULD
   NOT be retransmitted until congestion control deems this appropriate.

   RACK is applicable for both fast recovery and recovery after can detect such losses, because
       of the required segment or sequence count.

   2.  Lost retransmission: Consider a
   retransmission timeout (RTO) in [RFC5681].  RACK applies equally to
   fast recovery flight of three data segments
       (P1, P2, P3) that are sent; P1 and RTO recovery because RACK P2 are dropped.  Suppose the
       transmission of each segment is purely based on at least RACK.reo_wnd after the
       transmission time order of packets. the previous segment.  When a packet retransmitted by
   RTO P3 is acknowledged, SACKed, RACK
       will mark any unacked packet sent
   sufficiently prior to the RTO P1 and P2 lost and they will be retransmitted as lost, because at least one RTT has
   elapsed since these packets were sent. R1 and
       R2.  Suppose R1 is lost again but R2 is SACKed; RACK may detect losses faster or slower than will mark R1
       lost and trigger retransmission again.  Again, neither the
       conventional three duplicate ACK threshold approach does.  RACK approach, nor
       [RFC6675], nor the Forward Acknowledgment [FACK] algorithm can
       detect such losses.  And such a lost retransmission can detect losses faster happen
       when TCP is being rate-limited, particularly by not requiring three DUPACKs, so token bucket
       policers with large bucket depth and low rate limit; in such
       cases retransmissions are often lost repeatedly because standard
       congestion control may requires multiple round trips to reduce the
   congestion window earlier.  When
       rate below the network path policed rate.

   3.  Packet reordering: Consider a simple reordering event where a
       flight of segments are sent as (P1, P2, P3).  P1 and P2 carry a
       full payload of MSS octets, but P3 has both only a 1-octet payload.
       Suppose the sender has detected reordering previously and losses, thus
       RACK.reo_wnd is min_RTT/4.  Now P3 is reordered and delivered
       first, before P1 and P2.  As long as P1 and P2 are delivered
       within min_RTT/4, RACK detects losses slower by waiting for will not consider P1 and P2 lost.  But if
       P1 and P2 are delivered outside the reordering
   window to expire.  TCP may continue to increase window, then RACK
       will still spuriously mark P1 and P2 lost.

   The examples above show that RACK-TLP is particularly useful when the congestion window
   upon receiving ACKs during this time, making
   sender is limited by the application, which can happen with
   interactive or request/response traffic.  Similarly, RACK still works
   when the sender more
   aggressive.  Certain congestion control algorithms is limited by the receive window, which can benefit from
   accounting for this increase in happen
   with applications that use the congestion receive window during to throttle the
   reordering window.

8.5.1.  Example: interactions with congestion control

   The following simple example compares how RACK and non-RACK loss
   detection interacts sender.

   RACK-TLP works more efficiently with congestion control: suppose a TCP sender has
   a congestion window (cwnd) of 20 packets Segmentation Offload (TSO)
   compared to DUPACK-counting.  RACK always marks the entire TSO
   aggregate lost because the segments in the same TSO aggregate have
   the same transmission timestamp.  By contrast, the algorithms based
   on sequence counting (e.g., [RFC6675][RFC5681]) may mark only a SACK-enabled
   connection.  It sends 10 data packets and all
   subset of them are lost.

   Without RACK, segments in the sender would time out, reset cwnd to 1, and
   retransmit TSO aggregate lost, forcing the first packet.  It would take four round trips (1 + 2 +
   4 + 3 = 10) stack to retransmit all
   perform expensive fragmentation of the 10 TSO aggregate, or to
   selectively tag individual segments lost packets using slow start. in the scoreboard.

   The recovery latency would be RTO + 4*RTT, with an ending cwnd main drawback of 4
   packets due to congestion window validation.

   With RACK, a sender would send RACK-TLP is the TLP after 2*RTT and get a DUPACK.
   If additional states required
   compared to DUPACK-counting.  RACK requires the sender implements Proportional Rate Reduction [RFC6937] it
   would slow start to retransmit the remaining 9 lost packets since record the
   number
   transmission time of packets in flight (0) each segment sent at a clock granularity that is lower
   finer than 1/4 of the slow start
   threshold (10).  The slow start would again take four round trips (1
   + 2 + 4 + 3 = 10).  The recovery latency would minimum RTT of the connection.  TCP
   implementations that record this already for RTT estimation do not
   require any new per-packet state.  But implementations that are not
   yet recording segment transmission times will need to add per-packet
   internal state (expected to be 2*RTT + 4*RTT, with
   an ending cwnd set either 4 or 8 octets per segment or
   TSO aggregate) to the slow start threshold of 10 packets. track transmission times.  In both cases, the sender after contrast, [RFC6675]
   loss detection approach does not require any per-packet state beyond
   the recovery would be in congestion
   avoidance.  The difference in recovery latency (RTO + 4*RTT vs 6*RTT)
   can be significant if SACK scoreboard; this is particularly useful on ultra-low RTT
   networks where the RTT is much smaller may be less than the minimum RTO (1
   second in RFC6298) or if the RTT is large.  The former case is common
   in local area networks, data-center networks, or content distribution
   networks sender TCP clock
   granularity (e.g. inside data-centers).

8.2.  Relationships with deep deployments. other loss recovery algorithms

   The latter case primary motivation of RACK-TLP is more common in
   developing regions with highly congested and/or high-latency
   networks.

8.6.  TLP to provide a general
   alternative to some of the standard loss recovery detection with delayed ACKs

   Delayed ACKs complicate algorithms
   [RFC5681][RFC6675][RFC5827][RFC4653].  [RFC5827][RFC4653] dynamically
   adjusts the detection of repairs done by TLP, since
   with a delayed duplicate ACK threshold based on the sender receives one fewer ACK than would
   normally current or previous
   flight sizes.  RACK-TLP takes a different approach by using a time-
   based reordering window.  RACK-TLP can be expected.  To mitigate this complication, before sending seen as an extended Early
   Retransmit [RFC5827] without a TLP loss probe retransmission, the sender should attempt FlightSize limit but with an
   additional reordering window.  [FACK] considers an original segment
   to wait
   long enough that the receiver has sent any delayed ACKs that it be lost when its sequence range is
   withholding.  The sender algorithm described above features such a
   delay, in sufficiently far below the
   highest SACKed sequence.  In some sense RACK-TLP can be seen as a
   generalized form of WCDelAckT.  Furthermore, if the receiver
   supports duplicate selective acknowledgments (D-SACKs) [RFC2883] then FACK that operates in the case time space instead of a delayed ACK the sender's TLP recovery detection
   algorithm (see above) can use the D-SACK information
   sequence space, enabling it to infer that
   the original better handle reordering, application-
   limited traffic, and TLP retransmission both arrived at lost retransmissions.

   RACK-TLP is compatible with the receiver.

   If there standard RTO [RFC6298], RTO-restart
   [RFC7765], F-RTO [RFC5682] and Eifel algorithms [RFC3522].  This is ACK
   because RACK-TLP only detects loss or a delayed by using ACK without a D-SACK, then this events.  It neither
   changes the RTO timer calculation nor detects spurious RTO.

8.3.  Interaction with congestion control

   RACK-TLP intentionally decouples loss detection from congestion
   control.  RACK-TLP only detects losses; it does not modify the
   congestion control algorithm is conservative, because [RFC5681][RFC6937].  A segment marked
   lost by RACK-TLP MUST not be retransmitted until congestion control
   deems this appropriate.

   The only exception -- the sender will reduce cwnd when only way in fact there was no packet loss.  In practice this which RACK-TLP modulates the
   congestion control algorithm -- is acceptable,
   and potentially that one outstanding loss probe
   can be sent even desirable: if there is reverse path the congestion
   then reducing cwnd can be prudent.

8.7.  RACK window is full.  However, this
   temporary over-commit is accounted for other transport protocols

   RACK can be implemented and credited in other transport protocols.  The algorithm
   can be simplified by skipping step 3 if the protocol can support a
   unique transmission or in-flight
   data tracked for congestion control, so that congestion control will
   erase the over-commit upon the next ACK.

   If packet identifier (e.g. losses happen after the reordering window has been
   increased by DSACK, RACK-TLP may take longer to detect losses than
   the pure DUPACK-counting approach.  In this case TCP timestamp options

   [RFC7323]).  For example, may continue to
   increase the QUIC protocol implements RACK [QUIC-
   LR]. congestion window upon receiving ACKs during this time,
   making the sender more aggressive.

   The [Sprout] following simple example compares how RACK-TLP and non-RACK-TLP
   loss detection algorithm was also independently
   designed to use interacts with congestion control: suppose a 10ms reordering sender
   has a congestion window to improve its loss
   detection.

9.  Experiments (cwnd) of 20 segments on a SACK-enabled
   connection.  It sends 10 data segments and Performance Evaluations

   RACK all of them are lost.

   Without RACK-TLP, the sender would time out, reset cwnd to 1, and TLP have been deployed at Google, for both connections
   retransmit the first segment.  It would take four round trips (1 + 2
   + 4 + 3 = 10) to
   users in retransmit all the Internet 10 lost segments using slow
   start.  The recovery latency would be RTO + 4*RTT, with an ending
   cwnd of 4 segments due to congestion window validation.

   With RACK-TLP, a sender would send the TLP after 2*RTT and internally.  We conducted get a performance
   evaluation experiment for
   DUPACK, enabling RACK to detect the losses and TLP on a small set trigger fast recovery.
   If the sender implements Proportional Rate Reduction [RFC6937] it
   would slow start to retransmit the remaining 9 lost segments since
   the number of Google Web
   servers in Western Europe that serve mostly European and some African
   countries.  The experiment lasted three days segments in March 2017. flight (0) is lower than the slow start
   threshold (10).  The
   servers were divided evenly into slow start would again take four groups of roughly 5.3 million
   flows each:

   Group 1 (control): RACK off, TLP off, RFC 6675 on

   Group 2: RACK on, TLP off, RFC 6675 on

   Group 3: RACK on, TLP on, RFC 6675 on

   Group 4: RACK on, TLP on, RFC 6675 off

   All groups used Linux round trips (1
   + 2 + 4 + 3 = 10) to retransmit all the lost segments.  The recovery
   latency would be 2*RTT + 4*RTT, with CUBIC congestion control, an initial
   congestion window of 10 packets, and ending cwnd set to the fq/pacing qdisc.  In terms slow
   start threshold of specific 10 segments.

   The difference in recovery features, all groups enabled RFC5682 (F-RTO) but
   disabled FACK because it is not an IETF RFC.  FACK was excluded
   because latency (RTO + 4*RTT vs 6*RTT) can be
   significant if the goal of this setup RTT is to compare RACK and TLP to RFC-
   based loss recoveries.  Since TLP depends on either FACK much smaller than the minimum RTO (1 second
   in [RFC6298]) or RACK, we
   could not run another group that enables TLP only (with both RACK and
   FACK disabled).  Group 4 if the RTT is to test whether RACK plus TLP large.  The former case can
   completely replace the DupThresh-based [RFC6675]. happen in
   local area networks, data-center networks, or content distribution
   networks with deep deployments.  The servers sit behind a load balancer that distributes the
   connections evenly across latter case can happen in
   developing regions with highly congested and/or high-latency
   networks.

8.4.  TLP recovery detection with delayed ACKs

   Delayed ACKs complicate the four groups.

   Each group handles a similar number detection of connections and sends and repairs done by TLP, since
   with a delayed ACK the sender receives similar amounts of data.  We compare total time spent in one fewer ACK than would
   normally be expected.  To mitigate this complication, before sending
   a TLP loss recovery across groups.  The recovery time is measured from when probe retransmission, the recovery and retransmission starts, until sender should attempt to wait
   long enough that the remote host receiver has
   acknowledged sent any delayed ACKs that it is
   withholding.  The sender algorithm described above features such a
   delay, in the highest sequence (SND.NXT) at form of TLP.max_ack_delay.  Furthermore, if the time
   receiver supports DSACK then in the case of a delayed ACK the
   sender's TLP recovery
   started.  Therefore detection mechanism (see above) can use the
   DSACK information to infer that the recovery includes both fast recoveries original and
   timeout recoveries.

   Our data shows that Group 2 recovery latency is only 0.3% lower than TLP retransmission
   both arrived at the Group 1 recovery latency.  But Group 3 recovery latency receiver.

   If there is 25%
   lower than Group 1 due to ACK loss or a 40% reduction delayed ACK without a DSACK, then this
   algorithm is conservative, because the sender will reduce the
   congestion window when in RTO-triggered
   recoveries!  Therefore it fact there was no packet loss.  In practice
   this is important to implement both TLP acceptable, and potentially even desirable: if there is
   reverse path congestion then reducing the congestion window can be
   prudent.

8.5.  RACK for performance.  Group 4's total recovery latency is 0.02% lower
   than Group 3's, indicating that other transport protocols

   RACK plus TLP can successfully
   replace RFC6675 as a standalone recovery mechanism.

   We want to emphasize that the current experiment is limited be implemented in terms
   of network coverage. other transport protocols (e.g., [QUIC-
   LR]).  The connectivity in Western Europe is fairly
   good, therefore [Sprout] loss recovery is not detection algorithm was also independently
   designed to use a major performance bottleneck.
   We plan to expand our experiments 10ms reordering window to regions with worse connectivity,
   in particular on networks with strong traffic policing.

10. improve its loss
   detection.

9.  Security Considerations

   RACK does not change

   RACK-TLP algorithm behavior is based on information conveyed in SACK
   options, so it has security considerations similar to those described
   in the Security Considerations section of [RFC6675].

   Additionally, RACK-TLP has a lower risk profile for TCP.

   An interesting scenario than [RFC6675]
   because it is not vulnerable to ACK-splitting attacks [SCWA99]: for
   an MSS-size packet segment sent, the receiver or the attacker might send MSS
   ACKs that SACK or acknowledge one additional byte per ACK.  This
   would not fool RACK.  In such a scenario, RACK.xmit_ts would not advance
   advance, because all the
   sequences of sequence ranges within the packet are segment were
   transmitted at the same time (carry time, and thus carry the same transmission timestamp).
   timestamp.  In other words, SACKing only one byte of a packet segment or
   SACKing the packet segment in entirety have the same effect on with RACK.

11.

10.  IANA Considerations

   This document makes no request of IANA.

   Note to RFC Editor: this section may be removed on publication as an
   RFC.

12.

11.  Acknowledgments

   The authors thank Matt Mathis for his insights in FACK and Michael
   Welzl for his per-packet timer idea that inspired this work.  Eric
   Dumazet, Randy Stewart, Van Jacobson, Ian Swett, Rick Jones, Jana
   Iyengar, Hiren Panchasara, Praveen Balasubramanian, Yoshifumi
   Nishida, Bob Briscoe, Felix Weinrank, Michael Tuexen, Martin Duke,
   and
   Ilpo Jarvinen Jarvinen, Theresa Enghardt, Mirja Kuehlewind, Gorry Fairhurst,
   and Yi Huang contributed to the draft or the implementations in
   Linux, FreeBSD, Windows Windows, and QUIC.

13.

12.  References

13.1.

12.1.  Normative References

   [RFC2018]  Mathis, M. and J. Mahdavi, "TCP Selective Acknowledgment
              Options", RFC 2018, October 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", RFC 2119, March 1997.

   [RFC2883]  Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
              Extension to the Selective Acknowledgement (SACK) Option
              for TCP", RFC 2883, July 2000.

   [RFC4737]  Morton, A., Ciavattone, L., Ramachandran, G., Shalunov,
              S., and J. Perser, "Packet Reordering Metrics", RFC 4737,
              November 2006.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, September 2009.

   [RFC5682]  Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
              "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
              Spurious Retransmission Timeouts with TCP", RFC 5682,
              September 2009.

   [RFC5827]  Allman, M., Ayesta, U., Wang, L., Blanton, J., and P.
              Hurtig, "Early Retransmit for TCP and Stream Control
              Transmission Protocol (SCTP)", RFC 5827, April 2010.

   [RFC6298]  Paxson, V., Allman, M., Chu, J., and M. Sargent,
              "Computing TCP's Retransmission Timer", RFC 6298, June
              2011.

   [RFC6675]  Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
              and Y. Nishida, "A Conservative Loss Recovery Algorithm
              Based on Selective Acknowledgment (SACK) for TCP",
              RFC 6675, August 2012.

   [RFC6937]  Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
              Rate Reduction for TCP", May 2013.

   [RFC7323]  Borman, D., Braden, B., Jacobson, V., and R.
              Scheffenegger, "TCP Extensions for High Performance",
              September 2014.

   [RFC793]   Postel, J., "Transmission Control Protocol", September
              1981.

13.2.

12.2.  Informative References

   [DMCG11]   Dukkipati, N., Mathis, M., Cheng, Y., and M. Ghobadi,
              "Proportional Rate Reduction for TCP", May 2013.

   [FACK]     Mathis, M. and M. Jamshid, "Forward acknowledgement:
              refining TCP congestion control", ACM SIGCOMM Computer
              Communication Review, Volume 26, Issue 4, Oct. 1996. ,
              1996.

   [POLICER16]
              Flach, T., Papageorge, P., Terzis, A., Pedrosa, L., Cheng,
              Y., Karim, T., Katz-Bassett, E., and R. Govindan, "An
              Analysis of Traffic Policing in the Web", ACM SIGCOMM ,
              2016.

   [QUIC-LR]  Iyengar, J. and I. Swett, "QUIC Loss Recovery And
              Congestion Control", draft-ietf-quic-recovery-latest (work
              in progress), March 2020.

   [RACK-TCPM97]
              Cheng, Y., "RACK: a time-based fast loss recovery", IETF97
              TCPM meeting , 2016.

   [RFC7765]  Hurtig, P., Brunstrom, A., Petlund, A., and M. Welzl, "TCP
              and SCTP RTO Restart", February 2016.

   [SCWA99]   Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
              "TCP Congestion Control With a Misbehaving Receiver", ACM
              Computer Communication Review, 29(5) , 1999.

   [Sprout]   Winstein, K., Sivaraman, A., and H. Balakrishnan,
              "Stochastic Forecasts Achieve High Throughput and Low
              Delay over Cellular Networks", USENIX Symposium on
              Networked Systems Design and Implementation (NSDI) , 2013.

   [THIN-STREAM]
              Petlund, A., Evensen, K., Griwodz, C., and P. Halvorsen,
              "TCP enhancements for interactive thin-stream
              applications", NOSSDAV , 2008.

   [TLP]      Dukkipati, N., Cardwell, N., Cheng, Y., and M. Mathis,
              "Tail Loss Probe (TLP): An Algorithm for Fast Recovery of
              Tail Drops", draft-dukkipati-tcpm-tcp-loss-probe-01 (work
              in progress), August 2013.

Authors' Addresses

   Yuchung Cheng
   Google, Inc

   Email: ycheng@google.com

   Neal Cardwell
   Google, Inc

   Email: ncardwell@google.com
   Nandita Dukkipati
   Google, Inc

   Email: nanditad@google.com

   Priyaranjan Jha
   Google, Inc

   Email: priyarjha@google.com