Network Working Group G. Almes Internet Draft S. Kalidindi Expiration Date:December 1998March 1999 M. Zekauskas Advanced Network & ServicesJuneAugust 1998 A One-way Delay Metric for IPPM<draft-ietf-ippm-delay-03.txt><draft-ietf-ippm-delay-04.txt> 1. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet Drafts. Internet-Drafts are draft documents valid for a maximum of six months, and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. 2. Introduction This memo defines a metric for one-way delay of packets across Internet paths. It builds on notions introduced and discussed in the IPPM Framework document, RFC22232330 [1]; the reader is assumed to be familiar with that document. This memo is intended to be parallel in structure to a companion document for Packet Loss ("A Packet Loss Metric for IPPM"<draft-ietf-ippm-loss-02.txt>)<draft-ietf-ippm-loss-04.txt>) [2]. The structure of the memo is as follows: + A 'singleton' analytic metric, called Type-P-One-way-Delay, will be introduced to measure a single observation of one-way delay. + Using this singleton metric, a 'sample', called Type-P-One-way- Delay-Poisson-Stream, will be introduced to measure a sequence of singleton delays measured at times taken from a Poisson process. + Using this sample, several 'statistics' of the sample will be defined and discussed. This progression from singleton to sample to statistics, with clear separation among them, is important. Whenever a technical term from the IPPM Framework document is first used in this memo, it will be tagged with a trailing asterisk. For example, "term*" indicates that "term" is defined in the Framework. 2.1. Motivation: One-way delay of atype-PType-P* packet from a source host* to a destination host is useful for several reasons: + Some applications do not perform well (or at all) if end-to-end delay between hosts is large relative to some threshold value. + Erratic variation in delay makes it difficult (or impossible) to support many real-time applications. + The larger the value of delay, the more difficult it is for transport-layer protocols to sustain high bandwidths. + The minimum value of this metric provides an indication of the delay due only to propagation and transmission delay. + The minimum value of this metric provides an indication of the delay that will likely be experienced when the path* traversed is lightly loaded. + Values of this metric above the minimum provide an indication of the congestion present in the path. It is outside the scope of this document to say precisely how delay metrics would be applied to specific problems. 2.2. General Issues Regarding Time Whenever a time (i.e., a moment in history) is mentioned here, it is understood to be measured in seconds (and fractions) relative to UTC. As described more fully in the Framework document, there are four distinct, but related notions of clock uncertainty: synchronization* measures the extent to which two clocks agree on what time it is. For example, the clock on one host might be 5.4 msec ahead of the clock on a second host. accuracy* measures the extent to which a given clock agrees with UTC. For example, the clock on a host might be 27.1 msec behind UTC. resolution* measures the precision of a given clock. For example, the clock on an old Unix host might tick only once every 10 msec, and thus have a resolution of only 10 msec. skew* measures the change of accuracy, or of synchronization, with time. For example, the clock on a given host might gain 1.3 msec per hour and thus be 27.1 msec behind UTC at one time and only 25.8 msec an hour later. In this case, we say that the clock of the given host has a skew of 1.3 msec per hour relative to UTC, and this threatens accuracy. We might also speak of the skew of one clock relative to another clock, and this threatens synchronization. 3. A Singleton Definition for One-way Delay 3.1. Metric Name: Type-P-One-way-Delay 3.2. Metric Parameters: + Src, the IP address of a host + Dst, the IP address of a host + T, a time 3.3. Metric Units: The value of atype-P-One-way-DelayType-P-One-way-Delay is either a non-negative realnumbernumber, or an undefined (informally, infinite) number of seconds. 3.4. Definition: For a non-negative real number dT, >>the *Type-P-One-way-Delay* from Src to Dst at T is dT<< means that Src sent the first bit of atype-PType-P packet to Dst at wire-time* T and that Dst received the last bit of that packet at wire-time T+dT. >>The *Type-P-One-way-Delay* from Src to Dst at T is undefined (informally, infinite)<< means that Src sent the first bit of atype-Type- P packet to Dst at wire-time T and that Dst did not receive that packet. Suggestions for what to report along with metric values appear in Section 3.8 after a discussion of the metric, methodologies for measuring the metric, and error analysis. 3.5. Discussion: Type-P-One-way-Delay is a relatively simple analytic metric, and one that we believe will afford effective methods of measurement. The following issues are likely to come up in practice: + Since delay values will often be as low as the 100 usec to 10 msec range, it will be important for Src and Dst to synchronize very closely. GPS systems afford one way to achieve synchronization to within several 10s of usec. Ordinary application of NTP may allow synchronization to within several msec, but this depends on the stability and symmetry of delay properties among those NTP agents used, and this delay is what we are trying to measure. A combination of some GPS-based NTP servers and a conservatively designed and deployed set of other NTP servers should yield good results, but this is yet to be tested. + A given methodology will have to include a way to determine whether a delay value is infinite or whether it is merely very large (and the packet is yet to arrive at Dst). As noted by Mahdavi and Paxson [4], simple upper bounds (such as the 255 seconds theoretical upper bound on the lifetimes of IP packets [5]) could be used, but good engineering, including an understanding of packet lifetimes, will be needed in practice. {Comment: Note that, for many applications of these metrics, the harm in treating a large delay as infinite might be zero or very small. A TCP data packet, for example, that arrives only after several multiples of the RTT may as well have been lost.} +The context in whichIf themetricpacket ismeasured must be carefully considered, and should always be reportedduplicated alongwith metric results. As noted intheFramework document [1],path (or paths) so that multiple non-corrupt copies arrive at thevalue ofdestination, then themetric may depend onpacket is counted as received, and thetype of IP packets usedfirst copy tomakearrive determines themeasurement, or "type-P". The value of Type-P-One-way-Delay could change ifpacket's one-way delay. + If the packet is fragmented and if, for whatever reason, reassembly does not occur, then the packet will be deemed lost. 3.6. Methodologies: As with other Type-P-* metrics, the detailed methodology will depend on the Type-P (e.g., protocol(UDP or TCP),number, UDP/TCP port number, size,or arrangementprecedence). Generally, forspecial treatment (e.g., IP precedence or RSVP) changes. The exact Type-P used to make the measurements must be accurately reported. In addition,a given Type-P, thethreshold (ormethodology would proceed as follows: + Arrange that Src and Dst are synchronized; that is, that they have clocks that are very closely synchronized with each other and each fairly close todistinguish) between a large finite delaythe actual time. + At the Src host, select Src andloss should be reported. Finally,Dst IP addresses, and form a test packet of Type-P with these addresses. Any 'padding' portion of thepath traversed bypacket needed only to make the test packet a given size should bereported, if possible. In general it is impracticalfilled with randomized bits toknow the precise pathavoid agiven packet takes throughsituation in which thenetwork. The precise path maymeasured delay is lower than it would otherwise beknown for certain Type-P on short or stable paths. If Type-P includesdue to compression techniques along therecord route (or loose-source route) optionpath. + At the Dst host, arrange to receive the packet. + At the Src host, place a timestamp in theIP header,prepared Type-P packet, and send it towards Dst. + If thepath is short enough, and all routers* onpacket arrives within a reasonable period of time, take a timestamp as soon as possible upon thepath support record (or loose-source) route, thenreceipt of thepath willpacket. By subtracting the two timestamps, an estimate of one-way delay can beprecisely recorded. This is impractical becausecomputed. Error analysis of a given implementation of theroutemethod mustbe short enough, many routers do not support (or are not configured for) record route, and usetake into account the closeness ofthis feature would often artificially worsensynchronization between Src and Dst. If theperformance observed by removingdelay between Src's timestamp and the actual sending of the packetfrom common-case processing. However, partial informationisstill valuable context. For example, if a host can choose between two links* (and hence two separate routes from src to dst),known, then theinitial link used is valuable context. {Comment: For example, with Merit's NetNow setup, a Src on one NAP can reach a Dst on another NAPestimate could be adjusted byeithersubtracting this amount; uncertainty in this value must be taken into account in error analysis. Similarly, if the delay between the actual receipt ofseveral different backbone networks.} The above listthe packet and Dst's timestamp isnot exhaustive; any additional information thatknown, then the estimate could beusefuladjusted by subtracting this amount; uncertainty ininterpreting applications of the metrics shouldthis value must bereported.taken into account in error analysis. See the next section, "Errors and Uncertainties", for a more detailed discussion. + If the packetis duplicated alongfails to arrive within a reasonable period of time, thepath (or paths) soone-way delay is taken to be undefined (informally, infinite). Note thatmultiple non-corrupt copies arrive atthedestination, thenthreshold of 'reasonable' is a parameter of the methodology. Issues such as the packetis countedformat, the means by which Dst knows when to expect the test packet, and the means by which Src and Dst are synchronized are outside the scope of this document. {Comment: We plan to document elsewhere our own work in describing such more detailed implementation techniques and we encourage others to asreceived,well.} 3.7. Errors and Uncertainties: The description of any specific measurement method should include an accounting and analysis of various sources of error or uncertainty. The Framework document provides general guidence on this point, but we note here the following specifics related to delay metrics: + Errors or uncertainties due to uncertainties in the clocks of the Src and Dst hosts. + Errors or uncertainties due to the difference between 'wire time' and 'host time'. In addition, the loss threshold may affect the results. Each of these are discussed in more detail below, along with a section ("Calibration") on accounting for these errors and uncertainties. 3.7.1. Errors or uncertainties related to Clocks The uncertainty in a measurement of one-way delay is related, in part, to uncertainties in the clocks of the Src and Dst hosts. In the following, we refer to the clock used to measure when the packet was sent from Src as the source clock, we refer to the clock used to measure when the packet was received by Dst as the dest clock, we refer to the observed time when the packet was sent by the source clock as Tsource, and the observed time when the packet was received by the dest clock as Tdest. Alluding to the notions of synchronization, accuracy, resolution, and skew mentioned in the Introduction, we note the following: + Any error in the synchronization between the source clock and the dest clock will contribute to error in the delay measurement. We say that the source clock and the dest clock have a synchronization error of Tsynch if the source clock is Tsynch ahead of the dest clock. Thus, if we know the value of Tsynch exactly, we could correct for clock synchronization by adding Tsynch to the uncorrected value of Tdest-Tsource. + The accuracy of a clock is important only in identifying the time at which a given delay was measured. Accuracy, per se, has no importance to the accuracy of the measurement of delay. When computing delays, we are interested only in the differences between clock values, not the values themselves. + The resolution of a clock adds to uncertainty about any time measured with it. Thus, if the source clock has a resolution of 10 msec, then this adds 10 msec of uncertainty to any time value measured with it. We will denote the resolution of the source clock and the dest clock as Rsource and Rdest, respectively. + The skew of a clock is not so much an additional issue as it is a realization of the fact that Tsynch is itself a function of time. Thus, if we attempt to measure or to bound Tsynch, this needs to be done periodically. Over some periods of time, this function can be approximated as a linear function plus some higher order terms; in these cases, one option is to use knowledge of the linear component to correct the clock. Using this correction, the residual Tsynch is made smaller, but remains a source of uncertainty that must be accounted for. We use the function Esynch(t) to denote an upper bound on the uncertainty in synchronization. Thus, |Tsynch(t)| <= Esynch(t). Taking these items together, we note that naive computation Tdest- Tsource will be off by Tsynch(t) +/- (|Rsource|+|Rdest|). Using the notion of Esynch(t), we note that these clock-related problems introduce a total uncertainty of Esynch(t)+|Rsource|+|Rdest|. This estimate of total clock-related uncertainty should be included in thefirst copyerror/uncertainty analysis of any measurement implementation. 3.7.2. Errors or uncertainties related toarrive determines the packet'sWire-time vs Host-time As we have defined one-waydelay. + Ifdelay, we would like to measure thepacket is fragmented and if, for whatever reason, reassembly does not occur, thentime between when the test packetwill be deemed lost. 3.6. Methodologies: As with other Type-P-* metrics, the detailed methodology will depend on the Type-P (e.g., protocol number, UDP/TCP port number, size, precedence). Generally, for a given Type-P,leaves themethodology would proceed as follows: + Arrange thatnetwork interface of Src andDst are synchronized; that is, that they have clocks that are very closely synchronized with each otherwhen it (completely) arrives at the network interface of Dst, andeach fairly closewe refer to this as 'wire time'. If theactual time. + At the Src host, selecttimings are themselves performed by software on Src andDst IP addresses, and form a test packet of Type-P with these addresses. Any 'padding' portion of the packet neededDst, however, then this software can only directly measure the time between when Src grabs a timestamp just prior tomakesending the test packet and when Dst grabs agiven size should be filled with randomized bits to avoid a situation in whichtimestamp just after having received themeasured delay is lower than it would otherwise be duetest packet, and we refer tocompression techniques alongthis as 'host time'. To thepath. + Atextent that theDst host, arrangedifference between wire time and host time is accurately known, this knowledge can be used toreceive the packet. + At the Src host, place a timestamp in the prepared Type-P packet,correct for host time measurements andsend it towards Dst. + Ifthepacket arrives within a reasonable period of time, take a timestamp as soon as possible uponcorrected value more accurately estimates thereceipt ofdesired (wire time) metric. To thepacket. By subtractingextent, however, that thetwo timestamps, an estimate of one-way delay candifference between wire time and host time is uncertain, this uncertainty must becomputed. Erroraccounted for in an analysis of a givenimplementation ofmeasurement method. We denote by Hsource an upper bound on themethod must take into accountuncertainty in thecloseness of synchronizationdifference betweenSrcwire time andDst. Ifhost time on thedelay between Src's timestampSrc host, and similarly define Hdest for theactual sending of the packet is known,Dst host. We thenthenote that these problems introduce a total uncertainty of Hsource+Hdest. This estimatecould be adjusted by subtracting this amount;of total wire-vs-host uncertaintyin this value mustshould betaken into accountincluded inerror analysis. Similarly, if the delay betweentheactual receipterror/uncertainty analysis of any measurement implementation. 3.7.3. Calibration Generally, thepacket and Dst's timestamp is known, thenmeasured values can be decomposed as follows: measured value = true value + systematic error + random error If theestimate couldsystematic error (the constant bias in measured values) can beadjusted by subtracting this amount; uncertaintydetermined, it can be compensated for inthisthe reported results. reported valuemust= measured value - systematic error therefore reported value = true value + random error The goal of calibration is to determine the systematic and random error in as much detail as possible. At a minimum, a bound ("e") should betaken into accountfound such that the reported value is inerror analysis. + Ifthepacket failsrange (true value - e) toarrive within a reasonable period(true value + e) at least 95 percent oftime,theone-way delaytime. We call "e" the error bar for the measurements. {Comment: 95 percent was chosen because (1) some confidence level istakendesirable to beundefined (informally, infinite). Noteable to remove outliers which will be found in measuring any physical property; (2) a particular confidence level should be specified so that thethresholdresults of'reasonable' here isindependent implementations can be compared; and (3) even with aparameter ofprototype user-level implementation, 95% was loose enough to exclude outliers.} From themethodology. Issues such asdiscussion in thepacket format,previous two sections, themeanserror in measurements could be bounded bywhich Dst knows when to expectdetermining all thetest packet,individual uncertainties, and adding them together to form Esynch(t) + |Rsource| + |Rdest| + Hsource + Hdest. However, reasonable bounds on both themeansclock-related uncertainty captured bywhich Src and Dst are synchronized are outsidethescope of this document. {Comment: We plan to document elsewhere our own work in describing such more detailed implementation techniques and we encourage others to as well.} 3.7. Errorsfirst three terms andUncertainties: The description of any specific measurement methodthe host-related uncertainty captured by the last two terms shouldinclude an accountingbe possible by careful design techniques andanalysis of various sources of error/uncertainty. The Framework document provides general guidence on this point, but we note herecalibrating thefollowing specifics related to delay metrics: + Errors/uncertainties due to uncertaintiesinstruments using a known, isolated, network in a lab. For example, theclocks of the Src and Dst hosts. + Errors/uncertainties due toclock-related uncertainties are greatly reduced through thedifference between 'wire time' and 'host time'. Eachuse ofthese are discussed in more detail below. 3.7.1. Errors/uncertainties related to Clocks The uncertainty inameasurementGPS time source. The sum ofone-way delayEsynch(t) + |Rsource| + |Rdest| isrelated, in part, to uncertainties insmall, and is also bounded for theclocksduration of theSrc and Dst hosts. Inmeasurement because of thefollowing, we refer toglobal time source. The host-related uncertainties, Hsource + Hdest, could be bounded by connecting two instruments back-to-back with a high-speed serial link or isolated LAN (depending on theclock used to measure whenintended network connection for actual measurement), and performing repeated measurements. In this case, unlike measuring live networks, repeated measurements are measuring thepacket was sent from Src assame wire time. (When measuring live networks, thesource clock, we refer towire time is what you are measuring, and varies with theclock used to measure whenload encountered on thepacket was receivedpath traversed byDst asthedest clock, we refer totest packets.) If theobservedtest packets are small, such a network connection has a minimal wire timewhen the packet was sentthat may be approximated by zero. The measured delay therefore contains only systematic and random error in thesource clock as Tsource,instrumentation. The "average value" of repeated measurements is the systematic error, and theobserved time whenvariation is thepacket was received byrandom error. One way to compute thedest clock as Tdest. Alludingsystematic error, and the random error to a 95% confidence is to repeat thenotionsexperiment many times - at least hundreds ofsynchronization, accuracy, resolution,tests. The systematic error would then be the median, andskew mentioned inlikely theIntroduction, we notemode (the most frequently occuring value). {Comment: It's likely thefollowing: + Anysystematic errorinis represented by thesynchronization betweenminimum value (which is also thesource clockmedian and thedest clockmode); with unloaded instruments on a single test path all the random error willcontributetend to be increased time due to host processing. The only errorin theresulting an a delaymeasurement. We say thatless than thesource clocksystematic error would be due to clock-related uncertainties (resolution and relative skew).} The random error could then be found by removing thedest clock have a synchronizationsystematic errorof Tsynch iffrom thesource clock is Tsynch ahead ofmeasured values. The 95% confidence interval would be thedest clock. Thus, if we knowrange from thevalue2nd percentile to the 97th percentile ofTsynch exactly, wethese deviations from the true value. The error bar "e" couldcorrect for clock synchronization by adding Tsynchthen be taken to be theuncorrectedlargest absolute value ofTdest-Tsource. + The accuracythese two numbers, plus the clock-related uncertainty. If all ofa clock is important only in identifyingthetime at which a given delay was measured. Accuracy, per se, has no importance todeviations are positive, then theaccuracy95% confidence interval is simply the 95th percentile, and that value should be used instead of themeasurementlarger ofdelay. Thisthe 2nd and 97th percentiles. {Comment: as described, this bound isbecause, when computing delays, werelatively loose since the uncertainties areinterested only inadded, and thedifferences between clock values. + The resolutionabsolute value of the largest deviation is used. As long as the resulting value is not aclock adds to uncertainty about any timesignificant fraction of the measuredwith it. Thus, ifvalues, it is a reasonable bound. If thesource clock hasresulting value is aresolutionsignificant fraction of10 msec,the measured values, then more exact methods will be needed to compute an error bar.} Note that random error is a function of measurement load. For example, if many paths will be measured by one instrument, thisadds 10 msecmight increase interrupts, process scheduling, and disk I/O (for example, recording the measurements), all ofuncertainty to any time valuewhich may increase the random error in measured singletons. Therefore, in addition to minimal load measurements to find the systematic error, calibration measurements should be performed withit. Wethe same measurement load that the instruments will see in the field. In addition to calibrating the instruments for finite one-way delay, two checks should be made to ensure that packets reported as losses were really lost. First, the threshold for loss should be verified. In particular, ensure the "reasonable" threshold is reasonable: that it is very unlikely a packet willdenote the resolution ofarrive after thesource clockthreshold value, and therefore thedest clock as Rsource and Rdest, respectively. + The skewnumber ofa clock is not so muchpackets lost over anadditional issue as itinterval isa realization ofnot sensitive to thefacterror bound on measurements. Second, consider the probability thatTsynch is itselfafunction of time. Thus, if we attemptpacket arrives at the network interface, but is lost due tomeasurecongestion on that interface or tobound Tsynch, this needs to be done periodically. Over some periods of time, this function can be approximated as a linear function plus some higher order terms;other resource exhaustion (e.g. buffers) inthese cases, one option is to use knowledge ofthelinear component to correctinstrument. 3.8. Reporting theclock. Using this correction,metric: The calibration and context in which theresidual Tsynchmetric ismade smaller, but remains a source of uncertainty thatmeasured must beaccounted for.carefully considered, and should always be reported along with metric results. Weuse the function Esynch(t) to denote an upper bound on the uncertainty in synchronization. Thus, |Tsynch(t)| <= Esynch(t). Taking thesenow present four itemstogether, we note that naive computation Tdest- Tsource will be off by Tsynch(t) +/- (|Rsource|+|Rdest|). Usingto consider: thenotionType-P ofEsynch(t), we note that these clock-related problems introduce a total uncertaintytest packets, the threshold ofEsynch(t)+|Rsource|+|Rdest|.infinite delay (if any), error calibration, and the path traversed by the test packets. Thisestimate of total clock-related uncertainty shouldlist is not exhaustive; any additional information that could beincludeduseful in interpreting applications of theerror/uncertainty analysis of any measurement implementation. 3.7.2. Errors/uncertainties related to Wire-time vs Host-time As we've defined one-way delay, we'd like to measure the time between when the test packet leavesmetrics should also be reported. 3.8.1. Type-P As noted in thenetwork interface of Src and when it (completely) arrives atFramework document [1], thenetwork interfacevalue ofDst, and we refer to this as 'wire time'. Ifthetimings are themselves performed by softwaremetric may depend onSrc and Dst, however, then this software can only directly measurethetime between when Src grabs a timestamp just priortype of IP packets used tosendingmake thetest packet and when Dst grabs a timestamp just after having receivedmeasurement, or "type-P". The value of Type-P-One-way-Delay could change if thetest packet, and we referprotocol (UDP or TCP), port number, size, or arrangement for special treatment (e.g., IP precedence or RSVP) changes. The exact Type-P used tothis as 'host time'. Tomake theextent thatmeasurements must be accurately reported. 3.8.2. Loss threshold In addition, thedifferencethreshold (or methodology to distinguish) betweenwire timea large finite delay andhost time is accurately known, this knowledgeloss should be reported. 3.8.3. Calibration results + If the systematic error can beused to correct for host time measurements anddetermined, it should be removed from thecorrectedmeasured values. + Report an error bar, e, such that the true valuemore accurately estimatesis thedesired (wire time) metric. Toreported value plus or minus e, with 95% confidence. + If possible, report theextent, however,probability thatthe difference between wire time and host timea test packet with finite delay isuncertain, this uncertainty mustreported as lost due to resource exhaustion on the measurement instrument. 3.8.4. Path Finally, the path traversed by the packet should beaccounted for in an analysis ofreported, if possible. In general it is impractical to know the precise path a givenmeasurement method. We denote by Hsource an upper boundpacket takes through the network. The precise path may be known for certain Type-P on short or stable paths. If Type-P includes theuncertaintyrecord route (or loose-source route) option in thedifference between wire timeIP header, andhost time ontheSrc host,path is short enough, andsimilarly define Hdest forall routers* on theDst host. Wepath support record (or loose-source) route, thennote that these problems introduce a total uncertainty of Hsource+Hdest.the path will be precisely recorded. Thisestimate of total wire-vs-host uncertainty shouldis impractical because the route must beincluded inshort enough, many routers do not support (or are not configured for) record route, and use of this feature would often artificially worsen theerror/uncertainty analysisperformance observed by removing the packet from common-case processing. However, partial information is still valuable context. For example, if a host can choose between two links* (and hence two separate routes from src to dst), then the initial link used is valuable context. {Comment: For example, with Merit's NetNow setup, a Src on one NAP can reach a Dst on another NAP by either ofany measurement implementation.several different backbone networks.} 4. A Definition for Samples of One-way Delay Given the singleton metric Type-P-One-way-Delay, we now define one particular sample of such singletons. The idea of the sample is to select a particular binding of the parameters Src, Dst, and Type-P, then define a sample of values of parameter T. The means for defining the values of T is to select a beginning time T0, a final time Tf, and an average rate lambda, then define a pseudo-random Poisson arrival process of rate lambda, whose values fall between T0 and Tf. The time interval between successive values of T will then average 1/lambda. 4.1. Metric Name: Type-P-One-way-Delay-Poisson-Stream 4.2. Metric Parameters: + Src, the IP address of a host + Dst, the IP address of a host + T0, a time + Tf, a time + lambda, a rate in reciprocal seconds 4.3. Metric Units: A sequence of pairs; the elements of each pair are: + T, a time, and + dT, either a non-negative real number or an undefined number of seconds. The values of T in the sequence are monotonic increasing. Note that T would be a valid parameter to Type-P-One-way-Delay, and that dT would be a valid value of Type-P-One-way-Delay. 4.4. Definition: Given T0, Tf, and lambda, we compute a pseudo-random Poisson process beginning at or before T0, with average arrival rate lambda, and ending at or after Tf. Those time values greater than or equal to T0 and less than or equal to Tf are then selected. At each of the times in this process, we obtain the value of Type-P-One-way-Delay at this time. The value of the sample is the sequence made up of the resulting <time, delay> pairs. If there are no such pairs, the sequence is of length zero and the sample is said to be empty. 4.5. Discussion: Note first that, since a pseudo-random number sequence is employed, the sequence of times, and hence the value of the sample, is not fully specified. Pseudo-random number generators of good quality will be needed to achieve the desired qualities. The sample is defined in terms of a Poisson process both to avoid the effects of self-synchronization and also capture a sample that is statistically as unbiased as possible. {Comment: there is, of course, no claim that real Internet traffic arrives according to a Poisson arrival process.} All the singleton Type-P-One-way-Delay metrics in the sequence will have the same values of Src, Dst, and Type-P. Note also that, given one sample that runs from T0 to Tf, and given new time values T0' and Tf' such that T0 <= T0' <= Tf' <= Tf, the subsequence of the given sample whose time values fall between T0' and Tf' are also a valid Type-P-One-way-Delay-Poisson-Stream sample. 4.6. Methodologies: The methodologies follow directly from: + the selection of specific times, using the specified Poisson arrival process, and + the methodologies discussion already given for the singleton Type- P-One-way-Delay metric. Care must, of course, be given to correctly handle out-of-order arrival of test packets; it is possible that the Src could send one test packet at TS[i], then send a second one (later) at TS[i+1], while the Dst could receive the second test packet at TR[i+1], and then receive the first one (later) at TR[i]. 4.7. Errors and Uncertainties: In addition to sources of errors and uncertainties associated with methods employed to measure the singleton values that make up the sample, care must be given to analyze the accuracy of the Poisson arrival process of the wire-time of the sending of the test packets. Problems with this process could be caused byeither ofseveral things, including problems with the pseudo-random number techniques used to generate the Poisson arrival process, or with jitter in the value of Hsource (mentioned above as uncertainty in the singleton delay metric). The Framework document shows how to useanthe Anderson- Darling test to verify the accuracy of the Poisson process. 4.8. Reporting the metric: You should report the calibration and context forthis.the underlying singletons along with the stream. (See "Reporting the metric" for Type-P-One-way-Delay.) 5. Some Statistics Definitions for One-way Delay Given the sample metric Type-P-One-way-Delay-Poisson-Stream, we now offer several statistics of that sample. These statistics are offered mostly to be illustrative of what could be done. 5.1. Type-P-One-way-Delay-Percentile Given a Type-P-One-way-Delay-Poisson-Stream and a percent X between 0% and 100%, the Xth percentile of all the dT values in the Stream. In computing this percentile, undefined values are treated as infinitely large. Note that this means that the percentile could thus be undefined (informally, infinite). In addition, the Type-P- One-way-Delay-Percentile is undefined if the sample is empty. Example: suppose we take a sample and the results are: Stream1 = < <T1, 100 msec> <T2, 110 msec> <T3, undefined> <T4, 90 msec> <T5, 500 msec> > Then the 50th percentile would be 110 msec, since 90 msec and 100 msec are smaller and 110 msec and 'undefined' are larger. Note that if the probability that a finite packet is reported as lost is significant, then a high percentile (90th or 95th) might be reported as infinite instead of finite. 5.2. Type-P-One-way-Delay-Median Given a Type-P-One-way-Delay-Poisson-Stream, the median of all the dT values in the Stream. In computing the median, undefined values are treated as infinitely large. As noted in the Framework document, the median differs from the 50th percentile only when the sample contains an even number of values, in which case the mean of the two central values is used. Example: suppose we take a sample and the results are: Stream2 = < <T1, 100 msec> <T2, 110 msec> <T3, undefined> <T4, 90 msec> > Then the median would be 105 msec, the mean of 100 msec and 110 msec, the two central values. 5.3. Type-P-One-way-Delay-Minumum Given a Type-P-One-way-Delay-Poisson-Stream, the minimum of all the dT values in the Stream. In computing this, undefined values are treated as infinitely large. Note that this means that the minimum could thus be undefined (informally, infinite) if all the dT values are undefined. In addition, the Type-P-One-way-Delay-Minimum is undefined if the sample is empty. In the above example, the minimum would be 90 msec. 5.4. Type-P-One-way-Delay-Inverse-Percentile Given a Type-P-One-way-Delay-Poisson-Stream and a non-negative time duration threshold, the fraction of all the dT values in the Stream less than or equal to the threshold. The result could be as low as 0% (if all the dT values exceed threshold) or as high as 100%. In the above example, the Inverse-Percentile of 103 msec would be 50%. 6. Security Considerations Conducting Internet measurements raises both security and privacy concerns. This memo does not specify an implementation of the metrics, so it does not directly affect the security of the Internet nor of applications which run on the Internet. However, implementations of these metrics must be mindful of security and privacy concerns. There are two types of security concerns: potential harm caused by the measurements, and potential harm to the measurements. The measurements could cause harm because they are active, and inject packets into the network. The measurement parameters must be carefully selected so that the measurements inject trivial amounts of additional traffic into the networks they measure. If they inject "too much" traffic, they can skew the results of the measurement, and in extreme cases cause congestion and denial of service. The measurements themselves could be harmed by routers giving measurement traffic a different priority than "normal" traffic, or by an attacker injecting artificial measurement traffic. If routers can recognize measurement traffic and treat it separately, the measurements will not reflect actual user traffic. If an attacker injects artificial traffic that is accepted as legitimate, the loss rate will be artificially lowered. Therefore, the measurement methodologies should include appropriate techniques to reduce the probability measurement traffic can be distinguished from "normal" traffic. Authentication techniques, such as digital signatures, may be used where appropriate to guard against injected traffic attacks. The privacy concerns of network measurement are limited by the active measurements described in this memo. Unlike passive measurements, there can be no release of existing user data. 7. Acknowledgements Special thanks are due to Vern Paxson of Lawrence Berkeley Labs for his helpful comments on issues of clock uncertainty and statistics. Thanks also to Will Leland, SeanShapiraShapira, andtoRoland Wittig for several useful suggestions. 8. References [1] V. Paxson, G. Almes, J. Mahdavi, and M. Mathis, "Framework for IP Performance Metrics", RFC 2330, May 1998. [2] G. Almes, S. Kalidindi, and M. Zekauskas, "AOne-way DelayPacket Loss Metric for IPPM", Internet-Draft<draft-ietf-ippm-delay-02.txt>, June<draft-ietf-ippm-loss-04.txt>, August 1998. [3] D. Mills, "Network Time Protocol (v3)", RFC 1305, April 1992. [4] J. Mahdavi and V. Paxson,"Connectivity", Work in Progress, November 1997."IPPM Metrics for Measuring Connectivity", Internet-Draft <draft-ietf-ippm- connectivity-02.txt>, August 1998. [5] J. Postel, "Internet Protocol", RFC 791, September 1981. 9. Authors' Addresses Guy Almes Advanced Network & Services, Inc. 200 Business Park Drive Armonk, NY 10504 USA Phone: +1 914 765 1120 EMail: almes@advanced.org Sunil Kalidindi Advanced Network & Services, Inc. 200 Business Park Drive Armonk, NY 10504 USA Phone: +1 914 765 1128 EMail: kalidindi@advanced.org Matthew J. Zekauskas Advanced Network & Services, Inc. 200 Buisiness Park Drive Armonk, NY 10504 USA Phone: +1 914 765 1112 EMail: matt@advanced.org Expiration date:December, 1998March, 1999