Network Working Group Y. Cai Internet-Draft H. Ou Intended status: Standards Track Alibaba Group Expires:May 17, 2019April 13, 2020 S. Vallepalli M. Mishra S. Venaas Cisco Systems, Inc. A. Green British TelecomNovember 13, 2018October 11, 2019 PIM Designated Router Load Balancingdraft-ietf-pim-drlb-10draft-ietf-pim-drlb-11 Abstract On a multi-access network, one of thePIMPIM-SM routers is elected as a DesignatedRouter (DR). OnRouter. One of thelast hop LAN,responsibilities of thePIM DRDesignated Router isresponsible for trackingto track local multicast listeners andforwarding trafficforward data to these listeners if the group is operating in PIM-SM. This document specifies a modification to the PIM-SM protocol that allows more than one ofthese last hopthe PIM-SM routers tobe selected,take on this responsibility so that the forwarding load can be distributed amongthesemultiple routers. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onMay 17, 2019.April 13, 2020. Copyright Notice Copyright (c)20182019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Functional Overview . . . . . . . . . . . . . . . . . . . . .65 4.1. GDR Candidates . . . . . . . . . . . . . . . . . . . . . 64.2. Hash Mask and Hash Algorithm5. Protocol Specification . . . . . . . . . . . . . . . . . . . 74.3. Modulo5.1. Hash Mask and Hash Algorithm . . . . . . . . . . . . . . 7 5.2. Modulo Hash Algorithm . . . . . . .8 4.3.1. Limitations. . . . . . . . . . . 8 5.2.1. Modulo Hash Algorithm Example . . . . . . . . . . . . 94.4. PIM Hello Options5.2.2. Limitations . . . . . . . . . . . . . . . . . . . .9 5.. 10 5.3. PIM HelloOption FormatsOptions . . . . . . . . . . . . . . . . . . . . 105.1.5.3.1. PIM DR Load Balancing Capability(DRLBC)(DRLB-Cap) Hello Option . .10 5.2. PIM DR Load Balancing GDR (DRLBGDR) Hello Option .. . .10 6. Protocol Specification. . . . . . . . . . . . . . . . . . 10 5.3.2. PIM DR Load Balancing List (DRLB-List) Hello Option . 116.1.5.4. PIM DR Operation . . . . . . . . . . . . . . . . . . . .11 6.2.12 5.5. PIM GDR Candidate Operation . . . . . . . . . . . . . . .12 6.2.1. Router Receives New DRLBGDR . . . . . . . . . . . . .136.2.2. Router Receives Updated DRLBGDR5.6. DRLB-List Hello Option Processing . . . . . . . . . . . . 136.3.5.7. PIM Assert Modification . . . . . . . . . . . . . . . . . 147.5.8. Backward Compatibility . . . . . . . . . . . . . . . . .. . . . . . . 15 8.16 6. Manageability Considerations . . . . . . . . . . . . . . . . 169.7. IANA Considerations . . . . . . . . . . . . . . . . . . . . .16 9.1.17 7.1. Initial registry . . . . . . . . . . . . . . . . . . . .16 9.2.17 7.2. Assignment of new hash algorithms . . . . . . . . . . . .16 10.17 8. Security Considerations . . . . . . . . . . . . . . . . . . .16 11.17 9. Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .17 12.18 10. References . . . . . . . . . . . . . . . . . . . . . . . . .17 12.1.18 10.1. Normative References . . . . . . . . . . . . . . . . . .17 12.2.18 10.2. Informative References . . . . . . . . . . . . . . . . .1718 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .1719 1. Introduction On a multi-accessLANLAN, such as an Ethernet, with one or more PIM-SM [RFC7761] routers, one of thePIMPIM-SM routers is elected as aDR.Designated Router (DR). The PIM DR has tworolesresponsibilities in the PIM-SM protocol.On the first hopFor any active sources on a LAN, the PIM DR is responsible for registeringan active sourcewith the Rendezvous Point (RP) if the group is operating in PIM-SM.On the last hop LAN,Also, the PIM DR is responsible for tracking local multicast listeners and forwarding to these listeners if the group is operating in PIM-SM. Consider the followinglast hopLAN in Figure 1: (core networks) | | | | | | R1 R2 R3 | | |--(last hop LAN)------(LAN)---- | | (many receivers) Figure 1:Last HopLAN with receivers Assume R1 is elected as theDesignated Router.DR. According to[RFC7761],the PIM-SM protocol, R1 will be responsible for forwarding traffic to that LAN on behalf of any local members. In addition to keeping track ofIGMP and MLDmembership reports, R1 is also responsible for initiating the creation of source and/or shared trees towards the senders or the RPs.Forcing soleThe membership reports would be IGMP or MLD messages. This applies to any versions of the IGMP and MLD protocols. The most recent versions are IGMPv3 [RFC3376] and MLDv2 [RFC3810]. Having a single router acting as DR and being responsible for data plane forwardingresponsibility on the PIM DR uncovers a limitation in the protocol. In comparison, even though an OSPF DR or an IS-IS DIS handles additional duties while running the OSPF or IS-IS protocols, they are not required to be solely responsible for forwarding packets for the network. On the other hand, on a last hop LAN, only the PIM DR is asked to forward packets while the other routers handle only control traffic (and perhaps drop packets due to RPF failures). Hence the forwarding load of a last hop LAN is concentrated on a single router. This leads to several issues. One ofleads to several issues. One of the issues is that the aggregated bandwidth will be limited to what R1 can handletowards this particular interface.with regards to capacity of incoming links, the interface on the LAN, and total forwarding capacity. It is very common thatthe last hopa LAN consists of switches that run IGMP/MLD or PIMsnooping.snooping [RFC4541]. This allows the forwarding of multicast packets to be restricted only to segments leading to receivers who have indicated their interest in multicast groups using either IGMP or MLD. The emergence of the switched Ethernet allows the aggregated bandwidth to exceed, sometimes by a large number, that of a single link. For example, let us modify Figure 1 and introduce an Ethernet switch in Figure 2. (core networks) | | | | | | R1 R2 R3 | | | +=gi0===gi1===gi2=+ + + + switch + + + +=gi4===gi5===gi6=+ | | | H1 H2 H3 Figure 2:Last Hop NetworkLAN with Ethernet Switch Let us assume that each individual link is a Gigabit Ethernet. Each router, R1, R2 and R3, and the switch have enough forwarding capacity to handle hundreds of Gigabits of data. Let us further assume that each of the hosts requests 500 Mbps of unique multicast data. This totals to 1.5 Gbps of data, which is less than what each switch or the combined uplink bandwidth across the routers can handle, even under failure of a single router. On the other hand, the link between R1 and switch, via port gi0, can only handle a throughput of 1Gbps. And if R1 is the only DR (the PIM DR elected using the procedure defined by [RFC7761]) at least 500 Mbps worth of data will be lost because the only link that can be used to draw the traffic from the routers to the switch is via gi0. In other words, the entire network's throughput is limited by the single connection between the PIM DR and the switch (orthe last hopLAN as in Figure 1). Another important issue is related to failover. If R1 is the only forwarder onthe last hop router fora shared LAN, when R1 goes out of service, multicast forwarding for the entire LAN has to be rebuilt by the newly elected PIM DR. However, if there was a way that allowed multiple routers to forward to the LAN for different groups, failure of one of the routers would only lead to disruption to a subset of the flows, therefore improving the overall resilience of the network.There is a limitation in the hash algorithm used in this document, but this document provides the option to have different and more consistent hash algorithms in the future.This document specifies a modification to the PIM-SM protocol that allows more than one of these routers, called Group Designated Routers (GDR) to be selected so that the forwarding load can be distributed among a number of routers. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in[RFC2119].BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. With respect toPIM,PIM-SM, this document follows the terminology that has been defined in [RFC7761]. This document also introduces the following new acronyms: o GDR:GDR stands for "GroupGroup DesignatedRouter".Router. For each multicast flow, either a (*,G) forASM,Any-Source Multicast (ASM), or an (S,G) forSSM,Source- Specific Multicast (SSM) [RFC4607], a hash algorithm (described below) is used to select one of the routers as a GDR. The GDR is responsible for initiating the forwarding tree building process for the corresponding multicast flow. o GDR Candidate: alast hoprouter that has the potential to become a GDR.A GDR Candidate must have the same DR priority and must run the same GDR election hash algorithm as the DR router. It must send and process new PIM Hello Options as defined in this document.There might bemore than onemultiple GDRCandidateCandidates on a LAN, but only one can become the GDR for a specific multicast flow. 3. Applicability The extension specified in this document applies to PIM-SM when they act as last hop routersonly.(there are directly connected receivers). It does not alter the behavior of a PIMDRDR, or any other routers, on the first hopnetwork.network (directly connected sources). This is because the source tree is built using the IP address of the sender, not the IP address of the PIM DR that sends the registers towards the RP. The load balancing between first hop routers can be achieved naturally if an IGP provides equal cost multiple paths (which it usually does in practice). Also distributing the load to do registering does not justify the additional complexity required to support it. 4. Functional Overview In the PIM DR election as defined in [RFC7761], when multiplelast hoprouters are connected to a multi-access LAN (for example, an Ethernet), one of them is elected to act as PIM DR. The PIM DR is responsible for sending local Join/Prune messages towards the RP or source. In order to elect the PIM DR, each PIM router on the LAN examines the received PIM Hello messages and compares its own DR priority and IP address with those of its neighbors. The router with the highest DR priority is the PIM DR. If there are multiple such routers, their IP addresses are used as the tie-breaker, as described in [RFC7761]. In order to share forwarding load among last hop routers, besides the normal PIM DR election, the GDR is also elected on thelast hopmulti-access LAN. There is only one PIM DR on the multi-access LAN, but there might be multiple GDR Candidates. For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a hash algorithm is used to select one of the routers to be the GDR. A new DR Load Balancing Capability(DRLBC)(DRLB-Cap) PIM Hello Option, which contains hash algorithm type, is announced by routers on interfaces where this specification is enabled.Last hop routersRouters with the newDRLBCDRLB-Cap Option advertised initstheir PIM Hello,andusing the same GDR election hash algorithm and the same DR priority as the PIM DR, are considered as GDR Candidates. Hash Masks are defined for Source, Group and RP separately, in order to handle PIM ASM/SSM. The masks, as well as a sorted list of GDR Candidate Addresses, are announced by the DR in a new DR Load BalancingGDR (DRLBGDR)List (DRLB-List) PIM Hello Option. A hash algorithm based on the announced Source, Group, or RP masks allows one GDR to be assigned to a corresponding multicast state. And that GDR is responsible for initiating the creation of the multicast forwarding tree for multicast traffic. 4.1. GDR Candidates GDR is the new concept introduced by this specification. GDR Candidates are routers eligible for GDR election on the LAN. To become a GDR Candidate, a routerMUST support this specification,must have the same DR priority and run the same GDR election hash algorithm as the DR on the LAN. For example, assume there are 4 routers on the LAN: R1, R2, R3 and R4,which all support this specification.each announcing a DRLB-Cap option. R1, R2 and R3 have the same DR priority while R4's DR priority is less preferred. In this example, R4 will not be eligible for GDR election, because R4 will not become a PIM DR unless all of R1, R2 and R3 go out of service. Furthermore, assume router R1 wins the PIM DR election, R1 and R2 run the same hash algorithm for GDR election, while R3 runs a different one. In this case, only R1 and R2 will be eligible for GDR election, while R3 will not. As a DR, R1 will include its own Load Balancing Hash Masks and the identity of R1 and R2 (the GDR Candidates) in itsDRLBGDRDRLB-List Hello Option.4.2.5. Protocol Specification 5.1. Hash Mask and Hash Algorithm A Hash Mask is used to extract a number of bits from the corresponding IP address field (32 forv4,IPv4, 128 forv6)IPv6) and calculate a hash value. A hash value is used to select a GDR from GDR Candidates advertised by PIM DR. For example, 0.0.255.0 defines a Hash Mask for an IPv4 address that masks the first, the second, and the fourth octets.There are three Hash Masks defined: o RP Hash Mask o Source Hash Mask o GroupHashMask The hashmasksneed toallow for certain flows to always be forwarded by the same GDR, since the hash values are the same. For instance the mask 0.0.255.0 means that only the third octet will be considered when hashing. In the text below, a hash mask is in some places said to be zero. A hash mask is zero if no bits are set. That is, 0.0.0.0 for IPv4 and :: for IPv6. Also, a hash mask is said to be an all-bits-set mask if it is 255.255.255.255 for IPv4 or FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF for IPv6. There are three Hash Masks defined: o RP Hash Mask o Source Hash Mask o Group Hash Mask The hash masks need to be configured on the PIM routers that can potentially become a PIM DR, unless the implementation provides defaultHash Maskhash mask values. An implementation SHOULDprovide masks withhave default hash mask values255.255.255.255 (IPv4)as follows. The default RP Hash Mask SHOULD be zero (no bits set). The default Source andFFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF (IPv6).Group Hash Masks SHOULD both be all-bits-set masks. These default values are likely acceptable for most deployments, and simplify configuration. The DRLB-List Hello Option contains a list of GDR Candidates. The first one listed has ordinal number 0, the second listed ordinal number 1, and the last one has ordinal number N - 1 if there are N candidates listed. The hash value computed will be the ordinal number of the GDR Candidate that is acting as GDR. o If the group is in ASM mode and the RP Hash Mask announced by the PIM DR is not0,zero (at least one bit is set), calculate the value of hashvalue_RP [Section4.3]5.2] to determine the GDR. o If the group is in ASM mode and the RP Hash Mask announced by the PIM DR is0,zero (no bits are set), obtain the value of hashvalue_Group [Section4.3 ]5.2] to determine the GDR. o If the group is in SSM mode, use hashvalue_SG [Section4.3]5.2] to determine the GDR. A simple Modulo hash algorithm is defined in this document. However, to allow another hash algorithms to be used, a 1-octet "Hash Algorithm" field is included inDRLBCthe DRLB-Cap Hello Option to specify the hash algorithm used bya last hopthe router. If different hash algorithms are advertised amonglast hop routers, only last hopthe routersrunningon a LAN, only the outers advertising the same hash algorithm as the DR(and(as well as having the same DR priority as the DR) are eligible for GDR election.4.3.5.2. Modulo Hash AlgorithmThe Modulo hash algorithm is discussed here with a detailed description on hashvalue_RP. The same algorithmAs part of computing the hash, the notation LSZC(hash_mask) isdescribed in brief for hashvalue_Group usingused to denote thegroup address insteadnumber of zeroes counted from theRP address forleast significant bit of a Hash Mask hash_mask. As anASM group with zero RP_hashmask,example, LSZC(255.255.128) is 7 and alsowith hashvalue_SGLSZC(FFFF:8000::) is 111. If all bits are set, LSZC will be 0. If the mask is zero, then LSZC will be 32 for IPv4, and 128 for IPv6. The number of GDR Candidates is denoted as GDRC. The idea behind the Modulo hash algorithm is in simple terms that the corresponding mask is applied to a value, then thesource address of an (S,G), insteadresult is shifted right LSZC(mask) bits so that the least significant bits that were masked out are not considered. Then this result is masked by 0xFFFF, keeping only the last 32 bits of theRP address, o For ASM groups, withresult (this only makes anon-zero RP_Hash Mask,difference for IPv6). Finally, the hash value is this result modulo the number of GDR Candidates (GDRC). The Modulo hash algorithm for computing the values hashvalue_RP, hashvalue_Group and hashvalue_SG is defined as follows. hashvalue_RP is calculated as:hashvalue_RP =(((RP_address &RP_hashmask)RP_mask) >>N)LSZC(RP_mask)) & 0xFFFF) %MGDRC RP_address is the address of the RP defined for thegroup. Ngroup and RP_mask is thenumber of zeroes, counted fromRP Hash Mask. hashvalue_Group is calculated as: (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xFFFF) % GDRC Group_address is theleast significant bit ofgroup address and Group_mask is theRP_hashmask. MGroup Hash Mask. hashvalue_SG is calculated as: ((((Source_address & Source_mask) >> LSZC(Source_mask)) & 0xFFFF) ^ (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xFFFF)) % GDRC Group_address is thenumber of GDR Candidates. For example,group address and Group_mask is the Group Hash Mask. 5.2.1. Modulo Hash Algorithm Example To help illustrate the algorithm, consider this example. Router X with IPv4 address 203.0.113.1 receives aDRLBGDRDRLB-List Hello Option from the DR, which announces RP Hash Mask 0.0.255.0 and a list of GDR Candidates, sorted by IP addresses from high to low: 203.0.113.3, 203.0.113.2 and 203.0.113.1. The ordinal number assigned to those addresses would be: 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router X) Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 198.51.100.2 for Group2. Following the modulo hash algorithm:NLSZC(0.0.255.0) is 8for 0.0.255.0,andMGDRC is3 for the total number of GDR Candidates.3. The hashvalue_RP for Group1 with RP RP1192.0.2.1is: (((192.0.2.1 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 2 % 3 = 2 which matches the ordinal number assigned to Router X. Router X will be the GDR forGroup1, which uses 192.0.2.1 as the RP.Group1. The hashvalue_RP for Group2 with RP RP2198.51.100.2is: (((198.51.100.2 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 100 % 3 = 1 which is different fromRouter X'sthe ordinalnumber(2) hence,number of router X (2). Hence, Router X will not be GDR for Group2.o If RP_hashmask is 0,5.2.2. Limitations The Modulo Hash Algorithm has poor failover characteristics when ahash value for an ASM group is calculated using the Group Hash Mask: hashvalue_Group = (((Group_address & Group_hashmask) >> N) & 0xFFFF) % M Compare hashvalue_Group with Ordinal number assigned to Router X, to decide if Router X is the GDR. o For SSM groups, a hash value is calculated using both the Source and Group Hash Mask: hashvalue_SG = ((((Source_address & Source_hashmask) >> N_S) & 0xFFFF) ^ (((Group_address & Group_hashmask) >> N_G) & 0xFFFF)) % M 4.3.1. Limitations The Modulo Hash Algorithm has poor failover characteristics when a shared LAN has more than two GDRs. Inshared LAN has more than two GDRs. In the case of more than two GDRs on a LAN, when one GDR fails, all of the groups may be reassigned to anewdifferent GDR, even if they were not assigned to the failed GDR. However, many deployments use only two routers on a shared LAN for redundancy purposes. Future work may define new hash algorithms where only groups assigned to the failed GDR get reassigned.4.4.5.3. PIM Hello Options When alast hopPIM router sends a PIM Helloforon an interface with this specification enabled, it includes a new option, called "Load Balancing Capability(DRLBC)".(DRLB-Cap)". Besides thisDRLBCDRLB-Cap Hello Option, the elected PIM DR also includes a new "DR Load BalancingGDR (DRLBGDR)List (DRLB-List) Hello Option". TheDRLBGDRDRLB- List Hello Option consists of three Hash Masks as defined above and also a sorted list of GDR Candidate addresses on thelast hopLAN.The elected PIM DR uses DRLBC Hello Option advertised by all routers on the last hop LAN to compose the DRLBGDR Option. The GDR Candidates use the DRLBGDR Hello Option advertised by the PIM DR to calculate the hash value. 5. Hello Option Formats 5.1.5.3.1. PIM DR Load Balancing Capability(DRLBC)(DRLB-Cap) Hello Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type =TBD34 | Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved |Hash Algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: PIM DR Load Balancing Capability Hello Option Type:TBD34 Length: 4 Reserved: Transmitted as zero, ignored on receipt. Hash Algorithm: Hash algorithm type. 0 for the Modulo algorithm defined in this document. ThisDRLBCDRLB-Cap Hello Option MUST be advertised bylast hoprouters on all interfaceswith this specificationwhere DR Load Balancing is enabled.5.2.5.3.2. PIM DR Load BalancingGDR (DRLBGDR)List (DRLB-List) Hello Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type =TBD35 | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RP Mask | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | GDR Candidate Address(es) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4:GDRPIM DR Load Balancing List Hello Option Type:TBD35 Length: (3 + n) x (4 or16)16), where n is the number of GDR candidates. Group Mask (32/128 bits): Mask applied to group addresses as part of hash computation. Source Mask (32/128 bits): Mask applied to source addresses as part of hash computation. RP Mask (32/128 bits): Mask applied to RP addresses as part of hash computation. All masks MUSTbe inhave the sameaddress familynumber of bits as the IP source address in the PIM Hello IP header. GDR Address (32/128 bits): Address(es) of GDR Candidate(s) All addressesmustMUST be in the same address family as the PIM Hello IP header.TheIt is RECOMMENDED that the addresses are sorted in descending order.The order is converted to the ordinal number associated with each GDR candidate in hash value calculation. For example, if addresses advertised are R3, R2, R1, the ordinal number assigned to R3 is 0, to R2 is 1 and to R1 is 2.If the "Interface ID" option, as specified in [RFC6395], is present in a GDR Candidate's PIM Hello message, and the "Router ID" portion is non-zero: + For IPv4, the "GDR Candidate Address" will be set directly to the "Router ID". + For IPv6, the "GDR Candidate Address" will beset to the IPv4-IPv6 translated address of the "Router ID", as described in [RFC4291], that is the "Router-ID" is appended to the prefix of96 bits ofzeroes.zeroes followed by the 32 bit Router ID. If the "Interface ID" option is not present in a GDRCandidate'sCandidate' PIM Hello message, or if the "Interface ID" option is present but the "Router ID" field is zero, the "GDR Candidate Address" will be the IPv4 or IPv6 source address of the PIM Hello message. ThisDRLBGDRDRLB-List Hello Option MUST only be advertised by the elected PIM DR.6. Protocol Specification 6.1. PIM DR Operation The DRIt MUST be ignored if received from a non-DR. 5.4. PIM DR Operation The DR election process is still the same as defined in [RFC7761]. A DR that has this specification enabled on an interface advertises the newDRLBGDRDRLB-List Hello Option, which contains mask values from userconfiguration,configuration (or default values), followed by asortedlist of GDR CandidateAddresses,Addresses. It is RECOMMENDED that the list is sorted, from the highest value to the lowest value.Moreover,The reason for sorting the list is to make the behavior deterministic, regardless of the order the DR learns of new candidates. Note that same as non-DR routers, the DR also advertisesDRLBCDRLB-Cap Hello Option to indicate its capability of supporting this specification and the type of its GDR election hash algorithm. If a PIM DR receives aPIM Hello with the DRLBGDR Option, the PIM DR SHOULD ignore the TLV. If a PIM DR receives aneighborDRLBCDRLB-Cap Hello Option, which contains the same hash algorithm as the DR, and the neighbor has the same DR priority as the DR, PIM DR SHOULD consider the neighbor as a GDR Candidate and insert the GDRCandidate'sCandidate' Address into thesortedlist of theDRLBGDRDRLB-List Option. However, the DRMAYmay have policies limiting which GDR Candidates, or the number of GDR Candidates to include.6.2.The DR would normally include itself in the list of GDR Candidates. If a PIM neighbor included in the list expires, stops announcing the DRLB-Cap Hello Option, changes DR priority, changes hash algorithm or otherwise becomes ineligibile as a candidate, the DR should immediately send a triggered hello with a new list in the DRLB-List option, excluding the neighbor. If a new router becomes eligible as a candidate, there is no urgency in sending out an updated list. An updated list SHOULD be included in the next hello. 5.5. PIM GDR Candidate Operation When an IGMP/MLD report is received, without this specification, only the PIM DR will handle the join and potentially run into the issues described earlier. Using this specification, a hash algorithm is used by the GDR Candidates to determine which router is going to be responsible for building forwarding trees on behalf of the host. If this specification is enabled on an interface, the router MUST include theDRLBCDRLB-Cap Hello Option initsall PIM Hello messages sent onthethat interface. Note that the presence of theDRLBCDRLB-Cap Option in PIM Hello does not guarantee that this router would be considered as a GDR candidate. Once DR election is done, theDRLBGDRDRLB-List Hello Option would be received from the current PIM DR on the link which would contain a list of GDRs Candidates selected by the PIM DR. A router only acts as a GDRcandidateCandidate if it is included in the GDR Candidate list of theDRLBGDRDRLB-List Hello Option.A GDR Candidate may receive a DRLBGDRSee next section for details. 5.6. DRLB-List Hello Optionfrom the PIM DR with different Hash Masks from those the candidate was configured with. The GDR Candidate MUST use the Hash Masks advertised by the PIM DR to calculateProcessing This section discusses processing of thehash value. A GDR CandidateDRLB-List Hello Option. All routers MUST ignore theDRLBGDRDRLB-List Hello Option if it is received from a PIM router which is not the DR.If the PIM DR does not support this specification, GDR election will not take place, andThe option MUST only be processed by routers that are announcing thePIM DR joinsDRLB-Cap Option. Also, themulticast tree. 6.2.1. Router Receives New DRLBGDR The first time a router receives a DRLBGDR option fromalgorithm announced in thePIM DR, itDRLB-Cap Option, MUSTprocessbe theoption and check if it is insame as what was announced by the DR. All GDRlist. 1. If a router is not listed as a GDR candidate in DRLBGDR, no action is needed. 2. If a router is listed as a GDR candidate in DRLBGDR, then itCandidates MUSTprocess each of the groups, or source and group pairs if SSM, inuse theIGMP/MLD reports. The masks are announcedHash Masks advertised in thePIM Hello by the DR inOption, even if they differ from those theDRLBGDR Hello Option. For each group incandidate was configured with. A router stores thereportslatest option contents thatis in ASM mode,was announced, if any, andeach sourcedeletes the previous contents. The router MUST also compare the new contents with any previous contents, andgroup pairif there are any changes, continue processing as below. Note that if thegroup is in SSM mode, it (PIM Router) needs to runoption does not pass thehash algorithm (described in section 4.3) based onabove checks, theannounced Source, Group or RP masks to determinebelow processing MUST be done as ifit istheGDR for specified group, or source and group pair.option was not announced. If thehash result is to becontents of theGDR forDRLB-List Option, themulticast flow, it does buildmasks or themulticast forwarding tree. Ifcandidate list, differs from the previously saved copy, it isnot the GDRreceived for themulticast flow, no action is needed. 6.2.2. Router Receives Updated DRLBGDR If a router (GDRfirst time, ornon GDR) receives an unchanged DRLBGDR from the current PIM DR, no actionit isneeded. If a router (GDR or non GDR) receives a newno longer being received ormodified DRLBGDR fromaccepted, thecurrent PIM DR, it requires processingoption MUST be processed asdescribed below:below. 1. Ifitthe router was not included in the previous GDR list,and stillor there was no previous GDR list, but it is included in the new GDRlist: It needs to processlist, the router MUST for each of the groups, or source and group pairs if the group is in SSM mode,andwith local receiver interest, run the hash algorithm tocheck ifdetermine which of them it isstillthe GDRfor the givenfor. If it is not the GDR for a group, or source and group pair ifSSM.SSM, no processing is required. If it is hashed as the GDR, it needs to build a multicast forwarding tree. 2. If the router was included in the previous GDR list, and still is included in the new GDR list: The router MUST for each of the groups, or source and group pairs if the group is in SSM mode, with local receiver interest, run the hash algorithm to determine which of them it is the GDR for. If it was the GDR for a group, or source and group pair if SSM, and the new hash result chose it as the GDR, then no processing is required. If it was the GDR for a group, or source and group pair if SSM, earlier and now it is no longer the GDR, then it setsitsthe assert metricforpreference to maximum (0x7FFFFFFF) and themulticast flowassert metric tobe (PIM_ASSERT_INFINITY - 1),one less than maximum (0xFFFFFFFE), as explained inSection 6.3.[Section 5.7]. If it was not the GDR for a group, or source and group pair if SSM, earlier, and the new hash does not make it GDR, then no processing is required. If it was not the GDR for an earlier group, or source and group pair if SSM, and now becomes the GDR, it starts building multicast forwarding tree for this flow.2.3. Ifitthe router was included in the previous GDR list, but is not included in the new GDR list, or there is no new GDR list:It needs to processThe router MUST for each of the groups, or source and group pairs if the group is in SSMmode.mode, with local receiver interest do as follows. If it was the GDR for a group, or source and group pair if SSM, it setsitsthe assert metricforpreference to maximum (0x7FFFFFFF) and themulticast flowassert metric tobe (PIM_ASSERT_INFINITY - 1),one less than maximum (0xFFFFFFFE), as explained inSection 6.3.[Section 5.7]. If it was not the GDR, then no processing is required.3. If it was not included in the previous5.7. PIM Assert Modification GDRlist, but is included in thechanges may occur due to configuration change, due to GDR candidates going down, and also new routers coming up and becoming GDRlist, the router MUST runcandidates. This may occur while flows are being forwarded. If thehash algorithmGDR foreach of the groups, source and group pairs if SSM. If itan active flow changes, there isnotlikely to be some disruption, such as packet loss or duplicates. By using asserts, packet loss is minimized, while allowing a small amount of duplicates. When a router stops acting as the GDR for a group, or source and group pair if SSM,no processing is required. Ifitis hashed asMUST set theGDR, it needsassert metric preference tobuild a multicast forwarding tree. 6.3. PIM Assert Modification It is possible thatmaximum (0x7FFFFFFF) and theidentity ofassert metric to one less than maximum (0xFFFFFFFE). This was also mentioned in theGDR might changeprevious section. That is, whenever it sends or receives an assert for the group, it must use these values as the metric preference and metric rather than the values provided by routing. This is similar to what is done for AssertCancel Messages in [RFC7761], except that themiddlemetric value here is one less. The rest ofan active flow. Examples whenthiscould happen include: When a new PIM router comes up When a GDR restarts Whensection is just for illustration purposes and not part of theGDR changes, existing traffic might be disrupted. Duplicates or packet loss might be observed.protocol definition. To illustrate thecase,behavior when there is a GDR change, consider the following scenario where there are two flows G1 and G2. R1 is the GDR for G1, and R2 is the GDR for G2. When R3 comesup online,up, it is possible that R3 becomes GDR for both G1 and G2, hence R3 starts to build the forwarding tree for G1 and G2. If R1 and R2 stop forwarding before R3 completes the process, packet loss might occur. On the other hand, if R1 and R2 continue forwarding while R3 is building the forwarding trees, duplicates might occur.This is not a typical deployment scenario but might still happen. Here we describe a mechanism to minimize the impact. We essentially want to minimize packet loss. Therefore, we would allow a small amount of duplicates and depend on PIM Assert to minimize the duplication.When the role of GDR changes as above, instead of immediately stopping forwarding, R1 and R2 continue forwarding to G1 and G2 respectively, while, at the same time, R3 build forwarding trees for G1 and G2. This will lead to PIM Asserts.With the introduction of GDR, the following modification to the Assert packet MUST be done: if a router enables this specification on its downstream interface, but it is not a GDR (before network event it was GDR), it would adjust its Assert metric to (PIM_ASSERT_INFINITY - 1).Using the above example, for G1, assume R1 and R3 agree on the new GDR, which is R3. With the new assert behavior, R1will setsets itsAssertassert metricas (PIM_ASSERT_INFINITY - 1).to the near maximum value discussed above. That will make R3, which has normal metric in its Assert as the Assert winner. For G2, assume it takes a slightly longer time for R2 to find out that R3 is the new GDR and still considers itself being the GDR while R3 already has assumed the role of GDR. Since both R2 and R3 think they are GDRs, they further compare their metric and IP addresses. If R3 has the better routing metric, or the same metric but a better tie-breaker, the result will be consistent during GDR selection. If unfortunately, R2 has the better metric or the same metric but a better tie-breaker, R2 will become the Assert winner and continues to forward traffic.This will continue until: The next PIM Hello Option from DR selects R3 as the GDR. R3 will then build the forwarding tree and send an Assert. The process continues untilShortly after when R2agrees to the selection of R3 asfinds out that it is no longer the GDR,and sets its own Assert metric to (PIM_ASSERT_INFINITY - 1), whichR2 willmake R3 the Assert winner. Duringchange to using theprocess, we will see intermittent duplication of traffic but packet lossnear maximum assert metric. Next time R2 sends an assert message, it willbe minimized. Inlose theunlikely case thatassert and stop forwarding. As assert winner, R2never relinquishes its role as GDR (while every other router thinks otherwise), the proposed mechanism also helps to keep the duplication to a minimum until manual intervention takes place to remedy the situation. 7.would send periodic assert messages per [RFC7761]. 5.8. Backward Compatibility In the case of a hybrid Ethernet shared LAN (where some PIM routers enable the specification defined in this document, and some donot)not). o If a router which does not support this specification becomes the DR on the LAN, then it is theonly router acting as a DR, and there will be no load-balancing. o If a router which does not supportonly router acting as a DR, and there will be no load-balancing. o If a router which does not support this specification becomes a non-DR on link, then it acts as non-DR defined in [RFC7761], and it will not take part in any load-balancing. Load-balancing may still happen. 6. Manageability Considerations An administrator needs to consider what the total bandwidth requirements are and find a set of routers that together has enough total capacity, while making sure that each of the router can handle its part, assuming that the traffic is distributed roughly equally among the routers. Ideally, one should also have enough bandwidth to handle the case where at least one router fails. Ideally all the routers should have reachability to the sources, and RPs if applicable, that is not via the LAN. Care must be taken when choosing what hash masks to configure. One would typically configure the same masks on all the routers, so that they are the same, regardless of which router is elected as DR. The default masks are likely suitable for most deployment. The RP Hash Mask must be configured (the default is no bits set) if one wishes to hash based on the RP address rather than the group address for ASM. The default masks will use the entire group addresses, and source addresses if SSM, as part of the hash. An administrator may set other masks that masks out part of the addresses to ensure that certain flows always get hashed to the same router. How thisspecification becomes a non-DRis achieved depends onlink, then it acts as non-DR defined in [RFC7761], and it will not take part in any load-balancing. 8. Manageability Considerationshow the group addresses are allocated. Only the routers announcing the same Hash Algorithm as the DR would be considered as GDR candidates. Network administrators need to make sure that the desired set of routers announce the same algorithm. Migration between different algorithms is not considered in this document.9.7. IANA Considerations IANA has temporarily assigned type 34 for the PIM DR Load Balancing Capability(DRLBC)(DRLB-Cap) Hello Option, and type 35 for the PIM DR Load BalancingGDR (DRLBGDR)List (DRLB-List) Hello Option in the PIM-Hello Options registry. IANA is requested to make these assignments permanent when this document is published as an RFC.The string TBD should be replaced byNote that theassigned values accordingly.option names have changed slightly since the temporary assignments were made. Also, the length of option 34 is always 4, the registry currently says it is variable. This document requests IANA to create a registry called "Designated Router Load Balancing Hash Algorithms" in the "Protocol Independent Multicast (PIM)" branch of the registry tree. The registry lists hash algorithms for use by PIM Designated Router Load Balancing.9.1.7.1. Initial registry The initial content of the registry should be as follows. Type Name Reference ------ ---------------------------------------- -------------------- 0 Modulo This document 1-255 Unassigned9.2.7.2. Assignment of new hash algorithms Assignment of new hash algorithms is done according to the "IETF Review" model, see[RFC5226]. 10.[RFC8126]. 8. Security Considerations Security of the new DR Load Balancing PIM Hello Options is only guaranteed by the security of PIM Hello messages, so the security considerations for PIM Hello messages as described in PIM-SM [RFC7761] apply here.11.If the DR is subverted it could omit or add certain GDRs or announce an unsupported algorithm. If another router is subverted, it could be made DR and cause similar issues. While these issues are specific to this specification, they are not that different from existing attacks such as subverting a DR and lowering the DR priority, causing a different router to become the DR. If a GDR is subverted, it could potentially be made to stop forwarding all the traffic it is expected to forward. This is also similar today to if a DR is subverted. 9. Acknowledgement The authors would like to thank SteveSimlo,Simlo and Taki Millonis for helping with the originalidea,idea; Alia Atlas, Bill Atwood, Jake Holland, BharatJoshiJoshi, Anish Kachinthaya, Anvitha Kachinthaya and Alvaro Retana forreview comments,reviews and comments; and Toerless Eckert and Rishabh Parekh for helpful conversation on the document.Special thanks to Anish Kachinthaya, Anvitha Kachinthaya and Jake Holland for reviewing the document and providing comments. 12.10. References12.1.10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.[RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", RFC 4291, DOI 10.17487/RFC4291, February 2006, <https://www.rfc-editor.org/info/rfc4291>.[RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, October 2011, <https://www.rfc-editor.org/info/rfc6395>. [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent Multicast - Sparse Mode (PIM-SM): Protocol Specification (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March 2016, <https://www.rfc-editor.org/info/rfc7761>.12.2.[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>. 10.2. Informative References[RFC5226] Narten, T.[RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. Thyagarajan, "Internet Group Management Protocol, Version 3", RFC 3376, DOI 10.17487/RFC3376, October 2002, <https://www.rfc-editor.org/info/rfc3376>. [RFC3810] Vida, R., Ed. and L. Costa, Ed., "Multicast Listener Discovery Version 2 (MLDv2) for IPv6", RFC 3810, DOI 10.17487/RFC3810, June 2004, <https://www.rfc-editor.org/info/rfc3810>. [RFC4541] Christensen, M., Kimball, K., and F. Solensky, "Considerations for Internet Group Management Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping Switches", RFC 4541, DOI 10.17487/RFC4541, May 2006, <https://www.rfc-editor.org/info/rfc4541>. [RFC4607] Holbrook, H.Alvestrand,and B. Cain, "Source-Specific Multicast for IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, <https://www.rfc-editor.org/info/rfc4607>. [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC5226,8126, DOI10.17487/RFC5226, May 2008, <https://www.rfc-editor.org/info/rfc5226>.10.17487/RFC8126, June 2017, <https://www.rfc-editor.org/info/rfc8126>. Authors' Addresses Yiqun Cai Alibaba Group Email: yiqun.cai@alibaba-inc.com Heidi Ou Alibaba Group Email: heidi.ou@alibaba-inc.com Sri Vallepalli Cisco Systems, Inc. 3625 Cisco Way San Jose CA 95134 USA Email: svallepa@cisco.com Mankamana Mishra Cisco Systems, Inc. 821 Alder Drive, Milpitas CA 95035 USA Email: mankamis@cisco.com Stig Venaas Cisco Systems, Inc. Tasman Drive San Jose CA 95134 USA Email: stig@cisco.com Andy Green British Telecom Adastral Park Ipswich IP5 2RE United Kingdom Email: andy.da.green@bt.com