Network Working Group                                             Y. Cai
Internet-Draft                                                     H. Ou
Intended status: Standards Track                           Alibaba Group
Expires: May 17, 2019 April 13, 2020                                    S. Vallepalli
                                                               M. Mishra
                                                               S. Venaas
                                                     Cisco Systems, Inc.
                                                                A. Green
                                                         British Telecom
                                                       November 13, 2018
                                                        October 11, 2019

                  PIM Designated Router Load Balancing
                         draft-ietf-pim-drlb-10
                         draft-ietf-pim-drlb-11

Abstract

   On a multi-access network, one of the PIM PIM-SM routers is elected as a
   Designated Router (DR).  On Router.  One of the last hop LAN, responsibilities of the PIM DR Designated
   Router is
   responsible for tracking to track local multicast listeners and forwarding
   traffic forward data to
   these listeners if the group is operating in PIM-SM.  This document
   specifies a modification to the PIM-SM protocol that allows more than
   one of these last hop the PIM-SM routers to be selected, take on this responsibility so that the
   forwarding load can be distributed among these multiple routers.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on May 17, 2019. April 13, 2020.

Copyright Notice

   Copyright (c) 2018 2019 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   5
   3.  Applicability . . . . . . . . . . . . . . . . . . . . . . . .   5
   4.  Functional Overview . . . . . . . . . . . . . . . . . . . . .   6   5
     4.1.  GDR Candidates  . . . . . . . . . . . . . . . . . . . . .   6
     4.2.  Hash Mask and Hash Algorithm
   5.  Protocol Specification  . . . . . . . . . . . . . . . . . . .   7
     4.3.  Modulo
     5.1.  Hash Mask and Hash Algorithm  . . . . . . . . . . . . . .   7
     5.2.  Modulo Hash Algorithm . . . . . . .   8
       4.3.1.  Limitations . . . . . . . . . . .   8
       5.2.1.  Modulo Hash Algorithm Example . . . . . . . . . . . .   9
     4.4.  PIM Hello Options
       5.2.2.  Limitations . . . . . . . . . . . . . . . . . . . .   9
   5. .  10
     5.3.  PIM Hello Option Formats Options . . . . . . . . . . . . . . . . . . . .  10
     5.1.
       5.3.1.  PIM DR Load Balancing Capability (DRLBC) (DRLB-Cap) Hello
               Option  . .  10
     5.2.  PIM DR Load Balancing GDR (DRLBGDR) Hello Option  . . . .  10
   6.  Protocol Specification . . . . . . . . . . . . . . . . . .  10
       5.3.2.  PIM DR Load Balancing List (DRLB-List) Hello Option .  11
     6.1.
     5.4.  PIM DR Operation  . . . . . . . . . . . . . . . . . . . .  11
     6.2.  12
     5.5.  PIM GDR Candidate Operation . . . . . . . . . . . . . . .  12
       6.2.1.  Router Receives New DRLBGDR . . . . . . . . . . . . .  13
       6.2.2.  Router Receives Updated DRLBGDR
     5.6.  DRLB-List Hello Option Processing . . . . . . . . . . . .  13
     6.3.
     5.7.  PIM Assert Modification . . . . . . . . . . . . . . . . .  14
   7.
     5.8.  Backward Compatibility  . . . . . . . . . . . . . . . . . . . . . . . .  15
   8.  16
   6.  Manageability Considerations  . . . . . . . . . . . . . . . .  16
   9.
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
     9.1.  17
     7.1.  Initial registry  . . . . . . . . . . . . . . . . . . . .  16
     9.2.  17
     7.2.  Assignment of new hash algorithms . . . . . . . . . . . .  16
   10.  17
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
   11.  17
   9.  Acknowledgement . . . . . . . . . . . . . . . . . . . . . . .  17
   12.  18
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
     12.1.  18
     10.1.  Normative References . . . . . . . . . . . . . . . . . .  17
     12.2.  18
     10.2.  Informative References . . . . . . . . . . . . . . . . .  17  18
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  17  19

1.  Introduction

   On a multi-access LAN LAN, such as an Ethernet, with one or more PIM-SM
   [RFC7761] routers, one of the PIM PIM-SM routers is elected as a DR.
   Designated Router (DR).  The PIM DR has two roles responsibilities in the
   PIM-SM protocol.
   On the first hop  For any active sources on a LAN, the PIM DR is
   responsible for registering an
   active source with the Rendezvous Point (RP) if the
   group is operating in PIM-SM.  On the last hop LAN,  Also, the PIM DR is responsible for
   tracking local multicast listeners and forwarding to these listeners
   if the group is operating in PIM-SM.

   Consider the following last hop LAN in Figure 1:

                             (core networks)
                              |     |     |
                              |     |     |
                             R1    R2     R3
                              |     |     |
                           --(last hop LAN)--
                              ----(LAN)----
                                    |
                                    |
                            (many receivers)

                       Figure 1: Last Hop LAN with receivers

   Assume R1 is elected as the Designated Router. DR.  According to
   [RFC7761], the PIM-SM protocol, R1
   will be responsible for forwarding traffic to that LAN on behalf of
   any local members.  In addition to keeping track of IGMP
   and MLD membership
   reports, R1 is also responsible for initiating the creation of source
   and/or shared trees towards the senders or the RPs.

   Forcing sole  The membership
   reports would be IGMP or MLD messages.  This applies to any versions
   of the IGMP and MLD protocols.  The most recent versions are IGMPv3
   [RFC3376] and MLDv2 [RFC3810].

   Having a single router acting as DR and being responsible for data
   plane forwarding responsibility on the PIM DR
   uncovers a limitation in the protocol.  In comparison, even though an
   OSPF DR or an IS-IS DIS handles additional duties while running the
   OSPF or IS-IS protocols, they are not required to be solely
   responsible for forwarding packets for the network.  On the other
   hand, on a last hop LAN, only the PIM DR is asked to forward packets
   while the other routers handle only control traffic (and perhaps drop
   packets due to RPF failures).  Hence the forwarding load of a last
   hop LAN is concentrated on a single router.

   This leads to several issues.  One of leads to several issues.  One of the issues is that
   the aggregated bandwidth will be limited to what R1 can handle towards
   this particular interface. with
   regards to capacity of incoming links, the interface on the LAN, and
   total forwarding capacity.  It is very common that the last hop a LAN consists of
   switches that run IGMP/MLD or PIM snooping. snooping [RFC4541].  This allows
   the forwarding of multicast packets to be restricted only to segments
   leading to receivers who have indicated their interest in multicast
   groups using either IGMP or MLD.  The emergence of the switched
   Ethernet allows the aggregated bandwidth to exceed, sometimes by a
   large number, that of a single link.  For example, let us modify
   Figure 1 and introduce an Ethernet switch in Figure 2.

                            (core networks)
                             |     |     |
                             |     |     |
                            R1    R2     R3
                             |     |     |
                          +=gi0===gi1===gi2=+
                          +                 +
                          +      switch     +
                          +                 +
                          +=gi4===gi5===gi6=+
                             |     |     |
                            H1    H2     H3

               Figure 2: Last Hop Network LAN with Ethernet Switch

   Let us assume that each individual link is a Gigabit Ethernet.  Each
   router, R1, R2 and R3, and the switch have enough forwarding capacity
   to handle hundreds of Gigabits of data.

   Let us further assume that each of the hosts requests 500 Mbps of
   unique multicast data.  This totals to 1.5 Gbps of data, which is
   less than what each switch or the combined uplink bandwidth across
   the routers can handle, even under failure of a single router.

   On the other hand, the link between R1 and switch, via port gi0, can
   only handle a throughput of 1Gbps.  And if R1 is the only DR (the PIM
   DR elected using the procedure defined by [RFC7761]) at least 500
   Mbps worth of data will be lost because the only link that can be
   used to draw the traffic from the routers to the switch is via gi0.
   In other words, the entire network's throughput is limited by the
   single connection between the PIM DR and the switch (or the last hop LAN as in
   Figure 1).

   Another important issue is related to failover.  If R1 is the only
   forwarder on the last hop router for a shared LAN, when R1 goes out of service, multicast
   forwarding for the entire LAN has to be rebuilt by the newly elected
   PIM DR.  However, if there was a way that allowed multiple routers to
   forward to the LAN for different groups, failure of one of the
   routers would only lead to disruption to a subset of the flows,
   therefore improving the overall resilience of the network.

   There is a limitation in the hash algorithm used in this document,
   but this document provides the option to have different and more
   consistent hash algorithms in the future.

   This document specifies a modification to the PIM-SM protocol that
   allows more than one of these routers, called Group Designated
   Routers (GDR) to be selected so that the forwarding load can be
   distributed among a number of routers.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   With respect to PIM, PIM-SM, this document follows the terminology that
   has been defined in [RFC7761].

   This document also introduces the following new acronyms:

   o  GDR: GDR stands for "Group Group Designated Router". Router.  For each multicast flow, either a
      (*,G) for ASM, Any-Source Multicast (ASM), or an (S,G) for SSM, Source-
      Specific Multicast (SSM) [RFC4607], a hash algorithm (described
      below) is used to select one of the routers as a GDR.  The GDR is
      responsible for initiating the forwarding tree building process
      for the corresponding multicast flow.

   o  GDR Candidate: a last hop router that has the potential to become a GDR.  A GDR Candidate must have the same DR priority and must
      run the same GDR election hash algorithm as the DR router.  It
      must send and process new PIM Hello Options as defined in this
      document.
      There might be more than one multiple GDR Candidate Candidates on a LAN, but only one can
      become the GDR for a specific multicast flow.

3.  Applicability

   The extension specified in this document applies to PIM-SM when they
   act as last hop routers only. (there are directly connected receivers).  It
   does not alter the behavior of a PIM DR DR, or any other routers, on the
   first hop network. network (directly connected sources).  This is because the
   source tree is built using the IP address of the sender, not the IP
   address of the PIM DR that sends the registers towards the RP.  The
   load balancing between first hop routers can be achieved naturally if
   an IGP provides equal cost multiple paths (which it usually does in
   practice).  Also distributing the load to do registering does not
   justify the additional complexity required to support it.

4.  Functional Overview

   In the PIM DR election as defined in [RFC7761], when multiple last
   hop routers
   are connected to a multi-access LAN (for example, an Ethernet), one
   of them is elected to act as PIM DR.  The PIM DR is responsible for
   sending local Join/Prune messages towards the RP or source.  In order
   to elect the PIM DR, each PIM router on the LAN examines the received
   PIM Hello messages and compares its own DR priority and IP address
   with those of its neighbors.  The router with the highest DR priority
   is the PIM DR.  If there are multiple such routers, their IP
   addresses are used as the tie-breaker, as described in [RFC7761].

   In order to share forwarding load among last hop routers, besides the
   normal PIM DR election, the GDR is also elected on the last hop multi-access
   LAN.  There is only one PIM DR on the multi-access LAN, but there
   might be multiple GDR Candidates.

   For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a
   hash algorithm is used to select one of the routers to be the GDR.  A
   new DR Load Balancing Capability (DRLBC) (DRLB-Cap) PIM Hello Option, which
   contains hash algorithm type, is announced by routers on interfaces
   where this specification is enabled.  Last hop routers  Routers with the new
   DRLBC DRLB-Cap
   Option advertised in its their PIM Hello, and using the same GDR election
   hash algorithm and the same DR priority as the PIM DR, are considered
   as GDR Candidates.

   Hash Masks are defined for Source, Group and RP separately, in order
   to handle PIM ASM/SSM.  The masks, as well as a sorted list of GDR
   Candidate Addresses, are announced by the DR in a new DR Load
   Balancing GDR (DRLBGDR) List (DRLB-List) PIM Hello Option.

   A hash algorithm based on the announced Source, Group, or RP masks
   allows one GDR to be assigned to a corresponding multicast state.
   And that GDR is responsible for initiating the creation of the
   multicast forwarding tree for multicast traffic.

4.1.  GDR Candidates

   GDR is the new concept introduced by this specification.  GDR
   Candidates are routers eligible for GDR election on the LAN.  To
   become a GDR Candidate, a router MUST support this specification, must have the same DR priority and
   run the same GDR election hash algorithm as the DR on the LAN.

   For example, assume there are 4 routers on the LAN: R1, R2, R3 and
   R4, which all support this specification. each announcing a DRLB-Cap option.  R1, R2 and R3 have the same
   DR priority while R4's DR priority is less preferred.  In this
   example, R4 will not be eligible for GDR election, because R4 will
   not become a PIM DR unless all of R1, R2 and R3 go out of service.

   Furthermore, assume router R1 wins the PIM DR election, R1 and R2 run
   the same hash algorithm for GDR election, while R3 runs a different
   one.  In this case, only R1 and R2 will be eligible for GDR election,
   while R3 will not.

   As a DR, R1 will include its own Load Balancing Hash Masks and the
   identity of R1 and R2 (the GDR Candidates) in its DRLBGDR DRLB-List Hello
   Option.

4.2.

5.  Protocol Specification

5.1.  Hash Mask and Hash Algorithm

   A Hash Mask is used to extract a number of bits from the
   corresponding IP address field (32 for v4, IPv4, 128 for v6) IPv6) and
   calculate a hash value.  A hash value is used to select a GDR from
   GDR Candidates advertised by PIM DR.  For example, 0.0.255.0 defines
   a Hash Mask for an IPv4 address that masks the first, the second, and
   the fourth octets.

   There are three Hash Masks defined:

   o  RP Hash Mask

   o  Source Hash Mask

   o  Group  Hash Mask

   The hash masks need to allow for certain flows to always be
   forwarded by the same GDR, since the hash values are the same.  For
   instance the mask 0.0.255.0 means that only the third octet will be
   considered when hashing.

   In the text below, a hash mask is in some places said to be zero.  A
   hash mask is zero if no bits are set.  That is, 0.0.0.0 for IPv4 and
   :: for IPv6.  Also, a hash mask is said to be an all-bits-set mask if
   it is 255.255.255.255 for IPv4 or
   FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF for IPv6.

   There are three Hash Masks defined:

   o  RP Hash Mask

   o  Source Hash Mask

   o  Group Hash Mask

   The hash masks need to be configured on the PIM routers that can
   potentially become a PIM DR, unless the implementation provides
   default Hash Mask hash mask values.  An implementation SHOULD provide masks
   with have default hash
   mask values 255.255.255.255 (IPv4) as follows.  The default RP Hash Mask SHOULD be zero (no
   bits set).  The default Source and
   FFFF:FFFF:FFFF:FFFF:FFFFF:FFFF:FFFF:FFFF (IPv6). Group Hash Masks SHOULD both be
   all-bits-set masks.  These default values are likely acceptable for
   most deployments, and simplify configuration.

   The DRLB-List Hello Option contains a list of GDR Candidates.  The
   first one listed has ordinal number 0, the second listed ordinal
   number 1, and the last one has ordinal number N - 1 if there are N
   candidates listed.  The hash value computed will be the ordinal
   number of the GDR Candidate that is acting as GDR.

   o  If the group is in ASM mode and the RP Hash Mask announced by the
      PIM DR is not 0, zero (at least one bit is set), calculate the value
      of hashvalue_RP [Section 4.3] 5.2] to determine the GDR.

   o  If the group is in ASM mode and the RP Hash Mask announced by the
      PIM DR is 0, zero (no bits are set), obtain the value of
      hashvalue_Group [Section 4.3 ] 5.2] to determine the GDR.

   o  If the group is in SSM mode, use hashvalue_SG [Section 4.3] 5.2] to
      determine the GDR.

   A simple Modulo hash algorithm is defined in this document.  However,
   to allow another hash algorithms to be used, a 1-octet "Hash
   Algorithm" field is included in DRLBC the DRLB-Cap Hello Option to specify
   the hash algorithm used by a last hop the router.

   If different hash algorithms are advertised among last hop routers,
   only last hop the routers running on a
   LAN, only the outers advertising the same hash algorithm as the DR (and
   (as well as having the same DR priority as the DR) are eligible for
   GDR election.

4.3.

5.2.  Modulo Hash Algorithm

   The Modulo hash algorithm is discussed here with a detailed
   description on hashvalue_RP.  The same algorithm

   As part of computing the hash, the notation LSZC(hash_mask) is described in
   brief for hashvalue_Group using used
   to denote the group address instead number of zeroes counted from the RP
   address for least significant bit
   of a Hash Mask hash_mask.  As an ASM group with zero RP_hashmask, example, LSZC(255.255.128) is 7 and
   also with
   hashvalue_SG LSZC(FFFF:8000::) is 111.  If all bits are set, LSZC will be 0.
   If the mask is zero, then LSZC will be 32 for IPv4, and 128 for IPv6.

   The number of GDR Candidates is denoted as GDRC.

   The idea behind the Modulo hash algorithm is in simple terms that the
   corresponding mask is applied to a value, then the source address of an (S,G), instead result is shifted
   right LSZC(mask) bits so that the least significant bits that were
   masked out are not considered.  Then this result is masked by 0xFFFF,
   keeping only the last 32 bits of the RP
   address,

   o  For ASM groups, with result (this only makes a non-zero RP_Hash Mask,
   difference for IPv6).  Finally, the hash value is this result modulo
   the number of GDR Candidates (GDRC).

   The Modulo hash algorithm for computing the values hashvalue_RP,
   hashvalue_Group and hashvalue_SG is defined as follows.

   hashvalue_RP is calculated as:

         hashvalue_RP =

      (((RP_address & RP_hashmask) RP_mask) >> N) LSZC(RP_mask)) & 0xFFFF) % M GDRC

      RP_address is the address of the RP defined for the group.  N group and
      RP_mask is the number of zeroes, counted from RP Hash Mask.

   hashvalue_Group is calculated as:

      (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xFFFF) %
      GDRC

      Group_address is the least significant bit
         of group address and Group_mask is the RP_hashmask.  M Group
      Hash Mask.

   hashvalue_SG is calculated as:

      ((((Source_address & Source_mask) >> LSZC(Source_mask)) & 0xFFFF)
      ^ (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xFFFF)) %
      GDRC

      Group_address is the number of GDR Candidates.

         For example, group address and Group_mask is the Group
      Hash Mask.

5.2.1.  Modulo Hash Algorithm Example

   To help illustrate the algorithm, consider this example.  Router X
   with IPv4 address 203.0.113.1 receives a
         DRLBGDR DRLB-List Hello Option from
   the DR, which announces RP Hash Mask 0.0.255.0 and a list of GDR
   Candidates, sorted by IP addresses from high to low: 203.0.113.3,
   203.0.113.2 and 203.0.113.1.  The ordinal number assigned to those
   addresses would be:

   0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router X)

   Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 198.51.100.2
   for Group2.  Following the modulo hash algorithm:

         N

   LSZC(0.0.255.0) is 8 for 0.0.255.0, and M GDRC is 3 for the total number of GDR
         Candidates. 3.  The hashvalue_RP for Group1 with
   RP RP1 192.0.2.1 is:

   (((192.0.2.1 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 2 % 3 = 2

   which matches the ordinal number assigned to Router X.  Router X will
   be the GDR for Group1, which uses 192.0.2.1 as the RP. Group1.

   The hashvalue_RP for Group2 with RP RP2 198.51.100.2 is:

   (((198.51.100.2 & 0.0.255.0) >> 8) & 0xFFFF % 3) = 100 % 3 = 1

   which is different from Router X's the ordinal number(2) hence, number of router X (2).  Hence,
   Router X will not be GDR for Group2.

   o  If RP_hashmask is 0,

5.2.2.  Limitations

   The Modulo Hash Algorithm has poor failover characteristics when a hash value for an ASM group is calculated
      using the Group Hash Mask:

         hashvalue_Group = (((Group_address & Group_hashmask) >> N) &
         0xFFFF) % M

         Compare hashvalue_Group with Ordinal number assigned to Router
         X, to decide if Router X is the GDR.

   o  For SSM groups, a hash value is calculated using both the Source
      and Group Hash Mask:

         hashvalue_SG = ((((Source_address & Source_hashmask) >> N_S) &
         0xFFFF) ^ (((Group_address & Group_hashmask) >> N_G) & 0xFFFF))
         % M

4.3.1.  Limitations

   The Modulo Hash Algorithm has poor failover characteristics when a
   shared LAN has more than two GDRs.  In
   shared LAN has more than two GDRs.  In the case of more than two GDRs
   on a LAN, when one GDR fails, all of the groups may be reassigned to
   a new different GDR, even if they were not assigned to the failed GDR.
   However, many deployments use only two routers on a shared LAN for
   redundancy purposes.  Future work may define new hash algorithms
   where only groups assigned to the failed GDR get reassigned.

4.4.

5.3.  PIM Hello Options

   When a last hop PIM router sends a PIM Hello for on an interface with this
   specification enabled, it includes a new option, called "Load
   Balancing Capability (DRLBC)". (DRLB-Cap)".

   Besides this DRLBC DRLB-Cap Hello Option, the elected PIM DR also includes
   a new "DR Load Balancing GDR (DRLBGDR) List (DRLB-List) Hello Option".  The DRLBGDR DRLB-
   List Hello Option consists of three Hash Masks as defined above and
   also a sorted list of GDR Candidate addresses on the last hop LAN.

   The elected PIM DR uses DRLBC Hello Option advertised by all routers
   on the last hop LAN to compose the DRLBGDR Option.  The GDR
   Candidates use the DRLBGDR Hello Option advertised by the PIM DR to
   calculate the hash value.

5.  Hello Option Formats

5.1.

5.3.1.  PIM DR Load Balancing Capability (DRLBC) (DRLB-Cap) Hello Option

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           Type = TBD 34           |         Length = 4            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Reserved                  |Hash Algorithm |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Figure 3: PIM DR Load Balancing Capability Hello Option

      Type: TBD 34

      Length: 4

      Reserved: Transmitted as zero, ignored on receipt.

      Hash Algorithm: Hash algorithm type. 0 for the Modulo algorithm
      defined in this document.

   This DRLBC DRLB-Cap Hello Option MUST be advertised by last hop routers on all
   interfaces with this specification where DR Load Balancing is enabled.

5.2.

5.3.2.  PIM DR Load Balancing GDR (DRLBGDR) List (DRLB-List) Hello Option

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |           Type = TBD 35           |         Length                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          Group Mask                           |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                          Source Mask                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                            RP Mask                            |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                    GDR Candidate Address(es)                  |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Figure 4: GDR PIM DR Load Balancing List Hello Option

      Type: TBD 35

      Length: (3 + n) x (4 or 16) 16), where n is the number of GDR
      candidates.

      Group Mask (32/128 bits): Mask applied to group addresses as part
      of hash computation.

      Source Mask (32/128 bits): Mask applied to source addresses as
      part of hash computation.

      RP Mask (32/128 bits): Mask applied to RP addresses as part of
      hash computation.

         All masks MUST be in have the same address family number of bits as the IP source
         address in the PIM Hello IP header.

      GDR Address (32/128 bits): Address(es) of GDR Candidate(s)

         All addresses must MUST be in the same address family as the PIM
         Hello IP header.  The  It is RECOMMENDED that the addresses are
         sorted in descending order.  The
         order is converted to the ordinal number associated with each
         GDR candidate in hash value calculation.  For example, if
         addresses advertised are R3, R2, R1, the ordinal number
         assigned to R3 is 0, to R2 is 1 and to R1 is 2.

         If the "Interface ID" option, as specified in [RFC6395], is
         present in a GDR Candidate's PIM Hello message, and the "Router
         ID" portion is non-zero:

         +  For IPv4, the "GDR Candidate Address" will be set directly
            to the "Router ID".

         +  For IPv6, the "GDR Candidate Address" will be set to the
            IPv4-IPv6 translated address of the "Router ID", as
            described in [RFC4291], that is the "Router-ID" is appended
            to the prefix of 96 bits of zeroes.
            zeroes followed by the 32 bit Router ID.

         If the "Interface ID" option is not present in a GDR
         Candidate's Candidate'
         PIM Hello message, or if the "Interface ID" option is present
         but the "Router ID" field is zero, the "GDR Candidate Address"
         will be the IPv4 or IPv6 source address of the PIM Hello
         message.

         This DRLBGDR DRLB-List Hello Option MUST only be advertised by the
         elected PIM DR.

6.  Protocol Specification

6.1.  PIM DR Operation

   The DR  It MUST be ignored if received from a non-DR.

5.4.  PIM DR Operation

   The DR election process is still the same as defined in [RFC7761].  A
   DR that has this specification enabled on an interface advertises the
   new DRLBGDR DRLB-List Hello Option, which contains mask values from user
   configuration,
   configuration (or default values), followed by a sorted list of GDR
   Candidate Addresses, Addresses.  It is RECOMMENDED that the list is sorted, from
   the highest value to the lowest value.  Moreover,  The reason for sorting the
   list is to make the behavior deterministic, regardless of the order
   the DR learns of new candidates.  Note that same as non-DR routers,
   the DR also advertises DRLBC DRLB-Cap Hello Option to indicate its
   capability of supporting this specification and the type of its GDR
   election hash algorithm.

   If a PIM DR receives a PIM Hello with the DRLBGDR Option, the PIM DR
   SHOULD ignore the TLV.

   If a PIM DR receives a neighbor DRLBC DRLB-Cap Hello Option, which contains
   the same hash algorithm as the DR, and the neighbor has the same DR
   priority as the DR, PIM DR SHOULD consider the neighbor as a GDR
   Candidate and insert the GDR Candidate's Candidate' Address into the sorted list of the DRLBGDR
   DRLB-List Option.  However, the DR MAY may have policies limiting which
   GDR Candidates, or the number of GDR Candidates to include.

6.2.  The DR
   would normally include itself in the list of GDR Candidates.

   If a PIM neighbor included in the list expires, stops announcing the
   DRLB-Cap Hello Option, changes DR priority, changes hash algorithm or
   otherwise becomes ineligibile as a candidate, the DR should
   immediately send a triggered hello with a new list in the DRLB-List
   option, excluding the neighbor.

   If a new router becomes eligible as a candidate, there is no urgency
   in sending out an updated list.  An updated list SHOULD be included
   in the next hello.

5.5.  PIM GDR Candidate Operation

   When an IGMP/MLD report is received, without this specification, only
   the PIM DR will handle the join and potentially run into the issues
   described earlier.  Using this specification, a hash algorithm is
   used by the GDR Candidates to determine which router is going to be
   responsible for building forwarding trees on behalf of the host.

   If this specification is enabled on an interface, the router MUST
   include the DRLBC DRLB-Cap Hello Option in its all PIM Hello messages sent on the
   that interface.  Note that the presence of the DRLBC DRLB-Cap Option in PIM
   Hello does not guarantee that this router would be considered as a
   GDR candidate.  Once DR election is done, the DRLBGDR DRLB-List Hello Option
   would be received from the current PIM DR on the link which would
   contain a list of GDRs Candidates selected by the PIM DR.

   A router only acts as a GDR candidate Candidate if it is included in the GDR
   Candidate list of the DRLBGDR DRLB-List Hello Option.

   A GDR Candidate may receive a DRLBGDR  See next section for
   details.

5.6.  DRLB-List Hello Option from the PIM DR
   with different Hash Masks from those the candidate was configured
   with.  The GDR Candidate MUST use the Hash Masks advertised by the
   PIM DR to calculate Processing

   This section discusses processing of the hash value.

   A GDR Candidate DRLB-List Hello Option.  All
   routers MUST ignore the DRLBGDR DRLB-List Hello Option if it is received from
   a PIM router which is not the DR.

   If the PIM DR does not support this specification, GDR election will
   not take place, and  The option MUST only be processed
   by routers that are announcing the PIM DR joins DRLB-Cap Option.  Also, the multicast tree.

6.2.1.  Router Receives New DRLBGDR

   The first time a router receives a DRLBGDR option from
   algorithm announced in the PIM DR, it DRLB-Cap Option, MUST process be the option and check if it is in same as what
   was announced by the DR.  All GDR list.

   1.  If a router is not listed as a GDR candidate in DRLBGDR, no
       action is needed.

   2.  If a router is listed as a GDR candidate in DRLBGDR, then it Candidates MUST
       process each of the groups, or source and group pairs if SSM, in use the IGMP/MLD reports.  The masks are announced Hash Masks
   advertised in the PIM Hello
       by the DR in Option, even if they differ from those the DRLBGDR Hello Option.  For each group in
   candidate was configured with.

   A router stores the
       reports latest option contents that is in ASM mode, was announced, if
   any, and each source deletes the previous contents.  The router MUST also compare
   the new contents with any previous contents, and group pair if there are any
   changes, continue processing as below.  Note that if the group is in SSM mode, it (PIM Router) needs to run option does
   not pass the hash
       algorithm (described in section 4.3) based on above checks, the announced
       Source, Group or RP masks to determine below processing MUST be done as if it is
   the GDR for
       specified group, or source and group pair. option was not announced.

   If the hash result is
       to be contents of the GDR for DRLB-List Option, the multicast flow, it does build masks or the multicast
       forwarding tree.  If candidate
   list, differs from the previously saved copy, it is not the GDR received for the multicast flow, no
       action is needed.

6.2.2.  Router Receives Updated DRLBGDR

   If a router (GDR
   first time, or non GDR) receives an unchanged DRLBGDR from the
   current PIM DR, no action it is needed.

   If a router (GDR or non GDR) receives a new no longer being received or modified DRLBGDR from accepted, the current PIM DR, it requires processing option
   MUST be processed as described below: below.

   1.  If it the router was not included in the previous GDR list, and still or there
       was no previous GDR list, but it is included in the new GDR list: It needs to process list,
       the router MUST for each of the groups, or source and group pairs
       if the group is in SSM mode,
       and with local receiver interest, run
       the hash algorithm to check if determine which of them it is still the GDR for
       the given for.

          If it is not the GDR for a group, or source and group pair if SSM.
          SSM, no processing is required.

          If it is hashed as the GDR, it needs to build a multicast
          forwarding tree.

   2.  If the router was included in the previous GDR list, and still is
       included in the new GDR list: The router MUST for each of the
       groups, or source and group pairs if the group is in SSM mode,
       with local receiver interest, run the hash algorithm to determine
       which of them it is the GDR for.

          If it was the GDR for a group, or source and group pair if
          SSM, and the new hash result chose it as the GDR, then no
          processing is required.

          If it was the GDR for a group, or source and group pair if
          SSM, earlier and now it is no longer the GDR, then it sets its the
          assert metric for preference to maximum (0x7FFFFFFF) and the multicast flow
          assert metric to be
          (PIM_ASSERT_INFINITY - 1), one less than maximum (0xFFFFFFFE), as
          explained in Section 6.3. [Section 5.7].

          If it was not the GDR for a group, or source and group pair if
          SSM, earlier, and the new hash does not make it GDR, then no
          processing is required.

          If it was not the GDR for an earlier group, or source and
          group pair if SSM, and now becomes the GDR, it starts building
          multicast forwarding tree for this flow.

   2.

   3.  If it the router was included in the previous GDR list, but is not
       included in the new GDR list, or there is no new GDR list: It needs to process The
       router MUST for each of the groups, or source and group pairs if
       the group is in SSM mode. mode, with local receiver interest do as
       follows.

          If it was the GDR for a group, or source and group pair if
          SSM, it sets its the assert metric for preference to maximum
          (0x7FFFFFFF) and the multicast flow assert metric to be
          (PIM_ASSERT_INFINITY - 1), one less than maximum
          (0xFFFFFFFE), as explained in Section 6.3. [Section 5.7].

          If it was not the GDR, then no processing is required.

   3.  If it was not included in the previous

5.7.  PIM Assert Modification

   GDR list, but is included
       in the changes may occur due to configuration change, due to GDR
   candidates going down, and also new routers coming up and becoming
   GDR list, the router MUST run candidates.  This may occur while flows are being forwarded.  If
   the hash algorithm GDR for
       each of the groups, source and group pairs if SSM.

          If it an active flow changes, there is not likely to be some
   disruption, such as packet loss or duplicates.  By using asserts,
   packet loss is minimized, while allowing a small amount of
   duplicates.

   When a router stops acting as the GDR for a group, or source and
   group pair if SSM, no processing is required.

          If it is hashed as MUST set the GDR, it needs assert metric preference to build a multicast
          forwarding tree.

6.3.  PIM Assert Modification

   It is possible that
   maximum (0x7FFFFFFF) and the identity of assert metric to one less than maximum
   (0xFFFFFFFE).  This was also mentioned in the GDR might change previous section.  That
   is, whenever it sends or receives an assert for the group, it must
   use these values as the metric preference and metric rather than the
   values provided by routing.  This is similar to what is done for
   AssertCancel Messages in [RFC7761], except that the
   middle metric value here
   is one less.

   The rest of an active flow.  Examples when this could happen include:

      When a new PIM router comes up

      When a GDR restarts

   When section is just for illustration purposes and not
   part of the GDR changes, existing traffic might be disrupted.
   Duplicates or packet loss might be observed. protocol definition.

   To illustrate the case, behavior when there is a GDR change, consider the
   following scenario where there are two flows G1 and G2.  R1 is the
   GDR for G1, and R2 is the GDR for G2.  When R3 comes up
   online, up, it is
   possible that R3 becomes GDR for both G1 and G2, hence R3 starts to
   build the forwarding tree for G1 and G2.  If R1 and R2 stop
   forwarding before R3 completes the process, packet loss might occur.
   On the other hand, if R1 and R2 continue forwarding while R3 is
   building the forwarding trees, duplicates might occur.

   This is not a typical deployment scenario but might still happen.
   Here we describe a mechanism to minimize the impact.  We essentially
   want to minimize packet loss.  Therefore, we would allow a small
   amount of duplicates and depend on PIM Assert to minimize the
   duplication.

   When the role of GDR changes as above, instead of immediately
   stopping forwarding, R1 and R2 continue forwarding to G1 and G2
   respectively, while, at the same time, R3 build forwarding trees for
   G1 and G2.  This will lead to PIM Asserts.

   With the introduction of GDR, the following modification to the
   Assert packet MUST be done: if a router enables this specification on
   its downstream interface, but it is not a GDR (before network event
   it was GDR), it would adjust its Assert metric to
   (PIM_ASSERT_INFINITY - 1).

   Using the above example, for G1, assume R1 and R3 agree on the new
   GDR, which is R3.  With the new assert behavior, R1 will set sets its Assert assert
   metric as
   (PIM_ASSERT_INFINITY - 1). to the near maximum value discussed above.  That will make R3,
   which has normal metric in its Assert as the Assert winner.

   For G2, assume it takes a slightly longer time for R2 to find out
   that R3 is the new GDR and still considers itself being the GDR while
   R3 already has assumed the role of GDR.  Since both R2 and R3 think
   they are GDRs, they further compare their metric and IP addresses.
   If R3 has the better routing metric, or the same metric but a better
   tie-breaker, the result will be consistent during GDR selection.  If
   unfortunately, R2 has the better metric or the same metric but a
   better tie-breaker, R2 will become the Assert winner and continues to
   forward traffic.  This will continue until:

   The next PIM Hello Option from DR selects R3 as the GDR.  R3 will
   then build the forwarding tree and send an Assert.

   The process continues until  Shortly after when R2 agrees to the selection of R3 as finds out that it is no
   longer the GDR, and sets its own Assert metric to (PIM_ASSERT_INFINITY - 1),
   which R2 will make R3 the Assert winner.  During change to using the process, we will
   see intermittent duplication of traffic but packet loss near maximum assert
   metric.  Next time R2 sends an assert message, it will be
   minimized.  In lose the unlikely case that
   assert and stop forwarding.  As assert winner, R2 never relinquishes its role
   as GDR (while every other router thinks otherwise), the proposed
   mechanism also helps to keep the duplication to a minimum until
   manual intervention takes place to remedy the situation.

7. would send periodic
   assert messages per [RFC7761].

5.8.  Backward Compatibility

   In the case of a hybrid Ethernet shared LAN (where some PIM routers
   enable the specification defined in this document, and some do not) not).

   o  If a router which does not support this specification becomes the
      DR on the LAN, then it is the only router acting as a DR, and
      there will be no load-balancing.

   o  If a router which does not support only router acting as a DR, and
      there will be no load-balancing.

   o  If a router which does not support this specification becomes a
      non-DR on link, then it acts as non-DR defined in [RFC7761], and
      it will not take part in any load-balancing.  Load-balancing may
      still happen.

6.  Manageability Considerations

   An administrator needs to consider what the total bandwidth
   requirements are and find a set of routers that together has enough
   total capacity, while making sure that each of the router can handle
   its part, assuming that the traffic is distributed roughly equally
   among the routers.  Ideally, one should also have enough bandwidth to
   handle the case where at least one router fails.  Ideally all the
   routers should have reachability to the sources, and RPs if
   applicable, that is not via the LAN.

   Care must be taken when choosing what hash masks to configure.  One
   would typically configure the same masks on all the routers, so that
   they are the same, regardless of which router is elected as DR.  The
   default masks are likely suitable for most deployment.  The RP Hash
   Mask must be configured (the default is no bits set) if one wishes to
   hash based on the RP address rather than the group address for ASM.
   The default masks will use the entire group addresses, and source
   addresses if SSM, as part of the hash.  An administrator may set
   other masks that masks out part of the addresses to ensure that
   certain flows always get hashed to the same router.  How this specification becomes a
      non-DR is
   achieved depends on link, then it acts as non-DR defined in [RFC7761], and
      it will not take part in any load-balancing.

8.  Manageability Considerations how the group addresses are allocated.

   Only the routers announcing the same Hash Algorithm as the DR would
   be considered as GDR candidates.  Network administrators need to make
   sure that the desired set of routers announce the same algorithm.
   Migration between different algorithms is not considered in this
   document.

9.

7.  IANA Considerations

   IANA has temporarily assigned type 34 for the PIM DR Load Balancing
   Capability (DRLBC) (DRLB-Cap) Hello Option, and type 35 for the PIM DR Load
   Balancing GDR (DRLBGDR) List (DRLB-List) Hello Option in the PIM-Hello Options
   registry.  IANA is requested to make these assignments permanent when
   this document is published as an RFC.  The string TBD should be
   replaced by  Note that the assigned values accordingly. option names
   have changed slightly since the temporary assignments were made.
   Also, the length of option 34 is always 4, the registry currently
   says it is variable.

   This document requests IANA to create a registry called "Designated
   Router Load Balancing Hash Algorithms" in the "Protocol Independent
   Multicast (PIM)" branch of the registry tree.  The registry lists
   hash algorithms for use by PIM Designated Router Load Balancing.

9.1.

7.1.  Initial registry

   The initial content of the registry should be as follows.

    Type   Name                                     Reference
    ------ ---------------------------------------- --------------------
    0      Modulo                                   This document
    1-255  Unassigned

9.2.

7.2.  Assignment of new hash algorithms

   Assignment of new hash algorithms is done according to the "IETF
   Review" model, see [RFC5226].

10. [RFC8126].

8.  Security Considerations

   Security of the new DR Load Balancing PIM Hello Options is only
   guaranteed by the security of PIM Hello messages, so the security
   considerations for PIM Hello messages as described in PIM-SM
   [RFC7761] apply here.

11.

   If the DR is subverted it could omit or add certain GDRs or announce
   an unsupported algorithm.  If another router is subverted, it could
   be made DR and cause similar issues.  While these issues are specific
   to this specification, they are not that different from existing
   attacks such as subverting a DR and lowering the DR priority, causing
   a different router to become the DR.

   If a GDR is subverted, it could potentially be made to stop
   forwarding all the traffic it is expected to forward.  This is also
   similar today to if a DR is subverted.

9.  Acknowledgement

   The authors would like to thank Steve Simlo, Simlo and Taki Millonis for
   helping with the original idea, idea; Alia Atlas, Bill Atwood, Jake
   Holland, Bharat Joshi Joshi, Anish Kachinthaya, Anvitha Kachinthaya and
   Alvaro Retana for review
   comments, reviews and comments; and Toerless Eckert and
   Rishabh Parekh for helpful conversation on the document.

   Special thanks to Anish Kachinthaya, Anvitha Kachinthaya and Jake
   Holland for reviewing the document and providing comments.

12.

10.  References

12.1.

10.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
              Architecture", RFC 4291, DOI 10.17487/RFC4291, February
              2006, <https://www.rfc-editor.org/info/rfc4291>.

   [RFC6395]  Gulrajani, S. and S. Venaas, "An Interface Identifier (ID)
              Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395,
              October 2011, <https://www.rfc-editor.org/info/rfc6395>.

   [RFC7761]  Fenner, B., Handley, M., Holbrook, H., Kouvelas, I.,
              Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent
              Multicast - Sparse Mode (PIM-SM): Protocol Specification
              (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March
              2016, <https://www.rfc-editor.org/info/rfc7761>.

12.2.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

10.2.  Informative References

   [RFC5226]  Narten, T.

   [RFC3376]  Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A.
              Thyagarajan, "Internet Group Management Protocol, Version
              3", RFC 3376, DOI 10.17487/RFC3376, October 2002,
              <https://www.rfc-editor.org/info/rfc3376>.

   [RFC3810]  Vida, R., Ed. and L. Costa, Ed., "Multicast Listener
              Discovery Version 2 (MLDv2) for IPv6", RFC 3810,
              DOI 10.17487/RFC3810, June 2004,
              <https://www.rfc-editor.org/info/rfc3810>.

   [RFC4541]  Christensen, M., Kimball, K., and F. Solensky,
              "Considerations for Internet Group Management Protocol
              (IGMP) and Multicast Listener Discovery (MLD) Snooping
              Switches", RFC 4541, DOI 10.17487/RFC4541, May 2006,
              <https://www.rfc-editor.org/info/rfc4541>.

   [RFC4607]  Holbrook, H. Alvestrand, and B. Cain, "Source-Specific Multicast for
              IP", RFC 4607, DOI 10.17487/RFC4607, August 2006,
              <https://www.rfc-editor.org/info/rfc4607>.

   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
              Writing an IANA Considerations Section in RFCs", BCP 26,
              RFC 5226, 8126, DOI 10.17487/RFC5226, May 2008,
              <https://www.rfc-editor.org/info/rfc5226>. 10.17487/RFC8126, June 2017,
              <https://www.rfc-editor.org/info/rfc8126>.

Authors' Addresses

   Yiqun Cai
   Alibaba Group

   Email: yiqun.cai@alibaba-inc.com

   Heidi Ou
   Alibaba Group

   Email: heidi.ou@alibaba-inc.com

   Sri Vallepalli
   Cisco Systems, Inc.
   3625 Cisco Way
   San Jose  CA 95134
   USA

   Email: svallepa@cisco.com

   Mankamana Mishra
   Cisco Systems, Inc.
   821 Alder Drive,
   Milpitas  CA 95035
   USA

   Email: mankamis@cisco.com
   Stig Venaas
   Cisco Systems, Inc.
   Tasman Drive
   San Jose  CA 95134
   USA

   Email: stig@cisco.com

   Andy Green
   British Telecom
   Adastral Park
   Ipswich  IP5 2RE
   United Kingdom

   Email: andy.da.green@bt.com