Dynamic Host Configuration (DHC) T. Mrugalski Internet-Draft ISC Intended status: Standards Track K. Kinnear Expires:March 11,April 25, 2013 CiscoSeptember 7,October 22, 2012 DHCPv6 Failover Designdraft-ietf-dhc-dhcpv6-failover-design-01draft-ietf-dhc-dhcpv6-failover-design-02 Abstract DHCPv6 defined in [RFC3315] does not offer server redundancy. This document defines a design for DHCPv6 failover, a mechanism for running two servers on the same network with capability for either server to take over clients' leases in case of server failure or network partition. This is a DHCPv6 Failover design document, it is not protocol specification document. It is a second document in a planned series of three documents. DHCPv6 failover requirements are specified in [I-D.ietf-dhc-dhcpv6-failover-requirements]. A protocol specification document is planned to follow this document. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onMarch 11,April 25, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1. Additional Requirements . . . . . . . . . . . . . . . . . 6 3.2. Features out of Scope: Load Balancing . . . . . . . . . . 6 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 6 4.1. FailoverMachineState Machine Overview . . . . . . . . . . . . . 8 4.2. Messages . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. Connection Management . . . . . . . . . . . . . . . . . . . . 11 5.1. Creating Connections . . . . . . . . . . . . . . . . . . . 11 5.2. Endpoint Identification . . . . . . . . . . . . . . . . . 12 6. Resource Allocation . . . . . . . . . . . . . . . . . . . . . 13 6.1. Proportional Allocation . . . . . . . . . . . . . . . . .1314 6.2. Independent Allocation . . . . . . . . . . . . . . . . . .1416 6.3.DeterminingChoosing AllocationApproach . . . . . . . . . . . . . 15 6.3.1. IPv6 Addresses . . . . .Algorithm . . . . . . . . . . . . . .. 15 6.3.2. IPv6 Prefixes . . . . . . . . . . . . . . . . . . . . 1516 7. Information model . . . . . . . . . . . . . . . . . . . . . .1517 8. Failover Mechanisms . . . . . . . . . . . . . . . . . . . . .1921 8.1. Time Skew . . . . . . . . . . . . . . . . . . . . . . . .1921 8.2. Time expression . . . . . . . . . . . . . . . . . . . . .1922 8.3. Lazy updates . . . . . . . . . . . . . . . . . . . . . . .1922 8.4. MCLT concept . . . . . . . . . . . . . . . . . . . . . . .2022 8.4.1. MCLT example . . . . . . . . . . . . . . . . . . . . .2124 8.5. Unreachability detection . . . . . . . . . . . . . . . . .2225 8.6. Re-allocating Leases . . . . . . . . . . . . . . . . . . .2325 8.7. Sending Binding Update . . . . . . . . . . . . . . . . . .2326 8.8. Receiving Binding Update . . . . . . . . . . . . . . . . .2428 8.9. Conflict Resolution . . . . . . . . . . . . . . . . . . .2528 8.10. Acknowledging Reception . . . . . . . . . . . . . . . . .2730 9. Endpoint States . . . . . . . . . . . . . . . . . . . . . . .2730 9.1. State Machine Operation . . . . . . . . . . . . . . . . .2730 9.2. State Machine Initialization . . . . . . . . . . . . . . .3033 9.3. STARTUP State . . . . . . . . . . . . . . . . . . . . . .3033 9.3.1. Operation in STARTUP State . . . . . . . . . . . . . .3134 9.3.2. Transition Out of STARTUP State . . . . . . . . . . .3134 9.4. PARTNER-DOWN State . . . . . . . . . . . . . . . . . . . .3235 9.4.1. Operation in PARTNER-DOWN State . . . . . . . . . . .3235 9.4.2. Transition Out of PARTNER-DOWN State . . . . . . . . .3336 9.5. RECOVER State . . . . . . . . . . . . . . . . . . . . . .3437 9.5.1. Operation in RECOVER State . . . . . . . . . . . . . .3437 9.5.2. Transition Out of RECOVER State . . . . . . . . . . .3437 9.6. RECOVER-WAIT State . . . . . . . . . . . . . . . . . . . .3639 9.6.1. Operation in RECOVER-WAIT State . . . . . . . . . . .3740 9.6.2. Transition Out of RECOVER-WAIT State . . . . . . . . .3740 9.7. RECOVER-DONE State . . . . . . . . . . . . . . . . . . . .3740 9.7.1. Operation in RECOVER-DONE State . . . . . . . . . . .3841 9.7.2. Transition Out of RECOVER-DONE State . . . . . . . . .3841 9.8. NORMAL State . . . . . . . . . . . . . . . . . . . . . . .3841 9.8.1. Operation in NORMAL State . . . . . . . . . . . . . .3841 9.8.2. Transition Out of NORMAL State . . . . . . . . . . . .3942 9.9. COMMUNICATIONS-INTERRUPTED State . . . . . . . . . . . . .4043 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State . . . .4043 9.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State . .4144 9.10. POTENTIAL-CONFLICT State . . . . . . . . . . . . . . . . .4245 9.10.1. Operation in POTENTIAL-CONFLICT State . . . . . . . .4346 9.10.2. Transition Out of POTENTIAL-CONFLICT State . . . . . .4346 9.11. RESOLUTION-INTERRUPTED State . . . . . . . . . . . . . . .4447 9.11.1. Operation in RESOLUTION-INTERRUPTED State . . . . . .4548 9.11.2. Transition Out of RESOLUTION-INTERRUPTED State . . . .4548 9.12. CONFLICT-DONE State . . . . . . . . . . . . . . . . . . .4548 9.12.1. Operation in CONFLICT-DONE State . . . . . . . . . . .4648 9.12.2. Transition Out of CONFLICT-DONE State . . . . . . . .4649 10. Proposed extensions . . . . . . . . . . . . . . . . . . . . .4649 10.1. Active-active mode . . . . . . . . . . . . . . . . . . . .4649 11. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . .47 12. Reservations and50 11.1. Relationship between failover and dynamic DNS update . . . 50 11.2. Exchanging DDNS Information . . . . . . . . . . . . . . .47 13. Protocol entities . . . . . . . . . . . . . . . . . . . . . . 47 13.1. Failover Protocol . .51 11.3. Adding RRs to the DNS . . . . . . . . . . . . . . . . . .47 13.2. Protocol constants53 11.4. Deleting RRs from the DNS . . . . . . . . . . . . . . . . 54 11.5. Name Assignment with No Update of DNS . . . .47 14. Open questions. . . . . . 54 12. Reservations and failover . . . . . . . . . . . . . . . . . .48 15.55 13. Security Considerations . . . . . . . . . . . . . . . . . . .48 16.56 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . .48 17.56 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .48 18.56 16. References . . . . . . . . . . . . . . . . . . . . . . . . . .49 18.1.57 16.1. Normative References . . . . . . . . . . . . . . . . . . .49 18.2.57 16.2. Informative References . . . . . . . . . . . . . . . . . .4957 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .5058 1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Glossary This is a supplemental glossary that should be combined with definitions in Section 3 of [I-D.ietf-dhc-dhcpv6-failover-requirements]. o Failover endpoint - The failover protocol allows for there to be a unique failover 'endpoint' for each failover relationship in which a failover server participates. The failover relationship is defined by a relationship name, and includes the failover partner IP address, the role this server takes with respect to that partner (primary or secondary), and the prefixes associated with that relationship. Note that a single prefix can only be associated with a single failover relationship. This failover endpoint can take actions and hold unique states. Typically, there is a one failover endpoint per partner (server), although there may be more. 'Server' and 'failover endpoint' are synonymous only if the server participates in only one failover relationship. However, for the sake of simplicity 'Server' is used throughout the document to refer to a failover endpoint unless to do so would be confusing. o Failover transmission - all messages exchanged between partners. o Independent Allocation - a prefix allocation algorithm to split the available pool of resources between the primary and secondary servers that is particularly well suited for vast pools (i.e. when available resources are not expected to deplete). See Section 6.2 for details. oPrimary Server o Proportional AllocationPartner -a prefix allocation algorithm to split the available free leases betweenname of theprimary and secondary serversother DHCPv6 server that participates in failover relationship. When the role (primary or secondary) isparticularly well suited for more limited resources. See Section 6.1 for details. o Resource - Anynot important, the other server is referred to as a "failover partner" or simply partner. o Primary Server - First out of two DHCPv6 servers that participate in a failover relationship. In active-passive mode this is the server that handles most of the client traffic. Its failover partner is referred to as secondary server. o Proportional Allocation - a prefix allocation algorithm that splits the available resources (addresses or prefixes) between the primary and secondary servers that is particularly well suited for more limited resources. See Section 6.1 for details. o Resource - Any type of resource that is assignable using DHCPv6. Currently there are two types of such resources defined: a non- temporary IPv6 address and an IPv6 prefix. Due to the nature of temporary addresses, they are not covered by the failover mechanism. Other resource types may be defined in the future. o Responsive - A server that is responsive, will respond to DHCPv6 client requests. o Secondary Server - Second of out two DHCPv6 servers that participate in a failover relationship. Its failover partner is referred to as primary server. In active-passive mode this server typically does not handle client traffic and acts as a backup. o Server - A DHCPv6 server that implements DHCPv6 failover. 'Server' and 'failover endpoint' are synonymous only if the server participates in only one failover relationship. o Unresponsive - A server that is unresponsive will not respond to DHCPv6 client requests. 3. Introduction The failover protocol design provides a means for cooperating DHCPv6 servers to work together to provide a DHCPv6 service with availability that is increased beyond that which could be provided by a single DHCPv6 server operating alone. It is designed to protect DHCPv6 clients against server unreachability, including server failure and network partition. It is possible to deploy exactly two servers that are able to continue providing a lease on an IPv6 address [RFC3315] or on an IPv6 prefix [RFC3633] without the DHCPv6 client experiencing lease expiration or a reassignment of a lease to a different IPv6 address in the event of failure by one or the other of the two servers. This protocol defines active-passive mode, sometimes also called a hot standby model. This means that during normal operation one server is active (i.e. actively responds to clients' requests) while the second is passive (i.e. it does receive clients' requests, but does not respond to them and only maintains a copy of lease database and is ready to take over incoming queries in case of primary server failure). Active-active mode (i.e. both servers actively handling clients' requests) is currently not supported for the sake of simplicity. Such mode may be defined as an exension at a later time. The failover protocol is designed to provide lease stability for leases with lease times beyond a short period. Due to the additional overhead required, failover is not suitable for leases shorter than 30 seconds. The DHCPv6 Failover protocol MUST NOT be used for leases shorter than 30 seconds. This design attempts to fulfill all DHCPv6 failover requirements defined in [I-D.ietf-dhc-dhcpv6-failover-requirements]. 3.1. Additional Requirements The following requirements are not related to failover mechanism in general, but rather to this particular design. 1. Minimize Asymmetry - while there are two distinct roles in failover (primary and secondary server), the differences between those two roles should be as small as possible. This will yield a simpler design as well as a simpler implementation of that design. 3.2. Features out of Scope: Load Balancing While it is tempting to extend DHCPv6 failover mechanism to also offer load balancing, as DHCPv4 failover did, this design does not do that. Here is the reasoning for this decision. In general case (not related to failover) load balancing solutions are used when each server is not able to handle total incoming traffic. However, by the very definition, DHCPv6 failover is supposed to assume service availability despite failure of one server. That leads to conclusion that each server must be able to handle whole traffic. Therefore in properly provisioned setup, load balancing is not needed. It is likely that active-active mode that is essentially a load balancing will be defined as an extension in the near future. 4. Protocol Overview The DHCPv6 Failover Protocol is defined as a communication between failover partners with all associated algorithms and mechanisms. Failover communication is conducted over a TCP connection established between the partners. The protocol reuses the framing format specified in Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460], but uses different message types. New failover-specific message types are listed in Section 4.2. All information is sent over the connection as typical DHCPv6 messages that convey DHCPv6 options, following format defined in Section 22.1 of [RFC3315]. After initialization, the primary server establishes a TCP connection with its partner. The primary server sends a CONNECT message with initial parameters. Secondary server responds with CONNECTACK. Depending on the failover state of each partner, they MUST initiate one of the binding update procedures. Each server MAY send an UPDREQ message to request its partner to send all updates that have not been sent yet (this case applies when the partner has an existing database and wants to update it). Alternatively, a server MAY choose to send an UPDREQALL message to request a full lease database transmission including all leases (this case applies in case of booting up new server after installation, corruption or complete loss of database, or other catastrophic failure). Servers exchange lease information by using BNDUPD messages. Depending on the local and remote state of a lease, a server may either accept or reject the update. Reception of lease update information is confirmed by responding with a BNDACK message with appropriate status. The majority of the messages sent over a failover TCP connection consists of BNDUPD and BNDACK messages. A subset of available resources (addresses or prefixes) is reserved for secondary server use. This is required for handling a case where both servers are able to communicate with clients, but unable to communicate with each other. After the initial connection is established, the secondary server requests a pool of available addresses by sending a POOLREQ message. The primary server assigns addresses to the secondary by sending a series of BNDUPD messages. When this process is complete, the primary server sends a POOLRESP message to the secondary server. The secondary server may initiate such pool request at any time when in communication with primary server. Failover servers use a lazy update mechanism to update their failover partner about changes to their lease state database. After a server performs any modifications to its lease state database (assign a new lease, extend an existing one, release or expire a lease), it sends its response to the client's request first (performing the "regular" DHCPv6 operation) and then informs its failover partner using a BNDUPD message. This BNDUPD message SHOULD be sent soon after the response is sent to the DHCPv6 client, but there is no specific requirement of a minimum time in which to do so. The major problem with lazy update mechanism is the case when the server crashes after sending a response to client, but before sending the lazy update to its partner (or when communication between partners is interrupted). To solve this problem, the concept known as the Maximum Client Lead Time(MCLT)(initially designed for DHCPv4 failover) is used. The MCLT is the maximum amount of time that one server can extend a lease for a client's binding beyond the time known by its failover partner. See Section 8.4 for detailed desciption how the MCLT affects assigned lease times. Servers verify each others availability by periodically exchanging CONTACT messages. See Section 8.5 for discussion about detecting a partner's unreachability. A server that is being shut down transmits a DISCONNECT message, closes the connection with its failover partner and stops operation. A Server SHOULD transmit any pending lease updates before transmitting DISCONNECT message. 4.1. FailoverMachineState Machine Overview The following section provides a simplified description of all states. For the sake of clarity and simplicity, it omits important details. For complete description, see Section 9. In case of a disagreement between the simplified and complete description, please follow Section 9. Each server MUST be in one of the well defines states. In each state a server may be either responsive (responds to clients' queries) or unresponsive (clients' queries are ignored). A server starts its operation in short-lived STARTUP state. A server determines its partnerreachibilityreachability and state and sets its own state based on that determination. It frequently returns back to the state it was in before shutdown. During typical operation when servers maintain communication, both are in NORMAL state. In that state only the primary responds to clients' requests. A secondary server in unresponsive to DHCPv6 clients. If a server discovers that its partner is no longer reachable, it goes to COMMUNICATIONS-INTERRUPTED state. A server must be extra cautious as it can't distingush if its partner is down or just communication between servers is interrupted. Since communication between partners is not possible, a server must act on the assumtion that its partner is up. A failover server must follow a defined procedure, in particular, it MUST NOT extend any lease more than the MCLT beyond its partner's knowledge of the lease expiration time. This imposes an additional burden on the server, in that clients will return to the server for lease renewals more frequently than they would otherwise. Therefore it is not recommended to operate for prolonged periods in this state. Once communication is reestablished, a server may go into NORMAL, POTENTIAL-CONFLICT or PARTNER-DOWN state. It may also stay in COMMUNICATIONS-INTERRUPTED state if certain conditions are met. Once a server is switched into PARTNER-DOWN (when auto-partner-down is used or as a result of administrative action), it can extend leases, regardless of the original server that initially granted the lease. In that state server handles leases from its own pool, but is also able to serve pool from its downed partner. MCLT restrictions no longer apply. Operation in this mode is less demanding for the server that remains operational, than in COMMUNICATIONS-INTERRUPTED state, but PARTNER-DOWN does not offer any kind of redundancy. When a server does not have an intact lease state database (e.g. due to first time run or catastrophic failure) or detects that is partner is in PARTNER-DOWN state and additional conditions are met, it switches to RECOVER state. In that state the server acknowledges that content of its database is doubtful and it needs to refresh its database from its partner. Once this operation is complete, it switches to RECOVER-WAIT and later to RECOVER-DONE. Once servers reestablish connection, they discover each others' state. Depending on the conditions, they may return to NORMAL or move to POTENTINAL-CONFLICT if the partner is in a state that doesn't allow a simple re-integration of the server's lease state databases. It is a goal of this protocol to minimize the possibility that POTENTIAL-CONFLICT state is ever entered. Servers running in POTENTIAL-CONFLICT do not respond to clients' requests and work only on resolving potential conflicts. Once outstanding lease updates are exchanged, servers move to CONFLICT-DONE or NORMAL states. Servers that are recovering from potential conflicts and loose communication, switch to RESOLUTION-INTERRUPTED. A Server that is being shut down sends a DISCONNECT message. See Section 4.2. 4.2. Messages The failover protocol is centered around the message exchanges used by one server to update its partner and respond to received updates. The following list enumerates these messages. It should be noted that no specific formats or message type values are assigned at this stage. Appropriate implementation details will be specified in a separate protocol specification document. o BNDUPD - The binding update message is used to send the binding lease changes to the partner. One message may contain one or more lease updates. The partner is expected to respond with a BNDACK message. o BNDACK - The binding acknowledgement is used for confirmation of the received BNDUPD message. It may contain a positive or negative response (e.g. due to detected lease conflict). o POOLREQ - The Pool Request message is used by one server (typically secondary) to request allocation of resources (addresses or prefixes) from its partner. The partner responds with POOLRSP. o POOLRSP - The Pool Response message is used by one server (typically primary) to repond to its partner's request for resources allocation. One POOLRSP message may contain more than one pool. o UPDREQ - The update request message is used by one server to request that its partner send all binding database changes that has not been sent and confirmed already. Requested partner is expected to respond with zero or more BNDUPD messages, followed by UPDDONE that signals end of updates. o UPDREQALL - The update request all is used by one server to request that all binding database information be sent in order to recover from a total loss of its binding database by the requesting server. Requested server responds with zero or more BNDUPD messages, followed by UPDDONE that signal end of updates. o UPDDONE - The update done message is used by the responding server to indicate that all requested updates have been sent by the responding server and acked by the requesting server. o CONNECT - The connect message is used by the primary server to establish a high level connection with the other server, and to transmit several important configuration data items between the servers. The partner is expected to confirm by responding with CONNECTACK message. o CONNECTACK - The connect acknowledgement message is used by the secondary server to respond to a CONNECT message from the primary server. o DISCONNECT - The disconnect message is used by either server when closing a connection and shutting down. No response is required for this message. o STATE - The state message is used by either server to inform its partner about a change of failover state. In some cases it may be used to also inform the partner about current state, e.g. after connection is established in COMMUNICATIONS-INTERRUPTED or PARTNER-DOWN states. o CONTACT - The contact message is used by either server to ensure that the other server continues to see the connection as opera- tional. It MUST be transmitted periodically over every esta- blished connection if other message traffic is not flowing, and it MAY be sent at any time. 5. Connection Management 5.1. Creating Connections Every primary server implementing the failover protocol SHOULD attempt to connect to all of its partners periodically, where the period is implementation dependent and SHOULD be configurable. In the event that a connection has been rejected by a CONNECTACK message with a reject-reason option contained in it or a DISCONNECT message, a server SHOULD reduce the frequency with which it attempts to connect to that server but it SHOULD continue to attempt to connect periodically. Every secondary server implementing the failover protocol SHOULD listen for connection attempts from the primary server. When a connection attempt succeeds, the primary server which has initiated the connection attempt MUST send a CONNECT message down the connection. When a connection attempt is received, the only information that the receiving server has is the IP address of the partner initiating a connection. If it has any relationships with the connecting server for which it is a seconary server, it should just await the CONNECT message to determine which relationship this connection is to serve. If it has no secondary relationships with the connecting server, it SHOULD drop the connection. To summarize -- a primary server MUST use a connection that it has initiated in order to send a CONNECT message. Every server that is a secondary server in a relationship simply listens for connection attempts from the primary server. Once a connection is established, the primary server MUST send a CONNECT message across the connection. A secondary server MUST wait for the CONNECT message from a primary server. If the secondary server doesn't receive a CONNECT message from the primary server in an installation dependent amount of time, it MAY drop the connection. Every CONNECT message includes a TLS-request option, and if the CONNECTACK message does not reject the CONNECT message and the TLS- reply option says TLS MUST be used, then the servers will immediately enter into TLS negotiation. Once TLS negotiation is complete, the primary server MUST resend the CONNECT message on the newly secured TLS connection and then wait for the CONNECTACK message in response. The TLS-request and TLS-reply options MUST NOT appear in either this second CONNECT or its associated CONNECTACK message as they had in the first messages. The second message sent over a new connection (either a bare TCP connection or a connection utilizing TLS) is a STATE message. Upon the receipt of this message, the receiver can consider communications up. 5.2. Endpoint Identification The proper operation of the failover protocol requires more than the transmission of messages between one server and the other. Each endpoint might seem to be a single DHCPv6 server, but in fact there are situations where additional flexibility in configuration is useful. A failover endpoint is always associated with a set of DHCPv6 prefixes that are configured on the DHCPv6 server where the endpoint appears. A DHCPv6 prefix MUST NOT be associated with more than one failover endpoint. The failover protocol SHOULD be configured with one failover relationship between each pair of failover servers. In this case there is one failover endpoint for that relationship on each failover partner. This failover relationship MUST have a unique name. There is typically little need for addtional relationships between any two servers but there MAY be more than one failover relationship between two servers -- however each MUST have a unique relationship name. Any failover endpoint can take actions and hold unique states. This document frequently describes the behavior of the protocol in terms of primary and secondary servers, not primary and secondary failover endpoints. However, it is important to remember that every 'server' described in this document is in reality a failover endpoint that resides in a particular process, and that several failover end- points may reside in the same server process. It is not the case that there is a unique failover endpoint for each prefix that participates in a failover relationship. On one server, there is (typically) one failover endpoint per partner, regardless of how many prefixes are managed by that combination of partner and role. Conversely, on a particular server, any given prefix will be associated with exactly one failover endpoint. When a connection is received from the partner, the unique failover endpoint to which the message is directed is determined solely by the IP address of the partner, the relationship-name, and the role of the receiving server. 6. Resource Allocation Currently there are two allocation algorithms defined for resources (addresses or prefixes). Additional allocation schemes may be defined as future extensions. 1. Proportional Allocation - This allocation algorithm is a direct application of the algorithm defined in [dhcpv4-failover] to DHCPv6. Available resources are split between the primary and secondary servers. Released resources are always returned to the primary server. Primary and secondary servers may initiate a rebalancing procedure when disparity between resources available to each server reaches a preconfigured threshold. Only resources that are not leased to any clients are "owned" by one of the servers. This algorithm is particularly well suited for scenarios where amount of available resources is limited, as may be the case with prefix delegation. See Section 6.1 for details. 2. Independent Allocation - This allocation algorithm assumes that available resources are split between primary and secondary servers as well. In this case, however, resources are assigned to a specific server for all time, regardless if they are available or currently used. This algorithm is much simpler than proportional allocation, because resource imbalance doesn't have to be checked and there is no rebalancing for independent allocation. This algorithm is particularly well suited for scenarios where the there is an abundance of available resources which is typically the case for DHCPv6 address allocation. See Section 6.2 for details. 6.1. Proportional Allocation In this allocation scheme, each server has its own pool of available resources. Note that a resource is not "owned" by a particular server throughout its entire lifetime. Only a resource which is available is "owned" by a particular server -- once it has been leased to a client, it is not owned by either failover partner. When it finally becomes available again, it will be owned initially by the primary server, and it may or may not be allocated to the secondary server by the primary server. The flow of a resource is as follows: initially a resource is owned by the primary server. It may be allocated to the secondary server if it is available, and then it is owned by the secondary server. Either server can allocate available resources which they own to clients, in which case they cease to own them. When the client releases the resource or the lease on it expires, it will again become available and will be owned by the primary. A resource will not become owned by the server which allocated it initially when it is released or the lease expires because, in general, that server will have had to replenish its pool of available resources well in advance of any likely lease expirations. Thus, having a particular resource cycle back to the secondary might well put the secondary more out of balance with respect to the primary instead of enhancing the balance of available addresses or prefixes between them. Pools governed by proportional allocation are used for allocation when the server is in all states, except PARTNER-DOWN. In PARTNER- DOWN state the healthy partner can allocate from either pool (both its own and its partner's). This allocation and maintenance of these address pools is an area of some sensitivity, since the goal is to maintain a more or less constant ratio of available addresses between the two servers.TODO: Reuse rest ofThe initial allocation when thedescription from section 5.4servers first integrate is triggered by the POOLREQ message from[dhcpv4-failover] here. 6.2. Independent Allocation In this allocation scheme, available resources are split between servers. Available resources are split betweenthe secondary to the primary. This is followed by the POOLRESP message where the primaryandtells the secondaryservers as part of initial connection establishment. Oncehow many resourcesareit allocated toeach server, there is no needthe secondary. Then, the primary sends the allocated resources toreassign them. This algorithmthe secondary via BNDUPD messages. The POOLREQ/POOLRESP message issimpler than proportional allocation since it requires similar initial communication and does not requirearebalancing mechanism, but it assumes thattrigger to thepool assignedprimary toeach server will never deplete. That is oftenperform areasonable assumption for IPv6 addresses (e.g.scan of its database and to ensure that the secondary has enough resources (based on some configured ratio). Servers frequently have several kinds of resources available on a particular network segment. The failover protocol assumes that both primary and secondary servers areoften assignedconfigured in such a/64 poolway thatcontains many more addresses than existing electronic deviceseach knows the type and number of resources onEarth). This allocation mechanism SHOULD be used for IPv6 addresses, unlessevery network segment participating in theconfigured address pool is small orfailover protocol. The primary server isotherwise administratively limited. Onceresponsible for allocating the secondary server the correct proportion of available resources of each kind, and the secondary server isassignedresponsible for being configured in such aresource pool during initial connection establishment,way that itmay allocate assignedcan tell the kind of every resource based solely on the IP or prefix address itself. The resources are delegated toclients. Once a client releasethe secondary using the BNDUPD message with aresource or its lease is expired,state of FREE_BACKUP, which indicates thereturnedresourcereturns to poolis now available forthe same server. Resources never changes servers. During COMMUNICATION-INTERRUPTED events, a partner MAY continue extending existing leases when requestedallocation byclients. A healthy partnerthe secondary. Once the message is sent, the primary MUST NOTleaseuse these resources for allocation to DHCPv6 clients. Available resources can be delegated back to the primary server in certain cases. BNDUPD will contain state FREE for leases that wereassigned to its downed partner and later released by a client unless it ispreviously inPARTNER-DOWNFREE_BACKUP state.6.3. Determining Allocation Approach 6.3.1. IPv6 Addresses 6.3.2. IPv6 Prefixes 7. Information model In most DHCP servers a resource (an IP address or a prefix) can take on several different binding-status values, sometimes also called lease states. While no two DHCP servers probably have exactlyThe POOLREQ/POOLRESP message exchange initiated by thesame possible binding-status values,secondary is valid at any time, and theDHCP RFC enforces some commonality amongprimary server SHOULD, whenever it receives thegeneral semanticsPOOLREQ message, scan its database of prefixes and determine if thebinding-status values used by various DHCP server implementations.secondary needs more resources from any of the prefixes. In order totransmit binding database updatessupport a reasonably dynamic balance of the resources betweenone server and another usingthe failoverprotocol, some common denominator binding- status values must be defined. It is not expected that these values correspond with any actual implementation ofpartners, theDHCP protocol in a DHCP server, but ratherprimary server needs to do additional work to ensure that thebinding-status values defined in this document should be a common denominator of those in use by many DHCPsecondary serverimplementations.has as many resources as it needs (but that it doesn't have more than it needs). Thelease binding-status values defined forprimary server SHOULD examine thefailover protocol are listed below. Unless otherwise noted below, there MAY be client information associated with eachbalance ofthese binding-status value. ACTIVE -- The lease is assigned to a client. Client identification data MUST appear. EXPIRED -- indicates that a client's binding on a given lease has expired. Whenavailable resources between thepartner acksprimary and secondary for a particular prefix whenever theBNDUPDnumber ofan expired lease, the server sets its internal state to FREE*. Client identification SHOULD appear. RELEASED -- indicates thatavailable resources for either the primary or secondary changes by more than aclient sent in RELEASE message. Whenconfigured limit. The primary server SHOULD adjust thepartner acksavailable resource balance as required to ensure theBNDUPD of a released lease,configured resource balance, excepting that the primary serversets its internal state to FREE*. Client identificationSHOULDappear. FREE* -- Onceemploy some threshold mechanism to such alease is expired or released, its state becomes FREE*. Depending on which algorithm and which pool was usedbalance adjustment in order toallocateminimize the overhead of maintaining this balance. An example of agiven lease, FREE* may either mean FREE or FREE_BACKUP. Implementationsthreshold approach is: do nothave to implement this FREE* state, but may choose to switchattempt to re-balance thedestination state directly. For a clarityprefixes on the primary and secondary until the out ofrepresentation, this transitional FREE* state is treated as a separate state. FREE -- Is used whenbalance value exceeds aDHCPconfigured value. The primary serverneedscan, at any time, send an available resource tocommunicate thatthe secondary using a BNDUPD with the state BACKUP. The primary server can attempt to take an available resourceis unused by any client, but it was not just released, expired or resetaway from the secondary by sending anetwork administrator. WhenBNDUPD with thepartner acksstate FREE. If theBNDUPD of a FREE lease,secondary accepts theserver marksBNDUPD, then it is now available to thelease asPRIMARY and not availablefor assignment byto theprimary server. Notesecondary. Of course, the secondary MUST reject thatonBNDUPD if it has already used that resource for asecondary server running in PARTNER-DOWN state, after waiting the MCLT,DHCP client. 6.2. Independent Allocation In this allocation scheme, available resources are permanently (until server configuration changes) split between servers. Available resources are split between theresource MAY beprimary and secondary servers as part of initial connection establishment. Once resources are allocated to each server, there is no need to reassign them. This algorithm is simpler than proportional allocation since it requires similar initial communication, but does not require aclient byrebalancing mechanism. It assumes that thesecondarypool assigned to each serverif proportional algorithmwill never deplete. That isused. Client identification MAY appear. FREE_BACKUP -- indicatesoften a reasonable assumption for IPv6 addresses (e.g. servers are often assigned a /64 pool thatthis resource cancontains many more addresses than existing electronic devices on Earth). This allocation mechanism SHOULD beallocated byused for IPv6 addresses, unless thesecondaryconfigured address pool is small or is otherwise administratively limited. Once each server is assigned a resource pool during initial connection establishment, it may allocate assigned resources to clients. Once a clientat any time. Note that the primary server running in PARTNER-DOWN state, after waiting the MCLT,release a resource or its lease is expired, the returned resourceMAY be allocatedreturns toa client bypool for theprimaryserverif proportional algorithm was used. Client identification MAY appear. ABANDONED -- indicatesthat leased it. Resources never changes servers. During COMMUNICATION-INTERRUPTED events, a partner MAY continue extending existing leases when requested by clients. A healthy partner MUST NOT leaseis considered unusableresources that were assigned to its downed partner and later released bythe DHCP system. The primary reason for entering such statea client unless it is in PARTNER-DOWN state. Server SHOULD use its own pool first before starting new assignements from its downed partner's pool. As the assumption isreception of DECLINE message for said lease. Client identification MUST NOT appear. RESET -- indicatesthatthis resource was previously abandoned, but was madeindependent allocation should be used only when availableby operator command. Thisresources are vast and not expected to be fully used at any given time, it isa distinct state sovery unlikely that thereason thatserver will ever need to use its downed partner pools. 6.3. Choosing Allocation Algorithm All implementations MUST support proportional allocation algorithm and SHOULD support independent allocation. If theresource became FREE canimplementation implement both and let the user configure it, the default algorithm used SHOULD bedetermined. Client identification MAY appear. The lease state machine has been presented in Figure 1. Most states are stationary, i.e.proportional allocation algorithm. Proportional allocation mechanism is more flexible as it can dynamically rebalance available resources between servers. That balance includes additional burden for thelease staysservers and generates more traffic between servers. Proportional algorithm can be considered as managing available resources more efficiently than idenpendent. That is important aspect when working in agiven state untile exernal event triggers transition to another state. The only transitive state is FREE*. One itnetwork that isreached,nearing address and/or prefix depletion. Independent allocation can be used when the number of available resources are large and there is no realistic danger of running out of resources. Use of thestate machine immediately transitions to either FREE or FREE_BACKUP state. +---------+ /------------->| ACTIVE |<--------------\ | +---------+ | | | | | | | /--(8)--/ (3) \--(9)-\ | | | | | | | V V V | | +-------+ +--------+ +---------+ | | |EXPIRED| |RELEASED| |ABANDONED| | | +-------+ +--------+ +---------+ | | | | | | | | | (10) | | | | V | | | | +---------+ | | | | | RESET | | | | | +---------+ | | | | | | | \--(4)--\ (4) /--(4)--/ | | | | | | (1) V V V (2) | /---------\ | | | FREE* | | | \---------/ | | | | | | /-(5)--/ \-(6)-\ | | | | | | V V | | +-------+ +-----------+ | \----| FREE |<--(7)-->|FREE_BACKUP|-----/ +-------+ +-----------+ Figure 1: Lease State Machine Transitionsindependent allocation makes communication betweenstates are results of the following events: 1. Primary server allocatespartners simpler. Typically indepentent allocation is used for IPv6 addresses, because even for /64 pools alease. 2. Secondaryserverallocates a lease. 3. Client sends RELEASE and the leasewill never run out of addresses to assign, so there isreleased. 4. Partner acknowledges state change. This transition MAY also occur ifno need to rebalance. For theserverprefix delegation mechanism, available resources are much smaller, so there isin PARTNER-DOWN state and the MCLT has passed since the entry in RELEASED, EXPIRED, or RESET states. 5. The lease belongs toapool that is governed by thedanger of running out of addresses. Therefore typically proportionalallocation, or independentallocationiswill be usedand this lease belongs to primary server. 6. The lease belongs to a pool that is governed by the independentfor prefix delegations. Independent allocationis used and the lease belongs to the secondary server. 7. Pool rebalance event occurs (POOLREQ/POOLRSP messages are exchanged). Addresses (or prefixes) belonging tomay be used, but theprimary serverimplication must be well understood. For example in a network that delegates /64 prefixes out out /48 prefix (so there can beassignedup tothe secondary server pool (transition from FREE65536 prefixes delegated) and a 1000 requesting routers, it is safe toFREE_BACKUP) or vice versa. 8. The leaseuse independent allocation. It should be stressed out that independent allocation algorithm SHOULD NOT be used when number of resources isexpired. 9. DECLINE messagelimited and there isreceived oralease is deemed unusable for other reasons. 10. An administrative actionrealistic danger of depleting resources. If this recommendation istaken to recover an abandoned lease backviolated, it may lead tousable state. This transition MAY occura case, when one server denies clients due toan implementation specific handling on ABANDONED resource. One possible example of such use is an Neighbor Discovery or ICMP Echo check ifpool depletion despite theaddress isfact the the other partner stillin use. Thehave many resources available. 7. Information model In most DHCP servers a resourcethat is no longer in use (due to expiration(an IP address orrelease), becomes FREE*. Dependinga prefix) can take on several different binding-status values, sometimes also called lease states. While no two DHCP servers probably have exactly the same possible binding-status values, the DHCP RFC enforces some commonality among the general semantics ofwhat allocation algorithm is used,theresource that is no longer is use, returns to primary (FREE) or secondary pool (FREE_BACKUP). The conditions for specific transitions are depicted in Figure 2. +---------------+---------+-----------+ | \ Pool owner| | | | \-------\ | Primary | Secondary | |Algorithm \ | | | +---------------+---------+-----------+ | Proportional | FREE | FREE | | Independent | FREE |FREE_BACKUP| +---------------+---------+-----------+ Figure 2: FREE* State Transitions TODO: In case of Active-Passive model, while a majority of the addresses are ownedbinding-status values used bythe primary server, the secondaryvarious DHCP serverwill need a portion of the addressesimplementations. In order toserve new clients while operating in communication-interrupted statetransmit binding database updates between one server andalso in partner down state before it can take overanother using theentire address pool (expiry of MCLT). The conceptfailover protocol, some common denominator binding- status values must be defined. It is not expected that these values correspond with any actual implementation of the DHCP protocol in apercentage of pool reserved for secondaryDHCP server, but rather that the binding-status values defined in this document should bedescribed here. 8. Failover Mechanisms This section lays out an overviewa common denominator ofthe communication between partners and other mechanisms requiredthose in use by many DHCP server implementations. The lease binding-status values defined for the failoveroperation. As this is a design document, not a protocol specification, high level ideas are presented without implementation specific details (e.g. on- wireprotocolformats). Specific protocol detailsareout of the scope of this document, and maylisted below. Unless otherwise noted below, there MAY bespecified in a separate draft. 8.1. Time Skew Partners exchangeclient informationabout knownassociated with each of these binding-status value. ACTIVE -- The leasestates. To reliably compareis assigned to aknown lease state with an update received fromclient. Client identification data MUST appear. EXPIRED -- indicates that apartner, servers must be able to reliably compareclient's binding on a given lease has expired. When thetimes stored inpartner acks theknown lease state withBNDUPD of an expired lease, thetimes receivedserver sets its internal state to FREE*. Client identification SHOULD appear. RELEASED -- indicates that a client sent in RELEASE message. When theupdate. Althoughpartner acks the BNDUPD of asimple approach would bereleased lease, the server sets its internal state torequire both partnersFREE*. Client identification SHOULD appear. FREE* -- Once a lease is expired or released, its state becomes FREE*. Depending on which algorithm and which pool was used touse synchronized time, e.g. by using NTP, suchallocate aservicegiven lease, FREE* may either mean FREE or FREE_BACKUP. Implementations do notalways be available in some scenarios that failover expectshave tocover. Therefore a mechanismimplement this FREE* state, but may choose to switch tomeasure and track relative time differences between servers is necessary. To do so, each message MUST contain information about the time of the transmission inthetime contextdestination state directly. For a clarity ofthe transmitter. The transmitting server MUST setrepresentation, this transitional FREE* state is treated asclosea separate state. FREE -- Is used when a DHCP server needs tothe actual transmission as possible. The receiving partner MUST store its own timestamp of reception as close to the actual reception as possible. The received timestamp information is then compared with local timestamp. To account for packet delay variation (jitter), the measured differencecommunicate that a resource isnot used directly,unused by any client, butratherit was not just released, expired or reset by a network administrator. When themoving averagepartner acks the BNDUPD oflast TIME_SKEW_PKTS_AVG packets time difference is calculated. This averaged value is referred toa FREE lease, the server marks the lease as available for assignment by thetime skew.primary server. Note that on a secondary server running in PARTNER-DOWN state, after waiting thetime skewMCLT, the resource MAY be allocated to a client by the secondary server if proportional algorithmallows cooperation between clients with completely desynchronized clocks as well as those whose desynchronization itselfisnot constant. 8.2. Time expression Timestamps are expressed as number of seconds since midnight (UTC), January 1, 2000, modulo 2^32. Note:used. Client identification MAY appear. FREE_BACKUP -- indicates thatisthis resource can be allocated by thesame approach as used in creation of DUID-LLT (see Section 9.2 of [RFC3315]). Time differences are expressed in seconds and are signed. 8.3. Lazy updates Lazy update referssecondary server tothe requirement placed ona client at any time. Note that the primary serverimplementing a failover protocol to update its failover partner wheneverrunning in PARTNER-DOWN state, after waiting thebinding database changes. A failover protocol which didn't support lazy update would requireMCLT, thefailover partner update to complete before a DHCPv6 server could respondresource MAY be allocated to aDHCPv6clientrequest. The lazy update mechanism allows a server to allocate a new or extend an existing lease and then update its failover partner as time permits. Althoughby thelazy update mechanism does not introduce additional delays in server response times, it introduces other difficulties. The key problem with lazy update is that when aprimary serverfails after updating a client with a particular lease time and before updating its partner, the partner will believeif proportional algorithm was used. Client identification MAY appear. ABANDONED -- indicates that a leasehas expired even though the client still retains a valid lease on that address or prefix. 8.4. MCLT concept In order to handle problem introduced by lazy updates (see Section 8.3), a period of time known as the "Maximum Client Lead Time" (MCLT)isdefined and must be known to bothconsidered unusable by the DHCP system. The primaryand secondary servers. Proper usereason for entering such state is reception of DECLINE message for said lease. Client identification MUST NOT appear. RESET -- indicates that thistime interval places an upper bound on the difference allowed between the lease time provided to a DHCPv6 clientresource was previously abandoned, but was made available by operator command. This is aserver anddistinct state so that thelease time known byreason thatserver's failover partner. The MCLT is typically much less thanthe resource became FREE can be determined. Client identification MAY appear. The leasetime that a serverstate machine has beenconfigured to offerpresented in Figure 1. Most states are stationary, i.e. the lease stays in aclient, and so some strategy must exist to allow a servergiven state untile exernal event triggers transition toofferanother state. The only transitive state is FREE*. One it is reached, theconfigured lease timethe state machine immediately transitions toa client. During a lazy updateeither FREE or FREE_BACKUP state. +---------+ /------------->| ACTIVE |<--------------\ | +---------+ | | | | | | | /--(8)--/ (3) \--(9)-\ | | | | | | | V V V | | +-------+ +--------+ +---------+ | | |EXPIRED| |RELEASED| |ABANDONED| | | +-------+ +--------+ +---------+ | | | | | | | | | (10) | | | | V | | | | +---------+ | | | | | RESET | | | | | +---------+ | | | | | | | \--(4)--\ (4) /--(4)--/ | | | | | | (1) V V V (2) | /---------\ | | | FREE* | | | \---------/ | | | | | | /-(5)--/ \-(6)-\ | | | | | | V V | | +-------+ +-----------+ | \----| FREE |<--(7)-->|FREE_BACKUP|-----/ +-------+ +-----------+ Figure 1: Lease State Machine Transitions between states are results of theupdatingfollowing events: 1. Primary servertypically updates its partner withallocates apotential expiration time which is longer than the lease time previously given to the clientlease. 2. Secondary server allocates a lease. 3. Client sends RELEASE andwhich is longer thanthe leasetime thatis released. 4. Partner acknowledges state change. This transition MAY also occur if the server is in PARTNER-DOWN state and the MCLT hasbeen configured to give a client. This allows that server to give a longer lease time to the client the next time the client renews its lease,passed since thetime that it will giveentry in RELEASED, EXPIRED, or RESET states. 5. The lease belongs tothe client will not exceed the MCLT beyond the potential expiration time acknowledgeda pool that is governed byits partner. The fundamental relationship on which much ofthecorrectness of this protocol dependsproportional allocation, or independent allocation isthat theused and this leaseexpiration time knownbelongs to primary server. 6. The lease belongs to aDHCPv6 client MUST NOT under any circumstances be more thanpool that is governed by themaximum client lead time (MCLT) greater thanindependent allocation and thepotential expiration time knownlease belongs toa server's partner. The remainder of this section makestheabove fundamental relationship more explicit. This protocol requires a DHCPv6 server to deal with several different lease intervals and places specific restrictions on their relationships. The purpose of these restrictions issecondary server. 7. Pool rebalance event occurs (POOLREQ/POOLRSP messages are exchanged). Addresses (or prefixes) belonging toallowtheotherprimary serverin the pair tocan beableassigned tomake certain assumptions intheabsence of an abilitysecondary server pool (transition from FREE tocommunicate between servers. The different times are: desired valid lifetime:FREE_BACKUP) or vice versa. 8. Thedesired valid lifetime is theleaseinterval thatis expired. 9. DECLINE message is received or aDHCPv6 server would likelease is deemed unusable for other reasons. 10. An administrative action is taken togiverecover an abandoned lease back toa DHCPv6 client in the absenceusable state. This transition MAY occur due to an implementation specific handling on ABANDONED resource. One possible example ofany restrictions imposed by the failover protocol. Its determinationsuch use isoutside ofa Neighbor Discovery or ICMP Echo check if thescope of this protocol. Typically thisaddress isthe result of external configuration of a DHCPv6 server. actual valid lifetime:still in use. Theactual valid lifetimeresource that is no longer in use (due to expiration or release), becomes FREE*. Depending of what allocation algorithm is used, thelease intervalresource thata DHCPv6 server gives outis no longer is use, returns to primary (FREE) or secondary pool (FREE_BACKUP). The conditions for specific transitions are depicted in Figure 2. +---------------+---------+-----------+ | \ Pool owner| | | | \-------\ | Primary | Secondary | |Algorithm \ | | | +---------------+---------+-----------+ | Proportional | FREE | FREE | | Independent | FREE |FREE_BACKUP| +---------------+---------+-----------+ Figure 2: FREE* State Transitions In case of servers operating in active-passive mode, while aDHCPv6 client. It may be shorter thanmajority of thedesired valid lifetime (as explained below). potential valid lifetime: The potential valid lifetime isresources are owned by thepotential lease expiration intervalprimary server, thelocalsecondary servertellswill need a portion of the resources toits partnerserve new clients while operating ina BNDUPD message. acknowledged potential valid lifetime: The acknowledged potential valid lifetime isCOMMUNICATION-INTERRUPTED state and also in PARTNER-DOWN state before it can take over thepotential lease intervalentire address pool (after thepartnerexpiry of MCLT). The secondary serverhas most recently acknowledged in a BNDACK message. 8.4.1. MCLT example The following example demonstratesconnot simply take over the entire resource pool immediately, since we have to handle theMCLT concept in practice. The values used are arbitrarily chosen are and not a recommendation for actual values. The MCLT in thiscaseis 1 hour.where both servers are able to communicate with DHCP clients, but unable to communicate with each other. Thedesired valid lifetime is 3 days, and its renewal time is halfsize of thevalid lifetime. When a server makes an offer for a new lease on an IP addressresource pool allocated toa DHCPv6 client, it determines the desired valid lifetime (in this case, 3 days). It then examinestheacknowledged potential valid lifetime (which in this casesecondary iszero) and determines the remainderspecified as a percentage of thetime left to run, which is also zero. To this it addscurrently available resources. Thus, as theMCLT. Sincenumber of available resources changes on theactual valid lifetime cannot be allowed to exceedprimary server, theremaindernumber of resources available to thecurrent acknowledged potential valid lifetime plussecondary server MUST also change, although theMCLT,frequency of theofferchanges made to theclientsecondary server's pool of address resources SHOULD be low enough to not use significant processing power or network bandwidth. The required size of this private pool allocated to the secondary server isforbased only on theremainderarrival rate of new DHCP clients and thecurrent acknowledged potential valid lifetime (i.e., zero) plus the MCLT. Thus,length of expected downtime of theactual valid lifetimeprimary server, and is1 hour. Once the server has sentnot directly influenced by theREPLY to the DHCPv6 client, it will update its failover partner with the lease information. However, the desired potential valid lifetime will be composed of one halftotal number of DHCP clients supported by thecurrent actual valid lifetime added to the desired valid lifetime. Thus,server pair. 8. Failover Mechanisms This section lays out an overview of the communication between partners and other mechanisms required for failoverpartneroperation. As this isupdated withaBNDUPD withdesign document, not apotential valid lifetimeprotocol specification, high level ideas are presented without implementation specific details (e.g. on- wire protocol formats). Specific protocol details are out of3 days + 1/2 hour. Whentheprimary server receives a BNDACK to its updatescope ofthe secondary server's (partner's) potential valid lifetime, it records that as the acknowledged potential valid lifetime. A server MUST NOT send a BNDACKthis document, and may be specified inresponse toaBNDUPD message until it is sure that theseparate draft. 8.1. Time Skew Partners exchange informationin the BNDUPD message has been updated in itsabout known leasedatabase. Thus, the primary server in this case canstates. To reliably compare a known lease state with an update received from a partner, servers must besure thatable to reliably compare thesecondary server has recordedtimes stored in thepotentialknown leaseintervalstate with the times received inits stable storage whentheprimary server receivesupdate. Although aBNDACK message from the secondary server. When the DHCPv6 client attemptssimple approach would be torenew at T1 (approximately one half an hour from the start of the lease), the primary server again determines the desired valid lifetime, which is still 3 days. It then compares this with the original acknowledged potential valid lifetime (3 days + 1/2 hour)require both partners to use synchronized time, e.g. by using NTP, such a service may not always be available in some scenarios that failover expects to cover. Therefore a mechanism to measure andadjusts fortrack relative time differences between servers is necessary. To do so, each message MUST contain information about the timepassed sinceof thesecondary was last updated (1/2 hour). Thustransmission in the timeremainingcontext of theacknowledged potential valid interval is 3 days. Adding the MCLT totransmitter. The transmitting server MUST set thisyields 3 days plus 1 hour, which is more thanas close to thedesired valid lifetimeactual transmission as possible. The receiving partner MUST store its own timestamp of3 days. Soreception as close to theclientactual reception as possible. The received timestamp information isrenewedthen compared with local timestamp. To account for packet delay variation (jitter), thedesired valid lifetime -- 3 days. When the primary DHCPv6 server updates the secondary DHCPv6 server after the DHCPv6 client's renewal REPLYmeasured difference iscomplete, it will calculate the desired potential valid lifetime asnot used directly, but rather theT1 fractionmoving average of last TIME_SKEW_PKTS_AVG packets time difference is calculated. This averaged value is referred to as theactual client valid lifetime (1/2 of 3 days thistime= 1.5 days). To this it will addskew. Note that thedesired client valid lifetime of 3 days, yielding a total desired potential valid lifetimetime skew algorithm allows cooperation between clients with completely desynchronized clocks as well as those whose desynchronization itself is not constant. 8.2. Time expression Timestamps are expressed as number of4.5 days. In this way, the primary attempts to have the secondary always "lead"seconds since midnight (UTC), January 1, 2000, modulo 2^32. Note: that is theclientsame approach as used inits understandingcreation ofthe client's valid lifetime so as to be able to always offer the client the desired client valid lifetime. Once the initial actual client valid lifetimeDUID-LLT (see Section 9.2 of [RFC3315]). Time differences are expressed in seconds and are signed. 8.3. Lazy updates Lazy update refers to theMCLT is past, the protocol operates effectively like the DHCPv6requirement placed on a server implementing a failover protocoldoes today into update itsbehavior concerning valid lifetimes. However, the guarantee thatfailover partner whenever theactual client valid lifetime will never exceedbinding database changes. A failover protocol which didn't support lazy update would require theremaining acknowledgedfailover partner update to complete before a DHCPv6 serverpotential valid lifetime by more than the MCLTcould respond to a DHCPv6 client request. The lazy update mechanism allowsfull recovery fromavariety of failures. 8.5. Unreachability detection Each partner maintainsserver to allocate a new or extend anFO_SEND timer for eachexisting lease and then update its failover partnerconnection. The FO_SEND timer is reset everyas timeany message is transmitted. If the timer reachespermits. Although theFO_SEND_MAX value, a CONTACT message is transmitted and timer is reset. The CONTACT message may be transmitted at any time. 8.6. Re-allocating Leases TODO: Describe controlled re-allocation of released/expired leases to different clients. 8.7. Sending Binding Update This and the following section is written as though every BNDUPD message contains only a single bindinglazy updatetransactionmechanism does not introduce additional delays inorder to reduce the complexity of the discussion. Note that while aserverMAY generate BNDUPD messagesresponse times, it introduces other difficulties. The key problem withmultiple bindinglazy updatetransactions, everyis that when a serverMUST be able to processfails after updating aBNDUPD message which contains multiple binding update transactions and generate the corresponding BNDACK messagesclient withstatus for multiple binding update transactions. Each server updatesa particular lease time and before updating itsfailoverpartner, the partnerabout recent changes inwill believe that a leasestates. Each update MUST include at leasthas expired even though thefollowing information: 1. resource type - non-temporaryclient still retains a valid lease on that address oraprefix.Resource type can be indicated8.4. MCLT concept In order to handle problem introduced by lazy updates (see Section 8.3), a period of time known as thecontainer that conveys"Maximum Client Lead Time" (MCLT) is defined and must be known to both theactual resource (e.g.primary and secondary servers. Proper use of this time interval places anIA_NA option indicates non-temporary IPv6 address). 2. resource information -upper bound on theactual address or prefix. That is conveyed usingdifference allowed between theappropriate option, e.g. an IAADDR for an address or an IAPREFIX for prefix. 3. valid lifelease timerequested byprovided to a DHCPv6 client4.by a server and the lease time known by that server's failover partner. The MCLT is typically much less than the lease time that a server has been configured to offer a client, and so some strategy must exist to allow a server to offer the configured lease time to a client. During a lazy update the updating server typically updates its partner with a potential expiration time which is longer than the lease time previously given to the client and which is longer than the lease time that the server has been configured to give a client. This allows that server to give a longer lease time to the client the next time the client renews its lease, since the time that it will give to the client will not exceed the MCLT beyond the potential expiration time acknowledged by its partner. The fundamental relationship on which much of the correctness of this protocol depends is that the lease expiration time known to a DHCPv6 client MUST NOT under any circumstances be more than the maximum client lead time (MCLT) greater than the potential expiration time known to a server's partner. The remainder of this section makes the above fundamental relationship more explicit. This protocol requires a DHCPv6 server to deal with several different lease intervals and places specific restrictions on their relationships. The purpose of these restrictions is to allow the other server in the pair to be able to make certain assumptions in the absence of an ability to communicate between servers. The different times are: desired valid lifetime: The desired valid lifetime is the lease interval that a DHCPv6 server would like to give to a DHCPv6 client in the absence of any restrictions imposed by the failover protocol. Its determination is outside of the scope of this protocol. Typically this is the result of external configuration of a DHCPv6 server. actual valid lifetime: The actual valid lifetime is the lease interval that a DHCPv6 server gives out to a DHCPv6 client. It may be shorter than the desired valid lifetime (as explained below). potential valid lifetime: The potential valid lifetime is the potential lease expiration interval the local server tells to its partner in a BNDUPD message. acknowledged potential valid lifetime: The acknowledged potential valid lifetime is the potential lease interval the partner server has most recently acknowledged in a BNDACK message. 8.4.1. MCLT example The following example demonstrates the MCLT concept in practice. The values used are arbitrarily chosen are and not a recommendation for actual values. The MCLT in this case is 1 hour. The desired valid lifetime is 3 days, and its renewal time is half the valid lifetime. When a server makes an offer for a new lease on an IP address to a DHCPv6 client, it determines the desired valid lifetime (in this case, 3 days). It then examines the acknowledged potential valid lifetime (which in this case is zero) and determines the remainder of the time left to run, which is also zero. To this it adds the MCLT. Since the actual valid lifetime cannot be allowed to exceed the remainder of the current acknowledged potential valid lifetime plus the MCLT, the offer made to the client is for the remainder of the current acknowledged potential valid lifetime (i.e., zero) plus the MCLT. Thus, the actual valid lifetime is 1 hour. Once the server has sent the REPLY to the DHCPv6 client, it will update its failover partner with the lease information. However, the desired potential valid lifetime will be composed of one half of the current actual valid lifetime added to the desired valid lifetime. Thus, the failover partner is updated with a BNDUPD with a potential valid lifetime of 3 days + 1/2 hour. When the primary server receives a BNDACK to its update of the secondary server's (partner's) potential valid lifetime, it records that as the acknowledged potential valid lifetime. A server MUST NOT send a BNDACK in response to a BNDUPD message until it is sure that the information in the BNDUPD message has been updated in its lease database. Thus, the primary server in this case can be sure that the secondary server has recorded the potential lease interval in its stable storage when the primary server receives a BNDACK message from the secondary server. When the DHCPv6 client attempts to renew at T1 (approximately one half an hour from the start of the lease), the primary server again determines the desired valid lifetime, which is still 3 days. It then compares this with the original acknowledged potential valid lifetime (3 days + 1/2 hour) and adjusts for the time passed since the secondary was last updated (1/2 hour). Thus the time remaining of the acknowledged potential valid interval is 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which is more than the desired valid lifetime of 3 days. So the client is renewed for the desired valid lifetime -- 3 days. When the primary DHCPv6 server updates the secondary DHCPv6 server after the DHCPv6 client's renewal REPLY is complete, it will calculate the desired potential valid lifetime as the T1 fraction of the actual client valid lifetime (1/2 of 3 days this time = 1.5 days). To this it will add the desired client valid lifetime of 3 days, yielding a total desired potential valid lifetime of 4.5 days. In this way, the primary attempts to have the secondary always "lead" the client in its understanding of the client's valid lifetime so as to be able to always offer the client the desired client valid lifetime. Once the initial actual client valid lifetime of the MCLT is past, the protocol operates effectively like the DHCPv6 protocol does today in its behavior concerning valid lifetimes. However, the guarantee that the actual client valid lifetime will never exceed the remaining acknowledged partner server potential valid lifetime by more than the MCLT allows full recovery from a variety of failures. 8.5. Unreachability detection Each partner maintains an FO_SEND timer for each partner connection. The FO_SEND timer is reset every time any message is transmitted. If the timer reaches the FO_SEND_MAX value, a CONTACT message is transmitted and timer is reset. The CONTACT message may be transmitted at any time. 8.6. Re-allocating Leases When in PARTNER-DOWN state there is a waiting period after which a resource can be re-allocated to another client. For resources which are available when the server enters PARTNER-DOWN state, the period is the MCLT from entry into PARTNER-DOWN state. For resources which are not available when the server enters PARTNER-DOWN state, the period is the MCLT after the later of the following times: the potential valid lifetime, the most recently transmitted potential valid lifetime, the most recently received acknowledged potential valid lifetime, and the most recently transmitted acknowledged potential valid lifetime. If this time would be earlier than the current time plus the MCLT, then the time the server entered PARTNER- DOWN state plus the maximum-client-lead-time is used. In any other state, a server cannot reallocate a resource from one client to another without first notifying its partner (through a BNDUPD message) and receiving acknowledgement (through a BNDACK mes- sage) that its partner is aware that that first client is not using the resource. This could be modeled in the following way. Though this specific implementation is in no way required, it may serve to better illus- trate the concept. An "available" resource on a server may be allocated to any client. A resource which was leased to a client and which expired or was released by that client would take on a new state, EXPIRED or RELEASED respectively. The partner server would then be notified that this resource was EXPIRED or RELEASED through a BNDUPD. When the sending server received the BNDACK for that resource showing it was FREE, it would move the resource from EXPIRED or RELEASED to FREE, and it would be available for allocation by the primary server to any clients. A server MAY reallocate a resource in the EXPIRED or RELEASED state to the same client with no restrictions provided it has not sent a BNDUPD message to its partner. This situation would exist if the lease expired or was released after the transition into PARTNER- DOWN state, for instance. 8.7. Sending Binding Update This and the following section is written as though every BNDUPD message contains only a single binding update transaction in order to reduce the complexity of the discussion. Note that while a server MAY generate BNDUPD messages with multiple binding update transactions, every server MUST be able to process a BNDUPD message which contains multiple binding update transactions and generate the corresponding BNDACK messages with status for multiple binding update transactions. Each server updates its failover partner about recent changes in lease states. Each update MUST include at least the following information: 1. resource type - non-temporary address or a prefix. Resource type can be indicated by the container that conveys the actual resource (e.g. an IA_NA option indicates non-temporary IPv6 address). 2. resource information - the actual address or prefix. That is conveyed using the appropriate option, e.g. an IAADDR for an address or an IAPREFIX for prefix. 3. valid life time requested by client 4. valid life time sent to client 5. IAID - Identity Association used by the client, while obtaining a given lease. (Note1: one client may use many IAIDs simulatenously. Note2: IAID for IA, TA and PD are orthogonal number spaces.) 6. Next Expected Client Transmission - time interval since Client Last Transmission Time, when a response from a client is expected. 7. potential valid life time - a lifetime that the server is willing to set if there were no MCLT/failover restrictions imposed. 8. preferred life time sent to client - the actual value sent back to the client 9. CLTT - Client Last Transaction Time, a timestamp of the last received transmission from a client 10. Client DUID The BNDUPD message MAY contain additional information related to the updated lease. The additional information MAY include, but is not limited to: 1. assigned FQDN name, defined in [RFC4704] 2. Options Requested by the client, i.e. content of the ORO 3. Remote-ID, defined in [RFC4649] 4. Relay-ID, defined in [RFC5460], section 5.4.1 5. Link-layer address [I-D.ietf-dhc-dhcpv6-client-link-layer-addr-opt] 6. Any other options the updating partner deems useful. Receiving partner MAY store received additional information, but it MAY choose to ignore them as well. Some information may be useful, so it is a good idea to keep or update them. One reason is FQDN information. A server SHOULD be prepared to clean up DNS information once the lease expires or is released. Another reason the partner may be interested in keepin additional data is a better support for leasequery [RFC5007] or bulk leasequery [RFC5460], which features queries based on Relay-ID, by link address and by Remote-ID. 8.8. Receiving Binding Update When a server receives a BNDUPD message, it needs to decide how to process the binding update transaction it contains and whether that transaction represents a conflict of any sort. The conflict resolution process MUST be used on the receipt of every BNDUPD message, not just those that are received while in POTENTIAL-CONFLICT state, in order to increase the robustness of the protocol. There are three sorts of conflicts: 1. Two clients, one resource - This is the duplicate resource allocation conflict. There two different clients each allocated the same resource. See Section 8.9. 2. Two resources, one client conflict - This conflict exists when a client on one server is associated with a one resource, and on the other server with a different resource in the same or related subnet. This does not refer to the case where a single client has resources in multiple different subnets or administrative domains, but rather the case where on the same subnet the client has a lease on one IP address in one server and on a different IP address on the other server. This conflict may or may not be a problem for a given DHCP server implementation and policy. If implementations and policies allow, both resources can be assigned to a given client. In the event that a DHCP server requires that a DHCP client have only one outstanding lease of a given type, the conflict MUST be resolved by accepting the lease which has the latest CLTT. 3. binding-status conflict - This is normal conflict, where one server is updating the other with newer information. See Section 8.9 for details of how to resolve these conflicts. 8.9. Conflict Resolution The server receiving a lease update from its partner must evaluate the received lease information to see if it is consistent with already known state and decide which information - the previously known or that just received - is "better". The server should take into consideration the following aspects: if the lease is already assigned to a specific client, who had contact with client recently, start time of the lease, etc. When analyzing a BNDUPD message from a partner server, if there is insufficient information in the BNDUPD to process it, then reject the BNDUPD with reject-reason 3: "Missing binding information". If the resource in the BNDUPD is not a resource associated with the failover endpoint which received the BNDUPD message, then reject it with reject-reason 1: "Illegal IP address (not part of any address pool)". Every BNDUPD message SHOULD contain a client-last-transaction-time option, which MUST, if it appears, be the time that the server last interacted with the DHCP client. It MUST NOT be, for instance, the time that the lease on an IP address expired. If there has been no interaction with the DHCP client in question (or there is no DHCP client presently associated with this resource), then there will be no client-last-transaction-time option in the BNDUPD message. The list in Figure 3 presents the conflict resolution outcome. To "accept" BNDUPD means to update the server's bindings database with the information contained in the BNDUDP and once the update is complete, send a BNDACK message corresponding to the BNDUPD message. To "reject" a BNDUPD means to lease the server's binding database unchangeg and to respond to the BNDUPD with BNDACK with a rejest- reason option included. When interpreting the information in the following table (Figure 3), for those rules that are listed with "time" -- if a BNDUPD doesn't have a client-last-transaction-time value, then it MUST NOT be considered later than the client-last-transaction-time in the receiving server's binding. If the BNDUPD contains a client-last- transaction-time value and the receiving server's binding does not, then the client-last-transaction-time value in the BNDUPD MUST be considered later than the server's. binding-status in received BNDUPD. binding-status in receiving FREE RESET server ACTIVE EXPIRED RELEASED FREE_BACKUP ABANDONED ACTIVE accept(5) time(2) time(1) time(2) accept EXPIRED time(1) accept accept accept accept RELEASED time(1) time(1) accept accept accept FREE/FREE_BACKUP accept accept accept accept accept RESET time(3) accept accept accept accept ABANDONED reject(4) reject(4) reject(4) reject(4) accept Figure 3: Conflict Resolution time(1): If the client-last-transaction-time in the BNDUPD is later than the client-last-transaction-time in the receiving server's binding, accept it, else reject it. time(2): If the current time is later than the receiving servers' lease-expiration-time, accept it, else reject it. time(3): If the client-last-transaction-time in the BNDUPD is later than the start-time-of-state in the receiving server's binding, accept it, else reject it. (1,2,3): If rejecting, use reject reason "Outdated binding information". (4): Use reject reason "Less critical binding information". (5): If the clients in a BNDUPD message and in a receiving server's binding differ, then if the receiving server is a secondary accept it, else reject it with a reject reason of "Fatal conflict exists: address in use by other client". The lease update may be accepted or rejected. Rejection SHOULD NOT change the flag in a lease that says that it should be transmitted to the failover partner. If this flag is set, then it should be transmitted, but if it is not already set, the rejection of a lease state update SHOULD NOT trigger an automatic update of the failover partner sending the rejected update. The potential for update storms is too great, and in the unusual case where the servers simply can't agree, that disagreement is better than an update storm. 8.10. Acknowledging Reception 9. Endpoint States 9.1. State Machine Operation Each server (or, more accurately, failover endpoint) can take on a variety of failover states. These states play a crucial role in determining the actions that a server will perform when processing a request from a DHCPv6 client as well as dealing with changing external conditions (e.g., loss of connection to a failover partner). The failover state in which a server is running controls the following behaviors: o Responsiveness -- the server is either responsive to DHCPv6 client requests or it is not. o Allocation Pool -- which pool of addresses (or prefixes) can be used for allocation on receipt of a SOLICIT message. o MCLT -- ensure that validlifelifetimes are not beyond what the partner has acked plus the MCLT (or not). A server will transition from one failover state to another based on the specific values held by the following state variables: o Current failover state. o Communications status (OK or not OK). o Partner's failover state (if known). Several events can cause the transition from one failover state to another. o Change in communications status (OK or not OK). o Change in partner's failover state. o Receipt of particular messages. o Expiration of timers. Whenever either of the last two of the above state variables changes state, the state machine is invoked, which may then trigger a change in the current failove state. Thus, whenever the communications status changes, the state machine processing is invoked. This may or may not result in a change in the current failover state. Whenever a server transitions to a new failover state, the new state MUST be communicated to its failover partner in a STATE message if the communications status is OK. In addition, whenever a server makes a transition into a new state, it MUST record the new state, its current understanding of its partner's state, and the time at which it entered the new state in stable storage. The following state transition diagram gives a condensed view of the state machine. If there is a difference between the words describing a particular state and the diagram below, the words should be considered authoritative. +---------------+ V +--------------+ | RECOVER -|+| | | STARTUP - | |(unresponsive) | +->+(unresponsive)| +------+--------+ +--------------+ +-Comm. OK +-----------------+ | Other State: | PARTNER DOWN - +<----------------------+ | RESOLUTION-INTER. | (responsive) | ^ All POTENTIAL- +----+------------+ | Others CONFLICT------------ | --------+ | | CONFLICT-DONE Comm. OK | +--------------+ | UPDREQ or Other State: | +--+ RESOLUTION - | | UPDREQALL | | | | | INTERRUPTED | | Rcv UPDDONE RECOVER All | | | (responsive) | | | +---------------+ | Others | | +------------+-+ | +->+RECOVER-WAIT +-| RECOVER | | | ^ | | |(unresponsive) | WAIT or | | Comm. | Ext. | +-----------+---+ DONE | | OK Comm. Cmd----->+ Comm.---+ Wait MCLT | V V V Failed | Changed | V +---+ +---+-----+--+-+ | | | +---+----------++ | | POTENTIAL + +-------+ | | |RECOVER-DONE +-| Wait | CONFLICT +------+ | +->+(unresponsive) | for |(unresponsive)| Primary | +------+--------+ Other +>+----+--------++ resolve Comm. | Comm. OK State: | | ^ conflict Changed | +---Other State:-+ RECOVER | Secondary | V V | | | | | DONE | resolve | ++----------+---++ | | All Others: POTENT. | | conflict | |CONFLICT-DONE-|+| | | Wait for CONFLICT- | ----+ see (9.10) | | (responsive) | | | Other State: V V | +------+---------+ | | NORMAL or RECOVER ++------------+---+ Other State: NORMAL | | | DONE | NORMAL + +<--------------+ | | +--+----------+-->+ (balanced) +-------External Command--->+ | ^ ^ +--------+--------+ | | | | | | | | Wait for Comm. OK Comm. Failed | | | Other Other | External | State: State: | | Command | RECOVER-DONE NORMAL Start Safe Comm. OK or | | COMM. INT. Period Timer Other State: Safe | Comm. OK. | V All Others Period | Other State: | +---------+--------+ | expiration | RECOVER +--+ COMMUNICATIONS - +----+ | | +-------------+ INTERRUPTED | | RECOVER | (responsive) +-------------------------->+ RECOVER-WAIT--------->+------------------+ Figure 4: Failover Endpoint State Machine 9.2. State Machine Initialization The state machine is characterized by storage (in stable storage) of at least the following information: o Current failover state. o Previous failover state. o Start time of current failover state. o Partner's failover state. o Start timesent to client 5. IAID - Identity Association usedof partner's failover state. o Time most recent packet received from partner. The state machine is initialized bythe client, while obtaining a given lease. (Note1: one client may use many IAIDs simulatenously. Note2: IAID for IA, TAreading these data items from stable storage andPD are orthogonal number spaces.) 6. Next Expected Client Transmission - time interval since Client Last Transmission Time, when a responserestoring their values froma client is expected. 7. potential valid life time - a lifetime thattheserver is willing to set ifinformation saved. If therewereis noMCLT/failover restrictions imposed. 8. preferred lifeinformation in stable storage concerning these items, then they should be initialized as follows: o Current failover state: Primary: PARTNER-DOWN, Secondary: RECOVER o Previous failover state: None. o Start timesent to client - the actual value sent back to the client 9. CLTT - Client Last Transaction Time, a timestampofthe lastcurrent failover state: Current time. o Partner's failover state: None until reception of STATE message. o Start time of partner's failover state: None until reception of STATE message. o Time most recent packet receivedtransmissionfroma client 10. Client DUIDpartner: None until packet received. 9.3. STARTUP State TheBNDUPD message MAY contain additional information relatedSTARTUP state affords an opportunity for a server to probe its partner server, before starting to service DHCP clients. When in theupdated lease. The additionalSTARTUP state, a server attempts to learn its partner's state and determine (using that informationMAY include, butif it is available) what state it should enter. The STARTUP state is notlimited to: 1. assigned FQDN name, definedshown with any specific state transitions in[RFC4704] 2. Options Requested bytheclient, i.e. content ofstate machine diagram (Figure 4) because theORO 3. Remote-ID, defined in [RFC4649] 4. Relay-ID, defined in [RFC5460], section 5.4.1 5. Link-layer address [I-D.ietf-dhc-dhcpv6-client-link-layer-addr-opt] 6. Any other optionsprocessing during theupdating partner deems useful. Receiving partner MAY store received additional information, but it MAY chooseSTARTUP state can cause the server toignore them as well. Some information may be useful, so it is a good ideatransition tokeep or update them. One reason is FQDNany of the other states, so that specific state transition arcs would only obscure other information.A9.3.1. Operation in STARTUP State The serverSHOULDMUST NOT beprepared to clean up DNS information once the lease expires orresponsive in STARTUP state. Whenever a STATE message isreleased. Another reasonsent to the partnermaywhile in STARTUP state the STARTUP flag MUST beinterestedset inkeepin additional data is a better support for leasequery [RFC5007] or bulk leasequery [RFC5460], which features queries based on Relay-ID, by link addressthe message andby Remote-ID. 8.8. Receiving Binding Update When athe previously recorded failover state MUST be placed in the server-state option. 9.3.2. Transition Out of STARTUP State The following algorithm is followed every time the serverreceives a BNDUPD message, it needs to decide howinitializes itself, and enters STARTUP state. Step 1: If there is any record in stable storage of a previous failover state for this server, set PREVIOUS-STATE toprocessthebinding update transaction it containslast recorded value in stable storage, andwhether that transaction represents a conflictgo to Step 2. If there is no record of anysort. The conflict resolution process MUST be used on the receipt of every BNDUPD message, not just those that are received while in POTENTIAL-CONFLICT state,previous failover state inorder to increasestable storage for this server, then set therobustness ofPREVIOUS-STATE to RECOVER and set theprotocol. There are three sorts of conflicts: 1. Two clients, one resource -TIME-OF-FAILURE to 0. Thisis the duplicate resource allocation conflict. Therewill allow twodifferent clients each allocated the same resource. See Section 8.9. 2. Two resources, one client conflict - This conflict exists when a client on oneservers which already have lease information to synchronize themselves prior to operating. In some cases, an existing serveris associated withwill be commissioned as aone resource,failover server andonbrought back into operation where its partner is not yet available. In this case, theothernewly commissioned failover serverwithwill not operate until its partner comes online -- but it has operational responsibilities as adifferent resourceDHCP server nonetheless. To properly handle this situation, a server SHOULD be configurable inthe same or related subnet. This does not refersuch a way as to move directly into PARTNER-DOWN state after thecase where a single clientstartup period expires if it hasresources in multiple different subnets or administrative domains, but ratherbeen unable to contact its partner during thecasestartup period. Step 2: If the previous state is one whereoncommunications was "OK", then set thesame subnetprevious state to theclient hasstate that is the result of the communications failed state transition (if such transition exists -- some states don't have alease on one IP address in one servercommunications failed state transition, since they allow both communications OK andonfailed). Step 3: Start the STARTUP state timer. The time that adifferent IP address onserver remains in theother server. This conflict may or may notSTARTUP state (absent any communications with its partner) is implementation dependent but SHOULD bea problemshort. It SHOULD be long enough for agiven DHCP server implementation and policy. If implementations and policies allow, both resources canTCP connection to beassignedcreated to agiven client. In the event that a DHCP server requires thatheavily loaded partner across aDHCP client have only one outstanding lease ofslow network. Step 4: Attempt to create agiven type,TCP connection to theconflict MUST be resolved by acceptingfailover partner. Step 5: Wait for "communications OK". When and if communications become "okay", clear theleaseSTARTUP flag, and set the current state to the PREVIOUS-STATE. If the partner is in PARTNER-DOWN state, and if the time at whichhasit entered PARTNER-DOWN state (as received in thelatest CLTT. 3. binding-status conflict - Thisstart-time-of-state option in the STATE message) isnormal conflict, where one serverlater than the last recorded time of operation of this server, then set CURRENT-STATE to RECOVER. If the time at which it entered PARTNER-DOWN state isupdatingearlier than theother with newer information. See Section 8.9 for detailslast recorded time ofhowoperation of this server, then set CURRENT-STATE toresolve these conflicts. 8.9. Conflict Resolution The server receiving a lease update from its partner must evaluate the received lease informationPOTENTIAL-CONFLICT. Then, transition tosee if it is consistent with already knownthe current state anddecide which information -take thepreviously known or that just received - is "better". The"communications OK" state transition based on the current state of this servershould take into considerationand thefollowing aspects: ifpartner. Step 6: If thelease is already assigned to a specific client, who had contact with client recently, startstartup timeofexpires thelease, etc. When analyzing a BNDUPD message from a partner server, if thereserver SHOULD transition to the PREVIOUS-STATE. 9.4. PARTNER-DOWN State PARTNER-DOWN state isinsufficient informationa state either server can enter. When in this state, theBNDUPD to process it, then rejectserver assumes that it is theBNDUPD with reject-reason 3: "Missing binding information".only server operating and serving the client base. If one server is in PARTNER-DOWN state, theresourceother server MUST NOT be operating. 9.4.1. Operation in PARTNER-DOWN State The server MUST be responsive in PARTNER-DOWN state. It will allow renewal of all outstanding leases on IP addresses. For those IP addresses for which theBNDUPDserver isnotusing proportional allocation, it will allocate IP addresses from its own pool, and after aresource associated with the failover endpoint which received the BNDUPD message, then rejectfixed period of time (the MCLT interval) has elapsed from entry into PARTNER-DOWN state, itwith reject-reason 1: "Illegalwill allocate IPaddress (not partaddresses from the set ofanyall available IP addresses. Any IP addresspool)". Every BNDUPD message SHOULD contain a client-last-transaction-time option, which MUST, if it appears, be the time thattagged as available for allocation by the other serverlast interacted with the DHCP client. It(at entry to PARTNER-DOWN state) MUST NOTbe, for instance, the time that the lease on an IP address expired. If there has been no interaction with the DHCP client in question (or there is no DHCP client presently associated with this resource), then there willbeno client-last-transaction-time option in the BNDUPD message. The list in Figure 3 presents the conflict resolution outcome. To "accept" BNDUPD meansallocated toupdatea new client until theserver's bindings database withmaximum-client-lead-time beyond theinformation containedentry into PARTNER-DOWN state has elapsed. A server inthe BNDUDP and once the update is complete, send a BNDACK message correspondingPARTNER-DOWN state MUST NOT allocate an IP address tothe BNDUPD message. To "reject"aBNDUPD meansDHCP client different from that toleasewhich it was allocated at theserver's binding database unchangeg and to respondentrance to PARTNER-DOWN state until theBNDUPD with BNDACK with a rejest- reason option included. When interpretingmaximum-client-lead-time beyond theinformation inmaximum of the followingtable (Figure 3), for those rules that are listed with "time" -- if a BNDUPD doesn't have a client-last-transaction-time value, then it MUST NOT be considered later thantimes: client expiration time, most recently transmitted potential-expiration-time, most recently received ack of potential-expiration-time from theclient-last-transaction-time inpartner, and most recently acked potential-expiration-time to thereceiving server's binding.partner. If this time would be earlier than theBNDUPD contains a client-last- transaction-time value andcurrent time plus thereceiving server's binding does not,maximum-client- lead-time, then theclient-last-transaction-time value in the BNDUPD MUST be considered later thantime theserver's. binding-status in received BNDUPD. binding-status in receiving FREE RESETserverACTIVE EXPIRED RELEASED FREE_BACKUP ABANDONED ACTIVE accept(5) time(2) time(1) time(2) accept EXPIRED time(1) accept accept accept accept RELEASED time(1) time(1) accept accept accept FREE/BACKUP accept accept accept accept accept RESET time(3) accept accept accept accept ABANDONED reject(4) reject(4) reject(4) reject(4) accept Figure 3: Conflict Resolution time(1): If the client-last-transaction-time inentered PARTNER-DOWN state plus theBNDUPDmaximum-client-lead-time islater thanused. The server is not restricted by theclient-last-transaction-timeMCLT when offering lease times while in PARTNER-DOWN state. In thereceiving server's binding, accept it, else reject it. time(2): If the current timeunlikely case, when there are two servers operating in a PARTNER-DOWN state, there islater thana chance of duplicate leases assigned. This leads to a POTENTIAL-CONFLICT (unresponsive) state when they re- establish contact. The duplicate lease issue can be postponed to a large extent by thereceiving servers' lease-expiration-time, accept it, else reject it. time(3): Ifserver granting new leases first from its own pool. Therefore the server operating in PARTNER-DOWN state MUST use its own pool first for new leases before assigning any leases from its downed partner pool. 9.4.2. Transition Out of PARTNER-DOWN State When a server in PARTNER-DOWN state succeeds in establishing a con- nection to its partner, its actions are conditional on theclient-last-transaction-timestate and flags received in theBNDUPD is later thanSTATE message from thestart-time-of-state inother server as part of thereceiving server's binding, accept it, else reject it. (1,2,3): If rejecting, use reject reason 15: "Outdated binding information". (4): Use reject reason 16: "Less critical binding information". (5):process of establishing the connection. If theclients in a BNDUPD message andSTARTUP bit is set ina receiving server's binding differ, then ifthereceiving server isserver-flags option of asecondary accept it, else reject it withreceived STATE message, areject reason of 2: "Fatal conflict exists: addressserver inuse by other client". The lease update may be accepted or rejected. Rejection SHOULDPARTNER-DOWN state MUST NOTchange the flag intake any state transitions based on reestablishing communications. Essentially, if alease that says that it should be transmitted to the failover partner. If this flagserver isset, then it should be transmitted, but ifin PARTNER-DOWN state, itis not already set,ignores all STATE messages from its partner that have therejection of a lease state update SHOULD NOT trigger an automatic updateSTARTUP bit set in the server-flags option of thefailover partner sendingSTATE message. If therejected update. The potential for update stormsSTARTUP bit istoo great, andnot set in theunusual case where the servers simply can't agree, that disagreement is better than an update storm. 8.10. Acknowledging Reception 9. Endpoint States 9.1. State Machine Operation Each server (or, more accurately, failover endpoint) can take on a varietyserver-flags option offailover states. These states playacrucial roleSTATE message received from its partner, then a server indeterminingPARTNER-DOWN state takes the following actionsthatbased on the state of the partner as received in aserver will performSTATE message (either immediately after establishing communications or at any time later whenprocessing a request fromaDHCPv6 client as well as dealing with changing external conditions (e.g., loss of connectionnew state is received) If the partner is in: NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN, POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE state transition toa failover partner). The failoverPOTENTIAL-CONFLICT statein which a serverIf the partner isrunning controlsin: RECOVER, RECOVER-WAIT state stay in PARTNER-DOWN state If thefollowing behaviors: o Responsiveness --partner is in: RECOVER-DONE state transition into NORMAL state 9.5. RECOVER State This state indicates that the serveris either responsive to DHCPv6 client requestshas no information in its stable storage or that it isnot. o Allocation Pool -- which pool of addresses (or prefixes) can be used for allocation on receipt ofre-integrating with aSOLICIT message. o MCLT -- ensure that valid lifetimes are not beyond what the partnerserver in PARTNER-DOWN state after it hasacked plus the MCLT (or not).been down. A serverwill transition from one failoverin this state MUST attempt toanother based on the specific values held by the following state variables: o Current failover state. o Communications status (OK or not OK). o Partner's failover state (if known). Several events can cause the transitionrefresh its stable storage fromone failover state to another. o Changethe other server. 9.5.1. Operation incommunications status (OK or not OK). o ChangeRECOVER State The server MUST NOT be responsive inpartner's failoverRECOVER state.o Receipt of particular messages. o Expiration of timers. Whenever either ofA server in RECOVER state will attempt to reestablish communications with thelast twoother server. 9.5.2. Transition Out of RECOVER State If theabove state variables changes state, the state machineother server isinvoked, which may then trigger a changeinthe current failove state. Thus, whenever thePOTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE state when communicationsstatus changes,are reestablished, then thestate machine processing is invoked. This may or may not result in a changeserver in RECOVER state will move to POTENTIAL-CONFLICT state itself. If thecurrent failover state. Whenever aother servertransitions to a new failoveris in any other state, then thenew state MUST be communicated to its failover partnerserver ina STATE message ifRECOVER state will request an update of missing binding information by sending an UPDREQ message. If thecommunications status is OK. In addition, whenever aservermakes a transition into a new state,has determined that itMUST record the new state,has lost itscurrent understandingstable storage because it has no record of ever having talked to itspartner's state,partner, while its partner does have a record of communicating with it, it MUST send an UPDREQALL message, otherwise it MUST send an UPDREQ message. It will wait for an UPDDONE message, andthe time at whichupon receipt of that message itentered the new state in stable storage. The following statewill transitiondiagram gives a condensed view of the state machine.to RECOVER-WAIT state. Ifthere is a difference betweencommunications fails during the reception of thewords describing a particular state andresults of thediagram below,UPDREQ or UPDREQALL message, thewords should be considered authoritative. +---------------+ V +--------------+ |server will remain in RECOVER-|+| | | STARTUP - | |(unresponsive) | +->+(unresponsive)| +------+--------+ +--------------+ +-Comm. OK +-----------------+ | Other State: | PARTNER DOWN - +<----------------------+ | RESOLUTION-INTER. | (responsive) | ^ All POTENTIAL- +----+------------+ | Others CONFLICT------------ | --------+ | | CONFLICT-DONE Comm. OK | +--------------+ |state, and will re-issue the UPDREQ orOther State: | +--+ RESOLUTION - | |UPDREQALL| | | | | INTERRUPTED | | Rcvwhen communications are re- established. If an UPDDONERECOVER All | | | (responsive) | | | +---------------+ | Others | | +------------+-+ | +->+RECOVER-WAIT +-| RECOVER | | | ^ | | |(unresponsive) | WAIT or | | Comm. | Ext. | +-----------+---+ DONE | | OK Comm. Cmd----->+ Comm.---+ Wait MCLT | V V V Failed | Changed | V +---+ +---+-----+--+-+ | | | +---+----------++ | | POTENTIAL + +-------+ | | |RECOVER-DONE +-| Wait | CONFLICT +------+ | +->+(unresponsive) | for |(unresponsive)| Primary | +------+--------+ Other +>+----+--------++ resolve Comm. | Comm. OK State: |message isn't received within an implementation dependent amount of time, and no BNDUPD messages are being received, the connection SHOULD be dropped. A B Server Server |^ conflict Changed|+---Other State:-+RECOVER| Secondary | V V | | | | | DONE | resolve | ++----------+---++ | | All Others: POTENT. | | conflict | |CONFLICT-DONE-|+|PARTNER-DOWN | |Wait for CONFLICT-|----+ see (9.10)>--UPDREQ--------------------> | |(responsive)| | <---------------------BNDUPD--< |Other State: V V|+------+---------+>--BNDACK--------------------> | ... ... |NORMAL or RECOVER ++------------+---+ Other State: NORMAL| | <---------------------BNDUPD--< |DONE|NORMAL + +<--------------+>--BNDACK--------------------> | |+--+----------+-->+ (balanced) +-------External Command--->+|^ ^ +--------+--------+| <--------------------UPDDONE--< | | | RECOVER-WAIT | | | | >--STATE-(RECOVER-WAIT)------> |Wait for Comm. OK Comm. Failed| | |Other Other|ExternalWait MCLT from last known |State: State:time of failover operation | |Command| RECOVER-DONENORMAL Start Safe Comm. OK or| |COMM. INT. Period Timer Other State: Safe|Comm. OK.|V All Others Period>--STATE-(RECOVER-DONE)------> |Other State:|+---------+--------+NORMAL |expiration<-------------(NORMAL)-STATE--< |RECOVER +--+ COMMUNICATIONS - +----+NORMAL | | >---- State-(NORMAL)---------------> | |+-------------+ INTERRUPTED| |RECOVER|(responsive) +-------------------------->+ RECOVER-WAIT--------->+------------------+Figure4: Failover Endpoint5: Transition out of RECOVER state If, at any time while a server is in RECOVER state communications fails, the server will stay in RECOVER state. When communications are restored, it will restart the process of transitioning out of RECOVER state. 9.6. RECOVER-WAIT StateMachine 9.2.This state indicates that the server has done an UPDREQ or UPDREQALL and has received the UPDDONE message indicating that it has received all outstanding binding update information. In the RECOVER-WAIT state the server will wait for the MCLT in order to ensure that any processing that this server might have done prior to losing its stable storage will not cause future difficulties. 9.6.1. Operation in RECOVER-WAIT StateMachine InitializationThe server MUST NOT be responsive in RECOVER-WAIT state. 9.6.2. Transition Out of RECOVER-WAIT State Upon entry to RECOVER-WAIT statemachinethe server MUST start a timer whose expiration ischaracterized by storage (in stable storage) of at leastset to a time equal to thefollowing information: o Current failover state. o Previous failover state. o Starttimeof current failover state. o Partner's failover state. o Startthe server went down (if known) or the timeof partner's failoverthe server started (if the down-time is unknown) plus the maximum-client-lead-time. When this timer expires, the server will transition into RECOVER-DONE state.o Time most recent packet received from partner. The state machineThis isinitializedto allow any IP addresses that were allocated byreading these data items fromthis server prior to loss of its client binding information in stable storageand restoring their values fromto contact theinformation saved.other server or to time out. Iftherethis isnothe first time this server has run failover -- as determined by the informationinreceived from the partner, not necessarily only as determined by this server's stable storageconcerning these items,(as that may have been lost), thenthey should be initialized as follows: o Current failover state: Primary: PARTNER-DOWN, Secondary: RECOVER o Previous failover state: None. o Start time of current failover state: Current time. o Partner's failover state: None until reception of STATE message. o Startthe waiting timeof partner's failover state: None until reception of STATE message. o Time most recent packet received from partner: None until packet received. 9.3. STARTUP State The STARTUP state affords an opportunity for adiscussed above may be skipped, and the server may transition immediately toprobe its partner server, before starting to service DHCP clients. When inRECOVER-DONE state. If theSTARTUP state, aserverattemptshas never before run failover, then there is no need tolearn its partner'swait in this stateand-- but, again, to determine(using that informationif this server has run failover it isavailable) what state it should enter. The STARTUP statevital that the information provided by the partner be utilized, since the stable storage of this server may have been lost. If communications fails while a server isnot shown with any specific state transitionsin RECOVER-WAIT state, it has no effect on thestate machine diagram (Figure 4) becauseoperation of this state. The server SHOULD continue to operate its timer, and theprocessingtimer expires during theSTARTUP state can causeperiod where communications with the other servertohave failed, then the server SHOULD transition toany ofRECOVER-DONE state. This is rare -- failover state transitions are not usually made while communications are interrupted, but in this case there is no reason to inhibit theother states, so that specifictimer. 9.7. RECOVER-DONE State This state exists to allow an interlocked transitionarcs would only obscure other information. 9.3.1.for one server from RECOVER state and another server from PARTNER-DOWN or COMMUNICATIONS-INTERRUPTED state into NORMAL state. 9.7.1. Operation inSTARTUPRECOVER-DONE StateTheA server in RECOVER-DONE state MUSTNOT be responsiverespond only to DHCPREQUEST/ RENEWAL and DHCPREQUEST/REBINDING DHCP messages. 9.7.2. Transition Out of RECOVER-DONE State When a server inSTARTUPRECOVER-DONE state determines that its partner server has entered NORMAL or RECOVER-DONE state, then it will transition into NORMAL state.WheneverIf communications fails while in RECOVER-DONE state, aSTATE messageserver will stay in RECOVER-DONE state. 9.8. NORMAL State NORMAL state is the state used by a server when it issent tocommunicating with thepartner whileother server, and any required resynchronization has been performed. While some bindings database synchronization is performed inSTARTUPNORMAL state, potential conflicts are resolved prior to entry into NORMAL state as is binding database data loss. When entering NORMAL state, a server will send to theSTARTUP flag MUST be set in the message and the previously recorded failover state MUST be placed inother server all currently unacknowledged binding updates as BNDUPD messages. When theserver-state option. 9.3.2. Transition Out of STARTUP State The following algorithmabove process isfollowed every timecomplete, if the serverinitializes itself, and enters STARTUP state. Step 1: If thereentering NORMAL state isany record in stable storage ofaprevious failover state for thissecondary server,set PREVIOUS-STATE tothen it will request IP addresses for allocation using thelast recorded valuePOOLREQ message. 9.8.1. Operation instable storage, and go to Step 2. If there is no record of any previous failoverNORMAL State When in NORMAL state a server will operate instable storage for this server, then setthePREVIOUS-STATE to RECOVER and setfollowing manner: Lease time calculations As discussed in Section 8.4, theTIME-OF-FAILURE to 0. This will allow two servers which already haveleaseinformation to synchronize themselves priorinterval given tooperating. In some cases, an existing server will be commissioned asafailover server and brought back into operation where its partner is not yet available. In this case,DHCP client can never be more than thenewly commissioned failover server will not operate until itsMCLT greater than the most recently received potential- expiration-time from the failover partnercomes online -- but it has operational responsibilitiesor the current time, whichever is later. As long as aDHCPservernonetheless. To properly handleadheres to thissituation,constraint, the specifics of the lease interval that it gives to a DHCP client or the value of the potential-expiration-time sent to its failover partner are implementation dependent. Lazy update of partner serverSHOULD be configurable in such a way asAfter sending an REPLY that includes lease update tomove directly into PARTNER-DOWN state aftera client, thestartup period expires if it has been unableserver servicing a DHCP client request attempts tocontactupdate its partnerduring the startup period. Step 2: Ifwith theprevious statenew binding information. Server transmits both desired valid lifetime and actual valid lifetime. Reallocation of IP addresses between clients Whenever a client binding isone where communications was "OK", then set the previous statereleased or expires, a BNDUPD mes- sage must be sent to the partner, setting the binding statethatto RELEASED or EXPIRED. However, until a BNDACK is received for this message, theresult ofIP address cannot be allocated to another client. It cannot be allocated to thecommunications failed state transition (if such transition exists -- some states don't havesame client again if acommunications failed state transition, since they allow both communications OK and failed). Step 3: Start the STARTUP state timer. The time thatBNDUPD was sent, otherwise it can. See Section 8.6. In normal state, each server receives binding updates from its partner server in BNDUPD messages. It records these in its client binding database in stable storage and then sends a corresponding BNDACK message to its partner server. 9.8.2. Transition Out of NORMAL State If an external command is received by a serverremainsinthe STARTUPNORMAL state(absent any communications withinforming it that itspartner)partner isimplementation dependent but SHOULDdown, then transition into PARTNER- DOWN state. Generally, this would beshort. It SHOULDan unusual situation, where some external agency knew the partner server was down. Using the command in this case would belong enough forappropriate if the polling interval and timeout were long. If aTCP connection to be createdserver in NORMAL state fails toa heavily loaded partner across a slow network. Step 4: Attemptreceive acks tocreate a TCP connectionmessages sent tothe failover partner. Step 5: Waitits partner for"communications OK". When andan implementation dependent period of time, it MAY move into COMMUNICATIONS-INTERRUPTED state. This situation might occur ifcommunications become "okay", cleartheSTARTUP flag, and setpartner server was capable of maintaining thecurrent state toTCP con- nection between thePREVIOUS-STATE.server and also capable of sending a CONTACT mes- sage every tSend seconds, but was (for some reason) incapable of pro- cessing BNDUPD messages. If thepartnercommunications isin PARTNER-DOWN state, and if the time at which it entered PARTNER-DOWN statedetermined to not be "ok" (asreceived in the start-time-of-state optiondefined inthe STATE message) is later than the last recorded time of operation of this server,Section 8.5), thenset CURRENT-STATE to RECOVER.transition into COMMUNICATIONS-INTERRUPTED state. If a server in NORMAL state receives any messages from its partner where thetime at which it entered PARTNER-DOWNpartner has changed stateis earlier thanfrom that expected by thelast recorded time of operation of this server,server in NORMAL state, thenset CURRENT-STATE to POTENTIAL-CONFLICT. Then, transition tothecurrentserver should transition into COMMUNICATIONS-INTERRUPTED state and take the"communications OK"appropriate statetransition based ontran- sition from there. For example, it would be expected for thecurrent state of this server andpartner to transition from POTENTIAL-CONFLICT into NORMAL state, but not for thepartner. Step 6:partner to transition from NORMAL into POTENTIAL-CONFLICT state. Ifthe startup time expiresa server in NORMAL state receives a DISCONNECT message from its partner, the serverSHOULDshould transitionto the PREVIOUS-STATE. 9.4. PARTNER-DOWNinto COMMUNICATIONS-INTERRUPTED state. 9.9. COMMUNICATIONS-INTERRUPTED StatePARTNER-DOWNA server goes into COMMUNICATIONS-INTERRUPTED state whenever it isaunable to communicate with its partner. Primary and secondary servers cycle automatically (without administrative intervention) between NORMAL and COMMUNICATIONS-INTERRUPTED stateeitheras the network connection between them fails and recovers, or as the partner server cycles between operational and non-operational. No duplicate IP address allocation canenter. When in this state,occur while the servers cycle between these states. When a serverassumes thatenters COMMUNICATIONS-INTERRUPTED state, if itis the only server operatinghas been configured to support an automatic transition out of COMMUNICATIONS- INTERRUPTED state andservinginto PARTNER-DOWN state (i.e., a "safe period" has been configured, see section 10), then a timer MUST be started for theclient base. If onelength of the configured safe period. A serveristransitioning into the COMMUNICATIONS-INTERRUPTED state from the NORMAL state SHOULD raise some alarm condition to alert administrative staff to a potential problem inPARTNER-DOWN state,theother server MUST NOT be operating. 9.4.1.DHCP subsystem. 9.9.1. Operation inPARTNER-DOWNCOMMUNICATIONS-INTERRUPTED StateTheIn this state a server MUSTbe responsive in PARTNER-DOWN state. It will allow renewal ofrespond to alloutstanding leases on IP addresses. For those IP addresses for which theDHCP client requests. When allocating new leases, each serveris using proportional allocation, it will allocate IP addressesallocates from its own pool,and after a fixed period of time (the MCLT interval) has elapsed from entry into PARTNER-DOWN state, it willwhere the primary MUST allocateIP addresses fromonly FREE resources (addresses or prefixes), and thesetsecondary MUST allocate only FREE_BACKUP resources (addresses or prefixes). When responding to RENEW messages, each server will allow continued renewal ofall available IP addresses. Anya DHCP client's current lease on an IP addresstagged as available for allocationor prefix irrespective of whether that lease was given out by theotherreceiving server(at entry to PARTNER-DOWN state)or not, although the renewal period MUST NOTbe allocated to a new client untilexceed themaximum-client-lead-timemaximum client lead time (MCLT) beyond theentry into PARTNER-DOWN state has elapsed. A server in PARTNER-DOWN state MUST NOT allocate an IP address to a DHCP client different from that to which it was allocated atlatest of: 1) theentrance to PARTNER-DOWN state untilpotential valid lifetime already acknowledged by themaximum-client-lead-time beyondother server, or 2) themaximum ofactual valid lifetime sent to thefollowing times: client expiration time, most recently transmitted potential-expiration-time, most recentlyDHCPv6 client, or 3) the potential valid lifetime receivedack of potential-expiration-timefrom thepartner, and most recently acked potential-expiration-timepartner server. However, since the server cannot communicate with its partner in this state, the acknowledged potential valid lifetime will not be updated in any new bindings. This is likely to eventually cause thepartner. If this time wouldactual valid lifetimes to beearlier thanthe current time plus themaximum-client- lead-time, then the time the server entered PARTNER-DOWN state plus the maximum-client-lead-timeMCLT (unless this isused.greater than the desired-client-lease- time). The serveris not restricted by the MCLT when offering lease times while in PARTNER-DOWN state. In the unlikely case, when there are two servers operating in a PARTNER-DOWN state, there is a chance of duplicate leases assigned. This leadsshould continue toa POTENTIAL-CONFLICT (unresponsive) state when they re- establish contact. The duplicate lease issue can be postponedtry to establish alarge extent by the server granting new leases first from its own pool. Therefore the server operating in PARTNER-DOWN state MUST use its own pool first for new leases before assigning any leases fromconnection with itsdowned partner pool. 9.4.2.partner. 9.9.2. Transition Out ofPARTNER-DOWNCOMMUNICATIONS-INTERRUPTED StateWhen a server in PARTNER-DOWN state succeeds in establishing a con- nection to its partner, its actions are conditional on the state and flags received in the STATE message from the other server as part of the process of establishing the connection.If theSTARTUP bit is set in the server-flags option of a received STATE message,safe period timer expires while a server is in the COMMUNICATIONS-INTERRUPTED state, it will transition immediately into PARTNER-DOWNstate MUST NOT take any state transitions based on reestablishing communications. Essentially, ifstate. If an external command is received by a serverisinPARTNER-DOWN state,COMMUNICATIONS- INTERRUPTED state informing itignores all STATE messages fromthat its partnerthat have the STARTUP bit set in the server-flags option of the STATE message.is down, it will transition immediately into PARTNER-DOWN state. Ifthe STARTUP bitcommunications isnot set inrestored with theserver-flags option of a STATE message received from its partner,other server, thenathe server inPARTNER-DOWNCOMMUNICATIONS-INTERRUPTED state will transition into another statetakes the following actionsbased on the state of thepartner as received in a STATE message (either immediately after establishing communicationspartner: o NORMAL orat any time later when a new state is received) IfCOMMUNICATIONS-INTERRUPTED: Transition into thepartner is in: NORMAL, COMMUNICATIONS-INTERRUPTED,NORMAL state. o RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state. o RECOVER-DONE: Transition into NORMAL state. o PARTNER-DOWN, POTENTIAL-CONFLICT,RESOLUTION-INTERRUPTED,CONFLICT-DONE, orCONFLICT-DONE stateRESOLUTION- INTERRUPTED: Transition into POTENTIAL-CONFLICT state. The following figure illustrates the transition from NORMAL to COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. Primary Secondary Server Server NORMAL NORMAL | >--CONTACT-------------------> | | <--------------------CONTACT--< | | [TCP connection broken] | COMMUNICATIONS : COMMUNICATIONS INTERRUPTED : INTERRUPTED | [attempt new TCP connection] | | [connection succeeds] | | | | >--CONNECT-------------------> | | <-----------------CONNECTACK--< | | NORMAL | <-------------------STATE-----< | NORMAL | | >--STATE---------------------> | | | >--BNDUPD--------------------> | | <---------------------BNDACK--< | | | | <---------------------BNDUPD--< | | >------BNDACK----------------> | ... ... | | | <--------------------POOLREQ--< | | >--POOLRESP-(2)--------------> | | | | >--BNDUPD-(#1)---------------> | | <---------------------BNDACK--< | | | | <--------------------POOLREQ--< | | >--POOLRESP-(0)--------------> | | | | >--BNDUPD-(#2)---------------> | | <---------------------BNDACK--< | | | Figure 6: Transition from NORMAL to COMMUNICATIONS-INTERRUPTED and back (example with 2 addresses allocated to secondary) 9.10. POTENTIAL-CONFLICTstate If the partner is in: RECOVER, RECOVER-WAIT state stay in PARTNER-DOWN state If the partner is in: RECOVER-DONE state transition into NORMAL state 9.5. RECOVERState This state indicates that theserver has no information in its stable storage or that it is re-integratingtwo servers are attempting to reintegrate witha server in PARTNER-DOWN state after it has been down. A servereach other, but at least one of them was running inthisa stateMUST attempt to refresh its stable storage from the other server. 9.5.1. Operation in RECOVER State The server MUST NOTthat did not guarantee automatic reintegration would beresponsive in RECOVER state. A server in RECOVERpossible. In POTENTIAL-CONFLICT statewill attempt to reestablish communications withtheother server. 9.5.2. Transition Out of RECOVER State Ifservers may determine that theother serversame resource has been offered and accepted by two different clients. It isin POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE state when communications are reestablished, then the server in RECOVER state will movea goal of this protocol to minimize the possibility that POTENTIAL-CONFLICT stateitself. If the other serverisin any other state, then theever entered. When a primary serverin RECOVERenters POTENTIAL-CONFLICT statewillit should requestan update of missing binding information by sending an UPDREQ message. If the server has determinedthatit has lost its stable storage because it has no record of ever having talked to its partner, while its partner does have a record of communicating with it, it MUST send an UPDREQALL message, otherwise it MUSTthe secondary send it all updates of which it is currently unaware by sending an UPDREQmessage. Itmessage to the secondary server. A secondary server entering POTENTIAL-CONFLICT state will wait for the primary to send it anUPDDONE message, and upon receiptUPDREQ message. 9.10.1. Operation in POTENTIAL-CONFLICT State Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming DHCP requests. 9.10.2. Transition Out ofthat message it will transition to RECOVER-WAIT state.POTENTIAL-CONFLICT State If communications failsduring the reception of the results ofwith theUPDREQ or UPDREQALL message,partner while in POTENTIAL-CONFLICT state, then the server willremaintransition to RESOLUTION-INTERRUPTED state. Whenever either server receives an UPDDONE message from its partner while inRECOVERPOTENTIAL-CONFLICT state, it MUST transition to a new state. The primary MUST transition to CONFLICT-DONE state, and the secondary MUST transition to NORMAL state. This willre-issuecause the primary server to leave POTENTIAL-CONFLICT state prior to the secondary, since the primary sends an UPDREQor UPDREQALL when communications are re- established. Ifmessage and receives an UPDDONEmessage isn't received withinbefore the secondary sends animplementation dependent amount of time,UPDREQ message andno BNDUPD messages are being received,receives its UPDDONE message. When a secondary server receives an indication that theconnectionprimary server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE state, it SHOULDbe dropped. A Bsend an UPDREQ message to the primary server. Primary Secondary Server Server | |RECOVER PARTNER-DOWNPOTENTIAL-CONFLICT POTENTIAL-CONFLICT | | | >--UPDREQ--------------------> | | | | <---------------------BNDUPD--< | | >--BNDACK--------------------> | ... ... | | | <---------------------BNDUPD--< | | >--BNDACK--------------------> | | | | <--------------------UPDDONE--< | CONFLICT-DONE | |RECOVER-WAIT |>--STATE--(CONFLICT-DONE)----> | | <---------------------UPDREQ--< |>--STATE-(RECOVER-WAIT)------>| | | >--BNDUPD--------------------> | |Wait MCLT from last known<---------------------BNDACK--< |time of failover operation... ... | >--BNDUPD--------------------> | |RECOVER-DONE<---------------------BNDACK--< | | | |>--STATE-(RECOVER-DONE)------>>--UPDDONE-------------------> | | NORMAL |<-------------(NORMAL)-STATE--<<------------STATE--(NORMAL)--< | NORMAL | |>---- State-(NORMAL)--------------->>--STATE--(NORMAL)-----------> | | | | <--------------------POOLREQ--< | | >------POOLRESP-(n)----------> | | addresses | Figure5:7: Transition out ofRECOVER state If, at any time while a server is in RECOVER state communications fails, the server will stay in RECOVER state. When communications are restored, it will restart the process of transitioning out of RECOVER state. 9.6. RECOVER-WAITPOTENTIAL-CONFLICT 9.11. RESOLUTION-INTERRUPTED State This state indicates that theserver has done an UPDREQ or UPDREQALL and has received the UPDDONE message indicating that it has received all outstanding binding update information. In the RECOVER-WAIT state the server will wait for the MCLT in ordertwo servers were attempting toensure that any processing that this server might have donereintegrate with each other in POTENTIAL-CONFLICT state, but communications failed prior tolosing its stable storage will not cause future difficulties. 9.6.1. Operationcompletion of re-integration. If the servers remained inRECOVER-WAIT State ThePOTENTIAL-CONFLICT while communications was interrupted, neither serverMUST NOTwould be responsivein RECOVER-WAIT state. 9.6.2. Transition Out of RECOVER-WAIT State Upon entrytoRECOVER-WAIT state theDHCP client requests, and if one serverMUST start a timer whose expiration is sethad crashed, then there might be no server able to process DHCP requests. When atime equalserver enters RESOLUTION-INTERRUPTED state it SHOULD raise an alarm condition to alert administrative staff of a problem in thetime theDHCP subsystem. 9.11.1. Operation in RESOLUTION-INTERRUPTED State In this state a serverwent down (if known)MUST respond to all DHCP client requests. When allocating new resources (addresses orthe time theprefixes), each serverstartedSHOULD allocate from its own pool (if that can be determined), where thedown-time is unknown) plusprimary SHOULD allocate only FREE resources, and themaximum-client-lead-time.secondary SHOULD allocate only BACKUP resources. Whenthis timer expires, theresponding to renewal requests, each server willtransition into RECOVER-DONE state. This is toallowany IP addressescontinued renewal of a DHCP client's current lease independent of whether thatwere allocatedlease was given out bythis server prior to loss of its client binding information in stable storage to contacttheotherreceiving server orto time out. If this isnot, although thefirstrenewal period MUST NOT exceed the maximum client lead timethis server has run failover -- as determined by(MCLT) beyond theinformation received fromlatest of: 1) thepartner, not necessarily only as determinedpotential valid lifetime already acknowledged bythis server's stable storage (as that may have been lost), then the waiting time discussed above may be skipped, andthe other servermay transition immediately to RECOVER-DONE state. Ifor 2) the lease-expiration-time or 3) potential valid lifetime received from the partner server. However, since the serverhas never before run failover, then there is no need to waitcannot communicate with its partner in thisstate -- but, again, to determine if this server has run failover it is vital that the information provided bystate, thepartneracknowledged potential valid lifetime will not beutilized, since the stable storageupdated in any new bindings. 9.11.2. Transition Out ofthis server may have been lost.RESOLUTION-INTERRUPTED State Ifcommunications fails whilean external command is received by a serverisinRECOVER-WAIT state,RESOLUTION- INTERRUPTED state informing ithas no effect on the operation of this state. The server SHOULD continue to operatethat itstimer, and the timer expires during the period wherepartner is down, it will transition immediately into PARTNER-DOWN state. If communications is restored with the otherserver have failed,server, then the serverSHOULDin RESOLUTION-INTERRUPTED state will transitionto RECOVER-DONEinto POTENTIAL- CONFLICT state. 9.12. CONFLICT-DONE State Thisis rare -- failoverstatetransitions are not usually made while communicationsindicates that during the process where the two servers areinterrupted, but in this case there is no reasonattempting toinhibitre-integrate with each other, thetimer. 9.7. RECOVER-DONE State This state exists to allow an interlocked transition for one server from RECOVER state and anotherprimary server has received all of the updates fromPARTNER-DOWN or COMMUNICATIONS-INTERRUPTED statethe secondary server. It make a transition into CONFLICT-DONE state in order that it may be totally responsive to the client load, as opposed to NORMALstate. 9.7.1.state where it would be in a "balanced" responsive state, running the load balancing algorithm. 9.12.1. Operation inRECOVER-DONECONFLICT-DONE State A primary server inRECOVER-DONECONFLICT-DONE stateMUST respond onlyis fully responsive toDHCPREQUEST/ RENEWAL and DHCPREQUEST/REBINDINGall DHCPmessages. 9.7.2.clients (similar to the situation in COMMUNICATIONS-INTERRUPTED state). If communications fails, remain in CONFLICT-DONE state. If communications becomes OK, remain in CONFLICT-DONE state until the conditions for transition out become satisfied. 9.12.2. Transition Out ofRECOVER-DONECONFLICT-DONE StateWhen a server in RECOVER-DONE state determines that its partner server has entered NORMAL or RECOVER-DONE state, then it will transition into NORMAL state.If communications fails with the partner while inRECOVER-DONECONFLICT-DONE state,athen the server willstayremain inRECOVER-DONECONFLICT-DONE state.9.8. NORMAL State NORMAL state is the state used byWhen a primary serverwhen it is communicating withdetermines that theother server, and any required resynchronization has been performed. While some bindings database synchronization is performed insecondary server has made a transition into NORMAL state,potential conflicts are resolved prior to entrythe primary server will also transition into NORMALstatestate. 10. Proposed extensions The following section discusses possible extensions to the proposed failover mechanism. Listed extensions must be sufficiently simple to not further complicate failover protocol. Any proposals that are considered complex will be defined as stand-alone extensions in separate documents. 10.1. Active-active mode A very simple way to achieve active-active mode isbinding database data loss. When entering NORMAL state, a server will sendto remove theotherrestriction that seconary serverall currently unacknowledged binding updates as BNDUPDMUST NOT respond to SOLICIT and REQUEST messages.When the above process is complete, ifInstead it could respond, but MUST have lower preference than primary server. Clients discovering available servers will receive ADVERTISE messages from both servers, but are expected to select the primary serverentering NORMAL state is a secondary server, thenas it has higher preference value configured. The following REQUEST message willrequest IP addressesbe directed to primary server. Discussion: Do DHCPv6 clients actually do this? DHCPv4 clients were rumored to wait forallocation usinga "while" to accept thePOOLREQ message. 9.8.1. Operation in NORMAL State When in NORMAL statebest offer, but to aserver will operate infirst approximation, they all take thefollowing manner: Lease time calculations As discussed in Section 8.4,first offer they receive that is even acceptable. The benefit of this approach, compared to thelease interval given"basic" active--passive solution is that there is no delay between primary failure and the moment when secondary starts serving requests. Discussion: The possibility of setting both servers preference to an equal value could theoretically work as aDHCP client can nevercrude attempt to provide load balancing. It wouldn't do much good on its own, as one (faster) server could be chosen morethan the MCLT greater than the most recently received potential- expiration-time from the failover partner or the current time, whicheverfrequently (assuming that with equal preference sets clients will pick first responding server, which islater. As long asnot mandated by DHCPv6). We could design aserver adheressimple mechanism of dynamically updating preference depending on usage of available resources. This concept hasn't been investigated in detail yet. 11. Dynamic DNS Considerations DHCP servers (and clients) can use DNS Dynamic Updates as described in RFC 2136 [RFC2136] tothis constraint, the specificsmaintain DNS name-mappings as they maintain DHCP leases. Many different administrative models for DHCP-DNS integration are possible. Descriptions ofthe lease intervalseveral of these models, and guidelines thatit gives to aDHCPclient or the valueservers and clients should follow in carrying them out, are laid out in RFC 4704 [RFC4704]. The nature of thepotential-expiration-time sent to itsfailoverpartnerprotocol introduces some issues concerning dynamic DNS updates that areimplementation dependent. Lazy updatenot part ofpartner server After sending an REPLY that includes lease update to a client,non-failover environments. This section describes these issues, and defines theserver servicing a DHCP client request attemptsinformation which failover partners should exchange in order toupdate its partner withensure consistent behavior. The presence of this section should not be interpreted as requiring an implementation of thenew binding information. Server transmits both desired valid lifetime and actual valid lifetime. ReallocationDHCPv6 failover protocol to also support DDNS updates. The purpose ofIP addresses between clients Whenever a client bindingthis discussion isreleased or expires, a BNDUPD mes- sage must be sentto clarify thepartner, settingareas where thebinding state to RELEASED or EXPIRED. However, until a BNDACK is receivedfailover and DHCP-DDNS protocols intersect forthis message,theIP address cannot be allocated to another client. It cannot be allocatedbenefit of implementations which support both protocols, not to introduce a new requirement into thesame client again ifDHCPv6 failover protocol. Thus, aBNDUPD was sent, otherwise it can. See Section 8.6. In normal state, eachDHCPv6 serverreceives bindingwhich implements the failover protocol MAY also support dynamic DNS updates, but if it does support dynamic DNS updatesfrom its partner server in BNDUPD messages. It records these in its client binding databaseit SHOULD utilize the techniques described here instable storage and then sends a corresponding BNDACK messageorder toits partner server. 9.8.2. Transition Outcorrectly distribute them between the failover partners. See RFC 4704 [RFC4704] as well as RFC 4703 [RFC4703] for information on how DHCPv6 servers deal with potential conflicts when updating DNS even without failover. From the standpoint ofNORMAL State If an external commandthe failover protocol, there isreceived byno reason why a serverin NORMAL state informing it that its partnerwhich isdown, then transition into PARTNER- DOWN state. Generally, this would be an unusual situation, where some external agency knewutilizing thepartnerDDNS protocol to update a DNS serverwas down. Using the command in this case wouldshould not beappropriate ifa partner with a server which is not utilizing thepolling interval and timeout were long. IfDDNS protocol to update a DNS server. However, a serverin NORMAL state failswhich is not able toreceive ackssupport DDNS or is not configured to support DDNS SHOULD output a warning message when it receives BNDUPD messagessent towhich indicate that its failover partnerfor anis configured to support the DDNS protocol to update a DNS server. An implementationdependent period of time,MAY consider this an error and refuse to operate, or it MAYmove into COMMUNICATIONS-INTERRUPTED state. This situation might occur ifchoose to operate anyway, having warned thepartner server was capableuser ofmaintainingtheTCP con- nectionproblem in some way. 11.1. Relationship between failover and dynamic DNS update The failover protocol describes the conditions under which each failover serverand also capable of sendingmay renew aCONTACT mes- sage every tSend seconds, but was (for some reason) incapable of pro- cessing BNDUPD messages. If the communications is determinedlease tonot be "ok" (as defined in Section 8.5), then transition into COMMUNICATIONS-INTERRUPTED state. If a server in NORMAL state receives any messages fromitspartner where the partner has changed state from that expected by the server in NORMAL state, thencurrent DHCP client, and describes the conditions under which it may grant a lease to a new DHCP client. An analogous set of conditions determines when a failover server shouldtransition into COMMUNICATIONS-INTERRUPTED stateinitiate a DDNS update, andtake the appropriate state tran- sition from there. For example,when itwould be expected for the partnershould attempt totransitionremove records fromPOTENTIAL-CONFLICT into NORMAL state, but not forthepartnerDNS. The failover protocol's conditions are based on the desired external behavior: avoiding duplicate address and prefix assignments; allowing clients totransition from NORMAL into POTENTIAL-CONFLICT state. If a server in NORMAL state receives a DISCONNECT messagecontinue using leases which they obtained fromits partner,one failover partner even if they can only communicate with the other partner; allowing the secondary DHCP servershould transition into COMMUNICATIONS-INTERRUPTED state. 9.9. COMMUNICATIONS-INTERRUPTED State A server goes into COMMUNICATIONS-INTERRUPTED state wheneverto grant new leases even if it is unable to communicate withits partner. Primary and secondarythe primary server. The desired external DDNS behavior for DHCP failover serverscycle automatically (without administrative intervention) between NORMAL and COMMUNICATIONS-INTERRUPTED state asis similar to that described above for thenetwork connection between them fails and recovers, or asfailover protocol itself: 1. Allow timely DDNS updates from thepartnerservercycles between operational and non-operational. No duplicate IP address allocation can occur while the servers cycle between these states. Whenwhich grants aserver enters COMMUNICATIONS-INTERRUPTED state, if it has been configuredlease tosupport an automatic transition out of COMMUNICATIONS- INTERRUPTED state and into PARTNER-DOWN state (i.e.,a"safe period" has been configured, see section 10), thenclient. Recognize that there is often atimer MUST be started forDDNS update lifecycle which parallels thelengthDHCP lease lifecycle. This is likely to include the addition of records when theconfigured safe period. Alease is granted, and the removal of DNS records when the leased resource is subsequently made available for allocation to a different client. 2. Communicate enough information between the two failover servers to allow one to complete the DDNS update 'lifecycle' even if the other servertransitioning intooriginally granted theCOMMUNICATIONS-INTERRUPTED state fromlease. 3. Avoid redundant or overlapping DDNS updates, where both failover servers are attempting to perform DDNS updates for theNORMAL state SHOULD raise some alarm conditionsame lease-client binding. 4. Avoid situations where one partner is attempting toalert administrative staffadd RRs related to a lease binding while the other partner is attempting to remove RRs related toa potential problem inthe same lease binding. While DHCPsubsystem. 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State Inservers configured for DDNS typically perform these operations on both the AAAA and the PTR resource records, thisstateis not required. It is entirely possible that aserver MUST respond to allDHCPclient requests. When allocating new leases, eachserverallocates from its own pool, where the primary MUST allocatecould be configured to onlyFREE resources (addresses or prefixes),update the DNS with PTR records, and thesecondary MUST allocateDHCPv6 clients could be responsible for updating the DNS with their own AAAA records. In this case, the discussions here would apply onlyFREE_BACKUP resources (addresses or prefixes). When respondingtoRENEW messages, eachthe PTR records. 11.2. Exchanging DDNS Information In order for either serverwill allow continued renewal ofto be able to complete aDHCP client's current lease on an IP addressDDNS update, orprefix irrespectiveto remove DNS records which were added by its partner, both servers need to know the FQDN associated with the lease-client binding. In addition, to properly handle DDNS updates, additional information is required. All ofwhetherthe following information needs to be transmitted between the failover partners: 1. The FQDN thatlease was given outthe client requested be associated with the resource. If the client doesn't request a particular FQDN and one is synthesized by thereceivingfailover server ornot, although the renewal period MUST NOT exceedif themaximumfailover server is configured to replace a clientlead time (MCLT) beyondrequested FQDN with a different FQDN, then thelatest of: 1)server generated value would be used. 2. The FQDN that was actually placed in thepotential valid lifetime already acknowledged byDNS for this lease. It may differ from the client requested FQDN due to some form of disambiguation or otherserver,DHCP server configuration (as described above). 3. The status of and DDNS operations in progress or2) the actual valid lifetime sentcompleted. 4. Information sufficient to allow theDHCPv6 client, or 3)failover partner to remove thepotential valid lifetime receivedFQDN from thepartner server. However, sinceDNS should that become necessary. These data items are theserver cannot communicateminimum necessary set to reliably allow two failover partners to successfully share the responsibility to keep the DNS up to date withits partner in this state,theacknowledged potential valid lifetime will notresources allocated to clients. This information would typically beupdatedincluded inany new bindings. This is likelyBNDUPD messages sent from one failover partner toeventually causetheactual valid lifetimesother. Failover servers MAY choose not tobe the current time plus the MCLT (unlessinclude thisis greater thaninformation in BNDUPD messages if there has been no change in thedesired-client-lease- time).status of any DDNS update related to the lease. The partner servershould continue to try to establish a connectionreceiving BNDUPD messages containing the DDNS information SHOULD compare the status informatin and the FQDN with the current DDNS information it has associated with the lease binding, and update itspartner. 9.9.2. Transition Outnotion ofCOMMUNICATIONS-INTERRUPTED State Ifthesafe period timer expires whileDDNS status accordingly. Some implementations will instead choose to send aserver is inBNDUPD without waiting for theCOMMUNICATIONS-INTERRUPTED state, itDDNS update to complete, and then willtransition immediately into PARTNER-DOWN state. If an external commandsend a second BNDUPD once the DDNS update isreceived bycomplete. Other implementations will delay sending the partner aserverBNDUPD until the DDNS update has been acknowledged by the DNS server, or until some time-limit has elapsed, inCOMMUNICATIONS- INTERRUPTED state informing itorder to avoid sending a second BNDUPD. The FQDN option contains the FQDN thatits partner is down, itwilltransition immediately into PARTNER-DOWN state. If communications is restoredbe associated with theother server, thenAAAA RR (if the serverin COMMUNICATIONS-INTERRUPTED state will transition into another state based onis performing an AAAA RR update for thestate ofclient). The PTR RR can be generated automatically from thepartner: o NORMALIP address orCOMMUNICATIONS-INTERRUPTED: Transition intoprefix value. The FQDN may be composed in any of several ways, depending on server configuration and theNORMAL state. o RECOVER: Stayinformation provided by the client inCOMMUNICATIONS-INTERRUPTED state. o RECOVER-DONE: Transition into NORMAL state. o PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or RESOLUTION- INTERRUPTED: Transition into POTENTIAL-CONFLICT state.its DHCP messages. Thefollowing figure illustratesclient may supply a hostname which it would like thetransition from NORMALserver toCOMMUNICATIONS-INTERRUPTED state and then backuse in forming the FQDN, or it may supply the entire FQDN. The server may be configured toNORMAL state again. Primary Secondary Server Server NORMAL NORMAL | >--CONTACT-------------------> | | <--------------------CONTACT--< | | [TCP connection broken] | COMMUNICATIONS : COMMUNICATIONS INTERRUPTED : INTERRUPTED | [attempt new TCP connection] | | [connection succeeds] | | | | >--CONNECT-------------------> | | <-----------------CONNECTACK--< | | NORMAL | <-------------------STATE-----< | NORMAL | | >--STATE---------------------> | | | >--BNDUPD--------------------> | | <---------------------BNDACK--< | | | | <---------------------BNDUPD--< | | >------BNDACK----------------> | ... ... | | | <--------------------POOLREQ--< | | >--POOLRESP-(2)--------------> | | | | >--BNDUPD-(#1)---------------> | | <---------------------BNDACK--< | | | | <--------------------POOLREQ--< | | >--POOLRESP-(0)--------------> | | | | >--BNDUPD-(#2)---------------> | | <---------------------BNDACK--< | | | Figure 6: Transition from NORMALattempt toCOMMUNICATIONS-INTERRUPTED and back (exampleuse the information the client supplies, it may be configured with2 addresses allocatedan FQDN tosecondary) 9.10. POTENTIAL-CONFLICT State This state indicates thatuse for the client, or it may be configured to synthesize an FQDN. Since the server interacting with the client may not have completed the DDNS update at the time it sends the first BNDUPD about the lease binding, there may be cases where the FQDN in later BNDUPD messages does not match the FQDN included in earlier messages. For example, the responsive server may be configured to handle situations where twoserversor more DHCP client FQDNs areattemptingidentical by modifying the most- specific label in the FQDNs of some of the clients in an attempt toreintegrate with each other, butgenerate unique FQDNs for them (a process sometimes called "disambiguation"). Alternatively, atleast onesites which use some or all of the information which clients supply to form the FQDN, it's possible that a client's configuration may be changed so that it begins to supply new data. The server interacting with the client may react by removing the DNS records which it originally added for the client, and replacing them with records that refer to the client's new FQDN. In such cases, the server SHOULD include the actual FQDN that wasrunningused in subsequent DDNS options in any BNDUPD messages exchanged between the failover partners. This server SHOULD include relevant information in its BNDUPD messages. This information may be necessary in order to allow the non-responsive partner to detect client configuration changes that change the hostname or FQDN data which the client includes in its DHCP requests. 11.3. Adding RRs to the DNS A failover server which is going to perform DDNS updates SHOULD initiate the DDNS update when it grants a new lease to astate thatclient. The server which did notguarantee automatic reintegration would be possible. In POTENTIAL-CONFLICT stategrant theservers may determine thatlease SHOULD NOT initiate a DDNS update when it receives thesame resourceBNDUPD after the lease has beenoffered and accepted by two different clients. It is a goal of thisgranted. The failover protocolto minimize the possibilityensures thatPOTENTIAL-CONFLICT state is ever entered. Whenonly one of the partners will grant aprimary server enters POTENTIAL-CONFLICT statelease to any individual client, so itshould requestfollows thatthe secondary send it allthis requirement will prevent both partners from initiating updatesof which it is currently unaware by sending an UPDREQ message to the secondary server. A secondarysimultaneously. The serverentering POTENTIAL-CONFLICT state will wait forinitiating theprimary to send it an UPDREQ message. 9.10.1. Operationupdate SHOULD follow the protocol inPOTENTIAL-CONFLICT State AnyRFC 4704 [RFC4704]. The serverin POTENTIAL-CONFLICT state MUST NOT process any incoming DHCP requests. 9.10.2. Transition Outmay be configured to perform a AAAA RR update on behalf ofPOTENTIAL-CONFLICT State If communications fails with the partner while in POTENTIAL-CONFLICT state, then theits clients, or not. Ordinarily, a failover server willtransition to RESOLUTION-INTERRUPTED state. Whenever eithernot initiate DDNS updates when it renews leases. In two cases, however, a failover serverreceives an UPDDONE message from its partner while in POTENTIAL-CONFLICT state,MAY initiate a DDNS update when itMUST transition torenews anew state. The primary MUST transitionlease toCONFLICT-DONE state, andits existing client: 1. When thesecondary MUST transition to NORMAL state. This will causelease was granted before theprimaryserver was configured toleave POTENTIAL-CONFLICT state priorperform DDNS updates, the server MAY be configured to perform updates when it next renews existing leases. The server which granted thesecondary, sincelease is theprimary sends an UPDREQ message and receives an UPDDONE beforeserver which should initiate thesecondary sends an UPDREQ message and receives its UPDDONE message. WhenDDNS update. 2. If asecondaryserverreceives an indicationis in PARTNER-DOWN state, it can conclude that its partner is no longer attempting to perform an update for theprimaryexisting client. If the remaining server hasmadenot recorded that an update for the binding has been successfully completed, the server MAY initiate atransition from POTENTIAL-CONFLICTDDNS update. It MAY initiate this update immediately upon entry toCONFLICT-DONEPARTNER-DOWN state, it may perform this in the background, or it MAY initiate this update upon next hearing from the DHCP client. 11.4. Deleting RRs from the DNS The failover server which makes a resource FREE SHOULDsend an UPDREQ message toinitiate any DDNS deletes, if it has recorded that DNS records were added on behalf of theprimary server. Primary Secondary Server Server | | POTENTIAL-CONFLICT POTENTIAL-CONFLICT | | | >--UPDREQ--------------------> | | | | <---------------------BNDUPD--< | | >--BNDACK--------------------> | ... ... | | | <---------------------BNDUPD--< | | >--BNDACK--------------------> | | | | <--------------------UPDDONE--< | CONFLICT-DONE | | >--STATE--(CONFLICT-DONE)----> | | <---------------------UPDREQ--< | | | | >--BNDUPD--------------------> | | <---------------------BNDACK--< | ... ... | >--BNDUPD--------------------> | | <---------------------BNDACK--< | | | | >--UPDDONE-------------------> | | NORMAL | <------------STATE--(NORMAL)--< | NORMAL | | >--STATE--(NORMAL)-----------> | | | | <--------------------POOLREQ--< | | >------POOLRESP-(n)----------> | | addresses | Figure 7: Transition outclient. A server not in PARTNER-DOWN state "makes a resource FREE" when it initiates a BNDUPD with a binding-status of FREE, FREE_BACKUP, EXPIRED, or RELEASED. Its partner confirms this status by acking that BNDUPD, and upon receipt ofPOTENTIAL-CONFLICT 9.11. RESOLUTION-INTERRUPTED State Thisthe BNDACK the server has "made the resource FREE". Conversely, a server in PARTNER-DOWN stateindicates that"makes a resource FREE" when it sets thetwo servers were attemptingbinding-status toreintegrate with each otherFREE, since inPOTENTIAL-CONFLICT state, butPARTNER-DOWN state no communicationsfailed prioris required with the partner. It is at this point that it should initiate the DDNS operations tocompletiondelete RRs from the DDNS. Its partner SHOULD NOT initiate DDNS deletes for DNS records related to the lease binding as part ofre-integration. Ifsending theservers remained in POTENTIAL-CONFLICT while communications was interrupted, neitherBNDACK message. The partner MAY have issued BNDUPD messages with a binding-status of FREE, EXPIRED, or RELEASED previously, but the other serverwould be responsive to DHCP client requests, and ifwill have rejected these BNDUPD messages. The failover protocol ensures that only oneserver had crashed, then there mightof the two partner servers will beno serverable toprocess DHCP requests. Whenmake a resource FREE. The serverenters RESOLUTION-INTERRUPTED statemaking the resource FREE may be doing so while itSHOULD raise an alarm condition to alert administrative staff ofis in NORMAL communication with its partner, or it may be in PARTNER-DOWN state. If aproblemserver is in PARTNER-DOWN state, it may be performing DDNS deletes for RRs which its partner added originally. This allows a single remaining partner server to assume responsibility for all of theDHCP subsystem. 9.11.1. Operation in RESOLUTION-INTERRUPTED State InDDNS activity which the two servers were undertaking. Another implication of this approach is that no DDNS RR deletes will be performed while either server is in COMMUNICATIONS-INTERRUPTED state, since no resource are moved into the FREE state during that period. 11.5. Name Assignment with No Update of DNS In some cases, a DHCP serverMUST respondis configured toall DHCPreturn a name to the DHCPv6 clientrequests. When allocating new resources (addressesbut not enter that name into the DNS. This is typically a name that it has discovered orprefixes), each server SHOULD allocategenerated from information it has received fromits own pool (if that can be determined), wheretheprimaryclient. In this case this name information SHOULDallocate only FREE resources, andbe communicated to thesecondary SHOULD allocatefailover partner, if onlyBACKUP resources. When respondingtorenewal requests, each server will allow continued renewal of a DHCP client's current lease independent of whetherensure thatlease was given out by the receiving server or not, although the renewal period MUST NOT exceedthey will return themaximum client lead time (MCLT) beyondsame name in thelatest of: 1)event thepotential valid lifetime already acknowledged bypartner becomes theotherserveror 2) the lease-expiration-time or 3) potential valid lifetime received fromto which thepartner server. However, sinceDHCPv6 client begins to interact. 12. Reservations and failover Some DHCP servers support a capability to offer specific preconfigured resources to DHCP clients. These are real DHCP clients, they do theserver cannot communicate with its partner in this state,entire DHCP protocol, but these servers always offer theacknowledged potential valid lifetime will not be updated in any new bindings. 9.11.2. Transition Out of RESOLUTION-INTERRUPTED State If an external command is received byclient aserver in RESOLUTION- INTERRUPTED state informing itspecific pre-configured resource, one they offer thatits partner is down, it will transition immediately into PARTNER-DOWN state. If communications is restored with theresource to no otherserver, then the serverclients. Such a capability has several names, but it is sometimes called a "reservation", inRESOLUTION-INTERRUPTED state will transition into POTENTIAL- CONFLICT state. 9.12. CONFLICT-DONE State This state indicatesthatduringtheprocessresource is reserved for a particular DHCP client. In a situation wherethethere are two DHCP serversare attempting to re-integrate with each other, the primary server has received all ofserving theupdates fromsame subnet without using failover, thesecondary server. It make a transition into CONFLICT-DONE state in order that it may be totally responsivetwo DHCP server's need to have disjoint resource pools, but identical reservations for theclient load, as opposedDHCP clients. In a failover context, both servers need toNORMAL state where it wouldbe configured with the proper reservations ina "balanced" responsive state, runningan identical manner, but if we stop there problems can occur around theload balancing algorithm. TODO: We do not support load balancing, so CONFLICT-DONE is actually equal to NORMAL. Need to remove CONFLICT-DONE and replace all its referencesedge conditions where reservations are made for resource that has already been leased toNORMAL. 9.12.1. Operation in CONFLICT-DONE State A primary servera different client. Different servers handle this conflict inCONFLICT-DONE statedifferent ways, but the goal of the failover protocol isfully responsivetoall DHCP clients (similarallow correct operation with any server's approach to thesituation in COMMUNICATIONS-INTERRUPTED state). If communications fails, remain in CONFLICT-DONE state. If communications becomes OK, remain in CONFLICT-DONE state until the conditions for transition out become satisfied. 9.12.2. Transition Outnormal processing ofCONFLICT-DONE State If communications fails with the partner while in CONFLICT-DONE state, thentheserver will remain in CONFLICT-DONE state. WhenDHCP protocol. The general solution with regards to reservations is as follows. Whenever a reserved resource becomes FREE (i.e., when first configured or whenever a client frees it or it expires or is reset), the primary serverdeterminesMUST show that resource as FREE (and thus available for its own allocation) and it MUST send it to the secondary serverhas madein atransition into NORMAL state,BNDUPD with a flag set showing that it is reserved and with a status of BACKUP. Note that this implies that a reserved resource goes through theprimary server will also transition into NORMAL state. 10. Proposed extensionsnormal state changes from FREE to ACTIVE (and possibly back to FREE). Thefollowing section discusses possible extensionsfailover protocol supports this approach to reservations, i.e., where theproposed failover mechanism. Listed extensions mustresource undergoes the normal state changes of any resource, but it can only besufficiently simpleoffered tonot further complicate failover protocol. Any proposalsthe client for which it is reserved. From the above, it follows thatare considered complexa reservation soley on the secondary willbe defined as stand-alone extensions in separate documents. 10.1. Active-active mode A very simple way to achieve active-active mode is to removenot necessarily allow therestrictionsecondary to offer thatseconary server MUST NOT respondaddress toSOLICIT and REQUEST messages. Instead it could respond, but MUST have lower preference than primary server. Clients discovering available servers will receive ADVERTISE messages from both servers, but are expectedclient toselect the primary server aswhom ithas higher preference value configured.is reserved. Thefollowing REQUEST message will be directed toreservation must also appear on the primaryserver. Discussion: Do DHCPv6 clients actually do this? DHCPv4 clients were rumored to waitas well fora "while"the secondary to be able toacceptoffer thebest offer, butresource toa first approximation, they all takethefirst offer they receive that is even acceptable. The benefit of this approach, comparedclient to which is is reserved. When the"basic" active--passive solutionreservation on a resource isthat therecancelled, if the resource isno delay between primary failurecurrently FREE and themoment when secondary starts serving requests. Discussion: The possibility of setting both servers preference to an equal value could theoretically work asserver is the primary, or BACKUP and the server is the secondary, the server MUST send acrude attemptBNDUPD toprovide load balancing. It wouldn't do much good on its own, as one (faster)the other servercould be chosen more frequently (assuming thatwithequal preference sets clients will pick first responding server, whichthe binding-status FREE and an indication that the resource isnot mandated by DHCPv6). We could design a simple mechanism of dynamically updating preference depending on usage of available resources. This concept hasn't been investigated in detail yet. 11. Dynamic DNSno longer reserved. 13. Security ConsiderationsTODO: Describe DNS Update [RFC2136] challenges inDHCPv6 failoverenvironment. Itisnicely described in Section 5.12an extension of[dhcpv4-failover]. 12. Reservationsa standard DHCPv6 protocol, so all security considerations from [RFC3315], Section 23 andfailover TODO: Describe how lease reservation works with failover. See[RFC3633], Section5.13 in [dhcpv4-failover]. 13. Protocol entities Discussion: It is unclear if following sections belong15 related todesign or protocol draft. Itthe server apply. As traffic exchange between clients and server iscurrently kept here as a scratchbook with list of things that will have to be defined eventually. Whether ornotit will stay in this document or will be moved toencrypted, an attacker than penetrated theprotocol spec documentnetwork and isTBD. 13.1. Failover Protocol This section enumerates list of options thatable to intercept traffic, willbe defined in failover protocol specification. Rough descriptionnot gain anything by also sniffing communication between partners. An attacker that can impersonate one partner can efficiently perform a denial ofpurpose and content for each option is specified. Exactservice attack onwire format willthe remaining uncompromised server. Several techniques may bedefined in protocol specification. 1. OPTION_FO_TIMESTAMP - convey information about timestamp. Itused: pretending that conflict resolution isused by time skew measurement algorithm (see Section 8.1). 13.2. Protocol constants This section enumerates various constantsrequired, requesting rebalance, claming thathave to be defined in actual protocol specification. 1. TIME_SKEW_PKTS_AVG - number of packetsa valid lease was released or declined etc. For thatare used to calculate average time skewreason the communication betweenpartners. See (seeservers SHOULD support failover connections over TLS, as explained in Section8.1). 14. Open questions This is scratchbook. This section willSection 5.1. Such secure connection SHOULD beremoved once questions are answered. Q: Do we want to support temporary addresses? I think not. They are short-livedoptional and configurable bydefinition, so clients should not mind getting new temporary addresses. Q: Do we want to support CGA-registered addresses? There is currently work in DHC WG about this, but I haven't looked at it yet. If that is complicated, we may not define it here, but rather as an extension. [If it moves forward, we need to support it.] 15. Security Considerationsthe administrator. TODO: Security considerations sectionwill containcontains loose notes and will be transformed into consistent text once the core design solidifies.16.14. IANA Considerations IANA is not requested to perform any actions at this time.17.15. Acknowledgements This document extensively uses concepts, definitions and other parts of [dhcpv4-failover] document. Authors would like to thank Shawn Routher, Greg Rabil, and Bernie Volz for their significant involvement and contributions. Authors would like to thank VithalPrasad Gaitonde for his insightful comments. This work has been partially supported by Department of Computer Communications (a division of Gdansk University of Technology) and the Polish Ministry of Science and Higher Education under the European Regional Development Fund, Grant No. POIG.01.01.02-00-045/ 09-00 (Future Internet Engineering Project).18.16. References18.1.16.1. Normative References [I-D.ietf-dhc-dhcpv6-client-link-layer-addr-opt] Halwasia, G., Systems, C., and W. Dec, "Client Link-layer Address Option in DHCPv6",draft-ietf-dhc-dhcpv6-client-link-layer-addr-opt-01draft-ietf-dhc-dhcpv6-client-link-layer-addr-opt-03 (work in progress),AugustOctober 2012. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.[RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, "Dynamic Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April 1997.[RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., and M. Carney, "Dynamic Host Configuration Protocol for IPv6 (DHCPv6)", RFC 3315, July 2003. [RFC3633] Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic Host Configuration Protocol (DHCP) version 6", RFC 3633, December 2003. [RFC4703] Stapp, M. and B. Volz, "Resolution of Fully Qualified Domain Name (FQDN) Conflicts among Dynamic Host Configuration Protocol (DHCP) Clients", RFC 4703, October 2006. [RFC4704] Volz, B., "The Dynamic Host Configuration Protocol for IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN) Option", RFC 4704, October 2006.18.2.16.2. Informative References [I-D.ietf-dhc-dhcpv6-failover-requirements] Mrugalski, T. and K. Kinnear, "DHCPv6 Failover Requirements",draft-ietf-dhc-dhcpv6-failover-requirements-01draft-ietf-dhc-dhcpv6-failover-requirements-02 (work in progress),JulySeptember 2012. [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, "Dynamic Updates in the Domain Name System (DNS UPDATE)", RFC 2136, April 1997. [RFC4649] Volz, B., "Dynamic Host Configuration Protocol for IPv6 (DHCPv6) Relay Agent Remote-ID Option", RFC 4649, August 2006. [RFC5007] Brzozowski, J., Kinnear, K., Volz, B., and S. Zeng, "DHCPv6 Leasequery", RFC 5007, September 2007. [RFC5460] Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460, February 2009. [dhcpv4-failover] Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S., Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover Protocol", draft-ietf-dhc-failover-12 (work in progress), March 2003. Authors' Addresses Tomasz Mrugalski Internet Systems Consortium, Inc. 950 Charter Street Redwood City, CA 94063 USA Phone: +1 650 423 1345 Email: tomasz.mrugalski@gmail.com Kim Kinnear Cisco Systems, Inc. 1414 Massachusetts Ave. Boxborough, Massachusetts 01719 USA Phone: +1 (978) 936-0000 Email: kkinnear@cisco.com