--- 1/draft-ietf-dhc-dhcpv6-failover-design-00.txt 2012-09-07 18:14:13.789425771 +0200 +++ 2/draft-ietf-dhc-dhcpv6-failover-design-01.txt 2012-09-07 18:14:13.881425807 +0200 @@ -1,19 +1,19 @@ Dynamic Host Configuration (DHC) T. Mrugalski Internet-Draft ISC Intended status: Standards Track K. Kinnear -Expires: January 5, 2013 Cisco - July 4, 2012 +Expires: March 11, 2013 Cisco + September 7, 2012 DHCPv6 Failover Design - draft-ietf-dhc-dhcpv6-failover-design-00 + draft-ietf-dhc-dhcpv6-failover-design-01 Abstract DHCPv6 defined in [RFC3315] does not offer server redundancy. This document defines a design for DHCPv6 failover, a mechanism for running two servers on the same network with capability for either server to take over clients' leases in case of server failure or network partition. This is a DHCPv6 Failover design document, it is not protocol specification document. It is a second document in a planned series of three documents. DHCPv6 failover requirements are @@ -28,21 +28,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on January 5, 2013. + This Internet-Draft will expire on March 11, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -50,175 +50,178 @@ to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 - 3.1. Additional Requirements . . . . . . . . . . . . . . . . . 5 + 3.1. Additional Requirements . . . . . . . . . . . . . . . . . 6 3.2. Features out of Scope: Load Balancing . . . . . . . . . . 6 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 6 - 4.1. Failover Machine Sate Overview . . . . . . . . . . . . . . 7 - 5. Connection Management . . . . . . . . . . . . . . . . . . . . 9 - 5.1. Creating Connections . . . . . . . . . . . . . . . . . . . 9 - 5.2. Endpoint Identification . . . . . . . . . . . . . . . . . 10 - 6. Resource Allocation . . . . . . . . . . . . . . . . . . . . . 11 - 6.1. Proportional Allocation . . . . . . . . . . . . . . . . . 12 - 6.2. Independent Allocation . . . . . . . . . . . . . . . . . . 13 - 6.3. Determining Allocation Approach . . . . . . . . . . . . . 13 - 6.3.1. IPv6 Addresses . . . . . . . . . . . . . . . . . . . . 13 - 6.3.2. IPv6 Prefixes . . . . . . . . . . . . . . . . . . . . 13 - 7. Information model . . . . . . . . . . . . . . . . . . . . . . 13 - 8. Failover Mechanisms . . . . . . . . . . . . . . . . . . . . . 14 - 8.1. Time Skew . . . . . . . . . . . . . . . . . . . . . . . . 14 - 8.2. Time expression . . . . . . . . . . . . . . . . . . . . . 14 - 8.3. Lazy updates . . . . . . . . . . . . . . . . . . . . . . . 15 - 8.4. MCLT concept . . . . . . . . . . . . . . . . . . . . . . . 15 - 8.4.1. MCLT example . . . . . . . . . . . . . . . . . . . . . 16 - 8.5. Unreachability detection . . . . . . . . . . . . . . . . . 18 - 8.6. Re-allocating Leases . . . . . . . . . . . . . . . . . . . 18 - 8.7. Sending Data . . . . . . . . . . . . . . . . . . . . . . . 18 - 8.7.1. Required Data . . . . . . . . . . . . . . . . . . . . 19 - 8.7.2. Optional Data . . . . . . . . . . . . . . . . . . . . 19 - 8.8. Receiving Data . . . . . . . . . . . . . . . . . . . . . . 19 - 8.8.1. Conflict Resolution . . . . . . . . . . . . . . . . . 19 - 8.8.2. Acknowledging Reception . . . . . . . . . . . . . . . 19 - 9. Endpoint States . . . . . . . . . . . . . . . . . . . . . . . 19 - 9.1. State Machine Operation . . . . . . . . . . . . . . . . . 19 - 9.2. State Machine Initialization . . . . . . . . . . . . . . . 22 - 9.3. STARTUP State . . . . . . . . . . . . . . . . . . . . . . 22 - 9.3.1. Operation in STARTUP State . . . . . . . . . . . . . . 22 - 9.3.2. Transition Out of STARTUP State . . . . . . . . . . . 22 - 9.4. PARTNER-DOWN State . . . . . . . . . . . . . . . . . . . . 24 - 9.4.1. Operation in PARTNER-DOWN State . . . . . . . . . . . 24 - 9.4.2. Transition Out of PARTNER-DOWN State . . . . . . . . . 25 - 9.5. RECOVER State . . . . . . . . . . . . . . . . . . . . . . 25 - 9.5.1. Operation in RECOVER State . . . . . . . . . . . . . . 26 - 9.5.2. Transition Out of RECOVER State . . . . . . . . . . . 26 - 9.6. RECOVER-WAIT State . . . . . . . . . . . . . . . . . . . . 27 - 9.6.1. Operation in RECOVER-WAIT State . . . . . . . . . . . 28 - 9.6.2. Transition Out of RECOVER-WAIT State . . . . . . . . . 28 - 9.7. RECOVER-DONE State . . . . . . . . . . . . . . . . . . . . 28 - 9.7.1. Operation in RECOVER-DONE State . . . . . . . . . . . 29 - 9.7.2. Transition Out of RECOVER-DONE State . . . . . . . . . 29 - 9.8. NORMAL State . . . . . . . . . . . . . . . . . . . . . . . 29 - 9.8.1. Operation in NORMAL State . . . . . . . . . . . . . . 29 - 9.8.2. Transition Out of NORMAL State . . . . . . . . . . . . 30 - 9.9. COMMUNICATIONS-INTERRUPTED State . . . . . . . . . . . . . 31 - 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State . . . . 31 - 9.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State . . 32 - 9.10. POTENTIAL-CONFLICT State . . . . . . . . . . . . . . . . . 33 - 9.10.1. Operation in POTENTIAL-CONFLICT State . . . . . . . . 34 - 9.10.2. Transition Out of POTENTIAL-CONFLICT State . . . . . . 34 - 9.11. RESOLUTION-INTERRUPTED State . . . . . . . . . . . . . . . 35 - 9.11.1. Operation in RESOLUTION-INTERRUPTED State . . . . . . 36 - 9.11.2. Transition Out of RESOLUTION-INTERRUPTED State . . . . 36 - 9.12. CONFLICT-DONE State . . . . . . . . . . . . . . . . . . . 36 - 9.12.1. Operation in CONFLICT-DONE State . . . . . . . . . . . 37 - 9.12.2. Transition Out of CONFLICT-DONE State . . . . . . . . 37 - 9.13. PAUSED State . . . . . . . . . . . . . . . . . . . . . . . 37 - 9.13.1. Operation in PAUSED State . . . . . . . . . . . . . . 37 - 9.13.2. Transition Out of PAUSED State . . . . . . . . . . . . 38 - 9.14. SHUTDOWN State . . . . . . . . . . . . . . . . . . . . . . 38 - 9.14.1. Operation in SHUTDOWN State . . . . . . . . . . . . . 38 - 9.14.2. Transition Out of SHUTDOWN State . . . . . . . . . . . 38 - 10. Proposed extensions . . . . . . . . . . . . . . . . . . . . . 38 - 10.1. Active-active mode . . . . . . . . . . . . . . . . . . . . 39 - 11. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . . 39 - 12. Reservations and failover . . . . . . . . . . . . . . . . . . 39 - 13. Protocol entities . . . . . . . . . . . . . . . . . . . . . . 39 - 13.1. Failover Protocol . . . . . . . . . . . . . . . . . . . . 40 - 13.2. Protocol constants . . . . . . . . . . . . . . . . . . . . 40 - 14. Open questions . . . . . . . . . . . . . . . . . . . . . . . . 40 - 15. Security Considerations . . . . . . . . . . . . . . . . . . . 40 - 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 40 - 17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 41 - 18. References . . . . . . . . . . . . . . . . . . . . . . . . . . 41 - 18.1. Normative References . . . . . . . . . . . . . . . . . . . 41 - 18.2. Informative References . . . . . . . . . . . . . . . . . . 41 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42 + 4.1. Failover Machine State Overview . . . . . . . . . . . . . 8 + 4.2. Messages . . . . . . . . . . . . . . . . . . . . . . . . . 9 + 5. Connection Management . . . . . . . . . . . . . . . . . . . . 11 + 5.1. Creating Connections . . . . . . . . . . . . . . . . . . . 11 + 5.2. Endpoint Identification . . . . . . . . . . . . . . . . . 12 + 6. Resource Allocation . . . . . . . . . . . . . . . . . . . . . 13 + 6.1. Proportional Allocation . . . . . . . . . . . . . . . . . 13 + 6.2. Independent Allocation . . . . . . . . . . . . . . . . . . 14 + 6.3. Determining Allocation Approach . . . . . . . . . . . . . 15 + 6.3.1. IPv6 Addresses . . . . . . . . . . . . . . . . . . . . 15 + 6.3.2. IPv6 Prefixes . . . . . . . . . . . . . . . . . . . . 15 + 7. Information model . . . . . . . . . . . . . . . . . . . . . . 15 + 8. Failover Mechanisms . . . . . . . . . . . . . . . . . . . . . 19 + 8.1. Time Skew . . . . . . . . . . . . . . . . . . . . . . . . 19 + 8.2. Time expression . . . . . . . . . . . . . . . . . . . . . 19 + 8.3. Lazy updates . . . . . . . . . . . . . . . . . . . . . . . 19 + 8.4. MCLT concept . . . . . . . . . . . . . . . . . . . . . . . 20 + 8.4.1. MCLT example . . . . . . . . . . . . . . . . . . . . . 21 + 8.5. Unreachability detection . . . . . . . . . . . . . . . . . 22 + 8.6. Re-allocating Leases . . . . . . . . . . . . . . . . . . . 23 + 8.7. Sending Binding Update . . . . . . . . . . . . . . . . . . 23 + 8.8. Receiving Binding Update . . . . . . . . . . . . . . . . . 24 + 8.9. Conflict Resolution . . . . . . . . . . . . . . . . . . . 25 + 8.10. Acknowledging Reception . . . . . . . . . . . . . . . . . 27 + 9. Endpoint States . . . . . . . . . . . . . . . . . . . . . . . 27 + 9.1. State Machine Operation . . . . . . . . . . . . . . . . . 27 + 9.2. State Machine Initialization . . . . . . . . . . . . . . . 30 + 9.3. STARTUP State . . . . . . . . . . . . . . . . . . . . . . 30 + 9.3.1. Operation in STARTUP State . . . . . . . . . . . . . . 31 + 9.3.2. Transition Out of STARTUP State . . . . . . . . . . . 31 + 9.4. PARTNER-DOWN State . . . . . . . . . . . . . . . . . . . . 32 + 9.4.1. Operation in PARTNER-DOWN State . . . . . . . . . . . 32 + 9.4.2. Transition Out of PARTNER-DOWN State . . . . . . . . . 33 + + 9.5. RECOVER State . . . . . . . . . . . . . . . . . . . . . . 34 + 9.5.1. Operation in RECOVER State . . . . . . . . . . . . . . 34 + 9.5.2. Transition Out of RECOVER State . . . . . . . . . . . 34 + 9.6. RECOVER-WAIT State . . . . . . . . . . . . . . . . . . . . 36 + 9.6.1. Operation in RECOVER-WAIT State . . . . . . . . . . . 37 + 9.6.2. Transition Out of RECOVER-WAIT State . . . . . . . . . 37 + 9.7. RECOVER-DONE State . . . . . . . . . . . . . . . . . . . . 37 + 9.7.1. Operation in RECOVER-DONE State . . . . . . . . . . . 38 + 9.7.2. Transition Out of RECOVER-DONE State . . . . . . . . . 38 + 9.8. NORMAL State . . . . . . . . . . . . . . . . . . . . . . . 38 + 9.8.1. Operation in NORMAL State . . . . . . . . . . . . . . 38 + 9.8.2. Transition Out of NORMAL State . . . . . . . . . . . . 39 + 9.9. COMMUNICATIONS-INTERRUPTED State . . . . . . . . . . . . . 40 + 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State . . . . 40 + 9.9.2. Transition Out of COMMUNICATIONS-INTERRUPTED State . . 41 + 9.10. POTENTIAL-CONFLICT State . . . . . . . . . . . . . . . . . 42 + 9.10.1. Operation in POTENTIAL-CONFLICT State . . . . . . . . 43 + 9.10.2. Transition Out of POTENTIAL-CONFLICT State . . . . . . 43 + 9.11. RESOLUTION-INTERRUPTED State . . . . . . . . . . . . . . . 44 + 9.11.1. Operation in RESOLUTION-INTERRUPTED State . . . . . . 45 + 9.11.2. Transition Out of RESOLUTION-INTERRUPTED State . . . . 45 + 9.12. CONFLICT-DONE State . . . . . . . . . . . . . . . . . . . 45 + 9.12.1. Operation in CONFLICT-DONE State . . . . . . . . . . . 46 + 9.12.2. Transition Out of CONFLICT-DONE State . . . . . . . . 46 + 10. Proposed extensions . . . . . . . . . . . . . . . . . . . . . 46 + 10.1. Active-active mode . . . . . . . . . . . . . . . . . . . . 46 + 11. Dynamic DNS Considerations . . . . . . . . . . . . . . . . . . 47 + 12. Reservations and failover . . . . . . . . . . . . . . . . . . 47 + 13. Protocol entities . . . . . . . . . . . . . . . . . . . . . . 47 + 13.1. Failover Protocol . . . . . . . . . . . . . . . . . . . . 47 + 13.2. Protocol constants . . . . . . . . . . . . . . . . . . . . 47 + 14. Open questions . . . . . . . . . . . . . . . . . . . . . . . . 48 + 15. Security Considerations . . . . . . . . . . . . . . . . . . . 48 + 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48 + 17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 48 + 18. References . . . . . . . . . . . . . . . . . . . . . . . . . . 49 + 18.1. Normative References . . . . . . . . . . . . . . . . . . . 49 + 18.2. Informative References . . . . . . . . . . . . . . . . . . 49 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 50 1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Glossary This is a supplemental glossary that should be combined with definitions in Section 3 of [I-D.ietf-dhc-dhcpv6-failover-requirements]. o Failover endpoint - The failover protocol allows for there to be a - unique failover 'endpoint' per partner per role per relationship - (where role is primary or secondary and the relationship is - defined by the relationship-name). This failover endpoint can - take actions and hold unique states. Typically, there is a one - failover endpoint per partner (server), although there may be - more. 'Server' and 'failover endpoint' are synonymous only if the - server participates in only one failover relationship. However, - for the sake of simplicity 'Server' is used throughout the - document to refer to a failover endpoint unless to do so would be - confusing. + unique failover 'endpoint' for each failover relationship in which + a failover server participates. The failover relationship is + defined by a relationship name, and includes the failover partner + IP address, the role this server takes with respect to that + partner (primary or secondary), and the prefixes associated with + that relationship. Note that a single prefix can only be + associated with a single failover relationship. This failover + endpoint can take actions and hold unique states. Typically, + there is a one failover endpoint per partner (server), although + there may be more. 'Server' and 'failover endpoint' are + synonymous only if the server participates in only one failover + relationship. However, for the sake of simplicity 'Server' is + used throughout the document to refer to a failover endpoint + unless to do so would be confusing. o Failover transmission - all messages exchanged between partners. o Independent Allocation - a prefix allocation algorithm to split the available pool of resources between the primary and secondary servers that is particularly well suited for vast pools (i.e. when available resources are not expected to deplete). See Section 6.2 for details. o Primary Server o Proportional Allocation - a prefix allocation algorithm to split the available free leases between the primary and secondary servers that is particularly well suited for more limited resources. See Section 6.1 for details. - o Resource - an IPv6 address or a IPv6 prefix. + o Resource - Any type of resource that is assignable using DHCPv6. + Currently there are two types of such resources defined: a non- + temporary IPv6 address and an IPv6 prefix. Due to the nature of + temporary addresses, they are not covered by the failover + mechanism. Other resource types may be defined in the future. o Responsive - A server that is responsive, will respond to DHCPv6 client requests. o Secondary Server + o Server - A DHCPv6 server that implements DHCPv6 failover. - 'Server' and 'failover endpoint' as synonymous only if server + 'Server' and 'failover endpoint' are synonymous only if the server participates in only one failover relationship. o Unresponsive - A server that is unresponsive will not respond to DHCPv6 client requests. 3. Introduction The failover protocol design provides a means for cooperating DHCPv6 servers to work together to provide a DHCPv6 service with availability that is increased beyond that which could be provided by a single DHCPv6 server operating alone. It is designed to protect DHCPv6 clients against server unreachability, including server failure and network partition. It is possible to deploy exactly two servers that are able to continue providing a lease on an IPv6 address [RFC3315] or on an IPv6 prefix [RFC3633] without the DHCPv6 client experiencing lease expiration or a reassignment of a lease to a different IPv6 address in the event of failure by one or the other of the two servers. - This protocol defines active-passive mode, sometimes also called hot - standby model. This means that during normal operation one server is - active (i.e. actively responds to clients' requests) while the second - is passive (i.e. it does receive clients' requests, but does not - respond to them and only maintains a copy of lease database and is - ready to take over incoming queries in case of primary server + This protocol defines active-passive mode, sometimes also called a + hot standby model. This means that during normal operation one + server is active (i.e. actively responds to clients' requests) while + the second is passive (i.e. it does receive clients' requests, but + does not respond to them and only maintains a copy of lease database + and is ready to take over incoming queries in case of primary server failure). Active-active mode (i.e. both servers actively handling clients' requests) is currently not supported for the sake of simplicity. Such mode may be defined as an exension at a later time. The failover protocol is designed to provide lease stability for leases with lease times beyond a short period. Due to the additional overhead required, failover is not suitable for leases shorter than 30 seconds. The DHCPv6 Failover protocol MUST NOT be used for leases shorter than 30 seconds. @@ -231,241 +234,302 @@ general, but rather to this particular design. 1. Minimize Asymmetry - while there are two distinct roles in failover (primary and secondary server), the differences between those two roles should be as small as possible. This will yield a simpler design as well as a simpler implementation of that design. 3.2. Features out of Scope: Load Balancing - It may be tempting to extend DHCPv6 failover mechanism to also offer - load balancing, as DHCPv4 failover did. Here is the reasoning for - this decision. In general case (not related to failover) load - balancing solutions are used when each server is not able to handle - total incoming traffic. However, by the very definition, DHCPv6 - failover is supposed to assume service availability despite failure - of one server. That leads to conclusion that each server must be - able to handle whole traffic. Therefore in properly provisioned - setup, load balancing is not needed. + While it is tempting to extend DHCPv6 failover mechanism to also + offer load balancing, as DHCPv4 failover did, this design does not do + that. Here is the reasoning for this decision. In general case (not + related to failover) load balancing solutions are used when each + server is not able to handle total incoming traffic. However, by the + very definition, DHCPv6 failover is supposed to assume service + availability despite failure of one server. That leads to conclusion + that each server must be able to handle whole traffic. Therefore in + properly provisioned setup, load balancing is not needed. 4. Protocol Overview The DHCPv6 Failover Protocol is defined as a communication between failover partners with all associated algorithms and mechanisms. Failover communication is conducted over a TCP connection established between the partners. The protocol reuses the framing format specified in Section 5.1 of DHCPv6 Bulk Leasequery [RFC5460], but - uses different message types. Additional failover-specific message - types will be defined. All information is sent over the connection - as typical DHCPv6 Options, following format defined in Section 22.1 - of [RFC3315]. + uses different message types. New failover-specific message types + are listed in Section 4.2. All information is sent over the + connection as typical DHCPv6 messages that convey DHCPv6 options, + following format defined in Section 22.1 of [RFC3315]. After initialization, the primary server establishes a TCP connection with its partner. The primary server sends a CONNECT message with initial parameters. Secondary server responds with CONNECTACK. Depending on the failover state of each partner, they MUST initiate one of the binding update procedures. Each server MAY send an UPDREQ message to request its partner to send all updates that have not been - sent yet (this case applies when partner has an existing database and - wants to update it). Alternatively, a server MAY choose to send an - UPDREQALL message to request a full lease database transmission + sent yet (this case applies when the partner has an existing database + and wants to update it). Alternatively, a server MAY choose to send + an UPDREQALL message to request a full lease database transmission including all leases (this case applies in case of booting up new server after installation, corruption or complete loss of database, or other catastrophic failure). Servers exchange lease information by using BNDUPD messages. - Depending on local and remote state of a lease, a server may either - accept or reject the update. Reception of lease update information - is confirmed by responding with BNDACK message with appropriate - status. The majority of the messages sent over a failover TCP - connection consists of BNDUPD and BNDACK messages. + Depending on the local and remote state of a lease, a server may + either accept or reject the update. Reception of lease update + information is confirmed by responding with a BNDACK message with + appropriate status. The majority of the messages sent over a + failover TCP connection consists of BNDUPD and BNDACK messages. A subset of available resources (addresses or prefixes) is reserved for secondary server use. This is required for handling a case where both servers are able to communicate with clients, but unable to - communicate with each other. After initial connection is + communicate with each other. After the initial connection is established, the secondary server requests a pool of available - addresses by sending a POOLREQ message. The primary server assigns a - pool to the secondary by transmitting a POOLRESP message and then - sending a series of BNDUPD messages. The secondary server may - initiate such pool request at any time when maintaining communication - with primary server. + addresses by sending a POOLREQ message. The primary server assigns + addresses to the secondary by sending a series of BNDUPD messages. + When this process is complete, the primary server sends a POOLRESP + message to the secondary server. The secondary server may initiate + such pool request at any time when in communication with primary + server. Failover servers use a lazy update mechanism to update their failover partner about changes to their lease state database. After a server performs any modifications to its lease state database (assign a new lease, extend an existing one, release or expire a lease), it sends its response to the client's request first (performing the "regular" DHCPv6 operation) and then informs its failover partner using a BNDUPD message. This BNDUPD message SHOULD be sent soon after the response is sent to the DHCPv6 client, but there is no specific requirement of a minimum time in which to do so. The major problem with lazy update mechanism is the case when the - server crashes after sending response to client, but before sending + server crashes after sending a response to client, but before sending the lazy update to its partner (or when communication between - partners is interrupted). To solve this problem, concept known as - the Maximum Client Lead Time (MCLT) (initially designed for DHCPv4 + partners is interrupted). To solve this problem, the concept known + as the Maximum Client Lead Time (MCLT) (initially designed for DHCPv4 failover) is used. The MCLT is the maximum amount of time that one server can extend a lease for a client's binding beyond the time known by its failover partner. See Section 8.4 for detailed - desciption how MCLT affects assigned lease times. + desciption how the MCLT affects assigned lease times. Servers verify each others availability by periodically exchanging - CONTACT messages. See Section 8.5 for discussion about detecting + CONTACT messages. See Section 8.5 for discussion about detecting a partner's unreachability. A server that is being shut down transmits a DISCONNECT message, closes the connection with its failover partner and stops operation. A Server SHOULD transmit any pending lease updates before transmitting DISCONNECT message. -4.1. Failover Machine Sate Overview +4.1. Failover Machine State Overview - The following section provides simplified description of all states. - For the sake of clarity and simplicity, it omits important details. - For complete description, see Section 9. In case of a disagreement - between simplified and complete description, please follow Section 9. + The following section provides a simplified description of all + states. For the sake of clarity and simplicity, it omits important + details. For complete description, see Section 9. In case of a + disagreement between the simplified and complete description, please + follow Section 9. - Each server may be in one of the well defines states. In each state + Each server MUST be in one of the well defines states. In each state a server may be either responsive (responds to clients' queries) or unresponsive (clients' queries are ignored). A server starts its operation in short-lived STARTUP state. A server - determines its partner reachibility and state and usually returns - back to the state it was in before shutdown. + determines its partner reachibility and state and sets its own state + based on that determination. It frequently returns back to the state + it was in before shutdown. During typical operation when servers maintain communication, both - are in NORMAL state. In that state only primary responds to clients' - requests. A secondary server in unresponsive. + are in NORMAL state. In that state only the primary responds to + clients' requests. A secondary server in unresponsive to DHCPv6 + clients. If a server discovers that its partner is no longer reachable, it - goes to COMMUNICATIONS-INTERRUPTED state. Server must be extra + goes to COMMUNICATIONS-INTERRUPTED state. A server must be extra cautious as it can't distingush if its partner is down or just communication between servers is interrupted. Since communication between partners is not possible, a server must act on the assumtion - that if its partner is up, it follows defined procedure. In - particular, not extend any lease beyond its partner knowledge by at - most MCLT. That imposes additional burden on the server. Therefore - it is not recommended to operate for prolonged periods in this state. - Once communication is reestablished, server may go into NORMAL, - POTENTIAL-CONFLICT or PARTNER-DOWN state. It may also stay in - COMMUNICATIONS-INTERRUPTED if certain conditions are met. + that its partner is up. A failover server must follow a defined + procedure, in particular, it MUST NOT extend any lease more than the + MCLT beyond its partner's knowledge of the lease expiration time. + This imposes an additional burden on the server, in that clients will + return to the server for lease renewals more frequently than they + would otherwise. Therefore it is not recommended to operate for + prolonged periods in this state. Once communication is + reestablished, a server may go into NORMAL, POTENTIAL-CONFLICT or + PARTNER-DOWN state. It may also stay in COMMUNICATIONS-INTERRUPTED + state if certain conditions are met. Once a server is switched into PARTNER-DOWN (when auto-partner-down is used or as a result of administrative action), it can extend leases, regardless of the original server that initially granted the lease. In that state server handles leases from its own pool, but is - albo able to serve pool from its downed partner. MCLT restrictions + also able to serve pool from its downed partner. MCLT restrictions no longer apply. Operation in this mode is less demanding for the server that remains operational, than in COMMUNICATIONS-INTERRUPTED state, but PARTNER-DOWN does not offer any kind of redundancy. - When server loses its database (e.g. due to first time run or - catastrophic failure) or detects that is partner is in PARTNER-DOWN - state and additional conditions are met, it switches to RECOVER - state. In that state server acknowledges that content of its - database is doubtful and needs to refresh its database from its - partner. Once this operation is done, it switches to RECOVER-WAIT - and later to RECOVER-DONE. + When a server does not have an intact lease state database (e.g. due + to first time run or catastrophic failure) or detects that is partner + is in PARTNER-DOWN state and additional conditions are met, it + switches to RECOVER state. In that state the server acknowledges + that content of its database is doubtful and it needs to refresh its + database from its partner. Once this operation is complete, it + switches to RECOVER-WAIT and later to RECOVER-DONE. Once servers reestablish connection, they discover each others' state. Depending on the conditions, they may return to NORMAL or - move to POTENTINAL-CONFLICT in case of unexpected partner's state. + move to POTENTINAL-CONFLICT if the partner is in a state that doesn't + allow a simple re-integration of the server's lease state databases. It is a goal of this protocol to minimize the possibility that POTENTIAL-CONFLICT state is ever entered. Servers running in - POTENTIAL-CONFLICT do not respond to clients' requests and work on - resolving potential conflicts. Once outstanding lease updates are + POTENTIAL-CONFLICT do not respond to clients' requests and work only + on resolving potential conflicts. Once outstanding lease updates are exchanged, servers move to CONFLICT-DONE or NORMAL states. - Servers that are recovering from potential conflict and loose + Servers that are recovering from potential conflicts and loose communication, switch to RESOLUTION-INTERRUPTED. - Server that is being shut down, switches briefly to SHUTDOWN state - and communicates its state to its partner before actual termination. + A Server that is being shut down sends a DISCONNECT message. See + Section 4.2. + +4.2. Messages + + The failover protocol is centered around the message exchanges used + by one server to update its partner and respond to received updates. + The following list enumerates these messages. + + It should be noted that no specific formats or message type values + are assigned at this stage. Appropriate implementation details will + be specified in a separate protocol specification document. + + o BNDUPD - The binding update message is used to send the binding + lease changes to the partner. One message may contain one or more + lease updates. The partner is expected to respond with a BNDACK + message. + + o BNDACK - The binding acknowledgement is used for confirmation of + the received BNDUPD message. It may contain a positive or + negative response (e.g. due to detected lease conflict). + + o POOLREQ - The Pool Request message is used by one server + (typically secondary) to request allocation of resources + (addresses or prefixes) from its partner. The partner responds + with POOLRSP. + + o POOLRSP - The Pool Response message is used by one server + (typically primary) to repond to its partner's request for + resources allocation. One POOLRSP message may contain more than + one pool. + + o UPDREQ - The update request message is used by one server to + request that its partner send all binding database changes that + has not been sent and confirmed already. Requested partner is + expected to respond with zero or more BNDUPD messages, followed by + UPDDONE that signals end of updates. + + o UPDREQALL - The update request all is used by one server to + request that all binding database information be sent in order to + recover from a total loss of its binding database by the + requesting server. Requested server responds with zero or more + BNDUPD messages, followed by UPDDONE that signal end of updates. + + o UPDDONE - The update done message is used by the responding server + to indicate that all requested updates have been sent by the + responding server and acked by the requesting server. + + o CONNECT - The connect message is used by the primary server to + establish a high level connection with the other server, and to + transmit several important configuration data items between the + servers. The partner is expected to confirm by responding with + CONNECTACK message. + + o CONNECTACK - The connect acknowledgement message is used by the + secondary server to respond to a CONNECT message from the primary + server. + + o DISCONNECT - The disconnect message is used by either server when + closing a connection and shutting down. No response is required + for this message. + + o STATE - The state message is used by either server to inform its + partner about a change of failover state. In some cases it may be + used to also inform the partner about current state, e.g. after + connection is established in COMMUNICATIONS-INTERRUPTED or + PARTNER-DOWN states. + + o CONTACT - The contact message is used by either server to ensure + that the other server continues to see the connection as opera- + tional. It MUST be transmitted periodically over every esta- + blished connection if other message traffic is not flowing, and it + MAY be sent at any time. 5. Connection Management 5.1. Creating Connections - Every server implementing the failover protocol SHOULD attempt to - connect to all of its partners periodically, where the period is - implementation dependent and SHOULD be configurable. In the event - that a connection has been rejected by a CONNECTACK message with a - reject-reason option contained in it or a DISCONNECT message, a - server SHOULD reduce the frequency with which it attempts to connect - to that server but it SHOULD continue to attempt to connect + Every primary server implementing the failover protocol SHOULD + attempt to connect to all of its partners periodically, where the + period is implementation dependent and SHOULD be configurable. In + the event that a connection has been rejected by a CONNECTACK message + with a reject-reason option contained in it or a DISCONNECT message, + a server SHOULD reduce the frequency with which it attempts to + connect to that server but it SHOULD continue to attempt to connect periodically. - When a connection attempt succeeds, if the server generating the - connection attempt is a primary server for that relationship, then it - MUST send a CONNECT message down the connection. If it is not a - primary server for the relationship, then it MUST just drop the - connection and wait for the primary server to connect to it. + Every secondary server implementing the failover protocol SHOULD + listen for connection attempts from the primary server. + + When a connection attempt succeeds, the primary server which has + initiated the connection attempt MUST send a CONNECT message down the + connection. When a connection attempt is received, the only information that the receiving server has is the IP address of the partner initiating a - connection. It also knows whether it has the primary role for any - failover relationships with the connecting server. If it has any - relationships for which it is a primary server, it should initiate a - connection of its own to the partner server, one for each primary - relationship it has with that server. - - If it has any relationships with the connecting server for which it - is a seconary server, it should just await the CONNECT message to - determine which relationship this connection is to serve. + connection. If it has any relationships with the connecting server + for which it is a seconary server, it should just await the CONNECT + message to determine which relationship this connection is to serve. If it has no secondary relationships with the connecting server, it SHOULD drop the connection. To summarize -- a primary server MUST use a connection that it has initiated in order to send a CONNECT message. Every server that is a - secondary server in a relationship attempts to create a connection to - the server which is primary in the relationship, but that connection - is only used to stimulate the primary server into recognizing that - the secondary server is ready for operation. The reason behind this - is that the secondary server has no way to communicate to the primary - server which relationship a connection is designed to serve. - - A server which has multiple secondary relationships with a primary - server SHOULD only send one stimulus connection attempt to the - primary server. + secondary server in a relationship simply listens for connection + attempts from the primary server. Once a connection is established, the primary server MUST send a CONNECT message across the connection. A secondary server MUST wait for the CONNECT message from a primary server. If the secondary server doesn't receive a CONNECT message from the primary server in - an installation dependent amount of time, it MAY drop the connection - and send another stimulus connection attempt to the primary server. + an installation dependent amount of time, it MAY drop the connection. Every CONNECT message includes a TLS-request option, and if the CONNECTACK message does not reject the CONNECT message and the TLS- reply option says TLS MUST be used, then the servers will immediately enter into TLS negotiation. Once TLS negotiation is complete, the primary server MUST resend the CONNECT message on the newly secured TLS connection and then wait for the CONNECTACK message in response. The TLS-request and TLS-reply options MUST NOT appear in either this second CONNECT or its associated CONNECTACK message as they had in the first messages. The second message sent over a new connection (either a bare TCP connection or a connection utilizing TLS) is a STATE message. Upon the receipt of this message, the receiver can consider communications up. - A secondary server MUST NOT respond to the closing of a TCP - connection with a blind attempt to reconnect -- there may be another - TCP connection to the same failover partner already in use. - 5.2. Endpoint Identification The proper operation of the failover protocol requires more than the transmission of messages between one server and the other. Each endpoint might seem to be a single DHCPv6 server, but in fact there are situations where additional flexibility in configuration is useful. A failover endpoint is always associated with a set of DHCPv6 prefixes that are configured on the DHCPv6 server where the endpoint appears. A DHCPv6 prefix MUST NOT be associated with more than one failover endpoint. @@ -501,30 +565,30 @@ IP address of the partner, the relationship-name, and the role of the receiving server. 6. Resource Allocation Currently there are two allocation algorithms defined for resources (addresses or prefixes). Additional allocation schemes may be defined as future extensions. 1. Proportional Allocation - This allocation algorithm is a direct - application of algorithm defined in [dhcpv4-failover] to DHCPv6. - Available resources are split between primary and secondary - server. Released resources are always returned to primary - server. Primary and secondary servers may initiate a rebalancing - procedure, when disparity between resources available to each - server reaches a preconfigured threshold. Only resources that - are not leased to any clients are "owned" by one of the servers. - This algorithm is particularly well suited for scenarios where - amount of available resources is limited, as may be the case for - prefix delegation. See Section 6.1 for details. + application of the algorithm defined in [dhcpv4-failover] to + DHCPv6. Available resources are split between the primary and + secondary servers. Released resources are always returned to the + primary server. Primary and secondary servers may initiate a + rebalancing procedure when disparity between resources available + to each server reaches a preconfigured threshold. Only resources + that are not leased to any clients are "owned" by one of the + servers. This algorithm is particularly well suited for + scenarios where amount of available resources is limited, as may + be the case with prefix delegation. See Section 6.1 for details. 2. Independent Allocation - This allocation algorithm assumes that available resources are split between primary and secondary servers as well. In this case, however, resources are assigned to a specific server for all time, regardless if they are available or currently used. This algorithm is much simpler than proportional allocation, because resource imbalance doesn't have to be checked and there is no rebalancing for independent allocation. This algorithm is particularly well suited for scenarios where the there is an abundance of available resources @@ -535,64 +599,62 @@ In this allocation scheme, each server has its own pool of available resources. Note that a resource is not "owned" by a particular server throughout its entire lifetime. Only a resource which is available is "owned" by a particular server -- once it has been leased to a client, it is not owned by either failover partner. When it finally becomes available again, it will be owned initially by the primary server, and it may or may not be allocated to the secondary server by the primary server. - So, the flow of a resource is as follows: initially a resource is - owned by the primary server. It may be allocated to the secondary - server if it is available, and then it is owned by the secondary - server. Either server can allocate available resources which they - own to clients, in which case they cease to own them. When the - client releases the resource or the lease on it expires, it will - again become available and will be owned by the primary. + The flow of a resource is as follows: initially a resource is owned + by the primary server. It may be allocated to the secondary server + if it is available, and then it is owned by the secondary server. + Either server can allocate available resources which they own to + clients, in which case they cease to own them. When the client + releases the resource or the lease on it expires, it will again + become available and will be owned by the primary. A resource will not become owned by the server which allocated it initially when it is released or the lease expires because, in general, that server will have had to replenish its pool of available resources well in advance of any likely lease expirations. Thus, having a particular resource cycle back to the secondary might well put the secondary more out of balance with respect to the primary instead of enhancing the balance of available addresses or prefixes between them. - TODO: Need to rework this v4-specific vocabulary to v6, once we - decide how things will look like in v6. - - When they are used, these proportional pools are used for allocation - when in every state but PARTNER-DOWN state. In PARTNER-DOWN state a - failover server can allocate from either pool. This allocation and - maintenance of these address pools is an area of some sensitivity, - since the goal is to maintain a more or less constant ratio of - available addresses between the two servers. + Pools governed by proportional allocation are used for allocation + when the server is in all states, except PARTNER-DOWN. In PARTNER- + DOWN state the healthy partner can allocate from either pool (both + its own and its partner's). This allocation and maintenance of these + address pools is an area of some sensitivity, since the goal is to + maintain a more or less constant ratio of available addresses between + the two servers. TODO: Reuse rest of the description from section 5.4 from [dhcpv4-failover] here. 6.2. Independent Allocation In this allocation scheme, available resources are split between servers. Available resources are split between the primary and secondary servers as part of initial connection establishment. Once resources are allocated to each server, there is no need to reassign them. This algorithm is simpler than proportional allocation since - it requires no less initial communicagtion and does not require a + it requires similar initial communication and does not require a rebalancing mechanism, but it assumes that the pool assigned to each server will never deplete. That is often a reasonable assumption for IPv6 addresses (e.g. servers are often assigned a /64 pool that contains many more addresses than existing electronic devices on Earth). This allocation mechanism SHOULD be used for IPv6 addresses, - unless configured address pool is small or is otherwise + unless the configured address pool is small or is otherwise administratively limited. Once each server is assigned a resource pool during initial connection establishment, it may allocate assigned resources to clients. Once a client release a resource or its lease is expired, the returned resource returns to pool for the same server. Resources never changes servers. During COMMUNICATION-INTERRUPTED events, a partner MAY continue extending existing leases when requested by clients. A healthy @@ -601,56 +663,212 @@ state. 6.3. Determining Allocation Approach 6.3.1. IPv6 Addresses 6.3.2. IPv6 Prefixes 7. Information model - TODO: Describe information model here. In particular, we need to - describe lease lifecycle here. + In most DHCP servers a resource (an IP address or a prefix) can take + on several different binding-status values, sometimes also called + lease states. While no two DHCP servers probably have exactly the + same possible binding-status values, the DHCP RFC enforces some + commonality among the general semantics of the binding-status values + used by various DHCP server implementations. - TODO: In case of Active-Passive model, while majority of addresses - are owned by the primary server, secondary server will need a portion - of addresses to serve new clients while operating in communication- - interrupted state as also in partner down state before it can take - over the entire address pool (expiry of MCLT). The concept of a - percentage of pool reserved for secondary should be described here. + In order to transmit binding database updates between one server and + another using the failover protocol, some common denominator binding- + status values must be defined. It is not expected that these values + correspond with any actual implementation of the DHCP protocol in a + DHCP server, but rather that the binding-status values defined in + this document should be a common denominator of those in use by many + DHCP server implementations. + + The lease binding-status values defined for the failover protocol are + listed below. Unless otherwise noted below, there MAY be client + information associated with each of these binding-status value. + + ACTIVE -- The lease is assigned to a client. Client identification + data MUST appear. + + EXPIRED -- indicates that a client's binding on a given lease has + expired. When the partner acks the BNDUPD of an expired lease, + the server sets its internal state to FREE*. Client + identification SHOULD appear. + + RELEASED -- indicates that a client sent in RELEASE message. When + the partner acks the BNDUPD of a released lease, the server sets + its internal state to FREE*. Client identification SHOULD appear. + + FREE* -- Once a lease is expired or released, its state becomes + FREE*. Depending on which algorithm and which pool was used to + allocate a given lease, FREE* may either mean FREE or FREE_BACKUP. + Implementations do not have to implement this FREE* state, but may + choose to switch to the destination state directly. For a clarity + of representation, this transitional FREE* state is treated as a + separate state. + + FREE -- Is used when a DHCP server needs to communicate that a + resource is unused by any client, but it was not just released, + expired or reset by a network administrator. When the partner + acks the BNDUPD of a FREE lease, the server marks the lease as + available for assignment by the primary server. Note that on a + secondary server running in PARTNER-DOWN state, after waiting the + MCLT, the resource MAY be allocated to a client by the secondary + server if proportional algorithm is used. Client identification + MAY appear. + + FREE_BACKUP -- indicates that this resource can be allocated by the + secondary server to a client at any time. Note that the primary + server running in PARTNER-DOWN state, after waiting the MCLT, the + resource MAY be allocated to a client by the primary server if + proportional algorithm was used. Client identification MAY + appear. + + ABANDONED -- indicates that a lease is considered unusable by the + DHCP system. The primary reason for entering such state is + reception of DECLINE message for said lease. Client + identification MUST NOT appear. + + RESET -- indicates that this resource was previously abandoned, but + was made available by operator command. This is a distinct state + so that the reason that the resource became FREE can be + determined. Client identification MAY appear. + + The lease state machine has been presented in Figure 1. Most states + are stationary, i.e. the lease stays in a given state untile exernal + event triggers transition to another state. The only transitive + state is FREE*. One it is reached, the the state machine immediately + transitions to either FREE or FREE_BACKUP state. + + +---------+ + /------------->| ACTIVE |<--------------\ + | +---------+ | + | | | | | + | /--(8)--/ (3) \--(9)-\ | + | | | | | + | V V V | + | +-------+ +--------+ +---------+ | + | |EXPIRED| |RELEASED| |ABANDONED| | + | +-------+ +--------+ +---------+ | + | | | | | + | | | (10) | + | | | V | + | | | +---------+ | + | | | | RESET | | + | | | +---------+ | + | | | | | + | \--(4)--\ (4) /--(4)--/ | + | | | | | + (1) V V V (2) + | /---------\ | + | | FREE* | | + | \---------/ | + | | | | + | /-(5)--/ \-(6)-\ | + | | | | + | V V | + | +-------+ +-----------+ | + \----| FREE |<--(7)-->|FREE_BACKUP|-----/ + +-------+ +-----------+ + + Figure 1: Lease State Machine + + Transitions between states are results of the following events: + + 1. Primary server allocates a lease. + + 2. Secondary server allocates a lease. + + 3. Client sends RELEASE and the lease is released. + + 4. Partner acknowledges state change. This transition MAY also + occur if the server is in PARTNER-DOWN state and the MCLT has + passed since the entry in RELEASED, EXPIRED, or RESET states. + + 5. The lease belongs to a pool that is governed by the + proportional allocation, or independent allocation is used and + this lease belongs to primary server. + + 6. The lease belongs to a pool that is governed by the + independent allocation is used and the lease belongs to the + secondary server. + + 7. Pool rebalance event occurs (POOLREQ/POOLRSP messages are + exchanged). Addresses (or prefixes) belonging to the primary + server can be assigned to the secondary server pool (transition + from FREE to FREE_BACKUP) or vice versa. + + 8. The lease is expired. + + 9. DECLINE message is received or a lease is deemed unusable for + other reasons. + + 10. An administrative action is taken to recover an abandoned + lease back to usable state. This transition MAY occur due to an + implementation specific handling on ABANDONED resource. One + possible example of such use is an Neighbor Discovery or ICMP Echo + check if the address is still in use. + + The resource that is no longer in use (due to expiration or release), + becomes FREE*. Depending of what allocation algorithm is used, the + resource that is no longer is use, returns to primary (FREE) or + secondary pool (FREE_BACKUP). The conditions for specific + transitions are depicted in Figure 2. + + +---------------+---------+-----------+ + | \ Pool owner| | | + | \-------\ | Primary | Secondary | + |Algorithm \ | | | + +---------------+---------+-----------+ + | Proportional | FREE | FREE | + | Independent | FREE |FREE_BACKUP| + +---------------+---------+-----------+ + + Figure 2: FREE* State Transitions + + TODO: In case of Active-Passive model, while a majority of the + addresses are owned by the primary server, the secondary server will + need a portion of the addresses to serve new clients while operating + in communication-interrupted state and also in partner down state + before it can take over the entire address pool (expiry of MCLT). + The concept of a percentage of pool reserved for secondary should be + described here. 8. Failover Mechanisms This section lays out an overview of the communication between partners and other mechanisms required for failover operation. As this is a design document, not a protocol specification, high level - ideas are presented without implementation specific details (e.g. - lack of on-wire formats). Implementation details will be specified - in a separate draft. + ideas are presented without implementation specific details (e.g. on- + wire protocol formats). Specific protocol details are out of the + scope of this document, and may be specified in a separate draft. 8.1. Time Skew Partners exchange information about known lease states. To reliably compare a known lease state with an update received from a partner, servers must be able to reliably compare the times stored in the known lease state with the times received in the update. Although a simple approach would be to require both partners to use synchronized - time, e.g. by using NTP, such a service may become unavailable in - some scenarios that failover expects to cover, e.g. network - partition. Therefore a mechanism to measure and track relative time - differences between servers is necessary. To do so, each message - MUST contain FO_TIMESTAMP option that contains the timestamp of the - transmission in the time context of the transmitter. The - transmitting server MUST set this as close to the actual transmission - as possible. The receiving partner MUST store its own timestamp of - reception event as close to the actual reception as possible. The - received timestamp information is then compared with local timestamp. + time, e.g. by using NTP, such a service may not always be available + in some scenarios that failover expects to cover. Therefore a + mechanism to measure and track relative time differences between + servers is necessary. To do so, each message MUST contain + information about the time of the transmission in the time context of + the transmitter. The transmitting server MUST set this as close to + the actual transmission as possible. The receiving partner MUST + store its own timestamp of reception as close to the actual reception + as possible. The received timestamp information is then compared + with local timestamp. To account for packet delay variation (jitter), the measured difference is not used directly, but rather the moving average of last TIME_SKEW_PKTS_AVG packets time difference is calculated. This averaged value is referred to as the time skew. Note that the time skew algorithm allows cooperation between clients with completely desynchronized clocks as well as those whose desynchronization itself is not constant. 8.2. Time expression @@ -694,21 +912,21 @@ allow a server to offer the configured lease time to a client. During a lazy update the updating server typically updates its partner with a potential expiration time which is longer than the lease time previously given to the client and which is longer than the lease time that the server has been configured to give a client. This allows that server to give a longer lease time to the client the next time the client renews its lease, since the time that it will give to the client will not exceed the MCLT beyond the potential expiration time acknowledged by its partner. - The fundamental relationship on which much of The correctness of this + The fundamental relationship on which much of the correctness of this protocol depends is that the lease expiration time known to a DHCPv6 client MUST NOT under any circumstances be more than the maximum client lead time (MCLT) greater than the potential expiration time known to a server's partner. The remainder of this section makes the above fundamental relationship more explicit. This protocol requires a DHCPv6 server to deal with several different lease intervals and places specific restrictions on their @@ -771,21 +990,21 @@ send a BNDACK in response to a BNDUPD message until it is sure that the information in the BNDUPD message has been updated in its lease database. Thus, the primary server in this case can be sure that the secondary server has recorded the potential lease interval in its stable storage when the primary server receives a BNDACK message from the secondary server. When the DHCPv6 client attempts to renew at T1 (approximately one half an hour from the start of the lease), the primary server again determines the desired valid lifetime, which is still 3 days. It - then compares this with the remaining acknowledged potential valid + then compares this with the original acknowledged potential valid lifetime (3 days + 1/2 hour) and adjusts for the time passed since the secondary was last updated (1/2 hour). Thus the time remaining of the acknowledged potential valid interval is 3 days. Adding the MCLT to this yields 3 days plus 1 hour, which is more than the desired valid lifetime of 3 days. So the client is renewed for the desired valid lifetime -- 3 days. When the primary DHCPv6 server updates the secondary DHCPv6 server after the DHCPv6 client's renewal REPLY is complete, it will calculate the desired potential valid lifetime as the T1 fraction of @@ -805,99 +1024,224 @@ MCLT allows full recovery from a variety of failures. 8.5. Unreachability detection Each partner maintains an FO_SEND timer for each partner connection. The FO_SEND timer is reset every time any message is transmitted. If the timer reaches the FO_SEND_MAX value, a CONTACT message is transmitted and timer is reset. The CONTACT message may be transmitted at any time. - Discussion: Perhaps it would be more reasonable to use echo-reply - approach, rather than periodic transmissions? - 8.6. Re-allocating Leases TODO: Describe controlled re-allocation of released/expired leases to different clients. -8.7. Sending Data +8.7. Sending Binding Update + + This and the following section is written as though every BNDUPD + message contains only a single binding update transaction in order to + reduce the complexity of the discussion. Note that while a server + MAY generate BNDUPD messages with multiple binding update + transactions, every server MUST be able to process a BNDUPD message + which contains multiple binding update transactions and generate the + corresponding BNDACK messages with status for multiple binding update + transactions. Each server updates its failover partner about recent changes in - lease states. Each update must include following information: + lease states. Each update MUST include at least the following + information: - 1. resource type - non-temporary address or a prefix + 1. resource type - non-temporary address or a prefix. Resource + type can be indicated by the container that conveys the actual + resource (e.g. an IA_NA option indicates non-temporary IPv6 + address). - 2. resource information - actual address or prefix + 2. resource information - the actual address or prefix. That is + conveyed using the appropriate option, e.g. an IAADDR for an + address or an IAPREFIX for prefix. 3. valid life time requested by client - 4. IAID - Identity Association used by client, while obtaining this - lease. (Note1: one client may use many IAID simulatenously. - Note2: IAID for IA, TA and PD are orthogonal number spaces.) + 4. valid life time sent to client - 5. valid life time sent to client + 5. IAID - Identity Association used by the client, while obtaining + a given lease. (Note1: one client may use many IAIDs + simulatenously. Note2: IAID for IA, TA and PD are orthogonal + number spaces.) - 6. potential valid life time + 6. Next Expected Client Transmission - time interval since Client + Last Transmission Time, when a response from a client is + expected. - 7. preferred life time sent to client + 7. potential valid life time - a lifetime that the server is + willing to set if there were no MCLT/failover restrictions + imposed. - 8. CLTT - Client Last Transaction Time, a timestamp of the last + 8. preferred life time sent to client - the actual value sent back + to the client + + 9. CLTT - Client Last Transaction Time, a timestamp of the last received transmission from a client - 9. assigned FQDN names, if any (optional) + 10. Client DUID - Discussion: Do we need T1 as well? Something like next expected - client transmission? + The BNDUPD message MAY contain additional information related to the + updated lease. The additional information MAY include, but is not + limited to: - Q: Maybe we could reuse IA_NA and IA_PD options here? Yes. + 1. assigned FQDN name, defined in [RFC4704] - Q: Do we care about preferred lifetime? (presumably no). Certainly - not what was requested by the client. + 2. Options Requested by the client, i.e. content of the ORO - Q: Do we care about IAID? (presumably yes) Yes. + 3. Remote-ID, defined in [RFC4649] -8.7.1. Required Data + 4. Relay-ID, defined in [RFC5460], section 5.4.1 -8.7.2. Optional Data + 5. Link-layer address + [I-D.ietf-dhc-dhcpv6-client-link-layer-addr-opt] -8.8. Receiving Data + 6. Any other options the updating partner deems useful. -8.8.1. Conflict Resolution + Receiving partner MAY store received additional information, but it + MAY choose to ignore them as well. Some information may be useful, + so it is a good idea to keep or update them. One reason is FQDN + information. A server SHOULD be prepared to clean up DNS information + once the lease expires or is released. Another reason the partner + may be interested in keepin additional data is a better support for + leasequery [RFC5007] or bulk leasequery [RFC5460], which features + queries based on Relay-ID, by link address and by Remote-ID. - TODO: This is just a loose collection of notes. This section will - probably need to be rewritten as a a flowchart of some kind. +8.8. Receiving Binding Update + + When a server receives a BNDUPD message, it needs to decide how to + process the binding update transaction it contains and whether that + transaction represents a conflict of any sort. The conflict + resolution process MUST be used on the receipt of every BNDUPD + message, not just those that are received while in POTENTIAL-CONFLICT + state, in order to increase the robustness of the protocol. + + There are three sorts of conflicts: + + 1. Two clients, one resource - This is the duplicate resource + allocation conflict. There two different clients each allocated + the same resource. See Section 8.9. + + 2. Two resources, one client conflict - This conflict exists when a + client on one server is associated with a one resource, and on + the other server with a different resource in the same or related + subnet. This does not refer to the case where a single client + has resources in multiple different subnets or administrative + domains, but rather the case where on the same subnet the client + has a lease on one IP address in one server and on a different IP + address on the other server. + This conflict may or may not be a problem for a given DHCP server + implementation and policy. If implementations and policies + allow, both resources can be assigned to a given client. In the + event that a DHCP server requires that a DHCP client have only + one outstanding lease of a given type, the conflict MUST be + resolved by accepting the lease which has the latest CLTT. + + 3. binding-status conflict - This is normal conflict, where one + server is updating the other with newer information. See + Section 8.9 for details of how to resolve these conflicts. + +8.9. Conflict Resolution The server receiving a lease update from its partner must evaluate the received lease information to see if it is consistent with - already known state and decide which information - previously known - or just received - is "better". The server should take into - consideration the following aspects: if the lease is already assigned - to specific client, who had contact with client recently, start time - of the lease, etc. + already known state and decide which information - the previously + known or that just received - is "better". The server should take + into consideration the following aspects: if the lease is already + assigned to a specific client, who had contact with client recently, + start time of the lease, etc. + + When analyzing a BNDUPD message from a partner server, if there is + insufficient information in the BNDUPD to process it, then reject the + BNDUPD with reject-reason 3: "Missing binding information". + + If the resource in the BNDUPD is not a resource associated with the + failover endpoint which received the BNDUPD message, then reject it + with reject-reason 1: "Illegal IP address (not part of any address + pool)". + + Every BNDUPD message SHOULD contain a client-last-transaction-time + option, which MUST, if it appears, be the time that the server last + interacted with the DHCP client. It MUST NOT be, for instance, the + time that the lease on an IP address expired. If there has been no + interaction with the DHCP client in question (or there is no DHCP + client presently associated with this resource), then there will be + no client-last-transaction-time option in the BNDUPD message. + + The list in Figure 3 presents the conflict resolution outcome. To + "accept" BNDUPD means to update the server's bindings database with + the information contained in the BNDUDP and once the update is + complete, send a BNDACK message corresponding to the BNDUPD message. + To "reject" a BNDUPD means to lease the server's binding database + unchangeg and to respond to the BNDUPD with BNDACK with a rejest- + reason option included. + + When interpreting the information in the following table (Figure 3), + for those rules that are listed with "time" -- if a BNDUPD doesn't + have a client-last-transaction-time value, then it MUST NOT be + considered later than the client-last-transaction-time in the + receiving server's binding. If the BNDUPD contains a client-last- + transaction-time value and the receiving server's binding does not, + then the client-last-transaction-time value in the BNDUPD MUST be + considered later than the server's. + + binding-status in received BNDUPD. + binding-status + in receiving FREE RESET + server ACTIVE EXPIRED RELEASED FREE_BACKUP ABANDONED + + ACTIVE accept(5) time(2) time(1) time(2) accept + EXPIRED time(1) accept accept accept accept + RELEASED time(1) time(1) accept accept accept + FREE/BACKUP accept accept accept accept accept + RESET time(3) accept accept accept accept + ABANDONED reject(4) reject(4) reject(4) reject(4) accept + + Figure 3: Conflict Resolution + + time(1): If the client-last-transaction-time in the BNDUPD is later + than the client-last-transaction-time in the receiving server's + binding, accept it, else reject it. + + time(2): If the current time is later than the receiving servers' + lease-expiration-time, accept it, else reject it. + + time(3): If the client-last-transaction-time in the BNDUPD is later + than the start-time-of-state in the receiving server's binding, + accept it, else reject it. + + (1,2,3): If rejecting, use reject reason 15: "Outdated binding + information". + + (4): Use reject reason 16: "Less critical binding information". + + (5): If the clients in a BNDUPD message and in a receiving server's + binding differ, then if the receiving server is a secondary accept + it, else reject it with a reject reason of 2: "Fatal conflict exists: + + address in use by other client". The lease update may be accepted or rejected. Rejection SHOULD NOT change the flag in a lease that says that it should be transmitted to the failover partner. If this flag is set, then it should be transmitted, but if it is not already set, the rejection of a lease state update SHOULD NOT trigger an automatic update of the failover partner sending the rejected update. The potential for update storms is too great, and in the unusual case where the servers simply can't agree, that disagreement is better than an update storm. - Discussion: There will definitely be different types of update - rejections. For example, this will allow a server to treat - differently a case when receiving a new lease that it previously - haven't seen than a case when partner sents old version of a lease - for which a newer state is known. - -8.8.2. Acknowledging Reception +8.10. Acknowledging Reception 9. Endpoint States 9.1. State Machine Operation Each server (or, more accurately, failover endpoint) can take on a variety of failover states. These states play a crucial role in determining the actions that a server will perform when processing a request from a DHCPv6 client as well as dealing with changing external conditions (e.g., loss of connection to a failover partner). @@ -916,45 +1260,49 @@ A server will transition from one failover state to another based on the specific values held by the following state variables: o Current failover state. o Communications status (OK or not OK). o Partner's failover state (if known). - Whenever the either of the last two of the above state variables - changes state, the state machine is invoked, which may then trigger a - change in the current failove state. Thus, whenever the - communications status changes, the state machine is processing is - invoked. This may or may not result in a change in the current - failover state. + Several events can cause the transition from one failover state to + another. + + o Change in communications status (OK or not OK). + + o Change in partner's failover state. + + o Receipt of particular messages. + + o Expiration of timers. + + Whenever either of the last two of the above state variables changes + state, the state machine is invoked, which may then trigger a change + in the current failove state. Thus, whenever the communications + status changes, the state machine processing is invoked. This may or + may not result in a change in the current failover state. Whenever a server transitions to a new failover state, the new state MUST be communicated to its failover partner in a STATE message if the communications status is OK. In addition, whenever a server makes a transition into a new state, it MUST record the new state, its current understanding of its partner's state, and the time at which it entered the new state in stable storage. The following state transition diagram gives a condensed view of the state machine. If there is a difference between the words describing a particular state and the diagram below, the words should be considered authoritative. - A transition into SHUTDOWN or PAUSED state is not represented in the - following figure, since other than sending that state to its partner, - the remaining actions involved look just like the server halting in - its otherwise current state, which then becomes the previous state - upon server restart. - +---------------+ V +--------------+ | RECOVER -|+| | | STARTUP - | |(unresponsive) | +->+(unresponsive)| +------+--------+ +--------------+ +-Comm. OK +-----------------+ | Other State: | PARTNER DOWN - +<----------------------+ | RESOLUTION-INTER. | (responsive) | ^ All POTENTIAL- +----+------------+ | Others CONFLICT------------ | --------+ | | CONFLICT-DONE Comm. OK | +--------------+ | @@ -973,60 +1321,92 @@ +------+--------+ Other +>+----+--------++ resolve Comm. | Comm. OK State: | | ^ conflict Changed | +---Other State:-+ RECOVER | Secondary | V V | | | | | DONE | resolve | ++----------+---++ | | All Others: POTENT. | | conflict | |CONFLICT-DONE-|+| | | Wait for CONFLICT- | ----+ see (9.10) | | (responsive) | | | Other State: V V | +------+---------+ | | NORMAL or RECOVER ++------------+---+ Other State: NORMAL | | | DONE | NORMAL + +<--------------+ | | +--+----------+-->+ (balanced) +-------External Command--->+ - | ^ ^ +--------+--------+ or Other State: | - | | | | | SHUTDOWN | - | Wait for Comm. OK Comm. Failed or | | - | Other Other Other State: PAUSED | External + | ^ ^ +--------+--------+ | + | | | | | | + | Wait for Comm. OK Comm. Failed | | + | Other Other | External | State: State: | | Command | RECOVER-DONE NORMAL Start Safe Comm. OK or | | COMM. INT. Period Timer Other State: Safe | Comm. OK. | V All Others Period | Other State: | +---------+--------+ | expiration | RECOVER +--+ COMMUNICATIONS - +----+ | | +-------------+ INTERRUPTED | | RECOVER | (responsive) +-------------------------->+ RECOVER-WAIT--------->+------------------+ - Figure 1: Failover Endpoint State Machine + Figure 4: Failover Endpoint State Machine 9.2. State Machine Initialization - TODO + The state machine is characterized by storage (in stable storage) of + at least the following information: + + o Current failover state. + + o Previous failover state. + + o Start time of current failover state. + + o Partner's failover state. + + o Start time of partner's failover state. + + o Time most recent packet received from partner. + + The state machine is initialized by reading these data items from + stable storage and restoring their values from the information saved. + If there is no information in stable storage concerning these items, + then they should be initialized as follows: + + o Current failover state: Primary: PARTNER-DOWN, Secondary: RECOVER + + o Previous failover state: None. + + o Start time of current failover state: Current time. + + o Partner's failover state: None until reception of STATE message. + + o Start time of partner's failover state: None until reception of + STATE message. + + o Time most recent packet received from partner: None until packet + received. 9.3. STARTUP State The STARTUP state affords an opportunity for a server to probe its partner server, before starting to service DHCP clients. When in the STARTUP state, a server attempts to learn its partner's state and determine (using that information if it is available) what state it should enter. The STARTUP state is not shown with any specific state transitions in - the state machine diagram (Figure 1) because the processing during + the state machine diagram (Figure 4) because the processing during the STARTUP state can cause the server to transition to any of the other states, so that specific state transition arcs would only obscure other information. 9.3.1. Operation in STARTUP State The server MUST NOT be responsive in STARTUP state. Whenever a STATE message is sent to the partner while in STARTUP - state the STARTUP flag MUST be set the message and the previously + state the STARTUP flag MUST be set in the message and the previously recorded failover state MUST be placed in the server-state option. 9.3.2. Transition Out of STARTUP State The following algorithm is followed every time the server initializes itself, and enters STARTUP state. Step 1: If there is any record in stable storage of a previous failover state @@ -1048,21 +1428,21 @@ move directly into PARTNER-DOWN state after the startup period expires if it has been unable to contact its partner during the startup period. Step 2: If the previous state is one where communications was "OK", then set the previous state to the state that is the result of the communications failed state transition (if such transition exists -- some states don't have a communications failed state transition, - since they allow both commun- ications OK and failed). + since they allow both communications OK and failed). Step 3: Start the STARTUP state timer. The time that a server remains in the STARTUP state (absent any communications with its partner) is implementation dependent but SHOULD be short. It SHOULD be long enough for a TCP connection to be created to a heavily loaded partner across a slow network. Step 4: @@ -1083,21 +1463,21 @@ time at which it entered PARTNER-DOWN state is earlier than the last recorded time of operation of this server, then set CURRENT-STATE to POTENTIAL-CONFLICT. Then, transition to the current state and take the "communications OK" state transition based on the current state of this server and the partner. Step 6: - If the startup time expires the server SHOULD go transition to the + If the startup time expires the server SHOULD transition to the PREVIOUS-STATE. 9.4. PARTNER-DOWN State PARTNER-DOWN state is a state either server can enter. When in this state, the server assumes that it is the only server operating and serving the client base. If one server is in PARTNER-DOWN state, the other server MUST NOT be operating. 9.4.1. Operation in PARTNER-DOWN State @@ -1120,62 +1500,62 @@ DHCP client different from that to which it was allocated at the entrance to PARTNER-DOWN state until the maximum-client-lead-time beyond the maximum of the following times: client expiration time, most recently transmitted potential-expiration-time, most recently received ack of potential-expiration-time from the partner, and most recently acked potential-expiration-time to the partner. If this time would be earlier than the current time plus the maximum-client- lead-time, then the time the server entered PARTNER-DOWN state plus the maximum-client-lead-time is used. - The server is not restricted by the MCLT when offering lease tmes + The server is not restricted by the MCLT when offering lease times while in PARTNER-DOWN state. In the unlikely case, when there are two servers operating in a - PARTNER-DOWN state, there is a change od duplicate leases assigned. + PARTNER-DOWN state, there is a chance of duplicate leases assigned. This leads to a POTENTIAL-CONFLICT (unresponsive) state when they re- establish contact. The duplicate lease issue can be postponed to a - large extent by the server giving new leases from its own pool. - Therefore the server operating in PARTNER-DOWN state MUST use its own - pool first for new leases before assigning any leases from its downed - partner pool. + large extent by the server granting new leases first from its own + pool. Therefore the server operating in PARTNER-DOWN state MUST use + its own pool first for new leases before assigning any leases from + its downed partner pool. 9.4.2. Transition Out of PARTNER-DOWN State When a server in PARTNER-DOWN state succeeds in establishing a con- nection to its partner, its actions are conditional on the state and flags received in the STATE message from the other server as part of the process of establishing the connection. If the STARTUP bit is set in the server-flags option of a received STATE message, a server in PARTNER-DOWN state MUST NOT take any state transitions based on reestablishing communications. Essentially, if a server is in PARTNER-DOWN state, it ignores all STATE messages from its partner that have the STARTUP bit set in the server-flags option - of the STATE message. THIS NEEDS TO BE MOVED + of the STATE message. If the STARTUP bit is not set in the server-flags option of a STATE message received from its partner, then a server in PARTNER-DOWN state takes the following actions based on the state of the partner as received in a STATE message (either immediately after establishing communications or at any time later when a new state is received) If the partner is in: NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN, POTENTIAL-CONFLICT, RESOLUTION-INTERRUPTED, or CONFLICT-DONE state transition to POTENTIAL-CONFLICT state If the partner is in: - RECOVER, RECOVER-WAIT, SHUTDOWN, PAUSED state + RECOVER, RECOVER-WAIT state stay in PARTNER-DOWN state If the partner is in: RECOVER-DONE state transition into NORMAL state 9.5. RECOVER State @@ -1247,21 +1627,21 @@ RECOVER-DONE | | | | >--STATE-(RECOVER-DONE)------> | | NORMAL | <-------------(NORMAL)-STATE--< | NORMAL | | >---- State-(NORMAL)---------------> | | | | | - Figure 2: Transition out of RECOVER state + Figure 5: Transition out of RECOVER state If, at any time while a server is in RECOVER state communications fails, the server will stay in RECOVER state. When communications are restored, it will restart the process of transitioning out of RECOVER state. 9.6. RECOVER-WAIT State This state indicates that the server has done an UPDREQ or UPDREQALL and has received the UPDDONE message indicating that it has received @@ -1397,23 +1777,23 @@ Section 8.5), then transition into COMMUNICATIONS-INTERRUPTED state. If a server in NORMAL state receives any messages from its partner where the partner has changed state from that expected by the server in NORMAL state, then the server should transition into COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- sition from there. For example, it would be expected for the partner to transition from POTENTIAL-CONFLICT into NORMAL state, but not for the partner to transition from NORMAL into POTENTIAL-CONFLICT state. - If a server in NORMAL state receives any messages from its partner - where the PARTNER has changed into SHUTDOWN state, the server should - transition into PARTNER-DOWN state. + If a server in NORMAL state receives a DISCONNECT message from its + partner, the server should transition into COMMUNICATIONS-INTERRUPTED + state. 9.9. COMMUNICATIONS-INTERRUPTED State A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is unable to communicate with its partner. Primary and secondary servers cycle automatically (without administrative intervention) between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network connection between them fails and recovers, or as the partner server cycles between operational and non-operational. No duplicate IP address allocation can occur while the servers cycle between these @@ -1425,31 +1805,31 @@ has been configured, see section 10), then a timer MUST be started for the length of the configured safe period. A server transitioning into the COMMUNICATIONS-INTERRUPTED state from the NORMAL state SHOULD raise some alarm condition to alert administrative staff to a potential problem in the DHCP subsystem. 9.9.1. Operation in COMMUNICATIONS-INTERRUPTED State In this state a server MUST respond to all DHCP client requests. - When allocating new lease, each server allocates from its own pool, + When allocating new leases, each server allocates from its own pool, where the primary MUST allocate only FREE resources (addresses or - prefixes), and the secondary MUST allocate only BACKUP resources + prefixes), and the secondary MUST allocate only FREE_BACKUP resources (addresses or prefixes). When responding to RENEW messages, each server will allow continued renewal of a DHCP client's current lease on an IP address or prefix irrespective of whether that lease was given out by the receiving server or not, although the renewal period MUST NOT exceed the maximum client lead time (MCLT) beyond the latest of: 1) the potential valid lifetime already acknowledged by the other - server, or 2) the lease- expiration-time , or 3) the potential valid - lifetime received from the partner server. + server, or 2) the actual valid lifetime sent to the DHCPv6 client, or + 3) the potential valid lifetime received from the partner server. However, since the server cannot communicate with its partner in this state, the acknowledged potential valid lifetime will not be updated in any new bindings. This is likely to eventually cause the actual valid lifetimes to be the current time plus the MCLT (unless this is greater than the desired-client-lease- time). The server should continue to try to establish a connection with its partner. @@ -1470,22 +1850,20 @@ o NORMAL or COMMUNICATIONS-INTERRUPTED: Transition into the NORMAL state. o RECOVER: Stay in COMMUNICATIONS-INTERRUPTED state. o RECOVER-DONE: Transition into NORMAL state. o PARTNER-DOWN, POTENTIAL-CONFLICT, CONFLICT-DONE, or RESOLUTION- INTERRUPTED: Transition into POTENTIAL-CONFLICT state. - o SHUTDOWN: Transition into PARTNER-DOWN state. - The following figure illustrates the transition from NORMAL to COMMUNICATIONS-INTERRUPTED state and then back to NORMAL state again. Primary Secondary Server Server NORMAL NORMAL | >--CONTACT-------------------> | | <--------------------CONTACT--< | | [TCP connection broken] | @@ -1503,32 +1881,32 @@ | | >--BNDUPD--------------------> | | <---------------------BNDACK--< | | | | <---------------------BNDUPD--< | | >------BNDACK----------------> | ... ... | | | <--------------------POOLREQ--< | | >--POOLRESP-(2)--------------> | - t> | | + | | | >--BNDUPD-(#1)---------------> | | <---------------------BNDACK--< | | | | <--------------------POOLREQ--< | | >--POOLRESP-(0)--------------> | | | | >--BNDUPD-(#2)---------------> | | <---------------------BNDACK--< | | | - Figure 3: Transition from NORMAL to COMMUNICATIONS-INTERRUPTED and + Figure 6: Transition from NORMAL to COMMUNICATIONS-INTERRUPTED and back (example with 2 addresses allocated to secondary) 9.10. POTENTIAL-CONFLICT State This state indicates that the two servers are attempting to reintegrate with each other, but at least one of them was running in a state that did not guarantee automatic reintegration would be possible. In POTENTIAL-CONFLICT state the servers may determine that the same resource has been offered and accepted by two different clients. @@ -1596,21 +1974,21 @@ | >--UPDDONE-------------------> | | NORMAL | <------------STATE--(NORMAL)--< | NORMAL | | >--STATE--(NORMAL)-----------> | | | | <--------------------POOLREQ--< | | >------POOLRESP-(n)----------> | | addresses | - Figure 4: Transition out of POTENTIAL-CONFLICT + Figure 7: Transition out of POTENTIAL-CONFLICT 9.11. RESOLUTION-INTERRUPTED State This state indicates that the two servers were attempting to reintegrate with each other in POTENTIAL-CONFLICT state, but communications failed prior to completion of re-integration. If the servers remained in POTENTIAL-CONFLICT while communications was interrupted, neither server would be responsive to DHCP client requests, and if one server had crashed, then there might be no @@ -1621,21 +1999,21 @@ DHCP subsystem. 9.11.1. Operation in RESOLUTION-INTERRUPTED State In this state a server MUST respond to all DHCP client requests. When allocating new resources (addresses or prefixes), each server SHOULD allocate from its own pool (if that can be determined), where the primary SHOULD allocate only FREE resources, and the secondary SHOULD allocate only BACKUP resources. When responding to renewal requests, each server will allow continued renewal of a DHCP client's - current lease irrespective of whether that lease was given out by the + current lease independent of whether that lease was given out by the receiving server or not, although the renewal period MUST NOT exceed the maximum client lead time (MCLT) beyond the latest of: 1) the potential valid lifetime already acknowledged by the other server or 2) the lease-expiration-time or 3) potential valid lifetime received from the partner server. However, since the server cannot communicate with its partner in this state, the acknowledged potential valid lifetime will not be updated in any new bindings. @@ -1675,86 +2053,20 @@ 9.12.2. Transition Out of CONFLICT-DONE State If communications fails with the partner while in CONFLICT-DONE state, then the server will remain in CONFLICT-DONE state. When a primary server determines that the secondary server has made a transition into NORMAL state, the primary server will also transition into NORMAL state. -9.13. PAUSED State - - TODO: Remove PAUSED state completely - - This state exists to allow one server to inform another that it will - be out of service for what is predicted to be a relatively short - time, and to allow the other server to transition to COMMUNICATIONS- - INTERRUPTED state immediately and to begin servicing all DHCP clients - with no interruption in service to new DHCP clients. - - A server which is aware that it is shutting down temporarily SHOULD - send a STATE message with the server-state option containing PAUSED - state and close the TCP connection. - - While a server may or may not transition internally into PAUSED - state, the 'previous' state determined when it is restarted MUST be - the state the server was in prior to receiving the command to shut- - down and restart and which precedes its entry into the PAUSED state. - See Section 9.3.2 concerning the use of the previous state upon - server restart. - - When entering PAUSED state, the server MUST store the previous state - in stable storage, and use that state as the previous state when it - is restarted. - -9.13.1. Operation in PAUSED State - - Server MUST NOT perform any operation while in PAUSED state. - -9.13.2. Transition Out of PAUSED State - - A server makes a transition out of PAUSED state by being restarted. - At that time, the previous state MUST be the state the server was in - prior to entering the PAUSED state. - -9.14. SHUTDOWN State - - This state exists to allow one server to inform another that it will - be out of service for what is predicted to be a relatively long time, - and to allow the other server to transition immediately to PARTNER- - DOWN state, and take over completely for the server going down. - - When entering SHUTDOWN state, the server MUST record the previous - state in stable storage for use when the server is restarted. It - also MUST record the current time as the last time operational. - - A server which is aware that it is shutting down SHOULD send a STATE - message with the server-state field containing SHUTDOWN. - -9.14.1. Operation in SHUTDOWN State - - A server in SHUTDOWN state MUST NOT respond to any DHCP client input. - - If a server receives any message indicating that the partner has - moved to PARTNER-DOWN state while it is in SHUTDOWN state then it - MUST record RECOVER state as the previous state to be used when it is - restarted. - - A server SHOULD wait for a few seconds after informing the partner of - entry into SHUTDOWN state (if communications are okay) to determine - if the partner entered PARTNER-DOWN state. - -9.14.2. Transition Out of SHUTDOWN State - - A server makes a transition out of SHUTDOWN state by being restarted. - 10. Proposed extensions The following section discusses possible extensions to the proposed failover mechanism. Listed extensions must be sufficiently simple to not further complicate failover protocol. Any proposals that are considered complex will be defined as stand-alone extensions in separate documents. 10.1. Active-active mode @@ -1780,22 +2092,23 @@ equal value could theoretically work as a crude attempt to provide load balancing. It wouldn't do much good on its own, as one (faster) server could be chosen more frequently (assuming that with equal preference sets clients will pick first responding server, which is not mandated by DHCPv6). We could design a simple mechanism of dynamically updating preference depending on usage of available resources. This concept hasn't been investigated in detail yet. 11. Dynamic DNS Considerations - TODO: Descibe DNS Updates challenges in failover environment. It is - nicely described in Section 5.12 of [dhcpv4-failover]. + TODO: Describe DNS Update [RFC2136] challenges in failover + environment. It is nicely described in Section 5.12 of + [dhcpv4-failover]. 12. Reservations and failover TODO: Describe how lease reservation works with failover. See Section 5.13 in [dhcpv4-failover]. 13. Protocol entities Discussion: It is unclear if following sections belong to design or protocol draft. It is currently kept here as a scratchbook with list @@ -1853,58 +2166,64 @@ involvement and contributions. Authors would like to thank VithalPrasad Gaitonde for his insightful comments. This work has been partially supported by Department of Computer Communications (a division of Gdansk University of Technology) and the Polish Ministry of Science and Higher Education under the European Regional Development Fund, Grant No. POIG.01.01.02-00-045/ 09-00 (Future Internet Engineering Project). 18. References - 18.1. Normative References + [I-D.ietf-dhc-dhcpv6-client-link-layer-addr-opt] + Halwasia, G., Systems, C., and W. Dec, "Client Link-layer + Address Option in DHCPv6", + draft-ietf-dhc-dhcpv6-client-link-layer-addr-opt-01 (work + in progress), August 2012. + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. + [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, + "Dynamic Updates in the Domain Name System (DNS UPDATE)", + RFC 2136, April 1997. + [RFC3315] Droms, R., Bound, J., Volz, B., Lemon, T., Perkins, C., and M. Carney, "Dynamic Host Configuration Protocol for IPv6 (DHCPv6)", RFC 3315, July 2003. [RFC3633] Troan, O. and R. Droms, "IPv6 Prefix Options for Dynamic Host Configuration Protocol (DHCP) version 6", RFC 3633, December 2003. [RFC4704] Volz, B., "The Dynamic Host Configuration Protocol for IPv6 (DHCPv6) Client Fully Qualified Domain Name (FQDN) Option", RFC 4704, October 2006. - [RFC5460] Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460, - February 2009. - 18.2. Informative References [I-D.ietf-dhc-dhcpv6-failover-requirements] Mrugalski, T. and K. Kinnear, "DHCPv6 Failover Requirements", - draft-ietf-dhc-dhcpv6-failover-requirements-00 (work in - progress), October 2011. + draft-ietf-dhc-dhcpv6-failover-requirements-01 (work in + progress), July 2012. - [I-D.ietf-dhc-dhcpv6-redundancy-consider] - Tremblay, J., Brzozowski, J., Chen, J., and T. Mrugalski, - "DHCPv6 Redundancy Deployment Considerations", - draft-ietf-dhc-dhcpv6-redundancy-consider-02 (work in - progress), October 2011. + [RFC4649] Volz, B., "Dynamic Host Configuration Protocol for IPv6 + (DHCPv6) Relay Agent Remote-ID Option", RFC 4649, + August 2006. - [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound, - "Dynamic Updates in the Domain Name System (DNS UPDATE)", - RFC 2136, April 1997. + [RFC5007] Brzozowski, J., Kinnear, K., Volz, B., and S. Zeng, + "DHCPv6 Leasequery", RFC 5007, September 2007. + + [RFC5460] Stapp, M., "DHCPv6 Bulk Leasequery", RFC 5460, + February 2009. [dhcpv4-failover] Droms, R., Kinnear, K., Stapp, M., Volz, B., Gonczi, S., Rabil, G., Dooley, M., and A. Kapur, "DHCP Failover Protocol", draft-ietf-dhc-failover-12 (work in progress), March 2003. Authors' Addresses Tomasz Mrugalski