draft-ietf-dhc-failover-03.txt   draft-ietf-dhc-failover-04.txt 
Network Working Group Ralph Droms Network Working Group Ralph Droms
INTERNET DRAFT Bucknell University INTERNET DRAFT Bucknell University
Greg Rabil
Mike Dooley
Arun Kapur
Quadritek Systems
Kim Kinnear Kim Kinnear
Mark Stapp Mark Stapp
Cisco Systems Cisco Systems
Steve Gonczi
Bernie Volz Bernie Volz
Steve Gonczi
Process Software Process Software
November 1998 Greg Rabil
Expires June 1999 Mike Dooley
Arun Kapur
Quadritek Systems
June 1999
Expires December 1999
DHCP Failover Protocol DHCP Failover Protocol
<draft-ietf-dhc-failover-03.txt> <draft-ietf-dhc-failover-04.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft and is in full conformance with
documents of the Internet Engineering Task Force (IETF), its areas, all provisions of Section 10 of RFC2026.
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference time. It is inappropriate to use Internet- Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
To view the entire list of current Internet-Drafts, please check the The list of current Internet-Drafts can be accessed at
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow http://www.ietf.org/ietf/1id-abstracts.txt
Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern
Europe), ftp.nic.it (Southern Europe), munnari.oz.au (Pacific Rim), The list of Internet-Draft Shadow Directories can be accessed at
ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (1999). All Rights Reserved.
Abstract Abstract
DHCP [RFC 2131] allows for multiple servers to be operating on a DHCP [RFC 2131] allows for multiple servers to be operating on a
single network. Some sites are interested in running multiple servers single network. Some sites are interested in running multiple servers
in such a way so as to provide redundancy in case of server failure. in such a way so as to provide redundancy in case of server failure.
In order for this to work reliably, the cooperating primary and In order for this to work reliably, the cooperating primary and
DRAFT November 1998
secondary servers must maintain a consistent database of the lease secondary servers must maintain a consistent database of the lease
information. This implies that servers will need to coordinate any information. This implies that servers will need to coordinate any
and all lease activity so that this information is synchronized in and all lease activity so that this information is synchronized in
case of failover. case of failover.
This document defines a protocol to provide this synchronization This document defines a protocol to provide this synchronization
between two servers. One server is designated the "Primary" server, between two servers. One server is designated the "primary" server,
the other is the "Secondary" server. Additionally, this document the other is the "secondary" server. Additionally, this document
describes a protocol for the automatic transfer of control from the describes a protocol which allows each server to determine to which
primary to the secondary in the case of failure (failover), as well DHCP clients it should provide service when both servers are
as a network partition. operating in order to support load balancing as well as when on one
server has failed in order to support increased DHCP service
availability.
This document further develops the concepts presented in draft-ietf- This document is a complete rewrite of draft-ietf-dhc-failover-
dhc-failover-02.txt. 03.txt. That earlier draft described a UDP based failover protocol,
and this draft describes a closely related protocol which uses TCP as
a transport and includes new load-balancing and security
capabilities.
Table of Contents
1. Introduction................................................. 4
2. Terminology.................................................. 5
2.1. Requirements terminology................................... 5
2.2. DHCP and failover terminology.............................. 5
3. Background and External Requirements......................... 7
3.1. Key aspects of the DHCP protocol........................... 7
3.2. BOOTP relay agent implementation........................... 9
3.3. What does it mean if a server can't communicate with its partner?
10
3.4. Challenging scenarios for a Failover protocol............. 10
3.5. Using TCP to detect partner server failure................ 11
4. Design Goals................................................ 13
4.1. Design requirements for this protocol..................... 13
4.2. Goals for this protocol................................... 13
4.3. Limitations of this Protocol.............................. 14
5. Protocol Overview........................................... 15
5.1. Messages and States....................................... 15
5.2. Fundamental restrictions.................................. 18
5.3. Load balancing............................................ 24
5.4. Operating in NORMAL state................................. 25
5.5. Operating in COMMUNICATIONS-INTERRUPTED state............. 25
5.6. Operating in PARTNER-DOWN state........................... 25
5.7. Operating in RECOVER state................................ 26
6. Packet Formats.............................................. 26
6.1. Common message format..................................... 26
6.2. Common option format...................................... 28
6.3. BNDUPD message format..................................... 40
6.4. BNDACK message format..................................... 42
6.5. Bulking for BNDUPD and BNDACK messages.................... 44
6.6. UPDREQ message format..................................... 44
6.7. UPDREQALL message format.................................. 44
6.8. UPDDONE message format.................................... 44
6.9. POOLREQ message format.................................... 45
6.10. POOLRESP message format.................................. 45
6.11. CONNECT message format................................... 46
6.12. CONNECTACK message format................................ 46
6.13. STATE message format..................................... 47
6.14. CONTACT message format................................... 48
7. Protocol Messages........................................... 48
7.1. BNDUPD message............................................ 48
7.2. BNDACK message............................................ 57
7.3. UPDREQ message............................................ 58
7.4. UPDREQALL message......................................... 59
7.5. UPDDONE message........................................... 60
7.6. POOLREQ message........................................... 60
7.7. POOLRESP message.......................................... 61
7.8. CONNECT message........................................... 62
7.9. CONNECTACK message........................................ 65
7.10. STATE message............................................ 68
7.11. CONTACT message.......................................... 69
8. Connection Management....................................... 70
8.1. Connection granularity.................................... 70
8.2. Creating the TCP connection............................... 70
8.3. Using the TCP connection for determining communications status. 71
8.4. Using the TCP connection for binding data................. 73
8.5. Using the TCP connection for control messages............. 73
8.6. Losing the TCP connection................................. 73
9. Protocol States............................................. 73
9.1. Server Initialization..................................... 74
9.2. Server State Transitions.................................. 74
9.3. STARTUP state............................................. 77
9.4. PARTNER-DOWN state........................................ 79
9.5. RECOVER state............................................. 81
9.6. NORMAL state.............................................. 83
9.7. COMMUNICATIONS-INTERRUPTED State.......................... 86
9.8. POTENTIAL-CONFLICT state.................................. 89
9.9. RECOVER-DONE state........................................ 90
9.10. PAUSED state............................................. 91
9.11. SHUTDOWN state........................................... 91
10. Safe Period................................................ 92
11. Security................................................... 94
11.1. Simple shared secret..................................... 94
11.2. TLS...................................................... 94
12. Hash algorithm for load balancing.......................... 95
13. Acknowledgments............................................ 96
14. References................................................. 97
15. Author's information....................................... 98
16. Full Copyright Statement................................... 99
1. Introduction 1. Introduction
As the use of DHCP servers in networked environments grows, the DHCP [RFC 2131] allows for multiple servers to be operating on a sin-
dependency of those networks on the DHCP server increases. This is gle network. Some sites are interested in running multiple servers
particularly true of the hosts that receive their configuration in such a way so as to provide redundancy in case of server failure
information from the DHCP server. Therefore, it is very important to since the DHCP subsystem is in many cases a critical part of the net-
be able to provide reliable, continuous availability of DHCP ser- work infrastructure.
vices.
This specification describes a protocol to support automatic failover This document defines a protocol to provide synchronization between
from a primary to its secondary server. The failover mechanism two servers in order that each can take over for the other should
allows the secondary server to perform DHCP actions while the primary either one fail or become unreachable.
is down, or when a network failure prevents the primary and secondary
from communicating. The protocol also specifies how reintegration is
achieved when the primary again becomes operational or when the pri-
mary and secondary can again communicate.
In providing the specification for the failover, the protocol speci- One server is designated the "primary" server, the other is the
fies how to guarantee reliable delivery of binding changes to the "secondary" server, and all DHCP client requests are sent to each
partner server. This is required to synchronize lease data between server.
the primary and the secondary. The protocol further specifies a
mechanism to allow either server to determine if it can communicate
with its partner. The secondary will automatically begin to service
DHCP requests whenever it cannot communicate with the primary. When
the primary server becomes available again, the secondary will convey
any changes that occurred since the time of failover back to the pri-
mary.
Through careful control of the difference between the lease times In order to provide a high availability DHCP service, these
offered to DHCP clients and the lease time known by the secondary cooperating primary and secondary servers must maintain a consistent
server, the protocol allows the primary to communicate with the database of lease information. This implies that servers will need
secondary after the primary has completed communication with the DHCP to coordinate any and all lease activity so that this information is
client (a technique known as "lazy" update) and still guarantee that synchronized in case failover is required. The protocol messages and
processing techniques required to maintain a consistent database are
specified in the protocol described here.
DRAFT November 1998 The failover protocol also contains an algorithm which allows each
server to determine to which DHCP clients it should provide service
when both servers are operating normally, and this capability can be
used to support load balancing.
duplicate IP address allocations do not occur. Thus, the protocol 2. Terminology
does not directly impact the ability of a DHCP server to respond to
DHCP client requests.
1.1. Requirements Terminology This section discusses both the generic requirements terminology com-
mon to many IETF protocol specifications as well as specialized DHCP
and failover protocol specific terminology.
2.1. Requirements terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC 2119]. document are to be interpreted as described in RFC 2119 [RFC 2119].
1.2. DHCP Terminology 2.2. DHCP and failover terminology
This document uses the following terms: This document uses the following terms:
o "DHCP client" or "client" o "DHCP client" or "client"
A DHCP client is an Internet host using DHCP to obtain confi- A DHCP client is an Internet host using DHCP to obtain confi-
guration parameters such as a network address. guration parameters such as a network address.
o "DHCP server" or "server" o "DHCP server" or "server"
skipping to change at page 3, line 41 skipping to change at page 5, line 41
o "binding" o "binding"
A binding is a collection of configuration parameters, including A binding is a collection of configuration parameters, including
at least an IP address, associated with or "bound to" a DHCP at least an IP address, associated with or "bound to" a DHCP
client. Bindings are managed by DHCP servers. client. Bindings are managed by DHCP servers.
o "binding database" o "binding database"
The collection of bindings managed by a primary and secondary. The collection of bindings managed by a primary and secondary.
o "failover endpoint"
The failover protocol allows for there to be a unique failover
endpoint per partner per role (where role is primary or secon-
dary). This failover endpoint can take actions and hold unique
states. There are thus a maximum of two failover endpoints per
server per partner (one for each partner as a primary and one
for that same partner as a secondary.)
o "lazy update"
Lazy update refers to the requirement placed on a server imple-
menting a failover protocol to update its failover partner when-
ever the binding database changes. A failover protocol which
didn't support lazy update would require the failover partner
update to be complete before a DHCP server could respond to a
DHCP client request with a DHCPACK. A failover protocol which
does support lazy update places no such restriction on the
update of the failover partner server, and so a server can allo-
cate an IP address or extend a lease on an IP address and then
update its failover partner as time permits. A failover proto-
col which supports lazy update not only removes the requirement
to update the failover partner prior to responding to a DHCP
client with a DHCPACK, but also allows gathering up batches of
updates from one failover server to its partner.
o "subnet address pool" o "subnet address pool"
A subnet address pool is the set of IP address which is associ- A subnet address pool is the set of IP address which is associ-
ated with a particular network number and subnet mask. In the ated with a particular network number and subnet mask. In the
simple case, there is a single network number and subnet mask simple case, there is a single network number and subnet mask
and a set of IP addresses. In the more complex case (sometimes and a set of IP addresses. In the more complex case (sometimes
called "secondary subnets", sometimes "superscopes"), several called "secondary subnets", sometimes "superscopes"), several
(apparently unrelated) network number and subnet mask combina- (apparently unrelated) network number and subnet mask combina-
tions with their associated IP addresses may all be configured tions with their associated IP addresses may all be configured
together into one subnet address pool. together into one subnet address pool.
o "Primary server" or "Primary" o "Primary server" or "Primary"
DRAFT November 1998
A DHCP server configured to provide primary service to a set of A DHCP server configured to provide primary service to a set of
DHCP clients for a particular set of subnet address pools. DHCP clients for a particular set of subnet address pools.
o "Secondary server" or "Secondary" o "Secondary server" or "Secondary"
A DHCP server configured to act as backup to a primary server A DHCP server configured to act as backup to a primary server
for a particular set of subnet address pools. for a particular set of subnet address pools.
o "stable storage" o "stable storage"
Every DHCP server is assumed to have some form of what is called Every DHCP server is assumed to have some form of what is called
"stable storage". Stable storage is used to hold information "stable storage". Stable storage is used to hold information
concerning IP address bindings (among other things) so that this concerning IP address bindings (among other things) so that this
information is not lost in the event of a server failure which information is not lost in the event of a server failure which
requires restart of the server. requires restart of the server.
1.3. Requirements for this protocol o "MCLT"
The following list of goals must be (and are) achieved by this proto- The MCLT refers to maximum client lead time. This time is con-
col. figured on the primary server and transmitted from the primary
to the secondary server in the CONNECT message. It is the max-
imum amount of time that one server can give to a client for a
binding beyond that known and ACKed by the partner server. See
section 5.2.1 for details.
3. Background and External Requirements
This section highlights key aspects of the DHCP protocol on which the
failover protocol depends. It also discusses the requirements that
the failover protocol places on other aspects of the network infras-
tructure, and some general issues surrounding server failure detec-
tion. Some failure scenarios that provide particular challenges to a
failover protocol are discussed. Finally, the challenges inherent in
using a TCP connection as a means to detect failure of a partner
server are elaborated.
3.1. Key aspects of the DHCP protocol
The failover protocol is designed to augment the DHCP protocol as
described in RFC 2131 [RFC 2131]. There are several key aspects of
the DHCP protocol which are required by the failover protocol in
order to successfully meet its design goals.
3.1.1. Broadcast behavior
There are two aspects of the broadcast behavior of the DHCP protocol
which are key to making the failover protocol operate successfully.
The first is simply that the DHCP protocol requires a DHCP client to
broadcast all DHCPDISCOVER and DHCPREQUEST/INIT-REBOOT messages.
Because of this requirement, a DHCP client who was communicating with
one server will automatically be able to communicate with another
server if one is available.
The second aspect of broadcast behavior is similar to the first, but
involves the distinction between a DHCPREQUEST/RENEW and
DHCPREQUEST/REBINDING. A DHCPREQUEST/RENEW is the message that a
DHCP client uses to extend its lease. It is unicast to the DHCP
server from which it acquired the lease. However, the DHCP protocol
(in a farsighted move), was explicitly designed so that in the event
that a DHCP client cannot contact the server from which it received a
lease on an IP address using a DHCPREQUEST/RENEW, the client is
required to broadcast its renewal using a DHCPREQUEST/REBINDING to
any available DHCP server. Since all DHCP clients were required to
implement this algorithm, the failover protocol can have a different
server from the one that initially granted a lease be the server to
renew a lease. Thus, one server can take over for another with no
interruption in the service as experience by the DHCP client or its
associated applications software.
3.1.2. Client responsibility
In the DHCP protocol the DHCP clients are entrusted with a consider-
able responsibility. In particular, after they are granted a lease
on an IP address, they are enjoined to only use that IP address while
their lease is valid. Every DHCP client is expected to stop using an
IP address if the expiration time on the lease has passed and if it
cannot get an extension on the lease for that IP address from some
DHCP server. Thus, the correct behavior of every DHCP client in this
regard is required to ensure the integrity of the DHCP service. On
the other hand, incorrect behavior by a client in this area will tend
to adversely affect at most one other DHCP client.
Furthermore, any DHCP client which sends in a DHCPREQUEST/RENEW or
DHCPREQUEST/REBINDING to a DHCP server (either unicast for a RENEW or
broadcast for a REBINDING) MUST still have time to run on the lease
for that IP address. The DHCP server sends the DHCPACK back unicast
to the IP address from which the RENEW or REBINDING originated.
Given the existing responsibility placed on the client to only use an
IP address when the lease is valid, and to only send in a RENEW or
REBINDING if the lease is valid, the failover protocol relies on DHCP
clients to perform responsibly and will, in the absence of conflict-
ing information, believe a DHCP client that is attempting to RENEW or
REBIND a lease on an IP address is the legitimate owner of that IP
address.
One troublesome issue is that of the DHCP client responsibility when
sending in DHCPREQUEST/INIT-REBOOT requests. While the original DHCP
RFC was written to require a DHCP client to have time left to run on
the lease for an IP address if the client is sending an INIT-REBOOT
request, it was sufficiently unclear that some client vendors didn't
realize this until recently. Since the INIT-REBOOT request was sent
with the IP address in the dhcp-requested-address option and not in
the ciaddr (for perfectly good reasons), the similarity to the RENEW
and REBINDING case was lost on many people.
At present, the failover protocol does not assume that a client send-
ing in an INIT-REBOOT request necessarily has a valid lease on the IP
address appearing in the dhcp-requested-address option in the INIT-
REBOOT request.
The implications of this are as follows: Assume that there is a DHCP
client that gets a lease from one server while that server is unable
to communicate with its failover partner. Then, assume that after
that client reboots it is able only to communicate with the other
failover server. If the failover servers have not been able to com-
municate with each other during this process, then the DHCP client
will get a new IP address instead of being able to continue to use
its existing IP address. This will affect no applications on the DHCP
client, since it is rebooting. However, it will use up an additional
IP address in this marginal case.
3.1.3. Stable storage update before DHCPACK
The DHCP protocol allocates resources, and in order to operate
correctly it requires that a DHCP server update some form of stable
storage prior to sending a DHCPACK to a DHCP client in order to grant
that client a lease on an IP address.
One of the goals of the failover protocol is that it not add signifi-
cant additional time to this already time consuming requirement to
update stable storage prior to a DHCPACK. In particular, adding a
requirement to communicate with another server prior to sending a
DHCPACK would simplify the failover protocol, but it would limit the
potential scalability of any DHCP server which employed the failover
protocol in an unacceptable manner.
3.2. BOOTP relay agent implementation
Many DHCP clients are not resident on the same network segment as a
DHCP server. In order to support this form of network architecture,
most contemporary routers implement something known as a BOOTP Relay
Agent. This capability inside of a router listens for all broadcasts
at the DHCP port, port 67, and will relay any broadcasts that it
receives on to a DHCP server. The IP address of the DHCP server must
have been previously configured into the router. As part of the
relay process, the relay agent will place the address of the inter-
face on which it received the broadcast into the giaddr field of the
DHCP packet.
Since the failover protocol requires two DHCP servers to receive any
broadcast DHCP messages, in order to work with DHCP clients which are
not local to the DHCP server, the BOOTP relay agent on the router
closest to the DHCP client must be configured to point at more than
one DHCP server.
Most BOOTP relay agent implementations allow this duplication of
packets.
If this is not possible, an administrator might be able to configure
the relay agent with a subnet broadcast address, but in this case the
primary and secondary DHCP servers in a failover pair must both
reside on the same subnet. While this is a realistic configuration,
it is not the one that most people will use.
3.3. What does it mean if a server can't communicate with its partner?
In any protocol designed to allow one server to take over some
responsibilities from a partner server in the event of "failure" of
that partner server, there is an inherent difficulty in determining
when that partner server has failed.
In fact, it is fundamentally impossible for one server to distinguish
a network communications failure from the outright failure of the
server to which it is trying to communicate. In the case where each
server is handing out resources (in this case IP addresses) to a
client community, mistaking an inability to communicate with a
partner server for failure of that partner server could easily cause
both servers to be handing out the same IP addresses to different
clients.
One way that this is sometimes handled is for there to be more than
two servers. In the case of an odd number of servers, the servers
that can still communicate with a majority of other servers will con-
sider themselves operational, and any server which can't communicate
to a majority of other servers must immediately cease operations.
While this technique works in some domains, having the only server to
which a DHCP client can communicate voluntarily shut itself down
seems like something worth avoiding.
The failover protocol will operate correctly while both servers are
unable to communicate, whether they are both running or not. At some
point there may be resource contention, and if one of the servers is
actually down, then the operator can inform the other server and the
operational server will be able to use all of the downed server's
resources.
The protocol also allows detection of an orderly shutdown of a parti-
cipating server.
3.4. Challenging scenarios for a Failover protocol
There exist two failure scenarios which provide particular challenges
the correctness guarantees of a failover protocol.
3.4.1. Primary Server crash before "lazy" update:
In the case where the primary server sends a DHCPACK to a client for
a newly allocated IP address and then crashes prior to sending the
corresponding update to the secondary server, the secondary server
will have no record of the IP address allocation. When the secondary
server takes over, it may well try to allocate that IP address to a
different client. In the case where the first client to receive the
IP address is not on the net at the time (yet while there was still
time to run on its lease), an ICMP echo (i.e., ping) will not prevent
the secondary server from allocating that IP address to a different
client.
The failover protocol deals with this situation by having the primary
and secondary servers allocate addresses for new clients from dis-
joint address pools. See section 5.4 for details.
A more likely (in that DHCPRENEWs are presumably more common than
DHCPDISCOVERs) and more subtle version of this problem is where the
primary server crashes after extending a client's lease time, and
before updating the secondary with a new time using a lazy update.
After the secondary takes over, if the client is not connected to the
network the secondary will believe the client's lease has expired
when, in fact, it has not. In this case as well, the IP address
might be reallocated to a different client while the first client is
still using it.
This scenario is handled by the failover protocol through control of
the lease time and the use of the maximum client lead time (MCLT).
See section 5.2.1 for details.
3.4.2. Network partition where DHCP servers can't communicate but each
can talk to clients:
Several conditions are required for this situation to occur. First,
due to a network failure, the primary and secondary servers cannot
communicate. As well, some of the DHCP clients must be able to com-
municate with the primary server, and some of the clients must now
only be able to communicate with the secondary server. When this
condition occurs, both primary and secondary servers could attempt to
allocate IP addresses for new clients from the same pool of available
addresses. At some point, then, two clients will end up being allo-
cated the same IP address. This will cause problems when the network
failure that created this situation is corrected.
The failover protocol deals with this situation by having the primary
and secondary servers allocate addresses for new clients from dis-
joint address pools. See section 5.4 for details.
3.5. Using TCP to detect partner server failure
There are several characteristics of TCP that are important to the
functioning of the failover protocol, which uses one TCP connection
for both bulk data transfer as well as to assess communications
integrity with the other server. Reliable and ordered message
delivery are chief among these important characteristics.
It would be nice to use the capabilities built in to TCP to allow it
to determine if communications integrity exists to the failover
partner but this strategy contains some problems which require
analysis. There exist three fundamental cases for an open TCP con-
nection that must be examined.
1. When no data is being sent then no messages are traveling
across the TCP connection.
2. When data is queued to be sent, and the receiver has not
blocked the sending of additional data, then messages are
flowing across the TCP connection containing the applications
data.
3. When data is queued to be sent, and the receiver has blocked
the transmission of additional data, then persist messages are
flowing from the receiver to the sender to ensure that the
sender doesn't miss the receiver opening the window for
further transmissions.
The first case can be turned into the second case by sending
application-level keep-alive messages periodically when there is no
other data queued to be sent. Note TCP keep-alive messages might be
used as well, but they present additional problems.
Thus, we can ensure that the TCP connection has messages flowing
periodically across the connection fairly easily. The question
remains as to what TCP will do if the other end of the connection
fails to respond (either because of network partition or because the
receiving server crashes). TCP will attempt to retransmit a message
with an exponential backoff, and will eventually timeout that
retransmission. However, the length of that timeout cannot, in gen-
eral, be set on a per-connection basis, and is frequently as long as
nine minutes, though in some cases it may be as short as two minutes.
One some systems it can be set system-wide, while on some systems it
cannot be changed at all.
A value for this timeout that would be appropriate for the failover
protocol, say less than 1 minute, could have unpleasant side-effects
on other applications running on the same server, assuming that it
could be changed at all on the host operating system.
Nine minutes is a long time for the DHCP service to be unavailable to
any new clients that were being served by the server which has
crashed, when there is another server running that could respond to
them immediately as soon as it determines that its partner is not
operational.
The conclusion drawn from this analysis is that TCP provides very
useful support for the failover protocol in the areas of reliable and
ordered message delivery, but cannot by itself be relied upon to
detect partner server failure in a fashion acceptable to the needs of
the failover protocol. Additional failover protocol capabilities
will need to be created to support timely detection of partner server
failure. See section 8.3 for details on this mechanism.
4. Design Goals
This section lists the design requirements, the design goals, and the
limitations of the failover protocol.
4.1. Design requirements for this protocol
The following list of requirements must be (and are) met by this pro-
tocol. They are listed in priority order.
1. Implementations of this protocol must work with existing DHCP 1. Implementations of this protocol must work with existing DHCP
client implementations based on the DHCP protocol [RFC 2131]. client implementations based on the DHCP protocol [1].
2. Implementations of the protocol must work with existing BOOTP 2. Implementations of the protocol must work with existing BOOTP
relay implementations. relay agent implementations.
3. The protocol must provide failover redundancy between servers 3. The protocol must provide failover redundancy between servers
that are not located on the same subnet. that are not located on the same subnet.
1.4. Goals for this protocol 4.2. Goals for this protocol
The following goals are met by this protocol as well, though they are
less important than the requirements listed above. These goals are
listed in priority order.
1. Provide for continued service to DHCP clients through an 1. Provide for continued service to DHCP clients through an
automated mechanism in the event of failure of the primary automated mechanism in the event of failure of the primary
server. server.
2. Avoid binding an IP address to a client while that binding is 2. Avoid binding an IP address to a client while that binding is
currently valid for another client. In other words, do not currently valid for another client. In other words, do not
allocate the same IP address to two clients. allocate the same IP address to two clients.
3. Minimize any need for manual administrative intervention. 3. Minimize any need for manual administrative intervention.
4. Introduce no additional delays in server response time as a 4. Introduce no additional delays in server response time as a
result of the communications required to implement the Fail- result of the network communications required to implement the
over protocol. failover protocol, i.e., don't require communications with the
partner between the receipt of a DHCPREQUEST and the
DRAFT November 1998 corresponding DHCPACK.
5. Share IP address ranges between primary and secondary servers; 5. Share IP address ranges between primary and secondary servers;
i.e., impose no requirement that the pool of available i.e., impose no requirement that the pool of available
addresses be divided between servers. addresses be divided between servers.
6. Continue to meet the goals and objectives of this protocol in 6. Continue to meet the goals and objectives of this protocol in
the event of server failure or network partition. the event of server failure or network partition.
7. Provide graceful reintegration of full protocol service after 7. Provide graceful reintegration of full protocol service after
server failure or network partition. server failure or network partition.
skipping to change at page 5, line 38 skipping to change at page 14, line 42
server. Ensure that in the face of partition, where servers server. Ensure that in the face of partition, where servers
continue to run but cannot communicate with each other, the continue to run but cannot communicate with each other, the
above goals and requirements may be met. In addition, when the above goals and requirements may be met. In addition, when the
partition condition is removed, allow graceful automatic re- partition condition is removed, allow graceful automatic re-
integration without requiring human intervention. integration without requiring human intervention.
11. If either primary or secondary server loses all of the infor- 11. If either primary or secondary server loses all of the infor-
mation that is has stored in stable storage, it should be able mation that is has stored in stable storage, it should be able
to refresh its stable storage from the other server. to refresh its stable storage from the other server.
1.5. Limitations of this Protocol 12. Support load balancing between the primary and secondary
servers, and allow configuration of the percentage of the
client population served by each with a moderately fine granu-
larity.
The following are explicit limitations of this protocol. 4.3. Limitations of this Protocol
1. Under normal operation, only one server at a time will hand The following are explicit limitations of this protocol.
out new IP addresses, but client lease renewals are serviced
by both servers; the protocol provides reliability through
redundancy and some degree of load balancing of lease
renewals.
2. This protocol provides only one level of redundancy through a 1. This protocol provides only one level of redundancy through a
single secondary server for each primary server. single secondary server for each primary server.
3. The protocol provides a way to detect when the primary and 2. A subset of the address pool is reserved for secondary server
secondary server cannot communicate, but once this condition
has been detected, does not (indeed, cannot) provide any way
DRAFT November 1998
to further distinguish between network failure and failure of
one of the servers. The protocol allows detection of an ord-
erly shutdown of a participating server.
4. A subset of the address pool is reserved for secondary server
use. In order to handle the failure case where both servers use. In order to handle the failure case where both servers
are able to communicate with DHCP clients, but unable to com- are able to communicate with DHCP clients, but unable to com-
municate with each other, a subset of the IP address pool must municate with each other, a subset of the IP address pool must
be set aside as a private address pool for the secondary be set aside as a private address pool for the secondary
server. The secondary can use these to service newly arrived server. The secondary can use these to service newly arrived
DHCP clients during such a period. The size of this private DHCP clients during such a period. The size of this private
pool SHOULD be based only on the arrival rate of new DHCP pool SHOULD be based only on the arrival rate of new DHCP
clients and the length of expected down-time, and is not clients and the length of expected downtime, and is not influ-
influenced in any way by the total number of DHCP clients sup- enced in any way by the total number of DHCP clients supported
ported by the server pair. by the server pair.
5. The primary and secondary servers do not respond to client 3. The primary and secondary servers do not respond to client
requests at all while recovering from a failure that could requests at all while recovering from a failure that could
have resulted in duplicate IP assignments. (When synchroniz- have resulted in duplicate IP assignments. (When synchroniz-
ing in POTENTIAL-CONFLICT state). ing in POTENTIAL-CONFLICT state).
2. Protocol Operations 5. Protocol Overview
The protocol features a small number of messages to communicate bind- This section will discuss the failover protocol at a relatively high
ing information, operational status and to manage various level level of detail. In the event that a description in this sec-
disconnect-reconnect scenarios between servers. tion conflicts (or appears to conflict due to the overview nature of
this section) with information in later sections of this draft, the
information in the later sections should be considered authoritative.
2.1. Message Addressing and Configuration granularity 5.1. Messages and States
When discussing messages, an important question is "to whom are mes- This protocol is centered around the message exchange used by one
sages sent" and "from whom are messages sent". What is the address- server to update the other server of binding database changes result-
able entity from which and to which messages are sent? ing from DHCP client activity:
At one level, this would seem to be a single DHCP server, but in fact o Communication of binding database changes
there are many situations where additional flexibility in configura-
tion is useful. For instance, there might be several servers which
are each primary for a distinct set of address pools, and one server
which is secondary for all of those address pools. The situation
with the primaries is straightforward, but the secondary will need to
maintain a separate failover state, partner state, and communications
up/down status for each of the separate primary servers for which it
is acting as a secondary.
The protocol allows for there to be a unique failover entity per The binding update (BNDUPD) message is used to send the binding
partner per role (where role is primary or secondary). This failover database changes to the partner server, and the partner server
entity can take actions and hold unique states. There are thus a responds with a binding acknowledgement (BNDACK) message when it
has successfully committed those changes to its own stable
storage.
DRAFT November 1998 All of the other messages are involve ancillary issues:
maximum of two failover entities per partner (one for the partner as o Management of available IP addresses
a primary and one for that same partner as a secondary.)
The pool request (POOLREQ) is used by the secondary server to
request an allocation of IP addresses from the primary server.
The pool response (POOLRESP) is used by the primary server to
inform the secondary server how many IP addresses it was allo-
cated as the result of a pool request.
o Synchronization of the binding databases between the servers
after they've been out of communications
The update request (UPDREQ) message is used by one server to
request that its partner send it all binding database informa-
tion that it has not already seen. The update request all
(UPDREQALL) message is used by one server to request that all
binding database information be sent in order to recover from a
total loss of its lease state database by the requesting server.
The update done (UPDDONE) message is used by the responding
server to indicate that all requested updates have been sent the
responding server and acked by the requesting server.
o Connection establishment
The connect (CONNECT) message is used by either server to estab-
lish a high level connection with the other server, and to
transmit several important configuration data items between the
servers. The connect acknowledgement message (CONNECTACK) is
used to respond to a CONNECT message from another server.
o Server synchronization
The state change (STATE) message is used by either server to
inform the other server of a change of failover state.
o Connection integrity management
The contact (CONTACT) message is used by either server to ensure
that the other server continues to see the connection as opera-
tional. It MUST be transmitted periodically over every esta-
blished connection if other message traffic is not flowing, and
it MAY be sent at any time.
5.1.1. Failover endpoints
The proper operation of the failover protocol requires more than the
transmission of messages between one server and the other. Each end-
point might seem to be a single DHCP server, but in fact there are
many situations where additional flexibility in configuration is use-
ful.
For instance, there might be several servers which are each primary
for a distinct set of address pools, and one server which is
secondary for all of those address pools. The situation with the
primaries is straightforward, but the secondary will need to maintain
a separate failover state, partner state, and communications up/down
status for each of the separate primary servers for which it is act-
ing as a secondary.
The failover protocol calls for there to be a unique failover end-
point per partner per role (where role is primary or secondary).
This failover endpoint can take actions and hold unique states.
There are thus a maximum of two failover endpoints per partner (one
for the partner as a primary and one for that same partner as a
secondary.)
Thus, in the case where there are two primary servers A and B each Thus, in the case where there are two primary servers A and B each
backed up by a single common secondary server C, there is one fail- backed up by a single common secondary server C, there is one fail-
over entity on each of A and B, and two different failover entities over endpoint on each of A and B, and two different failover end-
on C. The two different failover entities on C each have unique points on C. The two different failover endpoints on C each have
states and message xid ranges. As far as the protocol described in unique states and independent TCP connections.
this draft is concerned, they constitute different "servers",
although they are certainly part of one server (as the term is com-
monly used) if they reside in the same process.
It is not the case that there is subnet granularity for each failover This document describes the behavior of the protocol in terms of pri-
entity. On one server, there is one failover entity per "partner- mary and secondary servers, not primary and secondary failover end-
role", regardless of how many subnets or address pools are managed by points. However, it is important to remember that every 'server'
that combination of partner and role. Conversely, any given subnet described in this document is in reality a failover endpoint that
or pool will be associated with exactly one failover entity on a sin- resides in a particular process, and that many failover endpoints may
gle server (but it will also be associated with the corresponding reside in the same process.
partner's failover entity.)
When a message is received from the partner, the unique failover It is not the case that there is a unique failover endpoint for each
entity to which the message is directed is determined solely by the subnet that participates in a failover relationship. On one server,
there is one failover endpoint per partner per role, regardless of
how many subnets or address pools are managed by that combination of
partner and role. Conversely, any given subnet or pool will be asso-
ciated with exactly one failover endpoint on a single server.
When a connection is received from the partner, the unique failover
endpoint to which the message is directed is determined solely by the
IP address of the partner and the setting of the SECONDARY bit in the IP address of the partner and the setting of the SECONDARY bit in the
'flags' field of the message header. 'flags' field of the contact message.
Throughout this document, the states and actions taken by "servers" Throughout this document, the states and actions taken by "servers"
are described. The terms "server", "primary server", and "secondary are described. The terms "server", "primary server", and "secondary
server" are commonly used to described the entity taking these states server" are commonly used to described the failover endpoint taking
and taking actions. This description is wholly accurate only for the these states and performing these actions. This description is
simplest of cases, where all of the address pools on one server are wholly accurate only for the simplest of cases, where all of the
backed up by all of the address pools on another server. In this address pools on one server are backed up by all of the address pools
case, there is a "true" primary and secondary server. In all other on another server. In this case, there is single failover endpoint
cases, the term "server" is used to describe one of the two possible in each server. In all other cases, the term "server" is used to
failover entities per partner. describe one of the two possible failover endpoints per partner.
2.2. Packet transport 5.2. Fundamental restrictions
All messages sent by this protocol are sent in UDP packets. All mes- There a several fundamental restrictions this protocol places on what
sages are unicast from the sender to the receiver. The next section one server an do in the absence of knowledge of the other server, and
discusses the port to use when sending DHCP failover UDP packets. these restrictions are key to the correct operation of the protocol.
DISCUSSION: 5.2.1. Control of lease time
See section 8, Extended discussion #1, for a discussion of the The key problem with lazy update is that when the a server fails
reasons to use UDP as the protocol. after updating a client with a particular lease time and before
updating its partner, the partner will believe that a lease has
expired even though the client still retains a valid lease on that IP
address.
DRAFT November 1998 In order to handle this problem, a period of time known as the "Max-
imum Client Lead Time" (MCLT) is defined and must be known to both
the primary and secondary servers. Proper use of this time interval
places an upper bound on the difference allowed between the lease
time provided to a DHCP client by a server and the lease time known
by that server's partner. However, the MCLT is typically much less
than the lease time that a server has been configured to offer a
client, and so some strategy must exist to allow a server to offer
the configured lease time to a client. During a lazy update the
updating server typically updates its partner with a potential
expiration time which is longer than the lease time previously given
to the client and which is longer than the lease time that the server
has been configured to give a client. This allows that server to
give a longer lease time to the client the next time the client
renews its lease, since the time that it will give to the client will
not exceed the MCLT beyond the potential expiration time acknowledged
by the partner.
2.3. Port usage When moving to the PARTNER-DOWN state (where a server is allowed to
reallocate the partner's IP addresses), a server will wait the Max-
imum Client Lead Time before allocating any IP addresses from its
partner's pool to any new DHCP clients. Thus, any clients which have
a lease on an IP address with a lease time greater than that known by
the server moving into PARTNER-DOWN state will either have contacted
that server during the MCLT period or their leases will have expired.
Compliant servers SHOULD use port 647 (assigned to dhcp-failover by When a server has transitioned to PARTNER-DOWN state, it MUST NOT
IANA) for sending and receiving Failover protocol messages, though reallocate an IP address from one client to another client until an
they MAY be configured to use a different port (including ports 67 or additional maximum client lead time interval after the lease by the
68). original client expires. (Actually, until the maximum client lead
time after what it believes to be the lease expiration time of the
first client.)
Since the use of port 67 and 68 is allowed, the messages are format- Some optimizations exist for this restriction, in that it only
ted in such a way that they can be distinguished from DHCP or BOOTP applies to leases that were issued BEFORE entering PARTNER-DOWN. Once
messages by the use of distinct message 'op' codes. Note that send- a server has entered PARTNER-DOWN and it leases out an address, it
ing failover messages on port 67 to servers not designed to support need not wait this time as long as it has never communicated with the
them may not only not work, but may cause those servers to operate partner since the lease was given out.
incorrectly or to crash.
DISCUSSION: The fundamental relationship on which much of the correctness of this
protocol depends is that the lease expiration time known to a DHCP
client MUST NOT be more than the maximum client lead time greater
than the potential expiration time known to a server's partner.
Some implementors have a strong requirement for using a separate The remainder of this section makes the above fundamental relation-
port for the Failover protocol, and the use of the allocated port ship more explicit.
647 will accommodate them. Some other implementors seem equally
committed to allowing failover packets to be sent to the standard
DHCP port, port 67. The above language strongly suggests that the
failover port be used (by using SHOULD), but leaves open the pos-
sibility of using the standard DHCP port (or any other) for
servers designed to operate in that fashion.
2.4. Time synchronization between communicating servers This protocol requires a DHCP server to deal with several different
lease intervals and places specific restrictions on their relation-
ships. The purpose of these restrictions is to allow the other server
in the pair to be able to make certain assumptions in the absence of
an ability to communicate between servers.
Each Binding update message carries a "sent time stamp" (the time The different lease times are:
when the message was sent in GMT). This provides a simple mechanism
to determine any "time drift" between communicating servers. o desired lease interval
The desired lease interval is the lease interval that a DHCP
server would like to give to a DHCP client in the absence of any
restrictions imposed by the Failover protocol. Its determina-
tion is outside of the scope of this protocol. Typically this is
the result of external configuration of a DHCP server.
o actual lease interval
The actual lease internal is the lease interval that a DHCP
server gives out to a DHCP client in the dhcp-lease-time option
of a DHCPACK packet. It may be shorter than the desired client
lease interval (as explained below).
o potential lease interval
The potential lease interval is the lease expiration interval
the local server tells to its partner in the potential-
expiration-time option of a BNDUPD message.
o acknowledged potential lease interval
The acknowledged potential lease interval is the potential least
interval the partner server has most recently acknowledged in
the potential-expiration-time option of a BNDACK message.
The key restriction (and guarantee) that any server makes with
respect to lease intervals is that the actual client lease interval
never exceeds the acknowledged potential lease interval (if any) by
more than a fixed amount. This fixed amount is called the "Maximum
Client Lead Time" (MCLT).
The MCLT MAY be configurable on the primary server, but for correct
server operation it MUST be the same and known to both the primary
and secondary servers. The secondary server determines the MCLT from
the MCLT option sent from the primary server to the secondary server
in the CONNECT or CONNECTACK message.
A server MUST record in its stable storage both the actual lease
interval and the most recently acknowledged potential lease interval
for each IP address binding. It is assumed that the desired client
lease interval can be determined through techniques outside of the
scope of this protocol.
Again, the fundamental relationship among these times which MUST be
maintained is:
actual lease interval <
( acknowledged potential lease interval + MCLT )
Figure 5.1-1 illustrates a initial lease to a client using the rules
discussed in the example which follows it.
DHCP Primary Secondary
time Client Server Server
| (time in intervals) | (absolute time) |
| | |
| >-DHCPDISCOVER-> | |
| <---DHCPOFFER-< | |
| | |
| >-DHCPREQUEST-> | |
| (selecting) | |
| | |
t | <--------DHCPACK-< | |
| lease-time=MCLT | |
| | >-BNDUPD--> |
| | lease-expiration=t+MCLT
| | potential-expiration=t+(MCLT/2)+X
| | |
| | <-BNDACK-< |
| | potential-expiration=t+(MCLT/2)+X
... ... ...
| | |
t+MCLT/2 | >-DHCPREQUEST-> | |
| (renew) | |
| | |
t1 | <--------DHCPACK-< | |
| lease-time=X | |
| | >-BNDUPD--> |
| | lease-expiration=t1+X
| | potential-expiration=t1+(X/2)+X
| | |
| | <-BNDACK-< |
| | potential-expiration=t1+(X/2)+X
... ... ...
Figure 5.1-1: Lazy Update Message Traffic
X = Desired Lease Interval
DISCUSSION: DISCUSSION:
If a UDP packet is successfully transmitted (i.e.: it does not get This protocol mandates no algorithm concerning these lease inter-
lost), the packet travel time is negligible in the framework of vals, as long as above fundamental relationship is preserved.
DHCP leases. By providing a GMT "sent time" stamp, the recipient
can compare this with its notion of the current GMT time at the
time it receives the packet. The difference (plus the packet
travel time, which we ignore) is the time drift. The recipient
MUST use this time drift value to bias "absolute time" values it
receives from the sender.
2.5. Failover Protocol Messages In the interests of clarity, however, let's examine a specific
example. The MCLT in this case is 1 hour. The desired lease
interval is 3 days, and its renewal time is half the lease inter-
val.
The Failover protocol messages are sent using UDP and encoded using a The rules for this example are:
packet format specific to the Failover protocol. To allow easy
recognition of and separation of Failover protocol messages from
DRAFT November 1998 o What to tell the client:
BOOTP and DHCP messages, BOOTP packet 'op' field values 3..11 are Take the remainder of the acknowledged potential lease interval.
used to indicate various Failover protocol message types. A Failover If this is a new lease, then this value will be zero. If this
protocol message is always unicast from the source to the destination remainder plus the MCLT is greater than the desired lease inter-
using the port defined in section 2.2. The sender, and never the val, give the client the desired lease interval else give the
recipient is responsible for retransmission when necessary. client the remainder plus the MCLT.
2.6. Failover protocol packet header format o What to tell the failover partner server:
Take the renewal interval (typically half of the actual client
lease interval), add to it the desired lease interval, and add
it to the current time to yield the value that goes into the
potential-expiration-time option.
Also tell the failover partner the actual lease interval by
adding it to the current time to yield the value that goes into
the lease-expiration option.
In operation this might work as follows:
When a server makes an offer for a new lease on an IP address to a
DHCP client, it determines the desired lease interval (in this
case, 3 days). It then examines the acknowledged potential lease
interval (which in this case is zero) and determines the remainder
of the time left to run, which is also zero. To this it adds the
MCLT. Since the actual lease interval cannot be allowed to exceed
the remainder of the current acknowledged potential lease interval
plus the MCLT, the offer made to the client is for the remainder
of the current acknowledged potential lease interval (i.e., zero)
plus the MCLT. Thus, the actual lease interval is 1 hour.
Once the server has performed the ACK to the DHCP client, it will
update the secondary server with the lease information. However,
the desired potential lease interval will be composed of the one
half of the current actual lease interval added to the desired
lease interval. Thus, the secondary server is updated with a
BNDUPD with a lease interval of 3 days + 1/2 hour specified in the
IP Address Lease Time Option (Option 51).
When the primary server receives an ACK to its update of the
secondary server's (partner's) potential lease interval, it
records that as the acknowledged potential lease interval. A
server MUST NOT send a BNDACK in response to a BNDUPD message
until it is sure that the information in the BNDUPD message
resides in its stable storage. Thus, the primary server in this
case can be sure that the secondary server has recorded the poten-
tial lease interval in its stable storage when the primary server
receives a BNDACK message from the secondary server.
When the DHCP client attempts to renew at T1 (approximately one
half an hour from the start of the lease), the primary server
again determines the desired lease interval, which is still 3
days. It then compares this with the remaining acknowledged
potential lease interval (3 days + 1/2 hour) and adjusts for the
time passed since the secondary was last updated (1/2 hour). Thus
the time remaining of the acknowledged potential lease interval is
3 days. Adding the MCLT to this yields 3 days plus 1 hour, which
is more than the desired lease interval of 3 days. So the client
is renewed for the desired lease interval -- 3 days.
When the primary DHCP server updates the secondary DHCP server
after the DHCP client's renewal ACK is complete, it will calculate
the desired potential lease interval as the T1 fraction of the
actual client lease interval (1/2 of 3 days this time = 1.5 days).
To this it will add the desired client lease interval of 3 days,
yielding a total desired partner server lease interval of 4.5
days. In this way, the primary attempts to have the secondary
always "lead" the client in its understanding of the client's
lease interval so as to be able to always offer the client the
desired client lease interval.
Once the initial actual client lease interval of the MCLT is past,
the protocol operates effectively like the DHCP protocol does
today in its behavior concerning lease intervals. However, the
guarantee that the actual client lease interval will never exceed
the remaining acknowledged partner server lease interval by more
than the MCLT allows full recovery from a variety of failures.
5.2.2. Controlled re-allocation of IP addresses
When in PARTNER-DOWN state there is a waiting period after which an
IP address can be re-allocated to another client. For leases which
are available when the server enters PARTNER-DOWN state, the period
is the MCLT from entry into PARTNER-DOWN state. For IP addresses
which are not available when the server enters PARTNER-DOWN state,
the period is the MCLT after the lease becomes available. See sec-
tion 9.4.2 for more details.
In any other state, a server cannot reallocate an address from one
client to another without first notifying its partner (through a
BNDUPD message) and receiving acknowledgement (through a BNDACK mes-
sage) that its partner is aware that that first client is not using
the address.
This could be modeled in the following way. Though this specific
implementation is in no way required, it may serve to better illus-
trate the concept.
An "available" IP address on a server may be allocated to any client.
An IP address which was leased to a client and which expired or was
released by that client would take on a new state, EXPIRED or
RELEASED respectively. The partner server would then be notified
that this IP address was EXPIRED or RELEASED through a BNDUPD. When
the sending server received the BNDACK for that IP address showing it
was FREE, it would move the IP address from EXPIRED or RELEASED to
FREE, and it would be available for allocation by the primary server
to any clients.
A server MAY reallocate an IP address in the EXPIRED or RELEASED
state to the same client with no restrictions.
5.3. Load balancing
In order to implement load balancing between a primary and secondary
server pair, each server must respond to DHCPDISCOVER requests from
some clients and not from other clients. In order to do this suc-
cessfully, each server must be able to determine immediately upon
receipt of a DHCP client request whether it is to service this
request or to ignore it in order to allow the other server to service
the request.
In addition, it should be possible to configure the percentage of
clients which will be serviced by either the primary or secondary
server. This configuration should be more or less continuous, from
all serviced by the primary through an even split with half serviced
by each, to all serviced by the secondary.
The technique chosen to support these goals is to define a hash func-
tion which must be applied to the client-identifier or to the htype
concatenated with the chaddr if no client-identifier is specified.
The results of this hash function yields a number between 0 and 255
which maps into one of 256 "hash-buckets". Each hash bucket is
assigned to one server or the other by the primary server whenever a
connection is established, through use of the hash-bucket-assignment
option.
The hash-bucket-assignment option uses a 32 octet value field (con-
taining 256 bits), with one bit associated with each possible hash
bucket. If the bit corresponding to a hash bucket is a 1 in the
hash-bucket-assignment option, then the secondary server is required
to service all DHCP client requests that map into that hash bucket
when in NORMAL state.
For example, if the primary server sends a hash-bucket-assignment
option to the secondary with the following 32 octets:
buckets
FF FF FF FF FF FF FF FF ( 0 - 63 )
FF FF FF FF FF FF FF FF ( 64 - 127 )
00 00 00 00 00 00 00 00 ( 128 - 191 )
00 00 00 00 00 00 00 00 ( 192 - 255 )
then the secondary MUST service any DHCP client requests where the
client-identifier or htype concatenated with the chaddr hashs into
the bucket values of 0 through 127.
See section 12 for the code to implement the hash bucket algorithm.
Each server MUST implement this same algorithm in order for all
clients to get service.
5.4. Operating in NORMAL state
When in NORMAL state, each server services DHCPDISCOVER's and all
other DHCP requests other than DHCPREQUEST/RENEWAL or
DHCPREQUEST/REBINDING from the client set defined by the load balanc-
ing algorithm. Each server services DHCPREQUEST/RENEWAL or
DHCPDISCOVER/REBINDING requests from any client.
In general, whenever the binding database is changed in stable
storage, then a BNDUPD message is sent with the contents of that
change to the partner server. The partner server then writes the
information about that binding in its bindings database in stable
storage and replies with a BNDACK message.
5.5. Operating in COMMUNICATIONS-INTERRUPTED state
When operating in COMMUNICATIONS-INTERRUPTED state, each server is
operating independently, but does not assume that its partner is not
operating. The partner server might be operating and simply unable
to communicate with this server, or might not be operating.
Each server responds to the full range of DHCP client messages that
it receives, but in such a way that graceful reintegration is alway
possible when its partner comes back into contact with it.
5.6. Operating in PARTNER-DOWN state
When operating in PARTNER-DOWN state, a server assumes that its
partner is not currently operating, but does make allowances for the
possibility that that server was operating in the past. It responds
to all DHCP client requests in PARTNER-DOWN state.
Any transactions that the partner server may have had with DHCP
clients but been unable to communicate to this server are allowed for
in the algorithms that are used to gradually take over full control
of all of the addresses configured into the server.
5.7. Operating in RECOVER state
A server operating in RECOVER state assumes that it is reintegrating
with a server that has been operating in PARTNER-DOWN state, and that
it needs to update its bindings database before it services DHCP
client requests.
A server may also operate in RECOVER state in order to fully recover
its bindings database from its partner server.
6. Packet Formats
This section discusses the common message format that all failover
messages have in common, and then defines option used in the failover
protocol.
6.1. Common message format
All failover protocol messages are sent over the TCP connection
between failover endpoints and encoded using a packet format specific
to the failover protocol.
There exists a common message format for all failover messages, which
utilizes the options in a way similar to the DHCP protocol. For each
message type, some options are required and some are optional. In
addition, when a message is received any options that are not under-
stood by the receiving server MUST be ignored.
All of the fields in the fixed portion of the packet MUST be filled All of the fields in the fixed portion of the packet MUST be filled
with correct data in every message sent. with correct data in every message sent.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| op (1) | rev (1) | payload offset (2) | | packet length (2) | msg type (1) |payload off (1)|
+---------------+---------------+---------------+---------------+ +---------------+---------------+---------------+---------------+
| xid (4) | | xid (4) |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
| sending server ID ( IP address ) (4) |
+---------------------------------------------------------------+
| time stamp (4) |
+---------------------------------------------------------------+
| state (1) | flags(1) | reserved (2) |
+---------------+---------------+---------------+---------------+
| 0 or more additional header bytes (variable) | | 0 or more additional header bytes (variable) |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
| Payload Data, formatted as DHCP-style options | | payload data (variable) |
| (although using a unique option number space) | | |
| (variable) | | formatted as DHCP-style options |
| using a unique option number space in the ?R6? |
| format defined by [NAMESPACE] |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
DRAFT November 1998 packet length - 2 bytes, network byte order
op - 1 byte This is the length of the packet. It includes the two byte packet
length itself.
These values extend the number space of the existing BOOTP message msg type - 1 byte
type "Op" field.
The message type field is used to distinguish between messages.
The following message types are defined: The following message types are defined:
Value Message Type Value Message Type
----- ------------ ----- ------------
0 reserved to BOOTP/DHCP, unused by failover 0 reserved not used
1 BOOTREQUEST (reserved to BOOTP/DHCP, unused by failover) 1 POOLREQ request allocation of addresses
2 BOOTREPLY (reserved to BOOTP/DHCP, unused by failover) 2 POOLRESP respond with allocation count
3 DHCPPOOLREQ request allocation of addresses 3 BNDUPD update partner with binding info
4 DHCPPOOLRESP respond with allocation count 4 BNDACK acknowledge receipt of binding update
5 DHCPBNDUPD update partner with binding info 5 CONNECT establish connection with partner
6 DHCPBNDACK acknowledge receipt of binding update 6 CONNECTACK respond to attempt to establish contact with partner
7 DHCPPOLL probe partner for comm. integrity 7 UPDREQALL request full transfer of binding info
8 DHCPPRPL acknowledge comm. integrity 8 UPDDONE ack send and ack of req'd binding info
9 DHCPUPDATEREQALL request full transfer of binding info 9 UPDREQ req transfer of un-acked binding info
10 DHCPUPDATEDONE ack send and ack of req'd binding info 10 STATE inform partner of current state or state change
11 DHCPUPDATEREQ req transfer of un-acked binding info 11 CONTACT probe communications integrity with partner
rev - 1 byte
Failover protocol version supported. Set to 1 for the Failover New message types should be defined in one of two ranges, 0-127 or
protocol described in this draft. The value 255 is reserved for 129-255. The range of 0-127 is used for messages that MUST be
experimental implementations. Such implementations SHOULD use the supported by every server, and if a server receives a message in the
DHCP Vendor Class option to recognize a partner server which is using range of 0-127 that it doesn't understand, it MUST drop the TCP con-
the same vendor's experimental implementation. nection. The range of 128-255 is used for messages which MAY be sup-
ported but are not required, and if a server receives a message in
this range that it does not understand it SHOULD ignore the message.
payload offset - 2 bytes, network byte order payload offset - 1 byte
The byte offset of the Payload area, from the beginning of the The byte offset of the Payload Data, from the beginning of the
Failover packet header. The value for the current protocol version is failover packet header. The value for the current protocol version is
20. 8.
xid - 4 bytes, network byte order xid - 4 bytes, network byte order
The sender of a Failover protocol packet is responsible for setting This is the transaction id of the failover packet. The sender of a
this number, and the receiver of the packet copies the number over failover protocol packet is responsible for setting this number, and
into any response packet, treating it as opaque data. The sender the receiver of the packet copies the number over into any response
SHOULD ensure that every packet sent to a particular IP address and packet, treating it as opaque data. The sender SHOULD ensure that
port combination has a unique transaction id unless that packet is a every packet sent from a particular failover endpoint over the
re-transmission. associated TCP connection has a unique transaction id unless that
packet is a re-transmission.
DRAFT November 1998 payload data - variable length
sending server ID - 4 bytes, network byte order The options are placed after the header, after skipping payload
offset bytes from beginning of the packet. The payload data options
are not preceded by a "cookie" value.
The IP address of the sending server. In conjunction with the The payload data is formatted as DHCP style options using the two
setting of the SECONDARY flag, this uniquely determines the failover byte option number and two byte option length format as specified in
entity sending the message as well as that destined to receive the the recommendations of the DHCP panel in [NAMESPACE].
message.
This is placed in the packet instead of being recovered from the IP The maximum length of the payload data in octets is 2048 less the
header for security purposes (see section 8). size of the header, i.e., the maximum packet length is 2048 octets.
time stamp - 4 bytes, unsigned, network byte order 6.2. Common option format
A time stamp, indicating the time when the packet was sent. The time The options contained in the payload data section of the failover
is a 32 bit unsigned long value in network byte order, in units of packet all use the two byte option number and two byte length format
seconds (GMT since EPOCH). as specified by the recommendations of the DHCP panel in [NAMESPACE].
It is used to determine the time drift between the sender and the The option numbers are drawn from an option number space unique to
recipient. The time drift is defined as the difference between the failover protocol. All of the message types share a common
"Arrive Time (GMT)" and "(Send Time (GMT)". The actual packet travel option number space and common options definitions, though not all
time is assumed to be negligible in this context. All Date-Time options are required or meaningful for every message.
values contained in Failover messages MUST be corrected by the time
drift before being stored by the recipient.
state - 1 byte In contrast to the options which appear in DHCP client and server
packets, the options in failover message are ordered. That is, for
some messages the order in which the options appear in the payload
data area is significant. The messages for which this is the case
spell it out in detail.
This field indicates the state of the sender, at the time the packet For all options which refer to time, they all use an absolute time in
was sent. The field MUST be set in every Failover message. The GMT. Time synchronization has already been achieved between the
server state value can be one of the following: source and the target server using the CONNECT message. All time
fields in the options defined below use a time represented as seconds
elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value representa-
tion). Note that this is (at present) a signed field.
Additional options can be defined for intervendor or vendor specific
use with limited difficulty due to the large number of option numbers
available.
6.2.1. binding-status
This option is used to convey the current state of a binding.
Code Len Type
+-----+-----+------+-----+-----+
| 0 | 1 | 0 | 1 | 1-7 |
+-----+-----+------+-----+-----+
Legal values for this option are:
Value Binding Status
----- ------------------------------------------------
1 FREE Lease has never been used
2 ACTIVE Lease is assigned to a client
3 EXPIRED Lease has expired
4 RELEASED Lease has been released by client
5 ABANDONED A server, or client flagged address as unusable
6 RESET Lease was freed by some external agent
7 BACKUP Lease belongs to secondary's private address pool
8 EXPIRED-GRACE Lease will become available after this period
9 RELEASED-GRACE Lease will become available after this period
6.2.2. assigned-IP-address
The IP address to which this message refers.
Code Len Address
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 2 | 0 | 4 | a1 | a2 | a3 | a4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.3. sending-server-IP-address
The IP address of the server sending this message.
Code Len Address
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 3 | 0 | 4 | a1 | a2 | a3 | a4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.4. addresses-transferred
A 32 bit unsigned long in network byte order. Reports the number of
addresses transferred by the primary to the secondary server
(addresses to be used for the secondary server's private address
pool)
Code Len Number of Addresses
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 4 | 0 | 4 | n1 | n2 | n3 | n4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.5. client-identifier
The format, code and conventions used are identical to DHCP option
61.
Code Len Client Identifier
+-----+-----+------+-----+----+-----+---
| 0 | 5 | 0 | n | i1 | i2 | ...
+-----+-----+------+-----+----+-----+--
6.2.6. client-hardware-address
The format is similar to DHCP option 61. Byte t1 (type) MUST be set
to the proper ARP hardware address code, as defined in the ARP
section of RFC 1700 (it MUST NOT be zero!)
Code Len MAC address
+-----+-----+------+-----+----+-----+-----+---
| 0 | 6 | 0 | n | t1 | m1 | m2 | ...
+-----+-----+------+-----+----+-----+-----+---
Either Client Id, Client Hardware Address or BOTH MAY be present in
binding update transactions. At least one of them MUST be present.
If both are present, the Client Id MUST be used to uniquely identify
the owner of the binding (exactly as in RFC 2131).
6.2.7. client-FQDN
If an implementation supports Dynamic DNS updates, this option can be
used to communicate the DNS name that was set. Uses the format of the
Client FQDN option (81) as described in [DDNS] and extended to fit in
the two byte code and length approach of the DHCP panel.
Code Len Flags Rcode1 Rcode2 Domain Name
+-----+-----+------+-----+-----+------+------+-----+------
| 0 | 7 | 0 | n | f | r1 | r2 | d1 | d2...
+-----+-----+------+-----+-----+------+------+-----+------
6.2.8. reject-reason
This option is used to selectively reject binding updates. It MAY be
used in BNDACK message, always associated with an assigned-IP-address
option, which contains the IP address of the update being rejected.
Code Len Reason Code
+-----+-----+------+-----+----------+
| 0 | 8 | 0 | 1 | R1 |
+-----+-----+------+-----+----------+
Reason codes :
0 Reserved
1 Illegal IP address (not part of any address pool)
2 Fatal conflict exists: address in use by other client.
3 Missing binding information.
4 Connection rejected, time mismatch too great.
5 Connection rejected, invalid MCLT.
6 Connection rejected, unknown reason.
7 Connection rejected, duplicate connection.
8 Connection rejected, invalid failover partner.
9 TLS not supported
10 TLS supported but not configured
11 TLS required but not supported by partner
12 Message digest not supported
13 Message digest not configured
14 Protocol version mismatch
15 Missing binding information
16 Outdata binding information
17 Less critical binding information
18-253, reserved.
254 Unknown: Error occurred but does not match any reason code
255 Reserved for code expansion
6.2.9. message
This option is used to supply a human readable message. It may be
used in association with the Reject Reason Code to provide a human
readable error message for the reject.
Code Len Text
+-----+-----+------+-----+------+-----+--
| 0 | 9 | 0 | n | c1 | c2 | ...
+-----+-----+------+-----+------+-----+--
6.2.10. MCLT
Maximum Client Lead Time, in seconds. A 32 bit integer value, in
network byte order. T
Code Len Time
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 10 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.11. vendor-class-identifier
A string which identifies the vendor of the failover protocol
implementation.
The code for this option is 60, and its minimum length is 1.
Code Len vendor class string
+-----+-----+------+-----+----+-----+---
| 0 | 11 | 0 | n | c1 | c2 | ...
+-----+-----+------+-----+----+-----+---
6.2.12. current-time
The current time expressed as an absolute time in GMT represented as
seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value
representation).
Code Len Current Time
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 12 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.13. lease-expiration-time
The lease expiration time expressed as an absolute time in GMT
represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t
time value representation).
The lease expiration time is the time that a server has ACKed to a
DHCP client.
Code Len Time
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 13 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.14. potential-expiration-time
The potential expiration time expressed as an absolute time in GMT
represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t
time value representation).
The potential expiration time is the time that one server tells
another server that it may ACK to a client.
Code Len Time
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 14 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.15. grace-expiration-time
The grace expiration time expressed as an absolute time in GMT
represented as seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t
time value representation).
The grace expiration time is the time that a grace period will
expire.
Code Len Time
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 15 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.16. client-last-transaction-time
The time at which this server last received a DHCP request from a
particular client expressed as an absolute time in GMT represented as
seconds elapsed since Jan 1, 1970 (i.e. ANSI C time_t time value
representation).
Code Len Partner Down Time
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 16 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.17. start-time-of-state
The time at which the state contained in this message began,
expressed as an absolute time in GMT represented as seconds elapsed
since Jan 1, 1970 (i.e. ANSI C time_t time value representation).
This option is used for different states in different messages. In a
BNDUPD message it represents the start time of the state of the lease
in the BNDUPD message. In a STATE message, it represents the start
time of the partner server's failover state.
Code Len Start Time of State
+-----+-----+------+-----+----+-----+-----+-----+
| 0 | 17 | 0 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+------+-----+----+-----+-----+-----+
6.2.18. server-state
This option is used to convey the current state of the failover
endpoint in the sending server.
Code Len Server State
+-----+-----+------+-----+-----+
| 0 | 18 | 0 | 1 | 1-9 |
+-----+-----+------+-----+-----+
Legal values for this option are:
Value Server State Value Server State
----- ------------------------------------------------------------- ----- -------------------------------------------------------------
0 NO-STATE May only occur in POLL messages. 0 reserved
The partner should reply, but
should not react with any state
transition.
1 STARTUP Startup state (1) 1 STARTUP Startup state (1)
2 NORMAL Normal state 2 NORMAL Normal state
3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe) 3 COMMUNICATIONS-INTERRUPTED Communication interrupted (safe)
4 PARTNER-DOWN Partner down (unsafe mode) 4 PARTNER-DOWN Partner down (unsafe mode)
5 POTENTIAL-CONFLICT Synchronizing 5 POTENTIAL-CONFLICT Synchronizing
6 RECOVER Recovering bindings from partner 6 RECOVER Recovering bindings from partner
7 PAUSED Shutting down for a short period. 7 PAUSED Shutting down for a short period.
8 SHUTDOWN Shutting down for an extended 8 SHUTDOWN Shutting down for an extended
period. period.
9 RECOVER-DONE Interlock state prior to NORMAL 9 RECOVER-DONE Interlock state prior to NORMAL
DRAFT November 1998 6.2.19. server-flags
Note 1: The STARTUP state is never set in the State field of the mes- This option is used to convey the current flags of the failover
sage, but rather is represented by the setting of the STARTUP flag endpoint in the sending server.
(see the description of the Flags field immediately below). When the
server is in the STARTUP state, the state transmitted in the State
byte is the PREVIOUS state (usually, but not always, the last
recorded in stable storage prior to a server going down -- see sec-
tion 6.3 for details.)
flags - 1 byte Code Len Server Flags
+-----+-----+------+-----+-------+
| 0 | 19 | 0 | 1 | flags |
+-----+-----+------+-----+-------+
Currently, bits 7 (MSB), 6, and 5 are defined. All other bits are Legal values for this option are:
reserved, and must be set to 0.
o SECONDARY Currently, bit 5 is defined. All other bits
are reserved, and must be set to 0.
Bit 7 is the SECONDARY flag and defines the server role. Bit 7 o STARTUP
is 0 if the sender is a primary server, 1 if it is a secondary
server. Note that this role is fixed for the duration of the
relationship between primary and secondary server. In particu-
lar, it does not change when and if the secondary server "takes
over" for the primary server when it enters COMMUNICATIONS-
INTERRUPTED or PARTNER-DOWN state -- each server retains its
role throughout all of its state transitions.
o RESTART Bit 5 is the STARTUP flag. Bit 5 MUST be set to 1 whenever the
server is in STARTUP state, and set to 0 otherwise. (Note that
when in STARTUP state, the state transmitted in the server-state
option is usually the last recorded state from stable storage,
but see section 9.3 for details.)
Bit 6 is the RESTART flag. If bit 6 is 1, the sender is res- 6.2.20. vendor-specific-options
tarting. A server MUST set this bit every time it is re-
started, and it MUST clear the bit upon receiving the first
DHCPPRPL to a DHCPPOLL message it has sent with the bit set.
Whenever a DHCPPOLL message is sent with the RESTART bit set in This option is used to convey options specific to a particular
the 'flags' field, the MCLT Option, Option 235, MUST be vendor's implementation. The vendor class identifier is used to
included. specify which option space the embedded options are drawn from.
Whenever a message with the RESTART bit is received by a server, It functions similarly to the vendor class identifier and vendor
it MUST transition through the communications failed state tran- specific options in the DHCP protocol.
sition. The RESTART bit signals that the partner server has
been restarted, and if communications is already considered to
have failed, then nothing need be done. If, however, the
partner server appeared to be operating correctly, then it was
able to restart without the receiving server noticing that it
was ever gone. The communications failed transition is forced
in this case to restart any on-going resynchronization processes
that were operating with the partner server. See section 6.3
for additional information.
Whenever a DHCPPOLL message is sent with the RESTART bit set, This option contains other options in the same two byte code, two
byte length format. If this option appears in a message without a
corresponding vendor class identifier, it MUST be ignored.
DRAFT November 1998 Code Len Embedded options
+-----+-----+------+-----+----+-----+---
| 0 | 20 | 0 | n | c1 | c2 | ...
+-----+-----+------+-----+----+-----+---
the server SHOULD include a Vendor Class Identifier, Option 60, 6.2.21. max-unacked-bndupd
in the message to identify the server to its partner.
o STARTUP The maximum number of BNDUPD message that this server is prepared to
accept over the TCP connection without causing the TCP connection to
block.
Bit 5 is the STARTUP flag. Bit 5 MUST be set to 1 whenever the Code Len Maximum Unacked BNDUPD
server is in STARTUP state, and set to 0 otherwise. (Note that +-----+-----+------+-----+----+-----+-----+-----+
when in STARTUP state, the state transmitted in the 'state' | 0 | 21 | 0 | 4 | n1 | n2 | n3 | n4 |
field is usually the last recorded state from stable storage, +-----+-----+------+-----+----+-----+-----+-----+
but see section 6.3 for details.)
reserved - 2 bytes 6.2.22. server-role
2 filler bytes, reserved. This option is used to convey the role of the failover endpoint in
the sending server.
2.7. DHCPPOOLREQ and DHCPPOOLRESP: Code Len Role
+-----+-----+------+-----+-------+
| 0 | 22 | 0 | 1 | r1 |
+-----+-----+------+-----+-------+
A secondary server requests addresses for its unique use from the A value of 0 indicates that the failover endpoint is a primary server
primary server by using the DHCPPOOLREQ message. The primary is in and a value of 1 indicates that it is a secondary server.
complete charge of how many addresses the secondary receives.
The primary server will allocate IP addresses to the secondary server 6.2.23. receive-timer
upon receipt of a DHCPPOOLREQ message and inform the secondary server
of the number of additional addresses allocated in this allocation
cycle by sending the number in the DHCPPOOLRESP message.
When the primary server gets a DHCPPOOLREQ message, it computes which The number of seconds within which the server must receive a packet
addresses should be transferred to the secondary, and queues up from its partner, or it will assume that the partner is down or the
DHCPBNDUPD transactions by setting the Status of the selected communication path to the partner has failed.
addresses to "BACKUP". Having done this, it sends a DHCPPOOLRESP
message. The DHCPPOOLRESP message carries the "Number of addresses
transferred" as its payload. The primary server does not have to
wait until all the above binding updates have been acknowledged,
The secondary server keeps sending DHCPPOOLREQ messages until it Code Len Receive Timer
receives a DHCPPOOLRESP with "Number of addresses transferred" = 0, +-----+-----+------+-----+----+-----+-----+-----+
or it decides that the partner is not responding. | 0 | 23 | 0 | 4 | s1 | s2 | s3 | s4 |
+-----+-----+------+-----+----+-----+-----+-----+
If the secondary server receives a DHCPPOOLRESP message with "Number 6.2.24. hash-bucket-assignment
of addresses transferred" > 0, it MUST send another DHCPPOOLREQ mes-
sage, since additional addresses may still be waiting for it. How-
ever, the time at which it sends subsequent DHCPPOOLREQ messages is
implementation dependent. This mechanism makes it possible for the
primary server to pace the transfer (e.g., it could generate all
addresses all at once, or one-by-one) and to some degree for the
secondary to pace their receipt.
DRAFT November 1998 The set of hash values to which the receiving server MUST respond.
See section 5.3 for more information on how this option is used.
The primary server MUST respond to each DHCPPOOLREQ message it This option consists of a set of 32 bytes, in network byte order,
receives. If it has already generated all private addresses, or it where each bit corresponds to one of 256 possible hash bucket values.
has no available addresses, it MUST send DHCPPOOLRESP with "Number If a bit is set to 1, the recipient is required to service the
of addresses transferred" = 0. requests whose client-identifier or htype concatenated with the
chaddr (if no client-identifier exists) map into the corresponding
hash bucket.
The secondary server MAY send a DHCPPOOLREQ message at any time, and Code Len Hash Buckets
although the primary server is under no obligation to allocate any +-----+-----+------+-----+----+-----+-----+-----+
additional addresses, it MUST respond with a DHCPPOOLRESP indicating | 0 | 24 | 0 | 32 | b1 | b2 | ... | b32 |
how many new addresses it has allocated or 0 if no new addresses were +-----+-----+------+-----+----+-----+-----+-----+
allocated.
2.8. DHCPUPDATEREQ, DHCPUPDATEREQALL and DHCPUPDATEDONE: 6.2.25. message-digest
Whenever either server wishes to be updated with information the The message digest for this message.
other server knows but has not yet transmitted, it will send a
DHCPUPDATEREQ or DHCPUPDATEREQALL message.
When either server gets a DHCPUPDATEREQ or DHCPUPDATEREQALL message, This option consists of a variable number of bytes which contain the
it computes which updates should be transferred to the partner, and message digest of the message prior to the inclusion of this option.
queues up DHCPBNDUPD transactions as appropriate. Once all such
updates have been acknowledged, it sends a DHCPUPDATEDONE message.
If the message that initiated this process was a DHCPUPDATEREQ mes- When this option appears in a message, it MUST appear as the last
sage, the receiving server will transmit only DHCPBNDUPD messages for option in the message.
IP addresses which its information indicates that its partner has not
acked.
If, however, the message that initiated this process was a DHCPUP- Code Len Message Digest
DATEREQALL message, the receiving server will transmit DHCPBNDUPD +-----+-----+------+-----+----+-----+-----
messages for all IP addresses involved in failover with this partner | 0 | 25 | 0 | n | d1 | d2 | ...
in this role. +-----+-----+------+-----+----+-----+-----
The secondary server periodically re-transmits the DHCPUPDATEREQ mes- 6.2.26. protocol-version
sage, until it receives a DHCPUPDATEDONE message with a matching
'xid' field, or until it decides that the partner is not responding.
This approach is similar to the DHCPPOOLREQ/DHCPPOOLRESP message The protocol version being used by the server. It is only sent in the
exchange, with one critical difference: the DHCPPOOLRESP is sent as CONNECT and CONNECTACK messages.
soon as the binding updates are queued up, but the DHCPUPDATEDONE
message is deferred until all of the sender's DHCPBNDUPD messages
have been successfully transmitted and a corresponding DHCPBNDACK
message has been received for each of them.
The server processing a DHCPUPDATEREQ message MUST NOT send a Code Len Version
corresponding DHCPUPDATEDONE message until all of the DHCPBNDUPD mes- +-----+-----+------+-----+----+
sages have been acked by the partner with a DHCPBNDACK message. | 0 | 26 | 0 | 1 | v1 |
+-----+-----+------+-----+----+
DRAFT November 1998 6.2.27. TLS-request
Any retransmissions of the DHCPUPDATEREQ message MUST have the same This option contains information relating to TLS security
transaction ID. Use of a new transaction ID may cause rebuilding of negotiation. It is sent in a CONNECT message
the outgoing binding update queue or other processing in the server
with a negative effect on performance.
2.9. DHCPBNDUPD The first byte, req, is the TLS request from this server. A value of
0 indicates no TLS operation, a value of 1 indicates that TLS
operation is desired, and a value of 2 indicates that TLS operation
is required to establish communications with this server.
One server notifies its partner of a binding state change by using The second byte, acc, is what this server will accept for TLS
the DHCPBNDUPD message. operation. A value of 0 means that this server will not accept TLS
connections. A value of 1 means that this server will accept TLS
connections.
Every DHCPBNDUPD message MUST contain: If req is not zero, then acc MUST be 1.
o An Assigned IP Address Option (Option 50). This allows a server which is not configured for TLS support to
inform its partner that it will accept a TLS connection although it
does not desire one, for instance.
o A DHCP Binding Status (Option X). Code Len request acccept
+-----+-----+------+-----+----+----+
| 0 | 27 | 0 | 2 | req| acc|
+-----+-----+------+-----+----+----+
o Where the Binding Status is ACTIVE, EXPIRED, RELEASED, or RESET, 6.2.28. TLS-reply
it MUST also contain one or both of the Client Identifier
(Option 61) and the Client Hardware Address (Option X+3). In the
case where the Binding Status is ACTIVE, it MUST contain the
Lease Duration, Option 51.
o Where dynamic DNS updates are being used by the sending server, This option contains information relating to TLS security
the Client FQDN Option, Option 81, is used by the sender to negotiation. It is sent in a CONNECTACK message
communication the status of the binding update to its partner.
In response to a binding update, the recipient server MUST respond
with a DHCPBNDACK message.
Multiple binding updates MAY be batched up, and sent in one Failover The value of 0 indicates no TLS operation, a value of 1 indicates
protocol message (see section 3.1). that TLS operation is required.
2.10. DHCPBNDACK Code Len TLS
+-----+-----+------+-----+----+
| 0 | 28 | 0 | 1 | t1 |
+-----+-----+------+-----+----+
This message implements either a positive or negative acknowledgment 6.3. BNDUPD message format
of one or more binding updates.
A binding update, (or a batch of binding updates sent as one message) The binding update (BNDUPD) message is used to send the binding data-
are matched up with their associated acknowledgment by having the base changes to the partner server.
same 'xid' field value in the message header.
The server sending a DHCPBNDACK message MAY include any of the The message type for the BNDUPD message is 3.
options that are acceptable in a DHCPBNDUPD message when the
DHCPBNDACK message is returned to the sender. It MUST include at
least the Assigned IP Address Option.
If any of this information differs from the information in the The xid of the BNDUPD MUST be unique with respect to other failover
DHCPBNDUPD message, the receiver MUST NOT update its bindings messages transmitted from this failover endpoint.
DRAFT November 1998 The following table summarizes the various options for the BNDUPD
message.
database with that information upon receipt of the DHCPBNDACK mes- binding-status
sage, since the sender will have no way of knowing if the receiver
actually received the message.
The DHCPBNDACK MAY selectively reject one or more updates, by includ- Option ACTIVE EXPIRED RELEASED FREE
ing one or more IP address - Reject Reason option pairs in the mes- ------ ------ ------- -------- ----
sage body. assigned-IP-address MUST MUST MUST MUST
binding-status MUST MUST MUST MUST
client-identifier MAY MAY MAY MAY
client-hardware-address MUST MUST MUST MAY
lease-expiration-time MUST MUST NOT MUST NOT MUST NOT
potential-expiration-time MUST MUST NOT MUST NOT MUST NOT
grace-expiration-time MUST NOT MUST NOT MUST NOT MUST NOT
start-time-of-state SHOULD SHOULD SHOULD SHOULD
client-last-trans.-time SHOULD SHOULD SHOULD MAY
client-FQDN(1) SHOULD SHOULD SHOULD SHOULD
all others MAY MAY MAY MAY
The DHCPBNDACK implicitly acknowledges any binding updates it replies binding-status
to, except those it enumerates using Reject Reason Codes. BACKUP
EXPIRED- RELEASED- RESET
Option GRACE GRACE ABANDONED
------ ------ ----- ---------
assigned-IP-address MUST MUST MUST
binding-status MUST MUST MUST
client-identifier MAY MAY MAY(2)
client-hardware-address MAY MAY MAY(2)
lease-expiration-time MUST NOT MUST NOT MUST NOT
potential-expiration-time MUST NOT MUST NOT MUST NOT
grace-expiration-time MUST MUST MUST NOT
start-time-of-state SHOULD SHOULD SHOULD
client-last-trans.-time SHOULD SHOULD MAY
client-FQDN(1) SHOULD SHOULD SHOULD
all others MAY MAY MAY
Implementations of this protocol MAY send batched updates, and they (1) Only SHOULD appear if client supplies a host name and dynamic DNS
MUST be prepared to receive batched updates. is used.
2.11. DHCPPOLL (2) MUST NOT if binding-status is ABANDONED.
In the absence of other messages, a DHCPPOLL message is used to Table 6.3-1: Options used in a BNDACK message
verify the communications integrity of the link between the primary
and secondary servers. It is used by either server whenever there is
some question about either the communications integrity or running
status of the other server.
Since current state and other status information is transmitted in 6.4. BNDACK message format
every DHCPPOLL and in every DHCPPRPL message, the DHCPPOLL and
DHCPPRPL exchange can also be used to signal a change in status by a
server or as a way to request an update of the status of its partner.
Whenever a DHCPPOLL message is generated it MUST have a unique value A server sends a binding acknowledgement (BNDACK) message when it has
in the 'xid' field, unless it is a retransmission of a previously successfully committed binding database changes received from a fail-
un-acked DHCPPOLL message. over partner in a BNDUPD message to its own stable storage.
2.12. DHCPPRPL The message type for the BNDACK message is 4.
This message simply replies to the DHCPPOLL message (PRPL = Poll The xid in a BNDACK MUST be the same as the xid of the corresponding
reply). Like all messages, it needs to have all of the fixed BNDUPD.
portions of the failover packet header filled in, including the state
and the flags fields.
3. Protocol Payload Data Format The following table summarizes the options for the BNDACK message.
Payload data is encoded as a set of flexible DHCP/BOOTP style options binding-status
[RFC 2132]. (The usual 1 byte option code, 1 byte length, and
"length" bytes of data). The options are placed after the header,
after skipping PayloadOffset bytes. The payload data options are not
preceded by a "cookie" value.
DRAFT November 1998 Option ACTIVE EXPIRED RELEASED FREE
------ ------ ------- -------- ----
assigned-IP-address MUST MUST MUST MUST
binding-status MUST MUST MUST MUST
client-identifier MAY MAY MAY MAY
client-hardware-address MUST MUST MUST MAY
reject-reason MAY MAY MAY MAY
message MAY MAY MAY MAY
lease-expiration-time MUST MUST NOT MUST NOT MUST NOT
potential-expiration-time MUST MUST NOT MUST NOT MUST NOT
grace-expiration-time MUST NOT MUST NOT MUST NOT MUST NOT
start-time-of-state SHOULD SHOULD SHOULD SHOULD
client-last-trans.-time SHOULD SHOULD SHOULD MAY
client-FQDN(1) SHOULD SHOULD SHOULD SHOULD
all others MAY MAY MAY MAY
Since the packet is NOT a DHCP/BOOTP protocol packet, the options binding-status
used here do not conflict with any existing "proper" DHCP/BOOTP BACKUP
options. In fact, these options are allocated in relationship to the EXPIRED- RELEASED- RESET
DHCP option space in the following way. Option GRACE GRACE ABANDONED
------ ------ ----- ---------
assigned-IP-address MUST MUST MUST
binding-status MUST MUST MUST
client-identifier MAY MAY MAY
client-hardware-address MAY MAY MAY(2)
reject-reason MAY MAY MAY
message MAY MAY MAY
lease-expiration-time MUST NOT MUST NOT MUST NOT
potential-expiration-time MUST NOT MUST NOT MUST NOT
grace-expiration-time MUST MUST MUST NOT
start-time-of-state SHOULD SHOULD SHOULD
client-last-trans.-time SHOULD SHOULD MAY
client-FQDN(1) SHOULD SHOULD SHOULD
all others MAY MAY MAY
In cases where the syntax and semantics of a Failover Payload Option (1) Only SHOULD appear if client supplies a host name and dynamic DNS
is identical to that of a DHCP/BOOTP option, the same option number is used.
is used. For options unique to the Failover protocol, option numbers
starting at 230 are used.
Thus, all new Failover protocol option numbers are assigned from a (2) MUST NOT if binding-status is ABANDONED.
continuous range beginning with 230.
The protocol is permissive in allowing various other DHCP options in Table 6.4-1: Options used in a BNDACK message
binding updates. As long as the sender wishes to use an option, it
MAY include it. On the other hand, the recipient MUST ignore any
option it is not prepared to process.
3.1. Batching multiple binding updates in one packet 6.5. Bulking for BNDUPD and BNDACK messages
Implementations of this protocol MAY send batched updates, and they DISCUSSION:
MUST be prepared to receive batched updates.
Multiple DHCPBNDUPD transactions MAY be batched together in one Bulking is planned for this protocol, but it hasn't been specified
protocol message. Data sets for individual transactions MUST always in this revision of the draft. Once the draft settles down, we
begin with the Assigned IP Address (Option 50). Option ordering will specify the bulking approach in detail.
between the Assigned IP Address options is not significant.
If batched updates are sent, they MUST be formatted as follows: 6.6. UPDREQ message format
Non-IP Address/Non-client specific options first The update request (UPDREQ) message is used by one server to request
Assigned IP address option (50) for the first address that its partner send it all binding database information that it has
Options pertaining to first address, including not already seen.
at least DHCP Binding Status (230)
Assigned IP address option (50) for the second address
Options pertaining to second address, including
at least DHCP Binding Status (230)
...
In case an implementation chooses to reject some or all of the IP The message type for the UPDREQ message is 9.
address binding information in a DHCPBNDUPD message in a DHCPBNDACK
reply, the DHCPBNDACK message MUST contain one or more Assigned IP
Address (Option 50) / Reject Reason Code pairs to indicate that the
updates for the address(es) were not accepted. The Assigned IP
Address options communicates which updates out of the batch are being
rejected, and the Reject Reason Code indicates why. Any IP addresses
DRAFT November 1998 The xid in a UPDREQ message MUST be unique among messages transmitted
from this failover endpoint during the life of this connection.
present in the DHCPBNDUPD message without corresponding Option 50/ There are no options that MUST appear in an UPDREQALL message. Any
Reject Reason Code pairs in the DHCPBNDACK message are implicitly option MAY appear.
acked by the DHCPBNDACK message. If the DHCPBNDUPD message only con-
tains one binding update and that update is rejected, a DHCPBNDACK
with a single Assigned IP Address / Reject Reason Code pair MUST be
sent.
3.2. DHCP Binding Status 6.7. UPDREQALL message format
This option is used to convey the current state of a binding. This The update request all (UPDREQALL) message is used by one server to
option is mandatory for DHCPBNDUPD messages. request that all binding database information be sent in order to
recover from a total loss of its lease state database by the request-
ing server.
Code Len Type The message type for the UPDREQALL message is 7.
+-----+-----+-----+
| 230 | 1 | 1-7 |
+-----+-----+-----+
Legal values for this option are:
Value Binding Status The xid in a UPDREQALL message MUST be unique among messages
----- ------------------------------------------------ transmitted from this failover endpoint during the life of this con-
1 FREE Lease has never been used nection.
2 ACTIVE Lease is assigned to a client
3 EXPIRED Lease has expired
4 RELEASED Lease has been released by client
5 ABANDONED A server, or client flagged address as unusable
6 RESET Lease was freed by some external agent
7 BACKUP Lease belongs to secondary's private address pool
3.3. Assigned IP address There are no options that MUST appear in an UPDREQALL message. Any
option MAY appear.
Uses identical code and format to DHCP Option 50 (requested IP 6.8. UPDDONE message format
address). This option is mandatory for DHCPBNDUPD messages and in
any DHCPBNDACK message where a Reject Reason Code option appears.
Code Len Address The update done (UPDDONE) message is used by the responding server to
+-----+-----+-----+-----+-----+-----+ indicate that all requested updates have been sent by the responding
| 50 | 4 | a1 | a2 | a3 | a4 | server as BNDUPD messages and acked by the requesting server using
+-----+-----+-----+-----+-----+-----+ BNDACK messages. While a BNDACK message MUST have been received for
each IP address that was sent in a BNDUPD message, the BNDACK message
could have contained a reject-reason in order to NAK that specific
update.
DRAFT November 1998 Thus, this message confirms that the requesting server has received
and responded to a BNDUPD message for all of the requested updates,
but it does require the requesting server to accept all of the
offered updates.
3.4. Absolute time The message type for the UPDDONE message is 7.
This absolute time is used for the lease grant time as well the The xid in an UPDDONE message MUST be identical to the xid in the
partner-down time. When used in a DHCPBNDUPD or DHCPBNDACK UPDREQ or UPDREQALL message that initiated the update process.
message, it represents the lease grant time. When used in a DHCPPOLL
message, it represents the partner-down time.
An absolute, GMT time value for this option, as time synchronization There are no options that MUST appear in an UPDDONE message. Any
has already been achieved between the source and the target server option MAY appear.
using the time field in the message. Represented as seconds elapsed
since Jan 1, 1970 (i.e. ANSI C time_t time value representation).
Note that this is (at present) a signed field.
Code Len Time 6.9. POOLREQ message format
+------+-----+-----+-----+-----+-----+
| 231 | 4 | t1 | t2 | t3 | t4 |
+------+-----+-----+-----+-----+-----+
3.5. Number of addresses transferred to Secondary Server The pool request (POOLREQ) is used by the secondary server to request
an allocation of IP addresses from the primary server.
A 32 bit unsigned long in network byte order. Reports the number of The message type for the POOLREQ message is 1.
addresses transferred by the primary to the secondary server
(addresses to be used for the secondary server's private address
pool)
Code Len Number of Addresses The xid in a POOLREQ message MUST be unique among messages transmit-
+-----+-----+-----+-----+-----+-----+ ted from this failover endpoint during the life of this connection.
| 232 | 4 | n1 | n2 | n3 | n4 |
+-----+-----+-----+-----+-----+-----+
3.6. Lease Duration There are no options that MUST appear in a POOLREQ message. Any
option MAY appear.
Uses the format and code of the standard DHCP IP Address Lease Time 6.10. POOLRESP message format
option (51). The time is in units of seconds, and is specified as a
32-bit unsigned integer. A Lease Duration of 0xFFFFFFFF indicates an
infinite lease.
Code Len Lease Time The pool response (POOLRESP) is used by the primary server to inform
+-----+-----+-----+-----+-----+-----+ the secondary server how many IP addresses it was allocated as the
| 51 | 4 | t1 | t2 | t3 | t4 | result of a pool request.
+-----+-----+-----+-----+-----+-----+
DRAFT November 1998 The message type for the POOLRESP message is 2.
3.7. Client Identifier The xid in the POOLRESP message MUST be identical to the xid in the
POOLREQ message for which this POOLRESP is a response.
The format, code and conventions used are identical to DHCP option The following table shows the options that MUST appear in a POOLRESP
61. message:
Code Len Type Client-Identifier Option
+-----+-----+-----+-----+-----+--- ------
| 61 | n | t1 | i1 | i2 | ... addresses-transferred MUST
+-----+-----+-----+-----+-----+---
3.8. Client Hardware Address Table 6.10-1: Options used in a STATE message
The format is similar to DHCP option 61. T1 (type) MUST be set to the 6.11. CONNECT message format
proper ARP hardware address code, as defined in the ARP section of
RFC 1700 (it MUST NOT be zero!)
Code Len Type MAC address The connect (CONNECT) message is used by either server to establish a
+-----+-----+-----+-----+-----+--- high level connection with the other server, and to transmit several
| 233 | n | t1 | m1 | m2 | ... important configuration data items between the servers.
+-----+-----+-----+-----+-----+---
Either Client Id, Client Hardware Address or BOTH MAY be present in The message type for the CONNECT message is 5.
binding update transactions. At least one of them MUST be present.
If both are present, the Client Id MUST be used to uniquely identify
the owner of the binding (exactly as in RFC 2131).
3.9. Host Name The xid in a CONNECT message MUST be unique among messages transmit-
ted from this failover endpoint during the life of this connection.
Uses the format and code of DHCP option 12. The CONNECT message MUST be the first message sent down a newly esta-
blished connection.
Code Len Host Name The following table summarizes the options that are associated with
+-----+-----+-----+-----+-----+-----+-----+-----+-- the CONNECT message:
| 12 | n | h1 | h2 | h3 | h4 | h5 | h6 | ...
+-----+-----+-----+-----+-----+-----+-----+-----+--
3.10. Domain Name role
Uses the format and code of DHCP option 15. Option primary secondary
------ ------ ---------
sending-server-IP-address MUST MUST
server-role MUST MUST
max-unacked-bndupd MUST MUST
receive-timer MUST MUST
current-time MUST MUST
vendor-class-identifier MUST MUST
protocol-version MUST MUST
TLS-request MUST(1) MUST(1)
MCLT MUST MUST NOT
hash-bucket-assignment MUST MUST NOT
all others MAY MAY
Code Len Domain Name (1) If the CONNECT message is being sent on a TLS secured connection,
+-----+-----+-----+-----+-----+-----+-- then there MUST NOT be a TLS-request option.
| 15 | n | d1 | d2 | d3 | d4 | ...
+-----+-----+-----+-----+-----+-----+--
DRAFT November 1998 Table 6.11-1: Options used in a CONNECT message
3.11. Client FQDN 6.12. CONNECTACK message format
If an implementation supports Dynamic DNS updates, this option can be The connect response (CONNECTACK) message is used by a server to
used to communicate the DNS name that was set. Uses the format and respond to the receipt of a CONNECT message.
code of the Client FQDN option (81) as described in <draft-ietf-dhc-
dhcp-dns-08.txt>.
Code Len Flags Rcode1 Rcode2 Domain Name The message type for the CONNECTACK message is 6.
+-----+-----+-----+------+------+-----+------
| 81 | n | f | r1 | r2 | d1 | d2...
+-----+-----+-----+------+------+-----+------
3.12. Reject Reason Code The xid in the CONNECTACK message MUST be identical to the xid in the
CONNECT message for which this CONNECTACK is a response.
This option is used to selectively reject binding updates. It MAY be The following table summarizes the options associated with the CON-
used in DHCPBNDACK message, always following an option 50. Option 50 NECTACK message:
contains the IP address of the specific update being rejected.
Note that a Message option, DHCP Option 56, may be included to give a Option
human readable error indication along with the Reject Reason Code. ------
sending-server-IP-address MUST
server-role MUST
max-unacked-bndupd MUST
receive-timer MUST
current-time MUST
vendor-class-identifier MUST
protocol-version MUST
TLS-reply MUST(1)
reject-reason MAY(2)
message MAY
Code Len Reason code (1) If the CONNECTACK is being sent over an already TLS secured
+-----+-----+----------+ connection, then the TLS-reply option MUST NOT appear.
| 234 | 1 | R1 |
+-----+-----+----------+
Reason codes : (2) Indicates a rejection of the CONNECT message.
0 Reserved Table 6.12-1: Options used in a CONNECTACK message
1 Illegal IP address (not part of any address pool)
2 Fatal conflict exists: address in use by other client.
3 - 253 Reserved for new Reason Codes.
254 Unknown: Error occurred but does not match any reason code
255 Reserved for code expansion
DRAFT November 1998 6.13. STATE message format
3.13. Message The state (STATE) message is used by either server to communicate the
current state of the failover endpoint with the other server. It
MUST be sent immediately after a connection is established with
another server, and it MUST be sent whenever the server's state
changes.
This option is used to supply a human readable message. It may be The message type for the STATE message is 10.
used in association with the Reject Reason Code to provide a human
readable error message for the reject.
Code Len Text The xid in a STATE message MUST be unique among messages transmitted
+-----+-----+------+-----+-- from this failover endpoint during the life of this connection.
| 56 | 1 | c1 | c2 | ...
+-----+-----+------+-----+--
3.14. MCLT - Maximum Client Lead Time The following table shows the options that MUST appear in a STATE
message:
Maximum Client Lead Time, in seconds. A 32 bit integer value, in Option
network byte order. This option MUST be used in DHCPPOLL and DHCPPRPL ------
messages, when the server is NOT in normal state. sending-state MUST
server-flags MUST
start-time-of-state MUST
Code Len Time Table 6.13-1: Options used in a STATE message
+------+-----+-----+-----+-----+-----+
| 235 | 4 | t1 | t2 | t3 | t4 |
+------+-----+-----+-----+-----+-----+
3.15. Vendor Class Identifier 6.14. CONTACT message format
A string which identifies the vendor of the failover protocol The contact (CONTACT) message is used by either server to verify that
implementation. the connection is operational to the other server.
The code for this option is 60, and its minimum length is 1. The message type for the CONTACT message is 11.
Code Len Vendor Class Identifier The xid in a CONTACT message MUST be unique among messages transmit-
+-----+-----+-----+-----+-----+-- ted from this failover endpoint during the life of this connection.
| 60 | n | i1 | i2 | i3 | ...
+-----+-----+-----+-----+-----+--
4. Challenging scenarios for a Failover protocol The following table shows the options that MUST appear in a CONTACT
message:
There exist a number of failure scenarios which will challenge the Option
correctness guarantees of the Failover protocol. Two of the ------
scenarios that the Failover protocol was specifically designed to current-time MUST
handle correctly are detailed in this section in order to motivate
some of the more unusual aspects of the protocol's operations.
DRAFT November 1998 Table 6.14-1: Options used in a CONTACT message
4.1. Primary Server crash before "lazy" update: 7. Protocol Messages
In the case where the primary server sends a DHCPACK to a client for This section contains the detailed definition of the protocol mes-
a newly allocated IP address and then crashes prior to sending the sages, including the information to include when sending the message,
corresponding update to the secondary server, the secondary server as well as the actions to take upon receiving the message.
will have no record of the IP address allocation. When the secondary
server takes over, it may well try to allocate that IP address to a 7.1. BNDUPD message
different client. In the case where the first client to receive the
IP address is not on the net at the time (yet while there was still The binding update (BNDUPD) message is used to send the binding data-
time to run on its lease), an ICMP echo (i.e., ping) will not prevent base changes to the partner server, and the partner server responds
the secondary server from allocating that IP address to different with a binding acknowledgement (BNDACK) message when it has success-
fully commited those changes to its own stable storage.
The rest of the failover protocol exists to determine whether the
partner server is able to communicate or not, and to enable the
partners to exchange BNDUPD/BNDACK messages in order to keep their
binding databases in stable storage synchronized.
7.1.1. Sending the BNDUPD message
A BNDUPD message SHOULD be generated whenever any binding changes. A
change might be in the binding-status, the lease-expiration-time, or
even just the last-transaction-time. In general, any time a DHCP
client sends in a packet that results in a DHCP server writing to its
stable storage, a BNDUPD message SHOULD be generated.
The BNDUPD (and BNDACK) messages refer to the binding-status of the
IP address, and this protocol defines a series of binding-statuses,
discussed in more detail below. Some servers may not support all of
these binding-statuses, and so in those cases they will not be sent,
and upon receipt a reasonable interpretation should be made.
All BNDUPD messages MUST contain the IP address in the assigned-IP-
address option, and it contains the IP address about which the BNDUPD
message is being sent.
All BNDUPD messages MUST contain the binding-status option, and it
will have one of the values in the following list. This list
discusses the meanings of the various binding-statuses and the infor-
mation that should go into the BNDUPD message because of them.
o ACTIVE
Indicates that the IP address is currently leased to a DHCP
client. client.
This is handled in the protocol by having the primary and secondary client-hardware-address
allocate addresses for new clients from distinct address pools.
A more likely (in that DHCPRENEWs are presumably more common than The client-hardware-address option MUST appear, and be set from
DHCPDISCOVERs) and more subtle version of this problem is where the the MAC address of the DHCP client to which this IP address is
primary server crashes after extending a client's lease time, and leased.
before updating the secondary with a new time using a lazy update.
After the secondary takes over, if the client is not connected to the
network the secondary will believe the client's lease has expired
when, in fact, it has not. In this case as well, the IP address
might be reallocated to a different client while the first client is
still using it.
This scenario is handled by the Failover protocol through control of client-identifier
the lease time and the use of the maximum client lead time (MCLT).
See the next section for details.
4.2. Network partition where servers can't communicate but each can If the DHCP client to which this IP address is leased used a
talk to clients: client-identifier option to identify itself, then the client-
identifier MUST appear in the BNDUPD message, else it MUST NOT
appear.
Several conditions are required for this situation to occur. First, lease-expiration-time
due to a network failure, the primary and secondary servers cannot The lease-expiration-time option MUST appear, and be set to the
communicate. As well, some of the DHCP clients must be able to expiration time most recently ACKed to the DHCP client. Note
communicate with the primary server, and some of the clients must now that the time ACKed to a DHCP client is a lease duration in
only be able to communicate with the secondary server. When this seconds, while the lease-expiration-time option in a BNDUPD mes-
condition occurs, both primary and secondary servers could attempt to sage is an absolute time value.
allocate IP addresses for new clients from the same pool of available
addresses. At some point, then, two clients will end up being
allocated the same IP address. This will cause potentially serious
problems when the network failure that created this situation is
corrected.
This is handled in the protocol by having the primary and secondary potential-expiration-time
servers allocate addresses for new clients from distinct address
DRAFT November 1998 The potential-expiration-time option MUST appear, and be set to
a value beyond that of the lease-expiration time. This is the
value that is ACKed by the BNDACK message. A server sending a
BNDUPD message MUST be able to recover the potential-
expiration-time sent in every BNDUPD, not just those that
receive a corresponding BNDACK, in order to be able to protect
against possible duplicate allocation of IP addresses after
transitioning to PARTNER-DOWN state. See section 5.2.1 for
details as to why the potential-expiration-time exists and
guidelines for how to decide the value.
pools. o EXPIRED
The specifics of how these two scenarios are handled are supplied in A binding-status of EXPIRED is used when a client's binding on
the next section. an IP address has expired and the server does not wish to imple-
ment an expired-grace period. When the partner server ACK's the
BNDUPD of an EXPIRED IP address, the server sets its internal
state to FREE. It is then available to allocation to any client
of the primary server.
5. Duplicate Address Assignment Control client-hardware-address
There are several ways that the Failover protocol avoids the possi- There SHOULD be a DHCP client associated with the IP address
bility of duplicate address assignment. whose binding has expired. If there is, then the client-
hardware-address option MUST appear, and be set from the MAC
address of the DHCP client to which this IP address was leased.
5.1. Control of lease time client-identifier
The key problem with lazy update is that when the a server fails There SHOULD be a DHCP client associated with the IP address
after updating a client with a particular lease time and before whose binding has expired. If there is, then if the DHCP client
updating its partner, the partner will believe that a lease has to which this IP address was leased used a client-identifier
expired even though the client still retains a valid lease on that IP option to identify itself, then the client-identifier MUST
appear in the BNDUPD message, else it MUST NOT appear.
o RELEASED
A binding-status of RELEASED is used when a DHCP client sends in
a DHCPRELEASE message and the server does not wish to implement
a released-grace period. When the partner server ACK's the
BNDUPD of an RELEASED IP address, the server sets its internal
state to FREE, and it is available for allocation by the primary
server to any DHCP client.
client-hardware-address
There SHOULD be a DHCP client associated with the IP address
whose binding has been released. If there is, then the client-
hardware-address option MUST appear, and be set from the MAC
address of the DHCP client which released this IP address.
client-identifier
There SHOULD be a DHCP client associated with the IP address
whose binding has been released. If there is, then if the DHCP
client which released this IP address used a client-identifier
option to identify itself, then the client-identifier MUST
appear in the BNDUPD message, else it MUST NOT appear.
o FREE
A binding-status of FREE is used when a DHCP server needs to
communicate that an IP address is available for allocation to
another server, but it was not just released, expired, or reset
by a network administrator. When the partner server ACK's the
BNDUPD of an FREE IP address, the server sets its internal state
such that it is available for allocation by any DHCP client.
client-hardware-address
There MAY be a DHCP client associated with the IP address whose
binding is now desired to be FREE. If there is, then the
client-hardware-address option MUST appear, and be set from the
MAC address of the DHCP client which released this IP address.
client-identifier
There MAY be a DHCP client associated with the IP address whose
binding is now desired to be FREE. If there is, then if the
DHCP client which released this IP address used a client-
identifier option to identify itself, then the client-identifier
MUST appear in the BNDUPD message, else it MUST NOT appear.
o EXPIRED-GRACE
Some servers support a grace period after lease expiration, to
handle clock speed differences between clients and servers as
well as to limit the number of times names are removed and
subsequently added to dynamic DNS.
client-hardware-address
There MAY be a DHCP client associated with the IP address whose
binding has now expired. If there is, then the client-
hardware-address option MUST appear, and be set from the MAC
address of the DHCP client which released this IP address.
client-identifier
There MAY be a DHCP client associated with the IP address whose
binding hs now expired. If there is, then if the DHCP client
which most recently leased this IP address used a client-
identifier option to identify itself, then the client-identifier
MUST appear in the BNDUPD message, else it MUST NOT appear.
grace-expiration-time
The grace-expiration-time option MUST appear, and is the length
of time that this server will wait before trying to make the IP
address available after the lease has expired for this IP
address. address.
In order to handle this problem, a period of time known as the "Max- o RELEASED-GRACE
imum Client Lead Time" (MCLT) is defined and must be known to both
the primary and secondary servers. Proper use of this time interval
places an upper bound on the difference allowed between the lease
time provided to a DHCP client by a server and the lease time known
by that server's partner. In order that this is not the maximum
lease time that a server can ever provide to a client, during a lazy
update the updating server typically updates its partner with lease
time information which is longer than the lease time previously given
to the client. This allows that server to give a longer lease time
to the client the next time the client renews its lease.
When moving to the PARTNER-DOWN state (where a server is allowed to Some servers support a grace period after lease release by a
reallocate the partner's IP addresses), a server will wait the Max- DHCP client, to handle clock speed differences between clients
imum Client Lead Time before allocating any IP addresses from its and servers as well as to limit the number of times names are
partner's pool to any new DHCP clients. Thus, any clients which have removed and subsequently added to dynamic DNS.
a lease on an IP address with a lease time greater than that known by
the server moving into PARTNER-DOWN state will either have contacted
that server during the MCLT period or their leases will have expired.
When a server has transitioned to PARTNER-DOWN state, it MUST NOT client-hardware-address
reallocate an IP address from one client to another client until an
additional maximum client lead time interval after the lease on the
first client expires. (Actually, until the maximum client lead time
after what it believes to be the lease expiration time of the first
client.)
The fundamental relationship on which much of the correctness of this There MAY be a DHCP client associated with the IP address whose
protocol depends is that the lease expiration time known to a DHCP binding has now been released by sending a DHCPRELEASE. If
client MUST NOT be more than the maximum client lead time greater there is, then the client-hardware-address option MUST appear,
and be set from the MAC address of the DHCP client which
released this IP address.
DRAFT November 1998 client-identifier
than the lease expiration time known to a server's partner. There MAY be a DHCP client associated with the IP address whose
binding has been released. If there is, then if the DHCP client
which most recently leased this IP address used a client-
identifier option to identify itself, then the client-identifier
MUST appear in the BNDUPD message, else it MUST NOT appear.
The remainder of this section makes the above fundamental relation- client-hardware-address
ship more explicit. There MAY be a DHCP client associated with the IP address whose
binding is now desired to be FREE. If there is, then the
client-hardware-address option MUST appear, and be set from the
MAC address of the DHCP client which released this IP address.
This protocol requires a DHCP server to deal with several different client-identifier
lease intervals and places specific restrictions on their relation-
ships. The purpose of these restrictions is to allow the other server
in the pair to be able to make certain assumptions in the absence of
an ability to communicate between servers.
The different lease times are: There MAY be a DHCP client associated with the IP address whose
binding is now desired to be FREE. If there is, then if the
DHCP client which released this IP address used a client-
identifier option to identify itself, then the client-identifier
MUST appear in the BNDUPD message, else it MUST NOT appear.
o desired client lease interval grace-expiration-time
The desired client lease interval is the lease interval that a The grace-expiration-time MUST appear, and is the length of time
DHCP server would like to give to a DHCP client in the absence that this server will wait before trying to make the IP address
of any restrictions imposed by the Failover protocol. Its available after the lease was released for this IP address
determination is outside of the scope of this protocol. Typi-
cally this is the result of external configuration of a DHCP
server.
o actual client lease interval o ABANDONED
The actual client lease internal is the lease interval that a An ABANDONED IP address is one that has been considered unusable
DHCP server gives out to a DHCP client. It may be shorter than by the DHCP subsystem. An IP address for which a valid PING
the desired client lease interval (as explained below). response was received SHOULD be set to ABANDONED.
o desired partner server lease interval client-hardware-address
The desired partner server lease interval is the lease expira- There SHOULD NOT be a DHCP client associated with an ABANDONDED
tion interval the local server tells to its partner. IP address. The client-hardware-address option MUST NOT appear
in the BNDUPD message.
o acknowledged partner server lease interval client-identifier
The acknowledged partner server lease interval is the interval There SHOULD NOT be a DHCP client associated with the IP address
the partner server has most recently acknowledged. whose binding has now been ABANDONED. The client-identifier
option MUST-NOT appear in the BNDUPD message.
The key restriction (and guarantee) that any server makes with o RESET
respect to lease intervals is that the actual client lease interval
never exceeds the acknowledged partner server lease interval (if any)
by more than a fixed amount. This fixed amount is called the "Max-
imum Client Lead Time" (MCLT).
The MCLT MAY be configurable, but for correct server operation it The RESET value of the binding-status is used to indicate that
MUST be the same and known to both the primary and secondary servers. this IP address was made available by operator command.
It is transmitted from the primary to the secondary in every message o BACKUP
DRAFT November 1998 The BACKUP value of binding-status indicates that this IP
address belongs to the secondary server, and can be allocated by
that server to a DHCP client at any time.
sent with the RESTART bit set, and also in every poll and poll reply client-hardware-address
message. The secondary MUST ensure that its value agrees with that
of the primary. See section 3.14 concerning the MCLT Option.
A server MUST record in its stable storage both the local server There MAY be a DHCP client associated with an BACKUP IP address.
lease interval and the most recently acknowledged partner server If there is, the client-hardware-address option MUST appear, and
lease interval for each IP address binding. It is assumed that the be set from the MAC address of the DHCP client to which this IP
desired client lease interval can be determined through techniques address was most recently associated.
outside of the scope of this protocol.
Again, the fundamental relationship among these times which MUST be client-identifier
maintained is:
actual client lease interval < There MAY be a DHCP client associated with this IP address. If
( acknowledged partner lease interval + MCLT ) the DHCP client to which this IP address is leased used a
client-identifier option to identify itself, then the client-
identifier MUST appear in the BNDUPD message, else it MUST NOT
appear.
The "acknowledged partner lease interval" is the acknowledged secon- The following option information is generic to all BNDUPD messages,
dary server lease interval for the primary server, and it would be regardless of the value of the binding-status.
the acknowledged primary server lease interval for the secondary
server when it is operating out of contact with the primary server.
Figure 5.1-1 illustrates a initial lease to a client using the rules o start-time-of-state
discussed in the example which follows it.
DRAFT November 1998 The start-time-of-state SHOULD appear. It is set to the time at
which this IP address first took on the state that corresponds to
the current value of binding-status.
DHCP Primary Secondary o last-transaction-time
Client Server Server
| | | The last-transaction-time value SHOULD appear. This is the time at
| >-DHCPDISCOVER-> | | which this DHCP server last received a packet from the DHCP client
| <---DHCPOFFER-< | | referenced by the client-identifier or client-hardware-address that
| | | was associated with the IP address referenced by the assigned-IP-
| >-DHCPREQUEST-> | | address.
| (selecting) | |
| | |
| <--------DHCPACK-< | |
| ^ (MCLT) | |
| : | >-DHCPBNDUPD--> |
| : | (1/2 MCLT + X ) |
| : | |
| : | <-DHCPBNDACK-< |
| MCLT / 2 | |
... : ... ...
| : | |
| V | |
| >-DHCPREQUEST-> | |
| (renew) | |
| | |
| <--------DHCPACK-< | |
| ^ (X) | |
| : | >-DHCPBNDUPD--> |
| : | ( 1/2 X + X ) |
| : | |
| : | <-DHCPBNDACK-< |
| X / 2 | |
| : | |
... ... ... ...
Figure 5.1-1: Lazy Update Message Traffic o client-FQDN
X = Desired Client Lease Interval
DISCUSSION: If the DHCP server is performing dynamic DNS operations on behalf
of the DHCP client represented by the client-identifier or client-
hardware-address, then it should include a client-FQDN option con-
taining the host name, domain name, and status of any dynamic DNS
operations enabled.
This protocol mandates no algorithm concerning these lease inter- The BNDUPD message SHOULD be sent as soon as possible from the time
vals, as long as above fundamental relationship is preserved. that the DHCP client received a response and the lease bindings data-
base is written on stable storage.
In the interests of clarity, however, let's examine a specific 7.1.2. Receiving the BNDUPD message
example. The MCLT in this case is 1 hour. The desired client
lease interval is 3 days, and its renewal time is half the lease
interval.
DRAFT November 1998 When a server receives a BNDUPD message, it needs to decide how to
processes the message and whether the message represents a conflict
of any sort. The conflict resolution process is used on the receipt
of every BNDUPD message, not just those that are received while in
POTENTIAL-CONFLICT state, in order to increase the robustness of the
protocol.
The rules for this example are: There are two sorts of conflict. The first, more major conflict, is
when a server receives a BNDUPD message from its partner for an
ACTIVE IP address and finds that the client specified in the BNDUPD
message is different from the client associated with this ACTIVE IP
address in this server's bindings database.
o What to tell the client: The second sort of conflict is where the receiving server has in its
bindings database the client specified in the BNDUPD message associ-
ated with a different IP address.
Take the remainder of the acknowledged partner server lease These two conflict cases can both occur together with the same BNDUPD
interval. If this is a new lease, then this value will be zero. message.
If this remainder plus the MCLT is greater than the desired
client lease interval, give the client the desired client lease
interval else give the client the remainder plus the MCLT.
o What to tell the failover partner server: When receiving a BNDUPD message, the server first determines the IP
address from the assigned-IP-address option, and then determines if
there was any client associated with this IP address by looking for
the client-identifier option. If there is no client-identifier
option, then the server looks for a client-hardware-address option,
and ultimately determines the client's identity specified in the
BNDUPD.
Take the renewal interval (typically half of the actual client The client specified in the BNDUPD message is compared to the client
lease interval), and add to it the desired client lease inter- currently associated with the IP address in this server's bindings
val. database. If they are the same, continue. If there is no client in
this server's binding database, continue. If there is a client in
this server's bindings database, and it is different from that speci-
fied in the BNDUPD message, a 'client conflict' exists. See the sec-
tion below on conflict resolution. If the client specified in the
BNDUPD message is associated with a different IP address in this
server's bindings database in the same subnet, then an 'IP address
conflict' exists. This does not refer to the case where a single
client has addresses in multiple different subnets or administrative
domains, but rather the case where in the same subnet the client has
as lease on one IP address in one server and on a different IP
address on the other server. See the section below on conflict reso-
lution.
In operation this might work as follows: If none of the conflicts mentioned above exist, then develop a time
for both the BNDUPD message and the server's information.
When a primary server makes an offer for a new lease on an IP The time for both the BNDUPD and the server's information are
address to a DHCP client, it determines the desired client lease developed independently in the following way: If there is a client-
interval (in this case, 3 days). It then examines the ack- last-transaction time, use that. If there isn't, but there is a
nowledged partner lease interval (which in this case is zero) and start-time-of-state, use that. If there isn't, but there is a
determines the remainder of the time left to run, which is also client-expiration-time, use that. If there isn't, then use the time
zero. To this it adds the the MCLT. Since the actual client the BNDUPD message was received for a BNDUPD message, and the current
lease interval cannot be allowed to exceed the remainder of the time for the server's information.
current partner lease interval plus the MCLT, the offer made to
the client is for the remainder of the current partner lease
interval (i.e., zero) plus the MCLT. Thus, the actual client
lease interval is 1 hour.
Once the primary server has performed the ACK to the DHCP client, Then the server determines the binding-status in the BNDUPD, and
it will update the secondary server with the lease information. takes the following actions based on binding-status:
However, the desired partner server lease interval will be com-
posed of the one half of the current actual client lease interval
added to the desired client lease interval. Thus, the secondary
server is updated with a DHCPBNDUPD with a lease interval of 3
days + 1/2 hour specified in the Lease Duration Option (Option
51).
When the primary server receives an ACK to its update of the (In the following list, to "accept" a BNDUPD means to update the
secondary server's (partner's) lease interval, it records that as server's bindings database with the information contained in the
the acknowledged partner server lease interval. A server MUST NOT BNDUPD and once that update is complete, send a BNDACK message
send a DHCPBNDACK in response to a DHCPBNDUPD message until it is corresponding to the BNDUPD message).
sure that the information in the DHCPBNDUPD message resides in its
stable storage. Thus, the primary server in this case can be sure
that the secondary server has recorded the desired partner server
lease interval in its stable storage when the primary server
receives a DHCPBNDACK message from the secondary server.
DRAFT November 1998 o ACTIVE in BNDUPD
When the DHCP client attempts to renew at T1 (approximately one If the BNDUPD is LATER than the server's information, accept it,
half an hour from the start of the lease), the primary server else reject it.
again determines the desired client lease interval, which is still
3 days. It then compares this with the remaining acknowledged
partner server lease interval (3 days + 1/2 hour) and adjusts for
the time passed since the secondary was last updated (1/2 hour).
Thus the remaining time on the acknowledged partner server lease
interval is 3 days. Adding the MCLT to this yields 3 days plus 1
hour, which is less than the desired client lease interval of 3
days. So the client is renewed for the desired client lease
interval -- 3 days.
When the primary DHCP server updates the secondary DHCP server o EXPIRED or EXPIRED-GRACE in BNDUPD
after the DHCP client's renewal ACK is complete, it will calculate
the desired partner server lease interval as the T1 fraction of
the actual client lease interval (1/2 of 3 days this time = 1.5
days). To this it will add the desired client lease interval of 3
days, yielding a total desired partner server lease interval of
4.5 days. In this way, the primary attempts to have the secondary
always "lead" the client in its understanding of the client's
lease interval so as to be able to always offer the client the
desired client lease interval.
Once the initial actual client lease interval of the MCLT is past, If the binding-status in the receiving server's bindings data-
the protocol operates effectively like the DHCP protocol does base is ACTIVE, then reject the BNDUPD. Otherwise, accept the
today in its behavior concerning lease intervals. However, the BNDUPD.
guarantee that the actual client lease interval will never exceed
the remaining acknowledged partner server lease interval by more
than the MCLT allows full recovery from a variety of failures.
5.2. Controlled re-allocation of IP addresses If the binding-status in the BNDUPD is EXPIRED-GRACE and the
server receiving the BNDUPD does not implement a grace period
for expired leases, then the server MUST set its lease expira-
tion to value held in the grace-expiration in the BNDUPD.
When in PARTNER-DOWN state (after a period defined in detail in sec- o RELEASED or RELEASED-GRACE in BNDUPD
tion 6.5.2 has passed), a there are no restrictions on reallocating a
lease from one client to another.
In any other state, a server cannot reallocate an address from one If the BNDUPD is LATER than the server's information, accept it,
client to another without first notifying (through a DHCPBNDUPD mes- else reject it.
sage) and receiving acknowledgement (through a DHCPBNDACK message)
that its partner is aware that that first client is not using the
address.
This could be modeled in the following way (though this specific If the binding-status in the BNDUPD is RELEASED-GRACE and the
implementation is in no way required). An "available" IP address on server receiving the BNDUPD does not implement a grace period
a server may be allocated to any client. An IP address which was for released leases, then the server MUST set its lease expira-
leased to a client and which expired or was released by that client tion to value held in the grace-expiration in the BNDUPD.
would take on a new state, say "pending-available". When an IP
address became "pending-available", the partner server would be
DRAFT November 1998 o FREE or BACKUP in BNDUPD
notified that this IP address was "available" through a DHCPBNDUPD. If the binding-status in the receiving server's database is
When the sending server received the DHCPBNDACK for that IP address ACTIVE and the lease-expiration-time has not yet been reached,
showing it was "available", it would move the IP address from reject it, else accept it.
"pending-available" to "available", and it would be available for
allocation to any clients.
A server MAY reallocate an IP address in "pending-available" state to o RESET or ABANDONDED in BNDUPD
the same client with no restrictions.
5.3. Secondary renewal of leases Accept it under all circumstances.
When operating in NORMAL state, a secondary server MAY process 7.1.3. Conflict resolution when receiving the BNDUPD message
DHCPREQUEST messages for renewal or rebinding leases. In this case,
the requirements for control of lease time and re-allocation of IP
addresses are the same as that of the primary server.
6. Server Operation When a either of the following conflicts exists between the informa-
tion in a BNDUPD message and the information held in the receiving
server's bindings database, it should be resolved in the following
manner:
This section discusses the operation of a server implementing the o client conflict
Failover protocol using the state transition diagram in Figure 6.2-1.
This is the common state transition diagram for both servers in a
pair.
6.1. Server Initialization This is the duplicate IP address allocation conflict. There are
two different clients each allocated the same address.
When a server starts it starts out in STARTUP state. See section 6.4 If times for both exist, use the LATER update, else use the
below for details. information from the primary server.
6.2. Establishing Communications Integrity o IP address conflict
Central to the operation of the Failover protocol is a notion of An IP address conflict exists when a client on one server is
"communications okay" or "communications failed". State transitions associated with a one IP address, and on the other server with a
are taken in many cases when the status of communications with the different IP address in the same or a related subnet. If one
partner changes. binding-status is ACTIVE and the other is anything but ACTIVE,
then the information in the ACTIVE binding SHOULD be used. Oth-
erwise, if times exist, then the LATER SHOULD be used. Other-
wise, if times do not exist, then the information from the pri-
mary server should be used.
A specific discipline exists for establishing and verifying communi- 7.2. BNDACK message
cations integrity. Communications is set to "okay" whenever a mes-
sage sent is acked by the partner. After an implementation dependent
length of time from the communications "okay" event the communica-
tions with the partner are deemed to have "failed" if no subsequent
acknowledgments have been received. Whenever a DHCPPRPL, DHCPUP-
DATEDONE, DHCPPOOLRESP or DHCPBNDACK is received this time period is
restarted.
Obviously, as the time period elapses, a server SHOULD send DHCPPOLL Every BNDUPD message that is received by a server MUST be responded
messages in order to elicit a DHCPPRPL message in reply, which will to with a corresponding BNDUPD message. The receiving server SHOULD
respond quickly to every BNDUPD message but it MAY choose to respond
preferentially to DHCP client requests instead of BNDUPD messages,
since there is no absolute time period within which a BNDACK must be
sent in response to a BNDUPD message, and DHCP clients frequently do
have time constraints that must be met.
DRAFT November 1998 7.2.1. Sending the BNDACK message
reset the time period. The BNDACK message MUST contain the same xid as the corresponding
BNDUPD message.
While an implementation SHOULD restart this time period on every All of the options which appear in the BNDUPD message MUST be
DHCPUPDATEDONE, DHCPPOOLRESP or DHCPBNDACK or DHCPRPL, it MAY choose included in the BNDACK message. The values in the options MAY be
to only restart it on a DHCPPRPL. updated to reflect current information on the server sending the
BNDACK. Note that update of this information may be used for infor-
mational purposes, but MUST NOT be assumed to necessarily be recorded
in the stable storage of the server who sent the BNDUPD message
because there is not corresponding ACK of the BNDACK message. Any
information that SHOULD be recorded in the partner server's stable
storage MUST be transmitted in a subsequent BNDUPD.
This technique ensures that two-way communications integrity exists If the server is accepting the BNDUPD, the BNDACK message includes
between the servers. Were the timeout period to be reset on the only those options that appears in the BNDUPD message. If the server
receipt of any message from the partner, a network failure where one is rejecting the BNDUPD, the additional option reject-reason MUST
server could send but not receive messages to the partner could lead appear in the BNDACK message, and the message option SHOULD appear in
to failure of the entire redundant DHCP subsystem. For example, in a this case containing a human-readable error message describing in
situation where the primary could send but not receive any messages, some detail the reason for the rejection of the BNDUPD message.
the secondary would never take over from the primary and yet DHCP
clients would not receive any service.
6.3. Server State Transitions 7.2.2. Receiving the BNDACK message
Figure 6.2-1 is the diagram of the server state transitions. The When a server receives a BNDACK message, if it doesn't contain a
reject-reason option that means that the BNDUPD message was accepted,
and the server which sent the BNDUPD MUST update its stable storage
with the potential-expiration-time value sent in the BNDUPD message
and returned in the BNDACK message. Other values sent in the BNDUPD
message MAY be used as desired.
7.3. UPDREQ message
The update request (UPDREQ) message is used by one server to request
that its partner send it all of the binding database information that
it has not already seen. Since each server is required to keep
track at all times of the binding information the other server has
received and ACKed, one server can request transmission of all un-
ACKed binding database information held by the other server by using
the UPDREQ message.
The UPDREQ message is used whenever the sending server cannot proceed
before it has processed all previously un-ACKed binding update infor-
mation, since the UPDREQ message should yield a corresponding UPDDONE
message. The UPDDONE message is not sent until the server that sent
the UPDREQ message has responded to all of the BNDUPD messages gen-
erated by the UPDREQ message with BNDACK messages. Thus, the sender
of the UPDREQ message can be sure upon receipt of an UPDDONE message
that it has received and commited to stable storage all outstanding
binding database updates.
See section 9, Protcol state transitions, for the details of when the
UPDREQ message is sent.
7.3.1. Sending the UPDREQ message
There are no options for the UPDREQ message.
The UPDREQ message is sent with a unique xid.
7.3.2. Receiving the UPDREQ message
A server receiving an UPDREQ message MUST send all binding database
changes that have not yet been ACKed by the sending server. These
changes are sent as undistinguished BNDUPD messages.
However, the server which received and is processing the UPDREQ mes-
sage MUST track the BNDACK messages that correspond to the BNDUPD
messages triggered by the UPDREQ message and, when they are all
received, the server MUST send an UPDDONE message.
When queuing up the BNDUPD messages for transmission to the sender of
the UPDREQ message, the receiving server MUST honor the value
returned in the max-unacked-bndupd option in the CONNECT or CONNEC-
TACK message that set up the connection with the sending server. It
MUST NOT send more BNDUPD messages without receiving corresponding
BNDACKs than the value returned in max-unacked-bndupd.
7.4. UPDREQALL message
The update request all (UPDREQALL) message is used by one server to
request that its partner send it all of the binding database informa-
tion. This message is used to allow one server to recover from a
failure of stable storage and to restore its binding database in its
entirety from the other server.
A server which sends an UPDREQALL message cannot proceed until all of
its binding update information is restored, and it knows that all of
that information is restored when an UPDDONE message is received.
See section 9, Protcol state transitions, for the details of when the
UPDREQALL message is sent.
7.4.1. Sending the UPDREQALL message
There are no options for the UPDREQALL message.
The UPDREQALL message is sent with a unique xid.
7.4.2. Receiving the UPDREQALL message
A server receiving an UPDREQALL message MUST send all binding data-
base information to the sending server. These changes are sent as
undistinguished BNDUPD messages.
However, the server receiving the UPDREQALL message MUST track the
BNDACK messages that correspond to the BNDUPD messages triggered by
the UPDREQ message and, when they are all received, the server MUST
send an UPDDONE message.
When queuing up the BNDUPD messages for transmission to the sender of
the UPDREQALL message, the receiving server MUST honor the value
returned in the max-unacked-bndupd option in the CONNECT or CONNEC-
TACK message that set up the connection with the sending server. It
MUST NOT send more BNDUPD messages without receiving corresponding
BNDACKs than the value returned in max-unacked-bndupd.
7.5. UPDDONE message
The update done (UPDDONE) message is used by a server receiving an
UPDREQ or UPDREQALL message to signify that it has sent all of the
BNDUPD messages requested by the UPDREQ or UPDREQALL request and that
it has received a BNDACK for each of those messages.
7.5.1. Sending the UPDDONE message
The UPDDONE message SHOULD be sent as soon as the last BNDACK message
corresponding to a BNDUPD message requested by the UPDREQ or
UPDREQALL is received from the server which sent the UPDREQ or
UPDREQALL.
7.5.2. Receiving the UPDDONE message
A server receiving the UPDDONE message knows that all of the informa-
tion that it requested by sending an UPDREQ or UPDREQALL message has
now been sent and that it has recorded this information in its stable
storage. It typically uses that the receipt of an UPDDONE message to
move to a different failover state. See sections 9.5.2 and 9.8.3 for
details.
7.6. POOLREQ message
The pool request (POOLREQ) message is used by the secondary server to
request an allocation of IP addresses from the primary server. It
MUST be sent by a secondary server to a primary server to request IP
address allocation by the primary. The IP addresses allocated are
transmitted using normal BNDUPD messages from the primary to the
secondary.
The POOLREQ message SHOULD be sent from the secondary to the primary
whenever the secondary transitions into NORMAL state. It SHOULD
periodically be resent in order that any change in the number of
available IP addresses on the primary be reflected in the pool on the
secondary.
7.6.1. Sending the POOLREQ message
The POOLREQ message has no options. It must be sent with a unique
xid.
7.6.2. Receiving the POOLREQ message
When a primary server receives a POOLREQ message it SHOULD examine
the binding database and determine how many IP addresses the secon-
dary server should have, and set these IP addresses to BACKUP state.
It SHOULD then send BNDUPD messages concerning all of these IP
addresses to the secondary server.
Servers frequently have several kinds of IP addresses available on a
particular network segment. The failover protocol assumes that both
primary and secondary servers are configured in such a way that each
knows the type and number of IP addresses on every network segment
participating in the failover protocol. The primary server is
responsible for allocating the secondary server the correct propor-
tion of available IP addresses of each kind, and the secondary server
is responsible for being configured in such a way that it can tell
the kind of every IP address based solely on the IP address itself.
A primary server MUST keep track of how many IP addresses were allo-
cated as a result of processing the POOLREQ message, and send that
number in the POOLRESP message.
A primary server MAY choose to defer processing a POOLREQ message
until a more convenient time to process it, but it should not depend
on the secondary server to retransmit the POOLREQ message in that
case.
If a secondary server receives a POOLREQ message it SHOULD report an
error.
7.7. POOLRESP message
A primary server sends a POOLRESP message to a secondary server after
the allocation process for available addresses to the secondary
server is complete. Typically this message will precede some of the
BNDUPD messages that the primary uses to send the actual allocated IP
addresses to the secondary.
7.7.1. Sending the POOLRESP message
The POOLRESP message MUST contain the same xid as the corresponding
POOLREQ message.
The only option which MUST appear in a POOLREQ message is:
o addressed-transferred
The number of addresses allocated to the secondary server by the
primary server as a result of a POOLREQ is contained in the
addresses-transferred option in a POOLRESP message. Note this
is the number of addresses that are transferred to the secondary
in the primary's binding database as a result of the correspond-
ing POOLREQ message, and that it may be some time before they
can all be transmitted to the secondary server through the use
of BNDUPD messages.
7.7.2. Receiving the POOLRESP message
When a secondary server receives a POOLRESP message, it SHOULD send
another POOLRESP message if the value of the addresses-transferred
option is non-zero.
Typically, no other action is taken on the reception of a POOLRESP
message.
7.8. CONNECT message
The connect message is used to establish an applications level con-
nection over a newly created TCP connection. It gives the source
information for the connection, and some important configuration
information. It may be sent by either primary or secondary server.
It is sent by the initiator of a TCP connection.
7.8.1. Sending the CONNECT message
The CONNECT message MUST be the first message sent by the initiator
of a TCP connection after the establishment of a new TCP connection
with another server participating in the failover protocol.
The xid of the CONNECT message must be unique.
The IP address of the sending server MUST be placed in the sending-
server-IP-address option. This information is placed in an option
inside of the packet in order to allow the identity of the sender to
be covered by a shared secret.
The role of the sending failover endpoint (i.e., either primary or
secondary) MUST be placed in the server-role option.
The current time MUST be placed in the current-time option.
The number of BNDUPD messages the server can accept without blocking
the TCP connection MUST be placed in the max-unacked-bndupd option.
This MUST be a number equal to or greater than 1, SHOULD be a number
greater than 10, and SHOULD be a number less than 100.
The length of the receive timer (tReceive, see section 8.3) MUST be
placed in the receive-timer option.
If the sending server is a primary server, then the MCLT MUST be
placed in the MCLT option.
If the sending server is a primary server, then the hash-bucket-
assignment option MUST be included in the CONNECT message. The value
of the hash-bucket-assignment option is determined from the specific
buckets that the primary server has determined that the secondary
server MUST service as part of the load-balancing algorithm. The way
in which the primary server determines this information is outside
the scope of this protocol definition. The primary server is SHOULD
be able to be configured with a percentage of clients that the secon-
dary server will be instructed to service, and the primary server
SHOULD convert that percentage value into a corresponding set of bits
in the hash-bucket-assignment option that are set to a 1, indicating
that the secondary server MUST service clients which map to those
hash buckets.
The vendor class identifier MUST be placed in the vendor-class-
identifier option.
The protocol-version option MUST be included in every CONNECT mes-
sage. The current value of the protocol version is 1.
The TLS-request option MUST be sent and contains the desired TLS con-
nection request as well as information concerning whether TLS is sup-
ported. If this CONNECT message is being sent over a already
created TLS connection, the TLS-request MUST NOT appear.
7.8.2. Receiving the CONNECT message
When a server receives a TCP connection on the failover port, it
should wait for a CONNECT message.
When a server receives a CONNECT message it should:
1. Record the time at which the message was received.
2. Examine the protocol-version option, and decide if this server
is capable of interoperating with another server running that
protocol version. If not, then send the CONNECTACK message
with the appropriate reject-reason. The server MUST include
its protocol-version in the CONNECTACK message.
3. Examine the TLS-request option. Figure out the TLS-reply
value based on the capabilities and configuration of this
server, and save it for the CONNECTACK message. If the
results of the TLS negotiation result in a connection rejec-
tion, then go immediately to send the CONNECTACK message.
The possibilities are:
CONNECT CONNECTACK
TLS-request TLS-reply
Reject
req acc t1 Reason Comments
--- --- -- ------ --------
0 0 0
0 0 1 11 receiver requires TLS
0 1 0
0 1 1
1 0 - request doesn't make sense
1 1 0
1 1 1
2 0 - request doesn't make sense
2 1 0 9 or 10 receiver won't do TLS
2 1 1
4. Check to see if there is a message-digest option in the CON-
NECT message. If there was, and the server does not support
message-digests, then reject the connection with the appropri-
ate reject-reason in the CONNECTACK.
5. Determine if the sender (from the sending-server-IP-address
option) and the role of the sender (from the server-role)
option represents a server with which the receiver was config-
ured to engage in failover activity.
If not, then the receiving server should reject the CONNECT
request by sending a CONNECTACK message with a reject-reason
value of: 8, invalid failover partner.
If it is, then the receiving failover endpoint should be
determined.
6. Decide if the time delta between the sending of the packet, in
the current-time option, and the receipt of the packet,
recorded in step 1 above, is acceptable. A server MAY require
an arbitrarily small delta in time values in order to set up a
failover connection with another server.
If the delta between the time values is too great, the server
should reject the CONNECT request by sending a CONNECTACK mes-
sage with a reject-reason of 4, time mismatch too great.
If the time mismatch is not considered too great then the
receiving server MUST record the delta between the servers.
The receiving server MUST use this delta to correct all of the
absolute times received from the other server in all time-
valued options. Note that server's can participate in fail-
over with arbitrarily great time mismatches, as long as it is
more or less constant.
7. If the receiving server is a secondary server, it MUST examine
the MCLT option in the CONNECT request and use the value of
the MCLT as the MCLT for this failover endpoint.
A receiving secondary server SHOULD be able to operate with
any MCLT sent by the primary, but if it cannot, then it
should send a CONNECTACK with a reject-reason of 5, MCLT
mismatch.
8. The receiving server MAY use the vendor-class-identifier to do
vendor specific processing.
7.9. CONNECTACK message
The CONNECTACK message is sent to accept or reject a CONNECT message.
It is sent by the server which accepted the TCP connection and
received a CONNECT message.
7.9.1. Sending the CONNECTACK message
The xid of the CONNECTACK message must be that of the corresponding
CONNECT message.
The IP address of the sending server MUST be placed in the sending-
server-IP-address option. This information is placed in an option
inside of the packet in order to allow the identity of the sender to
be covered by a shared secret.
The role of the sending failover endpoint (i.e., either primary or
secondary) MUST be placed in the server-role option.
The current time MUST be placed in the current-time option.
The protocol-version option MUST be included in every CONNECTACK mes-
sage. The current value of the protocol version is 1.
If the connection has been rejected, the reject-reason option MUST be
placed in the CONNECTACK message with an appropriate reason, and a
message option SHOULD be included with a human-readable error message
describing the reason for the rejection in some detail. If the
reject-reason option appears, then the remaining options listed below
do not appear.
The results of the TLS negotiation MUST be placed in the TLS-reply
option. If this CONNECTACK message is being sent over an already TLS
secured connection, then there MUST NOT be a TLS-reply option.
If there was a message-digest option in the CONNECT message, then
there MUST be a message-digest in the CONNECTACK message if it does
not contain a reject-reason.
The number of BNDUPD messages the server can accept without blocking
the TCP connection MUST be placed in the max-unacked-bndupd option.
This SHOULD be a number greater than 10, and SHOULD be a number less
than 100.
The length of the receive timer (tReceive, see section 8.3) MUST be
placed in the receive-timer option.
If the sending server is a primary server, then the MCLT MUST be
placed in the MCLT option.
The vendor class identifier MUST be placed in the vendor-class-
identifier option.
If the server is rejecting the CONNECT message, then the reject-
reason option MUST appear. A message option MAY appear to give a
human readable version of the rejection reason.
After sending a CONNECTACK message, the server MUST send a STATE mes-
sage.
After sending a CONNECTACK message, the server MUST start two timers
for the connection: tSend and tReceive. The tSend timer SHOULD be
approximately 20 percent of the time in the receiver-timer option in
the corresponding CONNECT message. The tReceive timer SHOULD be the
time sent in the receiver-timer option in the CONNECTACK message.
The tReceive timer is reset whenever a message is received from this
TCP connection. If it ever expires, the TCP connection is dropped
and communications with this partner is considered not ok.
The tSend timer is reset whenever a packet is sent over this connec-
tion. When it expires, a CONTACT message MUST be sent.
7.9.2. Receiving the CONNECTACK message
When a CONNECTACK message is received, the following actions should
be taken:
1. Record the time the packet was received.
2. Check to see if there is a reject-reason option in the CONNEC-
TACK message. If not, continue with step 3. If there is a
reject-reason option, the server SHOULD report the error code.
If a message option appears a server SHOULD display the string
from the message option in a user visible way. The server
MUST close the connection if a reject-reason option appears.
3. Check to see if the xid on the CONNECTACK matches an outstand-
ing CONNECT message on this TCP connection.
4. Check the value of the TLS-reply option, and if it was 1, then
skip processing of the rest of the CONNECTACK message, and
immediately enter into TLS connection setup.
If it does not, a server SHOULD report an error.
5. Examine the value of the protocol-version option. If this
server is able to establish connections with another server
running this protocol version, then continue, else close the
connection.
6. Check to see if the sending-server-IP-address and server-role
in the CONNECTACK message correspond to the failover endpoint
for which this TCP connection was created.
If it was not, the server MUST drop the TCP connection and
SHOULD report an error.
7. Decide if the time delta between the sending of the packet, in
the current-time option, and the receipt of the packet,
recorded in step 1 above, is acceptable. A server MAY require
an arbitrarily small delta in time values in order to set up a
failover connection with another server.
If the delta between the time values is too great, the server
should drop the TCP connection.
If the time mismatch is not considered too great then the
receiving server MUST record the delta between the servers.
The receiving server MUST use this delta to correct all of the
absolute times received from the other server in all time-
valued options. Note that the failover protocol is con-
structed so that two servers can be failover partners with
arbitrarily great time mismatches.
8. If the receiving server is a secondary server, it MUST examine
the MCLT option in the CONNECT request and use the value of
the MCLT as the MCLT for this failover endpoint.
A receiving secondary server SHOULD be able to operate with
any MCLT sent by the primary, but if it cannot, then it MUST
drop the TCP connection.
9. The receiving server MAY use the vendor-class-identifier to do
vendor specific processing.
10. After accepting a CONNECTACK message, the server MUST send a
STATE message.
After receiving a CONNECTACK message, the server MUST start
two timers for the connection: tSend and tReceive. The tSend
timer SHOULD be approximately 20 percent of the time in the
receiver-timer option in the corresponding CONNECTACK message.
The tReceive timer SHOULD be set to the time sent in the
receiver-timer option in the CONNECT message.
The tReceive timer is reset whenever a message is received
from this TCP connection. If it ever expires, the TCP connec-
tion is dropped and communications with this partner is con-
sidered not ok.
The tSend timer is reset whenever a packet is sent over this
connection. When it expires, a CONTACT message MUST be sent.
7.10. STATE message
The state (STATE) message is used to communicate the current failover
state to the partner server.
The STATE message MUST be sent after sending a CONNECTACK message
that didn't contain a reject-reason option, and MUST be sent after
receiving a CONNECTACK message without a reject-reason option.
A STATE message MUST be sent whenever the failover endpoint changes
its failover state and a connection exists to the partner.
The STATE message requires no response from the failover partner.
7.10.1. Sending the STATE message
The current failover state is placed in the server-state option and
the current state of the STARTUP flag is placed in the server-flags
option.
The message is sent with a unique xid.
A server SHOULD only send the STATE message either when the connec-
tion is created (i.e, after sending or receiving a CONNECTACK message
with no reject-reason option), or when there is a change from the
values sent in a previous STATE message.
7.10.2. Receiving the STATE message
Every STATE message SHOULD indicate a change in state or a change in
the flags.
When a STATE message is received, any state transitions specified in
section 9 are taken.
No response to a STATE message is required.
7.11. CONTACT message
The contact (CONTACT) message is sent to verify communications
integrity with a failover partner. The CONTACT message is sent when
no messages have been sent to the failover partner for a specified
period of time. This is determined by the tSend timer expiring (see
section 8.3).
7.11.1. Sending the CONTACT message
The current time is placed in the current-time option, and the CON-
TACT message is sent.
7.11.2. Receiving the CONTACT message
When a CONTACT message is received, the tReceive timer is reset (as
it is with any message that is received).
A server MAY use the time in the current-time option and the time
recorded above to refine the delta time calculations between the
servers.
8. Connection Management
Servers participating in the failover protocol communicate over TCP
connections. These TCP connections are used both to transmit bind-
ing information from one server to another as well as to allow each
server to determine whether communications is possible with the other
server.
Central to the operation of the failover protocol is a notion of
"communications okay" or "communications failed". Failover state
transitions are taken in many cases when the status of communications
with the partner changes, and the existence or non-existence of a TCP
connections between failover endpoints is used to determine if com-
munications is "okay" or "failed".
A single TCP connection exists which connects two failover endpoints.
8.1. Connection granularity
There exists one TCP connection between each set of failover end-
points. See section 5.1.1 for an explanation of failover endpoint.
There are a maximum of two TCP connections between any two servers
implementing the failover protocol, one for each of the possible
failover endpoints between these two servers. There is a minimum of
one TCP connection between one server and every other failover server
with which it implements the failover protocol.
8.2. Creating the TCP connection
Every server implementing the failover protocol MUST listen on port
647 for incoming failover TCP connections. The source port of the
TCP connection is unimportant.
Every server implementing the failover protocol SHOULD attempt to
connect to all of its partners periodically, where the period is
implementation dependent and SHOULD be configurable. In the event
that a connection has been rejected by a CONNECTACK message with a
reject-reason option contained in it, a server SHOULD reduce the fre-
quency with which it attempts to connect to that server but it SHOULD
continue to attempt to connect periodically.
Once a connection is established, the first message sent across the
connection MUST be a CONNECT message. This message establishes the
identity of the failover endpoint making the connection.
Every CONNECT message includes a TLS-request option, and if the CON-
NECTACK message does not reject the CONNECT message and the TLS-reply
option says TLS MUST be used, then the servers will enter into TLS
negotiation.
Once that negotiation is complete, then the server MUST resend the
CONNECT message on the newly secured TLS connection and then wait for
the CONNECTACK message in response. The TLS-request and TLS-reply
options MUST have the same values in this second CONNECT and CONNEC-
TACK message has they had in the first messages.
The second message sent over a new connection is a STATE message.
Upon the receipt of this message, the receiver can consider communi-
cations up.
It is entirely possible that two servers will attempt to make connec-
tions to each other essentially simultaneously, and then each will
send a CONNECT message down the new connection. In this case each
server will receive a CONNECT message on one connection having
already sent a CONNECT message on the other connection. In the event
that the primary server receives a CONNECT message from the secondary
server either while waiting for a CONNECTACK message from a secondary
server or when it has a valid connection open to a secondary server,
it will close the connection on which the CONNECT message was
received.
8.3. Using the TCP connection for determining communications status
The TCP connection is used to determine the communications status of
the other server, i.e., communications-ok, or communications-
interrupted.
Three things must happen for a server to consider that communications
are ok with respect to another server:
1. A TCP connection must be established to the other server.
2. A CONNECT message must be received and a CONNECTACK message
sent in response. The CONNECT message is used to determine
the identify of the failover endpoint of the other end of the
TCP connection -- without it, the failover endpoint cannot be
uniquely determined. Without knowledge of the failover end-
point, then the entity with which communications is ok is
undetermined.
3. A STATE message must be received from the other server over
the connection. This STATE message initializes important
information necessary to the operation of the state machine
the governs the behavior of this failover endpoint.
There are two ways that a server can determine that communications
has failed:
1. The TCP connection can go down, yielding an error when
attempting to send a message. This will happen at least as
often as the period of the tSend timer.
2. The tReceive timer can expire.
In either of these cases, communications is considered interrupted.
Several difficulties arise when trying to use one TCP connection for
both bulk data transfer as well as to sense the communications status
of the other server. One aspect of the problem stems from the dif-
ferent requirements of both uses. The bulk data transfer is of
course critically important to the protocol, but the speed with which
it is processed is not terribly significant. It might well be
minutes before a BNDUPD message is processed, and while not optimal,
such an occasional delay doesn't compromise the correctness of the
protocol. However, the speed with which one server detects the other
server is up (or, more importantly, down) is more highly constrained.
Generally one server should be able to detect that the other server
is not communicating within a minute or less.
These differing time constraints makes it difficult to use the same
TCP connection for data transfer as well as to sense communications
integrity. See section 3.5 for additional details on TCP.
The solution to this problem is to require a that some message be
received by each end of the connection within a limited time or that
the connection will be considered down. If no messages have been
sent recently, then a CONTACT message is sent.
In the case where there is no data queued to be sent, this is not a
problem, but in the case where there is data queued to be sent to the
partner, then the CONTACT message will not actually be transmitted
until the queued data is sent. Section 3.5 explains why waiting for
TCP to determine that the connection is down is not acceptable, and
leads a requirement that the receiving server never block the sending
server from sending CONTACT packets.
In order to meet this requirement, each server tells the other server
the number of outstanding BNDUPD messages that it will accept. The
receiving server is required to always be able to accept that many
BNDUPD messages off of the connection's input queue even if it cannot
process them immediately, and to accept all other messages immedi-
ately.
Thus, the sending server's TCP is never blocked from sending a mes-
sage except for very short periods, less than a few seconds unless
the network connection itself has problems. In this case, if the
CONTACT messages don't make it to the partner then the partner will
close the connection.
8.4. Using the TCP connection for binding data
Binding data, in the form of BNDUPD messages and BNDACK messages to
respond to them, are sent across the TCP connection.
In order to support timely detection of any failure in the partner
server, the TCP connection MUST NOT block for more than a very short
time, on the order of a few seconds. Therefore, a server that is
sending BNDUPD messages MUST send only a restricted number before
receiving BNDACK messages about previous messages sent.
The number of outstanding BNDUPD messages that each server will
accept without causing TCP to block transmission of additional data
(i.e, CONTACT messages) is sent by each server in the CONNECT and
CONNECTACK messages in the max-unacked-bndupd option.
8.5. Using the TCP connection for control messages
The TCP connection is used for control messages: POOLREQ, UPDREQ,
STATE, UPDREQALL and the corresponding reply messages: POOLRESP,
UPDDONE. A server MUST immediately accept all of these messages from
the TCP connection. A server MUST immediately accept any BNDACK
which is received as well.
8.6. Losing the TCP connection
When the TCP connection is lost, then communications is not ok with
the other server. A server which has lost communications SHOULD
immediately attempt to reconnect to the other server, and should
retry these connection attempts periodically.
Any BNDUPD or other messages that have been received but not yet pro-
cessed from the partner SHOULD be processed as soon as possible.
9. Protocol States
This section discusses the various states that a failover endpoint may
take, and the server actions required when entering the state, operating
in the state, and leaving the state, as well as the events that cause
transitions out of the state into another state.
The state transition diagram in Figure 9.2-1 is relevant for this
section. In the event that the textual description of a state differs
from the state transition diagram, the textual description is to be con-
sidered authoritative. This is the common state transition diagram for
both servers in a failover pair.
9.1. Server Initialization
When a server starts it starts out in STARTUP state. See section 9.4
below for details.
9.2. Server State Transitions
Whenever a server transitions into a new state, it MUST record the
state and the time at which it entered that state in stable storage.
If communications is "ok", it MUST also send a STATE message to its
failover partner.
Figure 9.2-1 is the diagram of the server state transitions. The
remainder of this section contains information important to the remainder of this section contains information important to the
understanding of that diagram. understanding of that diagram.
The server stays in the current state until all of the actions speci- The server stays in the current state until all of the actions speci-
fied on the state transition are complete. If communications fails fied on the state transition are complete. If communications fails
during one of the actions, the server simply stays in the current during one of the actions, the server simply stays in the current
state and attempts a transition whenever the conditions for a transi- state and attempts a transition whenever the conditions for a transi-
tion are later fulfilled. tion are later fulfilled.
In the state transition diagram below, the "+" or "-" in the upper In the state transition diagram below, the "+" or "-" in the upper
right corner of each state is a notation about whether communication right corner of each state is a notation about whether communication
is ongoing with the other server. is ongoing with the other server.
The legend "responsive", "partially-responsive", or "unresponsive" in The legend "responsive", "balanced", or "unresponsive" in each state
each state indicates whether the server is responsive to DHCP client indicates whether the server is responsive to all DHCP client
requests in the respective state. The terms "responsive" and requests, running in load balanced mode, or totally unresponsive in
"unresponsive" have the obvious meanings, while "partially- the respective state. The terms "responsive" and "unresponsive" have
responsive" means that a DHCP server may respond to DHCPREQUEST mes- the obvious meanings, while "balanced" means that a DHCP server may
sages that are RENEWAL or REBINDING, but to no other messages. respond to all DHCPREQUEST messages that are RENEWAL or REBINDING,
and to all other messages from clients for which the load balancing
algorithm indicates that it MUST respond to. See sections 5.3 and
9.6.2 for details on load balancing.
In the state transition diagram below, when communication is reesta- In the state transition diagram below, when communication is reesta-
blished between the two servers, each must record the state of the blished between the two servers, each must record the state of the
partner when communication was restored. State transitions on one partner when communication was restored. State transitions on one
server in some cases imply state transitions on the partner server, server in some cases imply state transitions on the partner server,
so a record of the current state of the partner server must be kept so a record of the current state of the partner server must be kept
by each server. by each server.
If a message is received from a partner with the state equal to zero
(0), then the receiving server should respond to that message with a
DHCPPRPL if it was a DHCPPOLL, but under no circumstances should it
DRAFT November 1998
consider communications to be "okay", nor take any state transitions
based on receipt of that message.
If the state of the partner changes while communicating a server If the state of the partner changes while communicating a server
moves through the communications-failed transition and into whatever moves through the communications-failed transition and into whatever
state results. It then immediately moves through whatever state state results. It then immediately moves through whatever state
transition is appropriate given the current state of the partner transition is appropriate given the current state of the partner
server. server. A server performing this operation SHOULD NOT drop the TCP
connection to its partner.
DISCUSSION: DISCUSSION:
The point of this technique is simplicity, both in explanation of The point of this technique is simplicity, both in explanation of
the protocol and in its implementation. The alternative to this the protocol and in its implementation. The alternative to this
technique of memory of partner state and automatic state transi- technique of memory of partner state and automatic state transi-
tion on change of partner state is to have every state in the fol- tion on change of partner state is to have every state in the fol-
lowing diagram have a state transition for every possible state of lowing diagram have a state transition for every possible state of
the partner. With the approach adopted, only the states in which the partner. With the approach adopted, only the states in which
communications are reestablished require a state transition for communications are reestablished require a state transition for
each possible partner state. each possible partner state.
The current state of a server must be recorded in stable storage and The current state of a server MUST be recorded in stable storage and
thus be available to the server after a server restart. thus be available to the server after a server restart.
DRAFT November 1998
+---------------+ V +--------------+ +---------------+ V +--------------+
| RECOVER - | | | STARTUP - | | RECOVER - | | | STARTUP - |
|(unresponsive) | +->|(unresponsive)| |(unresponsive) | +->|(unresponsive)|
+---------------+ +--------------+ +---------------+ +--------------+
Comm. OK +-----------------+ Comm. OK +-----------------+
Other State:-RECOVER | PARTNER DOWN - |<-----+ Other State:-RECOVER | PARTNER DOWN - |<-----+
| | | (responsive) | | | | | (responsive) | |
All POTENTIAL- +-----------------+ | All POTENTIAL- +-----------------+ |
Others CONFLICT------------ | --------+ ^(see | Others CONFLICT------------ | --------+ ^(see |
| Comm. OK | | 6.93) | | Comm. OK | | 9.8.3)|
UPDATEREQ(ALL) Other State: | +-----+ | UPDREQ(ALL) Other State: | +-----+ |
Wait UPDATEDONE | | | Comm. | | Wait UPDDONE | | | Comm. | |
Wait MCLT from fail RECOVER All Others| Failed | | Wait MCLT from fail RECOVER All Others| Failed | |
+--------------+ | V V | | | +--------------+ | V V | | |
|RECOVER-DONE +| +--+ +--------------+ | | |RECOVER-DONE +| +--+ +--------------+ | |
|(unresponsive)| | | POTENTIAL + |<--+ | |(unresponsive)| | | POTENTIAL + |<--+ |
+--------------+ Wait for +>| CONFLICT | | +--------------+ Wait for +>| CONFLICT | |
Comm. OK Other | |(unresponsive)|<--- | --+ Comm. OK Other | |(unresponsive)|<--- | --+
+--Other State:-+ State: | +--------------+ | | +--Other State:-+ State: | +--------------+ | |
| | | RECOVER | | | | | | | RECOVER | | | |
| All POTENT. DONE | Resolve Conflict | | | All POTENT. DONE | Resolve Conflict | |
| Others: CONFLICT-- | ----+ (see 6.9) | | | Others: CONFLICT-- | ----+ (see 9.8) | |
| Wait for V V | | | Wait for V V | |
| Other State: NORMAL +-----------------+ | | | Other State: NORMAL +-----------------+ | |
| V | NORMAL + | External | | | V | NORMAL + | External | |
| +--+----------+-->|(see 6.72, 6.73) |-Command-->+ | | +--+----------+-->| (balanced) |-Command-->+ |
| ^ ^ +-----------------+ | | | ^ ^ +-----------------+ | |
| | | | | | | | | | | |
| Wait for Comm. OK Comm. External | | Wait for Comm. OK Comm. External |
| Other Other Failed Command | | Other Other Failed Command |
| State: State: | or | | | State: State: | or | |
|RECOVER-DONE NORMAL Start Safe Safe | | |RECOVER-DONE NORMAL Start Safe Safe | |
| | COMM. INT. Period Timer Period | | | | COMM. INT. Period Timer Period | |
| Comm. OK. | V expiration | | Comm. OK. | V expiration |
| Other State: | +------------------+ | | | Other State: | +------------------+ | |
| RECOVER +--| COMMUNICATIONS - |-----------+ | | RECOVER +--| COMMUNICATIONS - |-----------+ |
V +-------------| INTERRUPTED | Comm. OK | V +-------------| INTERRUPTED | Comm. OK |
RECOVER | (responsive) |--Other State:-+ RECOVER | (responsive) |--Other State:-+
RECOVER-DONE--------->+------------------+ All Others RECOVER-DONE--------->+------------------+ All Others
Figure 6.2-1: Server state diagram. Figure 9.2-1: Server state diagram.
DRAFT November 1998
6.4. STARTUP state 9.3. STARTUP state
The STARTUP state affords an opportunity for a server to probe its The STARTUP state affords an opportunity for a server to probe its
partner server, before starting to service DHCP clients. partner server, before starting to service DHCP clients.
DISCUSSION: DISCUSSION:
Without the STARTUP state, a server would likely start in a state Without the STARTUP state, a server would likely start in a state
derived from its previously stored state (held in stable storage), derived from its previously stored state (held in stable storage),
if any. However, this may be inconsistent with the current state if any. However, this may be inconsistent with the current state
of the partner. The STARTUP state affords the opportunity for a of the partner. The STARTUP state affords the opportunity for a
server to potentially learn the partner's state and determine if server to potentially learn the partner's state and determine if
that state is consistent with its derived starting state or that state is consistent with its derived starting state or
whether some significant state change has occurred at the partner whether some significant state change has occurred at the partner
that forces the server to start in another state. This is that forces the server to start in another state. This is
especially critical if significant time has elapsed while the especially critical if significant time has elapsed while the
server was down. server was down.
6.4.1. Operation while in STARTUP state 9.3.1. Operation while in STARTUP state
Whenever a server is in STARTUP state, it MUST be unresponsive to Whenever a server is in STARTUP state, it MUST be unresponsive to
DHCP client requests, and so the time spent in the STARTUP state is DHCP client requests, and so the time spent in the STARTUP state is
necessarily short, typically on the order of a few seconds to a few necessarily short, typically on the order of a few seconds to a few
tens of seconds. The exact time spent in the STARTUP state is imple- tens of seconds. The exact time spent in the STARTUP state is imple-
mentation dependent, and the primary and secondary server are not mentation dependent, and the primary and secondary server are not
required to spend the same amount of time in the STARTUP state. required to spend the same amount of time in the STARTUP state.
Whenever any message is sent to the partner while in STARTUP state Whenever a STATE message is sent to the partner while in STARTUP
the STARTUP bit MUST be set in the 'flags' field of the message state the STARTUP bit MUST be set in the server-flags option and the
header. previously recorded failover state MUST be placed in the server-state
option.
6.4.2. Transition out of STARTUP state 9.3.2. Transition out of STARTUP state
Each server starts out in startup state every time it initializes Each server starts out in startup state every time it initializes
itself, and performs the following algorithm as part of its initiali- itself, and performs the following algorithm as part of its initiali-
zation: zation:
1. Ensure that the RESTART bit is set in the 'flags' field of the 1. Do not send any messages until step 5.
failover message header. Once set, the RESTART bit must
remain set in all failover messages sent by the server to the
partner until the first acknowledgment of a message is
received from that partner. This is required to assure that
the partner knows that the server has restarted, even if the
partner itself is unreachable for a long while.
DRAFT November 1998
Do not send any messages until step 5.
2. Is there any record in stable storage of a previous failover 2. Is there any record in stable storage of a previous failover
state? If yes, set previous-state to the last recorded state state? If yes, set previous-state to the last recorded state
in stable storage, and continue with step 3. in stable storage, and continue with step 3.
Is there any configuration information that indicates that Is there any configuration information that indicates that
this server was previously running but lost its stable this server was previously running but lost its stable
storage? Such information must typically come from some storage? Such information must typically come from some
administrative intervention, since it is difficult for a administrative intervention, since it is difficult for a
server to distinguish first startup from a startup after it server to distinguish first startup from a startup after it
has lost its stable storage. If yes, then set the previous- has lost its stable storage. If yes, then set the previous-
state to RECOVER, and set the time-of-failure to whatever time state to RECOVER, and set the time-of-failure to whatever time
was configured, and go on to step 3. This time-of-failure was configured, and go on to step 3. This time-of-failure
will be used in the transition out of the RECOVER state into will be used in the transition out of the RECOVER state into
the RECOVER-DONE state, below. the RECOVER-DONE state, below.
If there is no record of any previous failover state in stable If there is no record of any previous failover state in stable
storage nor of any previous operational activity for this storage nor of any previous operational activity for this
server, then set the previous-state to RECOVER and set the server, then set the previous-state to PARTNER-DOWN if this
time-of-failure to a time before the maximum-client-lead-time server is a primary and RECOVER if this server is a secondary,
before now. If using standard Posix times, 0 would typically and set the time-of-failure to a time before the maximum-
do quite well. client-lead-time before now. If using standard Posix times, 0
would typically do quite well.
3. Is the previous-state NORMAL? If yes, set the previous-state 3. Is the previous-state NORMAL? If yes, set the previous-state
to COMMUNICATIONS-INTERRUPTED. to COMMUNICATIONS-INTERRUPTED.
4. Start the STARTUP state timer. The time that a server remains 4. Start the STARTUP state timer. The time that a server remains
in the STARTUP state (absent any communications with its in the STARTUP state (absent any communications with its
partner) is implementation dependent (and would typically be partner) is implementation dependent and SHOULD be configur-
configurable). It should be long enough to poll several times able. It SHOULD be long enough to for a TCP connection to be
and stand a good chance to receive a response to at least one created to a heavily loaded partner across a slow network.
poll from a heavily loaded partner across a slow network.
5. Start sending DHCPPOLL messages (with both the RESTART and
STARTUP bits set in the 'flags' field).
6. Wait for "communications okay", i.e., the receipt of an 5. Attempt to create a TCP connection to the failover partner.
DHCPPRPL message. See section 8.2.
When a DHCPPRPL message is received, clear the RESTART flag, 6. Wait for "communications okay", i.e., the process discussed in
clear the STARTUP flag, and set the current state to the section 8.2 "Creating the TCP Connection", to complete,
previous-state. including the receipt of a STATE message from the partner.
If the partner is in PARTNER-DOWN state, and if its partner- When and if communications become "okay", clear the STARTUP
down time (received in the DHCPPRPL message in the Absolute flag, and set the current state to the previous-state.
Time Option) is later than the last recorded time of operation
of this server, then set the current state to RECOVER.
DRAFT November 1998 If the partner is in PARTNER-DOWN state, and if the time at
which it entered PARTNER-DOWN state (as receive in the start-
time-of-state option in the STATE message) is later than the
last recorded time of operation of this server, then set the
current state to RECOVER.
Then, transition to the current state and take the "communica- Then, transition to the current state and take the "communica-
tions okay" state transition based on the current state of tions okay" state transition based on the current state of
this server and the partner. this server and the partner.
7. If the startup time expires, take an implementation dependent 7. If the startup time expires, take an implementation dependent
action: The server MAY go to the previous-state, or the action: The server MAY go to the previous-state, or the
server MAY wait. server MAY wait.
Reasons to go to previous-state and begin processing: Reasons to go to previous-state and begin processing:
skipping to change at page 36, line 42 skipping to change at page 79, line 34
If the current server has been down for longer than the If the current server has been down for longer than the
maximum-client-lead-time, and it is partitioned from the other maximum-client-lead-time, and it is partitioned from the other
server, then when it returns it will attempt to use its own server, then when it returns it will attempt to use its own
available addresses to allocate to new DHCP clients, and the available addresses to allocate to new DHCP clients, and the
other server may well be in PARTNER-DOWN state and may have other server may well be in PARTNER-DOWN state and may have
already allocated some of those available addresses to DHCP already allocated some of those available addresses to DHCP
clients. In cases where the possibility of partition is high, clients. In cases where the possibility of partition is high,
and the safe period expiration time is less than the likely and the safe period expiration time is less than the likely
operator reaction time, this is a good approach to use. operator reaction time, this is a good approach to use.
6.5. PARTNER-DOWN state 9.4. PARTNER-DOWN state
PARTNER-DOWN state is a state either server can enter. When in this PARTNER-DOWN state is a state either server can enter. When in this
state, the server does not assume that the other server could still state, the server does not assume that the other server could still
be operating and servicing a different set of clients, but instead be operating and servicing a different set of clients, but instead
assumes that it is the only server operating. For this reason, only assumes that it is the only server operating. For this reason, only
one server should be operating in this state at a time. one server should be operating in this state at a time.
6.5.1. Upon Entry to PARTNER-DOWN state 9.4.1. Upon entry to PARTNER-DOWN state
When entering PARTNER-DOWN state a server MUST record the time of
entry, and must transmit it during every DHCPPOLL message or DHCPPRPL
DRAFT November 1998 No special actions are required when entering PARTNER-DOWN state.
message sent while in PARTNER-DOWN state. The server should continue to attempt to connect to the partner
periodically.
6.5.2. Operation while in PARTNER-DOWN state 9.4.2. Operation while in PARTNER-DOWN state
A server in PARTNER-DOWN state MUST respond to DHCP client requests. A server in PARTNER-DOWN state MUST respond to DHCP client requests.
It will allow renewal of all outstanding leases on IP addresses, and It will allow renewal of all outstanding leases on IP addresses, and
will allocate IP addresses from its own pool, and after a fixed will allocate IP addresses from its own pool, and after a fixed
period of time (the MCLT interval) has elapsed from entry into period of time (the MCLT interval) has elapsed from entry into
PARTNER-DOWN state, it will allocate IP addresses from the set of all PARTNER-DOWN state, it will allocate IP addresses from the set of all
available IP addresses. available IP addresses.
Once a server has entered NORMAL state, the PARTNER-DOWN state is Once a server has entered NORMAL state, the PARTNER-DOWN state is
entered only on command of an external agency (typically an adminis- entered only on command of an external agency (typically an adminis-
skipping to change at page 37, line 49 skipping to change at page 80, line 45
If the server wishes the Failover protocol to protect it from loss of If the server wishes the Failover protocol to protect it from loss of
stable storage in PARTNER-DOWN state, then it should ensure that the stable storage in PARTNER-DOWN state, then it should ensure that the
MCLT based lease time restrictions in Section 5.1 are maintained, MCLT based lease time restrictions in Section 5.1 are maintained,
even in PARTNER-DOWN state. even in PARTNER-DOWN state.
If the server wishes to forego the protection of the Failover proto- If the server wishes to forego the protection of the Failover proto-
col in the event of loss of stable storage, then it need recognize no col in the event of loss of stable storage, then it need recognize no
restrictions on actual client lease times while in PARTNER-DOWN restrictions on actual client lease times while in PARTNER-DOWN
state. state.
A server in PARTNER-DOWN state MUST poll its partner and attempt to A server in PARTNER-DOWN state attempt to establish communications
establish communications and synchronization. and synchronization with its partner.
While a server is in PARTNER-DOWN state, it MUST send the absolute 9.4.3. Transitions out of PARTNER-DOWN state
time of entry into PARTNER-DOWN using the absolute time option in
DRAFT November 1998 When a server in PARTNER-DOWN state succeeds in establishing a con-
nection to its partner, its actions are conditional on the state and
flags received in the STATE message from the other server as part of
the process of establishing the connection.
every DHCPPOLL and DHCPRPL message sent. If the STARTUP bit is set in the server-flags option of a received
STATE message, a server in PARTNER-DOWN state MUST NOT take any state
transitions based on reestablishing communications. Essentially, if a
server is in PARTNER-DOWN state, it ignores all STATE messages from
its partner that have the STARTUP bit set in the server-flags option
of the STATE message.
6.5.3. Transitions out of PARTNER-DOWN state If the STARTUP bit is not set in the server-flags option of a STATE
message received from its partner, then a server in PARTNER-DOWN
state take the following actions based on the value of the server-
state option in the received STATE message:
When a server in PARTNER-DOWN state succeeds in contacting its o partner in NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-DOWN or
partner, its actions are conditional on the state and flags received POTENTIAL-CONFLICT state
in the message from the other server.
If the STARTUP bit is set in the 'flags' field of a received DHCPPOLL transition to POTENTIAL-CONFLICT state
message, the server in PARTNER-DOWN state will send a DHCPPRPL mes-
sage with its current state (and with the absolute PARTNER-DOWN time
in the DHCPPRPL). A server in PARTNER-DOWN state MUST NOT take any
state transitions based on reestablishing communications if the
STARTUP bit is set in the 'flags' field of the messages that reesta-
blished communications.
If the STARTUP bit is not set in the 'flags' field then a server in o partner in RECOVER state
PARTNER-DOWN state will move into POTENTIAL-CONFLICT state if the
other server is in the NORMAL, COMMUNICATIONS-INTERRUPTED, PARTNER-
DOWN, or POTENTIAL-CONFLICT state.
If the STARTUP bit is not set in the 'flags' field, then a server in stay in PARTNER-DOWN state
PARTNER-DOWN state will stay in PARTNER-DOWN state if it detects that
the other server is in RECOVER state.
If the STARTUP bit is not set in the 'flags' field, then a server in o partner in RECOVER-DONE state
PARTNER-DOWN state moves into NORMAL state if it detects that the
other server is in RECOVER-DONE state.
6.6. RECOVER state transition into NORMAL state
9.5. RECOVER state
This state indicates that the server has no information in its stable This state indicates that the server has no information in its stable
storage or that it is re-integrating with a server in PARTNER-DOWN storage or that it is re-integrating with a server in PARTNER-DOWN
state after it has been down. A server in this state will attempt to state after it has been down. A server in this state will attempt to
refresh its stable storage from the other server. refresh its stable storage from the other server.
6.6.1. Operation in RECOVER state 9.5.1. Operation in RECOVER state
A server in RECOVER MUST NOT respond to DHCP client request. A server in RECOVER MUST NOT respond to DHCP client requests.
A server in RECOVER state will attempt to reestablish communications A server in RECOVER state will attempt to reestablish communications
with the other server. with the other server.
6.6.2. Transitions out of RECOVER state 9.5.2. Transitions out of RECOVER state
If the other server is in POTENTIAL-CONFLICT state when communica- If the other server is in POTENTIAL-CONFLICT state when communica-
tions are reestablished, then the server in RECOVER state will move tions are reestablished, then the server in RECOVER state will move
to POTENTIAL-CONFLICT state itself. to POTENTIAL-CONFLICT state itself.
DRAFT November 1998 If the other server is in RECOVER state, then this server SHOULD
signal an error and halt processing.
If the other server is in RECOVER state, then this server SHOULD sig-
nal an error and halt processing.
If the other server is in any other state, then the server in RECOVER If the other server is in any other state, then the server in RECOVER
state will request an update of missing binding information by send- state will request an update of missing binding information by send-
ing an UPDATEREQ message. If the server has been configured to indi- ing an UPDREQ message. If the server has been instructed (through
cate that it has lost its stable storage, it will send an configuration or other external agency) that it has lost its stable
UPDATEREQALL message, otherwise it will send an UPDATEREQ message. storage, it MUST send an UPDREQALL message, otherwise it MUST send an
UPDREQ message.
It will wait for an UPDATEDONE message, and upon receipt of that mes- It will wait for an UPDDONE message, and upon receipt of that message
sage it will start a timer whose expiration is set to a time equal to it will start a timer whose expiration is set to a time equal to the
the the time the server went down (if known) or the current time (if time the server went down (if known) or the current time (if the
the down-time is unknown) plus the maximum-client-lead-time. When down-time is unknown) plus the maximum-client-lead-time. When this
this timer goes off, the server will go into RECOVER-DONE state. timer goes off, the server will transition into RECOVER-DONE state.
This is to allow any IP addresses that were allocated by this server This is to allow any IP addresses that were allocated by this server
prior to loss of its client binding information in stable storage to prior to loss of its client binding information in stable storage to
contact the other server or to time out. contact the other server or to time out.
See Figure 6.6-1. See Figure 9.5.2-1.
DISCUSSION: DISCUSSION:
The actual requirement on this wait period in RECOVER is that it The actual requirement on this wait period in RECOVER is that it
start when the recovering server went down, not necessarily when start when the recovering server went down, not necessarily when
it came back up. If the time when the recovering server failed is it came back up. If the time when the recovering server failed is
known, then it could be communicated to the recovering server, and known, then it could be communicated to the recovering server, and
the wait period could be reduced to the maximum-client-lead-time the wait period could be reduced to the maximum-client-lead-time
less the difference between the current time and the time the less the difference between the current time and the time the
server failed. In this way, the waiting period could be minimized. server failed. In this way, the waiting period could be minimized.
If an UPDATEDONE message isn't received within an implementation If an UPDDONE message isn't received within an implementation depen-
dependent amount of time, and no DHCPBNDUPD message are being dent amount of time, and no BNDUPD message are being received, then
received, then the UPDATEREQ(ALL) message will be re-transmitted. the UPDREQ(ALL) message will be re-transmitted.
DRAFT November 1998
A B A B
Server Server Server Server
| | | |
RECOVER PARTNER-DOWN RECOVER PARTNER-DOWN
| | | |
| >--DHCPUPDATEREQ-------------> | | >--UPDREQ--------------------> |
| | | |
| <-----------------DHCPBNDUPD--< | | <---------------------BNDUPD--< |
| >--DHCPBNDACK----------------> | | >--BNDACK--------------------> |
... ... ... ...
| | | |
| <-----------------DHCPBNDUPD--< | | <---------------------BNDUPD--< |
| >--DHCPBNDACK----------------> | | >--BNDACK--------------------> |
| | | |
| <-------------DHCPUPDATEDONE--< | | <--------------------UPDDONE--< |
| | | |
Wait MCLT from last known | Wait MCLT from last known |
time of operation | time of operation |
| | | |
RECOVER-DONE | RECOVER-DONE |
| | | |
| >--DHCPPOLL-(RECOVER-DONE)---> | | >--STATE-(RECOVER-DONE)------> |
| <-------------------DHCPPRPL--< |
| |
| NORMAL | NORMAL
| | | <-------------(NORMAL)-STATE--< |
| <----------(NORMAL)-DHCPPOLL--< |
| >--DHCPPRPL------------------> |
| |
NORMAL | NORMAL |
| | | |
| | | |
Figure 6.6-1: Transition out of RECOVER state Figure 9.5.2-1: Transition out of RECOVER state
DRAFT November 1998
6.7. NORMAL state 9.6. NORMAL state
NORMAL state is the state used by a server when it can communicate NORMAL state is the state used by a server when it can communicate
with the other server. When in this state, the primary responds to with the other server.
DHCP all clients requests and while the secondary only responds to
renewal or rebinding requests which it receives. This is one of the
few states where the operation of the primary and secondary servers
are quite different.
6.7.1. Upon Entry to NORMAL state 9.6.1. Upon Entry to NORMAL state
When entering NORMAL state, a server will send to the other server When entering NORMAL state, a server will send to the other server
all currently unacknowledged DHCPBNDUPD messages. all currently unacknowledged binding updates as BNDUPD messages.
When the above process is complete, if the server entering NORMAL When the above process is complete, if the server entering NORMAL
state is a secondary server, then it will will request IP addresses state is a secondary server, then it will request IP addresses for
for allocation using the DHCPPOOLREQ message and the techniques allocation using the POOLREQ message.
described in section 2.5.
6.7.2. Operation in NORMAL state: Primary Server 9.6.2. Processing DHCP client requests and load balancing
When in NORMAL state, the primary server takes the following actions When in NORMAL state, each server MUST process all requests from some
to implement the Failover protocol: DHCP clients, and MUST NOT process any request other than a
DHCPREQUEST/RENEWAL or a DHCPREQUEST/REBINDING request from some
other DHCP clients. The load balancing algorithm determines into
which set a particular DHCP client falls.
o Lease Time Calculations As discussed in section 5.3, each server will take the client-
identifier from each DHCP client request (or the htype concatenated
to the front of the chaddr if no client-identifier is present in the
request), and hash it with the algorithm given in section 12. The
results of this hash algorithm yields a number between 0 and 255.
This number is used to index into the bit array received by a server
in the hash-bucket-assignment option (if the server is a secondary),
or into the inverse of the bit array sent to the secondary in the
hash-bucket-assignment option if the server is a primary.
As discussed in section 5.1, "Control of lease time", the lease If the bit found from this indexing process is a 1 bit, then the
interval given to a DHCP client can never be more than the server MUST process this DHCP request.
maximum-client-lead-time greater than the acknowledged partner-
server-lease-interval.
As long as the primary server adheres to this constraint, the In NORMAL state, a server MUST processes every DHCPREQUEST/RENEWAL or
specifics of the lease intervals that it gives to either the DHCPREQUEST/REBINDING request it receives.
DHCP client or the secondary DHCP server are implementation
dependent. One possible approach is shown in section 5.1, but
that particular approach is in no way required by this protocol.
o Lazy Update of Secondary Server 9.6.3. Operation in NORMAL state
After an ACK of a IP address binding, the primary server When in NORMAL state, for every DHCP client request that it
attempts to update the secondary with the binding information. processes, as determined by the algorithm described in section 9.6.2,
The lease time used in the update of the secondary MUST be at above, a server will operate in the following manner:
least that given to the DHCP client in the DHCPACK. It MAY,
however, be longer.
DRAFT November 1998 o Lease time calculations
o Reallocation of IP Addresses Between Clients As discussed in section 5.2.1, "Control of lease time", the
lease interval given to a DHCP client can never be more than the
MCLT greater than the most recently received potential-
expiration-time from the failover partner or the current time,
whichever is later.
Whenever a client binding is released, a DHCPBNDUPD message must As long as a server adheres to this constraint, the specifics of
be sent to the secondary server, setting the binding state to the lease interval that it gives to a DHCP client or the value
RELEASED. However, until a DHCPBNDACK is received for this mes- of the potential-expiration-time sent to its failover partner
sage, the IP address cannot be allocated to another client. It are implementation dependent. One possible approach is dis-
can be allocated to the same client again. cussed in section 5.2.1, but that particular approach is in no
way required by this protocol.
6.7.3. Operation in NORMAL state: Secondary Server o Lazy update of partner server
In normal state, the secondary server receives binding updates from After an ACK of a IP address binding, the server servicing a
the primary server in DHCPBNDUPD messages. It records these in its DHCP client request attempts to update its partner with the new
client binding database in stable storage and then sends the binding information. The lease time used in the update of the
corresponding DHCPBNDACK message to the primary server. It MUST secondary MUST be at that given to the DHCP client in the
ensure that the information is recorded in stable storage prior to DHCPACK, and the potential-expiration-time MUST be at least the
sending the DHCPBNDACK message back to the primary server. lease time, and SHOULD be longer.
While in NORMAL state, the secondary server MUST also acquire a o Reallocation of IP addresses between clients
series of IP addresses from the primary server to be used to satisfy
DHCPDISCOVER requests from DHCP clients when in COMMUNICATIONS-
INTERRUPTED state. See section 2.5 for details of this acquisition
process.
The secondary server periodically polls the primary server with the Whenever a client binding is released or expires, a BNDUPD mes-
DHCPPOLL message. If it fails to receive a DHCPPRPL message in reply sage must be sent to partner, setting the binding state to
after a configured number of retries or some administratively deter- RELEASED or EXPIRED. However, until a BNDACK is received for
mined time, the secondary server transitions into COMMUNICATIONS- this message, the IP address cannot be allocated to another
INTERRUPTED state. Both the DHCPPOLL and DHCPPRPL messages carry the client. It can be allocated to the same client again.
current state of the sender.
When in normal state, a secondary server is responsive to DHCP client In normal state, the each server receives binding updates from its
requests if they are RENEWAL or REBINDING. Any changes it makes to partner server in BNDUPD messages. It records these in its client
any leases based on these responses should be sent to the primary binding database in stable storage and then sends a corresponding
server using DHCPBNDUPD messages. BNDACK message to the primary server. It MUST ensure that the infor-
mation is recorded in stable storage prior to sending the BNDACK mes-
sage back to the primary server.
6.7.4. Transitions out of NORMAL state 9.6.4. Transitions out of NORMAL state
If an external command is received by a server in NORMAL state If an external command is received by a server in NORMAL state
informing it that its partner is down, then transition into PARTNER- informing it that its partner is down, then transition into PARTNER-
DOWN state. DOWN state.
If a server in NORMAL state fails to receive acks to any messages If a server in NORMAL state fails to receive acks to messages sent to
sent to its partner for an implementation dependent period of time, its partner for an implementation dependent period of time, it MAY
it will move into COMMUNICATIONS-INTERRUPTED state. (See section move into COMMUNICATIONS-INTERRUPTED state. This situation might
6.2). occur if the partner server was capable of maintaining the TCP con-
nection between the server and also capable of sending a CONTACT mes-
sage every tSend seconds, but was (for some reason) incapable of pro-
cessing BNDUPD messages.
DRAFT November 1998 If the communications is determined to not be "ok" (as defined in
section 8), then transition into COMMUNICATIONS-INTERRUPTED state.
If a server in NORMAL state receives any messages from its partner If a server in NORMAL state receives any messages from its partner
where the partner has changed state from that expected by the server where the partner has changed state from that expected by the server
in NORMAL state, then the server should transition into in NORMAL state, then the server should transition into
COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran- COMMUNICATIONS-INTERRUPTED state and take the appropriate state tran-
sition from there. For example, it would be expected for the partner sition from there. For example, it would be expected for the partner
to transition from POTENTIAL-CONFLICT into NORMAL state, but not for to transition from POTENTIAL-CONFLICT into NORMAL state, but not for
the partner to transition from NORMAL into POTENTIAL-CONFLICT state. the partner to transition from NORMAL into POTENTIAL-CONFLICT state.
6.8. COMMUNICATIONS-INTERRUPTED State 9.7. COMMUNICATIONS-INTERRUPTED State
A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is A server goes into COMMUNICATIONS-INTERRUPTED state whenever it is
unable to communicate with the other server. Primary and secondary unable to communicate with the other server. Primary and secondary
servers cycle automatically (without administrative intervention) servers cycle automatically (without administrative intervention)
between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network between NORMAL and COMMUNICATIONS-INTERRUPTED state as the network
connection between them fails and recovers, or as the partner server connection between them fails and recovers, or as the partner server
cycles between operational and non-operational. No duplicate IP cycles between operational and non-operational. No duplicate IP
address allocation can occur while the servers cycle between these address allocation can occur while the servers cycle between these
states. states.
6.8.1. Upon Entry to COMMUNICATIONS-INTERRUPTED state 9.7.1. Upon Entry to COMMUNICATIONS-INTERRUPTED state
When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been When a server enters COMMUNICATIONS-INTERRUPTED state, if it has been
configured to support an automatic transition out of COMMUNICATIONS- configured to support an automatic transition out of COMMUNICATIONS-
INTERRUPTED state and into PARTNER-DOWN state, then a timer MUST be INTERRUPTED state and into PARTNER-DOWN state (i.e., a "safe period"
started for an implementation dependent period. has been configured, see section 10), then a timer MUST be started
for a the length of the configured safe period.
It is anticipated that some alarm condition would be raised upon the
transition from NORMAL state to COMMUNICATIONS-INTERRUPTED state.
6.8.2. Operation in COMMUNICATIONS-INTERRUPTED State A server transitioning into the COMMUNICATIONS-INTERRUPTED state from
the NORMAL state SHOULD raise some alarm condition to alert adminis-
trative staff to a potential problem in the DHCP subsystem.
In this state a server may respond to DHCP client requests. When 9.7.2. Operation in COMMUNICATIONS-INTERRUPTED State
allocating new IP addresses, each server allocates from its own IP
address pool. When responding to renewal requests, each server will
allow continued renewal of a DHCP client's current lease on an IP
address, although the renewal period MUST not exceed the maximum
client lead time (MCLT) beyond the lease time already acknowledged by
the other server.
A server operates in COMMUNICATIONS-INTERRUPTED state as the primary In this state a server MUST respond to all DHCP client requests, and
server does in NORMAL state. the algorithm for load balancing described in section 5.3 MUST NOT be
used. When allocating new IP addresses, each server allocates from
its own IP address pool, where the primary MUST allocate only FREE IP
addresses, and the secondary MUST allocate only BACKUP IP addresses.
When responding to renewal requests, each server will allow continued
renewal of a DHCP client's current lease on an IP address irrespec-
tive of whether that lease was given out by the receiving server or
not, although the renewal period MUST not exceed the maximum client
lead time (MCLT) beyond the potential-expiration-time already ack-
nowledged by the other server or the lease-expiration-time or
potential-expiration-time received from the partner server.
However, since the server cannot communicate with its partner in this However, since the server cannot communicate with its partner in this
state, the acknowledged-partner-lease-time will not be updated in any state, the acknowledged-potential-expiration time will not be updated
new bindings. This is likely to eventually cause the actual-client- in any new bindings. This is likely to eventually cause the actual-
lease-times to be the current-time plus the maximum-client-lead-time client-lease-times to be the current-time plus the maximum-client-
lead-time (unless this is greater than the desired-client-lease-