draft-ietf-ipsecme-ipsecha-protocol-01.txt   draft-ietf-ipsecme-ipsecha-protocol-02.txt 
Network Working Group R. Singh, Ed. Network Working Group R. Singh, Ed.
Internet-Draft G. Kalyani Internet-Draft G. Kalyani
Intended status: Standards Track Cisco Intended status: Standards Track Cisco
Expires: April 14, 2011 Y. Nir Expires: April 28, 2011 Y. Nir
Check Point Check Point
D. Zhang D. Zhang
Huawei Huawei
October 11, 2010 October 25, 2010
Protocol Support for High Availability IKEv2/IPsec Protocol Support for High Availability of IKEv2/IPsec
draft-ietf-ipsecme-ipsecha-protocol-01 draft-ietf-ipsecme-ipsecha-protocol-02
Abstract Abstract
IKEv2 and IPsec protocols are widely used for deploying VPN. In The IPsec protocol suite is widely used for the deployment of virtual
order to make such VPN highly available, more scalable and failure- private networks (VPNs). In order to make such VPNs highly
prone, these VPNs are implemented as IKEv2/IPsec Highly Available available, more scalable and failure-resistant, these VPNs are
(HA) cluster. But there are many issues in IKEv2/IPsec HA cluster. implemented as IPsec High Availability (HA) clusters. However there
The draft "IPsec Cluster Problem Statement" enumerates all the issues are many issues in IPsec HA clustering, and in particular in IKEv2
encountered in IKEv2/IPsec HA cluster environment. clustering. An earlier document, "IPsec Cluster Problem Statement",
enumerates the issues encountered in the IKEv2/IPsec HA cluster
environment. This document attempts to resolve these issues with the
least possible change to the protocol.
This document proposes an extension to IKEv2 protocol to solve main This document proposes an extension to the IKEv2 protocol to solve
issues of "IPsec Cluster Problem Statement" in Hot Standby cluster the main issues of "IPsec Cluster Problem Statement" in the commonly
and gives implementation advice for other issues. The main issues to deployed hot-standby cluster, and provides implementation advice for
be solved are: other issues. The main issues to be solved are the synchronization
o IKEv2 Message Id synchronization : This is done by syncing up of IKEv2 Message ID counters, and of IPsec Replay Counters.
expected send and receive message Id values with the peer and
updating the values at the newly active cluster member after the
failover.
o IPsec Replay Counter synchronization : This is done by syncing up
bumped up outgoing SA replay counters values with peer and
updating the values at the newly active cluster member after the
failover.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 14, 2011. This Internet-Draft will expire on April 28, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Issues solved from IPsec Cluster Problem Statement . . . . . . 6 3. Issues Resolved from IPsec Cluster Problem Statement . . . . . 5
4. IKEv2/IPsec SA Counter Synchronization Problem . . . . . . . . 6 4. The IKEv2/IPsec SA Counter Synchronization Problem . . . . . . 5
5. IKEv2/IPsec SA Counter Synchronization Solution . . . . . . . 8 5. Counter Synchronization Solution . . . . . . . . . . . . . . . 7
6. IKEv2/IPsec synchronization notification payloads . . . . . . 9 6. IKEv2/IPsec Synchronization Notification Payloads . . . . . . 9
6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED . . . . . . . . . . . . . 10 6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED . . . . . . . . . . . . . 9
6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED . . . . . . . . . . . 10 6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED . . . . . . . . . . . 10
6.3. IKEV2_MESSAGE_ID_SYNC . . . . . . . . . . . . . . . . . . 11 6.3. IKEV2_MESSAGE_ID_SYNC . . . . . . . . . . . . . . . . . . 10
6.4. IPSEC_REPLAY_COUNTER_SYNC . . . . . . . . . . . . . . . . 11 6.4. IPSEC_REPLAY_COUNTER_SYNC . . . . . . . . . . . . . . . . 11
7. Details of implementation . . . . . . . . . . . . . . . . . . 12 7. Implementation Details . . . . . . . . . . . . . . . . . . . . 12
8. Step-by-Step details . . . . . . . . . . . . . . . . . . . . . 13 8. Step by Step Details . . . . . . . . . . . . . . . . . . . . . 13
9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14
10. Interaction with other drafts . . . . . . . . . . . . . . . . 14 10. Interaction with other drafts . . . . . . . . . . . . . . . . 14
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15
13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 16 13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 15
13.1. Draft -01 . . . . . . . . . . . . . . . . . . . . . . . . 16 13.1. Draft -02 . . . . . . . . . . . . . . . . . . . . . . . . 16
13.2. Draft -00 . . . . . . . . . . . . . . . . . . . . . . . . 16 13.2. Draft -01 . . . . . . . . . . . . . . . . . . . . . . . . 16
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 13.3. Draft -00 . . . . . . . . . . . . . . . . . . . . . . . . 16
14.1. Normative References . . . . . . . . . . . . . . . . . . . 17 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16
14.1. Normative References . . . . . . . . . . . . . . . . . . . 16
14.2. Informative References . . . . . . . . . . . . . . . . . . 17 14.2. Informative References . . . . . . . . . . . . . . . . . . 17
Appendix A. IKEv2 Message Id examples . . . . . . . . . . . . . . 17 Appendix A. IKEv2 Message ID Sync Examples . . . . . . . . . . . 17
A.1. Normal Failover - Example 1 . . . . . . . . . . . . . . . 17
A.2. Normal Failover - Example 2 . . . . . . . . . . . . . . . 18
A.3. Simultaneous Failover . . . . . . . . . . . . . . . . . . 18
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction 1. Introduction
IKEv2 is used for deploying IPsec-based VPNs. In order to make such The IPsec protocol suite, including IKEv2, is a major building block
VPN highly available, more scalable and failure-prone, these VPNs are of virtual private networks (VPNs). In order to make such VPNs
implemented as IKEv2/IPsec Highly Available (HA) cluster. But there highly available, more scalable and failure-resistant, these VPNs are
are many issues in IKEv2/IPsec HA cluster. The draft "IPsec Cluster implemented as IKEv2/IPsec Highly Available (HA) cluster. However
Problem Statement" enumerates all the issues encountered in IKEv2/ there are many issues with the IKEv2/IPsec HA cluster. The problem
IPsec HA cluster. statement draft Section 4 enumerates the issues around the IKEv2/
IPsec HA cluster solution.
In case of Hot Standby cluster implementation of IKEv2/IPsec based
VPNs, the IKEv2/IPsec session gets established with the peer and the
active member of cluster. After that, the active member syncs/
updates the IKE/IPsec SA state to the standby member of the cluster.
This primary SA state sync-up is done on SA bring up and/or rekey.
Doing SA state synchronization/updation between active and peer
member for each IKE and IPsec message standby cluster is very costly,
so normally its done periodically. So, when "failover" event happens
in the cluster, first "failover' is detected by the standby member
and then it becomes active member and it takes considerable time.
During the time of failover and standby member becoming newly active
member, the peer is unaware of failover and keeps sending IKE request
and IPsec packets to the cluster which is allowed as per IKEv2 and
IPsec windowing feature. Now, newly active member after coming up
finds the mismtach in IKE message Id's and IPsec replay counters.
Please see Section 4 for more details.
This document proposes an extension to IKEv2 protocol to solve main In the case of a hot-standby cluster implementation of IKEv2/IPsec
issues of IKE message id sync and IPsec SA replay counter sync and based VPNs, the IKEv2/IPsec session is first established between the
gives implementation advice for others. Here is summary of solutions peer and the active member of the cluster. Later, the active member
provided in this document: continuously syncs/updates the IKE/IPsec SA state to the standby
member of the cluster. This primary SA state sync-up takes place
upon each SA bring-up and/or rekey. Performing the SA state
synchronization/update for every single IKE and IPsec message is very
costly, so normally it is done periodically. As a result, when the
failover event happens, this is first detected by the standby member
and, possibly after a considerable amount of time, it becomes the
active member. During this failover process the peer is unaware of
the failover event, and keeps sending IKE requests and IPsec packets
to the cluster, as in fact it is allowed to do because of the IKEv2
windowing feature. After the newly-active member starts, it detects
the mismatch in IKE Message ID values and IPsec replay counters and
needs to resolve this situation. Please see Section 4 for more
details of the problem.
IKEv2 Message Id synchronization :This is done by syncing up expected This document proposes an extension to the IKEv2 protocol to solve
send and receive message Id values with the peer and updating the main issues of IKE Message ID synchronization and IPsec SA replay
values at the newly active cluster member after the failover. counter synchronization and gives implementation advice for others.
Following is a summary of the solutions provided in this document:
IPsec Replay Counter synchronization : This is done by syncing up o IKEv2 Message ID synchronization: this is done by syncing up the
bumped up outgoing SA replay counters values with peer and updating expected send and receive Message ID values with the peer, and
the values at the newly active cluster member after the failover updating the values at the newly active cluster member.
o IPsec Replay Counter synchronization: this is done by incrementing
the cluster's outgoing SA replay counter values by a "large"
number, and synchronizing these values with the peer. The peer
send its outgoing SA reply counter in the response.
Though this document describes the IKEv2 message Id sync and IPsec Although this document describes the IKEv2 Message ID and IPsec
replay counter synchronization in context of IPsec HA cluster, the replay counter synchronization in the context of an IPsec HA cluster,
solution provided is genetic and can be used in other scenarios where the solution provided is generic and can be used in other scenarios
IKEv2 message Id sync or IPsec SA replay counters sync is required. where IKEv2 Message ID or IPsec SA replay counter synchronization may
be required.
While some IPsec HA implementation suffers from IKEv2 message Id Implementations differ on the need to synchronize the IKEv2 Message
synchronization problem, some other implementation suffers from IPsec ID and/or IPsec replay counters. Both of these problem are handled
replay counter synchronization. Both of these problem are handled separately, using a separate notification for each capability. This
separately, using separate notify for each problem. This provides provides the flexibility of implementing either or both of these
the flexibility of implementing IKEv2 message Id synchronization or solutions.
IPsec replay counter synchronization or both.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in [RFC2119].
"SA Counter SYNC Request" is the information exchange request defined
in this document to synchronize the IKEv2/IPsec SA counter
information between member of the cluster and the peer.
"SA Counter SYNC Response" is the information exchange response
defined in this document to synchronize the IKEv2/IPsec SA counter
information between member of the cluster and the peer.
Below are the terms taken from [IPsec Cluster Problem Statement] with
added information in context of this document.
"Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
the members is active at any one time. This member is also referred
to as the "active", whereas the other(s) are referred to as
"standbys". VRRP ([RFC5798]) is one method of building such a
cluster. The goal of Hot Standby Cluster is that it creates illusion
of single virtual gateway to the peer(s).
"Active Member" is the primary member in the Hot Standby cluster. It
is responsible for forwarding packets for the virtual gateway.
"Standby Member" is the primary backup router. The member takes
control i.e. becomes active member after the "failover" event.
"Peer" is the IKEv2/IPsec endpoint which establishes VPN connection "SA Counter Synchronization Request/Response" are the request viz.
with Hot Standby cluster. The Peer knows Hot Standby Cluster by response of the information exchange defined in this document to
single cluster's IP address. In case of "failover", the standby synchronize the IKEv2/IPsec SA counter information between one member
member of the cluster becomes active, so the peer normally doesn't of the cluster and the peer.
notice that "failover" has occurred in the cluster.
"Multiple failover" is the situation when in a cluster with three or Some of the terms listed below are reused from [RFC6027] with further
more nodes failover happens in rapid succession. The protocol and clarification in the context of the current document.
implementation must be able to handle multiple failover i.e. able to
handle new failover even if they are still processing the old
failover.
"Simultaneous failover" is the situation when in a cluster the o "Hot Standby Cluster", or "HS Cluster" is a cluster where only one
failover happens at the both ends at the same time. The protocol and of the members is active at any one time. This member is also
implementation must be able to handle simultaneous failover. referred to as the "active" member, whereas the other(s) are
referred to as "standby" members. VRRP [RFC5798] is one method of
building such a cluster. The goal of Hot Standby Cluster is that
it creates illusion of single virtual gateway to the peer(s).
o "Active Member" is the primary member in the Hot-Standby cluster.
It is responsible for forwarding packets on behalf of the virtual
gateway.
o "Standby Member" is the primary backup member. This member takes
control, i.e. becomes the active member, after the failover event.
o "Peer" is an IKEv2/IPsec endpoint that maintains a VPN connection
with the Hot-Standby cluster. The Peer identifies the cluster by
the cluster's (single) IP address. If a failover event occurs,
the standby member of the cluster becomes active, and the peer
normally doesn't notice that failover has taken place.
o "Failover Count" is a global failover event counter maintained by
the HA cluster and incremented by 1 upon each failover event in
the HA cluster. All members of the HA cluster share the failover
count.
o "Multiple failover" is the situation where, in a cluster with
three or more members, failover happens in rapid succession. It
is our goal that the implementation should be able to handle this
situation, i.e. to handle the new failover event even if it is
still processing the old failover.
o "Simultaneous failover" is the situation where two clusters have a
VPN connection between them, and failover happens at the both ends
at the same time. It is our goal that implementation should be
able to handle simultaneous failover.
The generic term IKEv2/IPsec SA counters is used throughout. By The generic term "IKEv2/IPsec SA Counters" is used throughout this
IKEv2 SA counter stands for IKEv2 message ids and IPsec SA counter document. This term refers to both IKEv2 Message ID counters
stands for IPsec SA replay counters which are used to provide (mandatory, and used to ensure reliable delivery as well as to
optional anti-replay feature. protect against message replay in IKEv2) and IPsec SA replay counters
(optional, and used to provide the IPsec anti-replay feature).
3. Issues solved from IPsec Cluster Problem Statement 3. Issues Resolved from IPsec Cluster Problem Statement
IPsec Cluster Problem Statement defines the problems encountered in The IPsec Cluster Problem Statement [RFC6027] enumerates the problems
IPsec Clusters. . The problems along with their section names as raised by IPsec clusters. The following table lists the problem
given in the statement are as follows. statement's sections that are resolved by this document.
o 3.2. Lots of Long Lived State o 3.2. Lots of Long Lived State
o 3.3. IKE Counters o 3.3. IKE Counters
o 3.4. Outbound SA Counters o 3.4. Outbound SA Counters
o 3.5. Inbound SA Counters o 3.5. Inbound SA Counters
o 3.6. Missing Synch Messages o 3.6. Missing Synchronization Messages
o 3.7. Simultaneous use of IKE and IPsec SAs by Different Members o 3.7. Simultaneous use of IKE and IPsec SAs by Different Members
* 3.7.1. Outbound SAs using counter modes * 3.7.1. Outbound SAs using counter modes
o 3.8. Different IP addresses for IKE and IPsec o 3.8. Different IP addresses for IKE and IPsec
o 3.9. Allocation of SPIs o 3.9. Allocation of SPIs
This document solves the main issues using the protocol extension, The main problem areas are solved using the protocol extension
and provides implementation advice for other issues, given as defined below, and additionally this document provides implementation
follows. advice for other issues, given as follows.
o 3.2 This section mentions that there's lots of state that needs to o 3.2 This section mentions that there is a large amount of state
be synchronized. If state is not synchronized, it's not really an that needs to be synchronized. However if state is not
interesting cluster - failover will be just like a reboot, so the synchronized, this is not really an interesting cluster: failover
issue need not be solved with protocol extensions. is equivalent to a reboot of the cluster member, and so the issue
need not be solved with protocol extensions.
o 3.3, 3.4,3.5, and 3.6 are solved by this document. Please see o 3.3, 3.4,3.5, and 3.6 are solved by this document. Please see
Section 4, for more details. Section 4, for more details.
o 3.7 is the problem to be solved while building clusters. However, o 3.7 is an implementation problem that needs to be solved while
the peers should be mandated to accept multiple parallel SAs for building IPsec clusters. However, the peers should be required to
3.7.1 accept multiple parallel SAs for 3.7.1.
o 3.8 can be solved by using IKEv2 Redirect Mechanism [RFC-5685]. o 3.8 can be solved by using the IKEv2 Redirect mechanism [RFC5685].
o 3.9 is the problem about avoiding collision of same SPI's among o 3.9 discusses the avoidance of collisions where the same SPI value
the cluster members. This is outside the scope of the document is used by multiple cluster members. This is outside the
since this has to be solved within the context of the cluster and document's scope since the problem needs to be solved internally
not with the peer. to the cluster and does not involve the peer.
4. IKEv2/IPsec SA Counter Synchronization Problem 4. The IKEv2/IPsec SA Counter Synchronization Problem
IKEv2 RFC states that "An IKE endpoint MUST NOT exceed the peer's The IKEv2 protocol [RFC5996] states that "An IKE endpoint MUST NOT
stated window size for transmitted IKE requests". exceed the peer's stated window size for transmitted IKE requests".
As per the protocol, all IKEv2 packets follows request-response All IKEv2 messages are required to follow a request-response
paradigm. The initiator of an IKEv2 request MUST retransmit the paradigm. The initiator of an IKEv2 request MUST retransmit the
request, until it has received a response from the peer. IKEv2 request, until it has received a response from the peer. IKEv2
introduces a windowing mechanism that allows multiple requests to be introduces a windowing mechanism that allows multiple requests to be
outstanding at a given point of time, but mandates that the sender outstanding at a given point of time, but mandates that the sender
window does not move until the oldest message sent from one peer to window should not move until the oldest message sent from one peer to
another is acknowledged. Loss of even a single packet leads to another is acknowledged. Loss of even a single message leads to
repeated re-transmissions followed by an IKEv2 SA teardown if the re- repeated retransmissions followed by an IKEv2 SA teardown if the
transmissions are unacknowledged. retransmissions are unacknowledged.
IPsec Hot Standby Cluster is required to ensure that in case of An IPsec Hot Standby Cluster is required to ensure that in the case
failover of active member, the standby member becomes active of failover, the standby member becomes active immediately. The
immediately. The standby member is expected to have the exact values standby member is expected to have the exact value of the Message ID
of message id fields of active member before failover. Even with the counter as the active member had before failover. Even assuming the
best efforts to update the message Id values from active to standby best effort to update the Message ID values from active to standby
member, the values at standby member can be stale due to following member, the values at the standby member can still be stale due to
reasons: the following reasons:
o Standby member is unaware of the last message that was received o The standby member is unaware of the last message that was
and acknowledged by the older active member as failover could have received and acknowledged by the previously active member, as the
happened before the standby could be updated. failover event could have happened before the standby member could
o Standby member does not have information about on-going be updated.
unacknowledged requests of active member before the failover o The standby member does not have information about on-going
event. So after failover event when standby member becomes unacknowledged requests received by the previously active member.
active, it can not re-transmit those requests. As a result after the failover event, the newly active member
cannot retransmit those requests.
When a standby member takes over as the active member, it would start When a standby member takes over as the active member, it can only
the message id ranges from previously updated values. This would initialize the Message ID values from the previously updated values.
make it reject requests from the peer, since the values would be This would make it reject requests from the peer when these values
stale. As a sender, the standby member may end up reusing a stale are stale. Conversely, the standby member may end up reusing a stale
message id which will cause the peer to drop the request. Eventually Message ID value which would cause the peer to drop the request.
there is a high probability of the IKEv2 and corresponding IPsec SAs Eventually there is a high probability of the IKEv2 and corresponding
getting torn down simply because of a transitory message id mis-match IPsec SAs getting torn down simply because of a transitory Message ID
and re-transmission of requests. This is not a desirable feature of mismatch and retransmission of requests, negating the benefits of the
HA. Even after updating standby member periodically the cluster can high availability cluster despite the periodic update between the
loose IKE and so all IPsec SA due to message id i.e. SA counter cluster members.
mismatch.
Similar issue is observed in IPsec counters also if anti-replay A similar issue is also observed with IPsec anti-replay counters if
protection/ESN is implemented. Even with the best efforts of syncing anti-replay protection/ESN is implemented, which is commonly the
the ESP and AH SA counter numbers from active to stand by member , case. Regardless of how well the ESP and AH SA counters are
there is a chance that the stand-by member would have stale counter synchronized from the active to the standby member, there is a chance
values. The standby member would then send the stale counter that the standby member would end up with stale counter values. The
numbers. The peer would reject/drop such packets since in case of standby member would then use those stale counter values when sending
anti-replay protection feature, duplicate use of counters are not IPsec packets. The peer would reject/drop such packets since when
allowed. In case of IPsec it is OK to skip some counter values and the anti-replay protection feature is enabled, duplicate use of
start with the higher counter values. counters is not allowed. Note that IPsec allows the sender to skip
some counter values and continue sending with higher counter values.
Hence a mechanism is required in HA to ensure that the standby member We conclude that a mechanism is required to ensure that the standby
has correct values of message Id values and IPsec counters, so that member has correct Message ID and IPsec counter values when it
sessions are not torn down just because of mismatching counters. becomes active, so that sessions are not torn down as a result of
mismatched counters.
5. IKEv2/IPsec SA Counter Synchronization Solution 5. Counter Synchronization Solution
When the standby member becomes the active member after failover In general, when the standby member becomes the active member after
event in the cluster, the standby member would send an authenticated the failover event, the standby member sends an authenticated IKEv2
IKEv2 request to the peer to send its values of SA counters. request to the peer, asking it to send its SA counter values.
The standby member would then update its values of SA counters and The standby member then updates its own SA counter values and can
then start sending/receiving the requests. resume normally sending and receiving protocol messages.
First, the peer MUST negotiate its ability to support IKEv2 message First, the peer MUST negotiate its ability to support IKEv2 Message
Id synchronization information with active member of the cluster by ID synchronization with the active member of the cluster by sending
sending the IKEV2_MESSAGE_ID_SYNC_SUPPORTED notification in IKE_AUTH the IKEV2_MESSAGE_ID_SYNC_SUPPORTED notification in the IKE_AUTH
exchange. exchange.
Similarly to support IPsec replay counter synchronization, the peer Similarly, to support IPsec Replay Counter synchronization, the peer
MUST negotiate its ability to support IPsec replay counter MUST negotiate this capability with the active member of the cluster
synchronization with active member of the cluster by sending by sending the IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED notification in
IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED notification in IKE_AUTH the IKE_AUTH exchange.
exchange.
Peer Active Member Peer Active Member
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HDR, SK {IDi, [CERT], [CERTREQ], [IDr], AUTH, HDR, SK {IDi, [CERT], [CERTREQ], [IDr], AUTH,
N[IKEV2_MESSAGE_ID_SYNC_SUPPORTED], [N(IKEV2_MESSAGE_ID_SYNC_SUPPORTED),]
N[IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED], [N(IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED),]
SAi2, TSi, TSr} ----------> SAi2, TSi, TSr} ---------->
<---------- HDR, SK {IDr, [CERT+], [CERTREQ+], AUTH, <-------- HDR, SK {IDr, [CERT+], [CERTREQ+], AUTH,
N[IKEV2_MESSAGE_ID_SYNC_SUPPORTED], [N(IKEV2_MESSAGE_ID_SYNC_SUPPORTED),]
N[IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED], SAr2, TSi, TSr} [N(IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED),] SAr2, TSi, TSr}
When peer and active member both support SA counter synchronization, When the peer and active member both support SA counter
the active member MUST sync/update SA counter synchronization synchronization, the active member MUST inform the standby member of
capability to the standby member after the establishment of the IKE the SA counter synchronization capability after the establishment of
SA . So that standby member is aware of the capability and can use the IKE SA. The standby member can then use this capability when it
it when it becomes the active member after failover event. becomes the active member after a failover event.
After failover event, when the standby member becomes the active After the failover event, when the standby member becomes active, it
member, it has to request the peer for the SA counters. Standby has to request the SA counters from the peer. The newly-active
member would initiate the SYNC Request with an INFORMATIONAL exchange member initiates the synchronization request with an Informational
with message Id zero containing the notify IKEV2_MESSAGE_ID_SYNC or exchange with Message ID zero containing either the notification
IPSEC_REPLAY_COUNTER_SYNC or both depending on whether the IKEV2_MESSAGE_ID_SYNC or the two notifications IKEV2_MESSAGE_ID_SYNC
synchronization needs to be done for IKEv2 message Ids, IPsec replay and IPSEC_REPLAY_COUNTER_SYNC, depending on whether the
counters, or both. synchronization is to be done for IKEv2 Message IDs or for both IKEv2
Message IDs and IPsec replay counters. If the active member has only
negotiated synchronization of IPsec Replay Counters, the request is
sent as a regular IKEv2 Informational exchange (i.e. with a non-zero
Message ID) containing the notification IPSEC_REPLAY_COUNTER_SYNC.
The initiator of IKEv2 message Id sync request sends its expected The initiator of the IKEv2 Message ID synchronization request sends
send and receive message Id values and "failover count" in its expected send and receive Message ID values and "failover count"
IKEV2_MESSAGE_ID_SYNC notify. The responder of the request compares in a IKEV2_MESSAGE_ID_SYNC notification. The responder compares the
the received values with the available local values. The higher received values with its local values. For both send and receive
among both is selected and sent as sync response with notify values, The higher between the cluster member's and the local value
IKEV2_MESSAGE_ID_SYNC. The initiator now updates send and receive is selected, and sent in the response message with the notification
IKEv2 message Ids to the values received in sync response and can IKEV2_MESSAGE_ID_SYNC. The initiator now updates its send and
start normal IKEv2 message exchange. receive IKEv2 Message IDs to the values received in the response and
can now start a normal IKEv2 message exchange.
The initiator of IPsec replay counter sync sends bumped outgoing The initiator of an IPsec Replay Counter synchronization sends the
IPsec SA reply counter value and "failover count" in incremented outgoing IPsec SA reply counter value and a "failover
IPSEC_REPLAY_COUNTER_SYNC notify. The responder of the request count" in a IPSEC_REPLAY_COUNTER_SYNC notification in IKEv2
updates its incoming IPsec SA counter values and sends its bumped INFORMATIONAL exchange. The responder updates its incoming IPsec SA
outgoing IPsec SA replay counter value in sync response with counter values according to the received value. The responder now
IPSEC_REPLAY_COUNTER_SYNC. The initiator now updates its incoming sends its own incremented outgoing IPsec SA Replay Counter value in a
IPsec SA counter to values received in sync response and can start synchronization response message, with the same
normal IPsec data traffic. IPSEC_REPLAY_COUNTER_SYNC notification. The initiator can now update
its incoming IPsec SA counter to values received in the response
message and can start normal IPsec data traffic.
Both the notify types IKEV2_MESSAGE_ID_SYNC and The IKEV2_MESSAGE_ID_SYNC notification payload contain nonce data to
IPSEC_REPLAY_COUNTER_SYNC contain Nonce Data in the payload to avoid avoid a denial-of-service (DoS) attack due to replay of SA counter
DOS attack due to replay of SA counter sync request/response. The synchronization response. The nonce values are selected randomly on
Nonce are defined per notify and MUST be validated. The Nonce data each new notification and MUST be validated by the receiver. The
sent in response MUST match with nonce data sent by newly-active nonce data sent in the response MUST match the nonce data sent by the
member in request. If nonce data received in response does not match newly-active member in its request. If the nonce data received in
with nonce data sent in request, the standby i.e. newly-active member the response does not match the request's nonce data, the cluster
MUST discard this response, and normal IKEv2 behavior of re- member MUST silently discard this response, and SHOULD revert to
transmitting the request and waiting for genuine reply from the peer normal IKEv2 behavior of retransmitting the request and waiting for a
SHOULD follow, before tearing down the SA because of re-transmits. genuine a reply from the peer. Eventually this might result in the
SA being torn down because of excessive retransmissions.
Standby [Newly Active] Member Peer Standby [Newly Active] Member Peer
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HDR, SK {N[IKEV2_MESSAGE_ID_SYNC ], HDR, SK {N(IKEV2_MESSAGE_ID_SYNC),
N[IPSEC_REPLAY_COUNTER_SYNC]} --------> [N(IPSEC_REPLAY_COUNTER_SYNC)]} -------->
<--------- HDR, SK {N(IKEV2_MESSAGE_ID_SYNC),
[N(IPSEC_REPLAY_COUNTER_SYNC)]}
<--------- HDR, SK {N[IKEV2_MESSAGE_ID_SYNC ], Alternatively, if only IPsec Replay Counter synchronization is
N[IPSEC_REPLAY_COUNTER_SYNC]} desired, a normal Information exchange is used, where the Message ID
is non-zero:
6. IKEv2/IPsec synchronization notification payloads Standby [Newly Active] Member Peer
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HDR, SK{N(IPSEC_REPLAY_COUNTER_SYNC)} -------->
Below are the new notify and payload types that are defined <--------- HDR, SK {N(IPSEC_REPLAY_COUNTER_SYNC)}
6. IKEv2/IPsec Synchronization Notification Payloads
This section lists the new notification payloads types defined by
this extension.
6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED 6.1. IKEV2_MESSAGE_ID_SYNC_SUPPORTED
IKEV2_MESSAGE_ID_SYNC_SUPPORTED: This notify is included in the IKEV2_MESSAGE_ID_SYNC_SUPPORTED: This notification payload is
IKE_AUTH request/response to indicate support for IKEv2 message Id included in the IKE_AUTH request/response to indicate support of the
synchronization mechanism described in this document. IKEv2 Message ID synchronization mechanism described in this
document.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Payload |C| RESERVED | Payload Length | | Next Payload |C| RESERVED | Payload Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Protocol ID(=0)| SPI Size (=0) | Notify Message Type | |Protocol ID(=0)| SPI Size (=0) | Notify Message Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and
'Notify Message Type' fields are the same as described in Section 3 'Notify Message Type' fields are the same as described in Section 3
of [RFC5996]. The 'SPI Size' field MUST be set to 0 to indicate that of [RFC5996] . The 'SPI Size' field MUST be set to 0 to indicate
the SPI is not present in this message. The 'Protocol ID' MUST be that the SPI is not present in this message. The 'Protocol ID' MUST
set to 0, since the notification is not specific to a particular be set to 0, since the notification is not specific to a particular
security association. 'Payload Length' field is set to the length in security association. The 'Payload Length' field is set to the
octets of the entire payload, including the generic payload header. length in octets of the entire payload, including the generic payload
The 'Notify Message Type' field is set to indicate the header. The 'Notify Message Type' field is set to indicate
IKEV2_MESSAGE_ID_SYNC_SUPPORTED payload. IKEV2_MESSAGE_ID_SYNC_SUPPORTED, value TBD by IANA. There is no data
associated with this notification.
6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED 6.2. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED
IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED: This notify is included in the IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED: This notification payload is
IKE_AUTH request/response to indicate support for IPsec SA replay included in the IKE_AUTH request/response to indicate support for the
counter synchronization mechanism described in this document. IPsec SA Replay Counter synchronization mechanism described in this
document.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Payload |C| RESERVED | Payload Length | | Next Payload |C| RESERVED | Payload Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Protocol ID(=0)| SPI Size (=0) | Notify Message Type | |Protocol ID(=0)| SPI Size (=0) | Notify Message Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and The 'Next Payload', 'Payload Length', 'Protocol ID', 'SPI Size', and
'Notify Message Type' fields are the same as described in Section 3 'Notify Message Type' fields are the same as described in Section 3
of [RFC5996]. The 'SPI Size' field MUST be set to 0 to indicate that of [RFC5996] . The 'SPI Size' field MUST be set to 0 to indicate
the SPI is not present in this message. The 'Protocol ID' MUST be that the SPI is not present in this message. The 'Protocol ID' MUST
set to 0, since the notification is not specific to a particular be set to 0, since the notification is not specific to a particular
security association. 'Payload Length' field is set to the length in security association. The 'Payload Length' field is set to the
octets of the entire payload, including the generic payload header. length in octets of the entire payload, including the generic payload
The 'Notify Message Type' field is set to indicate the header. The 'Notify Message Type' field is set to indicate
IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED payload. IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED, value TBD by IANA. There is no
data associated with this notification.
6.3. IKEV2_MESSAGE_ID_SYNC 6.3. IKEV2_MESSAGE_ID_SYNC
IKEV2_MESSAGE_ID_SYNC : This payload type is defined to sync the IKEV2_MESSAGE_ID_SYNC : This notification payload type (value TBD by
IKEv2 message Ids among newly-active [standby] member and the peer. IANA) is defined to synchronize the IKEv2 Message ID values between
the newly-active (formerly standby) cluster member and the peer.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Payload | RESERVED | Payload Length | | Next Payload | RESERVED | Payload Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Failover count | |Protocol ID(=0)| SPI Size (=0) | Notify Message Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nonce Data | | Failover Count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EXPECTED_SEND_REQ_MESSAGE_ID | | Nonce Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EXPECTED_RECV_REQ_MESSAGE_ID | | EXPECTED_SEND_REQ_MESSAGE_ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EXPECTED_RECV_REQ_MESSAGE_ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It contains the following data. It contains the following data.
o Failover count (4 octets) : The failover count within the cluster, o Failover Count (4 octets): a running count of failover events
it increases with each failover event in HA cluster. between cluster members, it is initialized to 0 when the cluster
o Nonce Data (4 octets) : The random nonce data. It should be sent is first set up, and incremented by 1 upon each failover event.
same in the SYNC Request and Response. The nonce data is used to o Nonce Data (4 octets): the random nonce data. The data should be
counter the replay of IKEV2_MESSAGE_ID_SYNC response by the identical in the synchronization request and response.
attacker. o EXPECTED_SEND_REQ_MESSAGE_ID (4 octets): this field is used by the
o EXPECTED_SEND_REQ_MESSAGE_ID (4 octets) : This MUST be present sender of this notification payload to indicate the Message ID it
only if protocol ID is IKE. This field is used by the sender of will use in the next request that it will send to the other
this notify, to indicate the message Id it will use in the next protocol peer.
request, that it will send to the other side peer. o EXPECTED_RECV_REQ_MESSAGE_ID (4 octets): this field is used by the
o EXPECTED_RECV_REQ_MESSAGE_ID (4 octets) : This field is used by sender of this notification payload to indicate the Message ID it
the sender of this notify, to indicate the message Id it can is expecting in the next request to be received from the other
accept in the next request, received from the other side peer. protocol peer.
6.4. IPSEC_REPLAY_COUNTER_SYNC 6.4. IPSEC_REPLAY_COUNTER_SYNC
IPSEC_REPLAY_COUNTER_SYNC: This payload type is defined to sync the IPSEC_REPLAY_COUNTER_SYNC: This notification payload type (value TBD
IPsec SA replay counters among newly-active [standby] member and the by IANA) is defined to synchronize the IPsec SA Replay Counters
peer. between the newly-active (formerly standby) cluster member and the
peer. Since there may be numerous IPsec SAs established under a
single IKE SA, we do not directly synchronize the value of each one.
Instead, a delta value is sent and all Replay Counters for child SAs
of this IKE SA are incremented by the same value. Note that this
solution requires that all these Child SAs either use or do not use
Extended Sequence Numbers [RFC4301].
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Payload |ESN| RESERVED | Payload Length | | Next Payload |E| RESERVED | Payload Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Failover count | |Protocol ID(=0)| SPI Size (=0) | Notify Message Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outgoing IPsec SA counter | | Outgoing IPsec SA counter |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It contains the following data. The notification payload contains the following data.
o ESN (1 bit) : The ESN bit MUST be ON if IPsec SA were established o E (1 bit): The ESN bit. This MUST be 1 if the IPsec SAs were
with Extended Sequence Numbers. established with Extended Sequence Numbers.
o Failover count (4 octets) : The failover count within the cluster, o Outgoing IPsec SA delta value (4 or 8 octects): The sender will
it increases with each failover event in HA cluster. increment the all the Child SA Replay Counters for its outgoing
o Outgoing IPsec SA counter (4 octets or 8 octect) : The outgoing traffic by this value. The size of this field depends on ESN bit:
IPsec SA counter is the bumped-up outgoing IPsec SA replay counter if the ESN bit is 1, its size is 8 octets, otherwise it is 4
value considering ALL Child SA under the IKEv2 SA. The size of octets.
outgoing IPsec SA counter depends on ESN bit. If ESN bit is ON,
it is size of 8 octets else it is 4 octets.
7. Details of implementation 7. Implementation Details
The message Id used IKEV2_MESSAGE_ID_SYNC exchange MUST be zero so The Message ID value used in the Informational exchange that contains
that it is not validated upon receipt as per IKEv2 windowing. the IKEV2_MESSAGE_ID_SYNC notification MUST be zero so that it is not
Message Id zero MUST be permitted only for informational exchange validated upon receipt as required by normal IKEv2 windowing. The
that would have NOTIFY of type IKEV2_MESSAGE_ID_SYNC. If any Message ID zero MUST be accepted only in an Informational exchange
INFORMATIONAL exchange uses the message Id Zero, without having this that contains a notification of type IKEV2_MESSAGE_ID_SYNC. If any
Notify, then such packets MUST be discarded upon decryption and Informational exchange has a Message ID zero, but not this
INVALID_SYNTAX notify SHOULD be sent. No other payloads are allowed notification type, such messages MUST be discarded upon decryption
in this Informational exchange. Whenever IKEV2_MESSAGE_ID_SYNC or and the INVALID_SYNTAX notification SHOULD be sent. Other payloads
IPSEC_REPLAY_COUNTER_SYNC notify is received with invalid failover MUST NOT be sent in this Informational exchange. Whenever an
count or nonce data, the event SHOULD be logged. IKEV2_MESSAGE_ID_SYNC or IPSEC_REPLAY_COUNTER_SYNC notification
payload is received with an invalid failover count or invalid nonce
data, the event SHOULD be logged.
The standby member can initiate the synchronization of IKEv2 Message The standby member can initiate the synchronization of IKEv2 Message
Id's ID's under different circumstances.
o When it receives the bad IKEv2/IPsec packet. The 'bad" IKEv2/ o When it receives a problematic IKEv2/IPsec packet, i.e. a packet
IPsec packet means a packet outside receive window. outside its expected receive window.
o When it has to send an IKEv2/IPsec packet after failover event. o When it has to send the first IKEv2/IPsec packet after a failover
o It has just got the control from active member and would require event.
to update the values before-hand, so that it need not start this o When it has just received control from active member and wishes to
exchange at the time of sending/receiving the request. update the values proactively, so that it need not start this
exchange later, when sending or receiving the request.
The standby member can initiate the synchronization of IPsec SA The standby member can initiate the synchronization of IPsec SA
Counters Replay Counters:
o If there is traffic using the IPsec SA in the recent past and o If there has been traffic using the IPsec SA in the recent past
there could be stale replay counter at standby member and the standby member suspects that its Replay Counter may be
stale.
Since there can be many sessions at Standby member, and sending
exchanges from all of the sessions can cause throttling, the standby
member can choose to initiate the exchange when it has to send or
receive the request. Thus the trigger to initiate this exchange
depends on the requirement/discretion of the standby member.
The member which has not announced its capability Since there can be a large number of sessions at the standby member,
IKEV2_MESSAGE_ID_SYNC_SUPPORTED MUST NOT send/receive the notify and sending synchronization exchanges for all of them may result in
IKEV2_MESSAGE_ID_SYNC. overload, the standby member can choose to initiate the exchange in a
"lazy" fashion: only when it has to send or receive the request. In
general, the standby member is free to initiate this exchange at its
discretion.
The member which has not announced its capability A cluster member which has not announced its capability by using
IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED MUST NOT send/receive the notify IKEV2_MESSAGE_ID_SYNC_SUPPORTED MUST NOT send or accept the
IPSEC_REPLAY_COUNTER_SYNC. notification IKEV2_MESSAGE_ID_SYNC.
If a peer gets IKEV2_MESSAGE_ID_SYNC or IPSEC_REPLAY_COUNTER_SYNC A cluster member which has not announced its capability by using
request even though it did not announce its capability in IKE_AUTH IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED MUST NOT send or accept the
exchange, then it MUST ignore this message. notification IPSEC_REPLAY_COUNTER_SYNC.
If any of the Notify or the SYNC request/response is malformed, then If a peer receives a IKEV2_MESSAGE_ID_SYNC or
it is treated as INVALID_SYNTAX message. IPSEC_REPLAY_COUNTER_SYNC request although it had not announced the
appropriate capability in the IKE_AUTH exchange, then it MUST
silently ignore this message.
8. Step-by-Step details As usual in IKEv2, if any of the notification payloads defined here
is malformed, the receiver must announce this fact using the
INVALID_SYNTAX notification.
The step by step details of the synchronization of IKE message Id is 8. Step by Step Details
as follows.
o Active member and peer device establish the session . They
announce the capability to sync the counter info by sending
IKEV2_MESSAGE_ID_SYNC_SUPPORTED notify in IKE_AUTH Exchange.
o Active member dies and Stand-by member takes over. Standby Member
sends its own idea of the IKE Message ID (its side) to peer in an
INFORMATIONAL message exchange with message Id zero.
o The peer first authenticates the message and then validates that
failover count. The peer will compare the received values with
the values available locally and finally picks the higher value.
It then updates its message Id's with the higher values and also
propose the same in Response.
o The peer should not wait for pending response while responding
with this message Id values. For example if window size is 5 and
peer window is 3-7 and if peer has sent requests 3, 4,5,6,7 and
but got response only for 4,5,6,7 but not 3 then it should send
the EXPECTED_SEND_REQ_MESSAGE_ID as 8 and should not wait for
response of 3 anymore.
o The peer should not wait for pending request also. For example if This section goes through the sequence of steps of a typical failover
window size is 5 and peer window is 3-7 and if peer has received event, where the IKEv2 Message ID values are synchronized.
requests 4,5,6,7 but not 3 then it should send the o The active cluster member and the peer device establish the
EXPECTED_RECV_REQ_MESSAGE_ID as 8 and should not wait for 3 session. They both announce the capability to synchronize counter
anymore. information by sending the IKEV2_MESSAGE_ID_SYNC_SUPPORTED
notification in the IKE_AUTH Exchange.
o The active member dies, and a standby member takes over. The
standby member sends its own idea of the IKE Message IDs (both
incoming and outgoing) to the peer in an Informational message
exchange with Message ID zero.
o The peer first authenticates the message and then validates the
failover count. The peer compares the received values with the
values available locally and picks the higher value. It then
updates its Message IDs with the higher values and also propose
the same values in its response.
o The peer should not wait for any pending responses while
responding with the new Message ID values. For example, if the
window size is 5 and the peer's window is 3-7, and if the peer has
sent requests 3, 4, 5, 6, 7 and received responses only for 4, 5,
6, 7 but not for 3, then it should include the value 8 in its
EXPECTED_SEND_REQ_MESSAGE_ID payload and should not wait for a
response to message 3 anymore.
o Similarly, the peer should also not wait for pending (incoming)
requests. For example if the window size is 5 and the peer's
window is 3-7 and if the peer has received requests 4, 5, 6, 7 but
not 3, then it should send the value 8 in the
EXPECTED_RECV_REQ_MESSAGE_ID payload, and should not expect to
receive message 3 anymore.
There is corner case with "failover count' and multiple failover. In case multiple successive failover events and sync request getting
What if "failover count" is not updated on a member, and next lost, the failover count value at peer will not be updated and new
"failover" happened, then "failover count" is updated on other side standby member will become active with incremented failover count
but not on this member. [[ This need to be discussed on mailing list. value. So, peer can receive valid failover count value which is not
]] just incremented by 1 in case of multiple failover. Accepting
incremented failover count within a range is allowed and increases
interoperability.
9. Security Considerations 9. Security Considerations
There can be two types of DOS attacks. Since Message ID synchronization messages need to be sent with
o Replay of Message SYNC Request. This is countered by "failover Message ID zero, they are potentially vulnerable to replay attacks.
count", since synchronization starts after failover event and each Because of the semantics of this protocol, these can only be denial-
member of the cluster is aware of failover event. The receiver of of-service (DoS) attacks, and we are aware of two variants.
sync request should verify and maintain failover count. If a peer o Replay of Message ID synchronization request: This is countered by
again receives a sync request with same "failover count', it can use of the Failover Count, since synchronization starts after the
safely safely discard the request if it has received valid failover event and each member of the cluster needs to be aware of
request/response from other side peer after sync exchange. The the failover event. The receiver of the synchronization request
peer can send the cached response for sync request till it has not should verify the received Failover Count and maintain its own
received valid request/response from other side peer or failover copy of it. If a peer receives a synchronization request with an
count has not increased. already observed Failover Count, it can safely discard the request
o Replay of Message SYNC Response. This is countered by sending the if it has already received valid IKEv2 request/response from other
NONCE data along with the sync notify. The same NONCE data has to side peer after sync exchange. The peer will be not be aware that
be returned in response. Thus the standby member can accept the sync response has reached to other side till it receives a valid
reply only for the current request. After it receives the valid IKEv2 request/response from other side. The peer can send the
response, it MUST NOT process same response again and MUST discard cached response for sync request till it has not received valid
the response. request/response from other side peer or failover count has not
increased.
o Replay of the Message ID synchronization response: This is
countered by sending the nonce data along with the synchronization
payload. The same nonce data has to be returned in response.
Thus the standby member will accept a reply only for the current
request. After it receives a valid response, it MUST NOT process
the same response again and MUST discard any additional responses.
10. Interaction with other drafts 10. Interaction with other drafts
The primary assumption of IKEv2/IPsec SA Counter Synchronization The usage scenario of the IKEv2/IPsec SA counter synchronization
proposal is IKEv2 SA has been established between active member of proposal is that an IKEv2 SA has been established between the active
Hot Standby Cluster and peer, after that the failover event occurred member of a hot-standby cluster and a peer, then a failover event
and now standby member has "become" active. It also assumes the occurred with the standby member becoming active. The proposal
IKEv2 SA state was synced between active and standby member of the further assumes that the IKEv2 SA state was continuously synchronized
Hot Standby Cluster before the failover event. between the active and standby members of the cluster before the
o Session Resumption. Session resumption assumes that peer i.e. failover event.
client or initiator detects the need to re-establish the session. o Session resumption [RFC5723] assumes that a peer (client or
In IKEv2/IPsec SA counter synchronization, standby member which initiator) detects the need to re-establish the session. In
becomes active i.e. gateway or responder detects the need to IKEv2/IPsec SA counter synchronization, it is the newly-active
synchronize the SA counter after the failover event. Also in Hot member (a gateway or responder) that detects the need to
Standby Cluster, peer establishes the IKEv2/IPsec session with synchronize the SA counter after the failover event. Also in a
single cluster's IP address, so peer normally does not detect the hot-standby cluster, the peer establishes the IKEv2/IPsec session
event of failover in the cluster until standby member took very with a single IP address that represents the whole cluster, so the
long to become active and IKEv2 SA times out via liveness check. peer normally does not detect the event of failover in the cluster
So, session resumption and SA counter synchronization after unless the standby member takes too long to become active and the
failover are mutually exclusive. IKEv2 SA times out by use of the IKEv2 liveness check mechanism.
o This document describes the operation of tightly coupled clusters, To conclude, session resumption and SA counter synchronization
which are the common way of building IPsec clusters. In these after failover are mutually exclusive.
clusters, all members appear to the peer as one gateway, o The IKEv2 Redirect mechanism for load-balancing [RFC5685] can be
specifically they share a single IP address. High availability used either during the initial stages of SA setup (the IKE_SA_INIT
can also be provided by loosely coupled clusters (for lack of a and IKE_AUTH exchanges) or after session establishment. SA
better term), which are a group of gateways that do not share an counter synchronization is only useful after the IKE SA has been
IP address and do not synchronize state. In this architecture, established and a failover event has occurred. So, unlike
the client can use Session Resumption to fail-over from one Redirect, it is irrelevant during the first two exchanges.
cluster member to another. Specifically this requires: Redirect after the session has been established is mostly useful
* Support of session resumption on peers and gateways. for timed or planned shutdown/maintenance. A real failover event
* A common session resumption ticket format on all gateways (not cannot be detected by the active member ahead of time, and so
currently standardized). using Redirect after session establishment is not possible in the
* Configuration on the peers of the group of gateways that case of failover. So, Redirect and SA counter synchronization
constitute the cluster. after failover are mutually exclusive.
o Redirect. Redirect mechanism for load-balancing can be used o IKEv2 Failure Detection [I-D.ietf-ipsecme-failure-detection]
during init (IKE_SA_INIT) and auth (IKE_AUTH) and after session solves a similar problem where the peer can rapidly detect that a
establishment. While SA counter sync is used after IKE SA has cluster member has crashed based on a token. It is unrelated to
been established and failover event has occurred. So it is the current scenario because the goal in failover is for the peer
mutually exclusive with redirect during init and auth. The not to notice that a failure has occurred.
redirect after session established is used for timed or planned
shutdown/maintenance. The failover event can not be detected on
active member beforehand and so using redirect after session
establishment is not possible in case of failover. So, Redirect
and SA counter synchronization after failover are mutually
exclusive.
o Crash detection. Solves the similar problem where peer detect
that cluster member has crashed based on a token. It is mutually
exclusive with HA with SA counter sync.
11. IANA Considerations 11. IANA Considerations
This document introduces four new IKEv2 Notification Message types as This document introduces four new IKEv2 Notification Message types as
described in Section 6.The new Notify Message Types must be assigned described in Section 6.The new Notify Message Types must be assigned
values between 16396 and 40959. values between 16396 and 40959.
o IKEV2_MESSAGE_ID_SYNC_SUPPORTED. o IKEV2_MESSAGE_ID_SYNC_SUPPORTED.
o IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED. o IPSEC_REPLAY_COUNTER_SYNC_SUPPORTED.
o IKEV2_MESSAGE_ID_SYNC. o IKEV2_MESSAGE_ID_SYNC.
o IPSEC_REPLAY_COUNTER_SYNC. o IPSEC_REPLAY_COUNTER_SYNC.
12. Acknowledgements 12. Acknowledgements
We would like to thank Pratima Sethi and Frederic Detienne for their We would like to thank Pratima Sethi and Frederic Detienne for their
reviews comments and valuable suggestions for initial version of the review comments and valuable suggestions for the initial version of
document. the document.
We would also like to thank following people (in alphabetical order) We would also like to thank the following people (in alphabetical
for their review comments and valuable suggestions: Dan Harkins, Paul order) for their review comments and valuable suggestions: Dan
Hoffman, Steve Kent, Tero Kivinen, David McGrew, Pekka Riikonen, and Harkins, Paul Hoffman, Steve Kent, Tero Kivinen, David McGrew, Pekka
Yaron Sheffar. Riikonen, and Yaron Sheffer.
13. Change Log 13. Change Log
This section lists all the changes in this document. This section lists all the changes in this document.
NOTE TO RFC EDITOR: Please remove this section before publication. NOTE TO RFC EDITOR: Please remove this section before publication.
13.1. Draft -01 13.1. Draft -02
Added "Multiple and Simultaneous failover' scenarios. Addressed comments by Yaron Sheffer posted on the WG mailing list.
Numerous editorial changes.
13.2. Draft -01
Added "Multiple and Simultaneous failover' scenarios as pointed out
by Pekka Riikonen.
Now document provides a mechanism to sync either IKEv2 message or Now document provides a mechanism to sync either IKEv2 message or
IPsec replay counter or both to cater different types of IPsec replay counter or both to cater different types of
implementations. implementations.
HA cluster's "failover count' is used to encounter replay of sync HA cluster's "failover count' is used to encounter replay of sync
requests by attacker. requests by attacker.
The sync of IPsec SA replay counter optimized to to have just one The sync of IPsec SA replay counter optimized to to have just one
global bumped-up outgoing IPsec SA counter of ALL Child SAs under an global bumped-up outgoing IPsec SA counter of ALL Child SAs under an
IKEv2 SA. IKEv2 SA.
The examples added for IKEv2 message Id sync to provide more clarity. The examples added for IKEv2 Message ID sync to provide more clarity.
Some edits as per comments on mailing list to enhance clarity. Some edits as per comments on mailing list to enhance clarity.
13.2. Draft -00 13.3. Draft -00
Version 00 is identical to Version 00 is identical to
draft-kagarigi-ipsecme-ikev2-windowsync-04, started as WG document. draft-kagarigi-ipsecme-ikev2-windowsync-04, started as WG document.
Added IPSECME WG HA design team members as authors. Added IPSECME WG HA design team members as authors.
Added comment in Introduction to discuss the window sync process on Added comment in Introduction to discuss the window sync process on
WG mailing list to solve some concerns. WG mailing list to solve some concerns.
14. References 14. References
14.1. Normative References 14.1. Normative References
[IPsec Cluster Problem Statement]
Nir, Y., "IPsec Cluster Problem Statement", July 2010.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
[RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
"Internet Key Exchange Protocol: IKEv2", RFC 5996, "Internet Key Exchange Protocol Version 2 (IKEv2)",
September 2010. RFC 5996, September 2010.
[RFC6027] Nir, Y., "IPsec Cluster Problem Statement", RFC 6027,
October 2010.
14.2. Informative References 14.2. Informative References
[I-D.ietf-ipsecme-failure-detection]
Nir, Y., Wierbowski, D., Detienne, F., and P. Sethi, "A
Quick Crash Detection Method for IKE",
draft-ietf-ipsecme-failure-detection-01 (work in
progress), October 2010.
[RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for
IKEv2", RFC 5685, November 2009. the Internet Key Exchange Protocol Version 2 (IKEv2)",
RFC 5685, November 2009.
[RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", [RFC5723] Sheffer, Y. and H. Tschofenig, "Internet Key Exchange
RFC 5723, January 2010. Protocol Version 2 (IKEv2) Session Resumption", RFC 5723,
January 2010.
Appendix A. IKEv2 Message Id examples [RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)
Version 3 for IPv4 and IPv6", RFC 5798, March 2010.
Below are the examples to illustrate how the IKEv2 message Id values Appendix A. IKEv2 Message ID Sync Examples
are synced. The notation used to denote EXPECTED_SEND_REQ_MESSAGE_ID
and EXPECTED_RECV_REQ_MESSAGE_ID on a member is This (non-normative) section presents some examples that illustrate
how the IKEv2 Message ID values are synchronized. We use a tuple
notation, denoting the two counters EXPECTED_SEND_REQ_MESSAGE_ID and
EXPECTED_RECV_REQ_MESSAGE_ID on a member as
(EXPECTED_SEND_REQ_MESSAGE_ID, EXPECTED_RECV_REQ_MESSAGE_ID). (EXPECTED_SEND_REQ_MESSAGE_ID, EXPECTED_RECV_REQ_MESSAGE_ID).
Normal failover - Example 1 A.1. Normal Failover - Example 1
Standby [Newly Active] Member Peer Standby (Newly Active) Member Peer
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Request SYNC (2, 3) --------> Sync Request (2, 3) -------->
Peer has values as (4, 5) so it sends Peer has the values (4, 5) so it sends
< -------------( 4, 5) Response SYNC <------------- (4, 5) as the Sync Response
Normal failover - Example 2 A.2. Normal Failover - Example 2
Standby [Newly Active] Member Peer Standby (Newly Active) Member Peer
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Request SYNC (2, 5) --------> Sync Request (2, 5) -------->
Peer has values as (2, 4) so it sends
< -------------( 5, 4) Response SYNC
Simultaneous failover Peer has the values (2, 4) so it sends
<-------------(5, 4) as the Sync Response
In case of simultaneous failover, both the sides send the SYNC A.3. Simultaneous Failover
request, but whichever side has the higher value will be eventually
synced.
Standby [Newly Active] Member Peer In the case of simultaneous failover, both sides send the
synchronization request, but whichever side has the higher value will
be eventually synchronized.
Standby (Newly Active) Member Peer
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
request SYNC (4,4) -----> Sync Request (4,4) ----->
<-------------- request SYNC (5,5) <-------------- Sync Request (5,5)
response SYNC (5,5) ----> Sync Response (5,5) ---->
<-------- response SYNC (5,5) <-------- Sync Response (5,5)
Authors' Addresses Authors' Addresses
Raj Singh (Editor) Raj Singh (Editor)
Cisco Systems, Inc. Cisco Systems, Inc.
Divyashree Chambers, B Wing, O'Shaugnessy Road Divyashree Chambers, B Wing, O'Shaugnessy Road
Bangalore, Karnataka 560025 Bangalore, Karnataka 560025
India India
Phone: +91 80 4301 3320 Phone: +91 80 4301 3320
skipping to change at page 18, line 34 skipping to change at page 19, line 4
Authors' Addresses Authors' Addresses
Raj Singh (Editor) Raj Singh (Editor)
Cisco Systems, Inc. Cisco Systems, Inc.
Divyashree Chambers, B Wing, O'Shaugnessy Road Divyashree Chambers, B Wing, O'Shaugnessy Road
Bangalore, Karnataka 560025 Bangalore, Karnataka 560025
India India
Phone: +91 80 4301 3320 Phone: +91 80 4301 3320
Email: rsj@cisco.com Email: rsj@cisco.com
Kalyani Garigipati Kalyani Garigipati
Cisco Systems, Inc. Cisco Systems, Inc.
Divyashree Chambers, B Wing, O'Shaugnessy Road Divyashree Chambers, B Wing, O'Shaugnessy Road
Bangalore, Karnataka 560025 Bangalore, Karnataka 560025
India India
Phone: +91 80 4426 4831 Phone: +91 80 4426 4831
Email: kagarigi@cisco.com Email: kagarigi@cisco.com
Yoav Nir Yoav Nir
Check Point Software Technologies Ltd. Check Point Software Technologies Ltd.
5 Hasolelim st. 5 Hasolelim St.
Tel Aviv 67897 Tel Aviv 67897
Israel Israel
Email: ynir@checkpoint.com Email: ynir@checkpoint.com
Dacheng Zhang Dacheng Zhang
Huawei Technologies Ltd. Huawei Technologies Ltd.
Email: zhangdacheng@huawei.com Email: zhangdacheng@huawei.com
 End of changes. 112 change blocks. 
483 lines changed or deleted 547 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/