draft-ietf-ipsecme-ipsec-ha-06.txt   draft-ietf-ipsecme-ipsec-ha-07.txt 
Network Working Group Y. Nir Network Working Group Y. Nir
Internet-Draft Check Point Internet-Draft Check Point
Intended status: Informational June 10, 2010 Intended status: Informational June 24, 2010
Expires: December 12, 2010 Expires: December 26, 2010
IPsec Cluster Problem Statement IPsec Cluster Problem Statement
draft-ietf-ipsecme-ipsec-ha-06 draft-ietf-ipsecme-ipsec-ha-07
Abstract Abstract
This document defines terminology, problem statement and requirements This document defines terminology, problem statement and requirements
for implementing IKE and IPsec on clusters. It also describes gaps for implementing IKE and IPsec on clusters. It also describes gaps
in existing standards and their implementation that need to be in existing standards and their implementation that need to be
filled, in order to allow peers to interoperate with clusters from filled, in order to allow peers to interoperate with clusters from
different vendors. An agreed terminology, problem statement and different vendors. An agreed terminology, problem statement and
requirements will allow the IPSECME WG to consider development of requirements will allow the IPSECME WG to consider development of
IPsec/IKEv2 mechanisms to simplify cluster implementations. IPsec/IKEv2 mechanisms to simplify cluster implementations.
skipping to change at page 1, line 36 skipping to change at page 1, line 36
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 12, 2010. This Internet-Draft will expire on December 26, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 4, line 9 skipping to change at page 4, line 9
"Member" is one gateway in a cluster. "Member" is one gateway in a cluster.
"Availability" is a measure of a system's ability to perform the "Availability" is a measure of a system's ability to perform the
service for which it was designed. It is measured as the percentage service for which it was designed. It is measured as the percentage
of time a service is available, from the time it is supposed to be of time a service is available, from the time it is supposed to be
available. Colloquially, availability is sometimes expressed in available. Colloquially, availability is sometimes expressed in
"nines" rather than percentage, with 3 "nines" meaning 99.9% "nines" rather than percentage, with 3 "nines" meaning 99.9%
availability, 4 "nines" meaning 99.99% availability, etc. availability, 4 "nines" meaning 99.99% availability, etc.
"High Availability" is a condition of a system, not a configuration "High Availability" is a property of a system, not a configuration
type. A system is said to have high availability if its expected type. A system is said to have high availability if its expected
down time is low. High availability can be achieved in various ways, down time is low. High availability can be achieved in various ways,
one of which is clustering. All the clusters described in this one of which is clustering. All the clusters described in this
document achieve high availability. What "high" means depends on document achieve high availability. What "high" means depends on the
application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of
down time per year in a system that is supposed to be available all down time per year in a system that is supposed to be available all
the time. the time.
"Fault Tolerance" is a condition related to high availability, where "Fault Tolerance" is a property related to high availability, where a
a system maintains service availability, even when a specified set of system maintains service availability, even when a specified set of
fault conditions occur. In clusters, we expect the system to fault conditions occur. In clusters, we expect the system to
maintain service availability, when one or more of the cluster maintain service availability, when one or more of the cluster
members fails. members fails.
"Completely Transparent Cluster" is a cluster where the occurence of "Completely Transparent Cluster" is a cluster where the occurence of
a fault is never visible to the peers. a fault is never visible to the peers.
"Partially Transparent Cluster" is a cluster where the occurence of a "Partially Transparent Cluster" is a cluster where the occurence of a
fault may be visible to the peers. fault may be visible to the peers.
"Hot Standby Cluster", or "HS Cluster" is a cluster where only one of "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
the members is active at any one time. This member is also referred the members is active at any one time. This member is also referred
to as the the "active", whereas the other(s) are referred to as to as the "active", whereas the other(s) are referred to as "stand-
"stand-bys". [VRRP] is one method of building such a cluster. bys". VRRP ([RFC5798]) is one method of building such a cluster.
"Load Sharing Cluster", or "LS Cluster" is a cluster where more than "Load Sharing Cluster", or "LS Cluster" is a cluster where more than
one of the members may be active at the same time. The term "load one of the members may be active at the same time. The term "load
balancing" is also common, but it implies that the load is actually balancing" is also common, but it implies that the load is actually
balanced between the members, and this is not a requirement. balanced between the members, and this is not a requirement.
"Failover" is the event where a one member takes over some load from "Failover" is the event where one member takes over some load from
some other member. In a hot standby cluster, this hapens when a some other member. In a hot standby cluster, this happens when a
standby member becomes active due to a failure of the former active standby member becomes active due to a failure of the former active
member, or because of an administrator command. In a load sharing member, or because of an administrator command. In a load sharing
cluster, this usually happens because of a failure of one of the cluster, this usually happens because of a failure of one of the
members, but certain load-balancing technologies may allow a members, but certain load-balancing technologies may allow a
particular load (such as all the flows associated with a particular particular load (such as all the flows associated with a particular
child SA) to move from one member to another to even out the load, child SA) to move from one member to another to even out the load,
even without any failures. even without any failures.
"Tight Cluster" is a cluster where all the members share an IP "Tight Cluster" is a cluster where all the members share an IP
address. This could be accomplished using configured interfaces with address. This could be accomplished using configured interfaces with
specialized protocols or hardware, such as VRRP, or through the use specialized protocols or hardware, such as VRRP, or through the use
of multicast addresses, but in any case, peers need only be of multicast addresses, but in any case, peers need only be
configured with one IP address in the PAD. configured with one IP address in the PAD.
"Loose Cluster" is a cluster where each member has a different IP "Loose Cluster" is a cluster where each member has a different IP
address. Peers find the correct member using some method such as DNS address. Peers find the correct member using some method such as DNS
queries or [REDIRECT]. In some cases, a member's IP address(es) may queries or the IKEv2 redirect mechanism ([RFC5685]). In some cases,
be allocated to another member at failover. a member's IP address(es) may be allocated to another member at
failover.
"Synch Channel" is a communications channel among the cluster "Synch Channel" is a communications channel among the cluster
members, used to transfer state information. The synch channel may members, used to transfer state information. The synch channel may
or may not be IP based, may or may not be encrypted, and may work or may not be IP based, may or may not be encrypted, and may work
over short or long distances. The security and physical over short or long distances. The security and physical
characteristics of this channel are out of scope for this document, characteristics of this channel are out of scope for this document,
but it is a requirement that its use be minimized for scalability. but it is a requirement that its use be minimized for scalability.
3. The Problem Statement 3. The Problem Statement
skipping to change at page 6, line 21 skipping to change at page 6, line 21
o IPsec SAs last for minutes or hours, and carry keys, selectors and o IPsec SAs last for minutes or hours, and carry keys, selectors and
other information. Some gateways may carry hundreds of thousands other information. Some gateways may carry hundreds of thousands
such IPsec SAs. such IPsec SAs.
o SPD (Security Policy Database) Cache entries. While the SPD is o SPD (Security Policy Database) Cache entries. While the SPD is
unchanging, the SPD cache changes on the fly due to narrowing. unchanging, the SPD cache changes on the fly due to narrowing.
Entries last at least as long as the SAD (Security Association Entries last at least as long as the SAD (Security Association
Database) entries, but tend to last even longer than that. Database) entries, but tend to last even longer than that.
A naive implementation of a cluster would have no synchronized state, A naive implementation of a cluster would have no synchronized state,
and a failover would produce an effect similar to that of a rebooted and a failover would produce an effect similar to that of a rebooted
gateway. [resumption] describes how new IKE and IPsec SAs can be gateway. [RFC5723] describes how new IKE and IPsec SAs can be
recreated in such a case. recreated in such a case.
3.3. IKE Counters 3.3. IKE Counters
We can overcome the first problem described in Section 3.2, by We can overcome the first problem described in Section 3.2, by
synchronizing states - whenever an SA is created, we can synch this synchronizing states - whenever an SA is created, we can synch this
new state to all other members. However, those states are not only new state to all other members. However, those states are not only
long-lived, they are also ever changing. long-lived, they are also ever changing.
IKE has message counters. A peer MUST NOT process message n until IKE has message counters. A peer MUST NOT process message n until
skipping to change at page 7, line 21 skipping to change at page 7, line 21
A possible solution is to synch replay counter information, not for A possible solution is to synch replay counter information, not for
each packet emitted, but only at regular intervals, say, every 10,000 each packet emitted, but only at regular intervals, say, every 10,000
packets or every 0.5 seconds. After a failover, the newly-active packets or every 0.5 seconds. After a failover, the newly-active
member advances the counters for outbound IPsec SAs by 10,000. To member advances the counters for outbound IPsec SAs by 10,000. To
the peer this looks like up to 10,000 packets were lost, but this the peer this looks like up to 10,000 packets were lost, but this
should be acceptable, as neither ESP nor AH guarantee reliable should be acceptable, as neither ESP nor AH guarantee reliable
delivery. delivery.
3.5. Inbound SA Counters 3.5. Inbound SA Counters
An even tougher issue, is the synchronization of packet counters for An even tougher issue is the synchronization of packet counters for
inbound IPsec SAs. If a packet arrives at a newly-active member, inbound IPsec SAs. If a packet arrives at a newly-active member,
there is no way to determine whether this packet is a replay or not. there is no way to determine whether this packet is a replay or not.
The periodic synch does not solve the problem at all, because suppose The periodic synch does not solve the problem at all, because suppose
we synchronize every 10,000 packets, and the last synch before the we synchronize every 10,000 packets, and the last synch before the
failover had the counter at 170,000. It is probable, though not failover had the counter at 170,000. It is probable, though not
certain, that packet number 180,000 has not yet been processed, but certain, that packet number 180,000 has not yet been processed, but
if packet 175,000 arrives at the newly- active member, it has no way if packet 175,000 arrives at the newly- active member, it has no way
of determining whether or not that packet has or has not already been of determining whether or not that packet has or has not already been
processed. The synchronization does prevent the processing of really processed. The synchronization does prevent the processing of really
old packets, such as those with counter number 165,000. Ignoring all old packets, such as those with counter number 165,000. Ignoring all
skipping to change at page 8, line 15 skipping to change at page 8, line 15
3.6. Missing Synch Messages 3.6. Missing Synch Messages
The synch channel is very likely not to be infallible. Before The synch channel is very likely not to be infallible. Before
failover is detected, some synchronization messages may have been failover is detected, some synchronization messages may have been
missed. For example, the active member may have created a new Child missed. For example, the active member may have created a new Child
SA using message n. The new information (entry in the SAD and update SA using message n. The new information (entry in the SAD and update
to counters of the IKE SA) is sent on the synch channel. Still, with to counters of the IKE SA) is sent on the synch channel. Still, with
every possible technology, the update may be missed before the every possible technology, the update may be missed before the
failover. failover.
This is a bad situation, because the IKE SA is doomed. the newly- This is a bad situation, because the IKE SA is doomed. The newly-
active member has two problems: active member has two problems:
o It does not have the new IPsec SA pair. It will drop all incoming o It does not have the new IPsec SA pair. It will drop all incoming
packets protected with such an SA. This could be fixed by sending packets protected with such an SA. This could be fixed by sending
some DELETEs and INVALID_SPI notifications, if it wasn't for the some DELETEs and INVALID_SPI notifications, if it wasn't for the
other problem... other problem...
o The counters for the IKE SA show that only request n-1 has been o The counters for the IKE SA show that only request n-1 has been
sent. The next request will get the message ID n, but that will sent. The next request will get the message ID n, but that will
be rejected by the peer. After a sufficient number of be rejected by the peer. After a sufficient number of
retransmissions and rejections, the whole IKE SA with all retransmissions and rejections, the whole IKE SA with all
associated IPsec SAs will get dropped. associated IPsec SAs will get dropped.
skipping to change at page 9, line 34 skipping to change at page 9, line 34
one implementation to another. one implementation to another.
o Reply packets may arrive with an IPsec SA that is not "matched" to o Reply packets may arrive with an IPsec SA that is not "matched" to
the one used for the outgoing packets. Also, they might arrive at the one used for the outgoing packets. Also, they might arrive at
a different member. This problem is beyond the scope of this a different member. This problem is beyond the scope of this
document and should be solved by the application, perhaps by document and should be solved by the application, perhaps by
forwarding misdirected packets to the correct gateway for deep forwarding misdirected packets to the correct gateway for deep
inspection. inspection.
3.7.1. Outbound SAs using counter modes 3.7.1. Outbound SAs using counter modes
For SAs involving counter mode ciphers such as [CTR] or [GCM] there For SAs involving counter mode ciphers such as CTR ([RFC3686]) or GCM
is yet another complication. The initial vector for such modes MUST ([RFC4106]) there is yet another complication. The initial vector
NOT be repeated, and senders use methods such as counters or LFSRs to for such modes MUST NOT be repeated, and senders use methods such as
ensure this. An SA shared between more than one active member, or counters or LFSRs to ensure this. An SA shared between more than one
even failing over from one member to another need to make sure that active member, or even failing over from one member to another need
they do not generate the same initial vector. See [COUNTER_MODES] to make sure that they do not generate the same initial vector. See
for a discussion of this problem in another context. [COUNTER_MODES] for a discussion of this problem in another context.
3.8. Different IP addresses for IKE and IPsec 3.8. Different IP addresses for IKE and IPsec
In many implementations there are separate IP addresses for the In many implementations there are separate IP addresses for the
cluster, and for each member. While the packets protected by tunnel cluster, and for each member. While the packets protected by tunnel
mode child SAs are encapsulated in IP headers with the cluster IP mode child SAs are encapsulated in IP headers with the cluster IP
address, the IKE packets originate from a specific member, and carry address, the IKE packets originate from a specific member, and carry
that member's IP address. For the peer, this looks weird, as the that member's IP address. For the peer, this looks weird, as the
usual thing is for the IPsec packets to come from the same IP address usual thing is for the IPsec packets to come from the same IP address
as the IKE packets. as the IKE packets.
One obvious solution, is to use some fancy capability of the IKE host One obvious solution is to use some fancy capability of the IKE host
to change things so that IKE packets also come out of the cluster IP to change things so that IKE packets also come out of the cluster IP
address. This can be achieved through NAT or through assigning address. This can be achieved through NAT or through assigning
multiple addresses to interfaces. This is not, however, possible for multiple addresses to interfaces. This is not, however, possible for
all implementations. all implementations.
[ARORA] discusses this problem in greater depth, and proposes another [ARORA] discusses this problem in greater depth, and proposes another
solution, that does involve protocol changes. solution, that does involve protocol changes.
3.9. Allocation of SPIs 3.9. Allocation of SPIs
skipping to change at page 11, line 28 skipping to change at page 11, line 28
Version 02 includes comments by Yaron Sheffer and the acknowledgement Version 02 includes comments by Yaron Sheffer and the acknowledgement
section. section.
Version 03 fixes some ID-nits, and adds the problem presented by Version 03 fixes some ID-nits, and adds the problem presented by
Jitender Arora in [ARORA]. Jitender Arora in [ARORA].
Version 04 fixes a spelling mistake, moves the scope discussion to a Version 04 fixes a spelling mistake, moves the scope discussion to a
subsection of its own (Section 3.1), and adds a short discussion of subsection of its own (Section 3.1), and adds a short discussion of
the duplicate SPI problem, presented by Jean-Michel Combes. the duplicate SPI problem, presented by Jean-Michel Combes.
Versions 05, 06 and 07 just corrected nits and notation
8. Informative References 8. Informative References
[ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for [ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for
IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses
(work in progress), April 2010. (work in progress), April 2010.
[COUNTER_MODES] [COUNTER_MODES]
McGrew, D. and B. Weis, "Using Counter Modes with McGrew, D. and B. Weis, "Using Counter Modes with
Encapsulating Security Payload (ESP) and Authentication Encapsulating Security Payload (ESP) and Authentication
Header (AH) to Protect Group Traffic", Header (AH) to Protect Group Traffic",
draft-ietf-msec-ipsec-group-counter-modes (work in draft-ietf-msec-ipsec-group-counter-modes (work in
progress), March 2010. progress), March 2010.
[CTR] Housley, R., "Using Advanced Encryption Standard (AES)
Counter Mode", RFC 3686, January 2009.
[GCM] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
(GCM) in IPsec Encapsulating Security Payload (ESP)",
RFC 4106, June 2005.
[IKEv2bis] [IKEv2bis]
Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
"Internet Key Exchange Protocol: IKEv2", "Internet Key Exchange Protocol: IKEv2",
draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. draft-ietf-ipsecme-ikev2bis (work in progress), May 2010.
[REDIRECT]
Devarapalli, V. and K. Weniger, "Redirect Mechanism for
IKEv2", RFC 5685, November 2009.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3686] Housley, R., "Using Advanced Encryption Standard (AES)
Counter Mode", RFC 3686, January 2009.
[RFC4106] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
(GCM) in IPsec Encapsulating Security Payload (ESP)",
RFC 4106, June 2005.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005. Internet Protocol", RFC 4301, December 2005.
[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
RFC 4306, December 2005. RFC 4306, December 2005.
[VRRP] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for
RFC 5798, March 2010. IKEv2", RFC 5685, November 2009.
[resumption] [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
RFC 5723, January 2010. RFC 5723, January 2010.
[RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)",
RFC 5798, March 2010.
Author's Address Author's Address
Yoav Nir Yoav Nir
Check Point Software Technologies Ltd. Check Point Software Technologies Ltd.
5 Hasolelim st. 5 Hasolelim st.
Tel Aviv 67897 Tel Aviv 67897
Israel Israel
Email: ynir@checkpoint.com Email: ynir@checkpoint.com
 End of changes. 21 change blocks. 
40 lines changed or deleted 41 lines changed or added

This html diff was produced by rfcdiff 1.38. The latest version is available from http://tools.ietf.org/tools/rfcdiff/