--- 1/draft-ietf-ipsecme-ipsec-ha-06.txt 2010-06-24 18:12:32.000000000 +0200 +++ 2/draft-ietf-ipsecme-ipsec-ha-07.txt 2010-06-24 18:12:32.000000000 +0200 @@ -1,18 +1,18 @@ Network Working Group Y. Nir Internet-Draft Check Point -Intended status: Informational June 10, 2010 -Expires: December 12, 2010 +Intended status: Informational June 24, 2010 +Expires: December 26, 2010 IPsec Cluster Problem Statement - draft-ietf-ipsecme-ipsec-ha-06 + draft-ietf-ipsecme-ipsec-ha-07 Abstract This document defines terminology, problem statement and requirements for implementing IKE and IPsec on clusters. It also describes gaps in existing standards and their implementation that need to be filled, in order to allow peers to interoperate with clusters from different vendors. An agreed terminology, problem statement and requirements will allow the IPSECME WG to consider development of IPsec/IKEv2 mechanisms to simplify cluster implementations. @@ -25,21 +25,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on December 12, 2010. + This Internet-Draft will expire on December 26, 2010. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -118,71 +118,72 @@ "Member" is one gateway in a cluster. "Availability" is a measure of a system's ability to perform the service for which it was designed. It is measured as the percentage of time a service is available, from the time it is supposed to be available. Colloquially, availability is sometimes expressed in "nines" rather than percentage, with 3 "nines" meaning 99.9% availability, 4 "nines" meaning 99.99% availability, etc. - "High Availability" is a condition of a system, not a configuration + "High Availability" is a property of a system, not a configuration type. A system is said to have high availability if its expected down time is low. High availability can be achieved in various ways, one of which is clustering. All the clusters described in this - document achieve high availability. What "high" means depends on + document achieve high availability. What "high" means depends on the application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of down time per year in a system that is supposed to be available all the time. - "Fault Tolerance" is a condition related to high availability, where - a system maintains service availability, even when a specified set of + "Fault Tolerance" is a property related to high availability, where a + system maintains service availability, even when a specified set of fault conditions occur. In clusters, we expect the system to maintain service availability, when one or more of the cluster members fails. "Completely Transparent Cluster" is a cluster where the occurence of a fault is never visible to the peers. "Partially Transparent Cluster" is a cluster where the occurence of a fault may be visible to the peers. "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of the members is active at any one time. This member is also referred - to as the the "active", whereas the other(s) are referred to as - "stand-bys". [VRRP] is one method of building such a cluster. + to as the "active", whereas the other(s) are referred to as "stand- + bys". VRRP ([RFC5798]) is one method of building such a cluster. "Load Sharing Cluster", or "LS Cluster" is a cluster where more than one of the members may be active at the same time. The term "load balancing" is also common, but it implies that the load is actually balanced between the members, and this is not a requirement. - "Failover" is the event where a one member takes over some load from - some other member. In a hot standby cluster, this hapens when a + "Failover" is the event where one member takes over some load from + some other member. In a hot standby cluster, this happens when a standby member becomes active due to a failure of the former active member, or because of an administrator command. In a load sharing cluster, this usually happens because of a failure of one of the members, but certain load-balancing technologies may allow a particular load (such as all the flows associated with a particular child SA) to move from one member to another to even out the load, even without any failures. "Tight Cluster" is a cluster where all the members share an IP address. This could be accomplished using configured interfaces with specialized protocols or hardware, such as VRRP, or through the use of multicast addresses, but in any case, peers need only be configured with one IP address in the PAD. "Loose Cluster" is a cluster where each member has a different IP address. Peers find the correct member using some method such as DNS - queries or [REDIRECT]. In some cases, a member's IP address(es) may - be allocated to another member at failover. + queries or the IKEv2 redirect mechanism ([RFC5685]). In some cases, + a member's IP address(es) may be allocated to another member at + failover. "Synch Channel" is a communications channel among the cluster members, used to transfer state information. The synch channel may or may not be IP based, may or may not be encrypted, and may work over short or long distances. The security and physical characteristics of this channel are out of scope for this document, but it is a requirement that its use be minimized for scalability. 3. The Problem Statement @@ -223,21 +224,21 @@ o IPsec SAs last for minutes or hours, and carry keys, selectors and other information. Some gateways may carry hundreds of thousands such IPsec SAs. o SPD (Security Policy Database) Cache entries. While the SPD is unchanging, the SPD cache changes on the fly due to narrowing. Entries last at least as long as the SAD (Security Association Database) entries, but tend to last even longer than that. A naive implementation of a cluster would have no synchronized state, and a failover would produce an effect similar to that of a rebooted - gateway. [resumption] describes how new IKE and IPsec SAs can be + gateway. [RFC5723] describes how new IKE and IPsec SAs can be recreated in such a case. 3.3. IKE Counters We can overcome the first problem described in Section 3.2, by synchronizing states - whenever an SA is created, we can synch this new state to all other members. However, those states are not only long-lived, they are also ever changing. IKE has message counters. A peer MUST NOT process message n until @@ -271,21 +272,21 @@ A possible solution is to synch replay counter information, not for each packet emitted, but only at regular intervals, say, every 10,000 packets or every 0.5 seconds. After a failover, the newly-active member advances the counters for outbound IPsec SAs by 10,000. To the peer this looks like up to 10,000 packets were lost, but this should be acceptable, as neither ESP nor AH guarantee reliable delivery. 3.5. Inbound SA Counters - An even tougher issue, is the synchronization of packet counters for + An even tougher issue is the synchronization of packet counters for inbound IPsec SAs. If a packet arrives at a newly-active member, there is no way to determine whether this packet is a replay or not. The periodic synch does not solve the problem at all, because suppose we synchronize every 10,000 packets, and the last synch before the failover had the counter at 170,000. It is probable, though not certain, that packet number 180,000 has not yet been processed, but if packet 175,000 arrives at the newly- active member, it has no way of determining whether or not that packet has or has not already been processed. The synchronization does prevent the processing of really old packets, such as those with counter number 165,000. Ignoring all @@ -311,21 +312,21 @@ 3.6. Missing Synch Messages The synch channel is very likely not to be infallible. Before failover is detected, some synchronization messages may have been missed. For example, the active member may have created a new Child SA using message n. The new information (entry in the SAD and update to counters of the IKE SA) is sent on the synch channel. Still, with every possible technology, the update may be missed before the failover. - This is a bad situation, because the IKE SA is doomed. the newly- + This is a bad situation, because the IKE SA is doomed. The newly- active member has two problems: o It does not have the new IPsec SA pair. It will drop all incoming packets protected with such an SA. This could be fixed by sending some DELETEs and INVALID_SPI notifications, if it wasn't for the other problem... o The counters for the IKE SA show that only request n-1 has been sent. The next request will get the message ID n, but that will be rejected by the peer. After a sufficient number of retransmissions and rejections, the whole IKE SA with all associated IPsec SAs will get dropped. @@ -379,39 +380,39 @@ one implementation to another. o Reply packets may arrive with an IPsec SA that is not "matched" to the one used for the outgoing packets. Also, they might arrive at a different member. This problem is beyond the scope of this document and should be solved by the application, perhaps by forwarding misdirected packets to the correct gateway for deep inspection. 3.7.1. Outbound SAs using counter modes - For SAs involving counter mode ciphers such as [CTR] or [GCM] there - is yet another complication. The initial vector for such modes MUST - NOT be repeated, and senders use methods such as counters or LFSRs to - ensure this. An SA shared between more than one active member, or - even failing over from one member to another need to make sure that - they do not generate the same initial vector. See [COUNTER_MODES] - for a discussion of this problem in another context. + For SAs involving counter mode ciphers such as CTR ([RFC3686]) or GCM + ([RFC4106]) there is yet another complication. The initial vector + for such modes MUST NOT be repeated, and senders use methods such as + counters or LFSRs to ensure this. An SA shared between more than one + active member, or even failing over from one member to another need + to make sure that they do not generate the same initial vector. See + [COUNTER_MODES] for a discussion of this problem in another context. 3.8. Different IP addresses for IKE and IPsec In many implementations there are separate IP addresses for the cluster, and for each member. While the packets protected by tunnel mode child SAs are encapsulated in IP headers with the cluster IP address, the IKE packets originate from a specific member, and carry that member's IP address. For the peer, this looks weird, as the usual thing is for the IPsec packets to come from the same IP address as the IKE packets. - One obvious solution, is to use some fancy capability of the IKE host + One obvious solution is to use some fancy capability of the IKE host to change things so that IKE packets also come out of the cluster IP address. This can be achieved through NAT or through assigning multiple addresses to interfaces. This is not, however, possible for all implementations. [ARORA] discusses this problem in greater depth, and proposes another solution, that does involve protocol changes. 3.9. Allocation of SPIs @@ -466,64 +467,64 @@ Version 02 includes comments by Yaron Sheffer and the acknowledgement section. Version 03 fixes some ID-nits, and adds the problem presented by Jitender Arora in [ARORA]. Version 04 fixes a spelling mistake, moves the scope discussion to a subsection of its own (Section 3.1), and adds a short discussion of the duplicate SPI problem, presented by Jean-Michel Combes. + Versions 05, 06 and 07 just corrected nits and notation + 8. Informative References [ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses (work in progress), April 2010. [COUNTER_MODES] McGrew, D. and B. Weis, "Using Counter Modes with Encapsulating Security Payload (ESP) and Authentication Header (AH) to Protect Group Traffic", draft-ietf-msec-ipsec-group-counter-modes (work in progress), March 2010. - [CTR] Housley, R., "Using Advanced Encryption Standard (AES) - Counter Mode", RFC 3686, January 2009. - - [GCM] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode - (GCM) in IPsec Encapsulating Security Payload (ESP)", - RFC 4106, June 2005. - [IKEv2bis] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, "Internet Key Exchange Protocol: IKEv2", draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. - [REDIRECT] - Devarapalli, V. and K. Weniger, "Redirect Mechanism for - IKEv2", RFC 5685, November 2009. - [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. + [RFC3686] Housley, R., "Using Advanced Encryption Standard (AES) + Counter Mode", RFC 3686, January 2009. + + [RFC4106] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode + (GCM) in IPsec Encapsulating Security Payload (ESP)", + RFC 4106, June 2005. + [RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005. [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC 4306, December 2005. - [VRRP] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", - RFC 5798, March 2010. + [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for + IKEv2", RFC 5685, November 2009. - [resumption] - Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", + [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", RFC 5723, January 2010. + [RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", + RFC 5798, March 2010. + Author's Address Yoav Nir Check Point Software Technologies Ltd. 5 Hasolelim st. Tel Aviv 67897 Israel Email: ynir@checkpoint.com