draft-ietf-ipsecme-ipsec-ha-06.txt | draft-ietf-ipsecme-ipsec-ha-07.txt | |||
---|---|---|---|---|
Network Working Group Y. Nir | Network Working Group Y. Nir | |||
Internet-Draft Check Point | Internet-Draft Check Point | |||
Intended status: Informational June 10, 2010 | Intended status: Informational June 24, 2010 | |||
Expires: December 12, 2010 | Expires: December 26, 2010 | |||
IPsec Cluster Problem Statement | IPsec Cluster Problem Statement | |||
draft-ietf-ipsecme-ipsec-ha-06 | draft-ietf-ipsecme-ipsec-ha-07 | |||
Abstract | Abstract | |||
This document defines terminology, problem statement and requirements | This document defines terminology, problem statement and requirements | |||
for implementing IKE and IPsec on clusters. It also describes gaps | for implementing IKE and IPsec on clusters. It also describes gaps | |||
in existing standards and their implementation that need to be | in existing standards and their implementation that need to be | |||
filled, in order to allow peers to interoperate with clusters from | filled, in order to allow peers to interoperate with clusters from | |||
different vendors. An agreed terminology, problem statement and | different vendors. An agreed terminology, problem statement and | |||
requirements will allow the IPSECME WG to consider development of | requirements will allow the IPSECME WG to consider development of | |||
IPsec/IKEv2 mechanisms to simplify cluster implementations. | IPsec/IKEv2 mechanisms to simplify cluster implementations. | |||
skipping to change at page 1, line 36 | skipping to change at page 1, line 36 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on December 12, 2010. | This Internet-Draft will expire on December 26, 2010. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2010 IETF Trust and the persons identified as the | Copyright (c) 2010 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 4, line 9 | skipping to change at page 4, line 9 | |||
"Member" is one gateway in a cluster. | "Member" is one gateway in a cluster. | |||
"Availability" is a measure of a system's ability to perform the | "Availability" is a measure of a system's ability to perform the | |||
service for which it was designed. It is measured as the percentage | service for which it was designed. It is measured as the percentage | |||
of time a service is available, from the time it is supposed to be | of time a service is available, from the time it is supposed to be | |||
available. Colloquially, availability is sometimes expressed in | available. Colloquially, availability is sometimes expressed in | |||
"nines" rather than percentage, with 3 "nines" meaning 99.9% | "nines" rather than percentage, with 3 "nines" meaning 99.9% | |||
availability, 4 "nines" meaning 99.99% availability, etc. | availability, 4 "nines" meaning 99.99% availability, etc. | |||
"High Availability" is a condition of a system, not a configuration | "High Availability" is a property of a system, not a configuration | |||
type. A system is said to have high availability if its expected | type. A system is said to have high availability if its expected | |||
down time is low. High availability can be achieved in various ways, | down time is low. High availability can be achieved in various ways, | |||
one of which is clustering. All the clusters described in this | one of which is clustering. All the clusters described in this | |||
document achieve high availability. What "high" means depends on | document achieve high availability. What "high" means depends on the | |||
application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of | application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of | |||
down time per year in a system that is supposed to be available all | down time per year in a system that is supposed to be available all | |||
the time. | the time. | |||
"Fault Tolerance" is a condition related to high availability, where | "Fault Tolerance" is a property related to high availability, where a | |||
a system maintains service availability, even when a specified set of | system maintains service availability, even when a specified set of | |||
fault conditions occur. In clusters, we expect the system to | fault conditions occur. In clusters, we expect the system to | |||
maintain service availability, when one or more of the cluster | maintain service availability, when one or more of the cluster | |||
members fails. | members fails. | |||
"Completely Transparent Cluster" is a cluster where the occurence of | "Completely Transparent Cluster" is a cluster where the occurence of | |||
a fault is never visible to the peers. | a fault is never visible to the peers. | |||
"Partially Transparent Cluster" is a cluster where the occurence of a | "Partially Transparent Cluster" is a cluster where the occurence of a | |||
fault may be visible to the peers. | fault may be visible to the peers. | |||
"Hot Standby Cluster", or "HS Cluster" is a cluster where only one of | "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of | |||
the members is active at any one time. This member is also referred | the members is active at any one time. This member is also referred | |||
to as the the "active", whereas the other(s) are referred to as | to as the "active", whereas the other(s) are referred to as "stand- | |||
"stand-bys". [VRRP] is one method of building such a cluster. | bys". VRRP ([RFC5798]) is one method of building such a cluster. | |||
"Load Sharing Cluster", or "LS Cluster" is a cluster where more than | "Load Sharing Cluster", or "LS Cluster" is a cluster where more than | |||
one of the members may be active at the same time. The term "load | one of the members may be active at the same time. The term "load | |||
balancing" is also common, but it implies that the load is actually | balancing" is also common, but it implies that the load is actually | |||
balanced between the members, and this is not a requirement. | balanced between the members, and this is not a requirement. | |||
"Failover" is the event where a one member takes over some load from | "Failover" is the event where one member takes over some load from | |||
some other member. In a hot standby cluster, this hapens when a | some other member. In a hot standby cluster, this happens when a | |||
standby member becomes active due to a failure of the former active | standby member becomes active due to a failure of the former active | |||
member, or because of an administrator command. In a load sharing | member, or because of an administrator command. In a load sharing | |||
cluster, this usually happens because of a failure of one of the | cluster, this usually happens because of a failure of one of the | |||
members, but certain load-balancing technologies may allow a | members, but certain load-balancing technologies may allow a | |||
particular load (such as all the flows associated with a particular | particular load (such as all the flows associated with a particular | |||
child SA) to move from one member to another to even out the load, | child SA) to move from one member to another to even out the load, | |||
even without any failures. | even without any failures. | |||
"Tight Cluster" is a cluster where all the members share an IP | "Tight Cluster" is a cluster where all the members share an IP | |||
address. This could be accomplished using configured interfaces with | address. This could be accomplished using configured interfaces with | |||
specialized protocols or hardware, such as VRRP, or through the use | specialized protocols or hardware, such as VRRP, or through the use | |||
of multicast addresses, but in any case, peers need only be | of multicast addresses, but in any case, peers need only be | |||
configured with one IP address in the PAD. | configured with one IP address in the PAD. | |||
"Loose Cluster" is a cluster where each member has a different IP | "Loose Cluster" is a cluster where each member has a different IP | |||
address. Peers find the correct member using some method such as DNS | address. Peers find the correct member using some method such as DNS | |||
queries or [REDIRECT]. In some cases, a member's IP address(es) may | queries or the IKEv2 redirect mechanism ([RFC5685]). In some cases, | |||
be allocated to another member at failover. | a member's IP address(es) may be allocated to another member at | |||
failover. | ||||
"Synch Channel" is a communications channel among the cluster | "Synch Channel" is a communications channel among the cluster | |||
members, used to transfer state information. The synch channel may | members, used to transfer state information. The synch channel may | |||
or may not be IP based, may or may not be encrypted, and may work | or may not be IP based, may or may not be encrypted, and may work | |||
over short or long distances. The security and physical | over short or long distances. The security and physical | |||
characteristics of this channel are out of scope for this document, | characteristics of this channel are out of scope for this document, | |||
but it is a requirement that its use be minimized for scalability. | but it is a requirement that its use be minimized for scalability. | |||
3. The Problem Statement | 3. The Problem Statement | |||
skipping to change at page 6, line 21 | skipping to change at page 6, line 21 | |||
o IPsec SAs last for minutes or hours, and carry keys, selectors and | o IPsec SAs last for minutes or hours, and carry keys, selectors and | |||
other information. Some gateways may carry hundreds of thousands | other information. Some gateways may carry hundreds of thousands | |||
such IPsec SAs. | such IPsec SAs. | |||
o SPD (Security Policy Database) Cache entries. While the SPD is | o SPD (Security Policy Database) Cache entries. While the SPD is | |||
unchanging, the SPD cache changes on the fly due to narrowing. | unchanging, the SPD cache changes on the fly due to narrowing. | |||
Entries last at least as long as the SAD (Security Association | Entries last at least as long as the SAD (Security Association | |||
Database) entries, but tend to last even longer than that. | Database) entries, but tend to last even longer than that. | |||
A naive implementation of a cluster would have no synchronized state, | A naive implementation of a cluster would have no synchronized state, | |||
and a failover would produce an effect similar to that of a rebooted | and a failover would produce an effect similar to that of a rebooted | |||
gateway. [resumption] describes how new IKE and IPsec SAs can be | gateway. [RFC5723] describes how new IKE and IPsec SAs can be | |||
recreated in such a case. | recreated in such a case. | |||
3.3. IKE Counters | 3.3. IKE Counters | |||
We can overcome the first problem described in Section 3.2, by | We can overcome the first problem described in Section 3.2, by | |||
synchronizing states - whenever an SA is created, we can synch this | synchronizing states - whenever an SA is created, we can synch this | |||
new state to all other members. However, those states are not only | new state to all other members. However, those states are not only | |||
long-lived, they are also ever changing. | long-lived, they are also ever changing. | |||
IKE has message counters. A peer MUST NOT process message n until | IKE has message counters. A peer MUST NOT process message n until | |||
skipping to change at page 7, line 21 | skipping to change at page 7, line 21 | |||
A possible solution is to synch replay counter information, not for | A possible solution is to synch replay counter information, not for | |||
each packet emitted, but only at regular intervals, say, every 10,000 | each packet emitted, but only at regular intervals, say, every 10,000 | |||
packets or every 0.5 seconds. After a failover, the newly-active | packets or every 0.5 seconds. After a failover, the newly-active | |||
member advances the counters for outbound IPsec SAs by 10,000. To | member advances the counters for outbound IPsec SAs by 10,000. To | |||
the peer this looks like up to 10,000 packets were lost, but this | the peer this looks like up to 10,000 packets were lost, but this | |||
should be acceptable, as neither ESP nor AH guarantee reliable | should be acceptable, as neither ESP nor AH guarantee reliable | |||
delivery. | delivery. | |||
3.5. Inbound SA Counters | 3.5. Inbound SA Counters | |||
An even tougher issue, is the synchronization of packet counters for | An even tougher issue is the synchronization of packet counters for | |||
inbound IPsec SAs. If a packet arrives at a newly-active member, | inbound IPsec SAs. If a packet arrives at a newly-active member, | |||
there is no way to determine whether this packet is a replay or not. | there is no way to determine whether this packet is a replay or not. | |||
The periodic synch does not solve the problem at all, because suppose | The periodic synch does not solve the problem at all, because suppose | |||
we synchronize every 10,000 packets, and the last synch before the | we synchronize every 10,000 packets, and the last synch before the | |||
failover had the counter at 170,000. It is probable, though not | failover had the counter at 170,000. It is probable, though not | |||
certain, that packet number 180,000 has not yet been processed, but | certain, that packet number 180,000 has not yet been processed, but | |||
if packet 175,000 arrives at the newly- active member, it has no way | if packet 175,000 arrives at the newly- active member, it has no way | |||
of determining whether or not that packet has or has not already been | of determining whether or not that packet has or has not already been | |||
processed. The synchronization does prevent the processing of really | processed. The synchronization does prevent the processing of really | |||
old packets, such as those with counter number 165,000. Ignoring all | old packets, such as those with counter number 165,000. Ignoring all | |||
skipping to change at page 8, line 15 | skipping to change at page 8, line 15 | |||
3.6. Missing Synch Messages | 3.6. Missing Synch Messages | |||
The synch channel is very likely not to be infallible. Before | The synch channel is very likely not to be infallible. Before | |||
failover is detected, some synchronization messages may have been | failover is detected, some synchronization messages may have been | |||
missed. For example, the active member may have created a new Child | missed. For example, the active member may have created a new Child | |||
SA using message n. The new information (entry in the SAD and update | SA using message n. The new information (entry in the SAD and update | |||
to counters of the IKE SA) is sent on the synch channel. Still, with | to counters of the IKE SA) is sent on the synch channel. Still, with | |||
every possible technology, the update may be missed before the | every possible technology, the update may be missed before the | |||
failover. | failover. | |||
This is a bad situation, because the IKE SA is doomed. the newly- | This is a bad situation, because the IKE SA is doomed. The newly- | |||
active member has two problems: | active member has two problems: | |||
o It does not have the new IPsec SA pair. It will drop all incoming | o It does not have the new IPsec SA pair. It will drop all incoming | |||
packets protected with such an SA. This could be fixed by sending | packets protected with such an SA. This could be fixed by sending | |||
some DELETEs and INVALID_SPI notifications, if it wasn't for the | some DELETEs and INVALID_SPI notifications, if it wasn't for the | |||
other problem... | other problem... | |||
o The counters for the IKE SA show that only request n-1 has been | o The counters for the IKE SA show that only request n-1 has been | |||
sent. The next request will get the message ID n, but that will | sent. The next request will get the message ID n, but that will | |||
be rejected by the peer. After a sufficient number of | be rejected by the peer. After a sufficient number of | |||
retransmissions and rejections, the whole IKE SA with all | retransmissions and rejections, the whole IKE SA with all | |||
associated IPsec SAs will get dropped. | associated IPsec SAs will get dropped. | |||
skipping to change at page 9, line 34 | skipping to change at page 9, line 34 | |||
one implementation to another. | one implementation to another. | |||
o Reply packets may arrive with an IPsec SA that is not "matched" to | o Reply packets may arrive with an IPsec SA that is not "matched" to | |||
the one used for the outgoing packets. Also, they might arrive at | the one used for the outgoing packets. Also, they might arrive at | |||
a different member. This problem is beyond the scope of this | a different member. This problem is beyond the scope of this | |||
document and should be solved by the application, perhaps by | document and should be solved by the application, perhaps by | |||
forwarding misdirected packets to the correct gateway for deep | forwarding misdirected packets to the correct gateway for deep | |||
inspection. | inspection. | |||
3.7.1. Outbound SAs using counter modes | 3.7.1. Outbound SAs using counter modes | |||
For SAs involving counter mode ciphers such as [CTR] or [GCM] there | For SAs involving counter mode ciphers such as CTR ([RFC3686]) or GCM | |||
is yet another complication. The initial vector for such modes MUST | ([RFC4106]) there is yet another complication. The initial vector | |||
NOT be repeated, and senders use methods such as counters or LFSRs to | for such modes MUST NOT be repeated, and senders use methods such as | |||
ensure this. An SA shared between more than one active member, or | counters or LFSRs to ensure this. An SA shared between more than one | |||
even failing over from one member to another need to make sure that | active member, or even failing over from one member to another need | |||
they do not generate the same initial vector. See [COUNTER_MODES] | to make sure that they do not generate the same initial vector. See | |||
for a discussion of this problem in another context. | [COUNTER_MODES] for a discussion of this problem in another context. | |||
3.8. Different IP addresses for IKE and IPsec | 3.8. Different IP addresses for IKE and IPsec | |||
In many implementations there are separate IP addresses for the | In many implementations there are separate IP addresses for the | |||
cluster, and for each member. While the packets protected by tunnel | cluster, and for each member. While the packets protected by tunnel | |||
mode child SAs are encapsulated in IP headers with the cluster IP | mode child SAs are encapsulated in IP headers with the cluster IP | |||
address, the IKE packets originate from a specific member, and carry | address, the IKE packets originate from a specific member, and carry | |||
that member's IP address. For the peer, this looks weird, as the | that member's IP address. For the peer, this looks weird, as the | |||
usual thing is for the IPsec packets to come from the same IP address | usual thing is for the IPsec packets to come from the same IP address | |||
as the IKE packets. | as the IKE packets. | |||
One obvious solution, is to use some fancy capability of the IKE host | One obvious solution is to use some fancy capability of the IKE host | |||
to change things so that IKE packets also come out of the cluster IP | to change things so that IKE packets also come out of the cluster IP | |||
address. This can be achieved through NAT or through assigning | address. This can be achieved through NAT or through assigning | |||
multiple addresses to interfaces. This is not, however, possible for | multiple addresses to interfaces. This is not, however, possible for | |||
all implementations. | all implementations. | |||
[ARORA] discusses this problem in greater depth, and proposes another | [ARORA] discusses this problem in greater depth, and proposes another | |||
solution, that does involve protocol changes. | solution, that does involve protocol changes. | |||
3.9. Allocation of SPIs | 3.9. Allocation of SPIs | |||
skipping to change at page 11, line 28 | skipping to change at page 11, line 28 | |||
Version 02 includes comments by Yaron Sheffer and the acknowledgement | Version 02 includes comments by Yaron Sheffer and the acknowledgement | |||
section. | section. | |||
Version 03 fixes some ID-nits, and adds the problem presented by | Version 03 fixes some ID-nits, and adds the problem presented by | |||
Jitender Arora in [ARORA]. | Jitender Arora in [ARORA]. | |||
Version 04 fixes a spelling mistake, moves the scope discussion to a | Version 04 fixes a spelling mistake, moves the scope discussion to a | |||
subsection of its own (Section 3.1), and adds a short discussion of | subsection of its own (Section 3.1), and adds a short discussion of | |||
the duplicate SPI problem, presented by Jean-Michel Combes. | the duplicate SPI problem, presented by Jean-Michel Combes. | |||
Versions 05, 06 and 07 just corrected nits and notation | ||||
8. Informative References | 8. Informative References | |||
[ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for | [ARORA] Arora, J. and P. Kumar, "Alternate Tunnel Addresses for | |||
IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses | IKEv2", draft-arora-ipsecme-ikev2-alt-tunnel-addresses | |||
(work in progress), April 2010. | (work in progress), April 2010. | |||
[COUNTER_MODES] | [COUNTER_MODES] | |||
McGrew, D. and B. Weis, "Using Counter Modes with | McGrew, D. and B. Weis, "Using Counter Modes with | |||
Encapsulating Security Payload (ESP) and Authentication | Encapsulating Security Payload (ESP) and Authentication | |||
Header (AH) to Protect Group Traffic", | Header (AH) to Protect Group Traffic", | |||
draft-ietf-msec-ipsec-group-counter-modes (work in | draft-ietf-msec-ipsec-group-counter-modes (work in | |||
progress), March 2010. | progress), March 2010. | |||
[CTR] Housley, R., "Using Advanced Encryption Standard (AES) | ||||
Counter Mode", RFC 3686, January 2009. | ||||
[GCM] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode | ||||
(GCM) in IPsec Encapsulating Security Payload (ESP)", | ||||
RFC 4106, June 2005. | ||||
[IKEv2bis] | [IKEv2bis] | |||
Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, | Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, | |||
"Internet Key Exchange Protocol: IKEv2", | "Internet Key Exchange Protocol: IKEv2", | |||
draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. | draft-ietf-ipsecme-ikev2bis (work in progress), May 2010. | |||
[REDIRECT] | ||||
Devarapalli, V. and K. Weniger, "Redirect Mechanism for | ||||
IKEv2", RFC 5685, November 2009. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC3686] Housley, R., "Using Advanced Encryption Standard (AES) | ||||
Counter Mode", RFC 3686, January 2009. | ||||
[RFC4106] Viega, J. and D. McGrew, "The Use of Galois/Counter Mode | ||||
(GCM) in IPsec Encapsulating Security Payload (ESP)", | ||||
RFC 4106, June 2005. | ||||
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the | [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | |||
Internet Protocol", RFC 4301, December 2005. | Internet Protocol", RFC 4301, December 2005. | |||
[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", | [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", | |||
RFC 4306, December 2005. | RFC 4306, December 2005. | |||
[VRRP] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", | [RFC5685] Devarapalli, V. and K. Weniger, "Redirect Mechanism for | |||
RFC 5798, March 2010. | IKEv2", RFC 5685, November 2009. | |||
[resumption] | [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", | |||
Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", | ||||
RFC 5723, January 2010. | RFC 5723, January 2010. | |||
[RFC5798] Nadas, S., "Virtual Router Redundancy Protocol (VRRP)", | ||||
RFC 5798, March 2010. | ||||
Author's Address | Author's Address | |||
Yoav Nir | Yoav Nir | |||
Check Point Software Technologies Ltd. | Check Point Software Technologies Ltd. | |||
5 Hasolelim st. | 5 Hasolelim st. | |||
Tel Aviv 67897 | Tel Aviv 67897 | |||
Israel | Israel | |||
Email: ynir@checkpoint.com | Email: ynir@checkpoint.com | |||
End of changes. 21 change blocks. | ||||
40 lines changed or deleted | 41 lines changed or added | |||
This html diff was produced by rfcdiff 1.38. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |