cc/td/doc/product/rtrmgmt/ciscoasu/nr/nr3.0
hometocprevnextglossaryfeedbacksearchhelp
PDF

Table of Contents

DHCP Failover

DHCP Failover

Because the DHCP protocol (RFC 2131) allows multiple DHCP servers, it is possible to configure DHCP servers so that, in the event one server is unable to provide leases to requesting clients, a backup DHCP server can take over. Network Registrar provides this capability in its failover feature.

This chapter describes

Benefits of Failover

Failover provides a high-availability DHCP service, that is, it allows you to configure two servers to operate as a redundant pair of DHCP servers. If one server is down, the other server seamlessly takes over, and allows existing DHCP clients to keep and renew their IP leases. Clients requesting new leases need not know or care which server is responding to their request for a lease. These clients are able to obtain leases even if the main server is not operational.

How Failover Operates

After the servers are started, each server contacts the other. After contact is established, the main server provides the backup server with a private pool of IP addresses that it can use in the event of a failure. The main server then updates the backup server whenever it performs an operation for a DHCP client.

Under normal conditions, the main server continues to update the backup server and the backup server allows the main server to service DHCP client requests.

If a failure occurs on the main server, the backup server takes over and renews the addresses of the existing clients and offers addresses to new clients. When the main server is operational again, it automatically reintegrates with the backup server without administrator intervention.

The failover protocol is designed to protect against several kinds of failures:

Failover Regimes

The failover protocol operates in three regimes, which correspond loosely to failover states. These regimes are:

Each server operates differently in each of these regimes. Table 3-1 describes the server operations.


Table 3-1: Server Operations
Regime Main Server Backup Server

Normal

Responsive to all DHCP client requests and allocates IP addresses to new clients from its pool of available IP addresses.

It has allocated to the backup server some IP addresses for the backup server to use if communications are interrupted.

Unresponsive to DHCP client requests except renewals or rebinding requests. The backup server has requested and received a set of IP addresses to use for allocation to new DHCP clients if communication with the main server is interrupted.

Communications Interrupted

Responsive to all DHCP client requests. It cannot tell if the backup server has gone down or if the backup server is just unable to communicate. It operates normally, although it cannot reallocate an IP address from one DHCP client to another while in this regime.

Cannot tell if the main server is down or simply not communicating. In either case, the backup server is responsive to all DHCP client requests and can allocate IP addresses from its pool of available addresses it has received from the main server.

Servers usually transition between Normal and Communications Interrupted as one or the other server goes up and down.

Partner Down

The running server is guaranteed that the other server is down. The running server has control of all of the IP addresses, can offer any configured lease time or lease extension period, and at any time can reallocate an IP address from one client to another.

A server will only transition to Partner Down if it is informed that the other partner is indeed down. The notification can be either through the protocol (used when the partner knows that it is going down) or because the server was unable to communicate with its partner, it automatically entered the Communication Interrupted regime, and the administrator used the setPartnerDown command.

The setPartnerDown command tells the server that its partner is down. You could configure failover to affect an automatic transition from Communications Interrupted to Partner Down after the safe period has passed, but doing so would run the risk of duplicate IP address allocations if the partner is not actually down.

Ideally you would let the servers move from the Normal to the Communications Interrupted regimes and back again, since these are safe, and you would never need to use administrative intervention to move a server into the Partner Down regime. In some cases, however, this is not practical because a server running in the Communications Interrupted regime is not using the available IP addresses efficiently, and this may restrict the amount of time a server can effectively service DHCP clients.

There are restrictions on either server running in the Communications Interrupted regime that do not apply to a server running in the Partner Down regime:

In addition, if the backup server is running in Communication Interrupted regime, the following restriction apply:

The length of time a server can successfully run in the Communications Interrupted regime is limited only by the number of IP addresses that have been allocated to it, and the corresponding arrival rate of the DHCP client DISCOVER packets for new clients. When there is a high arrival rate of new DHCP clients or a high turnover rate of the client IP addresses, you may need to move the server into the Partner Down regime more quickly.


Note As far as failover is concerned, a server that is responsive to DISCOVERS is also responsive to INIT-REBOOTS.

Allocation of IP Addresses

In order to enable both your main and backup servers to operate in spite of a network partition (in which both servers can communicate with clients, but not with each other), you need to allocate more IP addresses than are needed to run a single server. The question is how to determine how many additional addresses you need.

You need to configure the main server to allocate a percentage of the currently available addresses in each scope's address pool to the backup server. These addresses are then not available to the main server to allocate to DHCP clients. The backup server uses these addresses in the event that it is running, but cannot talk to the main server, and has not been told that the main server is down.

The question is what percentage of addresses from the main server should be given to the backup.There is no single percentage answer that will suffice for all environments. It depends on the arrival rate of new DHCP clients and the reaction time of your network administration staff.

The backup server needs enough addresses from each scope to satisfy the requests of all new DHCP clients that arrive during the period in which the backup does not know whether or not the main server is down.

Even during the Partner Down regime, the backup server waits for the expiration of the maximum client lead time and the lease time before reallocating any leases. When these times expire, the backup server does the following:

Example

If during the day, the administrative staff is able to respond within a two-hour period to a Communications Interrupted and determine whether the main server is working, then the backup server needs enough addresses to support a reasonable upper bound on the number of new DHCP clients that might arrive during that two-hour period.

If during off-hours, the administrative staff is able to respond within a 12-hour period to the same situation, and considering that the arrival rate of previously unheard-from DHCP clients is also less, then the backup server needs enough addresses to support a reasonable upper bound on the number of DHCP clients that might arrive during that 12-hour period

Consequently, the number of addresses over which the backup requires sole control would be the greater of the two numbers, and would ultimately be expressed as a percentage of the currently available (unreserved) addresses in each scope.

If you are using client-class, remember that some clients can only use some set of scopes and other clients can only use other sets of scopes.

Dynamic BOOTP

When you are using Dynamic BOOTP, there are additional restrictions placed on the address usage in such scopes, because BOOTP clients are allocated IP addresses permanently and receive leases that never expire.

When a server, whose scope does not have dynamic-bootp enabled, goes to the Partner Down regime, it can allocate any available IP address from that scope, no matter whether it was initially available to the main or backup server. When dynamic-bootp is enabled however, the main server and backup servers can only allocate their own addresses. Consequently scopes that enable dynamic-bootp require more addresses to support failover.

When using dynamic BOOTP, do the following:

Safe Period

The safe period is optional and is disabled by default. It is the period after which either the main or backup server automatically transitions from the COMMUNICATIONS-INTERRUPTED to the PARTNER-DOWN state. You should only enable a safe period if, in the event of a server failure, it is more important to get an IP address than to risk receiving a duplicate address.

When the servers are in the COMMUNICATIONS-INTERRUPTED state, neither server can function long term. This state exists to allow the servers to easily survive transient network communications failures of a few minutes to a few days. Note that the actual time period a server can function effectively in COMMUNICATIONS-INTERRUPTED state depends on the DHCP activity of the network in terms of arrival and departure of DHCP clients on the network.

If both servers are still operating, but cannot communicate, you have no choice but to leave them in COMMUNICATIONS-INTERRUPTED state. In most situations, however, when one server is down for an extended period and the operational server can no longer function effectively in COMMUNICATIONS-INTERRUPTED state, it must be moved into the PARTNER-DOWN state.

There are two ways that a server can move into this state:

Configuring the safe period entails some risk, because it allows one server to enter the PARTNER-DOWN state when the other server may not be down. If this should occur, duplicate IP addresses could be allocated.

The purpose of the safe period is to allow network operations staff some time to react to a server moving into the COMMUNICATIONS- INTERRUPTED state. During the safe period the only requirement is that the network operations staff determine if both servers are still running---and if they are, to either fix the network communications failure, or to take one of the servers down before the expiration of the safe period.

The length of the safe period is installation specific, and depends in large part on the number of unallocated IP addresses within the subnet address pool and the expected frequency of arrival of previously unknown DHCP clients requiring IP addresses. Many environments should be able to support safe periods of several days.

During this safe period, either server allows renewals from any existing client. The only limitation is the need for IP addresses for the DHCP server to hand out to new DHCP clients and the need to reallocate IP addresses to different DHCP clients.

The number of extra IP addresses required is equal to the expected total number of new DHCP clients encountered during the safe period. This is dependent on the arrival rate of new DHCP clients, not on the total number of outstanding leases on IP addresses.

Even if you can only afford a short safe period, because of a dearth of IP addresses or a very high arrival rate of new DHCP clients, then substantial benefit is provided by allowing the DHCP subsystem to ride through minor problems that can be fixed within an hour. In such cases, there is no possibility that duplicate IP address allocation exists, and re-integration after the failure is solved will be automatic and require no operator intervention.

Failover State Transitions

During normal operation the DHCP failover servers transition from one state to another. The servers stay in their current state until all of the actions specified on the state transition are complete. If communications fails during one of the actions, the server simply stays in the current state and attempts a transition whenever the conditions for a transition are fulfilled.

Failover Configuration Guidelines

Network Registrar's failover protocol supports automatic failover from a main to a backup DHCP server.

To use failover you need to:

You can configure your network in a variety of ways---from the simplest in which a server has a backup server, to more complicated arrangements. The following are typical configurations:

Simple Configuration

In Figure 3-1 there is a main server and its backup server.


Figure 3-1: Simple Failover Configuration


Backoffice Configuration

In Figure 3-2 there are several main servers and a single backup server.


Figure 3-2: Backoffice Failover Configuration


Symmetric Configuration

In Figure 3-3 there are two servers that share the network and the backup responsibilities.


Figure 3-3: Symmetric Failover Configuration



hometocprevnextglossaryfeedbacksearchhelp
Posted: Thu Jul 13 11:06:17 PDT 2000
Copyright 1989-2000©Cisco Systems Inc.