VRRPD: IP failover

What it is

It's a daemon that implements the VRRPv2 (Virtual Router Redundant 
Protocol) for Linux. The daemon has to be run on each of the boxes that 
together make the high availability system. 

Basically, its function is to create a set of nodes with the same IP, 
so if one dies, another box of the same set can take its place 
transparently for the end user or host, e.g. a redundant system. 
Usually (though not necessarily) it is used on routers.

How it works

Each group has a master box, that honours services associated to an IP. 
That IP is shared throughout all the set. When the master node 
fails, a backup node takes its place as a new master. To choose which one 
should be the new master, static priorities are assigned to each node. 
Furthermore, as a box can be part of other redundancy groups, sets are 
attached together with a unique ID, called VRID (Virtual Router ID),

- Virtual IP and virtual MAC

The shared IP is a virtual IP, but that has to be completely valid; any 
host that uses that IP must not suffer from bad catching in the 
ARP table (MAC/IP couple). So this doesn't happen, a virtual MAC 
associated to the virtual IP is created. Otherwise, we'd have to wait 
for the arp table in each host that uses the redundant system entry to 
timeout for the new master to work, wrecking the whole point of use.

Therefore, even if the host that serves that IP changes, as 
the MAC is consistent through all the set, the ARP table entry of each 
box that uses the high availability system is absolutely valid. The 
service is attended transparently by each node of the set so that the 
end user hardly notices the change.

To makes things extra simple, the virtual MAC is made up from the 
standard prefix 00:00:5E:00:00:01 and the VRID. Let's say we have 
assigned VRID 1 to our set. Then our virtual MAC will be 
00:00:5E:00:00:01 + 01 = 00:00:5E:00:00:01:01.

- Node intercommunication: synchronization

The master node takes active part in synchronizing the whole system. 
Every fixed period of time (by default, 1 second) it announces that it is 
up and running, sending out a packet to the 224.0.0.18 multicast 
address. When a few cycles of these announcements pass (3 by default) without 
any announcement of the master, the working highest priority backup node 
comes into play, taking its turn to be master node.

If this actually happens, and the master node comes back to life 
afterwards, because it has higher priority, it preempts the temporal 
master; the first king goes back to its master ruling postion, setting 
the old backup node back to its idle wait status.

- Priorities: master node and backup nodes

Each node has a static priority so that, in case of competition, decide 
who shall be master. The alive node with highest priority will be 
the new ruler.

How long does it take?

I did a couple of tests, and it seems that the response is quicker when 
the master fails, to when the master that had failed cames back. On 
average, in the first case, a backup starts working after about 10 seconds. 
In the second case, when the master that had failed comes back to its 
original status, it doesn't become functional till about 30 seconds or 1 
minute after.

Where to see it

Around in /var/log/syslog of each node. Here you have an extract from 
a backup:

Aug 10 17:43:21 sandbox vrrpd[3926]: Starting (adver_int: 1000000, vrid: 100, use virtual mac: no)
Aug 10 17:43:21 sandbox vrrpd[3926]: VRRP ID 100 on eth0 (prio: 100) : we are now a backup router.
Aug 10 17:43:24 sandbox vrrpd[3926]: VRRP ID 100 on eth0 (prio: 100): we are now the master router.
Aug 10 17:43:27 sandbox kernel: eth0: no IPv6 routers present
Aug 10 17:45:28 sandbox vrrpd[3931]: Starting (adver_int: 1000000, vrid: 1, use virtual mac: no)
Aug 10 17:45:28 sandbox vrrpd[3931]: VRRP ID 1 on eth0 (prio: 100) : we are now a backup router.
Aug 10 17:47:02 sandbox vrrpd[3931]: VRRP ID 1 on eth0 (prio: 100): 172.16.0.3 is down, we are now the master router.
Aug 10 17:47:27 sandbox vrrpd[3931]: VRRP ID 1 on eth0 (prio: 100) : 172.16.0.3 is up, we are now a backup router.


Use

The package in s/Debian/Lunar/g is [surprise!] vrrpd. Once installed, 
running it is quite simple.

At the master node, box A [output truncated] :

# ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:E0:4C:31:69:5C
	inet addr:172.168.0.3  Bcast:172.168.255.255  Mask:255.255.0.0

# vrrpd -i eth0 -v 1 -D -p 100 172.168.0.222

# ifconfig eth0
eth0  Link encap:Ethernet  HWaddr 00:00:5E:00:00:01:01
	inet addr:172.168.0.3  Bcast:172.168.255.255  Mask:255.255.0.0 

Options are (in order)

    * -i eth0 : interface that is going to be modified and whose IP is 
		inside the same network as the virtual IP
    * -v 1 : VRID of the set
    * -D : daemonize.
    * -p 100 : this node's priority
    * 172.168.0.222 : virtual IP served

In a backup node (box B):

# ifconfig eth0
eth0    Link encap:Ethernet  HWaddr 00:07:95:A6:BE:81
        inet addr:172.16.0.50  Bcast:172.16.255.255  Mask:255.255.0.0

# vrrpd -i eth0 -v 1 -D -p 150 -n 172.168.0.222 

# ifconfig eth0
eth0    Link encap:Ethernet  HWaddr 00:07:95:A6:BE:81
        inet addr:172.16.0.50  Bcast:172.16.255.255  Mask:255.255.0.0

Differences with the master:

    * -p 150 : Has lower priority
    * -n : Don't change the MAC, as seen before and after running the 
vrrpd command

IMPORTANT:

    * The virtual IP 172.16.0.222 is a completely valid IP address 
inside the network 172.16.0.0/16. The announcements of the master node 
are not to 255.255.255.255, only to the range the virtual IP is in, 
172.16.0.0/16 in our example.

    * If we add the -n option (do not change immediately the MAC to 
the virtual one) to the master too, there'll be no problem. As the 
master has a higher priority, as soon as it wakes up, it will preempt 
any other backup. So, to keep things simple and stupid (TM), we can add 
-n to the conf of any node, including the master, but THE MASTER HAS TO 
HAVE THE HIGHEST PRIORITY.

If now, carrying on with the example, we disconnect the master, the 
backup will come into play. And, as said above, when the master arrives 
again, it'll be master and rule again :)

Inconveniences

Open connections are not exported through the nodes upon failure and 
change of master. You'll have to count on hanged connections untill a 
retries happen. At least we have assured a short response 
window ;)

More info:

    * RFC 2338: http://www.faqs.org/rfcs/rfc2338.html
    * Beloved wikepedia entry: http://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol
    * Imagestream's white paper : http://www.imagestream.com/VRRP_WhitePaper.PDF
    * VRRP: overview, implementation and usage (from the author): 
	http://lwn.net/2001/features/OLS/pdf/pdf/vrrpd.pdf