VRRPD: IP failover What it is It's a daemon that implements the VRRPv2 (Virtual Router Redundant Protocol) for Linux. The daemon has to be run on each of the boxes that together make the high availability system. Basically, its function is to create a set of nodes with the same IP, so if one dies, another box of the same set can take its place transparently for the end user or host, e.g. a redundant system. Usually (though not necessarily) it is used on routers. How it works Each group has a master box, that honours services associated to an IP. That IP is shared throughout all the set. When the master node fails, a backup node takes its place as a new master. To choose which one should be the new master, static priorities are assigned to each node. Furthermore, as a box can be part of other redundancy groups, sets are attached together with a unique ID, called VRID (Virtual Router ID), - Virtual IP and virtual MAC The shared IP is a virtual IP, but that has to be completely valid; any host that uses that IP must not suffer from bad catching in the ARP table (MAC/IP couple). So this doesn't happen, a virtual MAC associated to the virtual IP is created. Otherwise, we'd have to wait for the arp table in each host that uses the redundant system entry to timeout for the new master to work, wrecking the whole point of use. Therefore, even if the host that serves that IP changes, as the MAC is consistent through all the set, the ARP table entry of each box that uses the high availability system is absolutely valid. The service is attended transparently by each node of the set so that the end user hardly notices the change. To makes things extra simple, the virtual MAC is made up from the standard prefix 00:00:5E:00:00:01 and the VRID. Let's say we have assigned VRID 1 to our set. Then our virtual MAC will be 00:00:5E:00:00:01 + 01 = 00:00:5E:00:00:01:01. - Node intercommunication: synchronization The master node takes active part in synchronizing the whole system. Every fixed period of time (by default, 1 second) it announces that it is up and running, sending out a packet to the 224.0.0.18 multicast address. When a few cycles of these announcements pass (3 by default) without any announcement of the master, the working highest priority backup node comes into play, taking its turn to be master node. If this actually happens, and the master node comes back to life afterwards, because it has higher priority, it preempts the temporal master; the first king goes back to its master ruling postion, setting the old backup node back to its idle wait status. - Priorities: master node and backup nodes Each node has a static priority so that, in case of competition, decide who shall be master. The alive node with highest priority will be the new ruler. How long does it take? I did a couple of tests, and it seems that the response is quicker when the master fails, to when the master that had failed cames back. On average, in the first case, a backup starts working after about 10 seconds. In the second case, when the master that had failed comes back to its original status, it doesn't become functional till about 30 seconds or 1 minute after. Where to see it Around in /var/log/syslog of each node. Here you have an extract from a backup: Aug 10 17:43:21 sandbox vrrpd[3926]: Starting (adver_int: 1000000, vrid: 100, use virtual mac: no) Aug 10 17:43:21 sandbox vrrpd[3926]: VRRP ID 100 on eth0 (prio: 100) : we are now a backup router. Aug 10 17:43:24 sandbox vrrpd[3926]: VRRP ID 100 on eth0 (prio: 100): we are now the master router. Aug 10 17:43:27 sandbox kernel: eth0: no IPv6 routers present Aug 10 17:45:28 sandbox vrrpd[3931]: Starting (adver_int: 1000000, vrid: 1, use virtual mac: no) Aug 10 17:45:28 sandbox vrrpd[3931]: VRRP ID 1 on eth0 (prio: 100) : we are now a backup router. Aug 10 17:47:02 sandbox vrrpd[3931]: VRRP ID 1 on eth0 (prio: 100): 172.16.0.3 is down, we are now the master router. Aug 10 17:47:27 sandbox vrrpd[3931]: VRRP ID 1 on eth0 (prio: 100) : 172.16.0.3 is up, we are now a backup router. Use The package in s/Debian/Lunar/g is [surprise!] vrrpd. Once installed, running it is quite simple. At the master node, box A [output truncated] : # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:E0:4C:31:69:5C inet addr:172.168.0.3 Bcast:172.168.255.255 Mask:255.255.0.0 # vrrpd -i eth0 -v 1 -D -p 100 172.168.0.222 # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:00:5E:00:00:01:01 inet addr:172.168.0.3 Bcast:172.168.255.255 Mask:255.255.0.0 Options are (in order) * -i eth0 : interface that is going to be modified and whose IP is inside the same network as the virtual IP * -v 1 : VRID of the set * -D : daemonize. * -p 100 : this node's priority * 172.168.0.222 : virtual IP served In a backup node (box B): # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:07:95:A6:BE:81 inet addr:172.16.0.50 Bcast:172.16.255.255 Mask:255.255.0.0 # vrrpd -i eth0 -v 1 -D -p 150 -n 172.168.0.222 # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:07:95:A6:BE:81 inet addr:172.16.0.50 Bcast:172.16.255.255 Mask:255.255.0.0 Differences with the master: * -p 150 : Has lower priority * -n : Don't change the MAC, as seen before and after running the vrrpd command IMPORTANT: * The virtual IP 172.16.0.222 is a completely valid IP address inside the network 172.16.0.0/16. The announcements of the master node are not to 255.255.255.255, only to the range the virtual IP is in, 172.16.0.0/16 in our example. * If we add the -n option (do not change immediately the MAC to the virtual one) to the master too, there'll be no problem. As the master has a higher priority, as soon as it wakes up, it will preempt any other backup. So, to keep things simple and stupid (TM), we can add -n to the conf of any node, including the master, but THE MASTER HAS TO HAVE THE HIGHEST PRIORITY. If now, carrying on with the example, we disconnect the master, the backup will come into play. And, as said above, when the master arrives again, it'll be master and rule again :) Inconveniences Open connections are not exported through the nodes upon failure and change of master. You'll have to count on hanged connections untill a retries happen. At least we have assured a short response window ;) More info: * RFC 2338: http://www.faqs.org/rfcs/rfc2338.html * Beloved wikepedia entry: http://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol * Imagestream's white paper : http://www.imagestream.com/VRRP_WhitePaper.PDF * VRRP: overview, implementation and usage (from the author): http://lwn.net/2001/features/OLS/pdf/pdf/vrrpd.pdf