Pages

VRRP master/master issue on CSS 11501 with 3550


In the picture illustrated in which two Cisco CSS 11501 Loadbalancers were providing a redundant setup with fate sharing, the route from the "Servers" networks towards the client network was provided through an IP setup on a redundant interface shared by the 2 Loadbalancers.

The VRRP announcements for the virtual routers holding redundant interfaces on vlans A,B between the two loadbalancers were going through the 2 Cisco 3550 Catalyst switches which were running (C3550-I9Q3L2-M), Version 12.1(19)EA1c IOS.

To better depict the picture, each of the 2 Loadbalancers had one physical link to its corresponding L3 3550 and carrying over it vlans A,B (on the server side), one ISC link was connecting the two CSS for adaptive session redundancy (ASR) and the link between the two Cisco 3550 was set up as 802.1q trunk and transporting among other the vlans A,B over which the VRRP communication had to take place.
Although the setup and configuration was double and triple checked, the problem was that each of the Loadbalancers was claiming to be master on the virtual router instance running for its corresponding vlan (A or B).
For brevity I will illustrate the case of the virtual router on vlan A, although the problem seemed to be strongly related to the fact that the CSS were connecting through a trunk link to the 3550.
CSS11510_right# show redundant-interfaces

Redundant-Interfaces:

Interface Address: 192.168.0.2 VRID: 1
Redundant Address: 192.168.0.1 Range: 1
State: Master Master IP: 192.168.0.2

CSS11501_left# show redundant-interfaces

Redundant-Interfaces:

Interface Address: 192.168.0.3 VRID: 1
Redundant Address: 192.168.0.1 Range: 1
State: Master Master IP: 192.168.0.3
While trying to browse for this specific problem (both CSS were master), I found out that most of the cases were related to misconfiguration. Either an access list was blocking traffic between the 2 devices, either the VRID was incorrect, etc. However there was nothing wrong with the configuration present on the CSS nor on the 3550s.
Checking the counter for VRRP announcements received by the presumably slave Loadbalancer at some point in time, the number was always 0.
CSS11501_left# llama
CSS11501_left(debug)# ip scp statistics

totalIpFrames received: 211300
invalidIPFrame: 0 malformedIPFrame: 0
noIngressIPFrame: 0 srcDestSameIPFrame: 0
badIPVersion: 0 badIpHeaderLength: 0
badIpChecksum: 0 badSrcIPFrame: 0
loopbackIPFrame: 0 badIPAddress: 0
badIpDestAddress: 0 zeroTTLIPFrame: 0
badIpProtocol: 0 badIpOptions: 0

Packets received with supported protocol types:
IPPROTO_IP: 0 IPPROTO_ICMP: 12285
IPPROTO_IGMP: 0 IPPROTO_GGP: 0
IPPROTO_TCP: 3129 IPPROTO_EGP: 0
IPPROTO_PUP: 0 IPPROTO_UDP: 47625
IPPROTO_IDP: 0 IPPROTO_TP: 0
IPPROTO_EON: 0 IPPROTO_OSPF: 0
IPPROTO_ENCAP: 0 IPPROTO_VRRP: 0
IPPROTO_OSPF: 0

IP PACKET TO VXWORKS STATISTICS:
packetLeakToVxWorks: 170436
As mentioned earlier the 3550 was running Version 12.1(19)EA1c IOS, while the CSS was running sg0730203 (07.30.2.03) WebNS.
I didn't solve the issue myself. I was notified that there is a problem with the current IOS running on the 3550 and there was a need to upgrade to at least an EMI image 12.1.20. There is also a bug logged with Cisco, although the setup and the configuration of the presented issue and the one logged with Cisco are not exactly the same.
Here is the bug logged to Cisco.
After upgrading to 12.1.20 IOS, the VRRP announcements were received by the slave Loadbalancer and the initial VRRP negotiation took place correctly.
Reference: CSS Redundancy Configuration Guide