Thursday, May 15, 2014

CSCue56163 - 12.4(25e)JAL1 AP recovery img does not work as expected

I got hit with this little gem the other day and it proved to be quite frustrating.  It was only through some lucky Google searches that I was able to find a post on the CSC forums by someone else with the same issue.  It seems that this particular recovery image does not like it when you disable proxy-arp.  Something we have done for security reasons.  This is now my favorite problem with out of the box APs.  It's followed closely by the box of APs pre-loaded with mesh code and the box of APs from the wrong regulatory domain.  It only takes top slot because it is a flaw with the code rather than a simple mistake on the part of our reseller.

I was able to see this bug in action by issuing the "debug capwap client event" and "debug capwap client error" commands.  For my trouble I received  a repeat of the following messages:

AP4403.a701.a00a>
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: Could not discover any MWAR.
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: Starting Discovery. Initializing discovery latency in discovery responses.
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: CAPWAP State: Discovery.
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: spamResolveStaticGateway - Removing default route for existing gateway 10.119.64.1
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: spamResolveStaticGateway - Adding default route for gateway 10.119.64.1
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: spamResolveStaticGateway  - gateway found 10.119.64.1
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: spamResolveStaticGateway - Removing default route for existing gateway 10.119.64.1
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: spamResolveStaticGateway - Adding default route for gateway 10.119.64.1
*Mar  1 00:27:21.875: %CAPWAP-3-EVENTLOG: spamResolveStaticGateway  - gateway found 10.119.64.1

It was a Google search on some of these messages that lead me to the resolution of the issue.

There are three ways to deal with this little problem according to the Cisco Bug page.

They are as follows:

1. Make sure that ip proxy-arp is configured (default setting for an IOS
router), on the AP's subnet's default gateway. Also if ip broadcast-address is defined on the vlan with something other than 255.255.255.255 the AP will not join. Either no this command or set it to broadcast.

2. If console access is available on the AP, then disable IP routing - then it
should be able to join, and download the new IOS image:

ap#debug capwap console cli
ap#configure terminal
ap(config)#no ip routing

(wait for it to join)

This setting will not survive a reboot.

3. Install a different recovery (rcvk9w8) or lightweight IOS (k9w8) image on
the AP, such as 15.2(2)JA1.

I went with the second option to troubleshoot the issue, but when the 250 APs in the order are deployed I will probably use option 1 and temporarily enable proxy-arp on the AP management subnets.  Below I have included a screen capture directly from Cisco's web site.

CSCue56163