[rancid] Re: HP Procuves losing mangement interfaces

Mon Apr 13 23:17:40 UTC 2009

Mon, Apr 13, 2009 at 03:24:58PM -0700, Michael Kania:
> All,
> 
> I've been having a problem with a subset of switches periodically losing their management interfaces. We have 3 data centers set up within the united states and only 1 is having this problem. The problem data center is unique in that it is our largest(roughly ~110 hp procurve 2810s)  and CPU usage on each switch averages 35-45%. The servers behind each switch remain connected while the management interface is down. Pinging, snmpget and ssh all fail. The downed management interface on the switch eventually recovers and logs don't show any sign of failure. 
> 
> The rancid logs show a timeout when trying to contact that switch and then 3 failures to ssh. I've found that when rancid polls the switch CPU usage spikes dramatically, and my assumption was that the seviere spikes in CPU utilization causes the management interface to fall over. So mitigate against this, Ive turned down the number of retries and the polling interval, but the problem still remains. Anyone familiar with this issue?
> 
> Im using rancid version : 2.3.2~a9 on debian etch
> 
> Thanks,
> Mike Kania
> _______________________________________________
> Rancid-discuss mailing list
> Rancid-discuss at shrubbery.net
> http://www.shrubbery.net/mailman/listinfo.cgi/rancid-discuss

sounds like either a switch s/w bug or some over-zealous rate limiting.  it
is true that running rancid against any of the network devices will use
cpu, a little more than a human running the same commands, but it shouldnt
make the device fail.  if it does, its the vendor's bug.