[rancid] HP Procuves losing mangement interfaces

Mon Apr 13 22:24:58 UTC 2009

All,

I've been having a problem with a subset of switches periodically losing their management interfaces. We have 3 data centers set up within the united states and only 1 is having this problem. The problem data center is unique in that it is our largest(roughly ~110 hp procurve 2810s)  and CPU usage on each switch averages 35-45%. The servers behind each switch remain connected while the management interface is down. Pinging, snmpget and ssh all fail. The downed management interface on the switch eventually recovers and logs don't show any sign of failure. 

The rancid logs show a timeout when trying to contact that switch and then 3 failures to ssh. I've found that when rancid polls the switch CPU usage spikes dramatically, and my assumption was that the seviere spikes in CPU utilization causes the management interface to fall over. So mitigate against this, Ive turned down the number of retries and the polling interval, but the problem still remains. Anyone familiar with this issue?

Im using rancid version : 2.3.2~a9 on debian etch

Thanks,
Mike Kania
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.shrubbery.net/pipermail/rancid-discuss/attachments/20090413/82a800f9/attachment.html