[rancid] Dealing with rancid dying under heavy load

heasley heas at shrubbery.net
Sat Feb 22 16:04:58 UTC 2014


Sat, Feb 22, 2014 at 10:44:40AM +0200, Alan McKinnon:
> > unless the host can't keep up with the expect processes well enough to stay
> > within the login script's timeout period (or what you've set in cloginrc),
> > it should not fail - but i havent tried 30 or 40, usually kern.smp.cpus
> 
> That was my thought too - it shouldn't fail and timeouts shouldn't
> happen. These targets are all on fast networks (usually GigaBit, some
> 100M and all respond quickly).
> 
> I might have hit a threshold on the ESXi host; I don't have visibility
> into that environment and can't see what else it's hosting.

the first vm provided to me had 6 cpu and 8G; I couldnt figure out why the
performance of our multithreaded application, also i/o bound, was so
horrible - they'd limited the vm to ~512Mhz, so i had a 1989 vm.

> > if the rancid host is a VM and assuming it is timing out, but plenty of
> > spare cpu and net; figure out if its actually timing out due to wall clock
> > time, vs. missing interrupts for example.
> 
> I'll look into that. I'd discounted simple timeouts as another rancid
> system that deals with kit out in the field has PAR_COUNT=50 and it's
> been tested as high as 100. Some of that kit is stupid slow, I've seen
> show runs take 10 minutes to complete and the system just deals with it
> as expected.

lack of memory seems to be a big one for VMs; esp. VirtualBox and similar
VM systems.  but, my experience with vmware is that disk i/o performance
is poor.  that might be their drivers for the controllers i've had, but it
was bad enough that i moved to VBox.

> Both systems have the same OS config and both run as VMs in the same
> environment. I think step one is to get monitoring graphs out of the
> VMWare team.

thats a good indicator; when retrieving data, rancid spends most of its
time waiting on the device.  if you use NOPIPE=YES in rancid.conf, you
decouple the retrieval from the processing/reformatting.


More information about the Rancid-discuss mailing list