patches for pauses between parallel RANCID runs

Ed Ravin eravin at panix.com
Wed Jun 15 13:39:24 UTC 2005


On Tue, Jun 14, 2005 at 03:16:25PM -0700, john heasley wrote:
> Tue, Jun 14, 2005 at 12:10:58PM -0400, Ed Ravin:
> > I needed to control how fast RANCID starts up jobs in parallel: when
> > using one-time password logins, I had multiple routers trying
> > to log in with the same sequence number, and only one of them could
> > finish logging in.
> > 
> > It turns out "par" already supports such a feature, but there's no easy
> > hook to turn it on.  So here's an addition to /etc/rancid.conf:
> > 
> >   # How long to pause (in seconds) between parallel RANCID runs
> >   # This is important when using the same S/Key account on multiple
> >   # routers, otherwise all the routers will receive the same  challenge
> >   # and only one will actually be able to log in.  Default is zero.
> >   # PAR_PAUSE=3; export PAR_PAUSE
[...]
> I dont think that is a reliable solution.  you really need to write-lock the
> file you are reading the keys from.  The process will have to lock that file
> until it manages to get it's key accepted (login, then again for enable) or
> gives-up and others will have to block waiting for the lock.

I agree that it's not 100% reliable, but it will probably be good enough.
Note that this is a general issue with s/key, not a RANCID-specific thing.
I don't like the idea of locking files, as it only solves the problem for
RANCID and only when RANCID is running on just one machine.  Also, when
you add locking code you add the possibility of bugs that deadlock, which
is no fun.

I'd rather do what normally happens when an S/Key collision occurs
- try the login again.  The catch is, I'd like to sleep a random
amount so that a flock of clogins don't all retry at the same time - how
do you get random numbers in expect ?

What do you think of conditionally skipping the 1-second sleep in
clogin before sending the password?  I think that's part of the problem,
since any clogins using the same account that try another router in the 1
second interval will get a duplicate challenge that will be stale by the
time they finish their 1-second sleeps...

	-- Ed




More information about the Rancid-discuss mailing list