[rancid] rancid-run repeating to test device on errors that are not recoverable

Alan McKinnon alan.mckinnon at gmail.com
Fri Jan 13 16:38:42 UTC 2017


On 13/01/2017 16:16, Mischa Diehm wrote:
> Hi,
> 
> trying to get the logs from rancid into our monitoring system I noticed
> that rancid would try to login to systems $ROUND times even though the
> error is clear in terms of being unrecoverable during a rancid-run e.g.:
> 
> rancid at noc-XXX:~/logs$ grep 'Update the SSH known_hosts file
> accordingly.'  RZ-ROUTER.20170113.054748 | grep routerXYZ
> routerXYZ-fa-0-1.urz.p.unibas.ch clogin error: Error: The host key for
> routerXYZ-fa-0-1.urz.p.unibas.ch has changed.  Update the SSH
> known_hosts file accordingly.
> routerXYZ-fa-0-1.urz.p.unibas.ch clogin error: Error: The host key for
> routerXYZ-fa-0-1.urz.p.unibas.ch has changed.  Update the SSH
> known_hosts file accordingly.
> routerXYZ-fa-0-1.urz.p.unibas.ch clogin error: Error: The host key for
> routerXYZ-fa-0-1.urz.p.unibas.ch has changed.  Update the SSH
> known_hosts file accordingly.
> routerXYZ-fa-0-1.urz.p.unibas.ch clogin error: Error: The host key for
> routerXYZ-fa-0-1.urz.p.unibas.ch has changed.  Update the SSH
> known_hosts file accordingly.
> routerXYZ-fa-0-1.urz.p.unibas.ch clogin error: Error: The host key for
> routerXYZ-fa-0-1.urz.p.unibas.ch has changed.  Update the SSH
> known_hosts file accordingly.
> 
> in our case MAX_ROUNDS=4… I checked but couldn’t find an fast easy way
> to fix this. Same for „check your password“ et al. What do you think? Is
> there an easy way to prevent retrying in case of unrecoverable errors?

I don't see the retries as being especially problematic. *login will try
and fail the known_hosts tests many 10s of times in the time it takes to
retrieve one router's config. The extra processing effort is very little
indeed, almost below the noise floor.

What it does do though, is increase the log entries and make them rather
visible, all of which encourages you to fix known_hosts by making it
highly visible that there is a problem.

My solution for retries is to set it to 1, and rancid's cron job runs
every hour. If the attempt fails for any reason, it tries again in an
hour and I get 2 hours of changes in once cvs commit

-- 
Alan McKinnon
alan.mckinnon at gmail.com



More information about the Rancid-discuss mailing list