[rancid] Need some Help - F5's in RANCID

Fri Jul 1 12:52:33 UTC 2011

On 7/1/11, Jethro R Binks <jethro.binks at strath.ac.uk> wrote:
> On Thu, 30 Jun 2011, Lee wrote:
>
>> Well!!  It doesn't seem to depend on NOPIPE.  rancid run manually to
>> collect F5 configs works -- with NOPIPE set or clear.  Rancid run from
>> crontab sometimes works, sometimes not.
>>
>> Unless someone beats me to it (hint, hint :) I'll try to figure out next
>> week if it's an env. variable setting missing from the crontab run
>> that's causing the problem
>
> If something works from crontab but not from the command line, then the
> classic explanation is that there is something in the environment that's
> different.
>
> You can simply run the "env" command from cron and examine the mail output
> to see what environment cron jobs run within.

Right.  I had to do that when porting my stuff from Solaris to Redhat.

>  Then you can replicate that
> at the command line

That I couldn't do.  Maybe it was just me being ignorant, but there
were some env. vars I couldn't get rid of.  Any hints/tips on how to
replicate a cron environment at the command line would be appreciated
:)

> and see if that fixes the problem, and then further
> modify the environment to see what breaks.
>
> However, if it "sometimes" works from cron and sometimes not, then it is
> unlikely to be the environment I'd say.

On the one hand, I agree that sometimes works from cron & sometimes
not doesn't sound  like an environment differences problem.  On the
other hand, I don't have any other testable theory for what's causing
the problem, so it's worth spending an hour or two to see if it is an
environment or /bin/sh (cron) vs. /bin/bash (interactive) issue

>  Maybe something else: any NFS
> automounting going on?

I have no idea :(  A VM running Redhat with SAN storage pretty much
sums up my knowledge of that machine.

>  Clashing with some other job (do the failures
> happen in particular windows in time)?  Check the cron logs to see what
> else may be running at the time.  Is it one F5 host or all of them that
> fail?  Maybe it is host-related.

We've also got Cisco NCM collecting F5 configs.  maybe related is that
it's just recently started spewing out F5 change reports that look
like this:

Configuration Diff
< 001: # Binary configuration captured, checksum: 900614
  002: # Device's text version of configuration follows
  003: #-----------------------------------------------------
  004: provision apm {}
---
> 001: # Binary configuration captured, checksum: 710350
  002: # Device's text version of configuration follows
  003: #-----------------------------------------------------
  004: provision apm {}

Configuration Diff
< 001: # Binary configuration captured, checksum: 710350
  002: # Device's text version of configuration follows
  003: #-----------------------------------------------------
  004: provision apm {}
---
> 001: # Binary configuration captured, checksum: 782192
  002: # Device's text version of configuration follows
  003: #-----------------------------------------------------
  004: provision apm {}

Configuration Diff
< 001: # Binary configuration captured, checksum: 782192
  002: # Device's text version of configuration follows
  003: #-----------------------------------------------------
  004: provision apm {}
---
> 001: # Binary configuration captured, checksum: 764708
  002: # Device's text version of configuration follows
  003: #-----------------------------------------------------
  004: provision apm {}

But in any case, today is the start of a 4 day weekend for me &
worrying about F5s isn't part of my plans :)

Regards,
Lee