Frequently Asked Questions about rancid - last updated 20180408. This FAQ contains information that may not apply directly to versions of rancid prior to 3.0. It also contains paths containing tags such as , which refer to paths that are site-specific and are determined by how rancid was configured at installation time. These are explained briefly in the configure --help output. Below are the defaults used in rancid. PREFIX configure --prefix= option. default: /usr/local/rancid EPREFIX configure --exec-prefix= option. default: BINDIR configure --bindir= option. default: /bin The location of clogin, etc. SYSCONFDIR configure --sysconfdir option. default: /etc The location of rancid.conf, etc. LOCALSTATEDIR configure --localstatedir option. default: /var The location of the CVS repository, log files, etc. The most recent FAQ can be found at http://www.shrubbery.net/rancid/FAQ ToC 1) Platform/Device specific 2) CVS and filesystem permissions 3) General 4) Extending RANCiD 5) O/S specific 6) E-mail problems 7) I am still stuck 8) License 1) Platform/Device specific Q. I have a Cisco Catalyst 6500 series switch running the IOS (NOT catOS) software, is the router.db device type cisco or cat5? A. A catalyst running IOS is type "cisco". The 'show version' output will have banner including a phrase similar to "Cisco Internetwork Operating System Software". See the router.db(5) manual page. Q. I have Hybrid Cisco switch, like a cat5k with an RSM. How do I collect both the routing engine and switch configurations? A. Recommended way is to use two entries in the router.db, one for each. For example: cat5k_rsm.domain.com;cisco;up cat5k_sw.domain.com;cat5;up Q. I have a Cisco ??? on which collection stopped working, but clogin works as expected. A. Check if 'write term' produces output. Some IOS combined with large configs and low free memory produce zero 'write term' output, esp. combined with a memory leak. The device will have to be rebooted and/or upgraded. Q. I have a Cisco Catalyst switch. clogin connects, but after receiving the prompt, it stalls until it times out. Why? A. This may be due to your prompt. CatOS does not include an implicit '>' in it's prompt, like IOS does. clogin looks for '>' during login, so specify your prompt with a trailing '>'. Also see cat5rancid(1). For example: cat5k> cat5k> enable Password: cat5k> (enable) Q. Why won't a saved HP Procurve configuration load onto the switch to restore the configuration. A. Stefan Metzger and Per-Olof Olsson have found that the Procurve O/S does not like the leading comments, the lines beginning with ';', so remove these comments in whatever manner you wish. Also, they appear to expect a leading comment line that varies between different models, for example: "; Running config file:" OR "; JJBA8" Q. Polling a ZebOS box fails from cron, but is successful from the command- line. A. This is the tty/pty handling of either your O/S or ZebOS. Supposedly, changing the TERM in /rancid.conf to the following seems to fix it. TERM=vt100;export TERM COLUMNS=160; LINES=48; export COLUMNS LINES 2) CVS and filesystem permissions WARNING: Be careful when mucking around with the repository! Q. I am new to CVS, where can I find additional information? A. The manual page for CVS is quite complete, but can be be overwhelming even for someone familiar with RCS. There are some excellent resources on the web. See http://en.wikipedia.org/wiki/Concurrent_Versions_System and http://cvshome.org/. Q. Errors are showing up in the logs like: cvs [diff aborted]: there is no version here; run 'cvs checkout' first A. The directory was not imported into CVS properly or was not properly checked out afterward, so CVS control files or directories do not exist. rancid-cvs should always be used to create the directories and perform the CVS work. If it is just the directories that have been created manually, save a copy of the router.db file, then remove the group's directory, use rancid-cvs, and replace the router.db file. If the CVS import was also performed manually, cd to and use 'cvs co ' to create all the CVS control bits. Q. Errors are showing up in the logs like: cvs commit: Up-to-date check failed for `filename' A. This could mean that the version of the file that rancid has is older than the one in repository or that the file is missing. That could be due to a filesystem error, a system crash, a rancid crash or premature termination, or some third party altering the repository or rancid's local repository. Third parties should NOT alter the repository or rancid's local repository, except for editing the router.db file. All other modificiations should be left to rancid's automation. To recover, run cvs update in the group's directory as the rancid user. If the update fails or creates conflict, the files can be removed and the cvs update re-run. Some files are not saved in cvs, but these will be re-created without harm.. Some have asked why rancid does not run a cvs (or svn or git) update itself. If it did, it could inherit malicious modifications in the repository meant to hide the modifications. Q. I keep receiving the same diff for a (or set of) devices, but I know the data is not changing repeatedly. Why? A. This is probably a CVS or filesystem permissions problem. Check the log file from the last run for that group for clues first; it may provide the exact cause. Note: It is very important the following be done as the user who normally runs the rancid collection from cron. Check the cvs status of the device's file. example: guelah [2704] cvs status rtr.shrubbery.net =================================================================== File: yogi.shrubbery.net Status: Up-to-date Working revision: 1.197 Tue Jul 10 15:41:16 2001 Repository revision: 1.197 /usr/local/rancid/var/CVS/shrubbery/configs/rtr.shrubbery.net,v Sticky Tag: (none) Sticky Date: (none) Sticky Options: (none) The Status: should be Up-to-date. If the status is "Unknown", then somehow the file has been created without being cvs add'ed. This should be corrected by removing that device's entry from the group's router.db file, run rancid-run, replace the entry in router.db, and run rancid-run again. If the Status is anything else, someone has most likely been touching the files manually. Sane state can be achieved by removing the file and running cvs update to get a fresh copy from the repository. Check the ownership and permissions of the file and directory and the directory and file in the cvs repository (/CVS/). They should be owned by the user who runs rancid-run from cron. At the very least, the directory and files should be writable by the rancid user. Group and world permissions will determined by the umask (default 027), which is set in /rancid.conf. Likely the easiest way to fix the ownership on the cvs repository is chown -R /CVS / Q. I am renaming a device but would like to retain the history in CVS. How is this done? A. CVS does not provide a way (AFAIK) to rename files or to rename or delete directories. The best way is to copy the CVS repository file manually like this (disclaimer: BE VERY CAREFUL mucking around with the repository): % su - rancid_user % cd % echo "new_device_name;device_type;up" >> /router.db % cp -p CVS//configs/old_device_name,v \ CVS//configs/new_device_name,v % cd /configs % cvs update where GROUP is the name of the rancid group that the device is a member of. Once the renaming is complete, remove the old name from the router.db file and leave the CVS clean-up of the old filename to rancid. If one wanted to move a device to a different group and maintain the history, the same procedure would work, substituting the new group name appropriately and editing the router.db of both old and new groups, of course. SVN provides a rename function; but we suggest that you use it's copy function instead, and leave the clean-up of the old name to rancid. So, you would use the copy function (proper substitutions, of course) in place of the cp in the CVS example, then commit the new file. Q. I am new to svn. Where can I find more information? A. The svn so-called "red book" is the definitive guide. http://svnbook.red-bean.com/en/1.5/svn-book.html Q. I am removing a group and would like to remove all traces of it from the rancid directory and the CVS repository. How is this done? A. As far as I know, CVS does not provide a way to remove directories. First, remove the group from /rancid.conf. If rancid is running, wait for it to complete. Then just recursively remove the directory. For example, a group named "fubar": % su - rancid_user % cd % rm -rf fubar CVS/fubar Q. I would like to place my git repository on a remote machine. How do I do that? A. We suggest that you configure git to push a clone to the remote. That is, create the rancid group as normally done, including running rancid-cvs(1). Then add a push url to the existing origin. % cd $BASEDIR/ % git remote add % git remote set-url --add --push origin $CVSROOT/ % git remote set-url --add --push origin http://www.shrubbery.net/pipermail/rancid-discuss/2018-August/010348.html Q. I would like to place my CVS repository on a remote machine. How do I do that? A. Assuming that you're starting fresh, its quite simple. Before running rancid-cvs for the first time, adjust CVS_RSH & CVSROOT in rancid.conf similar to the following: CVS_RSH=ssh; export CVS_RSH CVSROOT="myhost:/fqpn/CVS"; export CVSROOT Note that CVS_RSH is not found in the sample rancid.conf that is distributed with rancid. Q. I need a web interface to the rancid CVS repository, for the CVS unsavvy. A. cvsweb works with rancid. Other software may exist for CVS and there are similar tools for Subversion and git. http://www.freebsd.org/projects/cvsweb.html cvsweb.conf: @CVSrepositories = ( 'rancid' => ['RANCID CVS, '/full_path_to_the_RANCID_CVS'], where the path will be /CVS. 3) General Q. I have upgraded to rancid >=3.0 and I am not receiving updates for any devices. A. Have you updated the format of your router.db files? The field separator changed in 3.0. See the UPGRADING file or router.db(5). Q. I have a (set of) device(s) on which collection fails. How can I debug this? A. Our usual diagnostic procedure for this is: - Make sure that the appropriate login script (example: clogin for cisco) works (see rancid.types.base). Use these tests to make sure you don't have routing or firewall issues, DNS or hostname errors, that your .cloginrc is correct, your banner does not have some character that the login script confuses for a prompt, and that the login script doesn't have a bug of some sort. For example: clogin cisco_router This should login to cisco_router and produce a router prompt that you can use normally (interactively), as if clogin were not used (i.e.: telnet cisco_router). Also, see the next FAQ entry. - Verify that commands can be executed on the router via the login script. This will exercise the script functionality needed for rancid. For example: clogin -c 'show version; show diag' cisco_router Should login to cisco_router, run show version and show diag, then disconnect and exit. The output will be displayed on your terminal. Use commands appropriate for the device; for rancid modules converted to perl modules, rancid -t -C produces the full list of commands that rancid would normally run. - Then see if the correct rancid commands work against the router. For example: rancid -t cisco cisco_router Should produce a cisco_router.new file (cooked to a golden rancid-style colour) in the current directory. If it does not, try again with the -d option, so that the cisco_router.new and cisco_router.raw files will not be removed if an error is detected. The cisco_router.raw file is the raw output of the dialogue with the device, and with the -d option, the .raw file will not be deleted at the end, which is helpful to see all the input. The -l option is also useful for logging. Note: in versions prior to 3.10, NOPIPE=YES must be set in your environment to produce the cisco_router.raw file. If all of these work, make sure that the device's entry in the group's router.db file is correct and check the group's last log file for errors. Q. A login script is failing to login to a device with an authentication error. I have the correct information in my .cloginrc. A. It is useful to verify that the information that you this is used, really id. See clogin(1) for the [Mm] options. Q. I am receiving persistent diffs for up/down/added/deleted devices in router.db, but nothing has changed and the cvs repository is up to date. A. Check that the configure process run during the installation of rancid determined the proper options for diff(1); look for diff in the control_rancid script. If you also run rancid from the command-line, be sure that your locale environment variables are consistent between your interactive and non- interactive (ie: cron) environments. On some O/Ses, the locale will affect the operation of sort(1). Q. I have received an e-mail with the subject "rancid hung - ". What does this mean? A. It means that the lockfile for already exists and rancid-run is attempting to run again. The lockfile prevents two instances of rancid from running simulaneously for the same rancid group. This could be due to a system crash or rancid being killed and this not having an opportunity to remove the lock or rancid could indeed be hung. Look at the system process list for an existing instance of rancid for the given group. If none exist, simply remove the lockfile named in the email. If it does exist, note what process in the progress group is hung and where it is hung (see system utilities ps, lsof, truss, ktrace, strace, dtrace and so on). Kill the hung process and rancid may run to completion from there. If the hang occurs again, further debugging will be necessary and will be dependant upon what is hanging. It has been observed that some devies have had bugs that cause interactive sessions to hang at exit and exhibit other exciting behaviors. Q. Are there any characters in the banner that rancid has problems with OR I changed the device's command prompt and now collection is failing? A. The trickiest part about clogin (et al) is recognizing the prompt correctly. clogin looks for '>' and '#' to figure out if it is logged in or in enable mode. So if you have a '>' or '#' in your login banner (or other motd), then clogin gets confused and will not be able to log in correctly, and thus rancid will fail. Don't use '>' or '#', or whatever the termination character of the given device's prompt is, in your prompt or in your banner or other motd. Q. I am experiencing random login failures using telnet and am seeing a kerberos message like: Kerberos: No default realm defined for Kerberos! A. This diagnostic output can confuse rancid if it appears amid a login prompt. The easiest work around is to tell telnet not to attempt kerberos authentication: echo DEFAULT unset autologin >> ~/.telnetrc Q. I use /*login -c to run commands on multiple boxes. Sometimes these are commands that take secondary input, like a filename. How can I enter the data for that secondary prompt? A. Two methods will work. Write an expect script to be used with clogin's -s option, for which a few examples come with rancid like cisco-load.exp. OR provide all the input in one command with the -c option like so: Router#clear counters Clear "show interface" counters on all interfaces [confirm] Router# clogin -c 'clear counters\n' The specific return (\n) will be entered after 'clear counters' followed by the normal return after the command. Some devices apparently eat the linefeed of the typical Unix \r\n sequence and require that a carriage- return be used instead (\r). Q. I would like to collect device configurations every hour, but only receive diffs every Nth collection or every N hours. Is this possible? A. Certainly, but rancid does not provide such a mechanism natively. Two approaches are recommended: 1) Using your preferred mail-list software, add a list with a digest and configure your MTA (example: sendmail) to send diffs to the list. Configure the mail-list software to force the digest at the interval desired. This allows folks to choose which type they prefer, after each collection or every N hours. This method also provides easy methods to archive the diff mail and retrieve previous diffs. 2) Write a script to send diffs, which saves the time it last ran and passes this to the -D option of CVS. Obviously, the first option is the cleanest and most featureful, which is why the script mentioned in the second option is not provided. Q. I'd like to have RANCID automatically begin collection when someone finishes configuring a router. How can I do this? A. Using a syslog watcher script, one can trigger RANCID from the syslog line emitted by, for example, an IOS router after configuration mode is ended. Here's a simple example using the Simple Event Correlator: (http://simple-evcorr.sourceforge.net/) If the syslog line in your logs looks like this (wrapped for readability): Apr 5 09:56:52 acc1.geo269.example.com 72: 000069: *Mar 6 21:40:13.466 \ AEDT: %SYS-5-CONFIG_I: Configured from console by gwbush on vty0 (10.1.1.1) You would use a SEC configuration stanza like this: # example rancid trigger # type=SingleWithSuppress ptype=RegExp pattern=\s\S+:\S+\S+\s(\S+)\.example\.com.*SYS-5-CONFIG_I action=shellcmd /opt/rancid/bin/run-rancid -r $1 window=1800 This will execute the command '/opt/rancid/bin/run-rancid -r acc1.geo269' when it is fed a line like that syslog line. The command will be run at most once every 1800 seconds. If you do not get hostnames in your log lines that match your router.db entries, either fix your reverse DNS or remove the '-r $1' part. And, if it can identify the group the device is part of, include the group on the command line to reduce the time it takes rancid to locate the group and avoid lock contention. Q. I would like to limit the permissions of the rancid user on my devices. Is this possible? A. Strictly speaking, no. Rancid needs permission to read device configuration and other data which is often not available to underprivileged users. However, if you use TACACS+, you can limit the commands that are available to a user. For example, to allow ping and show, but not "show tcp", and nothing else: user = rancid { cmd = "ping" { permit .* } cmd = "show" { deny tcp.* permit .* } # the default is to deny other commands } For RADIUS, Justin Grote suggested privilege levels: http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122newft/122t/122t13/ftprienh.htm Or, on a Juniper, configure a login class, such as below: set system login class RANCID permissions access set system login class RANCID permissions admin set system login class RANCID permissions firewall set system login class RANCID permissions flow-tap set system login class RANCID permissions interface set system login class RANCID permissions network set system login class RANCID permissions routing set system login class RANCID permissions secret set system login class RANCID permissions security set system login class RANCID permissions snmp set system login class RANCID permissions storage set system login class RANCID permissions system set system login class RANCID permissions trace set system login class RANCID permissions view set system login class RANCID permissions view-configuration set system login user rancid class RANCID Q. For approximately X hosts (configs), what size server should we be considering - speed and data storage? A. On modern machines it is unlikely you will have issues with disk space or memory - A heavily laden access router with a complex config won't consume more than a few megabytes of disk space for its configs over several years time (roughly 3 times the sum of all the config or */configs/* over 2 years is a decent approximation). Rancid is typically CPU bound, if you have adequate network bandwidth, but truthfully it spends most of its time waiting for information from the devices it is polling and network RTT. Experience shows rancid takes around 50 Mhz * minutes / device of processing power. This means that a 1Ghz machine can poll: 1000 Mhz * 60 (min/hour) / 50(Mhz min / device) = 1200 devices/hour That's obviously a ball park estimate which varies with many different factors such as the CPU type, the types of devices on your network, and the speed of your network. Q. How can I run rancid to make the most efficient use of resources (i.e. run in the shortest amount of time)? A. You can adjust PAR_COUNT in rancid.conf to achieve maximum efficiency during polling. You can watch the output of the standard unix command vmstat command during polling to determine whether or not the cpu is being wholly utilized - there should be little idle time and no process blocking (see vmstat(8)). Another simpler method is to look at the time stamps on the rancid log files, and adjust PAR_COUNT until the least amount of time is taken during polling. Make sure all devices are being polled by rancid before using this method - failing devices can extend the amount of time rancid takes to finish by a *LONG* period and throw your times way off. It may help to run rancid niced (man nice) if it will be sharing resources with other processes, as it may eat whatever is available if PAR_COUNT is set high. This is done by changing the crontab to be something like: 5 * * * * nice -19 /usr/local/rancid/bin/rancid-run If you _do_ share resources with other processes but want rancid to run efficiently, probably the vmstat method above will work better - rancid may take a little longer to run but you won't be stepping on other processes. Q. *login is failing to connect to devices via ssh with an error about the lack of a matching key or cipher, etc.. A. openssh has disabled (or removed from the default list offered) a number of legacy cryptography algorithms, but many devices are still running old code without support for newer algorithms. Also, as of RANCiD 3.5, the ssh -c option default of "3des" has been dropped from all of the login scripts due to its affect on offered ssh version 2 (sshv2) ciphers, allowing sshv2 negotiation to proceed as normal. See http://www.openssh.com/legacy.html, use ssh -v to display details about the connection negotiation to see what the remote device supports. Also see Cisco bug https://tools.cisco.com/bugsearch/bug/CSCuo76464 and http://stackoverflow.com/questions/25341773/cisco-ssh-key-exchange-fails-from-ubuntu-14-04-client-dh-key-range-mismatch. Suggested fixes for these failures may include the following options being added to /etc/ssh/ssh_config or ~/.ssh/config: Host * KexAlgorithms +diffie-hellman-group1-sha1 Cipher +3des Ciphers +3des-cbc Options can also be set in .cloginrc, such as: add sshcmd * {ssh\ -o\ KexAlgorithms=+diffie-hellman-group1-sha1} Q. What else can I do with rancid? A. The possibilities are endless...rancid is non-toxic when applied properly. see Joe Abley and Stephen Stuart's NANOG presentation: http://www.nanog.org/mtg-0210/abley.html or our NANOG presentation: http://www.shrubbery.net/rancid/NANOG29/ 4) Extending RANCiD Q. I wrote scripts for an unsupported device in rancid 2.x, but I am upgrading to 3.x and rancid-fe no longer works in the same manner. Can I use the script with 3.x? A. Yes. The device type must be added to etc/rancid.types.conf such as: DeviceTypeName;script;RancidScriptName DeviceTypeName;login;LoginScriptName eg: alteon;script;arancid alteon;login;alogin See rancid.types.conf(5). Q. I Have a device that is not supported by rancid. How can I add support for it? A. You should first search the rancid-discuss mail list and the web for previous art. Someone may have already accomplished or begun the task. If that is fruitless, you should query the rancid-discuss mail list, for someone still may have done the work or you may be able to hire or bribe someone. If you must write one, it is helpful to have a basic understanding of rancid's general process. See rancid_intro(7) and this maillist thread thread: http://www.shrubbery.net/pipermail/rancid-discuss/2015-August/008630.html . You should begin with installation of the latest version of rancid. Adding device support has been made easier in 3.0 and is being made easier as it progresses. Now, following the advice from Alan in the mail thread, choose the login script and rancid library that most closely resembles the new device. If it resembles none of them, clogin and ios.pm are reasonable starting points. If you wish to contribute your scripts to the community, it is easier if they are already in the format that autoconf and automake require. You accomplish this by beginning with the .in version of the scripts/libraries. The new files must also be added to automake &/ autoconf configuration files; configure.ac and various Makefile.am files. In some cases, it is possible to use an existing script without modification or just simple changes. Do so, if possible, BUT we encourage you to contribute those changes back to the community so that they will be in the next version and it will be easier for you to upgrade in the future, AND be careful making changes as it is easy to break the login scripts. Otherwise, copy the files to the new names that you have chosen. We normally use the name of the device's O/S for the library name; eg: iosxr.pm. For the login script, we use a one or two character prefix added to login to differentiate them and try to keep the names unique within the filesystem. If you must create an pre-rancid 3.0 style rancid script, the name typically follows the same format as the login script. Add your definitions to rancid.types.conf(5). Alter the login script so that interactive login works; then make -c, -s, and -x work. There are more tips on testing the login script earlier in this FAQ. After the login script works, work on the rancid library. First the inloop function, followed by the filter functions. See tips regarding rancid -d and .raw files earlier in this FAQ. Q. I wish to extend an existing rancid device type with an additional command or alter the filtering of an existing command. What is the recommended way to do this? A. It is recommended that a new device type is defined, so that future upgrades can proceed more smoothly. To do so, read the previous question in this FAQ and see the device type iosshtech in rancid.types.conf(5) for an example of altering the commands run for an existing device type ("ios"), adding a command and adding a new filtering function. 5) O/S specific 5a) General Q. After an O/S upgrade, ssh logins are failing errors such as: Unable to negotiate with IP port 22: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1 or %SSH-3-DH_RANGE_FAIL: Client DH key range mismatch with maximum configured DH key on server. A. The latest openssh has dropped support for less secure ciphers, but many devices do not support the newer ciphers, kex, etc. Work-arounds: Set a different cipher in your .cloginrc, eg: add cyphertype * {aes256-cbc} Re-enable the old kex method in your ssh_config (weakdh.org): Host * GSSAPIAuthentication yes KexAlgorithms +diffie-hellman-group1-sha1 Increase the key size in Cisco config: ip ssh dh min size 4096 5b) Linux Q. Some device collections have erroneously duplicated characters and which characters are duplicated often change between collections. Such as: - access-list 1 permit x.x.x.101 + access-list 1 permitt x.x.x.101 A. This is likely a problem with Kerberized-telnet. See http://www.shrubbery.net/pipermail/rancid-discuss/2011-September/005929.html http://www.shrubbery.net/pipermail/rancid-discuss/2012-February/006210.html There was also a bug prior to 3.0 affecting hlogin and other login scripts that use hpuifilter. 6) E-mail problems Q. Can you help with e-mail delivery problems? A. No. rancid-discuss@, etc. are not Mail Transport Agent (MTA) support lists. There are lists specific to Postfix, Sendmail, and all other MTAs; google for those lists and ask there. But, to explain basic information about rancid's use of the MTA is simple, rancid simply uses sendmail (the executable, named so for historical purposes and not to be confused with the MTA Sendmail) to send mail to the per-rancid-group aliases described in the rancid source README and rancid.conf(5) and affected by rancid.conf(5) and build-time configure options. All mail delivery is performed by the MTA. For debugging the MTA, review logs that your MTA generates and try sending mail to the per-rancid-group aliases manually using your favorite Mail User Agent (MUA) (Mail, mail, mutt, pine, etc) and see debugging options that your MTA and MUA offer for providing more logs, delivery receipts, etc. Q. I just want to store configrurations, I do not want to receive diffs. How can I accomplish this? A. Use procmail to filter them out of your inbox. OR, redirect the mail aliases in your MTA's aliases file or database to a mailman list with no subscribers. OR, redirect the mail aliases to /dev/null. OR, set DIFFSCRIPT in rancid.conf to something that eats it's input, such as "dd of=/dev/null bs=16k". 7) I am still stuck Q. I'm still stuck on this problem. Where can I get more help? A. A discussion list is available, rancid-discuss@shrubbery.net. You must be a subscriber to post. Subscribe like this: shell% echo "subscribe" | mail rancid-discuss-request@shrubbery.net Please, no questions about Unix or MTAs. This is not a Unix, MTA, or Perl help list. If you ask such questions, you will most likely be ignored. 8) License Q. Please explain the RANCID license. A. Quite simple; read it. It is a slightly modified BSD license; it has an additional clause.