Documentation for Autocheck v4. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 Introduction 2 Remote perl scripts 2.1 Output by remote perl scripts 2.2 How remote scripts are created 2.2.1 How parameters in the RUNFILE affect the remote script 2.2.2 The actual remote perl script 2.3 Script types in Autocheck v4 3 The RUNFILE 3.1 RUNFILE definition format 3.1.1 RUNFILE special variables 3.1.2 RUNFILE variables 4 The CHECKFILE 4.1 CHECKFILE definition format 4.2 Check output 4.2.1 Check output format and variables for checks 5 The RULEFILE 5.1 RULEFILE definition format 5.2 Check dependant rules 6 The ACTIONFILE 6.1 ACTIONFILE definition format 6.2 ACTIONFILE variables 7 The PAGERULES 7.1 PAGERRULES format 8 The MAILRULES 9 The PHONEBOOK 9.1 PHONEBOOK format 9.1.1 Special variables 9.1.2 Name definitions 9.1.3 Group definitions 10 The config file 10.1 Config file format 11 The pager service 1 Introduction ~~~~~~~~~~~~~~~ Autocheck has been completely rewritten in Perl (again) and is now up to it's fourth incarnation. Autocheck v4 is copyright Henrik Bilar (hbilar@users.sourceforge.net) and is licensed under the terms of the GNU General Public License (GPL). The full terms of this license can be found in the file LICENSE distributed with this software package. The terms and conditions are also available from the GNU website (http://www.gnu.org). The idea this time is that autocheck uses the rlogin tools to create a perl script on the remote machine and runs it. The perl script produces some output that the main autocheck modules parses. Autocheck now also supports local checks without the need for rlogin and ssh for secure checking. There are sample config files in sample-conf/ and they should ideally be placed in /etc/autocheck or some similar directory. The directory is configurable in ac4.config which should be placed in /etc, /usr/local/etc or the current directory. 2 Remote perl scripts ~~~~~~~~~~~~~~~~~~~~~~ The remote perl scripts are always created on the fly, just before they are run. This prevents version problems with different versions lingering on different systems producing different (incompatible) output. 2.1 Output by remote perl scripts The perl scripts produces output on STDOUT. The first part of the data is the data used by the perl autocheck module on the checking server. This data should be consistent accross platforms. Next comes a line with the textstring BEGINDETAIL. Any text after the BEGINDETAIL is ignored by the actual autocheck module but it is included in the message sent to the user in case of an alarm (or reset of a previous alarm). The output of the perl scripts on the remote machines are always in the following format: LINE DATA 1-n contains arbitrary data parsed by the modules. n+1 contains the text string BEGINDETAIL (n+2)-m contains detailed output used in the reports sent to users. This can be eg the output of the commands processed. 2.2 How remote scripts are created Remote scripts are generated on the autocheck server and written to the remote host with the rlogin/rsh/ssh tools. If the check is performed locally the scripts are created by accessing the file system directly. 2.2.1 How parameters in the RUNFILE affect the remote script All the parameters in the RUNFILE are converted to perl variables and written to the top of the script. The variables are encapsulated with double quotes, eg the string host = myhost.mydomain.com would be written to the perl script like this: $host = "myhost.mydomain.com"; Some characters get escaped, ie they get an extra backslash inserted before them. The characters are \"@'$. This to prevent the double quoted strings to evaluate to perl functions. All of the above string processing can be avoided by prepending the variable in the RUNFILE with \ESC\. If \ESC\ is the first part of the string the string is NOT encapsulated by double quotes, nor are any characters backslashed. This is useful for creating eg references to lists, hashes, interpolated strings etc. The string words = \ESC\ ["SUBMIT", "_CF_onError"] will get written to the perl script like this: $words = ["SUBMIT", "_CF_onError"]; which is a reference to an unnamed list. 2.2.2 The actual remote perl script The remote perl script is generated on the fly. The script gets fetched from the CHECKFILE. The code generated is different depending on which type of host is being checked. The script definition is started with a line and terminated with a line. The hosttypes can be any hosttypes this script is to be used for, or "all" which will make the script the default script to run. The scriptfile gets parsed from top to bottom, and every time a matching definition is found the current definition (which will be written to the remote script) is set to that definition. What this means is that any default scripts should come before the specialised scripts. Consider the file: ..... ..... .... This definition is flawed, as the generic comes after the specialised definitions, hence no specialised definitions will ever be used. 2.3 Script types in Autocheck v4 diskfree Lists the percentage free for all disk partitions. The output is in the format mountpoint = device;percentage free. ping Prints packet loss to all hosts specified in the RUNFILE pinghosts variable. The output is in the form: host:lossrate:roundtrip_min:roundtrip_avg: roundtrip_max swap Returns the percentage of free swap space. cpu Returns the percentage of free CPU time. dns Returns 1 if the lookup of host "lookuphost" on server "dns" (both from the RUNFILE) succeeded or 0 if it failed. informix-dbfree Returns list of percentage free in all the Informix DB spaces. The only RUNFILE parameter is "server" (informix instance). The format of the output is: dbspace:freeperc informix-status Returns 1 if the status command of the instance specified in "server" contains the substring "expectString", otherwise returns 0. process-check Returns 1 if NOT all the processes specified in the list "processes" are running. Note that the list in the RUNFILE needs to be specified like this: processes = \ESC\ ["named", "oninit" ] 3 The RUNFILE ~~~~~~~~~~~~~~ The RUNFILE is the file that defines what checks to run, where to run them and also what parameters to set. 3.1 RUNFILE definition format The file format looks somewhat like an XML file, each "check" is defined with a <>-tag, eg , and terminated with a -tag, eg . Within these tags, parameters are specified. Three parameters are mandatory for all checks. These are host, hosttype and protocol. 3.1.1 RUNFILE special variables The three variables host, hosttype and protocol are mandatory. The "host" variable defines on which host to run the current check. The "hosttype" variable defines what type of host the check is being run on. The "protocol" variable defines what protocol to use for communication with the host. This variable can currently have two different values, "local" and "rlogin". If the variable is "local" all actions regarding script generation etc are done locally on the server. If the variable is "rlogin" the rlogin/rsh protocol is used for creating and running the script remotely. "ssh" can be used instead of "rlogin" to use the Secure Shell utility. 3.1.2 RUNFILE variables Any number of variables can be defined for a check. They are always in the format name = value The runfile variables are used to control behaviour of the check programs. The check definition makes use of the variables to know eg what processes to check for for the process-check check. The value can be escaped by prepending it with the string "\ESC\" (without the quotes. If added, no string manipulation will be performed on the string and it will be copied to the remote script *as it stands*. Example processes = \ESC\ ["named", "oninit" ] will generate the following perl code: $processes = ["named", "oninit" ]; where as processes = named, oninit would generate $processes = "named, oninit"; 4 The CHECKFILE ~~~~~~~~~~~~~~~~ The checkfile defines all the checks for different operating systems. The sole purpose of the checkscripts is to send back information to the autocheck engine running on the server. The engine then processes this data locally. Ie, the checks should not perform any actions other than collecting data! 4.1 CHECKFILE definition format The file format, again, resembles that of an XML document. Each "check" definition starts with a -tag and ends with a -tag. For example .... defines what the script on the remote system should look like should the runfile specify a diskfree check on a system with hosttype of the ones specified after the checkname. The hosttypes can be substituted with "all" in which case this check would be the definition for every system type. It is important to realize that the checks are just normal perl code with special variables appended to the beginning. 4.2 Check output The various parts of the check engine expect to get data in a certain format. The output expected always has the generic format of n number of rows of actual check data, then a row containing the single text string "BEGINDETAIL" and then any output that will be sent to the mailrecipients etc in case of an alarm. 4.2.1 Check output format and variables for checks This section describes what data portion of the checks should look like. diskfree: Diskfree checks should return the amount of free diskspace (in percent) for all the mounted volumes. Output: for each mountpoint: mountpoint = device;free space Runfile Args: None ping: Ping returns ping status information for all the hosts passed to it in the runfile variable pinghosts (hosts are separated by a space). Output: for each host: host:%lost:rountrip_min:roundtrip_avg:roundtrip_max Runfile Args: pinghosts space separated list of hosts swap: Swap returns the amount of free swap space (in percent). Output: line 1: percent free swapspace Runfile Args: None cpu: Cpu returns the amount of free cpu time (in percent). Output: line 1: percent free cpu time Runfile Args: None html: The Html check is designed to check for web server availability. Output: line 1: 1 if server is up, 0 if server is down Runfile Args: url url to webpage (can include protocol specification) words escaped perl list reference of words to look for in the data the server returned. Example: \ESC\ ["SUBMIT", "_CF_onError"] dns: The DNS check is designed to check for DNS availability. Output: line 1: 1 if server is up, 0 if server is down Runfile Args: lookuphost address to attempt to look up on dns server dns dns server to test informix-dbfree: This check is designed to check the database for free space. Output: for each dbspace: dbspace:percent free Runfile Args: server informix instance informix-logsfree: This check is designed to keep track of the logical logs. Output: line 1: Percent free logical logs. Runfile Args: server informix instance informix-status: This check checks the status of the informix instance specified. Output: line 1: 1 if server online, 0 if off line Runfile Args: server informix instance expectString string to look for in output of onstat command when determining if server is online or not. process-check: This check checks for processes running on a system. Output: line 1: 1 if all processes running, 0 otherwise. Runfile Args: processes escaped perl list reference of processes to look for. Example: \ESC\ ["named", "oninit" ] 5 The RULEFILE ~~~~~~~~~~~~~~~ The RULEFILE is the file that defines what limits would raise an alarm of a specified level on a specific host. 5.1 RULEFILE definition format The file format resembles an XML document. Each rule has a start tag and a end tag. The hostname part of the rule defines for what host this rule is authorative. The hostname can be substituted with "all" in which case this rule will be authorative for all systems. Each rule should have a default line called "default". The actual rules are given as colon separated values and are specified highest level to lowest level. Ie, a line of 5:10:15 for a diskfree rule would set the alarm level to 1 if the diskfree returned 4% free space and level 3 if the amount of free space was 12. Note that rules do not make sense for a few types of checks and therefore they have no entries in the rulefile. 5.2 Check dependant rules diskfree: default|mountpoint colon separated list of perc free diskspace that would generate an alarm. ping: default|hostname colon separated list of time down (minutes) swap: default colon separated list of perc free swap space cpu: default colon separated list of perc free cpu time html: default|url colon separated list of time down (seconds) dns: default|server colon separated list of time down (seconds) informix-dbfree: default|instance_dbspace colon separated list of perc free dbspace informix-logsfree: default|instance colon separated list of perc free logs informix-status: default|dbinstance colon separated list of time down (seconds) 6 The ACTIONFILE ~~~~~~~~~~~~~~~~~ The action file defines how to notify the operators about the alarm conditions. 6.1 ACTIONFILE definition format The file format resembles an XML document. Each action has a start tag and a end tag. Each action definition also holds definitions of what actions to take in case of a certain alarm level. The alarm level definitions start with |all> and end with tag. All variables are given as key-value pairs. 6.2 ACTIONFILE variables action defines what actions to take value: mail or page or script or combination of all three mailrecp defines mail recipients for the alarm. Comma separated list. pagerecp names or groups from the phonebook to receive sms text messages. scriptname Name and script from the SCRIPTDIR directory to copy to host and run. The part up to the first space is taken as the script name and any text after the script name are given as arguments to the remote script. 7 The PAGERULES ~~~~~~~~~~~~~~~~ This file defines when specific alarms are allowed to be paged to specific users. 7.1 PAGERRULES format phonebook_entry = comma separated list of time definitions The phonebook_entry is a group or a name from the PHONEBOOK. The time definition has the format (the brackets are _not_ optional) [dow_start-dow_end][hhmm_start-hhmm_end] dow = Day Of Week (1-7 (1 = Monday)) hhmm = Hour Minute (eg 0100 for 1am and 1730 for 5.30pm) 8 The MAILRULES ~~~~~~~~~~~~~~~ Does the same for mail as the PAGERRULES (section 7) does for paging. A default section can be defined like recp = time limit and that will then be used if no other section in the file matches. See section 7 for time limit format 9 The PHONEBOOK ~~~~~~~~~~~~~~~~ The phonebook keeps track of phone numbers (eg for sms text messages). 9.1 PHONEBOOK format Entries are entered one on each line. The format for the entries are entry:data An entry can be a special variable, a name or a group. 9.1.1 Special variables There are currently two special variables (entries). The + entry defines the international dialing code. This is prepended to a phone number definition if the phone number starts with a plus (+). The dialingprefix entry gets prepended to every number. 9.1.2 Name definitions A name definition defines a name and a number to associate with that name. Eg henrik:_447900847485 would define the name henrik and associate the correct number with that name. 9.1.3 Group definitions A group definition begins with a @. It can contain other groups, names and numbers. The individual entries are separated by commas. Recursive groups are not allowed and they are detected by the parser. 10 The config file ~~~~~~~~~~~~~~~~~ Autocheck searches for the config file in (in order) /etc/ac4.config, /usr/local/etc/ac4.config and in the current directory. If a config file is found processing in the following locations is aborted. 10.1 Config file format Values are given as key-value pairs. They are documented in the config file. 11 The pager service ~~~~~~~~~~~~~~~~~~~~ The pager alarm service requires the pager daemon to be running on a machine on the network. The pager daemon is located in the same CVS tree as ac4 and the module name is sms-perl.