Documentation for Autocheck v4.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 Introduction
2 Remote perl scripts
2.1 Output by remote perl scripts
2.2 How remote scripts are created
2.2.1 How parameters in the RUNFILE affect the remote script
2.2.2 The actual remote perl script
2.3 Script types in Autocheck v4
3 The RUNFILE
3.1 RUNFILE definition format
3.1.1 RUNFILE special variables
3.1.2 RUNFILE variables
4 The CHECKFILE
4.1 CHECKFILE definition format
4.2 Check output
4.2.1 Check output format and variables for checks
5 The RULEFILE
5.1 RULEFILE definition format
5.2 Check dependant rules
6 The ACTIONFILE
6.1 ACTIONFILE definition format
6.2 ACTIONFILE variables
7 The PAGERULES
7.1 PAGERRULES format
8 The MAILRULES
9 The PHONEBOOK
9.1 PHONEBOOK format
9.1.1 Special variables
9.1.2 Name definitions
9.1.3 Group definitions
10 The config file
10.1 Config file format
11 The pager service
1 Introduction
~~~~~~~~~~~~~~~
Autocheck has been completely rewritten in Perl (again) and is now
up to it's fourth incarnation.
Autocheck v4 is copyright Henrik Bilar (hbilar@users.sourceforge.net)
and is licensed under the terms of the GNU General Public License
(GPL). The full terms of this license can be found in the file
LICENSE distributed with this software package. The terms and
conditions are also available from the GNU website
(http://www.gnu.org).
The idea this time is that autocheck uses the rlogin tools to create
a perl script on the remote machine and runs it. The perl script
produces some output that the main autocheck modules parses. Autocheck
now also supports local checks without the need for rlogin and ssh
for secure checking.
There are sample config files in sample-conf/ and they should ideally
be placed in /etc/autocheck or some similar directory. The directory is
configurable in ac4.config which should be placed in /etc,
/usr/local/etc or the current directory.
2 Remote perl scripts
~~~~~~~~~~~~~~~~~~~~~~
The remote perl scripts are always created on the fly, just before
they are run. This prevents version problems with different versions
lingering on different systems producing different (incompatible)
output.
2.1 Output by remote perl scripts
The perl scripts produces output on STDOUT. The first part of the
data is the data used by the perl autocheck module on the checking
server. This data should be consistent accross platforms. Next comes
a line with the textstring BEGINDETAIL. Any text after the BEGINDETAIL
is ignored by the actual autocheck module but it is included in the
message sent to the user in case of an alarm (or reset of a previous
alarm).
The output of the perl scripts on the remote machines are always in
the following format:
LINE DATA
1-n contains arbitrary data parsed by the modules.
n+1 contains the text string BEGINDETAIL
(n+2)-m contains detailed output used in the reports sent to
users. This can be eg the output of the commands
processed.
2.2 How remote scripts are created
Remote scripts are generated on the autocheck server and written
to the remote host with the rlogin/rsh/ssh tools. If the check is
performed locally the scripts are created by accessing the file
system directly.
2.2.1 How parameters in the RUNFILE affect the remote script
All the parameters in the RUNFILE are converted to perl variables
and written to the top of the script. The variables are encapsulated
with double quotes, eg the string
host = myhost.mydomain.com
would be written to the perl script like this:
$host = "myhost.mydomain.com";
Some characters get escaped, ie they get an extra backslash inserted
before them. The characters are \"@'$. This to prevent the double quoted
strings to evaluate to perl functions.
All of the above string processing can be avoided by prepending the
variable in the RUNFILE with \ESC\. If \ESC\ is the first part of the
string the string is NOT encapsulated by double quotes, nor are any
characters backslashed. This is useful for creating eg references to
lists, hashes, interpolated strings etc. The string
words = \ESC\ ["SUBMIT", "_CF_onError"]
will get written to the perl script like this:
$words = ["SUBMIT", "_CF_onError"];
which is a reference to an unnamed list.
2.2.2 The actual remote perl script
The remote perl script is generated on the fly. The script gets
fetched from the CHECKFILE. The code generated is different depending
on which type of host is being checked.
The script definition is started with a line
and terminated with a line. The hosttypes can be any
hosttypes this script is to be used for, or "all" which will make the
script the default script to run.
The scriptfile gets parsed from top to bottom, and every time a matching
definition is found the current definition (which will be written to
the remote script) is set to that definition. What this means is that
any default scripts should come before the specialised scripts.
Consider the file:
.....
.....
....
This definition is flawed, as the generic comes after
the specialised definitions, hence no specialised definitions will ever
be used.
2.3 Script types in Autocheck v4
diskfree Lists the percentage free for all disk
partitions. The output is in the format
mountpoint = device;percentage free.
ping Prints packet loss to all hosts specified in
the RUNFILE pinghosts variable. The output is
in the form:
host:lossrate:roundtrip_min:roundtrip_avg:
roundtrip_max
swap Returns the percentage of free swap space.
cpu Returns the percentage of free CPU time.
dns Returns 1 if the lookup of host "lookuphost"
on server "dns" (both from the RUNFILE) succeeded
or 0 if it failed.
informix-dbfree Returns list of percentage free in all the
Informix DB spaces. The only RUNFILE parameter
is "server" (informix instance).
The format of the output is:
dbspace:freeperc
informix-status Returns 1 if the status command of the instance
specified in "server" contains the substring
"expectString", otherwise returns 0.
process-check Returns 1 if NOT all the processes specified
in the list "processes" are running. Note that
the list in the RUNFILE needs to be specified
like this:
processes = \ESC\ ["named", "oninit" ]
3 The RUNFILE
~~~~~~~~~~~~~~
The RUNFILE is the file that defines what checks to run, where to run
them and also what parameters to set.
3.1 RUNFILE definition format
The file format looks somewhat like an XML file, each "check" is defined
with a <>-tag, eg , and terminated with a >-tag, eg
. Within these tags, parameters are specified. Three parameters
are mandatory for all checks. These are host, hosttype and protocol.
3.1.1 RUNFILE special variables
The three variables host, hosttype and protocol are mandatory.
The "host" variable defines on which host to run the current check.
The "hosttype" variable defines what type of host the check is being
run on.
The "protocol" variable defines what protocol to use for communication
with the host. This variable can currently have two different values,
"local" and "rlogin". If the variable is "local" all actions regarding
script generation etc are done locally on the server. If the variable
is "rlogin" the rlogin/rsh protocol is used for creating and running
the script remotely. "ssh" can be used instead of "rlogin" to use the
Secure Shell utility.
3.1.2 RUNFILE variables
Any number of variables can be defined for a check. They are always in the
format
name = value
The runfile variables are used to control behaviour of the check programs.
The check definition makes use of the variables to know eg what processes
to check for for the process-check check.
The value can be escaped by prepending it with the string "\ESC\" (without
the quotes. If added, no string manipulation will be performed on the
string and it will be copied to the remote script *as it stands*.
Example
processes = \ESC\ ["named", "oninit" ]
will generate the following perl code:
$processes = ["named", "oninit" ];
where as
processes = named, oninit
would generate
$processes = "named, oninit";
4 The CHECKFILE
~~~~~~~~~~~~~~~~
The checkfile defines all the checks for different operating systems.
The sole purpose of the checkscripts is to send back information to the
autocheck engine running on the server. The engine then processes this
data locally. Ie, the checks should not perform any actions other than
collecting data!
4.1 CHECKFILE definition format
The file format, again, resembles that of an XML document. Each "check"
definition starts with a -tag and ends with a
-tag.
For example
....
defines what the script on the remote system should look like should the
runfile specify a diskfree check on a system with hosttype of the ones
specified after the checkname.
The hosttypes can be substituted with "all" in which case this check would
be the definition for every system type.
It is important to realize that the checks are just normal perl code with
special variables appended to the beginning.
4.2 Check output
The various parts of the check engine expect to get data in a certain format.
The output expected always has the generic format of n number of rows of
actual check data, then a row containing the single text string "BEGINDETAIL"
and then any output that will be sent to the mailrecipients etc in case of
an alarm.
4.2.1 Check output format and variables for checks
This section describes what data portion of the checks should look like.
diskfree:
Diskfree checks should return the amount of free diskspace (in percent)
for all the mounted volumes.
Output:
for each mountpoint:
mountpoint = device;free space
Runfile Args:
None
ping:
Ping returns ping status information for all the hosts passed to it in
the runfile variable pinghosts (hosts are separated by a space).
Output:
for each host:
host:%lost:rountrip_min:roundtrip_avg:roundtrip_max
Runfile Args:
pinghosts space separated list of hosts
swap:
Swap returns the amount of free swap space (in percent).
Output:
line 1:
percent free swapspace
Runfile Args:
None
cpu:
Cpu returns the amount of free cpu time (in percent).
Output:
line 1:
percent free cpu time
Runfile Args:
None
html:
The Html check is designed to check for web server availability.
Output:
line 1:
1 if server is up, 0 if server is down
Runfile Args:
url url to webpage (can include protocol specification)
words escaped perl list reference of words to look for
in the data the server returned.
Example:
\ESC\ ["SUBMIT", "_CF_onError"]
dns:
The DNS check is designed to check for DNS availability.
Output:
line 1:
1 if server is up, 0 if server is down
Runfile Args:
lookuphost address to attempt to look up on dns server
dns dns server to test
informix-dbfree:
This check is designed to check the database for free space.
Output:
for each dbspace:
dbspace:percent free
Runfile Args:
server informix instance
informix-logsfree:
This check is designed to keep track of the logical logs.
Output:
line 1:
Percent free logical logs.
Runfile Args:
server informix instance
informix-status:
This check checks the status of the informix instance specified.
Output:
line 1:
1 if server online, 0 if off line
Runfile Args:
server informix instance
expectString string to look for in output of onstat command
when determining if server is online or not.
process-check:
This check checks for processes running on a system.
Output:
line 1:
1 if all processes running, 0 otherwise.
Runfile Args:
processes escaped perl list reference of processes to
look for.
Example:
\ESC\ ["named", "oninit" ]
5 The RULEFILE
~~~~~~~~~~~~~~~
The RULEFILE is the file that defines what limits would raise an alarm of
a specified level on a specific host.
5.1 RULEFILE definition format
The file format resembles an XML document. Each rule has a
start tag and a end tag. The hostname part of the
rule defines for what host this rule is authorative. The hostname can be
substituted with "all" in which case this rule will be authorative for all
systems.
Each rule should have a default line called "default".
The actual rules are given as colon separated values and are specified
highest level to lowest level. Ie, a line of 5:10:15 for a diskfree rule
would set the alarm level to 1 if the diskfree returned 4% free space and
level 3 if the amount of free space was 12.
Note that rules do not make sense for a few types of checks and therefore
they have no entries in the rulefile.
5.2 Check dependant rules
diskfree:
default|mountpoint colon separated list of perc free diskspace
that would generate an alarm.
ping:
default|hostname colon separated list of time down (minutes)
swap:
default colon separated list of perc free swap space
cpu:
default colon separated list of perc free cpu time
html:
default|url colon separated list of time down (seconds)
dns:
default|server colon separated list of time down (seconds)
informix-dbfree:
default|instance_dbspace colon separated list of perc free
dbspace
informix-logsfree:
default|instance colon separated list of perc free logs
informix-status:
default|dbinstance colon separated list of time down (seconds)
6 The ACTIONFILE
~~~~~~~~~~~~~~~~~
The action file defines how to notify the operators about the alarm conditions.
6.1 ACTIONFILE definition format
The file format resembles an XML document. Each action has a
start tag and a end tag. Each action definition also
holds definitions of what actions to take in case of a certain alarm level.
The alarm level definitions start with |all> and end with
tag.
All variables are given as key-value pairs.
6.2 ACTIONFILE variables
action defines what actions to take
value: mail or page or script or combination of all three
mailrecp defines mail recipients for the alarm. Comma separated list.
pagerecp names or groups from the phonebook to receive sms text messages.
scriptname Name and script from the SCRIPTDIR directory to copy to host and
run. The part up to the first space is taken as the script name
and any text after the script name are given as arguments to the
remote script.
7 The PAGERULES
~~~~~~~~~~~~~~~~
This file defines when specific alarms are allowed to be paged to specific
users.
7.1 PAGERRULES format
phonebook_entry = comma separated list of time definitions
The phonebook_entry is a group or a name from the PHONEBOOK.
The time definition has the format (the brackets are _not_ optional)
[dow_start-dow_end][hhmm_start-hhmm_end]
dow = Day Of Week (1-7 (1 = Monday))
hhmm = Hour Minute (eg 0100 for 1am and 1730 for 5.30pm)
8 The MAILRULES
~~~~~~~~~~~~~~~
Does the same for mail as the PAGERRULES (section 7) does for paging.
A default section can be defined like
recp = time limit
and that will then be used if no other section in the file matches.
See section 7 for time limit format
9 The PHONEBOOK
~~~~~~~~~~~~~~~~
The phonebook keeps track of phone numbers (eg for sms text messages).
9.1 PHONEBOOK format
Entries are entered one on each line. The format for the entries are
entry:data
An entry can be a special variable, a name or a group.
9.1.1 Special variables
There are currently two special variables (entries).
The + entry defines the international dialing code. This is prepended to
a phone number definition if the phone number starts with a plus (+).
The dialingprefix entry gets prepended to every number.
9.1.2 Name definitions
A name definition defines a name and a number to associate with that name.
Eg henrik:_447900847485 would define the name henrik and associate the correct
number with that name.
9.1.3 Group definitions
A group definition begins with a @. It can contain other groups, names
and numbers. The individual entries are separated by commas.
Recursive groups are not allowed and they are detected by the parser.
10 The config file
~~~~~~~~~~~~~~~~~
Autocheck searches for the config file in (in order) /etc/ac4.config,
/usr/local/etc/ac4.config and in the current directory. If a config file
is found processing in the following locations is aborted.
10.1 Config file format
Values are given as key-value pairs. They are documented in the config file.
11 The pager service
~~~~~~~~~~~~~~~~~~~~
The pager alarm service requires the pager daemon to be running on a machine
on the network. The pager daemon is located in the same CVS tree as ac4 and
the module name is sms-perl.