This entry is going to discuss how to configure Nagios to monitor the availability of your system, and how to configure it to detect incidents and have it inform you or take certain actions when certain events occur.
Nagios being an open source system has set a new industry standard when it comes to system and network monitoring, its yet to prove its self in application monitoring however my experience with Nagios proves that its just as good as any other product when it comes to application monitoring, not to mention its FREE and easy enough to learn in a matter of a few days, Additionally during almost 2 years Nagios didnt crash once.
First of all some Nagios basics:
Nagios configuration folders are different from one Linux distribution to another so please use find command to locate the files, this entry is based Nagios installed on a Red Hat disrtu.
The logical place to start when configuring Nagios to monitor your environment is the nagios.cfg file, this file includes all the locations of other configuration files, in this entry we are going to focus on the files under the # OBJECT CONFIGURATION FILE(S) header, under this header you’ll find all the files that include the monitored services per each host, the monitoring commands as well as the contacts to contact in case of a failure.
libexec:All the nagios plugins can be found in the libexec folder, these plugins are the building blocks you can use to create any alarm.
contacts.cfg: this file has all the contacts that nagios will contact in case of a certain alarm, more on that later.
commands.cfg : this is a wrapper that translate the plugin/script into a command that nagios can use.
How to integrate a new alarm using the ssh wrapper (check_by_ssh):
Start by verifying if the nagios user on the nagios machine can access the machine that has the service that needs to be monitored, if not export the .ssh key into the authorized_keys file in the target machine.
Then you need to create the monitoring script on the target machine, this script should do the check you need and exit as 0, 1 or 2, nagios would translate that to ok, warning, and critical, before the exit you should echo the message you want to apear on nagios and in the email sent on that alarm.
Once this script is written you need to integrate it into nagios, but first start by testing it, using the nagios user try the following command
./check_by_ssh -H target_machine -C ‘/path/to/script/script.sh’ -u user_you_connect_to -v
This script would return what nagios would see when running that script, once the script is verified, you need to integrate it into the object cfg file, an entry should look like the following:
service_description Alarm description
The script_to_run_on_failure should be integrated into the commands.cfg file, because objects cfg files wouldnt understand a scripts path, it looks up these values in the cfg file before procceeding, so you should add an entry similar to the following in the commands.cfg, this would append the alarm text into a logging file.
command_line echo -e “$NOTIFICATIONTYPE$\nService:$SERVICEDESC$:$HOSTALIAS$\nState:$SERVICESTATE$\n\n$SERVICEOUTPUT$ **$NOTIFICATIONTYPE$”>>/tmp/test.t
finally, once you are satisfied with the alarm do the following /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg to debug your config files, if everything is ok, reload the service using sudo /etc/init.d/nagios reload