Monitoring Production Enviroments – Availability Monitoring

This entry is going to discuss how to configure Nagios to monitor the availability of your system, and how to configure it to detect incidents and have it inform you or take certain actions when certain events occur.


Why Nagios:

Nagios being an open source system has set a new industry standard when it comes to system and network monitoring, its yet to prove its self in application monitoring however my experience with Nagios proves that its just as good as any other product when it comes to application monitoring, not to mention its FREE and easy enough to learn in a matter of a few days, Additionally during almost 2 years Nagios didnt crash once.

First of all some Nagios basics:

Nagios configuration folders are different from one Linux distribution to another so please use find command to locate the files, this entry is based Nagios installed on a Red Hat disrtu.

The logical place to start when configuring Nagios to monitor your environment is the nagios.cfg file, this file includes all the locations of other configuration files, in this entry we are going to focus on the files under the # OBJECT CONFIGURATION FILE(S) header, under this header you’ll find all the files that include the monitored services per each host, the monitoring commands as well as the contacts to contact in case of a failure.

 libexec:All the nagios plugins can be found in the libexec folder, these plugins are the building blocks you can use to create any alarm.

contacts.cfg: this file has all the contacts that nagios will contact in case of a certain alarm, more on that later.

commands.cfg : this is a wrapper that translate the plugin/script into a command that nagios can use.

How to integrate a new alarm using the ssh wrapper (check_by_ssh):

Start by verifying if the nagios user on the  nagios machine can access the machine that has the service that needs to be monitored, if not export the .ssh key into the authorized_keys file in the target machine.

Then you need to create the monitoring script on the target machine, this script should do the check you need and exit as 0, 1 or 2, nagios would translate that to ok, warning, and critical, before the exit you should echo the message you want to apear on nagios and in the email sent on that alarm.

Once this script is written you need to integrate it into nagios, but first start by testing it, using the nagios user try the following command

./check_by_ssh -H target_machine -C ‘/path/to/script/’ -u user_you_connect_to -v

This script would return what nagios would see when running that script, once the script is verified, you need to integrate it into the object cfg file, an entry should look like the following:

define service{
        contact_groups          contacts_to_be_emailed
        use                     local-service
        host_name               target_machine
        service_description     Alarm description
        name                    Alarm_name_to_apear_on_nagios
        check_interval          #of_Minutes_between_each_check
        event_handler           script_to_run_on_failure
        check_command           check_using_ssh!’/path/to/’!user_to_connect_to!

The script_to_run_on_failure should be integrated into the commands.cfg file, because objects cfg files wouldnt understand a scripts path, it looks up these values in the cfg file before procceeding, so you should add an entry similar to the following in the commands.cfg, this would append the alarm text into a logging file.

define command{
 command_name   script_to_run_on_failure

finally, once you are satisfied with the alarm do the following /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg to debug your config files, if everything is ok, reload the service using sudo /etc/init.d/nagios reload


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s