Python script to force recheck all critical services in Nagios

in nagios •  7 years ago 

If you've ever run a Nagios server, you may have noticed that it's sometimes a bit of a pain to manually recheck a bunch of service monitors after you've made some changes to your network or hosts, or services.

The normal way of re-checking a Nagios service monitor is through the GUI. However, Nagios allows for the possibility for external programs to send commands to Nagios through the external command file.

This file is usually located in: /usr/local/nagios/var/rw/nagios.cmd, however, your file may be located elsewhere, depending on your distro and installation procedure. You can always find it by installing mlocate and running the locate command:

 $ locate nagios.cmd

Here is the Nagios help page for External Commands:

https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/extcommands.html

The key to using External Commands is to realise that the current state of the Nagios system (that is, the current state of all its monitored hosts and services) is also stored in a file. This file is called 'status.dat' and is typically located in: /var/log/nagios/status.dat

Again, your mileage may vary, depending on your distro and installation.

This is the file that Nagios uses to store the current status, comment, and downtime information. This file is used by the CGIs so that current monitoring status can be reported via a web interface. The CGIs must have read access to this file in order to function properly. This file is deleted every time Nagios stops and recreated when it starts. [1]

We can use these two files in combination by scanning the status.dat file for hosts or services which are in a particular state, and then sending commands to nagios.cmd file to process them.

Below is a Python script which I wrote which opens the status.dat file (the location of which you may have to update), finds the entry for service (each service has its own paragraph), and determines its current status.

If the status is critical, it sends a command to the nagios.cmd file to force re-check the service.


import re
import os

print "Force rechecking the following services:"
print

with open('/var/log/nagios/status.dat') as file:
    for line in file:
        if 'servicestatus {' in line:
                for i in range(15):
                        myline=file.next().strip()
                        if re.match("host_name",myline):
                                host_name=myline.strip()
                        if re.match("service_description",myline):
                                service_description=myline.strip()
                        if re.match("current_state",myline):
                                current_state=myline.strip()
                        else:
                                current_state=""
                        if ('current_state=2' in current_state) or ('current_state=1' in current_state):
                                host_name = host_name.split("=",1)[1]
                                service_description =  service_description.split("=",1)[1]
                                print host_name
                                print service_description
                                print current_state
                                print
                                cmd = "echo '[1509653167]   SCHEDULE_FORCED_SVC_CHECK;"+host_name+";"+service_description+";1509653167' > /var/spool/nagios/cmd/nagios.cmd"
                                os.system(cmd)

[1] https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/configmain.html

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

You will need to change the line:

for i in range(15):

to:

for i in range(16):

For this script to work with the latest version of Nagios (Dec, 2017)

Congratulations @thomas-tiramisu! You have received a personal award!

1 Year on Steemit
Click on the badge to view your Board of Honor.

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @thomas-tiramisu! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!
Loading...