Open main menu

OpenVZ Virtuozzo Containers Wiki β

Changes

Monitoring openvz resources using nagios and snmp

96 bytes removed, 12:26, 21 December 2015
add category monitoring
</pre>
edit '''/etc/default/snmpd''' : remove ''-u snmp'' and replace ''127.0.0.1'' with your ip(ie : 207.46.250.119), Full'''/etc/default/snmpd''' example:
<pre>
export MIBDIRS=/usr/share/snmp/mibs
<pre>
/etc/init.d/snmpd stop
echo rouser my_username priv >> /etc/snmp/snmpd.conf
echo "extend .1.3.6.1.4.1.2021.51 beancounters /bin/cat /proc/user_beancounters" >> /etc/snmp/snmpd.conf
echo "extend .1.3.6.1.4.1.2021.52 vzquota /bin/cat /proc/vz/vzquota" >> /etc/snmp/snmpd.conf
/etc/init.d/snmpd start
</pre>
 
(Note that the createUser command goes into a separate file. On Centos5 this file is located in /var/net-snmp/snmpd.conf. Make sure you stop snmpd before putting the createUser command there!).
Testing snmp:
<pre>
snmpwalk -v 3 -u my_usrname my_username -l authPriv -a MD5 -A my_password -x DES -X my_password 207.46.250.119$(hostname -i)
</pre>
#####################################################################################
if [ $RET1 -gt $RET2 ]; then
RET=$RET1
else
</source>
=== nagios plugincheck_vzquota Without SNMP ===<source lang="bash">#!/failcnt in python bin/bashRET=0DATA=`echo;sudo /usr/sbin/vzlist -1 2>/dev/null | xargs -I {} bash -c "echo {}:;sudo /usr/sbin/vzquota stat {} | sed 's/\*//g'"`if [ -z "$DATA" ]; then VPS_err=$(sudo /usr/sbin/vzlist -1 2>&1 1>/dev/null) if [ -n "$VPS_err" ] && [ "$VPS_err" =="Container(s) not found" ]; then echo "OK - $VPS_err"; exit 0; else if [ -n "$VPS_err" ]; then echo "UNKNOWN - Error: $VPS_err"; else echo "UNKNOWN - VZquota stats are not readable or empty. Maybe it is only readable for root and this script should be called by sudo."; fi exit 3; fifi
To be added locally on the VZ HN to echo "$DATA" | perl -n -e'my $vid ;my $ret=0 ;my $crit="";my $warn="";my $ok="";while(<STDIN>){ my %vid; if ( /^(\d+):.*/ ){ $vid=$1; } if ( /\D*(\d+):.*/etc){ $vid=$1; } if ( /nagios\s*(\S+)\s+(\d+)\s+(\d+)\s+(\d+).*/nrpe_local){ $resource=$1 ; $usage=$2 ; $softlimit=$3 ; $hardlimit=$4 ; if ( $usage >= $hardlimit ){ $crit=$crit.conf<br"VZquota limit exceeded on $vid: $resource usage->$usage, softlimit->$softlimit, hardlimit->$hardlimit, time->$time, expire->$expire " ; $ret=2; } elsif ( $usage >= $softlimit ){Works as nagios $warn=$warn."VZquota limit exceeded on $vid: $resource usage->$usage, softlimit->$softlimit, hardlimit-plugin with option '>$hardlimit, time-f' or reports an increase of a failcnt>$time, expire-value by mail >$expire " ; $ret=1; } $ok=$ok."$vid:$resource $usage/$softlimit\n"; }}if run e($ret == 0) { print "OK - click on service-link for details..g. as a cronjob with option '\n$ok";} elsif ($ret == 1) { print "WARNING - $warn\n";} else { print "CRITICAL -t$crit\n";}exit($ret);'RET=$?exit $RET</source>The script calls <code>/usr/sbin/vzlist</code> by sudo. When doing this it normally needs a password, which check_nrpe will not know. We use Because of this it with both cases is necessary that you append a line like the following to <code>/etc/sudors</code> (user name an path should be sure that we see a peak in case it happened between adapted to the right ones on your system): nagios-checks ALL=NOPASSWD:/usr/sbin/vzlist, /usr/sbin/vzquota
=== check_ubc Without SNMP ===<source lang=python"bash">#!/usr/bin/pythonbash# Servicestate description can have a http-link to the openvz-wiki# Copyright (C) 2008 Christian Benkein case that a ressource is warning/critical. To use it:# Distributed under the terms of the GNU General Public License v21. set "escape_html_tags=0" in nagios/etc/cgi.cfg# v02.set "my $linked=1 2008;" in the first perl lines in this script#export FILE=/tmp/check_ubcRET=0ubc_file='/proc/user_beancounters';DATA='';if [ -04r $ubc_file ]; then DATA=`cat $ubc_file`fiif [ -03z "$DATA" ]; then# Christian Benke <benkokakao gmail com> echo "UNKNOWN - $ubc_file is not readable or empty. Maybe it is only readable for root and this script should be called by sudo."; exit 3;fi
import stringif [ -f $FILE ]; thenimport pickleecho "$DATA" | perl -n -e'import sysuse Data::Dumper;import getoptmy $linked=1; # 0:plain text output, 1:resourcename is a http-link to OpenVZ-wikiimport remy $file=$ENV{"FILE"};import smtplibmy $ret=0 ;import socketmy $vid ;my $resource ;my $held ;my $maxheld ;my $barrier ;my $limit ;my $failcnt ;my %beancounters ;my %beancounters_old ;while(<STDIN>){ my %vmachine; if ( /\D*(\d+):.*/ ){ $vid=$1; $beancounters{$vid}=\%vmachine ; } if ( /^[\W\d]+([a-z]+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+).*/ ) { $resource=$1 ; $held=$2 ; $maxheld=$3 ; $barrier=$4 ; $limit=$5 ; $failcnt=$6 ; ${beancounters{$vid}}{$resource}=[$held , $maxheld , $barrier , $limit ,$failcnt ]; if ( ($held > $barrier) && ($barrier != 0) ) { print "WARNING: Limits on $vid: ".&url($resource,$linked)." held->$held , barrier->$barrier ( limit->$limit ) " ; $ret=1; } #print "$vid:$resource $held Barrier:$barrier "; }}
veid=''# read and parse old dataopen(MYINPUTFILE, "<$file");current_data=dictwhile(<MYINPUTFILE>){opts my %vmachine; if ( /\D*(\d+):.*/ ){ $vid=None$1; $beancounters_old{$vid}=\%vmachine ; }beancounter_data=open if ('/proc^[\W\d]+([a-z]+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+).*/user_beancounters','r'){picklefilepath $resource='/tmp/beancounters_pickledump'$1 ; $held=$2 ; $maxheld=$3 ; $barrier=$4 ; $limit=$5 ; $failcnt=$6 ; ${beancounters_old{$vid}}{$resource}=[$held , $maxheld , $barrier , $limit ,$failcnt ]; }}
#foreach my $vmachine_id (keys %beancounters) { foreach my $resource (keys %{$beancounters{$vmachine_id}} ) { if ( defined($beancounters{$vmachine_id}{$resource}[4]) && defined($beancounters_old{$vmachine_id}{$resource}[4]) ){ my $failcnt=$beancounters{$vmachine_id}{$resource}[4]; my $failcnt_old=$beancounters_old{$vmachine_id}{$resource}[4]; my $held=$beancounters{$vmachine_id}{$resource}[0]; my $maxheld=$beancounters{$vmachine_id}{$resource}[1]; my $barrier=$beancounters{$vmachine_id}{$resource}[2]; my $limit=$beancounters{$vmachine_id}{$resource}[3]; if ( $failcnt_old < $failcnt ){ print "CRITICAL: Increased failcnt $vmachine_id: ".url($resource,$linked)." from $failcnt_old to $failcnt (held->$held , maxheld->$maxheld , barrier->$barrier , limit----- find the hostname for each veid --->$limit ) " ; $ret=2; } #print "$vmachine_id: Old_Failcnt: $failcnt_old Failcnt:$failcnt \n"; } }
def find_veid(veid):}sub url { veid_conf=openmy ('/etc/vz/conf/' + str(veid) + '.conf'$name,'r'$with_link)= @_; for line in veid_conf:if ($with_link) { if return "<a target=\"_blank\"HOSTNAMEhref=\" in linehttp: quotes=re//wiki.openvz.compile(org/"\.$name."#") line=quotes.sub($name."\">$name</a>",line); } else { return $name; }} linefeedif ($ret ==re.compile(0 ) { print "OK: All bean counters fine \n"; }# print Dumper(%beancounters_old); line=linefeed.sub(# print "\n",line); fqdn=re.splitexit('=',line$ret); hostname=re.split('\.',fqdn[1])[0] return hostname
# ---------- send mail in case of a counter-changedef send_mail(count_change): mailfrom = 'root@' + str(host) mailto RET= 'to@example.com' mailsubject = 'Beancounters changed in the last 5 minutes' mailbody = 'The Beancounter-failcnt value of the following veid(s) and resource(s) has \nincreased in the last 5 minutes:\n\n' server = smtplib.SMTP('localhost') server.sendmail(mailfrom, [mailto], '''\From:''' + mailfrom + '''\\nTo:''' + mailto + '''\$?\nSubject:''' + mailsubject + '''\fi
\n''' + mailbody + count_change) server.quit() #------------read raw and compare data from user_beancounters def compare_data(beancounter_data,data_read,count): barrier_break=str() count_change=str() for line in beancounter_data: if 'Version' in line or 'uid' in line or 'dummy' in line: continue else: fields=line.split( ) if len(fields) == 7: i=0 veid=int(fields[0][:-1]) fields.pop(0) #remove the first element current_data[veid]=dict() current_data[veid][fields[0]]=fields else: i=i+1 current_data[veid][fields[0]]=fields if data_read and count == True and data_read is not '0': #comparing counters of new data with previous run if data_read[veid][fields[0]][5] < current_data[veid][fields[0]][5]: if int(veid) != 0: hostname=find_veid(veid) else: hostname='OpenVZ Hardware Node' count_change=str(count_change) + str(hostname) + ': ' + str(fields[0]) + ' failcnt has changed from ' + data_read[veid][fields[0]][5] + ' to ' + str(current_data[veid][fields[0]][5]) + '\n'  if count == False: #comparing current level with barrier/limit if current_data[veid][fields[0]][0] == 'oomguarpages': #for oomguarpages and physpages only the limit-value is relevant if int(current_data[veid][fields[0]][1]) > int(current_data[veid][fields[0]][4])*0.9: barrier_break = str(barrier_break) + str(veid) + ': ' + str(current_data[veid][fields[0]][0]) + ' ' elif current_data[veid][fields[0]][0] == 'physpages': if int(current_data[veid][fields[0]][1]) > int(current_data[veid][fields[0]][4])*0.9: barrier_break = str(barrier_break) + str(veid) + ': ' + str(current_data[veid][fields[0]][0]) + ' ' else: if int(current_data[veid][fields[0]][1]) > int(current_data[veid][fields[0]][3])*0.9: barrier_break = str(barrier_break) + str(veid) + ': ' + str(current_data[veid][fields[0]][0]) + ' ' if barrier_break and count == False: print barrier_break sys.exit(2) elif count == False: print 'All Beancounters OK' sys.exit(0)  if count_change and count == True: send_mail(count_change) return current_data elif count == True: return current_data  # ----- pickle data - read or write def pickle_data(current_data,action,count,picklefilepath): try: picklefile = None if action == 'write': if current_data: picklefile=open(picklefilepath,'w') pickle.dump(current_data, picklefile) picklefile.close() return else: print 'current_data is empty: ' + str(current_data) elif action == 'read': picklefile=open(picklefilepath,'r') data_read=pickle.load(picklefile) picklefile.close() if data_read: return data_read else: print 'DATA_READ IS NONE:' + str(data_read) return data_read except IOError: current_data = compare_data(beancounter_data,'0',count) picklefile=open(picklefilepath,'w') pickle.dump(current_data,picklefile) picklefile.close() # ------- print script usage def usage(prog=echo "check_beancounters.py$DATA"): print """check_beancounters.py : Check if resource-values break barriers or limits and failcounters increase  check_beancounters.py [-tfh]  -h print this message  -t Check if failcnt-values have increased since the last run -f Check if current value of a resource is higher than barrier/limit """> $FILE opts=getopt.getopt(sys.argv[1:], 'thf')if opts: if opts[0]==[]: usage(); sys.exit(0) elif opts[0][0][0]=='-h': usage(); sys.exit(0) elif opts[0][0][0]=='-t': count=True elif opts[0][0][0]=='-f': count=False  data_read=pickle_data(current_data,'read',count,picklefilepath)current_data = compare_data(beancounter_data,data_read,count)pickle_data(current_data,'write',count,picklefilepath)$RET
</source>
The script needs to read the <code>/proc/user_beancounters</code> file. This is normally only readable for root. Because of this it is necessary that you append a line like the following to <code>/etc/sudors</code> (user name an path should be adapted to the right ones on your system):
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_ubc
Also don't forget to consider this on your <code>nrpe.cfg</code>, so that you call the script with sudo:
command[check_ubc]=sudo /usr/local/nagios/libexec/check_ubc
 [[Category: HOWTOMonitoring]]