greygraph on CentOS

April 1st, 2012 ntavares Posted in centos, en_US, fedora, monitorização, performance No Comments »

Ouvir com webReader

Today I gave a look at greygraph, a mailgraph-based tool for displaying sqlgrey graphs.

Here are the adaptations to run it in CentOS. Have a look at the README inside the distribution, anyway, After unpacking the distribution tarball:

  1. mkdir -p /var/cache/greygraph
  2. chgrp apache /var/cache/greygraph/
  3. chmod g+w /var/cache/greygraph/

Review etc/default/greygraph and copy it to /etc/sysconfig/greygraph:

  1. mv etc/default/greygraph  /etc/sysconfig/greygraph

Place files and directories directories:

  1. mkdir -p /usr/share/greygraph
  2. mv usr/lib/cgi-bin/greygraph.cgi  /usr/share/greygraph/
  3. mv var/www/css/greygraph.css /usr/share/greygraph/
  4. mv usr/sbin/greygraph /usr/sbin
  5. mkdir -p /var/lib/greygraph

As for the SysV script, I've adapted mailgrah's. Download the script greygraph (remove .txt extension).

In the meantime, I'll try to add the generated RRDs to cacti [1, 2], let me know if you managed to do so.

AddThis Social Bookmark Button

NRPE for Endian Firewall

December 31st, 2009 ntavares Posted in en_US, monitorização, nagios No Comments »

Ouvir com webReader

Finally I had the time to compile NRPE (the nagios addon) for Endian Firewall. If you are in a hurry, look for the download section below.

First of all, an important finding is that Endian Firewall (EFW) seems to be based on Fedora Core 3, so maybe sometimes you can spare some work by installing FC3 RPMs directly. And that's what we'll do right away, so we can move around EFW more easily.

Packaging and installing NRPE

Packaging and installing nagios-plugins

  • Grab the source at:

    cd /root

  • Repeat the procedure you did for NRPE: place the tarball on SOURCES and the spec file on SPECS:

    cp nagios-plugins-1.4.14.tar.gz /usr/src/endian/SOURCES/
    cd /usr/src/endian/SOURCES/
    tar xfvz nagios-plugins-1.4.14.tar.gz
    chown -R root:root nagios-plugins-1.4.14
    cp nagios-plugins-1.4.14/nagios-plugins.spec ../SPECS/

  • This bundle of plugins includes the so-called standard plugins for nagios. They are a lot and you maybe can just cut some off so the building is quicker. Also, you may avoid depend on perl(Net::SNMP), perl(Crypt::DES) and perl(Socket6) - which you can grab from DAG's RPM repo (remember the FC3 branching).

  • cd /root

  • Finally, install everything:

    rpm -ivh perl-Net-SNMP-5.2.0-1.1.fc3.rf.noarch.rpm \
    perl-Crypt-DES-2.05-3.1.fc3.rf.i386.rpm \
    perl-Socket6-0.19-1.1.fc3.rf.i386.rpm \

Final notes

Be aware that this is a sample demonstration. I was more interested in having it done for my case - since I can fix future problems - rather than doing a proper/full EFW integration. If you think you can contribute with tweaking this build process just drop me a note.


Here are the RPMs which include the supra-mentioned tweaks (this required extra patching on the .spec file and include the patch within the source):

AddThis Social Bookmark Button

Listing miscellaneous Apache parameters in SNMP

September 28th, 2009 ntavares Posted in en_US, monitorização No Comments »

Ouvir com webReader

We recently had to look at a server which occasionaly died with DoS. I was manually monitoring a lot of stuff while I was watching a persistent BIG apache worker popping up occasionally and then disappear (probably being recycled). Yet more rarely I caught two of them. This machine was being flood with blog spam from a botnet. I did the math and soon I found that if the current number of allowed workers was filled the way this was, the machine would start swapping like nuts. This seemed to be the cause.

After corrected the problem (many measures were taken, see below), I searched for cacti templates that could evidence this behaviour. I found that ApacheStats nor the better Apache templates didn't report about Virtual Memory Size (VSZ) nor Resident Set Size (RSS), which is exaplained by mod_status not reporting it either (and they fetch the data by querying mod_status).

So here's a simple way of monitoring these. Suppose there is a server which runs some apache workers you want to monitor, and there is machine to where you want to collect data:

Edit your server's /etc/snmp/snmpd.conf

  1. # .... other configuration directives
  2. exec . ApacheRSS /usr/local/bin/

'' OID is a branch of '.' which was assigned with meaning '', which is where one enterprise without IANA assignmed code should place its OIDs. Anyway, you can use any sequence you want.

Create a file named /usr/local/bin/ with following contents:

  1. #!/bin/sh
  2. WORKERS=4
  3. ps h -C httpd -o rss | sort -rn | head -n $WORKERS

Notice that httpd is apache's process name in CentOS. In Debian, eg, that would be apache. Now give the script execution rights. Now go to your poller machine, from where you'll do the SNMP queries:

  1. [root@poller ~]# snmpwalk -v 2c -c public targetserver .
  2. SNMPv2-SMI::enterprises.111111.1.101.1 = STRING: "27856"
  3. SNMPv2-SMI::enterprises.111111.1.101.2 = STRING: "25552"
  4. SNMPv2-SMI::enterprises.111111.1.101.3 = STRING: "24588"
  5. SNMPv2-SMI::enterprises.111111.1.101.4 = STRING: "12040"

So this is reporting the 4 most consuming workers (which is the value specified in the script variable WORKERS) with their RSS usage (that's the output of '-o rss' on the script).

Now graphing these values is a bit more complicated, specially because the graphs are usually created on a "fixed number of values" basis. That means whenever your workers number increases or decreases, the script has to cope with it. That's why there is filtering ocurring on the script: first we reverse order them by RSS size, then we get only the first 4 - this means you'll be listing the most consuming workers. To avoid having your graphs asking for more values than the scripts generates, the WORKERS script variable should be adjusted to the minimum apache workers you'll ever have on your system - that should be the httpd.conf StartServers directive.

Now going for the graphs: this is the tricky part as I find cacti a little overcomplicated. However you should be OK with this Netuality post. You should place individual data sources for each of the workers, and group the four in a Graph Template. This is the final result, after lots of struggling trying to get the correct values (I still didn't manage to get the right values, which are ~22KB):


In this graph you won't notice the events I exposed in the beginning because other measures were taken, including dynamic firewalling, apache tuning and auditing the blogs for comment and track/pingback permissions - we had an user wide open to spam, and that was when the automatic process of cleaning up the blog spam was implemented. In any case, this graph will evidence future similar situations which I hope are over.

I'll try to post the cacti templates as well, as soon as I recover from the struggling :) Drop me a note if you're interested.

AddThis Social Bookmark Button

Side-effect of mysqlhotcopy and LVM snapshots on active READ server

September 26th, 2009 ntavares Posted in en_US, monitorização, mysql, performance No Comments »

Ouvir com webReader

I just came across a particular feature of MySQL while using inspecting a Query Cache being wiped out at backup times. Whenever you run FLUSH TABLES, the whole Query Cache gets flushed as well, even if you FLUSH TABLES a particular table. And guess what, mysqlhotcopy issues FLUSH TABLES so the tables get in sync on storage.

I actually noticed the problem with Query Cache on a server reporting the cache flush at a [too] round time (backup time).


First thought was «there's something wrong about mysqlhotcopy. But actually this is expected behaviour:

When no tables are named, closes all open tables, forces all tables in use to be closed, and flushes the query cache. With one or more table names, flushes only the given tables. FLUSH TABLES also removes all query results from the query cache, like the RESET QUERY CACHE statement.

I got curious about why the heck closing a table should invalidate the cache - maybe the "close table" mechanism is overly cautious?

Anyway, it's not mysqlhotcopy's fault. And since you should issue FLUSH TABLES for LVM snapshost for consistentency as well, this method is also affected, which renders both methods pretty counter-perfomance in a single production server, comparing to mysqldump, unless you do post-backup warmup process. For that, it would be interesting to be able to dump the QC contents and reload them after the backup - which is not possible, at the moment... bummer...

AddThis Social Bookmark Button

Monitorização de plataformas com cacti - LVS

April 4th, 2009 ntavares Posted in clustering, linux driver, monitorização, pt_PT 2 Comments »

Ouvir com webReader

Descobri que existe um módulo completo de Net-SNMP para as estatísticas de IPVS, o net-snmp-lvs-module. O ponto de partida é, naturalmente, a FAQ de LVS, que nos leva para o dito cujo. Os gráficos talvez possam ser mais trabalhados, particularmente no que toca ao valor InActConn, mas para já não tenho tempo. Read the rest of this entry »

AddThis Social Bookmark Button