We recently had to look at a server which occasionaly died with DoS. I was manually monitoring a lot of stuff while I was watching a persistent BIG apache worker popping up occasionally and then disappear (probably being recycled). Yet more rarely I caught two of them. This machine was being flood with blog spam from a botnet. I did the math and soon I found that if the current number of allowed workers was filled the way this was, the machine would start swapping like nuts. This seemed to be the cause.
After corrected the problem (many measures were taken, see below), I searched for cacti templates that could evidence this behaviour. I found that ApacheStats nor the better Apache templates didn't report about Virtual Memory Size (VSZ) nor Resident Set Size (RSS), which is exaplained by mod_status not reporting it either (and they fetch the data by querying mod_status).
So here's a simple way of monitoring these. Suppose there is a server which runs some apache workers you want to monitor, and there is machine to where you want to collect data:
Edit your server's /etc/snmp/snmpd.conf
# .... other configuration directives
exec .18.104.22.168.4.1.111111.1 ApacheRSS /usr/local/bin/apache-snmp-rss.sh
22.214.171.124.4.1.111111.1' OID is a branch of '
.126.96.36.199.4.1' which was assigned with meaning '.iso.org.dod.internet.private.enterprises', which is where one enterprise without IANA assignmed code should place its OIDs. Anyway, you can use any sequence you want.
Create a file named
/usr/local/bin/apache-snmp-rss.sh with following contents:
ps h -C httpd -o rss | sort -rn | head -n $WORKERS
Notice that httpd is apache's process name in CentOS. In Debian, eg, that would be apache. Now give the script execution rights. Now go to your poller machine, from where you'll do the SNMP queries:
[root@poller ~]# snmpwalk -v 2c -c public targetserver .188.8.131.52.4.1.111111.1.101
SNMPv2-SMI::enterprises.111184.108.40.206 = STRING: "27856"
SNMPv2-SMI::enterprises.111220.127.116.11 = STRING: "25552"
SNMPv2-SMI::enterprises.11118.104.22.168 = STRING: "24588"
SNMPv2-SMI::enterprises.11122.214.171.124 = STRING: "12040"
So this is reporting the 4 most consuming workers (which is the value specified in the script variable WORKERS) with their RSS usage (that's the output of '-o rss' on the script).
Now graphing these values is a bit more complicated, specially because the graphs are usually created on a "fixed number of values" basis. That means whenever your workers number increases or decreases, the script has to cope with it. That's why there is filtering ocurring on the script: first we reverse order them by RSS size, then we get only the first 4 - this means you'll be listing the most consuming workers. To avoid having your graphs asking for more values than the scripts generates, the WORKERS script variable should be adjusted to the minimum apache workers you'll ever have on your system - that should be the httpd.conf StartServers directive.
Now going for the graphs: this is the tricky part as I find cacti a little overcomplicated. However you should be OK with this Netuality post. You should place individual data sources for each of the workers, and group the four in a Graph Template. This is the final result, after lots of struggling trying to get the correct values (I still didn't manage to get the right values, which are ~22KB):
In this graph you won't notice the events I exposed in the beginning because other measures were taken, including dynamic firewalling, apache tuning and auditing the blogs for comment and track/pingback permissions - we had an user wide open to spam, and that was when the automatic process of cleaning up the blog spam was implemented. In any case, this graph will evidence future similar situations which I hope are over.
I'll try to post the cacti templates as well, as soon as I recover from the struggling Drop me a note if you're interested.