hi changi dev and doc are not loading irc.tw.org as well cool its all clear now timothyv: here ? I'm trying to diagnose why noc is getting down so often recently and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two it's the first host on the list far beyond any other srishti: hi hi I'm trying to diagnose why noc is getting down so often recently and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two far beyond any other host so I go see apache logs but I don't see special thing many nagios checks ok but maybe you could make check on something else than tiki-index.php and /features ? yea ok I also see, but that's not related a lot of 404 to archives.tikiwiki.org from a browser at regular times archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:32 +0100] "GET /tiki-index.php HTTP/1.1" 404 212 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" every minute this is starange yes that's also what I think archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:55 +0100] "GET /features HTTP/1.1" 404 206 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" looks like nagios but without nagios signature have a look on http://noc.tikiwiki.org/stats looks like a false nagios check real nagios checks are every 3 minutes visibly on 2 pages on every host the thing is, actually, as now we have monit + munin, http/apache checks could be removed from nagios nagios check is still required to see if host pings, though as well, on old server you used snmp, didn't you ? yes what was it for ? we didnt actually use snmp, just ping, and host_alive oh, snmp was setup on server, actually so I guess that was just not used yeah on the day of yesterday we had 1542 nagios checks, and 3796 MSIE false nagios checks quite odd really weird btw what time is it for you right now ? its 7.40pm PDT oh, nice for me it's 2:41 pm and 7:41 am for changi good coverage :) :-) we could use a guy from canada or us, to fill it up hehe anyway, I guess it could be wise to remove the apache monitor from nagios, now that's probably not related to our current probklems but that traffic is useless ok sure do you want to get emails from monit/munin ? yeah sure do you have an adress for the monitoring staff ? you may use noc@avonsys.com great I add you i am also in the above group ok nagios s no longer doing http_check superb, thanks I will also add noc@avonsys as recipient of noc@tw.o if you don't mind yea sure so you actually get other alerts like diffmon messages ok and when people send alerts by mail we also use that email to discuss sysadmin issues great we really need to fix the current condition of the server, it falls often for unknown reason we have an armada of tools already but they are failing to auto-fix everything so what other plans do yu have to get that fixed? we'll make plan when analysis will reveal the cause we plan to dig out all the logs we have :) well, logical move I explored a bit but didn't see anything relevant yet if you get time, you are welcome to join the exploration :) ok sure we still get those requests from MSIE have a look on tail -f /var/log/apache2/access.vcombined-2010.01.29 | grep 113.20.89.106 there is a ghost somewhere ! :) on the server yeah oh we still have nagios check on doc.tikiwiki.org (only) ok lemme have a look ok done i mean yu shouldnt be getting checks on doc now polom hi tailers boy :) hi changi heya changi :) I'm a fanatic tailer mose: nice job with semaphor :) that was an easy one actually error message was explicit mose: i'll have more time this WE, will inspect this damned apache log but that was actually the first time I bumped into such error the thing is that sometimes apache gets ghosted impossible to restart by normal way that's why i create the script in /usr/local/sbin called by monit maybe that's mod_bw side effect combioned with some other oddity don't think so, we have this problem before oh then maybe apc it's a tiki problem well, whatever tiki problem apache should live or die not get ghosted when it ghosted, it use 100% of one cpu try to fetch information from mysql so that's the famous tracker curse on the old server, the problem was on the mysql server that couldn't answer more query i think so damn this thing it was a hack it became a beast it's an infinity loop and as we put mysql on socket connection, it's apache that crash :) poor indian ? apaches are indian lol linux proposes quite a wide range of imagination pretexts ;) will try some fcgi tuning to avoid this problem and try to find in log what queyr is done when apache crashed maybe put php5-cgi in debug mode hav to work see ya