[01:07] hi changi [01:08] dev and doc are not loading [01:19] irc.tw.org as well [02:10] cool its all clear now [02:14] *** timothyv1 has joined #tikiwiki-monitor [02:22] *** timothyv2 has joined #tikiwiki-monitor [02:22] *** timothyv2 has left [02:22] *** timothyv2 has joined #tikiwiki-monitor [02:33] *** timothyv has quit IRC (Read error: 110 (Connection timed out)) [02:42] *** timothyv1 has quit IRC (Read error: 113 (No route to host)) [04:21] *** timothyv has joined #tikiwiki-monitor [04:29] *** deepaks has joined #tikiwiki-monitor [04:40] *** timothyv2 has quit IRC (Read error: 110 (Connection timed out)) [05:40] *** deepaks has quit IRC ("Leaving.") [06:17] timothyv: here ? [06:18] I'm trying to diagnose why noc is getting down so often recently [06:18] *** timothyv has quit IRC (Read error: 113 (No route to host)) [06:19] and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two [06:20] it's the first host on the list [06:21] far beyond any other [06:24] *** srishti has joined #tikiwiki-monitor [06:24] srishti: hi [06:24] hi [06:25] I'm trying to diagnose why noc is getting down so often recently [06:25] and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two [06:25] far beyond any other host [06:26] so I go see apache logs [06:26] but I don't see special thing [06:27] many nagios checks [06:27] ok [06:27] but maybe you could make check on something else than tiki-index.php and /features ? [06:28] yea ok [06:28] I also see, but that's not related [06:28] a lot of 404 to archives.tikiwiki.org [06:28] from a browser [06:28] at regular times [06:28] archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:32 +0100] "GET /tiki-index.php HTTP/1.1" 404 212 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" [06:29] every minute [06:29] this is starange [06:29] yes that's also what I think [06:30] archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:55 +0100] "GET /features HTTP/1.1" 404 206 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" [06:30] looks like nagios [06:30] but without nagios signature [06:31] have a look on http://noc.tikiwiki.org/stats [06:31] looks like a false nagios check [06:32] real nagios checks are every 3 minutes visibly [06:32] on 2 pages [06:32] on every host [06:33] the thing is, actually, as now we have monit + munin, http/apache checks could be removed from nagios [06:34] nagios check is still required to see if host pings, though [06:34] as well, on old server you used snmp, didn't you ? [06:34] yes [06:34] what was it for ? [06:36] we didnt actually use snmp, just ping, and host_alive [06:36] oh, snmp was setup on server, actually [06:36] so I guess that was just not used [06:37] yeah [06:39] on the day of yesterday we had 1542 nagios checks, and 3796 MSIE false nagios checks [06:39] quite odd [06:39] really weird [06:40] *** timothyv1 has joined #tikiwiki-monitor [06:40] btw what time is it for you right now ? [06:41] its 7.40pm PDT [06:41] oh, nice [06:41] for me it's 2:41 pm [06:41] and 7:41 am for changi [06:41] good coverage :) [06:42] :-) [06:42] we could use a guy from canada or us, to fill it up [06:42] hehe [06:43] anyway, I guess it could be wise to remove the apache monitor from nagios, now [06:43] that's probably not related to our current probklems but that traffic is useless [06:44] ok sure [06:49] do you want to get emails from monit/munin ? [06:49] yeah sure [06:49] do you have an adress for the monitoring staff ? [06:49] you may use noc@avonsys.com [06:49] great [06:50] I add you [06:50] i am also in the above group [06:52] ok nagios s no longer doing http_check [06:52] superb, thanks [06:54] I will also add noc@avonsys as recipient of noc@tw.o if you don't mind [06:55] yea sure [06:55] so you actually get other alerts [06:55] like diffmon messages [06:55] ok [06:55] and when people send alerts by mail [06:56] we also use that email to discuss sysadmin issues [06:57] great [07:01] we really need to fix the current condition of the server, it falls often for unknown reason [07:02] we have an armada of tools already but they are failing to auto-fix everything [07:04] so what other plans do yu have to get that fixed? [07:04] we'll make plan when analysis will reveal the cause [07:04] we plan to dig out all the logs we have :) [07:05] well, logical move [07:05] I explored a bit but didn't see anything relevant yet [07:05] if you get time, you are welcome to join the exploration :) [07:07] ok sure [07:16] we still get those requests from MSIE [07:16] have a look on tail -f /var/log/apache2/access.vcombined-2010.01.29 | grep 113.20.89.106 [07:18] there is a ghost somewhere ! :) [07:18] on the server yeah [07:25] oh we still have nagios check on doc.tikiwiki.org (only) [07:25] ok lemme have a look [07:32] ok done [07:34] i mean yu shouldnt be getting checks on doc now [08:42] polom [08:43] hi tailers boy :) [08:44] hi changi [09:05] heya changi :) [09:06] I'm a fanatic tailer [09:06] mose: nice job with semaphor :) [09:07] that was an easy one actually [09:07] error message was explicit [09:07] mose: i'll have more time this WE, will inspect this damned apache log [09:07] but that was actually the first time I bumped into such error [09:08] the thing is that sometimes apache gets ghosted [09:08] impossible to restart by normal way [09:08] that's why i create the script in /usr/local/sbin called by monit [09:08] maybe that's mod_bw side effect [09:09] combioned with some other oddity [09:09] don't think so, we have this problem before [09:09] oh [09:09] then maybe apc [09:09] it's a tiki problem [09:09] well, whatever tiki problem apache should live or die [09:09] not get ghosted [09:09] when it ghosted, it use 100% of one cpu [09:10] try to fetch information from mysql [09:10] so that's the famous tracker curse [09:10] on the old server, the problem was on the mysql server that couldn't answer more query [09:10] i think so [09:10] damn this thing [09:10] it was a hack [09:11] it became a beast [09:11] it's an infinity loop [09:11] and as we put mysql on socket connection, it's apache that crash :) [09:12] poor indian [09:12] ? [09:12] apaches are indian [09:12] lol [09:12] linux proposes quite a wide range of imagination pretexts ;) [09:13] will try some fcgi tuning [09:13] to avoid this problem [09:14] and try to find in log what queyr is done when apache crashed [09:14] maybe put php5-cgi in debug mode [09:15] hav to work [09:15] see ya [10:58] *** rigieta has joined #tikiwiki-monitor [11:08] *** timothyv1 has left [18:45] *** changi|home has joined #tikiwiki-monitor [18:45] *** changi|home has left [18:46] *** changi|home has joined #tikiwiki-monitor [18:46] *** changi|home has left [21:24] *** srishti has quit IRC ("Leaving.") [23:42] *** rigieta has quit IRC ("Leaving.") [23:47] *** rupeni has joined #tikiwiki-monitor