| ←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |
| Who | What | When |
|---|---|---|
| timothyv | hi changi
dev and doc are not loading | [01:07] |
| irc.tw.org as well | [01:19] | |
| ........... (idle for 51mn) | ||
| cool its all clear now | [02:10] | |
| *** | timothyv1 has joined #tikiwiki-monitor | [02:14] |
| timothyv2 has joined #tikiwiki-monitor
timothyv2 has left timothyv2 has joined #tikiwiki-monitor | [02:22] | |
| timothyv has quit IRC (Read error: 110 (Connection timed out)) | [02:33] | |
| timothyv1 has quit IRC (Read error: 113 (No route to host)) | [02:42] | |
| .................... (idle for 1h39mn) | ||
| timothyv has joined #tikiwiki-monitor | [04:21] | |
| deepaks has joined #tikiwiki-monitor | [04:29] | |
| timothyv2 has quit IRC (Read error: 110 (Connection timed out)) | [04:40] | |
| ............. (idle for 1h0mn) | ||
| deepaks has quit IRC ("Leaving.") | [05:40] | |
| ........ (idle for 37mn) | ||
| mose | timothyv: here ?
I'm trying to diagnose why noc is getting down so often recently | [06:17] |
| *** | timothyv has quit IRC (Read error: 113 (No route to host)) | [06:18] |
| mose | and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two
it's the first host on the list far beyond any other | [06:19] |
| *** | srishti has joined #tikiwiki-monitor | [06:24] |
| mose | srishti: hi | [06:24] |
| srishti | hi | [06:24] |
| mose | I'm trying to diagnose why noc is getting down so often recently
and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two far beyond any other host so I go see apache logs but I don't see special thing many nagios checks | [06:25] |
| srishti | ok | [06:27] |
| mose | but maybe you could make check on something else than tiki-index.php and /features ? | [06:27] |
| srishti | yea ok | [06:28] |
| mose | I also see, but that's not related
a lot of 404 to archives.tikiwiki.org from a browser at regular times archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:32 +0100] "GET /tiki-index.php HTTP/1.1" 404 212 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" every minute | [06:28] |
| srishti | this is starange | [06:29] |
| mose | yes that's also what I think
archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:55 +0100] "GET /features HTTP/1.1" 404 206 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" looks like nagios but without nagios signature have a look on http://noc.tikiwiki.org/stats | [06:29] |
| srishti | looks like a false nagios check | [06:31] |
| mose | real nagios checks are every 3 minutes visibly
on 2 pages on every host the thing is, actually, as now we have monit + munin, http/apache checks could be removed from nagios nagios check is still required to see if host pings, though as well, on old server you used snmp, didn't you ? | [06:32] |
| srishti | yes | [06:34] |
| mose | what was it for ? | [06:34] |
| srishti | we didnt actually use snmp, just ping, and host_alive | [06:36] |
| mose | oh, snmp was setup on server, actually
so I guess that was just not used | [06:36] |
| srishti | yeah | [06:37] |
| mose | on the day of yesterday we had 1542 nagios checks, and 3796 MSIE false nagios checks
quite odd | [06:39] |
| srishti | really weird | [06:39] |
| *** | timothyv1 has joined #tikiwiki-monitor | [06:40] |
| mose | btw what time is it for you right now ? | [06:40] |
| srishti | its 7.40pm PDT | [06:41] |
| mose | oh, nice
for me it's 2:41 pm and 7:41 am for changi good coverage :) | [06:41] |
| srishti | :-) | [06:42] |
| mose | we could use a guy from canada or us, to fill it up | [06:42] |
| srishti | hehe | [06:42] |
| mose | anyway, I guess it could be wise to remove the apache monitor from nagios, now
that's probably not related to our current probklems but that traffic is useless | [06:43] |
| srishti | ok sure | [06:44] |
| mose | do you want to get emails from monit/munin ? | [06:49] |
| srishti | yeah sure | [06:49] |
| mose | do you have an adress for the monitoring staff ? | [06:49] |
| srishti | you may use noc@avonsys.com | [06:49] |
| mose | great
I add you | [06:49] |
| srishti | i am also in the above group
ok nagios s no longer doing http_check | [06:50] |
| mose | superb, thanks
I will also add noc@avonsys as recipient of noc@tw.o if you don't mind | [06:52] |
| srishti | yea sure | [06:55] |
| mose | so you actually get other alerts
like diffmon messages | [06:55] |
| srishti | ok | [06:55] |
| mose | and when people send alerts by mail
we also use that email to discuss sysadmin issues | [06:55] |
| srishti | great | [06:57] |
| mose | we really need to fix the current condition of the server, it falls often for unknown reason
we have an armada of tools already but they are failing to auto-fix everything | [07:01] |
| srishti | so what other plans do yu have to get that fixed? | [07:04] |
| mose | we'll make plan when analysis will reveal the cause
we plan to dig out all the logs we have :) well, logical move I explored a bit but didn't see anything relevant yet if you get time, you are welcome to join the exploration :) | [07:04] |
| srishti | ok sure | [07:07] |
| mose | we still get those requests from MSIE
have a look on tail -f /var/log/apache2/access.vcombined-2010.01.29 | grep 113.20.89.106 there is a ghost somewhere ! :) | [07:16] |
| srishti | on the server yeah | [07:18] |
| mose | oh we still have nagios check on doc.tikiwiki.org (only) | [07:25] |
| srishti | ok lemme have a look | [07:25] |
| ok done
i mean yu shouldnt be getting checks on doc now | [07:32] | |
| .............. (idle for 1h8mn) | ||
| changi | polom
hi tailers boy :) | [08:42] |
| srishti | hi changi | [08:44] |
| ..... (idle for 21mn) | ||
| mose | heya changi :)
I'm a fanatic tailer | [09:05] |
| changi | mose: nice job with semaphor :) | [09:06] |
| mose | that was an easy one actually
error message was explicit | [09:07] |
| changi | mose: i'll have more time this WE, will inspect this damned apache log | [09:07] |
| mose | but that was actually the first time I bumped into such error
the thing is that sometimes apache gets ghosted impossible to restart by normal way | [09:07] |
| changi | that's why i create the script in /usr/local/sbin called by monit | [09:08] |
| mose | maybe that's mod_bw side effect
combioned with some other oddity | [09:08] |
| changi | don't think so, we have this problem before | [09:09] |
| mose | oh
then maybe apc | [09:09] |
| changi | it's a tiki problem | [09:09] |
| mose | well, whatever tiki problem apache should live or die
not get ghosted | [09:09] |
| changi | when it ghosted, it use 100% of one cpu
try to fetch information from mysql | [09:09] |
| mose | so that's the famous tracker curse | [09:10] |
| changi | on the old server, the problem was on the mysql server that couldn't answer more query
i think so | [09:10] |
| mose | damn this thing
it was a hack it became a beast | [09:10] |
| changi | it's an infinity loop
and as we put mysql on socket connection, it's apache that crash :) | [09:11] |
| mose | poor indian | [09:12] |
| changi | ? | [09:12] |
| mose | apaches are indian | [09:12] |
| changi | lol | [09:12] |
| mose | linux proposes quite a wide range of imagination pretexts ;) | [09:12] |
| changi | will try some fcgi tuning
to avoid this problem and try to find in log what queyr is done when apache crashed maybe put php5-cgi in debug mode hav to work see ya | [09:13] |
| ..................... (idle for 1h43mn) | ||
| *** | rigieta has joined #tikiwiki-monitor | [10:58] |
| timothyv1 has left | [11:08] | |
| ............................................................................................ (idle for 7h37mn) | ||
| changi|home has joined #tikiwiki-monitor
changi|home has left changi|home has joined #tikiwiki-monitor changi|home has left | [18:45] | |
| ................................ (idle for 2h38mn) | ||
| srishti has quit IRC ("Leaving.") | [21:24] | |
| ............................ (idle for 2h18mn) | ||
| rigieta has quit IRC ("Leaving.") | [23:42] | |
| rupeni has joined #tikiwiki-monitor | [23:47] | |
| ←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |