←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |
Who | What | When |
---|---|---|
timothyv | hi changi
dev and doc are not loading | [01:07] |
irc.tw.org as well | [01:19] | |
........... (idle for 51mn) | ||
cool its all clear now | [02:10] | |
*** | timothyv1 has joined #tikiwiki-monitor | [02:14] |
timothyv2 has joined #tikiwiki-monitor
timothyv2 has left timothyv2 has joined #tikiwiki-monitor | [02:22] | |
timothyv has quit IRC (Read error: 110 (Connection timed out)) | [02:33] | |
timothyv1 has quit IRC (Read error: 113 (No route to host)) | [02:42] | |
.................... (idle for 1h39mn) | ||
timothyv has joined #tikiwiki-monitor | [04:21] | |
deepaks has joined #tikiwiki-monitor | [04:29] | |
timothyv2 has quit IRC (Read error: 110 (Connection timed out)) | [04:40] | |
............. (idle for 1h0mn) | ||
deepaks has quit IRC ("Leaving.") | [05:40] | |
........ (idle for 37mn) | ||
mose | timothyv: here ?
I'm trying to diagnose why noc is getting down so often recently | [06:17] |
*** | timothyv has quit IRC (Read error: 113 (No route to host)) | [06:18] |
mose | and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two
it's the first host on the list far beyond any other | [06:19] |
*** | srishti has joined #tikiwiki-monitor | [06:24] |
mose | srishti: hi | [06:24] |
srishti | hi | [06:24] |
mose | I'm trying to diagnose why noc is getting down so often recently
and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two far beyond any other host so I go see apache logs but I don't see special thing many nagios checks | [06:25] |
srishti | ok | [06:27] |
mose | but maybe you could make check on something else than tiki-index.php and /features ? | [06:27] |
srishti | yea ok | [06:28] |
mose | I also see, but that's not related
a lot of 404 to archives.tikiwiki.org from a browser at regular times archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:32 +0100] "GET /tiki-index.php HTTP/1.1" 404 212 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" every minute | [06:28] |
srishti | this is starange | [06:29] |
mose | yes that's also what I think
archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:55 +0100] "GET /features HTTP/1.1" 404 206 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-" looks like nagios but without nagios signature have a look on http://noc.tikiwiki.org/stats | [06:29] |
srishti | looks like a false nagios check | [06:31] |
mose | real nagios checks are every 3 minutes visibly
on 2 pages on every host the thing is, actually, as now we have monit + munin, http/apache checks could be removed from nagios nagios check is still required to see if host pings, though as well, on old server you used snmp, didn't you ? | [06:32] |
srishti | yes | [06:34] |
mose | what was it for ? | [06:34] |
srishti | we didnt actually use snmp, just ping, and host_alive | [06:36] |
mose | oh, snmp was setup on server, actually
so I guess that was just not used | [06:36] |
srishti | yeah | [06:37] |
mose | on the day of yesterday we had 1542 nagios checks, and 3796 MSIE false nagios checks
quite odd | [06:39] |
srishti | really weird | [06:39] |
*** | timothyv1 has joined #tikiwiki-monitor | [06:40] |
mose | btw what time is it for you right now ? | [06:40] |
srishti | its 7.40pm PDT | [06:41] |
mose | oh, nice
for me it's 2:41 pm and 7:41 am for changi good coverage :) | [06:41] |
srishti | :-) | [06:42] |
mose | we could use a guy from canada or us, to fill it up | [06:42] |
srishti | hehe | [06:42] |
mose | anyway, I guess it could be wise to remove the apache monitor from nagios, now
that's probably not related to our current probklems but that traffic is useless | [06:43] |
srishti | ok sure | [06:44] |
mose | do you want to get emails from monit/munin ? | [06:49] |
srishti | yeah sure | [06:49] |
mose | do you have an adress for the monitoring staff ? | [06:49] |
srishti | you may use noc@avonsys.com | [06:49] |
mose | great
I add you | [06:49] |
srishti | i am also in the above group
ok nagios s no longer doing http_check | [06:50] |
mose | superb, thanks
I will also add noc@avonsys as recipient of noc@tw.o if you don't mind | [06:52] |
srishti | yea sure | [06:55] |
mose | so you actually get other alerts
like diffmon messages | [06:55] |
srishti | ok | [06:55] |
mose | and when people send alerts by mail
we also use that email to discuss sysadmin issues | [06:55] |
srishti | great | [06:57] |
mose | we really need to fix the current condition of the server, it falls often for unknown reason
we have an armada of tools already but they are failing to auto-fix everything | [07:01] |
srishti | so what other plans do yu have to get that fixed? | [07:04] |
mose | we'll make plan when analysis will reveal the cause
we plan to dig out all the logs we have :) well, logical move I explored a bit but didn't see anything relevant yet if you get time, you are welcome to join the exploration :) | [07:04] |
srishti | ok sure | [07:07] |
mose | we still get those requests from MSIE
have a look on tail -f /var/log/apache2/access.vcombined-2010.01.29 | grep 113.20.89.106 there is a ghost somewhere ! :) | [07:16] |
srishti | on the server yeah | [07:18] |
mose | oh we still have nagios check on doc.tikiwiki.org (only) | [07:25] |
srishti | ok lemme have a look | [07:25] |
ok done
i mean yu shouldnt be getting checks on doc now | [07:32] | |
.............. (idle for 1h8mn) | ||
changi | polom
hi tailers boy :) | [08:42] |
srishti | hi changi | [08:44] |
..... (idle for 21mn) | ||
mose | heya changi :)
I'm a fanatic tailer | [09:05] |
changi | mose: nice job with semaphor :) | [09:06] |
mose | that was an easy one actually
error message was explicit | [09:07] |
changi | mose: i'll have more time this WE, will inspect this damned apache log | [09:07] |
mose | but that was actually the first time I bumped into such error
the thing is that sometimes apache gets ghosted impossible to restart by normal way | [09:07] |
changi | that's why i create the script in /usr/local/sbin called by monit | [09:08] |
mose | maybe that's mod_bw side effect
combioned with some other oddity | [09:08] |
changi | don't think so, we have this problem before | [09:09] |
mose | oh
then maybe apc | [09:09] |
changi | it's a tiki problem | [09:09] |
mose | well, whatever tiki problem apache should live or die
not get ghosted | [09:09] |
changi | when it ghosted, it use 100% of one cpu
try to fetch information from mysql | [09:09] |
mose | so that's the famous tracker curse | [09:10] |
changi | on the old server, the problem was on the mysql server that couldn't answer more query
i think so | [09:10] |
mose | damn this thing
it was a hack it became a beast | [09:10] |
changi | it's an infinity loop
and as we put mysql on socket connection, it's apache that crash :) | [09:11] |
mose | poor indian | [09:12] |
changi | ? | [09:12] |
mose | apaches are indian | [09:12] |
changi | lol | [09:12] |
mose | linux proposes quite a wide range of imagination pretexts ;) | [09:12] |
changi | will try some fcgi tuning
to avoid this problem and try to find in log what queyr is done when apache crashed maybe put php5-cgi in debug mode hav to work see ya | [09:13] |
..................... (idle for 1h43mn) | ||
*** | rigieta has joined #tikiwiki-monitor | [10:58] |
timothyv1 has left | [11:08] | |
............................................................................................ (idle for 7h37mn) | ||
changi|home has joined #tikiwiki-monitor
changi|home has left changi|home has joined #tikiwiki-monitor changi|home has left | [18:45] | |
................................ (idle for 2h38mn) | ||
srishti has quit IRC ("Leaving.") | [21:24] | |
............................ (idle for 2h18mn) | ||
rigieta has quit IRC ("Leaving.") | [23:42] | |
rupeni has joined #tikiwiki-monitor | [23:47] |
←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |