timothyv: dev and doc are not loading
irc.tw.org as well
cool its all clear now
***: timothyv1 has joined #tikiwiki-monitor
timothyv2 has joined #tikiwiki-monitor
timothyv2 has left
timothyv2 has joined #tikiwiki-monitor
timothyv has quit IRC (Read error: 110 (Connection timed out))
timothyv1 has quit IRC (Read error: 113 (No route to host))
timothyv has joined #tikiwiki-monitor
deepaks has joined #tikiwiki-monitor
timothyv2 has quit IRC (Read error: 110 (Connection timed out))
deepaks has quit IRC ("Leaving.")
mose: timothyv: here ?
I'm trying to diagnose why noc is getting down so often recently
***: timothyv has quit IRC (Read error: 113 (No route to host))
mose: and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two
it's the first host on the list
far beyond any other
***: srishti has joined #tikiwiki-monitor
mose: srishti: hi
srishti: hi
mose: I'm trying to diagnose why noc is getting down so often recently
and I was curious about why seine.avonsys.com gets 3.1Gb of traffic on dev.two and 5.7Gb on doc.two
far beyond any other host
so I go see apache logs
but I don't see special thing
many nagios checks
srishti: ok
mose: but maybe you could make check on something else than tiki-index.php and /features ?
srishti: yea ok
mose: I also see, but that's not related
a lot of 404 to archives.tikiwiki.org
from a browser
at regular times
archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:32 +0100] "GET /tiki-index.php HTTP/1.1" 404 212 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-"
every minute
srishti: this is starange
mose: yes that's also what I think
archives.tikiwiki.org 113.20.89.106 - - [29/Jan/2010:00:11:55 +0100] "GET /features HTTP/1.1" 404 206 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)" "-"
looks like nagios
but without nagios signature
have a look on http://noc.tikiwiki.org/stats
srishti: looks like a false nagios check
mose: real nagios checks are every 3 minutes visibly
on 2 pages
on every host
the thing is, actually, as now we have monit + munin, http/apache checks could be removed from nagios
nagios check is still required to see if host pings, though
as well, on old server you used snmp, didn't you ?
srishti: yes
mose: what was it for ?
srishti: we didnt actually use snmp, just ping, and host_alive
mose: oh, snmp was setup on server, actually
so I guess that was just not used
srishti: yeah
mose: on the day of yesterday we had 1542 nagios checks, and 3796 MSIE false nagios checks
quite odd
srishti: really weird
***: timothyv1 has joined #tikiwiki-monitor
mose: btw what time is it for you right now ?
srishti: its 7.40pm PDT
mose: oh, nice
for me it's 2:41 pm
and 7:41 am for changi
good coverage :)
srishti: :-)
mose: we could use a guy from canada or us, to fill it up
srishti: hehe
mose: anyway, I guess it could be wise to remove the apache monitor from nagios, now
that's probably not related to our current probklems but that traffic is useless
srishti: ok sure
mose: do you want to get emails from monit/munin ?
srishti: yeah sure
mose: do you have an adress for the monitoring staff ?
srishti: you may use noc@avonsys.com
mose: great
I add you
srishti: i am also in the above group
ok nagios s no longer doing http_check
mose: superb, thanks
I will also add noc@avonsys as recipient of noc@tw.o if you don't mind
srishti: yea sure
mose: so you actually get other alerts
like diffmon messages
srishti: ok
mose: and when people send alerts by mail
we also use that email to discuss sysadmin issues
srishti: great
mose: we really need to fix the current condition of the server, it falls often for unknown reason
we have an armada of tools already but they are failing to auto-fix everything
srishti: so what other plans do yu have to get that fixed?
mose: we'll make plan when analysis will reveal the cause
we plan to dig out all the logs we have :)
well, logical move
I explored a bit but didn't see anything relevant yet
if you get time, you are welcome to join the exploration :)
srishti: ok sure
mose: we still get those requests from MSIE
have a look on tail -f /var/log/apache2/access.vcombined-2010.01.29 | grep 113.20.89.106
there is a ghost somewhere ! :)
srishti: on the server yeah
mose: oh we still have nagios check on doc.tikiwiki.org (only)
srishti: ok lemme have a look
ok done
i mean yu shouldnt be getting checks on doc now
changi: polom
hi tailers boy :)
srishti: hi changi
mose: heya changi :)
I'm a fanatic tailer
changi: mose: nice job with semaphor :)
mose: that was an easy one actually
error message was explicit
changi: mose: i'll have more time this WE, will inspect this damned apache log
mose: but that was actually the first time I bumped into such error
the thing is that sometimes apache gets ghosted
impossible to restart by normal way
changi: that's why i create the script in /usr/local/sbin called by monit
mose: maybe that's mod_bw side effect
combioned with some other oddity
changi: don't think so, we have this problem before
mose: oh
then maybe apc
changi: it's a tiki problem
mose: well, whatever tiki problem apache should live or die
not get ghosted
changi: when it ghosted, it use 100% of one cpu
try to fetch information from mysql
mose: so that's the famous tracker curse
changi: on the old server, the problem was on the mysql server that couldn't answer more query
i think so
mose: damn this thing
it was a hack
it became a beast
changi: it's an infinity loop
and as we put mysql on socket connection, it's apache that crash :)
mose: poor indian
changi: ?
mose: apaches are indian
changi: lol
mose: linux proposes quite a wide range of imagination pretexts ;)
changi: will try some fcgi tuning
to avoid this problem
and try to find in log what queyr is done when apache crashed
maybe put php5-cgi in debug mode
hav to work
see ya
***: rigieta has joined #tikiwiki-monitor
timothyv1 has left
changi|home has joined #tikiwiki-monitor
changi|home has left
changi|home has joined #tikiwiki-monitor
changi|home has left
srishti has quit IRC ("Leaving.")
rigieta has quit IRC ("Leaving.")
rupeni has joined #tikiwiki-monitor