mmm agent cannot connect to MySQL even when mmm agent user is already created on DB

Asked by Gurvinder Dadyala

There is something wrong with mmm-agent which runs on database nodes. This is what happening now.

I have made 2 DB nodes and i have 2 virtual IPs [192.168.100.196, 192.168.100.197]. When i start mmm-monitor on monitor node and other agents on DB nodes everything works well. MMM-monitor assigns wirter role to DB1 and reader role to DB2. After 15 or 30 mins mmm-agent on assigned writer node starts throwing error message that it is not able to connect to MySQL. I am not sure why it does so because mmm-agent user is already created under both db nodes and mmm-agent does not throw this message when you start agents. It happens in between automatically after 15-30 mins. Following is the error message.

2011/06/08 14:01:16 FATAL Couldn't deny writes: ERROR: Can't connect to MySQL (host = 192.168.100.98:3306, user = mmm_agent)! Can't connect to MySQL server on '192.168.100.98' (4)

I have checked configuration files on both nodes and everything is good. Even i have deleted user credentials and created them again just to make sure it is not because of mmm-agent user credentials. This usually happens on node which has been assigned writer role.

Any inputs are welcome.

Question information

Language:
English Edit question
Status:
Answered
For:
mysql-mmm Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Kenny Gryp (gryp) said :
#1

The errorcode 4 is mentioned in your error.
This means that the code got interrupted, which indicates a timeout during the checks.

You might want to investigate connection time to mysql. or higher the timeout for the checks.

Revision history for this message
Gurvinder Dadyala (gurvinder) said :
#2

Hi Kenny
            Thanx for reviewing my query. I tried doing it. Earlier i had MySQL check interval of 1 sec in my mmm monitor configuration file. Then i changed it to 3 secs. still it is behaving like that. Following is my MMM monitor configuration file, you can have a look. Here i am checking MySQL every 3 secs.

################# FILE STARTS HERE ######################

include mmm_common.conf

<monitor>
    ip 192.168.100.96
    pid_path /var/run/mysql-mmm/mmm_mond.pid
    bin_path /usr/libexec/mysql-mmm
    status_path /var/lib/mysql-mmm/mmm_mond.status
    ping_ips 192.168.100.97, 192.168.100.98
    auto_set_online 60
    # The kill_host_bin does not exist by default, though the monitor will
    # throw a warning about it missing. See the section 5.10 "Kill Host
    # Functionality" in the PDF documentation.
    #
    # kill_host_bin /usr/libexec/mysql-mmm/monitor/kill_host
    #

</monitor>

# Ping checker
<check ping>
    check_period 3
    trap_period 5
    timeout 2
</check>

# Mysql checker
<check mysql>
    check_period 3
    trap_period 2
    timeout 2
</check>

# Mysql replication backlog checker
<check rep_backlog>
    check_period 5
    trap_period 10
    max_backlog 60
    timeout 2
</check>

# Mysql replication threads checker
<check rep_threads>
    check_period 3
    trap_period 5
    timeout 2
</check>

<host default>
    monitor_user mmm_monitor
    monitor_password monitor_password
</host>

debug 0
################### FILE ENDS HERE ##############################

You can also have a look at MMM common configuration file below.
################### FILE STARTS HERE ############################

active_master_role writer

<host default>
    cluster_interface eth0
    pid_path /var/run/mysql-mmm/mmm_agentd.pid
    bin_path /usr/libexec/mysql-mmm/
    replication_user replicant
    replication_password mypwd
    agent_user mmm_agent
    agent_password agent_password
</host>

<host db1>
    ip 192.168.100.97
    mode master
    peer db2
</host>

<host db2>
    ip 192.168.100.98
    mode master
    peer db1
</host>

#<host db3>
# ip 192.168.100.51
# mode slave
#</host>

<role writer>
    hosts db1, db2
    ips 192.168.100.196
    mode exclusive
</role>

<role reader>
    hosts db1, db2
    ips 192.168.100.197, 192.168.100.198
    mode balanced
</role>

######################## FILE ENDS HERE ##########################

In the articles i have got through on internet, I have found that many users are going for 1 sec MySQL check. Do you think if something needs to changed in MySQL like any time option. I appreciate that you looked into my issue. Thanx again.

Revision history for this message
Kenny Gryp (gryp) said :
#3

I'm talking about the timeout, not the check interval...

see below..

# Ping checker
<check ping>
    check_period 3
    trap_period 5
    timeout 2
</check>

I suggest to test the connection time from the monitoring node to the database in a script and see if it sometimes takes longer than 2 seconds.

The code uses the perl alarm() function with the timeout to bail out if the check if it takes longer than 'timeout'. Then you would get error 4 . see lib/Monitor/Checker/Checks.pm

Revision history for this message
Gurvinder Dadyala (gurvinder) said :
#4

I will check this and then post back. I do not understand why agent does not start start showing this error message when i start agents on the nodes. This always happens with node that has been assigned WRITER role by MMM monitor. This starts happening mostly after 15-20 mins, I get 2-3 same error messages and then it gets fixed automatically. Anyways I appreciate that you have shown me some way ahead. Till then take care.

Revision history for this message
Gurvinder Dadyala (gurvinder) said :
#5

I am also trying to find some script to check the total response time. I have not found anything yet. If you have it let me know.

Revision history for this message
Kenny Gryp (gryp) said :
#6

do a loop like this:

(while true; do
time mysql -h hostname -uuser -ppass
sleep 10
done) > logfile

It could be that you're having some issues with dns, which slows down connection/authentication time.
Or the network starts to become a problem when a lot of traffic comes in.

Also try to run the loop on localhost to see if it also happens if you only use localhost and not the network.

Can you help with this problem?

Provide an answer of your own, or ask Gurvinder Dadyala for more information if necessary.

To post a message you must log in.