Change Gluster Volume Connection Timeout for GlusterFS Native Client

Tuesday, February 24, 2015

After setting up a two node replica Gluster cluster to use as a web content backend, I began testing what would happen when I brought down one of the Gluster nodes. The web nodes access the Gluster cluster using the GlusterFS native client and I expected them to quickly see that one of the Gluster nodes was down and begin serving content from the healthy Gluster node, but that was not happening. My test involved simply rebooting one of the Gluster nodes and that node was coming back online within about 30 seconds. However, within that 30 seconds my website was down.

I discovered the default connection timeout for a Gluster client to flag a Gluster node as down is 42 seconds. You would think this is easy enough to change but it turned out to be a bit trickier.

First, I stumbled upon the network.ping-timeout Gluster volume setting. Looking at the documentation this value is set to 42 seconds. On one of the Gluster nodes, I changed it to 5 seconds with the following command (gv0 is the Gluster volume):

gluster volume set gv0 network.ping-timeout 5

However, this did not do anything.

Second, on both of the Gluster nodes, I tried applying a similar configuration change to the main Gluster configuration file at /etc/glusterfs/glusterd.vol by adding option network.ping-timeout 5:

volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option network.ping-timeout 5
#   option base-port 49152
end-volume

After restarting the glusterd service on both Gluster nodes, nothing had changed.

Finally, on both of the Gluster nodes, I found that I needed to add a specific volume configuration block for the Gluster volume in /etc/glusterfs/glusterd.vol and set option ping-timeout 5 inside it:

volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
#   option base-port 49152
end-volume

volume gv0
    type protocol/client
    option ping-timeout 5
end-volume

After restarting the glusterd service on both Gluster nodes, I was able to reboot one of the Gluster nodes, while still having a healthy one up, and within about 5 seconds continue serving content from my website.

References

Re: [Gluster-devel] How to change ping timeout



comments powered by Disqus