Wednesday, February 25, 2015
I recently setup a two node replica Gluster cluster on CentOS 6.6 with Gluster version 3.6.2-1. I setup a private network specifically for Gluster replication and communication and binded the Gluster daemons to this network.
I binded the Gluster daemons by adding the option transport.socket.bind-address line to the main Gluster configuration file at /etc/glusterfs/glusterd.vol on each Gluster node (on the second Gluster node 192.168.3.1 would be changed to 192.168.3.2):
volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option transport.socket.bind-address 192.168.3.1 # option base-port 49152 end-volume
Everything was working fine until I began playing with Gluster heal.
When I ran command
gluster volume heal gv0 it outputted the following and everything looked normal:
Launching heal operation to perform index self heal on volume gv0 has been successful Use heal info commands to check status
However, when I ran command
gluster volume heal gv0 info, the following was displayed:
gv0: Not able to fetch volfile from glusterd Volume heal failed
Obviously, something was wrong. The Gluster heal log file for the particular Gluster Volume, located at /var/log/glusterfs/glfsheal-gv0.log, outputted the following:
glfsheal-gv0.log:[2015-02-24 16:31:53.788775] E [socket.c:2267:socket_connect_finish] 0-gfapi: connection to 127.0.0.1:24007 failed (Connection refused)
Gluster heal was trying to connect to port 24007 on localhost, but, because I binded the Gluster daemon to the 192.168.3.0 network, that connection was refused. This occurs because the Gluster heal daemon has localhost hardcoded. A very old email thread explains why this is the case.
That same email thread suggests to try running
gluster --remote-host=<IP of Gluster Node> volume heal gv0 info, but this yielded the same error as above.
The only fix I found was to unbind the Gluster daemon by removing the line from the main Gluster configuration file mentioned above.
After that, running command
gluster volume heal gv0 info displayed the proper output:
Brick gluster1.example.com:/export/xvdb1/brick/ Number of entries: 0 Brick gluster2.example.com:/export/xvdb1/brick/ Number of entries: 0