Gluster Heal Not able to fetch volfile from glusterd

Published February 25, 2015 • Updated June 14, 2017

I recently setup a two node replica Gluster cluster on CentOS 6.6 with Gluster version 3.6.2-1. I setup a private network specifically for Gluster replication and communication and binded the Gluster daemons to this network.

I binded the Gluster daemons by adding the option transport.socket.bind-address line to the main Gluster configuration file at /etc/glusterfs/glusterd.vol on each Gluster node (on the second Gluster node 192.168.3.1 would be changed to 192.168.3.2):

volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option transport.socket.bind-address 192.168.3.1
#   option base-port 49152
end-volume

Everything was working fine until I began playing with Gluster heal.

When I ran command gluster volume heal gv0 it outputted the following and everything looked normal:

Launching heal operation to perform index self heal on volume gv0 has been successful 
Use heal info commands to check status

However, when I ran command gluster volume heal gv0 info, the following was displayed:

gv0: Not able to fetch volfile from glusterd
Volume heal failed

Something was wrong. The Gluster heal log file for the particular Gluster Volume, located at /var/log/glusterfs/glfsheal-gv0.log, outputted the following:

glfsheal-gv0.log:[2015-02-24 16:31:53.788775] E [socket.c:2267:socket_connect_finish] 0-gfapi: connection to 127.0.0.1:24007 failed (Connection refused)

Gluster heal was trying to connect to port 24007 on localhost, but, because I binded the Gluster daemon to the 192.168.3.0 network, that connection was refused. This occurs because the Gluster heal daemon has localhost hardcoded. A very old email thread explains why this is the case.

That same email thread suggests to try running gluster --remote-host=<IP of Gluster Node> volume heal gv0 info, but this yielded the same error as above.

The only fix I found was to unbind the Gluster daemon by removing the line from the main Gluster configuration file mentioned above.

After that, running command gluster volume heal gv0 info displayed the proper output:

Brick gluster1.example.com:/export/xvdb1/brick/
Number of entries: 0

Brick gluster2.example.com:/export/xvdb1/brick/
Number of entries: 0