[Gluster-users] Long client hangs in replicated configuration with 2-nodes acting as own clients

Fri Nov 4 21:47:42 UTC 2011

Hello;

I have installed glusterfs 3.2.4 to a pair of Red Hat Enterprise Linux 6.1
x86_64 machines with 2GB memory. I am attempting to mirror a directory full
of content between the two servers, which also serve and update the content
through a webapp via Apache. My issue is that the client mount points hang
for 30 minutes or so if either node is brought down.

The volfile will be at the end of this e-mail.

I setup two bricks, one each on nodes server01 and server02, using ext4 and
acl mount options. The vfstab entries on each server look like this

/dev/mapper/sysvg-brick01 /brick01              ext4
defaults,nosuid,acl        1 2

>From one host, I configure them as a mirror and start the volume:

gluster volume create volume01 replica 2 transport tcp server01:/brick01
server02:/brick01
gluster volume start volume01

Then server1 and server2 each mount the volume from themselves via
/etc/fstab entry:

localhost:/volume01     /glusterfs/vol01        glusterfs
defaults,_netdev,acl     0 0

This works, modifications inside /glusterfs/vol01 are seen by the other
host. However when I reboot either server01 or server02, the client mount
point on the surviving node (/glusterfs/vol01) hangs until the node
reboots. If the node never boots, the client mount point on the surviving
node hangs for 30 minutes. I have tried reducing frame-timeout to 10
seconds to no avail.

Also; once the rebooted server comes back online it fails to mount
/glusterfs/vol01, hanging, again for 30 minutes. A subsequent remount
succeeds. Cancelling the hung mount with umount -f /glusterfs/vol01 and
then re-mounting succeeds.

Any ideas what I am doing wrong?

Here is the volfile from /var/log/glusterfs/glusterfs-vol01.log

  1: volume volume01-client-0
  2:     type protocol/client
  3:     option remote-host server01
  4:     option remote-subvolume /brick01
  5:     option transport-type tcp
  6:     option frame-timeout 10
  7: end-volume
  8:
  9: volume volume01-client-1
 10:     type protocol/client
 11:     option remote-host server02
 12:     option remote-subvolume /brick01
 13:     option transport-type tcp
 14:     option frame-timeout 10
 15: end-volume
 16:
 17: volume volume01-replicate-0
 18:     type cluster/replicate
 19:     subvolumes volume01-client-0 volume01-client-1
 20: end-volume
 21:
 22: volume volume01-write-behind
 23:     type performance/write-behind
 24:     subvolumes volume01-replicate-0
 25: end-volume
 26:
 27: volume volume01-read-ahead
 28:     type performance/read-ahead
 29:     subvolumes volume01-write-behind
 30: end-volume
 31:
 32: volume volume01-io-cache
 33:     type performance/io-cache
 34:     subvolumes volume01-read-ahead
 35: end-volume
 36:
 37: volume volume01-quick-read
 38:     type performance/quick-read
 39:     subvolumes volume01-io-cache
 40: end-volume
 41:
 42: volume volume01-stat-prefetch
 43:     type performance/stat-prefetch
 44:     subvolumes volume01-quick-read
 45: end-volume
 46:
 47: volume volume01
 48:     type debug/io-stats
 49:     option latency-measurement off
 50:     option count-fop-hits off
 51:     subvolumes volume01-stat-prefetch
 52: end-volume
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111104/97c6ff60/attachment.html>