[Gluster-users] Self heal problem

Marcus Wellhardh wellhardh at roxen.com
Fri Nov 29 13:05:33 UTC 2013


Hi,

I have a glusterfs volume replicated on three nodes. I am planing to use
the volume as storage for vMware ESXi machines using NFS. The reason for
using tree nodes is to be able to configure Quorum and avoid
split-brains. However, during my initial testing when intentionally and
gracefully restart the node "ned", a split-brain/self-heal error
occurred.

The log on "todd" and "rod" gives:

  [2013-11-29 12:34:14.614456] E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 0-gv0-replicate-0: open of <gfid:09b6d1d7-e583-4cee-93a4-4e972346ade3> failed on child gv0-client-2 (No such file or directory)

The reason is probably that the file was deleted and recreated with the
same file name during the time the node was offline, i.e. new inode and
thus new gfid. 

Is this expected? Is it possible to configure the volume to
automatically handle this?

The same problem happens every time I test a restart. It looks like
Vmware is constantly creating new lock-files for the vSphere-HA
directory.

Below you will find various information about the glusterfs volume. I
have also attached the full logs for all three nodes. 

[root at todd ~]# gluster volume info
 
Volume Name: gv0
Type: Replicate
Volume ID: a847a533-9509-48c5-9c18-a40b48426fbc
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: todd-storage:/data/gv0
Brick2: rod-storage:/data/gv0
Brick3: ned-storage:/data/gv0
Options Reconfigured:
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%

[root at todd ~]# gluster volume heal gv0 info 
Gathering Heal info on volume gv0 has been successful

Brick todd-storage:/data/gv0
Number of entries: 2
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb

Brick rod-storage:/data/gv0
Number of entries: 2
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb

Brick ned-storage:/data/gv0
Number of entries: 0

[root at todd ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
getfattr: Removing leading '/' from absolute path names
# file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000002810000000100000000
trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3

[root at todd ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
  File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
  Size: 84        	Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 1191        Links: 2
Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-11-29 11:38:36.285091183 +0100
Modify: 2013-11-29 13:26:24.668822831 +0100
Change: 2013-11-29 13:26:24.668822831 +0100

[root at rod ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
getfattr: Removing leading '/' from absolute path names
# file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000002810000000100000000
trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3

[root at rod ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
  File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
  Size: 84        	Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 1558        Links: 2
Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-11-29 11:38:36.284671510 +0100
Modify: 2013-11-29 13:26:24.668985155 +0100
Change: 2013-11-29 13:26:24.669985185 +0100

[root at ned ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
getfattr: Removing leading '/' from absolute path names
# file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
trusted.afr.gv0-client-0=0x000000000000000000000000
trusted.afr.gv0-client-1=0x000000000000000000000000
trusted.afr.gv0-client-2=0x000000000000000000000000
trusted.gfid=0x76caf49a25d74ebdb711a562412bee43

[root at ned ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
  File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
  Size: 84        	Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d	Inode: 4545        Links: 2
Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-11-29 11:34:45.199330329 +0100
Modify: 2013-11-29 11:37:03.773330311 +0100
Change: 2013-11-29 11:37:03.773330311 +0100

Regards,
Marcus Wellhardh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterfs-logs.tgz
Type: application/x-compressed-tar
Size: 79669 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131129/2d3d45cc/attachment.bin>


More information about the Gluster-users mailing list