[Gluster-users] Self heal problem

Marcus Wellhardh wellhardh at roxen.com
Tue Dec 3 08:01:36 UTC 2013


Hi,

I did a trivial test to verify my delete/recreate theory:

  1) File exists on all nodes.
  2) One node is powered down.
  3) File is deleted and recreated with same filename.
  4) Failing node is restarted.
  5) Self heal worked on the modified file.

Glusterfs handled that above scenario perfectly. So the question is why
does self heal fail on the vSphere-HA lock file? Does anyone have a
troubleshoot idea?

I am using:

  glusterfs-3.4.1-3.el6.x86_64
  CentOS release 6.4

Regards,
Marcus

On Fri, 2013-11-29 at 14:05 +0100, Marcus Wellhardh wrote: 
> Hi,
> 
> I have a glusterfs volume replicated on three nodes. I am planing to use
> the volume as storage for vMware ESXi machines using NFS. The reason for
> using tree nodes is to be able to configure Quorum and avoid
> split-brains. However, during my initial testing when intentionally and
> gracefully restart the node "ned", a split-brain/self-heal error
> occurred.
> 
> The log on "todd" and "rod" gives:
> 
>   [2013-11-29 12:34:14.614456] E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 0-gv0-replicate-0: open of <gfid:09b6d1d7-e583-4cee-93a4-4e972346ade3> failed on child gv0-client-2 (No such file or directory)
> 
> The reason is probably that the file was deleted and recreated with the
> same file name during the time the node was offline, i.e. new inode and
> thus new gfid. 
> 
> Is this expected? Is it possible to configure the volume to
> automatically handle this?
> 
> The same problem happens every time I test a restart. It looks like
> Vmware is constantly creating new lock-files for the vSphere-HA
> directory.
> 
> Below you will find various information about the glusterfs volume. I
> have also attached the full logs for all three nodes. 
> 
> [root at todd ~]# gluster volume info
>  
> Volume Name: gv0
> Type: Replicate
> Volume ID: a847a533-9509-48c5-9c18-a40b48426fbc
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: todd-storage:/data/gv0
> Brick2: rod-storage:/data/gv0
> Brick3: ned-storage:/data/gv0
> Options Reconfigured:
> cluster.server-quorum-type: server
> cluster.server-quorum-ratio: 51%
> 
> [root at todd ~]# gluster volume heal gv0 info 
> Gathering Heal info on volume gv0 has been successful
> 
> Brick todd-storage:/data/gv0
> Number of entries: 2
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> 
> Brick rod-storage:/data/gv0
> Number of entries: 2
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware
> /production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> 
> Brick ned-storage:/data/gv0
> Number of entries: 0
> 
> [root at todd ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> getfattr: Removing leading '/' from absolute path names
> # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.afr.gv0-client-2=0x000002810000000100000000
> trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3
> 
> [root at todd ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
>   File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
>   Size: 84        	Blocks: 8          IO Block: 4096   regular file
> Device: fd03h/64771d	Inode: 1191        Links: 2
> Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-11-29 11:38:36.285091183 +0100
> Modify: 2013-11-29 13:26:24.668822831 +0100
> Change: 2013-11-29 13:26:24.668822831 +0100
> 
> [root at rod ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> getfattr: Removing leading '/' from absolute path names
> # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.afr.gv0-client-2=0x000002810000000100000000
> trusted.gfid=0x09b6d1d7e5834cee93a44e972346ade3
> 
> [root at rod ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
>   File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
>   Size: 84        	Blocks: 8          IO Block: 4096   regular file
> Device: fd03h/64771d	Inode: 1558        Links: 2
> Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-11-29 11:38:36.284671510 +0100
> Modify: 2013-11-29 13:26:24.668985155 +0100
> Change: 2013-11-29 13:26:24.669985185 +0100
> 
> [root at ned ~]# getfattr -m . -d -e hex /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> getfattr: Removing leading '/' from absolute path names
> # file: data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
> trusted.afr.gv0-client-0=0x000000000000000000000000
> trusted.afr.gv0-client-1=0x000000000000000000000000
> trusted.afr.gv0-client-2=0x000000000000000000000000
> trusted.gfid=0x76caf49a25d74ebdb711a562412bee43
> 
> [root at ned ~]# stat /data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb
>   File: `/data/gv0/production-cluster/.vSphere-HA/FDM-DA596AD1-4A6C-4571-A3C8-2114B4FF61EA-5034-b6e1d26-vmware/.lck-5e711126a297a6bb'
>   Size: 84        	Blocks: 8          IO Block: 4096   regular file
> Device: fd03h/64771d	Inode: 4545        Links: 2
> Access: (0775/-rwxrwxr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2013-11-29 11:34:45.199330329 +0100
> Modify: 2013-11-29 11:37:03.773330311 +0100
> Change: 2013-11-29 11:37:03.773330311 +0100
> 
> Regards,
> Marcus Wellhardh
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users





More information about the Gluster-users mailing list