[Gluster-users] strange error hangs hangs any access to gluster mount

Mon Mar 28 16:52:35 UTC 2011

On 03/28/2011 12:43 PM, Burnash, James wrote:
> I am receiving an error on a client trying to access a gluster mount
> (/pfs2, in this case).
> 
> [2011-03-28 12:26:17.897887] I
> [dht-layout.c:588:dht_layout_normalize] pfs-ro1-dht: found anomalies
> in /. holes=1 overlaps=2
> 
> This is seen on the client in the /var/log/glusterfs/pfs2.log, which
> is the mount point associated with that storage.
> 
> All other clients accessing the same storage do not have the hanging
> symptom, and have no such entry in their logs.
> 
> One possibly helpful note - this node worked fine until I upgraded
> the client from 3.1.1-1 to 3.1.3-1 on the x86_64 architecture,
> running CentOS 5.2. Even after I completely uninstalled GlusterFS
> from this node and reinstalled 3.1.1-1, the problem persisted.
> 
> Here is the RPM info:
> 
> root at jc1lnxsamm33:~# rpm -qa fuse fuse-2.7.4-8.el5.x86_64 
> root at jc1lnxsamm33:~# rpm -qa "glusterfs*" 
> glusterfs-fuse-3.1.1-1.x86_64 glusterfs-core-3.1.1-1.x86_64 
> glusterfs-debuginfo-3.1.1-1.x86_64
> 
> Servers are 4 Replicated-Distribute machines running CentOS 5.5 and
> GlusterFs 3.1.3-1.
> 
> Volume Name: pfs-ro1 Type: Distributed-Replicate Status: Started 
> Number of Bricks: 20 x 2 = 40 Transport-type: tcp Bricks: Brick1:
> jc1letgfs17-pfs1:/export/read-only/g01 Brick2:
> jc1letgfs18-pfs1:/export/read-only/g01 Brick3:
> jc1letgfs17-pfs1:/export/read-only/g02 Brick4:
> jc1letgfs18-pfs1:/export/read-only/g02 ... Brick35:
> jc1letgfs14-pfs1:/export/read-only/g08 Brick36:
> jc1letgfs15-pfs1:/export/read-only/g08 Brick37:
> jc1letgfs14-pfs1:/export/read-only/g09 Brick38:
> jc1letgfs15-pfs1:/export/read-only/g09 Brick39:
> jc1letgfs14-pfs1:/export/read-only/g10 Brick40:
> jc1letgfs15-pfs1:/export/read-only/g10 Options Reconfigured: 
> performance.stat-prefetch: on performance.cache-size: 2GB 
> network.ping-timeout: 10
> 
> Any help greatly appreciated.

Can you execute the following command on each of the brick roots?

	 getfattr -d -e hex -n trusted.glusterfs.dht $brick_root

That should give a clearer picture of what the layouts look like, and
what those gaps/overlaps are.  How they happened is a bit of another
story.  I see this kind of thing pretty often, but I know it's because
of some Weird Stuff (tm) I do in CloudFS.  I'm not aware of any bugs
etc. that would cause this in other contexts.