[Gluster-users] dht_layout_dir_mismatch after simultaneously mkdir

Tue Dec 4 08:04:15 UTC 2012

Hello Avati,

I got the log as below.
We can see that one gfid (g13) differs from others.
I sometimes found same situation so far.
Best regards
Kondo
------------------------

backend/ai/anchor/3:

g10:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g11:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g12:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g13:trusted.gfid=0x93c1e7a2797a4b888e17d9b6582da365

g14:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g15:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g16:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g17:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g18:trusted.gfid=0xd1366c6a09c345e8bab26b7953ac6405

g10:trusted.glusterfs.dht=0x00000001000000003ffffffe5ffffffc

g11:trusted.glusterfs.dht=0x00000001000000005ffffffd7ffffffb

g12:trusted.glusterfs.dht=0x00000001000000007ffffffc9ffffffa

g13:trusted.glusterfs.dht=0x000000010000000000000000ffffffff

g14:trusted.glusterfs.dht=0x00000001000000009ffffffbbffffff9

g15:trusted.glusterfs.dht=0x0000000100000000bffffffadffffff8

g16:trusted.glusterfs.dht=0x0000000100000000dffffff9ffffffff

g17:trusted.glusterfs.dht=0x0000000100000000000000001ffffffe

g18:trusted.glusterfs.dht=0x00000001000000001fffffff3ffffffd

/backend/ai/anchor:

g10:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g11:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g12:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g13:trusted.gfid=0x6ee644c825dd4806b2680bfeb985a27d

g14:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g15:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g16:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g17:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g18:trusted.gfid=0x89a33935d5da4acd85586e075eca221d

g10:trusted.glusterfs.dht=0x00000001000000005555555471c71c6f

g11:trusted.glusterfs.dht=0x000000010000000071c71c708e38e38b

g12:trusted.glusterfs.dht=0x00000001000000008e38e38caaaaaaa7

g13:trusted.glusterfs.dht=0x0000000100000000aaaaaaa8c71c71c3

g14:trusted.glusterfs.dht=0x0000000100000000c71c71c4e38e38df

g15:trusted.glusterfs.dht=0x0000000100000000e38e38e0ffffffff

g16:trusted.glusterfs.dht=0x0000000100000000000000001c71c71b

g17:trusted.glusterfs.dht=0x00000001000000001c71c71c38e38e37

g18:trusted.glusterfs.dht=0x000000010000000038e38e3855555553

/backend/ai

g10.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g11.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g12.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g13.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g14.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g15.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g16.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g17.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g18.gfid:trusted.gfid=0x258ba47b30ee41a984ceea1a491b1669

g10.gfid:trusted.glusterfs.dht=0x00000001000000008e38e38caaaaaaa7

g11.gfid:trusted.glusterfs.dht=0x0000000100000000aaaaaaa8c71c71c3

g12.gfid:trusted.glusterfs.dht=0x0000000100000000c71c71c4e38e38df

g13.gfid:trusted.glusterfs.dht=0x0000000100000000e38e38e0ffffffff

g14.gfid:trusted.glusterfs.dht=0x0000000100000000000000001c71c71b

g15.gfid:trusted.glusterfs.dht=0x00000001000000001c71c71c38e38e37

g16.gfid:trusted.glusterfs.dht=0x000000010000000038e38e3855555553

g17.gfid:trusted.glusterfs.dht=0x00000001000000005555555471c71c6f

g18.gfid:trusted.glusterfs.dht=0x000000010000000071c71c708e38e38b

On 2012/12/04, at 9:20, Anand Avati <anand.avati at gmail.com> wrote:

On Mon, Dec 3, 2012 at 5:14 AM, kenji kondo <kkay.jp at gmail.com> wrote:

> Hello Avati, thank you for your reply.
>
> I tried to test your suggestion in 3.2.7, but I could not test in 3.3.x
> because I don't have.
> For the results, unfortunately new similar problems occurred as following:
>
> gluster> volume set vol22 performance.stat-prefetch off
> gluster> volume info vol22
>
> Volume Name: vol22
> Type: Distribute
> Status: Started
> Number of Bricks: 9
> Transport-type: tcp
> Bricks:
> Brick1: gluster10:/export22/brick
> Brick2: gluster11:/export22/brick
> Brick3: gluster12:/export22/brick
> Brick4: gluster13:/export22/brick
> Brick5: gluster14:/export22/brick
> Brick6: gluster15:/export22/brick
> Brick7: gluster16:/export22/brick
> Brick8: gluster17:/export22/brick
> Brick9: gluster18:/export22/brick
> Options Reconfigured:
> performance.stat-prefetch: off
>
>
> After this setting, I ran same simulation in the vol22.
> But strange directories were made, that I could not remove the some
> directories as below:
>
> $ ls -a ai/anchor/3
> .  ..
>
> $ rmdir ai/anchor/3
> rmdir: ai/anchor/3: No such file or directory
>
>
> then I found error messages:
>
> [2012-12-03 18:08:14.816313] E [client3_1-fops.c:2228:client3_1_lookup_cbk]
> 0-vol22-client-3: remote operation failed: Stale NFS file handle
> [2012-12-03 18:08:14.817196] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-4
> [2012-12-03 18:08:14.817258] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-0
> [2012-12-03 18:08:14.817322] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-2
> [2012-12-03 18:08:14.817367] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-1
> [2012-12-03 18:08:14.817398] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-5
> [2012-12-03 18:08:14.817430] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-8
> [2012-12-03 18:08:14.817460] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-7
> [2012-12-03 18:08:14.817506] W [dht-common.c:178:dht_lookup_dir_cbk]
> 0-vol22-dht: /test1130/ai/anchor: gfid different on vol22-client-6
> [2012-12-03 18:08:14.818865] E [client3_1-fops.c:2132:client3_1_opendir_cbk]
> 0-vol22-client-3: remote operation failed: No such file or directory
> [2012-12-03 18:08:14.819198] W [fuse-bridge.c:1016:fuse_unlink_cbk]
> 0-glusterfs-fuse: 1684950: RMDIR()
> /test1130/ai/anchor/3 => -1 (No such file or directory)
>
>
> And, I found strange dht with getfattr command,
>
> $ sudo getfattr -d -m '.*' -n trusted.glusterfs.pathinfo ai/anchor/3
>
> trusted.glusterfs.pathinfo="(vol22-dht-layout (vol22-client-0 1073741822
> 1610612732) (vol22-client-1 1610612733 2147483643) (vol22-client-2 2147483644
> 2684354554)
>
>  (vol22-client-3 0 0)
>
> (vol22-client-4 2684354555 3221225465)
> (vol22-client-5 3221225466 3758096376) (vol22-client-6 3758096377 4294967295)
> (vol22-client-7 0 536870910) (vol22-client-8 536870911 1073741821))"
>
>
> vol22-client-3 is 0 to 0?: this would be incorrect.
>
> Above problem is found in all of our clients.
> I expect that the problems relate to exclusive control for mkdir, but
> I don't enough understand this phenomenon.
>
> Can you have some idea?
> I will try.
>
>
In such a state, can you get the backend xattr dumps of ai/anchor/ and
ai/anchor/3/ with the following commands run on ALL servers -

sh# getfattr -d -e hex -m . /backend/ai/anchor/
sh# getfattr -d -e hex -m . /backend/ai/anchor/3/

Thanks,
Avati
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121204/b52bd6ce/attachment.html>