[Gluster-users] Questions about gluster reblance

Shyam srangana at redhat.com
Wed Sep 10 14:36:41 UTC 2014


On 09/10/2014 03:27 AM, Paul Guo wrote:
> Hello,
>
> Recently I spent a bit time understanding rebalance since I want to know its
> performance given that there could be more and more bricks to be added into
> my glusterfs volume and there will be more and more files and directories
> in the existing glusterfs volume. During the test I saw something which I'm
> really confused about.
>
> Steps:
>
> SW versions: glusterfs 3.4.4 + centos 6.5
> Inital Configuration: replica 2, lab1:/brick1 + lab2:/brick1
>
> fuse_mount it on /mnt
> cp -rf /sbin /mnt (~300+ files under /sbin)
> add two more bricks: lab1:/brick2 + lab2:/brick2.
> run gluster reblance.
>
> 1) fix-layout only (e.g. gluster volume rebalance g1 fix-layout start)‍
>
> After rebalance is done (observed via "gluster volume rebalance g1
> status"),‍
> I found there is no file under lab1:/brick2/sbin. The hash ranges of
> new brick‍lab1:/brick2/sbin and old brick lab1:/brick1/sbin appear to
> be ok.
>
> [root at lab1 Desktop]# getfattr -dm. -e hex /brick2/sbin
> getfattr: Removing leading '/' from absolute path names
> # file: brick2/sbin
> trusted.gfid=0x35976c2034d24dc2b0639fde18de007d
> trusted.glusterfs.dht=0x00000001000000007fffffffffffffff
>
> [root at lab1 Desktop]# getfattr -dm. -e hex /brick1/sbin
> getfattr: Removing leading '/' from absolute path names
> # file: brick1/sbin
> trusted.gfid=0x35976c2034d24dc2b0639fde18de007d
> trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
>> The question is: AFAIK, fix-layout would create "linkto" files
> (files with "linkto" xattr and with sticky bit set only)
> for those ones whose hash values belong
> to the new subvol. so there should have been some "linkto" files
> under lab1:/brick2, but no one now, why?

fix-layout only fixes the layout, i.e spreads the layout to the newer 
bricks (or bricks previously not participating in the layout). It would 
not create the linkto files.

Post fix-layout, if one were to perform a lookup on a file, that should 
have belonged to the newer brick as per the layout and hash of that file 
name, one can see the linkto file being present.

Hope this explains (1).

>
> 2) fix-layout + data_migrate (e.g. gluster volume rebalance g1 start)
>
> After migration is done, I saw linkto files under brick2/sbin.‍
> There are totally 300+ files under system /sbin. Under brick2/sbin,
> I found the 300+ files are all there! either migrated or linkto-ed.
>
> -rwxr-xr-x 2 root root   17400 Sep 10 12:02 vmcore-dmesg
> ---------T 2 root root       0 Sep 10 12:03 weak-modules
> ---------T 2 root root       0 Sep 10 12:03 wipefs
> -rwxr-xr-x 2 root root  295656 Sep 10 12:02 xfsdump
> -rwxr-xr-x 2 root root  510000 Sep 10 12:02 xfs_repair
> -rwxr-xr-x 2 root root  348088 Sep 10 12:02 xfsrestore
>
> And under brick1/sbin, those migrated files are gone as expected.
> There are near to 150 files under brick/sbin.
>> This confuses me since creating those linkto files seems to
> be unnecessary, at least for files whose hash values do not belong
> to the subvol. (My understanding is that if a file's hash value is
> in the range of a subvol then it will be stored in that subvol.)

Can you check if a lookup of the file post rebalance clears up these 
_stale_ linkto files?

How did you compute the hash of these files and decide that they do not 
belong to the new brick (i.e brick2)? I did them on my end and you are 
right (based on the layout you presented above), but I am curious as to 
how you arrived at the same conclusion.

Rebalance could choose to not move files but just create the linkto 
files based on space usage between the source and target bricks etc. Not 
stating this is what happened here, but a possibility.

>
> I quickly looked at the code. gf_defrag_start_crawl() appears to
> be the function for this operation. I do see code that does file migration
> from the code path, but debugging code shows that those "linkto" files
> seem to be not created by gf_defrag_start_crawl(). I'm not that familar with
> the code detail and the theory so I'm not sure who created those
> "linkto" files and why the "linkto" file are created.

I am going to leave this part as, dht_linkfile_create does this and 
mostly would happen during lookup.

Shyam


More information about the Gluster-users mailing list