[Gluster-users] [EXT] link files not being created

Xavi Hernandez jahernan at redhat.com
Wed Oct 19 09:08:40 UTC 2022


On Tue, Oct 18, 2022 at 12:22 PM Stefan Solbrig <stefan.solbrig at ur.de>
wrote:

> Hi Xavi,
>
> Hi Stefan,
>
> On Tue, Oct 18, 2022 at 10:34 AM Stefan Solbrig <stefan.solbrig at ur.de>
> wrote:
>
>> Hi Xavi,
>>
>> On Mon, Oct 17, 2022 at 1:03 PM Stefan Solbrig <stefan.solbrig at ur.de>
>> wrote:
>>
>>> Dear all,
>>>
>>> I was doing some testing regarding to GlusterFS link files (as they are
>>> created by a "move" operation). According to this document:
>>> https://www.gluster.org/glusterfs-algorithms-distribution/  If a link
>>> file is missing, it should be created after accessing the file.
>>> However, I don't see this behaviour.  If I delete (by hand) a link file
>>> on the brick, the file is still accessible, but the link file is never
>>> recreated. I can do an "open" or a "stat" on the file without getting an
>>> error, but the link file is not created.
>>> Is this the intended behaviour? Or am I misunderstanding the above
>>> mentioned document?
>>>
>>
>> You shouldn't access or modify the backend filesystems manually, you can
>> accidentally create unexpected problems if you don't fully understand what
>> you are doing.
>>
>> That said, most probably the access to the file is still working because
>> Gluster is using its cached information to locate the file. If the client
>> mount is restarted, probably the file won't be accessible anymore unless
>> you disable the "lookup-optimize" option (and this should recreate the link
>> file).
>>
>> Regards,
>>
>> Xavi
>>
>>
>> Thanks for the quick reply!  Maybe I should explain better my motivation
>> for the above mentioned experiments. I have a large production system
>> running GlusterFS with almost 5 PB of data (in approx 100G of inodes). It's
>> a distributed-only system (no sharding, not dispersed).  In this system,
>> the users sometimes experience the problem that they cannot delete a
>> seemingly empty directory.  The cause of this problem is, that the
>> directory contains leftover link files, i.e. dht link files where the
>> target is gone. I haven't identified yet why this happens and I don't have
>> a method to provoke this error (otherwise I would have mentioned it on this
>> list already.)
>>
>
> What version of Gluster are you using ? if I remember correctly, there was
> a fix in 3.10.2 (and some other following patches) to delete stale link
> files when deleting empty directories to avoid precisely this problem.
> Recently there have also been some patches to avoid leaving some of those
> stale entries.
>
> If you are still using 3.x I would recommend you to upgrade to a newer
> version, which have many issues already fixed.
>
>
> I'm using 9.4 for the servers but my client (fuse) is still on 6.0. I know
> that's not optimal and I hope to change this soon, migrating everything to
> 9.6
>

In that case you shouldn't have had stale link files during rmdir. All the
files you removed were link files with rights set to "---------T", size 0
and an xattr named "trusted.glusterfs.dht.linkto" ?


>
>
>>  But my quick & dirty fix is, to delete these leftover link files by
>> hand.  (These leftover link files are not being cleaned up by a
>> "rebalance".)
>>
>
> If you only remove the file, you are leaving some data behind that should
> also be removed. Each file is associated with an entry inside
> .glusterfs/xx/yy in the brick, called gfid. This entry has the format of an
> uuid and can be determined by reading (in hex) the "trusted.gfid" xattr of
> the file you are going to delete:
>
> # getfattr -n trusted.gfid -e hex <file>
>
>
> If you manually remove files, you should also remove the gfid
>
>
> Yes, I'm aware of these files. Once I remove the (named) link file, the
> .glusterfs/xx/yy/....  will be the ones that have zero size and no other
> hard link. As far as I understand, every file on the bricks has a hard link
> to .glusterfs/xx/yy/... with the full name representing its gfid.  I tend
> to remove these as well.
>

Correct, except for directories, which are represented by a symlink with a
single hardlink.


>
>
>> The reason for my experiments with link files is: what happens if for
>> some reason I accidentally delete a link file where the target still exists?
>>
>> In the experiments (not on the production system) I also tried umounting
>> and remounting the system, and I already tried setting "loopup-optmize =
>> off". It doesn't affect the outcome of the experiments.
>>
>
> If after remounting the volume you are still able to access the file but
> the link file is not created, then it means that it's not needed. Maybe it
> was one of those stale link files.
>
>
> Not really... This was the case of the experiment, where I tried to delete
> the link file and the corresponding .glusterfs/x/yy,  stopped the volume,
> umounted, restarted the volume, remounted, but the link file is still not
> being recreated.
>

If the file is accessible after remounting and the link file is not
created, it means that dht doesn't need it. It could be a leftover from a
previous operation (rename, rebalance, add-brick, remove-brick ...).


> Can you give me one example of those link files (I need the name) and the
> trusted.glusterfs.dht xattr of the parent directory from all bricks ?
>
> # getfattr -n trusted.glusterfs.dht -e hex <path/to/directory>
>
>
> Regards,
>
> Xavi
>
>
> Here's one of the stale files:
>
> [root at glubs-01 testvol]# getfattr -d -m. -e hex
> /gl/lv1lucuma/glurchbrick/scratch/analysis/CLS/N302/N302r001/run11/XMLOUT/N302r001n631_sto100.out.xml
> getfattr: Removing leading '/' from absolute path names
> # file:
> gl/lv1lucuma/glurchbrick/scratch/analysis/CLS/N302/N302r001/run11/XMLOUT/N302r001n631_sto100.out.xml
> trusted.gfid=0x6155412f6ade4009bcb92d839c2ad8b3
>
> trusted.gfid2path.428e23fc0d37fc71=0x33343536636634622d336436642d346331622d386331622d6662616466643266356239302f4e333032723030316e3633315f73746f3130302e6f75742e786d6c
> trusted.glusterfs.dht.linkto=0x676c757263682d636c69656e742d3900
> trusted.pgfid.3456cf4b-3d6d-4c1b-8c1b-fbadfd2f5b90=0x00000001
>
> And here is the trusted.glusterfs.dht of the top level directory of each
> brick:
>
> trusted.glusterfs.dht=0x0888f55900000000b9ec78f7c58cd403
> trusted.glusterfs.dht=0x0888f55900000000e59527f2f148c5e9
> trusted.glusterfs.dht=0x0888f55900000000c58cd404ce451686
> trusted.glusterfs.dht=0x0888f55900000000f148c5eafa0f7a9c
> trusted.glusterfs.dht=0x0888f55900000000ce451687d6fd5909
> trusted.glusterfs.dht=0x0888f5590000000008e8c7fe11af7cb0
> trusted.glusterfs.dht=0x0888f55900000000d6fd590ae29ce547
> trusted.glusterfs.dht=0x0888f55900000000209640c72c05d3e5
> trusted.glusterfs.dht=0x0888f55900000000e29ce548e419069c
> trusted.glusterfs.dht=0x0888f55900000000e419069de59527f1
> trusted.glusterfs.dht=0x0888f55900000000fa0f7a9dfb8b9bf1
> trusted.glusterfs.dht=0x0888f55900000000fb8b9bf2fd07bd46
> trusted.glusterfs.dht=0x0888f55900000000fd07bd47fe83de9b
> trusted.glusterfs.dht=0x0888f55900000000fe83de9cffffffff
> trusted.glusterfs.dht=0x0888f5590000000000000000017c2154
> trusted.glusterfs.dht=0x0888f55900000000017c215502f842a9
> trusted.glusterfs.dht=0x0888f5590000000002f842aa047463fe
> trusted.glusterfs.dht=0x0888f55900000000047463ff05f08553
> trusted.glusterfs.dht=0x0888f5590000000005f08554076ca6a8
> trusted.glusterfs.dht=0x0888f55900000000076ca6a908e8c7fd
> trusted.glusterfs.dht=0x0888f5590000000011af7cb1132b9e05
> trusted.glusterfs.dht=0x0888f55900000000132b9e0614a7bf5a
> trusted.glusterfs.dht=0x0888f5590000000014a7bf5b1623e0af
> trusted.glusterfs.dht=0x0888f559000000001623e0b017a35fb5
> trusted.glusterfs.dht=0x0888f5590000000017a35fb6191f810a
> trusted.glusterfs.dht=0x0888f55900000000191f810b1a9f0010
> trusted.glusterfs.dht=0x0888f559000000001a9f00111c1b2165
> trusted.glusterfs.dht=0x0888f559000000001c1b21661d9aa06b
> trusted.glusterfs.dht=0x0888f559000000001d9aa06c1f16c1c0
> trusted.glusterfs.dht=0x0888f559000000001f16c1c1209640c6
> trusted.glusterfs.dht=0x0888f559000000002c05d3e62d81f53a
> trusted.glusterfs.dht=0x0888f559000000002d81f53b509f7813
> trusted.glusterfs.dht=0x0888f55900000000509f781473bcfaec
> trusted.glusterfs.dht=0x0888f5590000000073bcfaed96da7dc5
> trusted.glusterfs.dht=0x0888f5590000000096da7dc6b9ec78f6
>
>
I see that there are bricks with the same value for this xattr. On a pure
distribute volume this shouldn't happen. The file name is hashed and the
result indicates the brick that should contain the file. These xattr define
the range of hashes that will go to each brick. The size of these ranges is
also different on some bricks. These ranges should be proportional to the
size of the brick's disk. If disks are of the same size, the size of the
ranges should be equal.

Ignoring for now these potential issues, the file you mentioned should be
in the brick that has trusted.glusterfs.dht =
0x0888f55900000000d6fd590ae29ce547 (note that there are two bricks with
this value). The only valid link file should be in this brick, or it could
also be the real file. Any link file in other brick is stale and not really
required.

Regards,

Xavi


> Thank you a lot!
> -Stefan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20221019/50c623f6/attachment.html>


More information about the Gluster-users mailing list