[Gluster-users] Repair after accident

Mathias Waack mathias.waack at seim-partner.de
Sat Aug 8 15:02:10 UTC 2020


So b53c8e46-068b-4286-94a6-7cf54f711983 is not a gfid? What else is it?

Mathias

On 08.08.20 09:00, Strahil Nikolov wrote:
> In glusterfs the long string is called "gfid" and does not represent the name.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В петък, 7 август 2020 г., 21:40:11 Гринуич+3, Mathias Waack <mathias.waack at seim-partner.de> написа:
>
>
>
>
>
> Hi Strahil,
>
> but I cannot find these files in the heal info:
>
> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> '
> ...
> 7443397  132463 -rw-------   1 999      docker   1073741824 Aug  3 10:35
> /zbrick/.glusterfs/b5/3c/b53c8e46-068b-4286-94a6-7cf54f711983
>
> Now looking for this file in the heal infos:
>
> gluster volume heal gvol info | grep b53c8e46-068b-4286-94a6-7cf54f711983
>
> shows nothing.
>
> So I do not know, what I have to heal...
>
> Mathias
>
> On 07.08.20 14:32, Strahil Nikolov wrote:
>> Have you tried to gluster heal and check if the files are back into their place?
>>
>> I always thought that those hard links are used  by the healing mechanism  and if that is true - gluster should restore the files to their original location and then wiping the correct files from FUSE will be easy.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 7 август 2020 г. 10:24:38 GMT+03:00, Mathias Waack <mathias.waack at seim-partner.de> написа:
>>> Hi all,
>>>
>>> maybe I should add some more information:
>>>
>>> The container which filled up the space was running on node x, which
>>> still shows a nearly filled fs:
>>>
>>> 192.168.1.x:/gvol  2.6T  2.5T  149G  95% /gluster
>>>
>>> nearly the same situation on the underlying brick partition on node x:
>>>
>>> zdata/brick     2.6T  2.4T  176G  94% /zbrick
>>>
>>> On node y the network card crashed, glusterfs shows the same values:
>>>
>>> 192.168.1.y:/gvol  2.6T  2.5T  149G  95% /gluster
>>>
>>> but different values on the brick:
>>>
>>> zdata/brick     2.9T  1.6T  1.4T  54% /zbrick
>>>
>>> I think this happened because glusterfs still has hardlinks to the
>>> deleted files on node x? So I can find these files with:
>>>
>>> find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> '
>>>
>>> But now I am lost. How can I verify these files really belongs to the
>>> right container? Or can I just delete this files because there is no
>>> way
>>> to access it? Or offers glusterfs a way to solve this situation?
>>>
>>> Mathias
>>>
>>> On 05.08.20 15:48, Mathias Waack wrote:
>>>> Hi all,
>>>>
>>>> we are running a gluster setup with two nodes:
>>>>
>>>> Status of volume: gvol
>>>> Gluster process                             TCP Port  RDMA Port
>>>> Online  Pid
>>>>
>>> ------------------------------------------------------------------------------
>>>
>>>> Brick 192.168.1.x:/zbrick                  49152     0 Y 13350
>>>> Brick 192.168.1.y:/zbrick                  49152     0 Y 5965
>>>> Self-heal Daemon on localhost               N/A       N/A Y 14188
>>>> Self-heal Daemon on 192.168.1.93            N/A       N/A Y 6003
>>>>
>>>> Task Status of Volume gvol
>>>>
>>> ------------------------------------------------------------------------------
>>>
>>>> There are no active volume tasks
>>>>
>>>> The glusterfs hosts a bunch of containers with its data volumes. The
>>>> underlying fs is zfs. Few days ago one of the containers created a
>>> lot
>>>> of files in one of its data volumes, and at the end it completely
>>>> filled up the space of the glusterfs volume. But this happened only
>>> on
>>>> one host, on the other host there was still enough space. We finally
>>>> were able to identify this container and found out, the sizes of the
>>>> data on /zbrick were different on both hosts for this container. Now
>>>> we made the big mistake to delete these files on both hosts in the
>>>> /zbrick volume, not on the mounted glusterfs volume.
>>>>
>>>> Later we found the reason for this behavior: the network driver on
>>> the
>>>> second node partially crashed (which means we ware able to login on
>>>> the node, so we assumed the network was running, but the card was
>>>> already dropping packets at this time) at the same time, as the
>>> failed
>>>> container started to fill up the gluster volume. After rebooting the
>>>> second node  the gluster became available again.
>>>>
>>>> Now the glusterfs volume is running again- but it is still (nearly)
>>>> full: the files created by the container are not visible, but they
>>>> still count into amount of free space. How can we fix this?
>>>>
>>>> In addition there are some files which are no longer accessible since
>>>> this accident:
>>>>
>>>> tail access.log.old
>>>> tail: cannot open 'access.log.old' for reading: Input/output error
>>>>
>>>> Looks like affected by this error are files which have been changed
>>>> during the accident. Is there a way to fix this too?
>>>>
>>>> Thanks
>>>>        Mathias
>>>>
>>>>
>>>> ________
>>>>
>>>>
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> Schedule -
>>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>>> Bridge: https://bluejeans.com/441850968
>>>>
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://bluejeans.com/441850968
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


More information about the Gluster-users mailing list