[Gluster-users] Repair after accident

Fri Aug 7 07:24:38 UTC 2020

Hi all,

maybe I should add some more information:

The container which filled up the space was running on node x, which 
still shows a nearly filled fs:

192.168.1.x:/gvol  2.6T  2.5T  149G  95% /gluster

nearly the same situation on the underlying brick partition on node x:

zdata/brick     2.6T  2.4T  176G  94% /zbrick

On node y the network card crashed, glusterfs shows the same values:

192.168.1.y:/gvol  2.6T  2.5T  149G  95% /gluster

but different values on the brick:

zdata/brick     2.9T  1.6T  1.4T  54% /zbrick

I think this happened because glusterfs still has hardlinks to the 
deleted files on node x? So I can find these files with:

find /zbrick/.glusterfs -links 1 -ls | grep -v ' -> '

But now I am lost. How can I verify these files really belongs to the 
right container? Or can I just delete this files because there is no way 
to access it? Or offers glusterfs a way to solve this situation?

Mathias

On 05.08.20 15:48, Mathias Waack wrote:
> Hi all,
>
> we are running a gluster setup with two nodes:
>
> Status of volume: gvol
> Gluster process                             TCP Port  RDMA Port 
> Online  Pid
> ------------------------------------------------------------------------------ 
>
> Brick 192.168.1.x:/zbrick                  49152     0 Y 13350
> Brick 192.168.1.y:/zbrick                  49152     0 Y 5965
> Self-heal Daemon on localhost               N/A       N/A Y 14188
> Self-heal Daemon on 192.168.1.93            N/A       N/A Y 6003
>
> Task Status of Volume gvol
> ------------------------------------------------------------------------------ 
>
> There are no active volume tasks
>
> The glusterfs hosts a bunch of containers with its data volumes. The 
> underlying fs is zfs. Few days ago one of the containers created a lot 
> of files in one of its data volumes, and at the end it completely 
> filled up the space of the glusterfs volume. But this happened only on 
> one host, on the other host there was still enough space. We finally 
> were able to identify this container and found out, the sizes of the 
> data on /zbrick were different on both hosts for this container. Now 
> we made the big mistake to delete these files on both hosts in the 
> /zbrick volume, not on the mounted glusterfs volume.
>
> Later we found the reason for this behavior: the network driver on the 
> second node partially crashed (which means we ware able to login on 
> the node, so we assumed the network was running, but the card was 
> already dropping packets at this time) at the same time, as the failed 
> container started to fill up the gluster volume. After rebooting the 
> second node  the gluster became available again.
>
> Now the glusterfs volume is running again- but it is still (nearly) 
> full: the files created by the container are not visible, but they 
> still count into amount of free space. How can we fix this?
>
> In addition there are some files which are no longer accessible since 
> this accident:
>
> tail access.log.old
> tail: cannot open 'access.log.old' for reading: Input/output error
>
> Looks like affected by this error are files which have been changed 
> during the accident. Is there a way to fix this too?
>
> Thanks
>     Mathias
>
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users