[Gluster-users] Gluster extra large file on brick

Tue Jul 13 05:34:08 UTC 2021

Hi Dan,

On Mon, Jul 12, 2021 at 2:20 PM Dan Thomson <dthomson at triumf.ca> wrote:

> Hi gluster users,
>
> I'm having an issue that I'm hoping to get some help with on a
> dispersed volume (EC: 2x(4+2)) that's causing me some headaches. This is
> on a cluster running Gluster 6.9 on CentOS 7.
>
> At some point in the last week, writes to one of my bricks have started
> failing due to an "No Space Left on Device" error:
>
> [2021-07-06 16:08:57.261307] E [MSGID: 115067]
> [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-gluster-01-server:
> 1853436561: WRITEV -2 (f2d6f2f8-4fd7-4692-bd60-23124897be54), client:
> CTX_ID:648a7383-46c8-4ed7-a921-acafc90bec1a-GRAPH_ID:4-PID:19471-HOST:rhevh08.mgmt.triumf.ca-PC_NAME:gluster-01-client-5-RECON_NO:-5,
> error-xlator: gluster-01-posix [No space left on device]
>
> The disk is quite full (listed as 100% on the server), but does have
> some writable room left:
>
> /dev/mapper/vg--brick1-brick1
>               11T   11T   97G 100% /data/glusterfs/gluster-01/brick1
>
> however, I'm not sure if the amount of disk space used on the physical
> drive is the true cause of the "No Space Left on Device" errors anyway.
> I can still manually write to this brick outside of Gluster, so it seems
> like the operating system isn't preventing the writes from happening.
>

As Strahil has said, you are probably hitting the minimum space reserved by
Gluster. You can try those options. However I don't recommend keeping
bricks above 90% utilization. All filesystems, including XFS, tend to
degrade performance when available space is limited. If the brick's
filesystem works worse, Gluster performance will also drop.

> During my investigation, I noticed that one .glusterfs paths on the problem
> server is using up much more space than it is on the other servers. I can't
> quite figure out why that might be, or how that happened. I'm wondering
> if there's any advice on what the cause might've been.
>
> I had done some package updates on this server with the issue and not on
> the
> other servers. This included the kernel version, but didn't include the
> Gluster
> packages. So possibly this, or the reboot to load the new kernel may
> have caused a problem. I have scripts on my gluster machines to nicely kill
> all of the brick processes before rebooting, so I'm not leaning towards
> an abrupt shutdown being the cause, but it's a possibility.
>
> I'm also looking for advice on how to safely remove the problem file and
> rebuild it from the other Gluster peers. I've seen some documentation on
> this, but I'm a little nervous about corrupting the volume if I
> misunderstand the process. I'm not free to take the volume or cluster down
> and
> do maintenance at this point, but that might be something I'll have to
> consider
> if it's my only option.
>
> For reference, here's the comparison of the same path that seems to be
> taking up extra space on one of the hosts:
>
> 1: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 2: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 3: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 4: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 5: 26G     /data/gluster-01/brick1/vol/.glusterfs/99/56
> 6: 3.0T    /data/gluster-01/brick1/vol/.glusterfs/99/56
>

This is not normal at all. In a dispersed volume all bricks should use
roughly the same used space.

Can you provide the output of the following commands:

    # gluster volume info <volname>
    # gluster volume status <volname>

Also provide the output of this command from all bricks:

    # ls -ls /data/gluster-01/brick1/vol/.glusterfs/99/56

Regards,

Xavi

> Any and all advice is appreciated.
>
> Thanks!
> --
>
> Daniel Thomson
> DevOps Engineer
> t +1 604 222 7428
> dthomson at triumf.ca
> TRIUMF Canada's particle accelerator centre
> www.triumf.ca @TRIUMFLab
> 4004 Wesbrook Mall
> Vancouver BC V6T 2A3 Canada
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210713/c7ea009b/attachment.html>