[Gluster-users] Problems with write-behind with large files on Gluster 3.8.4

Tue Feb 27 02:44:25 UTC 2018

+csaba

On Tue, Feb 27, 2018 at 2:49 AM, Jim Prewett <download at carc.unm.edu> wrote:

>
> Hello,
>
> I'm having problems when write-behind is enabled on Gluster 3.8.4.
>
> I have 2 Gluster servers each with a single brick that is mirrored between
> them.  The code causing these issues reads two data files each approx. 128G
> in size.  It opens a third file, mmap()'s that file, and subsequently reads
> and writes to it.  The third file, on sucessful runs (without write-behind
> enabled) is ultimately approx. 224G in size.
>

What exactly is the problem you are facing with write-behind enabled? Is it
that the file size is smaller?

> The servers have the IP addresses 172.17.2.254 and 172.17.2.255 and the
> client has the IP address 172.17.1.61.  These are all IP over InfiniBand.
>
> I'm attaching logfiles for the brick and for the volume from each of the
> servers and for the client.  I'm also attaching the output of "gluster
> volume info" and "gluster volume get <volume> all".
>
> I have only noticed problems with write-behind being enabled with this one
> particular workload.  When I ran it under strace, I see it seeking all over
> the place and reading and writing little bits of data to/from the third
> file.
>

What is the pattern you see when write-behind is disabled? Can you attach
strace of the application for both scenarios - write-behind enabled and
disabled? Can you also explain the workload and its data access pattern?

> For now, I'm leaving write-behind disabled.  What are the performance
> implications of this for jobs that don't have this strange access pattern?
>

Disabling write-behind can bring down performance for sequential workloads.

> My co-worker who usually maintains the Gluster filesystems here is busy
> having a baby right now and I've gotten it while he's out, so I'm /really/
> new to Gluster and am not confident that anything is correct in my
> configuration (nor do I have a specific reason to doubt its correctness! :)
>
> I have checked the InfiniBand fabric for errors and do not see any beyond
> the normal PortXmitWait counter.  There is no firewall on any of these
> machines.  Their system clocks seem to all be synchronized.
>
> Is there anything additional I can provide to help diagnose this problem?
>
> Thanks for any help you can provide! :)
>
> Jim
>
> James E. Prewett                    Jim at Prewett.org download at hpc.unm.edu
> Systems Team Leader           LoGS: http://www.hpc.unm.edu/~download/LoGS/
> Designated Security Officer         OpenPGP key: pub 1024D/31816D93
> HPC Systems Engineer III   UNM HPC  505.277.8210
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180227/11a73396/attachment.html>