[Gluster-users] ganesha.nfsd process dies when copying files

Wed Aug 15 05:42:40 UTC 2018

Hi Karli,

I think Alex is right in regards with the NFS version and state.

I am only using NFSv3 and the failover is working per expectation.

In my use case, I have 3 nodes with ESXI 6.7 as OS and setup 1x gluster 
VM on each of the ESXI host using its local datastore.

Once I have formed the replicate 3, I use the CTDB VIP to present the 
NFS3 back to the Vcenter and uses it as a shared storage.

Everything works great other than performance is not very good ... I am 
still looking for ways to improve it.

Cheers,
Edy

On 8/15/2018 12:25 AM, Alex Chekholko wrote:
> Hi Karli,
>
> I'm not 100% sure this is related, but when I set up my ZFS NFS HA per 
> https://github.com/ewwhite/zfs-ha/wiki I was not able to get the 
> failover to work with NFS v4 but only with NFS v3.
>
> From the client point of view, it really looked like with NFS v4 there 
> is an open file handle and that just goes stale and hangs, or 
> something like that, whereas with NFSv3 the client retries and 
> recovers and continues.  I did not investigate further, I just use 
> v3.  I think it has something to do with NFSv4 being "stateful" and 
> NFSv3 being "stateless".
>
> Can you re-run your test but using NFSv3 on the client mount?  Or do 
> you need to use v4.x?
>
> Regards,
> Alex
>
> On Tue, Aug 14, 2018 at 6:11 AM Karli Sjöberg <karli at inparadise.se 
> <mailto:karli at inparadise.se>> wrote:
>
>     On Fri, 2018-08-10 at 09:39 -0400, Kaleb S. KEITHLEY wrote:
>     > On 08/10/2018 09:23 AM, Karli Sjöberg wrote:
>     > > On Fri, 2018-08-10 at 21:23 +0800, Pui Edylie wrote:
>     > > > Hi Karli,
>     > > >
>     > > > Storhaug works with glusterfs 4.1.2 and latest nfs-ganesha.
>     > > >
>     > > > I just installed them last weekend ... they are working very
>     well
>     > > > :)
>     > >
>     > > Okay, awesome!
>     > >
>     > > Is there any documentation on how to do that?
>     > >
>     >
>     > https://github.com/gluster/storhaug/wiki
>     >
>
>     Thanks Kaleb and Edy!
>
>     I have now redone the cluster using the latest and greatest following
>     the above guide and repeated the same test I was doing before (the
>     rsync while loop) with success. I let (forgot) it run for about a day
>     and it was still chugging along nicely when I aborted it, so success
>     there!
>
>     On to the next test; the catastrophic failure test- where one of the
>     servers dies, I'm having a more difficult time with.
>
>     1) I start with mounting the share over NFS 4.1 and then proceed with
>     writing a 8 GiB large random data file with 'dd', while "hard-cutting"
>     the power to the server I'm writing to, the transfer just stops
>     indefinitely, until the server comes back again. Is that supposed to
>     happen? Like this:
>
>     # dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=8192
>     # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
>     # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
>     2434793472 bytes (2,4 GB, 2,3 GiB) copied, 42 s, 57,9 MB/s
>
>     (here I cut the power and let it be for almost two hours before
>     turning
>     it on again)
>
>     dd: error writing '/mnt/test.bin': Remote I/O error
>     2325+0 records in
>     2324+0 records out
>     2436890624 bytes (2,4 GB, 2,3 GiB) copied, 6944,84 s, 351 kB/s
>     # umount /mnt
>
>     Here the unmount command hung and I had to hard reset the client.
>
>     2) Another question I have is why some files "change" as you copy them
>     out to the Gluster storage? Is that the way it should be? This time, I
>     deleted eveything in the destination directory to start over:
>
>     # mount -o vers=4.1 hv03v.localdomain:/data /mnt/
>     # rm -f /mnt/test.bin
>     # dd if=/var/tmp/test.bin of=/mnt/test.bin bs=1M status=progress
>     8557428736 bytes (8,6 GB, 8,0 GiB) copied, 122 s, 70,1 MB/s
>     8192+0 records in
>     8192+0 records out
>     8589934592 bytes (8,6 GB, 8,0 GiB) copied, 123,039 s, 69,8 MB/s
>     # md5sum /var/tmp/test.bin
>     073867b68fa8eaa382ffe05adb90b583  /var/tmp/test.bin
>     # md5sum /mnt/test.bin
>     634187d367f856f3f5fb31846f796397  /mnt/test.bin
>     # umount /mnt
>
>     Thanks in advance!
>
>     /K
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180815/8525ac5c/attachment.html>