[Gluster-devel] Multiple NFS Servers (Gluster NFS in 3.x, unfsd, knfsd, etc.)

Wed Jan 6 22:57:26 UTC 2010

Gordan Bobic wrote:

>> With native NFS there'll be no need to first mount a glusterFS
>> FUSE based volume and then export it as NFS. The way it has been 
>> developed is that
>> any glusterfs volume in the volfile can be exported using NFS by adding
>> an NFS volume over it in the volfile. This is something that will become
>> clearer from the sample vol files when 3.0.1 comes out.
> 
> It may be worth checking the performance of that solution vs the 
> performance of the standalone unfsd unbound to portmap/mountd over 
> mounted glfs volumes, as I discovered today that the performance feels 
> very similar to native knfsd and server-side AFR, but without the 
> fuse.ko complications of the former and the buggyness of the latter 
> (e.g. see bug 186: 
> http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=186 - that bug 
> has been driving me nuts since before 2.0.0 was released)
> 
> I'd hate to see this be another wasted effort like booster when there is 
> a solution that already works.
> 
>> The answer to your question is, yes, it will be possible to export your
>> local file system with knfsd and glusterfs distributed-replicated volumes
>> with Gluster NFS translator BUT not in the first release.
> 
> See comment above. Isn't that all the more reason to double check 
> performance figures before even bothering?
> 
> In fact, I may have just convinced myself to acquire some iozone 
> performance figures. Will report later.

OK, I couldn't get iozone to report sane results. glfs was reporting 
things in the reasonable ball park I'd expect (between 7MB/s and 110MB/s 
which is what I'd expect on gigabit ethernet). NFS was reporting figures 
that look more like the memory bandwidth so I'd guess that FS-Cache was 
taking over. With O_DIRECT and O_SYNC figures were in the 700KB/s range 
for NFS which is clearly not sane because in actual use the two seem 
fairly equivalent.

So - I did a redneck test instead - dd 64MB of /dev/zero to a file on 
the mounted partition.

On writes, NFS gets 4.4MB/s, GlusterFS (server side AFR) gets 4.6MB/s. 
Pretty even.
On reads GlusterFS gets 117MB/s, NFS gets 119MB/s (on the first read 
after flushing the caches, after that it goes up to 600MB/s). The 
difference in the unbuffered readings seems to be in the sane ball park 
and the difference on the reads is roughly what I'd expect considering 
NFS is running UDP and GLFS is running TCP.

So in conclusion - there is no performance difference between them worth 
speaking of. So what is the point in implementing a user-space NFS 
handler in glusterfsd when unfsd seems to do the job as well as 
glusterfsd could reasonably hope to?

There is, however a problem iozone showed up - it wouldn't complete on 
glfs. The glusterfs client would cause it to error out before it 
finished. iozone would report errors like this:

Error writing block 2047, fd= 3
write: Transport endpoint is not connected

Error writing block 2047, fd= 3
write: File descriptor in bad state

Error writing block 2047, fd= 3
write: File descriptor in bad state

/home/gordan/test/f1: Transport endpoint is not connected
/home/gordan/test/f2: Transport endpoint is not connected
/home/gordan/test/f3: Transport endpoint is not connected
/home/gordan/test/f4: Transport endpoint is not connected

On the client, the logs show things like this:

[2010-01-06 21:42:30] E 
[client-protocol.c:457:client_ping_timer_expired] home: Server 
10.2.0.10:7000 has not responded in the last
10 seconds, disconnecting.
[2010-01-06 21:42:30] E [saved-frames.c:165:saved_frames_unwind] home: 
forced unwinding frame type(1) op(FSYNC)
[2010-01-06 21:42:30] W [fuse-bridge.c:888:fuse_err_cbk] glusterfs-fuse: 
269780: FSYNC() ERR => -1 (Transport endpoint is not connec
ted)

Followed by lots of this:

[2010-01-06 21:46:39] W [fuse-bridge.c:1540:fuse_writev_cbk] 
glusterfs-fuse: 532985: WRITE => -1 (Transport endpoint is not connecte
d)

and this:

[2010-01-06 21:53:41] W [fuse-bridge.c:888:fuse_err_cbk] glusterfs-fuse: 
537456: FLUSH() ERR => -1 (File descriptor in bad state)

glfs seems to be rather fragile when load starts approaching wire speed. 
My ssh sessions running top to the same machine didn't disconnect or 
show any noticeable lag, so the disconnections are probably uncalled for 
- there ought to be a more graceful way to deal with it. How often does 
it send heartbeat packets and how many in a row have to go missing 
before it decided to disconnect?

Another thing I noticed is that even though the server glusterfs process 
was running with it's server side AFR export, the first time I tried to 
connect to it from the client after some hours of using the NFS mount, 
the server process appears to have crashed. This was what ended up in 
it's log:

[2010-01-06 21:35:54] N [server-protocol.c:7065:mop_setvolume] 
server-home: accepted client from 10.2.3.1:1023
[2010-01-06 21:35:54] N [server-protocol.c:7065:mop_setvolume] 
server-home: accepted client from 10.2.3.1:1022
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LK)

patchset: v2.0.9
signal received: 11
time of crash: 2010-01-06 21:36:12
configuration details:
argp 1
backtrace 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.9
/lib64/libc.so.6[0x3f55e302d0]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(client_lookup+0xc8)[0x2afea8f89438]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_lookup+0x226)[0x2afea97e3e66]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/server.so(server_lookup_cbk+0x513)[0x2afea95cb2a3]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_self_heal_cbk+0x8e)[0x2afea97e46fe]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_done+0xbe)[0x2afea97f8bce]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_flush_cbk+0x44)[0x2afea97fa284]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_utimes_cbk+0x9)[0x2afea97fa2a9]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(client_utimens_cbk+0x14e)[0x2afea8f9152e]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x2afea8f808aa]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(notify+0x212)[0x2afea8f874e2]
/usr/lib64/glusterfs/2.0.9/transport/socket.so(socket_event_handler+0xd3)[0x2aaaaaaafe33]
/usr/lib64/libglusterfs.so.0[0x3f56a27115]
/usr/sbin/glusterfs(main+0xa06)[0x403e96]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f55e1d994]
/usr/sbin/glusterfs[0x402509]
---------

None of the above went wrong when using the unfsd mount - and that 
really doesn't look very confidence inspiring in a stable release (2.0.9).

Gordan