[Gluster-users] Glusterfs 3.1.2 NFS offline when one brick is down

Wed Mar 9 21:33:28 UTC 2011

Hi Everyone,

I followed the simple documentation for 3.1 to setup two boxes in replication mode, and mounted NFS on ESXi 4.1.0 Everything worked right away until I took the box that ESXi is not directly pointed to. The whole NFS is not accessible until the box come back online.

The boxes are running CentOS 5.5. My installation steps are:

1. Install CentOS binary
2. Create trusted storage pool
3. Create replica 2 volume, start the volume

Then I mount the first box through NFS on ESXi, boom, everything started to work. Thanks to Gluster team's great work. This is by far the quickest and easiest open-source installation I have used.

When the second box, which my ESXi is not pointed to, is unplugged or shutdown, nfs.log on the first box gets following logs when the whole NFS become inaccessible.

[2011-03-09 14:16:13.864156] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x391a60f779] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x391a60ef2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x391a60ee9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-03-09 14:16:01.154175
[2011-03-09 14:16:13.864207] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x391a60f779] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x391a60ef2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x391a60ee9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-03-09 14:16:10.154410
[2011-03-09 14:16:13.864244] I [client.c:1590:client_rpc_notify] test-volume-client-1: disconnected

Has anyone else run into this?

Thanks in advance.

Hugh