[Gluster-users] Gluster NFS fails to start when replica brick is down

Tue Jun 10 11:57:39 UTC 2014

Hi Brent,

Thanks for reporting the issue. I have created a bugzilla entry for the 
bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1107649
You could add yourself in the CC list to the bug (create a bugzilla 
account if you need to) so that you can be notified about the progress.

Thanks,
Ravi

On 06/10/2014 04:57 PM, Santosh Pradhan wrote:
> Hi Brent,
> Please go ahead and file a bug.
>
> Thanks,
> Santosh
>
> On 06/10/2014 02:29 AM, Kolasinski, Brent D. wrote:
>> Hi all,
>>
>> I have noticed some interesting behavior from my gluster setup regarding
>> NFS on Gluster 3.5.0:
>>
>> My Problem:
>> I have 2 bricks in a replica volume (named gvol0).  This volume is
>> accessed through NFS.  If I fail one of the servers, everything works as
>> expected; gluster NFS continues to export the volume from the remaining
>> brick.  However, if I restart the the glusterd, glusterfsd, and rpcbind
>> services or reboot the remaining host while the other brick is down,
>> gluster NFS no longer exports the volume from the remaining brick.  It
>> appears to share the volume for the gluster-fuse client though. Is this
>> intended behavior, or is this a possible bug?
>>
>> Here is a ps just after a brick fails, with 1 brick remaining to export
>> the volume over gluster NFS:
>>
>> [root at nfs0 ~]# ps aux | grep gluster
>> root      2145  0.0  0.1 518444 24972 ?        Ssl  19:24   0:00
>> /usr/sbin/glusterfsd -s nfs0g --volfile-id gvol0.nfs0g.data-brick0-gvol0
>> -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S
>> /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name
>> /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log
>> --xlator-option 
>> *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c
>> --brick-port 49152 --xlator-option gvol0-server.listen-port=49152
>> root      2494  0.1  0.1 414208 19204 ?        Ssl  19:46   0:02
>> /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
>> root      2511  0.0  0.4 471324 77868 ?        Ssl  19:47   0:00
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
>> /var/run/b0f1e836c0c9f168518e0adba7187c10.socket
>> root      2515  0.0  0.1 334968 25408 ?        Ssl  19:47   0:00
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/173a6cd55e36ea8e0ce0896d27533355.socket --xlator-option
>> *replicate*.node-uuid=49f53699-babd-4731-9c56-582b2b90b27c
>>
>> Here is a ps after restarting the remaining host, with the other brick
>> still down:
>>
>> [root at nfs0 ~]# ps aux | grep gluster
>>
>> root      2134  0.1  0.0 280908 14684 ?        Ssl  20:36   0:00
>> /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
>> root      2144  0.0  0.1 513192 17300 ?        Ssl  20:36   0:00
>> /usr/sbin/glusterfsd -s nfs0g --volfile-id gvol0.nfs0g.data-brick0-gvol0
>> -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S
>> /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name
>> /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log
>> --xlator-option 
>> *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c
>> --brick-port 49152 --xlator-option gvol0-server.listen-port=49152
>>
>> It appears glusterfsd is not starting the gluster NFS service back up 
>> upon
>> reboot of the remaining host.  If I were to restart glusterfsd on the
>> remaining host, it still will not bring up NFS.  However, if I start the
>> gluster service on the host that serves the down brick, NFS will 
>> start up
>> again, without me restarting any services.
>>
>> Here is the volume information:
>>
>> Volume Name: gvol0
>> Type: Replicate
>> Volume ID: e88afc1c-50d3-4e2e-b540-4c2979219d12
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: nfs0g:/data/brick0/gvol0
>> Brick2: nfs1g:/data/brick0/gvol0
>> Options Reconfigured:
>> nfs.disable: 0
>> network.ping-timeout: 3
>>
>>
>>
>> Is this a bug, or intended functionality?
>>
>>
>>
>>
>>
>> ----------
>> Brent Kolasinski
>> Computer Systems Engineer
>>
>> Argonne National Laboratory
>> Decision and Information Sciences
>> ARM Climate Research Facility
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users