[Gluster-users] Gluster NFS fails to start when replica brick is down

Wed Jun 11 16:32:50 UTC 2014

Thanks all for the fast response.

I was able to successfully start NFS for the volume manually (as reported
in the bug comments) by doing the following:

gluster volume start $vol force

Thanks!

----------
Brent Kolasinski
Computer Systems Engineer

Argonne National Laboratory
Decision and Information Sciences
ARM Climate Research Facility

On 6/10/14, 6:57 AM, "Ravishankar N" <ravishankar at redhat.com> wrote:

>Hi Brent,
>
>Thanks for reporting the issue. I have created a bugzilla entry for the
>bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1107649
>You could add yourself in the CC list to the bug (create a bugzilla
>account if you need to) so that you can be notified about the progress.
>
>Thanks,
>Ravi
>
>On 06/10/2014 04:57 PM, Santosh Pradhan wrote:
>> Hi Brent,
>> Please go ahead and file a bug.
>>
>> Thanks,
>> Santosh
>>
>> On 06/10/2014 02:29 AM, Kolasinski, Brent D. wrote:
>>> Hi all,
>>>
>>> I have noticed some interesting behavior from my gluster setup
>>>regarding
>>> NFS on Gluster 3.5.0:
>>>
>>> My Problem:
>>> I have 2 bricks in a replica volume (named gvol0).  This volume is
>>> accessed through NFS.  If I fail one of the servers, everything works
>>>as
>>> expected; gluster NFS continues to export the volume from the remaining
>>> brick.  However, if I restart the the glusterd, glusterfsd, and rpcbind
>>> services or reboot the remaining host while the other brick is down,
>>> gluster NFS no longer exports the volume from the remaining brick.  It
>>> appears to share the volume for the gluster-fuse client though. Is this
>>> intended behavior, or is this a possible bug?
>>>
>>> Here is a ps just after a brick fails, with 1 brick remaining to export
>>> the volume over gluster NFS:
>>>
>>> [root at nfs0 ~]# ps aux | grep gluster
>>> root      2145  0.0  0.1 518444 24972 ?        Ssl  19:24   0:00
>>> /usr/sbin/glusterfsd -s nfs0g --volfile-id
>>>gvol0.nfs0g.data-brick0-gvol0
>>> -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S
>>> /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name
>>> /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log
>>> --xlator-option
>>> *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c
>>> --brick-port 49152 --xlator-option gvol0-server.listen-port=49152
>>> root      2494  0.1  0.1 414208 19204 ?        Ssl  19:46   0:02
>>> /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
>>> root      2511  0.0  0.4 471324 77868 ?        Ssl  19:47   0:00
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
>>> /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
>>> /var/run/b0f1e836c0c9f168518e0adba7187c10.socket
>>> root      2515  0.0  0.1 334968 25408 ?        Ssl  19:47   0:00
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/173a6cd55e36ea8e0ce0896d27533355.socket --xlator-option
>>> *replicate*.node-uuid=49f53699-babd-4731-9c56-582b2b90b27c
>>>
>>> Here is a ps after restarting the remaining host, with the other brick
>>> still down:
>>>
>>> [root at nfs0 ~]# ps aux | grep gluster
>>>
>>> root      2134  0.1  0.0 280908 14684 ?        Ssl  20:36   0:00
>>> /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
>>> root      2144  0.0  0.1 513192 17300 ?        Ssl  20:36   0:00
>>> /usr/sbin/glusterfsd -s nfs0g --volfile-id
>>>gvol0.nfs0g.data-brick0-gvol0
>>> -p /var/lib/glusterd/vols/gvol0/run/nfs0g-data-brick0-gvol0.pid -S
>>> /var/run/91885b40ac4835907081de3bdc235620.socket --brick-name
>>> /data/brick0/gvol0 -l /var/log/glusterfs/bricks/data-brick0-gvol0.log
>>> --xlator-option
>>> *-posix.glusterd-uuid=49f53699-babd-4731-9c56-582b2b90b27c
>>> --brick-port 49152 --xlator-option gvol0-server.listen-port=49152
>>>
>>> It appears glusterfsd is not starting the gluster NFS service back up
>>> upon
>>> reboot of the remaining host.  If I were to restart glusterfsd on the
>>> remaining host, it still will not bring up NFS.  However, if I start
>>>the
>>> gluster service on the host that serves the down brick, NFS will
>>> start up
>>> again, without me restarting any services.
>>>
>>> Here is the volume information:
>>>
>>> Volume Name: gvol0
>>> Type: Replicate
>>> Volume ID: e88afc1c-50d3-4e2e-b540-4c2979219d12
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: nfs0g:/data/brick0/gvol0
>>> Brick2: nfs1g:/data/brick0/gvol0
>>> Options Reconfigured:
>>> nfs.disable: 0
>>> network.ping-timeout: 3
>>>
>>>
>>>
>>> Is this a bug, or intended functionality?
>>>
>>>
>>>
>>>
>>>
>>> ----------
>>> Brent Kolasinski
>>> Computer Systems Engineer
>>>
>>> Argonne National Laboratory
>>> Decision and Information Sciences
>>> ARM Climate Research Facility
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>