[Bugs] [Bug 1222409] New: nfs-ganesha: HA failover happens but I/O does not move ahead when volume has two mounts and I/O going on both mounts
bugzilla at redhat.com
bugzilla at redhat.com
Mon May 18 07:26:02 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1222409
Bug ID: 1222409
Summary: nfs-ganesha: HA failover happens but I/O does not
move ahead when volume has two mounts and I/O going on
both mounts
Product: GlusterFS
Version: 3.7.0
Component: ganesha-nfs
Severity: high
Assignee: bugs at gluster.org
Reporter: saujain at redhat.com
Description of problem:
The problem is that the I/O does not move ahead even though as "pcs status"
command the nfs-ganesha process has failed over to some other node.
The two mounts for the same volume(say vol2) were done using the vers=3 on a
client. The mounts were done using the virtual IP. Now, iozone was started on
both mount points and on one of the server the nfs-ganehsa process was stopped.
So, after a certain grace period on cluster the I/O should have started moving
but that has not happened in this case.
[root at nfs1 ~]# gluster volume status
Status of volume: gluster_shared_storage
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.37.148:/rhs/brick1/d1r1-share 49156 0 Y 3549
Brick 10.70.37.77:/rhs/brick1/d1r2-share 49155 0 Y 3329
Brick 10.70.37.76:/rhs/brick1/d2r1-share 49155 0 Y 3081
Brick 10.70.37.69:/rhs/brick1/d2r2-share 49155 0 Y 3346
Brick 10.70.37.148:/rhs/brick1/d3r1-share 49157 0 Y 3566
Brick 10.70.37.77:/rhs/brick1/d3r2-share 49156 0 Y 3346
Brick 10.70.37.76:/rhs/brick1/d4r1-share 49156 0 Y 3098
Brick 10.70.37.69:/rhs/brick1/d4r2-share 49156 0 Y 3363
Brick 10.70.37.148:/rhs/brick1/d5r1-share 49158 0 Y 3583
Brick 10.70.37.77:/rhs/brick1/d5r2-share 49157 0 Y 3363
Brick 10.70.37.76:/rhs/brick1/d6r1-share 49157 0 Y 3115
Brick 10.70.37.69:/rhs/brick1/d6r2-share 49157 0 Y 3380
Self-heal Daemon on localhost N/A N/A Y 28128
Self-heal Daemon on 10.70.37.69 N/A N/A Y 30533
Self-heal Daemon on 10.70.37.77 N/A N/A Y 16037
Self-heal Daemon on 10.70.37.76 N/A N/A Y 6128
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
Status of volume: vol2
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.70.37.148:/rhs/brick1/d1r1 49153 0 Y 28060
Brick 10.70.37.77:/rhs/brick1/d1r2 49152 0 Y 15975
Brick 10.70.37.76:/rhs/brick1/d2r1 49152 0 Y 6068
Brick 10.70.37.69:/rhs/brick1/d2r2 49152 0 Y 30472
Brick 10.70.37.148:/rhs/brick1/d3r1 49154 0 Y 28077
Brick 10.70.37.77:/rhs/brick1/d3r2 49153 0 Y 15992
Brick 10.70.37.76:/rhs/brick1/d4r1 49153 0 Y 6085
Brick 10.70.37.69:/rhs/brick1/d4r2 49153 0 Y 30489
Brick 10.70.37.148:/rhs/brick1/d5r1 49155 0 Y 28094
Brick 10.70.37.77:/rhs/brick1/d5r2 49154 0 Y 16009
Brick 10.70.37.76:/rhs/brick1/d6r1 49154 0 Y 6102
Brick 10.70.37.69:/rhs/brick1/d6r2 49154 0 Y 30506
Self-heal Daemon on localhost N/A N/A Y 28128
Self-heal Daemon on 10.70.37.69 N/A N/A Y 30533
Self-heal Daemon on 10.70.37.77 N/A N/A Y 16037
Self-heal Daemon on 10.70.37.76 N/A N/A Y 6128
Task Status of Volume vol2
------------------------------------------------------------------------------
There are no active volume tasks
status of nfs-ganesha on all four nodes,
nfs1
====
root 3790 1 0 May13 ? 00:00:09 /usr/sbin/glusterfs
--volfile-server=nfs1 --volfile-id=/gluster_shared_storage
/var/run/gluster/shared_storage
---
nfs2
====
root 3300 1 0 May13 ? 00:00:09 /usr/sbin/glusterfs
--volfile-server=nfs1 --volfile-id=/gluster_shared_storage
/var/run/gluster/shared_storage
root 11003 1 0 May15 ? 00:01:08 /usr/bin/ganesha.nfsd -L
/var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p
/var/run/ganesha.nfsd.pid
---
nfs3
====
root 3577 1 0 May13 ? 00:00:08 /usr/sbin/glusterfs
--volfile-server=nfs1 --volfile-id=/gluster_shared_storage
/var/run/gluster/shared_storage
root 4195 1 0 May15 ? 00:01:08 /usr/bin/ganesha.nfsd -L
/var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p
/var/run/ganesha.nfsd.pid
---
nfs4
====
root 14760 1 0 May15 ? 00:00:04 /usr/sbin/glusterfs
--volfile-server=nfs1 --volfile-id=/gluster_shared_storage
/var/run/gluster/shared_storage
root 23970 1 0 May15 ? 00:02:17 /usr/bin/ganesha.nfsd -L
/var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p
/var/run/ganesha.nfsd.pid
pcs status; this clearly shows that the failover of nfs-ganesha process running
on nfs1 happened on nfs4
[root at nfs1 ~]# pcs status
Cluster name: ganesha-ha-360
Last updated: Mon May 18 12:54:12 2015
Last change: Fri May 15 19:25:20 2015
Stack: cman
Current DC: nfs1 - partition with quorum
Version: 1.1.11-97629de
4 Nodes configured
17 Resources configured
Online: [ nfs1 nfs2 nfs3 nfs4 ]
Full list of resources:
Clone Set: nfs-mon-clone [nfs-mon]
Started: [ nfs1 nfs2 nfs3 nfs4 ]
Clone Set: nfs-grace-clone [nfs-grace]
Started: [ nfs1 nfs2 nfs3 nfs4 ]
nfs1-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs4
nfs1-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs4
nfs2-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs2
nfs2-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs2
nfs3-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs3
nfs3-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs3
nfs4-cluster_ip-1 (ocf::heartbeat:IPaddr): Started nfs4
nfs4-trigger_ip-1 (ocf::heartbeat:Dummy): Started nfs4
nfs1-dead_ip-1 (ocf::heartbeat:Dummy): Started nfs1
Version-Release number of selected component (if applicable):
glusterfs-3.7.0beta2-0.0.el6.x86_64
nfs-ganesha-2.2.0-0.el6.x86_64
How reproducible:
Happens in first attempt itself
Steps to Reproduce:
1. create a volume of type 6x2, start it
2. start nfs-ganesha on all nodes in consideration, after doing the
pre-requisites
3. mount the volume on a client on two mount-points, using different virtual
IPs.
4. on one server, execute servce nfs-ganesha stop
5. wait for grace_period to finish and let I/O resume
Actual results:
step 4 result,
I/O does not resume as expected,
Expected results:
In this case I/O should resume
Additional info:
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the Bugs
mailing list