[Bugs] [Bug 1226029] New: I/O's hanging on tiered volumes (NFS)
bugzilla at redhat.com
bugzilla at redhat.com
Thu May 28 19:16:43 UTC 2015
https://bugzilla.redhat.com/show_bug.cgi?id=1226029
Bug ID: 1226029
Summary: I/O's hanging on tiered volumes (NFS)
Product: GlusterFS
Version: 3.7.0
Component: tiering
Keywords: Triaged
Severity: high
Assignee: bugs at gluster.org
Reporter: rkavunga at redhat.com
QA Contact: bugs at gluster.org
CC: annair at redhat.com, bugs at gluster.org,
dlambrig at redhat.com, nchilaka at redhat.com,
rkavunga at redhat.com, trao at redhat.com
Depends On: 1222442, 1222840
+++ This bug was initially created as a clone of Bug #1222840 +++
+++ This bug was initially created as a clone of Bug #1222442 +++
Description of problem:
I/O's hanging on tiered volumes
[root at dhcp42-250 gluster]# gluster vol info v1
Volume Name: v1
Type: Tier
Volume ID: cdebe3d4-bf02-4f19-9803-96852a9973a1
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: 10.70.43.107:/rhs/brick2
Brick2: 10.70.42.250:/rhs/brick2
Cold Bricks:
Cold Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick3: 10.70.42.250:/rhs/brick1
Brick4: 10.70.43.107:/rhs/brick1
Options Reconfigured:
performance.readdir-ahead: on
Version-Release number of selected component (if applicable):
glusterfs 3.7.0 built on May 15 2015 01:31:12
How reproducible:
Steps to Reproduce:
1. Create a replica 2 vol
2. Attach another replica 2 hot tier
3. Mount (NFS) on client and start linux untar
Actual results:
I/O's are hanging:
linux-2.6.31.1/drivers/net/skfp/h/hwmtm.h
linux-2.6.31.1/drivers/net/skfp/h/mbuf.h
linux-2.6.31.1/drivers/net/skfp/h/osdef1st.h
linux-2.6.31.1/drivers/net/skfp/h/sba.h
linux-2.6.31.1/drivers/net/skfp/h/sba_def.h
linux-2.6.31.1/drivers/net/skfp/h/skfbi.h
linux-2.6.31.1/drivers/net/skfp/h/skfbiinc.h
linux-2.6.31.1/drivers/net/skfp/h/smc.h
linux-2.6.31.1/drivers/net/skfp/h/smt.h
linux-2.6.31.1/drivers/net/skfp/h/smt_p.h
linux-2.6.31.1/drivers/net/skfp/h/smtstate.h
linux-2.6.31.1/drivers/net/skfp/h/supern_2.h
linux-2.6.31.1/drivers/net/skfp/h/targethw.h
linux-2.6.31.1/drivers/net/skfp/h/targetos.h
linux-2.6.31.1/drivers/net/skfp/h/types.h
linux-2.6.31.1/drivers/net/skfp/hwmtm.c
Expected results:
I/O's should not hang.
Additional info:
Attaching sosreport.
--- Additional comment from Anoop on 2015-05-18 04:53:11 EDT ---
--- Additional comment from RHEL Product and Program Management on 2015-05-18
06:13:36 EDT ---
This request has been proposed as a blocker, but a release flag has
not been requested. Please set a release flag to ? to ensure we may
track this bug against the appropriate upcoming release, and reset
the blocker flag to ?.
--- Additional comment from Anoop on 2015-05-18 12:34:57 EDT ---
I figured out that I get into this issues when I try creating the second tiered
volume.
This is what I see in the logs:
glusterd.log
The message "I [MSGID: 106006]
[glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has
disconnected from glusterd." repeated 39 times between [2015-05-18
19:09:56.628006] and [2015-05-18 19:11:53.652759]
nfs.log
[2015-05-18 21:09:08.032573] E [graph.y:153:new_volume] 0-parser: Line 175:
volume 'tier-dht' defined again
[2015-05-18 21:09:08.032897] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 21:09:08.033278] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
This is consistently reproducable.
--- Additional comment from Triveni Rao on 2015-05-19 02:58:44 EDT ---
i see similar problem on my setup. if i have many tiered volumes and create new
volumes like distrep/distribute then tried mounting the newly created volume
using nfs will show connection timed out.
[root at rhsqa14-vm1 ~]# gluster v info test
Volume Name: test
Type: Distribute
Volume ID: 345406fa-17c9-4523-bb00-1b489bb552a0
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.46.233:/rhs/brick1/j0
Brick2: 10.70.46.236:/rhs/brick1/j0
Brick3: 10.70.46.233:/rhs/brick5/j0
Brick4: 10.70.46.236:/rhs/brick5/j0
Options Reconfigured:
performance.readdir-ahead: on
[root at rhsqa14-vm1 ~]#
[root at rhsqa14-vm5 ~]# mount -t nfs 10.70.46.233:/test /mnt2
mount.nfs: Connection timed out
[root at rhsqa14-vm5 ~]#
Log messages:
2015-05-18 08:37:26.174773] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:26.174850] I [MSGID: 106006]
[glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has
disconnected from glusterd.
[2015-05-18 08:37:29.175448] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:32.176345] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:35.177158] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:38.177997] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:41.179047] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:44.179887] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:47.180348] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:50.181147] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:53.182271] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:55.110046] I
[glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management:
Received rem brick req
[2015-05-18 08:37:55.120848] I
[glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management:
Generated task-id 0570cbd5-a643-4f2f-b19f-12add534b25e for key remove-brick-id
[2015-05-18 08:37:55.660814] E [graph.y:153:new_volume] 0-parser: Line 197:
volume 'tier-dht' defined again
[2015-05-18 08:37:55.669775] W
[glusterd-brick-ops.c:2253:glusterd_op_remove_brick] 0-management: Unable to
reconfigure NFS-Server
[2015-05-18 08:37:55.669801] E [glusterd-syncop.c:1372:gd_commit_op_phase]
0-management: Commit of operation 'Volume Remove brick' failed on localhost
[2015-05-18 08:37:55.670829] E [glusterd-handshake.c:191:build_volfile_path]
0-management: Couldn't find volinfo
[2015-05-18 08:37:55.672561] E [glusterd-handshake.c:191:build_volfile_path]
0-management: Couldn't find volinfo
[2015-05-18 08:37:55.675863] E [glusterd-handshake.c:191:build_volfile_path]
0-management: Couldn't find volinfo
[2015-05-18 08:37:56.183110] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:59.183753] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:02.185611] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:05.186239] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:08.186897] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:11.187513] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
NFS logs:
[2015-05-18 10:02:56.640509] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:02:56.658596] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-18 10:02:57.663067] E [graph.y:153:new_volume] 0-parser: Line 131:
volume 'tier-dht' defined again
[2015-05-18 10:02:57.663212] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:02:57.663564] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-05-18 10:06:01.581244] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:06:01.598237] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-18 10:06:01.602635] E [graph.y:153:new_volume] 0-parser: Line 131:
volume 'tier-dht' defined again
[2015-05-18 10:06:01.602771] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:06:01.602987] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-05-18 10:23:45.699277] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:23:45.720241] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-18 10:23:45.724724] E [graph.y:153:new_volume] 0-parser: Line 167:
volume 'tier-dht' defined again
[2015-05-18 10:23:45.724861] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:23:45.725066] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-05-19 06:15:30.577550] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-19 06:15:30.595539] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-19 06:15:30.601199] E [graph.y:153:new_volume] 0-parser: Line 228:
volume 'tier-dht' defined again
[2015-05-19 06:15:30.601372] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-19 06:15:30.601569] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
--- Additional comment from Anand Avati on 2015-05-19 05:38:51 EDT ---
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
clinet graph) posted (#1) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-20 00:58:04 EDT ---
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#2) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-20 01:22:27 EDT ---
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#3) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-26 02:49:59 EDT ---
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#4) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-28 01:30:26 EDT ---
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#5) for review on master by mohammed rafi kc
(rkavunga at redhat.com)
--- Additional comment from Anand Avati on 2015-05-28 10:02:39 EDT ---
COMMIT: http://review.gluster.org/10820 committed in master by Kaushal M
(kaushal at redhat.com)
------
commit 05566baee6b5f0b2a3b083def4fe9bbdd0f63551
Author: Mohammed Rafi KC <rkavunga at redhat.com>
Date: Tue May 19 14:54:32 2015 +0530
tiering/nfs: duplication of nodes in client graph
When creating client volfiles, xlator tier-dht will
be loaded for each volume. So for services like nfs
have one or more volumes . So for each volume in the
graph a tier-dht xlator will be created. So the graph
parser will fail because of the redundant node in
graph.
By this change tier-dht will be renamed as volname-tier-dht
Change-Id: I3c9b9c23ddcb853773a8a02be7fd8a5d09a7f972
BUG: 1222840
Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
Reviewed-on: http://review.gluster.org/10820
Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
Tested-by: Gluster Build System <jenkins at build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaushal M <kaushal at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1222442
[Bug 1222442] I/O's hanging on tiered volumes (NFS)
https://bugzilla.redhat.com/show_bug.cgi?id=1222840
[Bug 1222840] I/O's hanging on tiered volumes (NFS)
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list