[Bugs] [Bug 1226029] New: I/O's hanging on tiered volumes (NFS)

bugzilla at redhat.com bugzilla at redhat.com
Thu May 28 19:16:43 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1226029

            Bug ID: 1226029
           Summary: I/O's hanging on tiered volumes (NFS)
           Product: GlusterFS
           Version: 3.7.0
         Component: tiering
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: rkavunga at redhat.com
        QA Contact: bugs at gluster.org
                CC: annair at redhat.com, bugs at gluster.org,
                    dlambrig at redhat.com, nchilaka at redhat.com,
                    rkavunga at redhat.com, trao at redhat.com
        Depends On: 1222442, 1222840



+++ This bug was initially created as a clone of Bug #1222840 +++

+++ This bug was initially created as a clone of Bug #1222442 +++

Description of problem:

I/O's hanging on tiered volumes

[root at dhcp42-250 gluster]# gluster vol info v1

Volume Name: v1
Type: Tier
Volume ID: cdebe3d4-bf02-4f19-9803-96852a9973a1
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: 10.70.43.107:/rhs/brick2
Brick2: 10.70.42.250:/rhs/brick2
Cold Bricks:
Cold Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick3: 10.70.42.250:/rhs/brick1
Brick4: 10.70.43.107:/rhs/brick1
Options Reconfigured:
performance.readdir-ahead: on

Version-Release number of selected component (if applicable):
glusterfs 3.7.0 built on May 15 2015 01:31:12

How reproducible:

Steps to Reproduce:
1. Create a replica 2 vol
2. Attach another replica 2 hot tier
3. Mount (NFS) on client and start linux untar

Actual results:
I/O's are hanging:
linux-2.6.31.1/drivers/net/skfp/h/hwmtm.h
linux-2.6.31.1/drivers/net/skfp/h/mbuf.h
linux-2.6.31.1/drivers/net/skfp/h/osdef1st.h
linux-2.6.31.1/drivers/net/skfp/h/sba.h
linux-2.6.31.1/drivers/net/skfp/h/sba_def.h
linux-2.6.31.1/drivers/net/skfp/h/skfbi.h
linux-2.6.31.1/drivers/net/skfp/h/skfbiinc.h
linux-2.6.31.1/drivers/net/skfp/h/smc.h
linux-2.6.31.1/drivers/net/skfp/h/smt.h
linux-2.6.31.1/drivers/net/skfp/h/smt_p.h
linux-2.6.31.1/drivers/net/skfp/h/smtstate.h
linux-2.6.31.1/drivers/net/skfp/h/supern_2.h
linux-2.6.31.1/drivers/net/skfp/h/targethw.h
linux-2.6.31.1/drivers/net/skfp/h/targetos.h
linux-2.6.31.1/drivers/net/skfp/h/types.h
linux-2.6.31.1/drivers/net/skfp/hwmtm.c








Expected results:
I/O's should not hang.

Additional info:
Attaching sosreport.

--- Additional comment from Anoop on 2015-05-18 04:53:11 EDT ---



--- Additional comment from RHEL Product and Program Management on 2015-05-18
06:13:36 EDT ---

This request has been proposed as a blocker, but a release flag has
not been requested. Please set a release flag to ? to ensure we may
track this bug against the appropriate upcoming release, and reset
the blocker flag to ?.

--- Additional comment from Anoop on 2015-05-18 12:34:57 EDT ---

I figured out that I get into this issues when I try creating the second tiered
volume. 

This is what I see in the logs:

glusterd.log
The message "I [MSGID: 106006]
[glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has
disconnected from glusterd." repeated 39 times between [2015-05-18
19:09:56.628006] and [2015-05-18 19:11:53.652759]

nfs.log
[2015-05-18 21:09:08.032573] E [graph.y:153:new_volume] 0-parser: Line 175:
volume 'tier-dht' defined again
[2015-05-18 21:09:08.032897] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 21:09:08.033278] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down


This is consistently reproducable.

--- Additional comment from Triveni Rao on 2015-05-19 02:58:44 EDT ---

i see similar problem on my setup. if i have many tiered volumes and create new
volumes like distrep/distribute then tried mounting the newly created volume
using nfs will show connection timed out.

[root at rhsqa14-vm1 ~]# gluster v info test

Volume Name: test
Type: Distribute
Volume ID: 345406fa-17c9-4523-bb00-1b489bb552a0
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.46.233:/rhs/brick1/j0
Brick2: 10.70.46.236:/rhs/brick1/j0
Brick3: 10.70.46.233:/rhs/brick5/j0
Brick4: 10.70.46.236:/rhs/brick5/j0
Options Reconfigured:
performance.readdir-ahead: on
[root at rhsqa14-vm1 ~]# 

[root at rhsqa14-vm5 ~]# mount -t nfs 10.70.46.233:/test /mnt2
mount.nfs: Connection timed out
[root at rhsqa14-vm5 ~]#


Log messages:

2015-05-18 08:37:26.174773] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:26.174850] I [MSGID: 106006]
[glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has
disconnected from glusterd.
[2015-05-18 08:37:29.175448] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:32.176345] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:35.177158] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:38.177997] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:41.179047] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:44.179887] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:47.180348] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:50.181147] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:53.182271] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:55.110046] I
[glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management:
Received rem brick req
[2015-05-18 08:37:55.120848] I
[glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management:
Generated task-id 0570cbd5-a643-4f2f-b19f-12add534b25e for key remove-brick-id
[2015-05-18 08:37:55.660814] E [graph.y:153:new_volume] 0-parser: Line 197:
volume 'tier-dht' defined again
[2015-05-18 08:37:55.669775] W
[glusterd-brick-ops.c:2253:glusterd_op_remove_brick] 0-management: Unable to
reconfigure NFS-Server
[2015-05-18 08:37:55.669801] E [glusterd-syncop.c:1372:gd_commit_op_phase]
0-management: Commit of operation 'Volume Remove brick' failed on localhost
[2015-05-18 08:37:55.670829] E [glusterd-handshake.c:191:build_volfile_path]
0-management: Couldn't find volinfo
[2015-05-18 08:37:55.672561] E [glusterd-handshake.c:191:build_volfile_path]
0-management: Couldn't find volinfo
[2015-05-18 08:37:55.675863] E [glusterd-handshake.c:191:build_volfile_path]
0-management: Couldn't find volinfo
[2015-05-18 08:37:56.183110] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:37:59.183753] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:02.185611] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:05.186239] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:08.186897] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)
[2015-05-18 08:38:11.187513] W [socket.c:642:__socket_rwv] 0-nfs: readv on
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid
argument)



NFS logs:

[2015-05-18 10:02:56.640509] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:02:56.658596] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-18 10:02:57.663067] E [graph.y:153:new_volume] 0-parser: Line 131:
volume 'tier-dht' defined again
[2015-05-18 10:02:57.663212] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:02:57.663564] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-05-18 10:06:01.581244] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:06:01.598237] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-18 10:06:01.602635] E [graph.y:153:new_volume] 0-parser: Line 131:
volume 'tier-dht' defined again
[2015-05-18 10:06:01.602771] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:06:01.602987] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-05-18 10:23:45.699277] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:23:45.720241] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-18 10:23:45.724724] E [graph.y:153:new_volume] 0-parser: Line 167:
volume 'tier-dht' defined again
[2015-05-18 10:23:45.724861] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:23:45.725066] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-05-19 06:15:30.577550] I [MSGID: 100030] [glusterfsd.c:2294:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args:
/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p
/var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S
/var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-19 06:15:30.595539] I [event-epoll.c:629:event_dispatch_epoll_worker]
0-epoll: Started thread with index 1
[2015-05-19 06:15:30.601199] E [graph.y:153:new_volume] 0-parser: Line 228:
volume 'tier-dht' defined again
[2015-05-19 06:15:30.601372] E [MSGID: 100026]
[glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-19 06:15:30.601569] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-:
received signum (0), shutting down

--- Additional comment from Anand Avati on 2015-05-19 05:38:51 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
clinet graph) posted (#1) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-20 00:58:04 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#2) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-20 01:22:27 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#3) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-26 02:49:59 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#4) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-28 01:30:26 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in
client graph) posted (#5) for review on master by mohammed rafi  kc
(rkavunga at redhat.com)

--- Additional comment from Anand Avati on 2015-05-28 10:02:39 EDT ---

COMMIT: http://review.gluster.org/10820 committed in master by Kaushal M
(kaushal at redhat.com) 
------
commit 05566baee6b5f0b2a3b083def4fe9bbdd0f63551
Author: Mohammed Rafi KC <rkavunga at redhat.com>
Date:   Tue May 19 14:54:32 2015 +0530

    tiering/nfs: duplication of nodes in client graph

    When creating client volfiles, xlator tier-dht will
    be loaded for each volume. So for services like nfs
    have one or more volumes . So for each volume in the
    graph a tier-dht xlator will be created. So the graph
    parser will fail because of the redundant node in
    graph.

    By this change tier-dht will be renamed as volname-tier-dht

    Change-Id: I3c9b9c23ddcb853773a8a02be7fd8a5d09a7f972
    BUG: 1222840
    Signed-off-by: Mohammed Rafi KC <rkavunga at redhat.com>
    Reviewed-on: http://review.gluster.org/10820
    Reviewed-by: Atin Mukherjee <amukherj at redhat.com>
    Tested-by: Gluster Build System <jenkins at build.gluster.com>
    Tested-by: NetBSD Build System
    Reviewed-by: Kaushal M <kaushal at redhat.com>


Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1222442
[Bug 1222442] I/O's hanging on tiered volumes (NFS)
https://bugzilla.redhat.com/show_bug.cgi?id=1222840
[Bug 1222840] I/O's hanging on tiered volumes (NFS)
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list