[Bugs] [Bug 1647165] New: [SNAPSHOT]: with brick multiplexing, snapshot restore will make glusterd send wrong volfile

Tue Nov 6 19:05:45 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1647165

            Bug ID: 1647165
           Summary: [SNAPSHOT]: with brick multiplexing, snapshot restore
                    will make glusterd send wrong volfile
           Product: GlusterFS
           Version: 3.12
         Component: snapshot
          Assignee: bugs at gluster.org
          Reporter: rabhat at redhat.com
                CC: bugs at gluster.org, rabhat at redhat.com,
                    rkavunga at redhat.com
        Depends On: 1635050, 1636218
            Blocks: 1636291, 1636162

+++ This bug was initially created as a clone of Bug #1636218 +++

+++ This bug was initially created as a clone of Bug #1635050 +++

Description of problem:

With brick multiplexing, when a snapshot restore is done, it leads to a
snapshot brick getting the client volume file for the corresponding snapshot
volume instead of the correct brick volume file.

[2018-09-28 23:34:10.405953] I
[glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching
not-only child
/run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick1/mirror
[2018-09-28 23:34:10.406009] I
[glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching
not-only child
/run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick2/mirror

The above 2 lines indicate that it is a brick process. Because of restore, the
original snapshot volume's bricks are stopped leading to a detach of the bricks
happening which is logged above.

[2018-09-28 23:34:10.407276] E
[rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not
unregister with portmap
[2018-09-28 23:34:10.407276] E
[rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not
unregister with portmap
[2018-09-28 23:34:10.407299] E [rpcsvc.c:1670:rpcsvc_program_unregister]
0-rpc-service: portmap unregistration of program failed
[2018-09-28 23:34:10.407310] E [rpcsvc.c:1670:rpcsvc_program_unregister]
0-rpc-service: portmap unregistration of program failed
[2018-09-28 23:34:10.407319] E [rpcsvc.c:1720:rpcsvc_program_unregister]
0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num:
1885957735, Ver: 1, Port: 0
[2018-09-28 23:34:10.407323] E [rpcsvc.c:1720:rpcsvc_program_unregister]
0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num:
1885957735, Ver: 1, Port: 0
[2018-09-28 23:34:10.409365] I [barrier.c:665:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing
all the queued fops
[2018-09-28 23:34:10.409382] I [barrier.c:665:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing
all the queued fops
[2018-09-28 23:34:10.413063] I [io-stats.c:3937:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded
[2018-09-28 23:34:10.413151] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1'
[2018-09-28 23:34:10.413178] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-09-28 23:34:10.413182] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:3
[2018-09-28 23:34:10.413195] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:3
[2018-09-28 23:34:10.413700] I [io-stats.c:3937:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded
[2018-09-28 23:34:10.413775] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1'
[2018-09-28 23:34:10.413786] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-09-28 23:34:10.413805] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:2
[2018-09-28 23:34:10.413807] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:2
[2018-09-28 23:34:10.923188] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2018-09-28 23:34:10.923350] I [MSGID: 101191]
[event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with
index 3
[2018-09-28 23:34:11.031917] I [MSGID: 101191]
[event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with
index 4
[2018-09-28 23:34:57.645690] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/protocol/client.so: undefined symbol:
xlator_api. Fall back
 to old symbols
[2018-09-28 23:34:57.646288] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol:
xlator_api. Fall ba
ck to old symbols
[2018-09-28 23:34:57.646714] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.646929] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647103] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647274] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647470] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647817] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.648021] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649370] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649654] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649858] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650027] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650192] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650381] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650716] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650914] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.652878] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.address-family' is not recognized
[2018-09-28 23:34:57.652911] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.tcp-user-timeout' is not recognized
[2018-09-28 23:34:57.652919] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.socket.keepalive-time' is not recognized
[2018-09-28 23:34:57.652927] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.socket.keepalive-interval' is not recognized
[2018-09-28 23:34:57.652934] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.socket.keepalive-count' is not recognized
[2018-09-28 23:34:57.652946] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.address-family' is not recognized
[2018-09-28 23:34:57.652954] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.tcp-user-timeout' is not recognized
[2018-09-28 23:34:57.652962] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.socket.keepalive-time' is not recognized
[2018-09-28 23:34:57.652969] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.socket.keepalive-interval' is not recognized
[2018-09-28 23:34:57.652976] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.socket.keepalive-count' is not recognized
[2018-09-28 23:34:57.683329] I [MSGID: 114020] [client.c:2354:notify]
7-mirror-client-1: parent translators are ready, attempting connect on
transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume mirror-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host workspace
  5:     option remote-subvolume
/run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick1/mirror
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab
  9:     option password 01f72c35-9b3a-47b3-852b-95bdb481e66a
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14:     option send-gids true
 15: end-volume
 16:  
 17: volume mirror-client-1
 18:     type protocol/client
 19:     option ping-timeout 42
 20:     option remote-host workspace
 21:     option remote-subvolume
/run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick2/mirror
 22:     option transport-type socket
 23:     option transport.address-family inet
 24:     option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab
 25:     option password 01f72c35-9b3a-47b3-852b-95bdb481e66a
 26:     option transport.tcp-user-timeout 0
 27:     option transport.socket.keepalive-time 20
 28:     option transport.socket.keepalive-interval 2
 29:     option transport.socket.keepalive-count 9
 30:     option send-gids true
 31: end-volume
 32:  
 33: volume 31717a7458a543dca4bffc8e6b1017cc-replicate-0
 34:     type cluster/replicate
 35:     option afr-pending-xattr mirror-client-0,mirror-client-1
 36:     option use-compound-fops off
 37:     subvolumes mirror-client-0 mirror-client-1
 38: end-volume
 39:
 40: volume 31717a7458a543dca4bffc8e6b1017cc-dht
 41:     type cluster/distribute
 42:     option lock-migration off
 43:     option force-migration off
 44:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-replicate-0
 45: end-volume
 46:  
 47: volume 31717a7458a543dca4bffc8e6b1017cc-read-only
 48:     type features/read-only
 49:     option read-only on
 50:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-dht
 51: end-volume
 52:  
 53: volume 31717a7458a543dca4bffc8e6b1017cc-write-behind
 54:     type performance/write-behind
 55:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-only
 56: end-volume
 57:  
 58: volume 31717a7458a543dca4bffc8e6b1017cc-read-ahead
 59:     type performance/read-ahead
 60:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-write-behind
 61: end-volume
 62:  
 63: volume 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead
 64:     type performance/readdir-ahead
 65:     option parallel-readdir off
 66:     option rda-request-size 131072
 67:     option rda-cache-limit 10MB
 68:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-ahead
 69: end-volume
 70:  
 71: volume 31717a7458a543dca4bffc8e6b1017cc-io-cache
 72:     type performance/io-cache
 73:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead
 74: end-volume
 75:  
 76: volume 31717a7458a543dca4bffc8e6b1017cc-quick-read
 77:     type performance/quick-read
 78:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-io-cache
 79: end-volume
 80:  
 81: volume 31717a7458a543dca4bffc8e6b1017cc-open-behind
 82:     type performance/open-behind
 83:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-quick-read
 84: end-volume
 85:  
 86: volume 31717a7458a543dca4bffc8e6b1017cc-md-cache
 87:     type performance/md-cache
 88:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-open-behind
 89: end-volume
 90:
 91: volume 31717a7458a543dca4bffc8e6b1017cc
 92:     type debug/io-stats
 93:     option log-level INFO
 94:     option latency-measurement off
 95:     option count-fop-hits off
 96:     subvolumes 31717a7458a543dca4bffc8e6b1017cc-md-cache
 97: end-volume
 98:  
+------------------------------------------------------------------------------+

Version-Release number of selected component (if applicable):

How reproducible:
Always

Steps to Reproduce:
1. Create couple of snapshots for a gluster volume
2. Then do a snapshot restore
3.

Actual results:
Client volfile is sent to the snapshot brick.

Expected results:
Actual brick volfile should be sent.

Additional info:

--- Additional comment from Worker Ant on 2018-10-01 18:03:32 EDT ---

REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the
volfile) posted (#1) for review on master by Raghavendra Bhat

--- Additional comment from Mohammed Rafi KC on 2018-10-03 07:04:01 EDT ---

I tried to reproduce the issue. I wasn't able to hit with the given steps to
reproduce. Am I missing anything here?

--- Additional comment from Raghavendra Bhat on 2018-10-03 08:47:30 EDT ---

Rafi, once you do snapshot restore, please check the log file of the
multiplexed snapshot brick. It prints the client volfile. IMO it should not
happen. In fact, due to another memory corruption bug, the brick process
crashed after it got the client volfile (the corruption bug is in the client
stack).

--- Additional comment from Worker Ant on 2018-10-04 01:03:07 EDT ---

REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the
volfile) posted (#4) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2018-10-04 14:33:01 EDT ---

REVIEW: https://review.gluster.org/21348 (mgmt/glusterd: use proper path to the
volfile) posted (#1) for review on release-4.1 by Raghavendra Bhat

--- Additional comment from Worker Ant on 2018-10-05 10:26:07 EDT ---

COMMIT: https://review.gluster.org/21348 committed in release-4.1 by
"Raghavendra Bhat" <raghavendra at redhat.com> with a commit message-
mgmt/glusterd: use proper path to the volfile

Till now, glusterd was generating the volfile path for the snapshot
volume's bricks like this.

/snaps/<snap name>/<brick volfile>

But in reality, the path to the brick volfile for a snapshot volume is

/snaps/<snap name>/<snap volume name>/<brick volfile>

The above workaround was used to distinguish between a mount command used
to mount the snapshot volume, and a brick of the snapshot volume, so that
based on what is actually happening, glusterd can return the proper volfile
(client volfile for the former and the brick volfile for the latter). But,
this was causing problems for snapshot restore when brick multiplexing is
enabled. Because, with brick multiplexing, it tries to find the volfile
and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e.

/snaps/<snap name>/<snap volume name>/<brick volfile>

So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc
request to glusterd for obtaining the brick volume file, glusterd was
returning the client volume file of the snapshot volume instead of the
brick volume file.

Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e
fixes: bz#1636218
Signed-off-by: Raghavendra Bhat <raghavendra at redhat.com>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1635050
[Bug 1635050] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
https://bugzilla.redhat.com/show_bug.cgi?id=1636162
[Bug 1636162] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
https://bugzilla.redhat.com/show_bug.cgi?id=1636218
[Bug 1636218] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
https://bugzilla.redhat.com/show_bug.cgi?id=1636291
[Bug 1636291] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.