[Bugs] [Bug 1647165] New: [SNAPSHOT]: with brick multiplexing, snapshot restore will make glusterd send wrong volfile
bugzilla at redhat.com
bugzilla at redhat.com
Tue Nov 6 19:05:45 UTC 2018
https://bugzilla.redhat.com/show_bug.cgi?id=1647165
Bug ID: 1647165
Summary: [SNAPSHOT]: with brick multiplexing, snapshot restore
will make glusterd send wrong volfile
Product: GlusterFS
Version: 3.12
Component: snapshot
Assignee: bugs at gluster.org
Reporter: rabhat at redhat.com
CC: bugs at gluster.org, rabhat at redhat.com,
rkavunga at redhat.com
Depends On: 1635050, 1636218
Blocks: 1636291, 1636162
+++ This bug was initially created as a clone of Bug #1636218 +++
+++ This bug was initially created as a clone of Bug #1635050 +++
Description of problem:
With brick multiplexing, when a snapshot restore is done, it leads to a
snapshot brick getting the client volume file for the corresponding snapshot
volume instead of the correct brick volume file.
[2018-09-28 23:34:10.405953] I
[glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching
not-only child
/run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick1/mirror
[2018-09-28 23:34:10.406009] I
[glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching
not-only child
/run/gluster/snaps/1534bc2e092341d7bb7d940e9728c3ca/brick2/mirror
The above 2 lines indicate that it is a brick process. Because of restore, the
original snapshot volume's bricks are stopped leading to a detach of the bricks
happening which is logged above.
[2018-09-28 23:34:10.407276] E
[rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not
unregister with portmap
[2018-09-28 23:34:10.407276] E
[rpcsvc.c:1542:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not
unregister with portmap
[2018-09-28 23:34:10.407299] E [rpcsvc.c:1670:rpcsvc_program_unregister]
0-rpc-service: portmap unregistration of program failed
[2018-09-28 23:34:10.407310] E [rpcsvc.c:1670:rpcsvc_program_unregister]
0-rpc-service: portmap unregistration of program failed
[2018-09-28 23:34:10.407319] E [rpcsvc.c:1720:rpcsvc_program_unregister]
0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num:
1885957735, Ver: 1, Port: 0
[2018-09-28 23:34:10.407323] E [rpcsvc.c:1720:rpcsvc_program_unregister]
0-rpc-service: Program unregistration failed: GlusterFS Changelog, Num:
1885957735, Ver: 1, Port: 0
[2018-09-28 23:34:10.409365] I [barrier.c:665:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing
all the queued fops
[2018-09-28 23:34:10.409382] I [barrier.c:665:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-barrier: Disabling barriering and dequeuing
all the queued fops
[2018-09-28 23:34:10.413063] I [io-stats.c:3937:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded
[2018-09-28 23:34:10.413151] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1'
[2018-09-28 23:34:10.413178] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-09-28 23:34:10.413182] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:3
[2018-09-28 23:34:10.413195] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:3
[2018-09-28 23:34:10.413700] I [io-stats.c:3937:fini]
0-1534bc2e092341d7bb7d940e9728c3ca-io-stats: io-stats translator unloaded
[2018-09-28 23:34:10.413775] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 4.x v1'
[2018-09-28 23:34:10.413786] I [rpcsvc.c:2054:rpcsvc_spawn_threads]
0-rpc-service: terminating 1 threads for program 'GlusterFS 3.3'
[2018-09-28 23:34:10.413805] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 4.x v1' thread terminated; total count:2
[2018-09-28 23:34:10.413807] I [rpcsvc.c:1993:rpcsvc_request_handler]
0-rpc-service: program 'GlusterFS 3.3' thread terminated; total count:2
[2018-09-28 23:34:10.923188] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2018-09-28 23:34:10.923350] I [MSGID: 101191]
[event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with
index 3
[2018-09-28 23:34:11.031917] I [MSGID: 101191]
[event-epoll.c:653:event_dispatch_epoll_worker] 0-epoll: Exited thread with
index 4
[2018-09-28 23:34:57.645690] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/protocol/client.so: undefined symbol:
xlator_api. Fall back
to old symbols
[2018-09-28 23:34:57.646288] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol:
xlator_api. Fall ba
ck to old symbols
[2018-09-28 23:34:57.646714] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.646929] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647103] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647274] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647470] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.647817] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.648021] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649370] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/cluster/replicate.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649654] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/features/read-only.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.649858] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/write-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650027] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/read-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650192] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/readdir-ahead.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650381] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/io-cache.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650716] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/performance/open-behind.so: undefined
symbol: xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.650914] I [MSGID: 101097]
[xlator.c:341:xlator_dynload_newway] 0-xlator: dlsym(xlator_api) on
/usr/local/lib/glusterfs/6dev/xlator/debug/io-stats.so: undefined symbol:
xlator_api. Fall back to old symbols
[2018-09-28 23:34:57.652878] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.address-family' is not recognized
[2018-09-28 23:34:57.652911] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.tcp-user-timeout' is not recognized
[2018-09-28 23:34:57.652919] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.socket.keepalive-time' is not recognized
[2018-09-28 23:34:57.652927] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.socket.keepalive-interval' is not recognized
[2018-09-28 23:34:57.652934] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-1: option
'transport.socket.keepalive-count' is not recognized
[2018-09-28 23:34:57.652946] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.address-family' is not recognized
[2018-09-28 23:34:57.652954] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.tcp-user-timeout' is not recognized
[2018-09-28 23:34:57.652962] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.socket.keepalive-time' is not recognized
[2018-09-28 23:34:57.652969] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.socket.keepalive-interval' is not recognized
[2018-09-28 23:34:57.652976] W [MSGID: 101174]
[graph.c:397:_log_if_unknown_option] 7-mirror-client-0: option
'transport.socket.keepalive-count' is not recognized
[2018-09-28 23:34:57.683329] I [MSGID: 114020] [client.c:2354:notify]
7-mirror-client-1: parent translators are ready, attempting connect on
transport
Final graph:
+------------------------------------------------------------------------------+
1: volume mirror-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host workspace
5: option remote-subvolume
/run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick1/mirror
6: option transport-type socket
7: option transport.address-family inet
8: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab
9: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a
10: option transport.tcp-user-timeout 0
11: option transport.socket.keepalive-time 20
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-count 9
14: option send-gids true
15: end-volume
16:
17: volume mirror-client-1
18: type protocol/client
19: option ping-timeout 42
20: option remote-host workspace
21: option remote-subvolume
/run/gluster/snaps/31717a7458a543dca4bffc8e6b1017cc/brick2/mirror
22: option transport-type socket
23: option transport.address-family inet
24: option username 8b527b11-a2e6-45e3-a12a-99b46dc636ab
25: option password 01f72c35-9b3a-47b3-852b-95bdb481e66a
26: option transport.tcp-user-timeout 0
27: option transport.socket.keepalive-time 20
28: option transport.socket.keepalive-interval 2
29: option transport.socket.keepalive-count 9
30: option send-gids true
31: end-volume
32:
33: volume 31717a7458a543dca4bffc8e6b1017cc-replicate-0
34: type cluster/replicate
35: option afr-pending-xattr mirror-client-0,mirror-client-1
36: option use-compound-fops off
37: subvolumes mirror-client-0 mirror-client-1
38: end-volume
39:
40: volume 31717a7458a543dca4bffc8e6b1017cc-dht
41: type cluster/distribute
42: option lock-migration off
43: option force-migration off
44: subvolumes 31717a7458a543dca4bffc8e6b1017cc-replicate-0
45: end-volume
46:
47: volume 31717a7458a543dca4bffc8e6b1017cc-read-only
48: type features/read-only
49: option read-only on
50: subvolumes 31717a7458a543dca4bffc8e6b1017cc-dht
51: end-volume
52:
53: volume 31717a7458a543dca4bffc8e6b1017cc-write-behind
54: type performance/write-behind
55: subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-only
56: end-volume
57:
58: volume 31717a7458a543dca4bffc8e6b1017cc-read-ahead
59: type performance/read-ahead
60: subvolumes 31717a7458a543dca4bffc8e6b1017cc-write-behind
61: end-volume
62:
63: volume 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead
64: type performance/readdir-ahead
65: option parallel-readdir off
66: option rda-request-size 131072
67: option rda-cache-limit 10MB
68: subvolumes 31717a7458a543dca4bffc8e6b1017cc-read-ahead
69: end-volume
70:
71: volume 31717a7458a543dca4bffc8e6b1017cc-io-cache
72: type performance/io-cache
73: subvolumes 31717a7458a543dca4bffc8e6b1017cc-readdir-ahead
74: end-volume
75:
76: volume 31717a7458a543dca4bffc8e6b1017cc-quick-read
77: type performance/quick-read
78: subvolumes 31717a7458a543dca4bffc8e6b1017cc-io-cache
79: end-volume
80:
81: volume 31717a7458a543dca4bffc8e6b1017cc-open-behind
82: type performance/open-behind
83: subvolumes 31717a7458a543dca4bffc8e6b1017cc-quick-read
84: end-volume
85:
86: volume 31717a7458a543dca4bffc8e6b1017cc-md-cache
87: type performance/md-cache
88: subvolumes 31717a7458a543dca4bffc8e6b1017cc-open-behind
89: end-volume
90:
91: volume 31717a7458a543dca4bffc8e6b1017cc
92: type debug/io-stats
93: option log-level INFO
94: option latency-measurement off
95: option count-fop-hits off
96: subvolumes 31717a7458a543dca4bffc8e6b1017cc-md-cache
97: end-volume
98:
+------------------------------------------------------------------------------+
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Create couple of snapshots for a gluster volume
2. Then do a snapshot restore
3.
Actual results:
Client volfile is sent to the snapshot brick.
Expected results:
Actual brick volfile should be sent.
Additional info:
--- Additional comment from Worker Ant on 2018-10-01 18:03:32 EDT ---
REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the
volfile) posted (#1) for review on master by Raghavendra Bhat
--- Additional comment from Mohammed Rafi KC on 2018-10-03 07:04:01 EDT ---
I tried to reproduce the issue. I wasn't able to hit with the given steps to
reproduce. Am I missing anything here?
--- Additional comment from Raghavendra Bhat on 2018-10-03 08:47:30 EDT ---
Rafi, once you do snapshot restore, please check the log file of the
multiplexed snapshot brick. It prints the client volfile. IMO it should not
happen. In fact, due to another memory corruption bug, the brick process
crashed after it got the client volfile (the corruption bug is in the client
stack).
--- Additional comment from Worker Ant on 2018-10-04 01:03:07 EDT ---
REVIEW: https://review.gluster.org/21314 (mgmt/glusterd: use proper path to the
volfile) posted (#4) for review on master by Atin Mukherjee
--- Additional comment from Worker Ant on 2018-10-04 14:33:01 EDT ---
REVIEW: https://review.gluster.org/21348 (mgmt/glusterd: use proper path to the
volfile) posted (#1) for review on release-4.1 by Raghavendra Bhat
--- Additional comment from Worker Ant on 2018-10-05 10:26:07 EDT ---
COMMIT: https://review.gluster.org/21348 committed in release-4.1 by
"Raghavendra Bhat" <raghavendra at redhat.com> with a commit message-
mgmt/glusterd: use proper path to the volfile
Till now, glusterd was generating the volfile path for the snapshot
volume's bricks like this.
/snaps/<snap name>/<brick volfile>
But in reality, the path to the brick volfile for a snapshot volume is
/snaps/<snap name>/<snap volume name>/<brick volfile>
The above workaround was used to distinguish between a mount command used
to mount the snapshot volume, and a brick of the snapshot volume, so that
based on what is actually happening, glusterd can return the proper volfile
(client volfile for the former and the brick volfile for the latter). But,
this was causing problems for snapshot restore when brick multiplexing is
enabled. Because, with brick multiplexing, it tries to find the volfile
and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e.
/snaps/<snap name>/<snap volume name>/<brick volfile>
So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc
request to glusterd for obtaining the brick volume file, glusterd was
returning the client volume file of the snapshot volume instead of the
brick volume file.
Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e
fixes: bz#1636218
Signed-off-by: Raghavendra Bhat <raghavendra at redhat.com>
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1635050
[Bug 1635050] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
https://bugzilla.redhat.com/show_bug.cgi?id=1636162
[Bug 1636162] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
https://bugzilla.redhat.com/show_bug.cgi?id=1636218
[Bug 1636218] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
https://bugzilla.redhat.com/show_bug.cgi?id=1636291
[Bug 1636291] [SNAPSHOT]: with brick multiplexing, snapshot restore will
make glusterd send wrong volfile
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list