[Bugs] [Bug 1731509] New: snapd crashes sometimes
bugzilla at redhat.com
bugzilla at redhat.com
Fri Jul 19 15:12:45 UTC 2019
https://bugzilla.redhat.com/show_bug.cgi?id=1731509
Bug ID: 1731509
Summary: snapd crashes sometimes
Product: GlusterFS
Version: 6
Status: NEW
Component: snapshot
Severity: high
Priority: high
Assignee: bugs at gluster.org
Reporter: rabhat at redhat.com
CC: atumball at redhat.com, bugs at gluster.org
Depends On: 1726783
Target Milestone: ---
Classification: Community
+++ This bug was initially created as a clone of Bug #1726783 +++
Description of problem:
This crash was found in the regression runs and it happens once in a while and
not always.
Core was generated by `/build/install/sbin/glusterfsd -s localhost --volfile-id
snapd/patchy -p /var/r'.
Program terminated with signal SIGSEGV, Segmentation fault.
bt
#0 0x00007f0a5d00b016 in __strcmp_sse42 () from ./lib64/libc.so.6
#1 0x00007f0a501deceb in mgmt_get_snapinfo_cbk (req=0x7f0a44001808,
iov=0x7f0a44001840, count=1, myframe=0x7f0a440014b8)
at
/home/raghu/work/src/glusterfs/xlators/features/snapview-server/src/snapview-server-mgmt.c:393
#2 0x00007f0a5e67a58c in rpc_clnt_handle_reply (clnt=0xa4d850,
pollin=0x7f0a4c002080) at
/home/raghu/work/src/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:764
#3 0x00007f0a5e67aab5 in rpc_clnt_notify (trans=0xa4da80, mydata=0xa4d880,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f0a4c002080)
at /home/raghu/work/src/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:931
#4 0x00007f0a5e676a9c in rpc_transport_notify (this=0xa4da80,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7f0a4c002080)
at /home/raghu/work/src/glusterfs/rpc/rpc-lib/src/rpc-transport.c:547
#5 0x00007f0a52c85a19 in socket_event_poll_in (this=0xa4da80,
notify_handled=true) at
/home/raghu/work/src/glusterfs/rpc/rpc-transport/socket/src/socket.c:2582
#6 0x00007f0a52c86a2e in socket_event_handler (fd=9, idx=1, gen=1,
data=0xa4da80, poll_in=1, poll_out=0, poll_err=0, event_thread_died=0 '\000')
at
/home/raghu/work/src/glusterfs/rpc/rpc-transport/socket/src/socket.c:2984
#7 0x00007f0a5e9574f2 in event_dispatch_epoll_handler (event_pool=0x9ff830,
event=0x7f0a50be7140)
at /home/raghu/work/src/glusterfs/libglusterfs/src/event-epoll.c:648
#8 0x00007f0a5e957a0b in event_dispatch_epoll_worker (data=0xa4faf0) at
/home/raghu/work/src/glusterfs/libglusterfs/src/event-epoll.c:761
#9 0x00007f0a5d702dd5 in start_thread () from ./lib64/libpthread.so.0
#10 0x00007f0a5cfcaead in clone () from ./lib64/libc.so.6
(gdb) f 1
#1 0x00007f0a501deceb in mgmt_get_snapinfo_cbk (req=0x7f0a44001808,
iov=0x7f0a44001840, count=1, myframe=0x7f0a440014b8)
at
/home/raghu/work/src/glusterfs/xlators/features/snapview-server/src/snapview-server-mgmt.c:393
393 if ((!strcmp(old_dirents[i].name, dirents[j].name)) &&
(gdb) l
388 LOCK(&priv->snaplist_lock);
389 {
390 oldcount = priv->num_snaps;
391 for (i = 0; i < priv->num_snaps; i++) {
392 for (j = 0; j < snapcount; j++) {
393 if ((!strcmp(old_dirents[i].name, dirents[j].name)) &&
394 (!strcmp(old_dirents[i].uuid, dirents[j].uuid))) {
395 dirents[j].fs = old_dirents[i].fs;
396 old_dirents[i].fs = NULL;
397 break;
(gdb) p old_dirents
$1 = (snap_dirent_t *) 0x0
(gdb) p snap->dirents
No symbol "snap" in current context.
(gdb) p priv->dirents
$2 = (snap_dirent_t *) 0x7f0a440031e0
(gdb) p *priv->dirents
$3 = {name = "snap1", '\000' <repeats 249 times>, uuid =
"da58a428-260e-4c28-abae-6d426f7417fa", snap_volname =
"11536174454641028b9d901c7962abca", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p priv->dirents[0]
$4 = {name = "snap1", '\000' <repeats 249 times>, uuid =
"da58a428-260e-4c28-abae-6d426f7417fa", snap_volname =
"11536174454641028b9d901c7962abca", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p priv->dirents[1]
$5 = {name = "snap2", '\000' <repeats 249 times>, uuid =
"985b0bcc-279a-40d3-8a21-1cb7d250ae25", snap_volname =
"202c92fdf9624836931590897482a09b", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p priv->dirents[2]
$6 = {name = "snap3", '\000' <repeats 249 times>, uuid =
"9c001fcf-b010-48d6-b6f4-b70f120b69f8", snap_volname =
"11f4f20ef1514be5a572b602ffe8d96d", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p priv->dirents[3]
$7 = {name = "snap4", '\000' <repeats 249 times>, uuid =
"3b2eadfe-cad4-4c6f-ba2e-d2c89710d960", snap_volname =
"d28715725bb24411a7f4eebdff70e41a", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p priv->dirents[4]
$8 = {name = "\000\000\000\000\000\000\000\000a\325\001", '\000' <repeats 243
times>, uuid = '\000' <repeats 36 times>, snap_volname = '\000' <repeats 254
times>, fs = 0x0}
(gdb) p dirents
$9 = (snap_dirent_t *) 0x7f0a4c024190
(gdb) p dirents[0]
$10 = {name = "snap1", '\000' <repeats 249 times>, uuid =
"da58a428-260e-4c28-abae-6d426f7417fa", snap_volname =
"11536174454641028b9d901c7962abca", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p dirents[1]
$11 = {name = "snap2", '\000' <repeats 249 times>, uuid =
"985b0bcc-279a-40d3-8a21-1cb7d250ae25", snap_volname =
"202c92fdf9624836931590897482a09b", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p dirents[2]
$12 = {name = "snap3", '\000' <repeats 249 times>, uuid =
"9c001fcf-b010-48d6-b6f4-b70f120b69f8", snap_volname =
"11f4f20ef1514be5a572b602ffe8d96d", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p dirents[3]
$13 = {name = "snap4", '\000' <repeats 249 times>, uuid =
"3b2eadfe-cad4-4c6f-ba2e-d2c89710d960", snap_volname =
"d28715725bb24411a7f4eebdff70e41a", '\000' <repeats 222 times>, fs = 0x0}
(gdb) p dirents[4]
$14 = {
name =
"\000\000\000\000\000\000\000\000\245\004\000\000\000\000\000\000\000@\315I\n\177\000\000\020\f\000L\n\177\000\000؍\356I\n\177\000\000\000\000\000\000\000\000\000\000@\311\001L\n\177\000\000`J\002L\n\177\000\000\000\000\000\000\000\000\000\000\320N\002L\n\177\000\000\000\000\000\000\000\000\000\000؍\356I\n\177\000\000\270\216\356I\n\177\000\000\250\216\356I\n\177\000\000\000\000\000\000\000\000\000\000h\216\356I\n\177\000\000x\216\356I\n\177\000\000\350\216\356I\n\177\000\000\370\216\356I\n\177\000\000\b\217\356I\n\177\000\000\210\216\356I\n\177\000\000\230\216\356I\n\177\000\000\370\215\356I\n\177\000\000\b\216\356I\n\177\000\000\350\215\356I\n\177",
'\000' <repeats 42 times>...,
uuid =
"\000\000\000\000\000\000\000\000\000؎\356I\n\177\000\000\000\000\000\000\000\000\000\000\030\216\356I\n\177\000\000\070\216\356I",
snap_volname = "\n\177\000\000(\216\356I\n\177\000\000H\216\356I\n\177",
'\000' <repeats 42 times>,
"H\217\356I\n\177\000\000\070\217\356I\n\177\000\000(\217\356I\n\177\000\000\030\217\356I\n\177",
'\000' <repeats 18 times>, "h\217\356I\n\177", '\000' <repeats 66 times>,
"X\217\356I\n\177", '\000' <repeats 68 times>, fs = 0x0}
(gdb) p snapcount
$15 = 4
(gdb) p oldcount
$16 = 4
Version-Release number of selected component (if applicable):
How reproducible:
Not always
Additional info:
Reason:
In snapshot daemon, mgmt_get_snapinfo_cbk is the callback for the rpc call to
the glusterd to get the list of snapshots available. The rpc call to glusterd
is made at two instances. One is when snapview-server xlator is spawned it
makes the call in the init function. Another is whenever there is connection
happening between snapd and glusterd (in svs_rpc_notify).
But whenever snapd comes up, it is both inited and gets connected to glusterd.
So, there can be 2 rpc calls happening to glusterd and the order in which the
responses comes can lead to the above crash.
in mgmt_get_snapinfo_cbk we do this
1) Get the list of existing snapshots from the local information private to the
snapview-server xlator
2) Then process the list of snapshots that glusterd sent.
Ideally mgmt_get_snapinfo_cbk should be getting the current list of snapshots
from its private structure under a lock. If not, then at the time of getting
the local list, the list could be empty. But at the time of processing and
comparing local list with the new list obtained via rpc, the list could have
changed. So the empty lists results in a NULL pointer access.
So the fix is to ensure that the local list of snapshots from its private
structure are obtained inside a lock.
--- Additional comment from Worker Ant on 2019-07-03 18:07:23 UTC ---
REVIEW: https://review.gluster.org/22989 (features/snapview-server: obtain the
list of snapshots inside the lock) posted (#1) for review on master by
Raghavendra Bhat
--- Additional comment from Worker Ant on 2019-07-03 18:08:58 UTC ---
REVIEW: https://review.gluster.org/22990 (features/snapview-server: obtain the
list of snapshots inside the lock) posted (#1) for review on master by
Raghavendra Bhat
--- Additional comment from Worker Ant on 2019-07-12 13:37:34 UTC ---
REVIEW: https://review.gluster.org/22990 (features/snapview-server: obtain the
list of snapshots inside the lock) merged (#2) on master by mohammed rafi kc
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1726783
[Bug 1726783] snapd crashes sometimes
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
More information about the Bugs
mailing list