[Gluster-devel] add-brick crashes client
Emmanuel Dreyfus
manu at netbsd.org
Fri Aug 3 04:52:11 UTC 2012
Hi
I feel unlucky with release-3.3. Adding a pair of brick in a replicated
volume crashes a client that is using the volume.
Client log is attached. Here is glusterfsd bbacktrace in gdb:
Program terminated with signal 11, Segmentation fault.
#0 0xbbbc239c in synctask_wrap (old_task=0xbb711000) at syncop.c:120
120 task->ret = task->syncfn (task->opaque);
(gdb) bt
#0 0xbbbc239c in synctask_wrap (old_task=0xbb711000) at syncop.c:120
#1 0xbb8ccbe0 in swapcontext () from /lib/libc.so.12
Backtrace stopped: Not enough registers or memory available to unwind further
(gdb) print task
$1 = (struct synctask *) 0x0
This means pthread_getspecific(synctask_key) in synctask_get() returned
NULL, something I cannot explain. I see the logs complain about a volume
being down. This may be the cause of the problem since I have been able
to do a live brick-add later, once I restarterd glusterd/glusterfsd on
all bricks.
--
Emmanuel Dreyfus
manu at netbsd.org
-------------- next part --------------
[2012-08-03 06:11:13.853900] I [glusterfsd-mgmt.c:64:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2012-08-03 06:11:14.991707] I [io-cache.c:1549:check_cache_size_ok] 1-gfs-quick-read: Max cache size is 18446744069951455232
[2012-08-03 06:11:14.991834] I [io-cache.c:1549:check_cache_size_ok] 1-gfs-io-cache: Max cache size is 18446744069951455232
[2012-08-03 06:11:15.056753] I [client.c:2142:notify] 1-gfs-client-0: parent translators are ready, attempting connect on transport
[2012-08-03 06:11:15.059123] I [client.c:2142:notify] 1-gfs-client-1: parent translators are ready, attempting connect on transport
[2012-08-03 06:11:15.061329] I [client.c:2142:notify] 1-gfs-client-2: parent translators are ready, attempting connect on transport
[2012-08-03 06:11:15.063623] I [client.c:2142:notify] 1-gfs-client-3: parent translators are ready, attempting connect on transport
Given volfile:
+------------------------------------------------------------------------------+
1: volume gfs-client-0
2: type protocol/client
3: option remote-host silo
4: option remote-subvolume /export/wd3a
5: option transport-type tcp
6: end-volume
7:
8: volume gfs-client-1
9: type protocol/client
10: option remote-host hangar
11: option remote-subvolume /export/wd3a
12: option transport-type tcp
13: end-volume
14:
15: volume gfs-client-2
16: type protocol/client
17: option remote-host hangar
18: option remote-subvolume /export/wd1a
19: option transport-type tcp
20: end-volume
21:
22: volume gfs-client-3
23: type protocol/client
24: option remote-host hotstuff
25: option remote-subvolume /export/wd1a
26: option transport-type tcp
27: end-volume
28:
29: volume gfs-replicate-0
30: type cluster/replicate
31: subvolumes gfs-client-0 gfs-client-1
32: end-volume
33:
34: volume gfs-replicate-1
35: type cluster/replicate
36: subvolumes gfs-client-2 gfs-client-3
37: end-volume
38:
39: volume gfs-dht
40: type cluster/distribute
41: subvolumes gfs-replicate-0 gfs-replicate-1
42: end-volume
43:
44: volume gfs-write-behind
45: type performance/write-behind
46: subvolumes gfs-dht
47: end-volume
48:
49: volume gfs-read-ahead
50: type performance/read-ahead
51: subvolumes gfs-write-behind
52: end-volume
53:
54: volume gfs-io-cache
55: type performance/io-cache
56: subvolumes gfs-read-ahead
57: end-volume
58:
59: volume gfs-quick-read
60: type performance/quick-read
61: subvolumes gfs-io-cache
62: end-volume
63:
64: volume gfs-md-cache
65: type performance/md-cache
66: subvolumes gfs-quick-read
67: end-volume
68:
69: volume gfs
70: type debug/io-stats
71: option latency-measurement off
72: option count-fop-hits off
73: subvolumes gfs-md-cache
74: end-volume
+------------------------------------------------------------------------------+
[2012-08-03 06:11:15.070451] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-0: changing port to 24010 (from 0)
[2012-08-03 06:11:16.240641] E [client-handshake.c:1717:client_query_portmap_cbk] 1-gfs-client-3: failed to get the port number for remote subvolume
[2012-08-03 06:11:16.240890] I [client.c:2090:client_rpc_notify] 1-gfs-client-3: disconnected
[2012-08-03 06:11:16.533693] E [client-handshake.c:1717:client_query_portmap_cbk] 1-gfs-client-2: failed to get the port number for remote subvolume
[2012-08-03 06:11:16.533963] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-1: changing port to 24010 (from 0)
[2012-08-03 06:11:16.534166] I [client.c:2090:client_rpc_notify] 1-gfs-client-2: disconnected
[2012-08-03 06:11:16.534363] E [afr-common.c:3664:afr_notify] 1-gfs-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2012-08-03 06:11:18.609639] I [client-handshake.c:1636:select_server_supported_programs] 1-gfs-client-0: Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-08-03 06:11:18.613869] I [client-handshake.c:1433:client_setvolume_cbk] 1-gfs-client-0: Connected to 192.0.2.99:24010, attached to remote volume '/export/wd3a'.
[2012-08-03 06:11:18.614028] I [client-handshake.c:1445:client_setvolume_cbk] 1-gfs-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2012-08-03 06:11:18.614483] I [afr-common.c:3627:afr_notify] 1-gfs-replicate-0: Subvolume 'gfs-client-0' came back up; going online.
[2012-08-03 06:11:18.615776] I [client-handshake.c:453:client_set_lk_version_cbk] 1-gfs-client-0: Server lk version = 1
[2012-08-03 06:11:19.625116] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-2: changing port to 24011 (from 0)
[2012-08-03 06:11:19.626509] I [client-handshake.c:1636:select_server_supported_programs] 1-gfs-client-1: Using Program GlusterFS 3.3git, Num (1298437), Version (330)
[2012-08-03 06:11:19.627991] I [rpc-clnt.c:1660:rpc_clnt_reconfig] 1-gfs-client-3: changing port to 24009 (from 0)
[2012-08-03 06:11:19.628392] I [client-handshake.c:1433:client_setvolume_cbk] 1-gfs-client-1: Connected to 192.0.2.98:24010, attached to remote volume '/export/wd3a'.
[2012-08-03 06:11:19.628606] I [client-handshake.c:1445:client_setvolume_cbk] 1-gfs-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-08-03 06:11:19.664120] I [fuse-bridge.c:4193:fuse_graph_setup] 0-fuse: switched to graph 1
[2012-08-03 06:11:19.665868] I [client-handshake.c:453:client_set_lk_version_cbk] 1-gfs-client-1: Server lk version = 1
[2012-08-03 06:11:19.669841] I [afr-common.c:1964:afr_set_root_inode_on_first_lookup] 1-gfs-replicate-0: added root inode
[2012-08-03 06:11:19.671492] I [dht-layout.c:593:dht_layout_normalize] 1-gfs-dht: found anomalies in /. holes=1 overlaps=0
[2012-08-03 06:11:19.672057] W [dht-selfheal.c:875:dht_selfheal_directory] 1-gfs-dht: 1 subvolumes down -- not fixing
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-08-03 06:11:19
configuration details:
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
spinlock 1
extattr.h 1
xattr.h 1
st_atimespec.tv_nsec 1
package-string: glusterfs 3.3git
More information about the Gluster-devel
mailing list