[Gluster-users] Distribute rebalance issues

Stephen Remde stephen.remde at gaist.co.uk
Tue Oct 17 09:18:34 UTC 2017


Hi,


I have a rebalance that has failed on one peer twice now. Rebalance
logs below (directories anonomised and some irrelevant log lines cut).
It looks like it loses connection to the brick, but immediately stops
the rebalance on that peer instead of waiting for reconnection - which
happens a second or so later.
Is this normal behaviour? So far it has been the same server and the
same (remote) brick.


The brick shows a high number of disconnects compared to the other
bricks on the same server


./export-md0-brick.log.1      2
./export-md1-brick.log.1      2
./export-md2-brick.log.1    181
./export-md3-brick.log.1      2


Any clues? What could be causing this because there is nothing in the
log to indicate cause.


Steve


gluster volume info video

Volume Name: video
Type: Distribute
Volume ID: ccdac37f-9b0e-415f-b62e-9071d8168199
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: tcp
Bricks:
Brick1: 10.0.0.31:/export/md0/brick
Brick2: 10.0.0.32:/export/md0/brick
Brick3: 10.0.0.31:/export/md1/brick
Brick4: 10.0.0.32:/export/md1/brick
Brick5: 10.0.0.31:/export/md2/brick
Brick6: 10.0.0.32:/export/md2/brick
Brick7: 10.0.0.31:/export/md3/brick
Brick8: 10.0.0.32:/export/md3/brick
Brick9: 10.0.0.33:/export/md0/brick
Options Reconfigured:
network.ping-timeout: 10
cluster.min-free-disk: 1%
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.rebal-throttle: lazy

[2017-10-12 23:00:55.099153] W [socket.c:590:__socket_rwv]
0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by
peer)
[2017-10-12 23:00:55.099709] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from
video-client-4. Client process will keep trying to connect to glusterd
until brick's port is available
[2017-10-12 23:00:55.099741] W [MSGID: 109073]
[dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN.
Exiting
[2017-10-12 23:00:55.099752] I [MSGID: 109029]
[dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on
rebalance
[2017-10-12 23:01:05.478462] I [rpc-clnt.c:1947:rpc_clnt_reconfig]
0-video-client-4: changing port to 49164 (from 0)
[2017-10-12 23:01:05.481180] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs]
0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-10-12 23:01:05.482630] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4:
Connected to video-client-4, attached to remote volume
'/export/md2/brick'.
[2017-10-12 23:01:05.482659] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4:
Server and Client lk-version numbers are not same, reopening the fds
[2017-10-12 23:01:05.483365] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4:
Server lk version = 1
[2017-10-12 23:01:30.310089] I
[dht-rebalance.c:2819:gf_defrag_process_dir] 0-DHT: Found critical
error from gf_defrag_get_entry
[2017-10-12 23:01:30.310166] E [MSGID: 109111]
[dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht:
gf_defrag_process_dir failed for directory: /y/y/y/y/y
[2017-10-12 23:01:30.380574] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /y/y/y/y/y
[2017-10-12 23:01:30.380756] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /y/y/y/y
[2017-10-12 23:01:30.380879] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /y/y/y
[2017-10-12 23:01:30.380965] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /y/y
[2017-10-12 23:03:09.285157] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f112b6d16ba]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55b325019545]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55b3250193b4] ) 0-:
received signum (15), shutting down

[2017-10-17 03:20:28.921512] W [socket.c:590:__socket_rwv]
0-video-client-4: readv on 10.0.0.31:49164 failed (Connection reset by
peer)
[2017-10-17 03:20:28.921554] I [MSGID: 114018]
[client.c:2280:client_rpc_notify] 0-video-client-4: disconnected from
video-client-4. Client process will keep trying to connect to glusterd
until brick's port is available
[2017-10-17 03:20:28.921570] W [MSGID: 109073]
[dht-common.c:8839:dht_notify] 0-video-dht: Received CHILD_DOWN.
Exiting
[2017-10-17 03:20:28.921578] I [MSGID: 109029]
[dht-rebalance.c:4195:gf_defrag_stop] 0-: Received stop command on
rebalance
[2017-10-17 03:20:39.344417] I [rpc-clnt.c:1947:rpc_clnt_reconfig]
0-video-client-4: changing port to 49164 (from 0)
[2017-10-17 03:20:39.347440] I [MSGID: 114057]
[client-handshake.c:1446:select_server_supported_programs]
0-video-client-4: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2017-10-17 03:20:39.349244] I [MSGID: 114046]
[client-handshake.c:1222:client_setvolume_cbk] 0-video-client-4:
Connected to video-client-4, attached to remote volume
'/export/md2/brick'.
[2017-10-17 03:20:39.349261] I [MSGID: 114047]
[client-handshake.c:1233:client_setvolume_cbk] 0-video-client-4:
Server and Client lk-version numbers are not same, reopening the fds
[2017-10-17 03:20:39.350611] I [MSGID: 114035]
[client-handshake.c:201:client_set_lk_version_cbk] 0-video-client-4:
Server lk version = 1
[2017-10-17 03:27:17.231133] I
[dht-rebalance.c:2819:gf_defrag_process_dir] 0-DHT: Found critical
error from gf_defrag_get_entry
[2017-10-17 03:27:17.231214] E [MSGID: 109111]
[dht-rebalance.c:3090:gf_defrag_fix_layout] 0-video-dht:
gf_defrag_process_dir failed for directory: /x/x/x/x/x
[2017-10-17 03:27:17.562481] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /x/x/x/x/x
[2017-10-17 03:27:17.562619] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /x/x/x/x
[2017-10-17 03:27:17.562726] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /x/x/x
[2017-10-17 03:27:17.562810] E [MSGID: 109016]
[dht-rebalance.c:3267:gf_defrag_fix_layout] 0-video-dht: Fix layout
failed for /x/x
[2017-10-17 03:27:18.379825] W [glusterfsd.c:1327:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f700b9696ba]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55f9c0022545]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55f9c00223b4] ) 0-:
received signum (15), shutting down
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171017/c2cbb2d0/attachment.html>


More information about the Gluster-users mailing list