[Gluster-users] How to diagnose volume rebalance failure?

Sat Dec 12 02:32:33 UTC 2015

Hi,
I tried to rebalance my volume for several times, but all failed. Gluster version is 3.7.4.

Status and info:
[root at d001 ~]# gluster volume rebalance FastVol status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            51251       553.4MB       3422092             0             0               failed           13211.00
volume rebalance: FastVol: success:

[root at d001 ~]# gluster volume status
Status of volume: FastVol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick d001:/mnt/c1/brick                    49154     0          Y       32111
Brick d001:/mnt/b1/brick                    49155     0          Y       24557
Task Status of Volume FastVol
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : cf1b25a0-4e33-4abf-9bb9-64cfd7bad115
Status               : failed

[root at d001 ~]# gluster volume info
Volume Name: FastVol
Type: Distribute
Volume ID: dbee250a-e3fe-4448-b905-b76c5ba80b25
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: d001:/mnt/c1/brick
Brick2: d001:/mnt/b1/brick
Options Reconfigured:
nfs.disable: true
auth.allow: 127.0.0.1,10.*

I have checked FastVol-rebalance.log and find no error by searching " E ", then I find some warnings by searching " W ":
[2015-12-11 15:51:57.402661] W [MSGID: 109009] [dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht: /for_ybest_fsdir/user/Weixin.oClDcji6/7g/yg/3LAF3uXtLlQndyFA: gfid different on FastVol-client-1. gfid local = 393d4a1a-20b8-49b2-0000-000000000000, gfid subvol = 393d4a1a-20b8-49b2-8f79-cb17472579e2
[2015-12-11 15:52:12.071984] W [MSGID: 109009] [dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht: /for_ybest_fsdir/user/Weixin.oClDcji6/ZA: gfid different on FastVol-client-1. gfid local = 5de2d8a9-954a-437a-8a4f-fe6ab30b646d, gfid subvol = 5de2d8a9-954a-437a-8a4f-fe6ab30b646d
[2015-12-11 16:04:24.346027] W [MSGID: 109009] [dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht: /for_ybest_fsdir/user/Weixin.oClDcjtd/2q: gfid different on FastVol-client-1. gfid local = 49c3a238-c204-4b05-0000-000000000000, gfid subvol = 49c3a238-c204-4b05-85ea-9a400044def6
[2015-12-11 17:55:46.232418] W [MSGID: 109009] [dht-common.c:569:dht_lookup_dir_cbk] 0-FastVol-dht: /for_ybest_fsdir/user/li/ur/on/gzhi/linkwrap/49138: gfid different on FastVol-client-1. gfid local = ae68fd66-36c8-4bd7-0000-000000000000, gfid subvol = ae68fd66-36c8-4bd7-a183-94390fb5704c

I also checked etc-glusterfs-glusterd.vol.log, but find no errors or warnings after the rebalance task had started. Latest lines in etc-glusterfs-glusterd.vol.log :
[2015-12-11 16:03:26.709198] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/b87982e05d7252cd3efe66bb7c634115.socket failed (Invalid argument)
[2015-12-11 16:03:29.709626] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/b87982e05d7252cd3efe66bb7c634115.socket failed (Invalid argument)
[2015-12-11 16:03:30.315759] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2015-12-11 16:03:30.318867] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2015-12-11 16:03:30.323944] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2015-12-11 16:03:30.326917] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-12-11 16:03:30.329868] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2015-12-11 16:03:30.371050] I [run.c:190:runner_log] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (--> /usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2] (--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=FastVol -o nfs.disable=on --gd-workdir=/var/lib/glusterd
[2015-12-11 16:03:30.389063] I [run.c:190:runner_log] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (--> /usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2] (--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=FastVol -o nfs.disable=on --gd-workdir=/var/lib/glusterd
The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2015-12-11 16:01:32.695911] and [2015-12-11 16:03:29.709689]
[2015-12-11 16:05:44.813587] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped
[2015-12-11 16:05:44.823077] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2015-12-11 16:05:44.825986] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-12-11 16:05:44.829007] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2015-12-11 16:05:44.865623] I [run.c:190:runner_log] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (--> /usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2] (--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh --volname=FastVol -o nfs.disable=true --gd-workdir=/var/lib/glusterd
[2015-12-11 16:05:44.873447] I [run.c:190:runner_log] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7f819756863b] (--> /usr/lib64/libglusterfs.so.0(runner_log+0x105)[0x7f81975bd5a5] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(glusterd_hooks_run_hooks+0x4cc)[0x7f818c027cbc] (--> /usr/lib64/glusterfs/3.7.4/xlator/mgmt/glusterd.so(+0xeefd2)[0x7f818c027fd2] (--> /lib64/libpthread.so.0(+0x79d1)[0x7f81966509d1] ))))) 0-management: Ran script: /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh --volname=FastVol -o nfs.disable=true --gd-workdir=/var/lib/glusterd
[2015-12-11 19:26:37.779065] W [socket.c:588:__socket_rwv] 0-management: readv on /var/run/gluster/gluster-rebalance-dbee250a-e3fe-4448-b905-b76c5ba80b25.sock failed (No data available)
[2015-12-11 19:26:38.220385] I [MSGID: 106007] [glusterd-rebalance.c:162:__glusterd_defrag_notify] 0-management: Rebalance process for volume FastVol has disconnected.
[2015-12-11 19:26:38.220446] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-management: size=588 max=1 total=1235
[2015-12-11 19:26:38.220462] I [MSGID: 101053] [mem-pool.c:616:mem_pool_destroy] 0-management: size=124 max=1 total=1235
[2015-12-12 01:11:13.920354] I [MSGID: 106488] [glusterd-handler.c:1463:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req
[2015-12-12 01:11:34.302028] I [MSGID: 106499] [glusterd-handler.c:4258:__glusterd_handle_status_volume] 0-management: Received status volume req for volume FastVol

Thanks

Cloudor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151212/13f596df/attachment.html>