[Gluster-users] Volume stuck unable to add a brick
Boris Goldowsky
bgoldowsky at cast.org
Mon Apr 15 14:05:52 UTC 2019
Atin, thank you for the reply. Here are all of those pieces of information:
[bgoldowsky at webserver9 ~]$ gluster --version
glusterfs 3.12.2
(same on all nodes)
[bgoldowsky at webserver9 ~]$ sudo gluster peer status
Number of Peers: 3
Hostname: webserver11.cast.org
Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
State: Peer in Cluster (Connected)
Hostname: webserver1.cast.org
Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
State: Peer in Cluster (Connected)
Other names:
192.168.200.131
webserver1
Hostname: webserver8.cast.org
Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
State: Peer in Cluster (Connected)
Other names:
webserver8
[bgoldowsky at webserver1 ~]$ sudo gluster v info
Volume Name: dockervols
Type: Replicate
Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/dockervols
Brick2: webserver11:/data/gluster/dockervols
Brick3: webserver9:/data/gluster/dockervols
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
auth.allow: 127.0.0.1
Volume Name: testvol
Type: Replicate
Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/testvol
Brick2: webserver9:/data/gluster/testvol
Brick3: webserver11:/data/gluster/testvol
Brick4: webserver8:/data/gluster/testvol
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[bgoldowsky at webserver8 ~]$ sudo gluster v info
Volume Name: dockervols
Type: Replicate
Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/dockervols
Brick2: webserver11:/data/gluster/dockervols
Brick3: webserver9:/data/gluster/dockervols
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
auth.allow: 127.0.0.1
Volume Name: testvol
Type: Replicate
Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/testvol
Brick2: webserver9:/data/gluster/testvol
Brick3: webserver11:/data/gluster/testvol
Brick4: webserver8:/data/gluster/testvol
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
[bgoldowsky at webserver9 ~]$ sudo gluster v info
Volume Name: dockervols
Type: Replicate
Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/dockervols
Brick2: webserver11:/data/gluster/dockervols
Brick3: webserver9:/data/gluster/dockervols
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
auth.allow: 127.0.0.1
Volume Name: testvol
Type: Replicate
Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/testvol
Brick2: webserver9:/data/gluster/testvol
Brick3: webserver11:/data/gluster/testvol
Brick4: webserver8:/data/gluster/testvol
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
[bgoldowsky at webserver11 ~]$ sudo gluster v info
Volume Name: dockervols
Type: Replicate
Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/dockervols
Brick2: webserver11:/data/gluster/dockervols
Brick3: webserver9:/data/gluster/dockervols
Options Reconfigured:
auth.allow: 127.0.0.1
transport.address-family: inet
nfs.disable: on
Volume Name: testvol
Type: Replicate
Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: webserver1:/data/gluster/testvol
Brick2: webserver9:/data/gluster/testvol
Brick3: webserver11:/data/gluster/testvol
Brick4: webserver8:/data/gluster/testvol
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
[bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force
volume add-brick: failed: Commit failed on webserver8.cast.org. Please check log file for details.
Webserver8 glusterd.log:
[2019-04-15 13:55:42.338197] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197] and [2019-04-15 13:55:42.341618]
[2019-04-15 14:00:20.445011] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7fe697764215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe6a2d16ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2019-04-15 14:00:20.445148] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4
[2019-04-15 14:00:20.445184] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it
[2019-04-15 14:00:20.672347] E [MSGID: 106054] [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected]
[2019-04-15 14:00:20.693491] E [MSGID: 101042] [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq [Transport endpoint is not connected]
[2019-04-15 14:00:20.693597] E [MSGID: 106074] [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add bricks
[2019-04-15 14:00:20.693637] E [MSGID: 106123] [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed.
[2019-04-15 14:00:20.693667] E [MSGID: 106123] [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick
Webserver11 log file:
[2019-04-15 13:56:29.563270] I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req
The message "I [MSGID: 106488] [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270] and [2019-04-15 13:56:29.566209]
[2019-04-15 14:00:33.996866] I [run.c:190:runner_log] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215) [0x7f36de924215] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d) [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f36e9ed6ea5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=dockervols --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd
[2019-04-15 14:00:33.996979] I [MSGID: 106578] [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management: replica-count is set 4
[2019-04-15 14:00:33.997004] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it
[2019-04-15 14:00:34.013789] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already stopped
[2019-04-15 14:00:34.013849] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-04-15 14:00:34.017535] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping glustershd daemon running in pid: 6087
[2019-04-15 14:00:35.018783] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd service is stopped
[2019-04-15 14:00:35.018952] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting glustershd service
[2019-04-15 14:00:35.028306] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped
[2019-04-15 14:00:35.028408] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is stopped
[2019-04-15 14:00:35.028601] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already stopped
[2019-04-15 14:00:35.028645] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is stopped
Thank you for taking a look!
Boris
From: Atin Mukherjee <atin.mukherjee83 at gmail.com>
Date: Friday, April 12, 2019 at 1:10 PM
To: Boris Goldowsky <bgoldowsky at cast.org>
Cc: Gluster-users <gluster-users at gluster.org>
Subject: Re: [Gluster-users] Volume stuck unable to add a brick
On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky <bgoldowsky at cast.org<mailto:bgoldowsky at cast.org>> wrote:
I’ve got a replicated volume with three bricks (“1x3=3”), the idea is to have a common set of files that are locally available on all the machines (Scientific Linux 7, which is essentially CentOS 7) in a cluster.
I tried to add on a fourth machine, so used a command like this:
sudo gluster volume add-brick dockervols replica 4 webserver8:/data/gluster/dockervols force
but the result is:
volume add-brick: failed: Commit failed on webserver1. Please check log file for details.
Commit failed on webserver8. Please check log file for details.
Commit failed on webserver11. Please check log file for details.
Tried: removing the new brick (this also fails) and trying again.
Tried: checking the logs. The log files are not enlightening to me – I don’t know what’s normal and what’s not.
From webserver8 & webserver11 could you attach glusterd log files?
Also please share following:
- gluster version? (gluster —version)
- Output of ‘gluster peer status’
- Output of ‘gluster v info’ from all 4 nodes.
Tried: deleting the brick directory from previous attempt, so that it’s not in the way.
Tried: restarting gluster services
Tried: rebooting
Tried: setting up a new volume, replicated to all four machines. This works, so I’m assuming it’s not a networking issue. But still fails with this existing volume that has the critical data in it.
Running out of ideas. Any suggestions? Thank you!
Boris
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users
--
--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190415/f203f30f/attachment.html>
More information about the Gluster-users
mailing list