[Gluster-users] Volume stuck unable to add a brick

Karthik Subrahmanya ksubrahm at redhat.com
Tue Apr 16 06:51:44 UTC 2019


On Mon, Apr 15, 2019 at 9:43 PM Atin Mukherjee <atin.mukherjee83 at gmail.com>
wrote:

> +Karthik Subrahmanya <ksubrahm at redhat.com>
>
> Didn't we we fix this problem recently? Failed to set extended attribute
> indicates that temp mount is failing and we don't have quorum number of
> bricks up.
>

We had two fixes which handles two kind of add-brick scenarios.
[1] Fails add-brick when increasing the replica count if any of the brick
is down to avoid data loss. This can be overridden by using the force
option.
[2] Allow add-brick to set the extended attributes by the temp mount if the
volume is already mounted (has clients).

They are in version 3.12.2 so, patch [1] is present there. But since they
are using the force option it should not have any problem even if they have
any brick down. The error message they are getting is also different, so it
is not because of any brick being down I guess.
Patch [2] is not present in 3.12.2 and it is not the conversion from plain
distribute to replicate volume. So the scenario is different here.
It seems like they are hitting some other issue.

@Boris,
Can you attach the add-brick's temp mount log. The file name should look
something like "dockervols-add-brick-mount.log". Can you also provide all
the brick logs of that volume during that time.

[1] https://review.gluster.org/#/c/glusterfs/+/16330/
[2] https://review.gluster.org/#/c/glusterfs/+/21791/

Regards,
Karthik

>
> Boris - What's the gluster version are you using?
>
>
>
> On Mon, Apr 15, 2019 at 7:35 PM Boris Goldowsky <bgoldowsky at cast.org>
> wrote:
>
>> Atin, thank you for the reply.  Here are all of those pieces of
>> information:
>>
>>
>>
>> [bgoldowsky at webserver9 ~]$ gluster --version
>>
>> glusterfs 3.12.2
>>
>> (same on all nodes)
>>
>>
>>
>> [bgoldowsky at webserver9 ~]$ sudo gluster peer status
>>
>> Number of Peers: 3
>>
>>
>>
>> Hostname: webserver11.cast.org
>>
>> Uuid: c2b147fd-cab4-4859-9922-db5730f8549d
>>
>> State: Peer in Cluster (Connected)
>>
>>
>>
>> Hostname: webserver1.cast.org
>>
>> Uuid: 4b918f65-2c9d-478e-8648-81d1d6526d4c
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> 192.168.200.131
>>
>> webserver1
>>
>>
>>
>> Hostname: webserver8.cast.org
>>
>> Uuid: be2f568b-61c5-4016-9264-083e4e6453a2
>>
>> State: Peer in Cluster (Connected)
>>
>> Other names:
>>
>> webserver8
>>
>>
>>
>> [bgoldowsky at webserver1 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>> auth.allow: 127.0.0.1
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 4 = 4
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/testvol
>>
>> Brick2: webserver9:/data/gluster/testvol
>>
>> Brick3: webserver11:/data/gluster/testvol
>>
>> Brick4: webserver8:/data/gluster/testvol
>>
>> Options Reconfigured:
>>
>> transport.address-family: inet
>>
>> nfs.disable: on
>>
>>
>>
>> [bgoldowsky at webserver8 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>> auth.allow: 127.0.0.1
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 4 = 4
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/testvol
>>
>> Brick2: webserver9:/data/gluster/testvol
>>
>> Brick3: webserver11:/data/gluster/testvol
>>
>> Brick4: webserver8:/data/gluster/testvol
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>>
>>
>> [bgoldowsky at webserver9 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>> auth.allow: 127.0.0.1
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 4 = 4
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/testvol
>>
>> Brick2: webserver9:/data/gluster/testvol
>>
>> Brick3: webserver11:/data/gluster/testvol
>>
>> Brick4: webserver8:/data/gluster/testvol
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>> transport.address-family: inet
>>
>>
>>
>> [bgoldowsky at webserver11 ~]$ sudo gluster v info
>>
>> Volume Name: dockervols
>>
>> Type: Replicate
>>
>> Volume ID: 6093a9c6-ec6c-463a-ad25-8c3e3305b98a
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 3 = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/dockervols
>>
>> Brick2: webserver11:/data/gluster/dockervols
>>
>> Brick3: webserver9:/data/gluster/dockervols
>>
>> Options Reconfigured:
>>
>> auth.allow: 127.0.0.1
>>
>> transport.address-family: inet
>>
>> nfs.disable: on
>>
>>
>>
>> Volume Name: testvol
>>
>> Type: Replicate
>>
>> Volume ID: 4d5f00f5-00ea-4dcf-babf-1a76eca55332
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 1 x 4 = 4
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: webserver1:/data/gluster/testvol
>>
>> Brick2: webserver9:/data/gluster/testvol
>>
>> Brick3: webserver11:/data/gluster/testvol
>>
>> Brick4: webserver8:/data/gluster/testvol
>>
>> Options Reconfigured:
>>
>> transport.address-family: inet
>>
>> nfs.disable: on
>>
>>
>>
>> [bgoldowsky at webserver9 ~]$ sudo gluster volume add-brick dockervols
>> replica 4 webserver8:/data/gluster/dockervols force
>>
>> volume add-brick: failed: Commit failed on webserver8.cast.org. Please
>> check log file for details.
>>
>>
>>
>> Webserver8 glusterd.log:
>>
>>
>>
>> [2019-04-15 13:55:42.338197] I [MSGID: 106488]
>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>> Received get vol req
>>
>> The message "I [MSGID: 106488]
>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>> Received get vol req" repeated 2 times between [2019-04-15 13:55:42.338197]
>> and [2019-04-15 13:55:42.341618]
>>
>> [2019-04-15 14:00:20.445011] I [run.c:190:runner_log]
>> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
>> [0x7fe697764215]
>> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
>> [0x7fe69780de9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>> [0x7fe6a2d16ea5] ) 0-management: Ran script:
>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>> --volname=dockervols --version=1 --volume-op=add-brick
>> --gd-workdir=/var/lib/glusterd
>>
>> [2019-04-15 14:00:20.445148] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
>> replica-count is set 4
>>
>> [2019-04-15 14:00:20.445184] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
>> type is set 0, need to change it
>>
>> [2019-04-15 14:00:20.672347] E [MSGID: 106054]
>> [glusterd-utils.c:13863:glusterd_handle_replicate_brick_ops] 0-management:
>> Failed to set extended attribute trusted.add-brick : Transport endpoint is
>> not connected [Transport endpoint is not connected]
>>
>> [2019-04-15 14:00:20.693491] E [MSGID: 101042]
>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of /tmp/mntmvdFGq
>> [Transport endpoint is not connected]
>>
>> [2019-04-15 14:00:20.693597] E [MSGID: 106074]
>> [glusterd-brick-ops.c:2590:glusterd_op_add_brick] 0-glusterd: Unable to add
>> bricks
>>
>> [2019-04-15 14:00:20.693637] E [MSGID: 106123]
>> [glusterd-mgmt.c:312:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
>> failed.
>>
>> [2019-04-15 14:00:20.693667] E [MSGID: 106123]
>> [glusterd-mgmt-handler.c:616:glusterd_handle_commit_fn] 0-management:
>> commit failed on operation Add brick
>>
>>
>>
>> Webserver11 log file:
>>
>>
>>
>> [2019-04-15 13:56:29.563270] I [MSGID: 106488]
>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>> Received get vol req
>>
>> The message "I [MSGID: 106488]
>> [glusterd-handler.c:1559:__glusterd_handle_cli_get_volume] 0-management:
>> Received get vol req" repeated 2 times between [2019-04-15 13:56:29.563270]
>> and [2019-04-15 13:56:29.566209]
>>
>> [2019-04-15 14:00:33.996866] I [run.c:190:runner_log]
>> (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x3a215)
>> [0x7f36de924215]
>> -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0xe3e9d)
>> [0x7f36de9cde9d] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>> [0x7f36e9ed6ea5] ) 0-management: Ran script:
>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>> --volname=dockervols --version=1 --volume-op=add-brick
>> --gd-workdir=/var/lib/glusterd
>>
>> [2019-04-15 14:00:33.996979] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1354:glusterd_op_perform_add_bricks] 0-management:
>> replica-count is set 4
>>
>> [2019-04-15 14:00:33.997004] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
>> type is set 0, need to change it
>>
>> [2019-04-15 14:00:34.013789] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: nfs already
>> stopped
>>
>> [2019-04-15 14:00:34.013849] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is
>> stopped
>>
>> [2019-04-15 14:00:34.017535] I [MSGID: 106568]
>> [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping
>> glustershd daemon running in pid: 6087
>>
>> [2019-04-15 14:00:35.018783] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: glustershd
>> service is stopped
>>
>> [2019-04-15 14:00:35.018952] I [MSGID: 106567]
>> [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting
>> glustershd service
>>
>> [2019-04-15 14:00:35.028306] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already
>> stopped
>>
>> [2019-04-15 14:00:35.028408] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: bitd service is
>> stopped
>>
>> [2019-04-15 14:00:35.028601] I [MSGID: 106132]
>> [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: scrub already
>> stopped
>>
>> [2019-04-15 14:00:35.028645] I [MSGID: 106568]
>> [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: scrub service is
>> stopped
>>
>>
>>
>> Thank you for taking a look!
>>
>>
>>
>> Boris
>>
>>
>>
>>
>>
>> *From: *Atin Mukherjee <atin.mukherjee83 at gmail.com>
>> *Date: *Friday, April 12, 2019 at 1:10 PM
>> *To: *Boris Goldowsky <bgoldowsky at cast.org>
>> *Cc: *Gluster-users <gluster-users at gluster.org>
>> *Subject: *Re: [Gluster-users] Volume stuck unable to add a brick
>>
>>
>>
>>
>>
>>
>>
>> On Fri, 12 Apr 2019 at 22:32, Boris Goldowsky <bgoldowsky at cast.org>
>> wrote:
>>
>> I’ve got a replicated volume with three bricks  (“1x3=3”), the idea is to
>> have a common set of files that are locally available on all the machines
>> (Scientific Linux 7, which is essentially CentOS 7) in a cluster.
>>
>>
>>
>> I tried to add on a fourth machine, so used a command like this:
>>
>>
>>
>> sudo gluster volume add-brick dockervols replica 4
>> webserver8:/data/gluster/dockervols force
>>
>>
>>
>> but the result is:
>>
>> volume add-brick: failed: Commit failed on webserver1. Please check log
>> file for details.
>>
>> Commit failed on webserver8. Please check log file for details.
>>
>> Commit failed on webserver11. Please check log file for details.
>>
>>
>>
>> Tried: removing the new brick (this also fails) and trying again.
>>
>> Tried: checking the logs. The log files are not enlightening to me – I
>> don’t know what’s normal and what’s not.
>>
>>
>>
>> From webserver8 & webserver11 could you attach glusterd log files?
>>
>>
>>
>> Also please share following:
>>
>> - gluster version? (gluster —version)
>>
>> - Output of ‘gluster peer status’
>>
>> - Output of ‘gluster v info’ from all 4 nodes.
>>
>>
>>
>> Tried: deleting the brick directory from previous attempt, so that it’s
>> not in the way.
>>
>> Tried: restarting gluster services
>>
>> Tried: rebooting
>>
>> Tried: setting up a new volume, replicated to all four machines. This
>> works, so I’m assuming it’s not a networking issue.  But still fails with
>> this existing volume that has the critical data in it.
>>
>>
>>
>> Running out of ideas. Any suggestions?  Thank you!
>>
>>
>>
>> Boris
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> --
>>
>> --Atin
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190416/d31ce7ee/attachment.html>


More information about the Gluster-users mailing list