[Gluster-users] add-brick: failed: Commit failed
Ravishankar N
ravishankar at redhat.com
Fri May 24 13:48:52 UTC 2019
Hi David,
On 23/05/19 3:54 AM, David Cunningham wrote:
> Hi Ravi,
>
> Please see the log attached.
When I grep -E "Connected to |disconnected from"
gvol0-add-brick-mount.log, I don't see a "Connected to gvol0-client-1".
It looks like this temporary mount is not able to connect to the 2nd
brick, which is why the lookup is failing due to lack of quorum.
> The output of "gluster volume status" is as follows. Should there be
> something listening on gfs3? I'm not sure whether it having TCP Port
> and Pid as N/A is a symptom or cause. Thank you.
>
> # gluster volume status
> Status of volume: gvol0
> Gluster process TCP Port RDMA Port
> Online Pid
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y 7624
> Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A N N/A
Can you see if the following steps help?
1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v
0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both*
gfs1 and gfs2.
2. 'gluster volume start gvol0 force`
3. Check if Brick-3 now comes online with a valid TCP port and PID. If
it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3
to see why.
Thanks,
Ravi
> Self-heal Daemon on localhost N/A N/A Y 19853
> Self-heal Daemon on gfs1 N/A N/A Y 28600
> Self-heal Daemon on gfs2 N/A N/A Y 17614
>
> Task Status of Volume gvol0
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com
> <mailto:ravishankar at redhat.com>> wrote:
>
> If you are trying this again, please 'gluster volume set $volname
> client-log-level DEBUG`before attempting the add-brick and attach
> the gvol0-add-brick-mount.log here. After that, you can change the
> client-log-level back to INFO.
>
> -Ravi
>
> On 22/05/19 11:32 AM, Ravishankar N wrote:
>>
>>
>> On 22/05/19 11:23 AM, David Cunningham wrote:
>>> Hi Ravi,
>>>
>>> I'd already done exactly that before, where step 3 was a simple
>>> 'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another
>>> suggestion on what the cleanup or reformat should be?
>> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me
>> David. Basically, '/nodirectwritedata/gluster/gvol0' must be
>> empty and must not have any extended attributes set on it. Why
>> fuse_first_lookup() is failing is a bit of a mystery to me at
>> this point. :-(
>> Regards,
>> Ravi
>>>
>>> Thank you.
>>>
>>>
>>> On Wed, 22 May 2019 at 13:56, Ravishankar N
>>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>> Hmm, so the volume info seems to indicate that the add-brick
>>> was successful but the gfid xattr is missing on the new
>>> brick (as are the actual files, barring the .glusterfs
>>> folder, according to your previous mail).
>>>
>>> Do you want to try removing and adding it again?
>>>
>>> 1. `gluster volume remove-brick gvol0 replica 2
>>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>>
>>> 2. Check that gluster volume info is now back to a 1x2
>>> volume on all nodes and `gluster peer status` is connected
>>> on all nodes.
>>>
>>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on
>>> gfs3.
>>>
>>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>>
>>> 5. Check that the files are getting healed on to the new brick.
>>>
>>> Thanks,
>>> Ravi
>>> On 22/05/19 6:50 AM, David Cunningham wrote:
>>>> Hi Ravi,
>>>>
>>>> Certainly. On the existing two nodes:
>>>>
>>>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: nodirectwritedata/gluster/gvol0
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: nodirectwritedata/gluster/gvol0
>>>> trusted.afr.dirty=0x000000000000000000000000
>>>> trusted.afr.gvol0-client-0=0x000000000000000000000000
>>>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>> trusted.gfid=0x00000000000000000000000000000001
>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>> On the new node:
>>>>
>>>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>> getfattr: Removing leading '/' from absolute path names
>>>> # file: nodirectwritedata/gluster/gvol0
>>>> trusted.afr.dirty=0x000000000000000000000001
>>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>> Output of "gluster volume info" is the same on all 3 nodes
>>>> and is:
>>>>
>>>> # gluster volume info
>>>>
>>>> Volume Name: gvol0
>>>> Type: Replicate
>>>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>>> Options Reconfigured:
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>>
>>>>
>>>> On Wed, 22 May 2019 at 12:43, Ravishankar N
>>>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>>
>>>> Hi David,
>>>> Could you provide the `getfattr -d -m. -e hex
>>>> /nodirectwritedata/gluster/gvol0` output of all bricks
>>>> and the output of `gluster volume info`?
>>>>
>>>> Thanks,
>>>> Ravi
>>>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>>>> Hi Sanju,
>>>>>
>>>>> Here's what glusterd.log says on the new arbiter
>>>>> server when trying to add the node:
>>>>>
>>>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>>> [0x7fe4ca9102cd]
>>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>>> [0x7fe4ca9bbb85]
>>>>> -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>>> --volname=gvol0 --version=1 --volume-op=add-brick
>>>>> --gd-workdir=/var/lib/glusterd
>>>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>>> 0-management: replica-count is set 3
>>>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>>> 0-management: arbiter-count is set 1
>>>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>>> 0-management: type is set 0, need to change it
>>>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>>> 0-management: Failed to set extended attribute
>>>>> trusted.add-brick : Transport endpoint is not
>>>>> connected [Transport endpoint is not connected]
>>>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>>> 0-glusterd: Unable to add bricks
>>>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
>>>>> 0-management: Add-brick commit failed.
>>>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>>> 0-management: commit failed on operation Add brick
>>>>>
>>>>> As before gvol0-add-brick-mount.log said:
>>>>>
>>>>> [2019-05-22 00:15:17.005695] I
>>>>> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>>>>> inited with protocol versions: glusterfs 7.24 kernel 7.22
>>>>> [2019-05-22 00:15:17.005749] I
>>>>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched
>>>>> to graph 0
>>>>> [2019-05-22 00:15:17.010101] E
>>>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first
>>>>> lookup on root failed (Transport endpoint is not
>>>>> connected)
>>>>> [2019-05-22 00:15:17.014217] W
>>>>> [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
>>>>> LOOKUP() / => -1 (Transport endpoint is not connected)
>>>>> [2019-05-22 00:15:17.015097] W
>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>> 00000000-0000-0000-0000-000000000001: failed to
>>>>> resolve (Transport endpoint is not connected)
>>>>> [2019-05-22 00:15:17.015158] W
>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>> 0-glusterfs-fuse: 3: SETXATTR
>>>>> 00000000-0000-0000-0000-000000000001/1
>>>>> (trusted.add-brick) resolution failed
>>>>> [2019-05-22 00:15:17.035636] I
>>>>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>>> initating unmount of /tmp/mntYGNbj9
>>>>> [2019-05-22 00:15:17.035854] W
>>>>> [glusterfsd.c:1500:cleanup_and_exit]
>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>> [0x55c81b63de75]
>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>> [0x55c81b63dceb] ) 0-: received signum (15), shutting down
>>>>> [2019-05-22 00:15:17.035942] I
>>>>> [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>>> '/tmp/mntYGNbj9'.
>>>>> [2019-05-22 00:15:17.035966] I
>>>>> [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>>> connection to '/tmp/mntYGNbj9'.
>>>>>
>>>>> Here are the processes running on the new arbiter server:
>>>>> # ps -ef | grep gluster
>>>>> root 3466 1 0 20:13 ? 00:00:00
>>>>> /usr/sbin/glusterfs -s localhost --volfile-id
>>>>> gluster/glustershd -p
>>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>>> /var/log/glusterfs/glustershd.log -S
>>>>> /var/run/gluster/24c12b09f93eec8e.socket
>>>>> --xlator-option
>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>> --process-name glustershd
>>>>> root 6832 1 0 May16 ? 00:02:10
>>>>> /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>>> --log-level INFO
>>>>> root 17841 1 0 May16 ? 00:00:58
>>>>> /usr/sbin/glusterfs --process-name fuse
>>>>> --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>>>
>>>>> Here are the files created on the new arbiter server:
>>>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>>> drwxr-xr-x 3 root root 4096 May 21 20:15
>>>>> /nodirectwritedata/gluster/gvol0
>>>>> drw------- 2 root root 4096 May 21 20:15
>>>>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>>
>>>>> Thank you for your help!
>>>>>
>>>>>
>>>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>>> <srakonde at redhat.com <mailto:srakonde at redhat.com>> wrote:
>>>>>
>>>>> David,
>>>>>
>>>>> can you please attach glusterd.logs? As the error
>>>>> message says, Commit failed on the arbitar node,
>>>>> we might be able to find some issue on that node.
>>>>>
>>>>> On Mon, May 20, 2019 at 10:10 AM Nithya
>>>>> Balachandran <nbalacha at redhat.com
>>>>> <mailto:nbalacha at redhat.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 17 May 2019 at 06:01, David Cunningham
>>>>> <dcunningham at voisonics.com
>>>>> <mailto:dcunningham at voisonics.com>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We're adding an arbiter node to an
>>>>> existing volume and having an issue. Can
>>>>> anyone help? The root cause error appears
>>>>> to be
>>>>> "00000000-0000-0000-0000-000000000001:
>>>>> failed to resolve (Transport endpoint is
>>>>> not connected)", as below.
>>>>>
>>>>> We are running glusterfs 5.6.1. Thanks in
>>>>> advance for any assistance!
>>>>>
>>>>> On existing node gfs1, trying to add new
>>>>> arbiter node gfs3:
>>>>>
>>>>> # gluster volume add-brick gvol0 replica 3
>>>>> arbiter 1
>>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>>> volume add-brick: failed: Commit failed on
>>>>> gfs3. Please check log file for details.
>>>>>
>>>>>
>>>>> This looks like a glusterd issue. Please check
>>>>> the glusterd logs for more info.
>>>>> Adding the glusterd dev to this thread. Sanju,
>>>>> can you take a look?
>>>>> Regards,
>>>>> Nithya
>>>>>
>>>>>
>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>
>>>>> [2019-05-17 01:20:22.689721] I
>>>>> [fuse-bridge.c:4267:fuse_init]
>>>>> 0-glusterfs-fuse: FUSE inited with
>>>>> protocol versions: glusterfs 7.24 kernel 7.22
>>>>> [2019-05-17 01:20:22.689778] I
>>>>> [fuse-bridge.c:4878:fuse_graph_sync]
>>>>> 0-fuse: switched to graph 0
>>>>> [2019-05-17 01:20:22.694897] E
>>>>> [fuse-bridge.c:4336:fuse_first_lookup]
>>>>> 0-fuse: first lookup on root failed
>>>>> (Transport endpoint is not connected)
>>>>> [2019-05-17 01:20:22.699770] W
>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>>> 0-fuse:
>>>>> 00000000-0000-0000-0000-000000000001:
>>>>> failed to resolve (Transport endpoint is
>>>>> not connected)
>>>>> [2019-05-17 01:20:22.699834] W
>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>> 0-glusterfs-fuse: 2: SETXATTR
>>>>> 00000000-0000-0000-0000-000000000001/1
>>>>> (trusted.add-brick) resolution failed
>>>>> [2019-05-17 01:20:22.715656] I
>>>>> [fuse-bridge.c:5144:fuse_thread_proc]
>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>> [2019-05-17 01:20:22.715865] W
>>>>> [glusterfsd.c:1500:cleanup_and_exit]
>>>>> (-->/lib64/libpthread.so.0(+0x7dd5)
>>>>> [0x7fb223bf6dd5]
>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>> [0x560886581e75]
>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>> [0x560886581ceb] ) 0-: received signum
>>>>> (15), shutting down
>>>>> [2019-05-17 01:20:22.715926] I
>>>>> [fuse-bridge.c:5914:fini] 0-fuse:
>>>>> Unmounting '/tmp/mntQAtu3f'.
>>>>> [2019-05-17 01:20:22.715953] I
>>>>> [fuse-bridge.c:5919:fini] 0-fuse: Closing
>>>>> fuse connection to '/tmp/mntQAtu3f'.
>>>>>
>>>>> Processes running on new node gfs3:
>>>>>
>>>>> # ps -ef | grep gluster
>>>>> root 6832 1 0 20:17 ? 00:00:00
>>>>> /usr/sbin/glusterd -p
>>>>> /var/run/glusterd.pid --log-level INFO
>>>>> root 15799 1 0 20:17 ? 00:00:00
>>>>> /usr/sbin/glusterfs -s localhost
>>>>> --volfile-id gluster/glustershd -p
>>>>> /var/run/gluster/glustershd/glustershd.pid
>>>>> -l /var/log/glusterfs/glustershd.log -S
>>>>> /var/run/gluster/24c12b09f93eec8e.socket
>>>>> --xlator-option
>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>> --process-name glustershd
>>>>> root 16856 16735 0 21:21 pts/0
>>>>> 00:00:00 grep --color=auto gluster
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> <mailto:Gluster-users at gluster.org>
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Sanju
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>
>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/099fc642/attachment.html>
More information about the Gluster-users
mailing list