[Gluster-users] add-brick: failed: Commit failed
Ravishankar N
ravishankar at redhat.com
Wed May 22 06:06:36 UTC 2019
If you are trying this again, please 'gluster volume set $volname
client-log-level DEBUG`before attempting the add-brick and attach the
gvol0-add-brick-mount.log here. After that, you can change the
client-log-level back to INFO.
-Ravi
On 22/05/19 11:32 AM, Ravishankar N wrote:
>
>
> On 22/05/19 11:23 AM, David Cunningham wrote:
>> Hi Ravi,
>>
>> I'd already done exactly that before, where step 3 was a simple 'rm
>> -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on
>> what the cleanup or reformat should be?
> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David.
> Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must
> not have any extended attributes set on it. Why fuse_first_lookup() is
> failing is a bit of a mystery to me at this point. :-(
> Regards,
> Ravi
>>
>> Thank you.
>>
>>
>> On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at redhat.com
>> <mailto:ravishankar at redhat.com>> wrote:
>>
>> Hmm, so the volume info seems to indicate that the add-brick was
>> successful but the gfid xattr is missing on the new brick (as are
>> the actual files, barring the .glusterfs folder, according to
>> your previous mail).
>>
>> Do you want to try removing and adding it again?
>>
>> 1. `gluster volume remove-brick gvol0 replica 2
>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>
>> 2. Check that gluster volume info is now back to a 1x2 volume on
>> all nodes and `gluster peer status` is connected on all nodes.
>>
>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>>
>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>
>> 5. Check that the files are getting healed on to the new brick.
>>
>> Thanks,
>> Ravi
>> On 22/05/19 6:50 AM, David Cunningham wrote:
>>> Hi Ravi,
>>>
>>> Certainly. On the existing two nodes:
>>>
>>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: nodirectwritedata/gluster/gvol0
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: nodirectwritedata/gluster/gvol0
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.gvol0-client-0=0x000000000000000000000000
>>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>> On the new node:
>>>
>>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: nodirectwritedata/gluster/gvol0
>>> trusted.afr.dirty=0x000000000000000000000001
>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>> Output of "gluster volume info" is the same on all 3 nodes and is:
>>>
>>> # gluster volume info
>>>
>>> Volume Name: gvol0
>>> Type: Replicate
>>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>> Options Reconfigured:
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>>
>>>
>>> On Wed, 22 May 2019 at 12:43, Ravishankar N
>>> <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>> Hi David,
>>> Could you provide the `getfattr -d -m. -e hex
>>> /nodirectwritedata/gluster/gvol0` output of all bricks and
>>> the output of `gluster volume info`?
>>>
>>> Thanks,
>>> Ravi
>>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>>> Hi Sanju,
>>>>
>>>> Here's what glusterd.log says on the new arbiter server
>>>> when trying to add the node:
>>>>
>>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>> [0x7fe4ca9102cd]
>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>> [0x7fe4ca9bbb85]
>>>> -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>> --volname=gvol0 --version=1 --volume-op=add-brick
>>>> --gd-workdir=/var/lib/glusterd
>>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>> 0-management: replica-count is set 3
>>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>> 0-management: arbiter-count is set 1
>>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>> 0-management: type is set 0, need to change it
>>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>> 0-management: Failed to set extended attribute
>>>> trusted.add-brick : Transport endpoint is not connected
>>>> [Transport endpoint is not connected]
>>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>> 0-glusterd: Unable to add bricks
>>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management:
>>>> Add-brick commit failed.
>>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>> 0-management: commit failed on operation Add brick
>>>>
>>>> As before gvol0-add-brick-mount.log said:
>>>>
>>>> [2019-05-22 00:15:17.005695] I
>>>> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>>>> inited with protocol versions: glusterfs 7.24 kernel 7.22
>>>> [2019-05-22 00:15:17.005749] I
>>>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to
>>>> graph 0
>>>> [2019-05-22 00:15:17.010101] E
>>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup
>>>> on root failed (Transport endpoint is not connected)
>>>> [2019-05-22 00:15:17.014217] W
>>>> [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
>>>> LOOKUP() / => -1 (Transport endpoint is not connected)
>>>> [2019-05-22 00:15:17.015097] W
>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>> 00000000-0000-0000-0000-000000000001: failed to resolve
>>>> (Transport endpoint is not connected)
>>>> [2019-05-22 00:15:17.015158] W
>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse:
>>>> 3: SETXATTR 00000000-0000-0000-0000-000000000001/1
>>>> (trusted.add-brick) resolution failed
>>>> [2019-05-22 00:15:17.035636] I
>>>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating
>>>> unmount of /tmp/mntYGNbj9
>>>> [2019-05-22 00:15:17.035854] W
>>>> [glusterfsd.c:1500:cleanup_and_exit]
>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>> [0x55c81b63de75]
>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>> [0x55c81b63dceb] ) 0-: received signum (15), shutting down
>>>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini]
>>>> 0-fuse: Unmounting '/tmp/mntYGNbj9'.
>>>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini]
>>>> 0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'.
>>>>
>>>> Here are the processes running on the new arbiter server:
>>>> # ps -ef | grep gluster
>>>> root 3466 1 0 20:13 ? 00:00:00
>>>> /usr/sbin/glusterfs -s localhost --volfile-id
>>>> gluster/glustershd -p
>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>> /var/log/glusterfs/glustershd.log -S
>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>> --process-name glustershd
>>>> root 6832 1 0 May16 ? 00:02:10
>>>> /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>>> root 17841 1 0 May16 ? 00:00:58
>>>> /usr/sbin/glusterfs --process-name fuse
>>>> --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>>
>>>> Here are the files created on the new arbiter server:
>>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>> drwxr-xr-x 3 root root 4096 May 21 20:15
>>>> /nodirectwritedata/gluster/gvol0
>>>> drw------- 2 root root 4096 May 21 20:15
>>>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>
>>>> Thank you for your help!
>>>>
>>>>
>>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>> <srakonde at redhat.com <mailto:srakonde at redhat.com>> wrote:
>>>>
>>>> David,
>>>>
>>>> can you please attach glusterd.logs? As the error
>>>> message says, Commit failed on the arbitar node, we
>>>> might be able to find some issue on that node.
>>>>
>>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran
>>>> <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, 17 May 2019 at 06:01, David Cunningham
>>>> <dcunningham at voisonics.com
>>>> <mailto:dcunningham at voisonics.com>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> We're adding an arbiter node to an existing
>>>> volume and having an issue. Can anyone help?
>>>> The root cause error appears to be
>>>> "00000000-0000-0000-0000-000000000001: failed
>>>> to resolve (Transport endpoint is not
>>>> connected)", as below.
>>>>
>>>> We are running glusterfs 5.6.1. Thanks in
>>>> advance for any assistance!
>>>>
>>>> On existing node gfs1, trying to add new
>>>> arbiter node gfs3:
>>>>
>>>> # gluster volume add-brick gvol0 replica 3
>>>> arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0
>>>> volume add-brick: failed: Commit failed on
>>>> gfs3. Please check log file for details.
>>>>
>>>>
>>>> This looks like a glusterd issue. Please check the
>>>> glusterd logs for more info.
>>>> Adding the glusterd dev to this thread. Sanju, can
>>>> you take a look?
>>>> Regards,
>>>> Nithya
>>>>
>>>>
>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>
>>>> [2019-05-17 01:20:22.689721] I
>>>> [fuse-bridge.c:4267:fuse_init]
>>>> 0-glusterfs-fuse: FUSE inited with protocol
>>>> versions: glusterfs 7.24 kernel 7.22
>>>> [2019-05-17 01:20:22.689778] I
>>>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse:
>>>> switched to graph 0
>>>> [2019-05-17 01:20:22.694897] E
>>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse:
>>>> first lookup on root failed (Transport endpoint
>>>> is not connected)
>>>> [2019-05-17 01:20:22.699770] W
>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>> 0-fuse: 00000000-0000-0000-0000-000000000001:
>>>> failed to resolve (Transport endpoint is not
>>>> connected)
>>>> [2019-05-17 01:20:22.699834] W
>>>> [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>> 0-glusterfs-fuse: 2: SETXATTR
>>>> 00000000-0000-0000-0000-000000000001/1
>>>> (trusted.add-brick) resolution failed
>>>> [2019-05-17 01:20:22.715656] I
>>>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>> initating unmount of /tmp/mntQAtu3f
>>>> [2019-05-17 01:20:22.715865] W
>>>> [glusterfsd.c:1500:cleanup_and_exit]
>>>> (-->/lib64/libpthread.so.0(+0x7dd5)
>>>> [0x7fb223bf6dd5]
>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>> [0x560886581e75]
>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>> [0x560886581ceb] ) 0-: received signum (15),
>>>> shutting down
>>>> [2019-05-17 01:20:22.715926] I
>>>> [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>> '/tmp/mntQAtu3f'.
>>>> [2019-05-17 01:20:22.715953] I
>>>> [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>> connection to '/tmp/mntQAtu3f'.
>>>>
>>>> Processes running on new node gfs3:
>>>>
>>>> # ps -ef | grep gluster
>>>> root 6832 1 0 20:17 ? 00:00:00
>>>> /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>> --log-level INFO
>>>> root 15799 1 0 20:17 ? 00:00:00
>>>> /usr/sbin/glusterfs -s localhost --volfile-id
>>>> gluster/glustershd -p
>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>> /var/log/glusterfs/glustershd.log -S
>>>> /var/run/gluster/24c12b09f93eec8e.socket
>>>> --xlator-option
>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>> --process-name glustershd
>>>> root 16856 16735 0 21:21 pts/0 00:00:00
>>>> grep --color=auto gluster
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> <mailto:Gluster-users at gluster.org>
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Sanju
>>>>
>>>>
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>
>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/cf21d1a3/attachment.html>
More information about the Gluster-users
mailing list