[Gluster-users] add-brick: failed: Commit failed

Ravishankar N ravishankar at redhat.com
Fri May 24 13:48:52 UTC 2019


Hi David,

On 23/05/19 3:54 AM, David Cunningham wrote:
> Hi Ravi,
>
> Please see the log attached.
When I grep -E "Connected to |disconnected from" 
gvol0-add-brick-mount.log,  I don't see a "Connected to gvol0-client-1". 
It looks like this temporary mount is not able to connect to the 2nd 
brick, which is why the lookup is failing due to lack of quorum.
> The output of "gluster volume status" is as follows. Should there be 
> something listening on gfs3? I'm not sure whether it having TCP Port 
> and Pid as N/A is a symptom or cause. Thank you.
>
> # gluster volume status
> Status of volume: gvol0
> Gluster process                             TCP Port  RDMA Port  
> Online  Pid
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0          Y       7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0          Y       7624
> Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A        N       N/A

Can you see if the following steps help?

1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v 
0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both* 
gfs1 and gfs2.

2. 'gluster volume start gvol0 force`

3. Check if Brick-3 now comes online with a valid TCP port and PID. If 
it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 
to see why.

Thanks,

Ravi


> Self-heal Daemon on localhost               N/A N/A        Y       19853
> Self-heal Daemon on gfs1                    N/A N/A        Y       28600
> Self-heal Daemon on gfs2                    N/A N/A        Y       17614
>
> Task Status of Volume gvol0
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>     If you are trying this again, please 'gluster volume set $volname
>     client-log-level DEBUG`before attempting the add-brick and attach
>     the gvol0-add-brick-mount.log here. After that, you can change the
>     client-log-level back to INFO.
>
>     -Ravi
>
>     On 22/05/19 11:32 AM, Ravishankar N wrote:
>>
>>
>>     On 22/05/19 11:23 AM, David Cunningham wrote:
>>>     Hi Ravi,
>>>
>>>     I'd already done exactly that before, where step 3 was a simple
>>>     'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another
>>>     suggestion on what the cleanup or reformat should be?
>>     `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me
>>     David. Basically, '/nodirectwritedata/gluster/gvol0' must be
>>     empty and must not have any extended attributes set on it. Why
>>     fuse_first_lookup() is failing is a bit of a mystery to me at
>>     this point. :-(
>>     Regards,
>>     Ravi
>>>
>>>     Thank you.
>>>
>>>
>>>     On Wed, 22 May 2019 at 13:56, Ravishankar N
>>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>>         Hmm, so the volume info seems to indicate that the add-brick
>>>         was successful but the gfid xattr is missing on the new
>>>         brick (as are the actual files, barring the .glusterfs
>>>         folder, according to your previous mail).
>>>
>>>         Do you want to try removing and adding it again?
>>>
>>>         1. `gluster volume remove-brick gvol0 replica 2
>>>         gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>>
>>>         2. Check that gluster volume info is now back to a 1x2
>>>         volume on all nodes and `gluster peer status` is  connected
>>>         on all nodes.
>>>
>>>         3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on
>>>         gfs3.
>>>
>>>         4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>>>         gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>>
>>>         5. Check that the files are getting healed on to the new brick.
>>>
>>>         Thanks,
>>>         Ravi
>>>         On 22/05/19 6:50 AM, David Cunningham wrote:
>>>>         Hi Ravi,
>>>>
>>>>         Certainly. On the existing two nodes:
>>>>
>>>>         gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>>         trusted.gfid=0x00000000000000000000000000000001
>>>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>         trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-0=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>>         trusted.gfid=0x00000000000000000000000000000001
>>>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>         trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         On the new node:
>>>>
>>>>         gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000001
>>>>         trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         Output of "gluster volume info" is the same on all 3 nodes
>>>>         and is:
>>>>
>>>>         # gluster volume info
>>>>
>>>>         Volume Name: gvol0
>>>>         Type: Replicate
>>>>         Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>>>         Status: Started
>>>>         Snapshot Count: 0
>>>>         Number of Bricks: 1 x (2 + 1) = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>>>         Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>>>         Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>>>         Options Reconfigured:
>>>>         performance.client-io-threads: off
>>>>         nfs.disable: on
>>>>         transport.address-family: inet
>>>>
>>>>
>>>>         On Wed, 22 May 2019 at 12:43, Ravishankar N
>>>>         <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>>
>>>>             Hi David,
>>>>             Could you provide the `getfattr -d -m. -e hex
>>>>             /nodirectwritedata/gluster/gvol0` output of all bricks
>>>>             and the output of `gluster volume info`?
>>>>
>>>>             Thanks,
>>>>             Ravi
>>>>             On 22/05/19 4:57 AM, David Cunningham wrote:
>>>>>             Hi Sanju,
>>>>>
>>>>>             Here's what glusterd.log says on the new arbiter
>>>>>             server when trying to add the node:
>>>>>
>>>>>             [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>>>             (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>>>             [0x7fe4ca9102cd]
>>>>>             -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>>>             [0x7fe4ca9bbb85]
>>>>>             -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>>>             [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>>>             /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>>>             --volname=gvol0 --version=1 --volume-op=add-brick
>>>>>             --gd-workdir=/var/lib/glusterd
>>>>>             [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>>>             [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>>>             0-management: replica-count is set 3
>>>>>             [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>>>             [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>>>             0-management: arbiter-count is set 1
>>>>>             [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>>>             [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>>>             0-management: type is set 0, need to change it
>>>>>             [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>>>             [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>>>             0-management: Failed to set extended attribute
>>>>>             trusted.add-brick : Transport endpoint is not
>>>>>             connected [Transport endpoint is not connected]
>>>>>             [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>>>             [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>>>             0-glusterd: Unable to add bricks
>>>>>             [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>>>             [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
>>>>>             0-management: Add-brick commit failed.
>>>>>             [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>>>             [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>>>             0-management: commit failed on operation Add brick
>>>>>
>>>>>             As before gvol0-add-brick-mount.log said:
>>>>>
>>>>>             [2019-05-22 00:15:17.005695] I
>>>>>             [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>>>>>             inited with protocol versions: glusterfs 7.24 kernel 7.22
>>>>>             [2019-05-22 00:15:17.005749] I
>>>>>             [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched
>>>>>             to graph 0
>>>>>             [2019-05-22 00:15:17.010101] E
>>>>>             [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first
>>>>>             lookup on root failed (Transport endpoint is not
>>>>>             connected)
>>>>>             [2019-05-22 00:15:17.014217] W
>>>>>             [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
>>>>>             LOOKUP() / => -1 (Transport endpoint is not connected)
>>>>>             [2019-05-22 00:15:17.015097] W
>>>>>             [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>>             00000000-0000-0000-0000-000000000001: failed to
>>>>>             resolve (Transport endpoint is not connected)
>>>>>             [2019-05-22 00:15:17.015158] W
>>>>>             [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>>             0-glusterfs-fuse: 3: SETXATTR
>>>>>             00000000-0000-0000-0000-000000000001/1
>>>>>             (trusted.add-brick) resolution failed
>>>>>             [2019-05-22 00:15:17.035636] I
>>>>>             [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>>>             initating unmount of /tmp/mntYGNbj9
>>>>>             [2019-05-22 00:15:17.035854] W
>>>>>             [glusterfsd.c:1500:cleanup_and_exit]
>>>>>             (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>>>             -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>>             [0x55c81b63de75]
>>>>>             -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>>             [0x55c81b63dceb] ) 0-: received signum (15), shutting down
>>>>>             [2019-05-22 00:15:17.035942] I
>>>>>             [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>>>             '/tmp/mntYGNbj9'.
>>>>>             [2019-05-22 00:15:17.035966] I
>>>>>             [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>>>             connection to '/tmp/mntYGNbj9'.
>>>>>
>>>>>             Here are the processes running on the new arbiter server:
>>>>>             # ps -ef | grep gluster
>>>>>             root      3466     1  0 20:13 ?        00:00:00
>>>>>             /usr/sbin/glusterfs -s localhost --volfile-id
>>>>>             gluster/glustershd -p
>>>>>             /var/run/gluster/glustershd/glustershd.pid -l
>>>>>             /var/log/glusterfs/glustershd.log -S
>>>>>             /var/run/gluster/24c12b09f93eec8e.socket
>>>>>             --xlator-option
>>>>>             *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>>             --process-name glustershd
>>>>>             root      6832     1  0 May16 ?        00:02:10
>>>>>             /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>>>             --log-level INFO
>>>>>             root     17841     1  0 May16 ?        00:00:58
>>>>>             /usr/sbin/glusterfs --process-name fuse
>>>>>             --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>>>
>>>>>             Here are the files created on the new arbiter server:
>>>>>             # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>>>             drwxr-xr-x 3 root root 4096 May 21 20:15
>>>>>             /nodirectwritedata/gluster/gvol0
>>>>>             drw------- 2 root root 4096 May 21 20:15
>>>>>             /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>>
>>>>>             Thank you for your help!
>>>>>
>>>>>
>>>>>             On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>>>             <srakonde at redhat.com <mailto:srakonde at redhat.com>> wrote:
>>>>>
>>>>>                 David,
>>>>>
>>>>>                 can you please attach glusterd.logs? As the error
>>>>>                 message says, Commit failed on the arbitar node,
>>>>>                 we might be able to find some issue on that node.
>>>>>
>>>>>                 On Mon, May 20, 2019 at 10:10 AM Nithya
>>>>>                 Balachandran <nbalacha at redhat.com
>>>>>                 <mailto:nbalacha at redhat.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>                     On Fri, 17 May 2019 at 06:01, David Cunningham
>>>>>                     <dcunningham at voisonics.com
>>>>>                     <mailto:dcunningham at voisonics.com>> wrote:
>>>>>
>>>>>                         Hello,
>>>>>
>>>>>                         We're adding an arbiter node to an
>>>>>                         existing volume and having an issue. Can
>>>>>                         anyone help? The root cause error appears
>>>>>                         to be
>>>>>                         "00000000-0000-0000-0000-000000000001:
>>>>>                         failed to resolve (Transport endpoint is
>>>>>                         not connected)", as below.
>>>>>
>>>>>                         We are running glusterfs 5.6.1. Thanks in
>>>>>                         advance for any assistance!
>>>>>
>>>>>                         On existing node gfs1, trying to add new
>>>>>                         arbiter node gfs3:
>>>>>
>>>>>                         # gluster volume add-brick gvol0 replica 3
>>>>>                         arbiter 1
>>>>>                         gfs3:/nodirectwritedata/gluster/gvol0
>>>>>                         volume add-brick: failed: Commit failed on
>>>>>                         gfs3. Please check log file for details.
>>>>>
>>>>>
>>>>>                     This looks like a glusterd issue. Please check
>>>>>                     the glusterd logs for more info.
>>>>>                     Adding the glusterd dev to this thread. Sanju,
>>>>>                     can you take a look?
>>>>>                     Regards,
>>>>>                     Nithya
>>>>>
>>>>>
>>>>>                         On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>
>>>>>                         [2019-05-17 01:20:22.689721] I
>>>>>                         [fuse-bridge.c:4267:fuse_init]
>>>>>                         0-glusterfs-fuse: FUSE inited with
>>>>>                         protocol versions: glusterfs 7.24 kernel 7.22
>>>>>                         [2019-05-17 01:20:22.689778] I
>>>>>                         [fuse-bridge.c:4878:fuse_graph_sync]
>>>>>                         0-fuse: switched to graph 0
>>>>>                         [2019-05-17 01:20:22.694897] E
>>>>>                         [fuse-bridge.c:4336:fuse_first_lookup]
>>>>>                         0-fuse: first lookup on root failed
>>>>>                         (Transport endpoint is not connected)
>>>>>                         [2019-05-17 01:20:22.699770] W
>>>>>                         [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>>>                         0-fuse:
>>>>>                         00000000-0000-0000-0000-000000000001:
>>>>>                         failed to resolve (Transport endpoint is
>>>>>                         not connected)
>>>>>                         [2019-05-17 01:20:22.699834] W
>>>>>                         [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>>                         0-glusterfs-fuse: 2: SETXATTR
>>>>>                         00000000-0000-0000-0000-000000000001/1
>>>>>                         (trusted.add-brick) resolution failed
>>>>>                         [2019-05-17 01:20:22.715656] I
>>>>>                         [fuse-bridge.c:5144:fuse_thread_proc]
>>>>>                         0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>>                         [2019-05-17 01:20:22.715865] W
>>>>>                         [glusterfsd.c:1500:cleanup_and_exit]
>>>>>                         (-->/lib64/libpthread.so.0(+0x7dd5)
>>>>>                         [0x7fb223bf6dd5]
>>>>>                         -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>>                         [0x560886581e75]
>>>>>                         -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>>                         [0x560886581ceb] ) 0-: received signum
>>>>>                         (15), shutting down
>>>>>                         [2019-05-17 01:20:22.715926] I
>>>>>                         [fuse-bridge.c:5914:fini] 0-fuse:
>>>>>                         Unmounting '/tmp/mntQAtu3f'.
>>>>>                         [2019-05-17 01:20:22.715953] I
>>>>>                         [fuse-bridge.c:5919:fini] 0-fuse: Closing
>>>>>                         fuse connection to '/tmp/mntQAtu3f'.
>>>>>
>>>>>                         Processes running on new node gfs3:
>>>>>
>>>>>                         # ps -ef | grep gluster
>>>>>                         root 6832     1  0 20:17 ? 00:00:00
>>>>>                         /usr/sbin/glusterd -p
>>>>>                         /var/run/glusterd.pid --log-level INFO
>>>>>                         root 15799     1  0 20:17 ? 00:00:00
>>>>>                         /usr/sbin/glusterfs -s localhost
>>>>>                         --volfile-id gluster/glustershd -p
>>>>>                         /var/run/gluster/glustershd/glustershd.pid
>>>>>                         -l /var/log/glusterfs/glustershd.log -S
>>>>>                         /var/run/gluster/24c12b09f93eec8e.socket
>>>>>                         --xlator-option
>>>>>                         *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>>                         --process-name glustershd
>>>>>                         root     16856 16735  0 21:21 pts/0
>>>>>                         00:00:00 grep --color=auto gluster
>>>>>
>>>>>                         -- 
>>>>>                         David Cunningham, Voisonics Limited
>>>>>                         http://voisonics.com/
>>>>>                         USA: +1 213 221 1092
>>>>>                         New Zealand: +64 (0)28 2558 3782
>>>>>                         _______________________________________________
>>>>>                         Gluster-users mailing list
>>>>>                         Gluster-users at gluster.org
>>>>>                         <mailto:Gluster-users at gluster.org>
>>>>>                         https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Thanks,
>>>>>                 Sanju
>>>>>
>>>>>
>>>>>
>>>>>             -- 
>>>>>             David Cunningham, Voisonics Limited
>>>>>             http://voisonics.com/
>>>>>             USA: +1 213 221 1092
>>>>>             New Zealand: +64 (0)28 2558 3782
>>>>>
>>>>>             _______________________________________________
>>>>>             Gluster-users mailing list
>>>>>             Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>>             https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         David Cunningham, Voisonics Limited
>>>>         http://voisonics.com/
>>>>         USA: +1 213 221 1092
>>>>         New Zealand: +64 (0)28 2558 3782
>>>
>>>
>>>
>>>     -- 
>>>     David Cunningham, Voisonics Limited
>>>     http://voisonics.com/
>>>     USA: +1 213 221 1092
>>>     New Zealand: +64 (0)28 2558 3782
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/099fc642/attachment.html>


More information about the Gluster-users mailing list