[Gluster-users] Gluster Startup Issue

Sat Jun 25 04:55:21 UTC 2016

So I've tried using a lot of your script, but I'm still unable to get past
the "Launching heal operation to perform full self heal on volume <volname>
has been unsuccessful on bricks that are down. Please check if all brick
processes are running." error message.  Everything else seems like it's
working, but the "gluster volume heal appian full" is never working.

Is there any way to figure out what exactly happened that would cause this
error message?  The logs don't seem very useful in determining what exactly
happened.  It seems to just state that it can't seem to "Commit" with the
other bricks.

When I restart the volume though, it sometimes fixes it, but not sure I
want to run a script that constantly restarts the volume until "gluster
volume heal appian full" is working.

On Thu, Jun 23, 2016 at 2:21 AM, Heiko L. <heikol at fh-lausitz.de> wrote:

>
> hostname not needed
>
> # nodea=10.1.1.100;bricka=/mnt/sda6/brick4
> should be working
>
> but I prefer like to work with hostnames.
>
>
> regards heiko
>
> PS i forgot notes:
> - xfs,zfs (ext3 work, but partially bad performance (V3.4))
> - brickdir should not be topdir of fs
>   /dev/sda6 /mnt/brick4, brick=/mnt/brick4 ->  not recommended
>   /dev/sda6 /mnt/sda6,   brick=/mnt/sda6/brick4     better
>
> > Thank you for responding, Heiko.  In the process of seeing the
> differences
> > between our two scripts.  First thing I noticed was that the notes
> states "need
> > to be defined in the /etc/hosts". Would using the IP address directly be
> a
> > problem?
> >
> > On Tue, Jun 21, 2016 at 2:10 PM, Heiko L. <heikol at fh-lausitz.de> wrote:
> >
> >> Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
> >> > Hello,
> >> >
> >> >
> >> > We are currently figuring out how to add GlusterFS to our system to
> make
> >> > our systems highly available using scripts.  We are using Gluster
> 3.7.11.
> >> >
> >> > Problem:
> >> > Trying to migrate to GlusterFS from a non-clustered system to a 3-node
> >> > glusterfs replicated cluster using scripts.  Tried various things to
> >> make this work, but it sometimes causes us to be in an
> >> > indesirable state where if you call "gluster volume heal <volname>
> >> full", we would get the error message, "Launching heal
> >> > operation to perform full self heal on volume <volname> has been
> >> unsuccessful on bricks that are down. Please check if
> >> > all brick processes are running."  All the brick processes are running
> >> based on running the command, "gluster volume status
> >> > volname"
> >> >
> >> > Things we have tried:
> >> > Order of preference
> >> > 1. Create Volume with 3 Filesystems with the same data
> >> > 2. Create Volume with 2 Empty filesysytems and one with the data
> >> > 3. Create Volume with only one filesystem with data and then using
> >> > "add-brick" command to add the other two empty filesystems
> >> > 4. Create Volume with one empty filesystem, mounting it, and then
> copying
> >> > the data over to that one.  And then finally, using "add-brick"
> command
> >> to add the other two empty filesystems
> >> - should be working
> >> - read each file on /mnt/gvol, to trigger replication [2]
> >>
> >> > 5. Create Volume
> >> > with 3 empty filesystems, mounting it, and then copying the data over
> >> - my favorite
> >>
> >> >
> >> > Other things to note:
> >> > A few minutes after the volume is created and started successfully,
> our
> >> > application server starts up against it, so reads and writes may
> happen
> >> pretty quickly after the volume has started.  But there
> >> > is only about 50MB of data.
> >> >
> >> > Steps to reproduce (all in a script):
> >> > # This is run by the primary node with the IP Adress, <server-ip-1>,
> that
> >> > has data systemctl restart glusterd gluster peer probe <server-ip-2>
> >> gluster peer probe <server-ip-3> Wait for "gluster peer
> >> > status" to all be in "Peer in Cluster" state gluster volume create
> >> <volname> replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
> >> > ${BRICKS[2]} force
> >> > gluster volume set <volname> nfs.disable true gluster volume start
> >> <volname> mkdir -p $MOUNT_POINT mount -t glusterfs
> >> > <server-ip-1>:/volname $MOUNT_POINT
> >> > find $MOUNT_POINT | xargs stat
> >>
> >> I have written a script for 2 nodes. [1]
> >> but should be at least 3 nodes.
> >>
> >>
> >> I hope it helps you
> >> regards Heiko
> >>
> >> >
> >> > Note that, when we added sleeps around the gluster commands, there
> was a
> >> > higher probability of success, but not 100%.
> >> >
> >> > # Once volume is started, all the the clients/servers will mount the
> >> > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p
> >> $MOUNT_POINT mount -t glusterfs <server-ip-1>:/volname
> >> > $MOUNT_POINT
> >> >
> >> >
> >> > Logs:
> >> > *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
> >> >
> >> >
> >> > [2016-06-21 14:10:38.285234] I [MSGID: 106533]
> >> > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
> >> 0-management:
> >> > Received heal vol req for volume volname
> >> > [2016-06-21 14:10:38.296801] E [MSGID: 106153]
> >> > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> >> > <server-ip-2>. Please check log file for details.
> >> >
> >> >
> >> >
> >> > *usr-local-volname-data-mirrored-data.log* in *server-ip-1*
> >> >
> >> >
> >> > [2016-06-21 14:14:39.233366] E [MSGID: 114058]
> >> > [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
> >> > failed to get the port number for remote subvolume. Please run
> 'gluster
> >> volume status' on server to see if brick process is
> >> > running. *I think this is caused by the self heal daemon*
> >> >
> >> >
> >> > *cmd_history.log* in *server-ip-1*
> >> >
> >> >
> >> > [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :
> >> Commit
> >> > failed on <server-ip-2>. Please check log file for details.
> >> _______________________________________________
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > http://www.gluster.org/mailman/listinfo/gluster-users
> >>
> >> [1]
> >>
> http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt
> >>   - old, limit 2 nodes
> >>
> >>
> >> --
> >>
> >>
> >>
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160625/fedfbe55/attachment.html>