[Gluster-users] Gluster Startup Issue

Wed Jun 22 11:21:44 UTC 2016

Thank you for responding, Heiko.  In the process of seeing the differences
between our two scripts.  First thing I noticed was that the notes states "need
to be defined in the /etc/hosts". Would using the IP address directly be a
problem?

On Tue, Jun 21, 2016 at 2:10 PM, Heiko L. <heikol at fh-lausitz.de> wrote:

> Am Di, 21.06.2016, 19:22 schrieb Danny Lee:
> > Hello,
> >
> >
> > We are currently figuring out how to add GlusterFS to our system to make
> > our systems highly available using scripts.  We are using Gluster 3.7.11.
> >
> > Problem:
> > Trying to migrate to GlusterFS from a non-clustered system to a 3-node
> > glusterfs replicated cluster using scripts.  Tried various things to
> make this work, but it sometimes causes us to be in an
> > indesirable state where if you call "gluster volume heal <volname>
> full", we would get the error message, "Launching heal
> > operation to perform full self heal on volume <volname> has been
> unsuccessful on bricks that are down. Please check if
> > all brick processes are running."  All the brick processes are running
> based on running the command, "gluster volume status
> > volname"
> >
> > Things we have tried:
> > Order of preference
> > 1. Create Volume with 3 Filesystems with the same data
> > 2. Create Volume with 2 Empty filesysytems and one with the data
> > 3. Create Volume with only one filesystem with data and then using
> > "add-brick" command to add the other two empty filesystems
> > 4. Create Volume with one empty filesystem, mounting it, and then copying
> > the data over to that one.  And then finally, using "add-brick" command
> to add the other two empty filesystems
> - should be working
> - read each file on /mnt/gvol, to trigger replication [2]
>
> > 5. Create Volume
> > with 3 empty filesystems, mounting it, and then copying the data over
> - my favorite
>
> >
> > Other things to note:
> > A few minutes after the volume is created and started successfully, our
> > application server starts up against it, so reads and writes may happen
> pretty quickly after the volume has started.  But there
> > is only about 50MB of data.
> >
> > Steps to reproduce (all in a script):
> > # This is run by the primary node with the IP Adress, <server-ip-1>, that
> > has data systemctl restart glusterd gluster peer probe <server-ip-2>
> gluster peer probe <server-ip-3> Wait for "gluster peer
> > status" to all be in "Peer in Cluster" state gluster volume create
> <volname> replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]}
> > ${BRICKS[2]} force
> > gluster volume set <volname> nfs.disable true gluster volume start
> <volname> mkdir -p $MOUNT_POINT mount -t glusterfs
> > <server-ip-1>:/volname $MOUNT_POINT
> > find $MOUNT_POINT | xargs stat
>
> I have written a script for 2 nodes. [1]
> but should be at least 3 nodes.
>
>
> I hope it helps you
> regards Heiko
>
> >
> > Note that, when we added sleeps around the gluster commands, there was a
> > higher probability of success, but not 100%.
> >
> > # Once volume is started, all the the clients/servers will mount the
> > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p
> $MOUNT_POINT mount -t glusterfs <server-ip-1>:/volname
> > $MOUNT_POINT
> >
> >
> > Logs:
> > *etc-glusterfs-glusterd.vol.log* in *server-ip-1*
> >
> >
> > [2016-06-21 14:10:38.285234] I [MSGID: 106533]
> > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume]
> 0-management:
> > Received heal vol req for volume volname
> > [2016-06-21 14:10:38.296801] E [MSGID: 106153]
> > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on
> > <server-ip-2>. Please check log file for details.
> >
> >
> >
> > *usr-local-volname-data-mirrored-data.log* in *server-ip-1*
> >
> >
> > [2016-06-21 14:14:39.233366] E [MSGID: 114058]
> > [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0:
> > failed to get the port number for remote subvolume. Please run 'gluster
> volume status' on server to see if brick process is
> > running. *I think this is caused by the self heal daemon*
> >
> >
> > *cmd_history.log* in *server-ip-1*
> >
> >
> > [2016-06-21 14:10:38.298800]  : volume heal volname full : FAILED :
> Commit
> > failed on <server-ip-2>. Please check log file for details.
> _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
> [1]
> http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt
>   - old, limit 2 nodes
>
>
> --
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160622/3febca94/attachment.html>