[Gluster-users] Failed file system

Atin Mukherjee amukherj at redhat.com
Wed Aug 3 19:58:28 UTC 2016


Use replace brick commit force.

@Pranith/@Anuradha - post this will self heal be triggered automatically or
a manual trigger is needed?

On Thursday 4 August 2016, Andres E. Moya <amoya at moyasolutions.com> wrote:

> Does anyone else have input?
>
> we are currently only running off 1 node and one node is offline in
> replicate brick.
>
> we are not experiencing any downtime because the 1 node is up.
>
> I do not understand which is the best way to bring up a second node.
>
> Do we just re create a file system on the node that is down and the mount
> points and allow gluster to heal( my concern with this is whether the node
> that is down will some how take precedence and wipe out the data on the
> healthy node instead of vice versa)
>
> Or do we fully wipe out the config on the node that is down, re create the
> file system and re add the node that is down into gluster using the add
> brick command replica 3, and then wait for it to heal then run the remove
> brick command for the failed brick
>
> which would be the safest and easiest to accomplish
>
> thanks for any input
>
>
>
> ------------------------------
> *From: *"Leno Vo" <lenovolastname at yahoo.com
> <javascript:_e(%7B%7D,'cvml','lenovolastname at yahoo.com');>>
> *To: *"Andres E. Moya" <amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at moyasolutions.com');>>
> *Cc: *"gluster-users" <gluster-users at gluster.org
> <javascript:_e(%7B%7D,'cvml','gluster-users at gluster.org');>>
> *Sent: *Tuesday, August 2, 2016 6:45:27 PM
> *Subject: *Re: [Gluster-users] Failed file system
>
> if you don't want any downtime (in the case that your node 2 really die),
> you have to create a new gluster san (if you have the resources of course,
> 3 nodes as much as possible this time), and then just migrate your vms (or
> files), therefore no downtime but you have to cross your finger that the
> only node will not die too...  also without sharding the vm migration
> especially an rdp one, will be slow access from users till it migrated.
>
> you have to start testing sharding, it's fast and cool...
>
>
>
>
> On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya <
> amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at moyasolutions.com');>> wrote:
>
>
> couldnt we just add a new server by
>
> gluster peer probe
> gluster volume add-brick replica 3 (will this command succeed with 1
> current failed brick?)
>
> let it heal, then
>
> gluster volume remove remove-brick
> ------------------------------
> *From: *"Leno Vo" <lenovolastname at yahoo.com
> <javascript:_e(%7B%7D,'cvml','lenovolastname at yahoo.com');>>
> *To: *"Andres E. Moya" <amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at moyasolutions.com');>>,
> "gluster-users" <gluster-users at gluster.org
> <javascript:_e(%7B%7D,'cvml','gluster-users at gluster.org');>>
> *Sent: *Tuesday, August 2, 2016 1:26:42 PM
> *Subject: *Re: [Gluster-users] Failed file system
>
> you need to have a downtime to recreate the second node, two nodes is
> actually not good for production and you should have put raid 1 or raid 5
> as your gluster storage, when you recreate the second node you might try
> running some VMs that need to be up and rest of vm need to be down but stop
> all backup and if you have replication, stop it too.  if you have 1G nic,
> 2cpu and less 8Gram, then i suggest all turn off the VMs during recreation
> of second node. someone said if you have sharding with 3.7.x, maybe some
> vip vm can be up...
>
> if it just a filesystem, then just turn off the backup service until you
> recreate the second node. depending on your resources and how big is your
> storage, it might be hours to recreate it and even days...
>
> here's my process on recreating the second or third node (copied and
> modifed from the net),
>
> #make sure partition is already added!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> This procedure is for replacing a failed server, IF your newly installed
> server has the same hostname as the failed one:
>
> (If your new server will have a different hostname, see this article
> instead.)
>
> For purposes of this example, the server that crashed will be server3 and
> the other servers will be server1 and server2
>
> On both server1 and server2, make sure hostname server3 resolves to the
> correct IP address of the new replacement server.
> #On either server1 or server2, do
> grep server3 /var/lib/glusterd/peers/*
>
> This will return a uuid followed by ":hostname1=server3"
>
> #On server3, make sure glusterd is stopped, then do
> echo UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info
>
> #actual testing below,
> [root at node1 ~]# cat /var/lib/glusterd/glusterd.info
> UUID=4b9d153c-5958-4dbe-8f91-7b5002882aac
> operating-version=30710
> #the second line is new.........  maybe not needed...
>
> On server3:
> make sure that all brick directories are created/mounted
> start glusterd
> peer probe one of the existing servers
>
> #restart glusterd, check that full peer list has been populated using
>  gluster peer status
>
> (if peers are missing, probe them explicitly, then restart glusterd again)
> #check that full volume configuration has been populated using
>  gluster volume info
>
> if volume configuration is missing, do
> #on the other node
> gluster volume sync "replace-node" all
>
> #on the node to be replaced
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/v1/info | cut -d= -f2 | sed 's/-//g') /gfs/b1/v1
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/v2/info | cut -d= -f2 | sed 's/-//g') /gfs/b2/v2
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/config/info | cut -d= -f2 | sed 's/-//g')
> /gfs/b1/config/c1
>
> mount -t glusterfs localhost:config /data/data1
>
> #install ctdb if not yet installed and put it back online, use the step on
> creating the ctdb config but
> #use your common sense not to deleted or modify current one.
>
> gluster vol heal v1 full
> gluster vol heal v2 full
> gluster vol heal config full
>
>
>
> On Tuesday, August 2, 2016 11:57 AM, Andres E. Moya <
> amoya at moyasolutions.com
> <javascript:_e(%7B%7D,'cvml','amoya at moyasolutions.com');>> wrote:
>
>
> Hi, we have a 2 node replica setup
> on 1 of the nodes the file system that had the brick on it failed, not the
> OS
> can we re create a file system and mount the bricks on the same mount point
>
> what will happen, will the data from the other node sync over, or will the
> failed node wipe out the data on the other mode?
>
> what would be the correct process?
>
> Thanks in advance for any help
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> <javascript:_e(%7B%7D,'cvml','Gluster-users at gluster.org');>
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>

-- 
--Atin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160804/cd94d71a/attachment.html>


More information about the Gluster-users mailing list