[Gluster-users] Failed file system

Anuradha Talur atalur at redhat.com
Thu Aug 4 08:47:54 UTC 2016


Replace brick commit force command can be used. If you are on glusterfs 3.7.3 and above, self-heal will be
automatically triggered from good bricks to the newly added brick.
But you can't replace a brick on the same path as before, your new brick path will
have to be different that the existing ones in the volume.

----- Original Message -----
> From: "Mahdi Adnan" <mahdi.adnan at outlook.com>
> To: "Andres E. Moya" <amoya at moyasolutions.com>, "gluster-users" <gluster-users at gluster.org>
> Sent: Thursday, August 4, 2016 1:25:59 AM
> Subject: Re: [Gluster-users] Failed file system
> 
> Hi,
> 
> I'm not expert in Gluster but, i think it would be better to replace the
> downed brick with a new one.
> Maybe start from here;
> 
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-brick
> 
> 
> --
> 
> Respectfully
> Mahdi A. Mahdi
> 
> 
> 
> 
> Date: Wed, 3 Aug 2016 15:39:35 -0400
> From: amoya at moyasolutions.com
> To: gluster-users at gluster.org
> Subject: Re: [Gluster-users] Failed file system
> 
> Does anyone else have input?
> 
> we are currently only running off 1 node and one node is offline in replicate
> brick.
> 
> we are not experiencing any downtime because the 1 node is up.
> 
> I do not understand which is the best way to bring up a second node.
> 
> Do we just re create a file system on the node that is down and the mount
> points and allow gluster to heal( my concern with this is whether the node
> that is down will some how take precedence and wipe out the data on the
> healthy node instead of vice versa)
> 
> Or do we fully wipe out the config on the node that is down, re create the
> file system and re add the node that is down into gluster using the add
> brick command replica 3, and then wait for it to heal then run the remove
> brick command for the failed brick
> 
> which would be the safest and easiest to accomplish
> 
> thanks for any input
> 
> 
> 
> 
> From: "Leno Vo" <lenovolastname at yahoo.com>
> To: "Andres E. Moya" <amoya at moyasolutions.com>
> Cc: "gluster-users" <gluster-users at gluster.org>
> Sent: Tuesday, August 2, 2016 6:45:27 PM
> Subject: Re: [Gluster-users] Failed file system
> 
> if you don't want any downtime (in the case that your node 2 really die), you
> have to create a new gluster san (if you have the resources of course, 3
> nodes as much as possible this time), and then just migrate your vms (or
> files), therefore no downtime but you have to cross your finger that the
> only node will not die too... also without sharding the vm migration
> especially an rdp one, will be slow access from users till it migrated.
> 
> you have to start testing sharding, it's fast and cool...
> 
> 
> 
> 
> On Tuesday, August 2, 2016 2:51 PM, Andres E. Moya <amoya at moyasolutions.com>
> wrote:
> 
> 
> couldnt we just add a new server by
> 
> gluster peer probe
> gluster volume add-brick replica 3 (will this command succeed with 1 current
> failed brick?)
> 
> let it heal, then
> 
> gluster volume remove remove-brick
> 
> From: "Leno Vo" <lenovolastname at yahoo.com>
> To: "Andres E. Moya" <amoya at moyasolutions.com>, "gluster-users"
> <gluster-users at gluster.org>
> Sent: Tuesday, August 2, 2016 1:26:42 PM
> Subject: Re: [Gluster-users] Failed file system
> 
> you need to have a downtime to recreate the second node, two nodes is
> actually not good for production and you should have put raid 1 or raid 5 as
> your gluster storage, when you recreate the second node you might try
> running some VMs that need to be up and rest of vm need to be down but stop
> all backup and if you have replication, stop it too. if you have 1G nic,
> 2cpu and less 8Gram, then i suggest all turn off the VMs during recreation
> of second node. someone said if you have sharding with 3.7.x, maybe some vip
> vm can be up...
> 
> if it just a filesystem, then just turn off the backup service until you
> recreate the second node. depending on your resources and how big is your
> storage, it might be hours to recreate it and even days...
> 
> here's my process on recreating the second or third node (copied and modifed
> from the net),
> 
> #make sure partition is already added!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> This procedure is for replacing a failed server, IF your newly installed
> server has the same hostname as the failed one:
> 
> (If your new server will have a different hostname, see this article
> instead.)
> 
> For purposes of this example, the server that crashed will be server3 and the
> other servers will be server1 and server2
> 
> On both server1 and server2, make sure hostname server3 resolves to the
> correct IP address of the new replacement server.
> #On either server1 or server2, do
> grep server3 /var/lib/glusterd/peers/*
> 
> This will return a uuid followed by ":hostname1=server3"
> 
> #On server3, make sure glusterd is stopped, then do
> echo UUID={uuid from previous step}>/var/lib/glusterd/glusterd.info
> 
> #actual testing below,
> [root at node1 ~]# cat /var/lib/glusterd/glusterd.info
> UUID=4b9d153c-5958-4dbe-8f91-7b5002882aac
> operating-version=30710
> #the second line is new......... maybe not needed...
> 
> On server3:
> make sure that all brick directories are created/mounted
> start glusterd
> peer probe one of the existing servers
> 
> #restart glusterd, check that full peer list has been populated using
> gluster peer status
> 
> (if peers are missing, probe them explicitly, then restart glusterd again)
> #check that full volume configuration has been populated using
> gluster volume info
> 
> if volume configuration is missing, do
> #on the other node
> gluster volume sync "replace-node" all
> 
> #on the node to be replaced
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/v1/info | cut -d= -f2 | sed 's/-//g') /gfs/b1/v1
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/v2/info | cut -d= -f2 | sed 's/-//g') /gfs/b2/v2
> setfattr -n trusted.glusterfs.volume-id -v 0x$(grep volume-id
> /var/lib/glusterd/vols/config/info | cut -d= -f2 | sed 's/-//g')
> /gfs/b1/config/c1
> 
> mount -t glusterfs localhost:config /data/data1
> 
> #install ctdb if not yet installed and put it back online, use the step on
> creating the ctdb config but
> #use your common sense not to deleted or modify current one.
> 
> gluster vol heal v1 full
> gluster vol heal v2 full
> gluster vol heal config full
> 
> 
> 
> On Tuesday, August 2, 2016 11:57 AM, Andres E. Moya <amoya at moyasolutions.com>
> wrote:
> 
> 
> Hi, we have a 2 node replica setup
> on 1 of the nodes the file system that had the brick on it failed, not the OS
> can we re create a file system and mount the bricks on the same mount point
> 
> what will happen, will the data from the other node sync over, or will the
> failed node wipe out the data on the other mode?
> 
> what would be the correct process?
> 
> Thanks in advance for any help
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________ Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Thanks,
Anuradha.


More information about the Gluster-users mailing list