[Gluster-users] Rebuild Distributed/Replicated Setup

Tue May 17 17:13:01 UTC 2011

Just FYI, I have errors similar to Remi, on 3.2.0 for certain dirs.,
same type of dist/repl setup.. Input/Output errors, unable to self-heal,
split-brain. Except mine say it's a permissions problem, not size
differences, even though the permissions on both backends for the
files/dirs show the same.

Only way I fixed mine was removing the offending directory and
recreating it.

-Tony

---------------------------

Manager, IT Operations

Format Dynamics, Inc.

P: 303-228-7327

F: 303-228-7305

abiacco at formatdynamics.com

http://www.formatdynamics.com <http://www.formatdynamics.com/> 

From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Remi Broemeling
Sent: Tuesday, May 17, 2011 9:33 AM
To: gluster-users at gluster.org
Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup

Hi Pranith.  Sure, here is a pastebin sampling of logs from one of the
hosts: http://pastebin.com/1U1ziwjC

On Mon, May 16, 2011 at 20:48, Pranith Kumar. Karampuri
<pranithk at gluster.com> wrote:

hi Remi,
   Would it be possible to post the logs on the client, so that we can
find what issue you are running into.

Pranith

----- Original Message -----
From: "Remi Broemeling" <remi at goclio.com>
To: gluster-users at gluster.org
Sent: Monday, May 16, 2011 10:47:33 PM
Subject: [Gluster-users] Rebuild Distributed/Replicated Setup

Hi,

I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM)
setup across two servers (web01 and web02) with the following vol
config:

volume shared-application-data-client-0
type protocol/client
option remote-host web01
option remote-subvolume /var/glusterfs/bricks/shared
option transport-type tcp
option ping-timeout 5
end-volume

volume shared-application-data-client-1
type protocol/client
option remote-host web02
option remote-subvolume /var/glusterfs/bricks/shared
option transport-type tcp
option ping-timeout 5
end-volume

volume shared-application-data-replicate-0
type cluster/replicate
subvolumes shared-application-data-client-0
shared-application-data-client-1
end-volume

volume shared-application-data-write-behind
type performance/write-behind
subvolumes shared-application-data-replicate-0
end-volume

volume shared-application-data-read-ahead
type performance/read-ahead
subvolumes shared-application-data-write-behind
end-volume

volume shared-application-data-io-cache
type performance/io-cache
subvolumes shared-application-data-read-ahead
end-volume

volume shared-application-data-quick-read
type performance/quick-read
subvolumes shared-application-data-io-cache
end-volume

volume shared-application-data-stat-prefetch
type performance/stat-prefetch
subvolumes shared-application-data-quick-read
end-volume

volume shared-application-data
type debug/io-stats
subvolumes shared-application-data-stat-prefetch
end-volume

In total, four servers mount this via GlusterFS FUSE. For whatever
reason (I'm really not sure why), the GlusterFS filesystem has run into
a bit of split-brain nightmare (although to my knowledge an actual split
brain situation has never occurred in this environment), and I have been
getting solidly corrupted issues across the filesystem as well as
complaints that the filesystem cannot be self-healed.

What I would like to do is completely empty one of the two servers (here
I am trying to empty server web01), making the other one (in this case
web02) the authoritative source for the data; and then have web01
completely rebuild it's mirror directly from web02.

What's the easiest/safest way to do this? Is there a command that I can
run that will force web01 to re-initialize it's mirror directly from
web02 (and thus completely eradicate all of the split-brain errors and
data inconsistencies)?

Thanks!
--

Remi Broemeling
System Administrator
Clio - Practice Management Simplified
1-888-858-2546 x(2^5) | remi at goclio.com
www.goclio.com | blog | twitter | facebook

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-- 

Remi Broemeling

System Administrator

Clio - Practice Management Simplified

1-888-858-2546 x(2^5) | remi at goclio.com

www.goclio.com <http://www.goclio.com/>  | blog
<http://www.goclio.com/blog>  | twitter <http://www.twitter.com/goclio>
| facebook <http://www.facebook.com/goclio> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110517/94f9ae38/attachment.html>