[Gluster-users] "mismatching layouts" errors after expanding volume

Jeff Darcy jdarcy at redhat.com
Thu Feb 23 17:21:40 UTC 2012


On 02/23/2012 11:45 AM, Dan Bretherton wrote:
>> The main question is therefore why
>> we're losing connectivity to these servers.
> Could there be a hardware issue?  I have replaced the network cables for 
> the two servers but I don't really know what else to check.  The network 
> switch hasn't recorded any errors for those two ports.  There isn't 
> anything sinister in /var/log/messages.
> 
> It seems a bit of a coincidence that both servers lost connection at 
> exactly the same time.   The only thing the users have started doing 
> differently recently is processing a large number of small text files.  
> There is one particular application they are running that processes this 
> data, but the load on the Glusterfs servers doesn't go up when it is 
> running.

It does seem like a weird coincidence.  About the only thing I can think of is
that there's some combination of events that occurs on those two servers but
not the others.  For example, what if there's some file that happens to live on
that replica pair, and which is accessed in some particularly pathological way?
 I used to see something like that with some astrophysics code that would try
to open and truncate the same file from each of a thousand nodes simultaneously
each time it started.  Needless to say, this caused a few problems.  ;)  Maybe
there's something about this new job type that similarly "converges" on one
file for configuration, logging, something like that?



More information about the Gluster-users mailing list