[Gluster-users] Possible new bug in 3.1.5 discovered

Wed Jun 29 15:25:45 UTC 2011

Thank you.

Would you be able to point me to the documentation of this in terms of release notes? It seems clear that I should read them more closely :)

James Burnash
Unix Engineer
Knight Capital Group

From: Anand Avati [mailto:anand.avati at gmail.com]
Sent: Wednesday, June 29, 2011 11:17 AM
To: Burnash, James
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Possible new bug in 3.1.5 discovered

Compatibility was broken between 3.1.4 (and pre) servers and 3.1.5 clients (results in a hang when replicate translator is used). This compat breakage was "necessary" in order to fix a hang issue which was present in all 3.1.x till then. New servers should work fine with old clients. Upgrade all your servers before upgrading the clients.

Avati
On Wed, Jun 29, 2011 at 8:23 PM, Burnash, James <jburnash at knight.com<mailto:jburnash at knight.com>> wrote:
I'm sorry - I think I wasn't clear.

The problem is that a 3.1.5 client used to write a file to GlusterFS native mount point on a server running 3.1.3 hangs.

Are you saying that the clients are known to not be backward compatible within the 3.1.x series?

James Burnash
Unix Engineer
Knight Capital Group

From: Anand Avati [mailto:anand.avati at gmail.com<mailto:anand.avati at gmail.com>]
Sent: Wednesday, June 29, 2011 10:46 AM
To: Burnash, James
Cc: gluster-users at gluster.org<mailto:gluster-users at gluster.org>
Subject: Re: [Gluster-users] Possible new bug in 3.1.5 discovered

James,
  Both in 3.1.5 and 3.2.1 there were necessary locks hang fixes which went in and as a side effect clients and servers result in a hang when used across versions. Please upgrade your clients to 3.1.5 as well. This is a known, and hard to fix compatibility issue.

Avati
On Wed, Jun 29, 2011 at 8:05 PM, Burnash, James <jburnash at knight.com<mailto:jburnash at knight.com>> wrote:
"May you live in interesting times"

Is this a curse or a blessing? :)

I've just tested a 3.1.5 GlusterFS native client against a 3.1.3 storage pool using this volume:

Volume Name: pfs-rw1
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: jc1letgfs16-pfs1:/export/read-write/g01
Brick2: jc1letgfs13-pfs1:/export/read-write/g01
Brick3: jc1letgfs16-pfs1:/export/read-write/g02
Brick4: jc1letgfs13-pfs1:/export/read-write/g02
Options Reconfigured:
performance.cache-size: 2GB
performance.stat-prefetch: 0
network.ping-timeout: 10
diagnostics.client-log-level: ERROR

Any attempt to write to that volume mounted on a native client using version 3.1.5 results in a hang at the command line, which I can only break out of by killing my ssh session into the client. Upon logging back into the same client, I see a zombie process from the attempt to write:

21172 ?        D      0:00 touch /pfs1/test/junk1

Anybody else run into this situation?

Client mount log (/var/log/glusterfs/pfs2.log) below:

[2011-06-29 10:28:07.860519] E [afr-self-heal-metadata.c:522:afr_sh_metadata_fix] 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
[2011-06-29 10:28:07.860668] E [afr-self-heal-metadata.c:522:afr_sh_metadata_fix] 0-pfs-ro1-replicate-1: Unable to self-heal permissions/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
ns/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
ns/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /
s/ownership of '/' (possible split-brain). Please fix the file on all backend volumes
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /
data self-heal failed on /

James Burnash
Unix Engineer
Knight Capital Group

DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the addressee(s)named herein and
may contain legally privileged and/or confidential information. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachments
thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently
delete the original and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free.
The sender therefore does not accept liability for any errors or omissions in the contents of this message which
arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY
Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications.

http://www.knight.com<http://www.knight.com/>

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110629/5edbe7c8/attachment.html>