[Gluster-users] File IO issues during brick unreachable in replica config

Sun Jun 3 15:05:09 UTC 2012

I've a volume in a 4 way replica configuration running 3.3.0 - Two 
bricks are in one datacenter, two are in the other. We had some sort of 
connectivity issue between the two facilities this morning, and 
applications utilizing gluster mounts (via NFS; in this case only-read 
work load) experienced IO timeouts.

I've a 5s network timeout on the volume, and a 20s timeout on the 
application - I'd expect even if it went through 3 bricks before it 
found a good one for a read, it would take 10s.

What is the expectation for a read which occurs when a brick is in the 
process of failing? Should the IO fail, or should it be re-routed to an 
available brick? I don't see anything specific in nfs.log indicating a 
particular read failed, just that the bricks went up/down.

Info is below - Let me know if there are other logs I need to look at.

[root at dresproddns02 glusterfs]# gluster volume info svn

Volume Name: svn
Type: Replicate
Volume ID: fabe320d-5ef2-4f35-9720-eab617e13dde
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: rhesproddns01:/gluster/svn
Brick2: rhesproddns02:/gluster/svn
Brick3: dresproddns01:/gluster/svn
Brick4: dresproddns02:/gluster/svn
Options Reconfigured:
performance.write-behind-window-size: 128Mb
performance.cache-size: 256Mb
auth.allow: 10.250.53.*,10.252.248.*,169.254.*,127.0.0.1
nfs.register-with-portmap: on
nfs.disable: off
performance.stat-prefetch: 1
network.ping-timeout: 5
performance.flush-behind: on
performance.client-io-threads: 1
nfs.rpc-auth-allow: 127.0.0.1

nfs.log output is here:
http://pastebin.com/CNmP4s32