[Gluster-users] Gluster native client in case of distributed volume node failure
Emir Imamagic
eimamagi at srce.hr
Sat Apr 16 19:01:05 UTC 2011
Hello,
I am trying to find precise definition of gluster native client behavior
in case of distributed volume node failure. Some info is provided in FAQ:
http://www.gluster.com/community/documentation/index.php/GlusterFS_Technical_FAQ#What_happens_if_a_GlusterFS_brick_crashes.3F
but it doesn't provide details.
The other info I managed to find is this stale document:
http://www.gluster.com/community/documentation/index.php/Understanding_DHT_Translator
Document says that files on the failed node will not be visible to
client. However, behavior of opened file handles is not described.
I did couple of simple tests with cp and sha1 commands in order to see
what what happens. Server configuration:
Volume Name: test
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/data
Brick2: gluster2:/data
Options Reconfigured:
performance.stat-prefetch: off
performance.write-behind-window-size: 4MB
performance.io-thread-count: 8
On client side I use default mount without any additional options.
*File read*: Both cp and sha1 seem to read to the point when node fails
and then exit without error. In case of sha1sum it reports incorrect
hash and in case of cp it copies part of the file. In Gluster client
logs I see errors indicating node failure, but commands doesn't report
anything.
*File write*: In case of write situation is slightly better as cp
reports that endpoint is not connected and then fails:
# cp testfile /gluster/; echo $?
cp: writing `testfile': Transport endpoint is not connected
cp: closing `testfile': Transport endpoint is not connected
1
Another interesting detail is that in client log I see that file gets
reopened when the storage node comes back online:
[2011-04-16 14:03:04.909540] I
[client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on
/testfile succeeded (remote-fd = 0)
[2011-04-16 14:03:04.909782] I
[client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on
/testfile succeeded (remote-fd = 1)
However, command has already finished. What is the purpose of this reopen?
Is this expected behavior? Could you please provide pointers to
documentation if such exists?
Is it possible to tune this behavior to be more NFS alike, i.e. put
processes in IO wait until the node comes back?
Thanks in advance
--
Emir Imamagic
www.srce.hr
More information about the Gluster-users
mailing list