[Gluster-users] Stale write/read during self-healing

Mon Sep 14 23:31:40 UTC 2015

Hey guys,

I'm trying to do an experiment with glusterfs and having some problems
and will appreciate if someone points out what I'm doing wrong.
I've created replicated volume across two nodes.

# gluster volume info
Volume Name: gv0
Type: Replicate
Volume ID: fe8f42b7-2a4b-4cf0-b22e-082bd5c8ed60
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 172.31.13.220:/srv/gluster-bricks/xvdf/gv0
Brick2: 172.31.20.154:/srv/gluster-bricks/xvdf/gv0

This is the scenario I'm testing.
Two servers are up and running:

#gluster peer status
Number of Peers: 1

Hostname: 172.31.13.220
Uuid: 3a7fc90b-47b3-4a80-950c-e27afbb64e06
State: Peer in Cluster (Connected)

And:

Number of Peers: 1

Hostname: 172.31.20.154
Uuid: aceb2dce-db05-4368-98b4-398338d9de1a
State: Peer in Cluster (Connected)

There are two other servers which use native glusterfs client to mount
the volume.
I run a script on both of the servers to create 0-sized files to measure
the latency of fail-over.

I've attached the script, the following cut is the important part of the
script:

--CUT--
while true
do
    epoch=$(date +"%s%N")
    touch /home/armen/testdata/$sampledata/$epoch &
    pid=$!
    if [ $? -eq 0 ];  then
        echo "mountpoint is up (${pid})"
    fi
    wait $pid
    echo "Testing $sampledata..."
done
--CUT--

when I reboot one of the servers I see following lines in the logs:

[2015-09-14 22:41:38.006195] E [rpc-clnt.c:362:saved_frames_unwind] (-->
/lib64/libglusterfs.so.0(_gf_log_callingfn+0x186)[0x7f78b08194c6] (-->
/lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f78b05ec01e] (-->
/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f78b05ec12e] (-->
/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x82)[0x7f78b05eda92]
(--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x48)[0x7f78b05ee248] )))))
14-gv0-client-1: forced unwinding frame type(GlusterFS 3.3)
op(LOOKUP(27)) called at 2015-09-14 22:41:37.973664 (xid=0x2d2efc)
[2015-09-14 22:41:38.006387] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 14-gv0-client-1: remote
operation failed: Transport endpoint is not connected. Path:
<gfid:f106d7a3-a7b1-4cf8-8402-40e19993ece6>/testdata
(72d1db26-1e6e-460f-a1ba-d428bd09b4d7)
[2015-09-14 22:41:38.006940] I [client.c:2215:client_rpc_notify]
14-gv0-client-1: disconnected from gv0-client-1. Client process will
keep trying to connect to glusterd until brick's port is available
[2015-09-14 22:41:38.009152] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 14-gv0-client-1: remote
operation failed: Transport endpoint is not connected. Path: (null)
(00000000-0000-0000-0000-000000000000)

And of-course:

# gluster peer status
Number of Peers: 1

Hostname: 172.31.20.154
Uuid: aceb2dce-db05-4368-98b4-398338d9de1a
State: Peer in Cluster (Disconnected)

The fail-over works smoothly, I don't see any latency or interruption in
the scripts.
When the server is back online, obviously it performs the self-heal:

[2015-09-14 22:42:51.483091] E
[client-handshake.c:1496:client_query_portmap_cbk] 14-gv0-client-1:
failed to get the port number for remote subvolume. Please run 'gluster
volume status' on server to see if brick process is running.
[2015-09-14 22:42:51.483387] I [client.c:2215:client_rpc_notify]
14-gv0-client-1: disconnected from gv0-client-1. Client process will
keep trying to connect to glusterd until brick's port is available
[2015-09-14 22:42:54.387194] I [rpc-clnt.c:1761:rpc_clnt_reconfig]
14-gv0-client-1: changing port to 49152 (from 0)
[2015-09-14 22:42:54.395019] I
[client-handshake.c:1413:select_server_supported_programs]
14-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-14 22:42:54.403842] I
[client-handshake.c:1200:client_setvolume_cbk] 14-gv0-client-1:
Connected to gv0-client-1, attached to remote volume
'/srv/gluster-bricks/xvdf/gv0'.
[2015-09-14 22:42:54.403986] I
[client-handshake.c:1210:client_setvolume_cbk] 14-gv0-client-1: Server
and Client lk-version numbers are not same, reopening the fds
[2015-09-14 22:42:54.404508] I
[client-handshake.c:188:client_set_lk_version_cbk] 14-gv0-client-1:
Server lk version = 1
[2015-09-14 22:42:54.438649] I
[afr-self-heal-entry.c:554:afr_selfheal_entry_do] 14-gv0-replicate-0:
performing entry selfheal on 6dbd77a9-398c-4c90-b7a0-f6c36337fb2a

But as soon as it starts the self-heal it blocks the bash script(s):

root       6343  0.0  0.0 107920   352 pts/0     S+   22:42   0:00 touch
/home/armen/testdata/1442270482/1442270574421803041
root     30929  0.5  0.0 113248  1620 pts/0    S+   22:41   0:01
/bin/bash ./test.sh

And Stracing the glusterfs process:

open("/proc/16343/status", O_RDONLY)    = 11
fstat(11, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f78b0cb2000
read(11, "Name:\ttouch\nState:\tS (sleeping)\n"..., 1024) = 1024
close(11)                               = 0
munmap(0x7f78b0cb2000, 4096)            = 0
gettimeofday({1442207605, 814096}, NULL) = 0
writev(14, [{"\200\0\2x", 4},
{"\0\25\250\36\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\33\0\5\363\227\0\0\0
"..., 72},
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\215\20\317\226\344\222M\344\274\376g\202i<x\33"...,
560}], 3) = 636
gettimeofday({1442207605, 814221}, NULL) = 0
readv(8,

And checking FD=8 with lsof reveals:

glusterfs  1563    root    8u      CHR             10,229      0t0    
7141 /dev/fuse

So it seems it blocks on reading /dev/fuse.
At this point if I run another instance of the bash script it works, it
just blocks the processes by the time the self-heal starts.

Then when I see the following line in the logs:

[2015-09-14 22:45:29.847199] I
[afr-self-heal-common.c:476:afr_log_selfheal] 14-gv0-replicate-0:
Completed entry selfheal on 6dbd77a9-398c-4c90-b7a0-f6c36337fb2a.
source=0 sinks=1

It lets the blocked process go.

My question is, why it blocks those processes when it starts the
self-heal while initiating and processes after self-heal is started is fine?

This is the glusterfs version I'm using in a CentOS:

#glusterfs -V
glusterfs 3.6.2 built on Feb  4 2015 20:41:30

#glusterd
-V                                                                                                                                                                                                                                           

glusterfs 3.6.2 built on Feb  4 2015 20:41:30

# uname -r
3.10.0-123.8.1.el7.x86_64

Cheers,
Armen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.sh
Type: application/x-shellscript
Size: 1256 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150915/b63b4178/attachment.bin>