[Gluster-users] libgfapi failover problem on replica bricks

Thu Aug 7 12:32:40 UTC 2014

I'm really sorry to bother, but it seems like all my previous test were
waste of time with those generated from /dev/zero files :). Its good and
bad news. Now I use real files for my tests. As it my almost last workday,
only things I prefer to do is to test and document :) .. so here are some
new results:

So this time I've got two gluster volumes:

1. with cluster.self-heal-daemon off
2. with cluster.self-heal-daemon on

1. real results with SHD off:
Seems like all is working as expected. VM survives both glusterfs servers
outage. And I'm able to see the sync via network traffic. FINE!

Sometimes healing occurs a bit late (takes time from 1 minute to 1 hour to
sync). Don't know why. Ideas?

2. test results on server with SHD on:
VM is not able to survive second server restart (as was previously
defined). gives IO errors, Although files are synced. Some locks, that do
not allow KVM hypervisor to reconnect to the storage in time?

So the problem actually is stripped files inside a VM :). If one uses them
(generates from /dev/zero ie), VM will crash and never come up due to
errors in qcow2 file headers. Another bug?

2014-08-07 9:53 GMT+03:00 Roman <romeo.r at gmail.com>:

> Ok, then I hope that we will be able to test it two weeks later. Thanks
> for your time and  patience.
>
>
> 2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>
>>
>> On 08/07/2014 12:17 PM, Roman wrote:
>>
>> Well, one thing is definitely true: If there is no healing daemon
>> running, I'm not able to start the VM after outage. Seems like the qcow2
>> file is corrupted (KVM unable to read its header).
>>
>> We shall see this again once I have the document with all the steps that
>> need to be carried out :-)
>>
>> Pranith
>>
>>
>>
>> 2014-08-07 9:35 GMT+03:00 Roman <romeo.r at gmail.com>:
>>
>>> > This should not happen if you do the writes lets say from
>>> '/dev/urandom' instead of '/dev/zero'
>>>
>>>  Somewhere deep inside me I thought so ! zero is zero :)
>>>
>>>  >I will provide you with a document for testing this issue properly. I
>>> have a lot going on in my day job so not getting enough time to write that
>>> out. Considering the weekend is approaching I will > get a bit of time
>>> definitely over the weekend so I will send you the document over the
>>> weekend.
>>>
>>>  Thank you a lot. I'll wait. Tomorrow starts my vacation and I'll be
>>> out for two weeks, so don't hurry very much.
>>>
>>>
>>>
>>>
>>>  2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>
>>> :
>>>
>>>>
>>>> On 08/07/2014 11:48 AM, Roman wrote:
>>>>
>>>> How can they be in sync, if they are different in size ? And why then
>>>> VM is not able to survive gluster outage? I really want to use glusterfs in
>>>> our production for infrastructure virtualization due to its simple setup,
>>>> but I'm not able to at this moment. Maybe you've got some testing agenda?
>>>> Or could you list me the steps to make right tests, so our VM-s would
>>>> survive the outages.
>>>>
>>>> This is because of sparse files.
>>>> http://en.wikipedia.org/wiki/Sparse_file
>>>> This should not happen if you do the writes lets say from
>>>> '/dev/urandom' instead of '/dev/zero'
>>>>
>>>> I will provide you with a document for testing this issue properly. I
>>>> have a lot going on in my day job so not getting enough time to write that
>>>> out. Considering the weekend is approaching I will get a bit of time
>>>> definitely over the weekend so I will send you the document over the
>>>> weekend.
>>>>
>>>> Pranith
>>>>
>>>>
>>>>  We would like to be sure, that in situation, when one of storages is
>>>> down, the VM-s are running - it is OK, we see this.
>>>> We would like to be sure, that data after the server is back up is
>>>> synced - we can't see that atm
>>>> We would like to be sure, that VMs are failovering to the second
>>>> storage during the outage - we can't see this atm
>>>> :(
>>>>
>>>>
>>>> 2014-08-07 9:12 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>
>>>> :
>>>>
>>>>>
>>>>> On 08/07/2014 11:33 AM, Roman wrote:
>>>>>
>>>>> File size increases because of me :) I generate files on VM from
>>>>> /dev/zero during the outage of one server. Then I bring up the downed
>>>>> server and it seems files never sync. I'll keep on testing today. Can't
>>>>> read much from logs also :(. This morning both VM-s (one on volume with
>>>>> self-healing and other on volume without it) survived second server outage
>>>>> (first server was down yesterday), while file sizes are different, VM-s ran
>>>>> without problems. But I've restarted them before bringing the second
>>>>> gluster server down.
>>>>>
>>>>> Then there is no bug :-). It seems the files are already in sync
>>>>> according to the extended attributes you have pasted. How to do you test if
>>>>> the files are in sync or not?
>>>>>
>>>>> Pranith
>>>>>
>>>>>
>>>>>  So I'm a bit lost at this moment. I'll try to keep my testings
>>>>> ordered and write here, what will happen.
>>>>>
>>>>>
>>>>> 2014-08-07 8:29 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com
>>>>> >:
>>>>>
>>>>>>
>>>>>> On 08/07/2014 10:46 AM, Roman wrote:
>>>>>>
>>>>>> yes, they do.
>>>>>>
>>>>>>  getfattr: Removing leading '/' from absolute path names
>>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>>
>>>>>>  root at stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> 1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> root at stor1:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>>>  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> root at stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> 1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>
>>>>>>
>>>>>>
>>>>>>  root at stor2:~# getfattr -d -m. -e hex
>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>>
>>>>>>  root at stor2:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>>>  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> root at stor2:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>> 2.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>
>>>>>>  I think the files are differing in size because of the sparse file
>>>>>> healing issue. Could you raise a bug with steps to re-create this issue
>>>>>> where after healing size of the file is increasing?
>>>>>>
>>>>>> Pranith
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm at redhat.com>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> | From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>>> | To: "Roman" <romeo.r at gmail.com>
>>>>>>> | Cc: gluster-users at gluster.org, "Niels de Vos" <ndevos at redhat.com>,
>>>>>>> "Humble Chirammal" <hchiramm at redhat.com>
>>>>>>> | Sent: Wednesday, August 6, 2014 12:09:57 PM
>>>>>>> | Subject: Re: [Gluster-users] libgfapi failover problem on replica
>>>>>>> bricks
>>>>>>> |
>>>>>>> | Roman,
>>>>>>> |      The file went into split-brain. I think we should do these
>>>>>>> tests
>>>>>>> | with 3.5.2. Where monitoring the heals is easier. Let me also come
>>>>>>> up
>>>>>>> | with a document about how to do this testing you are trying to do.
>>>>>>> |
>>>>>>> | Humble/Niels,
>>>>>>> |      Do we have debs available for 3.5.2? In 3.5.1 there was
>>>>>>> packaging
>>>>>>> | issue where /usr/bin/glfsheal is not packaged along with the deb. I
>>>>>>> | think that should be fixed now as well?
>>>>>>> |
>>>>>>>  Pranith,
>>>>>>>
>>>>>>> The 3.5.2 packages for debian is not available yet. We are
>>>>>>> co-ordinating internally to get it processed.
>>>>>>> I will update the list once its available.
>>>>>>>
>>>>>>> --Humble
>>>>>>> |
>>>>>>> | On 08/06/2014 11:52 AM, Roman wrote:
>>>>>>> | > good morning,
>>>>>>> | >
>>>>>>> | > root at stor1:~# getfattr -d -m. -e hex
>>>>>>> | > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | > getfattr: Removing leading '/' from absolute path names
>>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>>> | >
>>>>>>> | > getfattr: Removing leading '/' from absolute path names
>>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>>> | >
>>>>>>> | >
>>>>>>> | >
>>>>>>> | > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>> pkarampu at redhat.com
>>>>>>>  | > <mailto:pkarampu at redhat.com>>:
>>>>>>> | >
>>>>>>> | >
>>>>>>> | >     On 08/06/2014 11:30 AM, Roman wrote:
>>>>>>> | >>     Also, this time files are not the same!
>>>>>>> | >>
>>>>>>> | >>     root at stor1:~# md5sum
>>>>>>> | >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>     32411360c53116b96a059f17306caeda
>>>>>>> | >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>
>>>>>>> | >>     root at stor2:~# md5sum
>>>>>>> | >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>>>>> | >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >     What is the getfattr output?
>>>>>>> | >
>>>>>>> | >     Pranith
>>>>>>> | >
>>>>>>> | >>
>>>>>>> | >>
>>>>>>> | >>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r at gmail.com
>>>>>>>  | >>     <mailto:romeo.r at gmail.com>>:
>>>>>>> | >>
>>>>>>> | >>         Nope, it is not working. But this time it went a bit
>>>>>>> other way
>>>>>>> | >>
>>>>>>> | >>         root at gluster-client:~# dmesg
>>>>>>> | >>         Segmentation fault
>>>>>>> | >>
>>>>>>> | >>
>>>>>>> | >>         I was not able even to start the VM after I done the
>>>>>>> tests
>>>>>>> | >>
>>>>>>> | >>         Could not read qcow2 header: Operation not permitted
>>>>>>> | >>
>>>>>>> | >>         And it seems, it never starts to sync files after first
>>>>>>> | >>         disconnect. VM survives first disconnect, but not
>>>>>>> second (I
>>>>>>> | >>         waited around 30 minutes). Also, I've
>>>>>>> | >>         got network.ping-timeout: 2 in volume settings, but logs
>>>>>>> | >>         react on first disconnect around 30 seconds. Second was
>>>>>>> | >>         faster, 2 seconds.
>>>>>>> | >>
>>>>>>> | >>         Reaction was different also:
>>>>>>> | >>
>>>>>>> | >>         slower one:
>>>>>>> | >>         [2014-08-05 13:26:19.558435] W
>>>>>>> [socket.c:514:__socket_rwv]
>>>>>>> | >>         0-glusterfs: readv failed (Connection timed out)
>>>>>>> | >>         [2014-08-05 13:26:19.558485] W
>>>>>>> | >>         [socket.c:1962:__socket_proto_state_machine]
>>>>>>> 0-glusterfs:
>>>>>>> | >>         reading from socket failed. Error (Connection timed
>>>>>>> out),
>>>>>>>  | >>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
>>>>>>> | >>         [2014-08-05 13:26:21.281426] W
>>>>>>> [socket.c:514:__socket_rwv]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: readv failed (Connection
>>>>>>> timed out)
>>>>>>> | >>         [2014-08-05 13:26:21.281474] W
>>>>>>> | >>         [socket.c:1962:__socket_proto_state_machine]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: reading from socket
>>>>>>> failed.
>>>>>>> | >>         Error (Connection timed out), peer (10.250.0.1:49153
>>>>>>>  | >>         <http://10.250.0.1:49153>)
>>>>>>> | >>         [2014-08-05 13:26:21.281507] I
>>>>>>> | >>         [client.c:2098:client_rpc_notify]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: disconnected
>>>>>>> | >>
>>>>>>> | >>         the fast one:
>>>>>>> | >>         2014-08-05 12:52:44.607389] C
>>>>>>> | >>         [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
>>>>>>>  | >>         <http://10.250.0.2:49153> has not responded in the
>>>>>>> last 2
>>>>>>>  | >>         seconds, disconnecting.
>>>>>>> | >>         [2014-08-05 12:52:44.607491] W
>>>>>>> [socket.c:514:__socket_rwv]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: readv failed (No data
>>>>>>> available)
>>>>>>> | >>         [2014-08-05 12:52:44.607585] E
>>>>>>> | >>         [rpc-clnt.c:368:saved_frames_unwind]
>>>>>>> | >>
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>>> | >>         [0x7fcb1b4b0558]
>>>>>>> | >>
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>>> | >>         [0x7fcb1b4aea63]
>>>>>>> | >>
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>>> | >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>>>>>> | >>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
>>>>>>> called at
>>>>>>> | >>         2014-08-05 12:52:42.463881 (xid=0x381883x)
>>>>>>> | >>         [2014-08-05 12:52:44.607604] W
>>>>>>> | >>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
>>>>>>> | >>         Transport endpoint is not connected. Path: /
>>>>>>> | >>         (00000000-0000-0000-0000-000000000001)
>>>>>>> | >>         [2014-08-05 12:52:44.607736] E
>>>>>>> | >>         [rpc-clnt.c:368:saved_frames_unwind]
>>>>>>> | >>
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>>> | >>         [0x7fcb1b4b0558]
>>>>>>> | >>
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>>> | >>         [0x7fcb1b4aea63]
>>>>>>> | >>
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>>> | >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>>>>>> | >>         unwinding frame type(GlusterFS Handshake) op(PING(3))
>>>>>>> called
>>>>>>> | >>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
>>>>>>> | >>         [2014-08-05 12:52:44.607753] W
>>>>>>> | >>         [client-handshake.c:276:client_ping_cbk]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: timer must have expired
>>>>>>> | >>         [2014-08-05 12:52:44.607776] I
>>>>>>> | >>         [client.c:2098:client_rpc_notify]
>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: disconnected
>>>>>>> | >>
>>>>>>> | >>
>>>>>>> | >>
>>>>>>> | >>         I've got SSD disks (just for an info).
>>>>>>> | >>         Should I go and give a try for 3.5.2?
>>>>>>> | >>
>>>>>>> | >>
>>>>>>> | >>
>>>>>>> | >>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
>>>>>>>  | >>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>>> | >>
>>>>>>> | >>             reply along with gluster-users please :-). May be
>>>>>>> you are
>>>>>>> | >>             hitting 'reply' instead of 'reply all'?
>>>>>>> | >>
>>>>>>> | >>             Pranith
>>>>>>> | >>
>>>>>>> | >>             On 08/05/2014 03:35 PM, Roman wrote:
>>>>>>> | >>>             To make sure and clean, I've created another VM
>>>>>>> with raw
>>>>>>> | >>>             format and goint to repeat those steps. So now
>>>>>>> I've got
>>>>>>> | >>>             two VM-s one with qcow2 format and other with raw
>>>>>>> | >>>             format. I will send another e-mail shortly.
>>>>>>> | >>>
>>>>>>> | >>>
>>>>>>> | >>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
>>>>>>>  | >>>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com
>>>>>>> >>:
>>>>>>>  | >>>
>>>>>>> | >>>
>>>>>>> | >>>                 On 08/05/2014 03:07 PM, Roman wrote:
>>>>>>> | >>>>                 really, seems like the same file
>>>>>>> | >>>>
>>>>>>> | >>>>                 stor1:
>>>>>>> | >>>>                 a951641c5230472929836f9fcede6b04
>>>>>>> | >>>>
>>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>>>
>>>>>>> | >>>>                 stor2:
>>>>>>> | >>>>                 a951641c5230472929836f9fcede6b04
>>>>>>> | >>>>
>>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>>>
>>>>>>> | >>>>
>>>>>>> | >>>>                 one thing I've seen from logs, that somehow
>>>>>>> proxmox
>>>>>>> | >>>>                 VE is connecting with wrong version to
>>>>>>> servers?
>>>>>>> | >>>>                 [2014-08-05 09:23:45.218550] I
>>>>>>> | >>>>
>>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>>> | >>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
>>>>>>> | >>>>                 GlusterFS 3.3, Num (1298437), Version (330)
>>>>>>> | >>>                 It is the rpc (over the network data
>>>>>>> structures)
>>>>>>> | >>>                 version, which is not changed at all from 3.3
>>>>>>> so
>>>>>>> | >>>                 thats not a problem. So what is the
>>>>>>> conclusion? Is
>>>>>>> | >>>                 your test case working now or not?
>>>>>>> | >>>
>>>>>>> | >>>                 Pranith
>>>>>>> | >>>
>>>>>>> | >>>>                 but if I issue:
>>>>>>> | >>>>                 root at pve1:~# glusterfs -V
>>>>>>> | >>>>                 glusterfs 3.4.4 built on Jun 28 2014 03:44:57
>>>>>>> | >>>>                 seems ok.
>>>>>>> | >>>>
>>>>>>> | >>>>                 server  use 3.4.4 meanwhile
>>>>>>> | >>>>                 [2014-08-05 09:23:45.117875] I
>>>>>>> | >>>>                 [server-handshake.c:567:server_setvolume]
>>>>>>> | >>>>                 0-HA-fast-150G-PVE1-server: accepted client
>>>>>>> from
>>>>>>> | >>>>
>>>>>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>>>> | >>>>                 (version: 3.4.4)
>>>>>>> | >>>>                 [2014-08-05 09:23:49.103035] I
>>>>>>> | >>>>                 [server-handshake.c:567:server_setvolume]
>>>>>>> | >>>>                 0-HA-fast-150G-PVE1-server: accepted client
>>>>>>> from
>>>>>>> | >>>>
>>>>>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>>>> | >>>>                 (version: 3.4.4)
>>>>>>> | >>>>
>>>>>>> | >>>>                 if this could be the reason, of course.
>>>>>>> | >>>>                 I did restart the Proxmox VE yesterday (just
>>>>>>> for an
>>>>>>> | >>>>                 information)
>>>>>>> | >>>>
>>>>>>> | >>>>
>>>>>>> | >>>>
>>>>>>> | >>>>
>>>>>>> | >>>>
>>>>>>> | >>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar
>>>>>>> Karampuri
>>>>>>>  | >>>>                 <pkarampu at redhat.com <mailto:
>>>>>>> pkarampu at redhat.com>>:
>>>>>>>  | >>>>
>>>>>>> | >>>>
>>>>>>> | >>>>                     On 08/05/2014 02:33 PM, Roman wrote:
>>>>>>> | >>>>>                     Waited long enough for now, still
>>>>>>> different
>>>>>>> | >>>>>                     sizes and no logs about healing :(
>>>>>>> | >>>>>
>>>>>>> | >>>>>                     stor1
>>>>>>> | >>>>>                     # file:
>>>>>>> | >>>>>
>>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>>>>
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>> | >>>>>
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>> | >>>>>
>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>> | >>>>>
>>>>>>> | >>>>>                     root at stor1:~# du -sh
>>>>>>> | >>>>>                     /exports/fast-test/150G/images/127/
>>>>>>> | >>>>>                     1.2G  /exports/fast-test/150G/images/127/
>>>>>>> | >>>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>                     stor2
>>>>>>> | >>>>>                     # file:
>>>>>>> | >>>>>
>>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> | >>>>>
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>> | >>>>>
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>> | >>>>>
>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>> | >>>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>                     root at stor2:~# du -sh
>>>>>>> | >>>>>                     /exports/fast-test/150G/images/127/
>>>>>>> | >>>>>                     1.4G  /exports/fast-test/150G/images/127/
>>>>>>> | >>>>                     According to the changelogs, the file
>>>>>>> doesn't
>>>>>>> | >>>>                     need any healing. Could you stop the
>>>>>>> operations
>>>>>>> | >>>>                     on the VMs and take md5sum on both these
>>>>>>> machines?
>>>>>>> | >>>>
>>>>>>> | >>>>                     Pranith
>>>>>>> | >>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>                     2014-08-05 11:49 GMT+03:00 Pranith Kumar
>>>>>>> | >>>>>                     Karampuri <pkarampu at redhat.com
>>>>>>>  | >>>>>                     <mailto:pkarampu at redhat.com>>:
>>>>>>>  | >>>>>
>>>>>>> | >>>>>
>>>>>>> | >>>>>                         On 08/05/2014 02:06 PM, Roman wrote:
>>>>>>> | >>>>>>                         Well, it seems like it doesn't see
>>>>>>> the
>>>>>>> | >>>>>>                         changes were made to the volume ? I
>>>>>>> | >>>>>>                         created two files 200 and 100 MB
>>>>>>> (from
>>>>>>> | >>>>>>                         /dev/zero) after I disconnected the
>>>>>>> first
>>>>>>> | >>>>>>                         brick. Then connected it back and
>>>>>>> got
>>>>>>> | >>>>>>                         these logs:
>>>>>>> | >>>>>>
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>>> | >>>>>>
>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>> | >>>>>>                         0-glusterfs: No change in volfile,
>>>>>>> continuing
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830207] I
>>>>>>> | >>>>>>                         [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>> changing
>>>>>>> | >>>>>>                         port to 49153 (from 0)
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830239] W
>>>>>>> | >>>>>>                         [socket.c:514:__socket_rwv]
>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0: readv
>>>>>>> | >>>>>>                         failed (No data available)
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831024] I
>>>>>>> | >>>>>>
>>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0: Using
>>>>>>> | >>>>>>                         Program GlusterFS 3.3, Num
>>>>>>> (1298437),
>>>>>>> | >>>>>>                         Version (330)
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831375] I
>>>>>>> | >>>>>>
>>>>>>> [client-handshake.c:1456:client_setvolume_cbk]
>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>> Connected
>>>>>>> | >>>>>>                         to 10.250.0.1:49153
>>>>>>>  | >>>>>>                         <http://10.250.0.1:49153>,
>>>>>>> attached to
>>>>>>>  | >>>>>>                         remote volume
>>>>>>> '/exports/fast-test/150G'.
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831394] I
>>>>>>> | >>>>>>
>>>>>>> [client-handshake.c:1468:client_setvolume_cbk]
>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>> Server and
>>>>>>> | >>>>>>                         Client lk-version numbers are not
>>>>>>> same,
>>>>>>> | >>>>>>                         reopening the fds
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831566] I
>>>>>>> | >>>>>>
>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>> Server lk
>>>>>>> | >>>>>>                         version = 1
>>>>>>> | >>>>>>
>>>>>>> | >>>>>>
>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>>> | >>>>>>
>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>> | >>>>>>                         0-glusterfs: No change in volfile,
>>>>>>> continuing
>>>>>>> | >>>>>>                         this line seems weird to me tbh.
>>>>>>> | >>>>>>                         I do not see any traffic on switch
>>>>>>> | >>>>>>                         interfaces between gluster servers,
>>>>>>> which
>>>>>>>
>>>>>> ...
>
> [Письмо показано не полностью]

-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140807/4a337932/attachment.html>