[Gluster-users] libgfapi failover problem on replica bricks

Roman romeo.r at gmail.com
Fri Aug 8 06:05:46 UTC 2014


Just to be sure: why do you guys create an updated version of glusterfs
package for wheezy, if it is not able to install it on wheezy? :)


2014-08-08 9:03 GMT+03:00 Roman <romeo.r at gmail.com>:

> Oh, unfortunately I won't be able to install 3.5.2 nor 3.4.5 :( They both
> require libc6 update. I would not risk that way.
>
>  glusterfs-common : Depends: libc6 (>= 2.14) but 2.13-38+deb7u3 is to be
> installed
>                     Depends: liblvm2app2.2 (>= 2.02.106) but 2.02.95-8 is
> to be installed
>                     Depends: librdmacm1 (>= 1.0.16) but 1.0.15-1+deb7u1 is
> to be installed
>
>
>
> 2014-08-07 15:32 GMT+03:00 Roman <romeo.r at gmail.com>:
>
>> I'm really sorry to bother, but it seems like all my previous test were
>> waste of time with those generated from /dev/zero files :). Its good and
>> bad news. Now I use real files for my tests. As it my almost last workday,
>> only things I prefer to do is to test and document :) .. so here are some
>> new results:
>>
>> So this time I've got two gluster volumes:
>>
>> 1. with cluster.self-heal-daemon off
>> 2. with cluster.self-heal-daemon on
>>
>> 1. real results with SHD off:
>> Seems like all is working as expected. VM survives both glusterfs servers
>> outage. And I'm able to see the sync via network traffic. FINE!
>>
>> Sometimes healing occurs a bit late (takes time from 1 minute to 1 hour
>> to sync). Don't know why. Ideas?
>>
>> 2. test results on server with SHD on:
>> VM is not able to survive second server restart (as was previously
>> defined). gives IO errors, Although files are synced. Some locks, that do
>> not allow KVM hypervisor to reconnect to the storage in time?
>>
>>
>> So the problem actually is stripped files inside a VM :). If one uses
>> them (generates from /dev/zero ie), VM will crash and never come up due to
>> errors in qcow2 file headers. Another bug?
>>
>>
>>
>>
>>
>>
>>
>> 2014-08-07 9:53 GMT+03:00 Roman <romeo.r at gmail.com>:
>>
>>> Ok, then I hope that we will be able to test it two weeks later. Thanks
>>> for your time and  patience.
>>>
>>>
>>> 2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>>>
>>>>
>>>> On 08/07/2014 12:17 PM, Roman wrote:
>>>>
>>>> Well, one thing is definitely true: If there is no healing daemon
>>>> running, I'm not able to start the VM after outage. Seems like the qcow2
>>>> file is corrupted (KVM unable to read its header).
>>>>
>>>> We shall see this again once I have the document with all the steps
>>>> that need to be carried out :-)
>>>>
>>>> Pranith
>>>>
>>>>
>>>>
>>>> 2014-08-07 9:35 GMT+03:00 Roman <romeo.r at gmail.com>:
>>>>
>>>>> > This should not happen if you do the writes lets say from
>>>>> '/dev/urandom' instead of '/dev/zero'
>>>>>
>>>>>  Somewhere deep inside me I thought so ! zero is zero :)
>>>>>
>>>>>  >I will provide you with a document for testing this issue properly.
>>>>> I have a lot going on in my day job so not getting enough time to write
>>>>> that out. Considering the weekend is approaching I will > get a bit of time
>>>>> definitely over the weekend so I will send you the document over the
>>>>> weekend.
>>>>>
>>>>>  Thank you a lot. I'll wait. Tomorrow starts my vacation and I'll be
>>>>> out for two weeks, so don't hurry very much.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri <
>>>>> pkarampu at redhat.com>:
>>>>>
>>>>>>
>>>>>> On 08/07/2014 11:48 AM, Roman wrote:
>>>>>>
>>>>>> How can they be in sync, if they are different in size ? And why then
>>>>>> VM is not able to survive gluster outage? I really want to use glusterfs in
>>>>>> our production for infrastructure virtualization due to its simple setup,
>>>>>> but I'm not able to at this moment. Maybe you've got some testing agenda?
>>>>>> Or could you list me the steps to make right tests, so our VM-s would
>>>>>> survive the outages.
>>>>>>
>>>>>> This is because of sparse files.
>>>>>> http://en.wikipedia.org/wiki/Sparse_file
>>>>>> This should not happen if you do the writes lets say from
>>>>>> '/dev/urandom' instead of '/dev/zero'
>>>>>>
>>>>>> I will provide you with a document for testing this issue properly. I
>>>>>> have a lot going on in my day job so not getting enough time to write that
>>>>>> out. Considering the weekend is approaching I will get a bit of time
>>>>>> definitely over the weekend so I will send you the document over the
>>>>>> weekend.
>>>>>>
>>>>>> Pranith
>>>>>>
>>>>>>
>>>>>>  We would like to be sure, that in situation, when one of storages
>>>>>> is down, the VM-s are running - it is OK, we see this.
>>>>>> We would like to be sure, that data after the server is back up is
>>>>>> synced - we can't see that atm
>>>>>> We would like to be sure, that VMs are failovering to the second
>>>>>> storage during the outage - we can't see this atm
>>>>>> :(
>>>>>>
>>>>>>
>>>>>> 2014-08-07 9:12 GMT+03:00 Pranith Kumar Karampuri <
>>>>>> pkarampu at redhat.com>:
>>>>>>
>>>>>>>
>>>>>>> On 08/07/2014 11:33 AM, Roman wrote:
>>>>>>>
>>>>>>> File size increases because of me :) I generate files on VM from
>>>>>>> /dev/zero during the outage of one server. Then I bring up the downed
>>>>>>> server and it seems files never sync. I'll keep on testing today. Can't
>>>>>>> read much from logs also :(. This morning both VM-s (one on volume with
>>>>>>> self-healing and other on volume without it) survived second server outage
>>>>>>> (first server was down yesterday), while file sizes are different, VM-s ran
>>>>>>> without problems. But I've restarted them before bringing the second
>>>>>>> gluster server down.
>>>>>>>
>>>>>>> Then there is no bug :-). It seems the files are already in sync
>>>>>>> according to the extended attributes you have pasted. How to do you test if
>>>>>>> the files are in sync or not?
>>>>>>>
>>>>>>> Pranith
>>>>>>>
>>>>>>>
>>>>>>>  So I'm a bit lost at this moment. I'll try to keep my testings
>>>>>>> ordered and write here, what will happen.
>>>>>>>
>>>>>>>
>>>>>>> 2014-08-07 8:29 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>> pkarampu at redhat.com>:
>>>>>>>
>>>>>>>>
>>>>>>>> On 08/07/2014 10:46 AM, Roman wrote:
>>>>>>>>
>>>>>>>> yes, they do.
>>>>>>>>
>>>>>>>>  getfattr: Removing leading '/' from absolute path names
>>>>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>>>>
>>>>>>>>  root at stor1:~# du -sh
>>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> 1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> root at stor1:~# md5sum
>>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>>>>>  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> root at stor1:~# du -sh
>>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> 1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  root at stor2:~# getfattr -d -m. -e hex
>>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> getfattr: Removing leading '/' from absolute path names
>>>>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>>>>
>>>>>>>>  root at stor2:~# md5sum
>>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>>>>>  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> root at stor2:~# du -sh
>>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>> 2.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>>>>
>>>>>>>>  I think the files are differing in size because of the sparse file
>>>>>>>> healing issue. Could you raise a bug with steps to re-create this issue
>>>>>>>> where after healing size of the file is increasing?
>>>>>>>>
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm at redhat.com>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>> | From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>>>>> | To: "Roman" <romeo.r at gmail.com>
>>>>>>>>> | Cc: gluster-users at gluster.org, "Niels de Vos" <ndevos at redhat.com>,
>>>>>>>>> "Humble Chirammal" <hchiramm at redhat.com>
>>>>>>>>> | Sent: Wednesday, August 6, 2014 12:09:57 PM
>>>>>>>>> | Subject: Re: [Gluster-users] libgfapi failover problem on
>>>>>>>>> replica bricks
>>>>>>>>> |
>>>>>>>>> | Roman,
>>>>>>>>> |      The file went into split-brain. I think we should do these
>>>>>>>>> tests
>>>>>>>>> | with 3.5.2. Where monitoring the heals is easier. Let me also
>>>>>>>>> come up
>>>>>>>>> | with a document about how to do this testing you are trying to
>>>>>>>>> do.
>>>>>>>>> |
>>>>>>>>> | Humble/Niels,
>>>>>>>>> |      Do we have debs available for 3.5.2? In 3.5.1 there was
>>>>>>>>> packaging
>>>>>>>>> | issue where /usr/bin/glfsheal is not packaged along with the
>>>>>>>>> deb. I
>>>>>>>>> | think that should be fixed now as well?
>>>>>>>>> |
>>>>>>>>>  Pranith,
>>>>>>>>>
>>>>>>>>> The 3.5.2 packages for debian is not available yet. We are
>>>>>>>>> co-ordinating internally to get it processed.
>>>>>>>>> I will update the list once its available.
>>>>>>>>>
>>>>>>>>> --Humble
>>>>>>>>> |
>>>>>>>>> | On 08/06/2014 11:52 AM, Roman wrote:
>>>>>>>>> | > good morning,
>>>>>>>>> | >
>>>>>>>>> | > root at stor1:~# getfattr -d -m. -e hex
>>>>>>>>> | > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | > getfattr: Removing leading '/' from absolute path names
>>>>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>>>> | >
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>>>>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>>>>> | >
>>>>>>>>> | > getfattr: Removing leading '/' from absolute path names
>>>>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>>>>>>>>> | >
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>>>>> | >
>>>>>>>>> | >
>>>>>>>>> | >
>>>>>>>>> | > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>>> pkarampu at redhat.com
>>>>>>>>>  | > <mailto:pkarampu at redhat.com>>:
>>>>>>>>> | >
>>>>>>>>> | >
>>>>>>>>> | >     On 08/06/2014 11:30 AM, Roman wrote:
>>>>>>>>> | >>     Also, this time files are not the same!
>>>>>>>>> | >>
>>>>>>>>> | >>     root at stor1:~# md5sum
>>>>>>>>> | >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>     32411360c53116b96a059f17306caeda
>>>>>>>>> | >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>
>>>>>>>>> | >>     root at stor2:~# md5sum
>>>>>>>>> | >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>>>>>>> | >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >     What is the getfattr output?
>>>>>>>>> | >
>>>>>>>>> | >     Pranith
>>>>>>>>> | >
>>>>>>>>> | >>
>>>>>>>>> | >>
>>>>>>>>> | >>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r at gmail.com
>>>>>>>>>  | >>     <mailto:romeo.r at gmail.com>>:
>>>>>>>>> | >>
>>>>>>>>> | >>         Nope, it is not working. But this time it went a bit
>>>>>>>>> other way
>>>>>>>>> | >>
>>>>>>>>> | >>         root at gluster-client:~# dmesg
>>>>>>>>> | >>         Segmentation fault
>>>>>>>>> | >>
>>>>>>>>> | >>
>>>>>>>>> | >>         I was not able even to start the VM after I done the
>>>>>>>>> tests
>>>>>>>>> | >>
>>>>>>>>> | >>         Could not read qcow2 header: Operation not permitted
>>>>>>>>> | >>
>>>>>>>>> | >>         And it seems, it never starts to sync files after
>>>>>>>>> first
>>>>>>>>> | >>         disconnect. VM survives first disconnect, but not
>>>>>>>>> second (I
>>>>>>>>> | >>         waited around 30 minutes). Also, I've
>>>>>>>>> | >>         got network.ping-timeout: 2 in volume settings, but
>>>>>>>>> logs
>>>>>>>>> | >>         react on first disconnect around 30 seconds. Second
>>>>>>>>> was
>>>>>>>>> | >>         faster, 2 seconds.
>>>>>>>>> | >>
>>>>>>>>> | >>         Reaction was different also:
>>>>>>>>> | >>
>>>>>>>>> | >>         slower one:
>>>>>>>>> | >>         [2014-08-05 13:26:19.558435] W
>>>>>>>>> [socket.c:514:__socket_rwv]
>>>>>>>>> | >>         0-glusterfs: readv failed (Connection timed out)
>>>>>>>>> | >>         [2014-08-05 13:26:19.558485] W
>>>>>>>>> | >>         [socket.c:1962:__socket_proto_state_machine]
>>>>>>>>> 0-glusterfs:
>>>>>>>>> | >>         reading from socket failed. Error (Connection timed
>>>>>>>>> out),
>>>>>>>>>  | >>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
>>>>>>>>> | >>         [2014-08-05 13:26:21.281426] W
>>>>>>>>> [socket.c:514:__socket_rwv]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: readv failed
>>>>>>>>> (Connection timed out)
>>>>>>>>> | >>         [2014-08-05 13:26:21.281474] W
>>>>>>>>> | >>         [socket.c:1962:__socket_proto_state_machine]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: reading from socket
>>>>>>>>> failed.
>>>>>>>>> | >>         Error (Connection timed out), peer (10.250.0.1:49153
>>>>>>>>>  | >>         <http://10.250.0.1:49153>)
>>>>>>>>> | >>         [2014-08-05 13:26:21.281507] I
>>>>>>>>> | >>         [client.c:2098:client_rpc_notify]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: disconnected
>>>>>>>>> | >>
>>>>>>>>> | >>         the fast one:
>>>>>>>>> | >>         2014-08-05 12:52:44.607389] C
>>>>>>>>> | >>         [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
>>>>>>>>>  | >>         <http://10.250.0.2:49153> has not responded in the
>>>>>>>>> last 2
>>>>>>>>>  | >>         seconds, disconnecting.
>>>>>>>>> | >>         [2014-08-05 12:52:44.607491] W
>>>>>>>>> [socket.c:514:__socket_rwv]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: readv failed (No data
>>>>>>>>> available)
>>>>>>>>> | >>         [2014-08-05 12:52:44.607585] E
>>>>>>>>> | >>         [rpc-clnt.c:368:saved_frames_unwind]
>>>>>>>>> | >>
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>>>>> | >>         [0x7fcb1b4b0558]
>>>>>>>>> | >>
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>>>>> | >>         [0x7fcb1b4aea63]
>>>>>>>>> | >>
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>>>>> | >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1:
>>>>>>>>> forced
>>>>>>>>> | >>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
>>>>>>>>> called at
>>>>>>>>> | >>         2014-08-05 12:52:42.463881 (xid=0x381883x)
>>>>>>>>> | >>         [2014-08-05 12:52:44.607604] W
>>>>>>>>> | >>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
>>>>>>>>> | >>         Transport endpoint is not connected. Path: /
>>>>>>>>> | >>         (00000000-0000-0000-0000-000000000001)
>>>>>>>>> | >>         [2014-08-05 12:52:44.607736] E
>>>>>>>>> | >>         [rpc-clnt.c:368:saved_frames_unwind]
>>>>>>>>> | >>
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>>>>> | >>         [0x7fcb1b4b0558]
>>>>>>>>> | >>
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>>>>> | >>         [0x7fcb1b4aea63]
>>>>>>>>> | >>
>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>>>>> | >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1:
>>>>>>>>> forced
>>>>>>>>> | >>         unwinding frame type(GlusterFS Handshake) op(PING(3))
>>>>>>>>> called
>>>>>>>>> | >>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
>>>>>>>>> | >>         [2014-08-05 12:52:44.607753] W
>>>>>>>>> | >>         [client-handshake.c:276:client_ping_cbk]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: timer must have expired
>>>>>>>>> | >>         [2014-08-05 12:52:44.607776] I
>>>>>>>>> | >>         [client.c:2098:client_rpc_notify]
>>>>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: disconnected
>>>>>>>>> | >>
>>>>>>>>> | >>
>>>>>>>>> | >>
>>>>>>>>> | >>         I've got SSD disks (just for an info).
>>>>>>>>> | >>         Should I go and give a try for 3.5.2?
>>>>>>>>> | >>
>>>>>>>>> | >>
>>>>>>>>> | >>
>>>>>>>>> | >>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
>>>>>>>>>  | >>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>>>>> | >>
>>>>>>>>> | >>             reply along with gluster-users please :-). May be
>>>>>>>>> you are
>>>>>>>>> | >>             hitting 'reply' instead of 'reply all'?
>>>>>>>>> | >>
>>>>>>>>> | >>             Pranith
>>>>>>>>> | >>
>>>>>>>>> | >>             On 08/05/2014 03:35 PM, Roman wrote:
>>>>>>>>> | >>>             To make sure and clean, I've created another VM
>>>>>>>>> with raw
>>>>>>>>> | >>>             format and goint to repeat those steps. So now
>>>>>>>>> I've got
>>>>>>>>> | >>>             two VM-s one with qcow2 format and other with raw
>>>>>>>>> | >>>             format. I will send another e-mail shortly.
>>>>>>>>> | >>>
>>>>>>>>> | >>>
>>>>>>>>> | >>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar
>>>>>>>>> Karampuri
>>>>>>>>>  | >>>             <pkarampu at redhat.com <mailto:
>>>>>>>>> pkarampu at redhat.com>>:
>>>>>>>>>  | >>>
>>>>>>>>> | >>>
>>>>>>>>> | >>>                 On 08/05/2014 03:07 PM, Roman wrote:
>>>>>>>>> | >>>>                 really, seems like the same file
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                 stor1:
>>>>>>>>> | >>>>                 a951641c5230472929836f9fcede6b04
>>>>>>>>> | >>>>
>>>>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                 stor2:
>>>>>>>>> | >>>>                 a951641c5230472929836f9fcede6b04
>>>>>>>>> | >>>>
>>>>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                 one thing I've seen from logs, that somehow
>>>>>>>>> proxmox
>>>>>>>>> | >>>>                 VE is connecting with wrong version to
>>>>>>>>> servers?
>>>>>>>>> | >>>>                 [2014-08-05 09:23:45.218550] I
>>>>>>>>> | >>>>
>>>>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>>>>> | >>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
>>>>>>>>> | >>>>                 GlusterFS 3.3, Num (1298437), Version (330)
>>>>>>>>> | >>>                 It is the rpc (over the network data
>>>>>>>>> structures)
>>>>>>>>> | >>>                 version, which is not changed at all from
>>>>>>>>> 3.3 so
>>>>>>>>> | >>>                 thats not a problem. So what is the
>>>>>>>>> conclusion? Is
>>>>>>>>> | >>>                 your test case working now or not?
>>>>>>>>> | >>>
>>>>>>>>> | >>>                 Pranith
>>>>>>>>> | >>>
>>>>>>>>> | >>>>                 but if I issue:
>>>>>>>>> | >>>>                 root at pve1:~# glusterfs -V
>>>>>>>>> | >>>>                 glusterfs 3.4.4 built on Jun 28 2014
>>>>>>>>> 03:44:57
>>>>>>>>> | >>>>                 seems ok.
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                 server  use 3.4.4 meanwhile
>>>>>>>>> | >>>>                 [2014-08-05 09:23:45.117875] I
>>>>>>>>> | >>>>                 [server-handshake.c:567:server_setvolume]
>>>>>>>>> | >>>>                 0-HA-fast-150G-PVE1-server: accepted client
>>>>>>>>> from
>>>>>>>>> | >>>>
>>>>>>>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>>>>>> | >>>>                 (version: 3.4.4)
>>>>>>>>> | >>>>                 [2014-08-05 09:23:49.103035] I
>>>>>>>>> | >>>>                 [server-handshake.c:567:server_setvolume]
>>>>>>>>> | >>>>                 0-HA-fast-150G-PVE1-server: accepted client
>>>>>>>>> from
>>>>>>>>> | >>>>
>>>>>>>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>>>>>> | >>>>                 (version: 3.4.4)
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                 if this could be the reason, of course.
>>>>>>>>> | >>>>                 I did restart the Proxmox VE yesterday
>>>>>>>>> (just for an
>>>>>>>>> | >>>>                 information)
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar
>>>>>>>>> Karampuri
>>>>>>>>>  | >>>>                 <pkarampu at redhat.com <mailto:
>>>>>>>>> pkarampu at redhat.com>>:
>>>>>>>>>  | >>>>
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                     On 08/05/2014 02:33 PM, Roman wrote:
>>>>>>>>> | >>>>>                     Waited long enough for now, still
>>>>>>>>> different
>>>>>>>>> | >>>>>                     sizes and no logs about healing :(
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>                     stor1
>>>>>>>>> | >>>>>                     # file:
>>>>>>>>> | >>>>>
>>>>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>>>>
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>>>> | >>>>>
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>>>> | >>>>>
>>>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>                     root at stor1:~# du -sh
>>>>>>>>> | >>>>>                     /exports/fast-test/150G/images/127/
>>>>>>>>> | >>>>>                     1.2G
>>>>>>>>>  /exports/fast-test/150G/images/127/
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>                     stor2
>>>>>>>>> | >>>>>                     # file:
>>>>>>>>> | >>>>>
>>>>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>>>> | >>>>>
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>>>> | >>>>>
>>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>>>> | >>>>>
>>>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>                     root at stor2:~# du -sh
>>>>>>>>> | >>>>>                     /exports/fast-test/150G/images/127/
>>>>>>>>> | >>>>>                     1.4G
>>>>>>>>>  /exports/fast-test/150G/images/127/
>>>>>>>>> | >>>>                     According to the changelogs, the file
>>>>>>>>> doesn't
>>>>>>>>> | >>>>                     need any healing. Could you stop the
>>>>>>>>> operations
>>>>>>>>> | >>>>                     on the VMs and take md5sum on both
>>>>>>>>> these machines?
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>                     Pranith
>>>>>>>>> | >>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>                     2014-08-05 11:49 GMT+03:00 Pranith
>>>>>>>>> Kumar
>>>>>>>>> | >>>>>                     Karampuri <pkarampu at redhat.com
>>>>>>>>>  | >>>>>                     <mailto:pkarampu at redhat.com>>:
>>>>>>>>>  | >>>>>
>>>>>>>>> | >>>>>
>>>>>>>>> | >>>>>                         On 08/05/2014 02:06 PM, Roman
>>>>>>>>> wrote:
>>>>>>>>> | >>>>>>                         Well, it seems like it doesn't
>>>>>>>>> see the
>>>>>>>>> | >>>>>>                         changes were made to the volume ?
>>>>>>>>> I
>>>>>>>>> | >>>>>>                         created two files 200 and 100 MB
>>>>>>>>> (from
>>>>>>>>> | >>>>>>                         /dev/zero) after I disconnected
>>>>>>>>> the first
>>>>>>>>> | >>>>>>                         brick. Then connected it back and
>>>>>>>>> got
>>>>>>>>> | >>>>>>                         these logs:
>>>>>>>>> | >>>>>>
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>>>> | >>>>>>                         0-glusterfs: No change in
>>>>>>>>> volfile, continuing
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830207] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>>>> changing
>>>>>>>>> | >>>>>>                         port to 49153 (from 0)
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830239] W
>>>>>>>>> | >>>>>>                         [socket.c:514:__socket_rwv]
>>>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>>>> readv
>>>>>>>>> | >>>>>>                         failed (No data available)
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831024] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>>>> Using
>>>>>>>>> | >>>>>>                         Program GlusterFS 3.3, Num
>>>>>>>>> (1298437),
>>>>>>>>> | >>>>>>                         Version (330)
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831375] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [client-handshake.c:1456:client_setvolume_cbk]
>>>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>>>> Connected
>>>>>>>>> | >>>>>>                         to 10.250.0.1:49153
>>>>>>>>>  | >>>>>>                         <http://10.250.0.1:49153>,
>>>>>>>>> attached to
>>>>>>>>>  | >>>>>>                         remote volume
>>>>>>>>> '/exports/fast-test/150G'.
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831394] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [client-handshake.c:1468:client_setvolume_cbk]
>>>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>>>> Server and
>>>>>>>>> | >>>>>>                         Client lk-version numbers are not
>>>>>>>>> same,
>>>>>>>>> | >>>>>>                         reopening the fds
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831566] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>>>>> Server lk
>>>>>>>>> | >>>>>>                         version = 1
>>>>>>>>> | >>>>>>
>>>>>>>>> | >>>>>>
>>>>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>>>>> | >>>>>>
>>>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>>>> | >>>>>>                         0-glusterfs: No change in
>>>>>>>>> volfile, continuing
>>>>>>>>> | >>>>>>                         this line seems weird to me tbh.
>>>>>>>>> | >>>>>>                         I do not see any traffic on switch
>>>>>>>>> | >>>>>>                         interfaces between gluster
>>>>>>>>> servers, which
>>>>>>>>>
>>>>>>>> ...
>>>
>>> [Письмо показано не полностью]
>>
>>
>>
>>
>> --
>> Best regards,
>> Roman.
>>
>
>
>
> --
> Best regards,
> Roman.
>



-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140808/5e48fb74/attachment.html>


More information about the Gluster-users mailing list