[Gluster-users] libgfapi failover problem on replica bricks
Pranith Kumar Karampuri
pkarampu at redhat.com
Fri Aug 8 10:23:42 UTC 2014
On 08/08/2014 11:35 AM, Roman wrote:
> Just to be sure: why do you guys create an updated version of
> glusterfs package for wheezy, if it is not able to install it on
> wheezy? :)
CCed lala, Humble, kaleb who may know the answer
Pranith
>
>
> 2014-08-08 9:03 GMT+03:00 Roman <romeo.r at gmail.com
> <mailto:romeo.r at gmail.com>>:
>
> Oh, unfortunately I won't be able to install 3.5.2 nor 3.4.5 :(
> They both require libc6 update. I would not risk that way.
>
> glusterfs-common : Depends: libc6 (>= 2.14) but 2.13-38+deb7u3 is
> to be installed
> Depends: liblvm2app2.2 (>= 2.02.106) but
> 2.02.95-8 is to be installed
> Depends: librdmacm1 (>= 1.0.16) but
> 1.0.15-1+deb7u1 is to be installed
>
>
>
> 2014-08-07 15:32 GMT+03:00 Roman <romeo.r at gmail.com
> <mailto:romeo.r at gmail.com>>:
>
> I'm really sorry to bother, but it seems like all my previous
> test were waste of time with those generated from /dev/zero
> files :). Its good and bad news. Now I use real files for my
> tests. As it my almost last workday, only things I prefer to
> do is to test and document :) .. so here are some new results:
>
> So this time I've got two gluster volumes:
>
> 1. with cluster.self-heal-daemon off
> 2. with cluster.self-heal-daemon on
>
> 1. real results with SHD off:
> Seems like all is working as expected. VM survives both
> glusterfs servers outage. And I'm able to see the sync via
> network traffic. FINE!
>
> Sometimes healing occurs a bit late (takes time from 1 minute
> to 1 hour to sync). Don't know why. Ideas?
>
> 2. test results on server with SHD on:
> VM is not able to survive second server restart (as was
> previously defined). gives IO errors, Although files are
> synced. Some locks, that do not allow KVM hypervisor to
> reconnect to the storage in time?
>
>
> So the problem actually is stripped files inside a VM :). If
> one uses them (generates from /dev/zero ie), VM will crash and
> never come up due to errors in qcow2 file headers. Another bug?
>
>
>
>
>
>
>
> 2014-08-07 9:53 GMT+03:00 Roman <romeo.r at gmail.com
> <mailto:romeo.r at gmail.com>>:
>
> Ok, then I hope that we will be able to test it two weeks
> later. Thanks for your time and patience.
>
>
> 2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>
> On 08/07/2014 12:17 PM, Roman wrote:
>> Well, one thing is definitely true: If there is no
>> healing daemon running, I'm not able to start the VM
>> after outage. Seems like the qcow2 file is corrupted
>> (KVM unable to read its header).
> We shall see this again once I have the document with
> all the steps that need to be carried out :-)
>
> Pranith
>>
>>
>> 2014-08-07 9:35 GMT+03:00 Roman <romeo.r at gmail.com
>> <mailto:romeo.r at gmail.com>>:
>>
>> > This should not happen if you do the writes
>> lets say from '/dev/urandom' instead of '/dev/zero'
>>
>> Somewhere deep inside me I thought so ! zero is
>> zero :)
>>
>> >I will provide you with a document for testing
>> this issue properly. I have a lot going on in my
>> day job so not getting enough time to write that
>> out. Considering the weekend is approaching I
>> will > get a bit of time definitely over the
>> weekend so I will send you the document over the
>> weekend.
>>
>> Thank you a lot. I'll wait. Tomorrow starts my
>> vacation and I'll be out for two weeks, so don't
>> hurry very much.
>>
>>
>>
>>
>> 2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri
>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>
>> On 08/07/2014 11:48 AM, Roman wrote:
>>> How can they be in sync, if they are
>>> different in size ? And why then VM is not
>>> able to survive gluster outage? I really
>>> want to use glusterfs in our production for
>>> infrastructure virtualization due to its
>>> simple setup, but I'm not able to at this
>>> moment. Maybe you've got some testing
>>> agenda? Or could you list me the steps to
>>> make right tests, so our VM-s would survive
>>> the outages.
>> This is because of sparse files.
>> http://en.wikipedia.org/wiki/Sparse_file
>> This should not happen if you do the writes
>> lets say from '/dev/urandom' instead of
>> '/dev/zero'
>>
>> I will provide you with a document for
>> testing this issue properly. I have a lot
>> going on in my day job so not getting enough
>> time to write that out. Considering the
>> weekend is approaching I will get a bit of
>> time definitely over the weekend so I will
>> send you the document over the weekend.
>>
>> Pranith
>>>
>>> We would like to be sure, that in situation,
>>> when one of storages is down, the VM-s are
>>> running - it is OK, we see this.
>>> We would like to be sure, that data after
>>> the server is back up is synced - we can't
>>> see that atm
>>> We would like to be sure, that VMs are
>>> failovering to the second storage during the
>>> outage - we can't see this atm
>>> :(
>>>
>>>
>>> 2014-08-07 9:12 GMT+03:00 Pranith Kumar
>>> Karampuri <pkarampu at redhat.com
>>> <mailto:pkarampu at redhat.com>>:
>>>
>>>
>>> On 08/07/2014 11:33 AM, Roman wrote:
>>>> File size increases because of me :) I
>>>> generate files on VM from /dev/zero
>>>> during the outage of one server. Then I
>>>> bring up the downed server and it seems
>>>> files never sync. I'll keep on testing
>>>> today. Can't read much from logs also
>>>> :(. This morning both VM-s (one on
>>>> volume with self-healing and other on
>>>> volume without it) survived second
>>>> server outage (first server was down
>>>> yesterday), while file sizes are
>>>> different, VM-s ran without problems.
>>>> But I've restarted them before bringing
>>>> the second gluster server down.
>>> Then there is no bug :-). It seems the
>>> files are already in sync according to
>>> the extended attributes you have pasted.
>>> How to do you test if the files are in
>>> sync or not?
>>>
>>> Pranith
>>>>
>>>> So I'm a bit lost at this moment. I'll
>>>> try to keep my testings ordered and
>>>> write here, what will happen.
>>>>
>>>>
>>>> 2014-08-07 8:29 GMT+03:00 Pranith Kumar
>>>> Karampuri <pkarampu at redhat.com
>>>> <mailto:pkarampu at redhat.com>>:
>>>>
>>>>
>>>> On 08/07/2014 10:46 AM, Roman wrote:
>>>>> yes, they do.
>>>>>
>>>>> getfattr: Removing leading '/'
>>>>> from absolute path names
>>>>> # file:
>>>>> exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>
>>>>> root at stor1:~# du -sh
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> 1.6G
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> root at stor1:~# md5sum
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> root at stor1:~# du -sh
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> 1.6G
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>
>>>>>
>>>>>
>>>>> root at stor2:~# getfattr -d -m. -e
>>>>> hex
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> getfattr: Removing leading '/'
>>>>> from absolute path names
>>>>> # file:
>>>>> exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>
>>>>> root at stor2:~# md5sum
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> root at stor2:~# du -sh
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> 2.6G
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>> I think the files are differing in
>>>> size because of the sparse file
>>>> healing issue. Could you raise a
>>>> bug with steps to re-create this
>>>> issue where after healing size of
>>>> the file is increasing?
>>>>
>>>> Pranith
>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-08-06 12:49 GMT+03:00 Humble
>>>>> Chirammal <hchiramm at redhat.com
>>>>> <mailto:hchiramm at redhat.com>>:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> | From: "Pranith Kumar
>>>>> Karampuri"
>>>>> <pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>
>>>>> | To: "Roman"
>>>>> <romeo.r at gmail.com
>>>>> <mailto:romeo.r at gmail.com>>
>>>>> | Cc:
>>>>> gluster-users at gluster.org
>>>>> <mailto:gluster-users at gluster.org>,
>>>>> "Niels de Vos"
>>>>> <ndevos at redhat.com
>>>>> <mailto:ndevos at redhat.com>>,
>>>>> "Humble Chirammal"
>>>>> <hchiramm at redhat.com
>>>>> <mailto:hchiramm at redhat.com>>
>>>>> | Sent: Wednesday, August 6,
>>>>> 2014 12:09:57 PM
>>>>> | Subject: Re: [Gluster-users]
>>>>> libgfapi failover problem on
>>>>> replica bricks
>>>>> |
>>>>> | Roman,
>>>>> | The file went into
>>>>> split-brain. I think we should
>>>>> do these tests
>>>>> | with 3.5.2. Where monitoring
>>>>> the heals is easier. Let me
>>>>> also come up
>>>>> | with a document about how to
>>>>> do this testing you are trying
>>>>> to do.
>>>>> |
>>>>> | Humble/Niels,
>>>>> | Do we have debs
>>>>> available for 3.5.2? In 3.5.1
>>>>> there was packaging
>>>>> | issue where
>>>>> /usr/bin/glfsheal is not
>>>>> packaged along with the deb. I
>>>>> | think that should be fixed
>>>>> now as well?
>>>>> |
>>>>> Pranith,
>>>>>
>>>>> The 3.5.2 packages for debian
>>>>> is not available yet. We are
>>>>> co-ordinating internally to
>>>>> get it processed.
>>>>> I will update the list once
>>>>> its available.
>>>>>
>>>>> --Humble
>>>>> |
>>>>> | On 08/06/2014 11:52 AM,
>>>>> Roman wrote:
>>>>> | > good morning,
>>>>> | >
>>>>> | > root at stor1:~# getfattr -d
>>>>> -m. -e hex
>>>>> | >
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | > getfattr: Removing leading
>>>>> '/' from absolute path names
>>>>> | > # file:
>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >
>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>> | >
>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>>>>> | >
>>>>> trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>> | >
>>>>> | > getfattr: Removing leading
>>>>> '/' from absolute path names
>>>>> | > # file:
>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >
>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>>>>> | >
>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>> | >
>>>>> trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>> | >
>>>>> | >
>>>>> | >
>>>>> | > 2014-08-06 9:20 GMT+03:00
>>>>> Pranith Kumar Karampuri
>>>>> <pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>
>>>>> | >
>>>>> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>>:
>>>>> | >
>>>>> | >
>>>>> | > On 08/06/2014 11:30
>>>>> AM, Roman wrote:
>>>>> | >> Also, this time files are
>>>>> not the same!
>>>>> | >>
>>>>> | >> root at stor1:~# md5sum
>>>>> | >>
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>
>>>>> 32411360c53116b96a059f17306caeda
>>>>> | >>
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>
>>>>> | >> root at stor2:~# md5sum
>>>>> | >>
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>
>>>>> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>>> | >>
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | > What is the getfattr output?
>>>>> | >
>>>>> | > Pranith
>>>>> | >
>>>>> | >>
>>>>> | >>
>>>>> | >> 2014-08-05 16:33
>>>>> GMT+03:00 Roman
>>>>> <romeo.r at gmail.com
>>>>> <mailto:romeo.r at gmail.com>
>>>>> | >> <mailto:romeo.r at gmail.com
>>>>> <mailto:romeo.r at gmail.com>>>:
>>>>> | >>
>>>>> | >> Nope, it is not
>>>>> working. But this time it went
>>>>> a bit other way
>>>>> | >>
>>>>> | >> root at gluster-client:~# dmesg
>>>>> | >> Segmentation fault
>>>>> | >>
>>>>> | >>
>>>>> | >> I was not able even
>>>>> to start the VM after I done
>>>>> the tests
>>>>> | >>
>>>>> | >> Could not read qcow2
>>>>> header: Operation not permitted
>>>>> | >>
>>>>> | >> And it seems, it
>>>>> never starts to sync files
>>>>> after first
>>>>> | >> disconnect. VM survives
>>>>> first disconnect, but not
>>>>> second (I
>>>>> | >> waited around 30
>>>>> minutes). Also, I've
>>>>> | >> got
>>>>> network.ping-timeout: 2 in
>>>>> volume settings, but logs
>>>>> | >> react on first
>>>>> disconnect around 30 seconds.
>>>>> Second was
>>>>> | >> faster, 2 seconds.
>>>>> | >>
>>>>> | >> Reaction was
>>>>> different also:
>>>>> | >>
>>>>> | >> slower one:
>>>>> | >> [2014-08-05
>>>>> 13:26:19.558435] W
>>>>> [socket.c:514:__socket_rwv]
>>>>> | >> 0-glusterfs: readv failed
>>>>> (Connection timed out)
>>>>> | >> [2014-08-05
>>>>> 13:26:19.558485] W
>>>>> | >>
>>>>> [socket.c:1962:__socket_proto_state_machine]
>>>>> 0-glusterfs:
>>>>> | >> reading from socket
>>>>> failed. Error (Connection
>>>>> timed out),
>>>>> | >> peer
>>>>> (10.250.0.1:24007
>>>>> <http://10.250.0.1:24007>
>>>>> <http://10.250.0.1:24007>)
>>>>> | >> [2014-08-05
>>>>> 13:26:21.281426] W
>>>>> [socket.c:514:__socket_rwv]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> readv failed (Connection timed
>>>>> out)
>>>>> | >> [2014-08-05
>>>>> 13:26:21.281474] W
>>>>> | >>
>>>>> [socket.c:1962:__socket_proto_state_machine]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> reading from socket failed.
>>>>> | >> Error (Connection
>>>>> timed out), peer
>>>>> (10.250.0.1:49153
>>>>> <http://10.250.0.1:49153>
>>>>> | >>
>>>>> <http://10.250.0.1:49153>)
>>>>> | >> [2014-08-05
>>>>> 13:26:21.281507] I
>>>>> | >>
>>>>> [client.c:2098:client_rpc_notify]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> disconnected
>>>>> | >>
>>>>> | >> the fast one:
>>>>> | >> 2014-08-05
>>>>> 12:52:44.607389] C
>>>>> | >>
>>>>> [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> server 10.250.0.2:49153
>>>>> <http://10.250.0.2:49153>
>>>>> | >>
>>>>> <http://10.250.0.2:49153> has
>>>>> not responded in the last 2
>>>>> | >> seconds, disconnecting.
>>>>> | >> [2014-08-05
>>>>> 12:52:44.607491] W
>>>>> [socket.c:514:__socket_rwv]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> readv failed (No data available)
>>>>> | >> [2014-08-05
>>>>> 12:52:44.607585] E
>>>>> | >>
>>>>> [rpc-clnt.c:368:saved_frames_unwind]
>>>>> | >>
>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>> | >> [0x7fcb1b4b0558]
>>>>> | >>
>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>> | >> [0x7fcb1b4aea63]
>>>>> | >>
>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>> | >> [0x7fcb1b4ae97e])))
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> forced
>>>>> | >> unwinding frame
>>>>> type(GlusterFS 3.3)
>>>>> op(LOOKUP(27)) called at
>>>>> | >> 2014-08-05
>>>>> 12:52:42.463881 (xid=0x381883x)
>>>>> | >> [2014-08-05
>>>>> 12:52:44.607604] W
>>>>> | >>
>>>>> [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> remote operation failed:
>>>>> | >> Transport endpoint is
>>>>> not connected. Path: /
>>>>> | >>
>>>>> (00000000-0000-0000-0000-000000000001)
>>>>> | >> [2014-08-05
>>>>> 12:52:44.607736] E
>>>>> | >>
>>>>> [rpc-clnt.c:368:saved_frames_unwind]
>>>>> | >>
>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>> | >> [0x7fcb1b4b0558]
>>>>> | >>
>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>> | >> [0x7fcb1b4aea63]
>>>>> | >>
>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>> | >> [0x7fcb1b4ae97e])))
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> forced
>>>>> | >> unwinding frame
>>>>> type(GlusterFS Handshake)
>>>>> op(PING(3)) called
>>>>> | >> at 2014-08-05
>>>>> 12:52:42.463891 (xid=0x381884x)
>>>>> | >> [2014-08-05
>>>>> 12:52:44.607753] W
>>>>> | >>
>>>>> [client-handshake.c:276:client_ping_cbk]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> timer must have expired
>>>>> | >> [2014-08-05
>>>>> 12:52:44.607776] I
>>>>> | >>
>>>>> [client.c:2098:client_rpc_notify]
>>>>> | >>
>>>>> 0-HA-fast-150G-PVE1-client-1:
>>>>> disconnected
>>>>> | >>
>>>>> | >>
>>>>> | >>
>>>>> | >> I've got SSD disks
>>>>> (just for an info).
>>>>> | >> Should I go and give
>>>>> a try for 3.5.2?
>>>>> | >>
>>>>> | >>
>>>>> | >>
>>>>> | >> 2014-08-05 13:06
>>>>> GMT+03:00 Pranith Kumar Karampuri
>>>>> | >> <pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>
>>>>> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>>:
>>>>> | >>
>>>>> | >> reply along with
>>>>> gluster-users please :-). May
>>>>> be you are
>>>>> | >> hitting 'reply' instead
>>>>> of 'reply all'?
>>>>> | >>
>>>>> | >> Pranith
>>>>> | >>
>>>>> | >> On 08/05/2014
>>>>> 03:35 PM, Roman wrote:
>>>>> | >>> To make sure
>>>>> and clean, I've created
>>>>> another VM with raw
>>>>> | >>> format and goint to
>>>>> repeat those steps. So now
>>>>> I've got
>>>>> | >>> two VM-s one with qcow2
>>>>> format and other with raw
>>>>> | >>> format. I will send
>>>>> another e-mail shortly.
>>>>> | >>>
>>>>> | >>>
>>>>> | >>> 2014-08-05 13:01
>>>>> GMT+03:00 Pranith Kumar Karampuri
>>>>> | >>> <pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>
>>>>> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>>:
>>>>> | >>>
>>>>> | >>>
>>>>> | >>> On 08/05/2014 03:07
>>>>> PM, Roman wrote:
>>>>> | >>>> really, seems like
>>>>> the same file
>>>>> | >>>>
>>>>> | >>>> stor1:
>>>>> | >>>>
>>>>> a951641c5230472929836f9fcede6b04
>>>>> | >>>>
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>>>
>>>>> | >>>> stor2:
>>>>> | >>>>
>>>>> a951641c5230472929836f9fcede6b04
>>>>> | >>>>
>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>>>
>>>>> | >>>>
>>>>> | >>>> one thing I've seen
>>>>> from logs, that somehow proxmox
>>>>> | >>>> VE is connecting with
>>>>> wrong version to servers?
>>>>> | >>>> [2014-08-05
>>>>> 09:23:45.218550] I
>>>>> | >>>>
>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>> | >>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> Using Program
>>>>> | >>>> GlusterFS 3.3, Num
>>>>> (1298437), Version (330)
>>>>> | >>> It is the rpc (over
>>>>> the network data structures)
>>>>> | >>> version, which is not
>>>>> changed at all from 3.3 so
>>>>> | >>> thats not a problem.
>>>>> So what is the conclusion? Is
>>>>> | >>> your test case working
>>>>> now or not?
>>>>> | >>>
>>>>> | >>> Pranith
>>>>> | >>>
>>>>> | >>>> but if I issue:
>>>>> | >>>> root at pve1:~# glusterfs -V
>>>>> | >>>> glusterfs 3.4.4 built
>>>>> on Jun 28 2014 03:44:57
>>>>> | >>>> seems ok.
>>>>> | >>>>
>>>>> | >>>> server use 3.4.4
>>>>> meanwhile
>>>>> | >>>> [2014-08-05
>>>>> 09:23:45.117875] I
>>>>> | >>>>
>>>>> [server-handshake.c:567:server_setvolume]
>>>>> | >>>>
>>>>> 0-HA-fast-150G-PVE1-server:
>>>>> accepted client from
>>>>> | >>>>
>>>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>> | >>>> (version: 3.4.4)
>>>>> | >>>> [2014-08-05
>>>>> 09:23:49.103035] I
>>>>> | >>>>
>>>>> [server-handshake.c:567:server_setvolume]
>>>>> | >>>>
>>>>> 0-HA-fast-150G-PVE1-server:
>>>>> accepted client from
>>>>> | >>>>
>>>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>> | >>>> (version: 3.4.4)
>>>>> | >>>>
>>>>> | >>>> if this could be the
>>>>> reason, of course.
>>>>> | >>>> I did restart the
>>>>> Proxmox VE yesterday (just for an
>>>>> | >>>> information)
>>>>> | >>>>
>>>>> | >>>>
>>>>> | >>>>
>>>>> | >>>>
>>>>> | >>>>
>>>>> | >>>> 2014-08-05 12:30
>>>>> GMT+03:00 Pranith Kumar Karampuri
>>>>> | >>>> <pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>
>>>>> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>>:
>>>>> | >>>>
>>>>> | >>>>
>>>>> | >>>> On 08/05/2014
>>>>> 02:33 PM, Roman wrote:
>>>>> | >>>>> Waited long
>>>>> enough for now, still different
>>>>> | >>>>> sizes and no
>>>>> logs about healing :(
>>>>> | >>>>>
>>>>> | >>>>> stor1
>>>>> | >>>>> # file:
>>>>> | >>>>>
>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>>>>
>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>> | >>>>>
>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>> | >>>>>
>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>> | >>>>>
>>>>> | >>>>> root at stor1:~# du -sh
>>>>> | >>>>>
>>>>> /exports/fast-test/150G/images/127/
>>>>> | >>>>> 1.2G
>>>>> /exports/fast-test/150G/images/127/
>>>>> | >>>>>
>>>>> | >>>>>
>>>>> | >>>>> stor2
>>>>> | >>>>> # file:
>>>>> | >>>>>
>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>> | >>>>>
>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>> | >>>>>
>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>> | >>>>>
>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>> | >>>>>
>>>>> | >>>>>
>>>>> | >>>>> root at stor2:~# du -sh
>>>>> | >>>>>
>>>>> /exports/fast-test/150G/images/127/
>>>>> | >>>>> 1.4G
>>>>> /exports/fast-test/150G/images/127/
>>>>> | >>>> According to the
>>>>> changelogs, the file doesn't
>>>>> | >>>> need any healing.
>>>>> Could you stop the operations
>>>>> | >>>> on the VMs and
>>>>> take md5sum on both these
>>>>> machines?
>>>>> | >>>>
>>>>> | >>>> Pranith
>>>>> | >>>>
>>>>> | >>>>>
>>>>> | >>>>>
>>>>> | >>>>>
>>>>> | >>>>>
>>>>> | >>>>> 2014-08-05 11:49
>>>>> GMT+03:00 Pranith Kumar
>>>>> | >>>>> Karampuri
>>>>> <pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>
>>>>> | >>>>>
>>>>> <mailto:pkarampu at redhat.com
>>>>> <mailto:pkarampu at redhat.com>>>:
>>>>> | >>>>>
>>>>> | >>>>>
>>>>> | >>>>> On
>>>>> 08/05/2014 02:06 PM, Roman wrote:
>>>>> | >>>>>> Well, it seems like
>>>>> it doesn't see the
>>>>> | >>>>>> changes were made to
>>>>> the volume ? I
>>>>> | >>>>>> created two files 200
>>>>> and 100 MB (from
>>>>> | >>>>>> /dev/zero) after I
>>>>> disconnected the first
>>>>> | >>>>>> brick. Then connected
>>>>> it back and got
>>>>> | >>>>>> these logs:
>>>>> | >>>>>>
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.830150] I
>>>>> | >>>>>>
>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>> | >>>>>> 0-glusterfs: No
>>>>> change in volfile, continuing
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.830207] I
>>>>> | >>>>>>
>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>> | >>>>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> changing
>>>>> | >>>>>> port to
>>>>> 49153 (from 0)
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.830239] W
>>>>> | >>>>>>
>>>>> [socket.c:514:__socket_rwv]
>>>>> | >>>>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> readv
>>>>> | >>>>>> failed (No data
>>>>> available)
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.831024] I
>>>>> | >>>>>>
>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>> | >>>>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> Using
>>>>> | >>>>>> Program GlusterFS
>>>>> 3.3, Num (1298437),
>>>>> | >>>>>> Version (330)
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.831375] I
>>>>> | >>>>>>
>>>>> [client-handshake.c:1456:client_setvolume_cbk]
>>>>> | >>>>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> Connected
>>>>> | >>>>>> to
>>>>> 10.250.0.1:49153
>>>>> <http://10.250.0.1:49153>
>>>>> | >>>>>>
>>>>> <http://10.250.0.1:49153>,
>>>>> attached to
>>>>> | >>>>>> remote volume
>>>>> '/exports/fast-test/150G'.
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.831394] I
>>>>> | >>>>>>
>>>>> [client-handshake.c:1468:client_setvolume_cbk]
>>>>> | >>>>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> Server and
>>>>> | >>>>>> Client lk-version
>>>>> numbers are not same,
>>>>> | >>>>>> reopening the fds
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.831566] I
>>>>> | >>>>>>
>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>> | >>>>>>
>>>>> 0-HA-fast-150G-PVE1-client-0:
>>>>> Server lk
>>>>> | >>>>>> version = 1
>>>>> | >>>>>>
>>>>> | >>>>>>
>>>>> | >>>>>> [2014-08-05
>>>>> 08:30:37.830150] I
>>>>> | >>>>>>
>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>> | >>>>>> 0-glusterfs: No
>>>>> change in volfile, continuing
>>>>> | >>>>>> this line
>>>>> seems weird to me tbh.
>>>>> | >>>>>> I do not
>>>>> see any traffic on switch
>>>>> | >>>>>> interfaces between
>>>>> gluster servers, which
>>>>>
> ...
>
> [Письмо показано не полностью]
>
>
>
>
> --
> Best regards,
> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140808/0b03af89/attachment.html>
More information about the Gluster-users
mailing list