[Gluster-users] libgfapi failover problem on replica bricks

Fri Aug 8 10:23:42 UTC 2014

On 08/08/2014 11:35 AM, Roman wrote:
> Just to be sure: why do you guys create an updated version of 
> glusterfs package for wheezy, if it is not able to install it on 
> wheezy? :)
CCed lala, Humble, kaleb who may know the answer

Pranith
>
>
> 2014-08-08 9:03 GMT+03:00 Roman <romeo.r at gmail.com 
> <mailto:romeo.r at gmail.com>>:
>
>     Oh, unfortunately I won't be able to install 3.5.2 nor 3.4.5 :(
>     They both require libc6 update. I would not risk that way.
>
>      glusterfs-common : Depends: libc6 (>= 2.14) but 2.13-38+deb7u3 is
>     to be installed
>                         Depends: liblvm2app2.2 (>= 2.02.106) but
>     2.02.95-8 is to be installed
>                         Depends: librdmacm1 (>= 1.0.16) but
>     1.0.15-1+deb7u1 is to be installed
>
>
>
>     2014-08-07 15:32 GMT+03:00 Roman <romeo.r at gmail.com
>     <mailto:romeo.r at gmail.com>>:
>
>         I'm really sorry to bother, but it seems like all my previous
>         test were waste of time with those generated from /dev/zero
>         files :). Its good and bad news. Now I use real files for my
>         tests. As it my almost last workday, only things I prefer to
>         do is to test and document :) .. so here are some new results:
>
>         So this time I've got two gluster volumes:
>
>         1. with cluster.self-heal-daemon off
>         2. with cluster.self-heal-daemon on
>
>         1. real results with SHD off:
>         Seems like all is working as expected. VM survives both
>         glusterfs servers outage. And I'm able to see the sync via
>         network traffic. FINE!
>
>         Sometimes healing occurs a bit late (takes time from 1 minute
>         to 1 hour to sync). Don't know why. Ideas?
>
>         2. test results on server with SHD on:
>         VM is not able to survive second server restart (as was
>         previously defined). gives IO errors, Although files are
>         synced. Some locks, that do not allow KVM hypervisor to
>         reconnect to the storage in time?
>
>
>         So the problem actually is stripped files inside a VM :). If
>         one uses them (generates from /dev/zero ie), VM will crash and
>         never come up due to errors in qcow2 file headers. Another bug?
>
>
>
>
>
>
>
>         2014-08-07 9:53 GMT+03:00 Roman <romeo.r at gmail.com
>         <mailto:romeo.r at gmail.com>>:
>
>             Ok, then I hope that we will be able to test it two weeks
>             later. Thanks for your time and  patience.
>
>
>             2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri
>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>
>                 On 08/07/2014 12:17 PM, Roman wrote:
>>                 Well, one thing is definitely true: If there is no
>>                 healing daemon running, I'm not able to start the VM
>>                 after outage. Seems like the qcow2 file is corrupted
>>                 (KVM unable to read its header).
>                 We shall see this again once I have the document with
>                 all the steps that need to be carried out :-)
>
>                 Pranith
>>
>>
>>                 2014-08-07 9:35 GMT+03:00 Roman <romeo.r at gmail.com
>>                 <mailto:romeo.r at gmail.com>>:
>>
>>                     > This should not happen if you do the writes
>>                     lets say from '/dev/urandom' instead of '/dev/zero'
>>
>>                     Somewhere deep inside me I thought so ! zero is
>>                     zero :)
>>
>>                     >I will provide you with a document for testing
>>                     this issue properly. I have a lot going on in my
>>                     day job so not getting enough time to write that
>>                     out. Considering the weekend is approaching I
>>                     will > get a bit of time definitely over the
>>                     weekend so I will send you the document over the
>>                     weekend.
>>
>>                     Thank you a lot. I'll wait. Tomorrow starts my
>>                     vacation and I'll be out for two weeks, so don't
>>                     hurry very much.
>>
>>
>>
>>
>>                     2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri
>>                     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>
>>                         On 08/07/2014 11:48 AM, Roman wrote:
>>>                         How can they be in sync, if they are
>>>                         different in size ? And why then VM is not
>>>                         able to survive gluster outage? I really
>>>                         want to use glusterfs in our production for
>>>                         infrastructure virtualization due to its
>>>                         simple setup, but I'm not able to at this
>>>                         moment. Maybe you've got some testing
>>>                         agenda? Or could you list me the steps to
>>>                         make right tests, so our VM-s would survive
>>>                         the outages.
>>                         This is because of sparse files.
>>                         http://en.wikipedia.org/wiki/Sparse_file
>>                         This should not happen if you do the writes
>>                         lets say from '/dev/urandom' instead of
>>                         '/dev/zero'
>>
>>                         I will provide you with a document for
>>                         testing this issue properly. I have a lot
>>                         going on in my day job so not getting enough
>>                         time to write that out. Considering the
>>                         weekend is approaching I will get a bit of
>>                         time definitely over the weekend so I will
>>                         send you the document over the weekend.
>>
>>                         Pranith
>>>
>>>                         We would like to be sure, that in situation,
>>>                         when one of storages is down, the VM-s are
>>>                         running - it is OK, we see this.
>>>                         We would like to be sure, that data after
>>>                         the server is back up is synced - we can't
>>>                         see that atm
>>>                         We would like to be sure, that VMs are
>>>                         failovering to the second storage during the
>>>                         outage - we can't see this atm
>>>                         :(
>>>
>>>
>>>                         2014-08-07 9:12 GMT+03:00 Pranith Kumar
>>>                         Karampuri <pkarampu at redhat.com
>>>                         <mailto:pkarampu at redhat.com>>:
>>>
>>>
>>>                             On 08/07/2014 11:33 AM, Roman wrote:
>>>>                             File size increases because of me :) I
>>>>                             generate files on VM from /dev/zero
>>>>                             during the outage of one server. Then I
>>>>                             bring up the downed server and it seems
>>>>                             files never sync. I'll keep on testing
>>>>                             today. Can't read much from logs also
>>>>                             :(. This morning both VM-s (one on
>>>>                             volume with self-healing and other on
>>>>                             volume without it) survived second
>>>>                             server outage (first server was down
>>>>                             yesterday), while file sizes are
>>>>                             different, VM-s ran without problems.
>>>>                             But I've restarted them before bringing
>>>>                             the second gluster server down.
>>>                             Then there is no bug :-). It seems the
>>>                             files are already in sync according to
>>>                             the extended attributes you have pasted.
>>>                             How to do you test if the files are in
>>>                             sync or not?
>>>
>>>                             Pranith
>>>>
>>>>                             So I'm a bit lost at this moment. I'll
>>>>                             try to keep my testings ordered and
>>>>                             write here, what will happen.
>>>>
>>>>
>>>>                             2014-08-07 8:29 GMT+03:00 Pranith Kumar
>>>>                             Karampuri <pkarampu at redhat.com
>>>>                             <mailto:pkarampu at redhat.com>>:
>>>>
>>>>
>>>>                                 On 08/07/2014 10:46 AM, Roman wrote:
>>>>>                                 yes, they do.
>>>>>
>>>>>                                 getfattr: Removing leading '/'
>>>>>                                 from absolute path names
>>>>>                                 # file:
>>>>>                                 exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>>                                 trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>>                                 trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>
>>>>>                                 root at stor1:~# du -sh
>>>>>                                 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 1.6G
>>>>>                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 root at stor1:~# md5sum
>>>>>                                 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 c117d73c9f8a2e09ef13da31f7225fa6
>>>>>                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 root at stor1:~# du -sh
>>>>>                                 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 1.6G
>>>>>                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>
>>>>>
>>>>>
>>>>>                                 root at stor2:~# getfattr -d -m. -e
>>>>>                                 hex
>>>>>                                 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 getfattr: Removing leading '/'
>>>>>                                 from absolute path names
>>>>>                                 # file:
>>>>>                                 exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>>                                 trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>>                                 trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>
>>>>>                                 root at stor2:~# md5sum
>>>>>                                 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 c117d73c9f8a2e09ef13da31f7225fa6
>>>>>                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 root at stor2:~# du -sh
>>>>>                                 /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>                                 2.6G
>>>>>                                  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>                                 I think the files are differing in
>>>>                                 size because of the sparse file
>>>>                                 healing issue. Could you raise a
>>>>                                 bug with steps to re-create this
>>>>                                 issue where after healing size of
>>>>                                 the file is increasing?
>>>>
>>>>                                 Pranith
>>>>
>>>>>
>>>>>
>>>>>
>>>>>                                 2014-08-06 12:49 GMT+03:00 Humble
>>>>>                                 Chirammal <hchiramm at redhat.com
>>>>>                                 <mailto:hchiramm at redhat.com>>:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                                     ----- Original Message -----
>>>>>                                     | From: "Pranith Kumar
>>>>>                                     Karampuri"
>>>>>                                     <pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>>
>>>>>                                     | To: "Roman"
>>>>>                                     <romeo.r at gmail.com
>>>>>                                     <mailto:romeo.r at gmail.com>>
>>>>>                                     | Cc:
>>>>>                                     gluster-users at gluster.org
>>>>>                                     <mailto:gluster-users at gluster.org>,
>>>>>                                     "Niels de Vos"
>>>>>                                     <ndevos at redhat.com
>>>>>                                     <mailto:ndevos at redhat.com>>,
>>>>>                                     "Humble Chirammal"
>>>>>                                     <hchiramm at redhat.com
>>>>>                                     <mailto:hchiramm at redhat.com>>
>>>>>                                     | Sent: Wednesday, August 6,
>>>>>                                     2014 12:09:57 PM
>>>>>                                     | Subject: Re: [Gluster-users]
>>>>>                                     libgfapi failover problem on
>>>>>                                     replica bricks
>>>>>                                     |
>>>>>                                     | Roman,
>>>>>                                     |      The file went into
>>>>>                                     split-brain. I think we should
>>>>>                                     do these tests
>>>>>                                     | with 3.5.2. Where monitoring
>>>>>                                     the heals is easier. Let me
>>>>>                                     also come up
>>>>>                                     | with a document about how to
>>>>>                                     do this testing you are trying
>>>>>                                     to do.
>>>>>                                     |
>>>>>                                     | Humble/Niels,
>>>>>                                     |      Do we have debs
>>>>>                                     available for 3.5.2? In 3.5.1
>>>>>                                     there was packaging
>>>>>                                     | issue where
>>>>>                                     /usr/bin/glfsheal is not
>>>>>                                     packaged along with the deb. I
>>>>>                                     | think that should be fixed
>>>>>                                     now as well?
>>>>>                                     |
>>>>>                                     Pranith,
>>>>>
>>>>>                                     The 3.5.2 packages for debian
>>>>>                                     is not available yet. We are
>>>>>                                     co-ordinating internally to
>>>>>                                     get it processed.
>>>>>                                     I will update the list once
>>>>>                                     its available.
>>>>>
>>>>>                                     --Humble
>>>>>                                     |
>>>>>                                     | On 08/06/2014 11:52 AM,
>>>>>                                     Roman wrote:
>>>>>                                     | > good morning,
>>>>>                                     | >
>>>>>                                     | > root at stor1:~# getfattr -d
>>>>>                                     -m. -e hex
>>>>>                                     | >
>>>>>                                     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | > getfattr: Removing leading
>>>>>                                     '/' from absolute path names
>>>>>                                     | > # file:
>>>>>                                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>                                     | >
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>>>>>                                     | >
>>>>>                                     trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>                                     | >
>>>>>                                     | > getfattr: Removing leading
>>>>>                                     '/' from absolute path names
>>>>>                                     | > # file:
>>>>>                                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>>>>>                                     | >
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>                                     | >
>>>>>                                     trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>                                     | >
>>>>>                                     | >
>>>>>                                     | >
>>>>>                                     | > 2014-08-06 9:20 GMT+03:00
>>>>>                                     Pranith Kumar Karampuri
>>>>>                                     <pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>
>>>>>                                     | >
>>>>>                                     <mailto:pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>>>:
>>>>>                                     | >
>>>>>                                     | >
>>>>>                                     | >     On 08/06/2014 11:30
>>>>>                                     AM, Roman wrote:
>>>>>                                     | >> Also, this time files are
>>>>>                                     not the same!
>>>>>                                     | >>
>>>>>                                     | >> root at stor1:~# md5sum
>>>>>                                     | >>
>>>>>                                     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>
>>>>>                                     32411360c53116b96a059f17306caeda
>>>>>                                     | >>
>>>>>                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>
>>>>>                                     | >> root at stor2:~# md5sum
>>>>>                                     | >>
>>>>>                                     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>
>>>>>                                     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>>>                                     | >>
>>>>>                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | > What is the getfattr output?
>>>>>                                     | >
>>>>>                                     | > Pranith
>>>>>                                     | >
>>>>>                                     | >>
>>>>>                                     | >>
>>>>>                                     | >> 2014-08-05 16:33
>>>>>                                     GMT+03:00 Roman
>>>>>                                     <romeo.r at gmail.com
>>>>>                                     <mailto:romeo.r at gmail.com>
>>>>>                                     | >> <mailto:romeo.r at gmail.com
>>>>>                                     <mailto:romeo.r at gmail.com>>>:
>>>>>                                     | >>
>>>>>                                     | >>     Nope, it is not
>>>>>                                     working. But this time it went
>>>>>                                     a bit other way
>>>>>                                     | >>
>>>>>                                     | >> root at gluster-client:~# dmesg
>>>>>                                     | >> Segmentation fault
>>>>>                                     | >>
>>>>>                                     | >>
>>>>>                                     | >>     I was not able even
>>>>>                                     to start the VM after I done
>>>>>                                     the tests
>>>>>                                     | >>
>>>>>                                     | >>     Could not read qcow2
>>>>>                                     header: Operation not permitted
>>>>>                                     | >>
>>>>>                                     | >>     And it seems, it
>>>>>                                     never starts to sync files
>>>>>                                     after first
>>>>>                                     | >> disconnect. VM survives
>>>>>                                     first disconnect, but not
>>>>>                                     second (I
>>>>>                                     | >>     waited around 30
>>>>>                                     minutes). Also, I've
>>>>>                                     | >>     got
>>>>>                                     network.ping-timeout: 2 in
>>>>>                                     volume settings, but logs
>>>>>                                     | >>     react on first
>>>>>                                     disconnect around 30 seconds.
>>>>>                                     Second was
>>>>>                                     | >>     faster, 2 seconds.
>>>>>                                     | >>
>>>>>                                     | >>     Reaction was
>>>>>                                     different also:
>>>>>                                     | >>
>>>>>                                     | >>     slower one:
>>>>>                                     | >> [2014-08-05
>>>>>                                     13:26:19.558435] W
>>>>>                                     [socket.c:514:__socket_rwv]
>>>>>                                     | >> 0-glusterfs: readv failed
>>>>>                                     (Connection timed out)
>>>>>                                     | >> [2014-08-05
>>>>>                                     13:26:19.558485] W
>>>>>                                     | >>
>>>>>                                     [socket.c:1962:__socket_proto_state_machine]
>>>>>                                     0-glusterfs:
>>>>>                                     | >>     reading from socket
>>>>>                                     failed. Error (Connection
>>>>>                                     timed out),
>>>>>                                     | >>     peer
>>>>>                                     (10.250.0.1:24007
>>>>>                                     <http://10.250.0.1:24007>
>>>>>                                     <http://10.250.0.1:24007>)
>>>>>                                     | >>   [2014-08-05
>>>>>                                     13:26:21.281426] W
>>>>>                                     [socket.c:514:__socket_rwv]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     readv failed (Connection timed
>>>>>                                     out)
>>>>>                                     | >> [2014-08-05
>>>>>                                     13:26:21.281474] W
>>>>>                                     | >>
>>>>>                                     [socket.c:1962:__socket_proto_state_machine]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     reading from socket failed.
>>>>>                                     | >>     Error (Connection
>>>>>                                     timed out), peer
>>>>>                                     (10.250.0.1:49153
>>>>>                                     <http://10.250.0.1:49153>
>>>>>                                     | >>    
>>>>>                                     <http://10.250.0.1:49153>)
>>>>>                                     | >>   [2014-08-05
>>>>>                                     13:26:21.281507] I
>>>>>                                     | >>
>>>>>                                     [client.c:2098:client_rpc_notify]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     disconnected
>>>>>                                     | >>
>>>>>                                     | >>     the fast one:
>>>>>                                     | >>     2014-08-05
>>>>>                                     12:52:44.607389] C
>>>>>                                     | >>
>>>>>                                     [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     server 10.250.0.2:49153
>>>>>                                     <http://10.250.0.2:49153>
>>>>>                                     | >>    
>>>>>                                     <http://10.250.0.2:49153> has
>>>>>                                     not responded in the last 2
>>>>>                                     | >>   seconds, disconnecting.
>>>>>                                     | >> [2014-08-05
>>>>>                                     12:52:44.607491] W
>>>>>                                     [socket.c:514:__socket_rwv]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     readv failed (No data available)
>>>>>                                     | >> [2014-08-05
>>>>>                                     12:52:44.607585] E
>>>>>                                     | >>
>>>>>                                     [rpc-clnt.c:368:saved_frames_unwind]
>>>>>                                     | >>
>>>>>                                     (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>                                     | >> [0x7fcb1b4b0558]
>>>>>                                     | >>
>>>>>                                     (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>                                     | >> [0x7fcb1b4aea63]
>>>>>                                     | >>
>>>>>                                     (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>                                     | >> [0x7fcb1b4ae97e])))
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     forced
>>>>>                                     | >>     unwinding frame
>>>>>                                     type(GlusterFS 3.3)
>>>>>                                     op(LOOKUP(27)) called at
>>>>>                                     | >>     2014-08-05
>>>>>                                     12:52:42.463881 (xid=0x381883x)
>>>>>                                     | >> [2014-08-05
>>>>>                                     12:52:44.607604] W
>>>>>                                     | >>
>>>>>                                     [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     remote operation failed:
>>>>>                                     | >>     Transport endpoint is
>>>>>                                     not connected. Path: /
>>>>>                                     | >>
>>>>>                                     (00000000-0000-0000-0000-000000000001)
>>>>>                                     | >> [2014-08-05
>>>>>                                     12:52:44.607736] E
>>>>>                                     | >>
>>>>>                                     [rpc-clnt.c:368:saved_frames_unwind]
>>>>>                                     | >>
>>>>>                                     (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>                                     | >> [0x7fcb1b4b0558]
>>>>>                                     | >>
>>>>>                                     (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>                                     | >> [0x7fcb1b4aea63]
>>>>>                                     | >>
>>>>>                                     (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>                                     | >> [0x7fcb1b4ae97e])))
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     forced
>>>>>                                     | >>     unwinding frame
>>>>>                                     type(GlusterFS Handshake)
>>>>>                                     op(PING(3)) called
>>>>>                                     | >>     at 2014-08-05
>>>>>                                     12:52:42.463891 (xid=0x381884x)
>>>>>                                     | >> [2014-08-05
>>>>>                                     12:52:44.607753] W
>>>>>                                     | >>
>>>>>                                     [client-handshake.c:276:client_ping_cbk]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     timer must have expired
>>>>>                                     | >> [2014-08-05
>>>>>                                     12:52:44.607776] I
>>>>>                                     | >>
>>>>>                                     [client.c:2098:client_rpc_notify]
>>>>>                                     | >>
>>>>>                                     0-HA-fast-150G-PVE1-client-1:
>>>>>                                     disconnected
>>>>>                                     | >>
>>>>>                                     | >>
>>>>>                                     | >>
>>>>>                                     | >>     I've got SSD disks
>>>>>                                     (just for an info).
>>>>>                                     | >>     Should I go and give
>>>>>                                     a try for 3.5.2?
>>>>>                                     | >>
>>>>>                                     | >>
>>>>>                                     | >>
>>>>>                                     | >>     2014-08-05 13:06
>>>>>                                     GMT+03:00 Pranith Kumar Karampuri
>>>>>                                     | >>     <pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>
>>>>>                                     <mailto:pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>>>:
>>>>>                                     | >>
>>>>>                                     | >>         reply along with
>>>>>                                     gluster-users please :-). May
>>>>>                                     be you are
>>>>>                                     | >> hitting 'reply' instead
>>>>>                                     of 'reply all'?
>>>>>                                     | >>
>>>>>                                     | >> Pranith
>>>>>                                     | >>
>>>>>                                     | >>         On 08/05/2014
>>>>>                                     03:35 PM, Roman wrote:
>>>>>                                     | >>>             To make sure
>>>>>                                     and clean, I've created
>>>>>                                     another VM with raw
>>>>>                                     | >>> format and goint to
>>>>>                                     repeat those steps. So now
>>>>>                                     I've got
>>>>>                                     | >>> two VM-s one with qcow2
>>>>>                                     format and other with raw
>>>>>                                     | >>> format. I will send
>>>>>                                     another e-mail shortly.
>>>>>                                     | >>>
>>>>>                                     | >>>
>>>>>                                     | >>> 2014-08-05 13:01
>>>>>                                     GMT+03:00 Pranith Kumar Karampuri
>>>>>                                     | >>> <pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>
>>>>>                                     <mailto:pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>>>:
>>>>>                                     | >>>
>>>>>                                     | >>>
>>>>>                                     | >>>   On 08/05/2014 03:07
>>>>>                                     PM, Roman wrote:
>>>>>                                     | >>>>   really, seems like
>>>>>                                     the same file
>>>>>                                     | >>>>
>>>>>                                     | >>>>   stor1:
>>>>>                                     | >>>>
>>>>>                                     a951641c5230472929836f9fcede6b04
>>>>>                                     | >>>>
>>>>>                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>>>
>>>>>                                     | >>>>   stor2:
>>>>>                                     | >>>>
>>>>>                                     a951641c5230472929836f9fcede6b04
>>>>>                                     | >>>>
>>>>>                                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>>>
>>>>>                                     | >>>>
>>>>>                                     | >>>>   one thing I've seen
>>>>>                                     from logs, that somehow proxmox
>>>>>                                     | >>>>   VE is connecting with
>>>>>                                     wrong version to servers?
>>>>>                                     | >>>>   [2014-08-05
>>>>>                                     09:23:45.218550] I
>>>>>                                     | >>>>
>>>>>                                     [client-handshake.c:1659:select_server_supported_programs]
>>>>>                                     | >>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     Using Program
>>>>>                                     | >>>>   GlusterFS 3.3, Num
>>>>>                                     (1298437), Version (330)
>>>>>                                     | >>>   It is the rpc (over
>>>>>                                     the network data structures)
>>>>>                                     | >>>   version, which is not
>>>>>                                     changed at all from 3.3 so
>>>>>                                     | >>>   thats not a problem.
>>>>>                                     So what is the conclusion? Is
>>>>>                                     | >>>   your test case working
>>>>>                                     now or not?
>>>>>                                     | >>>
>>>>>                                     | >>>   Pranith
>>>>>                                     | >>>
>>>>>                                     | >>>>   but if I issue:
>>>>>                                     | >>>>   root at pve1:~# glusterfs -V
>>>>>                                     | >>>>   glusterfs 3.4.4 built
>>>>>                                     on Jun 28 2014 03:44:57
>>>>>                                     | >>>>   seems ok.
>>>>>                                     | >>>>
>>>>>                                     | >>>>   server  use 3.4.4
>>>>>                                     meanwhile
>>>>>                                     | >>>>   [2014-08-05
>>>>>                                     09:23:45.117875] I
>>>>>                                     | >>>>
>>>>>                                     [server-handshake.c:567:server_setvolume]
>>>>>                                     | >>>>
>>>>>                                     0-HA-fast-150G-PVE1-server:
>>>>>                                     accepted client from
>>>>>                                     | >>>>
>>>>>                                     stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>>                                     | >>>>   (version: 3.4.4)
>>>>>                                     | >>>>   [2014-08-05
>>>>>                                     09:23:49.103035] I
>>>>>                                     | >>>>
>>>>>                                     [server-handshake.c:567:server_setvolume]
>>>>>                                     | >>>>
>>>>>                                     0-HA-fast-150G-PVE1-server:
>>>>>                                     accepted client from
>>>>>                                     | >>>>
>>>>>                                     stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>>                                     | >>>>   (version: 3.4.4)
>>>>>                                     | >>>>
>>>>>                                     | >>>>   if this could be the
>>>>>                                     reason, of course.
>>>>>                                     | >>>>   I did restart the
>>>>>                                     Proxmox VE yesterday (just for an
>>>>>                                     | >>>>   information)
>>>>>                                     | >>>>
>>>>>                                     | >>>>
>>>>>                                     | >>>>
>>>>>                                     | >>>>
>>>>>                                     | >>>>
>>>>>                                     | >>>>   2014-08-05 12:30
>>>>>                                     GMT+03:00 Pranith Kumar Karampuri
>>>>>                                     | >>>>   <pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>
>>>>>                                     <mailto:pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>>>:
>>>>>                                     | >>>>
>>>>>                                     | >>>>
>>>>>                                     | >>>>       On 08/05/2014
>>>>>                                     02:33 PM, Roman wrote:
>>>>>                                     | >>>>>       Waited long
>>>>>                                     enough for now, still different
>>>>>                                     | >>>>>       sizes and no
>>>>>                                     logs about healing :(
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>       stor1
>>>>>                                     | >>>>>       # file:
>>>>>                                     | >>>>>
>>>>>                                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>>>>
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>                                     | >>>>>
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>                                     | >>>>>
>>>>>                                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>                                     | >>>>>
>>>>>                                     | >>>>> root at stor1:~# du -sh
>>>>>                                     | >>>>>
>>>>>                                     /exports/fast-test/150G/images/127/
>>>>>                                     | >>>>>       1.2G
>>>>>                                      /exports/fast-test/150G/images/127/
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>       stor2
>>>>>                                     | >>>>>       # file:
>>>>>                                     | >>>>>
>>>>>                                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                                     | >>>>>
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>                                     | >>>>>
>>>>>                                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>                                     | >>>>>
>>>>>                                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>> root at stor2:~# du -sh
>>>>>                                     | >>>>>
>>>>>                                     /exports/fast-test/150G/images/127/
>>>>>                                     | >>>>>       1.4G
>>>>>                                      /exports/fast-test/150G/images/127/
>>>>>                                     | >>>> According to the
>>>>>                                     changelogs, the file doesn't
>>>>>                                     | >>>>       need any healing.
>>>>>                                     Could you stop the operations
>>>>>                                     | >>>>       on the VMs and
>>>>>                                     take md5sum on both these
>>>>>                                     machines?
>>>>>                                     | >>>>
>>>>>                                     | >>>>       Pranith
>>>>>                                     | >>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>> 2014-08-05 11:49
>>>>>                                     GMT+03:00 Pranith Kumar
>>>>>                                     | >>>>> Karampuri
>>>>>                                     <pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>
>>>>>                                     | >>>>>
>>>>>                                     <mailto:pkarampu at redhat.com
>>>>>                                     <mailto:pkarampu at redhat.com>>>:
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>
>>>>>                                     | >>>>>           On
>>>>>                                     08/05/2014 02:06 PM, Roman wrote:
>>>>>                                     | >>>>>> Well, it seems like
>>>>>                                     it doesn't see the
>>>>>                                     | >>>>>> changes were made to
>>>>>                                     the volume ? I
>>>>>                                     | >>>>>> created two files 200
>>>>>                                     and 100 MB (from
>>>>>                                     | >>>>>> /dev/zero) after I
>>>>>                                     disconnected the first
>>>>>                                     | >>>>>> brick. Then connected
>>>>>                                     it back and got
>>>>>                                     | >>>>>> these logs:
>>>>>                                     | >>>>>>
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.830150] I
>>>>>                                     | >>>>>>
>>>>>                                     [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>                                     | >>>>>> 0-glusterfs: No
>>>>>                                     change in volfile, continuing
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.830207] I
>>>>>                                     | >>>>>>
>>>>>                                     [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>                                     | >>>>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     changing
>>>>>                                     | >>>>>>           port to
>>>>>                                     49153 (from 0)
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.830239] W
>>>>>                                     | >>>>>>
>>>>>                                     [socket.c:514:__socket_rwv]
>>>>>                                     | >>>>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     readv
>>>>>                                     | >>>>>> failed (No data
>>>>>                                     available)
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.831024] I
>>>>>                                     | >>>>>>
>>>>>                                     [client-handshake.c:1659:select_server_supported_programs]
>>>>>                                     | >>>>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     Using
>>>>>                                     | >>>>>> Program GlusterFS
>>>>>                                     3.3, Num (1298437),
>>>>>                                     | >>>>>> Version (330)
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.831375] I
>>>>>                                     | >>>>>>
>>>>>                                     [client-handshake.c:1456:client_setvolume_cbk]
>>>>>                                     | >>>>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     Connected
>>>>>                                     | >>>>>>           to
>>>>>                                     10.250.0.1:49153
>>>>>                                     <http://10.250.0.1:49153>
>>>>>                                     | >>>>>>          
>>>>>                                     <http://10.250.0.1:49153>,
>>>>>                                     attached to
>>>>>                                     | >>>>>> remote volume
>>>>>                                     '/exports/fast-test/150G'.
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.831394] I
>>>>>                                     | >>>>>>
>>>>>                                     [client-handshake.c:1468:client_setvolume_cbk]
>>>>>                                     | >>>>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     Server and
>>>>>                                     | >>>>>> Client lk-version
>>>>>                                     numbers are not same,
>>>>>                                     | >>>>>> reopening the fds
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.831566] I
>>>>>                                     | >>>>>>
>>>>>                                     [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>                                     | >>>>>>
>>>>>                                     0-HA-fast-150G-PVE1-client-0:
>>>>>                                     Server lk
>>>>>                                     | >>>>>> version = 1
>>>>>                                     | >>>>>>
>>>>>                                     | >>>>>>
>>>>>                                     | >>>>>> [2014-08-05
>>>>>                                     08:30:37.830150] I
>>>>>                                     | >>>>>>
>>>>>                                     [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>                                     | >>>>>> 0-glusterfs: No
>>>>>                                     change in volfile, continuing
>>>>>                                     | >>>>>>           this line
>>>>>                                     seems weird to me tbh.
>>>>>                                     | >>>>>>           I do not
>>>>>                                     see any traffic on switch
>>>>>                                     | >>>>>> interfaces between
>>>>>                                     gluster servers, which
>>>>>
>             ...
>
>             [Письмо показано не полностью] 
>
>
>
>
>         -- 
>         Best regards,
>         Roman.
>
>
>
>
>     -- 
>     Best regards,
>     Roman.
>
>
>
>
> -- 
> Best regards,
> Roman.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140808/0b03af89/attachment.html>