[Gluster-users] libgfapi failover problem on replica bricks

Roman romeo.r at gmail.com
Thu Aug 7 06:53:29 UTC 2014


Ok, then I hope that we will be able to test it two weeks later. Thanks for
your time and  patience.


2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:

>
> On 08/07/2014 12:17 PM, Roman wrote:
>
> Well, one thing is definitely true: If there is no healing daemon running,
> I'm not able to start the VM after outage. Seems like the qcow2 file is
> corrupted (KVM unable to read its header).
>
> We shall see this again once I have the document with all the steps that
> need to be carried out :-)
>
> Pranith
>
>
>
> 2014-08-07 9:35 GMT+03:00 Roman <romeo.r at gmail.com>:
>
>> > This should not happen if you do the writes lets say from
>> '/dev/urandom' instead of '/dev/zero'
>>
>>  Somewhere deep inside me I thought so ! zero is zero :)
>>
>>  >I will provide you with a document for testing this issue properly. I
>> have a lot going on in my day job so not getting enough time to write that
>> out. Considering the weekend is approaching I will > get a bit of time
>> definitely over the weekend so I will send you the document over the
>> weekend.
>>
>>  Thank you a lot. I'll wait. Tomorrow starts my vacation and I'll be out
>> for two weeks, so don't hurry very much.
>>
>>
>>
>>
>>  2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>>
>>>
>>> On 08/07/2014 11:48 AM, Roman wrote:
>>>
>>> How can they be in sync, if they are different in size ? And why then VM
>>> is not able to survive gluster outage? I really want to use glusterfs in
>>> our production for infrastructure virtualization due to its simple setup,
>>> but I'm not able to at this moment. Maybe you've got some testing agenda?
>>> Or could you list me the steps to make right tests, so our VM-s would
>>> survive the outages.
>>>
>>> This is because of sparse files.
>>> http://en.wikipedia.org/wiki/Sparse_file
>>> This should not happen if you do the writes lets say from '/dev/urandom'
>>> instead of '/dev/zero'
>>>
>>> I will provide you with a document for testing this issue properly. I
>>> have a lot going on in my day job so not getting enough time to write that
>>> out. Considering the weekend is approaching I will get a bit of time
>>> definitely over the weekend so I will send you the document over the
>>> weekend.
>>>
>>> Pranith
>>>
>>>
>>>  We would like to be sure, that in situation, when one of storages is
>>> down, the VM-s are running - it is OK, we see this.
>>> We would like to be sure, that data after the server is back up is
>>> synced - we can't see that atm
>>> We would like to be sure, that VMs are failovering to the second storage
>>> during the outage - we can't see this atm
>>> :(
>>>
>>>
>>> 2014-08-07 9:12 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>>>
>>>>
>>>> On 08/07/2014 11:33 AM, Roman wrote:
>>>>
>>>> File size increases because of me :) I generate files on VM from
>>>> /dev/zero during the outage of one server. Then I bring up the downed
>>>> server and it seems files never sync. I'll keep on testing today. Can't
>>>> read much from logs also :(. This morning both VM-s (one on volume with
>>>> self-healing and other on volume without it) survived second server outage
>>>> (first server was down yesterday), while file sizes are different, VM-s ran
>>>> without problems. But I've restarted them before bringing the second
>>>> gluster server down.
>>>>
>>>> Then there is no bug :-). It seems the files are already in sync
>>>> according to the extended attributes you have pasted. How to do you test if
>>>> the files are in sync or not?
>>>>
>>>> Pranith
>>>>
>>>>
>>>>  So I'm a bit lost at this moment. I'll try to keep my testings
>>>> ordered and write here, what will happen.
>>>>
>>>>
>>>> 2014-08-07 8:29 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>
>>>> :
>>>>
>>>>>
>>>>> On 08/07/2014 10:46 AM, Roman wrote:
>>>>>
>>>>> yes, they do.
>>>>>
>>>>>  getfattr: Removing leading '/' from absolute path names
>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>
>>>>>  root at stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> 1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> root at stor1:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>>  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> root at stor1:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> 1.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>
>>>>>
>>>>>
>>>>>  root at stor2:~# getfattr -d -m. -e hex
>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> getfattr: Removing leading '/' from absolute path names
>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000
>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000
>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa
>>>>>
>>>>>  root at stor2:~# md5sum /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> c117d73c9f8a2e09ef13da31f7225fa6
>>>>>  /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> root at stor2:~# du -sh /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>> 2.6G    /exports/pve1/1T/images/125/vm-125-disk-1.qcow2
>>>>>
>>>>>  I think the files are differing in size because of the sparse file
>>>>> healing issue. Could you raise a bug with steps to re-create this issue
>>>>> where after healing size of the file is increasing?
>>>>>
>>>>> Pranith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  2014-08-06 12:49 GMT+03:00 Humble Chirammal <hchiramm at redhat.com>:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> | From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>> | To: "Roman" <romeo.r at gmail.com>
>>>>>> | Cc: gluster-users at gluster.org, "Niels de Vos" <ndevos at redhat.com>,
>>>>>> "Humble Chirammal" <hchiramm at redhat.com>
>>>>>> | Sent: Wednesday, August 6, 2014 12:09:57 PM
>>>>>> | Subject: Re: [Gluster-users] libgfapi failover problem on replica
>>>>>> bricks
>>>>>> |
>>>>>> | Roman,
>>>>>> |      The file went into split-brain. I think we should do these
>>>>>> tests
>>>>>> | with 3.5.2. Where monitoring the heals is easier. Let me also come
>>>>>> up
>>>>>> | with a document about how to do this testing you are trying to do.
>>>>>> |
>>>>>> | Humble/Niels,
>>>>>> |      Do we have debs available for 3.5.2? In 3.5.1 there was
>>>>>> packaging
>>>>>> | issue where /usr/bin/glfsheal is not packaged along with the deb. I
>>>>>> | think that should be fixed now as well?
>>>>>> |
>>>>>>  Pranith,
>>>>>>
>>>>>> The 3.5.2 packages for debian is not available yet. We are
>>>>>> co-ordinating internally to get it processed.
>>>>>> I will update the list once its available.
>>>>>>
>>>>>> --Humble
>>>>>> |
>>>>>> | On 08/06/2014 11:52 AM, Roman wrote:
>>>>>> | > good morning,
>>>>>> | >
>>>>>> | > root at stor1:~# getfattr -d -m. -e hex
>>>>>> | > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | > getfattr: Removing leading '/' from absolute path names
>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>> | >
>>>>>> | > getfattr: Removing leading '/' from absolute path names
>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>>>>>> | > trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449
>>>>>> | >
>>>>>> | >
>>>>>> | >
>>>>>> | > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <
>>>>>> pkarampu at redhat.com
>>>>>>  | > <mailto:pkarampu at redhat.com>>:
>>>>>> | >
>>>>>> | >
>>>>>> | >     On 08/06/2014 11:30 AM, Roman wrote:
>>>>>> | >>     Also, this time files are not the same!
>>>>>> | >>
>>>>>> | >>     root at stor1:~# md5sum
>>>>>> | >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>     32411360c53116b96a059f17306caeda
>>>>>> | >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>
>>>>>> | >>     root at stor2:~# md5sum
>>>>>> | >>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>>>> | >>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >     What is the getfattr output?
>>>>>> | >
>>>>>> | >     Pranith
>>>>>> | >
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r at gmail.com
>>>>>>  | >>     <mailto:romeo.r at gmail.com>>:
>>>>>> | >>
>>>>>> | >>         Nope, it is not working. But this time it went a bit
>>>>>> other way
>>>>>> | >>
>>>>>> | >>         root at gluster-client:~# dmesg
>>>>>> | >>         Segmentation fault
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>         I was not able even to start the VM after I done the
>>>>>> tests
>>>>>> | >>
>>>>>> | >>         Could not read qcow2 header: Operation not permitted
>>>>>> | >>
>>>>>> | >>         And it seems, it never starts to sync files after first
>>>>>> | >>         disconnect. VM survives first disconnect, but not second
>>>>>> (I
>>>>>> | >>         waited around 30 minutes). Also, I've
>>>>>> | >>         got network.ping-timeout: 2 in volume settings, but logs
>>>>>> | >>         react on first disconnect around 30 seconds. Second was
>>>>>> | >>         faster, 2 seconds.
>>>>>> | >>
>>>>>> | >>         Reaction was different also:
>>>>>> | >>
>>>>>> | >>         slower one:
>>>>>> | >>         [2014-08-05 13:26:19.558435] W
>>>>>> [socket.c:514:__socket_rwv]
>>>>>> | >>         0-glusterfs: readv failed (Connection timed out)
>>>>>> | >>         [2014-08-05 13:26:19.558485] W
>>>>>> | >>         [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
>>>>>> | >>         reading from socket failed. Error (Connection timed out),
>>>>>>  | >>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
>>>>>> | >>         [2014-08-05 13:26:21.281426] W
>>>>>> [socket.c:514:__socket_rwv]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: readv failed (Connection
>>>>>> timed out)
>>>>>> | >>         [2014-08-05 13:26:21.281474] W
>>>>>> | >>         [socket.c:1962:__socket_proto_state_machine]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: reading from socket failed.
>>>>>> | >>         Error (Connection timed out), peer (10.250.0.1:49153
>>>>>>  | >>         <http://10.250.0.1:49153>)
>>>>>> | >>         [2014-08-05 13:26:21.281507] I
>>>>>> | >>         [client.c:2098:client_rpc_notify]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-0: disconnected
>>>>>> | >>
>>>>>> | >>         the fast one:
>>>>>> | >>         2014-08-05 12:52:44.607389] C
>>>>>> | >>         [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
>>>>>>  | >>         <http://10.250.0.2:49153> has not responded in the
>>>>>> last 2
>>>>>>  | >>         seconds, disconnecting.
>>>>>> | >>         [2014-08-05 12:52:44.607491] W
>>>>>> [socket.c:514:__socket_rwv]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: readv failed (No data
>>>>>> available)
>>>>>> | >>         [2014-08-05 12:52:44.607585] E
>>>>>> | >>         [rpc-clnt.c:368:saved_frames_unwind]
>>>>>> | >>
>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>> | >>         [0x7fcb1b4b0558]
>>>>>> | >>
>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>> | >>         [0x7fcb1b4aea63]
>>>>>> | >>
>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>> | >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>>>>> | >>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
>>>>>> called at
>>>>>> | >>         2014-08-05 12:52:42.463881 (xid=0x381883x)
>>>>>> | >>         [2014-08-05 12:52:44.607604] W
>>>>>> | >>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
>>>>>> | >>         Transport endpoint is not connected. Path: /
>>>>>> | >>         (00000000-0000-0000-0000-000000000001)
>>>>>> | >>         [2014-08-05 12:52:44.607736] E
>>>>>> | >>         [rpc-clnt.c:368:saved_frames_unwind]
>>>>>> | >>
>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>>>> | >>         [0x7fcb1b4b0558]
>>>>>> | >>
>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>>>> | >>         [0x7fcb1b4aea63]
>>>>>> | >>
>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>>>> | >>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>>>>> | >>         unwinding frame type(GlusterFS Handshake) op(PING(3))
>>>>>> called
>>>>>> | >>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
>>>>>> | >>         [2014-08-05 12:52:44.607753] W
>>>>>> | >>         [client-handshake.c:276:client_ping_cbk]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: timer must have expired
>>>>>> | >>         [2014-08-05 12:52:44.607776] I
>>>>>> | >>         [client.c:2098:client_rpc_notify]
>>>>>> | >>         0-HA-fast-150G-PVE1-client-1: disconnected
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>         I've got SSD disks (just for an info).
>>>>>> | >>         Should I go and give a try for 3.5.2?
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
>>>>>>  | >>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>> | >>
>>>>>> | >>             reply along with gluster-users please :-). May be
>>>>>> you are
>>>>>> | >>             hitting 'reply' instead of 'reply all'?
>>>>>> | >>
>>>>>> | >>             Pranith
>>>>>> | >>
>>>>>> | >>             On 08/05/2014 03:35 PM, Roman wrote:
>>>>>> | >>>             To make sure and clean, I've created another VM
>>>>>> with raw
>>>>>> | >>>             format and goint to repeat those steps. So now I've
>>>>>> got
>>>>>> | >>>             two VM-s one with qcow2 format and other with raw
>>>>>> | >>>             format. I will send another e-mail shortly.
>>>>>> | >>>
>>>>>> | >>>
>>>>>> | >>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
>>>>>>  | >>>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com
>>>>>> >>:
>>>>>>  | >>>
>>>>>> | >>>
>>>>>> | >>>                 On 08/05/2014 03:07 PM, Roman wrote:
>>>>>> | >>>>                 really, seems like the same file
>>>>>> | >>>>
>>>>>> | >>>>                 stor1:
>>>>>> | >>>>                 a951641c5230472929836f9fcede6b04
>>>>>> | >>>>
>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>>>
>>>>>> | >>>>                 stor2:
>>>>>> | >>>>                 a951641c5230472929836f9fcede6b04
>>>>>> | >>>>
>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>                 one thing I've seen from logs, that somehow
>>>>>> proxmox
>>>>>> | >>>>                 VE is connecting with wrong version to servers?
>>>>>> | >>>>                 [2014-08-05 09:23:45.218550] I
>>>>>> | >>>>
>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>> | >>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
>>>>>> | >>>>                 GlusterFS 3.3, Num (1298437), Version (330)
>>>>>> | >>>                 It is the rpc (over the network data structures)
>>>>>> | >>>                 version, which is not changed at all from 3.3 so
>>>>>> | >>>                 thats not a problem. So what is the conclusion?
>>>>>> Is
>>>>>> | >>>                 your test case working now or not?
>>>>>> | >>>
>>>>>> | >>>                 Pranith
>>>>>> | >>>
>>>>>> | >>>>                 but if I issue:
>>>>>> | >>>>                 root at pve1:~# glusterfs -V
>>>>>> | >>>>                 glusterfs 3.4.4 built on Jun 28 2014 03:44:57
>>>>>> | >>>>                 seems ok.
>>>>>> | >>>>
>>>>>> | >>>>                 server  use 3.4.4 meanwhile
>>>>>> | >>>>                 [2014-08-05 09:23:45.117875] I
>>>>>> | >>>>                 [server-handshake.c:567:server_setvolume]
>>>>>> | >>>>                 0-HA-fast-150G-PVE1-server: accepted client
>>>>>> from
>>>>>> | >>>>
>>>>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>>> | >>>>                 (version: 3.4.4)
>>>>>> | >>>>                 [2014-08-05 09:23:49.103035] I
>>>>>> | >>>>                 [server-handshake.c:567:server_setvolume]
>>>>>> | >>>>                 0-HA-fast-150G-PVE1-server: accepted client
>>>>>> from
>>>>>> | >>>>
>>>>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>>> | >>>>                 (version: 3.4.4)
>>>>>> | >>>>
>>>>>> | >>>>                 if this could be the reason, of course.
>>>>>> | >>>>                 I did restart the Proxmox VE yesterday (just
>>>>>> for an
>>>>>> | >>>>                 information)
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar
>>>>>> Karampuri
>>>>>>  | >>>>                 <pkarampu at redhat.com <mailto:
>>>>>> pkarampu at redhat.com>>:
>>>>>>  | >>>>
>>>>>> | >>>>
>>>>>> | >>>>                     On 08/05/2014 02:33 PM, Roman wrote:
>>>>>> | >>>>>                     Waited long enough for now, still
>>>>>> different
>>>>>> | >>>>>                     sizes and no logs about healing :(
>>>>>> | >>>>>
>>>>>> | >>>>>                     stor1
>>>>>> | >>>>>                     # file:
>>>>>> | >>>>>
>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>>>>
>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>> | >>>>>
>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>> | >>>>>
>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>> | >>>>>
>>>>>> | >>>>>                     root at stor1:~# du -sh
>>>>>> | >>>>>                     /exports/fast-test/150G/images/127/
>>>>>> | >>>>>                     1.2G  /exports/fast-test/150G/images/127/
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>                     stor2
>>>>>> | >>>>>                     # file:
>>>>>> | >>>>>
>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>> | >>>>>
>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>> | >>>>>
>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>> | >>>>>
>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>                     root at stor2:~# du -sh
>>>>>> | >>>>>                     /exports/fast-test/150G/images/127/
>>>>>> | >>>>>                     1.4G  /exports/fast-test/150G/images/127/
>>>>>> | >>>>                     According to the changelogs, the file
>>>>>> doesn't
>>>>>> | >>>>                     need any healing. Could you stop the
>>>>>> operations
>>>>>> | >>>>                     on the VMs and take md5sum on both these
>>>>>> machines?
>>>>>> | >>>>
>>>>>> | >>>>                     Pranith
>>>>>> | >>>>
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>                     2014-08-05 11:49 GMT+03:00 Pranith Kumar
>>>>>> | >>>>>                     Karampuri <pkarampu at redhat.com
>>>>>>  | >>>>>                     <mailto:pkarampu at redhat.com>>:
>>>>>>  | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>                         On 08/05/2014 02:06 PM, Roman wrote:
>>>>>> | >>>>>>                         Well, it seems like it doesn't see
>>>>>> the
>>>>>> | >>>>>>                         changes were made to the volume ? I
>>>>>> | >>>>>>                         created two files 200 and 100 MB
>>>>>> (from
>>>>>> | >>>>>>                         /dev/zero) after I disconnected the
>>>>>> first
>>>>>> | >>>>>>                         brick. Then connected it back and got
>>>>>> | >>>>>>                         these logs:
>>>>>> | >>>>>>
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>> | >>>>>>
>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>> | >>>>>>                         0-glusterfs: No change in volfile,
>>>>>> continuing
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830207] I
>>>>>> | >>>>>>                         [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>> changing
>>>>>> | >>>>>>                         port to 49153 (from 0)
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830239] W
>>>>>> | >>>>>>                         [socket.c:514:__socket_rwv]
>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0: readv
>>>>>> | >>>>>>                         failed (No data available)
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831024] I
>>>>>> | >>>>>>
>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0: Using
>>>>>> | >>>>>>                         Program GlusterFS 3.3, Num (1298437),
>>>>>> | >>>>>>                         Version (330)
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831375] I
>>>>>> | >>>>>>
>>>>>> [client-handshake.c:1456:client_setvolume_cbk]
>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0:
>>>>>> Connected
>>>>>> | >>>>>>                         to 10.250.0.1:49153
>>>>>>  | >>>>>>                         <http://10.250.0.1:49153>,
>>>>>> attached to
>>>>>>  | >>>>>>                         remote volume
>>>>>> '/exports/fast-test/150G'.
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831394] I
>>>>>> | >>>>>>
>>>>>> [client-handshake.c:1468:client_setvolume_cbk]
>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server
>>>>>> and
>>>>>> | >>>>>>                         Client lk-version numbers are not
>>>>>> same,
>>>>>> | >>>>>>                         reopening the fds
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.831566] I
>>>>>> | >>>>>>
>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>> | >>>>>>                         0-HA-fast-150G-PVE1-client-0: Server
>>>>>> lk
>>>>>> | >>>>>>                         version = 1
>>>>>> | >>>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>> | >>>>>>
>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>> | >>>>>>                         0-glusterfs: No change in volfile,
>>>>>> continuing
>>>>>> | >>>>>>                         this line seems weird to me tbh.
>>>>>> | >>>>>>                         I do not see any traffic on switch
>>>>>> | >>>>>>                         interfaces between gluster servers,
>>>>>> which
>>>>>> | >>>>>>                         means, there is no syncing between
>>>>>> them.
>>>>>> | >>>>>>                         I tried to ls -l the files on the
>>>>>> client
>>>>>> | >>>>>>                         and servers to trigger the healing,
>>>>>> but
>>>>>> | >>>>>>                         seems like no success. Should I wait
>>>>>> more?
>>>>>> | >>>>>                         Yes, it should take around 10-15
>>>>>> minutes.
>>>>>> | >>>>>                         Could you provide 'getfattr -d -m. -e
>>>>>> hex
>>>>>> | >>>>>                         <file-on-brick>' on both the bricks.
>>>>>> | >>>>>
>>>>>> | >>>>>                         Pranith
>>>>>> | >>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>                         2014-08-05 11:25 GMT+03:00 Pranith
>>>>>> Kumar
>>>>>> | >>>>>>                         Karampuri <pkarampu at redhat.com
>>>>>>  | >>>>>>                         <mailto:pkarampu at redhat.com>>:
>>>>>>  | >>>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>                             On 08/05/2014 01:10 PM, Roman
>>>>>> wrote:
>>>>>> | >>>>>>>                             Ahha! For some reason I was not
>>>>>> able
>>>>>> | >>>>>>>                             to start the VM anymore,
>>>>>> Proxmox VE
>>>>>> | >>>>>>>                             told me, that it is not able to
>>>>>> read
>>>>>> | >>>>>>>                             the qcow2 header due to
>>>>>> permission
>>>>>> | >>>>>>>                             is denied for some reason. So I
>>>>>> just
>>>>>> | >>>>>>>                             deleted that file and created a
>>>>>> new
>>>>>> | >>>>>>>                             VM. And the nex message I've
>>>>>> got was
>>>>>> | >>>>>>>                             this:
>>>>>> | >>>>>>                             Seems like these are the messages
>>>>>> | >>>>>>                             where you took down the bricks
>>>>>> before
>>>>>> | >>>>>>                             self-heal. Could you restart the
>>>>>> run
>>>>>> | >>>>>>                             waiting for self-heals to
>>>>>> complete
>>>>>> | >>>>>>                             before taking down the next
>>>>>> brick?
>>>>>> | >>>>>>
>>>>>> | >>>>>>                             Pranith
>>>>>> | >>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>                             [2014-08-05 07:31:25.663412] E
>>>>>> | >>>>>>>
>>>>>> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
>>>>>> | >>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
>>>>>> | >>>>>>>                             Unable to self-heal contents of
>>>>>> | >>>>>>>
>>>>>> '/images/124/vm-124-disk-1.qcow2'
>>>>>> | >>>>>>>                             (possible split-brain). Please
>>>>>> | >>>>>>>                             delete the file from all but the
>>>>>> | >>>>>>>                             preferred subvolume.- Pending
>>>>>> | >>>>>>>                             matrix:  [ [ 0 60 ] [ 11 0 ] ]
>>>>>> | >>>>>>>                             [2014-08-05 07:31:25.663955] E
>>>>>> | >>>>>>>
>>>>>> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
>>>>>> | >>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
>>>>>> | >>>>>>>                             background  data self-heal
>>>>>> failed on
>>>>>> | >>>>>>>                             /images/124/vm-124-disk-1.qcow2
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>                             2014-08-05 10:13 GMT+03:00
>>>>>> Pranith
>>>>>> | >>>>>>>                             Kumar Karampuri <
>>>>>> pkarampu at redhat.com
>>>>>>  | >>>>>>>                             <mailto:pkarampu at redhat.com>>:
>>>>>>  | >>>>>>>
>>>>>> | >>>>>>>                                 I just responded to your
>>>>>> earlier
>>>>>> | >>>>>>>                                 mail about how the log
>>>>>> looks.
>>>>>> | >>>>>>>                                 The log comes on the
>>>>>> mount's logfile
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>                                 Pranith
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>                                 On 08/05/2014 12:41 PM,
>>>>>> Roman wrote:
>>>>>> | >>>>>>>>                                 Ok, so I've waited enough,
>>>>>> I
>>>>>> | >>>>>>>>                                 think. Had no any traffic
>>>>>> on
>>>>>> | >>>>>>>>                                 switch ports between
>>>>>> servers.
>>>>>> | >>>>>>>>                                 Could not find any
>>>>>> suitable log
>>>>>> | >>>>>>>>                                 message about completed
>>>>>> | >>>>>>>>                                 self-heal (waited about 30
>>>>>> | >>>>>>>>                                 minutes). Plugged out the
>>>>>> other
>>>>>> | >>>>>>>>                                 server's UTP cable this
>>>>>> time
>>>>>> | >>>>>>>>                                 and got in the same
>>>>>> situation:
>>>>>> | >>>>>>>>                                 root at gluster-test1:~# cat
>>>>>> | >>>>>>>>                                 /var/log/dmesg
>>>>>> | >>>>>>>>                                 -bash: /bin/cat:
>>>>>> Input/output error
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>                                 brick logs:
>>>>>> | >>>>>>>>                                 [2014-08-05
>>>>>> 07:09:03.005474] I
>>>>>> | >>>>>>>>
>>>>>> [server.c:762:server_rpc_notify]
>>>>>> | >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
>>>>>> | >>>>>>>>                                 disconnecting
>>>>>> connectionfrom
>>>>>> | >>>>>>>>
>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>> | >>>>>>>>                                 [2014-08-05
>>>>>> 07:09:03.005530] I
>>>>>> | >>>>>>>>
>>>>>> [server-helpers.c:729:server_connection_put]
>>>>>> | >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
>>>>>> | >>>>>>>>                                 Shutting down connection
>>>>>> | >>>>>>>>
>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>> | >>>>>>>>                                 [2014-08-05
>>>>>> 07:09:03.005560] I
>>>>>> | >>>>>>>>
>>>>>> [server-helpers.c:463:do_fd_cleanup]
>>>>>> | >>>>>>>>
>>>>>> 0-HA-fast-150G-PVE1-server: fd
>>>>>> | >>>>>>>>                                 cleanup on
>>>>>> | >>>>>>>>
>>>>>> /images/124/vm-124-disk-1.qcow2
>>>>>> | >>>>>>>>                                 [2014-08-05
>>>>>> 07:09:03.005797] I
>>>>>> | >>>>>>>>
>>>>>> [server-helpers.c:617:server_connection_destroy]
>>>>>> | >>>>>>>>                                 0-HA-fast-150G-PVE1-server:
>>>>>> | >>>>>>>>                                 destroyed connection of
>>>>>> | >>>>>>>>
>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>                                 2014-08-05 9:53 GMT+03:00
>>>>>> | >>>>>>>>                                 Pranith Kumar Karampuri
>>>>>> | >>>>>>>>                                 <pkarampu at redhat.com
>>>>>>  | >>>>>>>>                                 <mailto:
>>>>>> pkarampu at redhat.com>>:
>>>>>>  | >>>>>>>>
>>>>>> | >>>>>>>>                                     Do you think it is
>>>>>> possible
>>>>>> | >>>>>>>>                                     for you to do these
>>>>>> tests
>>>>>> | >>>>>>>>                                     on the latest version
>>>>>> | >>>>>>>>                                     3.5.2? 'gluster volume
>>>>>> heal
>>>>>> | >>>>>>>>                                     <volname> info' would
>>>>>> give
>>>>>> | >>>>>>>>                                     you that information in
>>>>>> | >>>>>>>>                                     versions > 3.5.1.
>>>>>> | >>>>>>>>                                     Otherwise you will
>>>>>> have to
>>>>>> | >>>>>>>>                                     check it from either
>>>>>> the
>>>>>> | >>>>>>>>                                     logs, there will be
>>>>>> | >>>>>>>>                                     self-heal completed
>>>>>> message
>>>>>> | >>>>>>>>                                     on the mount logs (or)
>>>>>> by
>>>>>> | >>>>>>>>                                     observing 'getfattr -d
>>>>>> -m.
>>>>>> | >>>>>>>>                                     -e hex
>>>>>> <image-file-on-bricks>'
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>                                     Pranith
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>                                     On 08/05/2014 12:09 PM,
>>>>>> | >>>>>>>>                                     Roman wrote:
>>>>>> | >>>>>>>>>                                     Ok, I understand. I
>>>>>> will
>>>>>> | >>>>>>>>>                                     try this shortly.
>>>>>> | >>>>>>>>>                                     How can I be sure,
>>>>>> that
>>>>>> | >>>>>>>>>                                     healing process is
>>>>>> done,
>>>>>> | >>>>>>>>>                                     if I am not able to
>>>>>> see
>>>>>> | >>>>>>>>>                                     its status?
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>                                     2014-08-05 9:30
>>>>>> GMT+03:00
>>>>>> | >>>>>>>>>                                     Pranith Kumar
>>>>>> Karampuri
>>>>>> | >>>>>>>>>                                     <pkarampu at redhat.com
>>>>>>  | >>>>>>>>>                                     <mailto:
>>>>>> pkarampu at redhat.com>>:
>>>>>>  | >>>>>>>>>
>>>>>> | >>>>>>>>>                                         Mounts will do the
>>>>>> | >>>>>>>>>                                         healing, not the
>>>>>> | >>>>>>>>>                                         self-heal-daemon.
>>>>>> The
>>>>>> | >>>>>>>>>                                         problem I feel is
>>>>>> that
>>>>>> | >>>>>>>>>                                         whichever process
>>>>>> does
>>>>>> | >>>>>>>>>                                         the healing has
>>>>>> the
>>>>>> | >>>>>>>>>                                         latest information
>>>>>> | >>>>>>>>>                                         about the good
>>>>>> bricks
>>>>>> | >>>>>>>>>                                         in this usecase.
>>>>>> Since
>>>>>> | >>>>>>>>>                                         for VM usecase,
>>>>>> mounts
>>>>>> | >>>>>>>>>                                         should have the
>>>>>> latest
>>>>>> | >>>>>>>>>                                         information, we
>>>>>> should
>>>>>> | >>>>>>>>>                                         let the mounts do
>>>>>> the
>>>>>> | >>>>>>>>>                                         healing. If the
>>>>>> mount
>>>>>> | >>>>>>>>>                                         accesses the VM
>>>>>> image
>>>>>> | >>>>>>>>>                                         either by someone
>>>>>> | >>>>>>>>>                                         doing operations
>>>>>> | >>>>>>>>>                                         inside the VM or
>>>>>> | >>>>>>>>>                                         explicit stat on
>>>>>> the
>>>>>> | >>>>>>>>>                                         file it should do
>>>>>> the
>>>>>> | >>>>>>>>>                                         healing.
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>                                         Pranith.
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>                                         On 08/05/2014
>>>>>> 10:39
>>>>>> | >>>>>>>>>                                         AM, Roman wrote:
>>>>>> | >>>>>>>>>>                                         Hmmm, you told
>>>>>> me to
>>>>>> | >>>>>>>>>>                                         turn it off. Did
>>>>>> I
>>>>>> | >>>>>>>>>>                                         understood
>>>>>> something
>>>>>> | >>>>>>>>>>                                         wrong? After I
>>>>>> issued
>>>>>> | >>>>>>>>>>                                         the command
>>>>>> you've
>>>>>> | >>>>>>>>>>                                         sent me, I was
>>>>>> not
>>>>>> | >>>>>>>>>>                                         able to watch the
>>>>>> | >>>>>>>>>>                                         healing process,
>>>>>> it
>>>>>> | >>>>>>>>>>                                         said, it won't be
>>>>>> | >>>>>>>>>>                                         healed, becouse
>>>>>> its
>>>>>> | >>>>>>>>>>                                         turned off.
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>                                         2014-08-05 5:39
>>>>>> | >>>>>>>>>>                                         GMT+03:00 Pranith
>>>>>> | >>>>>>>>>>                                         Kumar Karampuri
>>>>>> | >>>>>>>>>>                                         <
>>>>>> pkarampu at redhat.com
>>>>>>  | >>>>>>>>>>                                         <mailto:
>>>>>> pkarampu at redhat.com>>:
>>>>>>  | >>>>>>>>>>
>>>>>> | >>>>>>>>>>                                             You didn't
>>>>>> | >>>>>>>>>>                                             mention
>>>>>> anything
>>>>>> | >>>>>>>>>>                                             about
>>>>>> | >>>>>>>>>>
>>>>>> self-healing. Did
>>>>>> | >>>>>>>>>>                                             you wait
>>>>>> until
>>>>>> | >>>>>>>>>>                                             the
>>>>>> self-heal is
>>>>>> | >>>>>>>>>>                                             complete?
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>                                             Pranith
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>                                             On 08/04/2014
>>>>>> | >>>>>>>>>>                                             05:49 PM,
>>>>>> Roman
>>>>>> | >>>>>>>>>>                                             wrote:
>>>>>> | >>>>>>>>>>>                                             Hi!
>>>>>> | >>>>>>>>>>>                                             Result is
>>>>>> pretty
>>>>>> | >>>>>>>>>>>                                             same. I set
>>>>>> the
>>>>>> | >>>>>>>>>>>                                             switch port
>>>>>> down
>>>>>> | >>>>>>>>>>>                                             for 1st
>>>>>> server,
>>>>>> | >>>>>>>>>>>                                             it was ok.
>>>>>> Then
>>>>>> | >>>>>>>>>>>                                             set it up
>>>>>> back
>>>>>> | >>>>>>>>>>>                                             and set
>>>>>> other
>>>>>> | >>>>>>>>>>>                                             server's
>>>>>> port
>>>>>> | >>>>>>>>>>>                                             off. and it
>>>>>> | >>>>>>>>>>>                                             triggered IO
>>>>>> | >>>>>>>>>>>                                             error on two
>>>>>> | >>>>>>>>>>>                                             virtual
>>>>>> | >>>>>>>>>>>                                             machines:
>>>>>> one
>>>>>> | >>>>>>>>>>>                                             with local
>>>>>> root
>>>>>> | >>>>>>>>>>>                                             FS but
>>>>>> network
>>>>>> | >>>>>>>>>>>                                             mounted
>>>>>> storage.
>>>>>> | >>>>>>>>>>>                                             and other
>>>>>> with
>>>>>> | >>>>>>>>>>>                                             network
>>>>>> root FS.
>>>>>> | >>>>>>>>>>>                                             1st gave an
>>>>>> | >>>>>>>>>>>                                             error on
>>>>>> copying
>>>>>> | >>>>>>>>>>>                                             to or from
>>>>>> the
>>>>>> | >>>>>>>>>>>                                             mounted
>>>>>> network
>>>>>> | >>>>>>>>>>>                                             disk, other
>>>>>> just
>>>>>> | >>>>>>>>>>>                                             gave me an
>>>>>> error
>>>>>> | >>>>>>>>>>>                                             for even
>>>>>> reading
>>>>>> | >>>>>>>>>>>                                             log.files.
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                             cat:
>>>>>> | >>>>>>>>>>>
>>>>>> /var/log/alternatives.log:
>>>>>> | >>>>>>>>>>>
>>>>>> Input/output error
>>>>>> | >>>>>>>>>>>                                             then I
>>>>>> reset the
>>>>>> | >>>>>>>>>>>                                             kvm VM and
>>>>>> it
>>>>>> | >>>>>>>>>>>                                             said me,
>>>>>> there
>>>>>> | >>>>>>>>>>>                                             is no boot
>>>>>> | >>>>>>>>>>>                                             device.
>>>>>> Next I
>>>>>> | >>>>>>>>>>>                                             virtually
>>>>>> | >>>>>>>>>>>                                             powered it
>>>>>> off
>>>>>> | >>>>>>>>>>>                                             and then
>>>>>> back on
>>>>>> | >>>>>>>>>>>                                             and it has
>>>>>> booted.
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                             By the way,
>>>>>> did
>>>>>> | >>>>>>>>>>>                                             I have to
>>>>>> | >>>>>>>>>>>                                             start/stop
>>>>>> volume?
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                             >> Could
>>>>>> you do
>>>>>> | >>>>>>>>>>>                                             the
>>>>>> following
>>>>>> | >>>>>>>>>>>                                             and test it
>>>>>> again?
>>>>>> | >>>>>>>>>>>                                             >> gluster
>>>>>> volume
>>>>>> | >>>>>>>>>>>                                             set
>>>>>> <volname>
>>>>>> | >>>>>>>>>>>
>>>>>> cluster.self-heal-daemon
>>>>>> | >>>>>>>>>>>                                             off
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                             >>Pranith
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                             2014-08-04
>>>>>> 14:10
>>>>>> | >>>>>>>>>>>                                             GMT+03:00
>>>>>> | >>>>>>>>>>>                                             Pranith
>>>>>> Kumar
>>>>>> | >>>>>>>>>>>                                             Karampuri
>>>>>> | >>>>>>>>>>>                                             <
>>>>>> pkarampu at redhat.com
>>>>>>  | >>>>>>>>>>>                                             <mailto:
>>>>>> pkarampu at redhat.com>>:
>>>>>>  | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                                 On
>>>>>> | >>>>>>>>>>>
>>>>>> 08/04/2014
>>>>>> | >>>>>>>>>>>                                                 03:33
>>>>>> PM,
>>>>>> | >>>>>>>>>>>                                                 Roman
>>>>>> wrote:
>>>>>> | >>>>>>>>>>>>                                                 Hello!
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 Facing
>>>>>> the
>>>>>> | >>>>>>>>>>>>                                                 same
>>>>>> | >>>>>>>>>>>>
>>>>>> problem as
>>>>>> | >>>>>>>>>>>>
>>>>>> mentioned
>>>>>> | >>>>>>>>>>>>                                                 here:
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>
>>>>>> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 my set
>>>>>> up
>>>>>> | >>>>>>>>>>>>                                                 is up
>>>>>> and
>>>>>> | >>>>>>>>>>>>
>>>>>> running, so
>>>>>> | >>>>>>>>>>>>                                                 i'm
>>>>>> ready
>>>>>> | >>>>>>>>>>>>                                                 to
>>>>>> help you
>>>>>> | >>>>>>>>>>>>                                                 back
>>>>>> with
>>>>>> | >>>>>>>>>>>>
>>>>>> feedback.
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 setup:
>>>>>> | >>>>>>>>>>>>                                                 proxmox
>>>>>> | >>>>>>>>>>>>                                                 server
>>>>>> as
>>>>>> | >>>>>>>>>>>>                                                 client
>>>>>> | >>>>>>>>>>>>                                                 2
>>>>>> gluster
>>>>>> | >>>>>>>>>>>>
>>>>>> physical
>>>>>> | >>>>>>>>>>>>
>>>>>>  servers
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 server
>>>>>> side
>>>>>> | >>>>>>>>>>>>                                                 and
>>>>>> client
>>>>>> | >>>>>>>>>>>>                                                 side
>>>>>> both
>>>>>> | >>>>>>>>>>>>
>>>>>> running atm
>>>>>> | >>>>>>>>>>>>                                                 3.4.4
>>>>>> | >>>>>>>>>>>>
>>>>>> glusterfs
>>>>>> | >>>>>>>>>>>>                                                 from
>>>>>> | >>>>>>>>>>>>
>>>>>> gluster repo.
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 the
>>>>>> problem is:
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 1.
>>>>>> craeted
>>>>>> | >>>>>>>>>>>>
>>>>>> replica bricks.
>>>>>> | >>>>>>>>>>>>                                                 2.
>>>>>> mounted
>>>>>> | >>>>>>>>>>>>                                                 in
>>>>>> proxmox
>>>>>> | >>>>>>>>>>>>                                                 (tried
>>>>>> both
>>>>>> | >>>>>>>>>>>>                                                 promox
>>>>>> | >>>>>>>>>>>>                                                 ways:
>>>>>> via
>>>>>> | >>>>>>>>>>>>                                                 GUI and
>>>>>> | >>>>>>>>>>>>                                                 fstab
>>>>>> (with
>>>>>> | >>>>>>>>>>>>                                                 backup
>>>>>> | >>>>>>>>>>>>                                                 volume
>>>>>> | >>>>>>>>>>>>                                                 line),
>>>>>> btw
>>>>>> | >>>>>>>>>>>>                                                 while
>>>>>> | >>>>>>>>>>>>
>>>>>> mounting
>>>>>> | >>>>>>>>>>>>                                                 via
>>>>>> fstab
>>>>>> | >>>>>>>>>>>>                                                 I'm
>>>>>> unable
>>>>>> | >>>>>>>>>>>>                                                 to
>>>>>> launch a
>>>>>> | >>>>>>>>>>>>                                                 VM
>>>>>> without
>>>>>> | >>>>>>>>>>>>                                                 cache,
>>>>>> | >>>>>>>>>>>>
>>>>>> meanwhile
>>>>>> | >>>>>>>>>>>>
>>>>>> direct-io-mode
>>>>>> | >>>>>>>>>>>>                                                 is
>>>>>> enabled
>>>>>> | >>>>>>>>>>>>                                                 in
>>>>>> fstab line)
>>>>>> | >>>>>>>>>>>>                                                 3.
>>>>>> installed VM
>>>>>> | >>>>>>>>>>>>                                                 4.
>>>>>> bring
>>>>>> | >>>>>>>>>>>>                                                 one
>>>>>> volume
>>>>>> | >>>>>>>>>>>>                                                 down -
>>>>>> ok
>>>>>> | >>>>>>>>>>>>                                                 5.
>>>>>> bringing
>>>>>> | >>>>>>>>>>>>                                                 up,
>>>>>> waiting
>>>>>> | >>>>>>>>>>>>                                                 for
>>>>>> sync is
>>>>>> | >>>>>>>>>>>>                                                 done.
>>>>>> | >>>>>>>>>>>>                                                 6.
>>>>>> bring
>>>>>> | >>>>>>>>>>>>                                                 other
>>>>>> | >>>>>>>>>>>>                                                 volume
>>>>>> down
>>>>>> | >>>>>>>>>>>>                                                 -
>>>>>> getting
>>>>>> | >>>>>>>>>>>>                                                 IO
>>>>>> errors
>>>>>> | >>>>>>>>>>>>                                                 on VM
>>>>>> guest
>>>>>> | >>>>>>>>>>>>                                                 and not
>>>>>> | >>>>>>>>>>>>                                                 able to
>>>>>> | >>>>>>>>>>>>
>>>>>> restore the
>>>>>> | >>>>>>>>>>>>                                                 VM
>>>>>> after I
>>>>>> | >>>>>>>>>>>>                                                 reset
>>>>>> the
>>>>>> | >>>>>>>>>>>>                                                 VM via
>>>>>> | >>>>>>>>>>>>                                                 host.
>>>>>> It
>>>>>> | >>>>>>>>>>>>                                                 says
>>>>>> (no
>>>>>> | >>>>>>>>>>>>
>>>>>> bootable
>>>>>> | >>>>>>>>>>>>                                                 media).
>>>>>> | >>>>>>>>>>>>                                                 After I
>>>>>> | >>>>>>>>>>>>                                                 shut it
>>>>>> | >>>>>>>>>>>>                                                 down
>>>>>> | >>>>>>>>>>>>
>>>>>> (forced)
>>>>>> | >>>>>>>>>>>>                                                 and
>>>>>> bring
>>>>>> | >>>>>>>>>>>>                                                 back
>>>>>> up, it
>>>>>> | >>>>>>>>>>>>                                                 boots.
>>>>>> | >>>>>>>>>>>                                                 Could
>>>>>> you do
>>>>>> | >>>>>>>>>>>                                                 the
>>>>>> | >>>>>>>>>>>
>>>>>> following
>>>>>> | >>>>>>>>>>>                                                 and
>>>>>> test it
>>>>>> | >>>>>>>>>>>                                                 again?
>>>>>> | >>>>>>>>>>>                                                 gluster
>>>>>> | >>>>>>>>>>>                                                 volume
>>>>>> set
>>>>>> | >>>>>>>>>>>
>>>>>> <volname>
>>>>>> | >>>>>>>>>>>
>>>>>> cluster.self-heal-daemon
>>>>>> | >>>>>>>>>>>                                                 off
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                                 Pranith
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 Need
>>>>>> help.
>>>>>> | >>>>>>>>>>>>                                                 Tried
>>>>>> | >>>>>>>>>>>>                                                 3.4.3,
>>>>>> 3.4.4.
>>>>>> | >>>>>>>>>>>>                                                 Still
>>>>>> | >>>>>>>>>>>>                                                 missing
>>>>>> | >>>>>>>>>>>>                                                 pkg-s
>>>>>> for
>>>>>> | >>>>>>>>>>>>                                                 3.4.5
>>>>>> for
>>>>>> | >>>>>>>>>>>>                                                 debian
>>>>>> and
>>>>>> | >>>>>>>>>>>>                                                 3.5.2
>>>>>> | >>>>>>>>>>>>                                                 (3.5.1
>>>>>> | >>>>>>>>>>>>                                                 always
>>>>>> | >>>>>>>>>>>>                                                 gives a
>>>>>> | >>>>>>>>>>>>                                                 healing
>>>>>> | >>>>>>>>>>>>                                                 error
>>>>>> for
>>>>>> | >>>>>>>>>>>>                                                 some
>>>>>> reason)
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>                                                 --
>>>>>> | >>>>>>>>>>>>                                                 Best
>>>>>> regards,
>>>>>> | >>>>>>>>>>>>                                                 Roman.
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>
>>>>>> | >>>>>>>>>>>>
>>>>>> _______________________________________________
>>>>>> | >>>>>>>>>>>>
>>>>>> Gluster-users
>>>>>> | >>>>>>>>>>>>
>>>>>> mailing list
>>>>>> | >>>>>>>>>>>>
>>>>>> Gluster-users at gluster.org
>>>>>>  | >>>>>>>>>>>>
>>>>>> <mailto:Gluster-users at gluster.org>
>>>>>>  | >>>>>>>>>>>>
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>
>>>>>> | >>>>>>>>>>>                                             --
>>>>>> | >>>>>>>>>>>                                             Best
>>>>>> regards,
>>>>>> | >>>>>>>>>>>                                             Roman.
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>
>>>>>> | >>>>>>>>>>                                         --
>>>>>> | >>>>>>>>>>                                         Best regards,
>>>>>> | >>>>>>>>>>                                         Roman.
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>
>>>>>> | >>>>>>>>>                                     --
>>>>>> | >>>>>>>>>                                     Best regards,
>>>>>> | >>>>>>>>>                                     Roman.
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>
>>>>>> | >>>>>>>>                                 --
>>>>>> | >>>>>>>>                                 Best regards,
>>>>>> | >>>>>>>>                                 Roman.
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>
>>>>>> | >>>>>>>                             --
>>>>>> | >>>>>>>                             Best regards,
>>>>>> | >>>>>>>                             Roman.
>>>>>> | >>>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>
>>>>>> | >>>>>>                         --
>>>>>> | >>>>>>                         Best regards,
>>>>>> | >>>>>>                         Roman.
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>
>>>>>> | >>>>>                     --
>>>>>> | >>>>>                     Best regards,
>>>>>> | >>>>>                     Roman.
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>
>>>>>> | >>>>                 --
>>>>>> | >>>>                 Best regards,
>>>>>> | >>>>                 Roman.
>>>>>> | >>>
>>>>>> | >>>
>>>>>> | >>>
>>>>>> | >>>
>>>>>> | >>>             --
>>>>>> | >>>             Best regards,
>>>>>> | >>>             Roman.
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>         --
>>>>>> | >>         Best regards,
>>>>>> | >>         Roman.
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>
>>>>>> | >>     --
>>>>>> | >>     Best regards,
>>>>>> | >>     Roman.
>>>>>> | >
>>>>>> | >
>>>>>> | >
>>>>>> | >
>>>>>> | > --
>>>>>> | > Best regards,
>>>>>> | > Roman.
>>>>>> |
>>>>>> |
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Best regards,
>>>>> Roman.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Best regards,
>>>> Roman.
>>>>
>>>>
>>>>
>>>
>>>
>>>  --
>>> Best regards,
>>> Roman.
>>>
>>>
>>>
>>
>>
>>  --
>> Best regards,
>> Roman.
>>
>
>
>
>  --
> Best regards,
> Roman.
>
>
>


-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140807/0cf9efbd/attachment-0001.html>


More information about the Gluster-users mailing list