[Gluster-users] libgfapi failover problem on replica bricks

Tue Sep 2 15:01:25 UTC 2014

Same here.
But it just never started to heal nor sync nor nothing, when I've wrote
first message :)
Now it runs very smoothly, except logging, which I will check tomorrow.
Thanks for feedback though !

2014-09-02 17:20 GMT+03:00 Peter Linder <peter at fiberdirekt.se>:

>  In my setup, proxmox does have a glusterfs mount but it is for
> management purposes only, ie creating images and such. The real business is
> done with libgfapi, which means that the kvm process itself is the gluster
> client. It will most certainly trigger a self-heal in itself so the self
> heal daemon wont pick it up, and it doesn't have anywhere to log that I
> know of.
>
> That being said, glusterfs has always recovered nicely whenever I have
> lost and recovered a server, but the healing seems to need an hour or so
> based on cpu and network usage graphs....
>
>
>
>
> On 9/1/2014 9:26 AM, Roman wrote:
>
> Hmm, I don't know how, but both VM-s survived the second server outage :)
> Still had no any message about healing completion anywhere :)
>
>
> 2014-09-01 10:13 GMT+03:00 Roman <romeo.r at gmail.com>:
>
>> The mount is on the proxmox machine.
>>
>>  here are the logs from disconnection till connection:
>>
>>
>> [2014-09-01 06:19:38.059383] W [socket.c:522:__socket_rwv] 0-glusterfs:
>> readv on 10.250.0.1:24007 failed (Connection timed out)
>> [2014-09-01 06:19:40.338393] W [socket.c:522:__socket_rwv]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: readv on 10.250.0.1:49159 failed
>> (Connection timed out)
>>  [2014-09-01 06:19:40.338447] I [client.c:2229:client_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: disconnected from 10.250.0.1:49159.
>> Client process will keep trying to connect to glusterd until brick's port
>> is available
>> [2014-09-01 06:19:49.196768] E [socket.c:2161:socket_connect_finish]
>> 0-glusterfs: connection to 10.250.0.1:24007 failed (No route to host)
>> [2014-09-01 06:20:05.565444] E [socket.c:2161:socket_connect_finish]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: connection to 10.250.0.1:24007
>> failed (No route to host)
>> [2014-09-01 06:23:26.607180] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: changing port to 49159 (from 0)
>> [2014-09-01 06:23:26.608032] I
>> [client-handshake.c:1677:select_server_supported_programs]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Using Program GlusterFS 3.3, Num
>> (1298437), Version (330)
>> [2014-09-01 06:23:26.608395] I
>> [client-handshake.c:1462:client_setvolume_cbk]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Connected to 10.250.0.1:49159,
>> attached to remote volume '/exports/HA-2TB-TT-Proxmox-cluster/2TB'.
>> [2014-09-01 06:23:26.608420] I
>> [client-handshake.c:1474:client_setvolume_cbk]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Server and Client lk-version numbers
>> are not same, reopening the fds
>> [2014-09-01 06:23:26.608606] I
>> [client-handshake.c:450:client_set_lk_version_cbk]
>> 0-HA-2TB-TT-Proxmox-cluster-client-0: Server lk version = 1
>> [2014-09-01 06:23:40.604979] I [glusterfsd-mgmt.c:1307:mgmt_getspec_cbk]
>> 0-glusterfs: No change in volfile, continuing
>>
>>  Now there is no healing traffic also. I could try to disconnect now
>> second server to see if it is going to failover. I don't really believe it
>> will :(
>>
>>  here are some logs for stor1 server (the one I've disconnected):
>>  root at stor1:~# cat
>> /var/log/glusterfs/bricks/exports-HA-2TB-TT-Proxmox-cluster-2TB.log
>> [2014-09-01 06:19:26.403323] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:26.403399] I [server-helpers.c:289:do_fd_cleanup]
>> 0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on
>> /images/112/vm-112-disk-1.raw
>> [2014-09-01 06:19:26.403486] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:29.475318] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:19:29.475373] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:36.963318] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:36.963373] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:40.419298] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:40.419355] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:42.531310] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:19:42.531368] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:23:25.088518] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> sisemon-141844-2014/08/28-19:27:19:824141-HA-2TB-TT-Proxmox-cluster-client-0-0-1
>> (version: 3.5.2)
>> [2014-09-01 06:23:25.532734] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> stor2-22775-2014/08/28-19:26:34:786262-HA-2TB-TT-Proxmox-cluster-client-0-0-1
>> (version: 3.5.2)
>> [2014-09-01 06:23:26.608074] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> pve1-289547-2014/08/28-19:27:22:605477-HA-2TB-TT-Proxmox-cluster-client-0-0-1
>> (version: 3.5.2)
>> [2014-09-01 06:23:27.187556] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> pve1-298005-2014/08/28-19:41:19:7269-HA-2TB-TT-Proxmox-cluster-client-0-0-1
>> (version: 3.5.2)
>> [2014-09-01 06:23:27.213890] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> stor2-22777-2014/08/28-19:26:34:791148-HA-2TB-TT-Proxmox-cluster-client-0-0-1
>> (version: 3.5.2)
>> [2014-09-01 06:23:31.222654] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-1
>> (version: 3.5.2)
>> [2014-09-01 06:23:52.591365] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:23:52.591447] W [inodelk.c:392:pl_inodelk_log_cleanup]
>> 0-HA-2TB-TT-Proxmox-cluster-server: releasing lock on
>> 14f70955-5e1e-4499-b66b-52cd50892315 held by {client=0x7f2494001ed0, pid=0
>> lk-owner=bc3ddbdbae7f0000}
>> [2014-09-01 06:23:52.591568] I [server-helpers.c:289:do_fd_cleanup]
>> 0-HA-2TB-TT-Proxmox-cluster-server: fd cleanup on
>> /images/124/vm-124-disk-1.qcow2
>> [2014-09-01 06:23:52.591679] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> pve1-494566-2014/08/29-01:00:13:257498-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:23:58.709444] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> (version: 3.5.2)
>> [2014-09-01 06:24:00.741542] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:24:00.741598] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> stor1-3975-2014/09/01-06:23:58:673930-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:30:06.010819] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> (version: 3.5.2)
>> [2014-09-01 06:30:08.056059] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:30:08.056127] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> stor1-4030-2014/09/01-06:30:05:976735-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:36:54.307743] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> (version: 3.5.2)
>> [2014-09-01 06:36:56.340078] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:36:56.340122] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> stor1-4077-2014/09/01-06:36:54:289911-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> [2014-09-01 06:46:53.601517] I [server-handshake.c:575:server_setvolume]
>> 0-HA-2TB-TT-Proxmox-cluster-server: accepted client from
>> stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>> (version: 3.5.2)
>> [2014-09-01 06:46:55.624705] I [server.c:520:server_rpc_notify]
>> 0-HA-2TB-TT-Proxmox-cluster-server: disconnecting connectionfrom
>> stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>  [2014-09-01 06:46:55.624793] I [client_t.c:417:gf_client_unref]
>> 0-HA-2TB-TT-Proxmox-cluster-server: Shutting down connection
>> stor2-6891-2014/09/01-06:46:53:583529-HA-2TB-TT-Proxmox-cluster-client-0-0-0
>>
>>  last 2 lines are pretty unclear. Why it has disconnected?
>>
>>
>>
>>
>> 2014-09-01 9:41 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>>
>>
>>> On 09/01/2014 12:08 PM, Roman wrote:
>>>
>>> Well, as for me, VM-s are not very impacted by healing process. At least
>>> the munin server running with pretty high load (average rarely goes below
>>> 0,9 :) )had no problems. To create some more load I've made a copy of 590
>>> MB file on the VM-s disk, It took 22 seconds. Which is ca 27 MB /sec or 214
>>> Mbps/sec
>>>
>>>  Servers are connected via 10 gbit network. Proxmox client is connected
>>> to the server with separate 1 gbps interface. We are thinking of moving it
>>> to 10gbps also.
>>>
>>>  Here are some heal info which is pretty confusing.
>>>
>>>  right after 1st server restored it connection, it was pretty clear:
>>>
>>>  root at stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info
>>> Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
>>> /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal
>>> Number of entries: 1
>>>
>>>  Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
>>> /images/124/vm-124-disk-1.qcow2 - Possibly undergoing heal
>>> /images/112/vm-112-disk-1.raw - Possibly undergoing heal
>>> Number of entries: 2
>>>
>>>
>>>  some time later is says
>>>  root at stor1:~# gluster volume heal HA-2TB-TT-Proxmox-cluster info
>>> Brick stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
>>> Number of entries: 0
>>>
>>>  Brick stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB/
>>> Number of entries: 0
>>>
>>>  while I can still see traffic between servers and still there was no
>>> messages about healing process completion.
>>>
>>>  On which machine do we have the mount?
>>>
>>> Pranith
>>>
>>>
>>>
>>>
>>> 2014-08-29 10:02 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>
>>> :
>>>
>>>>  Wow, this is great news! Thanks a lot for sharing the results :-). Did
>>>> you get a chance to test the performance of the applications in the vm
>>>> during self-heal?
>>>> May I know more about your use case? i.e. How many vms and what is the
>>>> avg size of each vm etc?
>>>>
>>>> Pranith
>>>>
>>>>
>>>> On 08/28/2014 11:27 PM, Roman wrote:
>>>>
>>>> Here are the results.
>>>> 1. still have problem with logs rotation. logs are being written to
>>>> .log.1 file, not .log file. any hints, how to fix?
>>>> 2. healing logs are now much more better, I can see the successful
>>>> message.
>>>> 3. both volumes with HD off and on successfully synced. the volume with
>>>> HD on synced much more faster.
>>>> 4. both VMs on volumes survived the outage, when new files were
>>>> added  and deleted during outage.
>>>>
>>>>  So replication works well with both HD on and off for volumes for
>>>> VM-s. With HD even faster. Need to solve the logging issue.
>>>>
>>>>  Seems we could start production storage from this moment :) The whole
>>>> company will use it. Some distributed and some replicated. Thanks for great
>>>> product.
>>>>
>>>>
>>>> 2014-08-27 16:03 GMT+03:00 Roman <romeo.r at gmail.com>:
>>>>
>>>>> Installed new packages. Will make some tests tomorrow. thanx.
>>>>>
>>>>>
>>>>> 2014-08-27 14:10 GMT+03:00 Pranith Kumar Karampuri <
>>>>> pkarampu at redhat.com>:
>>>>>
>>>>>
>>>>>> On 08/27/2014 04:38 PM, Kaleb KEITHLEY wrote:
>>>>>>
>>>>>>> On 08/27/2014 03:09 AM, Humble Chirammal wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>> | From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>>>>>>>> | To: "Humble Chirammal" <hchiramm at redhat.com>
>>>>>>>> | Cc: "Roman" <romeo.r at gmail.com>, gluster-users at gluster.org,
>>>>>>>> "Niels de Vos" <ndevos at redhat.com>
>>>>>>>> | Sent: Wednesday, August 27, 2014 12:34:22 PM
>>>>>>>> | Subject: Re: [Gluster-users] libgfapi failover problem on replica
>>>>>>>> bricks
>>>>>>>> |
>>>>>>>> |
>>>>>>>> | On 08/27/2014 12:24 PM, Roman wrote:
>>>>>>>> | > root at stor1:~# ls -l /usr/sbin/glfsheal
>>>>>>>> | > ls: cannot access /usr/sbin/glfsheal: No such file or directory
>>>>>>>> | > Seems like not.
>>>>>>>> | Humble,
>>>>>>>> |       Seems like the binary is still not packaged?
>>>>>>>>
>>>>>>>> Checking with Kaleb on this.
>>>>>>>>
>>>>>>>>  ...
>>>>>>>
>>>>>>>> | >>>             |
>>>>>>>> | >>>             | Humble/Niels,
>>>>>>>> | >>>             |      Do we have debs available for 3.5.2? In
>>>>>>>> 3.5.1
>>>>>>>> | >>>             there was packaging
>>>>>>>> | >>>             | issue where /usr/bin/glfsheal is not packaged
>>>>>>>> along
>>>>>>>> | >>>             with the deb. I
>>>>>>>> | >>>             | think that should be fixed now as well?
>>>>>>>> | >>>             |
>>>>>>>> | >>>             Pranith,
>>>>>>>> | >>>
>>>>>>>> | >>>             The 3.5.2 packages for debian is not available
>>>>>>>> yet. We
>>>>>>>> | >>>             are co-ordinating internally to get it processed.
>>>>>>>> | >>>             I will update the list once its available.
>>>>>>>> | >>>
>>>>>>>> | >>>             --Humble
>>>>>>>>
>>>>>>>
>>>>>>> glfsheal isn't in our 3.5.2-1 DPKGs either. We (meaning I) started
>>>>>>> with the 3.5.1 packaging bits from Semiosis. Perhaps he fixed 3.5.1 after
>>>>>>> giving me his bits.
>>>>>>>
>>>>>>> I'll fix it and spin 3.5.2-2 DPKGs.
>>>>>>>
>>>>>>  That is great Kaleb. Please notify semiosis as well in case he is
>>>>>> yet to fix it.
>>>>>>
>>>>>> Pranith
>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>>> Kaleb
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Best regards,
>>>>> Roman.
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Best regards,
>>>> Roman.
>>>>
>>>>
>>>>
>>>
>>>
>>>  --
>>> Best regards,
>>> Roman.
>>>
>>>
>>>
>>
>>
>>  --
>> Best regards,
>> Roman.
>>
>
>
>
>  --
> Best regards,
> Roman.
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140902/2f45f7ad/attachment.html>