[Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

Sat Jul 19 06:58:12 UTC 2014

On 07/19/2014 11:25 AM, Andrew Lau wrote:
>
>
> On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri 
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>
>
>     On 07/18/2014 05:43 PM, Andrew Lau wrote:
>>      
>>
>>     On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
>>     <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>>
>>         [Adding gluster-devel]
>>
>>
>>         On 07/18/2014 05:20 PM, Andrew Lau wrote:
>>
>>             Hi all,
>>
>>             As most of you have got hints from previous messages,
>>             hosted engine
>>             won't work on gluster . A quote from BZ1097639
>>
>>             "Using hosted engine with Gluster backed storage is
>>             currently something
>>             we really warn against.
>>
>>
>>             I think this bug should be closed or re-targeted at
>>             documentation, because there is nothing we can do here.
>>             Hosted engine assumes that all writes are atomic and
>>             (immediately) available for all hosts in the cluster.
>>             Gluster violates those assumptions.
>>             "
>>
>>         I tried going through BZ1097639 but could not find much
>>         detail with respect to gluster there.
>>
>>         A few questions around the problem:
>>
>>         1. Can somebody please explain in detail the scenario that
>>         causes the problem?
>>
>>         2. Is hosted engine performing synchronous writes to ensure
>>         that writes are durable?
>>
>>         Also, if there is any documentation that details the hosted
>>         engine architecture that would help in enhancing our
>>         understanding of its interactions with gluster.
>>
>>
>>             
>>
>>             Now my question, does this theory prevent a scenario of
>>             perhaps
>>             something like a gluster replicated volume being mounted
>>             as a glusterfs
>>             filesystem and then re-exported as the native kernel NFS
>>             share for the
>>             hosted-engine to consume? It could then be possible to
>>             chuck ctdb in
>>             there to provide a last resort failover solution. I have
>>             tried myself
>>             and suggested it to two people who are running a similar
>>             setup. Now
>>             using the native kernel NFS server for hosted-engine and
>>             they haven't
>>             reported as many issues. Curious, could anyone validate
>>             my theory on this?
>>
>>
>>         If we obtain more details on the use case and obtain gluster
>>         logs from the failed scenarios, we should be able to
>>         understand the problem better. That could be the first step
>>         in validating your theory or evolving further recommendations :).
>>
>>
>>      I'm not sure how useful this is, but Jiri Moskovcak tracked
>>     this down in an off list message.
>>
>>      Message Quote:
>>
>>      ==
>>
>>     We were able to track it down to this (thanks Andrew for
>>     providing the testing setup):
>>
>>     -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
>>     Traceback (most recent call last):
>>     File
>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>     line 165, in handle
>>       response = "success " + self._dispatch(data)
>>     File
>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>     line 261, in _dispatch
>>       .get_all_stats_for_service_type(**options)
>>     File
>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>     line 41, in get_all_stats_for_service_type
>>       d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>>     File
>>     "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>     line 74, in get_raw_stats_for_service_type
>>       f = os.open(path, direct_flag | os.O_RDONLY)
>>     OSError: [Errno 116] Stale file handle:
>>     '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'
>     Andrew/Jiri,
>             Would it be possible to post gluster logs of both the
>     mount and bricks on the bz? I can take a look at it once. If I
>     gather nothing then probably I will ask for your help in
>     re-creating the issue.
>
>     Pranith
>
>
> Unfortunately, I don't have the logs for that setup any more.. I'll 
> try replicate when I get a chance. If I understand the comment from 
> the BZ, I don't think it's a gluster bug per-say, more just how 
> gluster does its replication.
hi Andrew,
          Thanks for that. I couldn't come to any conclusions because no 
logs were available. It is unlikely that self-heal is involved because 
there were no bricks going down/up according to the bug description.

Pranith
>
>
>>
>>     It's definitely connected to the storage which leads us to the
>>     gluster, I'm not very familiar with the gluster so I need to
>>     check this with our gluster gurus.
>>
>>     == 
>>
>>         Thanks,
>>         Vijay
>>
>>
>>
>>
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel at gluster.org  <mailto:Gluster-devel at gluster.org>
>>     http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140719/102e56bd/attachment-0001.html>