[Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

Tue Jul 22 11:21:25 UTC 2014

On 07/22/2014 04:28 AM, Vijay Bellur wrote:
> On 07/21/2014 05:09 AM, Pranith Kumar Karampuri wrote:
>>
>> On 07/21/2014 02:08 PM, Jiri Moskovcak wrote:
>>> On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:
>>>>
>>>> On 07/19/2014 11:25 AM, Andrew Lau wrote:
>>>>>
>>>>>
>>>>> On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>>>>>
>>>>>
>>>>>     On 07/18/2014 05:43 PM, Andrew Lau wrote:
>>>>>>      
>>>>>>
>>>>>>     On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
>>>>>>     <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>>>>>>
>>>>>>         [Adding gluster-devel]
>>>>>>
>>>>>>
>>>>>>         On 07/18/2014 05:20 PM, Andrew Lau wrote:
>>>>>>
>>>>>>             Hi all,
>>>>>>
>>>>>>             As most of you have got hints from previous messages,
>>>>>>             hosted engine
>>>>>>             won't work on gluster . A quote from BZ1097639
>>>>>>
>>>>>>             "Using hosted engine with Gluster backed storage is
>>>>>>             currently something
>>>>>>             we really warn against.
>>>>>>
>>>>>>
>>>>>>             I think this bug should be closed or re-targeted at
>>>>>>             documentation, because there is nothing we can do here.
>>>>>>             Hosted engine assumes that all writes are atomic and
>>>>>>             (immediately) available for all hosts in the cluster.
>>>>>>             Gluster violates those assumptions.
>>>>>>             "
>>>>>>
>>>>>>         I tried going through BZ1097639 but could not find much
>>>>>>         detail with respect to gluster there.
>>>>>>
>>>>>>         A few questions around the problem:
>>>>>>
>>>>>>         1. Can somebody please explain in detail the scenario that
>>>>>>         causes the problem?
>>>>>>
>>>>>>         2. Is hosted engine performing synchronous writes to ensure
>>>>>>         that writes are durable?
>>>>>>
>>>>>>         Also, if there is any documentation that details the hosted
>>>>>>         engine architecture that would help in enhancing our
>>>>>>         understanding of its interactions with gluster.
>>>>>>
>>>>>>
>>>>>>             
>>>>>>
>>>>>>             Now my question, does this theory prevent a scenario of
>>>>>>             perhaps
>>>>>>             something like a gluster replicated volume being mounted
>>>>>>             as a glusterfs
>>>>>>             filesystem and then re-exported as the native kernel NFS
>>>>>>             share for the
>>>>>>             hosted-engine to consume? It could then be possible to
>>>>>>             chuck ctdb in
>>>>>>             there to provide a last resort failover solution. I have
>>>>>>             tried myself
>>>>>>             and suggested it to two people who are running a similar
>>>>>>             setup. Now
>>>>>>             using the native kernel NFS server for hosted-engine and
>>>>>>             they haven't
>>>>>>             reported as many issues. Curious, could anyone validate
>>>>>>             my theory on this?
>>>>>>
>>>>>>
>>>>>>         If we obtain more details on the use case and obtain gluster
>>>>>>         logs from the failed scenarios, we should be able to
>>>>>>         understand the problem better. That could be the first step
>>>>>>         in validating your theory or evolving further
>>>>>> recommendations :).
>>>>>>
>>>>>>
>>>>>>      I'm not sure how useful this is, but Jiri Moskovcak tracked
>>>>>>     this down in an off list message.
>>>>>>
>>>>>>      Message Quote:
>>>>>>
>>>>>>      ==
>>>>>>
>>>>>>     We were able to track it down to this (thanks Andrew for
>>>>>>     providing the testing setup):
>>>>>>
>>>>>>     -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
>>>>>>     Traceback (most recent call last):
>>>>>>     File
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>>>>
>>>>>>
>>>>>>     line 165, in handle
>>>>>>       response = "success " + self._dispatch(data)
>>>>>>     File
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>>>>
>>>>>>
>>>>>>     line 261, in _dispatch
>>>>>>       .get_all_stats_for_service_type(**options)
>>>>>>     File
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>
>>>>>>
>>>>>>     line 41, in get_all_stats_for_service_type
>>>>>>       d = self.get_raw_stats_for_service_type(storage_dir,
>>>>>> service_type)
>>>>>>     File
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>>>>
>>>>>>
>>>>>>     line 74, in get_raw_stats_for_service_type
>>>>>>       f = os.open(path, direct_flag | os.O_RDONLY)
>>>>>>     OSError: [Errno 116] Stale file handle:
>>>>>> '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'
>>>>>>
>>>>>>
>>>>>     Andrew/Jiri,
>>>>>             Would it be possible to post gluster logs of both the
>>>>>     mount and bricks on the bz? I can take a look at it once. If I
>>>>>     gather nothing then probably I will ask for your help in
>>>>>     re-creating the issue.
>>>>>
>>>>>     Pranith
>>>>>
>>>>>
>>>>> Unfortunately, I don't have the logs for that setup any more.. I'll
>>>>> try replicate when I get a chance. If I understand the comment from
>>>>> the BZ, I don't think it's a gluster bug per-say, more just how
>>>>> gluster does its replication.
>>>> hi Andrew,
>>>>           Thanks for that. I couldn't come to any conclusions
>>>> because no
>>>> logs were available. It is unlikely that self-heal is involved because
>>>> there were no bricks going down/up according to the bug description.
>>>>
>>>
>>> Hi,
>>> I've never had such setup, I guessed problem with gluster based on
>>> "OSError: [Errno 116] Stale file handle:" which happens when the file
>>> opened by application on client gets removed on the server. I'm pretty
>>> sure we (hosted-engine) don't remove that file, so I think it's some
>>> gluster magic moving the data around...
>> Hi,
>> Without bricks going up/down or there are new bricks added data is not
>> moved around by gluster unless a file operation comes to gluster. So I
>> am still not sure why this happened.
>>
>
> Does hosted engine perform deletion & re-creation of file
> <uuid>/ha_agent/hosted-engine.metadata in some operational sequence? In
> such a case, if this file is accessed by a stale gfid, ESTALE is possible.
>
> I see references to 2 hosted engines being operational in the bug report
> and that makes me wonder if this is a likely scenario?
>
> I am also curious to understand why NFS was chosen as the access method
> to the gluster volume. Isn't FUSE based access a possibility here?
>

it is, but it wasn't enabled in the setup due to multiple reports around 
gluster robustness with sanlock at the time.
iiuc, with replica 3 we should be in a much better place and re-enable 
it (also validating its replica 3 probably?)