[Gluster-devel] [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?
Jiri Moskovcak
jmoskovc at redhat.com
Mon Jul 21 08:38:16 UTC 2014
On 07/19/2014 08:58 AM, Pranith Kumar Karampuri wrote:
>
> On 07/19/2014 11:25 AM, Andrew Lau wrote:
>>
>>
>> On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri
>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>>
>>
>> On 07/18/2014 05:43 PM, Andrew Lau wrote:
>>>
>>>
>>> On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur
>>> <vbellur at redhat.com <mailto:vbellur at redhat.com>> wrote:
>>>
>>> [Adding gluster-devel]
>>>
>>>
>>> On 07/18/2014 05:20 PM, Andrew Lau wrote:
>>>
>>> Hi all,
>>>
>>> As most of you have got hints from previous messages,
>>> hosted engine
>>> won't work on gluster . A quote from BZ1097639
>>>
>>> "Using hosted engine with Gluster backed storage is
>>> currently something
>>> we really warn against.
>>>
>>>
>>> I think this bug should be closed or re-targeted at
>>> documentation, because there is nothing we can do here.
>>> Hosted engine assumes that all writes are atomic and
>>> (immediately) available for all hosts in the cluster.
>>> Gluster violates those assumptions.
>>> "
>>>
>>> I tried going through BZ1097639 but could not find much
>>> detail with respect to gluster there.
>>>
>>> A few questions around the problem:
>>>
>>> 1. Can somebody please explain in detail the scenario that
>>> causes the problem?
>>>
>>> 2. Is hosted engine performing synchronous writes to ensure
>>> that writes are durable?
>>>
>>> Also, if there is any documentation that details the hosted
>>> engine architecture that would help in enhancing our
>>> understanding of its interactions with gluster.
>>>
>>>
>>>
>>>
>>> Now my question, does this theory prevent a scenario of
>>> perhaps
>>> something like a gluster replicated volume being mounted
>>> as a glusterfs
>>> filesystem and then re-exported as the native kernel NFS
>>> share for the
>>> hosted-engine to consume? It could then be possible to
>>> chuck ctdb in
>>> there to provide a last resort failover solution. I have
>>> tried myself
>>> and suggested it to two people who are running a similar
>>> setup. Now
>>> using the native kernel NFS server for hosted-engine and
>>> they haven't
>>> reported as many issues. Curious, could anyone validate
>>> my theory on this?
>>>
>>>
>>> If we obtain more details on the use case and obtain gluster
>>> logs from the failed scenarios, we should be able to
>>> understand the problem better. That could be the first step
>>> in validating your theory or evolving further recommendations :).
>>>
>>>
>>> I'm not sure how useful this is, but Jiri Moskovcak tracked
>>> this down in an off list message.
>>>
>>> Message Quote:
>>>
>>> ==
>>>
>>> We were able to track it down to this (thanks Andrew for
>>> providing the testing setup):
>>>
>>> -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
>>> Traceback (most recent call last):
>>> File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>> line 165, in handle
>>> response = "success " + self._dispatch(data)
>>> File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>> line 261, in _dispatch
>>> .get_all_stats_for_service_type(**options)
>>> File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>> line 41, in get_all_stats_for_service_type
>>> d = self.get_raw_stats_for_service_type(storage_dir, service_type)
>>> File
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>> line 74, in get_raw_stats_for_service_type
>>> f = os.open(path, direct_flag | os.O_RDONLY)
>>> OSError: [Errno 116] Stale file handle:
>>> '/rhev/data-center/mnt/localhost:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-engine.metadata'
>> Andrew/Jiri,
>> Would it be possible to post gluster logs of both the
>> mount and bricks on the bz? I can take a look at it once. If I
>> gather nothing then probably I will ask for your help in
>> re-creating the issue.
>>
>> Pranith
>>
>>
>> Unfortunately, I don't have the logs for that setup any more.. I'll
>> try replicate when I get a chance. If I understand the comment from
>> the BZ, I don't think it's a gluster bug per-say, more just how
>> gluster does its replication.
> hi Andrew,
> Thanks for that. I couldn't come to any conclusions because no
> logs were available. It is unlikely that self-heal is involved because
> there were no bricks going down/up according to the bug description.
>
Hi,
I've never had such setup, I guessed problem with gluster based on
"OSError: [Errno 116] Stale file handle:" which happens when the file
opened by application on client gets removed on the server. I'm pretty
sure we (hosted-engine) don't remove that file, so I think it's some
gluster magic moving the data around...
--Jirka
> Pranith
>>
>>
>>>
>>> It's definitely connected to the storage which leads us to the
>>> gluster, I'm not very familiar with the gluster so I need to
>>> check this with our gluster gurus.
>>>
>>> ==
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>
More information about the Gluster-devel
mailing list