[Gluster-users] [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

Sahina Bose sabose at redhat.com
Tue Jul 25 06:27:21 UTC 2017


On Tue, Jul 25, 2017 at 11:12 AM, Kasturi Narra <knarra at redhat.com> wrote:

> These errors are because not having glusternw assigned to the correct
> interface. Once you attach that these errors should go away.  This has
> nothing to do with the problem you are seeing.
>
> sahina any idea about engine not showing the correct volume info ?
>

Please provide the vdsm.log (contianing the gluster volume info) and
engine.log


> On Mon, Jul 24, 2017 at 7:30 PM, yayo (j) <jaganz at gmail.com> wrote:
>
>> Hi,
>>
>> UI refreshed but problem still remain ...
>>
>> No specific error, I've only these errors but I've read that there is no
>> problem if I have this kind of errors:
>>
>>
>> 2017-07-24 15:53:59,823+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] START, GlusterServersListVDSCommand(HostName =
>> node01.localdomain.local, VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
>> 2017-07-24 15:54:01,066+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] FINISH, GlusterServersListVDSCommand, return: [10.10.20.80/24:CONNECTED,
>> node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id: 29a62417
>> 2017-07-24 15:54:01,076+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] START, GlusterVolumesListVDSCommand(HostName =
>> node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync=
>> 'true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3
>> 2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick 'gdnode01:/gluster/engine/brick' of
>> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
>> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,212+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick 'gdnode02:/gluster/engine/brick' of
>> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
>> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,215+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick 'gdnode04:/gluster/engine/brick' of
>> volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no
>> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,218+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick 'gdnode01:/gluster/data/brick' of
>> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
>> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,221+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick 'gdnode02:/gluster/data/brick' of
>> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
>> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,224+02 WARN  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2)
>> [b7590c4] Could not associate brick 'gdnode04:/gluster/data/brick' of
>> volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no
>> gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
>> 2017-07-24 15:54:02,224+02 INFO  [org.ovirt.engine.core.vdsbro
>> ker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2)
>> [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d
>> -437b-8ba7-4f2a23d17515=org.ovirt.engine.core.
>> common.businessentities.gluster.GlusterVolumeEntity at fdc91062, c7a5dfc9
>> -3e72-4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.c
>> ommon.businessentities.gluster.GlusterVolumeEntity at 999a6f23}, log id: 7
>> fce25d3
>>
>>
>> Thank you
>>
>>
>> 2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra at redhat.com>:
>>
>>> Hi,
>>>
>>>    Regarding the UI showing incorrect information about engine and data
>>> volumes, can you please refresh the UI and see if the issue persists  plus
>>> any errors in the engine.log files ?
>>>
>>> Thanks
>>> kasturi
>>>
>>> On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar at redhat.com>
>>> wrote:
>>>
>>>>
>>>> On 07/21/2017 11:41 PM, yayo (j) wrote:
>>>>
>>>> Hi,
>>>>
>>>> Sorry for follow up again, but, checking the ovirt interface I've found
>>>> that ovirt report the "engine" volume as an "arbiter" configuration and the
>>>> "data" volume as full replicated volume. Check these screenshots:
>>>>
>>>>
>>>> This is probably some refresh bug in the UI, Sahina might be able to
>>>> tell you.
>>>>
>>>>
>>>> https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFf
>>>> VmR5aDQ?usp=sharing
>>>>
>>>> But the "gluster volume info" command report that all 2 volume are full
>>>> replicated:
>>>>
>>>>
>>>> *Volume Name: data*
>>>> *Type: Replicate*
>>>> *Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
>>>> *Status: Started*
>>>> *Snapshot Count: 0*
>>>> *Number of Bricks: 1 x 3 = 3*
>>>> *Transport-type: tcp*
>>>> *Bricks:*
>>>> *Brick1: gdnode01:/gluster/data/brick*
>>>> *Brick2: gdnode02:/gluster/data/brick*
>>>> *Brick3: gdnode04:/gluster/data/brick*
>>>> *Options Reconfigured:*
>>>> *nfs.disable: on*
>>>> *performance.readdir-ahead: on*
>>>> *transport.address-family: inet*
>>>> *storage.owner-uid: 36*
>>>> *performance.quick-read: off*
>>>> *performance.read-ahead: off*
>>>> *performance.io-cache: off*
>>>> *performance.stat-prefetch: off*
>>>> *performance.low-prio-threads: 32*
>>>> *network.remote-dio: enable*
>>>> *cluster.eager-lock: enable*
>>>> *cluster.quorum-type: auto*
>>>> *cluster.server-quorum-type: server*
>>>> *cluster.data-self-heal-algorithm: full*
>>>> *cluster.locking-scheme: granular*
>>>> *cluster.shd-max-threads: 8*
>>>> *cluster.shd-wait-qlength: 10000*
>>>> *features.shard: on*
>>>> *user.cifs: off*
>>>> *storage.owner-gid: 36*
>>>> *features.shard-block-size: 512MB*
>>>> *network.ping-timeout: 30*
>>>> *performance.strict-o-direct: on*
>>>> *cluster.granular-entry-heal: on*
>>>> *auth.allow: **
>>>> *server.allow-insecure: on*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Volume Name: engine*
>>>> *Type: Replicate*
>>>> *Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
>>>> *Status: Started*
>>>> *Snapshot Count: 0*
>>>> *Number of Bricks: 1 x 3 = 3*
>>>> *Transport-type: tcp*
>>>> *Bricks:*
>>>> *Brick1: gdnode01:/gluster/engine/brick*
>>>> *Brick2: gdnode02:/gluster/engine/brick*
>>>> *Brick3: gdnode04:/gluster/engine/brick*
>>>> *Options Reconfigured:*
>>>> *nfs.disable: on*
>>>> *performance.readdir-ahead: on*
>>>> *transport.address-family: inet*
>>>> *storage.owner-uid: 36*
>>>> *performance.quick-read: off*
>>>> *performance.read-ahead: off*
>>>> *performance.io-cache: off*
>>>> *performance.stat-prefetch: off*
>>>> *performance.low-prio-threads: 32*
>>>> *network.remote-dio: off*
>>>> *cluster.eager-lock: enable*
>>>> *cluster.quorum-type: auto*
>>>> *cluster.server-quorum-type: server*
>>>> *cluster.data-self-heal-algorithm: full*
>>>> *cluster.locking-scheme: granular*
>>>> *cluster.shd-max-threads: 8*
>>>> *cluster.shd-wait-qlength: 10000*
>>>> *features.shard: on*
>>>> *user.cifs: off*
>>>> *storage.owner-gid: 36*
>>>> *features.shard-block-size: 512MB*
>>>> *network.ping-timeout: 30*
>>>> *performance.strict-o-direct: on*
>>>> *cluster.granular-entry-heal: on*
>>>> *auth.allow: **
>>>>
>>>>           server.allow-insecure: on
>>>>
>>>>
>>>> 2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz at gmail.com>:
>>>>
>>>>> 2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar at redhat.com>:
>>>>>
>>>>>>
>>>>>> But it does  say something. All these gfids of completed heals in the
>>>>>> log below are the for the ones that you have given the getfattr output of.
>>>>>> So what is likely happening is there is an intermittent connection problem
>>>>>> between your mount and the brick process, leading to pending heals again
>>>>>> after the heal gets completed, which is why the numbers are varying each
>>>>>> time. You would need to check why that is the case.
>>>>>> Hope this helps,
>>>>>> Ravi
>>>>>>
>>>>>>
>>>>>>
>>>>>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>>>>>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>>>>>> sources=[0] 1  sinks=2*
>>>>>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>>>>>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>>>>>> 0-engine-replicate-0: performing metadata selfheal on
>>>>>> f05b9742-2771-484a-85fc-5b6974bcef81*
>>>>>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>>>>>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>>>>>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
>>>>>> sources=[0] 1  sinks=2*
>>>>>>
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> following your suggestion, I've checked the "peer" status and I found
>>>>> that there is too many name for the hosts, I don't know if this can be the
>>>>> problem or part of it:
>>>>>
>>>>> *gluster peer status on NODE01:*
>>>>> *Number of Peers: 2*
>>>>>
>>>>> *Hostname: dnode02.localdomain.local*
>>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *192.168.10.52*
>>>>> *dnode02.localdomain.local*
>>>>> *10.10.20.90*
>>>>> *10.10.10.20*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *gluster peer status on NODE02:*
>>>>> *Number of Peers: 2*
>>>>>
>>>>> *Hostname: dnode01.localdomain.local*
>>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *gdnode01*
>>>>> *10.10.10.10*
>>>>>
>>>>> *Hostname: gdnode04*
>>>>> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *192.168.10.54*
>>>>> *10.10.10.40*
>>>>>
>>>>>
>>>>> *gluster peer status on NODE04:*
>>>>> *Number of Peers: 2*
>>>>>
>>>>> *Hostname: dnode02.neridom.dom*
>>>>> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *10.10.20.90*
>>>>> *gdnode02*
>>>>> *192.168.10.52*
>>>>> *10.10.10.20*
>>>>>
>>>>> *Hostname: dnode01.localdomain.local*
>>>>> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
>>>>> *State: Peer in Cluster (Connected)*
>>>>> *Other names:*
>>>>> *gdnode01*
>>>>> *10.10.10.10*
>>>>>
>>>>>
>>>>>
>>>>> All these ip are pingable and hosts resolvible across all 3 nodes but,
>>>>> only the 10.10.10.0 network is the decidated network for gluster  (rosolved
>>>>> using gdnode* host names) ... You think that remove other entries can fix
>>>>> the problem? So, sorry, but, how can I remove other entries?
>>>>>
>>>> I don't think having extra entries could be a problem. Did you check
>>>> the fuse mount logs for disconnect messages that I referred to in the other
>>>> email?
>>>>
>>>>
>>>>> And, what about the selinux?
>>>>>
>>>> Not sure about this. See if there are disconnect messages in the mount
>>>> logs first.
>>>> -Ravi
>>>>
>>>>
>>>>> Thank you
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Linux User: 369739 http://counter.li.org
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>>
>> --
>> Linux User: 369739 http://counter.li.org
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170725/2d0913dc/attachment.html>


More information about the Gluster-users mailing list