[Gluster-users] [ovirt-users] 4.0 - 2nd node fails on deploy

Wed Oct 5 11:01:49 UTC 2016

"no route to host" is a network problem. Looks like quorum loss is appropriate. 

On October 5, 2016 12:31:18 PM GMT+02:00, Sahina Bose <sabose at redhat.com> wrote:
>On Wed, Oct 5, 2016 at 1:56 PM, Jason Jeffrey <jason at sudo.co.uk> wrote:
>
>> HI,
>>
>>
>>
>> Logs attached
>>
>
>Have you probed 2 interfaces for same host, that is - dcasrv02 and
>dcastor02? Does "gluster peer status" understand both names as for same
>host?
>
>From glusterd logs and the mount logs - the connection between the
>peers is
>lost, and quorum is lost, which is reaffirming what Simone said
>earlier.
>Logs seem to indicate network issues - check the direct link setup. See
>below
>
>From mount logs:
>[2016-10-04 17:26:15.718300] E [socket.c:2292:socket_connect_finish]
>0-engine-client-2: connection to 10.100.103.3:24007 failed (No route to
>host)
>[2016-10-04 17:26:15.718345] W [MSGID: 108001]
>[afr-common.c:4379:afr_notify] 0-engine-replicate-0: Client-quorum is
>not
>met
>[2016-10-04 17:26:16.428290] E [socket.c:2292:socket_connect_finish]
>0-engine-client-1: connection to 10.100.101.2:24007 failed (No route to
>host)
>[2016-10-04 17:26:16.428336] E [MSGID: 108006]
>[afr-common.c:4321:afr_notify] 0-engine-replicate-0: All subvolumes are
>down. Going offline until atleast one of them comes back up
>
>And in glusterd logs:
>[2016-10-04 17:24:39.522402] E [socket.c:2292:socket_connect_finish]
>0-management: connection to 10.100.50.82:24007 failed (No route to
>host)
>[2016-10-04 17:24:39.522578] I [MSGID: 106004]
>[glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer
><dcasrv02> (<1e788fc9-dfe9-4753-92c7-76a95c8d0891>), in state <Peer in
>Cluster>, has disconnected from glusterd.
>[2016-10-04 17:24:39.523272] C [MSGID: 106002]
>[glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action]
>0-management: Server quorum lost for volume engine. Stopping local
>bricks.
>[2016-10-04 17:24:39.523314] I [MSGID: 106132]
>[glusterd-utils.c:1560:glusterd_service_stop] 0-management: brick
>already
>stopped
>[2016-10-04 17:24:39.526188] E [socket.c:2292:socket_connect_finish]
>0-management: connection to 10.100.103.3:24007 failed (No route to
>host)
>[2016-10-04 17:24:39.526219] I [MSGID: 106004]
>[glusterd-handler.c:5201:__glusterd_peer_rpc_notify] 0-management: Peer
><dcastor03> (<9a9c037e-96cd-4f73-9800-a1df5cdd2818>), in state <Peer in
>Cluster>, has disconnected from glusterd.
>
>
>
>> Thanks
>>
>>
>>
>> *From:* Sahina Bose [mailto:sabose at redhat.com]
>> *Sent:* 05 October 2016 08:11
>> *To:* Jason Jeffrey <jason at sudo.co.uk>; gluster-users at gluster.org;
>> Ravishankar Narayanankutty <ravishankar at redhat.com>
>> *Cc:* Simone Tiraboschi <stirabos at redhat.com>; users
><users at ovirt.org>
>>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>> [Adding gluster-users ML]
>>
>> The brick logs are filled with errors :
>> [2016-10-05 19:30:28.659061] E [MSGID: 113077]
>[posix-handle.c:309:posix_handle_pump]
>> 0-engine-posix: malformed internal link /var/run/vdsm/storage/
>>
>0a021563-91b5-4f49-9c6b-fff45e85a025/d84f0551-0f2b-457c-808c-6369c6708d43/
>> 1b5a5e34-818c-4914-8192-2f05733b5583 for /xpool/engine/brick/.
>> glusterfs/b9/8e/b98ed8d2-3bf9-4b11-92fd-ca5324e131a8
>> [2016-10-05 19:30:28.659069] E [MSGID: 113091]
>[posix.c:180:posix_lookup]
>> 0-engine-posix: Failed to create inode handle for path
>> <gfid:b98ed8d2-3bf9-4b11-92fd-ca5324e131a8>
>> The message "E [MSGID: 113018] [posix.c:198:posix_lookup]
>0-engine-posix:
>> lstat on null failed" repeated 3 times between [2016-10-05
>19:30:28.656529]
>> and [2016-10-05 19:30:28.659076]
>> [2016-10-05 19:30:28.659087] W [MSGID: 115005]
>> [server-resolve.c:126:resolve_gfid_cbk] 0-engine-server:
>> b98ed8d2-3bf9-4b11-92fd-ca5324e131a8: failed to resolve (Success)
>>
>> - Ravi, the above are from the data brick of the arbiter volume. Can
>you
>> take a look?
>>
>>
>>
>> Jason,
>>
>> Could you also provide the mount logs from the first host
>> (/var/log/glusterfs/rhev-data-center-mnt-glusterSD*engine.log) and
>> glusterd log (/var/log/glusterfs/etc-glusterfs-glusterd.vol.log)
>around
>> the same time frame.
>>
>>
>>
>>
>>
>> On Wed, Oct 5, 2016 at 3:28 AM, Jason Jeffrey <jason at sudo.co.uk>
>wrote:
>>
>> Hi,
>>
>>
>>
>> Servers are powered  off  when I’m not looking at the problem.
>>
>>
>>
>> There may have been instances where all three were not powered on,
>during
>> the same period.
>>
>>
>>
>> Glusterhd log attached, the xpool-engine-brick log is over 1 GB in
>size,
>> I’ve taken a sample of the last  couple days, looks to be highly
>repative.
>>
>>
>>
>> Cheers
>>
>>
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
>> *Sent:* 04 October 2016 16:50
>>
>>
>> *To:* Jason Jeffrey <jason at sudo.co.uk>
>> *Cc:* users <users at ovirt.org>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Oct 4, 2016 at 5:22 PM, Jason Jeffrey <jason at sudo.co.uk>
>wrote:
>>
>> Hi,
>>
>>
>>
>> DCASTORXX is a hosts entry for dedicated  direct 10GB links (each
>private
>> /28) between the x3 servers  i.e 1=> 2&3, 2=> 1&3, etc) planned to be
>used
>> solely for storage.
>>
>>
>>
>> I,e
>>
>>
>>
>> 10.100.50.81    dcasrv01
>>
>> 10.100.101.1    dcastor01
>>
>> 10.100.50.82    dcasrv02
>>
>> 10.100.101.2    dcastor02
>>
>> 10.100.50.83    dcasrv03
>>
>> 10.100.103.3    dcastor03
>>
>>
>>
>> These were setup with the gluster commands
>>
>>
>>
>> ·         gluster volume create iso replica 3 arbiter 1
>> dcastor01:/xpool/iso/brick   dcastor02:/xpool/iso/brick
>> dcastor03:/xpool/iso/brick
>>
>> ·         gluster volume create export replica 3 arbiter 1
>> dcastor02:/xpool/export/brick  dcastor03:/xpool/export/brick
>> dcastor01:/xpool/export/brick
>>
>> ·         gluster volume create engine replica 3 arbiter 1
>> dcastor01:/xpool/engine/brick dcastor02:/xpool/engine/brick
>> dcastor03:/xpool/engine/brick
>>
>> ·         gluster volume create data replica 3 arbiter 1
>> dcastor01:/xpool/data/brick  dcastor03:/xpool/data/brick
>> dcastor02:/xpool/data/bricky
>>
>>
>>
>>
>>
>> So yes, DCASRV01 is the server (pri) and have local bricks access
>through
>> DCASTOR01 interface
>>
>>
>>
>> Is the issue here not the incorrect soft link ?
>>
>>
>>
>> No, this should be fine.
>>
>>
>>
>> The issue is that periodically your gluster volume losses its server
>> quorum and become unavailable.
>>
>> It happened more than once from your logs.
>>
>>
>>
>> Can you please attach also gluster logs for that volume?
>>
>>
>>
>>
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
>>
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
>> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>>
>> [root at dcasrv01 /]# ls -al
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-
>> 76a4876ecaaf/
>>
>> ls: cannot access
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
>> No such file or directory
>>
>> But the data does exist
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>>
>> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>>
>> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>>
>> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
>> c47e6f9cbc93
>>
>> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
>> c47e6f9cbc93.lease
>>
>> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
>cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com]
>> *Sent:* 04 October 2016 14:40
>>
>>
>> *To:* Jason Jeffrey <jason at sudo.co.uk>
>> *Cc:* users <users at ovirt.org>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Oct 4, 2016 at 10:51 AM, Simone Tiraboschi
><stirabos at redhat.com>
>> wrote:
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 11:56 PM, Jason Jeffrey <jason at sudo.co.uk>
>wrote:
>>
>> Hi,
>>
>>
>>
>> Another problem has appeared, after rebooting the primary the VM will
>not
>> start.
>>
>>
>>
>> Appears the symlink is broken between gluster mount ref and vdsm
>>
>>
>>
>> The first host was correctly deployed but it seas that you are facing
>some
>> issue connecting the storage.
>>
>> Can you please attach vdsm logs and /var/log/messages from the first
>host?
>>
>>
>>
>> Thanks Jason,
>>
>> I suspect that your issue is related to this:
>>
>> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]:
>[2016-10-04
>> 17:24:39.522620] C [MSGID: 106002] [glusterd-server-quorum.c:351:
>> glusterd_do_volume_quorum_action] 0-management: Server quorum lost
>for
>> volume data. Stopping local bricks.
>>
>> Oct  4 18:24:39 dcasrv01 etc-glusterfs-glusterd.vol[2252]:
>[2016-10-04
>> 17:24:39.523272] C [MSGID: 106002] [glusterd-server-quorum.c:351:
>> glusterd_do_volume_quorum_action] 0-management: Server quorum lost
>for
>> volume engine. Stopping local bricks.
>>
>>
>>
>> and for some time your gluster volume has been working.
>>
>>
>>
>> But then:
>>
>> Oct  4 19:02:09 dcasrv01 systemd: Started /usr/bin/mount -t glusterfs
>-o
>> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
>> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>>
>> Oct  4 19:02:09 dcasrv01 systemd: Starting /usr/bin/mount -t
>glusterfs -o
>> backup-volfile-servers=dcastor02:dcastor03 dcastor01:engine
>> /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>>
>> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
>> packages/yajsonrpc/stomp.py:352: DeprecationWarning:
>Dispatcher.pending
>> is deprecated. Use Dispatcher.socket.pending instead.
>>
>> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending =
>getattr(dispatcher,
>> 'pending', lambda: 0)
>>
>> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: /usr/lib/python2.7/site-
>> packages/yajsonrpc/stomp.py:352: DeprecationWarning:
>Dispatcher.pending
>> is deprecated. Use Dispatcher.socket.pending instead.
>>
>> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent: pending =
>getattr(dispatcher,
>> 'pending', lambda: 0)
>>
>> Oct  4 19:02:11 dcasrv01 journal: vdsm vds.dispatcher ERROR SSL error
>> during reading data: unexpected eof
>>
>> Oct  4 19:02:11 dcasrv01 journal: ovirt-ha-agent
>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to
>> storage server failed' - trying to restart agent
>>
>> Oct  4 19:02:11 dcasrv01 ovirt-ha-agent:
>ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error:
>> 'Connection to storage server failed' - trying to restart agent
>>
>> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]:
>[2016-10-04
>> 18:02:12.384611] C [MSGID: 106003] [glusterd-server-quorum.c:346:
>> glusterd_do_volume_quorum_action] 0-management: Server quorum
>regained
>> for volume data. Starting local bricks.
>>
>> Oct  4 19:02:12 dcasrv01 etc-glusterfs-glusterd.vol[2252]:
>[2016-10-04
>> 18:02:12.388981] C [MSGID: 106003] [glusterd-server-quorum.c:346:
>> glusterd_do_volume_quorum_action] 0-management: Server quorum
>regained
>> for volume engine. Starting local bricks.
>>
>>
>>
>> And at that point VDSM started complaining that the
>hosted-engine-storage
>> domain doesn't exist anymore:
>>
>> Oct  4 19:02:30 dcasrv01 journal: ovirt-ha-agent
>> ovirt_hosted_engine_ha.lib.image.Image ERROR Error fetching volumes
>list:
>> Storage domain does not exist:
>(u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>>
>> Oct  4 19:02:30 dcasrv01 ovirt-ha-agent:
>ERROR:ovirt_hosted_engine_ha.lib.image.Image:Error
>> fetching volumes list: Storage domain does not exist:
>> (u'bbb70623-194a-46d2-a164-76a4876ecaaf',)
>>
>>
>>
>> I see from the logs that the ovirt-ha-agent is trying to mount the
>> hosted-engine storage domain as:
>>
>> /usr/bin/mount -t glusterfs -o
>backup-volfile-servers=dcastor02:dcastor03
>> dcastor01:engine /rhev/data-center/mnt/glusterSD/dcastor01:engine.
>>
>>
>>
>> Pointing to dcastor01, dcastor02 and dcastor03 while your server is
>> dcasrv01.
>>
>> But at the same time it seams that also dcasrv01 has local bricks for
>the
>> same engine volume.
>>
>>
>>
>> So, is dcasrv01 just an alias fro dcastor01? if not you probably have
>some
>> issue with the configuration of your gluster volume.
>>
>>
>>
>>
>>
>>
>>
>> From broker.log
>>
>>
>>
>> Thread-169::ERROR::2016-10-04 22:44:16,189::storage_broker::138::
>> ovirt_hosted_engine_ha.broker.storage_broker.
>> StorageBroker::(get_raw_stats_for_service_type) Failed to read
>metadata
>> from /rhev/data-center/mnt/glusterSD/dcastor01:engine/
>> bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/hosted-engine.metadata
>>
>>
>>
>> [root at dcasrv01 ovirt-hosted-engine-ha]# ls -al /rhev/data-center/mnt/
>>
>glusterSD/dcastor01\:engine/bbb70623-194a-46d2-a164-76a4876ecaaf/ha_agent/
>>
>> total 9
>>
>> drwxrwx---. 2 vdsm kvm 4096 Oct  3 17:27 .
>>
>> drwxr-xr-x. 5 vdsm kvm 4096 Oct  3 17:17 ..
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.lockspace ->
>>
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/23d81b73-bcb7-
>> 4742-abde-128522f43d78/11d6a3e1-1817-429d-b2e0-9051a3cf41a4
>>
>> lrwxrwxrwx. 1 vdsm kvm  132 Oct  3 17:27 hosted-engine.metadata ->
>>
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/fd44dbf9-473a-
>> 496a-9996-c8abe3278390/cee9440c-4eb8-453b-bc04-c47e6f9cbc93
>>
>>
>>
>> [root at dcasrv01 /]# ls -al
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-
>> 76a4876ecaaf/
>>
>> ls: cannot access
>/var/run/vdsm/storage/bbb70623-194a-46d2-a164-76a4876ecaaf/:
>> No such file or directory
>>
>>
>>
>> Though file appears to be there
>>
>>
>>
>> Gluster is setup as xpool/engine
>>
>>
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# pwd
>>
>> /xpool/engine/brick/bbb70623-194a-46d2-a164-76a4876ecaaf/
>> images/fd44dbf9-473a-496a-9996-c8abe3278390
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# ls -al
>>
>> total 2060
>>
>> drwxr-xr-x. 2 vdsm kvm    4096 Oct  3 17:17 .
>>
>> drwxr-xr-x. 6 vdsm kvm    4096 Oct  3 17:17 ..
>>
>> -rw-rw----. 2 vdsm kvm 1028096 Oct  3 20:48 cee9440c-4eb8-453b-bc04-
>> c47e6f9cbc93
>>
>> -rw-rw----. 2 vdsm kvm 1048576 Oct  3 17:17 cee9440c-4eb8-453b-bc04-
>> c47e6f9cbc93.lease
>>
>> -rw-r--r--. 2 vdsm kvm     283 Oct  3 17:17
>cee9440c-4eb8-453b-bc04-c47e6f9cbc93.meta
>>
>>
>>
>>
>>
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
>info
>>
>>
>>
>> Volume Name: data
>>
>> Type: Replicate
>>
>> Volume ID: 54fbcafc-fed9-4bce-92ec-fa36cdcacbd4
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor01:/xpool/data/brick
>>
>> Brick2: dcastor03:/xpool/data/brick
>>
>> Brick3: dcastor02:/xpool/data/bricky (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> cluster.eager-lock: enable
>>
>> network.remote-dio: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>> Volume Name: engine
>>
>> Type: Replicate
>>
>> Volume ID: dd4c692d-03aa-4fc6-9011-a8dad48dad96
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor01:/xpool/engine/brick
>>
>> Brick2: dcastor02:/xpool/engine/brick
>>
>> Brick3: dcastor03:/xpool/engine/brick (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> performance.quick-read: off
>>
>> performance.read-ahead: off
>>
>> performance.io-cache: off
>>
>> performance.stat-prefetch: off
>>
>> cluster.eager-lock: enable
>>
>> network.remote-dio: enable
>>
>> cluster.quorum-type: auto
>>
>> cluster.server-quorum-type: server
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>> Volume Name: export
>>
>> Type: Replicate
>>
>> Volume ID: 23f14730-d264-4cc2-af60-196b943ecaf3
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor02:/xpool/export/brick
>>
>> Brick2: dcastor03:/xpool/export/brick
>>
>> Brick3: dcastor01:/xpool/export/brick (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>> Volume Name: iso
>>
>> Type: Replicate
>>
>> Volume ID: b2d3d7e2-9919-400b-8368-a0443d48e82a
>>
>> Status: Started
>>
>> Number of Bricks: 1 x (2 + 1) = 3
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: dcastor01:/xpool/iso/brick
>>
>> Brick2: dcastor02:/xpool/iso/brick
>>
>> Brick3: dcastor03:/xpool/iso/brick (arbiter)
>>
>> Options Reconfigured:
>>
>> performance.readdir-ahead: on
>>
>> storage.owner-uid: 36
>>
>> storage.owner-gid: 36
>>
>>
>>
>>
>>
>> [root at dcasrv01 fd44dbf9-473a-496a-9996-c8abe3278390]# gluster volume
>> status
>>
>> Status of volume: data
>>
>> Gluster process                             TCP Port  RDMA Port 
>Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor01:/xpool/data/brick           49153     0          Y
>> 3076
>>
>> Brick dcastor03:/xpool/data/brick           49153     0          Y
>> 3019
>>
>> Brick dcastor02:/xpool/data/bricky          49153     0          Y
>> 3857
>>
>> NFS Server on localhost                     2049      0          Y
>>     3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume data
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>> Status of volume: engine
>>
>> Gluster process                             TCP Port  RDMA Port 
>Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor01:/xpool/engine/brick         49152     0          Y
>> 3131
>>
>> Brick dcastor02:/xpool/engine/brick         49152     0          Y
>> 3852
>>
>> Brick dcastor03:/xpool/engine/brick         49152     0          Y
>> 2992
>>
>> NFS Server on localhost                     2049      0          Y
>> 3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume engine
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>> Status of volume: export
>>
>> Gluster process                             TCP Port  RDMA Port 
>Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor02:/xpool/export/brick         49155     0          Y
>> 3872
>>
>> Brick dcastor03:/xpool/export/brick         49155     0          Y
>> 3147
>>
>> Brick dcastor01:/xpool/export/brick         49155     0          Y
>> 3150
>>
>> NFS Server on localhost                     2049      0          Y
>> 3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume export
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>> Status of volume: iso
>>
>> Gluster process                             TCP Port  RDMA Port 
>Online
>> Pid
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> Brick dcastor01:/xpool/iso/brick            49154     0          Y
>> 3152
>>
>> Brick dcastor02:/xpool/iso/brick            49154     0          Y
>> 3881
>>
>> Brick dcastor03:/xpool/iso/brick            49154     0          Y
>> 3146
>>
>> NFS Server on localhost                     2049      0          Y
>> 3097
>>
>> Self-heal Daemon on localhost               N/A       N/A        Y
>> 3088
>>
>> NFS Server on dcastor03                     2049      0          Y
>> 3039
>>
>> Self-heal Daemon on dcastor03               N/A       N/A        Y
>> 3114
>>
>> NFS Server on dcasrv02                      2049      0          Y
>> 3871
>>
>> Self-heal Daemon on dcasrv02                N/A       N/A        Y
>> 3864
>>
>>
>>
>> Task Status of Volume iso
>>
>> ------------------------------------------------------------
>> ------------------
>>
>> There are no active volume tasks
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>> *From:* users-bounces at ovirt.org [mailto:users-bounces at ovirt.org] *On
>> Behalf Of *Jason Jeffrey
>> *Sent:* 03 October 2016 18:40
>> *To:* users at ovirt.org
>>
>>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>> Hi,
>>
>>
>>
>> Setup log attached for primary
>>
>>
>>
>> Regards
>>
>>
>>
>> Jason
>>
>>
>>
>> *From:* Simone Tiraboschi [mailto:stirabos at redhat.com
>> <stirabos at redhat.com>]
>> *Sent:* 03 October 2016 09:27
>> *To:* Jason Jeffrey <jason at sudo.co.uk>
>> *Cc:* users <users at ovirt.org>
>> *Subject:* Re: [ovirt-users] 4.0 - 2nd node fails on deploy
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 12:45 AM, Jason Jeffrey <jason at sudo.co.uk>
>wrote:
>>
>> Hi,
>>
>>
>>
>> I am trying to build a x3 HC cluster, with a self hosted engine using
>> gluster.
>>
>>
>>
>> I have successful built the 1st node,  however when I attempt to run
>> hosted-engine –deploy on node 2, I get the following error
>>
>>
>>
>> [WARNING] A configuration file must be supplied to deploy Hosted
>Engine on
>> an additional host.
>>
>> [ ERROR ] 'version' is not stored in the HE configuration image
>>
>> [ ERROR ] Unable to get the answer file from the shared storage
>>
>> [ ERROR ] Failed to execute stage 'Environment customization': Unable
>to
>> get the answer file from the shared storage
>>
>> [ INFO  ] Stage: Clean up
>>
>> [ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-
>> setup/answers/answers-20161002232505.conf'
>>
>> [ INFO  ] Stage: Pre-termination
>>
>> [ INFO  ] Stage: Termination
>>
>> [ ERROR ] Hosted Engine deployment failed
>>
>>
>>
>> Looking at the failure in the log file..
>>
>>
>>
>> Can you please attach hosted-engine-setup logs from the first host?
>>
>>
>>
>>
>>
>> 2016-10-02 23:25:05 WARNING
>otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._customization:151 A configuration
>>
>> file must be supplied to deploy Hosted Engine on an additional host.
>>
>> 2016-10-02 23:25:05 DEBUG
>otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._fetch_answer_file:61 _fetch_answer_f
>>
>> ile
>>
>> 2016-10-02 23:25:05 DEBUG
>otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._fetch_answer_file:69 fetching from:
>>
>>
>/rhev/data-center/mnt/glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
>> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/7
>>
>> 8cb2527-a2e2-489a-9fad-465a72221b37
>>
>> 2016-10-02 23:25:05 DEBUG
>otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:69 executing: 'sudo -u vdsm dd i
>>
>> f=/rhev/data-center/mnt/glusterSD/dcastor02:engine/
>> 0a021563-91b5-4f49-9c6b-fff45e85a025/images/f055216c-
>> 02f9-4cd1-a22c-d6b56a0a8e9b
>>
>> /78cb2527-a2e2-489a-9fad-465a72221b37 bs=4k'
>>
>> 2016-10-02 23:25:05 DEBUG
>otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:70 executing: 'tar -tvf -'
>>
>> 2016-10-02 23:25:05 DEBUG
>otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:88 stdout:
>>
>> 2016-10-02 23:25:05 DEBUG
>otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib._dd_pipe_tar:89 stderr:
>>
>> 2016-10-02 23:25:05 ERROR
>otopi.plugins.gr_he_common.core.remote_answerfile
>> heconflib.validateConfImage:111 'version' is not stored
>>
>> in the HE configuration image
>>
>> 2016-10-02 23:25:05 ERROR
>otopi.plugins.gr_he_common.core.remote_answerfile
>> remote_answerfile._fetch_answer_file:73 Unable to get t
>>
>> he answer file from the shared storage
>>
>>
>>
>> Looking at the detected gluster path - /rhev/data-center/mnt/
>> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
>> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>>
>>
>>
>> [root at dcasrv02 ~]# ls -al /rhev/data-center/mnt/
>> glusterSD/dcastor02:engine/0a021563-91b5-4f49-9c6b-
>> fff45e85a025/images/f055216c-02f9-4cd1-a22c-d6b56a0a8e9b/
>>
>> total 1049609
>>
>> drwxr-xr-x. 2 vdsm kvm       4096 Oct  2 04:46 .
>>
>> drwxr-xr-x. 6 vdsm kvm       4096 Oct  2 04:46 ..
>>
>> -rw-rw----. 1 vdsm kvm 1073741824 Oct  2 04:46
>78cb2527-a2e2-489a-9fad-
>> 465a72221b37
>>
>> -rw-rw----. 1 vdsm kvm    1048576 Oct  2 04:46
>78cb2527-a2e2-489a-9fad-
>> 465a72221b37.lease
>>
>> -rw-r--r--. 1 vdsm kvm        294 Oct  2 04:46
>78cb2527-a2e2-489a-9fad-465a72221b37.meta
>>
>>
>>
>>
>> 78cb2527-a2e2-489a-9fad-465a72221b37 is  a 1 GB file, is this the
>engine
>> VM ?
>>
>>
>>
>> Copying the answers file form primary
>(/etc/ovirt-hosted-engine/answers.conf
>> ) to  node 2 and rerunning produces the same error : (
>>
>> (hosted-engine --deploy  --config-append=/root/answers.conf )
>>
>>
>>
>> Also tried on node 3, same issues
>>
>>
>>
>> Happy to provide logs and other debugs
>>
>>
>>
>> Thanks
>>
>>
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161005/e818e5fc/attachment-0001.html>