[Gluster-users] gluster and LIO, fairly basic setup, having major issues

Michael Ciccarelli mikecicc01 at gmail.com
Thu Oct 6 20:25:02 UTC 2016


this is the info file contents.. is there another file you would want to
see for config?
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
disperse_count=0
redundancy_count=0
version=3
transport-type=0
volume-id=98c258e6-ae9e-4407-8f25-7e3f7700e100
username=removed just cause
password=removed just cause
op-version=3
client-op-version=3
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
diagnostics.count-fop-hits=on
diagnostics.latency-measurement=on
performance.readdir-ahead=on
brick-0=media1-be:-gluster-brick1-gluster_volume_0
brick-1=media2-be:-gluster-brick1-gluster_volume_0

here are some log entries, etc-glusterfs-glusterd.vol.log:
The message "I [MSGID: 106006]
[glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs
has disconnected from glusterd." repeated 39 times between [2016-10-06
20:10:14.963402] and [2016-10-06 20:12:11.979684]
[2016-10-06 20:12:14.980203] I [MSGID: 106006]
[glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs
has disconnected from glusterd.
[2016-10-06 20:13:50.993490] W [socket.c:596:__socket_rwv] 0-nfs: readv on
/var/run/gluster/360710d59bc4799f8c8a6374936d2b1b.socket failed (Invalid
argument)

I can provide any specific details you would like to see.. Last night I
tried 1 more time and it appeared to be working ok for running 1 VM under
VMware but as soon as I had 3 running the targets became unresponsive. I
believe gluster volume is ok but for whatever reason the ISCSI target
daemon seems to be having some issues...

here is from the messages file:
Oct  5 23:13:00 media2 kernel: MODE SENSE: unimplemented page/subpage:
0x1c/0x02
Oct  5 23:13:00 media2 kernel: MODE SENSE: unimplemented page/subpage:
0x1c/0x02
Oct  5 23:13:35 media2 kernel:
iSCSI/iqn.1998-01.com.vmware:vmware4-0941d552: Unsupported SCSI Opcode
0x4d, sending CHECK_CONDITION.
Oct  5 23:13:35 media2 kernel:
iSCSI/iqn.1998-01.com.vmware:vmware4-0941d552: Unsupported SCSI Opcode
0x4d, sending CHECK_CONDITION.

and here are some more VMware iscsi errors:
2016-10-06T20:22:11.496Z cpu2:32825)NMP: nmp_ThrottleLogForDevice:2321: Cmd
0x89 (0x412e808532c0, 32801) to dev "naa.6001405c0d86944f3d2468d80c7d1540"
on
2016-10-06T20:22:11.635Z cpu2:32787)ScsiDeviceIO: 2338: Cmd(0x412e808532c0)
0x89, CmdSN 0x4f05 from world 32801 to dev
"naa.6001405c0d86944f3d2468d80c7d1
2016-10-06T20:22:11.635Z cpu3:35532)Fil3: 15389: Max timeout retries
exceeded for caller Fil3_FileIO (status 'Timeout')

2016-10-06T20:22:11.635Z cpu2:196414)HBX: 2832: Waiting for timed out [HB
state abcdef02 offset 3928064 gen 25 stampUS 49571997650 uuid
57f5c142-45632d75
2016-10-06T20:22:11.635Z cpu3:35532)HBX: 2832: Waiting for timed out [HB
state abcdef02 offset 3928064 gen 25 stampUS 49571997650 uuid
57f5c142-45632d75-
2016-10-06T20:22:11.635Z cpu0:32799)NMP: nmp_ThrottleLogForDevice:2321: Cmd
0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540"
on
2016-10-06T20:22:11.635Z cpu0:32799)ScsiDeviceIO: 2325: Cmd(0x412e80848580)
0x28, CmdSN 0x4f06 from world 32799 to dev
"naa.6001405c0d86944f3d2468d80c7d1
2016-10-06T20:22:11.773Z cpu0:32843)NMP: nmp_ThrottleLogForDevice:2321: Cmd
0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540"
on
2016-10-06T20:22:11.916Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd
0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540"
on
2016-10-06T20:22:12.000Z cpu2:33431)iscsi_vmk: iscsivmk_ConnNetRegister:
socket 0x410987bf0800 network resource pool netsched.pools.persist.iscsi
associa
2016-10-06T20:22:12.000Z cpu2:33431)iscsi_vmk: iscsivmk_ConnNetRegister:
socket 0x410987bf0800 network tracker id 16 tracker.iSCSI.172.16.1.40
associated
2016-10-06T20:22:12.056Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd
0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540"
on
2016-10-06T20:22:12.194Z cpu0:35549)NMP: nmp_ThrottleLogForDevice:2321: Cmd
0x28 (0x412e80848580, 32799) to dev "naa.6001405c0d86944f3d2468d80c7d1540"
on
2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk:
iscsivmk_StartConnection: vmhba38:CH:1 T:1 CN:0: iSCSI connection is being
marked "ONLINE"
2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk:
iscsivmk_StartConnection: Sess [ISID: 00023d000004 TARGET:
iqn.2016-09.iscsi.gluster:shared TPGT:
2016-10-06T20:22:12.253Z cpu2:33431)WARNING: iscsi_vmk:
iscsivmk_StartConnection: Conn [CID: 0 L: 172.16.1.53:49959 R:
172.16.1.40:3260]

Is it that the gluster overhead is just killing LIO/target?

thanks,
Mike



On Thu, Oct 6, 2016 at 12:22 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> Hi Mike,
>
> Can you please share your gluster volume configuration?
>
> Also do you notice anything in client logs on the node where fileio
> backstore is configured?
>
> Thanks,
> Vijay
>
> On Wed, Oct 5, 2016 at 8:56 PM, Michael Ciccarelli <mikecicc01 at gmail.com>
> wrote:
> > So I have a fairly basic setup using glusterfs between 2 nodes. The nodes
> > have 10 gig connections and the bricks reside on SSD LVM LUNs:
> >
> > Brick1: media1-be:/gluster/brick1/gluster_volume_0
> > Brick2: media2-be:/gluster/brick1/gluster_volume_0
> >
> >
> > On this volume I have a LIO iscsi target with 1 fileio backstore that's
> > being shared out to vmware ESXi hosts. The volume is around 900 gig and
> the
> > fileio store is around 850g:
> >
> > -rw-r--r-- 1 root root 912680550400 Oct  5 20:47 iscsi.disk.3
> >
> > I set the WWN to be the same so the ESXi hosts see the nodes as 2 paths
> to
> > the same target. I believe this is what I want. The issues I'm seeing is
> > that while the IO wait is low I'm seeing high CPU usage with only 3 VMs
> > running on only 1 of the ESX servers:
> >
> > this is media2-be:
> >   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> COMMAND
> >  1474 root      20   0 1396620  37912   5980 S 135.0  0.1 157:01.84
> > glusterfsd
> >  1469 root      20   0  747996  13724   5424 S   2.0  0.0   1:10.59
> > glusterfs
> >
> > And this morning it seemed like I had to restart the LIO service on
> > media1-be as the VMware was seeing time-out issues. I'm seeing issues
> like
> > this on the VMware ESX servers:
> >
> > 2016-10-06T00:51:41.100Z cpu0:32785)WARNING: ScsiDeviceIO: 1223: Device
> > naa.600140501ce79002e724ebdb66a6756d performance has deteriorated. I/O
> > latency increased from average value of 33420 microseconds to 732696
> > microseconds.
> >
> > Are there any special settings I need to have gluster+LIO+vmware to work?
> > Has anyone gotten this to work fairly well that it is stable? What am I
> > missing?
> >
> > thanks,
> > Mike
> >
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161006/030efd8a/attachment.html>


More information about the Gluster-users mailing list