[Gluster-users] I/O error on replicated volume
Mohammed Rafi K C
rkavunga at redhat.com
Fri Mar 27 08:24:39 UTC 2015
On 03/27/2015 11:04 AM, Jonathan Heese wrote:
> On Mar 27, 2015, at 1:29 AM, "Mohammed Rafi K C" <rkavunga at redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> When we change the transport from x to y, it should reflect in all
>> the vol files. But unfortunately, the volume set command failed to
>> change in nfs server, (of course it is a bug). I had clearly
>> mentioned in my previous mails, that changing the volume files using
>> the volume set command is not recommended, i suggested this, just to
>> check whether tcp work fine or not.
>>
>> The reason why you are getting rdma connection error is because , now
>> bricks are running through tcp, so the brick process will be
>> listening on socket port. But nfs-server asked for an rdma
>> connection, so they are trying to connect from rdma port to tcp port.
>> Obviously the connection will be rejected.
>
> Okay, thanks for the thorough explanation there.
>
> Now that we know that TCP does function without the original I/O
> errors (from the start of this thread), how do you suggest that I proceed?
>
> Do I have to wait for a subsequent release to rid myself of this bug?
I will make sure to fix the bug. Also we can expect that rdma patches
will be merged soon. After rdma bug's are fixed in 3.5.x , you can test
and switch to rdma.
>
> Would it be feasible for me to switch from RDMA to TCP in a more
> permanent fashion (maybe wipe the cluster and start over?)?
Either you can manually edit nfs-volfile, to change transport to "option
transport-type tcp", in all the places, then restarting nfs will solve
the problem. Or else and if possible you can start a fresh cluster
running on tcp.
Rafi
>
> Thanks.
>
> Regards,
> Jon Heese
>
>> Regards
>> Rafi KC
>>
>> On 03/27/2015 12:28 AM, Jonathan Heese wrote:
>>>
>>> Rafi,
>>>
>>>
>>> Here is my nfs-server.vol file:
>>>
>>>
>>> [root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
>>> volume gluster_disk-client-0
>>> type protocol/client
>>> option send-gids true
>>> option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>> option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>> option transport-type rdma
>>> option remote-subvolume /bricks/brick1
>>> option remote-host duke-ib
>>> end-volume
>>>
>>> volume gluster_disk-client-1
>>> type protocol/client
>>> option send-gids true
>>> option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>> option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>> option transport-type rdma
>>> option remote-subvolume /bricks/brick1
>>> option remote-host duchess-ib
>>> end-volume
>>>
>>> volume gluster_disk-replicate-0
>>> type cluster/replicate
>>> subvolumes gluster_disk-client-0 gluster_disk-client-1
>>> end-volume
>>>
>>> volume gluster_disk-dht
>>> type cluster/distribute
>>> subvolumes gluster_disk-replicate-0
>>> end-volume
>>>
>>> volume gluster_disk-write-behind
>>> type performance/write-behind
>>> subvolumes gluster_disk-dht
>>> end-volume
>>>
>>> volume gluster_disk
>>> type debug/io-stats
>>> option count-fop-hits off
>>> option latency-measurement off
>>> subvolumes gluster_disk-write-behind
>>> end-volume
>>>
>>> volume nfs-server
>>> type nfs/server
>>> option nfs3.gluster_disk.volume-id
>>> 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> option rpc-auth.addr.gluster_disk.allow *
>>> option nfs.drc off
>>> option nfs.nlm on
>>> option nfs.dynamic-volumes on
>>> subvolumes gluster_disk
>>> end-volume
>>>
>>> I see that "transport-type rdma" is listed a couple times here, but
>>> "gluster volume info" indicates that the volume is using the tcp
>>> transport:
>>>
>>>
>>> [root at duke ~]# gluster volume info gluster_disk
>>>
>>> Volume Name: gluster_disk
>>> Type: Replicate
>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: duke-ib:/bricks/brick1
>>> Brick2: duchess-ib:/bricks/brick1
>>> Options Reconfigured:
>>> config.transport: tcp
>>>
>>> Please let me know if you need any further information from me to
>>> determine how to correct this discrepancy.
>>>
>>>
>>> Also, I feel compelled to ask: Since the TCP connections are going
>>> over the InfiniBand connections between the two Gluster servers
>>> (based on the hostnames which are pointed to the IB IPs via hosts
>>> files), are there any (significant) drawbacks to using TCP instead
>>> of RDMA here? Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Jon Heese
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Mohammed Rafi K C <rkavunga at redhat.com>
>>> *Sent:* Monday, March 23, 2015 3:29 AM
>>> *To:* Jonathan Heese
>>> *Cc:* gluster-users
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>
>>>
>>> On 03/23/2015 11:28 AM, Jonathan Heese wrote:
>>>> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C"
>>>> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote:
>>>>
>>>>>
>>>>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>>>>
>>>>>> Mohamed,
>>>>>>
>>>>>>
>>>>>> I have completed the steps you suggested (unmount all, stop the
>>>>>> volume, set the config.transport to tcp, start the volume, mount,
>>>>>> etc.), and the behavior has indeed changed.
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: gluster_disk
>>>>>> Type: Replicate
>>>>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: duke-ib:/bricks/brick1
>>>>>> Brick2: duchess-ib:/bricks/brick1
>>>>>> Options Reconfigured:
>>>>>> config.transport: tcp
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume status
>>>>>> Status of volume: gluster_disk
>>>>>> Gluster process Port
>>>>>> Online Pid
>>>>>> ------------------------------------------------------------------------------
>>>>>> Brick duke-ib:/bricks/brick1 49152
>>>>>> Y 16362
>>>>>> Brick duchess-ib:/bricks/brick1 49152
>>>>>> Y 14155
>>>>>> NFS Server on localhost 2049
>>>>>> Y 16374
>>>>>> Self-heal Daemon on localhost N/A
>>>>>> Y 16381
>>>>>> NFS Server on duchess-ib 2049
>>>>>> Y 14167
>>>>>> Self-heal Daemon on duchess-ib N/A
>>>>>> Y 14174
>>>>>>
>>>>>> Task Status of Volume gluster_disk
>>>>>> ------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> I am no longer seeing the I/O errors during prolonged periods of
>>>>>> write I/O that I was seeing when the transport was set to rdma.
>>>>>> However, I am seeing this message on both nodes every 3 seconds
>>>>>> (almost exactly):
>>>>>>
>>>>>>
>>>>>> ==> /var/log/glusterfs/nfs.log <==
>>>>>> [2015-03-21 14:17:40.379719] W
>>>>>> [rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1:
>>>>>> cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023
>>>>>> peer:10.10.10.2:49152)
>>>>>>
>>>>>>
>>>>>> Is this something to worry about?
>>>>>>
>>>>> If you are not using nfs to export the volumes, there is nothing
>>>>> to worry.
>>>>
>>>> I'm using the native glusterfs FUSE component to mount the volume
>>>> locally on both servers -- I assume that you're referring to the
>>>> standard NFS protocol stuff, which I'm not using here.
>>>>
>>>> Incidentally, I would like to keep my logs from filling up with
>>>> junk if possible. Is there something I can do to get rid of these
>>>> (useless?) error messages?
>>>
>>> If i understand correctly, you are getting this enormous log message
>>> from nfs log only, all other logs and everything are fine now, right
>>> ? If that is the case, and you are not at all using nfs for
>>> exporting the volume, as a workaround you can disable nfs for your
>>> volume or cluster. (gluster v set nfs.disable on). This will turnoff
>>> your gluster nfs server, and you will no longer get those log messages.
>>>
>>>
>>>>>> Any idea why there are rdma pieces in play when I've set my
>>>>>> transport to tcp?
>>>>>>
>>>>>
>>>>> there should not be any piece of rdma,if possible, can you paste
>>>>> the volfile for nfs server. You can find the volfile in
>>>>> /var/lib/glusterd/nfs/nfs-server.vol or
>>>>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>>>>
>>>> I will get this for you when I can. Thanks.
>>>
>>> If you can make it, that will be great help to understand the problem.
>>>
>>>
>>> Rafi KC
>>>
>>>>
>>>> Regards,
>>>> Jon Heese
>>>>
>>>>> Rafi KC
>>>>>>
>>>>>> The actual I/O appears to be handled properly and I've seen no
>>>>>> further errors in the testing I've done so far.
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Jon Heese
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* gluster-users-bounces at gluster.org
>>>>>> <gluster-users-bounces at gluster.org> on behalf of Jonathan Heese
>>>>>> <jheese at inetu.net>
>>>>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>>>>> *To:* Mohammed Rafi K C
>>>>>> *Cc:* gluster-users
>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>>>
>>>>>> Mohammed,
>>>>>>
>>>>>> Thanks very much for the reply. I will try that and report back.
>>>>>>
>>>>>> Regards,
>>>>>> Jon Heese
>>>>>>
>>>>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C"
>>>>>> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Does anyone else have any further suggestions for
>>>>>>>> troubleshooting this?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> To sum up: I have a 2 node 2 brick replicated volume, which
>>>>>>>> holds a handful of iSCSI image files which are mounted and
>>>>>>>> served up by tgtd (CentOS 6) to a handful of devices on a
>>>>>>>> dedicated iSCSI network. The most important iSCSI clients
>>>>>>>> (initiators) are four VMware ESXi 5.5 hosts that use the iSCSI
>>>>>>>> volumes as backing for their datastores for virtual machine
>>>>>>>> storage.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> After a few minutes of sustained writing to the volume, I am
>>>>>>>> seeing a massive flood (over 1500 per second at times) of this
>>>>>>>> error in /var/log/glusterfs/mnt-gluster-disk.log:
>>>>>>>>
>>>>>>>> [2015-03-16 02:24:07.582801] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635358:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> When this happens, the ESXi box fails its write operation and
>>>>>>>> returns an error to the effect of “Unable to write data to
>>>>>>>> datastore”. I don’t see anything else in the supporting logs
>>>>>>>> to explain the root cause of the i/o errors.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Any and all suggestions are appreciated. Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> From the mount logs, i assume that your volume transport type is
>>>>>>> rdma. There are some known issues for rdma in 3.5.3, and the
>>>>>>> patch for to address those issues are already send to upstream
>>>>>>> [1]. From the logs, I'm not sure and it is hard to tell you
>>>>>>> whether this problem is something related to rdma transport or
>>>>>>> not. To make sure that the tcp transport is works well in this
>>>>>>> scenario, if possible can you try to reproduce the same using
>>>>>>> tcp type volumes. You can change the transport type of volume by
>>>>>>> doing the following step ( not recommended in normal use case).
>>>>>>>
>>>>>>> 1) unmount every client
>>>>>>> 2) stop the volume
>>>>>>> 3) run gluster volume set volname config.transport tcp
>>>>>>> 4) start the volume again
>>>>>>> 5) mount the clients
>>>>>>>
>>>>>>> [1] : http://goo.gl/2PTL61
>>>>>>>
>>>>>>> Regards
>>>>>>> Rafi KC
>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential information, which also
>>>>>>>> may be privileged, and is intended only for the person(s)
>>>>>>>> addressed above. Any unauthorized use, distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged information is
>>>>>>>> strictly prohibited. If you have received this communication in
>>>>>>>> error, please erase all copies of the message and its
>>>>>>>> attachments and notify the sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:*Jonathan Heese
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>>>>> *To:* 'Ravishankar N'; gluster-users at gluster.org
>>>>>>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ravi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The last lines in the mount log before the massive vomit of I/O
>>>>>>>> errors are from 22 minutes prior, and seem innocuous to me:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126340] E
>>>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-0: failed to get the port number for
>>>>>>>> remote subvolume. Please run 'gluster volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126687] E
>>>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-1: failed to get the port number for
>>>>>>>> remote subvolume. Please run 'gluster volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730165] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gluster_disk-client-0:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739500] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gluster_disk-client-1:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.741883] I
>>>>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744524] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152,
>>>>>>>> attached to remote volume '/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744537] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server and Client lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
>>>>>>>> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0'
>>>>>>>> came back up; going online.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744627] I
>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.753037] I
>>>>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755657] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152,
>>>>>>>> attached to remote volume '/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755676] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server and Client lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.761945] I
>>>>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse: switched to graph 0
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.762144] I
>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>>>>
>>>>>>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
>>>>>>>> 7.22 kernel 7.14
>>>>>>>>
>>>>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>> …
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I’ve seen no indication of split-brain on any files at any
>>>>>>>> point in this (ever since downdating from 3.6.2 to 3.5.3, which
>>>>>>>> is when this particular issue started):
>>>>>>>>
>>>>>>>> [root at duke gfapi-module-for-linux-target-driver-]# gluster v
>>>>>>>> heal gluster_disk info
>>>>>>>>
>>>>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential information, which also
>>>>>>>> may be privileged, and is intended only for the person(s)
>>>>>>>> addressed above. Any unauthorized use, distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged information is
>>>>>>>> strictly prohibited. If you have received this communication in
>>>>>>>> error, please erase all copies of the message and its
>>>>>>>> attachments and notify the sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>>>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>>>>>>> <mailto:gluster-users at gluster.org>
>>>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> So I resolved my previous issue with split-brains and the
>>>>>>>> lack of self-healing by dropping my installed glusterfs*
>>>>>>>> packages from 3.6.2 to 3.5.3, but now I've picked up a new
>>>>>>>> issue, which actually makes normal use of the volume
>>>>>>>> practically impossible.
>>>>>>>>
>>>>>>>> A little background for those not already paying close
>>>>>>>> attention:
>>>>>>>> I have a 2 node 2 brick replicating volume whose purpose in
>>>>>>>> life is to hold iSCSI target files, primarily for use to
>>>>>>>> provide datastores to a VMware ESXi cluster. The plan is
>>>>>>>> to put a handful of image files on the Gluster volume,
>>>>>>>> mount them locally on both Gluster nodes, and run tgtd on
>>>>>>>> both, pointed to the image files on the mounted gluster
>>>>>>>> volume. Then the ESXi boxes will use multipath
>>>>>>>> (active/passive) iSCSI to connect to the nodes, with
>>>>>>>> automatic failover in case of planned or unplanned downtime
>>>>>>>> of the Gluster nodes.
>>>>>>>>
>>>>>>>> In my most recent round of testing with 3.5.3, I'm seeing a
>>>>>>>> massive failure to write data to the volume after about
>>>>>>>> 5-10 minutes, so I've simplified the scenario a bit (to
>>>>>>>> minimize the variables) to: both Gluster nodes up, only one
>>>>>>>> node (duke) mounted and running tgtd, and just regular
>>>>>>>> (single path) iSCSI from a single ESXi server.
>>>>>>>>
>>>>>>>> About 5-10 minutes into migration a VM onto the test
>>>>>>>> datastore, /var/log/messages on duke gets blasted with a
>>>>>>>> ton of messages exactly like this:
>>>>>>>>
>>>>>>>> Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error
>>>>>>>> 0x1781e00 2a -1 512 22971904, Input/output error
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> And /var/log/glusterfs/mnt-gluster_disk.log gets blased
>>>>>>>> with a ton of messages exactly like this:
>>>>>>>>
>>>>>>>> [2015-03-16 02:24:07.572279] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse:
>>>>>>>> 635299: WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Are there any messages in the mount log from AFR about
>>>>>>>> split-brain just before the above line appears?
>>>>>>>> Does `gluster v heal <VOLNAME> info` show any files? Performing
>>>>>>>> I/O on files that are in split-brain fail with EIO.
>>>>>>>>
>>>>>>>> -Ravi
>>>>>>>>
>>>>>>>> And the write operation from VMware's side fails as soon as
>>>>>>>> these messages start.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I don't see any other errors (in the log files I know of)
>>>>>>>> indicating the root cause of these i/o errors. I'm sure
>>>>>>>> that this is not enough information to tell what's going
>>>>>>>> on, but can anyone help me figure out what to look at next
>>>>>>>> to figure this out?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I've also considered using Dan Lambright's libgfapi gluster
>>>>>>>> module for tgtd (or something similar) to avoid going
>>>>>>>> through FUSE, but I'm not sure whether that would be
>>>>>>>> irrelevant to this problem, since I'm not 100% sure if it
>>>>>>>> lies in FUSE or elsewhere.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential information, which
>>>>>>>> also may be privileged, and is intended only for the
>>>>>>>> person(s) addressed above. Any unauthorized use,
>>>>>>>> distribution, copying or disclosure of confidential and/or
>>>>>>>> privileged information is strictly prohibited. If you have
>>>>>>>> received this communication in error, please erase all
>>>>>>>> copies of the message and its attachments and notify the
>>>>>>>> sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>>> Gluster-users mailing list
>>>>>>>>
>>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150327/c0f7aa6e/attachment.html>
More information about the Gluster-users
mailing list