[Gluster-users] I/O error on replicated volume

Fri Mar 27 08:24:39 UTC 2015

On 03/27/2015 11:04 AM, Jonathan Heese wrote:
> On Mar 27, 2015, at 1:29 AM, "Mohammed Rafi K C" <rkavunga at redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> When we change the transport from x to y, it should reflect in all
>> the vol files. But unfortunately, the volume set command failed to
>> change in nfs server, (of course it is a bug).  I had clearly
>> mentioned in my previous mails, that changing the volume files using
>> the volume set command is not recommended, i suggested this, just to
>> check whether tcp work fine or not.
>>
>> The reason why you are getting rdma connection error is because , now
>> bricks are running through tcp, so the brick process will be
>> listening on socket port. But nfs-server asked for an rdma
>> connection, so they are trying to connect from rdma port to tcp port.
>> Obviously the connection will be rejected.
>
> Okay, thanks for the thorough explanation there.
>
> Now that we know that TCP does function without the original I/O
> errors (from the start of this thread), how do you suggest that I proceed?
>
> Do I have to wait for a subsequent release to rid myself of this bug?

I will make sure to fix the bug. Also we can expect that rdma patches
will be merged soon.  After rdma bug's are fixed in 3.5.x , you can test
and switch to rdma.

>
> Would it be feasible for me to switch from RDMA to TCP in a more
> permanent fashion (maybe wipe the cluster and start over?)?

Either you can manually edit nfs-volfile, to change transport to "option
transport-type tcp", in all the places, then restarting nfs will solve
the problem. Or else and if possible you can start a fresh cluster
running on tcp.

Rafi
>
> Thanks.
>
> Regards,
> Jon Heese
>
>> Regards
>> Rafi KC
>>  
>> On 03/27/2015 12:28 AM, Jonathan Heese wrote:
>>>
>>> Rafi,
>>>
>>>
>>> Here is my nfs-server.vol file:
>>>
>>>
>>> [root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
>>> volume gluster_disk-client-0
>>>     type protocol/client
>>>     option send-gids true
>>>     option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>>     option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>>     option transport-type rdma
>>>     option remote-subvolume /bricks/brick1
>>>     option remote-host duke-ib
>>> end-volume
>>>
>>> volume gluster_disk-client-1
>>>     type protocol/client
>>>     option send-gids true
>>>     option password 562ab460-7754-4b5a-82e6-18ed6c130786
>>>     option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>>>     option transport-type rdma
>>>     option remote-subvolume /bricks/brick1
>>>     option remote-host duchess-ib
>>> end-volume
>>>
>>> volume gluster_disk-replicate-0
>>>     type cluster/replicate
>>>     subvolumes gluster_disk-client-0 gluster_disk-client-1
>>> end-volume
>>>
>>> volume gluster_disk-dht
>>>     type cluster/distribute
>>>     subvolumes gluster_disk-replicate-0
>>> end-volume
>>>
>>> volume gluster_disk-write-behind
>>>     type performance/write-behind
>>>     subvolumes gluster_disk-dht
>>> end-volume
>>>
>>> volume gluster_disk
>>>     type debug/io-stats
>>>     option count-fop-hits off
>>>     option latency-measurement off
>>>     subvolumes gluster_disk-write-behind
>>> end-volume
>>>
>>> volume nfs-server
>>>     type nfs/server
>>>     option nfs3.gluster_disk.volume-id
>>> 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>     option rpc-auth.addr.gluster_disk.allow *
>>>     option nfs.drc off
>>>     option nfs.nlm on
>>>     option nfs.dynamic-volumes on
>>>     subvolumes gluster_disk
>>> end-volume
>>>
>>> I see that "transport-type rdma" is listed a couple times here, but
>>> "gluster volume info" indicates that the volume is using the tcp
>>> transport:
>>>
>>>
>>> [root at duke ~]# gluster volume info gluster_disk
>>>
>>> Volume Name: gluster_disk
>>> Type: Replicate
>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: duke-ib:/bricks/brick1
>>> Brick2: duchess-ib:/bricks/brick1
>>> Options Reconfigured:
>>> config.transport: tcp
>>>
>>> Please let me know if you need any further information from me to
>>> determine how to correct this discrepancy.
>>>
>>>
>>> Also, I feel compelled to ask: Since the TCP connections are going
>>> over the InfiniBand connections between the two Gluster servers
>>> (based on the hostnames which are pointed to the IB IPs via hosts
>>> files), are there any (significant) drawbacks to using TCP instead
>>> of RDMA here?  Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Jon Heese
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Mohammed Rafi K C <rkavunga at redhat.com>
>>> *Sent:* Monday, March 23, 2015 3:29 AM
>>> *To:* Jonathan Heese
>>> *Cc:* gluster-users
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>  
>>>
>>> On 03/23/2015 11:28 AM, Jonathan Heese wrote:
>>>> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C"
>>>> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote:
>>>>
>>>>>
>>>>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>>>>
>>>>>> Mohamed,
>>>>>>
>>>>>>
>>>>>> I have completed the steps you suggested (unmount all, stop the
>>>>>> volume, set the config.transport to tcp, start the volume, mount,
>>>>>> etc.), and the behavior has indeed changed.
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume info
>>>>>>
>>>>>> Volume Name: gluster_disk
>>>>>> Type: Replicate
>>>>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>>>> Status: Started
>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: duke-ib:/bricks/brick1
>>>>>> Brick2: duchess-ib:/bricks/brick1
>>>>>> Options Reconfigured:
>>>>>> config.transport: tcp
>>>>>>
>>>>>>
>>>>>> [root at duke ~]# gluster volume status
>>>>>> Status of volume: gluster_disk
>>>>>> Gluster process                                         Port   
>>>>>> Online  Pid
>>>>>> ------------------------------------------------------------------------------
>>>>>> Brick duke-ib:/bricks/brick1                            49152  
>>>>>> Y       16362
>>>>>> Brick duchess-ib:/bricks/brick1                         49152  
>>>>>> Y       14155
>>>>>> NFS Server on localhost                                 2049   
>>>>>> Y       16374
>>>>>> Self-heal Daemon on localhost                           N/A    
>>>>>> Y       16381
>>>>>> NFS Server on duchess-ib                                2049   
>>>>>> Y       14167
>>>>>> Self-heal Daemon on duchess-ib                          N/A    
>>>>>> Y       14174
>>>>>>
>>>>>> Task Status of Volume gluster_disk
>>>>>> ------------------------------------------------------------------------------
>>>>>> There are no active volume tasks
>>>>>>
>>>>>> I am no longer seeing the I/O errors during prolonged periods of
>>>>>> write I/O that I was seeing when the transport was set to rdma.
>>>>>> However, I am seeing this message on both nodes every 3 seconds
>>>>>> (almost exactly):
>>>>>>
>>>>>>
>>>>>> ==> /var/log/glusterfs/nfs.log <==
>>>>>> [2015-03-21 14:17:40.379719] W
>>>>>> [rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1:
>>>>>> cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023
>>>>>> peer:10.10.10.2:49152)
>>>>>>
>>>>>>
>>>>>> Is this something to worry about?
>>>>>>
>>>>> If you are not using nfs to export the volumes, there is nothing
>>>>> to worry.
>>>>
>>>> I'm using the native glusterfs FUSE component to mount the volume
>>>> locally on both servers -- I assume that you're referring to the
>>>> standard NFS protocol stuff, which I'm not using here.
>>>>
>>>> Incidentally, I would like to keep my logs from filling up with
>>>> junk if possible.  Is there something I can do to get rid of these
>>>> (useless?) error messages?
>>>
>>> If i understand correctly, you are getting this enormous log message
>>> from nfs log only, all other logs and everything are fine now, right
>>> ? If that is the case, and you are not at all using nfs for
>>> exporting the volume, as  a workaround you can disable nfs for your
>>> volume or cluster. (gluster v set nfs.disable on). This will turnoff
>>> your gluster nfs server, and you will no longer get those log messages.
>>>
>>>
>>>>>> Any idea why there are rdma pieces in play when I've set my
>>>>>> transport to tcp?
>>>>>>
>>>>>
>>>>> there should not be any piece of rdma,if possible, can you paste
>>>>> the volfile for nfs server. You can find the volfile in
>>>>> /var/lib/glusterd/nfs/nfs-server.vol or
>>>>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>>>>
>>>> I will get this for you when I can.  Thanks.
>>>
>>> If you can make it, that will be great help to understand the problem.
>>>
>>>
>>> Rafi KC
>>>
>>>>
>>>> Regards,
>>>> Jon Heese
>>>>
>>>>> Rafi KC
>>>>>>
>>>>>> The actual I/O appears to be handled properly and I've seen no
>>>>>> further errors in the testing I've done so far.
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Jon Heese
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> *From:* gluster-users-bounces at gluster.org
>>>>>> <gluster-users-bounces at gluster.org> on behalf of Jonathan Heese
>>>>>> <jheese at inetu.net>
>>>>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>>>>> *To:* Mohammed Rafi K C
>>>>>> *Cc:* gluster-users
>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>>>  
>>>>>> Mohammed,
>>>>>>
>>>>>> Thanks very much for the reply.  I will try that and report back.
>>>>>>
>>>>>> Regards,
>>>>>> Jon Heese
>>>>>>
>>>>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C"
>>>>>> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Does anyone else have any further suggestions for
>>>>>>>> troubleshooting this?
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> To sum up: I have a 2 node 2 brick replicated volume, which
>>>>>>>> holds a handful of iSCSI image files which are mounted and
>>>>>>>> served up by tgtd (CentOS 6) to a handful of devices on a
>>>>>>>> dedicated iSCSI network.  The most important iSCSI clients
>>>>>>>> (initiators) are four VMware ESXi 5.5 hosts that use the iSCSI
>>>>>>>> volumes as backing for their datastores for virtual machine
>>>>>>>> storage.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> After a few minutes of sustained writing to the volume, I am
>>>>>>>> seeing a massive flood (over 1500 per second at times) of this
>>>>>>>> error in /var/log/glusterfs/mnt-gluster-disk.log:
>>>>>>>>
>>>>>>>> [2015-03-16 02:24:07.582801] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635358:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> When this happens, the ESXi box fails its write operation and
>>>>>>>> returns an error to the effect of “Unable to write data to
>>>>>>>> datastore”.  I don’t see anything else in the supporting logs
>>>>>>>> to explain the root cause of the i/o errors.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Any and all suggestions are appreciated.  Thanks.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>
>>>>>>> From the mount logs, i assume that your volume transport type is
>>>>>>> rdma. There are some known issues for rdma in 3.5.3, and the
>>>>>>> patch for to address those issues are already send to upstream
>>>>>>> [1]. From the logs, I'm not sure and it is hard to tell you
>>>>>>> whether this problem is something related to rdma transport or
>>>>>>> not. To make sure that the tcp transport is works well in this
>>>>>>> scenario, if possible can you try to reproduce the same using
>>>>>>> tcp type volumes. You can change the transport type of volume by
>>>>>>> doing the following step ( not recommended in normal use case).
>>>>>>>
>>>>>>> 1) unmount every client
>>>>>>> 2) stop the volume
>>>>>>> 3) run gluster volume set volname config.transport tcp
>>>>>>> 4) start the volume again
>>>>>>> 5) mount the clients
>>>>>>>
>>>>>>> [1] : http://goo.gl/2PTL61
>>>>>>>
>>>>>>> Regards
>>>>>>> Rafi KC
>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential information, which also
>>>>>>>> may be privileged, and is intended only for the person(s)
>>>>>>>> addressed above. Any unauthorized use, distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged information is
>>>>>>>> strictly prohibited. If you have received this communication in
>>>>>>>> error, please erase all copies of the message and its
>>>>>>>> attachments and notify the sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> *From:*Jonathan Heese
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>>>>> *To:* 'Ravishankar N'; gluster-users at gluster.org
>>>>>>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Ravi,
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> The last lines in the mount log before the massive vomit of I/O
>>>>>>>> errors are from 22 minutes prior, and seem innocuous to me:
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126340] E
>>>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-0: failed to get the port number for
>>>>>>>> remote subvolume. Please run 'gluster volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126687] E
>>>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>>>> 0-gluster_disk-client-1: failed to get the port number for
>>>>>>>> remote subvolume. Please run 'gluster volume status' on server
>>>>>>>> to see if brick process is running.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730165] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gluster_disk-client-0:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>>>>> (peer:10.10.10.1:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739500] I
>>>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gluster_disk-client-1:
>>>>>>>> changing port to 49152 (from 0)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>>>> [0x7fd9c557bccf]
>>>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>>>> [0x7fd9c557a995]
>>>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>>>>> (peer:10.10.10.2:24008)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.741883] I
>>>>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744524] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152,
>>>>>>>> attached to remote volume '/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744537] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server and Client lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
>>>>>>>> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0'
>>>>>>>> came back up; going online.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.744627] I
>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.753037] I
>>>>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>>>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num
>>>>>>>> (1298437), Version (330)
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755657] I
>>>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152,
>>>>>>>> attached to remote volume '/bricks/brick1'.
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.755676] I
>>>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server and Client lk-version numbers
>>>>>>>> are not same, reopening the fds
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.761945] I
>>>>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse: switched to graph 0
>>>>>>>>
>>>>>>>> [2015-03-16 01:37:10.762144] I
>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>>>>
>>>>>>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
>>>>>>>> 7.22 kernel 7.14
>>>>>>>>
>>>>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084:
>>>>>>>> WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>> …
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> I’ve seen no indication of split-brain on any files at any
>>>>>>>> point in this (ever since downdating from 3.6.2 to 3.5.3, which
>>>>>>>> is when this particular issue started):
>>>>>>>>
>>>>>>>> [root at duke gfapi-module-for-linux-target-driver-]# gluster v
>>>>>>>> heal gluster_disk info
>>>>>>>>
>>>>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>>>>
>>>>>>>> Number of entries: 0
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> /Jon Heese/
>>>>>>>> /Systems Engineer/
>>>>>>>> *INetU Managed Hosting*
>>>>>>>> P: 610.266.7441 x 261
>>>>>>>> F: 610.266.7434
>>>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>> /** This message contains confidential information, which also
>>>>>>>> may be privileged, and is intended only for the person(s)
>>>>>>>> addressed above. Any unauthorized use, distribution, copying or
>>>>>>>> disclosure of confidential and/or privileged information is
>>>>>>>> strictly prohibited. If you have received this communication in
>>>>>>>> error, please erase all copies of the message and its
>>>>>>>> attachments and notify the sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>>>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>>>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>>>>>>> <mailto:gluster-users at gluster.org>
>>>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>>>>
>>>>>>>>     Hello,
>>>>>>>>
>>>>>>>>     So I resolved my previous issue with split-brains and the
>>>>>>>>     lack of self-healing by dropping my installed glusterfs*
>>>>>>>>     packages from 3.6.2 to 3.5.3, but now I've picked up a new
>>>>>>>>     issue, which actually makes normal use of the volume
>>>>>>>>     practically impossible.
>>>>>>>>
>>>>>>>>     A little background for those not already paying close
>>>>>>>>     attention:
>>>>>>>>     I have a 2 node 2 brick replicating volume whose purpose in
>>>>>>>>     life is to hold iSCSI target files, primarily for use to
>>>>>>>>     provide datastores to a VMware ESXi cluster.  The plan is
>>>>>>>>     to put a handful of image files on the Gluster volume,
>>>>>>>>     mount them locally on both Gluster nodes, and run tgtd on
>>>>>>>>     both, pointed to the image files on the mounted gluster
>>>>>>>>     volume. Then the ESXi boxes will use multipath
>>>>>>>>     (active/passive) iSCSI to connect to the nodes, with
>>>>>>>>     automatic failover in case of planned or unplanned downtime
>>>>>>>>     of the Gluster nodes.
>>>>>>>>
>>>>>>>>     In my most recent round of testing with 3.5.3, I'm seeing a
>>>>>>>>     massive failure to write data to the volume after about
>>>>>>>>     5-10 minutes, so I've simplified the scenario a bit (to
>>>>>>>>     minimize the variables) to: both Gluster nodes up, only one
>>>>>>>>     node (duke) mounted and running tgtd, and just regular
>>>>>>>>     (single path) iSCSI from a single ESXi server.
>>>>>>>>
>>>>>>>>     About 5-10 minutes into migration a VM onto the test
>>>>>>>>     datastore, /var/log/messages on duke gets blasted with a
>>>>>>>>     ton of messages exactly like this:
>>>>>>>>
>>>>>>>>     Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error
>>>>>>>>     0x1781e00 2a -1 512 22971904, Input/output error
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     And /var/log/glusterfs/mnt-gluster_disk.log gets blased
>>>>>>>>     with a ton of messages exactly like this:
>>>>>>>>
>>>>>>>>     [2015-03-16 02:24:07.572279] W
>>>>>>>>     [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse:
>>>>>>>>     635299: WRITE => -1 (Input/output error)
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>
>>>>>>>> Are there any messages in the mount log from AFR about
>>>>>>>> split-brain just before the above line appears?
>>>>>>>> Does `gluster v heal <VOLNAME> info` show any files? Performing
>>>>>>>> I/O on files that are in split-brain fail with EIO.
>>>>>>>>
>>>>>>>> -Ravi
>>>>>>>>
>>>>>>>>     And the write operation from VMware's side fails as soon as
>>>>>>>>     these messages start.
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     I don't see any other errors (in the log files I know of)
>>>>>>>>     indicating the root cause of these i/o errors.  I'm sure
>>>>>>>>     that this is not enough information to tell what's going
>>>>>>>>     on, but can anyone help me figure out what to look at next
>>>>>>>>     to figure this out?
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     I've also considered using Dan Lambright's libgfapi gluster
>>>>>>>>     module for tgtd (or something similar) to avoid going
>>>>>>>>     through FUSE, but I'm not sure whether that would be
>>>>>>>>     irrelevant to this problem, since I'm not 100% sure if it
>>>>>>>>     lies in FUSE or elsewhere.
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     Thanks!
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>     /Jon Heese/
>>>>>>>>     /Systems Engineer/
>>>>>>>>     *INetU Managed Hosting*
>>>>>>>>     P: 610.266.7441 x 261
>>>>>>>>     F: 610.266.7434
>>>>>>>>     www.inetu.net <https://www.inetu.net/>
>>>>>>>>
>>>>>>>>     /** This message contains confidential information, which
>>>>>>>>     also may be privileged, and is intended only for the
>>>>>>>>     person(s) addressed above. Any unauthorized use,
>>>>>>>>     distribution, copying or disclosure of confidential and/or
>>>>>>>>     privileged information is strictly prohibited. If you have
>>>>>>>>     received this communication in error, please erase all
>>>>>>>>     copies of the message and its attachments and notify the
>>>>>>>>     sender immediately via reply e-mail. **/
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     _______________________________________________
>>>>>>>>
>>>>>>>>     Gluster-users mailing list
>>>>>>>>
>>>>>>>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>>>
>>>>>>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>
>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150327/c0f7aa6e/attachment.html>