[Gluster-users] I/O error on replicated volume

Fri Mar 27 05:29:44 UTC 2015

When we change the transport from x to y, it should reflect in all the
vol files. But unfortunately, the volume set command failed to change in
nfs server, (of course it is a bug).  I had clearly mentioned in my
previous mails, that changing the volume files using the volume set
command is not recommended, i suggested this, just to check whether tcp
work fine or not.

The reason why you are getting rdma connection error is because , now
bricks are running through tcp, so the brick process will be listening
on socket port. But nfs-server asked for an rdma connection, so they are
trying to connect from rdma port to tcp port. Obviously the connection
will be rejected.

Regards
Rafi KC

On 03/27/2015 12:28 AM, Jonathan Heese wrote:
>
> Rafi,
>
>
> Here is my nfs-server.vol file:
>
>
> [root at duke ~]# cat /var/lib/glusterd/nfs/nfs-server.vol
> volume gluster_disk-client-0
>     type protocol/client
>     option send-gids true
>     option password 562ab460-7754-4b5a-82e6-18ed6c130786
>     option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>     option transport-type rdma
>     option remote-subvolume /bricks/brick1
>     option remote-host duke-ib
> end-volume
>
> volume gluster_disk-client-1
>     type protocol/client
>     option send-gids true
>     option password 562ab460-7754-4b5a-82e6-18ed6c130786
>     option username ad5d5754-cf02-4b96-9f85-ff3129ae0405
>     option transport-type rdma
>     option remote-subvolume /bricks/brick1
>     option remote-host duchess-ib
> end-volume
>
> volume gluster_disk-replicate-0
>     type cluster/replicate
>     subvolumes gluster_disk-client-0 gluster_disk-client-1
> end-volume
>
> volume gluster_disk-dht
>     type cluster/distribute
>     subvolumes gluster_disk-replicate-0
> end-volume
>
> volume gluster_disk-write-behind
>     type performance/write-behind
>     subvolumes gluster_disk-dht
> end-volume
>
> volume gluster_disk
>     type debug/io-stats
>     option count-fop-hits off
>     option latency-measurement off
>     subvolumes gluster_disk-write-behind
> end-volume
>
> volume nfs-server
>     type nfs/server
>     option nfs3.gluster_disk.volume-id
> 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>     option rpc-auth.addr.gluster_disk.allow *
>     option nfs.drc off
>     option nfs.nlm on
>     option nfs.dynamic-volumes on
>     subvolumes gluster_disk
> end-volume
>
> I see that "transport-type rdma" is listed a couple times here, but
> "gluster volume info" indicates that the volume is using the tcp
> transport:
>
>
> [root at duke ~]# gluster volume info gluster_disk
>
> Volume Name: gluster_disk
> Type: Replicate
> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: duke-ib:/bricks/brick1
> Brick2: duchess-ib:/bricks/brick1
> Options Reconfigured:
> config.transport: tcp
>
> Please let me know if you need any further information from me to
> determine how to correct this discrepancy.
>
>
> Also, I feel compelled to ask: Since the TCP connections are going
> over the InfiniBand connections between the two Gluster servers (based
> on the hostnames which are pointed to the IB IPs via hosts files), are
> there any (significant) drawbacks to using TCP instead of RDMA here? 
> Thanks.
>
>
> Regards,
>
> Jon Heese
>
>
> ------------------------------------------------------------------------
> *From:* Mohammed Rafi K C <rkavunga at redhat.com>
> *Sent:* Monday, March 23, 2015 3:29 AM
> *To:* Jonathan Heese
> *Cc:* gluster-users
> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>  
>
> On 03/23/2015 11:28 AM, Jonathan Heese wrote:
>> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C" <rkavunga at redhat.com
>> <mailto:rkavunga at redhat.com>> wrote:
>>
>>>
>>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>>
>>>> Mohamed,
>>>>
>>>>
>>>> I have completed the steps you suggested (unmount all, stop the
>>>> volume, set the config.transport to tcp, start the volume, mount,
>>>> etc.), and the behavior has indeed changed.
>>>>
>>>>
>>>> [root at duke ~]# gluster volume info
>>>>
>>>> Volume Name: gluster_disk
>>>> Type: Replicate
>>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: duke-ib:/bricks/brick1
>>>> Brick2: duchess-ib:/bricks/brick1
>>>> Options Reconfigured:
>>>> config.transport: tcp
>>>>
>>>>
>>>> [root at duke ~]# gluster volume status
>>>> Status of volume: gluster_disk
>>>> Gluster process                                         Port   
>>>> Online  Pid
>>>> ------------------------------------------------------------------------------
>>>> Brick duke-ib:/bricks/brick1                            49152  
>>>> Y       16362
>>>> Brick duchess-ib:/bricks/brick1                         49152  
>>>> Y       14155
>>>> NFS Server on localhost                                 2049   
>>>> Y       16374
>>>> Self-heal Daemon on localhost                           N/A    
>>>> Y       16381
>>>> NFS Server on duchess-ib                                2049   
>>>> Y       14167
>>>> Self-heal Daemon on duchess-ib                          N/A    
>>>> Y       14174
>>>>
>>>> Task Status of Volume gluster_disk
>>>> ------------------------------------------------------------------------------
>>>> There are no active volume tasks
>>>>
>>>> I am no longer seeing the I/O errors during prolonged periods of
>>>> write I/O that I was seeing when the transport was set to rdma.
>>>> However, I am seeing this message on both nodes every 3 seconds
>>>> (almost exactly):
>>>>
>>>>
>>>> ==> /var/log/glusterfs/nfs.log <==
>>>> [2015-03-21 14:17:40.379719] W
>>>> [rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1: cma
>>>> event RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023
>>>> peer:10.10.10.2:49152)
>>>>
>>>>
>>>> Is this something to worry about?
>>>>
>>> If you are not using nfs to export the volumes, there is nothing to
>>> worry.
>>
>> I'm using the native glusterfs FUSE component to mount the volume
>> locally on both servers -- I assume that you're referring to the
>> standard NFS protocol stuff, which I'm not using here.
>>
>> Incidentally, I would like to keep my logs from filling up with junk
>> if possible.  Is there something I can do to get rid of these
>> (useless?) error messages?
>
> If i understand correctly, you are getting this enormous log message
> from nfs log only, all other logs and everything are fine now, right ?
> If that is the case, and you are not at all using nfs for exporting
> the volume, as  a workaround you can disable nfs for your volume or
> cluster. (gluster v set nfs.disable on). This will turnoff your
> gluster nfs server, and you will no longer get those log messages.
>
>
>>>> Any idea why there are rdma pieces in play when I've set my
>>>> transport to tcp?
>>>>
>>>
>>> there should not be any piece of rdma,if possible, can you paste the
>>> volfile for nfs server. You can find the volfile in
>>> /var/lib/glusterd/nfs/nfs-server.vol or
>>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>>
>> I will get this for you when I can.  Thanks.
>
> If you can make it, that will be great help to understand the problem.
>
>
> Rafi KC
>
>>
>> Regards,
>> Jon Heese
>>
>>> Rafi KC
>>>>
>>>> The actual I/O appears to be handled properly and I've seen no
>>>> further errors in the testing I've done so far.
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Jon Heese
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* gluster-users-bounces at gluster.org
>>>> <gluster-users-bounces at gluster.org> on behalf of Jonathan Heese
>>>> <jheese at inetu.net>
>>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>>> *To:* Mohammed Rafi K C
>>>> *Cc:* gluster-users
>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>  
>>>> Mohammed,
>>>>
>>>> Thanks very much for the reply.  I will try that and report back.
>>>>
>>>> Regards,
>>>> Jon Heese
>>>>
>>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C"
>>>> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote:
>>>>
>>>>>
>>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Does anyone else have any further suggestions for troubleshooting
>>>>>> this?
>>>>>>
>>>>>>  
>>>>>>
>>>>>> To sum up: I have a 2 node 2 brick replicated volume, which holds
>>>>>> a handful of iSCSI image files which are mounted and served up by
>>>>>> tgtd (CentOS 6) to a handful of devices on a dedicated iSCSI
>>>>>> network.  The most important iSCSI clients (initiators) are four
>>>>>> VMware ESXi 5.5 hosts that use the iSCSI volumes as backing for
>>>>>> their datastores for virtual machine storage.
>>>>>>
>>>>>>  
>>>>>>
>>>>>> After a few minutes of sustained writing to the volume, I am
>>>>>> seeing a massive flood (over 1500 per second at times) of this
>>>>>> error in /var/log/glusterfs/mnt-gluster-disk.log:
>>>>>>
>>>>>> [2015-03-16 02:24:07.582801] W
>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635358:
>>>>>> WRITE => -1 (Input/output error)
>>>>>>
>>>>>>  
>>>>>>
>>>>>> When this happens, the ESXi box fails its write operation and
>>>>>> returns an error to the effect of “Unable to write data to
>>>>>> datastore”.  I don’t see anything else in the supporting logs to
>>>>>> explain the root cause of the i/o errors.
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Any and all suggestions are appreciated.  Thanks.
>>>>>>
>>>>>>  
>>>>>>
>>>>>
>>>>> From the mount logs, i assume that your volume transport type is
>>>>> rdma. There are some known issues for rdma in 3.5.3, and the patch
>>>>> for to address those issues are already send to upstream [1]. From
>>>>> the logs, I'm not sure and it is hard to tell you whether this
>>>>> problem is something related to rdma transport or not. To make
>>>>> sure that the tcp transport is works well in this scenario, if
>>>>> possible can you try to reproduce the same using tcp type volumes.
>>>>> You can change the transport type of volume by doing the following
>>>>> step ( not recommended in normal use case).
>>>>>
>>>>> 1) unmount every client
>>>>> 2) stop the volume
>>>>> 3) run gluster volume set volname config.transport tcp
>>>>> 4) start the volume again
>>>>> 5) mount the clients
>>>>>
>>>>> [1] : http://goo.gl/2PTL61
>>>>>
>>>>> Regards
>>>>> Rafi KC
>>>>>
>>>>>> /Jon Heese/
>>>>>> /Systems Engineer/
>>>>>> *INetU Managed Hosting*
>>>>>> P: 610.266.7441 x 261
>>>>>> F: 610.266.7434
>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>
>>>>>> /** This message contains confidential information, which also
>>>>>> may be privileged, and is intended only for the person(s)
>>>>>> addressed above. Any unauthorized use, distribution, copying or
>>>>>> disclosure of confidential and/or privileged information is
>>>>>> strictly prohibited. If you have received this communication in
>>>>>> error, please erase all copies of the message and its attachments
>>>>>> and notify the sender immediately via reply e-mail. **/
>>>>>>
>>>>>>  
>>>>>>
>>>>>> *From:*Jonathan Heese
>>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>>> *To:* 'Ravishankar N'; gluster-users at gluster.org
>>>>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Ravi,
>>>>>>
>>>>>>  
>>>>>>
>>>>>> The last lines in the mount log before the massive vomit of I/O
>>>>>> errors are from 22 minutes prior, and seem innocuous to me:
>>>>>>
>>>>>>  
>>>>>>
>>>>>> [2015-03-16 01:37:07.126340] E
>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>> 0-gluster_disk-client-0: failed to get the port number for remote
>>>>>> subvolume. Please run 'gluster volume status' on server to see if
>>>>>> brick process is running.
>>>>>>
>>>>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>>> (peer:10.10.10.1:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:07.126687] E
>>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>>> 0-gluster_disk-client-1: failed to get the port number for remote
>>>>>> subvolume. Please run 'gluster volume status' on server to see if
>>>>>> brick process is running.
>>>>>>
>>>>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>>> (peer:10.10.10.2:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:10.730165] I
>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gluster_disk-client-0:
>>>>>> changing port to 49152 (from 0)
>>>>>>
>>>>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>>> (peer:10.10.10.1:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:10.739500] I
>>>>>> [rpc-clnt.c:1729:rpc_clnt_reconfig] 0-gluster_disk-client-1:
>>>>>> changing port to 49152 (from 0)
>>>>>>
>>>>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>>> [0x7fd9c557bccf]
>>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>>> [0x7fd9c557a995]
>>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>>> (peer:10.10.10.2:24008)
>>>>>>
>>>>>> [2015-03-16 01:37:10.741883] I
>>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num
>>>>>> (1298437), Version (330)
>>>>>>
>>>>>> [2015-03-16 01:37:10.744524] I
>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached
>>>>>> to remote volume '/bricks/brick1'.
>>>>>>
>>>>>> [2015-03-16 01:37:10.744537] I
>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-0: Server and Client lk-version numbers are
>>>>>> not same, reopening the fds
>>>>>>
>>>>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
>>>>>> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0'
>>>>>> came back up; going online.
>>>>>>
>>>>>> [2015-03-16 01:37:10.744627] I
>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>>
>>>>>> [2015-03-16 01:37:10.753037] I
>>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num
>>>>>> (1298437), Version (330)
>>>>>>
>>>>>> [2015-03-16 01:37:10.755657] I
>>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached
>>>>>> to remote volume '/bricks/brick1'.
>>>>>>
>>>>>> [2015-03-16 01:37:10.755676] I
>>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>>> 0-gluster_disk-client-1: Server and Client lk-version numbers are
>>>>>> not same, reopening the fds
>>>>>>
>>>>>> [2015-03-16 01:37:10.761945] I
>>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse: switched to graph 0
>>>>>>
>>>>>> [2015-03-16 01:37:10.762144] I
>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>>
>>>>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
>>>>>> 7.22 kernel 7.14
>>>>>>
>>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084:
>>>>>> WRITE => -1 (Input/output error)
>>>>>>
>>>>>> …
>>>>>>
>>>>>>  
>>>>>>
>>>>>> I’ve seen no indication of split-brain on any files at any point
>>>>>> in this (ever since downdating from 3.6.2 to 3.5.3, which is when
>>>>>> this particular issue started):
>>>>>>
>>>>>> [root at duke gfapi-module-for-linux-target-driver-]# gluster v heal
>>>>>> gluster_disk info
>>>>>>
>>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>>
>>>>>> Number of entries: 0
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>>
>>>>>> Number of entries: 0
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>  
>>>>>>
>>>>>> /Jon Heese/
>>>>>> /Systems Engineer/
>>>>>> *INetU Managed Hosting*
>>>>>> P: 610.266.7441 x 261
>>>>>> F: 610.266.7434
>>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>>
>>>>>> /** This message contains confidential information, which also
>>>>>> may be privileged, and is intended only for the person(s)
>>>>>> addressed above. Any unauthorized use, distribution, copying or
>>>>>> disclosure of confidential and/or privileged information is
>>>>>> strictly prohibited. If you have received this communication in
>>>>>> error, please erase all copies of the message and its attachments
>>>>>> and notify the sender immediately via reply e-mail. **/
>>>>>>
>>>>>>  
>>>>>>
>>>>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>>>>> <mailto:gluster-users at gluster.org>
>>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>>>
>>>>>>  
>>>>>>
>>>>>>  
>>>>>>
>>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>>
>>>>>>     Hello,
>>>>>>
>>>>>>     So I resolved my previous issue with split-brains and the
>>>>>>     lack of self-healing by dropping my installed glusterfs*
>>>>>>     packages from 3.6.2 to 3.5.3, but now I've picked up a new
>>>>>>     issue, which actually makes normal use of the volume
>>>>>>     practically impossible.
>>>>>>
>>>>>>     A little background for those not already paying close attention:
>>>>>>     I have a 2 node 2 brick replicating volume whose purpose in
>>>>>>     life is to hold iSCSI target files, primarily for use to
>>>>>>     provide datastores to a VMware ESXi cluster.  The plan is to
>>>>>>     put a handful of image files on the Gluster volume, mount
>>>>>>     them locally on both Gluster nodes, and run tgtd on both,
>>>>>>     pointed to the image files on the mounted gluster volume.
>>>>>>     Then the ESXi boxes will use multipath (active/passive) iSCSI
>>>>>>     to connect to the nodes, with automatic failover in case of
>>>>>>     planned or unplanned downtime of the Gluster nodes.
>>>>>>
>>>>>>     In my most recent round of testing with 3.5.3, I'm seeing a
>>>>>>     massive failure to write data to the volume after about 5-10
>>>>>>     minutes, so I've simplified the scenario a bit (to minimize
>>>>>>     the variables) to: both Gluster nodes up, only one node
>>>>>>     (duke) mounted and running tgtd, and just regular (single
>>>>>>     path) iSCSI from a single ESXi server.
>>>>>>
>>>>>>     About 5-10 minutes into migration a VM onto the test
>>>>>>     datastore, /var/log/messages on duke gets blasted with a ton
>>>>>>     of messages exactly like this:
>>>>>>
>>>>>>     Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error
>>>>>>     0x1781e00 2a -1 512 22971904, Input/output error
>>>>>>
>>>>>>      
>>>>>>
>>>>>>     And /var/log/glusterfs/mnt-gluster_disk.log gets blased with
>>>>>>     a ton of messages exactly like this:
>>>>>>
>>>>>>     [2015-03-16 02:24:07.572279] W
>>>>>>     [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse:
>>>>>>     635299: WRITE => -1 (Input/output error)
>>>>>>
>>>>>>      
>>>>>>
>>>>>>
>>>>>> Are there any messages in the mount log from AFR about
>>>>>> split-brain just before the above line appears?
>>>>>> Does `gluster v heal <VOLNAME> info` show any files? Performing
>>>>>> I/O on files that are in split-brain fail with EIO.
>>>>>>
>>>>>> -Ravi
>>>>>>
>>>>>>     And the write operation from VMware's side fails as soon as
>>>>>>     these messages start.
>>>>>>
>>>>>>      
>>>>>>
>>>>>>     I don't see any other errors (in the log files I know of)
>>>>>>     indicating the root cause of these i/o errors.  I'm sure that
>>>>>>     this is not enough information to tell what's going on, but
>>>>>>     can anyone help me figure out what to look at next to figure
>>>>>>     this out?
>>>>>>
>>>>>>      
>>>>>>
>>>>>>     I've also considered using Dan Lambright's libgfapi gluster
>>>>>>     module for tgtd (or something similar) to avoid going through
>>>>>>     FUSE, but I'm not sure whether that would be irrelevant to
>>>>>>     this problem, since I'm not 100% sure if it lies in FUSE or
>>>>>>     elsewhere.
>>>>>>
>>>>>>      
>>>>>>
>>>>>>     Thanks!
>>>>>>
>>>>>>      
>>>>>>
>>>>>>     /Jon Heese/
>>>>>>     /Systems Engineer/
>>>>>>     *INetU Managed Hosting*
>>>>>>     P: 610.266.7441 x 261
>>>>>>     F: 610.266.7434
>>>>>>     www.inetu.net <https://www.inetu.net/>
>>>>>>
>>>>>>     /** This message contains confidential information, which
>>>>>>     also may be privileged, and is intended only for the
>>>>>>     person(s) addressed above. Any unauthorized use,
>>>>>>     distribution, copying or disclosure of confidential and/or
>>>>>>     privileged information is strictly prohibited. If you have
>>>>>>     received this communication in error, please erase all copies
>>>>>>     of the message and its attachments and notify the sender
>>>>>>     immediately via reply e-mail. **/
>>>>>>
>>>>>>      
>>>>>>
>>>>>>
>>>>>>
>>>>>>     _______________________________________________
>>>>>>
>>>>>>     Gluster-users mailing list
>>>>>>
>>>>>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>
>>>>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>  
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150327/feeace6a/attachment.html>