[Gluster-users] I/O error on replicated volume

Tue Mar 17 04:35:05 UTC 2015

On 03/17/2015 02:14 AM, Jonathan Heese wrote:
> Hello,
>
> So I resolved my previous issue with split-brains and the lack of 
> self-healing by dropping my installed glusterfs* packages from 3.6.2 
> to 3.5.3, but now I've picked up a new issue, which actually makes 
> normal use of the volume practically impossible.
>
> A little background for those not already paying close attention:
> I have a 2 node 2 brick replicating volume whose purpose in life is to 
> hold iSCSI target files, primarily for use to provide datastores to a 
> VMware ESXi cluster.  The plan is to put a handful of image files on 
> the Gluster volume, mount them locally on both Gluster nodes, and run 
> tgtd on both, pointed to the image files on the mounted gluster 
> volume. Then the ESXi boxes will use multipath (active/passive) iSCSI 
> to connect to the nodes, with automatic failover in case of planned or 
> unplanned downtime of the Gluster nodes.
>
> In my most recent round of testing with 3.5.3, I'm seeing a massive 
> failure to write data to the volume after about 5-10 minutes, so I've 
> simplified the scenario a bit (to minimize the variables) to: both 
> Gluster nodes up, only one node (duke) mounted and running tgtd, and 
> just regular (single path) iSCSI from a single ESXi server.
>
> About 5-10 minutes into migration a VM onto the test datastore, 
> /var/log/messages on duke gets blasted with a ton of messages exactly 
> like this:
>
> Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error 0x1781e00 2a 
> -1 512 22971904, Input/output error
>
>
> And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a ton of 
> messages exactly like this:
>
> [2015-03-16 02:24:07.572279] W [fuse-bridge.c:2242:fuse_writev_cbk] 
> 0-glusterfs-fuse: 635299: WRITE => -1 (Input/output error)
>
>

Are there any messages in the mount log from AFR about split-brain just 
before the above line appears?
Does `gluster v heal <VOLNAME> info` show any files? Performing I/O on 
files that are in split-brain fail with EIO.

-Ravi

> And the write operation from VMware's side fails as soon as these 
> messages start.
>
>
> I don't see any other errors (in the log files I know of) indicating 
> the root cause of these i/o errors.  I'm sure that this is not enough 
> information to tell what's going on, but can anyone help me figure out 
> what to look at next to figure this out?
>
>
> I've also considered using Dan Lambright's libgfapi gluster module for 
> tgtd (or something similar) to avoid going through FUSE, but I'm not 
> sure whether that would be irrelevant to this problem, since I'm not 
> 100% sure if it lies in FUSE or elsewhere.
>
>
> Thanks!
>
>
> /Jon Heese/
> /Systems Engineer/
> *INetU Managed Hosting*
> P: 610.266.7441 x 261
> F: 610.266.7434
> www.inetu.net<https://www.inetu.net/>
>
> /** This message contains confidential information, which also may be 
> privileged, and is intended only for the person(s) addressed above. 
> Any unauthorized use, distribution, copying or disclosure of 
> confidential and/or privileged information is strictly prohibited. If 
> you have received this communication in error, please erase all copies 
> of the message and its attachments and notify the sender immediately 
> via reply e-mail. **/
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150317/d7c640a2/attachment.html>