[Gluster-users] I/O error on replicated volume

Fri Mar 20 07:26:25 UTC 2015

On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>
> Hello all,
>
>  
>
> Does anyone else have any further suggestions for troubleshooting this?
>
>  
>
> To sum up: I have a 2 node 2 brick replicated volume, which holds a
> handful of iSCSI image files which are mounted and served up by tgtd
> (CentOS 6) to a handful of devices on a dedicated iSCSI network.  The
> most important iSCSI clients (initiators) are four VMware ESXi 5.5
> hosts that use the iSCSI volumes as backing for their datastores for
> virtual machine storage.
>
>  
>
> After a few minutes of sustained writing to the volume, I am seeing a
> massive flood (over 1500 per second at times) of this error in
> /var/log/glusterfs/mnt-gluster-disk.log:
>
> [2015-03-16 02:24:07.582801] W [fuse-bridge.c:2242:fuse_writev_cbk]
> 0-glusterfs-fuse: 635358: WRITE => -1 (Input/output error)
>
>  
>
> When this happens, the ESXi box fails its write operation and returns
> an error to the effect of “Unable to write data to datastore”.  I
> don’t see anything else in the supporting logs to explain the root
> cause of the i/o errors.
>
>  
>
> Any and all suggestions are appreciated.  Thanks.
>
>  
>

>From the mount logs, i assume that your volume transport type is rdma.
There are some known issues for rdma in 3.5.3, and the patch for to
address those issues are already send to upstream [1]. From the logs,
I'm not sure and it is hard to tell you whether this problem is
something related to rdma transport or not. To make sure that the tcp
transport is works well in this scenario, if possible can you try to
reproduce the same using tcp type volumes. You can change the transport
type of volume by doing the following step ( not recommended in normal
use case).

1) unmount every client
2) stop the volume
3) run gluster volume set volname config.transport tcp
4) start the volume again
5) mount the clients

[1] : http://goo.gl/2PTL61

Regards
Rafi KC

> /Jon Heese/
> /Systems Engineer/
> *INetU Managed Hosting*
> P: 610.266.7441 x 261
> F: 610.266.7434
> www.inetu.net <https://www.inetu.net/>
>
> /** This message contains confidential information, which also may be
> privileged, and is intended only for the person(s) addressed above.
> Any unauthorized use, distribution, copying or disclosure of
> confidential and/or privileged information is strictly prohibited. If
> you have received this communication in error, please erase all copies
> of the message and its attachments and notify the sender immediately
> via reply e-mail. **/
>
>  
>
> *From:*Jonathan Heese
> *Sent:* Tuesday, March 17, 2015 12:36 PM
> *To:* 'Ravishankar N'; gluster-users at gluster.org
> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>
>  
>
> Ravi,
>
>  
>
> The last lines in the mount log before the massive vomit of I/O errors
> are from 22 minutes prior, and seem innocuous to me:
>
>  
>
> [2015-03-16 01:37:07.126340] E
> [client-handshake.c:1760:client_query_portmap_cbk]
> 0-gluster_disk-client-0: failed to get the port number for remote
> subvolume. Please run 'gluster volume status' on server to see if
> brick process is running.
>
> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> [0x7fd9c557a995]
> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
> (peer:10.10.10.1:24008)
>
> [2015-03-16 01:37:07.126687] E
> [client-handshake.c:1760:client_query_portmap_cbk]
> 0-gluster_disk-client-1: failed to get the port number for remote
> subvolume. Please run 'gluster volume status' on server to see if
> brick process is running.
>
> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> [0x7fd9c557a995]
> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
> (peer:10.10.10.2:24008)
>
> [2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
> 0-gluster_disk-client-0: changing port to 49152 (from 0)
>
> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> [0x7fd9c557a995]
> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
> (peer:10.10.10.1:24008)
>
> [2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
> 0-gluster_disk-client-1: changing port to 49152 (from 0)
>
> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> [0x7fd9c557a995]
> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
> (peer:10.10.10.2:24008)
>
> [2015-03-16 01:37:10.741883] I
> [client-handshake.c:1677:select_server_supported_programs]
> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
>
> [2015-03-16 01:37:10.744524] I
> [client-handshake.c:1462:client_setvolume_cbk]
> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached to
> remote volume '/bricks/brick1'.
>
> [2015-03-16 01:37:10.744537] I
> [client-handshake.c:1474:client_setvolume_cbk]
> 0-gluster_disk-client-0: Server and Client lk-version numbers are not
> same, reopening the fds
>
> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0' came
> back up; going online.
>
> [2015-03-16 01:37:10.744627] I
> [client-handshake.c:450:client_set_lk_version_cbk]
> 0-gluster_disk-client-0: Server lk version = 1
>
> [2015-03-16 01:37:10.753037] I
> [client-handshake.c:1677:select_server_supported_programs]
> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num (1298437),
> Version (330)
>
> [2015-03-16 01:37:10.755657] I
> [client-handshake.c:1462:client_setvolume_cbk]
> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached to
> remote volume '/bricks/brick1'.
>
> [2015-03-16 01:37:10.755676] I
> [client-handshake.c:1474:client_setvolume_cbk]
> 0-gluster_disk-client-1: Server and Client lk-version numbers are not
> same, reopening the fds
>
> [2015-03-16 01:37:10.761945] I [fuse-bridge.c:5016:fuse_graph_setup]
> 0-fuse: switched to graph 0
>
> [2015-03-16 01:37:10.762144] I
> [client-handshake.c:450:client_set_lk_version_cbk]
> 0-gluster_disk-client-1: Server lk version = 1
>
> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
> kernel 7.14
>
> [*2015-03-16 01:59:26.098670*] W [fuse-bridge.c:2242:fuse_writev_cbk]
> 0-glusterfs-fuse: 292084: WRITE => -1 (Input/output error)
>
> …
>
>  
>
> I’ve seen no indication of split-brain on any files at any point in
> this (ever since downdating from 3.6.2 to 3.5.3, which is when this
> particular issue started):
>
> [root at duke gfapi-module-for-linux-target-driver-]# gluster v heal
> gluster_disk info
>
> Brick duke.jonheese.local:/bricks/brick1/
>
> Number of entries: 0
>
>  
>
> Brick duchess.jonheese.local:/bricks/brick1/
>
> Number of entries: 0
>
>  
>
> Thanks.
>
>  
>
> /Jon Heese/
> /Systems Engineer/
> *INetU Managed Hosting*
> P: 610.266.7441 x 261
> F: 610.266.7434
> www.inetu.net <https://www.inetu.net/>
>
> /** This message contains confidential information, which also may be
> privileged, and is intended only for the person(s) addressed above.
> Any unauthorized use, distribution, copying or disclosure of
> confidential and/or privileged information is strictly prohibited. If
> you have received this communication in error, please erase all copies
> of the message and its attachments and notify the sender immediately
> via reply e-mail. **/
>
>  
>
> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
> *Sent:* Tuesday, March 17, 2015 12:35 AM
> *To:* Jonathan Heese; gluster-users at gluster.org
> <mailto:gluster-users at gluster.org>
> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>
>  
>
>  
>
> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>
>     Hello,
>
>     So I resolved my previous issue with split-brains and the lack of
>     self-healing by dropping my installed glusterfs* packages from
>     3.6.2 to 3.5.3, but now I've picked up a new issue, which actually
>     makes normal use of the volume practically impossible.
>
>     A little background for those not already paying close attention:
>     I have a 2 node 2 brick replicating volume whose purpose in life
>     is to hold iSCSI target files, primarily for use to provide
>     datastores to a VMware ESXi cluster.  The plan is to put a handful
>     of image files on the Gluster volume, mount them locally on both
>     Gluster nodes, and run tgtd on both, pointed to the image files on
>     the mounted gluster volume. Then the ESXi boxes will use multipath
>     (active/passive) iSCSI to connect to the nodes, with automatic
>     failover in case of planned or unplanned downtime of the Gluster
>     nodes.
>
>     In my most recent round of testing with 3.5.3, I'm seeing a
>     massive failure to write data to the volume after about 5-10
>     minutes, so I've simplified the scenario a bit (to minimize the
>     variables) to: both Gluster nodes up, only one node (duke) mounted
>     and running tgtd, and just regular (single path) iSCSI from a
>     single ESXi server.
>
>     About 5-10 minutes into migration a VM onto the test datastore,
>     /var/log/messages on duke gets blasted with a ton of messages
>     exactly like this:
>
>     Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error 0x1781e00
>     2a -1 512 22971904, Input/output error
>
>      
>
>     And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a ton
>     of messages exactly like this:
>
>     [2015-03-16 02:24:07.572279] W
>     [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635299:
>     WRITE => -1 (Input/output error)
>
>      
>
>
> Are there any messages in the mount log from AFR about split-brain
> just before the above line appears?
> Does `gluster v heal <VOLNAME> info` show any files? Performing I/O on
> files that are in split-brain fail with EIO.
>
> -Ravi
>
>     And the write operation from VMware's side fails as soon as these
>     messages start.
>
>      
>
>     I don't see any other errors (in the log files I know of)
>     indicating the root cause of these i/o errors.  I'm sure that this
>     is not enough information to tell what's going on, but can anyone
>     help me figure out what to look at next to figure this out?
>
>      
>
>     I've also considered using Dan Lambright's libgfapi gluster module
>     for tgtd (or something similar) to avoid going through FUSE, but
>     I'm not sure whether that would be irrelevant to this problem,
>     since I'm not 100% sure if it lies in FUSE or elsewhere.
>
>      
>
>     Thanks!
>
>      
>
>     /Jon Heese/
>     /Systems Engineer/
>     *INetU Managed Hosting*
>     P: 610.266.7441 x 261
>     F: 610.266.7434
>     www.inetu.net <https://www.inetu.net/>
>
>     /** This message contains confidential information, which also may
>     be privileged, and is intended only for the person(s) addressed
>     above. Any unauthorized use, distribution, copying or disclosure
>     of confidential and/or privileged information is strictly
>     prohibited. If you have received this communication in error,
>     please erase all copies of the message and its attachments and
>     notify the sender immediately via reply e-mail. **/
>
>      
>
>
>
>     _______________________________________________
>
>     Gluster-users mailing list
>
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>
>     http://www.gluster.org/mailman/listinfo/gluster-users
>
>  
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150320/bbf2b97a/attachment.html>