[Gluster-users] I/O error on replicated volume

Mohammed Rafi K C rkavunga at redhat.com
Mon Mar 23 05:20:46 UTC 2015


On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>
> Mohamed,
>
>
> I have completed the steps you suggested (unmount all, stop the
> volume, set the config.transport to tcp, start the volume, mount,
> etc.), and the behavior has indeed changed.
>
>
> [root at duke ~]# gluster volume info
>
> Volume Name: gluster_disk
> Type: Replicate
> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: duke-ib:/bricks/brick1
> Brick2: duchess-ib:/bricks/brick1
> Options Reconfigured:
> config.transport: tcp
>
>
> [root at duke ~]# gluster volume status
> Status of volume: gluster_disk
> Gluster process                                         Port   
> Online  Pid
> ------------------------------------------------------------------------------
> Brick duke-ib:/bricks/brick1                            49152  
> Y       16362
> Brick duchess-ib:/bricks/brick1                         49152  
> Y       14155
> NFS Server on localhost                                 2049   
> Y       16374
> Self-heal Daemon on localhost                           N/A    
> Y       16381
> NFS Server on duchess-ib                                2049   
> Y       14167
> Self-heal Daemon on duchess-ib                          N/A    
> Y       14174
>
> Task Status of Volume gluster_disk
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> I am no longer seeing the I/O errors during prolonged periods of write
> I/O that I was seeing when the transport was set to rdma. However, I
> am seeing this message on both nodes every 3 seconds (almost exactly):
>
>
> ==> /var/log/glusterfs/nfs.log <==
> [2015-03-21 14:17:40.379719] W [rdma.c:1076:gf_rdma_cm_event_handler]
> 0-gluster_disk-client-1: cma event RDMA_CM_EVENT_REJECTED, error 8
> (me:10.10.10.1:1023 peer:10.10.10.2:49152)
>
>
> Is this something to worry about?
>
If you are not using nfs to export the volumes, there is nothing to worry.
>
> Any idea why there are rdma pieces in play when I've set my transport
> to tcp?
>

there should not be any piece of rdma,if possible, can you paste the
volfile for nfs server. You can find the volfile in
/var/lib/glusterd/nfs/nfs-server.vol or
/usr/local/var/lib/glusterd/nfs/nfs-server.vol.

Rafi KC
>
> The actual I/O appears to be handled properly and I've seen no further
> errors in the testing I've done so far.
>
>
> Thanks.
>
>
> Regards,
>
> Jon Heese
>
>
> ------------------------------------------------------------------------
> *From:* gluster-users-bounces at gluster.org
> <gluster-users-bounces at gluster.org> on behalf of Jonathan Heese
> <jheese at inetu.net>
> *Sent:* Friday, March 20, 2015 7:04 AM
> *To:* Mohammed Rafi K C
> *Cc:* gluster-users
> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>  
> Mohammed,
>
> Thanks very much for the reply.  I will try that and report back.
>
> Regards,
> Jon Heese
>
> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C" <rkavunga at redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>
>>> Hello all,
>>>
>>>  
>>>
>>> Does anyone else have any further suggestions for troubleshooting this?
>>>
>>>  
>>>
>>> To sum up: I have a 2 node 2 brick replicated volume, which holds a
>>> handful of iSCSI image files which are mounted and served up by tgtd
>>> (CentOS 6) to a handful of devices on a dedicated iSCSI network. 
>>> The most important iSCSI clients (initiators) are four VMware ESXi
>>> 5.5 hosts that use the iSCSI volumes as backing for their datastores
>>> for virtual machine storage.
>>>
>>>  
>>>
>>> After a few minutes of sustained writing to the volume, I am seeing
>>> a massive flood (over 1500 per second at times) of this error in
>>> /var/log/glusterfs/mnt-gluster-disk.log:
>>>
>>> [2015-03-16 02:24:07.582801] W [fuse-bridge.c:2242:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 635358: WRITE => -1 (Input/output error)
>>>
>>>  
>>>
>>> When this happens, the ESXi box fails its write operation and
>>> returns an error to the effect of “Unable to write data to
>>> datastore”.  I don’t see anything else in the supporting logs to
>>> explain the root cause of the i/o errors.
>>>
>>>  
>>>
>>> Any and all suggestions are appreciated.  Thanks.
>>>
>>>  
>>>
>>
>> From the mount logs, i assume that your volume transport type is
>> rdma. There are some known issues for rdma in 3.5.3, and the patch
>> for to address those issues are already send to upstream [1]. From
>> the logs, I'm not sure and it is hard to tell you whether this
>> problem is something related to rdma transport or not. To make sure
>> that the tcp transport is works well in this scenario, if possible
>> can you try to reproduce the same using tcp type volumes. You can
>> change the transport type of volume by doing the following step ( not
>> recommended in normal use case).
>>
>> 1) unmount every client
>> 2) stop the volume
>> 3) run gluster volume set volname config.transport tcp
>> 4) start the volume again
>> 5) mount the clients
>>
>> [1] : http://goo.gl/2PTL61
>>
>> Regards
>> Rafi KC
>>
>>> /Jon Heese/
>>> /Systems Engineer/
>>> *INetU Managed Hosting*
>>> P: 610.266.7441 x 261
>>> F: 610.266.7434
>>> www.inetu.net <https://www.inetu.net/>
>>>
>>> /** This message contains confidential information, which also may
>>> be privileged, and is intended only for the person(s) addressed
>>> above. Any unauthorized use, distribution, copying or disclosure of
>>> confidential and/or privileged information is strictly prohibited.
>>> If you have received this communication in error, please erase all
>>> copies of the message and its attachments and notify the sender
>>> immediately via reply e-mail. **/
>>>
>>>  
>>>
>>> *From:*Jonathan Heese
>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>> *To:* 'Ravishankar N'; gluster-users at gluster.org
>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>>>
>>>  
>>>
>>> Ravi,
>>>
>>>  
>>>
>>> The last lines in the mount log before the massive vomit of I/O
>>> errors are from 22 minutes prior, and seem innocuous to me:
>>>
>>>  
>>>
>>> [2015-03-16 01:37:07.126340] E
>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>> 0-gluster_disk-client-0: failed to get the port number for remote
>>> subvolume. Please run 'gluster volume status' on server to see if
>>> brick process is running.
>>>
>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>> (peer:10.10.10.1:24008)
>>>
>>> [2015-03-16 01:37:07.126687] E
>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>> 0-gluster_disk-client-1: failed to get the port number for remote
>>> subvolume. Please run 'gluster volume status' on server to see if
>>> brick process is running.
>>>
>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>> (peer:10.10.10.2:24008)
>>>
>>> [2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>>> 0-gluster_disk-client-0: changing port to 49152 (from 0)
>>>
>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>> (peer:10.10.10.1:24008)
>>>
>>> [2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>>> 0-gluster_disk-client-1: changing port to 49152 (from 0)
>>>
>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) [0x7fd9c557bccf]
>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>> [0x7fd9c557a995]
>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>> (peer:10.10.10.2:24008)
>>>
>>> [2015-03-16 01:37:10.741883] I
>>> [client-handshake.c:1677:select_server_supported_programs]
>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num (1298437),
>>> Version (330)
>>>
>>> [2015-03-16 01:37:10.744524] I
>>> [client-handshake.c:1462:client_setvolume_cbk]
>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached to
>>> remote volume '/bricks/brick1'.
>>>
>>> [2015-03-16 01:37:10.744537] I
>>> [client-handshake.c:1474:client_setvolume_cbk]
>>> 0-gluster_disk-client-0: Server and Client lk-version numbers are
>>> not same, reopening the fds
>>>
>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
>>> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0' came
>>> back up; going online.
>>>
>>> [2015-03-16 01:37:10.744627] I
>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>> 0-gluster_disk-client-0: Server lk version = 1
>>>
>>> [2015-03-16 01:37:10.753037] I
>>> [client-handshake.c:1677:select_server_supported_programs]
>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num (1298437),
>>> Version (330)
>>>
>>> [2015-03-16 01:37:10.755657] I
>>> [client-handshake.c:1462:client_setvolume_cbk]
>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached to
>>> remote volume '/bricks/brick1'.
>>>
>>> [2015-03-16 01:37:10.755676] I
>>> [client-handshake.c:1474:client_setvolume_cbk]
>>> 0-gluster_disk-client-1: Server and Client lk-version numbers are
>>> not same, reopening the fds
>>>
>>> [2015-03-16 01:37:10.761945] I [fuse-bridge.c:5016:fuse_graph_setup]
>>> 0-fuse: switched to graph 0
>>>
>>> [2015-03-16 01:37:10.762144] I
>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>> 0-gluster_disk-client-1: Server lk version = 1
>>>
>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22
>>> kernel 7.14
>>>
>>> [*2015-03-16 01:59:26.098670*] W
>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084: WRITE
>>> => -1 (Input/output error)
>>>
>>>>>>
>>>  
>>>
>>> I’ve seen no indication of split-brain on any files at any point in
>>> this (ever since downdating from 3.6.2 to 3.5.3, which is when this
>>> particular issue started):
>>>
>>> [root at duke gfapi-module-for-linux-target-driver-]# gluster v heal
>>> gluster_disk info
>>>
>>> Brick duke.jonheese.local:/bricks/brick1/
>>>
>>> Number of entries: 0
>>>
>>>  
>>>
>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>
>>> Number of entries: 0
>>>
>>>  
>>>
>>> Thanks.
>>>
>>>  
>>>
>>> /Jon Heese/
>>> /Systems Engineer/
>>> *INetU Managed Hosting*
>>> P: 610.266.7441 x 261
>>> F: 610.266.7434
>>> www.inetu.net <https://www.inetu.net/>
>>>
>>> /** This message contains confidential information, which also may
>>> be privileged, and is intended only for the person(s) addressed
>>> above. Any unauthorized use, distribution, copying or disclosure of
>>> confidential and/or privileged information is strictly prohibited.
>>> If you have received this communication in error, please erase all
>>> copies of the message and its attachments and notify the sender
>>> immediately via reply e-mail. **/
>>>
>>>  
>>>
>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>> <mailto:gluster-users at gluster.org>
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>
>>>  
>>>
>>>  
>>>
>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>
>>>     Hello,
>>>
>>>     So I resolved my previous issue with split-brains and the lack
>>>     of self-healing by dropping my installed glusterfs* packages
>>>     from 3.6.2 to 3.5.3, but now I've picked up a new issue, which
>>>     actually makes normal use of the volume practically impossible.
>>>
>>>     A little background for those not already paying close attention:
>>>     I have a 2 node 2 brick replicating volume whose purpose in life
>>>     is to hold iSCSI target files, primarily for use to provide
>>>     datastores to a VMware ESXi cluster.  The plan is to put a
>>>     handful of image files on the Gluster volume, mount them locally
>>>     on both Gluster nodes, and run tgtd on both, pointed to the
>>>     image files on the mounted gluster volume. Then the ESXi boxes
>>>     will use multipath (active/passive) iSCSI to connect to the
>>>     nodes, with automatic failover in case of planned or unplanned
>>>     downtime of the Gluster nodes.
>>>
>>>     In my most recent round of testing with 3.5.3, I'm seeing a
>>>     massive failure to write data to the volume after about 5-10
>>>     minutes, so I've simplified the scenario a bit (to minimize the
>>>     variables) to: both Gluster nodes up, only one node (duke)
>>>     mounted and running tgtd, and just regular (single path) iSCSI
>>>     from a single ESXi server.
>>>
>>>     About 5-10 minutes into migration a VM onto the test datastore,
>>>     /var/log/messages on duke gets blasted with a ton of messages
>>>     exactly like this:
>>>
>>>     Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error
>>>     0x1781e00 2a -1 512 22971904, Input/output error
>>>
>>>      
>>>
>>>     And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a
>>>     ton of messages exactly like this:
>>>
>>>     [2015-03-16 02:24:07.572279] W
>>>     [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635299:
>>>     WRITE => -1 (Input/output error)
>>>
>>>      
>>>
>>>
>>> Are there any messages in the mount log from AFR about split-brain
>>> just before the above line appears?
>>> Does `gluster v heal <VOLNAME> info` show any files? Performing I/O
>>> on files that are in split-brain fail with EIO.
>>>
>>> -Ravi
>>>
>>>     And the write operation from VMware's side fails as soon as
>>>     these messages start.
>>>
>>>      
>>>
>>>     I don't see any other errors (in the log files I know of)
>>>     indicating the root cause of these i/o errors.  I'm sure that
>>>     this is not enough information to tell what's going on, but can
>>>     anyone help me figure out what to look at next to figure this out?
>>>
>>>      
>>>
>>>     I've also considered using Dan Lambright's libgfapi gluster
>>>     module for tgtd (or something similar) to avoid going through
>>>     FUSE, but I'm not sure whether that would be irrelevant to this
>>>     problem, since I'm not 100% sure if it lies in FUSE or elsewhere.
>>>
>>>      
>>>
>>>     Thanks!
>>>
>>>      
>>>
>>>     /Jon Heese/
>>>     /Systems Engineer/
>>>     *INetU Managed Hosting*
>>>     P: 610.266.7441 x 261
>>>     F: 610.266.7434
>>>     www.inetu.net <https://www.inetu.net/>
>>>
>>>     /** This message contains confidential information, which also
>>>     may be privileged, and is intended only for the person(s)
>>>     addressed above. Any unauthorized use, distribution, copying or
>>>     disclosure of confidential and/or privileged information is
>>>     strictly prohibited. If you have received this communication in
>>>     error, please erase all copies of the message and its
>>>     attachments and notify the sender immediately via reply e-mail. **/
>>>
>>>      
>>>
>>>
>>>
>>>     _______________________________________________
>>>
>>>     Gluster-users mailing list
>>>
>>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>
>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>  
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150323/593b4723/attachment.html>


More information about the Gluster-users mailing list