[Gluster-users] I/O error on replicated volume

Mon Mar 23 07:29:11 UTC 2015

On 03/23/2015 11:28 AM, Jonathan Heese wrote:
> On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C" <rkavunga at redhat.com
> <mailto:rkavunga at redhat.com>> wrote:
>
>>
>> On 03/21/2015 07:49 PM, Jonathan Heese wrote:
>>>
>>> Mohamed,
>>>
>>>
>>> I have completed the steps you suggested (unmount all, stop the
>>> volume, set the config.transport to tcp, start the volume, mount,
>>> etc.), and the behavior has indeed changed.
>>>
>>>
>>> [root at duke ~]# gluster volume info
>>>
>>> Volume Name: gluster_disk
>>> Type: Replicate
>>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: duke-ib:/bricks/brick1
>>> Brick2: duchess-ib:/bricks/brick1
>>> Options Reconfigured:
>>> config.transport: tcp
>>>
>>>
>>> [root at duke ~]# gluster volume status
>>> Status of volume: gluster_disk
>>> Gluster process                                         Port   
>>> Online  Pid
>>> ------------------------------------------------------------------------------
>>> Brick duke-ib:/bricks/brick1                            49152  
>>> Y       16362
>>> Brick duchess-ib:/bricks/brick1                         49152  
>>> Y       14155
>>> NFS Server on localhost                                 2049   
>>> Y       16374
>>> Self-heal Daemon on localhost                           N/A    
>>> Y       16381
>>> NFS Server on duchess-ib                                2049   
>>> Y       14167
>>> Self-heal Daemon on duchess-ib                          N/A    
>>> Y       14174
>>>
>>> Task Status of Volume gluster_disk
>>> ------------------------------------------------------------------------------
>>> There are no active volume tasks
>>>
>>> I am no longer seeing the I/O errors during prolonged periods of
>>> write I/O that I was seeing when the transport was set to rdma.
>>> However, I am seeing this message on both nodes every 3 seconds
>>> (almost exactly):
>>>
>>>
>>> ==> /var/log/glusterfs/nfs.log <==
>>> [2015-03-21 14:17:40.379719] W
>>> [rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1: cma
>>> event RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023
>>> peer:10.10.10.2:49152)
>>>
>>>
>>> Is this something to worry about?
>>>
>> If you are not using nfs to export the volumes, there is nothing to
>> worry.
>
> I'm using the native glusterfs FUSE component to mount the volume
> locally on both servers -- I assume that you're referring to the
> standard NFS protocol stuff, which I'm not using here.
>
> Incidentally, I would like to keep my logs from filling up with junk
> if possible.  Is there something I can do to get rid of these
> (useless?) error messages?

If i understand correctly, you are getting this enormous log message
from nfs log only, all other logs and everything are fine now, right ?
If that is the case, and you are not at all using nfs for exporting the
volume, as  a workaround you can disable nfs for your volume or cluster.
(gluster v set nfs.disable on). This will turnoff your gluster nfs
server, and you will no longer get those log messages.

>>> Any idea why there are rdma pieces in play when I've set my
>>> transport to tcp?
>>>
>>
>> there should not be any piece of rdma,if possible, can you paste the
>> volfile for nfs server. You can find the volfile in
>> /var/lib/glusterd/nfs/nfs-server.vol or
>> /usr/local/var/lib/glusterd/nfs/nfs-server.vol
>
> I will get this for you when I can.  Thanks.

If you can make it, that will be great help to understand the problem.

Rafi KC

>
> Regards,
> Jon Heese
>
>> Rafi KC
>>>
>>> The actual I/O appears to be handled properly and I've seen no
>>> further errors in the testing I've done so far.
>>>
>>>
>>> Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Jon Heese
>>>
>>>
>>> ------------------------------------------------------------------------
>>> *From:* gluster-users-bounces at gluster.org
>>> <gluster-users-bounces at gluster.org> on behalf of Jonathan Heese
>>> <jheese at inetu.net>
>>> *Sent:* Friday, March 20, 2015 7:04 AM
>>> *To:* Mohammed Rafi K C
>>> *Cc:* gluster-users
>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>  
>>> Mohammed,
>>>
>>> Thanks very much for the reply.  I will try that and report back.
>>>
>>> Regards,
>>> Jon Heese
>>>
>>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C"
>>> <rkavunga at redhat.com <mailto:rkavunga at redhat.com>> wrote:
>>>
>>>>
>>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>>  
>>>>>
>>>>> Does anyone else have any further suggestions for troubleshooting
>>>>> this?
>>>>>
>>>>>  
>>>>>
>>>>> To sum up: I have a 2 node 2 brick replicated volume, which holds
>>>>> a handful of iSCSI image files which are mounted and served up by
>>>>> tgtd (CentOS 6) to a handful of devices on a dedicated iSCSI
>>>>> network.  The most important iSCSI clients (initiators) are four
>>>>> VMware ESXi 5.5 hosts that use the iSCSI volumes as backing for
>>>>> their datastores for virtual machine storage.
>>>>>
>>>>>  
>>>>>
>>>>> After a few minutes of sustained writing to the volume, I am
>>>>> seeing a massive flood (over 1500 per second at times) of this
>>>>> error in /var/log/glusterfs/mnt-gluster-disk.log:
>>>>>
>>>>> [2015-03-16 02:24:07.582801] W
>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635358:
>>>>> WRITE => -1 (Input/output error)
>>>>>
>>>>>  
>>>>>
>>>>> When this happens, the ESXi box fails its write operation and
>>>>> returns an error to the effect of “Unable to write data to
>>>>> datastore”.  I don’t see anything else in the supporting logs to
>>>>> explain the root cause of the i/o errors.
>>>>>
>>>>>  
>>>>>
>>>>> Any and all suggestions are appreciated.  Thanks.
>>>>>
>>>>>  
>>>>>
>>>>
>>>> From the mount logs, i assume that your volume transport type is
>>>> rdma. There are some known issues for rdma in 3.5.3, and the patch
>>>> for to address those issues are already send to upstream [1]. From
>>>> the logs, I'm not sure and it is hard to tell you whether this
>>>> problem is something related to rdma transport or not. To make sure
>>>> that the tcp transport is works well in this scenario, if possible
>>>> can you try to reproduce the same using tcp type volumes. You can
>>>> change the transport type of volume by doing the following step (
>>>> not recommended in normal use case).
>>>>
>>>> 1) unmount every client
>>>> 2) stop the volume
>>>> 3) run gluster volume set volname config.transport tcp
>>>> 4) start the volume again
>>>> 5) mount the clients
>>>>
>>>> [1] : http://goo.gl/2PTL61
>>>>
>>>> Regards
>>>> Rafi KC
>>>>
>>>>> /Jon Heese/
>>>>> /Systems Engineer/
>>>>> *INetU Managed Hosting*
>>>>> P: 610.266.7441 x 261
>>>>> F: 610.266.7434
>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>
>>>>> /** This message contains confidential information, which also may
>>>>> be privileged, and is intended only for the person(s) addressed
>>>>> above. Any unauthorized use, distribution, copying or disclosure
>>>>> of confidential and/or privileged information is strictly
>>>>> prohibited. If you have received this communication in error,
>>>>> please erase all copies of the message and its attachments and
>>>>> notify the sender immediately via reply e-mail. **/
>>>>>
>>>>>  
>>>>>
>>>>> *From:*Jonathan Heese
>>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM
>>>>> *To:* 'Ravishankar N'; gluster-users at gluster.org
>>>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume
>>>>>
>>>>>  
>>>>>
>>>>> Ravi,
>>>>>
>>>>>  
>>>>>
>>>>> The last lines in the mount log before the massive vomit of I/O
>>>>> errors are from 22 minutes prior, and seem innocuous to me:
>>>>>
>>>>>  
>>>>>
>>>>> [2015-03-16 01:37:07.126340] E
>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>> 0-gluster_disk-client-0: failed to get the port number for remote
>>>>> subvolume. Please run 'gluster volume status' on server to see if
>>>>> brick process is running.
>>>>>
>>>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>> [0x7fd9c557bccf]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>> [0x7fd9c557a995]
>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>> (peer:10.10.10.1:24008)
>>>>>
>>>>> [2015-03-16 01:37:07.126687] E
>>>>> [client-handshake.c:1760:client_query_portmap_cbk]
>>>>> 0-gluster_disk-client-1: failed to get the port number for remote
>>>>> subvolume. Please run 'gluster volume status' on server to see if
>>>>> brick process is running.
>>>>>
>>>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>> [0x7fd9c557bccf]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>> [0x7fd9c557a995]
>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>> (peer:10.10.10.2:24008)
>>>>>
>>>>> [2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>>>>> 0-gluster_disk-client-0: changing port to 49152 (from 0)
>>>>>
>>>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>> [0x7fd9c557bccf]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>> [0x7fd9c557a995]
>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called
>>>>> (peer:10.10.10.1:24008)
>>>>>
>>>>> [2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig]
>>>>> 0-gluster_disk-client-1: changing port to 49152 (from 0)
>>>>>
>>>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f)
>>>>> [0x7fd9c557bccf]
>>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
>>>>> [0x7fd9c557a995]
>>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea)
>>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called
>>>>> (peer:10.10.10.2:24008)
>>>>>
>>>>> [2015-03-16 01:37:10.741883] I
>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num
>>>>> (1298437), Version (330)
>>>>>
>>>>> [2015-03-16 01:37:10.744524] I
>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached
>>>>> to remote volume '/bricks/brick1'.
>>>>>
>>>>> [2015-03-16 01:37:10.744537] I
>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>> 0-gluster_disk-client-0: Server and Client lk-version numbers are
>>>>> not same, reopening the fds
>>>>>
>>>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify]
>>>>> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0' came
>>>>> back up; going online.
>>>>>
>>>>> [2015-03-16 01:37:10.744627] I
>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>> 0-gluster_disk-client-0: Server lk version = 1
>>>>>
>>>>> [2015-03-16 01:37:10.753037] I
>>>>> [client-handshake.c:1677:select_server_supported_programs]
>>>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num
>>>>> (1298437), Version (330)
>>>>>
>>>>> [2015-03-16 01:37:10.755657] I
>>>>> [client-handshake.c:1462:client_setvolume_cbk]
>>>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached
>>>>> to remote volume '/bricks/brick1'.
>>>>>
>>>>> [2015-03-16 01:37:10.755676] I
>>>>> [client-handshake.c:1474:client_setvolume_cbk]
>>>>> 0-gluster_disk-client-1: Server and Client lk-version numbers are
>>>>> not same, reopening the fds
>>>>>
>>>>> [2015-03-16 01:37:10.761945] I
>>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse: switched to graph 0
>>>>>
>>>>> [2015-03-16 01:37:10.762144] I
>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>> 0-gluster_disk-client-1: Server lk version = 1
>>>>>
>>>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init]
>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
>>>>> 7.22 kernel 7.14
>>>>>
>>>>> [*2015-03-16 01:59:26.098670*] W
>>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084:
>>>>> WRITE => -1 (Input/output error)
>>>>>
>>>>> …
>>>>>
>>>>>  
>>>>>
>>>>> I’ve seen no indication of split-brain on any files at any point
>>>>> in this (ever since downdating from 3.6.2 to 3.5.3, which is when
>>>>> this particular issue started):
>>>>>
>>>>> [root at duke gfapi-module-for-linux-target-driver-]# gluster v heal
>>>>> gluster_disk info
>>>>>
>>>>> Brick duke.jonheese.local:/bricks/brick1/
>>>>>
>>>>> Number of entries: 0
>>>>>
>>>>>  
>>>>>
>>>>> Brick duchess.jonheese.local:/bricks/brick1/
>>>>>
>>>>> Number of entries: 0
>>>>>
>>>>>  
>>>>>
>>>>> Thanks.
>>>>>
>>>>>  
>>>>>
>>>>> /Jon Heese/
>>>>> /Systems Engineer/
>>>>> *INetU Managed Hosting*
>>>>> P: 610.266.7441 x 261
>>>>> F: 610.266.7434
>>>>> www.inetu.net <https://www.inetu.net/>
>>>>>
>>>>> /** This message contains confidential information, which also may
>>>>> be privileged, and is intended only for the person(s) addressed
>>>>> above. Any unauthorized use, distribution, copying or disclosure
>>>>> of confidential and/or privileged information is strictly
>>>>> prohibited. If you have received this communication in error,
>>>>> please erase all copies of the message and its attachments and
>>>>> notify the sender immediately via reply e-mail. **/
>>>>>
>>>>>  
>>>>>
>>>>> *From:*Ravishankar N [mailto:ravishankar at redhat.com]
>>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM
>>>>> *To:* Jonathan Heese; gluster-users at gluster.org
>>>>> <mailto:gluster-users at gluster.org>
>>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume
>>>>>
>>>>>  
>>>>>
>>>>>  
>>>>>
>>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote:
>>>>>
>>>>>     Hello,
>>>>>
>>>>>     So I resolved my previous issue with split-brains and the lack
>>>>>     of self-healing by dropping my installed glusterfs* packages
>>>>>     from 3.6.2 to 3.5.3, but now I've picked up a new issue, which
>>>>>     actually makes normal use of the volume practically impossible.
>>>>>
>>>>>     A little background for those not already paying close attention:
>>>>>     I have a 2 node 2 brick replicating volume whose purpose in
>>>>>     life is to hold iSCSI target files, primarily for use to
>>>>>     provide datastores to a VMware ESXi cluster.  The plan is to
>>>>>     put a handful of image files on the Gluster volume, mount them
>>>>>     locally on both Gluster nodes, and run tgtd on both, pointed
>>>>>     to the image files on the mounted gluster volume. Then the
>>>>>     ESXi boxes will use multipath (active/passive) iSCSI to
>>>>>     connect to the nodes, with automatic failover in case of
>>>>>     planned or unplanned downtime of the Gluster nodes.
>>>>>
>>>>>     In my most recent round of testing with 3.5.3, I'm seeing a
>>>>>     massive failure to write data to the volume after about 5-10
>>>>>     minutes, so I've simplified the scenario a bit (to minimize
>>>>>     the variables) to: both Gluster nodes up, only one node (duke)
>>>>>     mounted and running tgtd, and just regular (single path) iSCSI
>>>>>     from a single ESXi server.
>>>>>
>>>>>     About 5-10 minutes into migration a VM onto the test
>>>>>     datastore, /var/log/messages on duke gets blasted with a ton
>>>>>     of messages exactly like this:
>>>>>
>>>>>     Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error
>>>>>     0x1781e00 2a -1 512 22971904, Input/output error
>>>>>
>>>>>      
>>>>>
>>>>>     And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a
>>>>>     ton of messages exactly like this:
>>>>>
>>>>>     [2015-03-16 02:24:07.572279] W
>>>>>     [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635299:
>>>>>     WRITE => -1 (Input/output error)
>>>>>
>>>>>      
>>>>>
>>>>>
>>>>> Are there any messages in the mount log from AFR about split-brain
>>>>> just before the above line appears?
>>>>> Does `gluster v heal <VOLNAME> info` show any files? Performing
>>>>> I/O on files that are in split-brain fail with EIO.
>>>>>
>>>>> -Ravi
>>>>>
>>>>>     And the write operation from VMware's side fails as soon as
>>>>>     these messages start.
>>>>>
>>>>>      
>>>>>
>>>>>     I don't see any other errors (in the log files I know of)
>>>>>     indicating the root cause of these i/o errors.  I'm sure that
>>>>>     this is not enough information to tell what's going on, but
>>>>>     can anyone help me figure out what to look at next to figure
>>>>>     this out?
>>>>>
>>>>>      
>>>>>
>>>>>     I've also considered using Dan Lambright's libgfapi gluster
>>>>>     module for tgtd (or something similar) to avoid going through
>>>>>     FUSE, but I'm not sure whether that would be irrelevant to
>>>>>     this problem, since I'm not 100% sure if it lies in FUSE or
>>>>>     elsewhere.
>>>>>
>>>>>      
>>>>>
>>>>>     Thanks!
>>>>>
>>>>>      
>>>>>
>>>>>     /Jon Heese/
>>>>>     /Systems Engineer/
>>>>>     *INetU Managed Hosting*
>>>>>     P: 610.266.7441 x 261
>>>>>     F: 610.266.7434
>>>>>     www.inetu.net <https://www.inetu.net/>
>>>>>
>>>>>     /** This message contains confidential information, which also
>>>>>     may be privileged, and is intended only for the person(s)
>>>>>     addressed above. Any unauthorized use, distribution, copying
>>>>>     or disclosure of confidential and/or privileged information is
>>>>>     strictly prohibited. If you have received this communication
>>>>>     in error, please erase all copies of the message and its
>>>>>     attachments and notify the sender immediately via reply
>>>>>     e-mail. **/
>>>>>
>>>>>      
>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>
>>>>>     Gluster-users mailing list
>>>>>
>>>>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>
>>>>>     http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150323/1141b053/attachment.html>