[Gluster-users] [External] Re: Input/output error on FUSE log
Davide Obbi
davide.obbi at booking.com
Sun Jan 6 18:26:10 UTC 2019
Hi,
i would start doing some checks like: "(Input/output error)" seems returned
by the operating system, this happens for instance trying to access a file
system which is on a device not available so i would check the network
connectivity between the client to servers and server to server during the
reported time.
Regards
Davide
On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa <rgowdapp at redhat.com>
wrote:
>
>
> On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa <rgowdapp at redhat.com>
> wrote:
>
>>
>>
>> On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack <mwaymack at nsgdv.com> wrote:
>>
>>> Hi all,
>>>
>>>
>>> I'm having a problem writing to our volume. When writing files larger
>>> than about 2GB, I get an intermittent issue where the write will fail and
>>> return Input/Output error. This is also shown in the FUSE log of the
>>> client (this is affecting all clients). A snip of a client log is below:
>>>
>>> [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 51040978: WRITE => -1
>>> gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output
>>> error)
>>>
>>> [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk]
>>> 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)
>>>
>>> [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 51041266: WRITE => -1
>>> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output
>>> error)
>>>
>>> [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk]
>>> 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)
>>>
>>> [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk]
>>> 0-glusterfs-fuse: 51041548: WRITE => -1
>>> gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output
>>> error)
>>>
>>> [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk]
>>> 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)
>>>
>>> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
>>> 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times
>>> between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]
>>>
>>> The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk]
>>> 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05
>>> 22:39:33.925981] and [2019-01-05 22:39:50.451862]
>>>
>>> The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search]
>>> 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times
>>> between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]
>>>
>>
>> This looks to be a DHT issue. Some questions:
>> * Are all subvolumes of DHT up and client is connected to them?
>> Particularly the subvolume which contains the file in question.
>> * Can you get all extended attributes of parent directory of the file
>> from all bricks?
>> * set diagnostics.client-log-level to TRACE, capture these errors again
>> and attach the client log file.
>>
>
> I spoke a bit early. dht_writev doesn't search hashed subvolume as its
> already been looked up in lookup. So, these msgs looks to be of a different
> issue - not writev failure.
>
>
>>
>>> This is intermittent for most files, but eventually if a file is large
>>> enough it will not write. The workflow is SFTP tot he client which then
>>> writes to the volume over FUSE. When files get to a certain point,w e can
>>> no longer write to them. The file sizes are different as well, so it's not
>>> like they all get to the same size and just stop either. I've ruled out a
>>> free space issue, our files at their largest are only a few hundred GB and
>>> we have tens of terrabytes free on each brick. We are also sharding at 1GB.
>>>
>>> I'm not sure where to go from here as the error seems vague and I can
>>> only see it on the client log. I'm not seeing these errors on the nodes
>>> themselves. This is also seen if I mount the volume via FUSE on any of the
>>> nodes as well and it is only reflected in the FUSE log.
>>>
>>> Here is the volume info:
>>> Volume Name: gv1
>>> Type: Distributed-Replicate
>>> Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 8 x (2 + 1) = 24
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: tpc-glus4:/exp/b1/gv1
>>> Brick2: tpc-glus2:/exp/b1/gv1
>>> Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
>>> Brick4: tpc-glus2:/exp/b2/gv1
>>> Brick5: tpc-glus4:/exp/b2/gv1
>>> Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
>>> Brick7: tpc-glus4:/exp/b3/gv1
>>> Brick8: tpc-glus2:/exp/b3/gv1
>>> Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
>>> Brick10: tpc-glus4:/exp/b4/gv1
>>> Brick11: tpc-glus2:/exp/b4/gv1
>>> Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter)
>>> Brick13: tpc-glus1:/exp/b5/gv1
>>> Brick14: tpc-glus3:/exp/b5/gv1
>>> Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter)
>>> Brick16: tpc-glus1:/exp/b6/gv1
>>> Brick17: tpc-glus3:/exp/b6/gv1
>>> Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter)
>>> Brick19: tpc-glus1:/exp/b7/gv1
>>> Brick20: tpc-glus3:/exp/b7/gv1
>>> Brick21: tpc-arbiter2:/exp/b7/gv1 (arbiter)
>>> Brick22: tpc-glus1:/exp/b8/gv1
>>> Brick23: tpc-glus3:/exp/b8/gv1
>>> Brick24: tpc-arbiter2:/exp/b8/gv1 (arbiter)
>>> Options Reconfigured:
>>> performance.cache-samba-metadata: on
>>> performance.cache-invalidation: off
>>> features.shard-block-size: 1000MB
>>> features.shard: on
>>> transport.address-family: inet
>>> nfs.disable: on
>>> cluster.lookup-optimize: on
>>>
>>> I'm a bit stumped on this, any help is appreciated. Thank you!
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
--
Davide Obbi
Senior System Administrator
Booking.com B.V.
Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
Direct +31207031558
[image: Booking.com] <https://www.booking.com/>
Empowering people to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
million reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190106/90320289/attachment.html>
More information about the Gluster-users
mailing list