<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>Hi all,</p>
<p><br>
</p>
<p>I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients).
A snip of a client log is below:</p>
<p>[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)</p>
<p>[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)</p>
<p>[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)</p>
<p>[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)</p>
<p>[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)</p>
<p>[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)</p>
<p>The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]</p>
<p>The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862]</p>
<p>The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]</p>
<div><br>
</div>
<div>This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writes to the volume over FUSE. When files get to a certain point,w e can no longer write to them. The file
sizes are different as well, so it's not like they all get to the same size and just stop either. I've ruled out a free space issue, our files at their largest are only a few hundred GB and we have tens of terrabytes free on each brick. We are also sharding
at 1GB.</div>
<div><br>
</div>
<div>I'm not sure where to go from here as the error seems vague and I can only see it on the client log. I'm not seeing these errors on the nodes themselves. This is also seen if I mount the volume via FUSE on any of the nodes as well and it is only reflected
in the FUSE log.</div>
<div><br>
</div>
<div>Here is the volume info:</div>
<div>
<div>Volume Name: gv1</div>
<div>Type: Distributed-Replicate</div>
<div>Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c</div>
<div>Status: Started</div>
<div>Snapshot Count: 0</div>
<div>Number of Bricks: 8 x (2 + 1) = 24</div>
<div>Transport-type: tcp</div>
<div>Bricks:</div>
<div>Brick1: tpc-glus4:/exp/b1/gv1</div>
<div>Brick2: tpc-glus2:/exp/b1/gv1</div>
<div>Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)</div>
<div>Brick4: tpc-glus2:/exp/b2/gv1</div>
<div>Brick5: tpc-glus4:/exp/b2/gv1</div>
<div>Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)</div>
<div>Brick7: tpc-glus4:/exp/b3/gv1</div>
<div>Brick8: tpc-glus2:/exp/b3/gv1</div>
<div>Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)</div>
<div>Brick10: tpc-glus4:/exp/b4/gv1</div>
<div>Brick11: tpc-glus2:/exp/b4/gv1</div>
<div>Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter)</div>
<div>Brick13: tpc-glus1:/exp/b5/gv1</div>
<div>Brick14: tpc-glus3:/exp/b5/gv1</div>
<div>Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter)</div>
<div>Brick16: tpc-glus1:/exp/b6/gv1</div>
<div>Brick17: tpc-glus3:/exp/b6/gv1</div>
<div>Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter)</div>
<div>Brick19: tpc-glus1:/exp/b7/gv1</div>
<div>Brick20: tpc-glus3:/exp/b7/gv1</div>
<div>Brick21: tpc-arbiter2:/exp/b7/gv1 (arbiter)</div>
<div>Brick22: tpc-glus1:/exp/b8/gv1</div>
<div>Brick23: tpc-glus3:/exp/b8/gv1</div>
<div>Brick24: tpc-arbiter2:/exp/b8/gv1 (arbiter)</div>
<div>Options Reconfigured:</div>
<div>performance.cache-samba-metadata: on</div>
<div>performance.cache-invalidation: off</div>
<div>features.shard-block-size: 1000MB</div>
<div>features.shard: on</div>
<div>transport.address-family: inet</div>
<div>nfs.disable: on</div>
<div>cluster.lookup-optimize: on</div>
</div>
<div><br>
</div>
<div>I'm a bit stumped on this, any help is appreciated. Thank you!</div>
<p><br>
</p>
</div>
</body>
</html>