[Gluster-users] Sporadic Bus error on mmap() on FUSE mount

Jan Wrona wrona at cesnet.cz
Tue Jul 18 11:55:17 UTC 2017


On 18.7.2017 12:17, Niels de Vos wrote:
> On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote:
>> Hi,
>>
>> I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses
>> memory-mapped file IO extensively (I know I can recompile rrdtool with
>> mmap() disabled, but that is just a workaround). I have three FUSE mount
>> points on three different servers, on one of them the command "rrdtool
>> create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U
>> RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is
>> killed and Bus error is reported. With every Bus error, following two lines
>> rise in the mount log:
>> [2017-07-18 08:30:22.470770] E [MSGID: 108008]
>> [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0:
>> Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain
>> observed. [Input/output error]
>> [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk]
>> 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error)
>>
>> I'm not sure about current state of mmap() on FUSE and Gluster, but its
>> strange that it works only on certain mount of the same volume.
> This can be caused when a mmap()'d region is not written. For example,
> trying to read/write the mmap()'d region that is after the end-of-file.
> I've seen issues like this before (long ago), and that got fixed in the
> write-behind xlator.
>
> Could you disable the performance.write-behind option for the volume and
> try to reproduce the problem? If the issue is in write-behind, disabling
> it should prevent the issue.
>
> If this helps, please file a bug with strace of the application and
> tcpdump that contains the GlusterFS traffic from start to end when the
> problem is observed.

I've disabled the performance.write-behind, umounted, stopped and 
started the volume, then mounted again, but no effect. After that I've 
been successively disabling/enabling options and xlators, and I've found 
that the problem is related to the cluster.nufa option. When NUFA 
translator is disabled, rrdtool works fine on all mounts. When enabled 
again, the problem shows up again.

>
>    https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind
>
> HTH,
> Niels
>
>
>> version: glusterfs 3.10.3
>>
>> [root at dc1]# gluster volume info flow
>> Volume Name: flow
>> Type: Distributed-Replicate
>> Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 3 x 2 = 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
>> Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
>> Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
>> Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
>> Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
>> Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
>> Options Reconfigured:
>> performance.parallel-readdir: on
>> performance.client-io-threads: on
>> cluster.nufa: enable
>> network.ping-timeout: 10
>> transport.address-family: inet
>> nfs.disable: true
>>
>> [root at dc1]# gluster volume status flow
>> Status of volume: flow
>> Gluster process                             TCP Port  RDMA Port Online  Pid
>> ------------------------------------------------------------------------------
>> Brick dc1.liberouter.org:/data/glusterfs/fl
>> ow/brick1/safety_dir                        49155     0 Y       26441
>> Brick dc2.liberouter.org:/data/glusterfs/fl
>> ow/brick2/safety_dir                        49155     0 Y       26110
>> Brick dc2.liberouter.org:/data/glusterfs/fl
>> ow/brick1/safety_dir                        49156     0 Y       26129
>> Brick dc3.liberouter.org:/data/glusterfs/fl
>> ow/brick2/safety_dir                        49152     0 Y       8703
>> Brick dc3.liberouter.org:/data/glusterfs/fl
>> ow/brick1/safety_dir                        49153     0 Y       8722
>> Brick dc1.liberouter.org:/data/glusterfs/fl
>> ow/brick2/safety_dir                        49156     0 Y       26460
>> Self-heal Daemon on localhost               N/A       N/A Y       26493
>> Self-heal Daemon on dc2.liberouter.org      N/A       N/A Y       26151
>> Self-heal Daemon on dc3.liberouter.org      N/A       N/A Y       8744
>>
>> Task Status of Volume flow
>> ------------------------------------------------------------------------------
>> There are no active volume tasks
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users




More information about the Gluster-users mailing list