[Gluster-users] Sporadic Bus error on mmap() on FUSE mount
    Niels de Vos 
    ndevos at redhat.com
       
    Tue Jul 18 10:17:45 UTC 2017
    
    
  
On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote:
> Hi,
> 
> I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses
> memory-mapped file IO extensively (I know I can recompile rrdtool with
> mmap() disabled, but that is just a workaround). I have three FUSE mount
> points on three different servers, on one of them the command "rrdtool
> create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U
> RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is
> killed and Bus error is reported. With every Bus error, following two lines
> rise in the mount log:
> [2017-07-18 08:30:22.470770] E [MSGID: 108008]
> [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0:
> Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain
> observed. [Input/output error]
> [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk]
> 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error)
> 
> I'm not sure about current state of mmap() on FUSE and Gluster, but its
> strange that it works only on certain mount of the same volume.
This can be caused when a mmap()'d region is not written. For example,
trying to read/write the mmap()'d region that is after the end-of-file.
I've seen issues like this before (long ago), and that got fixed in the
write-behind xlator.
Could you disable the performance.write-behind option for the volume and
try to reproduce the problem? If the issue is in write-behind, disabling
it should prevent the issue.
If this helps, please file a bug with strace of the application and
tcpdump that contains the GlusterFS traffic from start to end when the
problem is observed.
  https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind
HTH,
Niels
> 
> version: glusterfs 3.10.3
> 
> [root at dc1]# gluster volume info flow
> Volume Name: flow
> Type: Distributed-Replicate
> Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3 x 2 = 6
> Transport-type: tcp
> Bricks:
> Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
> Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
> Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
> Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
> Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
> Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
> Options Reconfigured:
> performance.parallel-readdir: on
> performance.client-io-threads: on
> cluster.nufa: enable
> network.ping-timeout: 10
> transport.address-family: inet
> nfs.disable: true
> 
> [root at dc1]# gluster volume status flow
> Status of volume: flow
> Gluster process                             TCP Port  RDMA Port Online  Pid
> ------------------------------------------------------------------------------
> Brick dc1.liberouter.org:/data/glusterfs/fl
> ow/brick1/safety_dir                        49155     0 Y       26441
> Brick dc2.liberouter.org:/data/glusterfs/fl
> ow/brick2/safety_dir                        49155     0 Y       26110
> Brick dc2.liberouter.org:/data/glusterfs/fl
> ow/brick1/safety_dir                        49156     0 Y       26129
> Brick dc3.liberouter.org:/data/glusterfs/fl
> ow/brick2/safety_dir                        49152     0 Y       8703
> Brick dc3.liberouter.org:/data/glusterfs/fl
> ow/brick1/safety_dir                        49153     0 Y       8722
> Brick dc1.liberouter.org:/data/glusterfs/fl
> ow/brick2/safety_dir                        49156     0 Y       26460
> Self-heal Daemon on localhost               N/A       N/A Y       26493
> Self-heal Daemon on dc2.liberouter.org      N/A       N/A Y       26151
> Self-heal Daemon on dc3.liberouter.org      N/A       N/A Y       8744
> 
> Task Status of Volume flow
> ------------------------------------------------------------------------------
> There are no active volume tasks
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170718/6526cee2/attachment.sig>
    
    
More information about the Gluster-users
mailing list