[Bugs] [Bug 1633669] New: Gluster bricks fails frequently

Thu Sep 27 13:58:33 UTC 2018

https://bugzilla.redhat.com/show_bug.cgi?id=1633669

            Bug ID: 1633669
           Summary: Gluster bricks fails frequently
           Product: GlusterFS
           Version: 4.1
         Component: glusterd
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: jaime.dulzura at cevalogistics.com
                CC: bugs at gluster.org

Description of problem:

We are trying to get the best Gluster volume options to fit our need to share
storage for TibCo EMS.

Unfortunately, bricks are failing after few runs of stress testing.

Version-Release number of selected component (if applicable):
Glusterfs-Server 4.1.4 and the latest release 4.1.5

First setup:

3VMs / 8 vCPU / 16G Memory from VSphere 6.5
Gluster volume (replica 3 no arbiter)
bricks are failing after first run of 50k messages

Second Setup:

3VMs / 8 vCPU / 16G Memory from VSphere 6.5
Gluster volume (replica 3 arbiter 1)
bricks are failing after 2 runs of 50k messages

How reproducible: 

Setup same specs with gluster 4.1.4 or 4.1.5 then run 50k TibCo EMS3. mount a
volume using gluster native client.

Steps to Reproduce:
1.Setup same VM specs
2.Run a 50k messages from TibCo EMS with Gluster Native shared storage
3.

Actual results: 
Bricks may fail along the way or after all messages have been processed

Expected results:
Volume heath should be available for next round of 50k messages.

Additional info:
Core dumps were generated on the node with bricks that are failing.

volumes info:

 gluster v info

Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 255a31c4-13a1-4330-a73d-6d001e71d57c
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: iahdvlgfsa001.logistics.corp:/var/lib/glusterd/ss_brick
Brick2: iahdvlgfsb001:/var/lib/glusterd/ss_brick
Brick3: iahdvlgfsc001:/var/lib/glusterd/ss_brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.enable-shared-storage: enable

Volume Name: tibco
Type: Replicate
Volume ID: abc14a06-852d-46c2-8e70-a1f09136bc08
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: iahdvlgfsa001:/local/bricks/volume01/tibco
Brick2: iahdvlgfsb001:/local/bricks/volume01/tibco
Brick3: iahdvlgfsc001:/local/bricks/volume01/tibco (arbiter)
Options Reconfigured:
auth.allow: 127.0.0.1,10.1.25.*,10.1.26.*,10.1.34.*
nfs.disable: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.strict-o-direct: on
performance.strict-write-ordering: on
cluster.enable-shared-storage: enable

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.