[Bugs] [Bug 1529237] New: Sometimes read failed just after write before file closed when try 100 times in different new files

bugzilla at redhat.com bugzilla at redhat.com
Wed Dec 27 09:02:59 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1529237

            Bug ID: 1529237
           Summary: Sometimes read failed just after write before file
                    closed when try 100 times in different new files
           Product: GlusterFS
           Version: 3.12
         Component: io-cache
          Severity: urgent
          Assignee: bugs at gluster.org
          Reporter: george.lian at nokia.com
                CC: bugs at gluster.org



Created attachment 1372733
  --> https://bugzilla.redhat.com/attachment.cgi?id=1372733&action=edit
Test script and log for this issue

Description of problem:

write 2 long variable and other text into a new file which store in glusterfs
filesystem, and then read and check whether the value is just wrote or not,
after try about 100 times into different new files, it easy to reproduce that
the read value is not correct like just wrote.

if test the same procedure in EXT4 filesytem, the issue will not be exist.

Version-Release number of selected component (if applicable):
3.12.3

How reproducible:


Steps to Reproduce:
1. create a gluster volume named "test", and mount to "/mnt/test" for this
volume.  (we use replicate mode , have SN-0, SN-1 2 SN node, and a Client Node)
2. copy a shell script _loop.sh and file_test.c in attachment
3. compile file_test.c with binary output "file_test" (could use command "make
file_test")
4. run _loop.sh , format is _loop.sh <try_times> <save_to_glusterwhere>

Actual results:


/mnt/test is mounted from gluster storage.

[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__1
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__2
Read values do not match with written data: -2064763152 and -1074501584
Error 6 returned on loop 3
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__3
Read values do not match with written data: -24283408 and -1440483296
Error 6 returned on loop 39
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__4
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__5
Read values do not match with written data: 438253296 and -2074572768
Error 6 returned on loop 34
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__6
Read values do not match with written data: -972917008 and -231886816
Error 6 returned on loop 3
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/test/test__7
Read values do not match with written data: -434657552 and 906727456
Error 6 returned on loop 6



Expected results:
Expect the test result like run the same command in ext4 ("/mnt/tmp" is EXT4
filesystem)
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__1
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__2
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__3
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__4
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__5
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__6
Loop completed 100 times without errors.
[root at mn-0:/home/robot]
# ./_loop.sh 100 /mnt/tmp/test__7
Loop completed 100 times without errors.
[root at mn-0:/home/robot]



Additional info:
with the below performance related translator disabled , the issue still is
exist:
gluster volume set test stat-prefetch off
gluster volume set test read-ahead off
gluster volume set test write-behind off
gluster volume set test io-cache off
gluster volume set test quick-read off
gluster volume set test open-behind off
# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: 2f602a72-2836-45c5-a14f-aaedab65f95d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: sn-0.local:/mnt/bricks/test/brick
Brick2: sn-1.local:/mnt/bricks/test/brick
Options Reconfigured:
diagnostics.brick-log-level: TRACE
diagnostics.client-log-level: TRACE
performance.open-behind: off
performance.quick-read: off
performance.io-cache: off
performance.write-behind: off
performance.read-ahead: off
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.server-quorum-ratio: 51%

I also attached the debug log when run the test FYI, and attach some I think
useful log abstract from the attachment below FYI:

[2017-12-27 06:21:04.332524] T [MSGID: 0] [afr-inode-read.c:286:afr_fstat_cbk]
12-stack-trace: stack-address: 0x7fb5ec010680, test-replicate-0 returned 0
[2017-12-27 06:21:04.332724] T [MSGID: 0] [syncop.c:1715:syncop_fgetxattr]
12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-dht to
test-replicate-0
[2017-12-27 06:21:04.332768] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn]
12-test-replicate-0: 70e0f7ee-2f0a-4364-a38e-96e7031eee7b: generation now vs
cached: 2, 2
[2017-12-27 06:21:04.332791] T [MSGID: 0]
[afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address:
0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.332817] T [rpc-clnt.c:1496:rpc_clnt_record]
12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.332837] T [rpc-clnt.c:1353:rpc_clnt_record_build_header]
12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76
[2017-12-27 06:21:04.332862] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt:
submitted request (XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35)
to rpc-transport (test-client-0)
[2017-12-27 06:21:04.333943] T [rpc-clnt.c:675:rpc_clnt_reply_init]
12-test-client-0: received rpc message (RPC XID: 0x2681 Program: GlusterFS 3.3,
ProgVers: 330, Proc: 35) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.334013] D [MSGID: 0]
[client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-0: remote
operation failed: No data available
[2017-12-27 06:21:04.334036] D [MSGID: 0]
[client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address:
0x7fb5fc0618d0, test-client-0 returned -1 error: No data available [No data
available]
[2017-12-27 06:21:04.334114] T [MSGID: 0]
[afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace:
stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.334174] T [rpc-clnt.c:1496:rpc_clnt_record]
12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.334196] T [rpc-clnt.c:1353:rpc_clnt_record_build_header]
12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334229] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt:
submitted request (XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25)
to rpc-transport (test-client-0)
[2017-12-27 06:21:04.334280] T [MSGID: 0]
[afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace:
stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1
[2017-12-27 06:21:04.334311] T [rpc-clnt.c:1496:rpc_clnt_record]
12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.334330] T [rpc-clnt.c:1353:rpc_clnt_record_build_header]
12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334354] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt:
submitted request (XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25)
to rpc-transport (test-client-1)
[2017-12-27 06:21:04.334904] T [rpc-clnt.c:675:rpc_clnt_reply_init]
12-test-client-1: received rpc message (RPC XID: 0x252c Program: GlusterFS 3.3,
ProgVers: 330, Proc: 25) from rpc-transport (test-client-1)
[2017-12-27 06:21:04.334983] T [MSGID: 0]
[client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address:
0x7fb5fc0618d0, test-client-1 returned 0
[2017-12-27 06:21:04.335128] T [rpc-clnt.c:675:rpc_clnt_reply_init]
12-test-client-0: received rpc message (RPC XID: 0x2682 Program: GlusterFS 3.3,
ProgVers: 330, Proc: 25) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.335165] T [MSGID: 0]
[client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address:
0x7fb5fc0618d0, test-client-0 returned 0
[2017-12-27 06:21:04.335193] T [MSGID: 0]
[afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address:
0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1
[2017-12-27 06:21:04.335224] T [rpc-clnt.c:1496:rpc_clnt_record]
12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.335253] T [rpc-clnt.c:1353:rpc_clnt_record_build_header]
12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76
[2017-12-27 06:21:04.335284] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt:
submitted request (XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35)
to rpc-transport (test-client-1)
[2017-12-27 06:21:04.335870] T [rpc-clnt.c:675:rpc_clnt_reply_init]
12-test-client-1: received rpc message (RPC XID: 0x252d Program: GlusterFS 3.3,
ProgVers: 330, Proc: 35) from rpc-transport (test-client-1)
[2017-12-27 06:21:04.335902] D [MSGID: 0]
[client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-1: remote
operation failed: No data available
[2017-12-27 06:21:04.335922] D [MSGID: 0]
[client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address:
0x7fb5fc0618d0, test-client-1 returned -1 error: No data available [No data
available]
[2017-12-27 06:21:04.335950] D [MSGID: 0]
[afr-inode-read.c:1720:afr_fgetxattr_wind] 12-stack-trace: stack-address:
0x7fb5fc0618d0, test-replicate-0 returned -1 error: No data available [No data
available]

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list