[Gluster-devel] Sometimes read failed just after write before file closed when try 100 times in different new files

Venetjoki, Sami (Nokia - FI/Espoo) sami.venetjoki at nokia.com
Thu Dec 28 14:10:51 UTC 2017


Hi,

yes, in test code there was such issue. The production code that occasionally has issues with first time accessing the file however has proper parameters in use when opening a file.

In production the issue is now gone when the same file being read/written is no longer locked with another file descriptor usin gflock/fcntl but the access prevention is done with a separate lock file, so avoiding dual handle to same file seems to fix the problem. Perhaps I'll need to tune the production code still a bit so that further information could be obtained why the file does not contain valid data right after writing.

br, sami


From: Lian, George (NSB - CN/Hangzhou)
Sent: Thursday, December 28, 2017 8:21 AM
To: Gluster-devel at gluster.org; Venetjoki, Sami (Nokia - FI/Espoo) <sami.venetjoki at nokia.com>
Cc: Zhong, Hua (NSB - CN/Hangzhou) <hua.zhong at nokia-sbell.com>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li at nokia-sbell.com>; I_MBB_HETRAN_SW3_TEAM_MATRIX_GMS <I_MBB_HETRAN_SW3_TEAM_MATRIX at internal.nsn.com>
Subject: RE: Sometimes read failed just after write before file closed when try 100 times in different new files

Hi, Gluster Expert,

Sorry for wrong code "open(argv[1], O_CREAT|O_RDWR|S_IRUSR|S_IWUSR) "in previous mail, after correct it , the issues seems GONE.
But I still confuse why wrong code not failed in EXT4 file system.
After I check tcpdump packet via network, it show that in CREATE process, the mode of file sometime became to " Mode: 0101000, S_ISVTX, Reserved ", with this mode, the file could be seen with "ls" command, but can't seen by "hexdump"command even with root privilege,  but if run this wrong program in ext4 file system, this MODE can't be occurred.
So I suggest maybe Gluster could do some inprovment for this MODE , so that can avoid thiss can't access file with wrong API called.

@Sami,
After I check the manual of open API, the correct format should be
      int open(const char *pathname, int flags);
       int open(const char *pathname, int flags, mode_t mode);
for your test code, I suppose the correct call should be "open(argv[1], O_CREAT|O_RDWR,S_IRUSR|S_IWUSR) "
after I correct it , c and C++ code all PASS in 3.6 and 3.12.

Best Regards,
George

From: Lian, George (NSB - CN/Hangzhou)
Sent: Wednesday, December 27, 2017 5:23 PM
To: Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
Cc: Zhong, Hua (NSB - CN/Hangzhou) <hua.zhong at nokia-sbell.com<mailto:hua.zhong at nokia-sbell.com>>; Venetjoki, Sami (Nokia - FI/Espoo) <sami.venetjoki at nokia.com<mailto:sami.venetjoki at nokia.com>>; Li, Deqian (NSB - CN/Hangzhou) <deqian.li at nokia-sbell.com<mailto:deqian.li at nokia-sbell.com>>
Subject: Sometimes read failed just after write before file closed when try 100 times in different new files

Hi, Gluster expert,

I has just raise a bug with https://bugzilla.redhat.com/show_bug.cgi?id=1529237 on gluster community.

I also attached the reproduce source code and shell script here FYI.

>From my investigation , I suppose the issue maybe caused by "cache in client is invalid but the file still not been persistented in server brick"
But I can't find the root cause from the debug log which attached in https://bugzilla.redhat.com/show_bug.cgi?id=1529237.

>From the following log I abstract client log from the attachment in the Bugzilla, I highlight it with YELLOW which in read phase, you can see both volume(test-client-0 and test-client-1) are failed
I just suppose it maybe some abnormal when it in read phase, but I can't any usefully log in server brick log from myself, so your helps is highly appreciated!

[2017-12-27 06:21:04.332524] T [MSGID: 0] [afr-inode-read.c:286:afr_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5ec010680, test-replicate-0 returned 0
[2017-12-27 06:21:04.332724] T [MSGID: 0] [syncop.c:1715:syncop_fgetxattr] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-dht to test-replicate-0
[2017-12-27 06:21:04.332768] D [MSGID: 0] [afr-read-txn.c:220:afr_read_txn] 12-test-replicate-0: 70e0f7ee-2f0a-4364-a38e-96e7031eee7b: generation now vs cached: 2, 2
[2017-12-27 06:21:04.332791] T [MSGID: 0] [afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.332817] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.332837] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76
[2017-12-27 06:21:04.332862] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) to rpc-transport (test-client-0)
[2017-12-27 06:21:04.333943] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-0: received rpc message (RPC XID: 0x2681 Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.334013] D [MSGID: 0] [client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-0: remote operation failed: No data available
[2017-12-27 06:21:04.334036] D [MSGID: 0] [client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-0 returned -1 error: No data available [No data available]
[2017-12-27 06:21:04.334114] T [MSGID: 0] [afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-0
[2017-12-27 06:21:04.334174] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-0: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.334196] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334229] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) to rpc-transport (test-client-0)
[2017-12-27 06:21:04.334280] T [MSGID: 0] [afr-common.c:1306:afr_inode_refresh_subvol_with_fstat] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1
[2017-12-27 06:21:04.334311] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.334330] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 336, payload: 260, rpc hdr: 76
[2017-12-27 06:21:04.334354] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) to rpc-transport (test-client-1)
[2017-12-27 06:21:04.334904] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-1: received rpc message (RPC XID: 0x252c Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) from rpc-transport (test-client-1)
[2017-12-27 06:21:04.334983] T [MSGID: 0] [client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-1 returned 0
[2017-12-27 06:21:04.335128] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-0: received rpc message (RPC XID: 0x2682 Program: GlusterFS 3.3, ProgVers: 330, Proc: 25) from rpc-transport (test-client-0)
[2017-12-27 06:21:04.335165] T [MSGID: 0] [client-rpc-fops.c:1462:client3_3_fstat_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-0 returned 0
[2017-12-27 06:21:04.335193] T [MSGID: 0] [afr-inode-read.c:1728:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, winding from test-replicate-0 to test-client-1
[2017-12-27 06:21:04.335224] T [rpc-clnt.c:1496:rpc_clnt_record] 12-test-client-1: Auth Info: pid: 5257, uid: 0, gid: 0, owner: 0000000000000000
[2017-12-27 06:21:04.335253] T [rpc-clnt.c:1353:rpc_clnt_record_build_header] 12-rpc-clnt: Request fraglen 140, payload: 64, rpc hdr: 76
[2017-12-27 06:21:04.335284] T [rpc-clnt.c:1699:rpc_clnt_submit] 12-rpc-clnt: submitted request (XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) to rpc-transport (test-client-1)
[2017-12-27 06:21:04.335870] T [rpc-clnt.c:675:rpc_clnt_reply_init] 12-test-client-1: received rpc message (RPC XID: 0x252d Program: GlusterFS 3.3, ProgVers: 330, Proc: 35) from rpc-transport (test-client-1)
[2017-12-27 06:21:04.335902] D [MSGID: 0] [client-rpc-fops.c:1143:client3_3_fgetxattr_cbk] 12-test-client-1: remote operation failed: No data available
[2017-12-27 06:21:04.335922] D [MSGID: 0] [client-rpc-fops.c:1151:client3_3_fgetxattr_cbk] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-client-1 returned -1 error: No data available [No data available]
[2017-12-27 06:21:04.335950] D [MSGID: 0] [afr-inode-read.c:1720:afr_fgetxattr_wind] 12-stack-trace: stack-address: 0x7fb5fc0618d0, test-replicate-0 returned -1 error: No data available [No data available]


Best Regards,
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20171228/cfaab171/attachment-0003.html>


More information about the Gluster-devel mailing list