[Bugs] [Bug 1361300] New: Direct io to sharded files fails when on zfs backend

Thu Jul 28 17:35:08 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1361300

            Bug ID: 1361300
           Summary: Direct io to sharded files fails when on zfs backend
           Product: GlusterFS
           Version: mainline
         Component: sharding
          Keywords: Triaged
          Severity: high
          Assignee: bugs at gluster.org
          Reporter: kdhananj at redhat.com
        QA Contact: bugs at gluster.org
                CC: bugs at gluster.org, dgossage at carouselchecks.com,
                    kdhananj at redhat.com, pkarampu at redhat.com,
                    sabose at redhat.com
        Depends On: 1360785

+++ This bug was initially created as a clone of Bug #1360785 +++

Beginning with 3.7.12 and 3.7.13 when using zfs backed bricks connecting to
sharded files fails with direct io.

How reproducible: Always

Steps to Reproduce:
1. zfs backed bricks default settings except xattr=sa
2. gluster fs 3.7.12+ sharding enabled
3. dd if=/dev/zero
of=/rhev/data-center/mnt/glusterSD/192.168.71.11\:_glustershard/81e19cd3-ae45-449c-b716-ec3e4ad4c2f0/images/test
oflag=direct count=100 bs=1M

Actual results: dd: error writing
‘/rhev/data-center/mnt/glusterSD/192.168.71.11:_glustershard/81e19cd3-ae45-449c-b716-ec3e4ad4c2f0/images/test’:
Operation not permitted

file test is created with file size defined by shard size.  sharded file
created in .shard are 0

Expected results: 
100+0 records in
100+0 records out
104857600 bytes etc.....

Additional info:
Using proxmox users have been able to work around by changing disk caching from
none to writethrough/back.  Not sure this would help with oVirt as the pything
script that checks storage with dd and oflag=direct also fails

attaching client and brick log from test

--- Additional comment from David on 2016-07-27 09:36:52 EDT ---

in oVirt mailing list was asked to test these settings

i. Set network.remote-dio to off
        # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on
        # gluster volume set <VOL> performance.strict-o-direct on

results:

dd if=/dev/zero
of=/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
oflag=direct count=100 bs=1M
dd: error writing
‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’:
Invalid argument
dd: closing output file
‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’:
Invalid argument

[2016-07-25 18:20:19.393121] E [MSGID: 113039] [posix.c:2939:posix_open]
0-glustershard-posix: open on
/gluster2/brick1/1/.glusterfs/02/f4/02f4783b-2799-46d9-b787-53e4ccd9a052,
flags: 16385 [Invalid argument]
[2016-07-25 18:20:19.393204] E [MSGID: 115070]
[server-rpc-fops.c:1568:server_open_cbk] 0-glustershard-server: 120: OPEN
/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) ==> (Invalid argument) [Invalid
argument]

and
/var/log/glusterfs/rhev-data-center-mnt-glusterSD-192.168.71.10\:_glustershard.log
[2016-07-25 18:20:19.393275] E [MSGID: 114031]
[client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-0: remote
operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument]
[2016-07-25 18:20:19.393270] E [MSGID: 114031]
[client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-1: remote
operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument]
[2016-07-25 18:20:19.393317] E [MSGID: 114031]
[client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-2: remote
operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test
(02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument]
[2016-07-25 18:20:19.393357] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 117: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393389] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 118: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393611] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 119: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393708] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 120: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393771] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 121: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393840] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 122: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393914] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 123: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.393982] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 124: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052
fd=0x7f5fec0ba08c (Invalid argument)
[2016-07-25 18:20:19.394045] W [fuse-bridge.c:709:fuse_truncate_cbk]
0-glusterfs-fuse: 125: FTRUNCATE() ERR => -1 (Invalid argument)
[2016-07-25 18:20:19.394338] W [fuse-bridge.c:1290:fuse_err_cbk]
0-glusterfs-fuse: 126: FLUSH() ERR => -1 (Invalid argument)

--- Additional comment from David on 2016-07-27 10:54:22 EDT ---

Also have heard from others with issue that problem exists in 3.8.x as well.  I
myself have not tested as my environment is still in 3.7.x

--- Additional comment from David on 2016-07-27 11:44:09 EDT ---

These are full settings I usually apply and run with

features.shard-block-size: 64MB
features.shard: on
performance.readdir-ahead: on
storage.owner-uid: 36
storage.owner-gid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: on
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
server.allow-insecure: on
cluster.self-heal-window-size: 1024
cluster.background-self-heal-count: 16
performance.strict-write-ordering: off
nfs.disable: on
nfs.addr-namelookup: off
nfs.enable-ino32: off

--- Additional comment from Krutika Dhananjay on 2016-07-28 13:33:25 EDT ---

Hi,

Open() on these affected files seems to be returning ENOENT, however as per the
find command output you gave on ovirt-users ML, both the file and its gfid
handle seem to be existing in the backend. Then the failure was not due to
ENOENT. I looked at the code in posix again and there is evidence to suggest
that the actual error code (the real reason for open() failing) is getting
masked by stat in .unlink directory:

30         if (fd->inode->ia_type == IA_IFREG) {                                
 29                 _fd = open (real_path, fd->flags);                          
 28                 if (_fd == -1) {                          
 27                         POSIX_GET_FILE_UNLINK_PATH (priv->base_path,        
 26                                                     fd->inode->gfid,        
 25                                                     unlink_path);           
 24                         _fd = open (unlink_path, fd->flags);                
 23                 }                                                           
 22                 if (_fd == -1) {                                            
 21                         op_errno = errno;                                   
 20                         gf_msg (this->name, GF_LOG_ERROR, op_errno,         
 19                                 P_MSG_READ_FAILED,                          
 18                                 "Failed to get anonymous "                  
 17                                 "real_path: %s _fd = %d", real_path, _fd);  
 16                         GF_FREE (pfd);                                      
 15                         pfd = NULL;                                         
 14                         goto out;                                           
 13                 }                                                           
 12         }                         

In your case, on line 29, the open on
.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d failed for a reason other
than ENOENT (it can't be ENOENT because we already saw on doing find that the
file exists). And then line 27 is executed. If the file exists in its real
path, then it must be absent in .unlink directory (because the gfid handle
can't be present at both places). So it is the open() on line 24 that is
failing with ENOENT and not the open on line 29.

I'll be sending a patch to fix this problem.

Meanwhile, in order to understand why the open on line 29 failed, could you
attach all of your bricks to strace, run the test again, wait for it to fail,
and then attach both the strace output files and the resultant glusterfs client
and brick logs here?

# strace -ff -p <pid-of-the-brick> -o
<path-where-you-want-to-capture-the-output>

Referenced Bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1360785
[Bug 1360785] Direct io to sharded files fails when on zfs backend
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.