[Gluster-users] Cant read files >64mb

Christian Reiss email at christian-reiss.de
Tue Feb 4 12:37:55 UTC 2020


Hey folks,

I am running a 3-way, no arbiter Gluster setup using oVirt and contained 
Gluster 6.7. After a crash we are unable to start any VMs due to Storage 
IO error.
After much, much backtracking and debugging we are closing in on the 
symptons, albeit not the issue.

Conditions:

  - gluster volume is healthy,
  - No outstanding heal or split-brain files,
  - 3 way without arbiter nodes (3 copies),
  - I already ran several "heal full" commands.

  Gluster Volume Info
     Volume Name: ssd_storage
     Type: Replicate
     Volume ID: d84ec99a-5db9-49c6-aab4-c7481a1dc57b
     Status: Started
     Snapshot Count: 0
     Number of Bricks: 1 x 3 = 3
     Transport-type: tcp
     Bricks:
     Brick1: node01.company.com:/gluster_bricks/ssd_storage/ssd_storage
     Brick2: node02.company.com:/gluster_bricks/ssd_storage/ssd_storage
     Brick3: node03.company.com:/gluster_bricks/ssd_storage/ssd_storage
     Options Reconfigured:
     cluster.self-heal-daemon: enable
     cluster.granular-entry-heal: enable
     storage.owner-gid: 36
     storage.owner-uid: 36
     network.ping-timeout: 30
     server.event-threads: 4
     client.event-threads: 4
     cluster.choose-local: off
     user.cifs: off
     features.shard: on
     cluster.shd-wait-qlength: 10000
     cluster.shd-max-threads: 8
     cluster.locking-scheme: granular
     cluster.data-self-heal-algorithm: full
     cluster.server-quorum-type: server
     cluster.quorum-type: auto
     cluster.eager-lock: enable
     network.remote-dio: off
     performance.low-prio-threads: 32
     performance.io-cache: off
     performance.read-ahead: off
     performance.quick-read: off
     performance.strict-o-direct: on
     transport.address-family: inet
     nfs.disable: on
     performance.client-io-threads: on

Gluster Volume Status
     Status of volume: ssd_storage
     Gluster process                             TCP Port  RDMA Port 
Online  Pid
 
------------------------------------------------------------------------------
     Brick node01.company.com:/gluster_br
     icks/ssd_storage/ssd_storage                49152     0          Y 
      8218
     Brick node02.company.com:/gluster_br
     icks/ssd_storage/ssd_storage                49152     0          Y 
      23595
     Brick node03.company.com:/gluster_br
     icks/ssd_storage/ssd_storage                49152     0          Y 
      8080
     Self-heal Daemon on localhost               N/A       N/A        Y 
      66028
     Self-heal Daemon on 10.100.200.12           N/A       N/A        Y 
      52087
     Self-heal Daemon on node03.company.com
     et                                          N/A       N/A        Y 
      8372

     Task Status of Volume ssd_storage
 
------------------------------------------------------------------------------
     There are no active volume tasks


The mounted path where the oVirt vm files reside is 100% okay, we copied 
all the images out there onto standalone hosts and the images run just 
fine. There is no obvious data corruption. However launching any VM out 
of oVirt fails with "IO Storage Error".

This is where everything gets funny.
oVirt uses a vdsm user to access all the files.

Findings:
  - root can read, edit and write all files inside the ovirt mounted 
gluster path.
  - vdsm user can write to new files regardless of size without any 
issues; changes get replicated instantly to other nodes.
  - vdsm user can append to existing files regardless of size without 
any issues; changes get replicated instantly to other nodes.
  - vdsm user can read files if those files are smaller than 64mb.
  - vdsm user gets permission denied errors if the file to be read is 
65mb or bigger.
  - vdsm user gets permission denied errors if the requests crosses a 
gluster shard-file boundary.
  - if root does a "dd if=file_larger_than64mb" of=/dev/null" on any 
large file, the file can then be read by the vdsm user on that single 
node. Changes do not get replicated to other nodes.


  Example:

  id of the vdsm user & sudo to them:

 
[vdsm at node01:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] 
$ id
uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock) 
context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
 
[vdsm at node02:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] 
$ id
uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock) 
context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
 
[vdsm at node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] 
$ id
uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock) 
context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Create a file >64mb on one node:

[vdsm at node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] 
$ base64 /dev/urandom | head -c 200000000 > file.txt
[vdsm at node03:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] 
$ ls -lha
total 191M
drwxr-xr-x. 2 vdsm kvm   30 Feb  4 13:10 .
drwxr-xr-x. 6 vdsm kvm   80 Jan  1  1970 ..
-rw-r--r--. 1 vdsm kvm 191M Feb  4 13:10 file.txt

File is instantly available on another node:

[vdsm at node01:/rhev/data-center/mnt/glusterSD/node01.company.com:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/test] 
$ ls -lha
total 191M
drwxr-xr-x. 2 vdsm kvm   30 Feb  4 13:10 .
drwxr-xr-x. 6 vdsm kvm   80 Jan  1  1970 ..
-rw-r--r--. 1 vdsm kvm 191M Feb  4 13:10 file.txt

Accessing the whole file fails:

[vdsm at node01:] dd if=file.txt of=/dev/null
dd: error reading ‘file.txt’: Permission denied
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 0.0651919 s, 1.0 GB/s

Reading first 64mb works, 65mb (crossing boundary) does not:

[vdsm at node01:] $ dd if=file.txt bs=1M count=64  of=/dev/null
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.00801663 s, 8.4 GB/s

[vdsm at node01:] $ dd if=file.txt bs=1M count=65  of=/dev/null
dd: error reading ‘file.txt’: Permission denied
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.00908712 s, 7.4 GB/s

Attaching/ appending to the file works (not crossing bounary):

[vdsm at node01:] $ date >> file.txt
[vdsm at node01:] $

[vdsm at node02:] $ tail -n2 file.txt
E16ACZaLqLhx2oUUUov5JHvQcVFohn6HH+eog6XZCiTaG0Tue  4 Feb 13:18:37 CET 2020

Reading the file beginning & end works, if it crosses the boundary not 
so much:

[vdsm at node02:] $ head file.txt
jrZOxGaGvwfpGSwn1BKWWmFC4556KNzXsD2BCwY78tnV1mRY54IxnE+hbnszRyWgVuXhBpVRoJTp
xvVwktZwSytMyvJjsSt7pQbXbHSY66tRe/rvrw5dHr3RNJn9HjqtlKQ9mHVX4ch1HkU5posSmDbg
vwzxBTXWfxLDMmIghyTgBTSFiI9Xg8W6htxDpxrbO+10EzlnaN1Am5tAlTkfrorNLyihpiQhUPGG
ag6tJUcFj3IySGRTAxnStFRQoBXN5dlyx1Sqc4s/Tpl7gkgR8+I7UcdRKISjgcGcpW+zrXKqFF/H
Dwv6ql+2ysPRrtlbt2V8Zf697VsNX5DTgZS9BKmWlAeqejNYaqG5Rsuhn7szbCfkkmsjedk+Rdcv
A3SHMBeHXdtfBHS0AlbEwKgeml08NmCUcwnifhrQywCnu8NN9+RQ3cUxGvIuLLSzi3915wC6hbxr
8xArckQfSUfKA/hrHvoiiCGZU9D23xj3XXtsjdbIIDXATDnCPrKANdvGN5LTKal8bT0jXORfAz1z
MniqVUgvWVNcviPgQ9BfT5qpGo8g7LaoBMGamAGVX6Ezrs04rk8jQ1yz1bB/8URfTRLZdyYkMh0u
MB4xMylnyavgusoi7Duf5RuYJvNaL0g8Lx/cfGpGsGwdD2Lj/qRC45ammn6wCxDVfiJV6Z/TzJcY
PBvzWK5xT++PQgMV8EwtXwA1kFqaGrcuiDHejMQ8O82Edjr+eBCBe0B7bRddoMD6oOlhNm1YsSNt

[vdsm at node02:] $ tail file.txt
9JX8OWCJwbyvEPDyyI30H1/jPZfDo1sS11dZ2JjiO7qhB45VaU8+irG45D0GGJhFf8wE8TD9EGWG
8346QHLX9ZSFsbjpuh71hr5Ju1UduVdvIDwwP8WDBtRUbMAVvsyGR33rkpijepmUjmYl/jeZ7rsC
VyUVlmG5PxrI7KKxz5dSkzApqVHKKgsf93JMDAdPwvXTq4hhZdUJ581w9FC/f9k2wWldEGkAcyB0
cCKp+VJl2vx989KUoqAJzsrvYdK0X7itruqYdpC29JXode+7NixUflhKvPdKmitBYyCEgCcyxUyn
eyMOdaan2x8d8MztLLoWLpp+gLzl2Hev7y3OXq6I9SVN2t+hcVIz8Llmumy0cD+VC4u2/UZszYqS
nDaSSMs35agGUUgIpHjPxCRf/yqnfrJJMTGAcxSEqHtpEdsjEmkf4QkyEgEZ13f4oi7P/DFCIIvV
JBsHzOLDoetnFzAA2/RqbDflPrVWcAR7tXVqGLACCj2s19uUFSNb8nBWmEk8fFz31iJhuL43v0WE
78/THl49T0hhzHQp6kdIiw5p1zPUIFGBZ0BS4mBCHxu+tMlPZe1zWJMJZdPnvDNtHZ4gQ6LFgU4w
E16ACZaLqLhx2oUUUov5JHvQcVFohn6HH+eog6XZCiTaG0Tue  4 Feb 13:18:37 CET 2020

[vdsm at node02:] $ dd if=file.txt of=/dev/null
dd: error reading ‘file.txt’: Permission denied
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 0.106097 s, 633 MB/s

if root does dd first, all is peachy:

[root at node02] # dd if=file.txt of=/dev/null
390625+1 records in
390625+1 records out
200000058 bytes (200 MB) copied, 0.345906 s, 578 MB/s

[vdsm at node02] $ dd if=file.txt of=/dev/null
390625+1 records in
390625+1 records out
200000058 bytes (200 MB) copied, 0.188451 s, 1.1 GB/s

Error in the gluster.log:

[2020-02-04 12:27:57.915356] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-1: 
remote operation failed. Path: 
/.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2 
(00000000-0000-0000-0000-000000000000) [Permission denied]
[2020-02-04 12:27:57.915404] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-0: 
remote operation failed. Path: 
/.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2 
(00000000-0000-0000-0000-000000000000) [Permission denied]
[2020-02-04 12:27:57.915472] W [MSGID: 114031] 
[client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-ssd_storage-client-2: 
remote operation failed. Path: 
/.shard/57200f4f-537d-4e56-9258-38fe6ac64c4e.2 
(00000000-0000-0000-0000-000000000000) [Permission denied]
[2020-02-04 12:27:57.915490] E [MSGID: 133010] 
[shard.c:2327:shard_common_lookup_shards_cbk] 0-ssd_storage-shard: 
Lookup on shard 2 failed. Base file gfid = 
57200f4f-537d-4e56-9258-38fe6ac64c4e [Permission denied]


What we tried:

  - restarting single hosts,
  - restarting the entire cluster,
  - doing stuff like find /rhev .. exec stats{}\ ;
  - dd'ing (read) all of the mount dir...


We are out of ideas and also no experts on either gluster nor ovirt, it 
seems.
And this is supposed to be a production HA environment. Any help would 
be appreciated.
I hope I did think of all the relevant data and logs.


-- 
with kind regards,
mit freundlichen Gruessen,

Christian Reiss



More information about the Gluster-users mailing list