[Gluster-users] 'No data available' at clients, brick xattr ops errors on small I/O -- XFS stripe issue or repeat bug?
LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Owen.M.LaGarde at erdc.dren.mil
Tue Nov 10 07:05:17 UTC 2015
This is possibly another instance of the earlier threads (below). This occurs
with 3.6.6-1 and 3.6.2-1.
http://www.gluster.org/pipermail/gluster-users/2014-June/017635.html
http://www.gluster.org/pipermail/gluster-users/2012-March/009942.html
Synopsis:
A standard user tries to build GIT and succeeds, but is then unable to
delete a relatively small number of files in the build tree. They get
'No data available' and while the bricks contain associated nodes which
getfattr seems happy with, at least one entry for each node has mode 1000
(text bit only). The logs are also being spammed again with iobref_unref
and iobuf_unref, which may be connected, or may not. The brick logs do
contain xattr set errors to start, and get/modify later on, ending with
unlink errors when the deletion attempts arrive.
I'm mainly hoping it's *not* a case of the latter thread listed above
(ie., use ext4 instead of xfs for the backing storage), because backing
up the healthy side of 80TB before rebuilding the underlying bricks' LUNs
will be ... interesting.
Environment:
RHEL 6.7, kernel 2.6.32-573.7.1.el6.x86_64
Gluster locations/packages/versions:
servers: "service{4..7,10..13}":
glusterfs-server-3.6.6-1.el6.x86_64
glusterfs-api-3.6.6-1.el6.x86_64
glusterfs-debuginfo-3.6.6-1.el6.x86_64
glusterfs-3.6.6-1.el6.x86_64
glusterfs-fuse-3.6.6-1.el6.x86_64
glusterfs-rdma-3.6.6-1.el6.x86_64
glusterfs-libs-3.6.6-1.el6.x86_64
glusterfs-devel-3.6.6-1.el6.x86_64
glusterfs-api-devel-3.6.6-1.el6.x86_64
glusterfs-extra-xlators-3.6.6-1.el6.x86_64
glusterfs-cli-3.6.6-1.el6.x86_64
clients: "service1" aka "phoenix01":
glusterfs-3.6.6-1.el6.x86_64
glusterfs-api-devel-3.6.6-1.el6.x86_64
glusterfs-libs-3.6.6-1.el6.x86_64
glusterfs-devel-3.6.6-1.el6.x86_64
glusterfs-cli-3.6.6-1.el6.x86_64
glusterfs-extra-xlators-3.6.6-1.el6.x86_64
glusterfs-fuse-3.6.6-1.el6.x86_64
glusterfs-rdma-3.6.6-1.el6.x86_64
glusterfs-api-3.6.6-1.el6.x86_64
glusterfs-debuginfo-3.6.6-1.el6.x86_64
volume info:
Volume Name: home
Type: Distribute
Volume ID: f03fcaf0-3889-45ac-a06a-a4d60d5a673d
Status: Started
Number of Bricks: 28
Transport-type: rdma
Bricks:
Brick1: service4-ib1:/mnt/l1_s4_ost0000_0000/brick
Brick2: service4-ib1:/mnt/l1_s4_ost0001_0001/brick
Brick3: service4-ib1:/mnt/l1_s4_ost0002_0002/brick
Brick4: service5-ib1:/mnt/l1_s5_ost0003_0003/brick
Brick5: service5-ib1:/mnt/l1_s5_ost0004_0004/brick
Brick6: service5-ib1:/mnt/l1_s5_ost0005_0005/brick
Brick7: service5-ib1:/mnt/l1_s5_ost0006_0006/brick
Brick8: service6-ib1:/mnt/l1_s6_ost0007_0007/brick
Brick9: service6-ib1:/mnt/l1_s6_ost0008_0008/brick
Brick10: service6-ib1:/mnt/l1_s6_ost0009_0009/brick
Brick11: service7-ib1:/mnt/l1_s7_ost000a_0010/brick
Brick12: service7-ib1:/mnt/l1_s7_ost000b_0011/brick
Brick13: service7-ib1:/mnt/l1_s7_ost000c_0012/brick
Brick14: service7-ib1:/mnt/l1_s7_ost000d_0013/brick
Brick15: service10-ib1:/mnt/l1_s10_ost000e_0014/brick
Brick16: service10-ib1:/mnt/l1_s10_ost000f_0015/brick
Brick17: service10-ib1:/mnt/l1_s10_ost0010_0016/brick
Brick18: service11-ib1:/mnt/l1_s11_ost0011_0017/brick
Brick19: service11-ib1:/mnt/l1_s11_ost0012_0018/brick
Brick20: service11-ib1:/mnt/l1_s11_ost0013_0019/brick
Brick21: service11-ib1:/mnt/l1_s11_ost0014_0020/brick
Brick22: service12-ib1:/mnt/l1_s12_ost0015_0021/brick
Brick23: service12-ib1:/mnt/l1_s12_ost0016_0022/brick
Brick24: service12-ib1:/mnt/l1_s12_ost0017_0023/brick
Brick25: service13-ib1:/mnt/l1_s13_ost0018_0024/brick
Brick26: service13-ib1:/mnt/l1_s13_ost0019_0025/brick
Brick27: service13-ib1:/mnt/l1_s13_ost001a_0026/brick
Brick28: service13-ib1:/mnt/l1_s13_ost001b_0027/brick
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
storage.build-pgfid: on
performance.stat-prefetch: off
volume status:
Status of volume: home
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick service4-ib1:/mnt/l1_s4_ost0000_0000/brick 49156 Y 7513
Brick service4-ib1:/mnt/l1_s4_ost0001_0001/brick 49157 Y 7525
Brick service4-ib1:/mnt/l1_s4_ost0002_0002/brick 49158 Y 7537
Brick service5-ib1:/mnt/l1_s5_ost0003_0003/brick 49163 Y 7449
Brick service5-ib1:/mnt/l1_s5_ost0004_0004/brick 49164 Y 7461
Brick service5-ib1:/mnt/l1_s5_ost0005_0005/brick 49165 Y 7473
Brick service5-ib1:/mnt/l1_s5_ost0006_0006/brick 49166 Y 7485
Brick service6-ib1:/mnt/l1_s6_ost0007_0007/brick 49155 Y 7583
Brick service6-ib1:/mnt/l1_s6_ost0008_0008/brick 49156 Y 7595
Brick service6-ib1:/mnt/l1_s6_ost0009_0009/brick 49157 Y 7607
Brick service7-ib1:/mnt/l1_s7_ost000a_0010/brick 49160 Y 7490
Brick service7-ib1:/mnt/l1_s7_ost000b_0011/brick 49161 Y 7502
Brick service7-ib1:/mnt/l1_s7_ost000c_0012/brick 49162 Y 7514
Brick service7-ib1:/mnt/l1_s7_ost000d_0013/brick 49163 Y 7526
Brick service10-ib1:/mnt/l1_s10_ost000e_0014/brick 49155 Y 8136
Brick service10-ib1:/mnt/l1_s10_ost000f_0015/brick 49156 Y 8148
Brick service10-ib1:/mnt/l1_s10_ost0010_0016/brick 49157 Y 8160
Brick service11-ib1:/mnt/l1_s11_ost0011_0017/brick 49160 Y 7453
Brick service11-ib1:/mnt/l1_s11_ost0012_0018/brick 49161 Y 7465
Brick service11-ib1:/mnt/l1_s11_ost0013_0019/brick 49162 Y 7477
Brick service11-ib1:/mnt/l1_s11_ost0014_0020/brick 49163 Y 7489
Brick service12-ib1:/mnt/l1_s12_ost0015_0021/brick 49155 Y 7457
Brick service12-ib1:/mnt/l1_s12_ost0016_0022/brick 49156 Y 7469
Brick service12-ib1:/mnt/l1_s12_ost0017_0023/brick 49157 Y 7481
Brick service13-ib1:/mnt/l1_s13_ost0018_0024/brick 49156 Y 7536
Brick service13-ib1:/mnt/l1_s13_ost0019_0025/brick 49157 Y 7548
Brick service13-ib1:/mnt/l1_s13_ost001a_0026/brick 49158 Y 7560
Brick service13-ib1:/mnt/l1_s13_ost001b_0027/brick 49159 Y 7572
NFS Server on localhost 2049 Y 7553
NFS Server on service6-ib1 2049 Y 7625
NFS Server on service13-ib1 2049 Y 7589
NFS Server on service11-ib1 2049 Y 7507
NFS Server on service12-ib1 2049 Y 7498
NFS Server on service10-ib1 2049 Y 8179
NFS Server on service5-ib1 2049 Y 7502
NFS Server on service7-ib1 2049 Y 7543
Task Status of Volume home
------------------------------------------------------------------------------
Task : Rebalance
ID : f3ad27ce-7bcf-4fab-92c1-b40af75d4300
Status : completed
Reproduction: As standard user clone the latest git source into ~/build_tests/, then...
test 0: dup source tree, delete original
success, test0.script
test 1: copy dupe to new, cd into new, make configure, cd out,
delete new
success, test1.script
test 2: mkdir $WORKDIR/temp/, copy dump to new, cd into it, make
configure, ./configure --prefix $WORKDIR/temp, cd out,
delete new, delete $WORKDIR/temp/
success, test2.script
test 3: mkdir $WORKDIR/temp/, copy dump to new, cd into it, make
configure, ./configure --prefix $WORKDIR/temp/, make all
doc, cd out, delete new, delete $WORKDIR/temp/
failure on attempt to remove the working tree
as root, trying to remove a sample file (file owner gets same result):
[root at phoenix-smc users]# ssh service1 rm /home/olagarde/build_tests/new/git-diff
rm: cannot remove `/home/olagarde/build_tests/new/git-diff': No data available
the file is homed on backing servers 4 and 13 (what happened on 13?):
[root at phoenix-smc users]# pdsh -g glfs ls -l /mnt/*/brick/olagarde/build_tests/new/git-diff
service10: ...
service11: ...
service6: ...
service12: ...
service7: ...
service5: ...
service13: ---------T 5 500 206 0 Nov 9 23:28 /mnt/l1_s13_ost001a_0026/brick/olagarde/build_tests/new/git-diff
service4: -rwxr----- 81 500 206 7960411 Nov 9 23:28 /mnt/l1_s4_ost0000_0000/brick/olagarde/build_tests/new/git-diff
fattrs appear to claim happiness for both backing instances:
[root at phoenix-smc users]# pdsh -w service4 -w service13 -f 1 'getfattr -m . -d -e hex /mnt/*/brick/olagarde/build_tests/new/git-diff'
service4: getfattr: Removing leading '/' from absolute path names
service4: # file: mnt/l1_s4_ost0000_0000/brick/olagarde/build_tests/new/git-diff
service4: trusted.gfid=0xa4daceb603b0485eab77df659ea3d34c
service4: trusted.pgfid.8bfecb0a-bae2-48e9-9992-ddce2ff8e4c7=0x00000050
service4:
service13: getfattr: Removing leading '/' from absolute path names
service13: # file: mnt/l1_s13_ost001a_0026/brick/olagarde/build_tests/new/git-diff
service13: trusted.gfid=0xa4daceb603b0485eab77df659ea3d34c
service13: trusted.glusterfs.dht.linkto=0x686f6d652d636c69656e742d3000
service13:
Profile output (vol profile home info incremental, 60s snaps) is available if that helps.
Logs are also available but I have to review/sanitize them before they leave the site.
Output of 'script' sessions around the above tests is also available, if it helps.
##END
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151110/44edc775/attachment.html>
More information about the Gluster-users
mailing list