[Gluster-users] gluster 3.3 sporadic 'Permission denied' failure under load in cluster env
harry mangalam
harry.mangalam at uci.edu
Wed Apr 10 18:35:19 UTC 2013
Sending this again, since I'm not even sure that the 1st made it to the list
and it's just happened again, even with the same user (one of the heaviest
users, but I don't think there's anything odd about his usage).
In the last 3 days, we've had 6 such errors (resulting in the logged error:
E [posix.c:1730:posix_create] 0-gl-posix: setting gfid on [file] failed
An question that could be answered is: has anyone had such errors in their
brick logs show up?
ie:
grep -n "posix.c:1730:posix_create" /var/log/glusterfs/bricks/raid[12].log
hjm
=== previously ===
We have a ~2500core academic cluster with saturating amounts of use.
The main data store is running on a 4 node/8brick/340TB/QDR IB gluster 3.3
filesystem. All are 8xOpteron/32GB systems with 3ware 9750 SAS controllers
The servers are all running SL6.2 and are stable, with load running stably at
about 2 continuously.
gluster is config'ed as:
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Bricks:
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.write-behind-window-size: 1024MB
performance.flush-behind: on
performance.cache-size: 268435456
nfs.disable: on
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*
Many of our users run large array jobs under SGE and especially during those
runs where there is LOTS of IO, we will VERY occasionally (20 times since last
June, according to brick logs) see these kinds of errors, resulting in the
failure of that particular element of the array job.
Sometimes these are acceptable, but often the next job depends on all elements
of the array job to complete correctly. At any rate, from the fs POV they
should all complete.
The rarity of this error and the type of error, and where it is located
suggest that it might be a hash collision..? According to gluster bugzilla
this doesn't seem to be a registered bug, so here I am asking if this has been
seen by others and how this might be addressed.
=========================================================================
> The error below being reported by Grid Engine says:
>
> user "root" 03/21/2013 15:29:23 [507:26777]: error: can't open output
> file
> "/gl/bio/krthornt/WTCCC/autosomal_analysis_Jan2013/1958BC/COMPUTE_1958BC.o2
> 54058.103": Permission denied 03/21/2013 15:29:23 [400:25458]: wait3
=========================================================================
Looking thru all the server logs (/var/log/glusterfs/etc-glusterfs-
glusterd.vol.log), reveals nothing about this error, but the brick logs yeild
this set of lines referencing that file at the correct time:
/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:18.667171] W [posix-
handle.c:461:posix_handle_hard] 0-gl-posix: link
/raid1/bio/krthornt/WTCCC/autosomal_
analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 ->
/raid1/.glusterfs/5a/0e/5a0e87a6-e35d-4368-841e-b45802fecc4e failed (File
exists)
/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:18.667249] E
[posix.c:1730:posix_create] 0-gl-posix: setting gfid on
/raid1/bio/krthornt/WTCCC/autosomal_
analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 failed
/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:19.241602] I [server3_1-
fops.c:1538:server_open_cbk] 0-gl-server: 644765: OPEN
/bio/krthornt/WTCCC/autoso
mal_analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 (5a0e87a6-
e35d-4368-841e-b45802fecc4e) ==> -1 (Permission denied)
/var/log/glusterfs/bricks/raid1.log:[2013-03-21 15:43:19.520455] I [server3_1-
fops.c:1538:server_open_cbk] 0-gl-server: 644970: OPEN
/bio/krthornt/WTCCC/autoso
mal_analysis_Jan2013/1958BC/COMPUTE_1958BC.o254058.103 (5a0e87a6-
e35d-4368-841e-b45802fecc4e) ==> -1 (Permission denied)
---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
More information about the Gluster-users
mailing list