[Gluster-devel] Problem with autofs configuration - sometimes mount does not complete fast enough?

Mon Sep 7 02:42:17 UTC 2009

This seems to happen about 50% of the time:

[root at wcarh035 ~]# ls /gluster/data
ls: cannot open directory /gluster/data: No such file or directory
[root at wcarh035 ~]# ls /gluster/data
00      06.fun  15      23.fun  32      40.fun  47      55.fun  64
00.fun  07      15.fun  24      32.fun  41      47.fun  56      64.fun
01      07.fun  16      24.fun  33      41.fun  50      56.fun  65
01.fun  10      16.fun  25      33.fun  42      50.fun  57      65.fun
02      10.fun  17      25.fun  34      42.fun  51      57.fun  66
02.fun  11      17.fun  26      34.fun  43      51.fun  60      66.fun
03      11.fun  20      26.fun  35      43.fun  52      60.fun  67
03.fun  12      20.fun  27      35.fun  44      52.fun  61      67.fun
04      12.fun  21      27.fun  36      44.fun  53      61.fun  lost+found
04.fun  13      21.fun  30      36.fun  45      53.fun  62
05      13.fun  22      30.fun  37      45.fun  54      62.fun
05.fun  14      22.fun  31      37.fun  46      54.fun  63
06      14.fun  23      31.fun  40      46.fun  55      63.fun

If the mount is not up at the time of accessing the autofs directory, 
then 50% of the time it takes 3 to 5 seconds for the directory listing 
to show properly, and the other 50% of the time it takes the same 3 to 5 
seconds but gives a "No such file or directory" error. This happens 
whether a longer path (/gluster/data/44 for example) or just the top 
level path is used. This happens whether autofs --ghost is used or not. 
It seems like something might time out too soon if glusterfs takes too 
long to start?

Here are the relevant autofs configurations:

[root at wcarh035 ~]# head -1 /etc/auto.master
/gluster /etc/glusterfs/auto.gluster --timeout=3600

[root at wcarh035 ~]# cat /etc/glusterfs/auto.gluster
data -fstype=glusterfs :/etc/glusterfs/gluster-data.vol

For gluster-data.vol, it is to a 3-node cluster/replicate cluster with 
some of the performance/ modules activated.

Any suggestions?

I don't mind the autofs mount taking a few seconds to complete (although 
if 3 to 5 seconds is unusual, perhaps I can fix that as well). I AM 
concerned that if the autofs mount is used for the first time, or the 
first time after a period of inactivity, that the request might 
spuriously fail. This is bad. Is this AutoFS at fault or is it GlusterFS?

My current guess is that GlusterFS is saying the mount is complete to 
AutoFS before the actual mount operation takes effect. 50% of the time 
GlusterFS is able to complete the mount before AutoFS let's the user 
continue, and all is well. The other 50% of the time, GlusterFS does not 
quite finish the mount, and AutoFS gives the user a broken directory.

I might try and prove this by adding a sleep 5 to /sbin/mount.glusterfs, 
although I do not consider this a valid solution, as it just reduces the 
effect of the race - it does not eliminate the race.

Cheers,
mark

-- 
Mark Mielke<mark at mielke.cc>