[Gluster-users] [SPAM?] STALE NFS Handle (Transport endpoint is not connected)

Heiko Schröter schroete at iup.physik.uni-bremen.de
Wed Mar 16 13:45:03 UTC 2011


Am Mittwoch 16 März 2011, um 14:22:21 schrieben Sie:

Hi Burnash,

yes i see your point too and may be it is more a philosophical matter of facts.

But when i do understand the gluster philosophy correct than the info about the files are set in the EAs and not in some kind of metadata server.
Having a metadata server one could access the dir info but not the file or object (as lustre does) when the brick or OSD is down.
So you would get the error (but not a STALE) when actually accessing the object.
This cannot be achieved in gluster (when i'am not mistaken).
You have a go or no go situation. Alas true binary ;-)

The thing is that gluster should not stop working when "one" ressource/brick is down.

No matter the state the system is in i.e. booting, going down et al.
And no, i don't think you are too wrong ...

Regards
Heiko


> Hi Heiko.
> 
> I understand your points. 
> 
> However, if a NFS server connection goes down, if the client has a "hard" mount to that server, any activity against that mount point will hang until the connection is restored. If the NFS server crashes and / or is rebooted, the client could see a STALE NFS handle the next time it tries to access that mount point.
> 
> I believe that this is the behavior that GlusterFS is emulating.
> 
> If remote storage is mounted but inaccessible, listing the directory should not result in an empty listing, because content is supposed be available at that mount point.
> 
> Then again ... I could be COMPLETELY wrong. It wouldn't be the first time :-)
> 
> James Burnash, Unix Engineering
> 
> -----Original Message-----
> From: Heiko Schröter [mailto:schroete at iup.physik.uni-bremen.de] 
> Sent: Wednesday, March 16, 2011 9:15 AM
> To: Burnash, James
> Cc: gluster-users at gluster.org
> Subject: Re: [SPAM?] [Gluster-users] STALE NFS Handle (Transport endpoint is not connected)
> 
> Am Mittwoch 16 März 2011, um 13:33:27 schrieben Sie:
> 
> Hi Burnash,
> 
> thanks for the info. Hm, yes i could do that but that is not the intention.
> Even with a "distribute" setup the client should never hang when a (or more) server goes down. No matter what.
> 
> Taken the "redistribute" scenario into account, when will that going to hang ? When both bricks are down (replica 2) ?
> Or what will happen if the cloud of the two bricks cannot be reached (network failure) ?
> 
> So to my understanding a network application should never just hang (or STALE), but give a proper error message or whatever is appropriate.
> To me more appropriate, is that the "ressource" (brick) is not there, i.e directory empty or so.
> 
> Correct if i'am wrong but adding replicas is just a way of saying it is "more unlikely" to happen, but it still could ...
> 
> Regards
> Heiko
> 
> 
> 
> 
> > Hi Heiko.
> > 
> > Since you have your bricks setup as Distribute, both bricks participate in the file system - if you look at the underlying directories that make up each brick, you will find some files on each server that appear to be under the same directory when mounted by a client.
> > 
> > You must have both bricks up to present a valid filesystem.
> > 
> > To get the scenario you are describing, you would have to delete and recreate your Gluster volume as Replicated-Distribute, using the arguments "replica 2" in your gluster volume create stanza.
> > 
> > James Burnash, Unix Engineering
> > 
> > -----Original Message-----
> > From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Heiko Schröter
> > Sent: Wednesday, March 16, 2011 8:26 AM
> > To: gluster-users at gluster.org
> > Subject: [SPAM?] [Gluster-users] STALE NFS Handle (Transport endpoint is not connected)
> > Importance: Low
> > 
> > Hello,
> > 
> > the following produces STALE NFS Handle (Transport endpoint is not connected)
> > Scenario: 2 bricks, 1 client
> > Version: 3.1.2
> > 
> > 1) Bring gluster system up
> > rd32 ~ # gluster volume info
> > Volume Name: test-volume
> > Type: Distribute
> > Status: Started
> > Number of Bricks: 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: rd32:/mnt/gluster1
> > Brick2: rd33:/mnt/gluster1
> > 
> > 1a) mount /mnt/gluster on client.
> > 
> > 2) Bring bricks down
> > 
> > 3) Try to access /mnt/gluster on client:
> > client# ls -la /mnt/gluster
> > ls: cannot access /mnt/gluster: Transport endpoint is not connected
> > 
> > 3) Start glusterd on Brick1:
> > client# ls -la /mnt/gluster
> > ls: cannot access /mnt/gluster: Stale NFS file handle
> > 
> > 4) Start glusterd on Brick2:
> > client# ls -la /mnt/gluster
> > total 102408
> > drwxr-xr-x 2 root root        58 Mar 14 15:48 .
> > drwxr-xr-x 7 root root      4096 Mar  2 10:21 ..
> > -rw-r--r-- 1 root root         1 Mar 14 14:36 .version
> > -rw-r--r-- 1 root root 104857600 Mar 14 15:48 willi.dat
> > 
> > STALE NFS Handle only vanishs when ALL bricks are up and running, when started as above.
> > This is reproducable.
> > 
> > Expected behaviour:
> > Client should never ever see a STALE NFS Handle.
> > i.e
> > If no brick is running -> /mnt/gluster should be just empty.
> > If at least one brick is running -> /mnt/gluster should show the files on that single brick.
> > 
> > 
> > Thanks and Regards
> > Heiko



More information about the Gluster-users mailing list