[Gluster-devel] Single-process (server and client) AFR problems

Tue May 20 10:21:34 UTC 2008

Krishna,

Is that to say that it's a bug, or am I just using it wrong? Or do I just 
have a knack for finding dodgy edge cases?

Is there a workaround?

I have just reconfigured my servers to do 2-process client-side AFR (i.e. 
the traditional approach), and that works fine. But having single-process 
server-side AFR would be more efficient, and simplify my config somewhat.

Thanks.

Gordan

On Tue, 20 May 2008, Krishna Srinivas wrote:

> In this setup, home1 is sending CHILD_UP event to "server" xlator instead
> of the "home" afr xlator. (and home2 is not up) This makes afr think none
> of its subvols are up. We will fix it to handle this situation.
> 
> Thanks
> Krishna
> 
> On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan at bobich.net> wrote:
>       This is with release 1.3.9.
>
>       Not much more that seems relevant turns up in the logs with -L DEBUG (DNS
>       chatter, mentions that the 2nd server isn't talking (glusterfs is switched
>       off on it because that causes the lock-up).
>
>       This gets logged when I try to cat ~/.bashrc:
>
>       2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk] glusterfs-fuse:
>       39: (34) /gordan/.bashrc =>
>        60166157
>       2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create
>       inode(60166157)
>       2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating
>       inode(60166157), lru=7/102
>       4
>       2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating
>       inode(60166157), lru=7/102
>       4
>       2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse: 40:
>       OPEN /gordan/.bashrc
>       2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the children
>       are up for locking, retur
>       ning EIO
> 2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40:
> (12) /gordan/.bashrc => -1
>  (5)
> 
> On the command line, I get back "Input/output error". I can ls the files,
> but cannot actually read them.
> 
> This is with only the first server up. Same happens when I mount home.vol
> via fstab or via something like:
> glusterfs -f /etc/glusterfs/home.vol /home
> 
> I have also reduced the config (single process, intended for servers) to a
> bare minimum (removed posix lock layer), to get to the bottom of it, but I
> cannot get any reads to work:
> 
> volume home1
>        type storage/posix
>        option directory /gluster/home
> end-volume
> 
> volume home2
>        type protocol/client
>        option transport-type tcp/client
>        option remote-host 192.168.3.1
>        option remote-subvolume home2
> end-volume
> 
> volume home
>        type cluster/afr
>        option read-subvolume home1
>        subvolumes home1 home2
> end-volume
> 
> volume server
>        type protocol/server
>        option transport-type tcp/server
>        subvolumes home home1
>        option auth.ip.home.allow 127.0.0.1,192.168.*
>        option auth.ip.home1.allow 127.0.0.1,192.168.*
> end-volume
> 
> On a related node, if single-process is used, how does GlusterFS know which
> volume to mount? For example, if it is trying to mount the protocol/client
> volume (home2), the obviously, that won't work because the 2nd server is
> not up. If it is mounting the protocol/server volume, then is it trying to
> mount home or home1? Or does it mount the outermost volume that _isn't_ a
> protocol/[client|server] (which is "home" in this case)?
> 
> Thanks.
> 
> Gordan
> 
> On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas
> <krishna at zresearch.com> wrote:
> > Gordan,
> >
> > Which patch set is this? Can you run glusterfs server side with "-L
> DEBUG"
> > and send the logs?
> >
> > Thanks
> > Krishna
> >
> > On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan at bobich.net> wrote:
> >> Hi,
> >>
> >> I'm having rather major problems getting single-process AFR to work
> > between
> >> two servers. When both servers come up, the GlusterFS on both locks up
> >> pretty solid. The processes that try to access the FS (including ls)
> > seem to
> >> get nowhere for a few minutes, and then complete. But something gets
> > stuck,
> >> and glusterfs cannot be killed even with -9!
> >>
> >> Another worrying thing is that fuse kernel module ends up having a
> > reference
> >> count even after glusterfs process gets killed (sometimes killing the
> > remote
> >> process that isn't locked up on it's host can break the locked-up
> > operations
> >> and allow for the local glusterfs process to be killed). So fuse then
> > cannot
> >> be unloaded.
> >>
> >> This error seems to come up in the logs all the time:
> >> 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the
> > children
> >> are up for locking, returning EIO
> >> 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse:
> > 63:
> >> (12) /test => -1 (5)
> >>
> >> This implies come kind of a locking issue, but the same error and
> > conditions
> >> also arise when posix locking module is removed.
> >>
> >> The configs for the two servers are attached. They are almost identical
> > to
> >> the examples on the glusterfs wiki:
> >>
> >> http://www.gluster.org/docs/index.php/AFR_single_process
> >>
> >> What am I doing wrong? Have I run into another bug?
> >>
> >> Gordan
> >>
> >> volume home1-store
> >>        type storage/posix
> >>        option directory /gluster/home
> >> end-volume
> >>
> >> volume home1
> >>        type features/posix-locks
> >>        subvolumes home1-store
> >> end-volume
> >>
> >> volume home2
> >>        type protocol/client
> >>        option transport-type tcp/client
> >>        option remote-host 192.168.3.1
> >>        option remote-subvolume home2
> >> end-volume
> >>
> >> volume home
> >>        type cluster/afr
> >>        option read-subvolume home1
> >>        subvolumes home1 home2
> >> end-volume
> >>
> >> volume server
> >>        type protocol/server
> >>        option transport-type tcp/server
> >>        subvolumes home home1
> >>        option auth.ip.home.allow 127.0.0.1
> >>        option auth.ip.home1.allow 192.168.*
> >> end-volume
> >>
> >> volume home2-store
> >>        type storage/posix
> >>        option directory /gluster/home
> >> end-volume
> >>
> >> volume home2
> >>        type features/posix-locks
> >>        subvolumes home2-store
> >> end-volume
> >>
> >> volume home1
> >>        type protocol/client
> >>        option transport-type tcp/client
> >>        option remote-host 192.168.0.1
> >>        option remote-subvolume home1
> >> end-volume
> >>
> >> volume home
> >>        type cluster/afr
> >>        option read-subvolume home2
> >>        subvolumes home1 home2
> >> end-volume
> >>
> >> volume server
> >>        type protocol/server
> >>        option transport-type tcp/server
> >>        subvolumes home home2
> >>        option auth.ip.home.allow 127.0.0.1
> >>        option auth.ip.home2.allow 192.168.*
> >> end-volume
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >>
> 
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> 
>