[Gluster-devel] Single-process (server and client) AFR problems

Krishna Srinivas krishna at zresearch.com
Tue May 20 09:48:08 UTC 2008


In this setup, home1 is sending CHILD_UP event to "server" xlator instead
of the "home" afr xlator. (and home2 is not up) This makes afr think none
of its subvols are up. We will fix it to handle this situation.

Thanks
Krishna

On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan at bobich.net> wrote:

> This is with release 1.3.9.
>
> Not much more that seems relevant turns up in the logs with -L DEBUG (DNS
> chatter, mentions that the 2nd server isn't talking (glusterfs is switched
> off on it because that causes the lock-up).
>
> This gets logged when I try to cat ~/.bashrc:
>
> 2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk] glusterfs-fuse:
> 39: (34) /gordan/.bashrc =>
>  60166157
> 2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create
> inode(60166157)
> 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating
> inode(60166157), lru=7/102
> 4
> 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating
> inode(60166157), lru=7/102
> 4
> 2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse: 40:
> OPEN /gordan/.bashrc
> 2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the children
> are up for locking, retur
> ning EIO
> 2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40:
> (12) /gordan/.bashrc => -1
>  (5)
>
> On the command line, I get back "Input/output error". I can ls the files,
> but cannot actually read them.
>
> This is with only the first server up. Same happens when I mount home.vol
> via fstab or via something like:
> glusterfs -f /etc/glusterfs/home.vol /home
>
> I have also reduced the config (single process, intended for servers) to a
> bare minimum (removed posix lock layer), to get to the bottom of it, but I
> cannot get any reads to work:
>
> volume home1
>         type storage/posix
>        option directory /gluster/home
> end-volume
>
> volume home2
>         type protocol/client
>        option transport-type tcp/client
>        option remote-host 192.168.3.1
>        option remote-subvolume home2
> end-volume
>
> volume home
>        type cluster/afr
>        option read-subvolume home1
>        subvolumes home1 home2
> end-volume
>
> volume server
>        type protocol/server
>        option transport-type tcp/server
>        subvolumes home home1
>         option auth.ip.home.allow 127.0.0.1,192.168.*
>        option auth.ip.home1.allow 127.0.0.1,192.168.*
> end-volume
>
> On a related node, if single-process is used, how does GlusterFS know which
> volume to mount? For example, if it is trying to mount the protocol/client
> volume (home2), the obviously, that won't work because the 2nd server is
> not up. If it is mounting the protocol/server volume, then is it trying to
> mount home or home1? Or does it mount the outermost volume that _isn't_ a
> protocol/[client|server] (which is "home" in this case)?
>
> Thanks.
>
> Gordan
>
> On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas
> <krishna at zresearch.com> wrote:
> > Gordan,
> >
> > Which patch set is this? Can you run glusterfs server side with "-L
> DEBUG"
> > and send the logs?
> >
> > Thanks
> > Krishna
> >
> > On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan at bobich.net> wrote:
> >> Hi,
> >>
> >> I'm having rather major problems getting single-process AFR to work
> > between
> >> two servers. When both servers come up, the GlusterFS on both locks up
> >> pretty solid. The processes that try to access the FS (including ls)
> > seem to
> >> get nowhere for a few minutes, and then complete. But something gets
> > stuck,
> >> and glusterfs cannot be killed even with -9!
> >>
> >> Another worrying thing is that fuse kernel module ends up having a
> > reference
> >> count even after glusterfs process gets killed (sometimes killing the
> > remote
> >> process that isn't locked up on it's host can break the locked-up
> > operations
> >> and allow for the local glusterfs process to be killed). So fuse then
> > cannot
> >> be unloaded.
> >>
> >> This error seems to come up in the logs all the time:
> >> 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the
> > children
> >> are up for locking, returning EIO
> >> 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse:
> > 63:
> >> (12) /test => -1 (5)
> >>
> >> This implies come kind of a locking issue, but the same error and
> > conditions
> >> also arise when posix locking module is removed.
> >>
> >> The configs for the two servers are attached. They are almost identical
> > to
> >> the examples on the glusterfs wiki:
> >>
> >> http://www.gluster.org/docs/index.php/AFR_single_process
> >>
> >> What am I doing wrong? Have I run into another bug?
> >>
> >> Gordan
> >>
> >> volume home1-store
> >>        type storage/posix
> >>        option directory /gluster/home
> >> end-volume
> >>
> >> volume home1
> >>        type features/posix-locks
> >>        subvolumes home1-store
> >> end-volume
> >>
> >> volume home2
> >>        type protocol/client
> >>        option transport-type tcp/client
> >>        option remote-host 192.168.3.1
> >>        option remote-subvolume home2
> >> end-volume
> >>
> >> volume home
> >>        type cluster/afr
> >>        option read-subvolume home1
> >>        subvolumes home1 home2
> >> end-volume
> >>
> >> volume server
> >>        type protocol/server
> >>        option transport-type tcp/server
> >>        subvolumes home home1
> >>        option auth.ip.home.allow 127.0.0.1
> >>        option auth.ip.home1.allow 192.168.*
> >> end-volume
> >>
> >> volume home2-store
> >>        type storage/posix
> >>        option directory /gluster/home
> >> end-volume
> >>
> >> volume home2
> >>        type features/posix-locks
> >>        subvolumes home2-store
> >> end-volume
> >>
> >> volume home1
> >>        type protocol/client
> >>        option transport-type tcp/client
> >>        option remote-host 192.168.0.1
> >>        option remote-subvolume home1
> >> end-volume
> >>
> >> volume home
> >>        type cluster/afr
> >>        option read-subvolume home2
> >>        subvolumes home1 home2
> >> end-volume
> >>
> >> volume server
> >>        type protocol/server
> >>        option transport-type tcp/server
> >>        subvolumes home home2
> >>        option auth.ip.home.allow 127.0.0.1
> >>        option auth.ip.home2.allow 192.168.*
> >> end-volume
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> Gluster-devel at nongnu.org
> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>
> >>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>



More information about the Gluster-devel mailing list