[Gluster-devel] Single-process (server and client) AFR problems

Krishna Srinivas krishna at zresearch.com
Tue May 20 10:59:58 UTC 2008


Gordon,

It is a bug in the way xlators notify UP/DOWN status to other xlators.
Thanks for notifying it to us :)

Workaround is having a separate process for "protocol/server" xlator.

Krishna

On Tue, May 20, 2008 at 3:51 PM, <gordan at bobich.net> wrote:

> Krishna,
>
> Is that to say that it's a bug, or am I just using it wrong? Or do I just
> have a knack for finding dodgy edge cases?
>
> Is there a workaround?
>
> I have just reconfigured my servers to do 2-process client-side AFR (i.e.
> the traditional approach), and that works fine. But having single-process
> server-side AFR would be more efficient, and simplify my config somewhat.
>
> Thanks.
>
> Gordan
>
>
> On Tue, 20 May 2008, Krishna Srinivas wrote:
>
>  In this setup, home1 is sending CHILD_UP event to "server" xlator instead
>> of the "home" afr xlator. (and home2 is not up) This makes afr think none
>> of its subvols are up. We will fix it to handle this situation.
>>
>> Thanks
>> Krishna
>>
>> On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan at bobich.net> wrote:
>>      This is with release 1.3.9.
>>
>>      Not much more that seems relevant turns up in the logs with -L DEBUG
>> (DNS
>>      chatter, mentions that the 2nd server isn't talking (glusterfs is
>> switched
>>      off on it because that causes the lock-up).
>>
>>      This gets logged when I try to cat ~/.bashrc:
>>
>>      2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk]
>> glusterfs-fuse:
>>      39: (34) /gordan/.bashrc =>
>>       60166157
>>      2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create
>>      inode(60166157)
>>      2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode:
>> activating
>>      inode(60166157), lru=7/102
>>      4
>>      2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode:
>> activating
>>      inode(60166157), lru=7/102
>>      4
>>      2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse:
>> 40:
>>      OPEN /gordan/.bashrc
>>      2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the
>> children
>>      are up for locking, retur
>>      ning EIO
>> 2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40:
>> (12) /gordan/.bashrc => -1
>>  (5)
>>
>> On the command line, I get back "Input/output error". I can ls the files,
>> but cannot actually read them.
>>
>> This is with only the first server up. Same happens when I mount home.vol
>> via fstab or via something like:
>> glusterfs -f /etc/glusterfs/home.vol /home
>>
>> I have also reduced the config (single process, intended for servers) to a
>> bare minimum (removed posix lock layer), to get to the bottom of it, but I
>> cannot get any reads to work:
>>
>> volume home1
>>        type storage/posix
>>        option directory /gluster/home
>> end-volume
>>
>> volume home2
>>        type protocol/client
>>        option transport-type tcp/client
>>        option remote-host 192.168.3.1
>>        option remote-subvolume home2
>> end-volume
>>
>> volume home
>>        type cluster/afr
>>        option read-subvolume home1
>>        subvolumes home1 home2
>> end-volume
>>
>> volume server
>>        type protocol/server
>>        option transport-type tcp/server
>>        subvolumes home home1
>>        option auth.ip.home.allow 127.0.0.1,192.168.*
>>        option auth.ip.home1.allow 127.0.0.1,192.168.*
>> end-volume
>>
>> On a related node, if single-process is used, how does GlusterFS know
>> which
>> volume to mount? For example, if it is trying to mount the protocol/client
>> volume (home2), the obviously, that won't work because the 2nd server is
>> not up. If it is mounting the protocol/server volume, then is it trying to
>> mount home or home1? Or does it mount the outermost volume that _isn't_ a
>> protocol/[client|server] (which is "home" in this case)?
>>
>> Thanks.
>>
>> Gordan
>>
>> On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas
>> <krishna at zresearch.com> wrote:
>> > Gordan,
>> >
>> > Which patch set is this? Can you run glusterfs server side with "-L
>> DEBUG"
>> > and send the logs?
>> >
>> > Thanks
>> > Krishna
>> >
>> > On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan at bobich.net>
>> wrote:
>> >> Hi,
>> >>
>> >> I'm having rather major problems getting single-process AFR to work
>> > between
>> >> two servers. When both servers come up, the GlusterFS on both locks up
>> >> pretty solid. The processes that try to access the FS (including ls)
>> > seem to
>> >> get nowhere for a few minutes, and then complete. But something gets
>> > stuck,
>> >> and glusterfs cannot be killed even with -9!
>> >>
>> >> Another worrying thing is that fuse kernel module ends up having a
>> > reference
>> >> count even after glusterfs process gets killed (sometimes killing the
>> > remote
>> >> process that isn't locked up on it's host can break the locked-up
>> > operations
>> >> and allow for the local glusterfs process to be killed). So fuse then
>> > cannot
>> >> be unloaded.
>> >>
>> >> This error seems to come up in the logs all the time:
>> >> 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the
>> > children
>> >> are up for locking, returning EIO
>> >> 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse:
>> > 63:
>> >> (12) /test => -1 (5)
>> >>
>> >> This implies come kind of a locking issue, but the same error and
>> > conditions
>> >> also arise when posix locking module is removed.
>> >>
>> >> The configs for the two servers are attached. They are almost identical
>> > to
>> >> the examples on the glusterfs wiki:
>> >>
>> >> http://www.gluster.org/docs/index.php/AFR_single_process
>> >>
>> >> What am I doing wrong? Have I run into another bug?
>> >>
>> >> Gordan
>> >>
>> >> volume home1-store
>> >>        type storage/posix
>> >>        option directory /gluster/home
>> >> end-volume
>> >>
>> >> volume home1
>> >>        type features/posix-locks
>> >>        subvolumes home1-store
>> >> end-volume
>> >>
>> >> volume home2
>> >>        type protocol/client
>> >>        option transport-type tcp/client
>> >>        option remote-host 192.168.3.1
>> >>        option remote-subvolume home2
>> >> end-volume
>> >>
>> >> volume home
>> >>        type cluster/afr
>> >>        option read-subvolume home1
>> >>        subvolumes home1 home2
>> >> end-volume
>> >>
>> >> volume server
>> >>        type protocol/server
>> >>        option transport-type tcp/server
>> >>        subvolumes home home1
>> >>        option auth.ip.home.allow 127.0.0.1
>> >>        option auth.ip.home1.allow 192.168.*
>> >> end-volume
>> >>
>> >> volume home2-store
>> >>        type storage/posix
>> >>        option directory /gluster/home
>> >> end-volume
>> >>
>> >> volume home2
>> >>        type features/posix-locks
>> >>        subvolumes home2-store
>> >> end-volume
>> >>
>> >> volume home1
>> >>        type protocol/client
>> >>        option transport-type tcp/client
>> >>        option remote-host 192.168.0.1
>> >>        option remote-subvolume home1
>> >> end-volume
>> >>
>> >> volume home
>> >>        type cluster/afr
>> >>        option read-subvolume home2
>> >>        subvolumes home1 home2
>> >> end-volume
>> >>
>> >> volume server
>> >>        type protocol/server
>> >>        option transport-type tcp/server
>> >>        subvolumes home home2
>> >>        option auth.ip.home.allow 127.0.0.1
>> >>        option auth.ip.home2.allow 192.168.*
>> >> end-volume
>> >>
>> >> _______________________________________________
>> >> Gluster-devel mailing list
>> >> Gluster-devel at nongnu.org
>> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >>
>> >>
>>
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>



More information about the Gluster-devel mailing list