[Gluster-devel] This bug hunt just gets weirder...

Tue Feb 17 19:39:41 UTC 2009

OK, I've managed to resolve this, but it wasn't possible to resync the 
primary off the secondary. What I ended up doing was backing up the 
files that were changed since the primary went down, blanking the 
secondary, resyncing the secondary off the primary, and copying the 
backed up files back into the file system.

By primary and secondary here I am referring to the order in which they 
are listed in subvolumes.

So to re-iterate - syncing primary off the secondary wasn't working, but 
syncing secondary off the primary worked.

Can anyone hazard a guess as to how to debug this issue further? Since I 
have the backup of the old data on the secondary, I can probably have a 
go at re-creating the problem (I'm hoping it won't be re-creatable with 
the freshly synced data).

Gordan

Gordan Bobic wrote:
> OK, now I'm completely stumped.
> 
> I just moved the backing store on the primary server away to a new 
> directory and re-created the share's root directory, so it can resync 
> from the secondary.
> 
> Only it doesn't. When the primary mounts the AFR volume, it reads the 
> volume with "ls -laR" as empty. If I blind cd into a directory and ls 
> it, the directory then gets created in the local store, and I can browse 
> it.
> 
> The setup is CentOS 5.2 x86-64, glusterfs-2.0.0rc1, gluster patched fuse 
> 2.7.4.
> 
> Volume spec files are pasted here:
> 
> primary server:
> 
> -------------------------------------------
> 
> volume home3
>         type protocol/client
>         option transport-type socket
>         option transport.address-family inet
>         option remote-host 10.2.0.10
>         option remote-port 6997
>         option remote-subvolume home3
> end-volume
> 
> volume home-store
>         type storage/posix
>         option directory /gluster/home
> end-volume
> 
> volume home2
>         type features/posix-locks
>         subvolumes home-store
> end-volume
> 
> volume server
>         type protocol/server
>         option transport-type socket
>         option transport.address-family inet
>         option transport.socket.listen-port 6997
>         subvolumes home2
>         option auth.addr.home2.allow 127.0.0.1,10.2.*
> end-volume
> 
> volume home
>         type cluster/afr
>         subvolumes home2 home3
>         option read-subvolume home2
> end-volume
> 
> ---------------------------------------------
> 
> secondary server:
> 
> volume home2
>         type protocol/client
>         option transport-type socket
>         option transport.address-family inet
>         option remote-host 10.2.3.1
>         option remote-port 6997
>         option remote-subvolume home2
> end-volume
> 
> volume home-store
>         type storage/posix
>         option directory /gluster/home
> end-volume
> 
> volume home3
>         type features/posix-locks
>         subvolumes home-store
> end-volume
> 
> volume server
>         type protocol/server
>         option transport-type socket
>         option transport.address-family inet
>         option transport.socket.listen-port 6997
>         subvolumes home3
>         option auth.addr.home3.allow 127.0.0.1,10.2.*
> end-volume
> 
> volume home
>         type cluster/afr
>         subvolumes home2 home3
>         option read-subvolume home3
> end-volume
> 
> ----------------------------------------
> 
> The only other thing of note is that I'm passing the 
> --disable-direct-io-mode parameter (I wanted tail -f to work properly).
> 
> No error appears in the log when ls-ing the share from the "empty" node.
> 
> Am I doing/overlooking something silly here due to a caffeine underflow 
> error? :-/
> 
> Gordan
> 
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at nongnu.org
> http://lists.nongnu.org/mailman/listinfo/gluster-devel