[Gluster-users] Gluster failure testing

Brian Candler B.Candler at pobox.com
Wed Aug 15 06:12:46 UTC 2012


On Tue, Aug 14, 2012 at 08:19:27PM -0700, stephen pierce wrote:
>    I let both clients run for a while, then I stop one client. I then
>    reset the brick/server that is not active (the other one is servicing
>    the HTTP traffic) now.

Do you mean that client1 sends HTTP traffic to brick/server1, and client2
sends HTTP traffic to brick/server2?

>    While investigating, I discover that there are a lot of phantom
>    files that are listed with just a filename, and lots of question marks
>    (????) when doing an ls l. rm rf * on the Gluster volume seems to
>    complete, but leaves behind all the broken files.

It would be helpful if you could show the actual ls -l output, but my guess
is you are seeing something like this (demo on a local filesystem, not
gluster):

$ mkdir testdir
$ touch testdir/testfile
$ chmod -x testdir
$ ls -l testdir
ls: cannot access testdir/testfile: Permission denied
total 0
-????????? ? ? ? ?                ? testfile

If so, these aren't really "phantom files", but the permissions of the
enclosing directory are set wrongly (which might be some intermediate state
in gluster replication, I don't know)

So an "ls -ld" of the parent directory would also be a good thing. Also, are
these filenames those you'd expect your application to create?

What might be helpful is to trace your backend-application and what's making
it return a 500 server error, which may or may not be related to these
permissions.  If you can see what file operations the backend is trying to
do and what filesystem error is being returned (e.g.  with strace), this may
make it clearer what's going on.  Then you can perhaps crank up gluster logs
at the appropriate place too.

Any log messages talking about "split brain" would be especially interesting.

Regards,

Brian.



More information about the Gluster-users mailing list