[Gluster-users] Deleted files reappearing

Fri Nov 8 19:33:19 UTC 2013

On 11/07/2013 05:19 PM, Øystein Viggen wrote:
> Hi,
>
> I have a small test setup on Ubuntu 12.04, using the
> 3.4.1-ubuntu1~precise1 packages of glusterfs from the recommended PPA.
> There are two gluster servers (replicate 2) and one client.  Bricks are
> 16 GB xfs -i size=512 filesystems.  All servers run on vmware.
>
> I've been using the linux kernel source for some simple performance and
> stability tests with many small files.  When deleting the linux kernel
> tree with rm -Rf while rebooting one glusterfs server, it seems that
> some deletes are missed, or "recreated".  Here's how it goes:
>
> root at client:/mnt# rm -Rf linux-3.12
>
> At this point, I run "shutdown -r now" on one server.  The deletion
> seems to keep running just fine, but just as the server comes back up, I
> get something like this on the client:
>
> rm: cannot remove `linux-3.12/arch/mips/ralink/dts': Directory not empty
>
> After the rm has run to completion:
>
> root at client:/mnt# find linux-3.12 -type f
> linux-3.12/arch/mips/ralink/dts/Makefile
>
> Sometimes it's more than one file, too.  "gluster volume heal volname
> info" shows no outstanding entries.
>
> If I turn off one server before running rm, and turn it on during the rm
> run, a similar thing happens, only it seems worse.  In one test, I had
> 9220 files left after rm had finished.
>
> If both servers are up during the rm run, all files are deleted as
> expected every time.
>
>
> What is happening here, and can I do something to avoid it?
It sounds like a split brain issue. Below mentioned commands will help 
you to figure this out.

  gluster v heal <volumeName> info split-brain
  gluster v heal <volumeName> info heal-failed

If you see any split-brain , then it is a bug. We can check with 
gluster-devel if it is  fixed in the master branch or there is bug for 
it in bugzilla.

>
> I was hoping that in a replica 2 cluster, you could safely reboot one
> server at a time (with sync-up time in between) to, say, apply OS
> patches without taking the gluster volume offline.
>
Yup, this should work. But not sure if there is any bug in gluster which 
is causing the issue for you. The work around would be to do

stop/kill all gluster service in one of the machine. make sure the 
glusterd service does not automatically start at next boot ( one time 
activity) . Apply patches to the os, reboot it, start the glusterd 
service. Check the self heal process to do all the sync required. You 
can repeat the steps for the other node once this node have all 
consistent data.
> I'm thankful for any help.
>
> Øystein
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users