[Bugs] [Bug 1414608] Weird directory appear when rmdir the directory in disk full condition

Wed Jan 25 02:10:02 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1414608

George <george.lian at nokia.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|needinfo?(george.lian at nokia |
                   |.com)                       |

--- Comment #2 from George <george.lian at nokia.com> ---
it is the volume of AFR type, in my env, we have the volume with replicate 2
bricks.
and if with only one brick( or let a brick down), will have no this issue.

it could be easy reproduced with the AFR with 2 bricks and the step I shared,
so I suggest you could collect the log which you are wanted.

and with my investigation these days, the root cause seems clear:
1) in disk full condition of brick, write a lots small file from client will
lead to zero entry created in the brick, but the FOP write will fail, and in
this case, the zero entry of file in 2 brick will not the same, that mean one
entry of file will created in the brick A, but not created in brick B, and
another entry will created in brick B, but not created in created A.
2) because disk full condition in both brick, the auto heal can't heal the
entry.
3) so the result is when create a lots file in both brick finished, the result
is list (ls) on both brick directory, the count of file entry is not the same.
4) and when rm command executed on client mount point, the RM will just
getdents from the first brick , such like brick A. and unlink all the file from
brick A, it will also unlink the same file existed in brick B, but will not
remove the file which not exist in brick A , but exist in brick B.
5) the glusterfs current implement will return success when rmdir success in
brick A, but not success in brick B due to some files not unlinked.
6) and because another process still open the directory which rmdir successed,
the kernel will keep the inode wtih S_DEAD(means "removed, but still open
directory")
7) when "cd" command executed in client to change directory to the "removed"
directory, the lookup of glusterfs will return success because the directory
not removed in brick B, and heal will triggerred , because more file is unlink
in previous step, so in this step will lead to the file exist in brick B sync
to brick A, so "cd" command will success. that mean in client, it can entry the
"removed" directory.
8) but when "ls" command run in the "removed" directory, it will trigger
getdents syscall, and from the kernel view, the direcoty is removed , so it
will TERMINCATED the syscall and return to user space result with "no such file
and directory", the touch and other write FOP with same result.

2 solutions for this issue:
1)if FOP is rmdir, let it return failed if one brick return failed. (current if
one is success treat it as success), do this change make sense? will it lead to
any risks?

2)give threshold reserved disk space on client, let's say 100M, or a percent
rate lets saye 1%, in client if the disk space is less than the threshold, the
new coming write FOP should be rejected. I have seem a parameter with
"cluster.min-free-disk", but it seems can't work for this issue, how can I use
the parameter? does it works for this issue, if not, could we enhance the
parameter to avoid this issue?

your comments is highly welcome:) and I will try the solution 1 from now, will
update to you the result if I have.

Thanks a lots!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.