[Gluster-devel] write-behind bug with ftruncate
Emmanuel Dreyfus
manu at netbsd.org
Sun Jul 17 08:59:09 UTC 2011
Pavan T C <tcp at gluster.com> wrote:
> If your version of NetBSD has dtrace ported and enabled, you can check
> if the reordering of the calls is happening within fuse at runtime
> without modifying fuse.
NetBSD FUSE is in userland, and I am actively developing it, therefore it is
not a problem for me to modify it.
However, the reordering does not occur in FUSE, and it seems i was wrong
about write-behind, and that removing it just made the bug disapear by
chance.
As I now understand, the problem is that fuse_setattr_cbk() will request a
ftruncate() after the SETATTR. Here is what I get in the logs:
fuse_write() size = 4096, offset = 39981056
fuse_setattr() fsi->valid = 0x78 => truncate_needed, size = 39987632
fuse_write() size = 20480, offset = 39985152
(...)
client3_1_writev() size = 4096, offset = 39981056
fuse_setattr_cbk() call fuse_do_truncate, offset = 39987632
client3_1_writev() size = 2480, offset = 39985152
(...)
client3_1_ftruncate() offset = 39987632
Why does it decides to set truncate_needed? fsi->valid = 0x78 means this is
set: | FATTR_FH | FATTR_SIZE
Here is the offending code:
#define FATTR_MASK (FATTR_SIZE \
| FATTR_UID | FATTR_GID \
| FATTR_ATIME | FATTR_MTIME \
| FATTR_MODE)
(...)
if ((fsi->valid & (FATTR_MASK)) != FATTR_SIZE) {
if (fsi->valid & FATTR_SIZE) {
state->size = fsi->size;
state->truncate_needed = _gf_true;
}
The sin is therefore to set FATTR_ATIME | FATTR_MTIME, while glusterfs
assumes this is a ftruncate() calls because only FATTR_SIZE is set. Am I
correct?
> Let me know if this line of debugging helps. I need to understand the
> details of the conversion of ftruncate() to FUSE SETATTR. A pointer to
> the corresponding NetBSD code will help.
That happens in the kernel.
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/kern/vfs_syscalls.c?rev=1.431
in sys_ftruncate()
if (vp->v_type == VDIR)
error = EISDIR;
else if ((error = vn_writechk(vp)) == 0) {
vattr_null(&vattr);
vattr.va_size = SCARG(uap, length);
error = VOP_SETATTR(vp, &vattr, fp->f_cred);
}
VOP_SETATTR() is the vnode method. It will eventually turn into
FUSE_SETATTR. glusterfs will convert it back to a ftruncate in fuse_setattr
() and fuse_setattr_cbk() from fuse-bridge.c.
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org
More information about the Gluster-devel
mailing list