[Gluster-devel] Issues with replacing hard links with symlinks in the .glusterfs directory

Fri Oct 23 01:58:53 UTC 2015

First off: I've based my work off of the release of 3.7.3, since it was the
most recent release when I started this project, and I couldn't get HEAD to
build on freebsd. (I'm using a freebsd server, and linux clients)

I realize that many things will be broken by doing this (renaming open
files, deleting open files, possibly some other stuff), but I can live with
those limitations.

What I've done:
 I've modified the code to failback to a symlink if making a hardlink fails
(which it will do somewhat frequently due to being on a different
filesystem).
I created an extended property on symlinks that are emulating hard links
changed the setattr code to check this before it tries to set the
attributs, and if it is set, it dereferences the link, then proceeds with
the setattr

To test this, I made a file, and ran chmod +x on it
the good: attributes were correctly set on the file!
the bad: chmod says it failed with EIO
my issue: I have no clue where this EIO is coming from.  Under the hood,s
chmod is calling fchmodat

After no luck with printf debugging, I just ran gluster under gdb, and set
a breakpoint on send_fuse_iov.  Here's the backtrace:
#0  send_fuse_iov (this=0x63a150, finh=0x7fffe0005fe0,
iov_out=0x7ffff08e7500, count=2) at fuse-bridge.c:158
#1  0x00007ffff550fcfd in send_fuse_data (this=0x63a150,
finh=0x7fffe0005fe0, data=0x7ffff08e75a0, size=104) at fuse-bridge.c:197
#2  0x00007ffff5511be1 in fuse_attr_cbk (frame=0x7fffe000145c,
cookie=0x7fffe000616c, this=0x63a150, op_ret=0, op_errno=117,
buf=0x7fffe0006734, xdata=0x0) at fuse-bridge.c:734
#3  0x00007ffff0b08714 in io_stats_stat_cbk (frame=0x7fffe000616c,
cookie=0x7fffe000626c, this=0x7fffec014de0, op_ret=0, op_errno=117,
buf=0x7fffe0006734, xdata=0x0) at io-stats.c:1344
#4  0x00007ffff0d2397e in mdc_stat_cbk (frame=0x7fffe000626c,
cookie=0x7fffe000645c, this=0x7fffec013890, op_ret=0, op_errno=117,
buf=0x7fffe0006734, xdata=0x0) at md-cache.c:901
#5  0x00007ffff7b30ad3 in default_stat_cbk (frame=0x7fffe000645c,
cookie=0x7fffe00029ec, this=0x7fffec00b910, op_ret=0, op_errno=117,
buf=0x7fffe0006734, xdata=0x0) at defaults.c:853
#6  0x00007ffff1be3165 in dht_attr_cbk (frame=0x7fffe00029ec,
cookie=0x7fffe0001bdc, this=0x7fffec00a400, op_ret=0, op_errno=0,
stbuf=0x7ffff08e78d0, xdata=0x0) at dht-inode-read.c:250
#7  0x00007ffff1e1e7b7 in client3_3_stat_cbk (req=0x7fffe00075dc,
iov=0x7fffe000761c, count=1, myframe=0x7fffe0001bdc) at
client-rpc-fops.c:535
#8  0x00007ffff78ec67c in rpc_clnt_handle_reply (clnt=0x7fffec02bd40,
pollin=0x7fffe40058a0) at rpc-clnt.c:766
#9  0x00007ffff78eca73 in rpc_clnt_notify (trans=0x7fffec02c020,
mydata=0x7fffec02bd70, event=RPC_TRANSPORT_MSG_RECEIVED,
data=0x7fffe40058a0) at rpc-clnt.c:894
#10 0x00007ffff78e8bb2 in rpc_transport_notify (this=0x7fffec02c020,
event=RPC_TRANSPORT_MSG_RECEIVED, data=0x7fffe40058a0) at
rpc-transport.c:544
#11 0x00007ffff32f1591 in socket_event_poll_in (this=0x7fffec02c020) at
socket.c:2290
#12 0x00007ffff32f1ad5 in socket_event_handler (fd=12, idx=1,
data=0x7fffec02c020, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403
#13 0x00007ffff7b9c02b in event_dispatch_epoll_handler
(event_pool=0x635210, event=0x7ffff08e7e30) at event-epoll.c:575
#14 0x00007ffff7b9c409 in event_dispatch_epoll_worker (data=0x7fffec029460)
at event-epoll.c:678
#15 0x00007ffff698137b in start_thread () from /lib64/libpthread.so.0
#16 0x00007ffff63216fd in clone () from /lib64/libc.so.6

  It is currently trying to send two buffers, one of length 16, one of
length 104. Printing out the individual components of this buffer, we have:
(gdb) p *fouh
$27 = {len = 120, error = 0, unique = 11}
(gdb) p fao
$28 = {attr_valid = 1, attr_valid_nsec = 0, dummy = 0, attr = {ino =
13385484696163529676, size = 6, blocks = 2, atime = 1445490621, mtime =
1444602468, ctime = 1444602468, atimensec = 947118511,
    mtimensec = 479067414, ctimensec = 479067414, mode = 16877, nlink = 2,
uid = 1043, gid = 1045, rdev = 4026597375, blksize = 131072, padding =
32767}}
and to round things out, the input:
(gdb) p *finh
$30 = {len = 56, opcode = 3, unique = 11, nodeid = 140736951487340, uid =
0, gid = 0, pid = 6233, padding = 0}

Since all of the code that I've changed is on the server side, I assume
that some pieces of data are being sent incorrectly, but I cannot identify
them.  From what I can tell, the data being sent back to the kernel is
correct.

Questions: Does anything look wrong with the data that is being sent to the
kernel?
Can anyone think of another reason that this would result in an EIO?
Is there any more information that would help you answer any of these
questions.

If you want more real-time conversing than email typically provides, I'm on
irc as mjrosenb.  Thanks --Marty
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151022/7175cf46/attachment.html>