[Gluster-devel] Problems with graph switch in disperse
Xavier Hernandez
xhernandez at datalab.es
Wed Dec 24 12:55:17 UTC 2014
Hi,
I'm experiencing a problem when gluster graph is changed as a result of
a replace-brick operation (probably with any other operation that
changes the graph) while the client is also doing other tasks, like
writing a file.
When operation starts, I see that the replaced brick is disconnected,
but writes continue working normally with one brick less.
At some point, another graph is created and comes online. Remaining
bricks on the old graph are disconnected and the old graph is destroyed.
I see how new write requests are sent to the new graph.
This seems correct. However there's a point where I see this:
[2014-12-24 11:29:58.541130] T [fuse-bridge.c:2305:fuse_write_resume]
0-glusterfs-fuse: 2234: WRITE (0x16dcf3c, size=131072, offset=255721472)
[2014-12-24 11:29:58.541156] T [ec-helpers.c:101:ec_trace] 2-ec:
WIND(INODELK) 0x7f8921b7a9a4(0x7f8921b78e14) [refs=5, winds=3, jobs=1]
frame=0x7f8932e92c38/0x7f8932e9e6b0, min/exp=3/3, err=0 state=1
{111:000:000} idx=0
[2014-12-24 11:29:58.541292] T [rpc-clnt.c:1384:rpc_clnt_record]
2-patchy-client-0: Auth Info: pid: 0, uid: 0, gid: 0, owner:
d025e932897f0000
[2014-12-24 11:29:58.541296] T [io-cache.c:133:ioc_inode_flush]
2-patchy-io-cache: locked inode(0x16d2810)
[2014-12-24 11:29:58.541354] T
[rpc-clnt.c:1241:rpc_clnt_record_build_header] 2-rpc-clnt: Request
fraglen 152, payload: 84, rpc hdr: 68
[2014-12-24 11:29:58.541408] T [io-cache.c:137:ioc_inode_flush]
2-patchy-io-cache: unlocked inode(0x16d2810)
[2014-12-24 11:29:58.541493] T [io-cache.c:133:ioc_inode_flush]
2-patchy-io-cache: locked inode(0x16d2810)
[2014-12-24 11:29:58.541536] T [io-cache.c:137:ioc_inode_flush]
2-patchy-io-cache: unlocked inode(0x16d2810)
[2014-12-24 11:29:58.541537] T [rpc-clnt.c:1577:rpc_clnt_submit]
2-rpc-clnt: submitted request (XID: 0x17 Program: GlusterFS 3.3,
ProgVers: 330, Proc: 29) to rpc-transport (patchy-client-0)
[2014-12-24 11:29:58.541646] W [fuse-bridge.c:2271:fuse_writev_cbk]
0-glusterfs-fuse: 2234: WRITE => -1 (Input/output error)
It seems that fuse still has a write request pending for graph 0. It is
resumed but it returns EIO without calling the xlator stack (operations
seen between the two log messages are from other operations and they are
sent to graph 2). I'm not sure why this happens and how I should aviod this.
I tried the same scenario with replicate and it seems to work, so there
must be something wrong in disperse, but I don't see where the problem
could be.
Any ideas ?
Thanks,
Xavi
More information about the Gluster-devel
mailing list