[Gluster-users] Samba vfs_glusterfs no such file or directory

Tue Jun 17 12:37:54 UTC 2014

Hi All,

I am running into a strange error with samba and vfs_glusterfs.

Here is some version information:
[root at gfs-a-3 samba]# smbd -V
Version 3.6.20

[root at gfs-a-3 tmp]# glusterfsd --version
glusterfs 3.4.1 built on Oct 21 2013 09:23:23

Samba is configured in an AD environment, using winbind. Group resolution,
user resolution, and cross--mapping of SIDs to IDs to usernames all works
as expected. The vfs_glusterfs module is working perfectly for the vast
majority of the users I have configured. A small percentage of the users,
though, get an "access is denied" error when they attempt to access the
share. They are all configured in the same way as the users that are
working.

We initially thought that perhaps the number of groups the user was a
member of was causing the issue. This still might be the case but we're not
sure how to verify that guess.

When we connect with a working user, with glusterfs:loglevel = 10, here is
are the last bits of log file. I'm not really sure where the interesting
lines are, any guidance would be much appreciated:

[2014-06-17 12:11:53.753289] D
> [client-handshake.c:1430:client_setvolume_cbk] 0-shares-client-5:
> clnt-lk-version = 1, server-lk-version = 0
> [2014-06-17 12:11:53.753296] I
> [client-handshake.c:1456:client_setvolume_cbk] 0-shares-client-5: Connected
> to 172.16.10.13:49153, attached to remote volume
> '/mnt/a-3-shares-brick-2/brick'.
> [2014-06-17 12:11:53.753301] I
> [client-handshake.c:1468:client_setvolume_cbk] 0-shares-client-5: Server
> and Client lk-version numbers are not same, reopening the fds
> [2014-06-17 12:11:53.753306] D
> [client-handshake.c:1318:client_post_handshake] 0-shares-client-5: No fds
> to open - notifying all parents child up
> [2014-06-17 12:11:53.753313] D
> [client-handshake.c:486:client_set_lk_version] 0-shares-client-5: Sending
> SET_LK_VERSION
> [2014-06-17 12:11:53.753320] T [rpc-clnt.c:1302:rpc_clnt_record]
> 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2014-06-17 12:11:53.753327] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 132, payload: 68, rpc hdr: 64
> [2014-06-17 12:11:53.753344] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x32x Program: GlusterFS Handshake,
> ProgVers: 2, Proc: 4) to rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753353] T [rpc-clnt.c:1302:rpc_clnt_record]
> 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2014-06-17 12:11:53.753360] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 64, payload: 0, rpc hdr: 64
> [2014-06-17 12:11:53.753373] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x33x Program: GlusterFS Handshake,
> ProgVers: 2, Proc: 3) to rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753381] I [afr-common.c:3698:afr_notify]
> 0-shares-replicate-2: Subvolume 'shares-client-5' came back up; going
> online.
> [2014-06-17 12:11:53.753393] T [rpc-clnt.c:1302:rpc_clnt_record]
> 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2014-06-17 12:11:53.753399] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 84, payload: 20, rpc hdr: 64
> [2014-06-17 12:11:53.753413] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x34x Program: GlusterFS 3.3, ProgVers:
> 330, Proc: 14) to rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753430] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-shares-client-5: received rpc message (RPC XID: 0x32x Program: GlusterFS
> Handshake, ProgVers: 2, Proc: 4) from rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753441] I
> [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-5:
> Server lk version = 1
> [2014-06-17 12:11:53.753451] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-shares-client-5: received rpc message (RPC XID: 0x33x Program: GlusterFS
> Handshake, ProgVers: 2, Proc: 3) from rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753474] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-shares-client-5: received rpc message (RPC XID: 0x34x Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 14) from rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753483] D [dht-diskusage.c:80:dht_du_info_cbk]
> 0-shares-dht: on subvolume 'shares-replicate-2': avail_percent is: 95.00
> and avail_space is: 1050826719232 and avail_inodes is: 99.00

And here is a log snip from the non-working user:

[2014-06-17 12:07:17.866693] W [socket.c:514:__socket_rwv]
> 0-shares-client-13: readv failed (No data available)
> [2014-06-17 12:07:17.866699] D
> [socket.c:1962:__socket_proto_state_machine] 0-shares-client-13: reading
> from socket failed. Error (No data available), peer (172.16.10.13:49155)
> [2014-06-17 12:07:17.866707] D [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
> [2014-06-17 12:07:17.866716] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-13: cleaning
> up state in transport object 0x7f22300aaa60
> [2014-06-17 12:07:17.866722] I [client.c:2097:client_rpc_notify]
> 0-shares-client-13: disconnected
> [2014-06-17 12:07:17.866735] E [afr-common.c:3735:afr_notify]
> 0-shares-replicate-6: All subvolumes are down. Going offline until atleast
> one of them comes back up.
> [2014-06-17 12:07:17.866743] D [socket.c:486:__socket_rwv]
> 0-shares-client-14: EOF on socket
> [2014-06-17 12:07:17.866750] W [socket.c:514:__socket_rwv]
> 0-shares-client-14: readv failed (No data available)
> [2014-06-17 12:07:17.866755] D
> [socket.c:1962:__socket_proto_state_machine] 0-shares-client-14: reading
> from socket failed. Error (No data available), peer (172.16.10.12:49162)
> [2014-06-17 12:07:17.866761] D [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
> [2014-06-17 12:07:17.866769] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-14: cleaning
> up state in transport object 0x7f2230085b60
> [2014-06-17 12:07:17.866775] I [client.c:2097:client_rpc_notify]
> 0-shares-client-14: disconnected
> [2014-06-17 12:07:17.866781] D [glfs-master.c:106:notify] 0-gfapi: got
> notify event 8
> [2014-06-17 12:07:17.866787] D [socket.c:486:__socket_rwv]
> 0-shares-client-15: EOF on socket
> [2014-06-17 12:07:17.866801] W [socket.c:514:__socket_rwv]
> 0-shares-client-15: readv failed (No data available)
> [2014-06-17 12:07:17.866807] D
> [socket.c:1962:__socket_proto_state_machine] 0-shares-client-15: reading
> from socket failed. Error (No data available), peer (172.16.10.14:49159)
> [2014-06-17 12:07:17.866813] D [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
> [2014-06-17 12:07:17.866820] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-15: cleaning
> up state in transport object 0x7f2230060c00
> [2014-06-17 12:07:17.866827] I [client.c:2097:client_rpc_notify]
> 0-shares-client-15: disconnected
> [2014-06-17 12:07:17.866832] E [afr-common.c:3735:afr_notify]
> 0-shares-replicate-7: All subvolumes are down. Going offline until atleast
> one of them comes back up.

Note that these log snips are from the same machine, minutes apart, same
config other than the username that is connecting to the share. It almost
appears as though the vfs_glusterfs interaction with the gluster volume is
related to the username.

I am trying to relate this to other similar bugs I've been able to dig up
online. Is there a limit to the number of clients that a gluster node can
handle?

What am I missing here?

Cheers,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140617/d9978b91/attachment.html>