[Gluster-users] Samba vfs_glusterfs no such file or directory

Mon Jun 23 15:50:19 UTC 2014

On Tue, Jun 17, 2014 at 08:37:54AM -0400, David Gibbons wrote:
> Hi All,
> 
> I am running into a strange error with samba and vfs_glusterfs.
> 
> Here is some version information:
> [root at gfs-a-3 samba]# smbd -V
> Version 3.6.20
> 
> [root at gfs-a-3 tmp]# glusterfsd --version
> glusterfs 3.4.1 built on Oct 21 2013 09:23:23
> 
> Samba is configured in an AD environment, using winbind. Group resolution,
> user resolution, and cross--mapping of SIDs to IDs to usernames all works
> as expected. The vfs_glusterfs module is working perfectly for the vast
> majority of the users I have configured. A small percentage of the users,
> though, get an "access is denied" error when they attempt to access the
> share. They are all configured in the same way as the users that are
> working.
> 
> We initially thought that perhaps the number of groups the user was a
> member of was causing the issue. This still might be the case but we're not
> sure how to verify that guess.

Samba with vfs_glusterfs has a limit of approx. 93 groups. If 'id $USER' 
returns more than 93 groups, those users can run into various issues.
'Access is denied' is one of the most common errors they'll see.

The upcoming 3.5.1 release has a 'server.manage-gids' volume option.  
With this option enabled, the number of groups will be limited to 65535.

> When we connect with a working user, with glusterfs:loglevel = 10, here is
> are the last bits of log file. I'm not really sure where the interesting
> lines are, any guidance would be much appreciated:
> 
> [2014-06-17 12:11:53.753289] D
> > [client-handshake.c:1430:client_setvolume_cbk] 0-shares-client-5:
> > clnt-lk-version = 1, server-lk-version = 0
> > [2014-06-17 12:11:53.753296] I
> > [client-handshake.c:1456:client_setvolume_cbk] 0-shares-client-5: Connected
> > to 172.16.10.13:49153, attached to remote volume
> > '/mnt/a-3-shares-brick-2/brick'.
> > [2014-06-17 12:11:53.753301] I
> > [client-handshake.c:1468:client_setvolume_cbk] 0-shares-client-5: Server
> > and Client lk-version numbers are not same, reopening the fds
> > [2014-06-17 12:11:53.753306] D
> > [client-handshake.c:1318:client_post_handshake] 0-shares-client-5: No fds
> > to open - notifying all parents child up
> > [2014-06-17 12:11:53.753313] D
> > [client-handshake.c:486:client_set_lk_version] 0-shares-client-5: Sending
> > SET_LK_VERSION
> > [2014-06-17 12:11:53.753320] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> > [2014-06-17 12:11:53.753327] T
> > [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> > 132, payload: 68, rpc hdr: 64
> > [2014-06-17 12:11:53.753344] T [rpc-clnt.c:1499:rpc_clnt_submit]
> > 0-rpc-clnt: submitted request (XID: 0x32x Program: GlusterFS Handshake,
> > ProgVers: 2, Proc: 4) to rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753353] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> > [2014-06-17 12:11:53.753360] T
> > [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> > 64, payload: 0, rpc hdr: 64
> > [2014-06-17 12:11:53.753373] T [rpc-clnt.c:1499:rpc_clnt_submit]
> > 0-rpc-clnt: submitted request (XID: 0x33x Program: GlusterFS Handshake,
> > ProgVers: 2, Proc: 3) to rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753381] I [afr-common.c:3698:afr_notify]
> > 0-shares-replicate-2: Subvolume 'shares-client-5' came back up; going
> > online.
> > [2014-06-17 12:11:53.753393] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> > [2014-06-17 12:11:53.753399] T
> > [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> > 84, payload: 20, rpc hdr: 64
> > [2014-06-17 12:11:53.753413] T [rpc-clnt.c:1499:rpc_clnt_submit]
> > 0-rpc-clnt: submitted request (XID: 0x34x Program: GlusterFS 3.3, ProgVers:
> > 330, Proc: 14) to rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753430] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-shares-client-5: received rpc message (RPC XID: 0x32x Program: GlusterFS
> > Handshake, ProgVers: 2, Proc: 4) from rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753441] I
> > [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-5:
> > Server lk version = 1
> > [2014-06-17 12:11:53.753451] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-shares-client-5: received rpc message (RPC XID: 0x33x Program: GlusterFS
> > Handshake, ProgVers: 2, Proc: 3) from rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753474] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-shares-client-5: received rpc message (RPC XID: 0x34x Program: GlusterFS
> > 3.3, ProgVers: 330, Proc: 14) from rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753483] D [dht-diskusage.c:80:dht_du_info_cbk]
> > 0-shares-dht: on subvolume 'shares-replicate-2': avail_percent is: 95.00
> > and avail_space is: 1050826719232 and avail_inodes is: 99.00
> 
> 
> And here is a log snip from the non-working user:
> 
> [2014-06-17 12:07:17.866693] W [socket.c:514:__socket_rwv]
> > 0-shares-client-13: readv failed (No data available)
> > [2014-06-17 12:07:17.866699] D
> > [socket.c:1962:__socket_proto_state_machine] 0-shares-client-13: reading
> > from socket failed. Error (No data available), peer (172.16.10.13:49155)
> > [2014-06-17 12:07:17.866707] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2014-06-17 12:07:17.866716] T
> > [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-13: cleaning
> > up state in transport object 0x7f22300aaa60
> > [2014-06-17 12:07:17.866722] I [client.c:2097:client_rpc_notify]
> > 0-shares-client-13: disconnected
> > [2014-06-17 12:07:17.866735] E [afr-common.c:3735:afr_notify]
> > 0-shares-replicate-6: All subvolumes are down. Going offline until atleast
> > one of them comes back up.
> > [2014-06-17 12:07:17.866743] D [socket.c:486:__socket_rwv]
> > 0-shares-client-14: EOF on socket
> > [2014-06-17 12:07:17.866750] W [socket.c:514:__socket_rwv]
> > 0-shares-client-14: readv failed (No data available)
> > [2014-06-17 12:07:17.866755] D
> > [socket.c:1962:__socket_proto_state_machine] 0-shares-client-14: reading
> > from socket failed. Error (No data available), peer (172.16.10.12:49162)
> > [2014-06-17 12:07:17.866761] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2014-06-17 12:07:17.866769] T
> > [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-14: cleaning
> > up state in transport object 0x7f2230085b60
> > [2014-06-17 12:07:17.866775] I [client.c:2097:client_rpc_notify]
> > 0-shares-client-14: disconnected
> > [2014-06-17 12:07:17.866781] D [glfs-master.c:106:notify] 0-gfapi: got
> > notify event 8
> > [2014-06-17 12:07:17.866787] D [socket.c:486:__socket_rwv]
> > 0-shares-client-15: EOF on socket
> > [2014-06-17 12:07:17.866801] W [socket.c:514:__socket_rwv]
> > 0-shares-client-15: readv failed (No data available)
> > [2014-06-17 12:07:17.866807] D
> > [socket.c:1962:__socket_proto_state_machine] 0-shares-client-15: reading
> > from socket failed. Error (No data available), peer (172.16.10.14:49159)
> > [2014-06-17 12:07:17.866813] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2014-06-17 12:07:17.866820] T
> > [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-15: cleaning
> > up state in transport object 0x7f2230060c00
> > [2014-06-17 12:07:17.866827] I [client.c:2097:client_rpc_notify]
> > 0-shares-client-15: disconnected
> > [2014-06-17 12:07:17.866832] E [afr-common.c:3735:afr_notify]
> > 0-shares-replicate-7: All subvolumes are down. Going offline until atleast
> > one of them comes back up.
> 
> 
> Note that these log snips are from the same machine, minutes apart, same
> config other than the username that is connecting to the share. It almost
> appears as though the vfs_glusterfs interaction with the gluster volume is
> related to the username.

I'm not sure how vfs_glusterfs fails when the user belongs to more than 
93 groups. The sending of the READ procedure would fail, maybe this 
results in an incorrect assumption that the bricks are unreachable.

> I am trying to relate this to other similar bugs I've been able to dig up
> online. Is there a limit to the number of clients that a gluster node can
> handle?

No, not that I am aware of.

> What am I missing here?

Very little, I would also suspect that the number of groups that those
problematic users belong to is too big.

HTH,
Niels