[Bugs] [Bug 1793852] Mounts fails after reboot of 1/3 gluster nodes

bugzilla at redhat.com bugzilla at redhat.com
Wed Jan 22 05:04:17 UTC 2020


https://bugzilla.redhat.com/show_bug.cgi?id=1793852



--- Comment #1 from Mohit Agrawal <moagrawa at redhat.com> ---
Reproducer:
1) created 3 node cluster(host10/11/12), bmux enabled
2) 500 vols of either arbiter of x3 created
3) mounted all 500 volumes on 3 clients(rhsqa6/7/8)
4) started linux untar on 4 volumes parallely on each of the 3
clients(basically, 4 screen sessions in each client, with linux untar being
done sequentially on set of 25 vols in each session)
5) did node reboot of rhsqa11
6) post reboot, checked all clients for vol mount sanity
client_host->all vols mounted
client_host-> arb_b68n0p9myx2id failed to mount
[root at client_host glusterfs]# grep -r "failed: Authentication failed"
mnt-arb_b68n0p9myx2id.log:[2020-01-21 12:32:06.281545] E [MSGID: 114044]
[client-handshake.c:1031:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1:
SETVOLUME on remote-host failed: Authentication failed [Permission denied]

RCA:
Below are the client logs throwing at the time of getting client_setvolume_cbk
failed.
As we can see here client is getting error only for brick (client-1) not for
other clients.
It means other clients are already connected. Here we can see client is getting
AUTH_FAILED event 
and when fuse gets AUTH_FAILED it calls fini so client is unmounted.

>>>>>>>>>>>>>>>.

[2020-01-21 11:43:01.806402] I [fuse-bridge.c:5840:fuse_graph_sync] 0-fuse:
switched to graph 0
[2020-01-21 12:26:48.626004] I [MSGID: 114018]
[client.c:2331:client_rpc_notify] 0-arb_b68n0p9myx2id-client-1: disconnected
from arb_b68n0p9myx2id-client-1. Client process will keep trying to connect to
glusterd until brick's port is available
[2020-01-21 12:31:56.095015] E [MSGID: 114058]
[client-handshake.c:1449:client_query_portmap_cbk]
0-arb_b68n0p9myx2id-client-1: failed to get the port number for remote
subvolume. Please run 'gluster volume status' on server to see if brick process
is running.
[2020-01-21 12:31:56.095094] I [MSGID: 114018]
[client.c:2331:client_rpc_notify] 0-arb_b68n0p9myx2id-client-1: disconnected
from arb_b68n0p9myx2id-client-1. Client process will keep trying to connect to
glusterd until brick's port is available
[2020-01-21 12:32:06.071586] I [rpc-clnt.c:2035:rpc_clnt_reconfig]
0-arb_b68n0p9myx2id-client-1: changing port to 49152 (from 0)
[2020-01-21 12:32:06.281470] W [MSGID: 114043]
[client-handshake.c:997:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1:
failed to set the volume [Permission denied]
[2020-01-21 12:32:06.281528] W [MSGID: 114007]
[client-handshake.c:1026:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1:
failed to get 'process-uuid' from reply dict [Invalid argument]
[2020-01-21 12:32:06.281545] E [MSGID: 114044]
[client-handshake.c:1031:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1:
SETVOLUME on remote-host failed: Authentication failed [Permission denied]
[2020-01-21 12:32:06.281558] I [MSGID: 114049]
[client-handshake.c:1115:client_setvolume_cbk] 0-arb_b68n0p9myx2id-client-1:
sending AUTH_FAILED event
[2020-01-21 12:32:06.281596] E [fuse-bridge.c:6358:notify] 0-fuse: Server
authenication failed. Shutting down.
[2020-01-21 12:32:06.281609] I [fuse-bridge.c:6900:fini] 0-fuse: Unmounting
'/mnt/arb_b68n0p9myx2id'.
[2020-01-21 12:32:06.309745] I [fuse-bridge.c:6106:fuse_thread_proc] 0-fuse:
initating unmount of /mnt/arb_b68n0p9myx2id
[2020-01-21 12:32:06.309916] I [fuse-bridge.c:6905:fini] 0-fuse: Closing fuse
connection to '/mnt/arb_b68n0p9myx2id'.
[2020-01-21 12:32:06.311119] W [glusterfsd.c:1581:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7ea5) [0x7f3a88e0cea5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55a28f6002b5]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55a28f60011b] ) 0-: received
signum (15), shutting down

>>>>>>>>>>>>>>>>>>>>

The client was getting Permission denied because brick was not attached at that
moment,server_setvolume execute below code before authenticating a client
request.
Here we can see if get_xlator_by_name is returning NULL we are updating this to
xl so it means the brick process is assuming if no xlator(volname) is found in
graph connect with already running brick but gf_authenticate failed and return
EPERM.

LOCK(&ctx->volfile_lock);
    {
        xl = get_xlator_by_name(this, name);
        if (!xl)
            xl = this;
    }
    UNLOCK(&ctx->volfile_lock);


We need to correct this condition to avoid the issue. This code was changed
from this patch(https://review.gluster.org/#/c/glusterfs/+/18048/).


Thanks,
Mohit Agrawal

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the Bugs mailing list