[Gluster-users] Permission denied at some directories/files after a split brain

Mon Feb 10 15:28:54 UTC 2020

On February 10, 2020 3:53:08 PM GMT+02:00, Alberto Bengoa <bengoa at gmail.com> wrote:
>Hello guys,
>
>We are running GlusterFS 6.6 in Replicate mode (1 x 3). After a
>split-brain
>and a massive heal process, we noticed that our app started to receive
>thousands of permissions denied while trying to access files and
>directories.
>
>Exemple log of a failed access atempt to a specific directory:
>
>[2020-02-10 10:38:17.402080] I [MSGID: 139001]
>[posix-acl.c:263:posix_acl_log_permit_denied]
>0-app_data-access-control:
>client:
>CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1,
>gfid: 092f1e28-d6a8-4ca9-95d5-75dc8ad1c835,
>req(uid:498,gid:498,perm:4,ngrps:1),
>ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
>[Permission denied]
>[2020-02-10 10:38:17.402182] E [MSGID: 115056]
>[server-rpc-fops_v2.c:687:server4_opendir_cbk] 0-app_data-server:
>6257941:
>OPENDIR /mailboxes.old/8692/211411002/Old
>(092f1e28-d6a8-4ca9-95d5-75dc8ad1c835), client:
>CTX_ID:7d744c50-43a1-4f81-9330-001b5dcaddb7-GRAPH_ID:0-PID:2310-HOST:ast10.local.domain-PC_NAME:app_data-client-1-RECON_NO:-1,
>error-xlator: app_data-access-control [Permission denied]
>
>The permission denied happens only to unprivileged users, even if that
>unprivileged user is the directory owner. The root user is able to
>access
>all files, and if we "touch" the file/directory as root it *sometimes*
>fixes the problem.
>
>We noticed inconsistent Access/Change dates. Here a stat of a directory
>before touching it, showing these inconsistencies:
>
>  File: ‘Old’
>  Size: 4096       Blocks: 8          IO Block: 131072 directory
>Device: 27h/39d Inode: 10388898073370567318  Links: 2
>Access: (2775/drwxrwsr-x)  Uid: (  498/app)   Gid: (  498/app)
>Access: 1970-01-01 01:00:00.000000000 +0100
>Modify: 2020-02-07 13:21:10.365297527 +0000
>Change: 1970-01-01 01:00:00.000000000 +0100
> Birth: -
>
>I think this case is similar to the reported here[1] and discussed at
>thread "ACL issue v6.6, v6.7, v7.1, v7.2", despite the fact that we are
>not
>using libvirt. We do use ACLs, but not in this particular directory.
>
>Any thoughts on this?
>
>[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1797099
>
>Thanks,
>Alberto Bengoa

Hi Alberto,
Sadly you should verify if the issue is the same.
Enable the trace logs for the bricks and verify if the errors in the logs with those in the bugzilla.
Don't forget to stop the trace log or your logs' dir will get full.

What version of gluster are you using ?
In my case only a  downgrade has restored the operation of the cluster, so you should consider that as an option (last, but still an option).

You can try to run a find against the fuse and 'find  /path/to/fuse -exec setfacl -m u:root:rw {} \;'
Maybe that will force gluster to read the ACLs again.

Good luck!
If you have the option, join the next gluster meeting and ask for an update (if the issue is actually the same).

Best Regards,
Strahil Nikolov