[Bugs] [Bug 1449782] New: quota: limit-usage command failed with error " Failed to start aux mount"

Wed May 10 16:29:16 UTC 2017

https://bugzilla.redhat.com/show_bug.cgi?id=1449782

            Bug ID: 1449782
           Summary: quota: limit-usage command failed  with error " Failed
                    to start aux mount"
           Product: GlusterFS
           Version: 3.8
         Component: quota
          Severity: medium
          Assignee: sunnikri at redhat.com
          Reporter: sunnikri at redhat.com
                CC: amukherj at redhat.com, ashah at redhat.com,
                    asrivast at redhat.com, bugs at gluster.org,
                    nbalacha at redhat.com, rhinduja at redhat.com,
                    rhs-bugs at redhat.com, storage-qa-internal at redhat.com

+++ This bug was initially created as a clone of Bug #1433906 +++

Description of problem:

While running di-staf automation tests, After enable quota on volume,
limit-usage command failed with error "Failed to start aux mount"

How reproducible:

Intermittent 

Steps to Reproduce:
1. Create 6*2 distribute replicate volume
2. start the volume
3. Enable quota 
4. Set limit-usage 

Actual results:

limit-usage command failed with error " Failed to start aux mount  "

Expected results:

Limit usage command should not failed

logs from glusterd
====================================
.so.0(runner_log+0x115) [0x7f3fc890e8d5] ) 0-management: Ran script:
/var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=testvol0
--first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-01-19 07:42:41.402803] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already
stopped
[2017-01-19 07:42:41.402923] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: quotad service is
stopped
[2017-01-19 07:42:41.402952] I [MSGID: 106567]
[glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting quotad
service
[2017-01-19 07:43:34.720648] E [MSGID: 106176]
[glusterd-quota.c:1929:glusterd_create_quota_auxiliary_mount] 0-management:
Failed to mount glusterfs client. Please check the log file
/var/log/glusterfs/quota-mount-testvol0.log for more details [File exists]
[2017-01-19 07:43:34.720703] E [MSGID: 106528]
[glusterd-quota.c:2107:glusterd_op_stage_quota] 0-management: Failed to start
aux mount
[2017-01-19 07:43:34.720715] E [MSGID: 106301]
[glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation
'Volume Quota' failed on localhost : Failed to start aux mount
[2017-01-19 07:43:52.293410] W [socket.c:590:__socket_rwv] 0-management: readv
on 10.70.36.4:24007 failed (No data available)

====================================================

[2017-01-19 07:43:34.719329] I [MSGID: 100030] [glusterfsd.c:2412:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args:
/usr/sbin/glusterfs --volfile-server localhost --volfile-id testvol0 -l
/var/log/glusterfs/quota-mount-testvol0.log -p /var/run/gluster/testvol0.pid
--client-pid -5 /var/run/gluster/testvol0/)
[2017-01-19 07:43:34.719758] E [fuse-bridge.c:5518:init] 0-fuse: Mountpoint
/var/run/gluster/testvol0/ seems to have a stale mount, run 'umount
/var/run/gluster/testvol0/' and try again.
[2017-01-19 07:43:34.719775] E [MSGID: 101019] [xlator.c:433:xlator_init]
0-fuse: Initialization of volume 'fuse' failed, review your volfile again

--- Additional comment from Sanoj Unnikrishnan on 2017-01-19 09:17:32 EST ---

>From the logs, The aux mount location /var/run/gluster/testvol0/ has not been
cleanly unmounted from the previous run.

We can also infer that no process was mounted on aux-mount path.

The aux mount is created on the first limit/remove_limit/list command on the
volume and it remains until volume is stopped / deleted / quota is disabled on
the volume (where we do a lazy unmount).

A lazy unmount would have instantaneously removed the path to the mount point,
since the path still exists we can rule out a lazy unmount on the path.

Hence it looks like the process (aux)mounted was uncleanly terminated and hence
we did not do a lazy unmount. 

create volume, start, mount,  enable quota 

[root at localhost mnt]# mount | grep gluster
localhost:v1 on /run/gluster/v1 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
10.70.1.217:v1 on /mnt type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root at localhost mnt]# kill -9 9317

>> notice the stale mout on /run/gluster/v1
[root at localhost mnt]# mount | grep gluster
localhost:v1 on /run/gluster/v1 type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
10.70.1.217:v1 on /mnt type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

>> 
[root at localhost mnt]# ls /run/gluster/v1
ls: cannot access '/run/gluster/v1': Transport endpoint is not connected

--- Additional comment from Sanoj Unnikrishnan on 2017-01-23 04:09:20 EST ---

While the scenario leading to the stale mount hasn't been RCA'd, One plausible
approach to avoid the issue would be to have all
commands(limit/remove_limit/list) umount the aux path before it finishes. 

The reason why this was not done in the first place is to avoid mount on
subsequent commands, this is a tiny performance improvement we could do away
with.

Another risk with keeping the aux mount around too long is that if the user
inadvertently did an 'rm' over the /var/run. It could delete all the persistent
filesystem data.

while clearing /var/run is not expected. It shouldn't have such side effect
(being a temporary directory).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=dvCswOfkhs&a=cc_unsubscribe