[Bugs] [Bug 1467513] CIFS:[USS]: .snaps is not accessible from the CIFS client after volume stop/start

bugzilla at redhat.com bugzilla at redhat.com
Tue Jul 4 06:19:29 UTC 2017


https://bugzilla.redhat.com/show_bug.cgi?id=1467513

Mohammed Rafi KC <rkavunga at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED



--- Comment #1 from Mohammed Rafi KC <rkavunga at redhat.com> ---
escription of problem:
After the volume is stop/start and browsing the .snaps from the CIFS fails.

Version-Release number of selected component (if applicable):
[root at gqas005 ~]# rpm -qa | grep gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
glusterfs-fuse-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.42.1-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
rhs-tests-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-2.37-0.noarch
glusterfs-libs-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-api-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-cli-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.42.1-1.el6rhs.x86_64
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-server-3.6.0.42.1-1.el6rhs.x86_64
[root at gqas005 ~]# 

How reproducible:
Intermittent


Steps to Reproduce:
1. Create a 6*2 dist-rep volume and start it
2. Mount the volume at the Windows Client
3. Enable USS and run I/O at CIFS mount point
4. Create 256 snapshots for the volume 
5. While accessing the 256 snaps and present in <Drive>:\.snaps stop the
gluster volume
6. Start the gluster volume again and try to access .snaps dir

Actual results:
snapd is down and not accessible as snapd is not able to connect to existing
socket because of error "Socket already in use"

This may also happen in the case when the snapd is killed for some reason and
the ports are held up by kernel. Ideally if the snapd could not find the port
it should bind to a different free port and so the .snaps can be accessible
from the client.

Expected results:
snapd should come up successfully.

Additional info:

Workaround: After enable the uss forcefully it worked.

snapd.log shows the following:
=============================
[2015-02-10 09:42:43.617728] W [options.c:898:xl_opt_validate]
0-testvol1-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2015-02-10 09:42:43.617825] E [socket.c:711:__socket_server_bind]
0-tcp.testvol1-server: binding to  failed: Address already in use
[2015-02-10 09:42:43.617840] E [socket.c:714:__socket_server_bind]
0-tcp.testvol1-server: Port is already in use
[2015-02-10 09:42:43.617859] W [rpcsvc.c:1531:rpcsvc_transport_create]
0-rpc-service: listening on transport failed
[2015-02-10 09:42:43.617872] W [server.c:911:init] 0-testvol1-server: creation
of listener failed
[2015-02-10 09:42:43.617883] E [xlator.c:406:xlator_init] 0-testvol1-server:
Initialization of volume 'testvol1-server' failed, review your volfile again
[2015-02-10 09:42:43.617894] E [graph.c:322:glusterfs_graph_init]
0-testvol1-server: initializing translator failed
[2015-02-10 09:42:43.617904] E [graph.c:525:glusterfs_graph_activate] 0-graph:
init failed
[2015-02-10 09:42:43.618221] W [glusterfsd.c:1183:cleanup_and_exit] (--> 0-:
received signum (0), shutting down
[2015-02-10 10:31:33.327204] I [MSGID: 100030] [glusterfsd.c:2016:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.6.0.42.1
(args: /usr/sbin/glusterfsd -s localhost --volfile-id snapd/testvol1 -p
/var/lib/glusterd/vols/testvol1/run/testvol1-snapd.pid -l
/var/log/glusterfs/snaps/testvol1/snapd.log --brick-name snapd-testvol1 -S
/var/run/c3bc0889c974e54aaf844607b33c8054.socket --brick-port 49959
--xlator-option testvol1-server.listen-port=49959 --no-mem-accounting)
[2015-02-10 10:31:34.202665] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2015-02-10 10:31:34.702169] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2015-02-10 10:31:35.160011] I [glusterfsd-mgmt.c:56:mgmt_cbk_spec] 0-mgmt:
Volume file changed
[2015-02-10 10:31:35.187382] I [graph.c:269:gf_add_cmdline_options]
0-testvol1-server: adding option 'listen-port' for volume 'testvol1-server'
with value '49959'
[2015-02-10 10:31:35.225011] I [rpcsvc.c:2142:rpcsvc_set_outstanding_rpc_limit]
0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2015-02-10 10:31:35.225094] W [options.c:898:xl_opt_validate]
0-testvol1-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2015-02-10 10:31:35.234311] W [graph.c:344:_log_if_unknown_option]
0-testvol1-server: option 'rpc-auth.auth-glusterfs' is not recognized
[2015-02-10 10:31:35.234355] W [graph.c:344:_log_if_unknown_option]
0-testvol1-server: option 'rpc-auth.auth-unix' is not recognized
[2015-02-10 10:31:35.234386] W [graph.c:344:_log_if_unknown_option]
0-testvol1-server: option 'rpc-auth.auth-null' is not recognized

[root at gqas005 ~]# gluster volume info

Volume Name: testvol1
Type: Distributed-Replicate
Volume ID: df8c4ec8-714f-4c58-8a34-65fe8c170dd9
Status: Started
Snap Volume: no
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1
Brick2: gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3
Brick4: gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4
Brick5: gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5
Brick6: gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6
Brick7: gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7
Brick8: gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8
Brick9: gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9
Brick10: gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b10
Brick11: gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b11
Brick12: gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b12
Options Reconfigured:
server.allow-insecure: enable
storage.batch-fsync-delay-usec: 0
features.quota: on
features.uss: enable
performance.readdir-ahead: on
features.show-snapshot-directory: enable
performance.stat-prefetch: enable
performance.io-cache: enable
features.quota-deem-statfs: enable
features.barrier: disable
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable
[root at gqas005 ~]# 



[root at gqas005 ~]# rpm -qa | grep gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
glusterfs-fuse-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.42.1-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
rhs-tests-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-2.37-0.noarch
glusterfs-libs-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-api-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-cli-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.42.1-1.el6rhs.x86_64
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-server-3.6.0.42.1-1.el6rhs.x86_64
[root at gqas005 ~]# 


[root at gqas005 ~]# gluster volume status testvol1
Status of volume: testvol1
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1    49153    Y    25691
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2    49153    Y    23636
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3    49153    Y    22686
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4    49154    Y    27794
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5    49155    Y    27813
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6    49154    Y    22697
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7    49154    Y    25709
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8    49154    Y    23647
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9    49155    Y    22708
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b
10                            49156    Y    27824
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b
11                            49155    Y    25721
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b
12                            49155    Y    23658
Snapshot Daemon on localhost                N/A    N    N/A  ---> snapd is down
NFS Server on localhost                    2049    Y    27844
Self-heal Daemon on localhost                N/A    Y    27855
Quota Daemon on localhost                N/A    Y    27862
Snapshot Daemon on gqas009.sbu.lab.eng.bos.redhat.com    49925    Y    25733
NFS Server on gqas009.sbu.lab.eng.bos.redhat.com    2049    Y    25758
Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com    N/A    Y    25793
Quota Daemon on gqas009.sbu.lab.eng.bos.redhat.com    N/A    Y    25830
Snapshot Daemon on gqas006.sbu.lab.eng.bos.redhat.com    49925    Y    22720
NFS Server on gqas006.sbu.lab.eng.bos.redhat.com    2049    Y    22727
Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com    N/A    Y    22734
Quota Daemon on gqas006.sbu.lab.eng.bos.redhat.com    N/A    Y    22741
Snapshot Daemon on gqas012.sbu.lab.eng.bos.redhat.com    49925    Y    23670
NFS Server on gqas012.sbu.lab.eng.bos.redhat.com    2049    Y    23678
Self-heal Daemon on gqas012.sbu.lab.eng.bos.redhat.com    N/A    Y    23685
Quota Daemon on gqas012.sbu.lab.eng.bos.redhat.com    N/A    Y    23692

Task Status of Volume testvol1
------------------------------------------------------------------------------
There are no active volume tasks

[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# less /var/log/glusterfs/
bricks/                                  geo-replication-slaves/               
  quotad.log
cli.log                                  glustershd.log                        
  quotad.log-20150208
cli.log-20150208                         glustershd.log-20150208               
  quota-mount-testvol1.log
.cmd_log_history                         nfs.log                               
  quota-mount-testvol1.log-20150208
etc-glusterfs-glusterd.vol.log           nfs.log-20150208                      
  quota-mount-testvol.log
etc-glusterfs-glusterd.vol.log-20150208  quota-crawl.log                       
  quota-mount-testvol.log-20150208
geo-replication/                         quota-crawl.log-20150208              
  snaps/
[root at gqas005 ~]# less /var/log/glusterfs/s
/var/log/glusterfs/s: No such file or directory
[root at gqas005 ~]# 
[root at gqas005 ~]# rpm -qa | grep gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
glusterfs-fuse-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.42.1-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
rhs-tests-rhs-tests-beaker-rhs-gluster-qe-libs-dev-bturner-2.37-0.noarch
glusterfs-libs-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-api-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-cli-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.42.1-1.el6rhs.x86_64
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-3.6.0.42.1-1.el6rhs.x86_64
glusterfs-server-3.6.0.42.1-1.el6rhs.x86_64
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# gluster vol set testvol1 features.uss enable force
Usage: volume set <VOLNAME> <KEY> <VALUE>
[root at gqas005 ~]# gluster vol set testvol1 features.uss enable
volume set: success
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# gluster vol status
Status of volume: testvol1
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1    49153    Y    25691
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2    49153    Y    23636
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3    49153    Y    22686
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4    49154    Y    27794
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5    49155    Y    27813
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6    49154    Y    22697
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7    49154    Y    25709
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8    49154    Y    23647
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9    49155    Y    22708
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b
10                            49156    Y    27824
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b
11                            49155    Y    25721
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b
12                            49155    Y    23658
Snapshot Daemon on localhost                49959    Y    29088
NFS Server on localhost                    2049    Y    27844
Self-heal Daemon on localhost                N/A    Y    27855
Quota Daemon on localhost                N/A    Y    27862
Snapshot Daemon on gqas006.sbu.lab.eng.bos.redhat.com    49925    Y    22720
NFS Server on gqas006.sbu.lab.eng.bos.redhat.com    2049    Y    22727
Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com    N/A    Y    22734
Quota Daemon on gqas006.sbu.lab.eng.bos.redhat.com    N/A    Y    22741
Snapshot Daemon on gqas009.sbu.lab.eng.bos.redhat.com    49925    Y    25733
NFS Server on gqas009.sbu.lab.eng.bos.redhat.com    2049    Y    25758
Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com    N/A    Y    25793
Quota Daemon on gqas009.sbu.lab.eng.bos.redhat.com    N/A    Y    25830
Snapshot Daemon on gqas012.sbu.lab.eng.bos.redhat.com    49925    Y    23670
NFS Server on gqas012.sbu.lab.eng.bos.redhat.com    2049    Y    23678
Self-heal Daemon on gqas012.sbu.lab.eng.bos.redhat.com    N/A    Y    23685
Quota Daemon on gqas012.sbu.lab.eng.bos.redhat.com    N/A    Y    23692

Task Status of Volume testvol1
------------------------------------------------------------------------------
There are no active volume tasks

[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# 
[root at gqas005 ~]# gluster volume stop testvol1
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
volume stop: testvol1: success
[root at gqas005 ~]# gluster volume status
Volume testvol1 is not started

[root at gqas005 ~]# gluster volume start testvol1
volume start: testvol1: success
[root at gqas005 ~]# gluster volume status
Status of volume: testvol1
Gluster process                        Port    Online    Pid
------------------------------------------------------------------------------
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick1/b1    49153    Y    27126
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick2/b2    49153    Y    25068
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick3/b3    49153    Y    24140
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick4/b4    49154    Y    30889
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick5/b5    49155    Y    30900
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick6/b6    49154    Y    24151
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick7/b7    49154    Y    27137
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick8/b8    49154    Y    25083
Brick gqas006.sbu.lab.eng.bos.redhat.com:/rhs/brick9/b9    49155    Y    24162
Brick gqas005.sbu.lab.eng.bos.redhat.com:/rhs/brick10/b
10                            49156    Y    30911
Brick gqas009.sbu.lab.eng.bos.redhat.com:/rhs/brick11/b
11                            49155    Y    27148
Brick gqas012.sbu.lab.eng.bos.redhat.com:/rhs/brick12/b
12                            49155    Y    25095
Snapshot Daemon on localhost                49959    Y    30923
NFS Server on localhost                    2049    Y    30930
Self-heal Daemon on localhost                N/A    Y    30937
Quota Daemon on localhost                N/A    Y    30944
Snapshot Daemon on gqas009.sbu.lab.eng.bos.redhat.com    49925    Y    27160
NFS Server on gqas009.sbu.lab.eng.bos.redhat.com    N/A    N    N/A
Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com    N/A    N    N/A
Quota Daemon on gqas009.sbu.lab.eng.bos.redhat.com    N/A    N    N/A
Snapshot Daemon on gqas012.sbu.lab.eng.bos.redhat.com    49925    Y    25108
NFS Server on gqas012.sbu.lab.eng.bos.redhat.com    2049    Y    25116
Self-heal Daemon on gqas012.sbu.lab.eng.bos.redhat.com    N/A    Y    25124
Quota Daemon on gqas012.sbu.lab.eng.bos.redhat.com    N/A    Y    25131
Snapshot Daemon on gqas006.sbu.lab.eng.bos.redhat.com    49925    Y    24174
NFS Server on gqas006.sbu.lab.eng.bos.redhat.com    2049    Y    24181
Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com    N/A    Y    24188
Quota Daemon on gqas006.sbu.lab.eng.bos.redhat.com    N/A    Y    24195

Task Status of Volume testvol1


--- Additional comment from Mohammed Rafi KC on 2017-06-30 12:11:44 EDT ---

RCA:

in windows client .snaps is considered as special directory, whereas in all
other client, it is a virtual directory. For this reason, lookup on root has to
return the snapshot entries in windows client. Even though snapshot is a
specially directory it doesn't have a dedicated inode. Which means each time,
it gets a different gfid which is not present in the backend. When a snapd
restarts, gfid from the backend will loose. but the client will have older gfid
and the client lookup fails with ESTALE. Usually when we get ESTALE error in
lookup we try with new inode, but that is missing in this code path. Because
this was a special lookup during readdirp on root.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Bugs mailing list