[Gluster-devel] Glusto failures with dispersed volumes + Samba

Henrik Kuhn henrik.kuhn at gmail.com
Thu Jan 25 10:39:31 UTC 2018


Hi gluster-devel experts,

I've stumbled upon the very same issue/observations like the one 
mentioned in 
http://lists.gluster.org/pipermail/gluster-devel/2017-July/053234.html.
Nigel Babu and Pranith Kumar Karampuri told me that this selinux related 
issue had been also been fixed in >= v3.13.0.

The test setup is as follows:
OS: OpenSUSE Leap 42.3 (selinux features installed/enabled)
SW: gluster 3.13.1
- 3 gluster nodes (gnode[1,2,3] with own vlan for gluster communication)
- 1 virtualization server (snode) upon which the test samba server 
instance is running under KVM/QEMU. The image is provided by the 
gluster  (/vmvol/).
The gluster volume /vol1/ is for the samba share.

snode:~ # ssh gnode1 gluster vol info all

*Volume Name: vmvol*
Type: Replicate
Volume ID: a03b8fc1-4fcb-4268-bf09-0f554ba5e7a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gs-gnode1:/data/glusterfs/vmvol/brick1/brick
Brick2: gs-gnode2:/data/glusterfs/vmvol/brick1/brick
Brick3: gs-gnode3:/data/glusterfs/vmvol/brick1/brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
server.allow-insecure: on
storage.owner-uid: 500
storage.owner-gid: 500

*Volume Name: vol1 *
Type: Disperse
Volume ID: fb081b58-bffc-4ddd-bf62-a87a13abec9b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: gs-gnode1:/data/glusterfs/vol1/brick1/brick
Brick2: gs-gnode2:/data/glusterfs/vol1/brick1/brick
Brick3: gs-gnode3:/data/glusterfs/vol1/brick1/brick
Brick4: gs-gnode1:/data/glusterfs/vol1/brick2/brick
Brick5: gs-gnode2:/data/glusterfs/vol1/brick2/brick
Brick6: gs-gnode3:/data/glusterfs/vol1/brick2/brick
Options Reconfigured:
performance.readdir-ahead: on
storage.batch-fsync-delay-usec: 0
performance.stat-prefetch: off
nfs.disable: on
transport.address-family: inet
server.allow-insecure: on

The samba server config is:
samba:~ # cat /etc/samba/smb.conf
[global]
   workgroup = TESTGROUP
   server string = Samba Server Version %v
   log file = /var/log/samba/log.%m
   log level = 2
   map to guest = Bad User
   unix charset = UTF-8
   idmap config * : backend = autorid
   idmap config * : range = 1000000-1999999

[gluster]
   kernel share modes = no
   vfs objects = acl_xattr glusterfs
   comment = glusterfs based volume
   browseable = Yes
   read only = No
   writeable = Yes
   public = Yes
   guest ok = Yes
   inherit acls = Yes
   path = /shares/data/
   glusterfs:volume = vol1
   glusterfs:volfile_server = gs-gnode1.origenis.de 
gs-gnode2.origenis.de gs-gnode3.origenis.de
#  glusterfs:volfile_server = 172.17.20.1 172.17.20.2 172.17.20.3
   glusterfs:loglevel = 9
   glusterfs:logfile = /var/log/samba/glusterfs-vol1.%M.log


The corresponding logs on the samba server are:
* The vfs_gluster log shows lots of:
[2018-01-15 15:27:49.349995] D 
[logging.c:1817:__gf_log_inject_timer_event] 0-logging-infra: Starting 
timer now. Timeout = 120, current buf size = 5
[2018-01-15 15:27:49.351598] D 
[rpc-clnt.c:1047:rpc_clnt_connection_init] 0-gfapi: defaulting 
frame-timeout to 30mins
[2018-01-15 15:27:49.351625] D 
[rpc-clnt.c:1061:rpc_clnt_connection_init] 0-gfapi: disable ping-timeout
[2018-01-15 15:27:49.351644] D [rpc-transport.c:279:rpc_transport_load] 
0-rpc-transport: attempt to load file 
/usr/lib64/glusterfs/3.13.1/rpc-transport/socket.so
[2018-01-15 15:27:49.352372] W [MSGID: 101002] 
[options.c:995:xl_opt_validate] 0-gfapi: option 'address-family' is 
deprecated, preferred is 'transport.address-family', continuing with 
correction
[2018-01-15 15:27:49.352402] T [MSGID: 0] 
[options.c:86:xlator_option_validate_int] 0-gfapi: no range check 
required for 'option remote-port 24007'
[2018-01-15 15:27:49.354865] D [socket.c:4236:socket_init] 0-gfapi: 
Configued transport.tcp-user-timeout=0
[2018-01-15 15:27:49.354885] D [socket.c:4254:socket_init] 0-gfapi: 
Reconfigued transport.keepalivecnt=9
[2018-01-15 15:27:49.354901] D [socket.c:4339:socket_init] 0-gfapi: SSL 
support on the I/O path is NOT enabled
[2018-01-15 15:27:49.354914] D [socket.c:4342:socket_init] 0-gfapi: SSL 
support for glusterd is NOT enabled
[2018-01-15 15:27:49.354926] D [socket.c:4359:socket_init] 0-gfapi: 
using system polling thread
[2018-01-15 15:27:49.354943] D 
[rpc-clnt.c:1567:rpcclnt_cbk_program_register] 0-gfapi: New program 
registered: GlusterFS Callback, Num: 52743234, Ver: 1
[2018-01-15 15:27:49.354958] T [rpc-clnt.c:406:rpc_clnt_reconnect] 
0-gfapi: attempting reconnect
[2018-01-15 15:27:49.354971] T [socket.c:3146:socket_connect] 0-gfapi: 
connecting 0x55e1aca9f9b0, state=0 gen=0 sock=-1
[2018-01-15 15:27:49.354991] T [MSGID: 0] 
[common-utils.c:306:gf_resolve_ip6] 0-resolver: DNS cache not present, 
freshly probing hostname: gs-gnode1.origenis.de
[2018-01-15 15:27:49.360012] D [MSGID: 0] 
[common-utils.c:346:gf_resolve_ip6] 0-resolver: returning ip-172.17.20.1 
(port-24007) for hostname: gs-gnode1.origenis.de and port: 24007
[2018-01-15 15:27:49.360039] D [socket.c:3063:socket_fix_ssl_opts] 
0-gfapi: disabling SSL for portmapper connection
[2018-01-15 15:27:49.360068] W [socket.c:3216:socket_connect] 0-gfapi: 
Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-01-15 15:27:49.360086] T [socket.c:834:__socket_nodelay] 0-gfapi: 
NODELAY enabled for socket 40
[2018-01-15 15:27:49.360109] T [socket.c:922:__socket_keepalive] 
0-gfapi: Keep-alive enabled for socket: 40, (idle: 20, interval: 2, 
max-probes: 9, timeout: 0)
[2018-01-15 15:27:49.360776] T [rpc-clnt.c:1496:rpc_clnt_record] 
0-gfapi: Auth Info: pid: 0, uid: 0, gid: 0, owner:
[2018-01-15 15:27:49.360811] T 
[rpc-clnt.c:1353:rpc_clnt_record_build_header] 0-rpc-clnt: Request 
fraglen 140, payload: 76, rpc hdr: 64
[2018-01-15 15:27:49.360856] T [rpc-clnt.c:1699:rpc_clnt_submit] 
0-rpc-clnt: submitted request (XID: 0x2 Program: GlusterFS Handshake, 
ProgVers: 2, Proc: 2) to rpc-transport (gfapi)
[2018-01-15 15:27:49.360875] D [rpc-clnt-ping.c:317:rpc_clnt_start_ping] 
0-gfapi: ping timeout is 0, returning
[2018-01-15 15:27:49.362364] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
0-gfapi: received rpc message (RPC XID: 0x2 Program: GlusterFS 
Handshake, ProgVers: 2, Proc: 2) from rpc-transport (gfapi)
[2018-01-15 15:27:49.362932] T [MSGID: 0] [graph.y:187:new_volume] 
0-parser: New node for 'vol1-client-0'
....
....
[2018-01-15 15:27:49.386102] T [socket.c:922:__socket_keepalive] 
0-vol1-client-5: Keep-alive enabled for socket: 47, (idle: 20, interval: 
2, max-probes: 9, timeout: 0)
Final graph:
+------------------------------------------------------------------------------+
   1: volume vol1-client-0
   2:     type protocol/client
   3:     option ping-timeout 42
   4:     option remote-host gs-gnode1
   5:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
   6:     option transport-type socket
   7:     option transport.address-family inet
   8:     option transport.tcp-user-timeout 0
   9:     option transport.socket.keepalive-time 20
  10:     option transport.socket.keepalive-interval 2
  11:     option transport.socket.keepalive-count 9
  12:     option send-gids true
  13: end-volume
  14:
  15: volume vol1-client-1
  16:     type protocol/client
  17:     option ping-timeout 42
  18:     option remote-host gs-gnode2
  19:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
  20:     option transport-type socket
  21:     option transport.address-family inet
  22:     option transport.tcp-user-timeout 0
  23:     option transport.socket.keepalive-time 20
  24:     option transport.socket.keepalive-interval 2
  25:     option transport.socket.keepalive-count 9
  26:     option send-gids true
  27: end-volume
  28:
  29: volume vol1-client-2
  30:     type protocol/client
  31:     option ping-timeout 42
  32:     option remote-host gs-gnode3
  33:     option remote-subvolume /data/glusterfs/vol1/brick1/brick
  34:     option transport-type socket
  35:     option transport.address-family inet
  36:     option transport.tcp-user-timeout 0
  37:     option transport.socket.keepalive-time 20
  38:     option transport.socket.keepalive-interval 2
  39:     option transport.socket.keepalive-count 9
  40:     option send-gids true
  41: end-volume
  42:
  43: volume vol1-client-3
  44:     type protocol/client
  45:     option ping-timeout 42
  46:     option remote-host gs-gnode1
  47:     option remote-subvolume /data/glusterfs/vol1/brick2/brick
  48:     option transport-type socket
  49:     option transport.address-family inet
  50:     option transport.tcp-user-timeout 0
  51:     option transport.socket.keepalive-time 20
  52:     option transport.socket.keepalive-interval 2
  53:     option transport.socket.keepalive-count 9
  54:     option send-gids true
  55: end-volume
  56:
  57: volume vol1-client-4
  58:     type protocol/client
  59:     option ping-timeout 42
  60:     option remote-host gs-gnode2
  61:     option remote-subvolume /data/glusterfs/vol1/brick2/brick
  62:     option transport-type socket
  63:     option transport.address-family inet
  64:     option transport.tcp-user-timeout 0
  65:     option transport.socket.keepalive-time 20
  66:     option transport.socket.keepalive-interval 2
  67:     option transport.socket.keepalive-count 9
  68:     option send-gids true
  69: end-volume
  70:
  71: volume vol1-client-5
  72:     type protocol/client
  73:     option ping-timeout 42
  74:     option remote-host gs-gnode3
  75:     option remote-subvolume /data/glusterfs/vol1/brick2/brick
  76:     option transport-type socket
  77:     option transport.address-family inet
  78:     option transport.tcp-user-timeout 0
  79:     option transport.socket.keepalive-time 20
  80:     option transport.socket.keepalive-interval 2
  81:     option transport.socket.keepalive-count 9
  82:     option send-gids true
  83: end-volume
  84:
  85: volume vol1-disperse-0
  86:     type cluster/disperse
  87:     option redundancy 2
  88:     subvolumes vol1-client-0 vol1-client-1 vol1-client-2 
vol1-client-3 vol1-client-4 vol1-client-5
  89: end-volume
  90:
  91: volume vol1-dht
  92:     type cluster/distribute
  93:     option lock-migration off
  94:     subvolumes vol1-disperse-0
  95: end-volume
  96:
  97: volume vol1-write-behind
  98:     type performance/write-behind
  99:     subvolumes vol1-dht
100: end-volume
101:
102: volume vol1-read-ahead
103:     type performance/read-ahead
104:     subvolumes vol1-write-behind
105: end-volume
106:
107: volume vol1-readdir-ahead
108:     type performance/readdir-ahead
109:     option parallel-readdir off
[2018-01-15 15:27:49.387104] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
0-vol1-client-0: received rpc message (RPC XID: 0x2 Program: GF-DUMP, 
ProgVers: 1, Proc: 1) from rpc-transport (vol1-client-0)
....
....
[2018-01-15 15:27:49.390946] T [rpc-clnt.c:1699:rpc_clnt_submit] 
0-rpc-clnt: submitted request (XID: 0x3 Program: PORTMAP, ProgVers: 1, 
Proc: 1) to rpc-transport (vol1-client-1)
110:     option rda-request-size 131072
111:     option rda-cache-limit 10MB
112:     subvolumes vol1-read-ahead
113: end-volume
114:
115: volume vol1-io-cache
116:     type performance/io-cache
117:     subvolumes vol1-readdir-ahead
118: end-volume
119:
120: volume vol1-quick-read
121:     type performance/quick-read
122:     subvolumes vol1-io-cache
123: end-volume
124:
125: volume vol1-open-behind
126:     type performance/open-behind
127:     subvolumes vol1-quick-read
128: end-volume
129:
130: volume vol1-io-threads
131:     type performance/io-threads
132:     subvolumes vol1-open-behind
133: end-volume
134:
135: volume vol1
136:     type debug/io-stats
137:     option log-level INFO
138:     option latency-measurement off
139:     option count-fop-hits off
140:     subvolumes vol1-io-threads
141: end-volume
142:
143: volume meta-autoload
144:     type meta
145:     subvolumes vol1
146: end-volume
147:
+------------------------------------------------------------------------------+
[2018-01-15 15:27:49.391515] T [rpc-clnt.c:675:rpc_clnt_reply_init] 
0-vol1-client-2: received rpc message (RPC XID: 0x2 Program: GF-DUMP, 
ProgVers: 1, Proc: 1) from rpc-transport (vol1-client-2)
....
....
*[2018-01-15 15:27:49.446699] W [MSGID: 122019] 
[ec-helpers.c:412:ec_loc_gfid_check] 0-vol1-disperse-0: **Mismatching 
GFID's in loc*
[2018-01-15 15:27:49.446719] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 
0-ec: MANAGER(LOOKUP) 0x7f92bc00db48((nil)) [refs=3, winds=0, jobs=0] 
frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=0 state=5 
{111111:000000:111111} error=22
[2018-01-15 15:27:49.446739] D [MSGID: 0] 
[defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1-disperse-0 returned -1 error: Invalid argument 
[Invalid argument]
[2018-01-15 15:27:49.446759] D [MSGID: 0] 
[dht-common.c:1501:dht_revalidate_cbk] 0-vol1-dht: revalidate lookup of 
/shares/data returned with op_ret -1 [Invalid argument]
[2018-01-15 15:27:49.446778] I [MSGID: 109094] 
[dht-common.c:1514:dht_revalidate_cbk] 0-vol1-dht: Revalidate: subvolume 
vol1-disperse-0 for /shares/data (gfid = 
2f5a8895-bf08-41df-a13f-3205c62125ab) returned -1 [Invalid argument]
[2018-01-15 15:27:49.446796] E [MSGID: 101046] 
[dht-common.c:1810:dht_revalidate_cbk] 0-vol1-dht: dict is null
[2018-01-15 15:27:49.446814] D [MSGID: 0] 
[dht-common.c:1814:dht_revalidate_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1-dht returned -1 error: Invalid argument [Invalid 
argument]
[2018-01-15 15:27:49.446832] D [MSGID: 0] 
[write-behind.c:2394:wb_lookup_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1-write-behind returned -1 error: Invalid argument 
[Invalid argument]
[2018-01-15 15:27:49.446852] D [MSGID: 0] 
[io-cache.c:268:ioc_lookup_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1-io-cache returned -1 error: Invalid argument 
[Invalid argument]
[2018-01-15 15:27:49.446871] D [MSGID: 0] 
[quick-read.c:450:qr_lookup_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1-quick-read returned -1 error: Invalid argument 
[Invalid argument]
[2018-01-15 15:27:49.446890] D [MSGID: 0] 
[defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1-io-threads returned -1 error: Invalid argument 
[Invalid argument]
[2018-01-15 15:27:49.446908] D [MSGID: 0] 
[io-stats.c:2216:io_stats_lookup_cbk] 0-stack-trace: stack-address: 
0x55e1acad3c38, vol1 returned -1 error: Invalid argument [Invalid argument]
[2018-01-15 15:27:49.446988] T [MSGID: 0] [syncop.c:1239:syncop_lookup] 
0-stack-trace: stack-address: 0x55e1acad3c38, winding from gfapi to 
meta-autoload
[2018-01-15 15:27:49.447011] T [MSGID: 0] 
[defaults.c:2574:default_lookup] 0-stack-trace: stack-address: 
0x55e1acad3c38, winding from meta-autoload to vol1
[2018-01-15 15:27:49.447028] T [MSGID: 0] 
[io-stats.c:2728:io_stats_lookup] 0-stack-trace: stack-address: 
0x55e1acad3c38, winding from vol1 to vol1-io-threads
[2018-01-15 15:27:49.447047] D [MSGID: 0] 
[io-threads.c:358:iot_schedule] 0-vol1-io-threads: LOOKUP scheduled as 
fast fop
[2018-01-15 15:27:49.447113] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 
0-ec: MANAGER(LOOKUP) 0x7f92bc00db48((nil)) [refs=3, winds=0, jobs=0] 
frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=0 state=0 
{111111:000000:111111} error=22
[2018-01-15 15:27:49.447140] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 
0-ec: RELEASE(LOOKUP) 0x7f92bc00db48((nil)) [refs=3, winds=0, jobs=0] 
frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=22 state=0 
{111111:000000:111111}
[2018-01-15 15:27:49.447159] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 
0-ec: RELEASE(LOOKUP) 0x7f92bc00db48((nil)) [refs=2, winds=0, jobs=0] 
frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=22 state=0 
{111111:000000:111111}
[2018-01-15 15:27:49.447177] T [MSGID: 0] [ec-helpers.c:89:ec_trace] 
0-ec: RELEASE(LOOKUP) 0x7f92bc00db48((nil)) [refs=1, winds=0, jobs=0] 
frame=0x7f92bc0013c8/0x7f92bc003878, min/exp=4/6, err=22 state=0 
{111111:000000:111111}
...
...


* The systems log-journal lists:
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.495269,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.489101,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.483684,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.478285,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.465356,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.459377,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.454542,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)
*Jan 15 15:27:49 samba smbd[3129]:   glfs_stat(.) failed: Invalid argument**
**Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.449593,  0] 
../source3/modules/vfs_glusterfs.c:996(vfs_gluster_stat)*
Jan 15 15:27:49 samba smbd[3129]:   vfs_gluster_connect: vol1: 
Initialized volume from servers gs-gnode1.origenis.de 
gs-gnode2.origenis.de gs-gnode3.origenis.de
Jan 15 15:27:49 samba smbd[3129]: [2018/01/15 15:27:49.420543,  0] 
../source3/modules/vfs_glusterfs.c:371(vfs_gluster_connect)
Jan 15 15:27:08 samba smbd[3117]:   STATUS=daemon 'smbd' finished 
starting up and ready to serve connections
Jan 15 15:27:08 samba smbd[3117]: [2018/01/15 15:27:08.920529,  0] 
../lib/util/become_daemon.c:124(daemon_ready)
Jan 15 15:27:08 samba systemd[1]: Started Samba SMB Daemon.
Jan 15 15:27:08 samba systemd[1]: Starting Samba SMB Daemon...

Kind regards for your help,
Henrik


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20180125/36d3005d/attachment-0001.html>


More information about the Gluster-devel mailing list