[Gluster-users] Is transport=rdma tested with "stripe"?

Wed Aug 16 11:14:15 UTC 2017

> Note that "stripe" is not tested much and practically unmaintained.

Ah, this was what I suspected.  Understood.  I'll be happy with "shard".

Having said that, "stripe" works fine with transport=tcp.  The failure reproduces with just 2 RDMA servers (with InfiniBand), one of those acts also as a client.

I looked into logs.  I paste lengthy logs below with hoping mail systems not automatically fold lines...

Takao

---

Immediately started the "gluster" interactive command, the following appeared in cli.log.  The last line repeats at every 3 seconds.

[2017-08-16 10:49:00.028789] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2017-08-16 10:49:00.032509] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2017-08-16 10:49:00.033038] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 10:49:00.033092] I [socket.c:2415:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-16 10:49:03.032434] I [socket.c:2415:socket_event_handler] 0-transport: EPOLLERR - disconnecting now

When I do:

gluster> volume create gv0 stripe 2 transport rdma gluster-s1-fdr:/data/brick1/gv0 gluster-s2-fdr:/data/brick1/gv0
volume create: gv0: success: please start the volume to access data
gluster> volume start gv0
volume start: gv0: success

The following appeared in glusterd.log.  Note the "E" flag on the last line.

[2017-08-16 10:38:48.451329] I [MSGID: 106062] [glusterd-volume-ops.c:2617:glusterd_op_start_volume] 0-management: Global dict not present.
[2017-08-16 10:38:48.751913] I [MSGID: 106143] [glusterd-pmap.c:277:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0.rdma on port 49152
[2017-08-16 10:38:48.752222] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-16 10:38:48.915868] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2017-08-16 10:38:48.915977] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2017-08-16 10:38:48.916008] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is stopped
[2017-08-16 10:38:48.916189] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2017-08-16 10:38:48.916210] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is stopped
[2017-08-16 10:38:48.916232] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2017-08-16 10:38:48.916245] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is stopped
[2017-08-16 10:38:49.392687] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdbd7a) [0x7fbb107e5d7a] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb83d) [0x7fbb107e583d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fbb1bc5c385] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=gv0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-08-16 10:38:49.402177] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdbd7a) [0x7fbb107e5d7a] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) [0x7fbb107e579b] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fbb1bc5c385] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=gv0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd

Looks like this was related to Samba that I do not use.  The same E error happens even I use transport=tcp.  No error in brick logs.  Below is what was written to data-brick1-gv0.log:

[2017-08-16 10:59:24.127902] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.3 (args: /usr/sbin/glusterfsd -s gluster-s1-fdr --volfile-id gv0.gluster-s1-fdr.data-brick1-gv0 -p /var/lib/glusterd/vols/gv0/run/gluster-s1-fdr-data-brick1-gv0.pid -S /var/run/gluster/6b6de65a92fa07146541a9474ffa2fd2.socket --brick-name /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option *-posix.glusterd-uuid=5c750a8f-c45b-4a7e-af84-16c1999874b7 --brick-port 49152 --xlator-option gv0-server.listen-port=49152 --volfile-server-transport=rdma)
[2017-08-16 10:59:24.134054] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 10:59:24.137118] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2017-08-16 10:59:24.138384] W [MSGID: 101002] [options.c:954:xl_opt_validate] 0-gv0-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
[2017-08-16 10:59:24.142207] I [MSGID: 121050] [ctr-helper.c:259:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
[2017-08-16 10:59:24.237783] I [trash.c:2493:init] 0-gv0-trash: no option specified for 'eliminate', using NULL
[2017-08-16 10:59:24.239129] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'rpc-auth.auth-glusterfs' is not recognized
[2017-08-16 10:59:24.239189] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'rpc-auth.auth-unix' is not recognized
[2017-08-16 10:59:24.239203] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'rpc-auth.auth-null' is not recognized
[2017-08-16 10:59:24.239226] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'auth-path' is not recognized
[2017-08-16 10:59:24.239235] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth.addr./data/brick1/gv0.allow' is not recognized
[2017-08-16 10:59:24.239251] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth-path' is not recognized
[2017-08-16 10:59:24.239257] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password' is not recognized
[2017-08-16 10:59:24.239263] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth.login./data/brick1/gv0.allow' is not recognized
[2017-08-16 10:59:24.239276] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-quota: option 'timeout' is not recognized
[2017-08-16 10:59:24.239311] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-trash: option 'brick-path' is not recognized
Final graph:
+------------------------------------------------------------------------------+
  1: volume gv0-posix
  2:     type storage/posix
  3:     option glusterd-uuid 5c750a8f-c45b-4a7e-af84-16c1999874b7
  4:     option directory /data/brick1/gv0
  5:     option volume-id 6491a59c-866f-4a1d-b21b-f894ea0e50cd
  6: end-volume
  7:
  8: volume gv0-trash
  9:     type features/trash
 10:     option trash-dir .trashcan
 11:     option brick-path /data/brick1/gv0
 12:     option trash-internal-op off
 13:     subvolumes gv0-posix
 14: end-volume
 15:
 16: volume gv0-changetimerecorder
 17:     type features/changetimerecorder
 18:     option db-type sqlite3
 19:     option hot-brick off
 20:     option db-name gv0.db
 21:     option db-path /data/brick1/gv0/.glusterfs/
 22:     option record-exit off
 23:     option ctr_link_consistency off
 24:     option ctr_lookupheal_link_timeout 300
 25:     option ctr_lookupheal_inode_timeout 300
 26:     option record-entry on
 27:     option ctr-enabled off
 28:     option record-counters off
 29:     option ctr-record-metadata-heat off
 30:     option sql-db-cachesize 12500
 31:     option sql-db-wal-autocheckpoint 25000
 32:     subvolumes gv0-trash
 33: end-volume
 34:
 35: volume gv0-changelog
 36:     type features/changelog
 37:     option changelog-brick /data/brick1/gv0
 38:     option changelog-dir /data/brick1/gv0/.glusterfs/changelogs
 39:     option changelog-barrier-timeout 120
 40:     subvolumes gv0-changetimerecorder
 41: end-volume
 42:
 43: volume gv0-bitrot-stub
 44:     type features/bitrot-stub
 45:     option export /data/brick1/gv0
 46:     subvolumes gv0-changelog
 47: end-volume
 48:
 49: volume gv0-access-control
 50:     type features/access-control
 51:     subvolumes gv0-bitrot-stub
 52: end-volume
 53:
 54: volume gv0-locks
 55:     type features/locks
 56:     subvolumes gv0-access-control
 57: end-volume
 58:
 59: volume gv0-worm
 60:     type features/worm
 61:     option worm off
 62:     option worm-file-level off
 63:     subvolumes gv0-locks
 64: end-volume
 65:
 66: volume gv0-read-only
 67:     type features/read-only
 68:     option read-only off
 69:     subvolumes gv0-worm
 70: end-volume
 71:
 72: volume gv0-leases
 73:     type features/leases
 74:     option leases off
 75:     subvolumes gv0-read-only
 76: end-volume
 77:
 78: volume gv0-upcall
 79:     type features/upcall
 80:     option cache-invalidation off
 81:     subvolumes gv0-leases
 82: end-volume
 83:
 84: volume gv0-io-threads
 85:     type performance/io-threads
 86:     subvolumes gv0-upcall
 87: end-volume
 88:
 89: volume gv0-marker
 90:     type features/marker
 91:     option volume-uuid 6491a59c-866f-4a1d-b21b-f894ea0e50cd
 92:     option timestamp-file /var/lib/glusterd/vols/gv0/marker.tstamp
 93:     option quota-version 0
 94:     option xtime off
 95:     option gsync-force-xtime off
 96:     option quota off
 97:     option inode-quota off
 98:     subvolumes gv0-io-threads
 99: end-volume
100:
101: volume gv0-barrier
102:     type features/barrier
103:     option barrier disable
104:     option barrier-timeout 120
105:     subvolumes gv0-marker
106: end-volume
107:
108: volume gv0-index
109:     type features/index
110:     option index-base /data/brick1/gv0/.glusterfs/indices
111:     subvolumes gv0-barrier
112: end-volume
113:
114: volume gv0-quota
115:     type features/quota
116:     option volume-uuid gv0
117:     option server-quota off
118:     option timeout 0
119:     option deem-statfs off
120:     subvolumes gv0-index
121: end-volume
122:
123: volume gv0-io-stats
124:     type debug/io-stats
125:     option unique-id /data/brick1/gv0
126:     option log-level INFO
127:     option latency-measurement off
128:     option count-fop-hits off
129:     subvolumes gv0-quota
130: end-volume
131:
132: volume /data/brick1/gv0
133:     type performance/decompounder
134:     option auth.addr./data/brick1/gv0.allow *
135:     option auth-path /data/brick1/gv0
136:     option auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password e5fe5e7e-6722-4845-8149-edaf14065ac0
137:     option auth.login./data/brick1/gv0.allow 2d6e8c76-47ed-4ac4-87ff-f96693f048b5
138:     subvolumes gv0-io-stats
139: end-volume
140:
141: volume gv0-server
142:     type protocol/server
143:     option transport.rdma.listen-port 49152
144:     option rpc-auth.auth-glusterfs on
145:     option rpc-auth.auth-unix on
146:     option rpc-auth.auth-null on
147:     option rpc-auth-allow-insecure on
148:     option transport-type rdma
149:     option auth.login./data/brick1/gv0.allow 2d6e8c76-47ed-4ac4-87ff-f96693f048b5
150:     option auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password e5fe5e7e-6722-4845-8149-edaf14065ac0
151:     option auth-path /data/brick1/gv0
152:     option auth.addr./data/brick1/gv0.allow *
153:     subvolumes /data/brick1/gv0
154: end-volume
155:
+------------------------------------------------------------------------------+

Anyway, gluster tells that the volume started successfully.

gluster> volume info gv0

Volume Name: gv0
Type: Stripe
Volume ID: 6491a59c-866f-4a1d-b21b-f894ea0e50cd
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: gluster-s1-fdr:/data/brick1/gv0
Brick2: gluster-s2-fdr:/data/brick1/gv0
Options Reconfigured:
nfs.disable: on
gluster>
gluster> volume status gv0
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster-s1-fdr:/data/brick1/gv0       0         49152      Y       2553
Brick gluster-s2-fdr:/data/brick1/gv0       0         49152      Y       2580

Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

I proceed to mount.  I did:

[root at gluster-s1 ~]# mount -t glusterfs glusterfs-s1-fdr:/gv0 /mnt
Mount failed. Please check the log file for more details.

The following was written to mnt.log:

[2017-08-16 11:09:08.794585] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.3 (args: /usr/sbin/glusterfs --volfile-server=glusterfs-s1-fdr --volfile-id=/gv0 /mnt)
[2017-08-16 11:09:08.949784] E [MSGID: 101075] [common-utils.c:307:gf_resolve_ip6] 0-resolver: getaddrinfo failed (unknown name or service)
[2017-08-16 11:09:08.949815] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host glusterfs-s1-fdr
[2017-08-16 11:09:08.949956] I [glusterfsd-mgmt.c:2134:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: glusterfs-s1-fdr
[2017-08-16 11:09:08.950097] I [glusterfsd-mgmt.c:2155:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2017-08-16 11:09:08.950105] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 11:09:08.950277] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xab) [0x7fdfa46bba2b] -->/usr/sbin/glusterfs(+0x10afd) [0x7fdfa4df2afd] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fdfa4debe4b] ) 0-: received signum (1), shutting down
[2017-08-16 11:09:08.950326] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/mnt'.
[2017-08-16 11:09:08.950582] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fdfa3752dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fdfa4dec025] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fdfa4debe4b] ) 0-: received signum (15), shutting down