[Gluster-users] Is transport=rdma tested with "stripe"?
Hatazaki, Takao
takao.hatazaki at hpe.com
Wed Aug 16 11:14:15 UTC 2017
> Note that "stripe" is not tested much and practically unmaintained.
Ah, this was what I suspected. Understood. I'll be happy with "shard".
Having said that, "stripe" works fine with transport=tcp. The failure reproduces with just 2 RDMA servers (with InfiniBand), one of those acts also as a client.
I looked into logs. I paste lengthy logs below with hoping mail systems not automatically fold lines...
Takao
---
Immediately started the "gluster" interactive command, the following appeared in cli.log. The last line repeats at every 3 seconds.
[2017-08-16 10:49:00.028789] I [cli.c:759:main] 0-cli: Started running gluster with version 3.10.3
[2017-08-16 10:49:00.032509] I [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not installed
[2017-08-16 10:49:00.033038] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 10:49:00.033092] I [socket.c:2415:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
[2017-08-16 10:49:03.032434] I [socket.c:2415:socket_event_handler] 0-transport: EPOLLERR - disconnecting now
When I do:
gluster> volume create gv0 stripe 2 transport rdma gluster-s1-fdr:/data/brick1/gv0 gluster-s2-fdr:/data/brick1/gv0
volume create: gv0: success: please start the volume to access data
gluster> volume start gv0
volume start: gv0: success
The following appeared in glusterd.log. Note the "E" flag on the last line.
[2017-08-16 10:38:48.451329] I [MSGID: 106062] [glusterd-volume-ops.c:2617:glusterd_op_start_volume] 0-management: Global dict not present.
[2017-08-16 10:38:48.751913] I [MSGID: 106143] [glusterd-pmap.c:277:pmap_registry_bind] 0-pmap: adding brick /data/brick1/gv0.rdma on port 49152
[2017-08-16 10:38:48.752222] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-16 10:38:48.915868] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2017-08-16 10:38:48.915977] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2017-08-16 10:38:48.916008] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is stopped
[2017-08-16 10:38:48.916189] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2017-08-16 10:38:48.916210] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is stopped
[2017-08-16 10:38:48.916232] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2017-08-16 10:38:48.916245] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is stopped
[2017-08-16 10:38:49.392687] I [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdbd7a) [0x7fbb107e5d7a] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb83d) [0x7fbb107e583d] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fbb1bc5c385] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=gv0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-08-16 10:38:49.402177] E [run.c:191:runner_log] (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdbd7a) [0x7fbb107e5d7a] -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) [0x7fbb107e579b] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fbb1bc5c385] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=gv0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
Looks like this was related to Samba that I do not use. The same E error happens even I use transport=tcp. No error in brick logs. Below is what was written to data-brick1-gv0.log:
[2017-08-16 10:59:24.127902] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.3 (args: /usr/sbin/glusterfsd -s gluster-s1-fdr --volfile-id gv0.gluster-s1-fdr.data-brick1-gv0 -p /var/lib/glusterd/vols/gv0/run/gluster-s1-fdr-data-brick1-gv0.pid -S /var/run/gluster/6b6de65a92fa07146541a9474ffa2fd2.socket --brick-name /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log --xlator-option *-posix.glusterd-uuid=5c750a8f-c45b-4a7e-af84-16c1999874b7 --brick-port 49152 --xlator-option gv0-server.listen-port=49152 --volfile-server-transport=rdma)
[2017-08-16 10:59:24.134054] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 10:59:24.137118] I [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2017-08-16 10:59:24.138384] W [MSGID: 101002] [options.c:954:xl_opt_validate] 0-gv0-server: option 'listen-port' is deprecated, preferred is 'transport.rdma.listen-port', continuing with correction
[2017-08-16 10:59:24.142207] I [MSGID: 121050] [ctr-helper.c:259:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is disabled.
[2017-08-16 10:59:24.237783] I [trash.c:2493:init] 0-gv0-trash: no option specified for 'eliminate', using NULL
[2017-08-16 10:59:24.239129] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'rpc-auth.auth-glusterfs' is not recognized
[2017-08-16 10:59:24.239189] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'rpc-auth.auth-unix' is not recognized
[2017-08-16 10:59:24.239203] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'rpc-auth.auth-null' is not recognized
[2017-08-16 10:59:24.239226] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'auth-path' is not recognized
[2017-08-16 10:59:24.239235] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth.addr./data/brick1/gv0.allow' is not recognized
[2017-08-16 10:59:24.239251] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth-path' is not recognized
[2017-08-16 10:59:24.239257] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password' is not recognized
[2017-08-16 10:59:24.239263] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth.login./data/brick1/gv0.allow' is not recognized
[2017-08-16 10:59:24.239276] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-quota: option 'timeout' is not recognized
[2017-08-16 10:59:24.239311] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-gv0-trash: option 'brick-path' is not recognized
Final graph:
+------------------------------------------------------------------------------+
1: volume gv0-posix
2: type storage/posix
3: option glusterd-uuid 5c750a8f-c45b-4a7e-af84-16c1999874b7
4: option directory /data/brick1/gv0
5: option volume-id 6491a59c-866f-4a1d-b21b-f894ea0e50cd
6: end-volume
7:
8: volume gv0-trash
9: type features/trash
10: option trash-dir .trashcan
11: option brick-path /data/brick1/gv0
12: option trash-internal-op off
13: subvolumes gv0-posix
14: end-volume
15:
16: volume gv0-changetimerecorder
17: type features/changetimerecorder
18: option db-type sqlite3
19: option hot-brick off
20: option db-name gv0.db
21: option db-path /data/brick1/gv0/.glusterfs/
22: option record-exit off
23: option ctr_link_consistency off
24: option ctr_lookupheal_link_timeout 300
25: option ctr_lookupheal_inode_timeout 300
26: option record-entry on
27: option ctr-enabled off
28: option record-counters off
29: option ctr-record-metadata-heat off
30: option sql-db-cachesize 12500
31: option sql-db-wal-autocheckpoint 25000
32: subvolumes gv0-trash
33: end-volume
34:
35: volume gv0-changelog
36: type features/changelog
37: option changelog-brick /data/brick1/gv0
38: option changelog-dir /data/brick1/gv0/.glusterfs/changelogs
39: option changelog-barrier-timeout 120
40: subvolumes gv0-changetimerecorder
41: end-volume
42:
43: volume gv0-bitrot-stub
44: type features/bitrot-stub
45: option export /data/brick1/gv0
46: subvolumes gv0-changelog
47: end-volume
48:
49: volume gv0-access-control
50: type features/access-control
51: subvolumes gv0-bitrot-stub
52: end-volume
53:
54: volume gv0-locks
55: type features/locks
56: subvolumes gv0-access-control
57: end-volume
58:
59: volume gv0-worm
60: type features/worm
61: option worm off
62: option worm-file-level off
63: subvolumes gv0-locks
64: end-volume
65:
66: volume gv0-read-only
67: type features/read-only
68: option read-only off
69: subvolumes gv0-worm
70: end-volume
71:
72: volume gv0-leases
73: type features/leases
74: option leases off
75: subvolumes gv0-read-only
76: end-volume
77:
78: volume gv0-upcall
79: type features/upcall
80: option cache-invalidation off
81: subvolumes gv0-leases
82: end-volume
83:
84: volume gv0-io-threads
85: type performance/io-threads
86: subvolumes gv0-upcall
87: end-volume
88:
89: volume gv0-marker
90: type features/marker
91: option volume-uuid 6491a59c-866f-4a1d-b21b-f894ea0e50cd
92: option timestamp-file /var/lib/glusterd/vols/gv0/marker.tstamp
93: option quota-version 0
94: option xtime off
95: option gsync-force-xtime off
96: option quota off
97: option inode-quota off
98: subvolumes gv0-io-threads
99: end-volume
100:
101: volume gv0-barrier
102: type features/barrier
103: option barrier disable
104: option barrier-timeout 120
105: subvolumes gv0-marker
106: end-volume
107:
108: volume gv0-index
109: type features/index
110: option index-base /data/brick1/gv0/.glusterfs/indices
111: subvolumes gv0-barrier
112: end-volume
113:
114: volume gv0-quota
115: type features/quota
116: option volume-uuid gv0
117: option server-quota off
118: option timeout 0
119: option deem-statfs off
120: subvolumes gv0-index
121: end-volume
122:
123: volume gv0-io-stats
124: type debug/io-stats
125: option unique-id /data/brick1/gv0
126: option log-level INFO
127: option latency-measurement off
128: option count-fop-hits off
129: subvolumes gv0-quota
130: end-volume
131:
132: volume /data/brick1/gv0
133: type performance/decompounder
134: option auth.addr./data/brick1/gv0.allow *
135: option auth-path /data/brick1/gv0
136: option auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password e5fe5e7e-6722-4845-8149-edaf14065ac0
137: option auth.login./data/brick1/gv0.allow 2d6e8c76-47ed-4ac4-87ff-f96693f048b5
138: subvolumes gv0-io-stats
139: end-volume
140:
141: volume gv0-server
142: type protocol/server
143: option transport.rdma.listen-port 49152
144: option rpc-auth.auth-glusterfs on
145: option rpc-auth.auth-unix on
146: option rpc-auth.auth-null on
147: option rpc-auth-allow-insecure on
148: option transport-type rdma
149: option auth.login./data/brick1/gv0.allow 2d6e8c76-47ed-4ac4-87ff-f96693f048b5
150: option auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password e5fe5e7e-6722-4845-8149-edaf14065ac0
151: option auth-path /data/brick1/gv0
152: option auth.addr./data/brick1/gv0.allow *
153: subvolumes /data/brick1/gv0
154: end-volume
155:
+------------------------------------------------------------------------------+
Anyway, gluster tells that the volume started successfully.
gluster> volume info gv0
Volume Name: gv0
Type: Stripe
Volume ID: 6491a59c-866f-4a1d-b21b-f894ea0e50cd
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: gluster-s1-fdr:/data/brick1/gv0
Brick2: gluster-s2-fdr:/data/brick1/gv0
Options Reconfigured:
nfs.disable: on
gluster>
gluster> volume status gv0
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster-s1-fdr:/data/brick1/gv0 0 49152 Y 2553
Brick gluster-s2-fdr:/data/brick1/gv0 0 49152 Y 2580
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks
I proceed to mount. I did:
[root at gluster-s1 ~]# mount -t glusterfs glusterfs-s1-fdr:/gv0 /mnt
Mount failed. Please check the log file for more details.
The following was written to mnt.log:
[2017-08-16 11:09:08.794585] I [MSGID: 100030] [glusterfsd.c:2475:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.3 (args: /usr/sbin/glusterfs --volfile-server=glusterfs-s1-fdr --volfile-id=/gv0 /mnt)
[2017-08-16 11:09:08.949784] E [MSGID: 101075] [common-utils.c:307:gf_resolve_ip6] 0-resolver: getaddrinfo failed (unknown name or service)
[2017-08-16 11:09:08.949815] E [name.c:262:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host glusterfs-s1-fdr
[2017-08-16 11:09:08.949956] I [glusterfsd-mgmt.c:2134:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: glusterfs-s1-fdr
[2017-08-16 11:09:08.950097] I [glusterfsd-mgmt.c:2155:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2017-08-16 11:09:08.950105] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 11:09:08.950277] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xab) [0x7fdfa46bba2b] -->/usr/sbin/glusterfs(+0x10afd) [0x7fdfa4df2afd] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fdfa4debe4b] ) 0-: received signum (1), shutting down
[2017-08-16 11:09:08.950326] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/mnt'.
[2017-08-16 11:09:08.950582] W [glusterfsd.c:1332:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fdfa3752dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fdfa4dec025] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fdfa4debe4b] ) 0-: received signum (15), shutting down
More information about the Gluster-users
mailing list