[Gluster-users] Instable server with server/server encryption

Mon Dec 7 10:41:21 UTC 2015

Le 07/12/2015 11:32, Kaushal M a écrit :
> On Mon, Dec 7, 2015 at 2:55 PM, Yannick Perret
> <yannick.perret at liris.cnrs.fr> wrote:
>> Hello,
>>
>> I'm having problems with glusterfs and server/server encryption.
>>
>> I have 2 servers (sto1 & sto2) with latest stable version (3.6.7-1 from
>> gluster repo) on Debian 8.2 (amd64), with one single volume with
>> replication.
>>
>> Without /var/lib/glusterd/secure-access all works as expected.
>>
> Enabling encryption requires a little more work before touching
> /var/lib/gluster/secure-access. I have written a blog post [1] which
> should help with the steps for getting encryption working with
> GlusterFS. Please check it out, and see if you've done everything
> required.
>
> [1] https://kshlm.in/network-encryption-in-glusterfs/
Yes, I followed this post (and an other one).
Please note that I can successfuly use my glusterfs volume, but both 
servers are up only if I start them both (mostly) at the same time.

So when I manage to start the 2 servers everything is fine, I can mount 
and use the volume from client, I can perform any configuration commands.
My problem is when one server is started not at the same time (with 
either crash or "ping-pong" crash).

An other point: both servers have the same time (it may be important for 
TLS stuff).

Regards,
--
Y.
>> Then I shut down both servers (without any client mounting the volume),
>> touch /var/lib/glusterd/secure-access on both servers, and start service on
>> one of the servers:
>> root at sto2:~# /etc/init.d/glusterfs-server stop
>> [ ok ] Stopping glusterfs-server (via systemctl): glusterfs-server.service.
>>
>> I touch the file:
>> root at sto2:~# touch /var/lib/glusterd/secure-access
>>
>> I start the service (the other server is still down):
>> root at sto2:~# /etc/init.d/glusterfs-server start
>> [ ok ] Starting glusterfs-server (via systemctl): glusterfs-server.service.
>> root at sto2:~# ps aux | grep glus
>> root     22538  1.3  0.4 402828 18668 ?        Ssl  10:07   0:00
>> /usr/sbin/glusterd -p /var/run/glusterd.pid
>> -> it is running.
>>
>> I check the pool:
>> root at sto2:~# gluster pool list
>> UUID                    Hostname              State
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2    sto1...    Disconnected
>> 8f51f101-254e-43f9-82a3-ec02591110b5    localhost Connected
>>
>> It is what expected at this point.
>> But now the gluster daemon is dead:
>> root at sto2:~# gluster pool list
>> Connection failed. Please check if gluster daemon is operational.
>>
>> I can stop and start again the service, and it dies after the 1st command,
>> whatever the command (tested with 'gluster volume status' which answers
>> 'Volume HOME is not started' which is the correct state as I stoped the only
>> volume before activating server/server encryption).
>>
>> Note that at this point the other server is still down and no client is
>> started.
>> See at the end the "crash log" from the server.
>>
>>
>> I guess it is not the expected behavior, and it is clearly a different
>> behavior than without server/server encryption. For example if I remove the
>> secure-access file:
>>
>> root at sto2:~# /etc/init.d/glusterfs-server stop
>> [ ok ] Stopping glusterfs-server (via systemctl): glusterfs-server.service.
>> root at sto2:~# rm /var/lib/glusterd/secure-access
>> root at sto2:~# /etc/init.d/glusterfs-server start
>> [ ok ] Starting glusterfs-server (via systemctl): glusterfs-server.service.
>> root at sto2:~# gluster pool list
>> UUID                    Hostname              State
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2    sto1...    Disconnected
>> 8f51f101-254e-43f9-82a3-ec02591110b5    localhost Connected
>>
>> And whatever I do the daemon is still alive and responding.
>>
>>
>> Is this a bug or I missed something needed when moving to server/server
>> encryption?
>>
>>
>> Moreover if I try to start the other server without performing any action on
>> the 1st (to prevent crash I have a "ping-pong" crash (start at sto2 then
>> start at sto1):
>> root at sto2:~# /etc/init.d/glusterfs-server start
>> [ ok ] Starting glusterfs-server (via systemctl): glusterfs-server.service.
>> root at sto1:~# /etc/init.d/glusterfs-server start
>> [ ok ] Starting glusterfs-server (via systemctl): glusterfs-server.service.
>> root at sto1:~# gluster pool list
>> UUID                    Hostname              State
>> 8f51f101-254e-43f9-82a3-ec02591110b5    sto2.liris.cnrs.fr Disconnected
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2    localhost Connected
>> -> here daemon is dead on sto2. Let restart sto2 daemon:
>> root at sto2:~# /etc/init.d/glusterfs-server restart
>> [ ok ] Restarting glusterfs-server (via systemctl):
>> glusterfs-server.service.
>> root at sto2:~# gluster pool list
>> UUID                    Hostname              State
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2    sto1.liris.cnrs.fr Disconnected
>> 8f51f101-254e-43f9-82a3-ec02591110b5    localhost Connected
>> -> here daemon is dead on sto1.
>> root at sto1:~# gluster pool list
>> Connection failed. Please check if gluster daemon is operational.
>>
>>
>> If I restart both daemons (mostly) at the same time it works fine:
>> root at sto1:~# /etc/init.d/glusterfs-server restart
>> [ ok ] Restarting glusterfs-server (via systemctl):
>> glusterfs-server.service.
>> root at sto2:~# /etc/init.d/glusterfs-server restart
>> [ ok ] Restarting glusterfs-server (via systemctl): glusterfs-server.service
>> root at sto1:~# gluster pool list
>> UUID                    Hostname              State
>> 8f51f101-254e-43f9-82a3-ec02591110b5    sto2.liris.cnrs.fr Connected
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2    localhost Connected
>> root at sto2:~# gluster pool list
>> UUID                    Hostname              State
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2    sto1.liris.cnrs.fr Connected
>> 8f51f101-254e-43f9-82a3-ec02591110b5    localhost Connected
>>
>>
>> Of course this is not an expected behavior as after a global shutdown
>> servers may not restart at the same time. Moreover it is a real problem when
>> shuting down a single server (i.e. for maintenance) as I get again the
>> "ping-pong" problem.
>>
>>
>> Any help would be appreciate.
>>
>> Note : before that these 2 servers were used for testing replicated volumes
>> (without encryption) without any problem.
>>
>> Regards,
>> --
>> Y.
>>
>> Log from sto2:
>>
>> cat /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>>
>> [2015-12-07 09:09:43.345640] I [MSGID: 100030] [glusterfsd.c:2035:main]
>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.7
>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid)
>> [2015-12-07 09:09:43.352452] I [glusterd.c:1214:init] 0-management: Maximum
>> allowed open file descriptors set to 65536
>> [2015-12-07 09:09:43.352516] I [glusterd.c:1259:init] 0-management: Using
>> /var/lib/glusterd as working directory
>> [2015-12-07 09:09:43.359063] I [socket.c:3880:socket_init]
>> 0-socket.management: SSL support on the I/O path is ENABLED
>> [2015-12-07 09:09:43.359102] I [socket.c:3883:socket_init]
>> 0-socket.management: SSL support for glusterd is ENABLED
>> [2015-12-07 09:09:43.359138] I [socket.c:3900:socket_init]
>> 0-socket.management: using private polling thread
>> [2015-12-07 09:09:43.361848] W [rdma.c:4440:__gf_rdma_ctx_create]
>> 0-rpc-transport/rdma: rdma_cm event channel creation failed (Aucun
>> périphérique de ce type)
>> [2015-12-07 09:09:43.361885] E [rdma.c:4744:init] 0-rdma.management: Failed
>> to initialize IB Device
>> [2015-12-07 09:09:43.361902] E [rpc-transport.c:333:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2015-12-07 09:09:43.362023] W [rpcsvc.c:1524:rpcsvc_transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2015-12-07 09:09:43.362267] I [socket.c:3883:socket_init]
>> 0-socket.management: SSL support for glusterd is ENABLED
>> [2015-12-07 09:09:46.812491] I
>> [glusterd-store.c:2048:glusterd_restore_op_version] 0-glusterd: retrieved
>> op-version: 30603
>> [2015-12-07 09:09:47.192205] I
>> [glusterd-handler.c:3179:glusterd_friend_add_from_peerinfo] 0-management:
>> connect returned 0
>> [2015-12-07 09:09:47.192321] I [rpc-clnt.c:969:rpc_clnt_connection_init]
>> 0-management: setting frame-timeout to 600
>> [2015-12-07 09:09:47.192564] I [socket.c:3880:socket_init] 0-management: SSL
>> support on the I/O path is ENABLED
>> [2015-12-07 09:09:47.192585] I [socket.c:3883:socket_init] 0-management: SSL
>> support for glusterd is ENABLED
>> [2015-12-07 09:09:47.192601] I [socket.c:3900:socket_init] 0-management:
>> using private polling thread
>> [2015-12-07 09:09:47.195831] E [socket.c:3016:socket_connect] 0-management:
>> connection attempt on  failed, (Connexion refusée)
>> [2015-12-07 09:09:47.196341] I [MSGID: 106004]
>> [glusterd-handler.c:4398:__glusterd_peer_rpc_notify] 0-management: Peer
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2, in Peer in Cluster state, has
>> disconnected from glusterd.
>> [2015-12-07 09:09:47.196413] E [socket.c:384:ssl_setup_connection]
>> 0-management: SSL connect error
>> [2015-12-07 09:09:47.196480] E [socket.c:2386:socket_poller] 0-management:
>> client setup failed
>> [2015-12-07 09:09:47.196534] E [glusterd-utils.c:181:glusterd_unlock]
>> 0-management: Cluster lock not held!
>> [2015-12-07 09:09:47.196642] I [mem-pool.c:545:mem_pool_destroy]
>> 0-management: size=588 max=0 total=0
>> [2015-12-07 09:09:47.196671] I [mem-pool.c:545:mem_pool_destroy]
>> 0-management: size=124 max=0 total=0
>> [2015-12-07 09:09:47.196787] I [glusterd.c:146:glusterd_uuid_init]
>> 0-management: retrieved UUID: 8f51f101-254e-43f9-82a3-ec02591110b5
>> Final graph:
>> +------------------------------------------------------------------------------+
>>    1: volume management
>>    2:     type mgmt/glusterd
>>    3:     option transport.socket.ssl-enabled on
>>    4:     option rpc-auth.auth-glusterfs on
>>    5:     option rpc-auth.auth-unix on
>>    6:     option rpc-auth.auth-null on
>>    7:     option transport.socket.listen-backlog 128
>>    8:     option ping-timeout 30
>>    9:     option transport.socket.read-fail-log off
>>   10:     option transport.socket.keepalive-interval 2
>>   11:     option transport.socket.keepalive-time 10
>>   12:     option transport-type rdma
>>   13:     option working-directory /var/lib/glusterd
>>   14: end-volume
>>   15:
>> +------------------------------------------------------------------------------+
>> [2015-12-07 09:09:50.348636] E [socket.c:2859:socket_connect] (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x147)[0x7f1b5a951497]
>> (-->
>> /usr/lib/x86_64-linux-gnu/glusterfs/3.6.7/rpc-transport/socket.so(+0x6c32)[0x7f1b545c3c32]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f1b5a723469]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xcd)[0x7f1b5a96b40d]
>> (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7f1b5a0e50a4] )))))
>> 0-socket: invalid argument: this->private
>> [2015-12-07 09:09:53.349724] E [socket.c:2859:socket_connect] (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x147)[0x7f1b5a951497]
>> (-->
>> /usr/lib/x86_64-linux-gnu/glusterfs/3.6.7/rpc-transport/socket.so(+0x6c32)[0x7f1b545c3c32]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f1b5a723469]
>> (-->
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xcd)[0x7f1b5a96b40d]
>> (--> /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7f1b5a0e50a4] )))))
>> 0-socket: invalid argument: this->private
>> [2015-12-07 09:09:55.604061] W
>> [glusterd-op-sm.c:4073:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2015-12-07 09:09:55.604797] I
>> [glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume HOME
>> [2015-12-07 09:09:55.605488] E [glusterd-syncop.c:1184:gd_stage_op_phase]
>> 0-management: Staging of operation 'Volume Status' failed on localhost :
>> Volume HOME is not started
>> [2015-12-07 09:09:47.196634] I [MSGID: 106004]
>> [glusterd-handler.c:4398:__glusterd_peer_rpc_notify] 0-management: Peer
>> 5fdb629d-886f-43cb-9a71-582051b0dbb2, in Peer in Cluster state, has
>> disconnected from glusterd.
>> pending frames:
>> patchset: git://git.gluster.com/glusterfs.git
>> signal received: 11
>> time of crash:
>> 2015-12-07 09:09:56
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 3.6.7
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb1)[0x7f1b5a9522a1]
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f1b5a96919d]
>> /lib/x86_64-linux-gnu/libc.so.6(+0x35180)[0x7f1b5996e180]
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_connect+0x8)[0x7f1b5a721f48]
>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9)[0x7f1b5a723469]
>> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xcd)[0x7f1b5a96b40d]
>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x80a4)[0x7f1b5a0e50a4]
>> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f1b59a1f04d]
>> ---------
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151207/d76baeba/attachment.p7s>