[Gluster-users] Failure while upgrading gluster to 3.10.1

Wed May 31 06:00:42 UTC 2017

Hello Atin,

I've tried restarting gluster one after another, but still see the same
result.

On Tue, May 30, 2017 at 10:40 AM, Atin Mukherjee <amukherj at redhat.com>
wrote:

> Pawan - I couldn't reach to any conclusive analysis so far. But, looking
> at the client (nfs)  & glusterd log files, it does look like that there is
> an issue w.r.t peer connections. Does restarting all the glusterd one by
> one solve this?
>
> On Mon, May 29, 2017 at 4:50 PM, Pawan Alwandi <pawan at platform.sh> wrote:
>
>> Sorry for big attachment in previous mail...last 1000 lines of those logs
>> attached now.
>>
>> On Mon, May 29, 2017 at 4:44 PM, Pawan Alwandi <pawan at platform.sh> wrote:
>>
>>>
>>>
>>> On Thu, May 25, 2017 at 9:54 PM, Atin Mukherjee <amukherj at redhat.com>
>>> wrote:
>>>
>>>>
>>>> On Thu, 25 May 2017 at 19:11, Pawan Alwandi <pawan at platform.sh> wrote:
>>>>
>>>>> Hello Atin,
>>>>>
>>>>> Yes, glusterd on other instances are up and running.  Below is the
>>>>> requested output on all the three hosts.
>>>>>
>>>>> Host 1
>>>>>
>>>>> # gluster peer status
>>>>> Number of Peers: 2
>>>>>
>>>>> Hostname: 192.168.0.7
>>>>> Uuid: 5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>> State: Peer in Cluster (Disconnected)
>>>>>
>>>>
>>>> Glusterd is disconnected here.
>>>>
>>>>>
>>>>>
>>>>> Hostname: 192.168.0.6
>>>>> Uuid: 83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>> State: Peer in Cluster (Disconnected)
>>>>>
>>>>
>>>> Same as above
>>>>
>>>> Can you please check what does glusterd log have to say here about
>>>> these disconnects?
>>>>
>>>
>>> glusterd keeps logging this every 3s
>>>
>>> [2017-05-29 11:04:52.182782] W [socket.c:852:__socket_keepalive]
>>> 0-socket: failed to set keep idle -1 on socket 5, Invalid argument
>>> [2017-05-29 11:04:52.182808] E [socket.c:2966:socket_connect]
>>> 0-management: Failed to set keep-alive: Invalid argument
>>> [2017-05-29 11:04:52.183032] W [socket.c:852:__socket_keepalive]
>>> 0-socket: failed to set keep idle -1 on socket 20, Invalid argument
>>> [2017-05-29 11:04:52.183052] E [socket.c:2966:socket_connect]
>>> 0-management: Failed to set keep-alive: Invalid argument
>>> [2017-05-29 11:04:52.183622] E [rpc-clnt.c:362:saved_frames_unwind]
>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f767c239c8e]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8]
>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>>> called at 2017-05-29 11:04:52.183210 (xid=0x23419)
>>> [2017-05-29 11:04:52.183735] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/gl
>>> usterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>> sterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>> sterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] )
>>> 0-management: Lock for vol shared not held
>>> [2017-05-29 11:04:52.183928] E [rpc-clnt.c:362:saved_frames_unwind]
>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f767c239c8e]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8]
>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>>> called at 2017-05-29 11:04:52.183422 (xid=0x23419)
>>> [2017-05-29 11:04:52.184027] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/gl
>>> usterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>> sterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>> sterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] )
>>> 0-management: Lock for vol shared not held
>>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> # gluster volume status
>>>>> Status of volume: shared
>>>>> Gluster process                             TCP Port  RDMA Port
>>>>> Online  Pid
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Brick 192.168.0.5:/data/exports/shared      49152     0
>>>>> Y       2105
>>>>> NFS Server on localhost                     2049      0
>>>>> Y       2089
>>>>> Self-heal Daemon on localhost               N/A       N/A
>>>>> Y       2097
>>>>>
>>>>
>>>> Volume status output does show all the bricks are up. So I'm not sure
>>>> why are you seeing the volume as read only. Can you please provide the
>>>> mount log?
>>>>
>>>
>>> The attached tar has nfs.log, etc-glusterfs-glusterd.vol.log,
>>> glustershd.log from host1.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Task Status of Volume shared
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> There are no active volume tasks
>>>>>
>>>>> Host 2
>>>>>
>>>>> # gluster peer status
>>>>> Number of Peers: 2
>>>>>
>>>>> Hostname: 192.168.0.7
>>>>> Uuid: 5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> Hostname: 192.168.0.5
>>>>> Uuid: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>>
>>>>> # gluster volume status
>>>>> Status of volume: shared
>>>>> Gluster process                        Port    Online    Pid
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Brick 192.168.0.5:/data/exports/shared            49152    Y    2105
>>>>> Brick 192.168.0.6:/data/exports/shared            49152    Y    2188
>>>>> Brick 192.168.0.7:/data/exports/shared            49152    Y    2453
>>>>> NFS Server on localhost                    2049    Y    2194
>>>>> Self-heal Daemon on localhost                N/A    Y    2199
>>>>> NFS Server on 192.168.0.5                2049    Y    2089
>>>>> Self-heal Daemon on 192.168.0.5                N/A    Y    2097
>>>>> NFS Server on 192.168.0.7                2049    Y    2458
>>>>> Self-heal Daemon on 192.168.0.7                N/A    Y    2463
>>>>>
>>>>> Task Status of Volume shared
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> There are no active volume tasks
>>>>>
>>>>> Host 3
>>>>>
>>>>> # gluster peer status
>>>>> Number of Peers: 2
>>>>>
>>>>> Hostname: 192.168.0.5
>>>>> Uuid: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> Hostname: 192.168.0.6
>>>>> Uuid: 83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>> State: Peer in Cluster (Connected)
>>>>>
>>>>> # gluster volume status
>>>>> Status of volume: shared
>>>>> Gluster process                        Port    Online    Pid
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> Brick 192.168.0.5:/data/exports/shared            49152    Y    2105
>>>>> Brick 192.168.0.6:/data/exports/shared            49152    Y    2188
>>>>> Brick 192.168.0.7:/data/exports/shared            49152    Y    2453
>>>>> NFS Server on localhost                    2049    Y    2458
>>>>> Self-heal Daemon on localhost                N/A    Y    2463
>>>>> NFS Server on 192.168.0.6                2049    Y    2194
>>>>> Self-heal Daemon on 192.168.0.6                N/A    Y    2199
>>>>> NFS Server on 192.168.0.5                2049    Y    2089
>>>>> Self-heal Daemon on 192.168.0.5                N/A    Y    2097
>>>>>
>>>>> Task Status of Volume shared
>>>>> ------------------------------------------------------------
>>>>> ------------------
>>>>> There are no active volume tasks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 24, 2017 at 8:32 PM, Atin Mukherjee <amukherj at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Are the other glusterd instances are up? output of gluster peer
>>>>>> status & gluster volume status please?
>>>>>>
>>>>>> On Wed, May 24, 2017 at 4:20 PM, Pawan Alwandi <pawan at platform.sh>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Atin,
>>>>>>>
>>>>>>> So I got gluster downgraded to 3.7.9 on host 1 and now have the
>>>>>>> glusterfs and glusterfsd processes come up.  But I see the volume is
>>>>>>> mounted read only.
>>>>>>>
>>>>>>> I see these being logged every 3s:
>>>>>>>
>>>>>>> [2017-05-24 10:45:44.440435] W [socket.c:852:__socket_keepalive]
>>>>>>> 0-socket: failed to set keep idle -1 on socket 17, Invalid argument
>>>>>>> [2017-05-24 10:45:44.440475] E [socket.c:2966:socket_connect]
>>>>>>> 0-management: Failed to set keep-alive: Invalid argument
>>>>>>> [2017-05-24 10:45:44.440734] W [socket.c:852:__socket_keepalive]
>>>>>>> 0-socket: failed to set keep idle -1 on socket 20, Invalid argument
>>>>>>> [2017-05-24 10:45:44.440754] E [socket.c:2966:socket_connect]
>>>>>>> 0-management: Failed to set keep-alive: Invalid argument
>>>>>>> [2017-05-24 10:45:44.441354] E [rpc-clnt.c:362:saved_frames_unwind]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f767c239c8e]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8]
>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>> called at 2017-05-24 10:45:44.440945 (xid=0xbf)
>>>>>>> [2017-05-24 10:45:44.441505] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/gl
>>>>>>> usterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>>>>>> sterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>>>>>> sterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] )
>>>>>>> 0-management: Lock for vol shared not held
>>>>>>> [2017-05-24 10:45:44.441660] E [rpc-clnt.c:362:saved_frames_unwind]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f767c46d483]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7f767c2383af]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f767c2384ce]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7f767c239c8e]
>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f767c23a4a8]
>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>> called at 2017-05-24 10:45:44.441086 (xid=0xbf)
>>>>>>> [2017-05-24 10:45:44.441790] W [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/gl
>>>>>>> usterd.so(glusterd_big_locked_notify+0x4b) [0x7f767734dffb]
>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>>>>>> sterd.so(__glusterd_peer_rpc_notify+0x14a) [0x7f7677357c6a]
>>>>>>> -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.9/xlator/mgmt/glu
>>>>>>> sterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7f76773f0ef3] )
>>>>>>> 0-management: Lock for vol shared not held
>>>>>>>
>>>>>>> The heal info says this:
>>>>>>>
>>>>>>> # gluster volume heal shared info
>>>>>>> Brick 192.168.0.5:/data/exports/shared
>>>>>>> Number of entries: 0
>>>>>>>
>>>>>>> Brick 192.168.0.6:/data/exports/shared
>>>>>>> Status: Transport endpoint is not connected
>>>>>>>
>>>>>>> Brick 192.168.0.7:/data/exports/shared
>>>>>>> Status: Transport endpoint is not connected
>>>>>>>
>>>>>>> Any idea whats up here?
>>>>>>>
>>>>>>> Pawan
>>>>>>>
>>>>>>> On Mon, May 22, 2017 at 9:42 PM, Atin Mukherjee <amukherj at redhat.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 22, 2017 at 9:05 PM, Pawan Alwandi <pawan at platform.sh>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, May 22, 2017 at 8:36 PM, Atin Mukherjee <
>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, May 22, 2017 at 7:51 PM, Atin Mukherjee <
>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sorry Pawan, I did miss the other part of the attachments. So
>>>>>>>>>>> looking from the glusterd.info file from all the hosts, it
>>>>>>>>>>> looks like host2 and host3 do not have the correct op-version. Can you
>>>>>>>>>>> please set the op-version as "operating-version=30702" in host2 and host3
>>>>>>>>>>> and restart glusterd instance one by one on all the nodes?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please ensure that all the hosts are upgraded to the same bits
>>>>>>>>>> before doing this change.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Having to upgrade all 3 hosts to newer version before gluster
>>>>>>>>> could work successfully on any of them means application downtime.  The
>>>>>>>>> applications running on these hosts are expected to be highly available.
>>>>>>>>> So with the way the things are right now, is an online upgrade possible?
>>>>>>>>> My upgrade steps are: (1) stop the applications (2) umount the gluster
>>>>>>>>> volume, and then (3) upgrade gluster one host at a time.
>>>>>>>>>
>>>>>>>>
>>>>>>>> One of the way to mitigate this is to first do an online upgrade to
>>>>>>>> glusterfs-3.7.9 (op-version:30707) given this bug was introduced in 3.7.10
>>>>>>>> and then come to 3.11.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Our goal is to get gluster upgraded to 3.11 from 3.6.9, and to
>>>>>>>>> make this an online upgrade we are okay to take two steps 3.6.9 -> 3.7 and
>>>>>>>>> then 3.7 to 3.11.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Apparently it looks like there is a bug which you have
>>>>>>>>>>> uncovered, during peer handshaking if one of the glusterd instance is
>>>>>>>>>>> running with old bits then during validating the handshake request there is
>>>>>>>>>>> a possibility that uuid received will be blank and the same was ignored
>>>>>>>>>>> however there was a patch http://review.gluster.org/13519 which
>>>>>>>>>>> had some additional changes which was always looking at this field and
>>>>>>>>>>> doing some extra checks which was causing the handshake to fail. For now,
>>>>>>>>>>> the above workaround should suffice. I'll be sending a patch pretty soon.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Posted a patch https://review.gluster.org/#/c/17358 .
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, May 22, 2017 at 11:35 AM, Pawan Alwandi <
>>>>>>>>>>> pawan at platform.sh> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Atin,
>>>>>>>>>>>>
>>>>>>>>>>>> The tar's have the content of `/var/lib/glusterd` too for all 3
>>>>>>>>>>>> nodes, please check again.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, May 22, 2017 at 11:32 AM, Atin Mukherjee <
>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Pawan,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see you have provided the log files from the nodes, however
>>>>>>>>>>>>> it'd be really helpful if you can provide me the content of
>>>>>>>>>>>>> /var/lib/glusterd from all the nodes to get to the root cause of this issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 19, 2017 at 12:09 PM, Pawan Alwandi <
>>>>>>>>>>>>> pawan at platform.sh> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Atin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for continued support.  I've attached requested files
>>>>>>>>>>>>>> from all 3 nodes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (I think we already verified the UUIDs to be correct, anyway
>>>>>>>>>>>>>> let us know if you find any more info in the logs)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee <
>>>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 18 May 2017 at 23:40, Atin Mukherjee <
>>>>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, 17 May 2017 at 12:47, Pawan Alwandi
>>>>>>>>>>>>>>>> <pawan at platform.sh> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello Atin,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I realized that these http://gluster.readthedocs.io/
>>>>>>>>>>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ instructions
>>>>>>>>>>>>>>>>> only work for upgrades from 3.7, while we are running 3.6.2.  Are there
>>>>>>>>>>>>>>>>> instructions/suggestion you have for us to upgrade from 3.6 version?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe upgrade from 3.6 to 3.7 and then to 3.10 would
>>>>>>>>>>>>>>>>> work, but I see similar errors reported when I upgraded to 3.7 too.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For what its worth, I was able to set the op-version
>>>>>>>>>>>>>>>>> (gluster v set all cluster.op-version 30702) but that doesn't seem to help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.700014] I [MSGID: 100030]
>>>>>>>>>>>>>>>>> [glusterfsd.c:2338:main] 0-/usr/sbin/glusterd: Started running
>>>>>>>>>>>>>>>>> /usr/sbin/glusterd version 3.7.20 (args: /usr/sbin/glusterd -p
>>>>>>>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.703808] I [MSGID: 106478]
>>>>>>>>>>>>>>>>> [glusterd.c:1383:init] 0-management: Maximum allowed open file descriptors
>>>>>>>>>>>>>>>>> set to 65536
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.703836] I [MSGID: 106479]
>>>>>>>>>>>>>>>>> [glusterd.c:1432:init] 0-management: Using /var/lib/glusterd as working
>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.708866] W [MSGID: 103071]
>>>>>>>>>>>>>>>>> [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma:
>>>>>>>>>>>>>>>>> rdma_cm event channel creation failed [No such device]
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.709011] W [MSGID: 103055]
>>>>>>>>>>>>>>>>> [rdma.c:4901:init] 0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.709033] W
>>>>>>>>>>>>>>>>> [rpc-transport.c:359:rpc_transport_load] 0-rpc-transport:
>>>>>>>>>>>>>>>>> 'rdma' initialization failed
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.709088] W
>>>>>>>>>>>>>>>>> [rpcsvc.c:1642:rpcsvc_create_listener] 0-rpc-service:
>>>>>>>>>>>>>>>>> cannot create listener, initing the transport failed
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:33.709105] E [MSGID: 106243]
>>>>>>>>>>>>>>>>> [glusterd.c:1656:init] 0-management: creation of 1 listeners failed,
>>>>>>>>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.480043] I [MSGID: 106513]
>>>>>>>>>>>>>>>>> [glusterd-store.c:2068:glusterd_restore_op_version]
>>>>>>>>>>>>>>>>> 0-glusterd: retrieved op-version: 30600
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.605779] I [MSGID: 106498]
>>>>>>>>>>>>>>>>> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>> 0-management: connect returned 0
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.607059] I
>>>>>>>>>>>>>>>>> [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management:
>>>>>>>>>>>>>>>>> setting frame-timeout to 600
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.607670] I
>>>>>>>>>>>>>>>>> [rpc-clnt.c:1046:rpc_clnt_connection_init] 0-management:
>>>>>>>>>>>>>>>>> setting frame-timeout to 600
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.607025] I [MSGID: 106498]
>>>>>>>>>>>>>>>>> [glusterd-handler.c:3640:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>> 0-management: connect returned 0
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.608125] I [MSGID: 106544]
>>>>>>>>>>>>>>>>> [glusterd.c:159:glusterd_uuid_init] 0-management:
>>>>>>>>>>>>>>>>> retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Final graph:
>>>>>>>>>>>>>>>>> +-----------------------------
>>>>>>>>>>>>>>>>> -------------------------------------------------+
>>>>>>>>>>>>>>>>>   1: volume management
>>>>>>>>>>>>>>>>>   2:     type mgmt/glusterd
>>>>>>>>>>>>>>>>>   3:     option rpc-auth.auth-glusterfs on
>>>>>>>>>>>>>>>>>   4:     option rpc-auth.auth-unix on
>>>>>>>>>>>>>>>>>   5:     option rpc-auth.auth-null on
>>>>>>>>>>>>>>>>>   6:     option rpc-auth-allow-insecure on
>>>>>>>>>>>>>>>>>   7:     option transport.socket.listen-backlog 128
>>>>>>>>>>>>>>>>>   8:     option event-threads 1
>>>>>>>>>>>>>>>>>   9:     option ping-timeout 0
>>>>>>>>>>>>>>>>>  10:     option transport.socket.read-fail-log off
>>>>>>>>>>>>>>>>>  11:     option transport.socket.keepalive-interval 2
>>>>>>>>>>>>>>>>>  12:     option transport.socket.keepalive-time 10
>>>>>>>>>>>>>>>>>  13:     option transport-type rdma
>>>>>>>>>>>>>>>>>  14:     option working-directory /var/lib/glusterd
>>>>>>>>>>>>>>>>>  15: end-volume
>>>>>>>>>>>>>>>>>  16:
>>>>>>>>>>>>>>>>> +-----------------------------
>>>>>>>>>>>>>>>>> -------------------------------------------------+
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.609868] I [MSGID: 101190]
>>>>>>>>>>>>>>>>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll:
>>>>>>>>>>>>>>>>> Started thread with index 1
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.610839] W [socket.c:596:__socket_rwv]
>>>>>>>>>>>>>>>>> 0-management: readv on 192.168.0.7:24007 failed (No data
>>>>>>>>>>>>>>>>> available)
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.611907] E
>>>>>>>>>>>>>>>>> [rpc-clnt.c:370:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3]
>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] (-->
>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe] (-->
>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39]
>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] )))))
>>>>>>>>>>>>>>>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>>>>>>>>>>>>>>>> at 2017-05-17 06:48:35.609965 (xid=0x1)
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.611928] E [MSGID: 106167]
>>>>>>>>>>>>>>>>> [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk]
>>>>>>>>>>>>>>>>> 0-management: Error through RPC layer, retry again later
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.611944] I [MSGID: 106004]
>>>>>>>>>>>>>>>>> [glusterd-handler.c:5201:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>),
>>>>>>>>>>>>>>>>> in state <Peer in Cluster>, has disconnected from glusterd.
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612024] W
>>>>>>>>>>>>>>>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>> glusterfs/3.7.20/xlator/mgmt/g
>>>>>>>>>>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b)
>>>>>>>>>>>>>>>>> [0x7fd6bdc4912b] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>> lusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160)
>>>>>>>>>>>>>>>>> [0x7fd6bdc52dd0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>> lusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3]
>>>>>>>>>>>>>>>>> ) 0-management: Lock for vol shared not held
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612039] W [MSGID: 106118]
>>>>>>>>>>>>>>>>> [glusterd-handler.c:5223:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>> 0-management: Lock not released for shared
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612079] W [socket.c:596:__socket_rwv]
>>>>>>>>>>>>>>>>> 0-management: readv on 192.168.0.6:24007 failed (No data
>>>>>>>>>>>>>>>>> available)
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612179] E
>>>>>>>>>>>>>>>>> [rpc-clnt.c:370:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fd6c2d70bb3]
>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] (-->
>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe] (-->
>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7fd6c2b3ba39]
>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] )))))
>>>>>>>>>>>>>>>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>>>>>>>>>>>>>>>> at 2017-05-17 06:48:35.610007 (xid=0x1)
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612197] E [MSGID: 106167]
>>>>>>>>>>>>>>>>> [glusterd-handshake.c:2091:__glusterd_peer_dump_version_cbk]
>>>>>>>>>>>>>>>>> 0-management: Error through RPC layer, retry again later
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612211] I [MSGID: 106004]
>>>>>>>>>>>>>>>>> [glusterd-handler.c:5201:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d8928805ed95>),
>>>>>>>>>>>>>>>>> in state <Peer in Cluster>, has disconnected from glusterd.
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.612292] W
>>>>>>>>>>>>>>>>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>> glusterfs/3.7.20/xlator/mgmt/g
>>>>>>>>>>>>>>>>> lusterd.so(glusterd_big_locked_notify+0x4b)
>>>>>>>>>>>>>>>>> [0x7fd6bdc4912b] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>> lusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>>>>>>>> usterd.so(__glusterd_peer_rpc_notify+0x160)
>>>>>>>>>>>>>>>>> [0x7fd6bdc52dd0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>> lusterfs/3.7.20/xlator/mgmt/gl
>>>>>>>>>>>>>>>>> usterd.so(glusterd_mgmt_v3_unlock+0x4c3) [0x7fd6bdcef1b3]
>>>>>>>>>>>>>>>>> ) 0-management: Lock for vol shared not held
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.613432] W [MSGID: 106118]
>>>>>>>>>>>>>>>>> [glusterd-handler.c:5223:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>> 0-management: Lock not released for shared
>>>>>>>>>>>>>>>>> [2017-05-17 06:48:35.614317] E [MSGID: 106170]
>>>>>>>>>>>>>>>>> [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req]
>>>>>>>>>>>>>>>>> 0-management: Request from peer 192.168.0.6:991 has an
>>>>>>>>>>>>>>>>> entry in peerinfo, but uuid does not match
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Apologies for delay. My initial suspect was correct. You
>>>>>>>>>>>>>>>> have an incorrect UUID in the peer file which is causing this. Can you
>>>>>>>>>>>>>>>> please provide me the
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Clicked the send button accidentally!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please send me the content of /var/lib/glusterd &
>>>>>>>>>>>>>>> glusterd log from all the nodes?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, May 15, 2017 at 10:31 PM, Atin Mukherjee <
>>>>>>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, 15 May 2017 at 11:58, Pawan Alwandi
>>>>>>>>>>>>>>>>>> <pawan at platform.sh> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Atin,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I see below error.  Do I require gluster to be upgraded
>>>>>>>>>>>>>>>>>>> on all 3 hosts for this to work?  Right now I have host 1 running 3.10.1
>>>>>>>>>>>>>>>>>>> and host 2 & 3 running 3.6.2
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # gluster v set all cluster.op-version 31001
>>>>>>>>>>>>>>>>>>> volume set: failed: Required op_version (31001) is not
>>>>>>>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes you should given 3.6 version is EOLed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, May 15, 2017 at 3:32 AM, Atin Mukherjee <
>>>>>>>>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Sun, 14 May 2017 at 21:43, Atin Mukherjee <
>>>>>>>>>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Allright, I see that you haven't bumped up the
>>>>>>>>>>>>>>>>>>>>> op-version. Can you please execute:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> gluster v set all cluster.op-version 30101  and then
>>>>>>>>>>>>>>>>>>>>> restart glusterd on all the nodes and check the brick status?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> s/30101/31001
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sun, May 14, 2017 at 8:55 PM, Pawan Alwandi <
>>>>>>>>>>>>>>>>>>>>> pawan at platform.sh> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello Atin,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for looking at this.  Below is the output you
>>>>>>>>>>>>>>>>>>>>>> requested for.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Again, I'm seeing those errors after upgrading
>>>>>>>>>>>>>>>>>>>>>> gluster on host 1.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Host 1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>>>>>>>> UUID=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>>>>>>>> operating-version=30600
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>>>>>>>> uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>>>>>>>> hostname1=192.168.0.7
>>>>>>>>>>>>>>>>>>>>>> uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>>>>>>>> hostname1=192.168.0.6
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>>>>>>>>>> glusterfs 3.10.1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Host 2
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>>>>>>>> UUID=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>>>>>>>>>> operating-version=30600
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>>>>>>>> uuid=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>>>>>>>> hostname1=192.168.0.7
>>>>>>>>>>>>>>>>>>>>>> uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>>>>>>>> hostname1=192.168.0.5
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>>>>>>>>>> glusterfs 3.6.2 built on Jan 21 2015 14:23:44
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Host 3
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # cat /var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>>>>>>>> UUID=5ec54b4f-f60c-48c6-9e55-95f2bb58f633
>>>>>>>>>>>>>>>>>>>>>> operating-version=30600
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # cat /var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>>>>>>>> uuid=7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>>>>>>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>>>>>>>> hostname1=192.168.0.5
>>>>>>>>>>>>>>>>>>>>>> uuid=83e9a0b9-6bd5-483b-8516-d8928805ed95
>>>>>>>>>>>>>>>>>>>>>> state=3
>>>>>>>>>>>>>>>>>>>>>> hostname1=192.168.0.6
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # gluster --version
>>>>>>>>>>>>>>>>>>>>>> glusterfs 3.6.2 built on Jan 21 2015 14:23:44
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sat, May 13, 2017 at 6:28 PM, Atin Mukherjee <
>>>>>>>>>>>>>>>>>>>>>> amukherj at redhat.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I have already asked for the following earlier:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Can you please provide output of following from all
>>>>>>>>>>>>>>>>>>>>>>> the nodes:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> cat /var/lib/glusterd/glusterd.info
>>>>>>>>>>>>>>>>>>>>>>> cat /var/lib/glusterd/peers/*
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Sat, 13 May 2017 at 12:22, Pawan Alwandi
>>>>>>>>>>>>>>>>>>>>>>> <pawan at platform.sh> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hello folks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Does anyone have any idea whats going on here?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, May 10, 2017 at 5:02 PM, Pawan Alwandi <
>>>>>>>>>>>>>>>>>>>>>>>> pawan at platform.sh> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I'm trying to upgrade gluster from 3.6.2 to 3.10.1
>>>>>>>>>>>>>>>>>>>>>>>>> but don't see the glusterfsd and glusterfs processes coming up.
>>>>>>>>>>>>>>>>>>>>>>>>> http://gluster.readthedocs.io/
>>>>>>>>>>>>>>>>>>>>>>>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ is the
>>>>>>>>>>>>>>>>>>>>>>>>> process that I'm trying to follow.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This is a 3 node server setup with a replicated
>>>>>>>>>>>>>>>>>>>>>>>>> volume having replica count of 3.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Logs below:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.507959] I [MSGID: 100030]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterfsd.c:2460:main] 0-/usr/sbin/glusterd: Started running
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/sbin/glusterd version 3.10.1 (args: /usr/sbin/glusterd -p
>>>>>>>>>>>>>>>>>>>>>>>>> /var/run/glusterd.pid)
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.512827] I [MSGID: 106478]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd.c:1449:init] 0-management: Maximum allowed open file descriptors
>>>>>>>>>>>>>>>>>>>>>>>>> set to 65536
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.512855] I [MSGID: 106479]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd.c:1496:init] 0-management: Using /var/lib/glusterd as working
>>>>>>>>>>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520426] W [MSGID: 103071]
>>>>>>>>>>>>>>>>>>>>>>>>> [rdma.c:4590:__gf_rdma_ctx_create]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-rpc-transport/rdma: rdma_cm event channel creation failed [No such device]
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520452] W [MSGID: 103055]
>>>>>>>>>>>>>>>>>>>>>>>>> [rdma.c:4897:init] 0-rdma.management: Failed to initialize IB Device
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520465] W
>>>>>>>>>>>>>>>>>>>>>>>>> [rpc-transport.c:350:rpc_transport_load]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-rpc-transport: 'rdma' initialization failed
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520518] W
>>>>>>>>>>>>>>>>>>>>>>>>> [rpcsvc.c:1661:rpcsvc_create_listener]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-rpc-service: cannot create listener, initing the transport failed
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:03.520534] E [MSGID: 106243]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd.c:1720:init] 0-management: creation of 1 listeners failed,
>>>>>>>>>>>>>>>>>>>>>>>>> continuing with succeeded transport
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.931764] I [MSGID: 106513]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-store.c:2197:glusterd_restore_op_version]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-glusterd: retrieved op-version: 30600
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.964354] I [MSGID: 106544]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd.c:158:glusterd_uuid_init] 0-management:
>>>>>>>>>>>>>>>>>>>>>>>>> retrieved UUID: 7f2a6e11-2a53-4ab4-9ceb-8be6a9
>>>>>>>>>>>>>>>>>>>>>>>>> f2d073
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.993944] I [MSGID: 106498]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: connect returned 0
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995864] I [MSGID: 106498]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: connect returned 0
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995879] W [MSGID: 106062]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:3466:glust
>>>>>>>>>>>>>>>>>>>>>>>>> erd_transport_inet_options_build] 0-glusterd:
>>>>>>>>>>>>>>>>>>>>>>>>> Failed to get tcp-user-timeout
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.995903] I
>>>>>>>>>>>>>>>>>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: setting frame-timeout to 600
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.996325] I
>>>>>>>>>>>>>>>>>>>>>>>>> [rpc-clnt.c:1059:rpc_clnt_connection_init]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: setting frame-timeout to 600
>>>>>>>>>>>>>>>>>>>>>>>>> Final graph:
>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------
>>>>>>>>>>>>>>>>>>>>>>>>> -------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>>   1: volume management
>>>>>>>>>>>>>>>>>>>>>>>>>   2:     type mgmt/glusterd
>>>>>>>>>>>>>>>>>>>>>>>>>   3:     option rpc-auth.auth-glusterfs on
>>>>>>>>>>>>>>>>>>>>>>>>>   4:     option rpc-auth.auth-unix on
>>>>>>>>>>>>>>>>>>>>>>>>>   5:     option rpc-auth.auth-null on
>>>>>>>>>>>>>>>>>>>>>>>>>   6:     option rpc-auth-allow-insecure on
>>>>>>>>>>>>>>>>>>>>>>>>>   7:     option transport.socket.listen-backlog
>>>>>>>>>>>>>>>>>>>>>>>>> 128
>>>>>>>>>>>>>>>>>>>>>>>>>   8:     option event-threads 1
>>>>>>>>>>>>>>>>>>>>>>>>>   9:     option ping-timeout 0
>>>>>>>>>>>>>>>>>>>>>>>>>  10:     option transport.socket.read-fail-log off
>>>>>>>>>>>>>>>>>>>>>>>>>  11:     option transport.socket.keepalive-interval
>>>>>>>>>>>>>>>>>>>>>>>>> 2
>>>>>>>>>>>>>>>>>>>>>>>>>  12:     option transport.socket.keepalive-time 10
>>>>>>>>>>>>>>>>>>>>>>>>>  13:     option transport-type rdma
>>>>>>>>>>>>>>>>>>>>>>>>>  14:     option working-directory /var/lib/glusterd
>>>>>>>>>>>>>>>>>>>>>>>>>  15: end-volume
>>>>>>>>>>>>>>>>>>>>>>>>>  16:
>>>>>>>>>>>>>>>>>>>>>>>>> +-----------------------------
>>>>>>>>>>>>>>>>>>>>>>>>> -------------------------------------------------+
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:04.996310] W [MSGID: 106062]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:3466:glust
>>>>>>>>>>>>>>>>>>>>>>>>> erd_transport_inet_options_build] 0-glusterd:
>>>>>>>>>>>>>>>>>>>>>>>>> Failed to get tcp-user-timeout
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.000461] I [MSGID: 101190]
>>>>>>>>>>>>>>>>>>>>>>>>> [event-epoll.c:629:event_dispatch_epoll_worker]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-epoll: Started thread with index 1
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001493] W
>>>>>>>>>>>>>>>>>>>>>>>>> [socket.c:593:__socket_rwv] 0-management: readv on
>>>>>>>>>>>>>>>>>>>>>>>>> 192.168.0.7:24007 failed (No data available)
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001513] I [MSGID: 106004]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:5882:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.7> (<5ec54b4f-f60c-48c6-9e55-95f2bb58f633>),
>>>>>>>>>>>>>>>>>>>>>>>>> in state <Peer in Cluster>, h
>>>>>>>>>>>>>>>>>>>>>>>>> as disconnected from glusterd.
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001677] W
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>>>>>>>>>> glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
>>>>>>>>>>>>>>>>>>>>>>>>> /glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>>>>>>>>>> lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
>>>>>>>>>>>>>>>>>>>>>>>>> t held
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.001696] W [MSGID: 106118]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:5907:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: Lock not released for shared
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003099] E
>>>>>>>>>>>>>>>>>>>>>>>>> [rpc-clnt.c:365:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c]
>>>>>>>>>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s
>>>>>>>>>>>>>>>>>>>>>>>>> aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (-->
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de]
>>>>>>>>>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x
>>>>>>>>>>>>>>>>>>>>>>>>> 91)[0x7f0bfec91c21] (-->
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710]
>>>>>>>>>>>>>>>>>>>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>>>>>>>>>>>>>>>>>>>> called at 2017-05-10 09:0
>>>>>>>>>>>>>>>>>>>>>>>>> 7:05.000627 (xid=0x1)
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003129] E [MSGID: 106167]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handshake.c:2181:__glusterd_peer_dump_version_cbk]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: Error through RPC layer, retry again later
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003251] W
>>>>>>>>>>>>>>>>>>>>>>>>> [socket.c:593:__socket_rwv] 0-management: readv on
>>>>>>>>>>>>>>>>>>>>>>>>> 192.168.0.6:24007 failed (No data available)
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003267] I [MSGID: 106004]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:5882:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: Peer <192.168.0.6> (<83e9a0b9-6bd5-483b-8516-d8928805ed95>),
>>>>>>>>>>>>>>>>>>>>>>>>> in state <Peer in Cluster>, h
>>>>>>>>>>>>>>>>>>>>>>>>> as disconnected from glusterd.
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003318] W
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>>>>>>>>>>>>>>>>>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/
>>>>>>>>>>>>>>>>>>>>>>>>> glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x20559)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x7f0bf9d74559] -->/usr/lib/x86_64-linux-gnu
>>>>>>>>>>>>>>>>>>>>>>>>> /glusterfs/3.10.1/xlator/mgmt/glusterd.so(+0x29cf0)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x7f0bf9d7dcf0] -->/usr/lib/x86_64-linux-gnu/g
>>>>>>>>>>>>>>>>>>>>>>>>> lusterfs/3.10.1/xlator/mgmt/glusterd.so(+0xd5ba3)
>>>>>>>>>>>>>>>>>>>>>>>>> [0x7f0bf9e29ba3] ) 0-management: Lock for vol shared no
>>>>>>>>>>>>>>>>>>>>>>>>> t held
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003329] W [MSGID: 106118]
>>>>>>>>>>>>>>>>>>>>>>>>> [glusterd-handler.c:5907:__glusterd_peer_rpc_notify]
>>>>>>>>>>>>>>>>>>>>>>>>> 0-management: Lock not released for shared
>>>>>>>>>>>>>>>>>>>>>>>>> [2017-05-10 09:07:05.003457] E
>>>>>>>>>>>>>>>>>>>>>>>>> [rpc-clnt.c:365:saved_frames_unwind] (-->
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> lusterfs.so.0(_gf_log_callingfn+0x13c)[0x7f0bfeeca73c]
>>>>>>>>>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(s
>>>>>>>>>>>>>>>>>>>>>>>>> aved_frames_unwind+0x1cf)[0x7f0bfec904bf] (-->
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> frpc.so.0(saved_frames_destroy+0xe)[0x7f0bfec905de]
>>>>>>>>>>>>>>>>>>>>>>>>> (--> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_connection_cleanup+0x
>>>>>>>>>>>>>>>>>>>>>>>>> 91)[0x7f0bfec91c21] (-->
>>>>>>>>>>>>>>>>>>>>>>>>> /usr/lib/x86_64-linux-gnu/libg
>>>>>>>>>>>>>>>>>>>>>>>>> frpc.so.0(rpc_clnt_notify+0x290)[0x7f0bfec92710]
>>>>>>>>>>>>>>>>>>>>>>>>> ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>>>>>>>>>>>>>>>>>>>>>>>>> called at 2017-05-10 09:0
>>>>>>>>>>>>>>>>>>>>>>>>> 7:05.001407 (xid=0x1)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> There are a bunch of errors reported but I'm not
>>>>>>>>>>>>>>>>>>>>>>>>> sure which is signal and which ones are noise.  Does anyone have any idea
>>>>>>>>>>>>>>>>>>>>>>>>> whats going on here?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Pawan
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>>>>>>>>>>>>>> http://lists.gluster.org/mailm
>>>>>>>>>>>>>>>>>>>>>>>> an/listinfo/gluster-users
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> - Atin (atinm)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>> - Atin (atinm)
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170531/6333ca81/attachment.html>