[Gluster-users] Fedora upgrade to f24 installed 3.8.0 client and broke mounting

Raghavendra Gowdappa rgowdapp at redhat.com
Mon Jun 27 07:38:21 UTC 2016



----- Original Message -----
> From: "Avra Sengupta" <asengupt at redhat.com>
> To: "Vijay Bellur" <vbellur at redhat.com>, "Alastair Neil" <ajneil.tech at gmail.com>, "gluster-users"
> <gluster-users at gluster.org>, "Niels de Vos" <ndevos at redhat.com>, "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> Sent: Monday, June 27, 2016 12:53:41 PM
> Subject: Re: [Gluster-users] Fedora upgrade to f24 installed 3.8.0 client and broke mounting
> 
> On 06/27/2016 12:04 PM, Avra Sengupta wrote:
> > On 06/25/2016 01:19 AM, Vijay Bellur wrote:
> >> On 06/24/2016 02:12 PM, Alastair Neil wrote:
> >>> I upgraded my fedora 23 system to f24 a couple of days ago, now I am
> >>> unable to mount my gluster cluster.
> >>>
> >>> The update installed:
> >>>
> >>> glusterfs-3.8.0-1.fc24.x86_64
> >>> glusterfs-libs-3.8.0-1.fc24.x86_64
> >>> glusterfs-fuse-3.8.0-1.fc24.x86_64
> >>> glusterfs-client-xlators-3.8.0-1.fc24.x86_64
> >>>
> >>> the gluster is running 3.7.11
> >>>
> >>> The volume is replica 3
> >>>
> >>> I see these errors in the mount log:
> >>>
> >>>     [2016-06-24 17:55:34.016462] I [MSGID: 100030]
> >>>     [glusterfsd.c:2408:main] 0-/usr/sbin/glusterfs: Started running
> >>>     /usr/sbin/glusterfs version 3.8.0 (args: /usr/sbin/glusterfs
> >>>     --volfile-server=gluster1 --volfile-id=homes /mnt/homes)
> >>>     [2016-06-24 17:55:34.094345] I [MSGID: 101190]
> >>>     [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started
> >>>     thread with index 1
> >>>     [2016-06-24 17:55:34.240135] I [MSGID: 101190]
> >>>     [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started
> >>>     thread with index 2
> >>>     [2016-06-24 17:55:34.240130] I [MSGID: 101190]
> >>>     [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started
> >>>     thread with index 4
> >>>     [2016-06-24 17:55:34.240130] I [MSGID: 101190]
> >>>     [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started
> >>>     thread with index 3
> >>>     [2016-06-24 17:55:34.241499] I [MSGID: 114020]
> >>>     [client.c:2356:notify] 0-homes-client-2: parent translators are
> >>>     ready, attempting connect on transport
> >>>     [2016-06-24 17:55:34.249172] I [MSGID: 114020]
> >>>     [client.c:2356:notify] 0-homes-client-5: parent translators are
> >>>     ready, attempting connect on transport
> >>>     [2016-06-24 17:55:34.250186] I [rpc-clnt.c:1855:rpc_clnt_reconfig]
> >>>     0-homes-client-2: changing port to 49171 (from 0)
> >>>     [2016-06-24 17:55:34.253347] I [MSGID: 114020]
> >>>     [client.c:2356:notify] 0-homes-client-6: parent translators are
> >>>     ready, attempting connect on transport
> >>>     [2016-06-24 17:55:34.254213] I [rpc-clnt.c:1855:rpc_clnt_reconfig]
> >>>     0-homes-client-5: changing port to 49154 (from 0)
> >>>     [2016-06-24 17:55:34.255115] I [MSGID: 114057]
> >>>     [client-handshake.c:1441:select_server_supported_programs]
> >>>     0-homes-client-2: Using Program GlusterFS 3.3, Num (1298437),
> >>>     Version (330)
> >>>     [2016-06-24 17:55:34.255861] W [MSGID: 114007]
> >>>     [client-handshake.c:1176:client_setvolume_cbk] 0-homes-client-2:
> >>>     failed to find key 'child_up' in the options
> >>>     [2016-06-24 17:55:34.259097] I [MSGID: 114057]
> >>>     [client-handshake.c:1441:select_server_supported_programs]
> >>>     0-homes-client-5: Using Program GlusterFS 3.3, Num (1298437),
> >>>     Version (330)
> >>>     Final graph:
> >>> +------------------------------------------------------------------------------+
> >>>
> >>>       1: volume homes-client-2
> >>>       2:     type protocol/client
> >>>       3:     option clnt-lk-version 1
> >>>       4:     option volfile-checksum 0
> >>>       5:     option volfile-key homes
> >>>       6:     option client-version 3.8.0
> >>>       7:     option process-uuid
> >>>     Island-29185-2016/06/24-17:55:34:10054-homes-client-2-0-0
> >>>       8:     option fops-version 1298437
> >>>       9:     option ping-timeout 20
> >>>      10:     option remote-host gluster-2
> >>>      11:     option remote-subvolume /export/brick2/home
> >>>      12:     option transport-type socket
> >>>      13:     option event-threads 4
> >>>      14:     option send-gids true
> >>>      15: end-volume
> >>>      16:
> >>>      17: volume homes-client-5
> >>>      18:     type protocol/client
> >>>      19:     option clnt-lk-version 1
> >>>      20:     option volfile-checksum 0
> >>>      21:     option volfile-key homes
> >>>      22:     option client-version 3.8.0
> >>>      23:     option process-uuid
> >>>     Island-29185-2016/06/24-17:55:34:10054-homes-client-5-0-0
> >>>      24:     option fops-version 1298437
> >>>      25:     option ping-timeout 20
> >>>      26:     option remote-host gluster1.vsnet.gmu.edu
> >>>     <http://gluster1.vsnet.gmu.edu>
> >>>      27:     option remote-subvolume /export/brick2/home
> >>>      28:     option transport-type socket
> >>>      29:     option event-threads 4
> >>>      30:     option send-gids true
> >>>      31: end-volume
> >>>      32:
> >>>      33: volume homes-client-6
> >>>      34:     type protocol/client
> >>>      35:     option ping-timeout 20
> >>>      36:     option remote-host gluster0
> >>>      37:     option remote-subvolume /export/brick2/home
> >>>      38:     option transport-type socket
> >>>      39:     option event-threads 4
> >>>      40:     option send-gids true
> >>>      41: end-volume
> >>>      42:
> >>>      43: volume homes-replicate-0
> >>>      44:     type cluster/replicate
> >>>      45:     option background-self-heal-count 20
> >>>      46:     option metadata-self-heal on
> >>>      47:     option data-self-heal off
> >>>      48:     option entry-self-heal on
> >>>      49:     option data-self-heal-window-size 8
> >>>      50:     option data-self-heal-algorithm diff
> >>>      51:     option eager-lock on
> >>>      52:     option quorum-type auto
> >>>      53:     option self-heal-readdir-size 64KB
> >>>      54:     subvolumes homes-client-2 homes-client-5 homes-client-6
> >>>      55: end-volume
> >>>      56:
> >>>      57: volume homes-dht
> >>>      58:     type cluster/distribute
> >>>      59:     option min-free-disk 5%
> >>>      60:     option rebalance-stats on
> >>>      61:     option readdir-optimize on
> >>>      62:     subvolumes homes-replicate-0
> >>>      63: end-volume
> >>>      64:
> >>>      65: volume homes-read-ahead
> >>>      66:     type performance/read-ahead
> >>>      67:     subvolumes homes-dht
> >>>      68: end-volume
> >>>      69:
> >>>      70: volume homes-io-cache
> >>>      71:     type performance/io-cache
> >>>      72:     subvolumes homes-read-ahead
> >>>      73: end-volume
> >>>      74:
> >>>      75: volume homes-quick-read
> >>>      76:     type performance/quick-read
> >>>      77:     subvolumes homes-io-cache
> >>>      78: end-volume
> >>>      79:
> >>>      80: volume homes-open-behind
> >>>      81:     type performance/open-behind
> >>>      82:     subvolumes homes-quick-read
> >>>      83: end-volume
> >>>      84:
> >>>      85: volume homes-md-cache
> >>>      86:     type performance/md-cache
> >>>      87:     subvolumes homes-open-behind
> >>>      88: end-volume
> >>>      89:
> >>>      90: volume homes
> >>>      91:     type debug/io-stats
> >>>      92:     option log-level INFO
> >>>      93:     option latency-measurement off
> >>>      94:     option count-fop-hits on
> >>>      95:     subvolumes homes-md-cache
> >>>      96: end-volume
> >>>      97:
> >>>      98: volume meta-autoload
> >>>      99:     type meta
> >>>     100:     subvolumes homes
> >>>     101: end-volume
> >>>     102:
> >>> +------------------------------------------------------------------------------+
> >>>
> >>>     [2016-06-24 17:55:34.261219] I [rpc-clnt.c:1855:rpc_clnt_reconfig]
> >>>     0-homes-client-6: changing port to 49153 (from 0)
> >>>     [2016-06-24 17:55:34.266096] I [MSGID: 114057]
> >>>     [client-handshake.c:1441:select_server_supported_programs]
> >>>     0-homes-client-6: Using Program GlusterFS 3.3, Num (1298437),
> >>>     Version (330)
> >>>     [2016-06-24 17:55:34.266905] W [MSGID: 114007]
> >>>     [client-handshake.c:1176:client_setvolume_cbk] 0-homes-client-6:
> >>>     failed to find key 'child_up' in the options
> >>>     [2016-06-24 17:55:34.273618] W [MSGID: 114007]
> >>>     [client-handshake.c:1176:client_setvolume_cbk] 0-homes-client-5:
> >>>     failed to find key 'child_up' in the options
> >>
> >>>
> >>>
> >>>
> >>> I checked the release notes for 3.8.0 but I did not see any caveats or
> >>> compatibility warnings.
> >>>
> >>> Anyone else seeing issues with 3.8 clients mounting 3.7 volumes?
> >>>
> >>
> >> Seems like it is due to this commit:
> >>
> >> commit 2bfdc30e0e7fba6f97d8829b2618a1c5907dc404
> >> Author: Avra Sengupta
> >> Date:   Mon Feb 29 14:43:58 2016 +0530
> >>
> >>     protocol client/server: Fix client-server handshake
> >>
> >> This commit introduced a new check to determine the existence of a
> >> key in the dictionary that gets exchanged between clients and servers
> >> during a handshake. Upon not finding the key, the clients bail out.
> >>
> >> Avra - would it be possible to avoid a hard check of 'child_up'
> >> during a handshake?
> > Yes Vijay, This particular failure is because the client is expecting
> > a 'child_up' from the server during a handshake, to determine if all
> > children in the server are up and it's not just a handshake. Although
> > this is the ideal behaviour in which the handshake should work, it is
> > currently breaking backward compatibility with 3.7 volumes, as those
> > servers are not sending the appropriate key which the newer client is
> > expecting.
> >
> > I would prefer not to bypass this check in the client, but rather
> > enforce this check only for connections comming from servers running 3.8.
> >
> > + Adding Raghavendra Gowdappa
> >
> > Raghavendra,
> >
> > Would it be possible to keep this check in the client specific to
> > servers running on 3.8 and beyond.
> I have raised a bug for this :
> https://bugzilla.redhat.com/show_bug.cgi?id=1350326 (3.8)
> 
> and I have sent a patch for this in master :
> http://review.gluster.org/#/c/14811/1

This approach fixes the current issue. Is there any reason for propagating CHILD_UP from server to client? Couldn't this be abstracted in server itself, i.e., fail all setvolume requests on brick till protocol/server on brick has received a CHILD_UP (with an optional error being sent for cause of failure). That way we could've fixed the original issue of clients connecting when the xlator stack on brick is not up yet even for older clients and newer server too.

> 
> I will backport it to 3.8 branch as soon as it is merged in master. With
> this patch we are treating the absence of the said key as an indication
> that the server trying to connect to this client is running an older
> version and hence in such a case we are setting conf->child_up as
> _gf_true explicitly. This should suffice in emulating the older behavior.
> >>
> >> Note that if servers are upgraded ahead of the clients, this problem
> >> should not be seen.
> >>
> >> Thanks,
> >> Vijay
> >>
> >>
> >
> 
> 


More information about the Gluster-users mailing list