[Gluster-users] Gluster 11.0 upgrade

Tue Feb 21 08:29:28 UTC 2023

Hi Marcus,

On Mon, Feb 20, 2023 at 2:53 PM Marcus Pedersén <marcus.pedersen at slu.se>
wrote:

> Hi again Xavi,
>
> I did some more testing on my virt machines
> with same setup:
> Number of Bricks: 1 x (2 + 1) = 3
> If I do it the same way, I upgrade the arbiter first,
> I get the same behavior that the bricks do not start
> and the other nodes does not "see" the upgraded node.
> If I upgrade one of the other nodes (non arbiter) and restart
> glusterd on both the arbiter and the other the arbiter starts
> the bricks and connects with the other upgraded node as expected.
> If I upgrade the last node (non arbiter) it will fail to start
> the bricks, same behaviour as the arbiter at first.
> If I then copy the /var/lib/gluster/vols/<myvol> from the
> upgraded (non arbiter) node to the other node that does not start the
> bricks
> and replace /var/lib/gluster/vols/<myvol> with the copied directory
> and restarts glusterd it works nicely after that.
> Everything then works the way it should.
>
> So the question is if the arbiter is treated in some other way
> compared to the other nodes?
>

It seems so, but at this point I'm not sure what could be the difference.

>
> Some type of config is happening at the start of the glusterd that
> makes the node fail?
>

Gluster requires that all glusterd share the same configuration. In this
case it seems that the "info" file in the volume definition has different
contents on the servers.  One of the servers has the value "nfs.disable=on"
but the others do not. This can be the difference that causes the checksum
error.

You can try to copy the "info" file from one node to the one that doesn't
start and try restarting glusterd.

> Do I dare to continue to upgrade my real cluster with the above described
> way?
>
> Thanks!
>
> Regards
> Marcus
>
>
>
> On Mon, Feb 20, 2023 at 01:42:47PM +0100, Marcus Pedersén wrote:
> > I made a recusive diff on the upgraded arbiter.
> >
> > /var/lib/glusterd/vols/gds-common is the upgraded aribiter
> > /home/marcus/gds-common is one of the other nodes still on gluster 10
> >
> > diff -r
> /var/lib/glusterd/vols/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> /home/marcus/gds-common/bricks/urd-gds-030:-urd-gds-gds-common
> > 5c5
> > < listen-port=60419
> > ---
> > > listen-port=0
> > 11c11
> > < brick-fsid=14764358630653534655
> > ---
> > > brick-fsid=0
> > diff -r
> /var/lib/glusterd/vols/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> /home/marcus/gds-common/bricks/urd-gds-031:-urd-gds-gds-common
> > 5c5
> > < listen-port=0
> > ---
> > > listen-port=60891
> > 11c11
> > < brick-fsid=0
> > ---
> > > brick-fsid=1088380223149770683
> > diff -r /var/lib/glusterd/vols/gds-common/cksum
> /home/marcus/gds-common/cksum
> > 1c1
> > < info=3948700922
> > ---
> > > info=458813151
> > diff -r
> /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> /home/marcus/gds-common/gds-common.urd-gds-030.urd-gds-gds-common.vol
> > 3c3
> > <     option shared-brick-count 1
> > ---
> > >     option shared-brick-count 0
> > diff -r
> /var/lib/glusterd/vols/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> /home/marcus/gds-common/gds-common.urd-gds-031.urd-gds-gds-common.vol
> > 3c3
> > <     option shared-brick-count 0
> > ---
> > >     option shared-brick-count 1
> > diff -r /var/lib/glusterd/vols/gds-common/info
> /home/marcus/gds-common/info
> > 23a24
> > > nfs.disable=on
> >
> >
> > I setup 3 virt machines  and configured them with gluster 10 (arbiter 1).
> > After that I upgraded to 11 and the first 2 nodes was fine but on the
> third
> > node I got the same behaviour: the brick never started.
> >
> > Thanks for the help!
> >
> > Regards
> > Marcus
> >
> >
> > On Mon, Feb 20, 2023 at 12:30:37PM +0100, Xavi Hernandez wrote:
> > > CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
> > >
> > >
> > > Hi Marcus,
> > >
> > > On Mon, Feb 20, 2023 at 8:50 AM Marcus Pedersén <
> marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote:
> > > Hi Xavi,
> > > I stopped glusterd and killall glusterd glusterfs glusterfsd
> > > and started glusterd again.
> > >
> > > The only log that is not empty is glusterd.log, I attach the log
> > > from the restart time. The brick log, glustershd.log and
> glfsheal-gds-common.log is empty.
> > >
> > > This are the errors in the log:
> > > [2023-02-20 07:23:46.235263 +0000] E [MSGID: 106061]
> [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
> [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > [2023-02-20 07:23:47.359917 +0000] E [MSGID: 106010]
> [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management:
> Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum
> = 2065453698 on peer urd-gds-031
> > > [2023-02-20 07:23:47.438052 +0000] E [MSGID: 106010]
> [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management:
> Version of Cksums gds-common differ. local cksum = 3017846959, remote cksum
> = 2065453698 on peer urd-gds-032
> > >
> > > Geo replication is not setup so I guess there is nothing strange that
> there is an error regarding georep.
> > > The checksum error seems natural to be there as the other nodes are
> still on version 10.
> > >
> > > No. The configurations should be identical.
> > >
> > > Can you try to compare volume definitions in
> /var/lib/glusterd/vols/gds-common between the upgraded server and one of
> the old ones ?
> > >
> > > Regards,
> > >
> > > Xavi
> > >
> > >
> > > My previous exprience with upgrades is that the local bricks starts and
> > > gluster is up and running. No connection with the other nodes until
> they are upgraded as well.
> > >
> > >
> > > gluster peer status, gives the output:
> > > Number of Peers: 2
> > >
> > > Hostname: urd-gds-032
> > > Uuid: e6f96ad2-0fea-4d80-bd42-8236dd0f8439
> > > State: Peer Rejected (Connected)
> > >
> > > Hostname: urd-gds-031
> > > Uuid: 2d7c0ad7-dfcf-4eaf-9210-f879c7b406bf
> > > State: Peer Rejected (Connected)
> > >
> > > I suppose and guess that this is due to that the arbiter is version 11
> > > and the other 2 nodes are version 10.
> > >
> > > Please let me know if I can provide any other information
> > > to try to solve this issue.
> > >
> > > Many thanks!
> > > Marcus
> > >
> > >
> > > On Mon, Feb 20, 2023 at 07:29:20AM +0100, Xavi Hernandez wrote:
> > > > CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
> > > >
> > > >
> > > > Hi Marcus,
> > > >
> > > > these errors shouldn't prevent the bricks from starting. Isn't there
> any other error or warning ?
> > > >
> > > > Regards,
> > > >
> > > > Xavi
> > > >
> > > > On Fri, Feb 17, 2023 at 3:06 PM Marcus Pedersén <
> marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se><mailto:
> marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>>> wrote:
> > > > Hi all,
> > > > I started an upgrade to gluster 11.0 from 10.3 on one of my clusters.
> > > > OS: Debian bullseye
> > > >
> > > > Volume Name: gds-common
> > > > Type: Replicate
> > > > Volume ID: 42c9fa00-2d57-4a58-b5ae-c98c349cfcb6
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: urd-gds-031:/urd-gds/gds-common
> > > > Brick2: urd-gds-032:/urd-gds/gds-common
> > > > Brick3: urd-gds-030:/urd-gds/gds-common (arbiter)
> > > > Options Reconfigured:
> > > > cluster.granular-entry-heal: on
> > > > storage.fips-mode-rchecksum: on
> > > > transport.address-family: inet
> > > > nfs.disable: on
> > > > performance.client-io-threads: off
> > > >
> > > > I started with the arbiter node, stopped all of gluster
> > > > upgraded to 11.0 and all went fine.
> > > > After upgrade I was able to see the other nodes and
> > > > all nodes were connected.
> > > > After a reboot on the arbiter nothing works the way it should.
> > > > Both brick1 and brick2 has connection but no connection
> > > > with the arbiter.
> > > > On the arbiter glusterd has started and is listening on port 24007,
> > > > the problem seems to be glusterfsd, it never starts!
> > > >
> > > > If I run: gluster volume status
> > > >
> > > > Status of volume: gds-common
> > > > Gluster process                             TCP Port  RDMA Port
> Online  Pid
> > > >
> ------------------------------------------------------------------------------
> > > > Brick urd-gds-030:/urd-gds/gds-common       N/A       N/A        N
>      N/A
> > > > Self-heal Daemon on localhost               N/A       N/A        N
>      N/A
> > > >
> > > > Task Status of Volume gds-common
> > > >
> ------------------------------------------------------------------------------
> > > > There are no active volume tasks
> > > >
> > > >
> > > > In glusterd.log I find the following errors (arbiter node):
> > > > [2023-02-17 12:30:40.519585 +0000] E
> [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call
> failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > > [2023-02-17 12:30:40.678031 +0000] E [MSGID: 106061]
> [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed
> [{Key=log-group}, {errno=2}, {error=No such file or directory}]
> > > >
> > > > In brick/urd-gds-gds-common.log I find the following error:
> > > > [2023-02-17 12:30:43.550753 +0000] E
> [gf-io-uring.c:404:gf_io_uring_setup] 0-io: [MSGID:101240] Function call
> failed <{function=io_uring_setup()}, {error=12 (Cannot allocate memory)}>
> > > >
> > > > I enclose both logfiles.
> > > >
> > > > How do I resolve this issue??
> > > >
> > > > Many thanks in advance!!
> > > >
> > > > Marcus
> > > > ---
> > > > När du skickar e-post till SLU så innebär detta att SLU behandlar
> dina personuppgifter. För att läsa mer om hur detta går till, klicka här <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > > E-mailing SLU will result in SLU processing your personal data. For
> more information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> > > > ________
> > > >
> > > >
> > > >
> > > > Community Meeting Calendar:
> > > >
> > > > Schedule -
> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org<mailto:Gluster-users at gluster.org><mailto:
> Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>>
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > ---
> > > När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> > > E-mailing SLU will result in SLU processing your personal data. For
> more information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
> ---
> När du skickar e-post till SLU så innebär detta att SLU behandlar dina
> personuppgifter. För att läsa mer om hur detta går till, klicka här <
> https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
> E-mailing SLU will result in SLU processing your personal data. For more
> information on how this is done, click here <
> https://www.slu.se/en/about-slu/contact-slu/personal-data/>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20230221/d9a1332d/attachment.html>