[Gluster-users] transport endpoint not connected on just 2 files
Kingsley Tart
gluster at gluster.dogwind.com
Tue Jun 7 11:50:20 UTC 2022
Hi,
Thanks - sorry for the late reply - I was suddenly swamped with other
work then it was a UK holiday.
I've tried rsync -A -X with the volume stopped, then restarted it. Will
see whether it heals.
Cheers,
Kingsley.
On Mon, 2022-05-30 at 18:41 +0000, Strahil Nikolov wrote:
> Make a backup from all bricks. Based on the info 2 of the bricks have
> the same copy while brickC has another copy (gfid mismatch).
>
> I would use mtime to identify the latest version and use that, but I
> have no clue what kind of application you have.
>
> Usually, It's not recommended to manipulate bricks directly, but in
> this case it might be necessary. The simplest way is to move the file
> on brick C (the only one that is different) away, but if you need
> exactly that one, you can rsync/scp it to the other 2 bricks.
>
>
> Best Regards,
> Strahil Nikolov
>
> > On Fri, May 27, 2022 at 11:45, Kingsley Tart
> > <gluster at gluster.dogwind.com> wrote:
> > Hi, thanks.
> >
> > OK that's interesting. Picking one of the files, on bricks A and B
> > I see this (and all of the values are identical between bricks A
> > and B):
> >
> > trusted.afr.dirty=0x000000000000000000000000
> > trusted.afr.gw-runqueues-client-2=0x000000010000000200000000
> > trusted.gfid=0xa40bb83ff3784ae09c997d272296a7a9
> > trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d34
> > 6365642d623863632d6261353037333339646364372f677733
> > trusted.glusterfs.mdata=0x01000000000000000000000000628ec5770000000
> > 0007168bb00000000628ec576000000000000000000000000628ec5760000000000
> > 000000
> >
> > and on brick C I see this:
> >
> > trusted.gfid=0xd73992aee03e4021824b1baced973df3
> > trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d34
> > 6365642d623863632d6261353037333339646364372f677733
> > trusted.glusterfs.mdata=0x01000000000000000000000000628ec5230000000
> > 030136ca000000000628ec523000000000000000000000000628ec5230000000000
> > 000000
> >
> > So brick C is missing the trusted.afr attributes and the
> > trusted.gfid and mdata differ.
> >
> > What do I need to do to fix this?
> >
> > Cheers,
> > Kingsley.
> >
> > On Fri, 2022-05-27 at 03:59 +0000, Strahil Nikolov wrote:
> > > Check the file attributes on all bricks:
> > >
> > > getfattr -d -e hex -m. /data/brick/gw-runqueues/<path to file>
> > >
> > >
> > > Best Regards,
> > > Strahil Nikolov
> > >
> > > > On Thu, May 26, 2022 at 16:05, Kingsley Tart
> > > > <gluster at gluster.dogwind.com> wrote:
> > > > Hi,
> > > >
> > > > I've got a strange issue where on all clients I've tested on
> > > > (tested on
> > > > 4) I have "transport endpoint is not connected" on two files in
> > > > a
> > > > directory, whereas other files can be read fine.
> > > >
> > > > Any ideas?
> > > >
> > > > On one of the servers (all same version):
> > > >
> > > > # gluster --version
> > > > glusterfs 9.1
> > > >
> > > > On one of the clients (same thing with all of them) - problem
> > > > with
> > > > files "gw3" and "gw11":
> > > >
> > > > [root at gw6 btl]# cd /mnt/runqueues/runners/
> > > > [root at gw6 runners]# ls -la
> > > > ls: cannot access gw11: Transport endpoint is not connected
> > > > ls: cannot access gw3: Transport endpoint is not connected
> > > > total 8
> > > > drwxr-xr-x 2 root root 4096 May 26 09:48 .
> > > > drwxr-xr-x 13 root root 4096 Apr 12 2021 ..
> > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw1
> > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw10
> > > > -????????? ? ? ? ? ? gw11
> > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw2
> > > > -????????? ? ? ? ? ? gw3
> > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw4
> > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw6
> > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw7
> > > > [root at gw6 runners]# cat *
> > > > cat: gw11: Transport endpoint is not connected
> > > > cat: gw3: Transport endpoint is not connected
> > > > [root at gw6 runners]#
> > > >
> > > >
> > > > Querying on a server shows those two problematic files:
> > > >
> > > > # gluster volume heal gw-runqueues info
> > > > Brick gluster9a:/data/brick/gw-runqueues
> > > > /runners
> > > > /runners/gw11
> > > > /runners/gw3
> > > > Status: Connected
> > > > Number of entries: 3
> > > >
> > > > Brick gluster9b:/data/brick/gw-runqueues
> > > > /runners
> > > > /runners/gw11
> > > > /runners/gw3
> > > > Status: Connected
> > > > Number of entries: 3
> > > >
> > > > Brick gluster9c:/data/brick/gw-runqueues
> > > > Status: Connected
> > > > Number of entries: 0
> > > >
> > > >
> > > > However several hours later there's no obvious change. The
> > > > servers have
> > > > hardly any load and the volume is tiny. From a client:
> > > >
> > > > # find /mnt/runqueues | wc -l
> > > > 35
> > > >
> > > >
> > > > glfsheal-gw-runqueues.log from server gluster9a:
> > > > https://pastebin.com/7mPszBBM
> > > >
> > > > glfsheal-gw-runqueues.log from server gluster9b:
> > > > https://pastebin.com/rxXs5Tcv
> > > >
> > > >
> > > > Any pointers would be much appreciated!
> > > >
> > > > Cheers,
> > > > Kingsley.
> > > >
> > > > ________
> > > >
> > > >
> > > >
> > > > Community Meeting Calendar:
> > > >
> > > > Schedule -
> > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > > ________
> > >
> > >
> > >
> > > Community Meeting Calendar:
> > >
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://meet.google.com/cpu-eiue-hvk
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220607/a439bcba/attachment.html>
More information about the Gluster-users
mailing list